filtering /48 is going to be necessary

newer
dell switch config export

older
shared address space... a reality!

Jeff Wheeler

9 Mar 2012 9 Mar '12

9:02 a.m.

On Fri, Mar 9, 2012 at 3:23 AM, Mehmet Akcin <mehmet@akcin.net> wrote:

...

if you know anyone who is filtering /48 , you can start telling them to STOP doing so as a good citizen of internet6.

I had a bit of off-list discussion about this topic, and I was not going to bring it up today on-list, but since the other point of view is already there, I may as well. Unless you are going to pay the bill for my clients to upgrade their 3BXL/3CXL systems (and similar) to XXL and then XXXL, I think we need to do two things before IPv6 up-take is really broad: 1) absolutely must drop /48 de-aggregates from ISP blocks 2) absolutely must make RIR policy so orgs can get /48s for anycasting, and whatever other purposes If we fail to adjust RIR policy to account for the huge amount of accidental de-aggregation that can (and will) happen with IPv6, we will eventually have to do #1 anyway, but a bunch of networks will have to renumber in order take advantage of #2 down the road. The way we are headed right now, it is likely that the IPv6 address space being issued today will look like "the swamp" in a few short years, and we will regret repeating this obvious mistake. We had this discussion on the list exactly a year ago. At that time, the average IPv6 origin ASN was announcing 1.43 routes. That figure today is 1.57 routes per origin ASN. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts

Show replies by date

Jeroen Massar

9 Mar 9 Mar

9:27 a.m.

On 2012-03-09 10:02 , Jeff Wheeler wrote:

...

On Fri, Mar 9, 2012 at 3:23 AM, Mehmet Akcin <mehmet@akcin.net> wrote:

...
if you know anyone who is filtering /48 , you can start telling them to STOP doing so as a good citizen of internet6.

I had a bit of off-list discussion about this topic, and I was not going to bring it up today on-list, but since the other point of view is already there, I may as well.

Unless you are going to pay the bill for my clients to upgrade their 3BXL/3CXL systems (and similar) to XXL and then XXXL, I think we need to do two things before IPv6 up-take is really broad:

1) absolutely must drop /48 de-aggregates from ISP blocks

See the strict filter at: http://www.space.net/~gert/RIPE/ipv6-filters.html which has been proposed for quite a long time already. Also note the existence of this awesome thing called RPSL. See also this great presentation by ras: http://www.nanog.org/meetings/nanog44/presentations/Tuesday/RAS_irrdata_N44.... and the very recent column by Geoff Huston: http://www.potaroo.net/ispcol/2012-03/leaks.html

...

2) absolutely must make RIR policy so orgs can get /48s for anycasting, and whatever other purposes

One can already receive those easily, generally as a /48. Also, quite a few organizations are requesting disjunct /32's per country or at least a /32 per region.... Greets, Jeroen

William Herrin

1:48 p.m.

On Fri, Mar 9, 2012 at 4:02 AM, Jeff Wheeler <jsw@inconcepts.biz> wrote:

...

On Fri, Mar 9, 2012 at 3:23 AM, Mehmet Akcin <mehmet@akcin.net> wrote:

...
if you know anyone who is filtering /48 , you can start telling them to STOP doing so as a good citizen of internet6.

I had a bit of off-list discussion about this topic, and I was not going to bring it up today on-list, but since the other point of view is already there, I may as well.

Unless you are going to pay the bill for my clients to upgrade their 3BXL/3CXL systems (and similar) to XXL and then XXXL, I think we need to do two things before IPv6 up-take is really broad:

1) absolutely must drop /48 de-aggregates from ISP blocks 2) absolutely must make RIR policy so orgs can get /48s for anycasting, and whatever other purposes

If we fail to adjust RIR policy to account for the huge amount of accidental de-aggregation that can (and will) happen with IPv6, we will eventually have to do #1 anyway, but a bunch of networks will have to renumber in order take advantage of #2 down the road.

Hi Jeff, We could use smarter prefix filtering than that. Which was proposed to ARIN a couple years ago. And failed. http://lists.arin.net/pipermail/arin-ppml/2009-November/015521.html Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Bernhard Schmidt

2:50 p.m.

Jeff Wheeler <jsw@inconcepts.biz> wrote: Hello Jeff,

...

On Fri, Mar 9, 2012 at 3:23 AM, Mehmet Akcin <mehmet@akcin.net> wrote:

...
if you know anyone who is filtering /48 , you can start telling them to STOP doing so as a good citizen of internet6.

I had a bit of off-list discussion about this topic, and I was not going to bring it up today on-list, but since the other point of view is already there, I may as well.

Unless you are going to pay the bill for my clients to upgrade their 3BXL/3CXL systems (and similar) to XXL and then XXXL, I think we need to do two things before IPv6 up-take is really broad:

1) absolutely must drop /48 de-aggregates from ISP blocks 2) absolutely must make RIR policy so orgs can get /48s for anycasting, and whatever other purposes

I used to be (or still am) on the same page as you are. I was dropping everything smaller than a /36 from PA ranges at the edge. I recently had to relax this filter, because Cloudflare seems to insist on throwing tons of /48s from their 2400:cb00::/32 into the air without an aggregate. And guess what the popular cloud reverse proxy for IPv6 webpages is these days ... cloudflare. Yes, it sucks, yes, I wrote them, but no answer and no change. Best Regards, Bernhard

PC

4:04 p.m.

I think ARIN issues /48s for Provider independent space as the minimum allocation size, so I'm guessing we shouldn't filter below that. At least, that's what's in their current policies. On Fri, Mar 9, 2012 at 7:50 AM, Bernhard Schmidt <berni@birkenwald.de>wrote:

...

Jeff Wheeler <jsw@inconcepts.biz> wrote:

Hello Jeff,

...
On Fri, Mar 9, 2012 at 3:23 AM, Mehmet Akcin <mehmet@akcin.net> wrote:

...
if you know anyone who is filtering /48 , you can start telling them to STOP doing so as a good citizen of internet6.

I had a bit of off-list discussion about this topic, and I was not going to bring it up today on-list, but since the other point of view is already there, I may as well.

Unless you are going to pay the bill for my clients to upgrade their 3BXL/3CXL systems (and similar) to XXL and then XXXL, I think we need to do two things before IPv6 up-take is really broad:

1) absolutely must drop /48 de-aggregates from ISP blocks 2) absolutely must make RIR policy so orgs can get /48s for anycasting, and whatever other purposes

I used to be (or still am) on the same page as you are. I was dropping everything smaller than a /36 from PA ranges at the edge.

I recently had to relax this filter, because Cloudflare seems to insist on throwing tons of /48s from their 2400:cb00::/32 into the air without an aggregate. And guess what the popular cloud reverse proxy for IPv6 webpages is these days ... cloudflare.

Yes, it sucks, yes, I wrote them, but no answer and no change.

Best Regards, Bernhard

Bernhard Schmidt

4:08 p.m.

On 09.03.2012 17:04, PC wrote:

...

I think ARIN issues /48s for Provider independent space as the minimum allocation size, so I'm guessing we shouldn't filter below that. At least, that's what's in their current policies.

Note that I explicitly wrote: | I used to be (or still am) on the same page as you are. I was | dropping everything smaller than a /36 from PA ranges at the edge. ^^^^^^^ Of course I'm accepting /48 from PI range (in ARIN world 2001:500::/30 2001:504::/30 and 2620::/23), anything else would be quite brain-dead and stupid. Bernhard

Owen DeLong

7:31 p.m.

Let us not forget that there is also the issue of PA /48s being advertised (quasi-legitimately) for some end-user organizations that are multi-homed but choose not to get PI space. It is not uncommon to obtain a PA /48 from provider A and also advertise it from Provider B. Owen

Bernhard Schmidt

8:50 p.m.

On 09.03.2012 20:31, Owen DeLong wrote: Hi,

...

Let us not forget that there is also the issue of PA /48s being advertised (quasi-legitimately) for some end-user organizations that are multi-homed but choose not to get PI space. It is not uncommon to obtain a PA /48 from provider A and also advertise it from Provider B.

While I agree it's not uncommon, I'm not a big fan of this setup. Also, provider A should still have his aggregate announced, which would allow strictly filtering ISPs to reach the destination anyway. Announcing /48s from a PA block without the covering aggregate calls for trouble. Bernhard

Owen DeLong

11:02 p.m.

On Mar 9, 2012, at 12:50 PM, Bernhard Schmidt wrote:

...

On 09.03.2012 20:31, Owen DeLong wrote:

Hi,

...
Let us not forget that there is also the issue of PA /48s being advertised (quasi-legitimately) for some end-user organizations that are multi-homed but choose not to get PI space. It is not uncommon to obtain a PA /48 from provider A and also advertise it from Provider B.

While I agree it's not uncommon, I'm not a big fan of this setup. Also, provider A should still have his aggregate announced, which would allow strictly filtering ISPs to reach the destination anyway.

I'm not a big fan, either, but, I think that the concept of "be conservative in what you announce and liberal in what you accept" has to apply in this case. Since it is a common (quasi-)legitimate practice, arbitrarily filtering it is ill-advised IMHO. The statement about the covering aggregate assumes that there are no failures in the union of {site, loop, provider A}. In the event that there is such a failure, the aggregate may not help and may even be harmful. Since one of the key purposes of this kind of multihoming is to provide coverage in the event of such a failure, filtration of the more-specific seems to defeat the purpose.

...

Announcing /48s from a PA block without the covering aggregate calls for trouble.

No question. However, the covering aggregate alone is also insufficient. Owen

George Bonser

10 Mar 10 Mar

3:08 a.m.

...

Owen said:

I'm not a big fan, either, but, I think that the concept of "be conservative in what you announce and liberal in what you accept" has to apply in this case. Since it is a common (quasi-)legitimate practice, arbitrarily filtering it is ill-advised IMHO.

While I agree in principle, 16 bits of disaggregation has the potential for a lot of mayhem and 32 bits (accepting /64 from PA) would be catastrophic. This would seem to be a case where upstream providers can assist the end user in obtaining their own PI space if they wish to multihome. It would be in the provider's interest as it would reduce the number of potential complaints from customers concerning multihoming problems. I filter /32 from PA space and am currently filtering one route but since the aggregate it is from has the same next hop and since I don't see the route from anyone else, I'm not worried about it.

Owen DeLong

6:48 a.m.

On Mar 9, 2012, at 7:08 PM, George Bonser wrote:

...

...
Owen said:

I'm not a big fan, either, but, I think that the concept of "be conservative in what you announce and liberal in what you accept" has to apply in this case. Since it is a common (quasi-)legitimate practice, arbitrarily filtering it is ill-advised IMHO.

While I agree in principle, 16 bits of disaggregation has the potential for a lot of mayhem and 32 bits (accepting /64 from PA) would be catastrophic. This would seem to be a case where upstream providers can assist the end user in obtaining their own PI space if they wish to multihome. It would be in the provider's interest as it would reduce the number of potential complaints from customers concerning multihoming problems.

I filter /32 from PA space and am currently filtering one route but since the aggregate it is from has the same next hop and since I don't see the route from anyone else, I'm not worried about it.

I haven't heard anyone advocate accepting less than a /48. I think /48 is a reasonable "You must be this tall to ride" barrier. Beyond that, YMMV. Owen

George Bonser

7:01 a.m.

...

I haven't heard anyone advocate accepting less than a /48. I think /48 is a reasonable "You must be this tall to ride" barrier.

Beyond that, YMMV.

Owen

Apparently AS6939 has at various times :) I remember getting some /64 announcements from HE. I haven't seen one lately, though. I'm only filtering one /64 route these days announced by AS4651

George Bonser

7:08 a.m.

...

I'm only filtering one /64 route these days announced by AS4651

Actually AS4651 is announcing it to me but it is originating from AS23883 via AS4750 so there are some folks out there taking /64 routes. That one hit my filters, though.

Owen DeLong

11:06 a.m.

On Mar 9, 2012, at 11:01 PM, George Bonser wrote:

...

...
I haven't heard anyone advocate accepting less than a /48. I think /48 is a reasonable "You must be this tall to ride" barrier.

Beyond that, YMMV.

Owen

Apparently AS6939 has at various times :) I remember getting some /64 announcements from HE. I haven't seen one lately, though. I'm only filtering one /64 route these days announced by AS4651

Like any other ISP, we're run by humans and humans occasionally make mistakes. If you saw anything longer than a /64 from 6939, it was the result of such an event. If you see anything longer than a /64 from 6939, please let us know and we will fix it. We have never advocated accepting longer than /48s to my knowledge. Owen

Jimmy Hess

9 Mar 9 Mar

11:20 p.m.

On Fri, Mar 9, 2012 at 1:31 PM, Owen DeLong <owen@delong.com> wrote:

...

Let us not forget that there is also the issue of PA /48s being advertised (quasi-legitimately) for some end-user organizations that are multi-homed but choose not to get PI space. It is not uncommon to obtain a PA /48 from provider A and also advertise it from Provider B.

What should happen is this "quasi-legitimate" method of multi-homing should just be declared illegitimate for IPv6, to facilitate stricter filtering. Instead, what should happen is the multi-homing should be required to fit into one of 3 scenarios, so any announcement with an IPv6 prefix length other than the RIR-allocated/assigned PA or PI block size can be treated as TE and summarily discarded or prioritizes when table resources are scarce. Scenario (1) The end user org obtains PI address space from a RIR, and originates their assigned prefix. The end user org originates their RIR assigned exact prefix and announces to their upstreams, who filter and accept from the end user only routes containing a NLRI of their exact prefix, with the prefix length used by the RIR for the PI blocks from which their assignment(s) had been made. (2) Same as (1) but The RIR provides some expedited process for the ISP to obtain and transfer PI space and AS numbers for the purpose of their customers' multihoming - in one step, so the end user does not have to figure out the RIR application process -- E.g. some RIR process provides the ISP an option to create PI blocks on demand in addition to their PA block; the ISP will not know in advance what AS number or PI block will be allocated, the ISP must follow the RIR rules for the assignment of PI blocks, and educate their user as needed, obtain a signed RSA with the End user, obtain written proof the user has two ISPs, has provided a network design that includes multihoming, and a written sound justification for the multi-homing or the meeting of a criteria requiring multihoming, provide the End User's billing contact info to the RIR, the ISP having pre-payed registration fees to the RIR --- should the end user stop using the ISP that created the block, responsibility for the PI block and AS numbers reverts to the end user org. (3) The end user org who is multi-homed picks a 3rd party organization to assign to the end user from their PA block. The 3rd party org's overall PA block is multihomed with diverse connectivity, and the end user inherits the multihoming of the 3rd party's PA block. The 3rd party AS is the sole AS that originates the prefix in the form of the entire PA block into the DFZ and then routes the individually assigned end-user block to the End user through private arrangement or peering with the End user orgs' upstreams, (IOW: the multi-homed users block does not appear as a globally visible more-specific/deaggregate) (4) Any of the other methods of achieving multi-homing, such as originating an NLRI with a longer prefix than the RIR delegation, should be rejected by filters.

...

Owen -- -JH

Sander Steffann

11:32 p.m.

Hi,

...

What should happen is this "quasi-legitimate" method of multi-homing should just be declared illegitimate for IPv6, to facilitate stricter filtering. Instead, what should happen is the multi-homing should be required to fit into one of 3 scenarios, so any announcement with an IPv6 prefix length other than the RIR-allocated/assigned PA or PI block size can be treated as TE and summarily discarded or prioritizes when table resources are scarce.

Splitting the allocation can be done for many reasons. There are known cases where one LIR operates multiple separate networks, each with a separate routing policy. They cannot get multiple allocations from the RIR and they cannot announce the whole allocation as a whole because of the separate routing policies (who are sometimes required legally, for example when an NREN has both a commercial and an educational network). Deaggregating to /48's is not a good idea, but giving an LIR a few bits (something like 3 or 4) to deaggregate makes sense. - Sander

Jimmy Hess

11:38 p.m.

On Fri, Mar 9, 2012 at 5:32 PM, Sander Steffann <sander@steffann.nl> wrote:

...

Splitting the allocation can be done for many reasons. There are known cases where one LIR >operates multiple separate networks, each with a separate routing policy. They cannot get multiple >allocations from the RIR and they cannot announce the whole allocation as a whole because of the >separate routing policies (who are sometimes required legally, for example when an NREN has both a >commercial and an educational network). Deaggregating to /48's is not a good idea, but giving an LIR

It does make sense to give the LIR a few bits. Note though I say what "should" happen. What will happen in actual fact, is probably going to be identical to IPv4. End users will go to other ISPs and demand they carry their individual /64s; resulting market pressure is more powerful than efficient design. -- -JH

George Herbert

11:40 p.m.

If the LIRs cannot get separate allocations from the RIR (and separate ASNs) for this usage, something is wrong. We want to make things as simple and efficient as possible, but no simpler or more efficient, because the curves go back up again at that point, and we all suffer. -george On Fri, Mar 9, 2012 at 3:32 PM, Sander Steffann <sander@steffann.nl> wrote:

...

Hi,

...
What should happen is this "quasi-legitimate" method of multi-homing should just be declared illegitimate for IPv6, to facilitate stricter filtering. Instead, what should happen is the multi-homing should be required to fit into one of 3 scenarios, so any announcement with an IPv6 prefix length other than the RIR-allocated/assigned PA or PI block size can be treated as TE and summarily discarded or prioritizes when table resources are scarce.

Splitting the allocation can be done for many reasons. There are known cases where one LIR operates multiple separate networks, each with a separate routing policy. They cannot get multiple allocations from the RIR and they cannot announce the whole allocation as a whole because of the separate routing policies (who are sometimes required legally, for example when an NREN has both a commercial and an educational network). Deaggregating to /48's is not a good idea, but giving an LIR a few bits (something like 3 or 4) to deaggregate makes sense.

- Sander

-- -george william herbert george.herbert@gmail.com

Owen DeLong

10 Mar 10 Mar

4:41 a.m.

This varies from RIR to RIR. In the ARIN region, you can get assignments or allocations for Multiple Discreet Networks, but, ARIN will often register them as an aggregate in the registration database, so... In the RIPE region (which is where I believe Sander is), only aggregates are available to the best of my knowledge. Owen On Mar 9, 2012, at 3:40 PM, George Herbert wrote:

...

If the LIRs cannot get separate allocations from the RIR (and separate ASNs) for this usage, something is wrong.

We want to make things as simple and efficient as possible, but no simpler or more efficient, because the curves go back up again at that point, and we all suffer.

-george

On Fri, Mar 9, 2012 at 3:32 PM, Sander Steffann <sander@steffann.nl> wrote:

...
Hi,

...
What should happen is this "quasi-legitimate" method of multi-homing should just be declared illegitimate for IPv6, to facilitate stricter filtering. Instead, what should happen is the multi-homing should be required to fit into one of 3 scenarios, so any announcement with an IPv6 prefix length other than the RIR-allocated/assigned PA or PI block size can be treated as TE and summarily discarded or prioritizes when table resources are scarce.

Splitting the allocation can be done for many reasons. There are known cases where one LIR operates multiple separate networks, each with a separate routing policy. They cannot get multiple allocations from the RIR and they cannot announce the whole allocation as a whole because of the separate routing policies (who are sometimes required legally, for example when an NREN has both a commercial and an educational network). Deaggregating to /48's is not a good idea, but giving an LIR a few bits (something like 3 or 4) to deaggregate makes sense.

- Sander

-- -george william herbert george.herbert@gmail.com

Leo Vegoda

9 Mar 9 Mar

11:45 p.m.

Hi, Sander wrote:

...

Splitting the allocation can be done for many reasons. There are known cases where one LIR operates multiple separate networks, each with a separate routing policy. They cannot get multiple allocations from the RIR and they cannot announce the whole allocation as a whole because of the separate routing policies (who are sometimes required legally, for example when an NREN has both a commercial and an educational network).

If they have two different routing policies and need two different allocations, why not just have two different LIRs? It makes things a lot easier than spending untold weeks or time trying to work out which corner cases should be supported by policy and which should not. No? Leo

Owen DeLong

10 Mar 10 Mar

4:42 a.m.

On Mar 9, 2012, at 3:45 PM, Leo Vegoda wrote:

...

Hi,

Sander wrote:

...
Splitting the allocation can be done for many reasons. There are known cases where one LIR operates multiple separate networks, each with a separate routing policy. They cannot get multiple allocations from the RIR and they cannot announce the whole allocation as a whole because of the separate routing policies (who are sometimes required legally, for example when an NREN has both a commercial and an educational network).

If they have two different routing policies and need two different allocations, why not just have two different LIRs? It makes things a lot easier than spending untold weeks or time trying to work out which corner cases should be supported by policy and which should not. No?

Leo

This may depend on where you are. Being two LIRs in the ARIN region requires setting up two complete legal entities which is a lot of overhead to carry for just that purpose. Owen

Joel jaeggli

5:33 a.m.

On 3/9/12 20:42 , Owen DeLong wrote:

...

On Mar 9, 2012, at 3:45 PM, Leo Vegoda wrote:

...
Hi,

Sander wrote:

...
Splitting the allocation can be done for many reasons. There are known cases where one LIR operates multiple separate networks, each with a separate routing policy. They cannot get multiple allocations from the RIR and they cannot announce the whole allocation as a whole because of the separate routing policies (who are sometimes required legally, for example when an NREN has both a commercial and an educational network).

If they have two different routing policies and need two different allocations, why not just have two different LIRs? It makes things a lot easier than spending untold weeks or time trying to work out which corner cases should be supported by policy and which should not. No?

Leo

This may depend on where you are. Being two LIRs in the ARIN region requires setting up two complete legal entities which is a lot of overhead to carry for just that purpose.

Owen

I'll put this as bluntly and succinctly as I can because I find the LIR distriction arbitrary... I have an ipv6 direct assignment from ARIN. It is sized to meet the needs of my enterprise consistent with needs for future growth number of pops, prevailing ARIN policy etc. Because my network is discontiguous I must announce more specific routes than the assignment in order to reflect the topology I have both in IPV4 and in IPV6. I fully expect (and have no evidence to the contrary) that my transit providers will accept the deaggreated prefixes and that their upstreams and peers will by-in-large do likewise. I have no interest in the general sense of deaggregating beyond the level required by the topological considerations. Imposing arbitary political considerations on organizations that are simply trying to operate networks in order preserve maximal aggregation at a given level seems absurd on the face of it. I am reasonably certain that every wholesale transit provider on this list that offers ipv6 transit would be willing to accept by money and route my prefixes in their current form.

Randy Bush

5:52 a.m.

...

Imposing arbitary political considerations on organizations that are simply trying to operate networks in order preserve maximal aggregation at a given level seems absurd on the face of it.

arin policy weenies live for this! randy

George Bonser

6:02 a.m.

...

I'll put this as bluntly and succinctly as I can because I find the LIR distriction arbitrary...

I have an ipv6 direct assignment from ARIN.

I am assuming you are an enterprise in PI space and not an ISP in PA space?

...

It is sized to meet the needs of my enterprise consistent with needs for future growth number of pops, prevailing ARIN policy etc.

Because my network is discontiguous I must announce more specific routes than the assignment in order to reflect the topology I have both in IPV4 and in IPV6.

...

I fully expect (and have no evidence to the contrary) that my transit providers will accept the deaggreated prefixes and that their upstreams and peers will by-in-large do likewise.

If you are in PI space, I believe most people take down to a /48 as a /48 is generally accepted to be a single "site". So let's say you were given a /40 and have several disconnected sites. Most people are going to accept a /48 from you in PI space. I would say pretty close to "everyone" is going to accept a /48 from PI space. An ISP that has been given a /32 or larger allocation from PA space and might have 10,000 customers each assigned their own /48 could instantly more than double the size of the IPv6 routing table if they disaggregated that /32. The problem here is that each /32 is 65536 /48 networks. An even larger net, say a /30 that disaggregates due to a router configuration goof means a potential of a huge number of networks suddenly flooding the Internet.

Joel jaeggli

6:32 a.m.

On 3/9/12 22:02 , George Bonser wrote:

...

An ISP that has been given a /32 or larger allocation from PA space and might have 10,000 customers each assigned their own /48 could instantly more than double the size of the IPv6 routing table if they disaggregated that /32.

The problem here is that each /32 is 65536 /48 networks. An even larger net, say a /30 that disaggregates due to a router configuration goof means a potential of a huge number of networks suddenly flooding the Internet.

I'm well into my second decade of having a v6 prefix in the dfz and am passingly familiar with powers of two...

George Bonser

6:52 a.m.

...

I'm well into my second decade of having a v6 prefix in the dfz and am passingly familiar with powers of two...

Point is that expecting people globally to take a /48 from PA space probably isn't a realistic expectation.

Jimmy Hess

2:08 p.m.

On Sat, Mar 10, 2012 at 12:52 AM, George Bonser <gbonser@seven.com> wrote:

...

...
I'm well into my second decade of having a v6 prefix in the dfz and am passingly familiar with powers of two... Point is that expecting people globally to take a /48 from PA space probably isn't a realistic expectation.

Exactly.... What's more realistic is you have to get a single /48 of PI space for people to carry that globally. And if you have 5 discontiguous networks, what the RIRs should do is carve a /44 out for your present and future PI allocations and issue you the 8 /48s; the PI /48 routing slots that you have justified need for -- arranged so that they fall within the same /45. -- -JH

Sven Olaf Kamphuis

10:47 p.m.

well... we actually intend to just announce /64's and smaller as well. i don't see the problem with that. just get routers with enough memory... i'm rather for a "specification" of a minimum supported route-size (let's say something along the lines of 64GB in each border router, it's 2012 after all ;) than for putting limits on the prefix sized announced so "old junk" can still stay connected to the internet. let's say, there is 6 billion people in the world.. if they all have 1 route table entry (average ;) i see no technical limitations on anything produced AFTER 2008 actually. stop buying crap without sufficient ram, or just scrap it and get new stuff. (which you're going to have to do to efficiently route ipv6 -anyway- at some point, as your old stuff, simply doesn't even loadbalance trunked ethernet ports properly (layer 3 based) ;) we can't limit the expansion of the internet, and the independance of it's users, just because some people refuse to part from their cisco 7200 vxr. On Sat, 10 Mar 2012, Jimmy Hess wrote:

...

On Sat, Mar 10, 2012 at 12:52 AM, George Bonser <gbonser@seven.com> wrote:

...
...
I'm well into my second decade of having a v6 prefix in the dfz and am passingly familiar with powers of two... Point is that expecting people globally to take a /48 from PA space probably isn't a realistic expectation.

Exactly.... What's more realistic is you have to get a single /48 of PI space for people to carry that globally.

And if you have 5 discontiguous networks, what the RIRs should do is carve a /44 out for your present and future PI allocations and issue you the 8 /48s; the PI /48 routing slots that you have justified need for -- arranged so that they fall within the same /45.

-- -JH

William Herrin

11:11 p.m.

On Sat, Mar 10, 2012 at 5:47 PM, Sven Olaf Kamphuis <sven@cb3rob.net> wrote:

...

just get routers with enough memory...

i'm rather for a "specification" of a minimum supported route-size (let's say something along the lines of 64GB in each border router, it's 2012 after all ;) than for putting limits on the prefix sized announced so "old junk" can still stay connected to the internet.

let's say, there is 6 billion people in the world.. if they all have 1 route table entry (average ;) i see no technical limitations on anything produced AFTER 2008 actually.

stop buying crap without sufficient ram, or just scrap it and get new stuff. (which you're going to have to do to efficiently route ipv6 -anyway- at some point, as your old stuff, simply doesn't even loadbalance trunked ethernet ports properly (layer 3 based) ;)

Sven, A) 7 billion people in the world, not 6. B) 7B IPv6 routes won't fit in a 64GB radix tree once, let alone the several times they'd need to in order to be useful in a router. For that matter, 6B routes won't fit either. (Hint: FIB plus at least one RIB for each peer) C) Big iron is either using massively parallel FIBs (many copies of the radix tree) or they're using TCAM instead of DRAM, a specialized tristate version of SRAM. In either case, you're talking 10 to 100 times the cost, ten times the power consumption and ten times the heat versus DRAM. D) No computer presently exists on which the BGP protocol could keep up with today's update rate per prefix with 7B prefixes. A router handling 10M routes is achievable today if we're willing to go back to $20k as the minimum cost BGP box. That's an order of magnitude more than we have now and three orders of magnitude short of where we need to be before we can stop sweating the prefix count. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Masataka Ohta

12 Mar 12 Mar

6:17 a.m.

William Herrin wrote:

...

C) Big iron is either using massively parallel FIBs (many copies of the radix tree) or they're using TCAM instead of DRAM, a specialized tristate version of SRAM. In either case, you're talking 10 to 100 times the cost, ten times the power consumption and ten times the heat versus DRAM.

TCAM is a specialized version of CAM. CAM is much worse than SRAM.

...

A router handling 10M routes is achievable today if we're willing to go back to $20k as the minimum cost BGP box. That's an order of magnitude more than we have now and three orders of magnitude short of where we need to be before we can stop sweating the prefix count.

For 16M routes, we only need /24. With /24 aggregation, route look up is trivially easy with a 16M entry single chip SRAM every 3ns consuming 1W. That's why IPv4 or original IPv6 proposal with 8B address is much better than the current IPv6. Masataka Ohta

Seth Mattinen

10 Mar 10 Mar

11:39 p.m.

On 3/10/12 2:47 PM, Sven Olaf Kamphuis wrote:

...

well... we actually intend to just announce /64's and smaller as well.

i don't see the problem with that.

just get routers with enough memory...

i'm rather for a "specification" of a minimum supported route-size (let's say something along the lines of 64GB in each border router, it's 2012 after all ;) than for putting limits on the prefix sized announced so "old junk" can still stay connected to the internet.

let's say, there is 6 billion people in the world.. if they all have 1 route table entry (average ;) i see no technical limitations on anything produced AFTER 2008 actually.

stop buying crap without sufficient ram, or just scrap it and get new stuff. (which you're going to have to do to efficiently route ipv6 -anyway- at some point, as your old stuff, simply doesn't even loadbalance trunked ethernet ports properly (layer 3 based) ;)

I'm under the impression from your messages in this thread that you're unaware or unfamiliar with TCAM. ~Seth

Joel jaeggli

11 Mar 11 Mar

midnight

On 3/10/12 14:47 , Sven Olaf Kamphuis wrote:

...

let's say, there is 6 billion people in the world.. if they all have 1 route table entry (average ;) i see no technical limitations on anything produced AFTER 2008 actually.

Over in ipv4 land there are ~40k entities that appear in the dfz internet... Of those somewhat less than half (16k) announce just one prefix. The top 30 ASes by route count on the other hand are 10% of the table. I don't a see a problem with the small guys. I don't see the little guys as a source of fib scaling problem becuase oddly enough they aren't. The actors causing the most impact on the size of my fib are by in large on this mailing list... joel

Owen DeLong

12:47 a.m.

On Mar 10, 2012, at 4:00 PM, Joel jaeggli wrote:

...

On 3/10/12 14:47 , Sven Olaf Kamphuis wrote:

...
let's say, there is 6 billion people in the world.. if they all have 1 route table entry (average ;) i see no technical limitations on anything produced AFTER 2008 actually.

Over in ipv4 land there are ~40k entities that appear in the dfz internet... Of those somewhat less than half (16k) announce just one prefix. The top 30 ASes by route count on the other hand are 10% of the table.

I don't a see a problem with the small guys. I don't see the little guys as a source of fib scaling problem becuase oddly enough they aren't.

The actors causing the most impact on the size of my fib are by in large on this mailing list...

joel

I expect that many of those are not nearly as likely to create as many routes in IPv6. Hence my belief that the problem is generally solved for some time to come once we can stop carrying the bloated obsolete IPv4 table for legacy support. Owen

Sven Olaf Kamphuis

10 Mar 10 Mar

10:50 p.m.

we also should have expanded the ASN to minimum 64 bits at the time it was expanded to 32 bit for exactly the same reason btw. there -are- some technical reasons why /64's would be practical as "end-site" stuff, and if we want to be able to make all those end site networks independant, we'd need 64 bit asn's to go along with that. but main thing: just get enough ram in your stuff, and stop imposing stupid limitations. (not my problem if your routers keep reloading the table or rebooting themselves because they're from 1993 ffs ;) you did buy a new iphone i bet.. why no modern routers. On Sat, 10 Mar 2012, Jimmy Hess wrote:

...

On Sat, Mar 10, 2012 at 12:52 AM, George Bonser <gbonser@seven.com> wrote:

...
...
I'm well into my second decade of having a v6 prefix in the dfz and am passingly familiar with powers of two... Point is that expecting people globally to take a /48 from PA space probably isn't a realistic expectation.

Exactly.... What's more realistic is you have to get a single /48 of PI space for people to carry that globally.

And if you have 5 discontiguous networks, what the RIRs should do is carve a /44 out for your present and future PI allocations and issue you the 8 /48s; the PI /48 routing slots that you have justified need for -- arranged so that they fall within the same /45.

-- -JH

Jared Mauch

12 Mar 12 Mar

3:18 p.m.

The big issue is not the control plane but forwarding plane memory. SRAM is hot and expensive. Jared Mauch On Mar 10, 2012, at 5:50 PM, Sven Olaf Kamphuis <sven@cb3rob.net> wrote:

...

you did buy a new iphone i bet.. why no modern routers.

Sven Olaf Kamphuis

10 Mar 10 Mar

10:56 p.m.

and anyway, the average visit to facebook is still more data than the entire ipv6 route table at the moment. we might also want to speed up bgp handling by routers a bit in the future, as some are DAMN SLOW in processing a few hundred thousand sets of data... (no people, it's NOT acceptable when a 200k box takes more than a few milliseconds to process whats basically just a few megabytes of data coming in over 10ge pipes and put it into a route table in ram ;) time to put all those suppliers a pepper in their **** and simply stop buying their stuff if they keep selling obsolete junk. end-to-end PI is the way to go. -- Greetings, Sven Olaf Kamphuis, CB3ROB LLTC. ========================================================================= Address: C/O German Embassy of the Republic CyberBunker Koloniestrasse 34 D-13359 Registration: #8 CBTR GERMANIA Phone: +31/(0)87-8747479 Das Gross Deutsche Reich RIPE: CBSK1-RIPE e-Mail: sven@cb3rob.net ========================================================================= http://www.facebook.com/cb3rob ========================================================================= Confidential: Please be advised that the information contained in this email message, including all attached documents or files, is privileged and confidential and is intended only for the use of the individual or individuals addressed. Any other use, dissemination, distribution or copying of this communication is strictly prohibited. On Sat, 10 Mar 2012, Jimmy Hess wrote:

...

On Sat, Mar 10, 2012 at 12:52 AM, George Bonser <gbonser@seven.com> wrote:

...
...
I'm well into my second decade of having a v6 prefix in the dfz and am passingly familiar with powers of two... Point is that expecting people globally to take a /48 from PA space probably isn't a realistic expectation.

Exactly.... What's more realistic is you have to get a single /48 of PI space for people to carry that globally.

And if you have 5 discontiguous networks, what the RIRs should do is carve a /44 out for your present and future PI allocations and issue you the 8 /48s; the PI /48 routing slots that you have justified need for -- arranged so that they fall within the same /45.

-- -JH

Owen DeLong

11 Mar 11 Mar

12:42 a.m.

On Mar 10, 2012, at 6:08 AM, Jimmy Hess wrote:

...

On Sat, Mar 10, 2012 at 12:52 AM, George Bonser <gbonser@seven.com> wrote:

...
...
I'm well into my second decade of having a v6 prefix in the dfz and am passingly familiar with powers of two... Point is that expecting people globally to take a /48 from PA space probably isn't a realistic expectation.

Exactly.... What's more realistic is you have to get a single /48 of PI space for people to carry that globally.

I fail to understand what difference it makes to a router whether a /48 is from PA or PI.

...

And if you have 5 discontiguous networks, what the RIRs should do is carve a /44 out for your present and future PI allocations and issue you the 8 /48s;

Well, they carve out a /44 and issue you the /44 in most cases that I am aware of. Is that a problem? If so, why? Seems rather silly.

...

the PI /48 routing slots that you have justified need for -- arranged so that they fall within the same /45.

RIRs don't issue routing slots. They issue addressing blocks. Usually (though not always) these addressing blocks are aligned with prefixes. Sometimes those prefixes end up in one routing slot. Sometimes more. Occasionally, none at all. There is no definite relationship between network blocks issued by RIRs and prefixes and even less so between prefixes and routing slots. Owen

Jimmy Hess

10 Mar 10 Mar

6:14 a.m.

On Fri, Mar 9, 2012 at 11:33 PM, Joel jaeggli <joelja@bogus.com> wrote:

...

On 3/9/12 20:42 , Owen DeLong wrote:

...

Because my network is discontiguous I must announce more specific routes than the assignment in order to reflect the topology I have both in IPV4 and in IPV6.

I fully expect (and have no evidence to the contrary) that my transit providers will accept the deaggreated prefixes and that their upstreams and peers will by-in-large do likewise.

I have no doubt any transit provider would be happy to provide the transit for your discontiguous network, and accept your deaggregates within their network. The unreasonable expectation is that their upstreams or peers would carry all the deaggregates in the long run. Connectivity for your discontiguous networks are your problem to solve, and as long as router memory is at a premium, limiting what deaggregates are accepted will be important. The peers want best connectivity to you at least cost for them.

...

I have no interest in the general sense of deaggregating beyond the level required by the topological considerations.

You don't have such an interest, but sloppy practices prevail on average. As evidenced in IPv4 by large blocks with all the /24s showing up.

...

Imposing arbitary political considerations on organizations that are

Not political considerations, technical restrictions, which are design constraints. There are already plenty such design constraints that are imposed by RFC; interconnecting networks doesn't have a reliable result without some technical ground rules that provide for interoperability, stability, and predictability.

...

simply trying to operate networks in order preserve maximal aggregation at a given level seems absurd on the face of it.

So for any network you provide transit to, in IPv4 you would be happy to allow them to announce their /12 as 13,1072 /29s, because they have 131072 subnets, and they could reasonably expect that all your peers would be happy to propagate the /29s, for the purpose of making sure the end user's design is not constrained (although at the peer's expense for increased equipment capacity) ? There's an unwritten rule somewhere, that you don't expect longer than a /24 to propagate far with high degree of certainty. With IPv6 instead of picking some arbitrary number liek /48 or /64, it should be based on the RIR allocation unit size and type of allocation, for best results. That's more rational than what we have with IPv4 -- -JH

George Bonser

3:10 a.m.

...

Deaggregating to /48's is not a good idea, but giving an LIR a few bits (something like 3 or 4) to deaggregate makes sense.

- Sander

+1 I wouldn't have a problem with a few bits of disaggregation. That seems reasonable for a network that might be subject to partitioning or doesn't have a fully meshed internal net.

Sascha Lenz

9:41 a.m.

Hi all,

...

Hi,

...
What should happen is this "quasi-legitimate" method of multi-homing should just be declared illegitimate for IPv6, to facilitate stricter filtering. Instead, what should happen is the multi-homing should be required to fit into one of 3 scenarios, so any announcement with an IPv6 prefix length other than the RIR-allocated/assigned PA or PI block size can be treated as TE and summarily discarded or prioritizes when table resources are scarce.

Splitting the allocation can be done for many reasons. There are known cases where one LIR operates multiple separate networks, each with a separate routing policy. They cannot get multiple allocations from the RIR and they cannot announce the whole allocation as a whole because of the separate routing policies (who are sometimes required legally, for example when an NREN has both a commercial and an educational network). Deaggregating to /48's is not a good idea, but giving an LIR a few bits (something like 3 or 4) to deaggregate makes sense.

yes, that's my point for years now - probably filter /48s from allocations (because end-users CAN get IPv6 PI assignments now everywhere i think), but do allow some "sub-allocations" in the DFZ for such mentioned reasons. Because for the latter there are no real "nice" solutions atm. (or probably update the policies to be able to acquire multiples allocations without hassle in such cases, but OTOH it doesn't matter to the routing table, another prefix is another prefix) It's much nicer to have, say, one /40 in the table aggregating some (routing-)separated /48 customers than to have 200 /48 PI prefixes in that AS if each customer needs to get their own PI space if you cannot split the allocation. I thought that would be a good middle ground (combined with RIR RR based filters perhaps of course). ...but it seems like you even need to accept /48 from everywhere nowadays based on the initial postings *sigh* Not even I do like that, although i never was a big fan of strict filtering. But it all comes down to this most likely, the internet is a distributed being, and RIRs don't control routing. So /48 just will become the new /24 and some people will give us the good old "told you so!". -- Mit freundlichen Grüßen / Kind Regards Sascha Lenz [SLZ-RIPE] Senior System- & Network Architect

Owen DeLong

4:45 a.m.

...

(4) Any of the other methods of achieving multi-homing, such as originating an NLRI with a longer prefix than the RIR delegation, should be rejected by filters.

...
Owen -- -JH

It is very rare that I will quote Randy Bush. Even more so when his original quote was utterly misplaced in the original context. However, in this case I will make an exception... "We don't need policy weenies telling us how to run our networks." --Randy Bush (from APNIC Policy SIG discussion of Prop-098) Owen

Owen DeLong

9 Mar 9 Mar

7:27 p.m.

On Mar 9, 2012, at 1:02 AM, Jeff Wheeler wrote:

...

On Fri, Mar 9, 2012 at 3:23 AM, Mehmet Akcin <mehmet@akcin.net> wrote:

...
if you know anyone who is filtering /48 , you can start telling them to STOP doing so as a good citizen of internet6.

I had a bit of off-list discussion about this topic, and I was not going to bring it up today on-list, but since the other point of view is already there, I may as well.

Unless you are going to pay the bill for my clients to upgrade their 3BXL/3CXL systems (and similar) to XXL and then XXXL, I think we need to do two things before IPv6 up-take is really broad:

1) absolutely must drop /48 de-aggregates from ISP blocks 2) absolutely must make RIR policy so orgs can get /48s for anycasting, and whatever other purposes

Item 1 will be interesting. Item 2 is already done in ARIN and I think RIPE and APNIC. I'm not sure about AfriNIC or LACNIC.

...

If we fail to adjust RIR policy to account for the huge amount of accidental de-aggregation that can (and will) happen with IPv6, we will eventually have to do #1 anyway, but a bunch of networks will have to renumber in order take advantage of #2 down the road.

Can you point to specific RIR policies that you believe need adjustment? I'm willing to help (and somewhat adept at doing so), but I'm not seeing the problem you are reporting, so, I need more data.

...

The way we are headed right now, it is likely that the IPv6 address space being issued today will look like "the swamp" in a few short years, and we will regret repeating this obvious mistake.

I don't think so, actually. First, I don't think the swamp was a mistake so much as a temporary problem with resource limitation on routers. The problem in today's routing table is NOT the "swamp". (Where swamp is defined as the large number of /24s issued to organizations in a non- aggregable manner often as direct assignments from the InterNIC before CIDR and provider-assigned addressing existed). The total scope of the swamp is limited to about 65,000 prefixes. All of the growth beyond about 65,000 prefixes is, instead, attributable to a number of other factors, not the least of which are disaggregation for convenience (which could be an issue in IPv6), disaggregation due to ignorance (which will likely be an issue in IPv6), de-aggregation due to differences in routing policy and/or for traffic management strategies (which will also happen in IPv6), general growth of the internet (which will also happen in IPv6, but, at a lower prefix-growth rate), and finally, one of the biggest causes... slow start, growth constrained RIR policies handing out incremental (often 1 year worth or less) growth in address blocks due to scarcity (which should not be a problem in IPv6). In the ARIN region I think we have pretty well prevented this last issue with current policy. I tried to propose similar policy in the APNIC region, but it was not well accepted there. The folks in Asia seem t want to cling to their scarcity mentality in IPv6 for the time being. I believe RIPE is issuing reasonably generous IPv6 allocations/assignments. I don't know enough about the goings on in AfriNIC or LACNIC to comment with any certainty.

...

We had this discussion on the list exactly a year ago. At that time, the average IPv6 origin ASN was announcing 1.43 routes. That figure today is 1.57 routes per origin ASN.

That represents a 10% growth in prefix/asn for IPv6. Compare to 9.3->9.96/ASN (7%) in IPv4 over that same time, While I would agree that this is a trend that merits watching, I think we're probably OK for quite some time. The higher growth rate in IPv6 can be largely attributed to the fact that IPv6 is still in its infancy and we probably haven't seen much IPv6 traffic engineering route manipulation yet. I don't think IPv6 is at all likely even with current policies and trends to reach 9:1. I expect it will most likely settle in somewhere around 2.5:1. Owen

Leo Vegoda

8:45 p.m.

Owen wrote: [...]

...

In the ARIN region I think we have pretty well prevented this last issue with current policy. I tried to propose similar policy in the APNIC region, but it was not well accepted there. The folks in Asia seem t want to cling to their scarcity mentality in IPv6 for the time being. I believe RIPE is issuing reasonably generous IPv6 allocations/assignments. I don't know enough about the goings on in AfriNIC or LACNIC to comment with any certainty.

You can see the prefix distribution in charts that are updated daily on our stats web site: http://stats.research.icann.org/ HTH, Leo

Mukom Akong T.

12 Mar 12 Mar

5:47 a.m.

On Fri, Mar 9, 2012 at 11:27 PM, Owen DeLong <owen@delong.com> wrote:

...

...
1) absolutely must drop /48 de-aggregates from ISP blocks 2) absolutely must make RIR policy so orgs can get /48s for anycasting, and whatever other purposes

Item 1 will be interesting. Item 2 is already done in ARIN and I think RIPE and APNIC. I'm not sure about AfriNIC or LACNIC.

AfriNC already does so. See http://www.afrinic.net/docs/policies/AFPUB-2007-v6-001.htm -- Mukom Akong [Tamon] ______________ “We don't LIVE in order to BREATH. Similarly WORKING in order to make MONEY puts us on a one way street to irrelevance.“ [In Search of Excellence & Perfection] - http://perfexcellence.org [Moments of TechXcellence] - http://techexcellence.net [ICT Business Integration] - http://ibiztech.wordpress.com [About Me] - http://about.me/perfexcellence

Iljitsch van Beijnum

11 Mar 11 Mar

3:48 p.m.

On 9 Mar 2012, at 10:02 , Jeff Wheeler wrote:

...

The way we are headed right now, it is likely that the IPv6 address space being issued today will look like "the swamp" in a few short years, and we will regret repeating this obvious mistake.

...

We had this discussion on the list exactly a year ago. At that time, the average IPv6 origin ASN was announcing 1.43 routes. That figure today is 1.57 routes per origin ASN.

The IETF and IRTF have looked at the routing scalability issue for a long time. The IETF came up with shim6, which allows multihoming without BGP. Unfortunately, ARIN started to allow IPv6 PI just in time so nobody bothered to adopt shim6. I haven't followed the IRTF RRG results for a while, but at some point LISP came out of this, where we basically tunnel the entire internet so the core routers don't have to see the real routing table. But back to the topic at hand: filtering long prefixes. There are two reasons you want to do this: 1. Attackers could flood BGP with bogus prefixes to make tables overflow 2. Legitimate prefixes may be deaggregated so tables overflow It won't be quick or easy, but the RPKI stuff should solve 1. So that leaves the issue of deaggregating legitimate prefixes. There are around 100k prefixes given out by the RIRs and nearly 400k in the routing tables. A quick look at the IPv4 BGP table shows that unless I missed the day in school when they covered "reasons to advertise 64 /22s instead of a /16" a good percentage of those deaggregates happen without any legitimate reason. Although the RIRs don't make this as easy as they could, it IS possible to determine the maximum prefix length for any given block of RIR space, and then simply filter on that prefix length. That takes care of the /48s and /32 deaggregating, but unfortunately not the /44s out of space used for /48s or the /20s out of space used for /32s. So the RIRs should up their game here, then we can really hold LIR's feet to the fire and stop them from deaggregating. That does of course leave people who do have a good reason to deaggreagate in the cold. But that's also easy to solve: if you run two separate networks, you need two prefixes and two AS numbers. So the RIRs should develop policies that allow for this if it's reasonable. If that means that an organization can't have both a bunch of independently announced prefixes AND have all those prefixes be part of one aggregate for easy firewall configuration, that's too bad. The RIRs should pick up on this, because there WILL be a moment an ISP deaggregates a /32 into 65000 /48s with the result that half the IPv6 internet goes down.

Joel jaeggli

7:15 p.m.

On 3/11/12 08:48 , Iljitsch van Beijnum wrote:

...

On 9 Mar 2012, at 10:02 , Jeff Wheeler wrote:

...
The way we are headed right now, it is likely that the IPv6 address space being issued today will look like "the swamp" in a few short years, and we will regret repeating this obvious mistake.

...
We had this discussion on the list exactly a year ago. At that time, the average IPv6 origin ASN was announcing 1.43 routes. That figure today is 1.57 routes per origin ASN.

The IETF and IRTF have looked at the routing scalability issue for a long time. The IETF came up with shim6, which allows multihoming without BGP. Unfortunately, ARIN started to allow IPv6 PI just in time so nobody bothered to adopt shim6.

That's a fairly simplistic version of why shim6 failed. A better reason (appart from the fact the building an upper layer overlay of the whole internet on an ip protocol that's largely unedeployed was hard) is that it leaves the destination unable to perform traffic engineering. That fundementaly is the business we're in when advertising prefixes to more than one provider, ingress path selection. Sancho Panza couldn't get us out of that one.

Iljitsch van Beijnum

10:15 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 11 Mar 2012, at 20:15 , Joel jaeggli wrote:

...

...
The IETF and IRTF have looked at the routing scalability issue for a long time. The IETF came up with shim6, which allows multihoming without BGP. Unfortunately, ARIN started to allow IPv6 PI just in time so nobody bothered to adopt shim6.

...

That's a fairly simplistic version of why shim6 failed. A better reason (appart from the fact the building an upper layer overlay of the whole internet on an ip protocol that's largely unedeployed was hard) is that it leaves the destination unable to perform traffic engineering.

I'm not saying that shim6 would have otherwise ruled the world by now, it was always an uphill battle because it requires support on both sides of a communication session/association. But ARIN's action meant it never had a chance. I really don't get why they felt the need to start allowing IPv6 PI after a decade, just when the multi6/shim6 effort started to get going but before the work was complete enough to judge whether it would be good enough.

...

That fundementaly is the business we're in when advertising prefixes to more than one provider, ingress path selection.

That's the business network operators are in. That's not the business end users who don't want to depend on a single ISP are in. Remember, shim6 was always meant as a solution that addresses the needs of a potential 1 billion "basement multihomers" with maybe ADSL + cable. The current 25k or so multihomers are irrelevant from the perspective of routing scalability. It's the other 999,975,000 that will kill the routing tables if multihoming becomes mainstream.

Doug Barton

12 Mar 12 Mar

12:42 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 3/11/2012 3:15 PM, Iljitsch van Beijnum wrote:

...

But ARIN's action meant it never had a chance. I really don't get why they felt the need to start allowing IPv6 PI after a decade

Because as far back as 2003 ARIN members (and members from all the other RIRs for that matter) were saying in very clear terms that PI space was a requirement for moving to v6. No one wanted to lose the provider independence that they had gained with v4. Without that, v6 was a total non-starter. ARIN was simply listening to its members. Doug -- If you're never wrong, you're not trying hard enough

Robert E. Seastrom

3:07 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Doug Barton <dougb@dougbarton.us> writes:

...

On 3/11/2012 3:15 PM, Iljitsch van Beijnum wrote:

...
But ARIN's action meant it never had a chance. I really don't get why they felt the need to start allowing IPv6 PI after a decade

Because as far back as 2003 ARIN members (and members from all the other RIRs for that matter) were saying in very clear terms that PI space was a requirement for moving to v6. No one wanted to lose the provider independence that they had gained with v4. Without that, v6 was a total non-starter.

ARIN was simply listening to its members.

It didn't help that there was initially no implementation of shim6 whatsoever. That later turned into a single prototype implementation of shim6 for linux. As much as I tried to keep an open mind about shim6, eventually it became clear that this was a Gedankenexperiment in protocol design. Somewhere along the line I started publicly referring to it as "sham6". I'm sure I'm not the only person who came to that conclusion. Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI -r

Leigh Porter

3:21 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

...

Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI

-r

A perfect summation. Also given that people understand what PI space is and how it works and indeed it does pretty much just work for the end users of the space. -- Leigh Porter UK Broadband ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com ______________________________________________________________________

Iljitsch van Beijnum

3:56 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 12 Mar 2012, at 16:21 , Leigh Porter wrote:

...

...
Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI

...

A perfect summation.

Except that it didn't happen in that order. When ARIN approved PI the shim6 effort was well underway, but it was too early to be able to know to what degree it would solve the multihoming problem. Earlier, when multi6 was stuck or later, when shim6, at least as a specification, but preferably as multiple implementations, could have been evaluated would both have been reasonable times to decide to go for PI instead. Of course as has been the case over and over the argument "if you give us feature X we'll implement IPv6" has never borne out.

...

Also given that people understand what PI space is and how it works and indeed it does pretty much just work for the end users of the space.

The trouble is that it doesn't scale. Which is fine right now at the current IPv6 routing table size, but who knows what the next decades bring. We've been living with IPv4 for 30 years now, and IPv6 doesn't have a built-in 32-bit expiry date so it's almost certainly going to be around for much longer.

Owen DeLong

5:16 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 12, 2012, at 8:56 AM, Iljitsch van Beijnum wrote:

...

On 12 Mar 2012, at 16:21 , Leigh Porter wrote:

...
...
Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI

...
A perfect summation.

Except that it didn't happen in that order. When ARIN approved PI the shim6 effort was well underway, but it was too early to be able to know to what degree it would solve the multihoming problem. Earlier, when multi6 was stuck or later, when shim6, at least as a specification, but preferably as multiple implementations, could have been evaluated would both have been reasonable times to decide to go for PI instead.

Of course as has been the case over and over the argument "if you give us feature X we'll implement IPv6" has never borne out.

Except it didn't happen that way. The argument wasn't "If you give us PI, we'll implement IPv6." The argument that carried the day and is, IMHO, quite valid was "If you don't give us PI we definitely WON'T implement IPv6." The inability to obtain PI was a serious detractor from IPv6 for any organization that already had IPv4 PI. Shim6 showed no promise whatsoever of changing this even in its most optimistic marketing predictions at the time. (As you point out, it was well underway at that point and it's not as if we didn't look at it prior to drafting the policy proposal.) Frankly, I think the long term solution is to implement IDR based on Locators in the native packet header and not using map/encap schemes that reduce MTU, but that doesn't seem to be a popular idea so far.

...

...
Also given that people understand what PI space is and how it works and indeed it does pretty much just work for the end users of the space.

The trouble is that it doesn't scale. Which is fine right now at the current IPv6 routing table size, but who knows what the next decades bring. We've been living with IPv4 for 30 years now, and IPv6 doesn't have a built-in 32-bit expiry date so it's almost certainly going to be around for much longer.

If IPv6 works out in the 1.6-2:1 prefix:ASN ratio that I expect or even as much as 4:1, we'll get at least another 30 years out of it. Since we've had IPv6 now for about 15 years, it's already half way through that original 30. :p Owen

Joel jaeggli

18 Mar 18 Mar

4:28 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 3/12/12 08:56 , Iljitsch van Beijnum wrote:

...

On 12 Mar 2012, at 16:21 , Leigh Porter wrote:

...
...
Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI

...
A perfect summation.

Except that it didn't happen in that order. When ARIN approved PI the shim6 effort was well underway, but it was too early to be able to know to what degree it would solve the multihoming problem. Earlier, when multi6 was stuck or later, when shim6, at least as a specification, but preferably as multiple implementations, could have been evaluated would both have been reasonable times to decide to go for PI instead.

Recall that from the outset (e.g. long before shim6) some of the very early pi prefixes to be assigned were done to organizations which are not internet service providers in any traditional sense. 2001:490::/32 not an isp... 2001:420::/32 not an isp... having received an assignment under the then existing policy it was not hard for large corporate or academic network operators to describe themselves as LIRs. Morever no-one batted an eye when I deaggregated a /32 into /36s we can hem and haw for a long about the possible prefix count and where one draws the line but it's been a consideration since the begining. If the fundamental distinction for who got a pi prefix and who didn't is scale, well there are a lot of ISPS that are small. That camel had it's nose under the tent from day one.

Seth Mos

12 Mar 12 Mar

3:23 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 12-3-2012 16:07, Robert E. Seastrom wrote:

...

Doug Barton <dougb@dougbarton.us> writes:

...

Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI

+ Cheap End Users = IPv6 NPt (IPv6 Prefix Translation) Cheers, Seth

Owen DeLong

5:09 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 12, 2012, at 8:23 AM, Seth Mos wrote:

...

On 12-3-2012 16:07, Robert E. Seastrom wrote:

...
Doug Barton <dougb@dougbarton.us> writes:

...
Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI

+ Cheap End Users = IPv6 NPt (IPv6 Prefix Translation)

Cheers,

Seth

I don't get the association between cheap end users and NPT. Can you explain how one relates to the other, given the added costs of unnecessarily translating prefixes? Owen

Seth Mos

6:53 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Hi, Op 12 mrt 2012, om 18:09 heeft Owen DeLong het volgende geschreven:

...

...
+ Cheap End Users = IPv6 NPt (IPv6 Prefix Translation)

Cheers,

Seth

I don't get the association between cheap end users and NPT. Can you explain how one relates to the other, given the added costs of unnecessarily translating prefixes?

Well, to explain cheap here I would like to explain it as following: - The existing yumcha plastic soap box that you can buy at your local electronics store is powerful enough. About as fast in v6 as it does v4 since it is all software anyhow. It only gets faster from there. - Requires no cooperation from the ISP. This gets excessively worse where n > 1. Some have 8 or more for added bandwidth. - The excessive cost associated by current ISP practices that demand you use a business connection (at reduced bandwidth and increased cost). Somehow there was a decision that you can't have PI on "consumer" connections. - Traffic engineering is a cinch, since it is all controlled by the single box. For example round robin the connections for increased download speed. Similar to how we do it in v4 land. - It is mighty cheap to implement in current software, a number of Cisco and Jumiper releases support it. The various *bsd platforms do and linux is in development. - Not to underestimate the failover capabilities when almost all routers support 3G dongles for backup internet these days. There are considerable drawbacks ofcourse: - Rewriting prefixes breaks voip/ftp again although without the port rewriting the impact is less, but significant. I'd really wish that h323, ftp and voip would go away. Or other protocols the embed local IP information inside the datagram. But I digress. - People balk at the idea of NAT66, not to underestimate a very focal group here. All for solutions here. :-) - It requires keeping state, so no graceful failover. This means dropping sessions ofcourse but the people that want this likely won't care for the price they are paying. Probably missed a bunch of arguments the people will complain about. It is probably best explained in the current experimental draft for NPt. http://tools.ietf.org/html/rfc6296 Cheers, Seth

Owen DeLong

7:30 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 12, 2012, at 11:53 AM, Seth Mos wrote:

...

Hi,

Op 12 mrt 2012, om 18:09 heeft Owen DeLong het volgende geschreven:

...
...
+ Cheap End Users = IPv6 NPt (IPv6 Prefix Translation)

Cheers,

Seth

I don't get the association between cheap end users and NPT. Can you explain how one relates to the other, given the added costs of unnecessarily translating prefixes?

Well, to explain cheap here I would like to explain it as following:

- The existing yumcha plastic soap box that you can buy at your local electronics store is powerful enough. About as fast in v6 as it does v4 since it is all software anyhow. It only gets faster from there.

Right.

...

- Requires no cooperation from the ISP. This gets excessively worse where n > 1. Some have 8 or more for added bandwidth.

This one doesn't really parse for me. I'm not sure I understand what you are saying.

...

- The excessive cost associated by current ISP practices that demand you use a business connection (at reduced bandwidth and increased cost). Somehow there was a decision that you can't have PI on "consumer" connections.

There's a big gap between PA without NPT and NPT, however. At the consumer level, I'd rather go PA than NPT. For a business, it's a different story, but, for a business, PI seems feasible and I would think that the business connection is sort of a given.

...

- Traffic engineering is a cinch, since it is all controlled by the single box. For example round robin the connections for increased download speed. Similar to how we do it in v4 land.

With all the same dysfunction. Further, in v4 land this depends a great deal on support built into applications and ALGs and a lot of other bloat and hacking to glue the broken little pieces back together and make it all work. I'm truly hoping that we can move away from that in IPv6. I'd really like to see application developers free to develop robust networking code in their applications instead of having to focus all their resources on dealing with the perils and pitfalls of NAT environments.

...

- It is mighty cheap to implement in current software, a number of Cisco and Jumiper releases support it. The various *bsd platforms do and linux is in development.

Well, I guess that depends on how and where you measure cost. Sure, if you only count the cost of making the capability available in the feature set on the router, it's cheap and easy. If you count the cost and overhead of the application bloat and complexity and the support costs, the security costs, etc. it adds up pretty quickly. Sort of like it doesn't cost much to send spam, but, the cost of dealing with the never ending onslaught of unwanted email seems to go up every year. (Yes, I just compared people using NPT to spammers).

...

- Not to underestimate the failover capabilities when almost all routers support 3G dongles for backup internet these days.

If you care that much about failover, PI is a much better solution. I know my view is unpopular, but, I really would rather see PI made inexpensive and readily available than see NAT brought into the IPv6 mainstream. However, in my experience, very few residential customers make use of that 3G backup port.

...

There are considerable drawbacks ofcourse:

- Rewriting prefixes breaks voip/ftp again although without the port rewriting the impact is less, but significant. I'd really wish that h323, ftp and voip would go away. Or other protocols the embed local IP information inside the datagram. But I digress.

Yep.

...

- People balk at the idea of NAT66, not to underestimate a very focal group here. All for solutions here. :-)

For good reason!

...

- It requires keeping state, so no graceful failover. This means dropping sessions ofcourse but the people that want this likely won't care for the price they are paying.

True.

...

Probably missed a bunch of arguments the people will complain about. It is probably best explained in the current experimental draft for NPt. http://tools.ietf.org/html/rfc6296

More than likely. Hopefully we can stop trying so hard to break the internet and start working on ways to make it better soon. Owen

Tim Chown

7:50 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 12 Mar 2012, at 19:30, Owen DeLong wrote:

...

I know my view is unpopular, but, I really would rather see PI made inexpensive and readily available than see NAT brought into the IPv6 mainstream. However, in my experience, very few residential customers make use of that 3G backup port.

So what assumptions do you think future IPv6-enabled homenets might make about the prefixes they receive or can use? Isn't having a PI per residential homenet rather unlikely? It would be desirable to avoid NPTv6 in the homenet scenario. Tim

Owen DeLong

8:04 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 12, 2012, at 12:50 PM, Tim Chown wrote:

...

On 12 Mar 2012, at 19:30, Owen DeLong wrote:

...
I know my view is unpopular, but, I really would rather see PI made inexpensive and readily available than see NAT brought into the IPv6 mainstream. However, in my experience, very few residential customers make use of that 3G backup port.

So what assumptions do you think future IPv6-enabled homenets might make about the prefixes they receive or can use? Isn't having a PI per residential homenet rather unlikely?

Yes, but, having reasonable and/or multiple PA prefixes is very likely and there is no reason not to use that instead of cobbled solutions based on NPT.

...

It would be desirable to avoid NPTv6 in the homenet scenario.

Very much so. (Or any other scenario I can think of as well). Owen

William Herrin

8:15 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mon, Mar 12, 2012 at 3:50 PM, Tim Chown <tjc@ecs.soton.ac.uk> wrote:

...

On 12 Mar 2012, at 19:30, Owen DeLong wrote:

...
I know my view is unpopular, but, I really would rather see PI made inexpensive and readily available than see NAT brought into the IPv6 mainstream. However, in my experience, very few residential customers make use of that 3G backup port.

So what assumptions do you think future IPv6-enabled homenets might make about the prefixes they receive or can use? Isn't having a PI per residential homenet rather unlikely?

Hi Tim, Not at all. You just build a second tier to the routing system. BGP is at the top tier. The second tier anchors SOHO users' provider independent addresses to a dynamically mapped set of top-tier relay addresses where each address in the relay anchor set can reach the SOHO's IP. Then you put an entry relay at many/most ISPs which receives the unrouted portions of PI space, looks up the exit relay set and relays the packet. The ingress relays have to keep some state but it's all discardable (can be re-looked up at any time). Also, they can be pushed close enough to the network edge that they aren't overwhelmed. The egress relays are stateless. Do it right and you get within a couple percent of the routing efficiency of BGP for SOHOs with only two or three ISPs. There are some issues with dead path detection which get thorny but they're solvable. There's also an origin filtering problem: packets originating from the PI space to BGP routed space aren't relayed and the ISP doesn't necessarily need to know that one of the PA addresses assigned to customer X is acting as an inbound relay for PI space. Again: solvable. If you want to dig in to how such a thing might work, read: http://bill.herrin.us/network/trrp.html Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Iljitsch van Beijnum

9:14 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 12 Mar 2012, at 21:15 , William Herrin wrote:

...

Not at all. You just build a second tier to the routing system.

It's so strange how people think a locator/identifier split will solve the scalability problem. We already have two tiers: DNS names and IP addresses. So that didn't solve anything. I don't see any reason a second second tier would.

Łukasz Bromirski

10:20 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 2012-03-12 22:14, Iljitsch van Beijnum wrote:

...

On 12 Mar 2012, at 21:15 , William Herrin wrote:

...
Not at all. You just build a second tier to the routing system.

It's so strange how people think a locator/identifier split will solve the scalability problem. We already have two tiers: DNS names and IP addresses. So that didn't solve anything. I don't see any reason a second second tier would.

Wrong analogy IMHO. Using it, you'd know how to get to specific host in IPv4/IPv6-centric Internet by looking up it's name. Knowing a host is 'thishost.org' doesn't give you information needed to route IPv4/v6 packets that we still use, to this specific system. You still need to lookup the IP assigned to this name. For LISP (other solutions may vary obviously) knowing node 54.100 is available (after lookup) currently at 200.101 makes possibility for core routers to only remember the paths to 200.101/16 and not thousands of this prefix aggregates. This is aggregation of information at the same level of lookup execution. The real problems for world-wide LISP adoption are currently: - nobody sees a FIB explosion for IPv6, because - only around 8k worth of prefixes is in the global IPv6 table Hardly a reason for anyone to implement aggregation. If IPv6 would reach todays IPv4 level of 400k it would be still not a very compelling reason apart from those SPs willing to run all their edge without MPLS and with L3 devices that have very tiny FIBs - like 2/4/8k of entries. Typical core router has ability to forward 2-3M of IPv4 prefixes in hardware, and around 500k-2M of IPv6 prefixes in hardware - today. Ideal LISP use case would be for example 4M of IPv6 prefixes with steady clearly visible growth. Aggregating this down to for example (I've made this completely up) 200k prefixes and still having ability to traffic engineer the paths between the source and destination almost at the levels of having all 4M prefixes in FIB is very compelling reason to deploy LISP. -- "There's no sense in being precise when | Łukasz Bromirski you don't know what you're talking | jid:lbromirski@jabber.org about." John von Neumann | http://lukasz.bromirski.net

William Herrin

10:50 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mon, Mar 12, 2012 at 5:14 PM, Iljitsch van Beijnum <iljitsch@muada.com> wrote:

...

On 12 Mar 2012, at 21:15 , William Herrin wrote:

...
Not at all. You just build a second tier to the routing system.

We already have two tiers: DNS names and IP addresses.

Hi Iljitsch, If only that were true. The DNS doesn't sit to the side of TCP, managing the moment to moment layer 4 to layer 3 mapping function the way ARP sits to the side of IP. Instead, the DNS's function is actuated all the way up at layer 7. This was the crux of my complaint about the getaddrinfo/connect APIs last week. Their design makes a future introduction of a transport protocol, something which actually does interact with the name service at the proper layer, needlessly hard. That and the common non-operation of the DNS TTL invalidates DNS' use as a routing tier. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Geoff Huston

13 Mar 13 Mar

3:33 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 13/03/2012, at 8:14 AM, Iljitsch van Beijnum wrote:

...

On 12 Mar 2012, at 21:15 , William Herrin wrote:

...
Not at all. You just build a second tier to the routing system.

It's so strange how people think a locator/identifier split will solve the scalability problem. We already have two tiers: DNS names and IP addresses. So that didn't solve anything. I don't see any reason a second second tier would.

I think you have encountered an article of faith Iljitsch :-) http://en.wikipedia.org/wiki/Indirectio: Any problem can be solved by adding another layer of indirection.

William Herrin

4:34 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mon, Mar 12, 2012 at 11:33 PM, Geoff Huston <gih@apnic.net> wrote:

...

On 13/03/2012, at 8:14 AM, Iljitsch van Beijnum wrote:

...
On 12 Mar 2012, at 21:15 , William Herrin wrote:

...
Not at all. You just build a second tier to the routing system.

It's so strange how people think a locator/identifier split will solve the scalability problem. We already have two tiers: DNS names and IP addresses. So that didn't solve anything. I don't see any reason a second second tier would.

I think you have encountered an article of faith Iljitsch :-)

http://en.wikipedia.org/wiki/Indirection: Any problem can be solved by adding another layer of indirection.

"But that usually will create another problem." Then the test must be: does any particular proposed layer of indirection solve more intractable and more valuable problems than it creates, enough more valuable to be worth the cost of implementation? Still, I concede that it would be "better" to more effectively use the indirection layer we have (DNS) rather than create another. Better, but not necessarily achievable. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Leo Bicknell

12 Mar 12 Mar

3:31 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

In a message written on Mon, Mar 12, 2012 at 11:07:54AM -0400, Robert E. Seastrom wrote:

...

Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI

I'll also add that Shim6 folks never made a good economic argument. It's true that having routes in the DFZ costs money, and that reducing the number of routes will save the industry money in router upgrades and such to handle more routes. However, it's also true that deploying SHIM6 (or similar solutions) also has a cost in rewritten software, traning for network engineers and administrators, and so on. It was never clear to me that even if it worked 100% as advertised that it would be cheaper / better in the global sense. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

William Herrin

7:07 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mon, Mar 12, 2012 at 11:31 AM, Leo Bicknell <bicknell@ufp.org> wrote:

...

In a message written on Mon, Mar 12, 2012 at 11:07:54AM -0400, Robert E. Seastrom wrote:

...
Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI

It was never clear to me that even if it worked 100% as advertised that it would be cheaper / better in the global sense.

Hi Leo, When I ran the numbers a few years ago, a route had a global cost impact in the neighborhood of $8000/year. It's tough to make a case that folks who need multihoming's reliability can't afford to put that much into the system. As long as the system is largely restricted to folks who do put that much in, there's really no "problem" with the current flood-all-routers multihoming strategy: at $8k/year the demand will never again exceed the supply. A *working* multi-addressed end user system (like shim6 attempted) could solve cheap multihoming. That could have a billion dollar a year impact as folks at the leaf nodes decide they don't need the more costly BGP multihoming. But that's not where the real money is. Often overlooked is that multihoming through multi-addressing could solve IP mobility too. Provider-agnostic and media-agnostic mobility without levering off a "home" router. That's where the money is. Carry your voip call uninterrupted from your home wifi on the cable modem to your cell provider in the car to your employer's wired ethernet and back. Keep your SSH sessions alive on the notebook as you travel from home, to the airport, to London and to the hotel. Let folks access the web server on your notebook as it travels from home, to the airport, to Tokyo and back. The capability doesn't exist today. The potential economic impact of such a capability's creation is unbounded. Unfortunately, shim6 didn't work in some of the boundary cases. Since single-homing works pretty well in the ordinary case, there's not much point to a multihoming protocol that fails to deliver all the boundary cases. IIRC, the main problem was that they tried to bootstrap the layer 3 to layer 2 mapping function instead of externally requesting it. That's like trying to build ARP by making a unicast request to a local router instead of a broadcast/multicast request on the LAN. What happens when the local routers no longer have MAC addresses that you know about? Fail. Also, in complete fairness, shim6 suffered for the general lack of consumer interest in IPv6 that persists even today. It's proponents bought in to the hype that new work should focus on IPv6, and they paid for it. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Masataka Ohta

11:29 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

William Herrin wrote:

...

When I ran the numbers a few years ago, a route had a global cost impact in the neighborhood of $8000/year. It's tough to make a case that folks who need multihoming's reliability can't afford to put that much into the system.

The cost for bloated DFZ routing table is not so small and is paid by all the players, including those who use DFZ but do not multihome. Those who can't pay the cost silently give up to be multihomed, which is why you overlooked them. Even those who pays the cost are not using full routing table for IGP, which makes their multihoming less capable.

...

A *working* multi-addressed end user system (like shim6 attempted)

Shim6 is too poorly designed that it does not work.

...

Often overlooked is that multihoming through multi-addressing could solve IP mobility too.

Not. What is often overlooked is the fact that they are orthogonal problems.

...

Carry your voip call uninterrupted from your home wifi on the cable modem to your cell provider in the car to your employer's wired ethernet and back.

Use mobile IP implemented long before shim6 was designed.

...

Unfortunately, shim6 didn't work in some of the boundary cases. Since single-homing works pretty well in the ordinary case, there's not much point to a multihoming protocol that fails to deliver all the boundary cases.

Just like NAT, shim6 is an intelligent intermediate entity trying to hide its existence from applications, which is why it does not work sometimes just as NAT does not work sometimes. The only end to end way to handle multiple addresses is to let applications handle them explicitly. Masataka Ohta

William Herrin

13 Mar 13 Mar

1:01 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

2012/3/12 Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>:

...

William Herrin wrote:

...
When I ran the numbers a few years ago, a route had a global cost impact in the neighborhood of $8000/year. It's tough to make a case that folks who need multihoming's reliability can't afford to put that much into the system.

The cost for bloated DFZ routing table is not so small and is paid by all the players, including those who use DFZ but do not multihome.

Hi, http://bill.herrin.us/network/bgpcost.html If you believe there's an error in my methodology, feel free to take issue with it.

...

...
Often overlooked is that multihoming through multi-addressing could solve IP mobility too.

Not.

What is often overlooked is the fact that they are orthogonal problems.

I respectfully disagree. Current mobility efforts have gone down a blind alley of relays from a home server and handoffs from one network to the next. And in all fairness, with TCP tightly bound to a particular IP address pair there aren't a whole lot of other options. Nevertheless, it's badly suboptimal. Latency and routing inefficiency rapidly increases with distance from the home node among other major problems. However, there's another way to imagine the problem: Networks become available. Networks cease to be available. No handoff. No home server. Just add and drop. Announce a route into the global system to each available network with priority set based on the node's best estimate of the network's bandwidth, likely future availablilty, etc. Cancel the announcement for any network that has left or is leaving range. Modify the announcement priority as the node's estimate of the network evolves. This is quite impossible with today's BGP core. The update rate would crush the core, as would the prefix count. And if those problems were magically solved, BGP still isn't capable of propagating a change fast enough to be useful for mobile applications. But suppose you had a TCP protocol that wasn't statically bound to the IP address by the application layer. Suppose each side of the connection referenced each other by name, TCP expected to spread packets across multiple local and remote addresses, and suppose TCP, down at layer 4, expected to generate calls to the DNS any time it wasn't sure what addresses it should be talking to. DNS servers can withstand the update rate. And the prefix count is moot. DNS is a distributed database. It *already* easily withstands hundreds of millions of entries in the in-addr.arpa zone alone. And if the node gets even moderately good at predicting when it will lose availability for each network it connects to and/or when to ask the DNS again instead of continuing to try the known IP addresses you can get to where network drops are ordinarily lossless and only occasionally result in a few packet losses over the course of a a single-digit number of seconds. Which would be just dandy for mobile IP applications.

...

The only end to end way to handle multiple addresses is to let applications handle them explicitly.

For connection-oriented protocols, that's nonsense. Pick an appropriate mapping function and you can handle multiple layer 3 addresses just fine at layer 4. Just like we successfully handle layer 2 addresses at layer 3. For connectionless protocols, maybe. Certainly layer 7 knowledge is needed to decide whether each path is operational. However, I'm not convinced that can't be reliably accomplished with a hinting process where the application tells layer 4 its best guess of which send()'s succeeded or failed and lets layer 4 figure out the resulting gory details of address selection. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Josh Hoppes

2:42 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mon, Mar 12, 2012 at 8:01 PM, William Herrin <bill@herrin.us> wrote:

...

But suppose you had a TCP protocol that wasn't statically bound to the IP address by the application layer. Suppose each side of the connection referenced each other by name, TCP expected to spread packets across multiple local and remote addresses, and suppose TCP, down at layer 4, expected to generate calls to the DNS any time it wasn't sure what addresses it should be talking to.

DNS servers can withstand the update rate. And the prefix count is moot. DNS is a distributed database. It *already* easily withstands hundreds of millions of entries in the in-addr.arpa zone alone. And if the node gets even moderately good at predicting when it will lose availability for each network it connects to and/or when to ask the DNS again instead of continuing to try the known IP addresses you can get to where network drops are ordinarily lossless and only occasionally result in a few packet losses over the course of a a single-digit number of seconds.

Which would be just dandy for mobile IP applications.

DNS handles many of millions of records sure, but that's because it was designed with caching in mind. DNS changes are rarely done at the rapid I think you are suggesting except for those who can stand the brunt of 5 minute time to live values. I think it would be insane to try and set a TTL much lower then that, but that would seem to work counter to the idea of sub 10 second loss. If you cut down caching as significantly as I think this idea would suggest I would expect scaling will take a plunge. Also consider the significant increased load on DNS servers to handling the constant stream of dynamic DNS updates to make this possible, and that you have to find some reliable trust mechanism to handle these updates because with out that you just made man in the middle attacks a just a little bit easier. That said, I might be misunderstanding something. I would like to see that idea elaborated.

Mark Andrews

3:12 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

In message <CAMcDhonQqYuzD5CLLZMBKW1tjQ5H6qmLE9LLJo4Z_H4D3coQRw@mail.gmail.com> , Josh Hoppes writes:

...

Also consider the significant increased load on DNS servers to handling the constant stream of dynamic DNS updates to make this possible, and that you have to find some reliable trust mechanism to handle these updates because with out that you just made man in the middle attacks a just a little bit easier.

The DNS already supports cryptographically authenticated updates. There is a good chance that your DHCP server used one of the methods below when you got your lease. SIG(0), TSIG and GSS_TSIG all scale appropiately for this. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

William Herrin

4:54 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mon, Mar 12, 2012 at 10:42 PM, Josh Hoppes <josh.hoppes@gmail.com> wrote:

...

On Mon, Mar 12, 2012 at 8:01 PM, William Herrin <bill@herrin.us> wrote:

...
Which would be just dandy for mobile IP applications.

DNS handles many of millions of records sure, but that's because it was designed with caching in mind. DNS changes are rarely done at the rapid I think you are suggesting except for those who can stand the brunt of 5 minute time to live values. I think it would be insane to try and set a TTL much lower then that, but that would seem to work counter to the idea of sub 10 second loss. If you cut down caching as significantly as I think this idea would suggest I would expect scaling will take a plunge.

Hi Josh, Actually, there was a study presented a few years ago. I think it was at a Fall NANOG. At any rate, a gentleman at a university decided to study the impact of adjusting the DNS TTL on the query count hitting his authoritative server. IIRC he tested ranges from 24 hours to 60 seconds. In my opinion he didn't control properly for browser DNS pinning (which would tend to suppress query count) but even with that taken into account, the increase in queries due to decreased TTLs was much less than you might expect.

...

Also consider the significant increased load on DNS servers to handling the constant stream of dynamic DNS updates to make this possible, and that you have to find some reliable trust mechanism to handle these updates because with out that you just made man in the middle attacks a just a little bit easier.

That's absolutely correct. We would see a ten-factor increase in load on the naming system and could see as much as a two order of magnitude increase in load. But not on the root -- that load increase is distributed almost exclusively to the leaves. And DNS has long since proven it can scale up many orders of magnitude more than that. By adding servers to be sure... but the DNS job parallelizes trivially and well. Route processing, like with BGP, doesn't. And you're right about implementing a trust mechanism suitable for such an architecture. There's quite a bit of cryptographic work already present in DNS updates but I frankly have no idea whether it would hold up here or whether something new would be required. If it can be reduced to "hostname and DNS password," and frankly I'd be shocked if it couldn't, then any problem should be readily solvable.

...

That said, I might be misunderstanding something. I would like to see that idea elaborated.

...

From your questions, it sounds like you're basically following the concept.

I sketched out the idea a couple years ago, working through some of the permutations. And the MPTCP working group has been chasing some of the concepts for a while too, though last I checked they'd fallen into one of the major architectural pitfalls of shim6, trying to bootstrap the address list instead of relying on a mapper. The main problem is that we "can't get there from here." No set of changes modest enough to not be another "IPv6 transition" gets the job done. We'd need to entrench smaller steps in the direction of such a protocol first. Like enhancing the sockets API with a variant of connect() which expects to take a host name and service name and return a connected protocol-agnostic socket. Today, just some under-the-hood calls to a non-blocking getaddrinfo and some parallelized connect()'s that happens to work better and be an easier choice than what most folks could write for themselves. But in the future, a socket connection call which receives all the knowledge that a multi-addressed protocol needs to get the job done without further changes to the application's code. Or, if I'm being fair about it, doing what the MPTCP folks are doing and then following up later with additional enhancements to call out to DNS from the TCP layer. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Masataka Ohta

7:21 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

William Herrin wrote:

...

...
...
When I ran the numbers a few years ago, a route had a global cost impact in the neighborhood of $8000/year. It's tough to make a case that folks who need multihoming's reliability can't afford to put that much into the system.

The cost for bloated DFZ routing table is not so small and is paid by all the players, including those who use DFZ but do not multihome.

Hi,

http://bill.herrin.us/network/bgpcost.html

If you believe there's an error in my methodology, feel free to take issue with it.

Your estimate on the number of routers in DFZ: somewhere between 120,000 and 180,000 with the consensus number near 150,000 is a result of high cost of routers and is inappropriate to estimate global cost of a routing table entry. Because DFZ capable routers are so expensive, the actual number of routers is so limited. If the number of routes in DFZ is, say, 100, many routers and hosts will be default free.

...

...
...
Often overlooked is that multihoming through multi-addressing could solve IP mobility too.

Not.

What is often overlooked is the fact that they are orthogonal problems.

I respectfully disagree.

My statement is based on my experience to implement locator/ID separation system with multi-address TCP and IP mobility. They need separate mechanisms and separate coding.

...

Current mobility efforts have gone down a blind alley of relays from a home server and handoffs from one network to the next. And in all fairness, with TCP tightly bound to a particular IP address pair there aren't a whole lot of other options. Nevertheless, it's badly suboptimal. Latency and routing inefficiency rapidly increases with distance from the home node among other major problems.

That is a mobility issue of triangle elimination having nothing to do with TCP.

...

But suppose you had a TCP protocol that wasn't statically bound to the IP address by the application layer. Suppose each side of the connection referenced each other by name, TCP expected to spread packets across multiple local and remote addresses, and suppose TCP, down at layer 4, expected to generate calls to the DNS any time it wasn't sure what addresses it should be talking to.

Ignoring that DNS does not work so fast, TCP becomes "it wasn't sure what addresses it should be talking to" only after a long timeout.

...

And if the node gets even moderately good at predicting when it will lose availability for each network it connects to and/or when to ask the DNS again instead of continuing to try the known IP addresses you can

What? A node asks DNS IP addresses of its peer, because the node is changing its IP addresses?

...

...
The only end to end way to handle multiple addresses is to let applications handle them explicitly.

For connection-oriented protocols, that's nonsense. Pick an appropriate mapping function and you can handle multiple layer 3 addresses just fine at layer 4.

It will require the applications perform reverse mapping function, when they require raw IP addresses.

...

For connectionless protocols, maybe.

I'm afraid you are unaware of connected UDP.

...

However, I'm not convinced that can't be reliably accomplished with a hinting process where the application tells layer 4 its best guess of which send()'s succeeded or failed and lets layer 4 figure out the resulting gory details of address selection.

That's annoying, which is partly why shim6 failed. It's a lot easier for UDP-based applications directly manage multiple IP addresses. Masataka Ohta

Ryan Malayter

11:49 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 13, 2:21 am, Masataka Ohta <mo...@necom830.hpcl.titech.ac.jp> wrote:

...

William Herrin wrote:

...
...
...
When I ran the numbers a few years ago, a route had a global cost impact in the neighborhood of $8000/year. It's tough to make a case that folks who need multihoming's reliability can't afford to put that much into the system.

...
...
The cost for bloated DFZ routing table is not so small and is paid by all the players, including those who use DFZ but do not multihome.

...
Hi,

...
http://bill.herrin.us/network/bgpcost.html

...
If you believe there's an error in my methodology, feel free to take issue with it.

Your estimate on the number of routers in DFZ:

somewhere between 120,000 and 180,000 with the consensus number near 150,000

is a result of high cost of routers and is inappropriate to estimate global cost of a routing table entry.

Because DFZ capable routers are so expensive, the actual number of routers is so limited.

If the number of routes in DFZ is, say, 100, many routers and hosts will be default free

For quite some time, a sub-$2000 PC running Linux/BSD has been able to cope with DFZ table sizes and handle enough packets per second to saturate two or more if the prevalent LAN interfaces of the day. The reason current routers in the core are so expensive is because of the 40 gigabit interfaces, custom ASICs to handle billions of PPS, esoteric features, and lack of competition. The fact that long-haul fiber is very expensive to run limits the number of DFZ routers more than anything else. Why not take a default route and simplify life when you're at the end of a single coax link? If your lucky enough to have access to fiber from multiple providers, the cost of a router which can handle a full table is not a major concern compared with your monthly recurring charges. -- RPM

Masataka Ohta

1:03 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Ryan Malayter wrote:

...

...
If the number of routes in DFZ is, say, 100, many routers and hosts will be default free

For quite some time, a sub-$2000 PC running Linux/BSD has been able to cope with DFZ table sizes and handle enough packets per second to saturate two or more if the prevalent LAN interfaces of the day.

What if, you run windows?

...

The reason current routers in the core are so expensive is because of the 40 gigabit interfaces, custom ASICs to handle billions of PPS, esoteric features, and lack of competition.

The point of http://bill.herrin.us/network/bgpcost.html was that routers are more expensive because of bloated routing table. If you deny it, you must deny its conclusion.

...

The fact that long-haul fiber is very expensive to run limits the number of DFZ routers more than anything else.

Given that global routing table is bloated because of site multihoming, where the site uses multiple ISPs within a city, costs of long-haul fiber is irrelevant.

...

Why not take a default route and simplify life when you're at the end of a single coax link?

That's fine.

...

If your lucky enough to have access to fiber from multiple providers, the cost of a router which can handle a full table is not a major concern compared with your monthly recurring charges.

As it costs less than $100 per month to have fiber from a local ISP, having them from multiple ISPs costs a lot less is negligible compared to having routers with a so bloated routing table. Masataka Ohta

Owen DeLong

7:18 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 13, 2012, at 6:03 AM, Masataka Ohta wrote:

...

Ryan Malayter wrote:

...
...
If the number of routes in DFZ is, say, 100, many routers and hosts will be default free

For quite some time, a sub-$2000 PC running Linux/BSD has been able to cope with DFZ table sizes and handle enough packets per second to saturate two or more if the prevalent LAN interfaces of the day.

What if, you run windows?

Why would you want to run windows on a box you're trying to use as a router? That's like trying to invade Fort Knox with a bag of plastic soldiers. Leo's point is that you can build/buy a DFZ capable router for less than $2,000. If you run windows, the box will be more expensive, less capable, and less reliable. If that's what you want, knock yourself out, but, it's hardly relevant to the discussion at hand.

...

...
The reason current routers in the core are so expensive is because of the 40 gigabit interfaces, custom ASICs to handle billions of PPS, esoteric features, and lack of competition.

The point of

http://bill.herrin.us/network/bgpcost.html

was that routers are more expensive because of bloated routing table.

If you deny it, you must deny its conclusion.

To a certain extent you are right. I believe that Bill's analysis and his conclusions are deeply flawed in many ways. However, he is marginally correct in that the high cost of core DFZ routers is the product of the large forwarding table multiplied by the cost per forwarding entry in a high-pps high-data-rate system. Further adding to this is the fact that high-rate (pps,data) routers generally need to distribute copies of the FIB to each line card so the cost per forwarding entry is further multiplied by the number of line cards (and in some cases, the number of modules installed on each line card).

...

...
The fact that long-haul fiber is very expensive to run limits the number of DFZ routers more than anything else.

Given that global routing table is bloated because of site multihoming, where the site uses multiple ISPs within a city, costs of long-haul fiber is irrelevant.

Long-haul meaning anything that leaves the building. Yes, it's a poor choice of terminology, but, if you prefer, the costs of last- mile fiber apply equally to Leo's point.

...

...
Why not take a default route and simplify life when you're at the end of a single coax link?

That's fine.

...
If your lucky enough to have access to fiber from multiple providers, the cost of a router which can handle a full table is not a major concern compared with your monthly recurring charges.

As it costs less than $100 per month to have fiber from a local ISP, having them from multiple ISPs costs a lot less is negligible compared to having routers with a so bloated routing table.

$100/month * 2 = $200/month. $200/month pays for a DFZ capable router every year. That's means that the cost of 2*fiber costs quite a bit more than the cost of the router. There is a difference between a DFZ router and a core router. I personally run a DFZ router for my personal AS. I don't personally own or run a core router for my personal AS. The fact that people conflate the idea of a DFZ router with the idea of a core router is part of the problem and a big part of where Bill's cost structure analysis breaks, as you pointed out. Small to medium businesses that want to multihome can easily do so with relatively small investments in equipment which are actually negligible compared to the telecom costs for the multiple connections. Owen

Ryan Malayter

9:21 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 13, 2:18 pm, Owen DeLong <o...@delong.com> wrote:

...

On Mar 13, 2012, at 6:03 AM, Masataka Ohta wrote:

...
Ryan Malayter wrote:

...
...
...
If the number of routes in DFZ is, say, 100, many routers and hosts will be default free

...
...
For quite some time, a sub-$2000 PC running Linux/BSD has been able to cope with DFZ table sizes and handle enough packets per second to saturate two or more if the prevalent LAN interfaces of the day.

...
What if, you run windows?

Why would you want to run windows on a box you're trying to use as a router? That's like trying to invade Fort Knox with a bag of plastic soldiers.

Check your quoting depth... you're attributing Masataka Ohta's comments to me, he brought up running windows. I am the one who brought put forward the notion of a sub-$2000 DFZ router.

Ryan Malayter

10:06 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 13, 8:03 am, Masataka Ohta <mo...@necom830.hpcl.titech.ac.jp> wrote:

...

The point of http://bill.herrin.us/network/bgpcost.html was that routers are more expensive because of bloated routing table. If you deny it, you must deny its conclusion.

Bill's analysis is quite interesting, but my initial take is that it is somehwat flawed. It assumes that the difference between what Cisco charges for a 7606 and a 3750G bears some resemblance to the actual bill of materials needed to support the larger routing table. That simply isn't the case: Cisco rightly charges what they think the market will bear for their routers and switches. I think a more realistic approach would be to use the cost differential between a router model X that supports 1M routes the same model configured to support 2M routes. Or perhaps we could look at the street prices for TCAM expansion modules. Either would be a better indicator of the incremental cost attributable to routing table size. The majority of costs in a mid-to-high-end Cisco/Juniper chassis are "sunk" and have nothing to do with the size of the routing table. The expensive routers currently used by providers are expensive because the market isn't that big in quantity, so they are not commodity items. They are designed to maximize the utility of very expensive long-haul fibers and facilities to a service provider. This means providing a high density of high-speed interfaces which can handle millions to billions of packets per second. They also provide lots of features that service providers and large enterprises want, sometimes in custom ASICs. These are features which have nothing to do with the size of the DFZ routing table, but significantly impact the cost of the device.

...

Given that global routing table is bloated because of site multihoming, where the site uses multiple ISPs within a city, costs of long-haul fiber is irrelevant.

I suppose smaller multi-homed sites can and often do take a full table, but they don't *need* to do so. What they do need is their routes advertised to the rest of the internet, which means they must be in the fancy-and-currently-expensive routers somewhere upstream. This is where the cost of long-haul fiber becomes relevant: Until we can figure out how dig cheaper ditches and negotiate cheaper rights-of- way, there will not be an explosion of the number of full-table provider edge routers, because there are only so many interconnection points where they are needed. Incremental growth, perhaps, but physical infrastructure cannot follow an exponential growth curve.

...

As it costs less than $100 per month to have fiber from a local ISP, having them from multiple ISPs costs a lot less is negligible compared to having routers with a so bloated routing table.

For consumer connections, a sub-$1000 PC would serve you fine with a full table given the level of over-subscription involved. Even something like Quagga or Vyatta running in a virutal machine would suffice. Or a Linksys with more RAM. Getting your providers to speak BGP with you on such a connection for that same $100/month will be quite a feat. Even in your contrived case, however, the monthly recurring charges exceed a $1000 router cost after a few months. Enterprises pay several thousand dollars per month per link for quality IP transit at Gigabit rates.

Owen DeLong

14 Mar 14 Mar

3:13 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

...

...
Given that global routing table is bloated because of site multihoming, where the site uses multiple ISPs within a city, costs of long-haul fiber is irrelevant.

I suppose smaller multi-homed sites can and often do take a full table, but they don't *need* to do so. What they do need is their routes advertised to the rest of the internet, which means they must be in the fancy-and-currently-expensive routers somewhere upstream.

This is where the cost of long-haul fiber becomes relevant: Until we can figure out how dig cheaper ditches and negotiate cheaper rights-of- way, there will not be an explosion of the number of full-table provider edge routers, because there are only so many interconnection points where they are needed. Incremental growth, perhaps, but physical infrastructure cannot follow an exponential growth curve.

Not entirely accurate. Most of the reduction in cost/mbps that has occurred over the last couple of decades has come not from better digging economics (though there has been some improvement there), but rather from more Mpbs per dig. As technology continues to increase the Mbps/strand, strands/cable, etc., the cost/Mbps will continue to drop. I expect within my lifetime that multi-gigabit ethernet will become commonplace in the household LAN environment and that when that becomes reality, localized IP Multicast over multi-gigabit ethernet will eventually supplant HDMI as the primary transport for audio/video streams between devices (sources such as BD players, DVRs, computers, etc. and destinations such as receivers/amps, monitors, speaker drivers, etc.). There are already hackish efforts at this capability in the form of TiVO's HTTTPs services, Sling Box, and others.

...

...
As it costs less than $100 per month to have fiber from a local ISP, having them from multiple ISPs costs a lot less is negligible compared to having routers with a so bloated routing table.

For consumer connections, a sub-$1000 PC would serve you fine with a full table given the level of over-subscription involved. Even something like Quagga or Vyatta running in a virutal machine would suffice. Or a Linksys with more RAM. Getting your providers to speak BGP with you on such a connection for that same $100/month will be quite a feat. Even in your contrived case, however, the monthly recurring charges exceed a $1000 router cost after a few months.

Simpler solution, let the providers speak whatever they will sell you. Ideally, find one that will at least sell you a static address. Then use a tunnel to do your real routing. There are several free tunnel services and I know at least one will do BGP.

...

Enterprises pay several thousand dollars per month per link for quality IP transit at Gigabit rates.

Since this isn't a marketing list, I'll let this one slide by. Owen

Valdis.Kletnieks＠vt.edu

4:18 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Tue, 13 Mar 2012 20:13:41 PDT, Owen DeLong said:

...

I expect within my lifetime that multi-gigabit ethernet will become commonplace in the household LAN environment and that when that becomes reality, localized IP Multicast over multi-gigabit ethernet will eventually supplant HDMI as the primary transport for audio/video streams between devices (sources such as BD players, DVRs, computers, etc. and destinations such as receivers/amps, monitors, speaker drivers, etc.).

The only reason you got HDMI at all was because the content owners managed to get HDCP included. You won't get a replacement that doesn't do HDCP until we fix the sorry state of copyright in the US. So it's equivalent to asking if we're going to fix copyright within your lifetime... :)

Leigh Porter

4:39 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

...

-----Original Message----- From: Valdis.Kletnieks@vt.edu [mailto:Valdis.Kletnieks@vt.edu]

The only reason you got HDMI at all was because the content owners managed to get HDCP included. You won't get a replacement that doesn't do HDCP until we fix the sorry state of copyright in the US.

So it's equivalent to asking if we're going to fix copyright within your lifetime... :)

When the revolution comes, all will be fixed. -- Leigh ______________________________________________________________________ This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com ______________________________________________________________________

Mike Andrews

4:42 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Wed, Mar 14, 2012 at 04:39:21PM +0000, Leigh Porter wrote:

...

...
From: Valdis.Kletnieks@vt.edu [mailto:Valdis.Kletnieks@vt.edu]

The only reason you got HDMI at all was because the content owners managed to get HDCP included. You won't get a replacement that doesn't do HDCP until we fix the sorry state of copyright in the US.

...

...
So it's equivalent to asking if we're going to fix copyright within your lifetime... :)

...

When the revolution comes, all will be fixed.

Mmmmmhmmmmm. Yeah. But until then, it's equivalent to solving the halting problem. -- Mike Andrews, W5EGO mikea@mikea.ath.cx Tired old sysadmin

Robert Bonomi

4:55 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Mike Andrews <mikea@mikea.ath.cx> wrote:

...

On Wed, Mar 14, 2012 at 04:39:21PM +0000, Leigh Porter wrote:

...
...
From: Valdis.Kletnieks@vt.edu [mailto:Valdis.Kletnieks@vt.edu]

The only reason you got HDMI at all was because the content owners managed to get HDCP included. You won't get a replacement that doesn't do HDCP until we fix the sorry state of copyright in the US.

...
...
So it's equivalent to asking if we're going to fix copyright within your lifetime... :)

...
When the revolution comes, all will be fixed.

Mmmmmhmmmmm. Yeah. But until then, it's equivalent to solving the halting problem.

"Come the revolution, things will be different." Not necessarily -better-, but _different_. <wry grin>

Owen DeLong

6:45 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 14, 2012, at 9:18 AM, <Valdis.Kletnieks@vt.edu> <Valdis.Kletnieks@vt.edu> wrote:

...

On Tue, 13 Mar 2012 20:13:41 PDT, Owen DeLong said:

...
I expect within my lifetime that multi-gigabit ethernet will become commonplace in the household LAN environment and that when that becomes reality, localized IP Multicast over multi-gigabit ethernet will eventually supplant HDMI as the primary transport for audio/video streams between devices (sources such as BD players, DVRs, computers, etc. and destinations such as receivers/amps, monitors, speaker drivers, etc.).

The only reason you got HDMI at all was because the content owners managed to get HDCP included. You won't get a replacement that doesn't do HDCP until we fix the sorry state of copyright in the US.

So it's equivalent to asking if we're going to fix copyright within your lifetime... :)

I fully expect them to develop an HDCP-or-equivalent enabled protocol to run over IP Multicast. Do you have any reason you believe that won't happen? Owen

Josh Hoppes

15 Mar 15 Mar

1:37 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Wed, Mar 14, 2012 at 1:45 PM, Owen DeLong <owen@delong.com> wrote:

...

I fully expect them to develop an HDCP-or-equivalent enabled protocol to run over IP Multicast.

Do you have any reason you believe that won't happen?

Owen

I'm pretty sure it's already in place for IPTV solutions.

William Herrin

13 Mar 13 Mar

5:26 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

2012/3/13 Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>:

...

William Herrin wrote:

...
http://bill.herrin.us/network/bgpcost.html

If you believe there's an error in my methodology, feel free to take issue with it.

Your estimate on the number of routers in DFZ:

somewhere between 120,000 and 180,000 with the consensus number near 150,000

is a result of high cost of routers and is inappropriate to estimate global cost of a routing table entry.

Hi, Please elaborate. In what way is the average cost of routers carrying the DFZ table an inappropriate variable in estimating the cost of the routing system?

...

Because DFZ capable routers are so expensive, the actual number of routers is so limited.

If the number of routes in DFZ is, say, 100, many routers and hosts will be default free.

If wishes were horses, beggars would ride. The number of routes in the DFZ isn't 100 and is trending north, not south.

...

...
...
...
Often overlooked is that multihoming through multi-addressing could solve IP mobility too.

Not.

What is often overlooked is the fact that they are orthogonal problems.

I respectfully disagree.

My statement is based on my experience to implement locator/ID separation system with multi-address TCP and IP mobility.

They need separate mechanisms and separate coding.

I've been an IRTF RRG participant and in my day job I build backend systems for mobile messaging devices used in some very challenging and very global IP and non-IP environments. If we're done touting our respective qualifications to hold an opinion, let's get back to vetting the idea itself.

...

...
Current mobility efforts have gone down a blind alley of relays from a home server and handoffs from one network to the next. And in all fairness, with TCP tightly bound to a particular IP address pair there aren't a whole lot of other options. Nevertheless, it's badly suboptimal. Latency and routing inefficiency rapidly increases with distance from the home node among other major problems.

That is a mobility issue of triangle elimination having nothing to do with TCP.

Au contraire. Triangle elimination is a problem because the IP address can't change with session survivability. But that's because TCP and UDP require it. If A follows from B and B follows from C then A follows from C: TCP is at fault.

...

...
But suppose you had a TCP protocol that wasn't statically bound to the IP address by the application layer. Suppose each side of the connection referenced each other by name, TCP expected to spread packets across multiple local and remote addresses, and suppose TCP, down at layer 4, expected to generate calls to the DNS any time it wasn't sure what addresses it should be talking to.

Ignoring that DNS does not work so fast, TCP becomes "it wasn't sure what addresses it should be talking to" only after a long timeout.

Says who? Our hypothetical TCP can become "unsure" as soon as the first retransmission if we want it to. It can even become unsure when handed a packet to send after a long delay with no traffic. There's little delay kicking off the recheck either way.

...

...
And if the node gets even moderately good at predicting when it will lose availability for each network it connects to and/or when to ask the DNS again instead of continuing to try the known IP addresses you can

What? A node asks DNS IP addresses of its peer, because the node is changing its IP addresses?

A re-verify by name lookup kicks off in a side thread any time the target threshold for a certainty heuristic is hit. Inputs into that heuristic include things like the TTL expiration of the prior lookup, the time since successful communication with the peer and the time spent retrying since the last successful communication with the peer. If you have any communication with the peer on any address pair, he can tell you what addresses should still be on your try-me list. If there's a reasonable chance that you've lost communication with the peer, then you ask the DNS server for the peer's latest information.

...

...
...
The only end to end way to handle multiple addresses is to let applications handle them explicitly.

For connection-oriented protocols, that's nonsense. Pick an appropriate mapping function and you can handle multiple layer 3 addresses just fine at layer 4.

It will require the applications perform reverse mapping function, when they require raw IP addresses.

No. The application passes the IP address in a string the same way it passes a name. The layer 4 protocol figures out how it's going to map that to a name. It could do a reverse mapping. It could connect to the raw IP and request that the peer provide a name. There are several other strategies which could be used independently or as a group. But you avoid using them at the application level. Keep that operation under layer 4's control.

...

...
For connectionless protocols, maybe.

I'm afraid you are unaware of connected UDP.

Your fears are unfounded.

...

...
However, I'm not convinced that can't be reliably accomplished with a hinting process where the application tells layer 4 its best guess of which send()'s succeeded or failed and lets layer 4 figure out the resulting gory details of address selection.

It's a lot easier for UDP-based applications directly manage multiple IP addresses.

I'll say it's fair to call that correct until disproven. However, it's worth noting that UDP is used to implement a lot of protocols which are connection-oriented but have characteristics (such as error tolerance and timely delivery requirements) which are inconsistent with TCP. Multiple address management for such protocols could almost certainly be handled below the application level, same as with other connection-oriented protocols. Regardless of the above, it might actually be worth defining a streaming data protocol to operate in parallel with UDP and TCP instead of being loaded on top of UDP. We probably know enough now Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Masataka Ohta

15 Mar 15 Mar

4:48 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

William Herrin wrote:

...

I've been an IRTF RRG participant and in my day job I build backend systems for mobile messaging devices used in some very challenging and very global IP and non-IP environments.

I know non-IP mobile environment is heavily encumbered. So, I can understand why you insist on using DNS for mobility only to make IP mobility as encumbered as non-IP ones.

...

Au contraire. Triangle elimination is a problem because the IP address can't change with session survivability. But that's because TCP and UDP require it. If A follows from B and B follows from C then A follows from C: TCP is at fault.

If a correspondent host CH send packets to a mobile host MH, it may be tunneled by a home agent HA or, with triangle elimination, tunneled by CH itself, in both of which cases, IP address of internal packets within tunnels are that of CH and MH's home address, which is handled by TCP just normally.

...

...
Ignoring that DNS does not work so fast, TCP becomes "it wasn't sure what addresses it should be talking to" only after a long timeout.

Says who? Our hypothetical TCP can become "unsure" as soon as the first retransmission if we want it to. It can even become unsure when handed a packet to send after a long delay with no traffic. There's little delay kicking off the recheck either way.

That may be a encumbered way of doing things in non-IP, or bell headed, mobile systems, where 0.05 second of voice loss is acceptable but 0.2 second of voice loss is significant. However, on the Internet, 0.05 second of packet losses can be significant and things work end to end. In this case, your peer, a mobile host, is the proper end, because it is sure when it has lost or are losing a link. Then, the end establishes a new link with a new IP and initiate update messages for triangle elimination at proper timing without unnecessary checking. According to the end to end argument of Saltzer et. al: The function in question can completely and correctly be implemented only with the knowledge and help of the application standing at the end points of the communication system. the mobility module of the mobile host has "the knowledge" for proper timing to update triangle elimination "the function in question". Masataka Ohta

William Herrin

7:40 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

2012/3/15 Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>:

...

William Herrin wrote:

...
I've been an IRTF RRG participant and in my day job I build backend systems for mobile messaging devices used in some very challenging and very global IP and non-IP environments.

I know non-IP mobile environment is heavily encumbered. So, I can understand why you insist on using DNS for mobility only to make IP mobility as encumbered as non-IP ones.

I don't understand your statement. None of the technologies I work with use the word "encumbered" in a comparable context. Perhaps you could rephrase?

...

...
...
Ignoring that DNS does not work so fast, TCP becomes "it wasn't sure what addresses it should be talking to" only after a long timeout.

Says who? Our hypothetical TCP can become "unsure" as soon as the first retransmission if we want it to. It can even become unsure when handed a packet to send after a long delay with no traffic. There's little delay kicking off the recheck either way.

That may be a encumbered way of doing things in non-IP, or bell headed, mobile systems, where 0.05 second of voice loss is acceptable but 0.2 second of voice loss is significant.

However, on the Internet, 0.05 second of packet losses can be significant and things work end to end.

Get real. Even EAPS takes 0.05 seconds to recover from an unexpected link failure and that's on a trivial Ethernet ring where knowledge propagation is far less complex than a mobile environment. For expected link failures, you can't get any fewer than zero packets lost, which is exactly what my add/drop approach delivers.

...

In this case, your peer, a mobile host, is the proper end, because it is sure when it has lost or are losing a link.

Correct, but...

...

Then, the end establishes a new link with a new IP and initiate update messages for triangle elimination at proper timing without unnecessary checking.

This is where the strategy falls apart every time. You know when your address set changes but you don't know the destination endpoint's instant address set unless either (1) he tells you or (2) he tells a 3rd party which you know to ask. Your set and his set are both in motion so there _will_ be times when your address set changes before he can tell you the changes for his set. Hence #1 alone is an _incomplete_ solution. It was incomplete in SCTP, it was incomplete in Shim6 and it'll be incomplete in MPTCP as well. And oh-by-the-way, if you want to avoid being chatty on every idle connection every time an address set changes and you want either endpoint to be able to reacquire the other when it next has data to send then the probability your destination endpoint has lost all the IP addresses you know about goes way up. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Masataka Ohta

12:52 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

William Herrin wrote:

...

...
I know non-IP mobile environment is heavily encumbered. So, I can understand why you insist on using DNS for mobility only to make IP mobility as encumbered as non-IP ones.

I don't understand your statement. None of the technologies I work with use the word "encumbered" in a comparable context. Perhaps you could rephrase?

OK. You are bell headed.

...

...
However, on the Internet, 0.05 second of packet losses can be significant and things work end to end.

Get real. Even EAPS takes 0.05 seconds to recover from an unexpected link failure

If you keep two or more links, keep them alive, and let them know their IP addresses each other, which can be coordinated by mobile hosts as the ends, links can cooperate to avoid broken links for a lot faster recovery than 0.05s.

...

and that's on a trivial Ethernet ring where knowledge propagation is far less complex than a mobile environment.

The previous statement of mine merely assumes radio links with sudden link failure by, say, phasing. Its spatial diversity arranged by the mobile hosts as the ends. If link failure is expected several seconds before, which is usual with radio links, mobile hosts can smoothly migrate to a new link without any packet losses, because it has much time to resend possibly lost control packets.

...

...
In this case, your peer, a mobile host, is the proper end, because it is sure when it has lost or are losing a link.

Correct, but...

...
Then, the end establishes a new link with a new IP and initiate update messages for triangle elimination at proper timing without unnecessary checking.

This is where the strategy falls apart every time. You know when your address set changes but you don't know the destination endpoint's instant address set unless

Certainly, if two communicating mobile hosts, two ends, changes their IP addresses simultaneously, they can not communicate *DIRECTLY* each other, because they can not receive new IP addresses of their peers. The proper end for the issue is the home agent. Just send triangle elimination messages to the home agent without triangle eliminations. With the new layer of indirection by the home agent, control messages for triangle elimination are sent reliably (though best effort). The home agent knows reachable foreign addresses of mobile hosts, as long as the mobile hosts can tell the home agent their new foreign addresses before the mobile hosts entirely loses their old links.

...

either (1) he tells you or (2) he tells a 3rd party which you know to ask.

(3) he tells his home agent, his first party, to which you, his second party, ask packet forwarding. Unlike DNS servers, the first party is responsible for its home agent.

...

Your set and his set are both in motion so there _will_ be times when your address set changes before he can tell you the changes for his set. Hence #1 alone is an _incomplete_ solution.

A difficulty to understand the end to end principle is to properly recognize ends. Here, you failed to recognize home agents as the essential ends to support reliable communication to mobile hosts.

...

It was incomplete in SCTP, it was incomplete in Shim6 and it'll be incomplete in MPTCP as well.

It is complete though shim6 is utterly incomplete.

...

And oh-by-the-way, if you want to avoid being chatty on every idle connection every time an address set changes and you want either endpoint to be able to reacquire the other when it next has data to send then the probability your destination endpoint has lost all the IP addresses you know about goes way up.

Idle connections may have timeouts for triangle elimination after which they use home agents of their peers. That's how the end to end Internet is working without any packet losses not caused by congestion nor unexpected sudden link failures. Masataka Ohta

William Herrin

5:31 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

2012/3/15 Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>:

...

William Herrin wrote:

...
...
I know non-IP mobile environment is heavily encumbered. So, I can understand why you insist on using DNS for mobility only to make IP mobility as encumbered as non-IP ones.

I don't understand your statement. None of the technologies I work with use the word "encumbered" in a comparable context. Perhaps you could rephrase?

OK. You are bell headed.

If you want to be snippy in English, you should first gain a better command of the language. Neither of your previous statements has a meaning recognized beyond the confines of your own brain.

...

...
Your set and his set are both in motion so there _will_ be times when your address set changes before he can tell you the changes for his set. Hence #1 alone is an _incomplete_ solution.

A difficulty to understand the end to end principle is to properly recognize ends.

Here, you failed to recognize home agents as the essential ends to support reliable communication to mobile hosts.

A device which relays IP packets is not an endpoint, it's a router. It may or may not be a worthy part of a network architecture but it is unambiguously not an endpoint. If that isn't clear to you then don't presume to lecture me about the end to end principle. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Valdis.Kletnieks＠vt.edu

5:48 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Thu, 15 Mar 2012 13:31:42 EDT, William Herrin said:

...

2012/3/15 Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>:

...
OK. You are bell headed.

If you want to be snippy in English, you should first gain a better command of the language. Neither of your previous statements has a meaning recognized beyond the confines of your own brain.

http://www.pcmag.com/encyclopedia_term/0,2542,t=Bellhead&i=38536,00.asp I don't think the term means what Masataka thinks it means, because nobody in this discussion is talking in terms of circuits rather than packet routing.

Tim Franklin

6:23 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

...

I don't think the term means what Masataka thinks it means, because nobody in this discussion is talking in terms of circuits rather than packet routing.

Geographical addressing can tend towards "bellhead thinking", in the sense that it assumes a small number (one?) of suppliers servicing all end users in a geographical area, low mobility, higher traffic volumes towards other end-users in the same or a close geography, relative willingness to renumber when a permanent change of location does occur, and simple, tightly defined interconnects where these single-suppliers can connect to the neighbouring single-supplier and their block of geography. I'm not sure he's right, but I think I understand what he's getting at. Regards, Tim.

Masataka Ohta

11:31 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

William Herrin wrote:

...

...
A difficulty to understand the end to end principle is to properly recognize ends.

Here, you failed to recognize home agents as the essential ends to support reliable communication to mobile hosts.

A device which relays IP packets is not an endpoint, it's a router.

If you want to call something which may not participate in routing protocol exchanges a router, that's fine, it's your terminology. But, as far as HA has "the knowledge" obtained through control packet exchanges with MH, it is the end that can give "the help" to make mobile IP correct and complete.

...

It may or may not be a worthy part of a network architecture but it is unambiguously not an endpoint.

Even ordinary routers are ends w.r.t. routing protocols, though they also behave as intermediate systems to other routers. As LS requires less intelligence than DV, it converges faster.

...

If that isn't clear to you then don't presume to lecture me about the end to end principle.

Here is an exercise for you insisting on DNS, an intermediate system. What if DNS servers, including root ones, are mobile? Masataka Ohta

Valdis.Kletnieks＠vt.edu

16 Mar 16 Mar

12:08 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Fri, 16 Mar 2012 08:31:07 +0900, Masataka Ohta said:

...

Here is an exercise for you insisting on DNS, an intermediate system.

What if DNS servers, including root ones, are mobile?

So, is this question more like: What if computers worked in trinary? or What if people show criminal negligence in misdesigning their networks? You're asking a "what if" for a usage case that nobody sane has suggested.

Masataka Ohta

12:29 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Valdis.Kletnieks@vt.edu wrote:

...

You're asking a "what if" for a usage case that nobody sane has suggested.

If you are saying it's insane to use DNS to manage frequently changing locations of mobile hosts instead of relying on immobile home agents, I fully agree with you. Masataka Ohta

Jimmy Hess

12:49 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

2012/3/15 Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>:

...

Valdis.Kletnieks@vt.edu wrote:

...
You're asking a "what if" for a usage case that nobody sane has suggested.

If you are saying it's insane to use DNS to manage frequently changing locations of mobile hosts instead of relying on immobile home agents, I fully agree with you. Masataka Ohta

Non sequitur. Mobile root DNS servers are what is insane, because the queries must terminate somewhere, and ultimately there must be something that doesn't have a circular dependency -- requiring working DNS to get DNS. As for using DNS to manage frequently changing locations of mobile hosts, DNS is almost perfect for that -- it's just the sort of thing DNS is designed for, depending on how frequent you mean by "frequently changing". -- -JH

Masataka Ohta

1:36 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Jimmy Hess wrote:

...

...
If you are saying it's insane to use DNS to manage frequently changing locations of mobile hosts instead of relying on immobile home agents, I fully agree with you.

...

and ultimately there must be something that doesn't have a circular dependency

It means there is no reason to deny to have immobile home agents.

...

As for using DNS to manage frequently changing locations of mobile hosts, DNS is almost perfect for that -- it's just the sort of thing DNS is designed for,

Not at all, because a DNS client has no idea when its peer, a mobile host, changes its location.

...

depending on how frequent you mean by "frequently changing".

Assuming a mobile host changes base stations every 5 seconds (moving at 144km/h with cell diameters of 200m) and old base stations are still usable within 1 second after the changes, a DNS client must check DNS servers every 0.5 second (reserving 0.5 second for RTT and possible retry). Can you still say it almost perfect? Masataka Ohta

Valdis.Kletnieks＠vt.edu

12:55 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Fri, 16 Mar 2012 09:29:44 +0900, Masataka Ohta said:

...

Valdis.Kletnieks@vt.edu wrote:

...
You're asking a "what if" for a usage case that nobody sane has suggested.

If you are saying it's insane to use DNS to manage frequently changing locations of mobile hosts instead of relying on immobile home agents, I fully agree with you.

I'm specifically saying that "what if the root servers are mobile?" is a stupid question, because nobody sane has proposed that they be mobile. Hope that makes it clearer for you.

William Herrin

2:40 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

2012/3/15 Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>:

...

Even ordinary routers are ends w.r.t. routing protocols, though they also behave as intermediate systems to other routers.

As LS requires less intelligence than DV, it converges faster.

I do believe that's the first time I've heard anybody suggest that a link state routing protocol requires "less intelligence" than a distance vector protocol.

...

...
If that isn't clear to you then don't presume to lecture me about the end to end principle.

Here is an exercise for you insisting on DNS, an intermediate system.

What if DNS servers, including root ones, are mobile?

DNS' basic bootstrapping issues don't change, nor do the solutions. The resovlers find the roots via a set of static well-known layer 3 address which, more and more these days, are actually anycast destinations matching diverse pieces of equipment. It makes no particular sense to enhance their mobility beyond this level. Before you jump up and down and yell "Ah ha!" realize that this is true of a mapping function at any level of the stack. ARP doesn't work without knowing the layer 2 broadcast address and IPv6's ND doesn't work without knowing a static set of multicast addresses. Below the roots, the authoritative zone servers are no different than any other node. If you're willing to tolerate the lowered TTL for your NS server's A and AAAA records then when IP address changes and your parent zone is willing to tolerate dynamic updates for any glue, then you can make DNS updates to the parent zone like any other mobile node. The clients find the recursing resovlers via whatever process assigns the client's IP address, e.g. DHCP or PPP. If it is for some reason useful for the server's base address to change then assign a set of VIPs to the DNS service and route them at layer 3. On the other side of the wall, the recursing resolvers don't particularly care about their source addresses for originating queries to the authoritative servers and will move to the newly favored address with nary a hitch. If you want an actually hard question, try this one: what do you do when fewer than all of the authoritative DNS servers for your node's name are available to receive an update? What do you do when those servers suffer a split brain where each is reachable to some clients but they can't talk to each other? How do you stop bog standard outages from escalating into major network partitions? For that matter, how do you solve the problem with your home agent approach? Is it even capable of having multiple home agents active for each node? How do you keep them in sync? Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Masataka Ohta

4:04 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

William Herrin wrote:

...

...
As LS requires less intelligence than DV, it converges faster.

I do believe that's the first time I've heard anybody suggest that a link state routing protocol requires "less intelligence" than a distance vector protocol.

I mean "intelligence as intermediate systems". DV is a distributed computation by intelligent intermediate systems, whereas, with LS, intermediate systems just flood and computation is by each end.

...

...
Here is an exercise for you insisting on DNS, an intermediate system.

What if DNS servers, including root ones, are mobile?

DNS' basic bootstrapping issues don't change, nor do the solutions.

The resovlers find the roots via a set of static well-known layer 3 address

You failed to deny MH know layer 3 address of its private HA. It's waste of resource for MH have well known IP address of root servers and domain names of its private DNS server and security keys for dynamic update only to avoid to know IP address of its private HA.

...

For that matter, how do you solve the problem with your home agent approach? Is it even capable of having multiple home agents active for each node? How do you keep them in sync?

I actually designed and implemented such a system. Multiple home agents each may have multiple addresses. If some address of HA does not work, MH tries other addresses of HA. If some HA can not communicate with MH, CH may try to use other HA. There is nothing mobility specific. Mobile protocols are modified just as other protocols are modified for multiple addresses. In practice, however, handling multiple addresses is not very useful because selection of the best working address is time consuming unless hosts have default free routing tables. Masataka Ohta

William Herrin

1:39 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

2012/3/16 Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>:

...

William Herrin wrote:

...
...
As LS requires less intelligence than DV, it converges faster.

I do believe that's the first time I've heard anybody suggest that a link state routing protocol requires "less intelligence" than a distance vector protocol.

I mean "intelligence as intermediate systems".

DV is a distributed computation by intelligent intermediate systems, whereas, with LS, intermediate systems just flood and computation is by each end.

That's basically wrong. Both systems perform computation on each router. Link State performs much more complex computation to arrive at its export to the forwarding information base. In fact, Distance Vector's calculation is downright trivial in comparison. The difference is that Link State shares the original knowledge, which it can do before recomputing its own tables. Distance Vector recomputes its own state first and then shares each router's state with the neighbors rather than sharing the original knowledge. The result is that the knowledge propagates faster with Link State and each router recomputes only once for each change. In some cases, distance vector will have to recompute several times before the system settles into a new stable state, delaying the process even further.

...

...
...
Here is an exercise for you insisting on DNS, an intermediate system.

What if DNS servers, including root ones, are mobile?

DNS' basic bootstrapping issues don't change, nor do the solutions.

The resovlers find the roots via a set of static well-known layer 3 address

You failed to deny MH know layer 3 address of its private HA.

Here's a tip for effective written communication: the first time in any document that you use an abbreviation that isn't well known, spell it out.

...

It's waste of resource for MH have well known IP address of root servers and domain names of its private DNS server and security keys for dynamic update only to avoid to know IP address of its private HA.

There's no reason for the Mobile Host to know the IP addresses of the root servers. Like any other host, including MH in your plan, it already knows its domain name and the IP addresses of its private DNS servers. That leaves only the security key. So, by your own accounting I swap knowledge of a topology-independent element (the security key) for a topology-dependent element (an IP address) which may change any time you adjust your home agent's required-to-be-landed network with all of today's vagaries around the renumbering problem.

...

...
For that matter, how do you solve the problem with your home agent approach? Is it even capable of having multiple home agents active for each node? How do you keep them in sync?

I actually designed and implemented such a system. Multiple home agents each may have multiple addresses.

If some address of HA does not work, MH tries other addresses of HA.

If some HA can not communicate with MH, CH may try to use other HA.

There is nothing mobility specific. Mobile protocols are modified just as other protocols are modified for multiple addresses.

In practice, however, handling multiple addresses is not very useful because selection of the best working address is time consuming unless hosts have default free routing tables.

In your home agent architecture, it doesn't matter if they can have multiple addresses. It matters if they can have the same address. Otherwise you're pushing off the generalized continuity of operations problem. One which my DNS add/drop approach handles seamlessly and at a granularity of the individual services on the mobile host. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Masataka Ohta

17 Mar 17 Mar

6:41 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

William Herrin wrote:

...

...
DV is a distributed computation by intelligent intermediate systems, whereas, with LS, intermediate systems just flood and computation is by each end.

That's basically wrong.

Please, don't demonstrate your lack of basic knowledge.

...

Both systems perform computation on each router.

The difference, as you can see in the above sentence of mine, is whether the computation is done as an intermediate system or an end.

...

Link State performs much more complex computation to arrive at its export to the forwarding information base. In fact, Distance Vector's calculation is downright trivial in comparison.

FYI, DV uses Bellman-Ford while LS can use Dijkstra, which is faster than Bellman-Ford. http://en.wikipedia.org/wiki/Routing Distance vector algorithms use the Bellman-Ford algorithm. http://en.wikipedia.org/wiki/Bellman-Ford The Bellman–Ford algorithm computes single-source shortest paths in a weighted digraph. For graphs with only non-negative edge weights, the faster Dijkstra's algorithm also solves the problem. should help you a lot to have basic knowledge.

...

The difference is that Link State shares the original knowledge, which it can do before recomputing its own tables. Distance Vector recomputes its own state first and then shares each router's state with the neighbors rather than sharing the original knowledge. The result is that the knowledge propagates faster with Link State and each router recomputes only once for each change. In some cases, distance vector will have to recompute several times before the system settles into a new stable state, delaying the process even further.

That is implied in my statements. So, don't repeat it in such verbose way only to reduce clarity.

...

...
You failed to deny MH know layer 3 address of its private HA.

Here's a tip for effective written communication: the first time in any document that you use an abbreviation that isn't well known, spell it out.

In this case, the document is the thread. And a tip for you is to remember the past mails in a thread before sending mails to the thread.

...

Like any other host, including MH in your plan, it already knows its domain name and the IP addresses of its private DNS servers.

And, to deny HA, your assumption must be that the private DNS servers may be mobile.

...

In your home agent architecture, it doesn't matter if they can have multiple addresses. It matters if they can have the same address.

That's totally insane operation. There is no reason to have anycast HA only to bloat the global routing table. Masataka Ohta

Valdis.Kletnieks＠vt.edu

15 Mar 15 Mar

7:03 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Thu, 15 Mar 2012 21:52:54 +0900, Masataka Ohta said:

...

...
Get real. Even EAPS takes 0.05 seconds to recover from an unexpected link failure

If you keep two or more links, keep them alive, and let them know their IP addresses each other, which can be coordinated by mobile hosts as the ends, links can cooperate to avoid broken links for a lot faster recovery than 0.05s.

May work for detecting a dead access point in a wireless mesh, but it doesn't scale to WAN sized connections. Standard systems control theory tells us that you can't control a system in less than 2*RTT across the network. There's *plenty* of network paths where endpoint-homebase-endpoint will be over 50ms. Consider the case where one endpoint is in Austria, the other is in Boston, and the node handling the mobility is in Japan. Now a router fails in Seattle. How long will it take for the endpoints to notice? (Alternatively, explain how you locate a suitable home base node closer than Japan. Remember in your explanation to consider that you may not have a business relationship with the carrier that would be an optimum location)

Masataka Ohta

16 Mar 16 Mar

12:25 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Valdis.Kletnieks@vt.edu wrote:

...

...
If you keep two or more links, keep them alive, and let them know their IP addresses each other, which can be coordinated by mobile hosts as the ends, links can cooperate to avoid broken links for a lot faster recovery than 0.05s.

May work for detecting a dead access point in a wireless mesh,

That's not my point. My point is to avoid dead links. Base stations try sending packets to a MH and if it fails a few times, they forward the packet to other base stations which may have live links to the MH.

...

but it doesn't scale to WAN sized connections.

Regardless of whether links are wireless or wired, the coordination is necessary only within (small number of) links to which MH, the end, is attached, which means the coordination is a local coordination if coordinated by the end. There is no WAN involved for the coordination.

...

Consider the case where one endpoint is in Austria, the other is in Boston, and the node handling the mobility is in Japan. Now a router fails in Seattle. How long will it take for the endpoints to notice?

Huh?

...

(Alternatively, explain how you locate a suitable home base node

No home agent is involved for the recovery. Masataka Ohta

Geoff Huston

13 Mar 13 Mar

3:19 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 13/03/2012, at 2:31 AM, Leo Bicknell wrote:

...

In a message written on Mon, Mar 12, 2012 at 11:07:54AM -0400, Robert E. Seastrom wrote:

...
Grass-roots, bottom-up policy process + Need for multihoming + Got tired of waiting = IPv6 PI

I'll also add that Shim6 folks never made a good economic argument. It's true that having routes in the DFZ costs money, and that reducing the number of routes will save the industry money in router upgrades and such to handle more routes. However, it's also true that deploying SHIM6 (or similar solutions) also has a cost in rewritten software, traning for network engineers and administrators, and so on.

It was never clear to me that even if it worked 100% as advertised that it would be cheaper / better in the global sense.

I think that's asking too much of the IETF Leo - Shim6 went through much the same process as most of the IETF work these days: bubble of thought, BOF sanity check, requirements work, protocol prototyping, technology specification. Yes, the economics of routing are strange, and the lack of any real strictures in the routing tables are testament to the observation that despite more than two decades of tossing the idea around we've yet to find the equivalent of a "route deaggregation tax" or a "route advertisement tax" or any other mechanism that effectively turns the routing space into a form of market that imposes some economic constraints on the activity. So after so long looking for such a framework in routing, the hope that someday we will figure it out gets smaller and smaller every day. And in some ways the routing explosion problem is one of fear rather than actuality - the growth rates of the IPv4 routing table have been sitting at around 8% - 15% p.a. for many years. oWhile you can't route the Internet on 15 year old hardware, the growth figures are still low enough under Moore's Law that the unit cost of routing is not escalating at levels that are notably higher than other cost elements for an ISP. Its not the routing table explosion that will cause you to raise your fees or worse, go bankrupt tomorrow. So in some ways for Shim6 to have a "good economic argument" I suspect that Shim6 would have to have pulled out of thin air an approach that completely externalised the cost of routing, and made routing completely free for ISPs. And that is simply fantasy land! Geoff

Leo Bicknell

1:48 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

In a message written on Tue, Mar 13, 2012 at 02:19:00PM +1100, Geoff Huston wrote:

...

On 13/03/2012, at 2:31 AM, Leo Bicknell wrote:

...
It was never clear to me that even if it worked 100% as advertised that it would be cheaper / better in the global sense.

I think that's asking too much of the IETF Leo - Shim6 went through much the same process as most of the IETF work these days: bubble of thought, BOF sanity check, requirements work, protocol prototyping, technology specification.

I think you took my statement a bit too literally, as if I wanted a proof that shim6 would be cheaper than building larger routers. That would be asking way too much. However, shim6 for me never even passed the theoretical smell test economically. To make routers handle more DFZ routes basically means putting more memory in routers. It may be super fancy super expensive fast TCAM to handle the job, but at the end of the day it's pretty much just more memory, which means more money. There's a wild range of estimates as to how many DFZ routers there are out there, but it seems like the low end is 50,000 and the high end is 500,000. A lot of ram and a lot of money for sure, but as far as we can tell a tractable problem even with a growth rate much higher than we have now. Compare and contrast with shim6, even if you assume it does everything it was billed to do. First, it assumes we migrate everyone to IPv6, because it's not an IPv4 solution. Second, it assumes we update well, basically every since device with an IP stack. I'm guessing we're north of 5 billion IP devices in the world, and wouldn't be surprised if the number isn't more like 10 billion. Third, because it is a software solution, it will have to be patched/maintained/ported _forever_. I'm hard pressed in my head to rationalize how maintaining software for the next 50 years on a few billion or so boxes is cheaper in the global sense than adding memory to perhaps half a million routers. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

William Herrin

5:27 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Tue, Mar 13, 2012 at 9:48 AM, Leo Bicknell <bicknell@ufp.org> wrote:

...

I'm hard pressed in my head to rationalize how maintaining software for the next 50 years on a few billion or so boxes is cheaper in the global sense than adding memory to perhaps half a million routers.

For a one-order of magnitude increase in "routes," (upper bound of $30B/year the BGP way) it may or may not be. For a four orders increase ($30T/year) it's self-evidently cheaper to change software on the billion or so boxes. How many "routes" would a system improvement that radically reduced the cost per route add? Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Owen DeLong

7:20 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

It's _WAY_ more than a billion boxes at this point. Owen On Mar 13, 2012, at 10:27 AM, William Herrin wrote:

...

On Tue, Mar 13, 2012 at 9:48 AM, Leo Bicknell <bicknell@ufp.org> wrote:

...
I'm hard pressed in my head to rationalize how maintaining software for the next 50 years on a few billion or so boxes is cheaper in the global sense than adding memory to perhaps half a million routers.

For a one-order of magnitude increase in "routes," (upper bound of $30B/year the BGP way) it may or may not be. For a four orders increase ($30T/year) it's self-evidently cheaper to change software on the billion or so boxes. How many "routes" would a system improvement that radically reduced the cost per route add?

Regards, Bill Herrin

-- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Randy Bush

10:16 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

...

Yes, the economics of routing are strange, and the lack of any real strictures in the routing tables are testament to the observation that despite more than two decades of tossing the idea around we've yet to find the equivalent of a "route deaggregation tax" or a "route advertisement tax" or any other mechanism that effectively turns the routing space into a form of market that imposes some economic constraints on the activity.

among other things, i suspect that the shadow of telco settlements makes us shy away from this. randy

Geoff Huston

10:41 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On 14/03/2012, at 9:16 AM, Randy Bush wrote:

...

...
Yes, the economics of routing are strange, and the lack of any real strictures in the routing tables are testament to the observation that despite more than two decades of tossing the idea around we've yet to find the equivalent of a "route deaggregation tax" or a "route advertisement tax" or any other mechanism that effectively turns the routing space into a form of market that imposes some economic constraints on the activity.

among other things, i suspect that the shadow of telco settlements makes us shy away from this.

Agreed. It's all ugly! The shadow of telco settlement nonsense, the entire issue of route pull vs route push, and the spectre of any such payments morphing into a coerced money flow towards to the so-called tier 1 networks all make this untenable. The topic has been coming up pretty regularly every 2 years since about 1994 to my knowledge, and probably earlier, and has never managed to get anywhere useful. Geoff

Randy Bush

10:58 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

...

...
...
Yes, the economics of routing are strange, and the lack of any real strictures in the routing tables are testament to the observation that despite more than two decades of tossing the idea around we've yet to find the equivalent of a "route deaggregation tax" or a "route advertisement tax" or any other mechanism that effectively turns the routing space into a form of market that imposes some economic constraints on the activity.

among other things, i suspect that the shadow of telco settlements makes us shy away from this.

Agreed. It's all ugly!

The shadow of telco settlement nonsense, the entire issue of route pull vs route push, and the spectre of any such payments morphing into a coerced money flow towards to the so-called tier 1 networks all make this untenable.

The topic has been coming up pretty regularly every 2 years since about 1994 to my knowledge, and probably earlier, and has never managed to get anywhere useful.

so we are left with o name and shame, and we have seen how unsucsessful that has been. the polluters have no shame. o operational incentives. peers' and general routing filters were the classic dis-incentive to deaggregate. but the droids cave in the minute the geeks leave the room (ntt/verio caved within a month or two of my departure). o router hacks. we have had tickets open for many years asking for knob variations on 'if it is covered (from same peer, from same origin, ...), drop it.' none of which seem to move us forward. i guess the lesson is that, as long as we are well below moore, we just keep going down the slippery, and damned expensive, slope. randy

Leo Bicknell

11:35 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

In a message written on Wed, Mar 14, 2012 at 07:58:30AM +0900, Randy Bush wrote:

...

none of which seem to move us forward. i guess the lesson is that, as long as we are well below moore, we just keep going down the slippery, and damned expensive, slope.

Bill's model for price is too simple, and it's because the number of devices with a full table change as the price pressure changes, and that causes other costs. Quite simply, if a box that could take a full table were 10x cheaper, more people would take a full table at the edge. More full tables at the edge probably means more BGP speakers. More BGP speakers means more churn, and churn means the core device needs more CPU. TL;DR A savings in ram may result in an increased need for CPU, based on a change in user behavior. I also think the difference in the BOM to a router vendor is small for most boxes. That is the actual cost to manufacture difference between a 1M route box and 2M route box is noise, on the high end the cost of 40 and 100G optics dominate, and on the low end in a CPU switching box RAM is super-cheap. The only "proof" I can offer is the _lack_ of vendors offering different route-holding profiles, and that the few that do are stuck in the mid-range equipment. If the route memory was such a big factor you would see more vendors with route memory options. Indeed, over time, the number of boxes with route-memory options have dropped over time and I think this is due to the fact that memory prices have dropped _much_ faster than CPU or optic prices. TL;DR backbone routers are on a treadmill for faster interfaces, and memory is a small fraction of their cost, edge routers are on a tredmill for more CPU for edge features, and again RAM is a fraction of their cost. It's only boxes in the middle being squeezed. I'll note Bill used the 6509/7600 platform, which is solidly in the middle and does have route-memory options (Sup720-3C Sup720-3CXL). If my theory is right, he used pretty much the _worst_ case to arrive at his $8k per route figure. The list price difference in these two cards is $12,000 to go from 256,000 routes to 1,000,000 routes. $12,000 / 750,000 routes = 1.6 cents per route per box. That matches Bill's number (and I think is where he got it), $8000 route/box / 1.6 cents/route/box = 500,000 boxes. But that box has a 5-7 year time frame, so it's really more like (being generous) $1600 per route per box per year. Priced a 100 Gig optic lately, or long haul DWDM system? I don't think the cost of routes is "damned expensive". -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

Masataka Ohta

15 Mar 15 Mar

4:18 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Randy Bush wrote:

...

none of which seem to move us forward. i guess the lesson is that, as long as we are well below moore, we just keep going down the slippery, and damned expensive, slope.

As long as we keep using IPv4, we are mostly stopping at /24 and must stop at /32. But, see the subject. It's well above moore. For high speed (fixed time) routed look up with 1M entries, SRAM is cheap at /24 and is fine at /32 but expensive and power consuming TCAM is required at /48. That's one reason why we should stay away from IPv6. Masataka Ohta

Eugen Leitl

10:54 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Thu, Mar 15, 2012 at 01:18:04PM +0900, Masataka Ohta wrote:

...

As long as we keep using IPv4, we are mostly stopping at /24 and must stop at /32.

But, see the subject. It's well above moore.

For high speed (fixed time) routed look up with 1M entries, SRAM is cheap at /24 and is fine at /32 but expensive and power consuming TCAM is required at /48.

That's one reason why we should stay away from IPv6.

What prevents you from using http://www.nature.com/ncomms/journal/v1/n6/full/ncomms1063.html with IPv6?

Masataka Ohta

12:57 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Eugen Leitl wrote:

...

...
For high speed (fixed time) routed look up with 1M entries, SRAM is cheap at /24 and is fine at /32 but expensive and power consuming TCAM is required at /48.

That's one reason why we should stay away from IPv6.

What prevents you from using http://www.nature.com/ncomms/journal/v1/n6/full/ncomms1063.html with IPv6?

Though I didn't paid $32 to read the full paper, it's like a proposal of geography based addressing. So, I should ask what prevents you from using it with IPv4? Masataka Ohta

Eugen Leitl

1:58 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Thu, Mar 15, 2012 at 09:57:10PM +0900, Masataka Ohta wrote:

...

...
...
That's one reason why we should stay away from IPv6.

What prevents you from using http://www.nature.com/ncomms/journal/v1/n6/full/ncomms1063.html with IPv6?

Though I didn't paid $32 to read the full paper, it's like a proposal of geography based addressing.

You can access the free full text at http://arxiv.org/pdf/1009.0267v2.pdf

...

So, I should ask what prevents you from using it with IPv4?

Because IPv4 will be legacy by the time something like this lands, and because IPv6 needs more bits/route so more pain there.

William Herrin

2:25 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Thu, Mar 15, 2012 at 9:58 AM, Eugen Leitl <eugen@leitl.org> wrote:

...

On Thu, Mar 15, 2012 at 09:57:10PM +0900, Masataka Ohta wrote:

...
...
...
That's one reason why we should stay away from IPv6. What prevents you from using http://www.nature.com/ncomms/journal/v1/n6/full/ncomms1063.html with IPv6?

Though I didn't paid $32 to read the full paper, it's like a proposal of geography based addressing.

You can access the free full text at http://arxiv.org/pdf/1009.0267v2.pdf

Hi Eugen, Geographic routing strategies have been all but proven to irredeemably violate the recursive commercial payment relationships which create the Internet's topology. In other words, they always end up stealing bandwidth on links for which neither the source of the packet nor it's destination have paid for a right to use. This is documented in a 2008 Routing Research Group thread. http://www.ops.ietf.org/lists/rrg/2008/msg01781.html If you have a new geographic routing strategy you'd like to table for consideration, start by proving it doesn't share the problem. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Eugen Leitl

2:41 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Thu, Mar 15, 2012 at 10:25:46AM -0400, William Herrin wrote:

...

Geographic routing strategies have been all but proven to irredeemably violate the recursive commercial payment relationships which create the Internet's topology. In other words, they always end up stealing bandwidth on links for which neither the source of the packet nor it's destination have paid for a right to use.

This is documented in a 2008 Routing Research Group thread. http://www.ops.ietf.org/lists/rrg/2008/msg01781.html

If you have a new geographic routing strategy you'd like to table for consideration, start by proving it doesn't share the problem.

I think the problem can be tackled by implementing this in wireless last-mile networks owned and operated by end users. (Obviously the /64 space is enough to carry that information. Long-range could be done via VPN overlay over the Internet). This will reduce the local chatter for route discovery and remove some of the last-mile load on wired connections, which is in ISPs' interest. I think we'll see some 1-10 GBit/s effective bandwidth in sufficiently small wireless cells. If this scenario plays out, this will inch up to low-end gear like Mikrotik and eventually move to the core. I don't think this will initially happen in the network core for the reasons you mentioned.

Scott Brim

2:44 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Thu, Mar 15, 2012 at 10:41, Eugen Leitl <eugen@leitl.org> wrote:

...

On Thu, Mar 15, 2012 at 10:25:46AM -0400, William Herrin wrote:

...
Geographic routing strategies have been all but proven to irredeemably violate the recursive commercial payment relationships which create the Internet's topology. In other words, they always end up stealing bandwidth on links for which neither the source of the packet nor it's destination have paid for a right to use.

This is documented in a 2008 Routing Research Group thread. http://www.ops.ietf.org/lists/rrg/2008/msg01781.html

...

I think the problem can be tackled by implementing this in wireless last-mile networks owned and operated by end users.

Interesting point, and the growth in municipal networks could help. But they are still a vast minority. Scott

William Herrin

5:11 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Thu, Mar 15, 2012 at 10:41 AM, Eugen Leitl <eugen@leitl.org> wrote:

...

On Thu, Mar 15, 2012 at 10:25:46AM -0400, William Herrin wrote:

...
Geographic routing strategies have been all but proven to irredeemably violate the recursive commercial payment relationships which create the Internet's topology. In other words, they always end up stealing bandwidth on links for which neither the source of the packet nor it's destination have paid for a right to use.

I think the problem can be tackled by implementing this in wireless last-mile networks owned and operated by end users. (Obviously the /64 space is enough to carry that information. Long-range could be done via VPN overlay over the Internet).

If an endpoint is allowed to have multiple addresses and allowed to rapidly change addresses then a more optimal last-mile solution is dynamic topological address delegation. Each IP represents a current-best-path coreward through the ISP's network. When the path changes, so do the downstream addresses. Instead of a routing protocol you have an addressing protocol. In theory, such a thing automatically aggregates into very small routing tables. Very much a work in progress: http://bill.herrin.us/network/name/nr1.gif http://bill.herrin.us/network/name/nr2.gif http://bill.herrin.us/network/name/nr3.gif Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004

Masataka Ohta

10:56 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Eugen Leitl wrote:

...

...
So, I should ask what prevents you from using it with IPv4?

Because IPv4 will be legacy by the time something like this lands,

Maybe. But, IPv6 will be so before IPv4 (or, is already IMHO).

...

and because IPv6 needs more bits/route so more pain there.

Feel free to propose filter everything beyond /32 and get accepted by the community. Masataka Ohta

james machado

5:38 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

2012/3/14 Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>: < stuff deleted >

...

For high speed (fixed time) routed look up with 1M entries, SRAM is cheap at /24 and is fine at /32 but expensive and power consuming TCAM is required at /48.

That's one reason why we should stay away from IPv6.

Masataka Ohta

I found this bit of research from 2007 ( http://www.cise.ufl.edu/~wlu/papers/tcam.pdf ). It seems to me there are probably more ways to mix and match different types of ram to be able to deal with this beast. james

Masataka Ohta

16 Mar 16 Mar

12:05 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

james machado wrote:

...

...
For high speed (fixed time) routed look up with 1M entries, SRAM is cheap at /24 and is fine at /32 but expensive and power consuming TCAM is required at /48.

That's one reason why we should stay away from IPv6.

...

I found this bit of research from 2007 ( http://www.cise.ufl.edu/~wlu/papers/tcam.pdf ). It seems to me there are probably more ways to mix and match different types of ram to be able to deal with this beast.

But it's not fixed time. Worse, it synthesis IPv6 table from the current IPv4 ones, which means the number of routing table entries is a lot less than 1M. Masataka Ohta

Ryan Malayter

12 Mar 12 Mar

4:27 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 12, 10:07 am, "Robert E. Seastrom" <r...@seastrom.com> wrote:

...

It didn't help that there was initially no implementation of shim6 whatsoever. That later turned into a single prototype implementation of shim6 for linux. As much as I tried to keep an open mind about shim6, eventually it became clear that this was a Gedankenexperiment in protocol design. Somewhere along the line I started publicly referring to it as "sham6". I'm sure I'm not the only person who came to that conclusion.

I thought the IETF required two inter-operable implementations for protocols. Or was that just for standards-track stuff? Anyway, the effort involved in getting Shim6 implemented globally on all devices would have been nearly as large as switching over all applications from TCP to a protocol with a "proper" session layer, like SCTP. I believe there are libraries that wrap SCTP and make it look like TCP to legacy applications; wouldn't that have been a better approach?

Robert E. Seastrom

7 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

Ryan Malayter <malayter@gmail.com> writes:

...

On Mar 12, 10:07 am, "Robert E. Seastrom" <r...@seastrom.com> wrote:

...
It didn't help that there was initially no implementation of shim6 whatsoever. That later turned into a single prototype implementation of shim6 for linux. As much as I tried to keep an open mind about shim6, eventually it became clear that this was a Gedankenexperiment in protocol design. Somewhere along the line I started publicly referring to it as "sham6". I'm sure I'm not the only person who came to that conclusion.

I thought the IETF required two inter-operable implementations for protocols. Or was that just for standards-track stuff?

Rough consensus and working code is soooooo 1993. -r

Owen DeLong

1:44 a.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 11, 2012, at 3:15 PM, Iljitsch van Beijnum wrote:

...

On 11 Mar 2012, at 20:15 , Joel jaeggli wrote:

...
...
The IETF and IRTF have looked at the routing scalability issue for a long time. The IETF came up with shim6, which allows multihoming without BGP. Unfortunately, ARIN started to allow IPv6 PI just in time so nobody bothered to adopt shim6.

...
That's a fairly simplistic version of why shim6 failed. A better reason (appart from the fact the building an upper layer overlay of the whole internet on an ip protocol that's largely unedeployed was hard) is that it leaves the destination unable to perform traffic engineering.

I'm not saying that shim6 would have otherwise ruled the world by now, it was always an uphill battle because it requires support on both sides of a communication session/association.

But ARIN's action meant it never had a chance. I really don't get why they felt the need to start allowing IPv6 PI after a decade, just when the multi6/shim6 effort started to get going but before the work was complete enough to judge whether it would be good enough.

As the person who led the charge in that action, I can probably answer that question... First, from my perspective at the time, SHIM6 didn't stand a chance. It was massively complex, required modifying the stack on every single end system to yield useful results and mad windows domain administration look simple by comparison. As such, I just didn't see any probability of SHIM6 becoming operational reality. (I think LISP suffers from many, though not all) of the same problems, frankly. I remember having this argument with you at the time, so, I'm surprised you don't remember the other side of the argument from the original discussions. However, there was also tremendous pressure in the community for "We're not going to adopt IPv6 when it puts us at a competitive disadvantage by locking us in to our upstream choices while we have portability with IPv4." Like it or not, that's a reality and it's a reality that is critically important to getting IPv6 adopted on a wider scale. Fortunately, it was a reality we were able to address through policy (though not without significant opposition from purists like yourself and larger providers that like the idea of locking in customers).

...

...
That fundementaly is the business we're in when advertising prefixes to more than one provider, ingress path selection.

That's the business network operators are in. That's not the business end users who don't want to depend on a single ISP are in. Remember, shim6 was always meant as a solution that addresses the needs of a potential 1 billion "basement multihomers" with maybe ADSL + cable. The current 25k or so multihomers are irrelevant from the perspective of routing scalability. It's the other 999,975,000 that will kill the routing tables if multihoming becomes mainstream.

It's not just about depending on a single ISP, it's also about being able to change your mind about which ISPs you are attached to without having to undertake a multi-month corporate-wide project in the process. Let's compare... BGP multihoming with portable PI prefix: 1. Sign new contract. 2. Make new connection. 3. Bring up new BGP session. 4. Verify routes are working in both directions and seen globally. 5. -- 6. -- 7. -- 8. -- 9. Tear down old BGP session. 10. -- 11. Terminate old contract. 12. -- PA based prefix: 1. Sign new contract. 2. Make new connection. 3. Get routing working for new prefix over new connection. 4. Add new prefix to all routers, switches, provisioning systems, databases, etc. 5. Renumber every machine in the company. 6. Renumber all of the VPNs. 7. Deal with all the remote ACL issues. 8. Deal with any other fallout. 9. Turn off old prefix and connection. 10. Deal with the fallout from the things that weren't symptomatic in steps 4-9. 11. Terminate old contract 12. Remove old prefix from all remaining equipment configurations. By my count, that's twice as many steps to move a PA end-user organization and let's face it, steps 5, 6, and 7 (which don't exist in the PI scenario) take the longest and steps 7, 8, and 10 (again, non-existant in the PI scenario) are the most painful and potentially the most costly. No multihomed business in their right mind is going to accept PA space as a viable way to run their network. Owen

Darrel Lewis

6:49 p.m.

New subject: Shim6, was: Re: filtering /48 is going to be necessary

On Mar 11, 2012, at 3:15 PM, Iljitsch van Beijnum wrote:

...

On 11 Mar 2012, at 20:15 , Joel jaeggli wrote:

...
...
The IETF and IRTF have looked at the routing scalability issue for a long time. The IETF came up with shim6, which allows multihoming without BGP. Unfortunately, ARIN started to allow IPv6 PI just in time so nobody bothered to adopt shim6.

...
That's a fairly simplistic version of why shim6 failed. A better reason (appart from the fact the building an upper layer overlay of the whole internet on an ip protocol that's largely unedeployed was hard) is that it leaves the destination unable to perform traffic engineering.

I'm not saying that shim6 would have otherwise ruled the world by now, it was always an uphill battle because it requires support on both sides of a communication session/association.

But ARIN's action meant it never had a chance. I really don't get why they felt the need to start allowing IPv6 PI after a decade, just when the multi6/shim6 effort started to get going but before the work was complete enough to judge whether it would be good enough.

...
That fundementaly is the business we're in when advertising prefixes to more than one provider, ingress path selection.

That's the business network operators are in. That's not the business end users who don't want to depend on a single ISP are in. Remember, shim6 was always meant as a solution that addresses the needs of a potential 1 billion "basement multihomers" with maybe ADSL + cable. The current 25k or so multihomers are irrelevant from the perspective of routing scalability. It's the other 999,975,000 that will kill the routing tables if multihoming becomes mainstream.

When discussing 'why shim6 failed' I think its only fair to include a link to a (well reasoned, imho) network operator's perspective on what it did and did not provide in the way of capabilities that network operators desired. http://www.nanog.org/meetings/nanog35/abstracts.php?pt=NDQ3Jm5hbm9nMzU=&nm=nanog35 -Darrel

Masataka Ohta

7:19 a.m.

Joel jaeggli wrote:

...

That's a fairly simplistic version of why shim6 failed. A better reason (appart from the fact the building an upper layer overlay of the whole internet on an ip protocol that's largely unedeployed was hard) is that

Shim6 failed mostly because of its complexity. It is complex mostly because its architecture is broken, trying to hide existence of shim6 from applications (the end systems within end hosts), which is against the end to end principle and impossible, only to make application modifications even more complicated. Other added features makes shim6 even worse.

...

it leaves the destination unable to perform traffic engineering. That fundementaly is the business we're in when advertising prefixes to more than one provider, ingress path selection.

That's not a inherent problem of architectures with multiple addresses. Destination hosts can listen to advertisements of destination network administrators and suggest source hosts which prefixes are preferred by the administrators. That is the end to end way of destination traffic engineering without bloating routing table entries. Masataka Ohta

Arturo Servin

11 Mar 11 Mar

7:30 p.m.

On 11 Mar 2012, at 09:48, Iljitsch van Beijnum <iljitsch@muada.com> wrote:

...

On 9 Mar 2012, at 10:02 , Jeff Wheeler wrote:

...
The way we are headed right now, it is likely that the IPv6 address space being issued today will look like "the swamp" in a few short years, and we will regret repeating this obvious mistake.

...
We had this discussion on the list exactly a year ago. At that time, the average IPv6 origin ASN was announcing 1.43 routes. That figure today is 1.57 routes per origin ASN.

The IETF and IRTF have looked at the routing scalability issue for a long time. The IETF came up with shim6, which allows multihoming without BGP. Unfortunately, ARIN started to allow IPv6 PI just in time so nobody bothered to adopt shim6. I haven't followed the IRTF RRG results for a while, but at some point LISP came out of this, where we basically tunnel the entire internet so the core routers don't have to see the real routing table.

But back to the topic at hand: filtering long prefixes. There are two reasons you want to do this:

1. Attackers could flood BGP with bogus prefixes to make tables overflow

2. Legitimate prefixes may be deaggregated so tables overflow

It won't be quick or easy, but the RPKI stuff should solve 1.

Unless the attacker uses the same origin AS that is in the ROA. Probably it won't hijack the traffic but it may create a DoS or any other kind of problem. Regards, as

5015

Age (days ago)

5024

Last active (days ago)

List overview

Download

128 comments

40 participants

participants (40)

Arturo Servin
Bernhard Schmidt
Darrel Lewis
Doug Barton
Eugen Leitl
Geoff Huston
George Bonser
George Herbert
Iljitsch van Beijnum
james machado
Jared Mauch
Jeff Wheeler
Jeroen Massar
Jimmy Hess
Joel jaeggli
Josh Hoppes
Leigh Porter
Leo Bicknell
Leo Vegoda
Mark Andrews
Masataka Ohta
Mike Andrews
Mukom Akong T.
Owen DeLong
PC
Randy Bush
Robert Bonomi
Robert E. Seastrom
Ryan Malayter
Sander Steffann
Sascha Lenz
Scott Brim
Seth Mattinen
Seth Mos
Sven Olaf Kamphuis
Tim Chown
Tim Franklin
Valdis.Kletnieks＠vt.edu
William Herrin
Łukasz Bromirski