Re: PI vs PA Address Space
| Stronger hierarchy leads to: | - strong regulation of ISPs | - hinders competition | - no incentive to solve difficult routing problems | - leads to governmental regulation and control Let's revisit the economics of the global Internet. You pay for three things, two of which are real products and one of which is an elasticity factor: 1/ delivery of packets into the global Internet 2/ receipt of packets from the global Internet (reachability) 3/ warm fuzzies ("they know what they're doing; they are responsive to my needs") Item (1) is what you get when your immediate service provider turns up your circuit and you say ip route 0.0.0.0 0.0.0.0 Serial0 on your router. The rate at which you can deliver packets into the Internet is the minimum of the sum of egress bandwidths from your local small-i internet, any choke points in the path to egress points, or the width of your circuit. For example, in the simple case, if you have an E1 and your service provider has a 512kbps circuit to AlterNet, your maximum delivery rate of traffic into the global Internet is 512kbps plus any local connectivity. The pricing for item (1) is typically the cost of the physical connection to you plus some value which reflects the effect your bandwidth utilization is likely to have on choke points plus a percentage. Item (2) is what you get when your immediate service provider has arrangements in place to have their customers' prefixes carried and made reachable nearly ubiquitously. ("Nearly" covers firewalls and networks with policy constraints which are enforced via routing mechanisms). Until fairly recently, the guarantee of even nearly ubiquitous reachability was impossible to make thanks to the way the AUP was enforced. However, once you had the NSFNET backbone service carrying your routing information, you generally nearly ubiquitous routing, thanks to the fact that practically everyone defaulted to AS 690. Then along comes Change. The first two huge changes were the CIX and MAE-EAST, two enormous steps away from the model of AS 690 as the network to which you simply defaulted. Suddenly rather than having PSI aggregated behind AS 690, AlterNet started hearing all their routes directly, and preferring those. Generally speaking, the MAE-EAST participants started on a path wherein they preferred any announcement over anything heard from AS 690, which often enough was left as a default. Over time, some of the MAE-EAST participants stopped defaulting to ANS, partly because the amount of routing information reachable only from ANS grew smaller, and partly because in several ways it's easier to manage full routing for recovery and optimization than it is to manage partial routing plus a default. Eventually routers stopped being able to handle full routing in 16Mb of memory, and suddenly the very real cost of carrying routing information around became clear to a number of providers: how much did replacing a bunch of mostly-AGS+ routers with 64Mb Cisco 7000-series routers cost? This was one of the big pushes behind serious deployment of CIDR. CIDR's principal goal was to keep routing tables small by hiding detail, that is, by aggregating into bigger blocks. (Its secondary goal, full classlessness, is being played with as folks start experimenting with interdomain routing of subnets of classful networks). Originally the need to keep routing tables small was to prevent routers which had not been converted to 64Mb boxes, and which could not get by without knowing large amounts of routing information, from running out of memory and crashing. Recently we have started noticing that, while memory consumption is still a real issue for a number of people in the world, those people with 64Mb boxes are starting to notice that the amount of CPU used by carrying full routing is increasing, especialy as interdomain convergence time is decreasing to the point where an update is seen by most Ciscos in the U.S. in a matter of a few seconds. In normal operation, with the normal background noise of a few flaps per second (largely attributable to flakey network connections and people doing dynamic routing updates for dialup users, and some level of longer-term transitions), most routers talking BGP hardly notice any CPU hit at all. Even those routers doing siginificant amounts of as-path and prefix-based filtering for various reasons (mostly involving backup arrangements and making sure bad things don't happen (giving or receiving accidental transit, not accepting or propagating certain bad prefixes (like not accepting an announcement for one's own backbone network from external peers), and so forth)) are borderline. A couple such boxes spend a constant 30-45% of their CPU handling BGP, others run at a constant 20% handling BGP. When a big transition happens, such as when someone at MCI or Sprint types clear ip bgp * at MAE-EAST+, several routers all over the world jump from less than 10% to 100% CPU utilization for on the order of ten minutes. As the number of prefixes increases -- and routing flap -- both the amount of CPU spent on normal everyday processing and the amount of real time necessary to handle a major transition increases. One observation that has been made is that smaller prefixes are liklier to flap than larger prefixes. An analysis of what prefixes were flapping that I did for the last NANOG seemed to indicate (after much discussion with the folks originating the prefixes) that the majority of flaps were caused by /24s used by dialup customers that got introduced into the global routing system upon connection, and removed when the dialup customer hung up. Multiply this by lots of simultaneous dialup customers and you have a problem. The problem is fixable by aggregation. If you aggregate all these /24s (or /28s or whatever) into something bigger, that something bigger is much less likely to flap, and moreover can easily be set up so that it never flaps at all. Nailing down these problems helps considerably, but the amount of CPU used by BGP in increasing numbers of routers is getting scary. Following the line of reasoning -- which seems to hold up in practice -- that on average, smaller prefixes are likelier to flap over time than larger prefixes, one really wants to see a large reduction in the number of smaller prefixes carried globally. That's not to say that local delegations should be big; a dialup user should get as small a chunk of address space as necessary, a dedicated line customer likewise, in an effort to avoid wasting address space, and also in an effort to assist in aggregating lots of individual connections behind a largeish (/18 or shorter) prefix. So, on the theory that pretty much every prefix that's /18 or shorter aggregates enough links and flap-prone things within it, and with the observation that very few prefixes shorter than 18 bits flap in normal circumstances (pace one international connection that was so completely saturated that BGP kept falling over due to keepalive timeouts, which caused traffic to fall off, which allowed BGP to re-establish itself, causing the cycle to repeat -- this got fixed), several NSPs started talking about how to go about reducing the number of prefixes longer than /24 with global scope to essentially zero. That is, while you can have a /24, /28 or /32 now or in the future, and while it can have local scope within a small-i internet (even one that's a big chunk of the big-I global Internet), right now nothing longer than /24 will have global scope at all, and ***in future blocks***, by default, nothing longer than /18 or /19 (it's /18 now, but it's not entirely inflexible, and dialogues continue) will have global scope. (I note *** in future blocks *** because people get really terrified that their current /24 will become useless Real Soon Now. That is not the plan, and likely won't be necessary any time soon, _especially_ if future allocations can be done right. Things are trending in the right direction.) "Local scope" could be as small as your immediate provider, or that provider's provider, or even a largeish NSP. However, if it's not aggregatable into a larger block, it won't work for interdomain routing among several size-large NSPs. Again, the general idea is to keep interdomain routing working in such a way that it doesn't make moving packets impossible. Which returns us to point #2. Arranging global reachability for a prefix is nontrivial; lots of things happen in the background at all levels in order to make global routing work. You pay your provider to pay their provider to pay their provider etc. to work out the hard problems so that a single piece of email, or an RADB object update or an addition to a configuration in a router or a phone call is all that's necessary for you to announce a new network out to the world. There's a problem though, and that is the cost of making some prefixes reachable is much greater than others. In fact, the cost of making everyone's nonaggregatable /28, /29, ... /32 reachable globally is so great that it is easier to say it simply cannot happen, in large part because the cost includes designing, building and deploying new router technology in several NSPs and ISPs, so that the routers of the world can actually handle enormous numbers of prefixes, especially when someone types clear ip bgp * at a large exchange point. Finally, (3). It's clear that people have different needs and wants and requirements from their service providers. Generally speaking, the bulk of Sprint's customers want the global Internet to work, because their users want sex-on-demand with people in Finland and to go poking around Brandy's Babes' home pages or www.plaything.com, or whatever it is that users do. The bulk of Sprint's customers are pretty clever and realize that while there are alot of things that look really really ugly, even or especially from their perspective, they really are necessary in order to keep the global Internet working. Among the things we do realize is that yes, there are side effects to proxy aggregating a size-large service-provider's non-aggregated CIDR blocks, and yes there are side-effects involved in pushing for renumbering into large aggregatable blocks, and yes there are side-effects to putting up filters that block prefixes longer than 24 bits, and yes there are side effects to rewriting our old policy of, "we talk BGP with you if you're a reseller period" to "we prefer not to talk BGP at all, unless there is a strong technical reason to do so". However, in all these cases the position we take is these ugly things (and yes, a whole bunch of much less ugly things) are necessary in order for the global Internet to work, and in order for us to offer you a level of service such that your customers or corporation or whatever doesn't scream bloody murder at you because things Just Don't Work because some router somewhere just keeled over because it was asked to do too much. Moreover, it's not just Sprint taking this line with their customers -- others do too, and give their customers the warm fuzziness that their customers are willing to pay for. So, in the final analysis, what we're pushing for does not reduce competition in an economic sense, although it does have side-effects. There is plenty of room in the current marketplace for all sorts of competition, and even more room for specializaton and cooperative deals, which is normal for a growth market of this magnitude. Lastly, the people most affected by the side-effects of keeping Sprint's part of the Internet up and running and connecting more than sixty countries and four hundred IP resellers are Sprint's customers and their customers. Given how little we directly compete with our customers anyway, while they are right to wish there were some other way (so does Sprint!), I think they also realize that the last thing we are trying to do is put them out of business or make it difficult for them to compete. Healthy customers makes for healthy revenues. And a healthy Internet makes for healthy customers. That's all. Sean.
Eventually routers stopped being able to handle full routing in 16Mb of memory, and suddenly the very real cost of carrying routing information around became clear to a number of providers: how much did replacing a bunch of mostly-AGS+ routers with 64Mb Cisco 7000-series routers cost?
This memory jump has occured more than once. I remember 4 and 8 meg routers. 16 meg boxen were deamed large enough when they were created. The leap to 64 is just another step in the process.
nothing longer than /18 or /19 (it's /18 now, but it's not entirely inflexible, and dialogues continue) will have global scope.
As an aside, is anyone else besides Sprint behind this /18 model? I know that Sean is a big proponent but I have heard no other public comment on this. (well there was one, which indicated that the community had reached consenses on this point, which is why I ask.) --bill
Eventually routers stopped being able to handle full routing in 16Mb of memory, and suddenly the very real cost of carrying routing information around became clear to a number of providers: how much did replacing a bunch of mostly-AGS+ routers with 64Mb Cisco 7000-series routers cost?
This memory jump has occured more than once. I remember 4 and 8 meg routers. 16 meg boxen were deamed large enough when they were created. The leap to 64 is just another step in the process.
nothing longer than /18 or /19 (it's /18 now, but it's not entirely inflexible, and dialogues continue) will have global scope.
As an aside, is anyone else besides Sprint behind this /18 model? I know that Sean is a big proponent but I have heard no other public comment on this. (well there was one, which indicated that the community had reached consenses on this point, which is why I ask.)
--bill
My commentary? These people are NUTS. I do not, and will not, get behind this /18 model for EXISTING addresses. If someone wishes to put this forward for FUTURE assignments, and give us a cut-over date which we can announce to customers (prospective and current) then I might support THAT. This proposal, implemented retroactively, serves to promote monopoly and tying arrangements to a particular provider, is not in the public interest, violates the assumptions and *statements* made by many over the last several years, and if undertaken as a collusive effort may spawn anti-trust and restraint of trade litigation. For these reason I believe it is *highly* ill advised to attempt to retroactively change the disposition of all the Class "C"s that have previously been delegated both from providers and directly from the Internic itself. BTW, in case it matters, one of our customers has ALREADY been bit by this when they attempted to leave MCSNet and attach to Sprint with addresses delegated from a netblock which Sprint assigned to us. They were first given incorrect information by Sprint's NOC personnell and then forced to renumber not only their internal hosts, but their CUSTOMERS machines. Litigation was imminent in this case and very narrowly avoided. Do we wish to open pandora's box on this one? I say no way. Put a cut over date and proposal forward for the FUTURE. Do *NOT* attempt to change, retroactively, the routability of addresses delegated over the last "N" years or you are begging for more trouble than you can imagine. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity Modem: [+1 312 248-0900] | (shell, PPP, SLIP, leased) in Chicagoland Voice: [+1 312 248-8649] | 7 POPs online through Chicago, all 28.8 Fax: [+1 312 248-9865] | Email to "info@mcs.net" for more information ISDN: Surf at Smokin' Speed | WWW: http://www.mcs.net, gopher: gopher.mcs.net
On Thu, 18 May 1995, Karl Denninger, MCSNet wrote:
nothing longer than /18 or /19 (it's /18 now, but it's not entirely inflexible, and dialogues continue) will have global scope.
As an aside, is anyone else besides Sprint behind this /18 model? I know that Sean is a big proponent but I have heard no other public comment on this. (well there was one, which indicated that the community had reached consenses on this point, which is why I ask.)
These people are NUTS.
BTW, in case it matters, one of our customers has ALREADY been bit by this when they attempted to leave MCSNet and attach to Sprint with addresses delegated from a netblock which Sprint assigned to us. They were first given incorrect information by Sprint's NOC personnell and then forced to renumber not only their internal hosts, but their CUSTOMERS machines.
This appears to be Sprint's policy. It happened to us when we left INSINC (Sprint Canada) for another provider and they refused to let us take a CIDR block with us. Fortunately, we had not yet allocated any of those addresses to customers. My recommendation: If your addresses do not come directly from a NIC, then get the allocation IN WRITING and SIGNED! If your provider will not sign for it, then apply directly to the NIC. Michael Dillon Voice: +1-604-549-1036 Network Operations Fax: +1-604-542-4130 Okanagan Internet Junction Internet: michael@junction.net http://www.junction.net - The Okanagan's 1st full-service Internet provider
In article <Pine.LNX.3.91.950518152411.10855E-100000@okjunc.junction.net>, Michael Dillon <michael@junction.net> wrote:
My recommendation: If your addresses do not come directly from a NIC, then get the allocation IN WRITING and SIGNED! If your provider will not sign for it, then apply directly to the NIC.
Or, just asked to be CC:'d on the SWIP. -- Peter Berger. System Administrator, Telerama Public Access Internet http://www.lm.com/~peterb "His ex-wife died of stretch marks." -Johnny Cash.
My recommendation: If your addresses do not come directly from a NIC, then get the allocation IN WRITING and SIGNED!
Or, just asked to be CC:'d on the SWIP.
SWIP does not imply address ownership - just temporary assignment. The CIDR block still belongs to the provider. -- jerry@mid.net Jerry Anderson, Network Engineer MIDnet Network Operations Center (402) 472-0241 201 N 8th, Suite 421 (402) 472-0240 [fax] Lincoln NE 68508
no, I am not. I don't bend reality to adapt to my disbility, but find solutions to challenges offered. Mike (and i don't whine but work on it) On Thu, 18 May 1995 bmanning@ISI.EDU wrote:
Eventually routers stopped being able to handle full routing in 16Mb of memory, and suddenly the very real cost of carrying routing information around became clear to a number of providers: how much did replacing a bunch of mostly-AGS+ routers with 64Mb Cisco 7000-series routers cost?
This memory jump has occured more than once. I remember 4 and 8 meg routers. 16 meg boxen were deamed large enough when they were created. The leap to 64 is just another step in the process.
nothing longer than /18 or /19 (it's /18 now, but it's not entirely inflexible, and dialogues continue) will have global scope.
As an aside, is anyone else besides Sprint behind this /18 model? I know that Sean is a big proponent but I have heard no other public comment on this. (well there was one, which indicated that the community had reached consenses on this point, which is why I ask.)
--bill
-------------------------------------------------------------------------------- Michael F. Nittmann nittmann@wis.com Network Architect nittmann@b3.com B3 Corporation, Marshfield, WI (CIX Member) (715) 387 1700 xt. 158 US Cyber (SM), Washington DC (715) 573 2448 (715) 831 7922 --------------------------------------------------------------------------------
nothing longer than /18 or /19 (it's /18 now, but it's not entirely inflexible, and dialogues continue) will have global scope.
As an aside, is anyone else besides Sprint behind this /18 model?
Why /18? Is there reasoning or research behind this choice? In Danvers I believe the metric "implemented hosts per routing table entry" was mentioned. -- jerry@mid.net Jerry Anderson, Network Engineer MIDnet Network Operations Center (402) 472-0241 201 N 8th, Suite 421 (402) 472-0240 [fax] Lincoln NE 68508
Why /18? Is there reasoning or research behind this choice?
I'm not sure if there was any analytical justification (not sure how you'd go about doing that with any sort of reality). I believe it was a compromise between the people who wanted /14 and /16 and the people who wanted /19 or /24.
In Danvers I believe the metric "implemented hosts per routing table entry" was mentioned.
Measured how? Regards, -drc
bmanning@ISI.EDU writes:
As an aside, is anyone else besides Sprint behind this /18 model? I know that Sean is a big proponent but I have heard no other public comment on this. (well there was one, which indicated that the community had reached consenses on this point, which is why I ask.)
If Sprint wants to reach European destinations it will not fly because we allocate /19s to new service providers due to our slow-start allocation policy. Sprint knows this. Daniel
As an aside, is anyone else besides Sprint behind this /18 model?
The hard core /18 model ("we won't accept prefixes greater than length 18") is untenable and throws out one of CIDR's features. It does not allow for a time period where a customer is migrating from provider A to provider B and will have end systems living within both provider based prefixes at any instant during the migration. The user community should not be forced into flash cuts, and the providers can make the needed overlap period of time work for bounded time frames. At a minimum, the model needs to be /18+E (E==entropy due to customer migrations). peter
the proposal comes from a provider that cannot route around a cable cut for now more than 24h. Mike On Fri, 19 May 1995 peter@swan.lanl.gov wrote:
As an aside, is anyone else besides Sprint behind this /18 model?
The hard core /18 model ("we won't accept prefixes greater than length 18") is untenable and throws out one of CIDR's features. It does not allow for a time period where a customer is migrating from provider A to provider B and will have end systems living within both provider based prefixes at any instant during the migration.
The user community should not be forced into flash cuts, and the providers can make the needed overlap period of time work for bounded time frames.
At a minimum, the model needs to be /18+E (E==entropy due to customer migrations).
peter
-------------------------------------------------------------------------------- Michael F. Nittmann nittmann@wis.com Network Architect nittmann@b3.com B3 Corporation, Marshfield, WI (CIX Member) (715) 387 1700 xt. 158 US Cyber (SM), Washington DC (715) 573 2448 (715) 831 7922 --------------------------------------------------------------------------------
As an aside, is anyone else besides Sprint behind this /18 model?
The user community should not be forced into flash cuts, and the providers can make the needed overlap period of time work for bounded time frames.
At a minimum, the model needs to be /18+E (E==entropy due to customer migrations).
Then, if we are going to get into implementation details now, the list should figure out just how long that "needed overlap period" is, since if I remember correctly, PST's original note on this indicated annual reductions in the mask value. Right Paul? -- --bill
participants (11)
-
bmanning@ISI.EDU
-
Daniel Karrenberg
-
David R Conrad
-
Jerry Anderson
-
jerry@mid.net
-
Karl Denninger, MCSNet
-
Michael Dillon
-
Michael F. Nittmann
-
peter@swan.lanl.gov
-
peterb@telerama.lm.com
-
Sean Doran