RE: The Gorgon's Knot. Was: Re: Verio Peering Question
| But, we all do, or we aren't talking BGP. The requirements here are not that | large. A Cisco 2651 with 128mb is a valid BGP speaker, these days. That's a | cheap router, indeed. And, router memory is dirt cheap. BGP is based on TCP and thus has the fun property that a big set of changes will pile up in front of a connection to a peer that is slow at processing inbound announcements & withdrawals. The slower you are at processing updates, the more likely you are to be out of sync with reality in such a way that you will begin to notice that you are forwarding some packets the wrong direction into loops or black holes. The slower you are, the greater the backlog you have to chug through to catch up, making you busier for longer periods, which in turn leads to greater backlogs. Slow down too much and the other side will help you out by resetting the session. We've seen this in the past - it's caused MASSIVE outages affecting nearly EVERYONE for hours at a time. Or you can say "smd is protecting his own personal interests" and carry on arguing the equivalent of "ANYBODY can build a modern router using a sufficient amount of ROM" which simply underlines the point that dynamic global routing is an expensive luxury that many people have gotten used to. | The common good is | promoted by allowing these folks to multihome, which would be effectively | prohibited if all networks implimented verio-style filter policies. Think of it as a catalyst for more experimentation with alternative ways of multihoming without the use of BGP. There are several which exist now, and several which are being discussed in multi6 which could be made to exist now without universal software changes. Some brainstorming could result in several other approaches, more or less generalized, but what's the point when the normal cheap-seeming thing to do is to announce CIDR holes to the world? | The number of folks who multihome is large and growing. We should support | this by promoting relatively open filtering policies and allowing /24s to be | truly, globally routable. I think we should encourage people to introduce individual /32s into the network and flap them around a bit, to force some issues which have been avoided becauase first Sprint and then Verio have been willing to take a bunch of negative PR in the act of self-protection (which has the side-effect of protecting alot of people who generate the negative PR, and everyone else). Sean.
Date: Fri, 28 Sep 2001 15:44:38 -0700 (PDT) From: Sean M. Doran <smd@clock.org>
[ snip ]
I think we should encourage people to introduce individual /32s into the network and flap them around a bit, to force some
I've not seen anyone suggest allowing longer than /24 in this thread. However, I'll definitely admit that, with name-based hosting, some webhosts most certainly could want to announce long prefixes.
issues which have been avoided becauase first Sprint and then Verio have been willing to take a bunch of negative PR in the act of self-protection (which has the side-effect of protecting
So allow le 24 at the border. Allow le <whatever> internally, and tag so it doesn't redistribute. Apply appropriate dampening.
alot of people who generate the negative PR, and everyone else).
I guess that someone who never hears a route is certainly safe from flappage. I guess that we can: 1. Continue arguing over right/wrong (nanog-l as a whole; I'm not _quite_ crazy enough to try taking on Sean publicly *grin*) 2. See which approach works in the long run (the network that dies with the most money wins) 3. Establish guidelines on what is "acceptable" table size, CPU utilization, etc., and then decide how to get there. Consider that, with providers being pushed to use name-based hosting, NAT, etc., it's very desirable to "basement multihome". <conspiracy_theory> Are big providers so desparate for business that the want to prevent customers from multihoming, attempting to be the sole vendor of bandwidth? </conspiracy_theory> All that said, I _do_ favor IP allocation based on region. Say I connect to KSCYMO, which connects to CHCGIL or DLLSTX... IP allocation would be from a sub-ARIN entity in one of those regions. Make space portable between providers... Wait a second. All of this sounds vaguely familiar... ;-) Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
On Fri, 28 Sep 2001 23:17:52 BST, "E.B. Dreger" said:
3. Establish guidelines on what is "acceptable" table size, CPU utilization, etc., and then decide how to get there.
Oh, that one's EASY. The global routing table is hereby capped at 125K routes. After that, if you want a route, you have to pay somebody to give up theirs. Problem solved ;) This will have some advantages - it will make companies that want to multi-home calculate the actual benefit of doing so ("we should multihome" becomes "it would cost an estimated $nnK a year in downtime/unreachability/lost sales") so they know how much they want to bid for a routing table entry. For many companies, it may not actually make as much business sense to multihome as they thought. ISPs will have a new thing to market - premium services to enhance reliability and uptime without a route announcement (more aggressive marketing of multihoming to 2 POPs of the same ISP for a discount off the normal price for 2 pipes?) In the dot-bombed crash, a large number of companies will probably be willing to sell off their route for a quick infusion of cash. route squatters will probably not be as big an issue as domain squatters. Disadvantages? ARIN and company are unpopular enough without acting as a commodity trade market for buying and selling routes. And the SEC will of course be on the lookout for insider trading in route futures - expect investigations the first time somebody shorts on a future. ;) It would be a strange new world - but at least the routing table wouldn't be growing. ;) /Valdis
The global routing table is hereby capped at 125K routes. After that, if you want a route, you have to pay somebody to give up theirs.
Of course, we could adopt geographic allocations. North American is still working on (+1) in e.164 space. We could shrink the global route table to a few thousand routes.
--On Friday, 28 September, 2001 10:12 PM -0400 Sean Donelan <sean@donelan.com> wrote:
Of course, we could adopt geographic allocations. North American is still working on (+1) in e.164 space. We could shrink the global route table to a few thousand routes.
We have this at a continental level. At less than a continental level the argument against this is that at lower distances there is a poorer and poorer map between geographic proximity and (network) topological proximity. Pick any major US city without a popular peering point / private peering for a trivial example. Alex Bligh Personal Capacity
On Fri, 28 Sep 2001, E.B. Dreger wrote:
I think we should encourage people to introduce individual /32s into the network and flap them around a bit, to force some
I've not seen anyone suggest allowing longer than /24 in this thread. However, I'll definitely admit that, with name-based hosting, some webhosts most certainly could want to announce long prefixes.
Filtering on prefix size is a pretty absurd idea. A network is not automatically unimportant because it has few addresses. A.ROOT-SERVERS.NET has a single address and www.cnn.com several within something that could be a /25 and a /27. I sure want to be able to reach those as effeciently and reliably as possible. Why should they announce 4000 extra unused addresses just to avoid filtering? On the other hand, filterers do have a point: why are there so many /24s in the global routing table? But then again, this also happens to a lesser degree for larger blocks. Have a look at the 24.x.y.z space, this is pretty ridiculous. Obviously, some networks don't care about the size of the routing table and announce hundreds of routes. Other networks do, and filter the easy targets. (And some networks manage to fall into both categories.) The result being that a group that didn't cause the problem suffers and the problem is not really solved.
3. Establish guidelines on what is "acceptable" table size, CPU utilization, etc., and then decide how to get there.
I don't think this is going to happen. Even if we can agree on these things _today_, everybody has a different view of what is going to happen in the future and how we should prepare for that.
<conspiracy_theory> Are big providers so desparate for business that the want to prevent customers from multihoming, attempting to be the sole vendor of bandwidth? </conspiracy_theory>
I don't think they are actively doing this, but if filtering is "sound engineering" and it happens to make life harder for a lot of those annoying small compitors, well, they can't help that, can they?
All that said, I _do_ favor IP allocation based on region. Say I connect to KSCYMO, which connects to CHCGIL or DLLSTX... IP allocation would be from a sub-ARIN entity in one of those regions. Make space portable between providers...
Wait a second. All of this sounds vaguely familiar... ;-)
The problem with this and many other good ideas is that they can only work well if they are widely adopted. And there are always people who have reasons (legitimate or otherwise) why they want another solution or keep things as they are. But I agree that some form of regional filtering and/or addressing could be beneficial. I live 30 miles from a major interconnect point. I would rather have 30k prefixes up to /24 or even larger that are reachable over this exchange point in my routing table and have a default for the rest of the world, than run full routing but only for RIR assigned blocks. But then, I buy transit so I don't have to be defaultless. But a defaultless network could accept large prefixes at exchange points but keep them local and only propagate RIR block filtered routes throughout the network. This would work better if routes were colored with information about their origin region, though. Even better would be if the RIRs would divvy up the world in 10 - 20 regions, and allocate a /8 - /10 to each. That way, the routers don't have to know all individual routes to some remote region, but they can simply forward the traffic to a part of the network that does know the region-specific routes. If anybody bothers to reply to this, you will see that there are numerous reasons why this isn't "the" solution. However, it may help some people some of the time, and it doesn't impact those who don't want to use it. And, more importantly: it doesn't require universal cooperation. Just the RIR's. Iljitsch van Beijnum
Date: Wed, 3 Oct 2001 11:02:44 +0200 (CEST) From: Iljitsch van Beijnum <iljitsch@muada.com>
Filtering on prefix size is a pretty absurd idea. A network is
[ snip anecdotes: a.root-servers.net & www.cnn.com ] And eBay's /24, /23, and /22 blocks; see one of my earlier posts. Yes, I agree... I probably should reverse myself: filtering > /24 is _not_ acceptable. My reasoning was that, if some idiot announces each dialup /32, their upstream would want to use filters. Prefixes shorter than /24 could quickly chew up 100K routes (I'm too lazy to do the math, but anyone following this thread is more than capable), but I'd consider the probability to be rather low. Providers are pretty good about distribute-list and filter-list checks... if you want to advert more than space they provide, you contact them out-of-band. Maybe we do something similar re prefix length. Sure, I'd love to move the route count problem to the edge. But the whole reason that places filter is because they think that the edge _isn't_ doing a good enough job. Perhaps we should enforce prefix length adverts top-down, in the same manner that IP space utilization is enforced? Now, Verio's policy wouldn't be so bad if it correlated with something official from RIRs. e.g.: * Globally-routable /32 in 126/8 * Globally-routable /27-/30 in 125/8 * ... /24-/26 in 124/8 My complaint is that Verio is taking rather great liberties in their filtering that, IMHO, _do not_ correlate well with allocations. Good idea in an ideal world, but they need to operate in reality. If we can change reality... all the power to them. I'll turn totally pro-filtering if we can accurately say "this is the shortest globally-routable prefix allowed in _this_ netblock". Wait a second... swamp /24s haven't all been returned yet. No, the above paragraph just won't work. We could require justification of existing netblocks... no, that would mean pain for everyone, not just new allocations. I guess that we'll keep going along Status Quo Road until we run out of IPv4 space, then go on a big witch-hunt. Anyone care to mark my words on this? (I only hope that I'm wrong!)
On the other hand, filterers do have a point: why are there so many /24s in the global routing table? But then again, this also happens to a lesser degree for larger blocks. Have a look at the 24.x.y.z space, this is pretty ridiculous.
No kidding. FWIW, we advert three routes: /22, /22, /23. A couple of /24s will be announced soon. Like Jeff, we'd gladly renumber, into a single /20 in our case. When we're through getting beaten up by ARIN and get a PI /20, we'll do just that. In the mean time, we'll keep advertising 3x as many routes as we should. I know of another place that is renumbering into a PI /19, and will finally give up about 8-10 longer prefixes. More table pollution. Current IP allocation policies are just not conduceive to efficient routing tables. We little guys can't get portable space, and upstreams must use theirs efficiently. Result? Routing table fragmentation. Pre-CIDR days, anyone?
Obviously, some networks don't care about the size of the routing table and announce hundreds of routes. Other networks do, and filter the easy targets. (And some networks manage to fall into both categories.) The result being that a group that didn't cause the problem suffers and the problem is not really solved.
Yup. [ snip regional-routing discourse ]
Even better would be if the RIRs would divvy up the world in 10 - 20 regions, and allocate a /8 - /10 to each. That way, the routers don't have to know all individual routes to some remote region, but they can simply forward the traffic to a part of the network that does know the region-specific routes.
Aggregation at its finest. :-) Furthermore, if one could _know_ that a given netblock was in a specific geographical location, one could more easily correlate latency with netblocks. Sure, 202/7 is APNIC territory. Alas, that's just a best case under the current scenario.
If anybody bothers to reply to this, you will see that there are numerous reasons why this isn't "the" solution. However, it
Anyone with The Solution is free to flame anything I have said. All public lartings will be accepted.
may help some people some of the time, and it doesn't impact those who don't want to use it. And, more importantly: it doesn't require universal cooperation. Just the RIR's.
Even that could be difficult. :-( But it's definitely orders of magnitude better than universal cooperation.
Iljitsch van Beijnum
Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
Also sprach E.B. Dreger
FWIW, we advert three routes: /22, /22, /23. A couple of /24s will be announced soon. Like Jeff, we'd gladly renumber, into a single /20 in our case. When we're through getting beaten up by ARIN and get a PI /20, we'll do just that. In the mean time, we'll keep advertising 3x as many routes as we should.
*IF* you get a /20. I don't know how anal ARIN is about it, but looking at the rough numbers you posted, I'm not sure you technically qualify. I could be wrong...at a remote pop at the moment waiting for cisco TAC to call me back at the moment, so can't double-check. Our experience was that we had a couple of /24's, a couple of /23's and a /20. ARIN gave us another /20 without any requirements to renumber out of any of our existing blocks (which, to be quite honest, we were expecting to have to do). If they had given us a /19, we'd have been fine (not looking *forward* to the process of renumbering, but perfectly willing to do so) with the process of renumbering out of one or more of our older blocks. Again, not only is the incentive to renumber into more aggregatable blocks not there, there's actually a *dis*incentive to do so. -- Jeff McAdams Email: jeffm@iglou.com Head Network Administrator Voice: (502) 966-3848 IgLou Internet Services (800) 436-4456
On Wed, 3 Oct 2001, Iljitsch van Beijnum wrote:
Even better would be if the RIRs would divvy up the world in 10 - 20 regions, and allocate a /8 - /10 to each. That way, the routers don't have to know all individual routes to some remote region, but they can simply forward the traffic to a part of the network that does know the region-specific routes.
I'm afraid that doesn't work. It's great when there is exactly one provider and nobody multihomes. As soon as people start multihoming then they have to start announcing smaller prefixes everywhere. Then people will no longer have circuits to the previous monopoly provider so even if you routed to the /8 it won't get through. Sift things around for a few years and you have people in that region connecting to every possible backbone provider plus most of the 2nd tiers and misc other countries. Take a look at 203.0.0.0/10 (from memory) which is Telstra's allocation for Australia. Almost every single ip in that range is in Australia but there are hundreds of different paths as ISPs in that range have switched providers and circuits over the years. Didn't we have this argument with 8+8 ? -- Simon Lyall. | Newsmaster | Work: simon.lyall@ihug.co.nz Senior Network/System Admin | Postmaster | Home: simon@darkmere.gen.nz ihug, Auckland, NZ | Asst Doorman | Web: http://www.darkmere.gen.nz
On Thu, 4 Oct 2001, Simon Lyall wrote:
Even better would be if the RIRs would divvy up the world in 10 - 20 regions, and allocate a /8 - /10 to each.
I'm afraid that doesn't work. It's great when there is exactly one provider and nobody multihomes. As soon as people start multihoming then they have to start announcing smaller prefixes everywhere.
Only when multihomers routinely connect to networks that only interconnect outside the region. In other words: as long as there is at least one widely-used interconnect point in the region, this should not be a problem. (There are some (rare, IMHO) failure modes that are not fatal with current practice that are in this scenario, though.) 10 to 20 regions means about three regions to a continent. That's not too unreasonable.
Sift things around for a few years and you have people in that region connecting to every possible backbone provider plus most of the 2nd tiers and misc other countries.
But Asian/Australian networks tend to connect to the US west coast, European networks to the US east coast. And even if a relatively large number of exceptions exist, savings are possible.
Didn't we have this argument with 8+8 ?
I wasn't there... But the argument shouldn't be about how much this will help, but about how much it will hurt. I don't think it will hurt anyone, so even if there is just a chance that it will help, we should do it.
Date: Sun, 7 Oct 2001 22:38:44 +0200 (CEST) From: Iljitsch van Beijnum <iljitsch@muada.com>
[ snip ]
10 to 20 regions means about three regions to a continent. That's not too unreasonable.
Furthermore, nothing says that there must be a mapping stating "this IP space is for this one region". Let's say that, in the U.S., CHI is the base for "north", DFW for "south", D.C. for "east", and Bay area for "west". All except E/W are valid combos. (e.g.: being in KS, I could be in "north" or "south", connected to CHI or DFW.) The number of region combos is "4 choose 2 minus 1", or 5: + N/E 126.0.0.0/11 + N/W 126.32.0.0/11 + N/S 126.64.0.0/11 + E/S 126.96.0.0/11 + W/S 126.128.0.0/11 Assign IP space based on one of those regions...
Sift things around for a few years and you have people in that region connecting to every possible backbone provider plus most of the 2nd tiers and misc other countries.
...rinse and repeat for E-US/W-EU, W-US/E-JP, etc.
But Asian/Australian networks tend to connect to the US west coast, European networks to the US east coast. And even if a relatively large number of exceptions exist, savings are possible.
I agree. Any comments on my above overlapping system? It's virtually impossible for one to no longer connect to one's "home" region. If "two closest points" isn't flexible enough, we can move to three closest points: "N choose 3 minus invalid_combos" is still fewer routes by far than the status quo. Let's take this a step further. Say that we divide the US into these "major hubs": Seattle, SF Bay, LA, San Diego, Phoenix, Salt Lake, Denver, DFW, Kansas City, Saint Louis, Chicago, Atlanta, Miami, D.C., NYC, Boston, Philadelphia, Twin Cities. Yes, I'm ignoring many cities. So what. This is an example... everyone feel free to tear it apart and improve upon it. I count 18 different hubs. Now let's say that we divide address space such that it a given netblock can be native to any of five different hubs -- "18 choose 5" different netblocks = 8568 netblocks. Now consider how many are invalid... the actual number is much lower. Using this logic, we can divide the entire CONUS into a few thousand netblocks. Let's say that I use 125.100.75.50/24. Let's further assume that this is in 125.96.0.0/11, which is "KC+STL+CHI+DFW+DEN". Any backbone provider servicing me in Wichita probably will connect to one of those hubs. I announce my /24 to Savvis and GBLX. They announce to peers. Peers can agg geographic traffic as they please. Someone in NYC who uses Sprint only sees 125.96.0.0/11, and knows that Sprint can get there... and that's all that matters. To get from NYC to Wichita, Sprint will interconnect with Savvis or GBLX in KC, STL, CHI, DFW, or DEN. I know that this creates peering problems, and the system won't quite work as stated... but I'm trying to brainstorm to the list in hopes that _something_ will come of it.
Didn't we have this argument with 8+8 ?
I wasn't there... But the argument shouldn't be about how much this will help, but about how much it will hurt. I don't think it will hurt anyone, so even if there is just a chance that it will help, we should do it.
Sort of... renumbering for naught is a bad thing. However, using a new, even marginally better, policy on new IP space would help. Back to server building... Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
On Sun, 7 Oct 2001, E.B. Dreger wrote:
10 to 20 regions means about three regions to a continent. That's not too unreasonable.
Furthermore, nothing says that there must be a mapping stating "this IP space is for this one region".
Let's say that, in the U.S., CHI is the base for "north", DFW for "south", D.C. for "east", and Bay area for "west". All except E/W are valid combos. (e.g.: being in KS, I could be in "north" or "south", connected to CHI or DFW.)
There are many ways this could work. I think a system where addresses are used in a smaller area would probably be better: you can always decide to accept Kansas addresses in Chicago or New York or Madrid if you want (as long as ISPs announce customer routes everywhere), but once some people in Kansas only connect to Dallas and others only to Chicago, some of the advantage is lost: you have to connect to both.
But Asian/Australian networks tend to connect to the US west coast, European networks to the US east coast. And even if a relatively large number of exceptions exist, savings are possible.
I agree. Any comments on my above overlapping system? It's virtually impossible for one to no longer connect to one's "home" region. If "two closest points" isn't flexible enough, we can move to three closest points: "N choose 3 minus invalid_combos" is still fewer routes by far than the status quo.
Suppose we are both global networks, but we interconnect only in a few places. Suppose my idea of Kansas is "north" and yours is "south". Obviously, if we could agree on an interconnect point where routes to Kansas belong, we both wouldn't have to carry more specifics than the regional aggregate outside this region. But if I accept your Kansas routes in Dallas and you mine in Chicago, everything still works, there are just no savings.
Let's take this a step further. Say that we divide the US into these "major hubs":
[...]
Let's say that I use 125.100.75.50/24. Let's further assume that this is in 125.96.0.0/11, which is "KC+STL+CHI+DFW+DEN". Any backbone provider servicing me in Wichita probably will connect to one of those hubs.
Yes, but what if there is no overlap? If two networks only know those routes in that part of the country, but don't interconnect in that region, there is a problem and more specifics have to be carried throughout a larger part of the networks. However, a network that doesn't interconnect in this region can accept Denver routes in the Bay area and Chicago routes at the Sprint NAP, if Denver and Chicago use different regional prefixes.
Didn't we have this argument with 8+8 ?
I wasn't there... But the argument shouldn't be about how much this will help, but about how much it will hurt. I don't think it will hurt anyone, so even if there is just a chance that it will help, we should do it.
Sort of... renumbering for naught is a bad thing. However, using a new, even marginally better, policy on new IP space would help.
I don't think many people will renumber for this. And it only applies to multihomers anyway, routes from well-aggregated PA space will presumably still be carried world wide. As long as we're on the subject: it wouldn't hurt if the regional registries looked at allocating bigger chunks of address space to large ISPs. There are ASes that announce hundreds of routes. That's not good. It seems like the RIRs are afraid to carve off big chunks of address space. Why? Assigning someone a /16 and keeping the next 15 /16s free in case he comes back soon doesn't mean those 15 /16s can never be allocated to anyone else any more. But giving someone a /16 and the next person the next /16 DOES mean the first one will never be able to aggregate two /16s into a /15. Iljitsch van Beijnum
participants (8)
-
Alex Bligh
-
E.B. Dreger
-
Iljitsch van Beijnum
-
Jeff Mcadams
-
Sean Donelan
-
Simon Lyall
-
smd@clock.org
-
Valdis.Kletnieks@vt.edu