there has to be a limit.
A limit is needed, but the filtering method in question to me essentially says this:
if you have 64.x.x.x/15, slice it into as many /20's as you can and bloat as much as you want.. we feel this is an acceptable practice.
i strongly doubt that the policy was formulated on that basis. it may or may not be equivilent to what you said, but you have not described anyone's (whom i'm aware of) actual motivations with the above formulation.
Yet, if you're a legitimately multihomed customer wants to push out a single /24 (1 AS, 1 prefix) that is not considered acceptable.
actually there's a loophole. nobody filters swamp /24's that i know of, since so much of the oldest salty crusty layers of the internet are built on those.
The only kind of prefix filtering I would want to implement is something that can accomplish:
1. Define threshold, say /20 or /18 or hell even /8. 3. all prefixes longer than threshold get held until entire tables are loaded 3. start looking at the longer prefixes across the entire ipv4 space starting with the longest and finishing at threshold+1 4. if prefixes longer than threshold appear as part of a larger aggregate block that *originate* from the same AS, drop. 5. if prefixes longer than threshold originate from a different AS than the aggregate, accept.
i wish you luck in implementing this proposal. i think that folks with multivendor global networks will find it completely impractical, but you can probably pull it off in a regional zebra-based network with no problem.
This way I could get rid of redundant information yet at the same time not cause any trouble to smaller multihomed customers. I'm not saying that we should allow /32's to be pushed everywhere either. As you said there has to be a limit, and /24 seems to be a pretty good one if something along the lines of the above mentioned filtering algorithm could be used.
let's do some math on this. swamp space is more or less 192/8 and 193/8 (though parts of other /8's were also cut up with a pretty fine bladed knife). if every 192.*.*/24 and 193.*.*/24 were advertised, that would be more prefixes than the entire current table shown in tony bates' reports (~100K vs 128K). that is of course just the existing swamp. and it would be hard to handle but even harder to prevent since there's no real way using today's routers to say "accept the current /24's in 192/8 and 193/8 but don't allow new ones". this is the bogey man that gives people like smd nightmares. then there's everything else. if 20 /8's were cut up into /24's then tony bates' report would have 1.3M more things in it than are there today. if the whole IPv4 space were cut up that way then we'd see 16M routes globally. those numbers may seem unreasonable, either because current routers can't hold them, or because current routing protocols would never be able to converge, or because you just can't imagine humanity generating even 1.3M /24's let alone 16M of them. multihoming is a necessary property of a scalable IP economy. actually, provider independence is necessary, multihoming is just a means to that end. if you don't think there are more than 1.3M entities worldwide who would pay a little extra for provider independence, then you don't understand what's happened to *.COM over the last 10 years. in that case i'll simply ask you to take my word for it -- you make 1.3M slots available, they'll fill up. i do not know the actual limit -- that is, where it ends. i know it's going to be higher than 1.3M though. i also know that the limit of humanity's desire for "provider independence without renumbering" (or "multihoming") is currently higher than what the internet's capital plant, including budgetted expansions, can support. and i strongly suspect that this will remain true for the next 5..10 years.
I'm sure in reality there's many reasons this would not be able to be implemented (CPU load perhaps) but it would atleast do something more than a "gross hack" that nails some offenders, not all by any means, and impacts multihomed customers who are only a portion of the problem that the current prefix filtering solution does not solve.
people are out there building networks using available technology. forget about CPU load and look at delta volume and convergence. the "internet backbone" is not fat enough to carry the amount of BGP traffic that it would take to represent the comings and goings of 16M prefixes. 1.3M is probably achievable by the time it comes due for natural causes. do any of our local theorists have an estimate of how much BGP traffic two adjacent core nodes will be exchanging with 1.3M prefixes? is it a full DS3 worth? more? less? every time you change out the capital plant on a single global AS core in order to support some sea change like 10Gb/s sonet or 200K routes, it costs that AS's owner between US$200M and US$1B depending on the density and capacity. bean counters for old line telcos used to want a 20 year payback (depreciation schedule) on investments of that order of magnitude. today a provider is lucky to get five years between core transplants. bringing the period down to two to three years would cause "the internet" to cost more to produce than its "customers" are willing to pay. so in the meanwhile, verio (and others who aren't mentioned in this thread) are using the technology they have in order to maximize the period before their capital plant becomes obsolete. as i said in a previous note, they are certainly balancing their filters so that filtering more would result in too many customer complaints due to unreachability, but filtering less would result in too many customer complaints due to instability. anyone who wants the point of equilibrium to move in the direction of "more routes" should be attacking the economies which give rise to the problem rather than attacking the engineering solutions which are the best current known answer to the problem. in other words go tell cisco/juniper/whomever your cool idea for a new routing protocol / route processing engine / cheap OC768-capable backplane and maybe they'll hire you to build it for them.
Date: 29 Sep 2001 12:39:27 -0700 From: Paul Vixie <vixie@vix.com>
[ snip ]
anyone who wants the point of equilibrium to move in the direction of "more routes" should be attacking the economies
"More routes" is too simplistic, at least for the "near future". "A greater number of useful routes" is what I think people are supporting. Given your point about many companies wanting to multihome, I agree that we can easily exceed 1M routes. See suggestion #3 below. Of course, there are screwballs such as someone who comes to mind who _claims_ OC-48 connectivity (not colo's bandwidth, but their own OC-48 line)... yet is single-homed. Supposedly they are so happy with their upstream that they have no desire to multihome. Frankly, I'd rather have tons of OC-3 to diverse backbones, but my point is that not everyone wants to multihome. How many _should_ want to? Most everyone. How _many_ do? I don't have the answer.
which give rise to the problem rather than attacking the engineering solutions which are the best current known answer to the problem. in other words go tell cisco/juniper/whomever your cool idea for a new routing protocol / route processing engine / cheap OC768-capable backplane and maybe they'll hire you to build it for them.
1. PI microallocations (e.g. /24) aligned on /19 (for example) boundaries. Need more space? Grow the subnet. One advert because IP space is contiguous. Cost: Change of policy at RIRs. 2. Responsibility for spam finds it way to the originating network. Why not filtering and aggregation? (No flame wars please... mention of spam is an analogy, not a desire to bring back certain flame wars after such a short while.) Cost: Individual responsibility and interacting with adjacent ASNs. 3. I'd suggest merging "best" routes according to next-hop, but the CPU load would probably be a snag. Flapping would definitely be a PITA, as it would involve agg/de-agg of netblocks. Maybe have a waiting period before agg/de-agg when a route changes... after said wait (which should be longer than the amount of time required to damp said route), proceed with netblock consolidation. I'm mulling some refinements to this, which I'll bring up if the discussion takes off. (Good idea, bad idea, flame war, I really don't care... if we eventually make progress, that's what counts.) Cost: Anyone care to estimate the resources required? Any good algorithms for merging subnets? Feel free to flame me for any oversights. <excuse>I'm attempting to multitask</excuse> and am well aware that I may have omitted something. Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
On Sat, 29 Sep 2001, E.B. Dreger wrote:
Given your point about many companies wanting to multihome, I agree that we can easily exceed 1M routes.
It is of course important not to underestimate the demand for multihoming. But on the other hand, after being in this business for a while, it's very easy to overestimate. In the absense of hard numbers, I assert that there are less than 10k real multihomers. If we assume that future multihomers will behave and only announce a single route, going from 10k to 1M multihomers is a factor 100. Even in this business few things grow that fast... Paul's comparison to .COM is not a good one, because getting an additional domain doesn't cost you any hardware and multihoming does.
How many _should_ want to? Most everyone. How _many_ do? I don't have the answer.
Multihoming costs a lot of money, so I doubt we will ever see a billion multihomers (which was the upper limit nobody bothered to protest against on multi6, 100M was still considered possibly too low by some).
1. PI microallocations (e.g. /24) aligned on /19 (for example) boundaries. Need more space? Grow the subnet. One advert because IP space is contiguous.
In practice, this already happens. If you become an ISP, you get a /20 _allocated_ even if you don't get a lot of addresses _assigned_.
Cost: Change of policy at RIRs.
And many innocent IPv4 addresses suffer.
3. I'd suggest merging "best" routes according to next-hop, but the CPU load would probably be a snag. Flapping would definitely be a PITA, as it would involve agg/de-agg of netblocks. Maybe have a waiting period before agg/de-agg when a route changes... after said wait (which should be longer than the amount of time required to damp said route), proceed with netblock consolidation.
It would be an interesting project to make an algorithm that takes a BGP table, and creates the shortest possible FIB that has traffic being forwarded in accordance with this BGP table. For dual homed networks, you should always be able to drop more than half the routes and install a default instead. But the actual size is not really the problem: some redesign and you can put 2 GB of memory in a router. Also, it should be possible to encode this information much more efficiently, maybe even to the degree that a route takes only a few bytes of memory. (http://www.muada.com/projects/bitmaprouting.txt) The real problem is processing the updates. This scales as O(N log N) because more routes mean more updates but also each update takes longer because it has to be done on a larger table. Fortunately, BGP is pretty brain dead in this area. See http://www.research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2000-74 to read about the count to infinity problem that BGP inherited from RIP. Fortunate, because a lot of improvement should be possible. Iljitsch van Beijnum
Also sprach Iljitsch van Beijnum
It would be an interesting project to make an algorithm that takes a BGP table, and creates the shortest possible FIB that has traffic being forwarded in accordance with this BGP table. For dual homed networks, you should always be able to drop more than half the routes and install a default instead.
This isn't exactly true. Keep in mind that with a full BGP table and default-less, if you send traffic to a bogus IP address you would get a Host Unreachable from your router, drop half the routes and put a default in, and that traffic follows the default and at least makes it to the next hop before getting a host unreachable. This probably isn't a major deal, about the only real operational impact is that your link that the default points to gets a bit more traffic on it...whether it would be measurable is doubtful even...but this doesn't exactly fit the constraints of the project you described as the forwarding wouldn't be in accordance with the original BGP table. </nit> -- Jeff McAdams Email: jeffm@iglou.com Head Network Administrator Voice: (502) 966-3848 IgLou Internet Services (800) 436-4456
participants (4)
-
E.B. Dreger
-
Iljitsch van Beijnum
-
Jeff Mcadams
-
Paul Vixie