"Joseph T. Klein" <jtk@titania.net> writes:
Having hopelessly screwed up my facts ... I was trying to make a point here. So the router was worse than I thought. Retaing policies that exclude new players because of AGS+'s inability to handle large routing flaps just does not cut it.
Sprint imposed this at a time when 7000s with 64M of memory where available. Will /19 remain policy when the majors are running with Cisco 12000 and GRFs?
I am not sure what point you are trying to make here. If you feel like grinding this axe again, I am more than willing to play in my fleeting spare moments. An AGS+ with a CSC/4 has exactly the same CPU as a 7000 with an RP. In fact, AGS+ performance is slightly higher in some cases because of some interesting design features of the 7000 and the other AGS+ downgrade path routers (notably the 7500 series). I did not put in a filter on /18s (yes it was /18s initially, and got changed to /19s after much discussion with the registries, especially Daniel Karrenberg at RIPE, in an attempt to harmonize Sprint's filters with slow-start allocation policies) because of the AGS+ difficulties; all the routers that were carrying full routing at the time had 64Mbytes of RAM, and the two remaining AGS+es were there to implement historical things done principally for ICM (like a STUN connection and the PANAMSAT router). What triggered the filter was the observation that in the blocks freshly allocated by all three registries were very poorly aggregated. More annoyingly, those allocated to Sprint's principal peers (most notably Internet MCI) demonstrated the worst aggregation; in one case a /14 was announced almost exclusively as prefixes no shorter than 19 bits. After spending some time trying to chase this down -- with some success, as in the case of PSI's then newest blocks, but not in the case of Internet MCI, who did nothing -- I decided to issue a warning that once the /8 that the InterNIC was using had filled up (and after some discussion, once RIPE and APNIC proceeded to allocate from new /8s), I would begin filtering all new unicast addresses to ignore Sprintward announcements of any prefix longer than 18 bits. Moreover, I also announced that I would filter out any subnets of historically classful As and Bs. The warning was several months old when people started noticing that they couldn't reach things behind Sprintlink, and alot of time was spent explaining to people that this shouldn't have surprised them at all. Some changes happened, notably I dropped down to 19 bits, the registries began to explain to people that anything longer than that almost certainly would not be routable, and that allocation != routing. This measurably flattened the growth curve of the number of prefixes seen by default-free routers, changing it from a nearly exponential function to a linear one, with the slope below that of Moore's law. In other words, it probably as much as the initial introduction of supernetting as a concept acted to keep the Internet scalable while it continued to use the current set of routing protocols.
If aggregation is the goal then mechanisms should be developed for exchanging CIDR blocks so the address space can be re-packed.
It is time for everyone to learn a term that unfortunately I did not invent: IPv4ever. NAT and other clever gatewaying effectively provides a mechanism to extend the address lifetime expectancy not only of the IPv4 unicast address space in general, but of any given host in particular. That is, there are now mechanisms which can hide address changes from hosts that deal with address changes badly, while at the same time there is increasingly good software to assist with renumbering hosts. There are mechanisms evolving which ultimately should lead to nearly any given unicast subnet of 0/0 to be perceived by everything else as having a different number than things within that subnet believe. Moreover, there are also mechanisms evolving which will cause nearly any given unicast subnet of 0/0 to renumber itself so that all the numbered entities under that subnet renumber into a different unicast subnet of 0/0. This alone should give rise to maximal aggregation, and if combined with schemes which overload some addresses or which simply compress sparsely populated large subnets into densely populated smaller ones, should eliminate a large percentage of address waste. In other words, the mechanism(s) you allude to are being worked on. I would like to see them applied to the swamp within the next year or two. The "IP addresses never change within the lifetime of a session" and "IP addresses are end-to-end" crowds who have misthought a number of protocols will probably fight tooth and nail to see this never happen. Mind you, they are mostly the same people who fought tooth and nail against the idea of renumbering in the first place, so one can expect roughly the same type of "discussions".
The /19 policy is archaic. It creates an obstacles and only partly resolves the problem. Fixing holes in CIDR blocks, exchanging fragmented blocks for contiguous blocks, and cleaning up "The Swamp" can do more for the stability and size of the routing table.
If you have an implementation of something Sprint and its competitors who now do precisely the same filtering can buy that can cause the swamp to be aggregated into a small handful of prefixes from their point of view, then I can point you at people who would be happy to sign a cheque. The only real problem I saw in the implementation of the /19 filter was the bad press generated by people who refused to listen to registries' warnings that long prefixes probably would not be globally routable, and possibly the lack of a tariff which would have allowed people with money to purchase exceptions in the Sprint filters.
BTW - If you use a route server to do the dampening and calculation of peer routes you can even make a wimpy CPUed 7000 handle backbone traffic.
The wimpy 7000 still has to receive at least one copy of the NLRI, and process changes into the forwarding table(s). As the number of prefixes increase, even if the level of "background" noise (the rate at which a large set of prefixes demonstrate instability that is considerably less than that which would be prevented even by very aggressive route dampening) were to remain constant, you require more CPU even in the simple case of receiving and installing modified forwarding tables. In the absence of any feedback mechanism that holds down the total number of globally visible prefixes, the increase in CPU requirements could easily outstrip Moore's law and overwhelm even state-of-the-art processors in a matter of time. Note that the liklihood of keeping up with the economics of dealing with things which are CPU bound and ill suited to parallel processing and which grow along the same slope or on a slightly greater slope than Moore's law is small. This describes the amount of BGP processing required prior to the installation of the first prefix-length filters at Sprint's border routers. I was always open to suggestions that would accomplish the same result, and helped push Cisco to develop two of them (a large cleanup of some of their BGP implementation's processing and an implementation of something very close to Curtis Villamizar's route flap dampening algorithm), however I have yet to see suggested something that eliminates the need for such filters that is readily deployable and which will keep the slope of processing requirements below that of processing capability. I still am, I belive my successors at Sprint and like-minded people at other ISPs who implement prefix-length filtering are too. Until such a thing emerges, however, I continue to believe that inbound prefix-length filtering is a good policy that should be implemented universally. Sean.