| > I'm pretty sure I need further explanation to "get it"./ I probably still don't get it, but let me see if I understand the mechanism. First, assign a prefix to a particular non-topological "locus", such as a metropolitan area, or a continent. Second, networks inside that locus will announce only the prefix, but with these exception bits. [Implied, but not stated: third, all these networks will exchange full information so as to be able to generate these exception bits]. Fourth, receivers of these prefixes, with the exception bits, will expand the longest-match trie (a Patricia tree is a compact representation of a trie, in common use when you have data with many nodes with just one child) so that lookups will only match in the case where there is no exception. If I understand you, what you are trying to do is to reduce the requirement for EVERY network operating within the aggregate to carry traffic to the ENTIRE aggregate at all times. This ordinarily would require announcing more specifics. So you propose a scheme where you use an attribute instead of the more specifics. Unfortunately, your attribute will cause the same behaviour in a receiver as would the list of more specifics, and therefore is merely a compression of the representation on the line that is somewhat better than, say, gzip. IOW, I think you are solving the wrong problem. We really have nearly zero experience with aggregates containing disjoint topology (i.e., non-provieder-based aggregation), largely because there is no obvious way to contain an explosion of more specifics when complete internal connectivity and complete transit break down. Steve Deering does propose a (partial) solution for this, but (in my opinion) it involves a complete reversal of current financial arrangements to work, in that a sender would have to compensate a transit network for carrying its traffic to anything within that aggregate, rather than the transit network collecting from the other (or both) parties. This is only a partial solution, since even where there is an incentive to maintain complete interconnectivity and carry traffic to all the consitituent subnets of the aggregate, failures will still cause black holes to arise even though other valid paths exist. Your scheme does let one warn of black holes in this eventuality, takes a bit less bandwith on the line, probably allows for the "slosh" to happen all at once rather than in dribs and drabs, and so forth, but it represents the same amount of work for the routers processing the attribute. That is, those routers are effectively brought inside the abstraction boundary of the "locus", and as a result the goal of hiding information from those routers is not met. My gut feeling is that for any sizable "locus", almost all of what we consider the core of the global routing system would be contained within the new abstraction boundary, so we're no better off than not aggregating in the first place. That is, we are MUCH better off with PA addressing. Sean.
On Thu, 30 Aug 2001, Sean M. Doran wrote:
I probably still don't get it, but let me see if I understand the mechanism.
First, assign a prefix to a particular non-topological "locus", such as a metropolitan area, or a continent.
How this is done is important, because it influences the number of customers an ISP will have per bitmap. Assigning a prefix to a continent wouldn't be a good idea, because that way every regional ISP has to announce the very large bitmap for the entire continent, while most of it contains just zeros. Per metro area would be better. But two ISPs that have many multi-homing customers in common could use a prefix for just the two of them, regardless of geography.
Second, networks inside that locus will announce only the prefix, but with these exception bits. [Implied, but not stated: third, all these networks will exchange full information so as to be able to generate these exception bits].
The bitmaps are generated inside the source AS (presumably, iBGP will still carry regular routes) and the bitmaps are transmitted from one network to another, so there is no requirement for full interconnetion at the routing level.
Fourth, receivers of these prefixes, with the exception bits, will expand the longest-match trie (a Patricia tree is a compact representation of a trie, in common use when you have data with many nodes with just one child) so that lookups will only match in the case where there is no exception.
Yes.
If I understand you, what you are trying to do is to reduce the requirement for EVERY network operating within the aggregate to carry traffic to the ENTIRE aggregate at all times.
Yes.
This ordinarily would require announcing more specifics. So you propose a scheme where you use an attribute instead of the more specifics. Unfortunately, your attribute will cause the same behaviour in a receiver as would the list of more specifics, and therefore is merely a compression of the representation on the line that is somewhat better than, say, gzip.
IOW, I think you are solving the wrong problem.
I'm mostly trying to solve the memory problem, but it should also help with (but certaintly not completely solve) the processing problem. Since an updated bitmap is always the same size and it updates many routes at a time, it should take less CPU power to process the updates. Also, you could make a certain group of routers responsible for the more specifics (this would work well if the prefixes are assigned geographically) and let the others delay processing of the bitmaps or even drop the bitmaps completely.
Your scheme does let one warn of black holes in this eventuality, takes a bit less bandwith on the line, probably allows for the "slosh" to happen all at once rather than in dribs and drabs, and so forth, but it represents the same amount of work for the routers processing the attribute. That is, those routers are effectively brought inside the abstraction boundary of the "locus", and as a result the goal of hiding information from those routers is not met.
I think the only way to really know what the processing benefits of all of this are is implementing it, or run detailed simulations, but those require pretty much an implementation as well. Note that bandwidth on the line is not an issue, BGP encodes the routing information sufficiently efficient.
My gut feeling is that for any sizable "locus", almost all of what we consider the core of the global routing system would be contained within the new abstraction boundary, so we're no better off than not aggregating in the first place.
That is, we are MUCH better off with PA addressing.
Suppose that every "P" would only announce a single "A". (I know, the other 300 are important too, but just for the sake of argument.) Would that solve the problem? Only if there is a limit on the number of ISPs. I don't think there is such a limit. I have my own web and mail servers at home, along with a router that can do BGP and handle incoming modem connections. So basically, I'm my own ISP. I have recently helped a medium sized business with their BGP and they became an "ISP" so they could get a /20. The only way we're ever going back to a 8k routing table in IPv6 is if multihoming at the host level becomes a decent alternative. There is SCTP, a transport protocol that will handle multiple source and destination IP addresses, so when one path goes down, it will use another. (SCTP is useless as a TCP replacement, though.) And there have been successful experiments with adding this kind of functionality to TCP. But the problem is that you can't just update a billion or so running TCP stacks over night. Multihoming will be here for a while. Filtering is coming back in style now, but it will go away when customers start to notice they can't reach certain destinations through certain networks: that's bad business. (It will also make multihoming even more attractive.) So we either start to build better EGPs now, even if we don't have a new algorithm that will magically make everything right, or start buying Cisco and Juniper stock while it's low.
On Fri, 31 Aug 2001, Iljitsch van Beijnum wrote: [snip]
The only way we're ever going back to a 8k routing table in IPv6 is if multihoming at the host level becomes a decent alternative. There is SCTP, a transport protocol that will handle multiple source and destination IP addresses, so when one path goes down, it will use another. (SCTP is useless as a TCP replacement, though.) And there have been successful experiments with adding this kind of functionality to TCP. [snip]
I've been being good about keeping my multi6 advocacy off of nanog, but I have to correct here: SCTP can be used as a full replacement of TCP as it is a strict superset, it also can replace UDP for many applications. As soon as the SCTP TCP-like API is finished in the Linux kernel SCTP implimentation I'll be making the minor changes to a few apps (lynx, openssh, and apache for starters) to demonstrate how easily TCP applications can be transisitoned to SCTP for multihoming support (SCTP has a number of additional advantages that would be useful, such has multiple streams which would require more then a simple search and replace).
On Fri, 31 Aug 2001, Greg Maxwell wrote:
addresses, so when one path goes down, it will use another. (SCTP is useless as a TCP replacement, though.) And there have been successful
I've been being good about keeping my multi6 advocacy off of nanog, but I have to correct here: SCTP can be used as a full replacement of TCP as it is a strict superset, it also can replace UDP for many applications.
That is like replacing passenger trains by freight trains. After all, aren't passengers just one type of freight? SCTP has a whole bunch of features that are of no use to our current applications, that all expect TCP. It would be very unwise to switch to a new transport protocol just because it has one desirable feature that can very easily be built in TCP. Two modules that do 99% the same thing but with different code is bad software design. And SCTP is not backwards compatible with older TCP implementations or access filters or firewalls or anything.
On Sat, Sep 01, 2001 at 10:59:24AM +0200, Iljitsch van Beijnum wrote:
On Fri, 31 Aug 2001, Greg Maxwell wrote:
addresses, so when one path goes down, it will use another. (SCTP is useless as a TCP replacement, though.) And there have been successful
I've been being good about keeping my multi6 advocacy off of nanog, but I have to correct here: SCTP can be used as a full replacement of TCP as it is a strict superset, it also can replace UDP for many applications.
That is like replacing passenger trains by freight trains. After all, aren't passengers just one type of freight?
SCTP has a whole bunch of features that are of no use to our current applications, that all expect TCP. It would be very unwise to switch to a new transport protocol just because it has one desirable feature that can very easily be built in TCP.
Two modules that do 99% the same thing but with different code is bad software design. And SCTP is not backwards compatible with older TCP implementations or access filters or firewalls or anything.
s/TCP/IPv4/ s/SCTP/IPv6/ Interesting to read that way... and it explains why SCTP isn't even known to most of the folks I deal with on a daily basis, much less in any sort of wide deployment. Me, I prefer to build a new car that's fully up to new design specs, rather than try to retrofit rocket boosters onto the old Studebaker. This isn't to claim, in any way, that "TCP is dead", mind you; but SCTP answers a fairly fundamental set of problems, with a different set of design goals than TCP and UDP were written for. Trying to mangle TCP to accomodate those goals seems likely to produce more confusion than viable code. BTW, SCTP is just as compatible with filters and firewalls as any other IP based protocol. It has a protocol number and a public design spec. That few of these implement the more advanced matching sets that can be used for TCP is largely due to the catch-22 of router vendors not wanting to waste time on writing code for it until people demand it, and people not demanding it because said vendors don't support it, so how big can it really be? (Oh, and on a sidenote: my Linux firewall will filter it just fine, without even knowing what it is). In any case. I agree with your assertion that TCP could be rewritten to do the same thing as SCTP. I assert, in turn, that you would end up re-writing most of the SCTP spec in the process, and have an equal amount of new (read 'buggy') code. As for the 'SCTP isn't backwards compatible with older TCP' claim... uhm, TCP isn't backwards compatible with UDP, either. Your point? -- *************************************************************************** Joel Baker System Administrator - lightbearer.com lucifer@lightbearer.com http://www.lightbearer.com/~lucifer
On Sat, 1 Sep 2001, Joel Baker wrote: [SCTP]
Me, I prefer to build a new car that's fully up to new design specs, rather than try to retrofit rocket boosters onto the old Studebaker. This isn't to claim, in any way, that "TCP is dead", mind you; but SCTP answers a fairly fundamental set of problems, with a different set of design goals than TCP and UDP were written for. Trying to mangle TCP to accomodate those goals seems likely to produce more confusion than viable code.
SCTP is a protocol designed to carry telephony signalling. Being able to use multiple IP addresses per session is not something that is inherently more appropriate for telephony signalling than for network applications that use stream-based communication. It is a nice option to have for any transport protocol. So unless there is _another_ reason why SCTP is appropriate for a certain application, it seems pretty clear to me that using TCP, which was designed to work with the protocols we use on the Net, and is the transport protocol applications expect, is much more appropriate. Extending TCP to use multiple IP addresses is not a problem. TCP has been extended in many ways in the past. And an experimental implementation has been available for four years.
As for the 'SCTP isn't backwards compatible with older TCP' claim... uhm, TCP isn't backwards compatible with UDP, either. Your point?
But nobody is proposing to have applications built for UDP run over TCP. I'm not against implementing new protocols that aren't backward compatible, but I'm merely saying that in this case the benefits are too small. And comparing this to IPv6: how many people are using IPv6 today? Sometimes it is necessary to forego backward compatibility, but that decission should never be taken lightly.
On Sat, Sep 01, 2001 at 10:34:31PM +0200, Iljitsch van Beijnum wrote:
On Sat, 1 Sep 2001, Joel Baker wrote:
[SCTP]
Me, I prefer to build a new car that's fully up to new design specs, rather than try to retrofit rocket boosters onto the old Studebaker. This isn't to claim, in any way, that "TCP is dead", mind you; but SCTP answers a fairly fundamental set of problems, with a different set of design goals than TCP and UDP were written for. Trying to mangle TCP to accomodate those goals seems likely to produce more confusion than viable code.
SCTP is a protocol designed to carry telephony signalling.
And bears about as much resembleance to this origion as TCP does to the military's origional purposes for having a network.
Being able to use multiple IP addresses per session is not something that is inherently more appropriate for telephony signalling than for network applications that use stream-based communication. It is a nice option to have for any transport protocol.
Agreed.
So unless there is _another_ reason why SCTP is appropriate for a certain application, it seems pretty clear to me that using TCP, which was designed to work with the protocols we use on the Net, and is the transport protocol applications expect, is much more appropriate. Extending TCP to use multiple IP addresses is not a problem. TCP has been extended in many ways in the past. And an experimental implementation has been available for four years.
RFC/Draft/URL/code? I have yet to see anything which allows the sort of clean and direct setup which SCTP does, but I certainly haven't made an exhaustive search of the field. Certainly, if it addresses all of the same issues while being more compatible and requiring fewer changes, I would be all for it.
As for the 'SCTP isn't backwards compatible with older TCP' claim... uhm, TCP isn't backwards compatible with UDP, either. Your point?
But nobody is proposing to have applications built for UDP run over TCP.
I might argue that, but it would degenerate into nitpicking. However, I will grant that a conversion to SCTP would affect a significantly larger portion of the network than any example I could present as a counter.
I'm not against implementing new protocols that aren't backward compatible, but I'm merely saying that in this case the benefits are too small. And comparing this to IPv6: how many people are using IPv6 today? Sometimes it is necessary to forego backward compatibility, but that decission should never be taken lightly.
I believe that was part of my point, in starting. Both SCTP and IPv6 provide benefits. However, neither appears to be making much headway in the direction of being adopted by the majority of the Internet. I really wonder whether any such major change will, since it is no longer practical for a central agency to say "support for <X> protocol will cease as of <date>". -- *************************************************************************** Joel Baker System Administrator - lightbearer.com lucifer@lightbearer.com http://www.lightbearer.com/~lucifer
The bitmaps are generated inside the source AS (presumably, iBGP will still carry regular routes) and the bitmaps are transmitted from one network to another, so there is no requirement for full interconnetion at the routing level.
The trouble with using 1 bit to represent 1 prefix is that there is a need to move more than 1 bit of information per route between AS's (think AS paths for loop detection, communities etc.). In iBGP the situation is worse as you have more information you want to carry (next hop, localpref), but you seem to envisage this only to replace eBGP. So all you are doing is compressing the data stream (after making some simplifying assumptions some of which I don't believe hold up). As you have to translate your bitmap back to/from iBGP in order to propagate announcements across the AS, you might as well consider the simpler alternative of just compressing the eBGP. However, you're increasing processor power here, rather than decreasing it. It would probably be possible to compress information on stub nodes, or nearly stub nodes much further (but you can do that effectively with outbound route filters), and, to a limitted extent reduce their visibility in the middle of the network (think proxy-aggregation) but we have existing tools to do this. The 'real' solution is to hierachicalize (sp?) or indirect the routing tree such that reachability information for common multihomed configurations does not in general reach the core's of most people's networks [*]. I suspect technologies similar to mobile IP may have application here. [*] or reduce the routes held in most of the routers in most people's networks, which, without wishing to start another flame war, is a claim occasionally made for MPLS networks in that non-edge LSR routers need not carry any BGP table, 'merely' an LIB. However, you still need to carry the prefixes in edge LSRs so, stepping neatly around the flame-fest, this seems to me an incomplete solution. -- Alex Bligh Personal Capacity
On Fri, 31 Aug 2001, Alex Bligh wrote:
The trouble with using 1 bit to represent 1 prefix is that there is a need to move more than 1 bit of information per route between AS's (think AS paths for loop detection, communities etc.).
I think it is possible to aggregate this information for a relatively large number of destinations. That means multihomers wouldn't be able to set communities for their routes, but at least they'd be reachable and that has to count for something.
In iBGP the situation is worse as you have more information you want to carry (next hop, localpref), but you seem to envisage this only to replace eBGP.
I answered a bit too soon. I meant that the full information should be carried in iBGP on the originating network (and not in transit networks), but this is not really necessary either, if you use an IGP. (But some networks use iBGP rather than an IGP to carry customer routes internally.)
participants (5)
-
Alex Bligh
-
Greg Maxwell
-
Iljitsch van Beijnum
-
Joel Baker
-
smd@clock.org