Re: A call for the future. Was: Re: Verio Decides what parts ofthe internet to drop
Around 07:22 AM 12/8/1999 -0800, rumor has it that Randy Bush said:
The phone system doesn't require anything close to millions of routes for LNP. Instead, at the time of call setup, there is a lookup that performs the translation between the portable number (which is the logical address) and the physical address (which to date is still mostly statically routed using a well-defined hierarchy based upon physical location).
and here is where the anology breaks down. a second or two of call setup may be acceptable for establishing a phone call. it would be a disaster on a per-packet basis.
ip is a connectionless protocol. before hitting the reply key, think about that.
I thought about it. It seems to me that a router is not presented with a stream of randomly addressed packets. In any time frame, there are going to be from 1 to many hundreds of packets between the same set of addresses. Cisco takes advantage of this to with a route cache. When I was a junior monkey, one of my first tickets involved the route cache ACL bug in IOS 8.0, where routes added to the route cache were no longer checked against the appropriate ACL's. Of course, I didn't know immediately that was the real problem. It just seemed to work after I pinged (permitted in the ACL), and would continue to work (for something denied in the acl) for a while afterward. So, while the first packet may result in a longer lookup, the successive packets hit the cached route. So now it becomes a problem to build an adequately sized route cache for the number of simultaneous 'connections' that one might expect to process in a given timeperiod. Which brings one back the terms and design rules that are not unlike those that apply to phone switches. The thing that is tough to grow is the access list size. The connectionless part of IP is just a matter of whether state is maintained in the protocol or at the endpoints. An ISDN connection has a state at the endpoints and all the processing points. All the switches along the path must maintain that state. An ISDN packet in the middle of stream does not carry enough information to recreate the state. If you miss the setup message, you can't figure out the state from the rest of the packets. By contrast, an IP packet traversing an IP network carries all the state it needs with it. However, that doesn't mean that you need to start from scratch each time you see the same src/dest pair. Methods which hold some additional state in the router for faster processing can be used to speed things up. The full route table could be very large. Much larger than 256Meg or even 4Gig. And even moving to disk backed storage (many gigs) most likely means that access would still be in tens or hundreds of milliseconds, not seconds. However, any given router doesn't really need to use very much of it at any given time. The problem, as I see it, is that Cisco sees everything as a router. And a typically low powered router, as compared to mid and high range unix servers. Consider the Cisco H323 (VOIP) product line. Clearly, the 5300 makes sense as a router. It processes 4 PRI or T1 CAS voice lines into h323 on IP, along with the consequent DSP chips. The 2600 & 3600 make sense for lower density DSP platforms. Clearly, tranlating G.711 to G.729 or others requires specialized hardware. However, the gatekeeper (call routing) software runs on a 3600. This is a grossly underpowered platform for gatekeeper functions, and underfeatured. Gatekeeper functions belong on a general purpose machine with modular (user replaceable) software and access to a database which can handle complex and quickly changing route policies (eg, time of day routing, followme routing). One desires a machine which can scale to handle hundreds or thousands of transactions per second (between a Sun ultra2 and an SGI Challenge w 64 processors). None of these functions are particularly suited to specialized routing hardware. Yet, Cisco insists that (nearly) everything must run on IOS. The best alternative right now runs on NT, with its lack of scalability and other PC problems. But the NT platform runs circles, triangles, squares, and other complex polyhedrons around the 3600. Policy based routing in either a voice or an IP network require flexibility which can't be found in a specialized hardware platform. The problem is the wrong software & hardware combination to do the job, not that the route table has grown too large, or that the job just can't be done. --Dean ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Plain Aviation, Inc dean@av8.com LAN/WAN/UNIX/NT/TCPIP http://www.av8.com ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dean, This is where you are showing the wrong things. You seem to be rambling on a variety of topics way off from the problem. Problem: People trying to announce the improper class (or classless) address out of the "classical classful space". Primarily the people who a large number of the inital /16's were assigned to were universities, and the larger businesses. These days the people who obtained large quantities of this space who have no use for it see a chance to make a profit off of it by "leasing" it to another person, or the outright sale of the company, or some semi-existant division of the company where the sale of the address space goes along with it. All I have to say is "Let the buyer beware". Not everyone uses the same filtering policies, but the majority of the tier-1 providers do. You are not in this boat of supporting this, nor really running these types of environments in your day-to-day operations. Although adding memory and cpu, along with a non-caching routing solution (such as CEF, or any type of '100% cache hit' routing system) will make the ability to forward packets faster. The problem is that we do not have direct control over what the vendors hand us in the backbone environment. If I wish to take an OC12 and do policy based routing on EVERY packet that goes across it, it is not possible with the cpu power that is put on there. This is what the majority of people are using these days. On the smaller ds(N) sized networks, it is more possible because of the current ability to distribute the load, and the cpu power available per packet, but once you get into this type of realistic environment, you really can not perform this type of routing that you are suggesting. Even with this state information, if I were a router and spent all my time comparing both my src and dst within the packet, I would more than double my forwarding time, and cause undue increased latency. Now, given that there are ~4.3B ip addresses, given memory consumption of lets say 512 bytes per ip, you're talking about 2.1Tb of memory if you were to do allow every 32 bit ip to be routed seperateley. This is not possible. This is why aggregation of the routes, and these tactics are useful, because looking at it from a common-sense point of view, we are saving ourselves a fair amount of money instead of asking for a 7200/GSR/M20/M40 that can route each ip seperateley and on a totally different path. Your cost for this box would make it really prohibitive. - jared On Wed, Dec 08, 1999 at 02:01:29PM -0500, Dean Anderson wrote:
Around 07:22 AM 12/8/1999 -0800, rumor has it that Randy Bush said:
The phone system doesn't require anything close to millions of routes for LNP. Instead, at the time of call setup, there is a lookup that performs the translation between the portable number (which is the logical address) and the physical address (which to date is still mostly statically routed using a well-defined hierarchy based upon physical location).
and here is where the anology breaks down. a second or two of call setup may be acceptable for establishing a phone call. it would be a disaster on a per-packet basis.
ip is a connectionless protocol. before hitting the reply key, think about that.
I thought about it. It seems to me that a router is not presented with a stream of randomly addressed packets. In any time frame, there are going to be from 1 to many hundreds of packets between the same set of addresses.
Cisco takes advantage of this to with a route cache. When I was a junior monkey, one of my first tickets involved the route cache ACL bug in IOS 8.0, where routes added to the route cache were no longer checked against the appropriate ACL's. Of course, I didn't know immediately that was the real problem. It just seemed to work after I pinged (permitted in the ACL), and would continue to work (for something denied in the acl) for a while afterward.
So, while the first packet may result in a longer lookup, the successive packets hit the cached route. So now it becomes a problem to build an adequately sized route cache for the number of simultaneous 'connections' that one might expect to process in a given timeperiod. Which brings one back the terms and design rules that are not unlike those that apply to phone switches. The thing that is tough to grow is the access list size.
The connectionless part of IP is just a matter of whether state is maintained in the protocol or at the endpoints. An ISDN connection has a state at the endpoints and all the processing points. All the switches along the path must maintain that state. An ISDN packet in the middle of stream does not carry enough information to recreate the state. If you miss the setup message, you can't figure out the state from the rest of the packets. By contrast, an IP packet traversing an IP network carries all the state it needs with it. However, that doesn't mean that you need to start from scratch each time you see the same src/dest pair. Methods which hold some additional state in the router for faster processing can be used to speed things up.
The full route table could be very large. Much larger than 256Meg or even 4Gig. And even moving to disk backed storage (many gigs) most likely means that access would still be in tens or hundreds of milliseconds, not seconds. However, any given router doesn't really need to use very much of it at any given time.
The problem, as I see it, is that Cisco sees everything as a router. And a typically low powered router, as compared to mid and high range unix servers. Consider the Cisco H323 (VOIP) product line. Clearly, the 5300 makes sense as a router. It processes 4 PRI or T1 CAS voice lines into h323 on IP, along with the consequent DSP chips. The 2600 & 3600 make sense for lower density DSP platforms. Clearly, tranlating G.711 to G.729 or others requires specialized hardware.
However, the gatekeeper (call routing) software runs on a 3600. This is a grossly underpowered platform for gatekeeper functions, and underfeatured. Gatekeeper functions belong on a general purpose machine with modular (user replaceable) software and access to a database which can handle complex and quickly changing route policies (eg, time of day routing, followme routing). One desires a machine which can scale to handle hundreds or thousands of transactions per second (between a Sun ultra2 and an SGI Challenge w 64 processors). None of these functions are particularly suited to specialized routing hardware. Yet, Cisco insists that (nearly) everything must run on IOS. The best alternative right now runs on NT, with its lack of scalability and other PC problems. But the NT platform runs circles, triangles, squares, and other complex polyhedrons around the 3600.
Policy based routing in either a voice or an IP network require flexibility which can't be found in a specialized hardware platform. The problem is the wrong software & hardware combination to do the job, not that the route table has grown too large, or that the job just can't be done.
--Dean
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Plain Aviation, Inc dean@av8.com LAN/WAN/UNIX/NT/TCPIP http://www.av8.com ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine. END OF LINE |
Now, given that there are ~4.3B ip addresses, given memory consumption of lets say 512 bytes per ip, you're talking about 2.1Tb of memory if you were to do allow every 32 bit ip to be routed seperateley.
2.1 terabytes...wow. but think about it this way: if you did manage to build such a beast, a route cache would not be needed, as route lookups would all be O(1). :) actually...if you *did* manage such a thing, you could get the memory usage down a lot, since you'd just need to store next hop for each ip address, and not so much stuff about linked lists and stuff. -- |-----< "CODE WARRIOR" >-----| codewarrior@daemon.org * "ah! i see you have the internet twofsonet@graffiti.com (Andrew Brown) that goes *ping*!" andrew@crossbar.com * "information is power -- share the wealth."
On Wed, 8 Dec 1999, Dean Anderson wrote:
ip is a connectionless protocol. before hitting the reply key, think about that.
I thought about it. It seems to me that a router is not presented with a stream of randomly addressed packets. In any time frame, there are going to be from 1 to many hundreds of packets between the same set of addresses.
This is a major fallacy. Many promising local ISPs had experience with this when they were smaller and more local. I am sure dkatz will tell you of his experiences with trying to get caching algorithms to work well with core network flows. In short, the core network flows cause so much churn in cache memory that the working set of the cache tends to be the size of the entire FIB to get adequate performance. Caching does NOT work in context of tcp flows at the core level. Period.
the packets. By contrast, an IP packet traversing an IP network carries all the state it needs with it. However, that doesn't mean that you need to start from scratch each time you see the same src/dest pair. Methods which hold some additional state in the router for faster processing can be used to speed things up.
Once again, at the core of promising local ISP's, this does NOT work.
The full route table could be very large. Much larger than 256Meg or even 4Gig. And even moving to disk backed storage (many gigs) most likely means that access would still be in tens or hundreds of milliseconds, not seconds. However, any given router doesn't really need to use very much of it at any given time.
Full route table size is not a problem. You can burn a hard disk as you mentioned to store it. The issue is getting data in and out of the processor, i.e. number of pins. Core flows are not ameneable to caching. This approach will fail the first time you see a new packet and need to swap from hard disk. /vijay
On Wed, 8 Dec 1999, Vijay Gill wrote:
Full route table size is not a problem. You can burn a hard disk as you mentioned to store it. The issue is getting data in and out of the processor, i.e. number of pins. Core flows are not ameneable to caching. This approach will fail the first time you see a new packet and need to swap from hard disk.
Not that it would be very economical, but what are the technical implications of using a solid state device (such as the Quantum's RUSHMORE NTE series) instead of a normal hard drive? -- Tim -------------------------------------------------- * Timothy M. Wolfe, Chief Network Engineer * * ClipperNet Corporation / It's a wireless world * * tim@clipper.net 800.338.2629 x 402 * * Sufficient for today = Inadequate for tomorrow * --------------------------------------------------
Cisco takes advantage of this to with a route cache.
And, after much machination, Cisco replaced the route cache with a full forwarding table. Caching doesn't work in the core. Really. Tony
participants (7)
-
Andrew Brown
-
Dean Anderson
-
Jared Mauch
-
Randy Bush
-
Tim Wolfe
-
Tony Li
-
Vijay Gill