Re: Inter-exchange media types
Hm, I think Peter was too brief to be understood by all. Let me try to expand on his major point (buffering requirements). First, however, to this: Jeremy Porter <jerry@fc.net> wrote:
Since large amounts of traffic on the Net orginates from modems which are typically plugged into terminal servers, which virtually all have ethernet interfaces, very large amounts of internet traffic have MTUs smaller than the 1500.
[Continues argument in the line of "if little traffic uses more than 1500 bytes MTU, ethernet will be better/cheaper/etc."]
I would claim that the average packet size doesn't really matter much -- the average packet size is usually in the order of 2-300 bytes anyway. However, restricting the MTU of an IX to 1500 bytes *will* matter for those fortunate enough to have FDDI and DS3 (or better) equipment all the way, forcing them to use smaller packets than they otherwise could. Some hosts get noticeably higher performance when they are able to use FDDI- sized packets compared to Ethernet-sized packets, and restricting the packet size to 1500 bytes will put a limit on the maximum performance these people will see. In some cases it is important to cater to these needs. The claim that switched fast full-duplex Ethernet will perform better than switched, full-duplex FDDI for small packets doesn't really make sense -- not to me at least. I mean, it's not like FDDI doesn't use variable-sized packets... Now, over to the rather important point Peter made. In some common cases what really matters is the behaviour of these boxes under high load or congestion. The Digital GigaSwitch is reportedly able to "steal" the token on one of the access ports if that port sends too much traffic to another port where there currently is congestion. This causes the router on the port where the token was stolen to buffer the packets it has to send until it sees the token again. Thus, the total buffering capacity of the system will be the sum of the buffering internal to the switch and the buffering in each connected router. I have a hard time seeing how similar effects could be achieved with ethernet-type switches. (If I'm not badly mistaken, this is a variant of one of the architectural problems with the current ATM based IXes as well.) Thanks to Curtis Villamizar it should be fairly well known by now what insufficient buffering can do to your effective utilization under high offered load (it's not pretty), and that the requirements for buffering at a bottleneck scales approximately with the (end-to-end) bandwidth X delay product for the traffic you transport through that bottleneck. So, there you have it: if you foresee that you will push the technology to it's limits, switched ethernet (fast or full duplex) as part of a "total solution" for an IX point seems to be at a disadvantage compared to switched FDDI as currently implemented in the Digital GigaSwitch. This doesn't render switched ethernet unusable in all circumstances, of course. Regards, - Havard
smaller packets than they otherwise could. Some hosts get noticeably higher performance when they are able to use FDDI- sized packets compared to Ethernet-sized packets, and restricting the packet size to 1500 bytes will put a limit on the maximum
Some hard figures on this would be interesting. Ie, % of packets with > 1500 MTU, % performance degradation if fragmented, etc. I suspect that other backbone design issues (like congestion) dominate any fragmentation issue. I'm not sure a few people trying to get a little extra throughput should dictate the design of a NAP (unless they want to pay for it).
A new stdaddr draft is forthcoming shortly. This other conversation is of interest, though. We're arguing about 100Mb/s interconnect technology as if we all planned to keep using it for some significant period. That's not so. Two years from now it'll be 622Mb/s or it'll be a dead concept. PMTUD matters, and TCP MSS (therefore IP MTU) matters because it will dictate the frame rate to the routers and the end hosts. Bytes are cheap, frames cost.
A new stdaddr draft is forthcoming shortly. This other conversation is of interest, though. We're arguing about 100Mb/s interconnect technology as if we all planned to keep using it for some significant period. That's not so. Two years from now it'll be 622Mb/s or it'll be a dead concept. PMTUD matters, and TCP MSS (therefore IP MTU) matters because it will dictate the frame rate to the routers and the end hosts. Bytes are cheap, frames cost.
Paul, with all due respect, there is a lot of apparent need and demand for 100mbps interconnects, one could argue that 100 100mbps interconnects are better than 3 or 4 622mbps interconnects. I think 100 interconnects are at least as likely in 2 years as 4 622mbps interconnects. Since most of the customers connecting up to these interconnects will be by devices with MTUs of 1500 (frame, T-1, switched 10 and 100mbps interconnects), and all of the network below these interconnected sites are 1500 MTU and smaller. I know of maybe 100 sites around the globle that can currently send FDDI MTUs in the 4000 range.a By contrast good esitmates show about 10 million dialup users connected via termial servers which have ethernet interfaces. In terms of total users FDDI users would be well below .001%. In terms of total revenues FDDI users would be well below 1%. Basied on these numbers, my recommendations to NAP/MAE builders would be to build based on 100mbps switched ethernet, with the option to interconnect FDDI. To make the FDDI worth while there would have to be at least two networks attached that had routers co-located (or virtually co-located), with DS-3 rate connections, and those two networks would each need at least one DS-3 rate customer each. Assuming this is the case the upgrade to FDDI is a relatively simple thing. Comparing the cost of 100mbps switched ethernet, to the 100 mbps switched FDDI, there is about 30% to 50% difference per port on the switch, and greater than that on the router. Prices on switched ethernet are dropping much faster than those of FDDI, because the price point is below the level at which a technology is only used where speed is the only factor. Speaking of interfaces and speed, I see Cisco now has a OC-3 packet interface that will do raw packets via HDLC or PPP at OC-3 rates via SONET. That seems much nicer than the ATM AIP card. You could build a fairly nice backbone with a dual attached mesh of OC-3 routers. -- Jeremy Porter, Freeside Communications, Inc. jerry@fc.net PO BOX 80315 Austin, Tx 78708 | 1-800-968-8750 | 512-339-6094 http://www.fc.net
Before you count-out switched FDDI, note that there are other FDDI switches in the world these days (with all deference to the Gigaswith's pioneering efforts, in whose hands our butts are held). They look to have good performance and the cost-per-port is rather lower. (Getting to be second and third, etc, is often a considerable advantage.) It's too bad there isn't an "official" way to run larger MTU (eg, FDDI) with 100baseTX running point-to-point Full-Duplex. Some of the chips even have control bits ignore MTU-exceeded conditions. The packet length limits are required only for Ethernet "cable mode" operations where collissions must be detected. In point-to-point Full-Duplex service, this is obviously a non-issue and an FDDI MTU would work just fine. So if there was an "official" (ie, interoperagble) large-packet mode and somebody built a 100baseTX switch that could handle jumbograms, then I think that Full-Duplexed Switched 100baseTX *would* be the medium of choice for many, many tasks - moderate-sized exchange points being one of them. Heck, I'd use it for interior wiring in my superhubs. But there are customers for whom end-to-end MTU is a buying issue. Bogus or not, it is a very real issue. -mo
But there are customers for whom end-to-end MTU is a buying issue. Bogus or not, it is a very real issue.
They exist, but are there enough of them to make it work the extra money to accomodate them? Nobody seems able to say how real the issue is (ie, hard usage and performance numbers). While larger MTUs are a good thing, they aren't very common and as 100BT gets more popular, I expect them to become even less common. I guess the router/switch manufacturers will just have to be prepared to handle high packet rates. Or maybe we need a "reverse fragmentation" protocol. Packets to the same destination could be lumped together and routed/switched as a single packet.
In message <199605040558.AAA03102@freeside.fc.net>, Jeremy Porter writes:
Paul, with all due respect, there is a lot of apparent need and demand for 100mbps interconnects, one could argue that 100 100mbps interconnects are better than 3 or 4 622mbps interconnects. I think 100 interconnects are at least as likely in 2 years as 4 622mbps interconnects.
That brings up another scaling problem. If one AS is attached to 100 or so interconnects, that's 100 border routers. That's not so bad with one backbone (ie: national provider in the US). If there are 2 or more (there are and will be more than 2), and each reaches X% of the 100 or so interconnects (where X% is 100% minus epsilon, say 95% for example), then they will announce the routes they hear from one interconnect to all other 100 or so minus 1 interconnects. The other backbones will hear the same set of routes 100 or so times. If we continue to do shortest path out, that's 100 paths (I'll drop the "or so") to each prefix. That means we need to fix the "MED latching" feature and use MED. With any good IBGP implementation, if MEDs are different the routers with less preferred routes with withhold their announcements. If so, we have another slight problem. If the primary route is withdrawn, most or all of those less preferred routes will get announces and gradually slosh around until the next best route is installed everywhere and all others are again suppressed. This is not impossible for a router to handle, just a lot of route flap to deal with and coding had better be very carefully done. There is existance proof that suggests that handling high levels of route flap can be done less than perfectly in ways that can promote sustained instability. The problem is that 100 interconnects will amplify route flap problems by a factor of 100. Just a heads up for router vendors - just expect to see high route flap loads and deal with it. Stability does not require infinite CPU power, just algorithms which don't choke when the load gets high, but rather converge at whatever rate they can sustain. I suspect we will see both. A lot more interconnects and a lot faster interconnects. Curtis
participants (6)
-
Curtis Villamizar
-
Havard.Eidnes@runit.sintef.no
-
Jeremy Porter
-
jon@branch.com
-
Mike O'Dell
-
Paul A Vixie