Cool, I love being talked down to by old guys. It's refreshing and doesn't happen nearly frequently enough. I'm almost at a loss to figure out where to begin with the scattershot flamefest you sent. Almost. Let's start here:
Lets see you allocate an ESF B8ZS Clear Channel T1 over IP....
PDH is dead. POTS is only alive because you can emulate PDH still, and extracting a single DS0 from SDH is easy, and because the POTS user interface is well understood by a very large installed user base. I don't expect POTS as perceived by the end-user to change much over time. End-to-end POTS is already dying. Worldcom is making a big deal over relatively simple technology which shuffles fax traffic over the Internet. There goes alot of long-haul POTS right there. Deutsche Telekom is tight with Vocaltec and already has a tariff for voice-over-the-Internet. It's crusty in implementation because you end up dialling a local access number in DT land and talk to another dialler on the remote end which makes a more local phone call. However, there are neat plans for SS7 and neater plans for doing clever things with interpreting DTMF. There is a local distribution plant problem however there are a number of people working on aggregating up local access lines into VC11/VC12, dropping that into a POP-in-a-box at STM-16 and pulling out STM-16c to a big crunchy IP router. In this model POTS and historical telco voice and data schemes become services rather than infrastructure. However, emulating the incredibly ugly phone network is secondary to enabling the evolution of more and more complicated and interesting applications; it's also less cost-effective for the moment than running parallel networks but converting away from PDH (which, remember, is dead). BTW, your formatting sucks to the point that your note is unreadable and unquotable without fmt(1) or fill-paragraph.
Ahhh, tag switching, I am on that particular holy grail as well.....
You might want to examine my comments on the mpls list at some point. "Holy Grail"? No. That was Noel. I know we look alike and stuff, but he sees a great deal of promise in MPLS while I am somewhat sceptical about the implementation and utility in practice.
How many parallel paths have you ran on layer 3 ? Ever watched the variability ? ( * shiver * ) Now, tell me parallel paths on IP are smooth with todays technology!
Hum, not more than six hours ago I believe I was telling Alan Hannan about the various interim survival techniques in the migration path from 7k+SSP -> decent routers. I guess you must have me beat experientially. All I ever did was sit down with pst and tli and hack and slash at the 10.2-viktor caching scheme to try to get traffic to avoid moving over to the stabler of the two lines between ICM-DC and RENATER. Oh that and helping beat on CEF/DFIB packet-by-packet load balancing before my last retirement. So unfortunately I'm really not in a position to comment on today's technology or the variability of parallel paths with Cisco routers using any of the forwarding schemes from fast to cbus to route-cache to flow to fib. (To be honest I never really figured out wtf optimum was :) ). The reason that you see "strange" or at least "unsmooth" load balancing along parallel paths is that except with fib and slow switching cisco had always forwarded packets towards the same destination out the same interface, and load balancing was performed by assigning (upon a cache fault) particular destinations to particular interfaces. (Leon was invented to blow away cached entries so that over time prefixes would slosh about from one interface to another as they were re-demand-filled into the cache. Points if you know who Leon and Viktor are. Hint: they're both as dead as PDH.) With CEF these days you can load-balance on a per-packet basis. This has the side effect that you cannot guarantee that packets will remain in sequence if the one way delay across the load-balanced paths is off by more than about half a packet transmission time. However, you also get much more even link utilization and no ugly cache/uncache/recache at frequent intervals (which really sucks because unfortunately you have to push a packet through the slow path at every recache). So anyway, as I was saying, I'm ignorant about such matters.
Audio sounds great with lots of variability
So if you aren't holding a full-duplex human-to-human conversation you introduce delay on the receiver side proportional to something like the 95th percentile and throw away outliers. If you're holding a full-duplex long-distance human-to-human conversation you can use POTS (which is dying but which will live on in emulation) and pay lots of money or you can use one of a number of rather clever VON member packages and pay alot less money but put up with little nagging problems. For local or toll-free stuff, to expect better price-performance from an end-user perspective now would require taking enormous doses of reality-altering drugs.
You want bounded delay on some traffic profiles that approach having hard real time requirements. (Anything that has actual hard real time requirements has no business being on a statistically multiplexed network, no matter what the multiplexing fabric is).
Such as voice? Why do you think SDM was created in the first place? Or do you mean like a military application, 2ms to respond to a nuke.... That is when channel priorities come into play.
I wasn't around for the invention of statistical muxing, but I'm sure there are some people here who could clarify with first-hand knowledge (and I'll take email from you, thanks :) ). If it was created for doing voice, I'm going to be surprised, because none of the voice literature I've ever looked was anything but circuit-modeled with TD muxing of DS0s because that is how God would design a phone network. Um, ok, why is it my day for running into arguments about real time. Hmm... "Real time" events are those which must be responded to by a deadline, otherwise the value of the response decays. In most real time applications, the decay curve varies substantially with the value of a response dropping to zero after some amount of time. This is "soft real time". "Hard real time" is used when the decay curve is vertical, that is, if the deadline is passed the response to the event is worthless or worse. There are very few hard real time things out there. Anything that is truly in need of hard real time response should not be done on a statmuxed network or on a wide PDH network (especially not since PDH is dead in large part because the propagation delay is inconsistent and unpredictable thanks to bitstuffing) unless variance in propagation delay is less than the window for servicing the hard real time event. Soft real time things can be implemented across a wide variety of unpredictable media depending on the window available to service the real time events and the slope of the utility decay function. For instance, interactive voice and video have a number of milliseconds leeway before a human audience will notice lag. Inducing a delay to avoid missing the end of the optimal window for receiving in-sequence frames or blobs of compressed voice data is wise engineering, particularly if the induced delay is adjusted to avoid it itself leading to loss of data utility.
However, I have NEVER failed to get the bandwith "promised" in our nets.
Sure, the problem is with mixing TCP and other window-based congestion control schemes which rely on implicit feedback with a rate-based congestion control scheme, particularly when it relies on explicit feedback. The problem is exacerbated when the former overlaps the latter, such that only a part of the path between transmitter and receiver is congestion controlled by the same rate-based explicit feedback mechanism. What happens is that in the presence of transient congestion unless timing is very tightly synchronized (Van Jacobson has some really entertaining rants about this) the "outer loop" will react by either hovering around the equivalent of the CIR or by filling the pipe until the rate based mechanism induces queue drops. In easily observable pathological cases there is a stair step or vacillation effect resembling an old TCP sawtooth pattern rather than the much nicer patterns you get from a modern TCP with FT/FR/1321 stamps/SACK. In other words your goodput suffers dramatically.
But, doesn't that same thing happen when you over-run the receiving router ?????
Yes, and with OFRV's older equipment the lack of decent buffering (where decent output buffering is, per port, roughly the bandwidth x delay product across the network) was obvious as bandwidth * delay products increased. With this now fixed in modern equipment and WRED available, the implicit feedback is not so much dropped packets as delayed ACKs, which leads to a much nicer subtractive slow-down by the transmitter, rather than a multiplicative backing off. So, in other words, in a device properly designed to handle large TCP flows, you need quite a bit of buffering and benefit enormously from induced early drops. As a consequence, when the path between transmitter and receiver uses proper, modern routers, buffer overruns should never happen in the face of transient congestion. Unfortunately this is easily seen with many popular rate-based congestion-control schemes as they react to transient congestion. Finally another ABR demon is in the decay of the rate at which a VS is allowed to send traffic, which in the face of bursty traffic (as one tends to see with most TCP-based protocols) throttles goodput rather dramatically. Having to wait an RTT before an RM cell returns tends to produce unfortunate effects, and the patch around this is to try to adjust the scr contract to some decent but low value and assure that there is enough buffering to allow a VS's burst to wait to be serviced and hope that this doesn't worsen the bursty pattern by bunching up alot of data until an RM returns allowing the queue to drain suddenly.
Ahhh.. We await the completion, and proper interaction of RM, ILMI, and OAM. These will, (and in some cases already DO), provide that information back to the router/tag switch. Now do they use it well ????? That is a different story....
The problem is that you need the source to slow transmission down, and the only mechanism to do that is to delay ACKs or induce packet drops. Even translating FECN/BECN into source quench or a drop close to the source is unhelpful since the data in flight will already lead to feedback which will slow down the source. The congestion control schemes are essentially fundamentally incompatible.
Delay across any fabric of any decent size is largely determined by the speed of light.
Where in the world does this come from in the industry. Maybe I am wrong, but Guys, do the math. The typical run across the North American Continent is timed at about 70ms. This is NOT being limited by the speed of light.
That would be round-trip time.
Light can travel around the world 8 times in 1 second. This means it can travel once around the world (full trip) in ~ 120 ms. Milliseconds, not micro.... So, why does one trip across North america take 70ms...
Light is slower in glass.
Hint, it is not the speed of light. Time is incurred encoding, decoding, and routing.
Kindly redo your calculation with a decent speed of light value. Unfortunately there is no vacuum between something in NYC and something in the SF Bay area.
BTW this (70ms median across the US) comes from a predominantly ATM network. Actually, I am quoting Pac-Bell.
Oh now THERE's a reliable source. "Hi my name is Frank and ATM will Just Work. Hi my name is Warren and ATM is fantastic.". Ugh. (kent bait kent bait kent bait)
Therefore, unless ABR is deliberately inducing queueing delays, there is no way your delay can be decreased when you send lots of traffic unless the ATM people have found a way to accelerate photons given enough pressure in the queues.
More available bandwidth = quicker transmission.
Ie: at 1000kb/s available, how long does it take to transmit 1000kb ? 1 second. Now, at 2000kb/s available, how long does it take ? 1/2 second. What were you saying ?
At higher bandwidths bits are shorter not faster. Repeat that several times. Whether you are signalling at 300bps or at 293875983758917538924372589bps, the start of the first bit arrives at the same time.
Why do you think you have "centi"-second delays in the first place.
Because photons and electrons are slow in glass and copper.
I would check yours, but I find time for a packet to cross a router > backplane to be < 1ms, route > determination in a traditional router can take up to 20 ms (or more), > and slightly less than a 1 ms, > if it is in cache. When I said cross a backplane, I meant "From > hardware ingress to egress", ie to be delivered.
You are still stuck thinking of routers as things which demand-fill a cache by dropping a packet through a slow path. This was an artefact of OFRV's (mis)design, and the subject of many long and interesting rants by Dennis Ferguson on this list a couple of years ago. Modern routers simply don't do this, even the ones from OFRV.
traceroute to cesium.clock.org (140.174.97.8), 30 hops max, 40 byte packets
6 core2-fddi3-0.san-francisco.yourtransit.net (-.174.56.2) 567 ms 154 ms 292 ms
>>>>>>>>> Tell me this is a speed of light issue. >>>>>>>>> From the FDDI to the HSSI on the same router.
This has nothing to do with the router's switching or route lookup mechanism. Router requirements allow routers to be selective in generating ICMP messages, and cisco's implementation on non-CEF routers will hand the task of generating ICMP time exceededs, port unreachables and echo replies to the main processor, which gets to the task as a low priority when it's good and ready. If the processor is doing anything else at the time you get rather long delays in replies, and if it's busy enough to start doing SPD you get nothing. This gets talked about quite frequently on the NANOG list. I suggest you investigate the archives. I'm sure Michael Dillon can point you at them. He's good at that.
PING cesium.clock.org (140.174.97.8): 56 data bytes 64 bytes from 140.174.97.8: icmp_seq=0 ttl=243 time=93 ms 64 bytes from 140.174.97.8: icmp_seq=1 ttl=243 time=78 ms 64 bytes from 140.174.97.8: icmp_seq=2 ttl=243 time=79 ms 64 bytes from 140.174.97.8: icmp_seq=3 ttl=243 time=131 ms 64 bytes from 140.174.97.8: icmp_seq=4 ttl=243 time=78 ms 64 bytes from 140.174.97.8: icmp_seq=5 ttl=243 time=81 ms 64 bytes from 140.174.97.8: icmp_seq=6 ttl=243 time=75 ms 64 bytes from 140.174.97.8: icmp_seq=7 ttl=243 time=93 ms
Nice and stable, huh. If this path were ATM switched (Dorian, I will respond to you in another post) it would have settled to a stable latency.
There is extraordinary congestion in the path between your source and cesium.clock.org, and cesium is also rather busy being CPU bound on occasion. There is also a spread-spectrum radio link between where it lives (the land of Vicious Fishes) and "our" ISP (Toad House), and some of the equipment involved in bridging over that is flaky. If you were ATM switching over that link you would see the same last-hop variability because of that physical level instability. It works great for IP though, and I quite happily am typing this at an emacs thrown up onto my X display in Scandinavia across an SSH connection.
Flow switching does a route determination once per flow, after that the packets are switched down a predetermined path "The Flow". Hence the term "flow switching". This reduces the variability of the entire flow.
Um, no it doesn't. As with all demand-cached forwarding schemes you have to process a packet heavily when you have a cache miss. Darren Kerr did some really neat things to make it less disgusting than previous demand-cached switching schemes emanating out of OFRV, particularly with respect to gleaning lots of useful information out of the side-effects of a hand-tuned fast path that was designed to account for all the header processing one could expect. Flow switching does magic matching of related packets to cache entries which describe the disposition of the packet that in previous caching schemes could only be determined by processing individual packets to see if they matched various access lists and the like. It's principal neat feature is that less per-packet processing means more pps throughput. MPLS is conceptually related, btw. Flow switching does not improve queueing delays or speed up photons and electrons, however, nor does it worsen them, therefore the effect of flow switching on variability of normal traffic is nil. Flow switching has mechanisms cleverer than Leon the Cleaner to delete entries from the cache and consequently there are much reduced odds of a cache fault during a long-lived flow that is constantly sending at least occasional traffic. You may see this as reducing variability. I see it as fixing a openly-admitted design flaw.
However, I should also point out that much of your argument is based in TCP. Most multimedia (Voice/Audio/Video) content does not focus on TCP, but UDP/Multicast. What does your slow start algorithm get you then ?
WRED and other admissions control schemes are being deployed that will penalize traffic that is out of profile, i.e., that doesn't behave like a reasonable TCP behaves. Most deployed streaming technologies have taken beatings from ISPs (EUNET, for example with CUSEEME, Vocaltec and Progressive with a wide range of ISPs) and have implemented congestion avoidance schemes that closely mimic TCP's, only in some cases there is no retransmission scheme.
PS MAC Layer Switching, and ATM switching are apples and oranges. Although, one could be used to do the other. (Told you Dorian)
Huh? Sean. P.S.: You have some very entertaining and unique expansion of a number of acronyms in general and relating to ATM in particular. I'm curious how you expand "PDH". Personally I favour "Pretty Damn Historical", although other epithets come to mind. P.P.S.: It's dead.