On Mon, Mar 16, 2009 at 09:09:35AM -0500, Leo Bicknell wrote:
The result is that if the vendor targeted 100ms of buffer you now have 400ms of buffer, and really bad lag.
Well, this is one of the reasons why I hate the fact that we're effectively stuck in a 1500 MTU world. My customers are vastly concerned with the quantity of data they can transmit per unit of latency. You may be more familiar with this termed as "through-put". Customers beat us operators and engineers up over it every day. TCP window tuning does help that if you can manage the side effects. A larger default layer 2 MTU (why we didn't change this when GE came out, I will never understand) would help even more by reducing the total number of frames necessary to transmit a packet across a give wire.
As network operators we have to get out of the mind set that "packet drops are bad"
Well, thats easier said than done and arguably not realistic. I got started in this business when 1-3% packet loss was normal and expected. As the network has grown, the expectation for 0% loss in all cases has grown with it. You have to remember that in the early days, the network itself was expected to guarentee data delivery. (ie X.25) Then the network improved and that burdon was cast on the host devices. Well, technology has continued to improve to the point where you litterally can expect 0% packet loss in relatively confined areas. (Say, Provider X in Los Angeles to user Y in San Jose.) But as you go further afield, such as from LAX to Israel, expectations have to change. Today, that mindset is not always there. As you illude to, this has also bred applications that are almost entirely intollerant of packet loss and extremely sensitive to jitter. (VOIP people, are you listening?) Real time gaming is a great example. Back in the days when 99% of us were on modems, any loss or varying delay between the client and the user made the difference between an enjoyable session and nothing but frustration and it was often hit and miss. A congested or dirty link in the middle of the path destroyed the user's experience. This is further compounded by the ever increasingly international participation in some of these services which means that 24x7 requirements render the customers and their users more and more sensitive to maintenance activities. (There can be areas where there is no "after hours" in which to do this stuff.) Add to this that as media companies expand their use of the network that customers have forced providers to write into their SLAs performance based metrics that, rather than simple uptime, now require often arbitrary guarentees of latency and data loss and you've got a real problem for operations and engineering. Techniques that can help improve network integrity are worth exploring. The difficulty is in proving these techniques under a wide array of circumstances, getting them properly adopted, and not having vendors or customers arbitrarily break them because of improper understanding, poor implementations, or bad configs (PMTUD, anyone?) Going forward, this sort of thing is going to be more and more important and harder and harder to get right. I'm actually glad to see this particular thread appear and will be quite interested in what people have to say on the matter. -Wayne --- Wayne Bouchard web@typo.org Network Dude http://www.typo.org/~web/