I'm not sure all of this thread has gone to the NANOG list (which I do not subscribe to), but I think my response to Marc may be of general interest. - Jim
From: Marc Slemko <marcs@znep.com> Date: Wed, 11 Feb 1998 12:28:03 -0700 (MST) To: Henrik Frystyk Nielsen <frystyk@w3.org> Cc: Vern Paxson <vern@ee.lbl.gov>, jg@w3.org Subject: Re: nanog discussion of HTTP 1.1
On Wed, 11 Feb 1998, Henrik Frystyk Nielsen wrote:
At 21:15 2/8/98 -0700, Marc Slemko wrote:
FYI, your message hasn't shown up on the nanog list yet. There are filters in place to prevent people who aren't subscribed to nanog or nanog-post or something like that from posting.
Hi Marc,
Ah - I just saw my message go through to the list...
The problem is that I do not see any of your tests being representative of the "average" Internet user who is connected via a dialup modem and may have highish latency and perhaps 20% packet loss. This average user is, of course, just in my head.
I don't claim that we have hit anything near an average user nor web site. Actually, I don't think it makes sense to talk about averages on anything on the Internet, you have to talk about distributions and different scenarios etc. I think that we made that clear in our paper and so your mileage may vary.
Of course you do, however the distribution is so weighted that it isn't that unrealistic to talk about the "average" use as someone with a low speed high latency connection, often with moderate packet loss. What has been presented in this paper is good work, is useful, but it does not go far enough to be able to draw conclusions about the majority of people using HTTP today, IMHO. You should be more in tune with this than I am, though, and probably are.
However, I think you skipped my point that the numbers we saw varied greatly as a function of time of day but the relative difference between HTTP/1.0 and HTTP/1.1 was pretty constant, even in situations with high data loss. I know for sure that we ran WAN tests when a link suffered 25% packet loss, and HTTP/1.1 still came out as the winner.
If the traceroutes referenced on the page are accurate, then the WAN link from LBL is really very close in terms of the Internet. More importantly, the low bandwidth high lateny link has a _very_ short path with little room for congestion and packet loss.
The WAN RTT varied from 75-120 ms. Hmm, so you say that we should have run PPP and WAN combined? That could have been interesting but we have enough problems with Solaris TCP bugs, especially on PPP, that getting the data in the first place was quite a task.
Yes, you need more varied tests on low speed high latency Internet connections. Your paper talks about the "transcontinental Internet" yet none of your published tests that I can see even went over the Internet; even the link from MIT to LBL only went over one backbone and really has low latency; I would not really call any of the links, even the PPP link, a high latency connections.
Dunno what the traceroute was the day we took the tests (I think that is squirrelled away in our data); the traceroute of this morning is that it goes to NYC via BBN, and then Sprint to the west coast. Whether this counts as two rather than one is a matter for the network operators to decide.
Although I have no data to support it, my gut feeling is that higher packet loss combined with high latency of what I perceive to be the "typical" dialup connection (which, right now, includes a vast majority of users) could have a significant impact on the results and result in multiple connections still being a big win.
Do you have any data to back that up?
As I said, no I don't. I don't have the facilities or the time to carry out the necessary research. In the absence of any evidence to the contrary for this particular situation, I am inclined to be doubtful.
You are welcome to your doubts; but if you feel this way, you should encourage you and your peers to do more tests; NANOG is clearly in a better position to do extensive testing than we are. All our testing technology is packaged up for distribution, so no-one need start from scratch.
While it appears clear that multiple connections can have a large negative impact and should be avoided for numerous known and reasonably understood reasons, I have not seen any research convincing me that the typical dialup user would not still get better results with multiple connections. Such measurements are, of course, far more complex to carry out and get something with meaning that can be applied in general.
Well, our tests with HTTP/1.1 showed that it did save some bandwidth, and was somewhat faster for a dialup user; our tests were over real modems, not simulated, for loading a page the first time.. Bigger gains for dialup users will be by compression and stylesheet technologie, for first-time cache loading. As the bandwidth goes up, HTTP/1.1's gains went up. For cache validation, HTTP/1.1 blew away HTTP/1.0 over a dialup. (and it is better for the server as well, reducing its load).
We don't pretend to say that HTTP/1.1 only needs one connection - that is not a realistic demand as HTTP/1.1 pipelining doesn't provide the same functionality as HTTP/1.0 using multiple TCP connections. However, the paper does show that using multiple connections is a looser in the test cases that we run. If you get proof of situations where this is not true then I would love to hear about it!
You do say:
we believe HTTP/1.1 can perform well over a single connection
it is true that does not say HTTP/1.1 requires only one connection, however strongly suggests it.
Obviously, multiple short flows are bad. The same number of long flows are better, but would really make you wonder what you are accomplishing. When you start dealing with congestion, the simple fact is that today a single client using a small percent of the total bandwidth on a congested link will get more data through using multiple simultaneous connections. This is similar to how a modified TCP stack that doesn't properly respect congestion control will have better performance, in the isolated case where it is the only one acting that way, over such links.
You don't end up with the same number of long flows... With HTTP/1.1, you end up with many fewer flows (typically 1), which is considerably longer (though not as much longer as you might naively think, as we send many fewer, larger, packets, which work against long packet trains). We also end up with half the packets to get lost (fewer small packets, ack packets, and packets associated with open and close). Some of NANOG may be interested in one other tid-bit from our paper; it showed the mean packet size doubled in our tests (since we get rid of so many small packets, and buffer requests into large packets). Not a bad way to help fill the large pipes you all are installing, and make it much easier for the router vendors to build the routers needed to keep up with Internet growth. As to getting more bandwidth with multiple TCP connections, this may be true, but we (deliberately) do not have data to this effect. Our data shows that we can get higher performance over a single connection than HTTP/1.0 does over 4 connections (the typical current implementation; despite the dialog box in Navigator, it is fixed at 4 connections, said the implementer of Navigator to us last spring). We deliberately did not want to encourage "non-network friendly" HTTP/1.1 implementations (and did not test that situation). It was better left unsaid... :-). So we "gave" HTTP/1.0 4 simultaneous connections while only using 1 ourselves. Non-the-less, HTTP/1.1 beat it in all of our tests. Something of order RED is needed to solve the "unfair" advantage that multiple connections may give, and we certainly strongly encourage RED's (or other active congestion control algorithms) deployment. This is clearly outside of the scope of HTTP, and we believe it is as important (possibly more important) than HTTP/1.1. Until it is deployed, application developers have incentive to mis-use the network, which game theory says is an unstable situation. But remember, for HTTP/1.0, you have many short connections; it is clear you are usually operating TCP no where close to its normal TCP congestion avoidance behavior. Since the connections are being thrown away all the time, current TCP implementations are constantly searching for the bandwidth that the network can absorb, and are therefore normally either hurting the application's performance (sending too slow), or contributing to congestion (sending to fast) and hurting the applications performance and the network at the same time (by dropping packets due to congestion).
Congested links are a fact of life today in the Internet. While the state of the network isn't a HTTP issue nor an issue that can be solved by HTTP, it needs to be understood when trying to make HTTP network friendly. I have some doubts that network congestion will be solved in the forseeable future without QoS and/or new queueing methods and/or metered pricing.
HTTP (1.0) is a strong contributor to the congestion and high packet loss rates, due to its (mis)use of TCP, and the fact it is a majority of network traffic. We, and many others, therefore believe it is an HTTP issue. TCP was never designed for the way HTTP/1.0 is using it.
I do not see much research on the current real-world interactions and problems happening to common high-latency low-bandwidth moderate-loss clients WRT HTTP. Attempting to use existing research in the construction of a new protocol to help overcome HTTP's deficiencies without taking these into consideration could result in something that does not do what users and client vendors want and will not be used.
Which is why we went and did our HTTP/1.1 work; if HTTP/1.1's performance was a net loss for the user, HTTP/1.1 would never see deployment. All the intuition in the world is not a substitute for some real measurments, over real networks.
You can dismiss this all as rantings of a clueless lunatic if you want, and you may be quite correct to do so since, as I have stated, I have nothing to support my suspicions and have not done the research in the area that you have. However, from where I stand, I have concerns.
I don't find you a lunatic; you are have many/most of the worries and concerns that we had ourselves before we took the data. We had lots of intuition that 1.1 ought to be able to do better than 1.0, but wondered about the badly congested case, where things are less clear.... We ran the tests over about 6 months, over networks changing sometimes on a by minute basis, often with terrible congestion and packet loss, and latencies often much higher than the final dataset published in the paper. The results were always in the same general form as we published (usually much higher elapsed times, but same general results). We picked the set we published as we finally got a single run with all data taken in a consistent fashion without hand tweaking. This run happened (probably no co-incidence) to be on a day that the network paths were behaving slightly better than usual, but we also were being greedy on getting as much data as we could, up until the publication deadline. (It is much fun finding out about, and getting fixes, for various broken TCP implementations, while the underlying network is in a state of distress.) Had we seen circumstances under which the results were in 1.0's favor, we'd be much more worried (and would probably have set up another test case to explore it). As it is, I sleep pretty well at night on this one now, where 14 months ago, I was similarly suspicious as you are. We'd be happy to see more data on the topic. I believe it is now in the court of the doubters to show us wrong, rather than us being obligated to show more data at this point, even if we were in a position to take more data under other network paths (which we aren't, both due to time and access to more network environments). We've gone out of our way to blaze the path for others to take more data, by packaging up our scripts and tools. We'd be happy to consult with people interested in getting more data under other network environments (so you won't have to suffer through as many of the data collection problems we did, usually due to buggy TCP's). - Jim Gettys