Re: nanog discussion of HTTP 1.1

12 Feb 1998

      I'm not sure all of this thread has gone to the NANOG list (which I do 
not subscribe to), but I think my response to Marc may be of general interest.
				- Jim
...
From: Marc Slemko <marcs@znep.com>
Date: Wed, 11 Feb 1998 12:28:03 -0700 (MST)
To: Henrik Frystyk Nielsen <frystyk@w3.org>
Cc: Vern Paxson <vern@ee.lbl.gov>, jg@w3.org
Subject: Re: nanog discussion of HTTP 1.1
On Wed, 11 Feb 1998, Henrik Frystyk Nielsen wrote:
...
At 21:15 2/8/98 -0700, Marc Slemko wrote:
...
FYI, your message hasn't shown up on the nanog list yet.  There are
filters in place to prevent people who aren't subscribed to nanog or
nanog-post or something like that from posting.
Hi Marc,
Ah - I just saw my message go through to the list...
...
The problem is that I do not see any of your tests being representative
of the "average" Internet user who is connected via a dialup modem
and may have highish latency and perhaps 20% packet loss.  This 
average user is, of course, just in my head.
I don't claim that we have hit anything near an average user nor web site.
Actually, I don't think it makes sense to talk about averages on anything
on the Internet, you have to talk about distributions and different
scenarios etc. I think that we made that clear in our paper and so your
mileage may vary.
Of course you do, however the distribution is so weighted that it isn't
that unrealistic to talk about the "average" use as someone with a low
speed high latency connection, often with moderate packet loss.  What has
been presented in this paper is good work, is useful, but it does not go
far enough to be able to draw conclusions about the majority of people
using HTTP today, IMHO.  You should be more in tune with this than I am,
though, and probably are.
...
However, I think you skipped my point that the numbers we saw varied
greatly as a function of time of day but the relative difference between
HTTP/1.0 and HTTP/1.1 was pretty constant, even in situations with high
data loss. I know for sure that we ran WAN tests when a link suffered 25%
packet loss, and HTTP/1.1 still came out as the winner.
...
If the traceroutes referenced on the page are accurate, then the
WAN link from LBL is really very close in terms of the Internet.
More importantly, the low bandwidth high lateny link has a 
_very_ short path with little room for congestion and packet loss.
The WAN RTT varied from 75-120 ms. Hmm, so you say that we should have run
PPP and WAN combined? That could have been interesting but we have enough
problems with Solaris TCP bugs, especially on PPP, that getting the data in
the first place was quite a task.
Yes, you need more varied tests on low speed high latency Internet
connections.  Your paper talks about the "transcontinental Internet" yet
none of your published tests that I can see even went over the Internet; 
even the link from MIT to LBL only went over one backbone and really has
low latency; I would not really call any of the links, even the PPP link,
a high latency connections.
Dunno what the traceroute was the day we took the tests (I think that 
is squirrelled away in our data); the traceroute of this morning is that 
it goes to NYC via BBN, and then Sprint to the west coast.  Whether this
counts as two rather than one is a matter for the network operators to
decide.
...
...
...
Although I have no data to support it, my gut feeling is that higher
packet loss combined with high latency of what I perceive to be 
the "typical" dialup connection (which, right now, includes a vast
majority of users) could have a significant impact on the results and
result in multiple connections still being a big win.
Do you have any data to back that up?
As I said, no I don't.  I don't have the facilities or the time to carry
out the necessary research.  In the absence of any evidence to the
contrary for this particular situation, I am inclined to be doubtful.
You are welcome to your doubts; but if you feel this way, you should 
encourage you and your peers to do more tests; NANOG is clearly in a better 
position to do extensive testing than we are.  All our testing technology 
is packaged up for distribution, so no-one need start from scratch.
...
...
...
While it appears clear that multiple connections can have a large
negative impact and should be avoided for numerous known and
reasonably understood reasons, I have not seen any research convincing
me that the typical dialup user would not still get better results
with multiple connections.  Such measurements are, of course, far more
complex to carry out and get something with meaning that can be applied
in general.
Well, our tests with HTTP/1.1 showed that it did save some bandwidth, 
and was somewhat faster for a dialup user; our tests were over real modems, 
not simulated, for loading a page the first time..

Bigger gains for dialup users will be by compression and stylesheet
technologie, for first-time cache loading.

As the bandwidth goes up, HTTP/1.1's gains went up.

For cache validation, HTTP/1.1 blew away HTTP/1.0 over a dialup.
(and it is better for the server as well, reducing its load).
...
...
We don't pretend to say that HTTP/1.1 only needs one connection - that is
not a realistic demand as HTTP/1.1 pipelining doesn't provide the same
functionality as HTTP/1.0 using multiple TCP connections. However, the
paper does show that using multiple connections is a looser in the test
cases that we run. If you get proof of situations where this is not true
then I would love to hear about it!
You do say:
we believe HTTP/1.1 can perform well over a single connection
it is true that does not say HTTP/1.1 requires only one connection,
however strongly suggests it.
Obviously, multiple short flows are bad.  The same number of long flows
are better, but would really make you wonder what you are accomplishing. 
When you start dealing with congestion, the simple fact is that today a
single client using a small percent of the total bandwidth on a congested
link will get more data through using multiple simultaneous connections. 
This is similar to how a modified TCP stack that doesn't properly respect
congestion control will have better performance, in the isolated case
where it is the only one acting that way, over such links.
You don't end up with the same number of long flows...  With HTTP/1.1, 
you end up with many fewer flows (typically 1), which is considerably 
longer (though not as much longer as you might naively think, as we send 
many fewer, larger, packets, which work against long packet trains).  
We also end up with half the packets to get lost (fewer small packets, 
ack packets, and packets associated with open and close).

Some of NANOG may be interested in one other tid-bit from our paper;
it showed the mean packet size doubled in our tests (since we get rid
of so many small packets, and buffer requests into large packets).  Not
a bad way to help fill the large pipes you all are installing, and
make it much easier for the router vendors to build the routers needed
to keep up with Internet growth.

As to getting more bandwidth with multiple TCP connections, this may be 
true, but we (deliberately) do not have data to this effect.  Our data 
shows that we can get higher performance over a single connection than 
HTTP/1.0 does over 4 connections (the typical current implementation; 
despite the dialog box in Navigator, it is fixed at 4 connections, said 
the implementer of Navigator to us last spring).  We deliberately did 
not want to encourage "non-network friendly" HTTP/1.1 implementations 
(and did not test that situation).  It was better left unsaid... :-). 

So we "gave" HTTP/1.0 4 simultaneous connections while only using 1 
ourselves.  Non-the-less, HTTP/1.1 beat it in all of our tests.

Something of order RED is needed to solve the "unfair" advantage that 
multiple connections may give, and we certainly strongly encourage RED's 
(or other active congestion control algorithms) deployment.  This is clearly 
outside of the scope of HTTP, and we believe it is as important (possibly 
more important) than HTTP/1.1.  Until it is deployed, application developers 
have incentive to mis-use the network, which game theory says is an unstable 
situation.

But remember, for HTTP/1.0, you have many short connections; it is clear 
you are usually operating TCP  no where close to its normal TCP congestion 
avoidance behavior.  Since the connections are being thrown away all the 
time, current TCP implementations are constantly searching for the bandwidth 
that the network can absorb, and are therefore normally either hurting 
the application's performance (sending too slow), or contributing to 
congestion (sending to fast) and hurting the applications performance and 
the network at the same time (by dropping packets due to congestion).
...
Congested links are a fact of life today in the Internet.  While the state
of the network isn't a HTTP issue nor an issue that can be solved by HTTP,
it needs to be understood when trying to make HTTP network friendly.  I
have some doubts that network congestion will be solved in the forseeable
future without QoS and/or new queueing methods and/or metered pricing.
HTTP (1.0) is a strong contributor to the congestion and high packet loss 
rates, due to its (mis)use of TCP, and the fact it is a majority of network 
traffic.  We, and many others, therefore believe it is an HTTP issue.  
TCP was never designed for the way HTTP/1.0 is using it.
...
I do not see much research on the current real-world interactions and
problems happening to common high-latency low-bandwidth moderate-loss
clients WRT HTTP.  Attempting to use existing research in the construction
of a new protocol to help overcome HTTP's deficiencies without taking
these into consideration could result in something that does not do what
users and client vendors want and will not be used.
Which is why we went and did our HTTP/1.1 work; if HTTP/1.1's performance was a 
net loss for the user, HTTP/1.1 would never see deployment.  All the intuition
in the world is not a substitute for some real measurments, over real
networks.
...
You can dismiss this all as rantings of a clueless lunatic if you want,
and you may be quite correct to do so since, as I have stated, I have
nothing to support my suspicions and have not done the research in the
area that you have.  However, from where I stand, I have concerns.
I don't find you a lunatic; you are have many/most of the worries and concerns
that we had ourselves before we took the data.  We had lots of intuition
that 1.1 ought to be able to do better than 1.0, but wondered about the
badly congested case, where things are less clear....

We ran the tests over about 6 months, over networks changing sometimes 
on a by minute basis, often with terrible congestion and packet loss, 
and latencies often much higher than the final dataset published in the 
paper.  The results were always in the same general form as we published 
(usually much higher elapsed times, but same general results). We picked 
the set we published as we finally got a single run with all data taken 
in a consistent fashion without hand tweaking. This run happened (probably 
no co-incidence) to be on a day that the network paths were behaving slightly 
better than usual, but we also were being greedy on getting as much data 
as we could, up until the publication deadline. (It is much fun finding 
out about, and getting fixes, for various broken TCP implementations, 
while the underlying network is in a state of distress.) Had we seen 
circumstances under which the results were in 1.0's favor, we'd be much 
more worried (and would probably have set up another test case to explore 
it).  As it is, I sleep pretty well at night on this one now, where 14 
months ago, I was similarly suspicious as you are.

We'd be happy to see more data on the topic.  I believe it is now in the 
court of the doubters to show us wrong, rather than us being obligated 
to show more data at this point, even if we were in a position to take 
more data under other network paths (which we aren't, both due to time 
and access to more network environments). 

We've gone out of our way to blaze the path for others to take more data, 
by packaging up our scripts and tools. We'd be happy to consult with people 
interested in getting more data under other network environments (so you 
won't have to suffer through as many of the data collection problems we 
did, usually due to buggy TCP's).	

				- Jim Gettys

Re: nanog discussion of HTTP 1.1

jg＠pa.dec.com