I'm not sure all of this thread has gone to the NANOG list (which I do
not subscribe to), but I think my response to Marc may be of general interest.
- Jim
> From: Marc Slemko <marcs(a)znep.com>
> Date: Wed, 11 Feb 1998 12:28:03 -0700 (MST)
> To: Henrik Frystyk Nielsen <frystyk(a)w3.org>
> Cc: Vern Paxson <vern(a)ee.lbl.gov>, jg(a)w3.org
> Subject: Re: nanog discussion of HTTP 1.1
>
> On Wed, 11 Feb 1998, Henrik Frystyk Nielsen wrote:
>
> > At 21:15 2/8/98 -0700, Marc Slemko wrote:
> >
> > >FYI, your message hasn't shown up on the nanog list yet. There are
> > >filters in place to prevent people who aren't subscribed to nanog or
> > >nanog-post or something like that from posting.
> >
> > Hi Marc,
> >
> > Ah - I just saw my message go through to the list...
> >
> > >The problem is that I do not see any of your tests being representative
> > >of the "average" Internet user who is connected via a dialup modem
> > >and may have highish latency and perhaps 20% packet loss. This
> > >average user is, of course, just in my head.
> >
> > I don't claim that we have hit anything near an average user nor web site.
> > Actually, I don't think it makes sense to talk about averages on anything
> > on the Internet, you have to talk about distributions and different
> > scenarios etc. I think that we made that clear in our paper and so your
> > mileage may vary.
>
> Of course you do, however the distribution is so weighted that it isn't
> that unrealistic to talk about the "average" use as someone with a low
> speed high latency connection, often with moderate packet loss. What has
> been presented in this paper is good work, is useful, but it does not go
> far enough to be able to draw conclusions about the majority of people
> using HTTP today, IMHO. You should be more in tune with this than I am,
> though, and probably are.
>
> > However, I think you skipped my point that the numbers we saw varied
> > greatly as a function of time of day but the relative difference between
> > HTTP/1.0 and HTTP/1.1 was pretty constant, even in situations with high
> > data loss. I know for sure that we ran WAN tests when a link suffered 25%
> > packet loss, and HTTP/1.1 still came out as the winner.
> >
> > >If the traceroutes referenced on the page are accurate, then the
> > >WAN link from LBL is really very close in terms of the Internet.
> > >More importantly, the low bandwidth high lateny link has a
> > >_very_ short path with little room for congestion and packet loss.
> >
> > The WAN RTT varied from 75-120 ms. Hmm, so you say that we should have run
> > PPP and WAN combined? That could have been interesting but we have enough
> > problems with Solaris TCP bugs, especially on PPP, that getting the data in
> > the first place was quite a task.
>
> Yes, you need more varied tests on low speed high latency Internet
> connections. Your paper talks about the "transcontinental Internet" yet
> none of your published tests that I can see even went over the Internet;
> even the link from MIT to LBL only went over one backbone and really has
> low latency; I would not really call any of the links, even the PPP link,
> a high latency connections.
>
Dunno what the traceroute was the day we took the tests (I think that
is squirrelled away in our data); the traceroute of this morning is that
it goes to NYC via BBN, and then Sprint to the west coast. Whether this
counts as two rather than one is a matter for the network operators to
decide.
>
> >
> > >Although I have no data to support it, my gut feeling is that higher
> > >packet loss combined with high latency of what I perceive to be
> > >the "typical" dialup connection (which, right now, includes a vast
> > >majority of users) could have a significant impact on the results and
> > >result in multiple connections still being a big win.
> >
> > Do you have any data to back that up?
>
> As I said, no I don't. I don't have the facilities or the time to carry
> out the necessary research. In the absence of any evidence to the
> contrary for this particular situation, I am inclined to be doubtful.
You are welcome to your doubts; but if you feel this way, you should
encourage you and your peers to do more tests; NANOG is clearly in a better
position to do extensive testing than we are. All our testing technology
is packaged up for distribution, so no-one need start from scratch.
>
> >
> > >While it appears clear that multiple connections can have a large
> > >negative impact and should be avoided for numerous known and
> > >reasonably understood reasons, I have not seen any research convincing
> > >me that the typical dialup user would not still get better results
> > >with multiple connections. Such measurements are, of course, far more
> > >complex to carry out and get something with meaning that can be applied
> > >in general.
Well, our tests with HTTP/1.1 showed that it did save some bandwidth,
and was somewhat faster for a dialup user; our tests were over real modems,
not simulated, for loading a page the first time..
Bigger gains for dialup users will be by compression and stylesheet
technologie, for first-time cache loading.
As the bandwidth goes up, HTTP/1.1's gains went up.
For cache validation, HTTP/1.1 blew away HTTP/1.0 over a dialup.
(and it is better for the server as well, reducing its load).
> >
> > We don't pretend to say that HTTP/1.1 only needs one connection - that is
> > not a realistic demand as HTTP/1.1 pipelining doesn't provide the same
> > functionality as HTTP/1.0 using multiple TCP connections. However, the
> > paper does show that using multiple connections is a looser in the test
> > cases that we run. If you get proof of situations where this is not true
> > then I would love to hear about it!
>
> You do say:
>
> we believe HTTP/1.1 can perform well over a single connection
>
> it is true that does not say HTTP/1.1 requires only one connection,
> however strongly suggests it.
>
> Obviously, multiple short flows are bad. The same number of long flows
> are better, but would really make you wonder what you are accomplishing.
> When you start dealing with congestion, the simple fact is that today a
> single client using a small percent of the total bandwidth on a congested
> link will get more data through using multiple simultaneous connections.
> This is similar to how a modified TCP stack that doesn't properly respect
> congestion control will have better performance, in the isolated case
> where it is the only one acting that way, over such links.
>
You don't end up with the same number of long flows... With HTTP/1.1,
you end up with many fewer flows (typically 1), which is considerably
longer (though not as much longer as you might naively think, as we send
many fewer, larger, packets, which work against long packet trains).
We also end up with half the packets to get lost (fewer small packets,
ack packets, and packets associated with open and close).
Some of NANOG may be interested in one other tid-bit from our paper;
it showed the mean packet size doubled in our tests (since we get rid
of so many small packets, and buffer requests into large packets). Not
a bad way to help fill the large pipes you all are installing, and
make it much easier for the router vendors to build the routers needed
to keep up with Internet growth.
As to getting more bandwidth with multiple TCP connections, this may be
true, but we (deliberately) do not have data to this effect. Our data
shows that we can get higher performance over a single connection than
HTTP/1.0 does over 4 connections (the typical current implementation;
despite the dialog box in Navigator, it is fixed at 4 connections, said
the implementer of Navigator to us last spring). We deliberately did
not want to encourage "non-network friendly" HTTP/1.1 implementations
(and did not test that situation). It was better left unsaid... :-).
So we "gave" HTTP/1.0 4 simultaneous connections while only using 1
ourselves. Non-the-less, HTTP/1.1 beat it in all of our tests.
Something of order RED is needed to solve the "unfair" advantage that
multiple connections may give, and we certainly strongly encourage RED's
(or other active congestion control algorithms) deployment. This is clearly
outside of the scope of HTTP, and we believe it is as important (possibly
more important) than HTTP/1.1. Until it is deployed, application developers
have incentive to mis-use the network, which game theory says is an unstable
situation.
But remember, for HTTP/1.0, you have many short connections; it is clear
you are usually operating TCP no where close to its normal TCP congestion
avoidance behavior. Since the connections are being thrown away all the
time, current TCP implementations are constantly searching for the bandwidth
that the network can absorb, and are therefore normally either hurting
the application's performance (sending too slow), or contributing to
congestion (sending to fast) and hurting the applications performance and
the network at the same time (by dropping packets due to congestion).
> Congested links are a fact of life today in the Internet. While the state
> of the network isn't a HTTP issue nor an issue that can be solved by HTTP,
> it needs to be understood when trying to make HTTP network friendly. I
> have some doubts that network congestion will be solved in the forseeable
> future without QoS and/or new queueing methods and/or metered pricing.
HTTP (1.0) is a strong contributor to the congestion and high packet loss
rates, due to its (mis)use of TCP, and the fact it is a majority of network
traffic. We, and many others, therefore believe it is an HTTP issue.
TCP was never designed for the way HTTP/1.0 is using it.
>
> I do not see much research on the current real-world interactions and
> problems happening to common high-latency low-bandwidth moderate-loss
> clients WRT HTTP. Attempting to use existing research in the construction
> of a new protocol to help overcome HTTP's deficiencies without taking
> these into consideration could result in something that does not do what
> users and client vendors want and will not be used.
Which is why we went and did our HTTP/1.1 work; if HTTP/1.1's performance was a
net loss for the user, HTTP/1.1 would never see deployment. All the intuition
in the world is not a substitute for some real measurments, over real
networks.
>
> You can dismiss this all as rantings of a clueless lunatic if you want,
> and you may be quite correct to do so since, as I have stated, I have
> nothing to support my suspicions and have not done the research in the
> area that you have. However, from where I stand, I have concerns.
>
I don't find you a lunatic; you are have many/most of the worries and concerns
that we had ourselves before we took the data. We had lots of intuition
that 1.1 ought to be able to do better than 1.0, but wondered about the
badly congested case, where things are less clear....
We ran the tests over about 6 months, over networks changing sometimes
on a by minute basis, often with terrible congestion and packet loss,
and latencies often much higher than the final dataset published in the
paper. The results were always in the same general form as we published
(usually much higher elapsed times, but same general results). We picked
the set we published as we finally got a single run with all data taken
in a consistent fashion without hand tweaking. This run happened (probably
no co-incidence) to be on a day that the network paths were behaving slightly
better than usual, but we also were being greedy on getting as much data
as we could, up until the publication deadline. (It is much fun finding
out about, and getting fixes, for various broken TCP implementations,
while the underlying network is in a state of distress.) Had we seen
circumstances under which the results were in 1.0's favor, we'd be much
more worried (and would probably have set up another test case to explore
it). As it is, I sleep pretty well at night on this one now, where 14
months ago, I was similarly suspicious as you are.
We'd be happy to see more data on the topic. I believe it is now in the
court of the doubters to show us wrong, rather than us being obligated
to show more data at this point, even if we were in a position to take
more data under other network paths (which we aren't, both due to time
and access to more network environments).
We've gone out of our way to blaze the path for others to take more data,
by packaging up our scripts and tools. We'd be happy to consult with people
interested in getting more data under other network environments (so you
won't have to suffer through as many of the data collection problems we
did, usually due to buggy TCP's).
- Jim Gettys