Re: Questions about Internet Packet Losses

13 Jan 1997

      to add to tony's constructive response...

re: host stacks.  other improvements i've heard of that are
relevant to "reacting to http" are (1) applying slow-start-
and congestion-avoidance-type algorithms to the rate at
which new tcp connections are opened and (2) having kernels
share data in the protocol control blocks relevant to the
tcp algorithms across connections to the same host (or even
"network").  there are issues with both of these ideas, but
the point is that we can do things to react to the observed
behavior resulting from the extreme popularity of the web

re: merit's data.  while i have the highest respects for
their motivations in collecting this data, i *do* concur
with tony about them doing anything more than reporting raw
numbers.  and with respect to the raw numbers, some questions
and observations are:

    (1) the measurements are made nap-to-nap.  no user traffic
        goes from nap-to-nap (because ISP1 doesn't play
        transit for traffic between ISP2 and ISP3)
    (2) what percentage of traffic between large providers
        goes across public exchanges anyway?
    (3) do the collection methodologies and analysis
        techniques share community consensus? is the former
        independantly verifiable?

having said all of this, i will add that some of their data
is alarming.  but for the second time today on this list, i
will say that we need to be careful about chasing stats for
their own sake .. we need to look very carefully at *exactly*
what is being measured, how it's being collected and analyzed
and what the results actually mean

/jws
...
Bob,
You quote:
"Although some
   of the packet loss is inadvertent, a large percentage of the public
   exchange point connectivity problems reflect intentional engineering
   decisions by Internet service providers based on commercial settlement
   issues.
I think that this is an _extremely_ dangerous assertion on Merit's part.
As always, ascribing intent rather than raw data requires much more
justification which I have yet to see.
Are you familiar with this packet loss data from Merit?  If not, please s
ee
...
above URL.
Am now...  ;-)
Is Merit's packet loss data (NetNow) credible?  Do packet losses in the
   Internet now average between 2% and 4% daily?  Are 30% packet losses comm
on
   during peak periods?  Is there any evidence that Internet packet losses a
re
   trending up or down?
Yes, that matches my instinctive feel.  I don't have concrete data which
corroborates or disputes their data, nor reflects high packet loss rates
nor trends.
Were Merit's data correct, what would be the impact of 30% packet losses 
on
   opening up TCP connections?
TCP is pretty damn robust.  Opening a connection is still likely to work.
On TCP throughput, say through a 28.8Kbps
   modem?  On Web throughput, since so many TCP connections are involved?  O
n
   DNS look-ups?  On email transport?
As you might imagine, that kind of packet loss rate is 'highly detrimental'
to throughput.  If you're asking for concrete numbers, I don't have them,
but I've lived through them.  Qualitatively, it means that interactive
usage is intolerable.  On the bright side, email works just fine.
How big a problem is HTTP's opening of so many TCP connections?
It's a very significant problem.  It decreases the average packet size,
thereby making router work much harder.  It generates many more packets
than necessary, and then closes down the connection after a very short
transfer.  In short, it's a horribly inefficient use of the net.
Does TCP need to operate differently than it does now when confronted
   routinely with 30% packet losses and quarter-second transit delays?
Your question presumes that we should live with the 30% losses.  We should
not.  TCP does palatably well at surviving such brown-outs and I would not
suggest changes for that cause.  Note that there are other changes that I'd
like to see, such as more use of Path MTU Discovery and fixing HTTP which
are much more important.  The quarter-second transit delays fall into two
categories: one are transient delays, mostly caused by routing transients.
Obviously we need to minimize such transients.  The second is normal
propagation delay.  Using larger windows would aid that a great deal.  I
don't think that many TCP implementations allocate sufficient buffering
today to truly be efficient.
What is the proper
   response of an IP-based protocol, like TCP, as packet losses climb?  Try
   harder or back off or what?
Back off.  Slow start is the accepted algorithm.  Trying harder only
increases congestion.
How robust are various widespread TCP/IP
   implementations in the face of 30% packet loss and quarter-second transit
   delays?
I have yet to see a significant problem with robustness.
Is the Internet's sometimes bogging down due mainly to packet losses or
   busy servers or what, or does the Internet not bog down?
That depends on your definitions.  "The Internet" as a whole does not bog
down.  It's a modular system and there are localized problems and
congestion which result in poor service to a wide-ranging set of users.
The causes of the problems vary.  I've seen lots of really slow servers,
congested access links, unhappy routers, congested interconnects, etc.
Where is the data on packet losses experienced by traffic that does not g
o
   through public exchange points?
I suspect that you'd have to ask the parties involved in the private
exchange point.  I suspect that there are not such statistics currently
kept, or if so, they would not be willing to disclose them.  Thus IPPM...
If 30% loss impacts are noticeable, what should be done to eliminate the
   losses or reduce their impacts on Web performance and reliability?
Ah...  Yes, loss rates of 30% are noticeable and painful.  There are
literally hundreds of things that can and should be done to imrpove
things.  Let's see, just off the top of my head:
- more private interconnects are necessary in the long term to scale the
  network.  We cannot have interconnects of infinite bandwidth as hardware
  simply doesn't scale as quickly as demand.  Thus, we need to invoke
  parallelism.  I think that this is already happening in a reasonable way.
- more bandwidth.  Of course, faster is better.  OC3 SONET technology is
  quickly becoming an obvious upgrade path from today's T3 backbones.
- better routers.  Current implementations have many shortcomings which
  aggravate instability.
- accurate reporting.  There seems to be a trend to find a problem and get
  everyone hyped up over it, far in excess of reality.  We spend time
  dealing with such issues rather than doing beneficial engineering.
- improved protocols.  We have an ongoing scalability problem with our
  routing protocols.
- fixed host stacks.  Using the full MTU would be a boon.  Recent data
  indicates that >40% of the packets out there are 40 bytes.
Are packet losses due mainly to transient queue buffer overflows of user
   traffic or to discards by overburdened routing processors or something el
se?
"mainly" is a dangerous quantifier given that there's no hard data.  My
intuition says that sheer congestion is the most serious problem, followed
closely by router implementation.
What does Merit mean when they say that some of these losses are
   intentional because of settlement issues?
I think you really need to ask Merit that.  I could find no justification
for that on their Web page.
Are ISPs cooperating
   intelligently in the carriage of Internet traffic, or are ISPs competing
   destructively, to the detriment of them and their customers?
Ummm...  I see them cooperating.  "intelligently" is in the eye of the
beholder.  Certainly there are some who are being anti-social.
Tony

Re: Questions about Internet Packet Losses

John W. Stewart III