Re: Ungodly packet loss rates

[Quotes mercilessly reordered] I'm amazed at the attitude I'm getting from this list. You are, collectively, in the business of running a large network. I am a paying user of that network. The network is not delivering appropriate performance, as measured most importantly by the time I and others spend waiting around for characters to echo, Web pages to display, and whatnot. This time is long far more often than it's historically been, and far more often than a reasonable person might expect. Although my immediate complaint is prompted by a specific incident, such incidents are so common as to constitute a continuing, pervasive pattern. Because of the structure of the network, this pattern affects customers of all providers, not just the immediately responsible ones. Although many problems do exist at user sites, it's clear that many problems also exist within the network itself. So I complain, and suggest that you should look into reducing network growth to a level you can really manage, and setting standards of performance for yourselves and one another. Do you say "Yes, that's a good idea"? No. Do you say "No, that won't work because <x>"? No. Do you say "We think we have a handle on the problem, and you can expect it to go away soon"?. No. Do you say "We don't think we can make the problem go away no matter what we do, so we'll try to do a better job of explaining the expected level of service to new users (and to old users who are losing the level of service they've been used to)?". No. Do you refer me to some existing document, prepared either by my own ISP or by NANOG or some other group, describing the quality of service I'm to expect, and point out to me that what I'm asking for is more than it guarantees? No. As far as I can tell, nobody's acknowledged that there's a problem. You really seem to believe that the quality of service provided over the Internet as a whole, as opposed to within any particular provider's network, is acceptable. What I hear is "Quit whining", or in one case, "Quit whining, idiot". mrbill> No, I beleive the person who recommended that suggested you shop around mrbill> for the best provider *to start out with*, not bitch, whine, and moan mrbill> when your connection is not 100% perfect through the one you mrbill> currently have. I think there's a big difference between complaining about a connection "not [being] 100% perfect" and complaining about a huge packet loss rate making a path (and indeed all paths between me and at least one very major network) nearly unusable. There's even more of a difference between complaining about a single incident of such a loss rate and complaining about a pervasive pattern of such incidents. Are you saying that I should accept bursty periods of 10-second character echo times, continuing for 4 or 5 days? I'm sorry, but that sort of congestion inside a network backbone demonstrates gross overload. It takes a lot to drive a network to that point in the presence of TCP congestion avoidance, even with lots of short connections. Are you suggesting that I find a provider that never gives me a path through a congested network? I'm sorry, but given the number of congested networks out there, and how quickly the congestion moves around, and the plain fact that some sites are connected *via* congested networks, I don't believe that's possible. I also think it's unreasonable to expect users to choose their providers based on which sites they're communicating with. Users should be able to expect acceptable levels of service to any site (yes, provided that site itself has adequate capacity). ISPs are in the business of providing usable service, not providing the service it's convenient for them to provide. Take my own case. I didn't get this connection to let me talk to Cisco; I already had facilities for that. I got it for general access to various random stuff on the Net. Unless it gives me usable connectivity to the *whole* Net (including Cisco, but only incidentally), it's not doing what I bought it for... and it's not doing what the people I bought it from sell it for, either. If I were going to put really heavy demands on the network, I could see being told I needed to connect somewhere close to my target. That's not what's going on here; we're talking about a TELNET connection. At a more basic level, if the Net can't be made usable for at least Web access from almost anywhere to almost anywhere, then what's the point of building it at all? mrbill> I dont see where a temporary network problem such as you describe mrbill> should result in a message being sent to the various ISPs and the mrbill> NANOG list. You misunderstand my point; the message wasn't really about the immediate problem; that was merely an example. A problem with my own stuff caused me to really rely on services I've been paying for for a long time. When I started using those services for serious interactive work, they failed me, and they continued to fail me for several days. I was reminded of how bad things on the Net at large really were, and motivated to investigate what was going on in this particular case. Having established to a reasonable degree of certainty that the problem isn't on my end and isn't on Cisco's end, and that the problem has gone on for several days, I feel justified in complaining to the ISPs involved. As far as the question of the problem being temporary, well, yes, it's temporary. Everything is temporary. You and I are decidedly temporary. If "temporary" in this case were 10 seconds, I'd agree with you. 4 days is, however, a ridiculously long lifetime for a double-digit drop rate in a major network backbone. When was the last time you saw a significant part of the telephone network become almost unusable for 4 days? Having seen similar problems all too often in the past, and having heard complaints about such problems from other users, I feel justified in recommending that an industry group, presumably concerned with quality of service, consider the matter. The issue isn't this particular failure. The issue is the industry's inability to manage the network appropriately. If this were an isolated incident, it would be acceptable, if annoying. The fact is, however, that some large part of the network is either down or degraded almost all the time. I believe that the reason for that is that the network is being grown at a faster rate than the industry can coordinate properly. Go Web surfing. Count the number of sites you can't reach when you *know* that the problem isn't local overloading at either end of the connection. Count the number of stalls you get when you're loading the pages that *do* work. Do you really consider that an appropriate level of service? Now multiply the annoyance factor by 10, and you'll get the idea what it's like for interactive users. mrbill> My suggestion: quit bitching and wait for your FR connection to be mrbill> restored, I beg your pardon, but I think I'm entitled to "bitch" whenever a service I'm paying for isn't being delivered in a satisfactory way. I assure you that I'd expect my provider to complain very loudly if I stopped paying my bills on time. mrbill> or reconfigure your current equipment (if you work at Cisco, mrbill> it shouldn't be TOO hard). Regardless of how hard it may or may not be, I shouldn't have to do it. I've paid for a service that *should*, if it were working properly, save me from having to do it. Your opinion as to whether I really need that service is irrelevant... and amazingly arrogant. In this case, I'd have to either take down network services that some friends of mine depend on, or come up with another computer. Doing one or the other is the only way I can maintain the air gap between Cisco and the Internet. Now, on technical issues (and my mistakes thereon): mikedoug> How in the hell can you expect a 100% success rate over (1) a slow mikedoug> modem link, and (2) to *ANY* site on the world. Hell, do you have mikedoug> any *CLUE*--I know you don't--how many sites on the net have servers mikedoug> behind 28.8 links??? How great a packet loss do you expect when you mikedoug> access them?? Is that provider dependent??? *ANY* site--really? Sigh. I have to admit that my language was wrong. When I said "any point" (I did not say "any site"), I meant "the edge of any ISP's network". Any IP path with a double-digit loss rate (or, generally, any single link with, say, a 5 percent loss rate) is grossly overloaded, but I can only hold ISPs responsible for capacity planning out to the edges of their own networks. In the present case, most of the loss is being introduced in the middle of Alternet's DS3 backbone. On a well-managed network, I can and should expect a loss rate just slightly above the rate intrinsic to TCP's flow control, given that the data traffic is overwhelmingly TCP. I don't know what the intrinsic rate is, but-- 1. I'd be pretty confident in guessing it's less than 5 percent. 2. It's a *lot* less than 40 percent. It's a lot less than 20 percent. 3. It doesn't create gross degradation of interactive service. As I realized shortly after I sent my message, 1 percent really *isn't* a reasonable expectation for a TCP/IP loss rate, since TCP uses packet loss as a flow-control feedback mechanism, and will force the loss rate along any path above 1 percent. My only excuse for this error is that the networks I used to work with were either run in uncongested mode (not as uncommon as you might think), or were not pure IP networks. At the time, most hosts had even worse congestion response than they have now, and you had to overengineer the network if you wanted it to work right. As for the rest... jbash> > It doesn't look to me as though the loss is being introduced at the jbash> > NAPS. If you look at the trace, you'll see that significant loss jbash> > starts to appear within Alternet, well after MAE-west. It looks as jbash> > though more loss appears inside BBN's network, although it's difficult jbash> > to tell because of the already large Alternet loss. mrbill> Traceroute is *not* a good tool to diagnose packet loss problems. mrbill> I've had traceroute tell me that a packet loss problem was between mrbill> two points 3-4 hops "out", when actually it was with the T-1 at mrbill> my site, the "first hop" in the trace. emv> Traceroute is less useful a tool than you think in the face of congestive emv> loss. Routers can and do selectively prioritize the queueing packets emv> based on their type, and if I were a network operator I would have no emv> hesitation about dropping traceroute or ping packets to low priority. Unfortunately, traceroute is what's available. Ed's point about priority queues (and fair queues, and whatever else is out there this week) is a good one, and I withdraw the assertion that the loss rate is 40 percent; obviously I can't really trust the absolute loss rates I get from ping and traceroute. Again, I plead rustiness (or maybe complete obsolescence)... my real-world experience predates useful priority queueing. The TCP connection itself reports about a 20 percent retransmission rate in one direction, and that may be a more reasonable estimate of the actual loss than the 40 percent I get from ping and traceroute. Given enough probes, however, traceroute should still show discontinuities in packet loss at congestion points. I think I was doing enough probes... 25 per hop, and the trace I sent wasn't the only one I took. In fact, I now have confirmation that most, or maybe all, of my loss is (or maybe was... loss is down quite a bit as I write this) being caused by a major overload on a link inside Alternet's backbone. Apparently some kind of routing reconfiguration (possibly by a third party) at MAE-west dumped a lot of traffic into an Alternet DS3 that wasn't overloaded before. None of which is really relevant to the basic problem, which is that this service level makes interactive sessions nearly unusable, and even Web access a bit painful... regardless of where the drops happen. -- J. Bashinski

Okay. I'll hopefully solve all your problems in one paragraph, vs. your bible of well written critique of the Internet: You get what you pay for or you pay what you get for. Since you chose a substandard choice or Cisco did, obviously one of you needs to seek legal action immediately. I can, of course for a fee, recommend a good attorney. If you realize that by paying less, you should obviously get less, as most people understand then you wouldn't have bitched and moaned to the entire NANOG community. Of course, being the good samaritan that I am can end your troubles with a free dial-up account on our network, whereby I make no guarantees of the performance, and of course this account would be free. The only agreement I attach to this deal is that you take this thread off-line and to the courts. Rob Exodus Communications Inc.
[Quotes mercilessly reordered]
I'm amazed at the attitude I'm getting from this list. You are, collectively, in the business of running a large network. I am a paying user of that network. The network is not delivering appropriate performance, as measured most importantly by the time I and others spend waiting around for characters to echo, Web pages to display, and whatnot. This time is long far more often than it's historically been, and far more often than a reasonable person might expect.
Although my immediate complaint is prompted by a specific incident, such incidents are so common as to constitute a continuing, pervasive pattern. Because of the structure of the network, this pattern affects customers of all providers, not just the immediately responsible ones. Although many problems do exist at user sites, it's clear that many problems also exist within the network itself.
So I complain, and suggest that you should look into reducing network growth to a level you can really manage, and setting standards of performance for yourselves and one another.
Do you say "Yes, that's a good idea"? No. Do you say "No, that won't work because <x>"? No. Do you say "We think we have a handle on the problem, and you can expect it to go away soon"?. No. Do you say "We don't think we can make the problem go away no matter what we do, so we'll try to do a better job of explaining the expected level of service to new users (and to old users who are losing the level of service they've been used to)?". No. Do you refer me to some existing document, prepared either by my own ISP or by NANOG or some other group, describing the quality of service I'm to expect, and point out to me that what I'm asking for is more than it guarantees? No.
As far as I can tell, nobody's acknowledged that there's a problem. You really seem to believe that the quality of service provided over the Internet as a whole, as opposed to within any particular provider's network, is acceptable.
What I hear is "Quit whining", or in one case, "Quit whining, idiot".
mrbill> No, I beleive the person who recommended that suggested you shop around mrbill> for the best provider *to start out with*, not bitch, whine, and moan mrbill> when your connection is not 100% perfect through the one you mrbill> currently have.
I think there's a big difference between complaining about a connection "not [being] 100% perfect" and complaining about a huge packet loss rate making a path (and indeed all paths between me and at least one very major network) nearly unusable. There's even more of a difference between complaining about a single incident of such a loss rate and complaining about a pervasive pattern of such incidents.
Are you saying that I should accept bursty periods of 10-second character echo times, continuing for 4 or 5 days? I'm sorry, but that sort of congestion inside a network backbone demonstrates gross overload. It takes a lot to drive a network to that point in the presence of TCP congestion avoidance, even with lots of short connections.
Are you suggesting that I find a provider that never gives me a path through a congested network? I'm sorry, but given the number of congested networks out there, and how quickly the congestion moves around, and the plain fact that some sites are connected *via* congested networks, I don't believe that's possible.
I also think it's unreasonable to expect users to choose their providers based on which sites they're communicating with. Users should be able to expect acceptable levels of service to any site (yes, provided that site itself has adequate capacity). ISPs are in the business of providing usable service, not providing the service it's convenient for them to provide.
Take my own case. I didn't get this connection to let me talk to Cisco; I already had facilities for that. I got it for general access to various random stuff on the Net. Unless it gives me usable connectivity to the *whole* Net (including Cisco, but only incidentally), it's not doing what I bought it for... and it's not doing what the people I bought it from sell it for, either.
If I were going to put really heavy demands on the network, I could see being told I needed to connect somewhere close to my target. That's not what's going on here; we're talking about a TELNET connection. At a more basic level, if the Net can't be made usable for at least Web access from almost anywhere to almost anywhere, then what's the point of building it at all?
mrbill> I dont see where a temporary network problem such as you describe mrbill> should result in a message being sent to the various ISPs and the mrbill> NANOG list.
You misunderstand my point; the message wasn't really about the immediate problem; that was merely an example.
A problem with my own stuff caused me to really rely on services I've been paying for for a long time. When I started using those services for serious interactive work, they failed me, and they continued to fail me for several days. I was reminded of how bad things on the Net at large really were, and motivated to investigate what was going on in this particular case.
Having established to a reasonable degree of certainty that the problem isn't on my end and isn't on Cisco's end, and that the problem has gone on for several days, I feel justified in complaining to the ISPs involved.
As far as the question of the problem being temporary, well, yes, it's temporary. Everything is temporary. You and I are decidedly temporary. If "temporary" in this case were 10 seconds, I'd agree with you. 4 days is, however, a ridiculously long lifetime for a double-digit drop rate in a major network backbone. When was the last time you saw a significant part of the telephone network become almost unusable for 4 days?
Having seen similar problems all too often in the past, and having heard complaints about such problems from other users, I feel justified in recommending that an industry group, presumably concerned with quality of service, consider the matter.
The issue isn't this particular failure. The issue is the industry's inability to manage the network appropriately. If this were an isolated incident, it would be acceptable, if annoying. The fact is, however, that some large part of the network is either down or degraded almost all the time. I believe that the reason for that is that the network is being grown at a faster rate than the industry can coordinate properly.
Go Web surfing. Count the number of sites you can't reach when you *know* that the problem isn't local overloading at either end of the connection. Count the number of stalls you get when you're loading the pages that *do* work. Do you really consider that an appropriate level of service? Now multiply the annoyance factor by 10, and you'll get the idea what it's like for interactive users.
mrbill> My suggestion: quit bitching and wait for your FR connection to be mrbill> restored,
I beg your pardon, but I think I'm entitled to "bitch" whenever a service I'm paying for isn't being delivered in a satisfactory way. I assure you that I'd expect my provider to complain very loudly if I stopped paying my bills on time.
mrbill> or reconfigure your current equipment (if you work at Cisco, mrbill> it shouldn't be TOO hard).
Regardless of how hard it may or may not be, I shouldn't have to do it. I've paid for a service that *should*, if it were working properly, save me from having to do it. Your opinion as to whether I really need that service is irrelevant... and amazingly arrogant.
In this case, I'd have to either take down network services that some friends of mine depend on, or come up with another computer. Doing one or the other is the only way I can maintain the air gap between Cisco and the Internet.
Now, on technical issues (and my mistakes thereon):
mikedoug> How in the hell can you expect a 100% success rate over (1) a slow mikedoug> modem link, and (2) to *ANY* site on the world. Hell, do you have mikedoug> any *CLUE*--I know you don't--how many sites on the net have servers mikedoug> behind 28.8 links??? How great a packet loss do you expect when you mikedoug> access them?? Is that provider dependent??? *ANY* site--really?
Sigh. I have to admit that my language was wrong. When I said "any point" (I did not say "any site"), I meant "the edge of any ISP's network". Any IP path with a double-digit loss rate (or, generally, any single link with, say, a 5 percent loss rate) is grossly overloaded, but I can only hold ISPs responsible for capacity planning out to the edges of their own networks. In the present case, most of the loss is being introduced in the middle of Alternet's DS3 backbone.
On a well-managed network, I can and should expect a loss rate just slightly above the rate intrinsic to TCP's flow control, given that the data traffic is overwhelmingly TCP. I don't know what the intrinsic rate is, but--
1. I'd be pretty confident in guessing it's less than 5 percent.
2. It's a *lot* less than 40 percent. It's a lot less than 20 percent.
3. It doesn't create gross degradation of interactive service.
As I realized shortly after I sent my message, 1 percent really *isn't* a reasonable expectation for a TCP/IP loss rate, since TCP uses packet loss as a flow-control feedback mechanism, and will force the loss rate along any path above 1 percent. My only excuse for this error is that the networks I used to work with were either run in uncongested mode (not as uncommon as you might think), or were not pure IP networks. At the time, most hosts had even worse congestion response than they have now, and you had to overengineer the network if you wanted it to work right.
As for the rest...
jbash> > It doesn't look to me as though the loss is being introduced at the jbash> > NAPS. If you look at the trace, you'll see that significant loss jbash> > starts to appear within Alternet, well after MAE-west. It looks as jbash> > though more loss appears inside BBN's network, although it's difficult jbash> > to tell because of the already large Alternet loss.
mrbill> Traceroute is *not* a good tool to diagnose packet loss problems. mrbill> I've had traceroute tell me that a packet loss problem was between mrbill> two points 3-4 hops "out", when actually it was with the T-1 at mrbill> my site, the "first hop" in the trace.
emv> Traceroute is less useful a tool than you think in the face of congestive emv> loss. Routers can and do selectively prioritize the queueing packets emv> based on their type, and if I were a network operator I would have no emv> hesitation about dropping traceroute or ping packets to low priority.
Unfortunately, traceroute is what's available. Ed's point about priority queues (and fair queues, and whatever else is out there this week) is a good one, and I withdraw the assertion that the loss rate is 40 percent; obviously I can't really trust the absolute loss rates I get from ping and traceroute. Again, I plead rustiness (or maybe complete obsolescence)... my real-world experience predates useful priority queueing.
The TCP connection itself reports about a 20 percent retransmission rate in one direction, and that may be a more reasonable estimate of the actual loss than the 40 percent I get from ping and traceroute.
Given enough probes, however, traceroute should still show discontinuities in packet loss at congestion points. I think I was doing enough probes... 25 per hop, and the trace I sent wasn't the only one I took.
In fact, I now have confirmation that most, or maybe all, of my loss is (or maybe was... loss is down quite a bit as I write this) being caused by a major overload on a link inside Alternet's backbone. Apparently some kind of routing reconfiguration (possibly by a third party) at MAE-west dumped a lot of traffic into an Alternet DS3 that wasn't overloaded before.
None of which is really relevant to the basic problem, which is that this service level makes interactive sessions nearly unusable, and even Web access a bit painful... regardless of where the drops happen.
-- J. Bashinski

Hola,
I'm amazed at the attitude I'm getting from this list. You are, collectively, in the business of running a large network. I am a paying user of that network. The network is not delivering appropriate performance, as measured most importantly by the time I and others spend waiting around for characters to echo, Web pages to display, and whatnot. This time is long far more often than it's historically been, and far more often than a reasonable person might expect.
Good paragraph.... Unfortuantely, I believe you have missed some fundamental importancy below...
So I complain, and suggest that you should look into reducing network growth to a level you can really manage, and setting standards of performance for yourselves and one another.
It's all in the economics. The profit/supply/demand of the free market has dictated that where it is is where it should be.
Do you say "Yes, that's a good idea"? No.
Because I/we don't think it is. The applications and users feed the network. Demand increases capacity. Ideally/expectedly the capacity planning is done maturely enough that the network is built to meet the demand. But this demand is rather large. People don't foresee properly. Even when they do they ignore hoping to increase their stretched profit margins.
Do you say "No, that won't work because <x>"?
Erm, maybe I missed something, but I don't see your Add Water solution to the problem. Are you suggesting we limit customers that have access to the Internet?
No. Do you say "We think we have a handle on the problem, and you can expect it to go away soon"?. No.
I hope not. The problem you are seeing is an example of one or two (or maybe more) poorly connected AS entities. With providers like that, the rest of the net looks really good. I don't accept your premise that the majority of the net is broked.
Do you say "We don't think we can make the problem go away no matter what we do, so we'll try to do a better job of explaining the expected level of service to new users (and to old users who are losing the level of service they've been used to)?". No.
No, because the Internet exists just like the global ecology. If the fools in the neighboring state want to take everyone's garbage, we'd love to give it to them. Err, my analogy is false, but the point is that the "Internet" isn't broked. Certain parts of the Internet are, and it's because they choose to be. Imply a sociological analogy. Do we say that the United States is broked because we've 4% unemployment? Because we've X murders / year?
Do you refer me to some existing document, prepared either by my own ISP or by NANOG or some other group, describing the quality of service I'm to expect, and point out to me that what I'm asking for is more than it guarantees? No.
Actually, I will. Your contract. What, it doesn't say anything on there about quality of service? Well, why not?
As far as I can tell, nobody's acknowledged that there's a problem. You really seem to believe that the quality of service provided over the Internet as a whole, as opposed to within any particular provider's network, is acceptable.
Indeed. I look forward to your definition of the Internet. Contributions to benchmarking the performance of the "internet" can be directed to the IPPM mailing list... [...]
I think there's a big difference between complaining about a connection "not [being] 100% perfect" and complaining about a huge packet loss rate making a path (and indeed all paths between me and at least one very major network) nearly unusable. There's even more of a difference between complaining about a single incident of such a loss rate and complaining about a pervasive pattern of such incidents.
But, you see, this is not our problem. It is the contributor of the loss's problem. There exist paths that do not have this problem. [...]
Having seen similar problems all too often in the past, and having heard complaints about such problems from other users, I feel justified in recommending that an industry group, presumably concerned with quality of service, consider the matter.
Well, it is an election year. Perhaps you can get the whole thing regulated by a federal oversight committee. That will really solve all the problems. Or you could get another provider, and encourage the sites connected to poor providers to change as well. It's a free market. It's not designed to provide for the common welfare. It's designed to reward the quick thinking and resourceful. It's designed to endorse Darwinism. In order to have networks succeed, you must have networks fail. Networks that fail will lose customers and decrease their potential to attract new ones. It's all well documented in many economics textbooks. -alan

On Mon, 21 Oct 1996 jbash@velvet.com wrote:
I'm amazed at the attitude I'm getting from this list. You are, collectively, in the business of running a large network. I am a paying user of that network. The network is not delivering appropriate performance, as measured most importantly by the time I and others spend waiting around [...]
I have also been experiencing lately a large increase in packet loss to various spots on the net. This has prompted me to start tracking down these problems using both traceroute and ping to determine which link seems to be causing the problem, and then emailing them a request. So far, I've had one reply, which in turn led to the discovery of a circular routing problem, solving my packet loss through that link. Looking for opinions here: Do I have the right, as a citizen of the internet, to phone up the NOC of another major provider to solve packet loss through their routers? -- Billy Biggs Ottawa, Canada

On Mon, 21 Oct 1996, Billy Biggs wrote:
Looking for opinions here: Do I have the right, as a citizen of the internet, to phone up the NOC of another major provider to solve packet loss through their routers?
You should talk to your provider who should try resolution of problem with the provider in question. This is the only scalable way for this to work. Having various folks call up NOCs for various problem is pretty close to a nightmare scenario for someone who is concerned with NOC staffing and response time. The fact that this approach may current not work well means that we should fix the inter-provider cooperation, rather then routing around it by having end users call NOCs of providers. If this happened, I wouldn't be surprised to see NOC phone numbers becoming semi secret or requiring authentication. -dorian

On Mon, 21 Oct 1996, Billy Biggs wrote:
Looking for opinions here: Do I have the right, as a citizen of the internet, to phone up the NOC of another major provider to solve packet loss through their routers?
You should talk to your provider who should try resolution of problem with the provider in question.
Exactly. Hierarchy is mandatory at this level, both for the physical network, and for the logical structure of customers, ISP's, and NSP's.
This is the only scalable way for this to work. Having various folks call up NOCs for various problem is pretty close to a nightmare scenario for someone who is concerned with NOC staffing and response time.
It simply means that social filters become required, in much the same way that packet filters, and route filters are now becoming standard on any well-connected network.
The fact that this approach may current not work well means that we should fix the inter-provider cooperation, rather then routing around it by having end users call NOCs of providers.
Agreed. End users should never be calling outside their ISP; it is NOT their place to solve problems for their ISP, for multiple reasons, one of which is that if any of our customers worked directly with MCI, for example, to solve a problem they saw that we were not yet aware of, and they and MCI solved the issue without involving us, we'd have no realization that a problem had been developing. We'd much rather have our customers talk to US if they feel a problem is developing, let us analyze the issue, and contact the parties we feel are most appropriate. In this way, we know what changes were made, and why, and we have a history to refer to later. If we simply show up, and find that a peering session has been turned off, or that a specific IP block has been re-advertised through a secondary link to try to shift routing, we have no background, no history of why that change might have been made if it has been a private effort between an individual and another ISP. Communication THROUGH the hierarchy is essential if changes are to be maintained and supported. Otherwise, we'll all be working against each other, trying to second guess and bypass each other's efforts.
If this happened, I wouldn't be surprised to see NOC phone numbers becoming semi secret or requiring authentication.
Ask for the customer number, or for the contact info/callback number if it's another ISP. We've found that in the case if the semi-clueless end user, asking for a customer number is enough to get them to confess that they're not really one of OUR customers. Those who try to bluff their way past don't usually make it past the "can I have your callback number and contact info" if they try to pretend to be calling representing a neighboring ISP's NOC. This is what I mean by "social filters"; it's rather like an access list, only not quite so strict. By imposing enough of a barrier that only those who know what they're doing will pass the test, you limit the random interruptions and noise that would otherwise bog you down, and cause trouble. *grin* It's almost like having to give your driver's license number to prove you're an adult. Maybe we need NANOG to issue "Clue Factor License Numbers" to network engineers and NOC employees that we can read off to each other when we call as one ISP/NSP to another... :-) well, enough random blathering, back to the grind.
-dorian
Matt Petach doing his best to NOT respond to any of the original rantings...

I've suggested that people calling NOCs be forced to listen to a 1 min recording explaining current outages and who should be calling before they get to a real person.
This is the only scalable way for this to work. Having various folks call up NOCs for various problem is pretty close to a nightmare scenario for someone

On Mon, 21 Oct 1996, Billy Biggs wrote: ==>Looking for opinions here: Do I have the right, as a citizen of the ==>internet, to phone up the NOC of another major provider to solve packet ==>loss through their routers? Let's put it in another perspective. You go to Safeway, and you want some oranges. Unfortunately, Safeway's provider of oranges is having a problem getting Safeway oranges that are orange enough. Do you have the right to call up Safeway's provider? Not really. And even if you did call them up, what do you think will do a better job; asking safeway--as a paying customer--to convey your messages, or calling their distributor direct? I know plenty of technical support managers who would, if called by a non-customer, be not-so-happy that his support structure is spending time helping non-customers as opposed to customers, the people that make the money for the company. A friend of mine, who works for an ISP, says that when some of his customers called their upper-level NSP to complain, the NSP wasn't too happy and asked them to please remind their customers to call their own support organization, and filter the problems up if they begin to receive a number of complaints. Just my $0.02.

On Mon, 21 Oct 1996 jbash@velvet.com wrote:
I'm amazed at the attitude I'm getting from this list. You are, collectively, in the business of running a large network. I am a paying user of that network. The network is not delivering appropriate performance, as measured most importantly by the time I and others spend waiting around for characters to echo, Web pages to display, and whatnot. This time is long far more often than it's historically been, and far more often than a reasonable person might expect.
You've provided no proof that the fault is not yours or your provider's or your employer's. The fact is that the vast majority of problems with network performance are in the last 100 feet, either at your end or at the destination end. It is counter-productive to blaim the people running the network core for problems that they have not caused. It is also counter-productive to paint everyone with the same brush when a problem occurs. Everybody, large and small, has problems with the network. The way to solve those problems is NOT to grab a paintbrush and start slapping black paint all over everyone you can see. The solution is to work your way step by step to the source of the problem and get it fixed. Black paint only obscures the problem.
Although my immediate complaint is prompted by a specific incident, such incidents are so common as to constitute a continuing, pervasive pattern.
100% correct. It is a natural consequence of the huge size of the network. This is a pattern seen in every other human endeavor.
Because of the structure of the network, this pattern affects customers of all providers, not just the immediately responsible ones.
Just like contaminated Tylenol affected the lives of everyone. It's the same pattern of distribution through many levels.
Although many problems do exist at user sites, it's clear that many problems also exist within the network itself.
This is not so clear. While there are certainly *SOME* problems within the network itself your statement implies that roughly half the problems are there. I think the split is closer to 10% within the network and 90% within the customer sites.
So I complain, and suggest that you should look into reducing network growth to a level you can really manage, and setting standards of performance for yourselves and one another.
You have obviously not been reading any computer magazines lately. Every one of them is *FULL* of Internet this and Internet that. Everyone with a computer is being urged to get on the net and those without a computer are urged to get one. This is something which nobody on this list has any control over. We cannot turn the demand off.
Do you refer me to some existing document, prepared either by my own ISP or by NANOG or some other group, describing the quality of service I'm to expect, and point out to me that what I'm asking for is more than it guarantees? No.
If you want to know what your ISP contract guarantees you, then please ask them because we haven't got a clue what it says.
As far as I can tell, nobody's acknowledged that there's a problem.
You are talking to a group of people that deal with specific problems, not with generalities. If there is a specific problem and it appears to be within their control to fix it then the people on NANOG will track down the cause of the problem and they will fix that specific problem.
You really seem to believe that the quality of service provided over the Internet as a whole, as opposed to within any particular provider's network, is acceptable.
But you appear to be upset over a more general issue that is out of the control of network engineers. If you believe that the quality of service is not acceptable, you need to talk to the managers who make the pricing and deployment decisions at the major NSP's and ISP's. But be ready to pay 3 to 5 times as much as you are now to cover the costs. Most people feel that the quality of service delivered now at current flat rate prices is quite acceptable. Michael Dillon - ISP & Internet Consulting Memra Software Inc. - Fax: +1-604-546-3049 http://www.memra.com - E-mail: michael@memra.com

[Quotes mercilessly reordered]
I'm amazed at the attitude I'm getting from this list. You are, collectively, in the business of running a large network. I am a
Right
paying user of that network. The network is not delivering appropriate
No. You're a minimal-paying user of a minimal provider who has connected to our network.
performance, as measured most importantly by the time I and others spend waiting around for characters to echo, Web pages to display, and whatnot. This time is long far more often than it's historically been, and far more often than a reasonable person might expect.
Its has no apostrophe when it's not possessive. At any rate, consider this analogy. Some of us pay $10K/month for 100% quality Some pay $2-$5K/month for mostly near 100% quality Some pay $400-$2K/month for a connection to that Some pay $10-$30 a month for a connection to a connection to that. so shut up. Ehud
Although my immediate complaint is prompted by a specific incident,
Whiner.

Ehud, Go back to English class. "It's" is a contraction of either "it is" or "it has". "Its" is the third person possessive. The referenced sentence uses "it's" as a contraction of "it has" and is correctly spelled in the message. R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634

Ehud,
Go back to English class. "It's" is a contraction of either "it is" or "it has". "Its" is the third person possessive. The referenced sentence uses "it's" as a contraction of "it has" and is correctly spelled in the message.
R. Kevin Oberman, Network Engineer
I generally try not to pick on people for English flaws unless they're repeated (and frequent)... But anyway, if Ehud's native language is Hebrew, the difficulty may be that Hebrew doesn't really have a present tense form of the verb 'to be'.
Energy Sciences Network (ESnet)
Avi

To all, Sorry. My comment on Ehud's grammar was intended for Ehud alone. I messed up and sent it to the list. I had intended to chide Ehud for his bringing grammar into the discussion and instead contributed to the problem myself. Sorry again. R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 P.S. For those who don't know Ehud, American English is his native language. He will probably use it rather effectively next time he sees me. :-(

I'd say this is nitpicking, and my native language is "screaming really hard" although I've been taught Hebrew, English, and SCUBA Spanish. (That's enough Spanish to go to San Carlos to dive with :) At any rate, It's is a contraction for it is. The common disgusting method of assumig that 'cause it sounds 'kay we 'kin do it doesn't mean it's stands for it has. Specifically, Kevin, don't send me back to English class, pick up a large library-sized dictionary and ejamacate yourself. In any case, that has little to do with the whiny shit on nanog. Ehud

I agree the Internet performs poorly between a lot of sites. There is much we can do to make it work better. My prediction is it is more likely to get worse than better in the near term.....:( My current "pet peeve" is that peering is more political than technical these days. There are customers that will pay more for better service, but it's a much more difficult sale. Limiting growth is very difficult, probably impossible. Many ISPs have VC or public money invested now, so the pressure to increase sales is pretty irresistible. Best Regards, Robert Laughlin ---------------------------------------------------------------------------- DataXchange sales: 800-863-1550 http://www.dx.net Network Operations Center: 703-903-7412 -or- 888-903-7412 ---------------------------------------------------------------------------- On Mon, 21 Oct 1996 jbash@velvet.com wrote:
I'm amazed at the attitude I'm getting from this list. You are, collectively, in the business of running a large network. I am a paying user of that network. The network is not delivering appropriate performance, as measured most importantly by the time I and others spend waiting around for characters to echo, Web pages to display, and whatnot. This time is long far more often than it's historically been, and far more often than a reasonable person might expect.

Unfortunately, traceroute is what's available. Ed's point about priority queues (and fair queues, and whatever else is out there this week) is a good one, and I withdraw the assertion that the loss rate is 40 percent; obviously I can't really trust the absolute loss rates I get from ping and traceroute.
There's something else you need to keep in mind when using traceroute, and that is that routing in the Internet is often asymmetric. So the forward path may be different than the reverse path, and the loss you *think* you see occuring at a specific location may actually be occuring somewhere else on the reverse path. Basically, it isn't sufficient to run forward traceroutes/pings/trenos/etc to ascertain beyond a doubt where the packet loss is occurring. You have to know about the reverse path too. mb p.s. It really isn't the net that's broken, it's the economic paradigm that's hosed. You're probably getting pretty close to the level of service you pay for.
participants (14)
-
alan@mindvision.com
-
Avi Freedman
-
Billy Biggs
-
booloo@cats.ucsc.edu
-
Craig A. Huegen
-
Dorian R. Kim
-
Ehud Gavron
-
jbash@velvet.com
-
jon@branch.net
-
Kevin Oberman
-
Matthew Petach
-
Michael Dillon
-
Robert Bowman
-
Robert Laughlin