October 1996 - Test - lists.nanog.org

More hardware design (was Re: GigaRouter)
by Alexis Rosen 22 Oct '96

22 Oct '96

[Figured it was about time to change the subject line...] Speaking of hardware design, I've got a few misc. questions and comments. These are geared towards building servers for remote use that are *not* routers, but rather light- to medium-load webservers and the like. Does anyone know of a *small* rackmount case for PCs? By this I mean one that doesn't chew up quite so much vertical room as the usual boxes. Does anyone know where to get a CPU card with integrated SCSI and video? Alternatively, integrated SCSI and ethernet. I know about the PEAK 520S. TJLS claims that it's got serious problems, and even if it does work, it uses the SIS chipset, and thus loses big on memory access. I want something that can boot NetBSD or BSDI. One of the annoying problems using an intel box instead of a sun is that there's no real console. If it dies, the only way to kick it remotely is with a remote-control power switch. These are expensive and unwieldy, not mounting nicely in racks. In lieu of a real remote console like the one I described in a recent message, Thor and I cooked up the notion of a user-mode demon that tickles a watchdog timer every few seconds. (This is better than putting it in the kernel or init, for a whole lot of reasons.) This will at least deal with the need to reset a box when it dies. But I don't know where to get ISA cards that are just watchdog timers (many CPU cards do come with timers built in). I'll be doing some research today but if anyone knows where to get them, or has any experience using them, I'd love to hear about it. BTW, Thor's already gotten such a demon running under SCO. Doing it on NetBSD and other Unixes looks to be trivial. Lastly, I've seen this really neat rackmount chassis from Multitech. It's got 22 ISA slots, severable into up to 9 parts, and enough drive bays to actually run 9 separate servers. If you're looking for maximal density it seems like a good bet. The only problem I can see is that you'll need CPUs with both SCSI and viseo on board (thus my first question) unless you're willing to run on IDE drives. I figure that for light or medium-use servers, ethernet over ISA should be fine. To really make this thing smooth, you'd want a box that can switch a floppy cable nine different ways, since there's only room for one floppy in the case. This doesn't seem very hard, conceptually, but I don't know of anyone who makes such a device. I wonder if any existing switch could be adapted to the purpose? I don't remember how many pins are actually used by one floppy, but I suspect less than 25. If so, there are true 25-line switches available that might do the job. (Black Box has 6-to-1 25-pin switches for $90, and I'm sure they could do a 9-1 if you asked, though at a typically high price. I don't know if the floppies could stand the noise caused by all the cable changes. And you'd need DB-25 to floppy adaptor cables.) MultiTech's working on a PCI model, too, but I don't know anything about it. All other things being equal, this would allow you to use a CPU card with enet and video, since running SCSI over PCI isn't the lose that it is over ISA. /a

7 13

Re: Trouble ticket referral
by William Allen Simpson 22 Oct '96

22 Oct '96

> From: Sean Donelan <SEAN(a)SDG.DRA.COM> > >Agreed. End users should never be calling outside their ISP; > >it is NOT their place to solve problems for their ISP, for > >multiple reasons, one of which is that if any of our > >customers worked directly with MCI, for example, to solve > >a problem they saw that we were not yet aware of, and > >they and MCI solved the issue without involving us, we'd > >have no realization that a problem had been developing. > > Unfortunately there is no good trouble ticket referral system > between providers. I've had MCI (acting as my provider) close > MCI tickets and tell me to call the other provider directly > because "it was not a MCI problem." Since MCI wasn't going to > handle the trouble referral for me, I didn't have a choice but > bypass MCI and contact the other ISP directly. > I will second this. My Ann Arbor ISP (Merit/MichNet) has no customer 24x7 trouble reporting service. So, I've made them give me the numbers of their NOC. I have an unusual contract, and they seem to trust me to have done my homework. Most of their customers do not have that information. On numerous occaisions, the Merit NOC has verified that it is not a MichNet problem, and passed me to the MCI NOC. And on at least 2 occaisions, the MCI NOC has determined it is not their problem, and passed me to the Sprint and PSI NOCs (respectively). Other times, MCI has handled the problem under their own trouble ticket system, even when it was at another NSP, and called me back when the problem was resolved. I reported here (this list) sometime last year how pleased I was with the MCI NOC when it was handled by BBN.... I am sorry to hear that they are backsliding a bit, but I assume they are having growth problems like everyone else. WSimpson(a)UMich.edu Key fingerprint = 17 40 5E 67 15 6F 31 26 DD 0D B9 9B 6A 15 2C 32 BSimpson(a)MorningStar.com Key fingerprint = 2E 07 23 03 C5 62 70 D3 59 B1 4F 5E 1D C2 C1 A2

1 0

Hold my mail?
by Joanie Wexler 22 Oct '96

22 Oct '96

Is it possible for you to hold my mail until November 11? I'm getting nearly 100 messages a day and will not be logging on before then after today. Thank you. Joanie Wexler

1 0

Re: Ungodly packet loss rates
by Kent W. England 22 Oct '96

22 Oct '96

At 04:49 PM 21-10-96 -0700, Michael Dillon wrote: >On Mon, 21 Oct 1996, Jon Zeeff wrote: > >> In other words, the big players don't like the "open" naps and >> are deliberately not installing sufficient bandwidth to them? > >No, the open NAP's are bad engineering and the big players are fixing the >topology by routing around them. > If you want a private interconnect to avoid having to deal with 100 peering requests per week from every Tom, Dick and Harriet's web page services, OK. But there isn't any gee-whiz technology that you can do at a private interconnect that you can't do at a NAP/MAE. Open NAPs aren't bad engineering. --Kent speaking as a consultant to PacBell NAP services

2 1

Fast Networks [was Re: Ungodly packet loss rates]
by Kent W. England 22 Oct '96

22 Oct '96

At 06:54 PM 21-10-96 -0700, bmanning(a)ISI.EDU wrote: > > We can build bigger doors (MTU) but the conveyer belts are stuck > at 100Mm/sec. We need faster conveyer belts! We have them now. > > (waiting for HPPI-64 or G-Ether technologies w/ baited breath :) No need to wait. (Oh, and it would be better if your breath were bated rather than baited :-) > > And then watch out 2-4Gbps backplanes... (that is aggregate right?) > >-- >--bill > > There are technologies that already have aggregate throughput of 20+Gbps. --Kent speaking as a L2 agnostic and PacBell NAP consultant

1 0

You are right [was Re: Ungodly packet loss rates]
by Kent W. England 22 Oct '96

22 Oct '96

At 05:47 PM 21-10-96 -0700, jbash(a)velvet.com wrote: >[Quotes mercilessly reordered] > >I'm amazed at the attitude I'm getting from this list. > > -- J. Bashinski > > John, is it?; You are right. You shouldn't have to worry about network topology when picking an ISP, but you have to realize that the nanog group gets most excited about fast memory and hardware architectures. If it were any other way, the Internet would melt right now. :-) Unfortunately, the practical solution to your problem is to shop around and the more you know about your unique topological concerns the better off you'll be. I wish there was a Net Consumers Report service that you could subscribe to, but we aren't quite there yet. So, John, relax a little knowing you are fundamentally right and your expectations are entirely reasonable and please accept my apology that the Internet isn't quite ready yet to meet your needs. I take solace in the fact that it has never been better and that it will continue to get better. --Kent speaking as a long time Internet services person

1 0

Re: Ungodly packet loss rates
by Vadim Antonov 22 Oct '96

22 Oct '96

>> Oh, well. >> >> You do not need 2 or whatever Gbps backplanes to build >> terabit per second networks. > Correct. I've got a terabit network on my desk right now. > Its kind of simple and doesn't scale well... only two ports. > But what can you expect from strands of fiber... :) > >--bill No, gigabit backplanes are not necessary for _routing IP_ at tbps. It can be done without fancy silicon. I talked that thru with many people and they agree that there is at least one way to build a tbps router in a basement. Software is a different matter, though. --vadim

1 0

Re: Ungodly packet loss rates
by Justin W. Newton 22 Oct '96

22 Oct '96

At 10:49 AM 10/22/96 -0400, Avi Freedman wrote: >We explain it to people. Generally not in detail to dialup customers, though >sometimes we do. But most providers and cmpanies connected via dedicated >connections can be helped to understand what's going on. There's the rub Avi. You have a lot of ISP customers. I have a lot of WeJustGotOffOfAOL customers. Mine take a bit more education, and could possibly be impossible to educate. This is of whom the original poster is speaking. This is getting off topic though. Justin Newton Network Architect Erol's Internet Services

2 1

Re: Ungodly packet loss rates
by jbash＠velvet.com 22 Oct '96

22 Oct '96

[Quotes mercilessly reordered] I'm amazed at the attitude I'm getting from this list. You are, collectively, in the business of running a large network. I am a paying user of that network. The network is not delivering appropriate performance, as measured most importantly by the time I and others spend waiting around for characters to echo, Web pages to display, and whatnot. This time is long far more often than it's historically been, and far more often than a reasonable person might expect. Although my immediate complaint is prompted by a specific incident, such incidents are so common as to constitute a continuing, pervasive pattern. Because of the structure of the network, this pattern affects customers of all providers, not just the immediately responsible ones. Although many problems do exist at user sites, it's clear that many problems also exist within the network itself. So I complain, and suggest that you should look into reducing network growth to a level you can really manage, and setting standards of performance for yourselves and one another. Do you say "Yes, that's a good idea"? No. Do you say "No, that won't work because <x>"? No. Do you say "We think we have a handle on the problem, and you can expect it to go away soon"?. No. Do you say "We don't think we can make the problem go away no matter what we do, so we'll try to do a better job of explaining the expected level of service to new users (and to old users who are losing the level of service they've been used to)?". No. Do you refer me to some existing document, prepared either by my own ISP or by NANOG or some other group, describing the quality of service I'm to expect, and point out to me that what I'm asking for is more than it guarantees? No. As far as I can tell, nobody's acknowledged that there's a problem. You really seem to believe that the quality of service provided over the Internet as a whole, as opposed to within any particular provider's network, is acceptable. What I hear is "Quit whining", or in one case, "Quit whining, idiot". mrbill> No, I beleive the person who recommended that suggested you shop around mrbill> for the best provider *to start out with*, not bitch, whine, and moan mrbill> when your connection is not 100% perfect through the one you mrbill> currently have. I think there's a big difference between complaining about a connection "not [being] 100% perfect" and complaining about a huge packet loss rate making a path (and indeed all paths between me and at least one very major network) nearly unusable. There's even more of a difference between complaining about a single incident of such a loss rate and complaining about a pervasive pattern of such incidents. Are you saying that I should accept bursty periods of 10-second character echo times, continuing for 4 or 5 days? I'm sorry, but that sort of congestion inside a network backbone demonstrates gross overload. It takes a lot to drive a network to that point in the presence of TCP congestion avoidance, even with lots of short connections. Are you suggesting that I find a provider that never gives me a path through a congested network? I'm sorry, but given the number of congested networks out there, and how quickly the congestion moves around, and the plain fact that some sites are connected *via* congested networks, I don't believe that's possible. I also think it's unreasonable to expect users to choose their providers based on which sites they're communicating with. Users should be able to expect acceptable levels of service to any site (yes, provided that site itself has adequate capacity). ISPs are in the business of providing usable service, not providing the service it's convenient for them to provide. Take my own case. I didn't get this connection to let me talk to Cisco; I already had facilities for that. I got it for general access to various random stuff on the Net. Unless it gives me usable connectivity to the *whole* Net (including Cisco, but only incidentally), it's not doing what I bought it for... and it's not doing what the people I bought it from sell it for, either. If I were going to put really heavy demands on the network, I could see being told I needed to connect somewhere close to my target. That's not what's going on here; we're talking about a TELNET connection. At a more basic level, if the Net can't be made usable for at least Web access from almost anywhere to almost anywhere, then what's the point of building it at all? mrbill> I dont see where a temporary network problem such as you describe mrbill> should result in a message being sent to the various ISPs and the mrbill> NANOG list. You misunderstand my point; the message wasn't really about the immediate problem; that was merely an example. A problem with my own stuff caused me to really rely on services I've been paying for for a long time. When I started using those services for serious interactive work, they failed me, and they continued to fail me for several days. I was reminded of how bad things on the Net at large really were, and motivated to investigate what was going on in this particular case. Having established to a reasonable degree of certainty that the problem isn't on my end and isn't on Cisco's end, and that the problem has gone on for several days, I feel justified in complaining to the ISPs involved. As far as the question of the problem being temporary, well, yes, it's temporary. Everything is temporary. You and I are decidedly temporary. If "temporary" in this case were 10 seconds, I'd agree with you. 4 days is, however, a ridiculously long lifetime for a double-digit drop rate in a major network backbone. When was the last time you saw a significant part of the telephone network become almost unusable for 4 days? Having seen similar problems all too often in the past, and having heard complaints about such problems from other users, I feel justified in recommending that an industry group, presumably concerned with quality of service, consider the matter. The issue isn't this particular failure. The issue is the industry's inability to manage the network appropriately. If this were an isolated incident, it would be acceptable, if annoying. The fact is, however, that some large part of the network is either down or degraded almost all the time. I believe that the reason for that is that the network is being grown at a faster rate than the industry can coordinate properly. Go Web surfing. Count the number of sites you can't reach when you *know* that the problem isn't local overloading at either end of the connection. Count the number of stalls you get when you're loading the pages that *do* work. Do you really consider that an appropriate level of service? Now multiply the annoyance factor by 10, and you'll get the idea what it's like for interactive users. mrbill> My suggestion: quit bitching and wait for your FR connection to be mrbill> restored, I beg your pardon, but I think I'm entitled to "bitch" whenever a service I'm paying for isn't being delivered in a satisfactory way. I assure you that I'd expect my provider to complain very loudly if I stopped paying my bills on time. mrbill> or reconfigure your current equipment (if you work at Cisco, mrbill> it shouldn't be TOO hard). Regardless of how hard it may or may not be, I shouldn't have to do it. I've paid for a service that *should*, if it were working properly, save me from having to do it. Your opinion as to whether I really need that service is irrelevant... and amazingly arrogant. In this case, I'd have to either take down network services that some friends of mine depend on, or come up with another computer. Doing one or the other is the only way I can maintain the air gap between Cisco and the Internet. Now, on technical issues (and my mistakes thereon): mikedoug> How in the hell can you expect a 100% success rate over (1) a slow mikedoug> modem link, and (2) to *ANY* site on the world. Hell, do you have mikedoug> any *CLUE*--I know you don't--how many sites on the net have servers mikedoug> behind 28.8 links??? How great a packet loss do you expect when you mikedoug> access them?? Is that provider dependent??? *ANY* site--really? Sigh. I have to admit that my language was wrong. When I said "any point" (I did not say "any site"), I meant "the edge of any ISP's network". Any IP path with a double-digit loss rate (or, generally, any single link with, say, a 5 percent loss rate) is grossly overloaded, but I can only hold ISPs responsible for capacity planning out to the edges of their own networks. In the present case, most of the loss is being introduced in the middle of Alternet's DS3 backbone. On a well-managed network, I can and should expect a loss rate just slightly above the rate intrinsic to TCP's flow control, given that the data traffic is overwhelmingly TCP. I don't know what the intrinsic rate is, but-- 1. I'd be pretty confident in guessing it's less than 5 percent. 2. It's a *lot* less than 40 percent. It's a lot less than 20 percent. 3. It doesn't create gross degradation of interactive service. As I realized shortly after I sent my message, 1 percent really *isn't* a reasonable expectation for a TCP/IP loss rate, since TCP uses packet loss as a flow-control feedback mechanism, and will force the loss rate along any path above 1 percent. My only excuse for this error is that the networks I used to work with were either run in uncongested mode (not as uncommon as you might think), or were not pure IP networks. At the time, most hosts had even worse congestion response than they have now, and you had to overengineer the network if you wanted it to work right. As for the rest... jbash> > It doesn't look to me as though the loss is being introduced at the jbash> > NAPS. If you look at the trace, you'll see that significant loss jbash> > starts to appear within Alternet, well after MAE-west. It looks as jbash> > though more loss appears inside BBN's network, although it's difficult jbash> > to tell because of the already large Alternet loss. mrbill> Traceroute is *not* a good tool to diagnose packet loss problems. mrbill> I've had traceroute tell me that a packet loss problem was between mrbill> two points 3-4 hops "out", when actually it was with the T-1 at mrbill> my site, the "first hop" in the trace. emv> Traceroute is less useful a tool than you think in the face of congestive emv> loss. Routers can and do selectively prioritize the queueing packets emv> based on their type, and if I were a network operator I would have no emv> hesitation about dropping traceroute or ping packets to low priority. Unfortunately, traceroute is what's available. Ed's point about priority queues (and fair queues, and whatever else is out there this week) is a good one, and I withdraw the assertion that the loss rate is 40 percent; obviously I can't really trust the absolute loss rates I get from ping and traceroute. Again, I plead rustiness (or maybe complete obsolescence)... my real-world experience predates useful priority queueing. The TCP connection itself reports about a 20 percent retransmission rate in one direction, and that may be a more reasonable estimate of the actual loss than the 40 percent I get from ping and traceroute. Given enough probes, however, traceroute should still show discontinuities in packet loss at congestion points. I think I was doing enough probes... 25 per hop, and the trace I sent wasn't the only one I took. In fact, I now have confirmation that most, or maybe all, of my loss is (or maybe was... loss is down quite a bit as I write this) being caused by a major overload on a link inside Alternet's backbone. Apparently some kind of routing reconfiguration (possibly by a third party) at MAE-west dumped a lot of traffic into an Alternet DS3 that wasn't overloaded before. None of which is really relevant to the basic problem, which is that this service level makes interactive sessions nearly unusable, and even Web access a bit painful... regardless of where the drops happen. -- J. Bashinski

14 15

RE: [fwd] Rats take down Stanford ...
by Chris A. Icide 22 Oct '96

22 Oct '96

>Second, test redundant systems through to resumption of normal operations. >In this case, the operators had tested to ensure that the redundant systems >would come online in the event of a failure of the primary system. They had >not tested to see what would happen when the primary system was restored to >normal operation. > >Who would have even thought about it? I confess that I would not have. > Anyone who has their rear end on the line would. I spent quite a few years doing engineering at a Nuclear Power Plant. We engineered and tested everything, to the point of having drills including *ALL* of the highly possible events, a high majority of the low possibility events, and even some of the catastrophic events that aren't supposed to be able to even occur. The difference is two-fold. One, if it breaks, does it end up killing your (or make you glow in the dark)? and two, do you take the attitude that it will break, no matter what you do? Number One, could always be rephrased into a question more like, does this affect my financial well being, if the answer is no, then your off the hook, and your management is on the hook. Number two is number two, and IMHO, anyone who thinks they are not susceptible to failures is deserving of what they receive... Chris

1 0