In message <199510041535.IAA23092@upeksa.sdsc.edu>, Hans-Werner Braun writes:
. are all three (four?) NAPs really being used (I know they are there, but despite repeated requests to at least one NAP service provider I appear to be unable to get an answer). I do know that the NY NAP is heavily used, including as my traffic to the Bay area sites I need access to traverses it (modulo all the losses in Sprintlink for at least weeks (reported to and confirmed by the regional network that serves SDSC, though from rumors I am hearing Sprintlink is rather not the exception, and many natives in the community starting to get restless]
We primarily use MaeEast and the Sprint NAP with backup through E144 (FixE), and soon MaeWest. We don't connect to AADS and only use PacBell for customers not reachable by an other means.
. Is there any evidence that the NAPs are really backing each other up? Did someone test and document it, e.g., with a few "test" networks in a bunch of regional networks? What are the time delays for a switch? Does someone have consecutive traceroute outputs where a switch among the NAPs really happened?
Yes there is. The NAPs can back each other up, but traffic can be a real problem if MaeEast goes down. Since adding the gigaswitch, Sprint NAP becomes much more viable as a backup and MaeWest is promising since they too may go with switched FDDI.
. do we have some regular examples from *any* site A initiating a connection from A to B, A to C, and A to D, where the three are verifiably (via traceroute, I guess) would traverse different NAPs (and hopefully only one each)?
There are tons of examples. If the load wasn't split, we'd drown in the traffic load at MaeEast.
. Are there routing stability reports accessible online from the RA (or whoever else feels responsible for this) that graph fluctuations at the NAPs, including correlation among them? What are the quality metrics for routing stability?
We have very reliable statistics on the peering session stability with our peers at every interchange. We also have some very unreliable data (sorry folks, the data reall isn't very good) on prefix stability. On a bad day (a few times a month) we might have a total disconnect time on a given peer of 5-15 minutes over a 24 hour period. This is the worst single peer, not the NAP as a whole. We occasionally (a few times a month) see the entire set of peers drop at MaeEast, we think due to route flap. The normal case is many days, without losing a peering session *anywhere*, interrupted by a loss of a single or small number of peers lasting from a few seconds to a few minutes followed by another few days of uninterrupted peering. The stability of the prefixes announced by those peers is another story. Unfortunately the data collection we have in place has been somewhat broken for a while now. The external route flap reporting is seen as a low priority (not officially supported, you can't get a lower priority), and I haven't had the time to fix it.
. Do all the NAPs provide online statistics?
. Are the NAP and RA regular reports to NSF publicly (hopefully via the Web) available?
You have reporting requirements? Great. We regularly show summary information on internal routing stability and external peer stability (I think that is still publicly available). The more detailed daily summary and the incident logs are not made public, though we should be proud of our record, so I was never able to figure out why. Perhaps you (NSF) can get a copy for reference.
. Is there any way NANOG can be used to exchange status information about networks, rather than getting comments and rumors second or third hand. I understand that it is painful for a service provider to see problems on their network being posted, but if the alternative is a few bad incidents and rumors spreading that the network is always bad, I'd take a few hits and show I fix things quickly. Even better then posting (e.g, via some mailing list) would be an accessible distributed data base covering all the service pproviders and accessible via the network. Is someone already working on that? Would not NANOG be *the* forum to cooperate on that?
This would be great, but I can't see it happenning.
I think this is prime NANOG business. Otherwise, who's problem are these? Who is or should be taking responsibility? Am I all off base here?
We should confirm that there actually was a problem and the problem duration first. Curtis