Re: MCI and SprintLink are partitioned (fwd)

4 Oct 1995

      In message <199510041535.IAA23092@upeksa.sdsc.edu>, Hans-Werner Braun writes:
...
. are all three (four?) NAPs really being used (I know they are
   there, but despite repeated requests to at least one NAP service
   provider I appear to be unable to get an answer). I do know that the
   NY NAP is heavily used, including as my traffic to the Bay area
   sites I need access to traverses it (modulo all the losses in
   Sprintlink for at least weeks (reported to and confirmed by the
   regional network that serves SDSC, though from rumors I am hearing
   Sprintlink is rather not the exception, and many natives in the
   community starting to get restless]
We primarily use MaeEast and the Sprint NAP with backup through E144
(FixE), and soon MaeWest.  We don't connect to AADS and only use
PacBell for customers not reachable by an other means.
...
. Is there any evidence that the NAPs are really backing each other
   up? Did someone test and document it, e.g., with a few "test" networks
   in a bunch of regional networks? What are the time delays for a
   switch? Does someone have consecutive traceroute outputs where a
   switch among the NAPs really happened?
Yes there is.  The NAPs can back each other up, but traffic can be a
real problem if MaeEast goes down.  Since adding the gigaswitch,
Sprint NAP becomes much more viable as a backup and MaeWest is
promising since they too may go with switched FDDI.
...
. do we have some regular examples from *any* site A initiating a
   connection from A to B, A to C, and A to D, where the three are
   verifiably (via traceroute, I guess) would traverse different NAPs
   (and hopefully only one each)?
There are tons of examples.  If the load wasn't split, we'd drown in
the traffic load at MaeEast.
...
. Are there routing stability reports accessible online from the RA
   (or whoever else feels responsible for this) that graph fluctuations
   at the NAPs, including correlation among them? What are the quality
   metrics for routing stability?
We have very reliable statistics on the peering session stability with
our peers at every interchange.  We also have some very unreliable
data (sorry folks, the data reall isn't very good) on prefix
stability.  On a bad day (a few times a month) we might have a total
disconnect time on a given peer of 5-15 minutes over a 24 hour period.
This is the worst single peer, not the NAP as a whole.  We
occasionally (a few times a month) see the entire set of peers drop at
MaeEast, we think due to route flap.  The normal case is many days,
without losing a peering session *anywhere*, interrupted by a loss of
a single or small number of peers lasting from a few seconds to a few
minutes followed by another few days of uninterrupted peering.

The stability of the prefixes announced by those peers is another
story.  Unfortunately the data collection we have in place has been
somewhat broken for a while now.  The external route flap reporting is
seen as a low priority (not officially supported, you can't get a
lower priority), and I haven't had the time to fix it.
...
. Do all the NAPs provide online statistics?
. Are the NAP and RA regular reports to NSF publicly (hopefully via
   the Web) available?
You have reporting requirements?  Great.

We regularly show summary information on internal routing stability
and external peer stability (I think that is still publicly
available).  The more detailed daily summary and the incident logs are
not made public, though we should be proud of our record, so I was
never able to figure out why.  Perhaps you (NSF) can get a copy for
reference.
...
. Is there any way NANOG can be used to exchange status information
   about networks, rather than getting comments and rumors second or
   third hand. I understand that it is painful for a service provider to
   see problems on their network being posted, but if the alternative is
   a few bad incidents and rumors spreading that the network is always
   bad, I'd take a few hits and show I fix things quickly. Even better
   then posting (e.g, via some mailing list) would be an accessible
   distributed data base covering all the service pproviders and
   accessible via the network. Is someone already working on that?
   Would not NANOG be *the* forum to cooperate on that?
This would be great, but I can't see it happenning.
...
I think this is prime NANOG business. Otherwise, who's problem are
these? Who is or should be taking responsibility? Am I all off base
here?
We should confirm that there actually was a problem and the problem
duration first.

Curtis

Re: MCI and SprintLink are partitioned (fwd)

Curtis Villamizar