Re: MCI and SprintLink are partitioned (fwd)
HWB - The problem is really not so much that the routing fell over but that other problems were run into, that are independent of NAPs/MAEs/etc. | . are all three (four?) NAPs really being used (I know they are | there, but despite repeated requests to at least one NAP service | provider I appear to be unable to get an answer). I do know that the | NY NAP is heavily used, including as my traffic to the Bay area | sites I need access to traverses it (modulo all the losses in | Sprintlink for at least weeks (reported to and confirmed by the | regional network that serves SDSC, though from rumors I am hearing | Sprintlink is rather not the exception, and many natives in the | community starting to get restless] SprintLink and MCI exchange traffic at two NAPs, both MAEs, and FIX-WEST. We likely will start exchanging traffic at the PAC*Bell NAP in the very near future. SprintLink and AGIS will be exchanging traffic there even before then. Others at that NAP are either in the queue wrt negotiating a bilateral agreement with Sprint, or have not yet approached Sprint with regards to a peering for various reasons. SprintLink and MCI exchange fairly heavy traffic at the Chicago NAP, very heavy traffic at the Pennsauken NAP, and extraordinarily heavy traffic at MAE-EAST, and we are have already been looking at a very strong and purely technical need to start moving traffic between ourselves directly in several other locations. The reason you see so much use of the Pennsauken NAP is that CERFNET has a DS3 ATM pipe terminated on a router there, and that is where CERFNET and SprintLink exchange a good chunk of traffic, principally because the bandwidth available to do that in New Yorsey has been greater than any other path on the west coast through which CERFNET and SprintLink could have exchanged traffic. I believe that Push could supply you with further details of CERFNET's near-future plans in this and other regards. | . Is there any evidence that the NAPs are really backing each other | up? Did someone test and document it, e.g., with a few "test" networks | in a bunch of regional networks? What are the time delays for a | switch? Does someone have consecutive traceroute outputs where a | switch among the NAPs really happened? What do you mean by backing each other up? There was never a requirement for NAPs to do that; what does fall-overs is the bilateral routing among each pair of peers at each touchdown point. BGP fallover with respect to very large changes (disconnectivity at a NAP, MAE or FIX-WEST) between two very big peers adjusts in various ways; firstly, you could have a fast IGP switching, which means convergence time within one side of a few seconds. Secondly, you could have an eBGP timeout or the like, which means convergence time in a matter of a couple of minutes or less. The key problem here is that convergence eats CPU (lots of routes to be announced or withdrawn or sent to different next-hops), and very very bad transitions can take ten to fifteen minutes, depending on the characteristics of the failures. However, fall-over happens fairly frequently and sometimes as a result of having to make code changes and the like at edge routers (routers colocated at NAPs/MAEs/FIXes etc), and we have long established that done right, it's not very painful. | . do we have some regular examples from *any* site A initiating a | connection from A to B, A to C, and A to D, where the three are | verifiably (via traceroute, I guess) would traverse different NAPs | (and hopefully only one each)? Sure; if I understand the question correctly, anybody on SprintLink or MCI should be able to do this without thinking about it. | . Are there routing stability reports accessible online from the RA | (or whoever else feels responsible for this) that graph fluctuations | at the NAPs, including correlation among them? What are the quality | metrics for routing stability? Not AFAIK. | . Do all the NAPs provide online statistics? The Sprint NAP has a statistics package which is really nifty but not yet widely publically available; you should tap Bilal or one of the other responsible people on the shoulder to show it off to you. | . Are the NAP and RA regular reports to NSF publicly (hopefully via | the Web) available? Not sure. It might be a good idea. | . Is there any way NANOG can be used to exchange status information | about networks, rather than getting comments and rumors second or | third hand. outage-request@sprint.net can put you on our (very widely-subscribed) list for announcing SprintLink/ICM outages, innages, root problems and potential solutions. I hope that you'll find that the quality of information there is reasonable and that volume is fairly light, and generally is what you seem to be asking for. | Even better | then posting (e.g, via some mailing list) would be an accessible | distributed data base covering all the service pproviders and | accessible via the network. Is someone already working on that? | Would not NANOG be *the* forum to cooperate on that? I'm certainly open to the idea. The first thing you'll have to realize is that when you can't get to the database because the network is broken, it can't help you... Sean.
participants (1)
-
Sean Doran