On Sat, 16 Nov 2002, Sean Donelan wrote:
In the 1990's the MAEs and Gigaswitches would give us an unscheduled failure of a major exchange point on a regular basis, which let us demostrate our disaster recovery capabilities. With the improved reliability, i.e. the PAIXes haven't had a catastrophic failure, we haven't had as many opportunities to demonstrate how well we can handle a disaster at those locations.
Without creating an actual disaster, what if all the providers turned off their BGP sessions with other providers at a PAIX (or Equinix or LINX or where ever), both through the shared switch and private point-to-point links, for an hour. More than likely no one would notice, but then we would have some hard data. Individually providers have tested parts of their own network, but I haven't heard of any coordinated efforts to test recovery across all the service providers in a particular location.
The main problem will be coordination.. you need to get all providers to do this in a tight slot of only one hour. And to make this a good test you need to ensure that all the major players take part more so than the smaller ISPs. From what I've seen its difficult enough to get ISPs to make config changes within a window of a couple of weeks so you're gonna have a problem pulling this together! Also from what I've seen I'll think you'll find things have changed, reduced budgets have forced compromises on redundancy and shutting down an exchange will have a noticable impact to users in the region... you could argue this is all the more reason to conduct these exercises! Steve