Todd Underwood wrote:
Sean Donelan wrote:
Todd Underwood wrote:
the general idea is: take a large peerset sending you full routes, keep every update forever, and take a reasonably long (at least a month or two) time horizon. calculate a consensus view for each prefix as to whether that prefix is reachable by some set of those peers. an outaged prefix is one that used to be reachable that not no longer is. in other words, one that has been withdrawn from the full table by some sufficiently large number of peers.
This describes a partioning, not necessarily an outage.
can you explain what you mean?
I'm not sure if Sean's thinking the same thing I am, but let me chime in with a nickel's worth of commentary. There are some inconsistent terms used in computer dependability research, but I prefer and use two key definitions: failure (something is offline) and outage (customer sees the service offline). Various redundancy can hide failures from customers and keep them from being true outages. Looking at the routing tables you see failures. If a prefix goes away completely and utterly, and is truly unreachable, then anyone trying to see it is going to see an outage. But you can have a lot of intermediate cases where routes are mostly down but not completely, or where parts of the net can see it but other parts can't due to the vagarities of route propogation and partial failures. And there are situations where the route is down but the service is still up. There are other network monitoring groups that do end to end connectivity tests from geographically distributed clients out to sample systems around the net. Some for research and some for hire for network monitoring. I think what they do is much closer to identifying true outages than your method. -george william herbert gherbert@retro.com