Resilience: faults, causes, statistics, open issues

27 Jan 2005

      Hi people!

I've begun research on (carrier-grade, aka telecom-grade) resiliency in IP transport networks. The first step would be to collect possible failure events, their causes and consequences, statistics about downtimes (mean time to repair) and mean times between failures, and I would like to identify which of the problems are most typical (HW bug, SW bug, cable cut through, plugged out (link going down), severe misconfiguration).

I think this is the perfect forum to get some feedback from real network-operational experience.

Is anyone out there who has some statistics/documents that would help me in any way?

Also, do you have any suggestions on open research issues to be solved in the area?

Any thoughts on your mind or comments would be most welcome!

Thanks!

András

András Császár (IJ/ETH)

David Andersen

tags

participants (2)