Network Reliability Engineering
I'm looking for some good reference materials to do some "reliability engineering" calculations and projections. This is to justify increased redundancy, and I want to include quantifiable numbers based on MTBF data and other reliability factors, kind of a scientific justification instead of just the typical emotional appeal using analyst/vendor FUD. I'd appreciate references on how to do this in a network environment (what data to collect, how to collect it, how to analyze, etc). Also any data (or rules of thumb) on typical MTBFs for network events that I won't find on vendor product slicks (like what's the MTBF on IOS, or human-caused service outages of various types, etc). If someone has put together something remotely like this that they'd care to share, that'd be incredibly helpful. Thanks. Pete.
Good luck. For a proper scientific analysis you'd need MTBF info on every point of failure - i.e. the physical link, CSU/DSU, power supply, ... As a rather non-scientific observation, a couple outages per year of 1-4 hours seems to be quite common for a single-homed T1 or faster connection, be it from WorldCom, AT&T, Sprint... I think the arguments in favor of dual-homing are pretty cut and dry. Tri-homing vs dual-homing would be a much tougher benefit to quantify. Ralph Doncaster principal, IStop.com div. of Doncaster Consulting Inc. On Sat, 18 May 2002, Pete Kruckenberg wrote:
I'm looking for some good reference materials to do some "reliability engineering" calculations and projections.
This is to justify increased redundancy, and I want to include quantifiable numbers based on MTBF data and other reliability factors, kind of a scientific justification instead of just the typical emotional appeal using analyst/vendor FUD.
I'd appreciate references on how to do this in a network environment (what data to collect, how to collect it, how to analyze, etc). Also any data (or rules of thumb) on typical MTBFs for network events that I won't find on vendor product slicks (like what's the MTBF on IOS, or human-caused service outages of various types, etc).
If someone has put together something remotely like this that they'd care to share, that'd be incredibly helpful.
Thanks. Pete.
participants (2)
-
Pete Kruckenberg
-
Ralph Doncaster