Am I over-thinking this?
Yes, I think so. Often a large component of an SLA is related to the cost of compliance versus the cost of the penalty imposed. If it is cheaper to pay the occasional penalty, rather than construct the network to meet the SLA, then the network operator will often make a purely sales/marketing decision to use the SLA without including engineering/OPS in the discussion. Also, the wording often refers to unplanned downtime so that any planned downtime doesn't get counted in the non-availability measure. And sometimes you find some allowance for packet drop during a limited time period so that if you drop a thousand packets, it doesn't count if it happens during the peak hour of the day or if all packets are dropped in a few minutes timeframe. Another limitation that I have seen refers to "core" network or "core" PoPs meaning the part of the network in the major market area (generally the USA and Western Europe) but not covering network or PoPs in "fringe" areas. I don't believe that there is any hard science behind SLAs and that most engineering/OPS teams don't even know what are the actual SLAs being given to customers. There are engineering targets that are sometimes referred to as SLAs but they are not the Service Level Agreement that is in signed customer contracts. All that aside, it would be interesting to see some standards for measuring and reporting things like "network availability" from an engineering point of view. --Michael Dillon