Matthew Petach wrote:
George William Herbert <gherbert@retro.com> wrote: Matthew Petach writes:
"protected rings" are a technology of the past. Don't count on your vendor to provide "redundancy" for you. Get two unprotected runs for half the cost each, from two different providers, and verify the path separation and diversity yourself with GIS data from the two providers; handle the failover yourself. That way, you *know* what your risks and potential impact scenarios are. It adds a bit of initial planning overhead, but in the long run, it generally costs a similar amount for two unprotected runs as it does to get a protected run, and you can plan your survival scenarios *much* better, including surviving things like one provider going under, work stoppages at one provider, etc.
This completely ignores the grooming problem.
Not completely; it just gives you teeth for exiting your contract earlier and finding a more responsible provider to go with who won't violate the terms of the contract and re-groom you without proper notification.
That's a post-facto financial recovery / liability limitation technique, not a high availability / hardening technique...
I'll admit I'm somewhat simplifying the scenario, in that I also insist on no single point of failure, so even an entire site going dark doesn't completely knock out service; those who have been around since the early days will remember my email to NANOG about the gas main cut in Santa Clara that knocked a good chunk of the area's connectivity out, *not* because the fiber was damaged, but because the fire marshall insisted that all active electrical devices be powered off (including all UPSes) until the gas in the area had dissipated. Ever since then, I've just acknowledged you can't keep a single site always up and running; there *will* be events that require it to be powered down, and part of my planning process accounts for that, as much as possible, via BCP planning.
I was less than a mile away from that, I remember it well. My corner cube even faced in that direction. I heard the noise then the net went poof. One of those "Oh, that's not good at all" combinations.
Now, I'll be the first to admit it's a different game if you're providing last-mile access to single-homed customers. But sitting on the content provider side of the fence, it's entirely possible to build your infrastructure such that having 3 or more OC192s cut at random places has no impact on your ability to carry traffic and continue functioning.
You have to get out of the game the fiber owners are playing. They can't even keep score for themselves, much less accurately for the rest of us. If you count on them playing fair or right, they're going to break your heart and your business.
You simply count on them not playing entirely fair, and penalize them when they don't; and you have enough parallel contracts with different providers at different sites that outages don't take you completely offline.
The problem with grooming is that in many cases, due to provider consolidation and fiber vendor consolidation and cable swap and so forth, you end up with parallel contracts with different providers at different sites that all end up going through one fiber link anyways. I had (at another site) separate vendors with fiber going northbound and southbound out of the two diverse sites. Both directions from both sites got groomed without notification. Slightly later, the northbound fiber was Then rerouted a bit up the road, into a southbound bundle (same one as our now-groomed southbound link), south to another datacenter then north again via another path. To improve route reduncancy northbound overall, for the providers' overall customer links. And the shared link south of us was what got backhoed. This was all in one geographical area. Diversity out of area will get you around single points like that, if you know the overall topology of the fiber networks around the US and chose locations carefully. But even that won't protect you against common mode vendor hardware failures, or a largescale BGP outage, or the routing chaos that comes with a very serious regional net outage (exchange points, major undersea cable cuts, etc).... There may be 4 or 5 nines, but the 1 at the end has your name on it. -george william herbert gherbert@retro.com