On Fri, 13 Sep 2002, Stephen J. Wilcox wrote:
At what point does one build redundancy into the network.
No, it doesnt necessarily use IX's, in the event of there being no peered path across an IX traffic will flow from the originator to their upstream "tier1" over a private transit link, then that "tier1" will peer with the destination's upstream "tier1" over a private fat pipe then that will go to the destination via their transit private link.
But will these links have enough spare capacity so congestion doesn't happen?
I'm only aware of a few providers who transit across IX's and I think the consensus is that its a bad thing so it tends to be just small people for whom the cost of the private link is relatively high.
I apologize in advance for naming names here, but I think it is important for making my point. A while back (I think last year, but I'm not sure) the AMS-IX had a huge outage because the power failed in two of the main locations. One of the locations didn't at that time have battery or generator backed up power (although they used three diversely routed inputs from the power company) and the other location only had batteries, which didn't last long. Nearly everything was still reachable over transit rather than peering with only minor congestion. However, some networks got their transit in the same buildings as where they connect to the AMS-IX, so both their peering and transit was gone and they were unreachable. If you think this was only true for small networks: think again. Surfnet suffered the same problem. Surfnet one of the largest (if not _the_ largest) Dutch network, connecting all the universities in the country at multi-gigabit speeds. However, they only connected to other networks in a single building at that time. I don't know if this is still the case. Now this is only one big network and a few small ones that suffered. However, things could have been much worse for people in the rest of the Netherlands, because even with all the rerouting going on almost all traffic still flowed through Amsterdam. So any outage in Amsterdam that takes down more than a single building would cripple the majority of Dutch networks. Obviously, something like this doesn't happen all the time, but luck has a tendency to run out from time to time. A plane crash (a 747 went down in an Amsterdam suburb 10 years ago) or a good sized flood (lots of stuff is below sea level in NL) will do it.
I suspect the catch would be that in the event of major switching nodes being taken out there would be considerable congestion on the transit links and most likely on the private peering of the tier1's also.
I'm more worried about long distance fiber running through rural areas. Much more bang for your backhoe renting buck.
not sure I'd call it a "poor job" for not planning all possible failure modes, or for not having links in place for them.
Well the trouble is in the real world we cant have the budgets we'd like to implement our plans and end up compromising.. theres the catch.
I don't think it's just a matter of money. In 1999, I helped roll out a completely new network. EVERYTHING in it, except the ports customers connect to, had a backup. Management originally wanted to connect every location to at least three others. (We got this requirement dropped because it essentially means you're buying a third circuit that doesn't do anything useful until the two others are down; traffic engineering to for both regular operation and the different failure modes is too complex.) Still, I couldn't convince them to move the second transit connection to another city where both our network and the transit network were also present in the same building. A year or so after I left I was in the building where that entire network connects to its transit network over two independent routers at both ends and the power went down and they couldn't get the generators online... Eventually the utility power came back online before the batteries were empty. All of this is on the ground floor in a place that's below sea level only a block or so from a river.