On Fri, 13 Sep 2002, Iljitsch van Beijnum wrote:
On Fri, 13 Sep 2002, Stephen J. Wilcox wrote:
At what point does one build redundancy into the network.
No, it doesnt necessarily use IX's, in the event of there being no peered path across an IX traffic will flow from the originator to their upstream "tier1" over a private transit link, then that "tier1" will peer with the destination's upstream "tier1" over a private fat pipe then that will go to the destination via their transit private link.
But will these links have enough spare capacity so congestion doesn't happen?
Well the policy among major isps tends to be around 50% max utilisation per circuit so they should have capacity to reroute. you're most likely to hit issues on the local isp's transit connection which is unlikely to have the capacity to shift a large amount of their peered traffic onto altho medium isps can probably reroute to another IXP a large amount anyway..
I'm only aware of a few providers who transit across IX's and I think the consensus is that its a bad thing so it tends to be just small people for whom the cost of the private link is relatively high.
I apologize in advance for naming names here, but I think it is important for making my point.
A while back (I think last year, but I'm not sure) the AMS-IX had a huge outage because the power failed in two of the main locations. One of the locations didn't at that time have battery or generator backed up power (although they used three diversely routed inputs from the power company) and the other location only had batteries, which didn't last long.
Nearly everything was still reachable over transit rather than peering with only minor congestion. However, some networks got their transit in the same buildings as where they connect to the AMS-IX, so both their peering and transit was gone and they were unreachable. If you think this was only true for small networks: think again. Surfnet suffered the same problem. Surfnet one of the largest (if not _the_ largest) Dutch network, connecting all the universities in the country at multi-gigabit speeds. However, they only connected to other networks in a single building at that time. I don't know if this is still the case.
Yes, there is a large amount of that happening in London where I'm more familiar with individual ISP's networks.. they tend to exist in one or two locations and pass traffic through a single location because of economies on bandwidth scaling. Altho I dont know of any medium/large ones like that.. I personally have always maintained multiple sites with sufficient capacity to handle the failure of another site since day one however perhaps I was lucky enough to be able to draw on a company with enough cash to be willing to do that. I regularly (every month or two) see something major happen at a site and on the whole things continue working just fine around it! Steve
Now this is only one big network and a few small ones that suffered. However, things could have been much worse for people in the rest of the Netherlands, because even with all the rerouting going on almost all traffic still flowed through Amsterdam. So any outage in Amsterdam that takes down more than a single building would cripple the majority of Dutch networks. Obviously, something like this doesn't happen all the time, but luck has a tendency to run out from time to time. A plane crash (a 747 went down in an Amsterdam suburb 10 years ago) or a good sized flood (lots of stuff is below sea level in NL) will do it.
I suspect the catch would be that in the event of major switching nodes being taken out there would be considerable congestion on the transit links and most likely on the private peering of the tier1's also.
I'm more worried about long distance fiber running through rural areas. Much more bang for your backhoe renting buck.
not sure I'd call it a "poor job" for not planning all possible failure modes, or for not having links in place for them.
Well the trouble is in the real world we cant have the budgets we'd like to implement our plans and end up compromising.. theres the catch.
I don't think it's just a matter of money. In 1999, I helped roll out a completely new network. EVERYTHING in it, except the ports customers connect to, had a backup. Management originally wanted to connect every location to at least three others. (We got this requirement dropped because it essentially means you're buying a third circuit that doesn't do anything useful until the two others are down; traffic engineering to for both regular operation and the different failure modes is too complex.) Still, I couldn't convince them to move the second transit connection to another city where both our network and the transit network were also present in the same building.
A year or so after I left I was in the building where that entire network connects to its transit network over two independent routers at both ends and the power went down and they couldn't get the generators online... Eventually the utility power came back online before the batteries were empty. All of this is on the ground floor in a place that's below sea level only a block or so from a river.