Steve Meuse <smeuse@bbnplanet.com> wrote:
On the other hand, you can choose to build a box that can handle thousands of customers, and decrease the traffic load, but also increase the liklihood of a failure that can directly affect a larger percentage of customers.
Dan Rabb <danr@dbn.net> wrote:
Routers will inevitably fail. The question becomes how much exposure do you want when it does?
First, you have to stop thinking of routers as "black boxes" and expose internal structure of large boxes so you can compare it with clusters. In this respect, big router designs i know of are eminently more reliable than clusters of traditional routers for a number of reasons: 1) the connectivity between components ("elementary routers") is significantly richer, with many diverse paths between components. 2) the design is inherently simpler than that of a multi-vendor and multi-standard cluster; with significantly fewer number of different components and a lot more regular topology. Simplicity directly translates into reliability. 3) there is a built-in support for extensive fault-tolerance and self-diagnostics at a level simply unachievable with standard routing protocols (which by their nature do not have a foggiest idea of the internal structure and diagnostic possiblilities of the routers; and do not provide any support for state mirroring). 4) the individual failure blocks are much smaller (i.e. one "line card" vs entire router, at least in Pluris design -- the line card interface is not a bus, but a serial line with protocol which cannot be screwed up by misbehaving line card, unlike any known bus protocols). 5) power supplies are distributed (Pluris box simply has a separate DC-DC converter on every card) 6) at least one vendor (Pluris) has all card cages complteley isolated electrically 7) the last (but not least) aspect of terabit routing is its inherent reliance on inverse-multiplexing over multiple parallel channels allowing to degrade service gracefully in case of individual channel or path failures - without any need to make the problem visible at IP level; and therefore not limited by performance of distributed routing algorithms. Alex Zinin <zinin@amt.ru> wrote:
"Have more bigger boxes rather than less smaller ones"-approach is not for everybody and not for every case. If you have clusters sitting in one room, powered from the same source, sharing the same ceiling that can fall, running the same version of soft, using the same config., etc., than yes it's ok, because they will more likely crash at the same moment.
A big router does not have to be all in one place physically. Pluris design allows hundreds feet of component separation with optical cabling.
Also, even if you do use a large box, you probably don't wanna know all the details about it's connections at some level of your network.
The whole premise of big box design is that its internal capacity is so much bigger than interface capacity that from outside it looks like a single point w/o any need to optimize routing inside. From the perspective of network management, of course, big boxes have to provide detailed internal status info. A sane design for a big router has an out-of-band diagnostic network within the box.
to eliminate updates which "do not matter" unlike SPF-based algorithms which have to inform everyone about local topology changes.
In SPF-based protocols we have areas for this purpose---we do not propogate topology information across the area boundaries.
Across boundaries which have to be configured _manually_. DV and diffuse algorithms tend to squelch topology updates automatically _within_ an area if a same-metric alternative path is found. SPF has to have a coherent picture of network topology at all times; so flap can easily kill it off. Diffuse algorithms are by design work well in a network with rapidly changing topology. --vadim