RE: OSPF multi-level hierarch: side question

28 May 1999

      Steve Meuse <smeuse@bbnplanet.com> wrote:
...
On the other hand, you can choose to build a box that can handle thousands
of customers, and decrease the traffic load, but also increase the
liklihood of a failure that can directly affect a larger percentage of
customers.
Dan Rabb <danr@dbn.net> wrote:
...
Routers will inevitably fail.  The question becomes how much exposure do you
want when it does?
First, you have to stop thinking of routers as "black boxes" and expose internal
structure of large boxes so you can compare it with clusters.

In this respect, big router designs i know of are eminently more reliable than
clusters of traditional routers for a number of reasons:

1) the connectivity between components ("elementary routers") is significantly
   richer, with many diverse paths between components.

2) the design is inherently simpler than that of a multi-vendor and
   multi-standard cluster; with significantly fewer number of different components
   and a lot more regular topology.  Simplicity directly translates into
   reliability.

3) there is a built-in support for extensive fault-tolerance and self-diagnostics
   at a level simply unachievable with standard routing protocols (which by
   their nature do not have a foggiest idea of the internal structure and
   diagnostic possiblilities of the routers; and do not provide any support
   for state mirroring).

4) the individual failure blocks are much smaller (i.e. one "line card" vs entire
   router, at least in Pluris design -- the line card interface is not a bus,
   but a serial line with protocol which cannot be screwed up by misbehaving
   line card, unlike any known bus protocols).

5) power supplies are distributed (Pluris box simply has a separate DC-DC converter
   on every card)

6) at least one vendor (Pluris) has all card cages complteley isolated electrically

7) the last (but not least) aspect of terabit routing is its inherent reliance
   on inverse-multiplexing over multiple parallel channels allowing to degrade
   service gracefully in case of individual channel or path failures - without
   any need to make the problem visible at IP level; and therefore not limited
   by performance of distributed routing algorithms.

Alex Zinin <zinin@amt.ru> wrote:
...
"Have more bigger boxes rather than less smaller ones"-approach is not
for everybody and not for every case. If you have clusters sitting in one room,
powered from the same source, sharing the same ceiling that can fall, running
the same version of soft, using the same config., etc., than yes it's ok,
because they will more likely crash at the same moment.
A big router does not have to be all in one place physically.  Pluris design
allows hundreds feet of component separation with optical cabling.
...
Also, even if you do use a large box, you probably don't wanna know
all the details about it's connections at some level of your network.
The whole premise of big box design is that its internal capacity is so much
bigger than interface capacity that from outside it looks like a single
point w/o any need to optimize routing inside.  From the perspective of
network management, of course, big boxes have to provide detailed internal
status info.  A sane design for a big router has an out-of-band diagnostic
network within the box.
...
...
to eliminate updates which "do not matter" unlike SPF-based algorithms
   which have to inform everyone about local topology changes.
In SPF-based protocols we have areas for this purpose---we do not propogate
topology information across the area boundaries.
Across boundaries which have to be configured _manually_.  DV and diffuse
algorithms tend to squelch topology updates automatically _within_ an
area if a same-metric alternative path is found.  SPF has to have a coherent
picture of network topology at all times; so flap can easily kill it off.
Diffuse algorithms are by design work well in a network with rapidly
changing topology.

--vadim

Vadim Antonov

tags

participants (1)