New subject: few big monolithic PEs vs many small PEs

27 Jun 2019

      On Wed, 19 Jun 2019 at 21:23, <adamv0025@netconsultings.com> wrote:
...
Hi folks,
Recently I ran into a peculiar situation where we had to cap couple of PE
even though merely a half of the rather big chassis was populated with
cards, reason being that the central RE/RP was not able to cope with the
combined number of routes/vrfs/bgp sessions/etc..
So this made me think about the best strategy in building out SP-Edge
nowadays (yes I'm aware of the centralize/decentralize pendulum swinging
every couple of years).
The conclusion I came to was that *currently the best approach would be to
use several medium to small(fixed) PEs to replace a big monolithic chasses
based system.
So what I was thinking is,
Yes it will cost a bit more (router is more expensive than a LC)
Will end up with more prefixes in IGP, more BGP sessions etc.. -don't care.
But the benefits are less eggs in one basket, simplified and hence faster
testing in case of specialized PEs and obviously better RP CPU/MEM to port
ratio.
Am I missing anything please?
*currently,
Yes some old chassis systems or even multi-chassis systems used to support
additional RPs and offloading some of the processes (e.g. BGP onto those)
-problem is these are custom hacks and still a single OS which needs
rebooting LC/ASICs when being upgraded -so the problem of too many eggs in
one basket still exists (yes cisco NCS6k and recent ASR9k lightspeed LCs are
an exception)
And yes there is the "node-slicing" approach from Juniper where one can
offload CP onto multiple x86 servers and assign LCs to each server (virtual
node) - which would solve my chassis full problem -but honestly how many of
you are running such setup? Exactly. And that's why I'd be hesitant to
deploy this solution in production just yet. I don't know of any other
vendor solution like this one, but who knows maybe in 5 years this is going
to be the new standard. Anyways I need a solution/strategy for the next 3-5
years.
Would like to hear what are your thoughts on this conundrum.
adam
netconsultings.com
::carrier-class solutions for the telecommunications industry::
Hi Adam,

Over the years I have been bitten multiple times by having fewer big
routers with either far too many services/customers connected to them
or too much traffic going through them. These days I always go for
more smaller/more routers than fewer/larger routers.

One experience I have made is that when there is an outage on a large
PE, even when it still has spare capacity, is that the business impact
can be too much to handle (the support desk is overwhelmed, customers
become irate if you can't quickly tell them what all the impacted
services are, when service will be restored, the NMS has so many
alarms it’s not clear what the problem is or where it's coming from
etc.).

I’ve seen networks place change freeze on devices, with the exception
of changes that migrate customers or services off of the PE, because
any outage would create too great an impact to the business, or risk
the customers terminating their contract. I’ve also seen changes
freeze be placed upon large PEs because the complexity was too great,
trying to work out the impact of a change on one of the original PEs
from when the network was first built, which is somehow linked to
virtually every service on the network in some obscure and
unforeseeable way.

This doesn’t mean there isn’t a place for large routers. For example,
in a typical network, by the time we get to the P nodes layer in the
core we tend to have high levels of redundancy, i.e. any PE is
dual-homed to two or more P nodes and will have 100% redundant
capacity. Down at the access layer customers may be connected to a
single access layer device or the access layer device might have a
single backhaul link. So technically we have lots of customers,
services and traffic passing through larger P node devices, but these
devices have a low rate of changes / low touch, perform a low number
of functions, they are operationally simple, and are highly redundant.
Adversely at the service edge, which I guess is your main concern
here, I’m all about more smaller devices with single service dedicated
devices.

I’ve tried to write some of my experiences here
(https://null.53bits.co.uk/index.php?page=few-larger-routers-vs.-many-smaller...).
The tl;dr version though is that there’s rarely a technical
restriction to having fewer large routers and it’s an
operational/business impact problem.

I'd like to hear from anyone who has had great success with fewer larger PEs.

Cheers,
James.

Re: few big monolithic PEs vs many small PEs

James Bensley

Mark Tinka

James Bensley

Mark Tinka

tags

participants (2)