RE: few big monolithic PEs vs many small PEs
From: James Bensley <jwbensley@gmail.com> Sent: Thursday, June 27, 2019 9:56 AM
One experience I have made is that when there is an outage on a large PE, even when it still has spare capacity, is that the business impact can be too much to handle (the support desk is overwhelmed, customers become irate if you can't quickly tell them what all the impacted services are, when service will be restored, the NMS has so many alarms it’s not clear what the problem is or where it's coming from etc.).
I see what you mean, my hope is to address these challenges by having a "single source of truth" provisioning system that will have, among other things, also HW-customer/service mapping -so Ops team will be able to say that if particular LC X fails then customers/services X,Y,Z will be affected. But yes I agree with smaller PEs any failure fallout is minimized proportionally.
This doesn’t mean there isn’t a place for large routers. For example, in a typical network, by the time we get to the P nodes layer in the core we tend to have high levels of redundancy, i.e. any PE is dual-homed to two or more P nodes and will have 100% redundant capacity.
Exactly, while the service edge topology might be dynamic as a result of horizontal scaling the core on the other hand I'd say should be fairly static and scaled vertically, that is I wouldn't want to scale core routers horizontally and as a result have core topology changing with every P scale out iteration at any POP, that would be bad news for capacity planning and traffic engineering...
I’ve tried to write some of my experiences here (https://null.53bits.co.uk/index.php?page=few-larger-routers-vs.-many- smaller-routers). The tl;dr version though is that there’s rarely a technical restriction to having fewer large routers and it’s an operational/business impact problem.
I'll give it a read, cheers. adam
On Thu, 27 Jun 2019 at 12:46, <adamv0025@netconsultings.com> wrote:
From: James Bensley <jwbensley@gmail.com> Sent: Thursday, June 27, 2019 9:56 AM
One experience I have made is that when there is an outage on a large PE, even when it still has spare capacity, is that the business impact can be too much to handle (the support desk is overwhelmed, customers become irate if you can't quickly tell them what all the impacted services are, when service will be restored, the NMS has so many alarms it’s not clear what the problem is or where it's coming from etc.).
I see what you mean, my hope is to address these challenges by having a "single source of truth" provisioning system that will have, among other things, also HW-customer/service mapping -so Ops team will be able to say that if particular LC X fails then customers/services X,Y,Z will be affected. But yes I agree with smaller PEs any failure fallout is minimized proportionally.
Hi Adam, My experience is that it is much more complex than that (although it also depends on what sort of service you're offering), one can't easily model the inter-dependency between multiple physical assets like links, interfaces, line cards, racks, DCs etc and logical services such as a VRFs/L3VPNs, cloud hosted proxies and the P&T edge. Consider this, in my opinion, relatively simple example: Three PEs in a triangle. Customer is dual-homed to PE1 and PE2 and their link to PE1 is their primary/active link. Transit is dual-homed to PE2 and PE3 and your hosted filtering service cluster is also dual-homed to PE2 and PE3 to be near the Internet connectivity. How will you record the inter-dependencies that an outage on PE3 impacts Customer? Because when that Customer sends traffic to PE1 (lets say all their operations are hosted in a public cloud provider), and PE1 has learned the shortest-path to 0/0 or ::0/0 from PE2, the Internet traffic is sent from PE1 to PE2, and from PE2 into your filtering cluster, and when the traffic comes back into PE2 after passing through the filters it is then sent to PE3 because the transit provider attached to PE3 has a better route to Customer's destination (AWS/Azure/GCP/whatever) than the one directly attached to PE2. That to me is a simple scenario, and it can be mapped with a dependency tree. But in my experience, and maybe it's just me, things are usually a lot more complicated than this. The root cause is probably a bad design introducing too much complexity, which is another vote for smaller PEs from me. With more service dedicated PEs one can reduce or remove the possibility of piling multiple services and more complexity onto the same PE(s). Most places I've seen (managed service providers) simply can't map the complex inter-dependencies they have been physical and logical infrastructure without having some super bespoke and also complex asset management / CMDB / CI system. Cheers, James.
On 27/Jun/19 14:48, James Bensley wrote:
That to me is a simple scenario, and it can be mapped with a dependency tree. But in my experience, and maybe it's just me, things are usually a lot more complicated than this. The root cause is probably a bad design introducing too much complexity, which is another vote for smaller PEs from me. With more service dedicated PEs one can reduce or remove the possibility of piling multiple services and more complexity onto the same PE(s).
Which is one of the reasons we - painfully to the bean counters - insist that routers are deployed for function. We won't run peering and transit services on the same router. We won't run SP and Enterprise on the same router as Broadband. We won't run supporting services (DNS, RADIUS, WWW, FTP, Portals, NMS, e.t.c.) on the same router where we terminate customers. This level of distribution, although quite costly initially, means you reduce the inter-dependency of services at a hardware level, and can safely keep things apart so that when bits fail, you aren't committing other services to the same fate. Mark.
On 27 June 2019 16:31:27 BST, Mark Tinka <mark.tinka@seacom.mu> wrote:
On 27/Jun/19 14:48, James Bensley wrote:
That to me is a simple scenario, and it can be mapped with a dependency tree. But in my experience, and maybe it's just me, things are usually a lot more complicated than this. The root cause is probably a bad design introducing too much complexity, which is another vote for smaller PEs from me. With more service dedicated PEs one can reduce or remove the possibility of piling multiple services and more complexity onto the same PE(s).
Which is one of the reasons we - painfully to the bean counters - insist that routers are deployed for function.
We won't run peering and transit services on the same router.
We won't run SP and Enterprise on the same router as Broadband.
We won't run supporting services (DNS, RADIUS, WWW, FTP, Portals, NMS, e.t.c.) on the same router where we terminate customers.
This level of distribution, although quite costly initially, means you reduce the inter-dependency of services at a hardware level, and can safely keep things apart so that when bits fail, you aren't committing other services to the same fate.
Mark.
Agreed. This is worked well for me over time. It's costly in the initial capex out-lay but these boxes will have different upgrade/capacity increase times and price points, so over time everything spreads out. Massive iron upgrades require biblical business cases and epic battles to get the funds approved. Periodic small to medium PE upgrades are nicer on the annual budget and the forecasting. Cheers, James.
Mark Tinka Sent: Thursday, June 27, 2019 4:31 PM
On 27/Jun/19 14:48, James Bensley wrote:
That to me is a simple scenario, and it can be mapped with a dependency tree. But in my experience, and maybe it's just me, things are usually a lot more complicated than this. The root cause is probably a bad design introducing too much complexity, which is another vote for smaller PEs from me. With more service dedicated PEs one can reduce or remove the possibility of piling multiple services and more complexity onto the same PE(s).
Which is one of the reasons we - painfully to the bean counters - insist that routers are deployed for function.
We won't run peering and transit services on the same router.
We won't run SP and Enterprise on the same router as Broadband.
We won't run supporting services (DNS, RADIUS, WWW, FTP, Portals, NMS, e.t.c.) on the same router where we terminate customers.
This level of distribution, although quite costly initially, means you reduce the inter-dependency of services at a hardware level, and can safely keep things apart so that when bits fail, you aren't committing other services to the same fate.
If the PEs are sufficiently small I'd even go further as to L3VPNs-PE vs L2VPNs-PE services etc..., it's mostly because of streamlined/simplified hw and code certification testing. But as with all the decentralize-centralize swings one has to strike the balance just right and weight the aggregation pros against too many eggs in one basket cons. adam
On 28/Jun/19 10:35, adamv0025@netconsultings.com wrote:
If the PEs are sufficiently small I'd even go further as to L3VPNs-PE vs L2VPNs-PE services etc..., it's mostly because of streamlined/simplified hw and code certification testing. But as with all the decentralize-centralize swings one has to strike the balance just right and weight the aggregation pros against too many eggs in one basket cons.
On the VPN side, we sell more l2vpn then l3vpn. In fact, I don't believe we've actually sold an l3vpn service, apart from the one we built to deliver voice services. l3vpn is a dying service in Africa. With everything in the cloud now, everybody just wants a simple IP service. Mark.
Hi James,
From: James Bensley <jwbensley+nanog@gmail.com> Sent: Thursday, June 27, 2019 1:48 PM
On Thu, 27 Jun 2019 at 12:46, <adamv0025@netconsultings.com> wrote:
From: James Bensley <jwbensley@gmail.com> Sent: Thursday, June 27, 2019 9:56 AM
One experience I have made is that when there is an outage on a large PE, even when it still has spare capacity, is that the business impact can be too much to handle (the support desk is overwhelmed, customers become irate if you can't quickly tell them what all the impacted services are, when service will be restored, the NMS has so many alarms it’s not clear what the problem is or where
I see what you mean, my hope is to address these challenges by having a "single source of truth" provisioning system that will have, among other
But yes I agree with smaller PEs any failure fallout is minimized
it's coming from etc.). things, also HW-customer/service mapping -so Ops team will be able to say that if particular LC X fails then customers/services X,Y,Z will be affected. proportionally.
Hi Adam,
My experience is that it is much more complex than that (although it also depends on what sort of service you're offering), one can't easily model the inter-dependency between multiple physical assets like links, interfaces, line cards, racks, DCs etc and logical services such as a VRFs/L3VPNs, cloud hosted proxies and the P&T edge.
Consider this, in my opinion, relatively simple example: Three PEs in a triangle. Customer is dual-homed to PE1 and PE2 and their link to PE1 is their primary/active link. Transit is dual-homed to PE2 and PE3 and your hosted filtering service cluster is also dual-homed to PE2 and PE3 to be near the Internet connectivity.
I agree the scenario you proposed is perfectly valid seems simple but might contain high degree of complexity in terms of traffic patterns. Thinking about this I'd propose to separate the problem into two parts, The simpler one to solve is the physical resource allocation part of the problem This is where the hierarchical record of physical assets could give us the right answers to what happens if this card fails (example of hierarchy POP->PE->LineCard->PhysicalPort(s)-> PhysicalPort(s)->Aggregation-SW->PhysicalPort(s)->Customer/Service) The other part of the problem is much harder and has two sub parts: -first subpart is to model interactions between number of protocols to accurately predict traffic patterns under various failure conditions. (I'd argue that this to some extent should be part of the design documentation and well understood and tested during POC testing for a new design -although entropy...) -And now the tricky subpart is to be able to map individual customer->service/service->customer traffic flows onto the first subpart (This subpart I didn't give much thought so can't possibly comment ) adam
participants (3)
-
adamv0025@netconsultings.com
-
James Bensley
-
Mark Tinka