few big monolithic PEs vs many small PEs
Hi folks, Recently I ran into a peculiar situation where we had to cap couple of PE even though merely a half of the rather big chassis was populated with cards, reason being that the central RE/RP was not able to cope with the combined number of routes/vrfs/bgp sessions/etc.. So this made me think about the best strategy in building out SP-Edge nowadays (yes I'm aware of the centralize/decentralize pendulum swinging every couple of years). The conclusion I came to was that *currently the best approach would be to use several medium to small(fixed) PEs to replace a big monolithic chasses based system. So what I was thinking is, Yes it will cost a bit more (router is more expensive than a LC) Will end up with more prefixes in IGP, more BGP sessions etc.. -don't care. But the benefits are less eggs in one basket, simplified and hence faster testing in case of specialized PEs and obviously better RP CPU/MEM to port ratio. Am I missing anything please? *currently, Yes some old chassis systems or even multi-chassis systems used to support additional RPs and offloading some of the processes (e.g. BGP onto those) -problem is these are custom hacks and still a single OS which needs rebooting LC/ASICs when being upgraded -so the problem of too many eggs in one basket still exists (yes cisco NCS6k and recent ASR9k lightspeed LCs are an exception) And yes there is the "node-slicing" approach from Juniper where one can offload CP onto multiple x86 servers and assign LCs to each server (virtual node) - which would solve my chassis full problem -but honestly how many of you are running such setup? Exactly. And that's why I'd be hesitant to deploy this solution in production just yet. I don't know of any other vendor solution like this one, but who knows maybe in 5 years this is going to be the new standard. Anyways I need a solution/strategy for the next 3-5 years. Would like to hear what are your thoughts on this conundrum. adam netconsultings.com ::carrier-class solutions for the telecommunications industry::
Hi Adam, Depends on how big of a router you need for your "small PE". Taking Juniper as an example, the MX204 is pretty unbeatable cost wise if you can make do with its 4*QSFP28 & 8*SFP+ interfaces. There's a very big gap between the MX204 and the first chassis based router in the MX lineup, even if you only try to replicate the port configuration at first. Best regards, Martijn PS, take note of the MX204 port profiles, not every combination of interface speeds is possible: https://apps.juniper.net/home/port-checker/ On 19 June 2019 22:22:45 CEST, adamv0025@netconsultings.com wrote: Hi folks, Recently I ran into a peculiar situation where we had to cap couple of PE even though merely a half of the rather big chassis was populated with cards, reason being that the central RE/RP was not able to cope with the combined number of routes/vrfs/bgp sessions/etc.. So this made me think about the best strategy in building out SP-Edge nowadays (yes I'm aware of the centralize/decentralize pendulum swinging every couple of years). The conclusion I came to was that *currently the best approach would be to use several medium to small(fixed) PEs to replace a big monolithic chasses based system. So what I was thinking is, Yes it will cost a bit more (router is more expensive than a LC) Will end up with more prefixes in IGP, more BGP sessions etc.. -don't care. But the benefits are less eggs in one basket, simplified and hence faster testing in case of specialized PEs and obviously better RP CPU/MEM to port ratio. Am I missing anything please? *currently, Yes some old chassis systems or even multi-chassis systems used to support additional RPs and offloading some of the processes (e.g. BGP onto those) -problem is these are custom hacks and still a single OS which needs rebooting LC/ASICs when being upgraded -so the problem of too many eggs in one basket still exists (yes cisco NCS6k and recent ASR9k lightspeed LCs are an exception) And yes there is the "node-slicing" approach from Juniper where one can offload CP onto multiple x86 servers and assign LCs to each server (virtual node) - which would solve my chassis full problem -but honestly how many of you are running such setup? Exactly. And that's why I'd be hesitant to deploy this solution in production just yet. I don't know of any other vendor solution like this one, but who knows maybe in 5 years this is going to be the new standard. Anyways I need a solution/strategy for the next 3-5 years. Would like to hear what are your thoughts on this conundrum. adam netconsultings.com ::carrier-class solutions for the telecommunications industry:: -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Wed, 19 Jun 2019 at 23:25, <adamv0025@netconsultings.com> wrote:
The conclusion I came to was that *currently the best approach would be to use several medium to small(fixed) PEs to replace a big monolithic chasses based system.
For availability I think it is best approach to do many small edge devices. Because software is terrible, will always be terrible. People are bad at operating the devices and will always be. Hardware is is something we think about lot when we think about redundancy, but it's not that common reason for an outage. With more smaller boxes the inevitable human cockup and software defects will affect fewer customers. Why I believe this to be true, is because the events are sufficiently rare and once those happen, we find solution or at very least workaround rather fast. With full inaction you could argue that having A3 and B1+B2 is same amount of aggregate outage, as while outage in B affects fewer customers, there are two B nodes with equal probability of outage. But I argue that the events are not independent, they are dependent, so probability calculation isn't straightforward. Once we get some rare software defect or operator mistake on B1, we usually solve it before it triggers on B2, making the aggregate downtime of entire system lower.
Yes it will cost a bit more (router is more expensive than a LC)
Several of my employees have paid only for LC. I don't think the CAPEX difference is meaningful, but operating two separate devices may have significant OPEX implications in electricity, rack space, provisioning, maintenance etc.
And yes there is the "node-slicing" approach from Juniper where one can offload CP onto multiple x86 servers and assign LCs to each server (virtual node) - which would solve my chassis full problem -but honestly how many of you are running such setup? Exactly. And that's why I'd be hesitant to deploy this solution in production just yet. I don't know of any other vendor solution like this one, but who knows maybe in 5 years this is going to be the new standard. Anyways I need a solution/strategy for the next 3-5 years.
Node slicing indeed seems like it can be sufficient compromise here between OPEX and availability. I believe (not know) that the shared software risks are meaningfully reduced and that bringing down whole system is sufficiently rare to allow availability upside compared to single large box. -- ++ytti
hey,
For availability I think it is best approach to do many small edge devices.
This is also great for planned maintenance. ISSU has not really worked out for any of the vendors and with two small devices you can upgrade them independently. Great for aggregation, enables you to dual-home access devices into two separate PEs that will never be down at the same time be it failure or planned maintenance (excluding the physical issues like power/cooling but dual-homing to two separate sites is always problematic for eyeball networks). -- tarko
Hey,
From: Tarko Tikan Sent: Thursday, June 20, 2019 8:28 AM
hey,
For availability I think it is best approach to do many small edge devices.
This is also great for planned maintenance. ISSU has not really worked out for any of the vendors and with two small devices you can upgrade them independently.
Yup I guess no one is really using ISSU in production, and even with ISSU, currently, most of the NPUs on the market need to be power-cycled to load a new version of microcode so there's packet loss on data-plane anyways.
Great for aggregation, enables you to dual-home access devices into two separate PEs that will never be down at the same time be it failure or planned maintenance (excluding the physical issues like power/cooling but dual-homing to two separate sites is always problematic for eyeball networks).
Actually this is an interesting point you just raised. (note: The assumption for the below is single-homed customers, as the dual-homed customer would probably what to be at least site diverse and pay premium for that service) So what is the primary goal of us using the aggregation/access layer? It's to achieve better utilization of the expensive router ports right? (hence called aggregation) And indeed there are cases where we connect customers directly on to the PEs, but then it's somehow ok for a line-card to be part of just a single chassis (or a PE). Now let's take a step even further what if the line-card is not inside the chassis anymore -cause it's a fabric-extender or a satellite card. Why all of a sudden we'd be uncomfortable again to have it part of just a single chassis (and there are tons of satellite/extender topologies to prove that this is a real concern among operators). So to circle back to a standalone aggregation device -should we try and complicate the design by creating this "fabric" (PEs "spine" and aggregation devices "leaf") in an attempt to increase resiliency or shall we treat each aggregation device as unitary indivisible part of a single PE as if it was a card in a chassis -cause if the economics worked It would be a card in a chassis? adam
hey,
So what is the primary goal of us using the aggregation/access layer? It's to achieve better utilization of the expensive router ports right? (hence called aggregation)
I'm in the eyeball business so saving router ports is not a primary concern. Aggregation exists to aggregate downstream access devices like DSLAMs, OLTs etc. First of all they have interfaces that are not available in your typical PEs. Secondly they are physically located further downstream, closer to the customers. It is not economical or even physically possible to have an MPLS device next to every DSLAM, hence the aggregation. Eyeball network topologies are very much driven by fiber layout that might have been built 10+ years ago following TDM network best practices (rings). Ideally (and if your market situation and finances allow this) you want your access device (or in PON case, perhaps even a OLT linecard) to be only SPOF. If you now uplink this access device to a PE, PE linecard becomes a SPOF for many, let's say 40 as this is a typical port count, access devices. If you don't want this to happen you can use second fiber pair for second uplink but you typically don't have fiber to second aggregation site. So your only option is to build on same fiber (so thats a SPOF too) to the same site. If you now uplink to same PE, you will still loose both uplinks during software upgrades. Two devices will help with that making aggregation upgrades invisible for customers thus improving customer satisfaction. Again, it very much depends on market, in here the customers get nosy if they have more than one or two planned maintenances in a year (and this is not for some premium L3VPN service but just internet). -- tarko
" It is not economical or even physically possible to have an MPLS device next to every DSLAM, hence the aggregation." https://mikrotik.com/product/RB750r2 MSRP $39.95 I readily admit that this device isn't large enough for most cases, but you can get cheap and small MPLS routers. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Tarko Tikan" <tarko@lanparty.ee> To: adamv0025@netconsultings.com, nanog@nanog.org Sent: Friday, June 21, 2019 2:51:20 AM Subject: Re: few big monolithic PEs vs many small PEs hey,
So what is the primary goal of us using the aggregation/access layer? It's to achieve better utilization of the expensive router ports right? (hence called aggregation)
I'm in the eyeball business so saving router ports is not a primary concern. Aggregation exists to aggregate downstream access devices like DSLAMs, OLTs etc. First of all they have interfaces that are not available in your typical PEs. Secondly they are physically located further downstream, closer to the customers. It is not economical or even physically possible to have an MPLS device next to every DSLAM, hence the aggregation. Eyeball network topologies are very much driven by fiber layout that might have been built 10+ years ago following TDM network best practices (rings). Ideally (and if your market situation and finances allow this) you want your access device (or in PON case, perhaps even a OLT linecard) to be only SPOF. If you now uplink this access device to a PE, PE linecard becomes a SPOF for many, let's say 40 as this is a typical port count, access devices. If you don't want this to happen you can use second fiber pair for second uplink but you typically don't have fiber to second aggregation site. So your only option is to build on same fiber (so thats a SPOF too) to the same site. If you now uplink to same PE, you will still loose both uplinks during software upgrades. Two devices will help with that making aggregation upgrades invisible for customers thus improving customer satisfaction. Again, it very much depends on market, in here the customers get nosy if they have more than one or two planned maintenances in a year (and this is not for some premium L3VPN service but just internet). -- tarko
On 21/Jun/19 09:36, adamv0025@netconsultings.com wrote:
And indeed there are cases where we connect customers directly on to the PEs, but then it's somehow ok for a line-card to be part of just a single chassis (or a PE).
We'd typically do this for very high-speed ports (100Gbps), as it's cheaper to aggregate 10Gbps-and-slower via an Ethernet switch trunking to a router line card.
Now let's take a step even further what if the line-card is not inside the chassis anymore -cause it's a fabric-extender or a satellite card. Why all of a sudden we'd be uncomfortable again to have it part of just a single chassis (and there are tons of satellite/extender topologies to prove that this is a real concern among operators).
I never quite saw the use-case for satellite ports. To me, it felt like vendors trying to find ways to lock you into their revenue stream forever, as many of these architectures do not play well with the other kids. I'd rather keep it simple and have 802.1Q trunks between router line cards and affordable Ethernet switches. We are currently switching our Layer 2 aggregation ports in the data centre from Juniper to Arista, talking to a Juniper edge router. I'd have been in real trouble if I'd fallen for Juniper's satellite system, as they have a number of shortfalls in the Layer 2 space, I feel.
So to circle back to a standalone aggregation device -should we try and complicate the design by creating this "fabric" (PEs "spine" and aggregation devices "leaf") in an attempt to increase resiliency or shall we treat each aggregation device as unitary indivisible part of a single PE as if it was a card in a chassis -cause if the economics worked It would be a card in a chassis?
See my previous response to you. Mark.
From: Mark Tinka Sent: Friday, June 21, 2019 9:07 AM
On 21/Jun/19 09:36, adamv0025@netconsultings.com wrote:
And indeed there are cases where we connect customers directly on to the PEs, but then it's somehow ok for a line-card to be part of just a single chassis (or a PE).
We'd typically do this for very high-speed ports (100Gbps), as it's cheaper to aggregate 10Gbps-and-slower via an Ethernet switch trunking to a router line card.
Now let's take a step even further what if the line-card is not inside the chassis anymore -cause it's a fabric-extender or a satellite card. Why all of a sudden we'd be uncomfortable again to have it part of just a single chassis (and there are tons of satellite/extender topologies to prove that this is a real concern among operators).
I never quite saw the use-case for satellite ports. To me, it felt like vendors trying to find ways to lock you into their revenue stream forever, as many of these architectures do not play well with the other kids. I'd rather keep it simple and have 802.1Q trunks between router line cards and affordable Ethernet switches.
We are currently switching our Layer 2 aggregation ports in the data centre from Juniper to Arista, talking to a Juniper edge router. I'd have been in real trouble if I'd fallen for Juniper's satellite system, as they have a number of shortfalls in the Layer 2 space, I feel.
I'd actually like to hear more on that if you don't mind.
So to circle back to a standalone aggregation device -should we try and complicate the design by creating this "fabric" (PEs "spine" and aggregation devices "leaf") in an attempt to increase resiliency or shall we treat each aggregation device as unitary indivisible part of a single PE as if it was a card in a chassis -cause if the economics worked It would be a card in a chassis?
See my previous response to you.
You actually haven't answered the question I'm afraid :) So would you connect the Juniper now Arista aggregation switch to at least two PEs in the POP (or all PEs in the POP -"fabric-style") or would you consider 1:1 mapping between an aggregation switch and a PE please? adam
On 21/Jun/19 10:46, adamv0025@netconsultings.com wrote:
I'd actually like to hear more on that if you don't mind.
What part, Juniper's Ethernet switching portfolio?
You actually haven't answered the question I'm afraid :) So would you connect the Juniper now Arista aggregation switch to at least two PEs in the POP (or all PEs in the POP -"fabric-style") or would you consider 1:1 mapping between an aggregation switch and a PE please?
Each edge router connects to its own aggregation switch (one or more, depending on the number of ports required). The outgoing EX4550's we used were setup in a VC for ease of management when we needed more ports on a router-switch pair. But since Arista don't support VC's, each switch would have an independent port to the edge router. Based upon experience with VC's and the EX4550, that's not necessarily a bad thing, as what you provision and what you actually get and can use are totally different things. We do not dual-home aggregation switches to edge routers; that's just asking for STP issues (which we once faced when we thought we should be fancy and provide VRRP services between 2 edge routers and their associated aggregated switches. Mark.
I was reading this and thought, ....planet earth is a single point of failure. ...but, I guess we build and design and connect as much redundancy (logic, hw, sw, power) as the customer requires and pays for.... and that we can truly accomplish. -Aaron
On Fri, Jun 21, 2019 at 09:01:38AM -0500, Aaron Gould wrote:
I was reading this and thought, ....planet earth is a single point of failure.
...but, I guess we build and design and connect as much redundancy (logic, hw, sw, power) as the customer requires and pays for.... and that we can truly accomplish.
Fate sharing is also an important concept in system design.
On 6/21/19 10:01 AM, Aaron Gould wrote:
I was reading this and thought, ....planet earth is a single point of failure.
...but, I guess we build and design and connect as much redundancy (logic, hw, sw, power) as the customer requires and pays for.... and that we can truly accomplish.
-Aaron
I don't know about you, but we keep two earths in active/standby. Sure, the power requirements are through the roof, but hey -- it's worth it.
Hey Saku,
From: Saku Ytti <saku@ytti.fi> Sent: Thursday, June 20, 2019 7:04 AM
On Wed, 19 Jun 2019 at 23:25, <adamv0025@netconsultings.com> wrote:
The conclusion I came to was that *currently the best approach would be to use several medium to small(fixed) PEs to replace a big monolithic chasses based system.
For availability I think it is best approach to do many small edge devices. Because software is terrible, will always be terrible. People are bad at operating the devices and will always be. Hardware is is something we think about lot when we think about redundancy, but it's not that common reason for an outage. With more smaller boxes the inevitable human cockup and software defects will affect fewer customers. Why I believe this to be true, is because the events are sufficiently rare and once those happen, we find solution or at very least workaround rather fast. With full inaction you could argue that having A3 and B1+B2 is same amount of aggregate outage, as while outage in B affects fewer customers, there are two B nodes with equal probability of outage. But I argue that the events are not independent, they are dependent, so probability calculation isn't straightforward. Once we get some rare software defect or operator mistake on B1, we usually solve it before it triggers on B2, making the aggregate downtime of entire system lower.
Yup I agree, Just on the human cockups though, we're putting more and more automation in to help address the problem of human imperfections. But automation can actually go both ways, some say it helps with the small day to day problems but occasionally creates a massive one. So considering the B1 & B2 correlation if operations on these are automated then, depending on how the automation system is designed/operated, one might not get the chance to reflect/assess on B1 before B2 is touched -so this might further complicate the equation for the aggregate system downtime computation.
Yes it will cost a bit more (router is more expensive than a LC)
Several of my employees have paid only for LC. I don't think the CAPEX difference is meaningful, but operating two separate devices may have significant OPEX implications in electricity, rack space, provisioning, maintenance etc.
And yes there is the "node-slicing" approach from Juniper where one can offload CP onto multiple x86 servers and assign LCs to each server (virtual node) - which would solve my chassis full problem -but honestly how many of you are running such setup? Exactly. And that's why I'd be hesitant to deploy this solution in production just yet. I don't know of any other vendor solution like this one, but who knows maybe in 5 years this is going to be the new standard. Anyways I need a solution/strategy for the next 3-5 years.
Node slicing indeed seems like it can be sufficient compromise here between OPEX and availability. I believe (not know) that the shared software risks are meaningfully reduced and that bringing down whole system is sufficiently rare to allow availability upside compared to single large box.
I tend to agree, though as you say it's a compromise nevertheless. If one needs to switch to a new version of fabric in order to support new line-cards or upgrade code on the base system for that matter - the whole thing (NFVI) needs to be power-cycled. adam
On Fri, 21 Jun 2019 at 10:09, <adamv0025@netconsultings.com> wrote:
Just on the human cockups though, we're putting more and more automation in to help address the problem of human imperfections.
With automation we break far far less often, far far more. MTTR is also increased due to skill rot, in CLI jockey network you break something every day and you have to troubleshoot and fix it, so even fixing complex problems becomes routine. With automation years may pass without complex outages when they happen, people panic and are able to act logically and focus on single problem. I am absolutely PRO automation. But I'm saying there is a cost. -- ++ytti
On 19/Jun/19 22:22, adamv0025@netconsultings.com wrote:
Yes it will cost a bit more (router is more expensive than a LC)
I found the reverse to be true... chassis' are cheap. Line cards are costly.
Would like to hear what are your thoughts on this conundrum.
So this depends on where you want to deliver your service, and the function, in my opinion. If you are talking about an IP/MPLS-enabled Metro-E network, then having several, smaller routers spread across one or more rings is cheaper and more effective. If you are delivering services to large customers from within a data centre, large edge routers make more sense, particularly given the rising costs of co-location. If you are providing BNG services, it depends on how you want to balance ease of management vs. scale vs. cost. If you have the cash to spend, de-centralizing your BNG's across a region/city/country will give you more scale and better redundancy, but could be more costly depending on your per-box sizing as well as an increase in management time. If you want to improve management, you can have fewer boxes to cover large parts of your region/city/country. But this may mean buying a very large box to concentrate more users in fewer places. If you are trying to combine Enterprise, Service Provider and Consumer services in one chassis, well, as the saying goes, "If you are competitor, I approve of this message" :-). Mark.
Hey Mark,
From: Mark Tinka Sent: Thursday, June 20, 2019 3:27 PM
On 19/Jun/19 22:22, adamv0025@netconsultings.com wrote:
Yes it will cost a bit more (router is more expensive than a LC)
I found the reverse to be true... chassis' are cheap. Line cards are costly.
Well yes but if say I compare just a single line-card cost to a standalone fixed-format 1RU router with a similar capacity -the card will always be cheaper and then as I'll start adding cards on the left-hand side of the equation things should start to even out gradually (problem is this gradual increase is just a theoretical exercise -there are no fixed PE products to do this with). Yes I can compare mpc7 with a mx204. Or asr9901 with some tomahawk card(s) probably not apples to apples? But if I would venture above 1/2RU then I'm back in chassis based systems paying extra for REs/RPs and fabric and fans and PSUs... with every small PE I'm putting in so then I'm talking about add two new cards to existing chassis or ad two new cards to a new chassis. Also one interesting CAPEX factor to consider is the connectivity back to the core, as with many small PEs in a POP one would need a lot of ports on core routers and also once again the aggregation factor is somewhat lost in doing so. Where I'd have just a couple of PEs with 100G back to the core now I'd need bunch of 10s-bundled or 40s -would probably need additional cards in core routers to accommodate the need for PE ports in the POP.
Would like to hear what are your thoughts on this conundrum.
So this depends on where you want to deliver your service, and the function, in my opinion.
If you are talking about an IP/MPLS-enabled Metro-E network, then having several, smaller routers spread across one or more rings is cheaper and more effective.
Well playing devil's advocate, having the metro rings build as dumb L1 or L2 with pair of PEs at the top is cheaper -although not much cheaper nowadays the economics in this sector changed significantly over the past years.
If you are delivering services to large customers from within a data centre, large edge routers make more sense, particularly given the rising costs of co- location.
So this particular case, the major POPs, is actually where we ran into the problem of RE/RP becoming full (too many VRFs/Routes/BGP sessions) halfway through the chassis. Hence I'm considering whether it's actually better to go with multiple small chassis and/or fixed form PEs in the rack as opposed to half/full rack chassis. adam
On 21/Jun/19 10:32, adamv0025@netconsultings.com wrote:
Well yes but if say I compare just a single line-card cost to a standalone fixed-format 1RU router with a similar capacity -the card will always be cheaper and then as I'll start adding cards on the left-hand side of the equation things should start to even out gradually (problem is this gradual increase is just a theoretical exercise -there are no fixed PE products to do this with). Yes I can compare mpc7 with a mx204. Or asr9901 with some tomahawk card(s) probably not apples to apples?
Yes, you can't always do that because not many vendors create 1U router versions of their line cards. The MX204 is probably one of those that comes reasonably close. I'm not sure deciding whether you get an MPC7 line card or an MX204 will be a meaningful exercise. You need to determine what your use-case fits. For example, rather than buy MPC7 line cards to support 100Gbps customers in our MX480's, it is easier to buy an MX10003. That way, we can keep the MPC2 line cards in the MX480 chassis to support up to N x 10Gbps of customer links (aggregated to an Ethernet switch, of course) and not pay the cost of trying to run 100Gbps services through the MX480. The MX10003 would then be dedicated for 100Gbps customers (and 40Gbps), meaning we can manage the ongoing operational costs of each type of customer for a specific box. We have thought about using MX204's to support 40Gbps and 100Gbps customers, but there aren't enough ports on it for it to make sense, particularly given those types of customers will want the routers they connect to to have some kind of physical redundancy, which the MX204 does not have. Our use-case for the MX204 is: - Peering. - Metro-E deployments for customers needing 10Gbps in the Access.
Also one interesting CAPEX factor to consider is the connectivity back to the core, as with many small PEs in a POP one would need a lot of ports on core routers and also once again the aggregation factor is somewhat lost in doing so. Where I'd have just a couple of PEs with 100G back to the core now I'd need bunch of 10s-bundled or 40s -would probably need additional cards in core routers to accommodate the need for PE ports in the POP.
Yes, that's not a small issue to scoff at, and you raise a valid concern that could be easily overlooked if you adopted several smaller edge routers in the data centre in favour of fewer large ones. That said, you could do what we do and have a Layer 2 core switching network, where you aggregate all routers in the data centre, so that you are not running point-to-point links between routers and your core boxes. For us, because of this, we still have plenty of slots left in our CRS-8 chassis 5 years after deploying them, even though we are supporting several 100's of Gbps worth of downstream router capacity.
Well playing devil's advocate, having the metro rings build as dumb L1 or L2 with pair of PEs at the top is cheaper -although not much cheaper nowadays the economics in this sector changed significantly over the past years.
A dumb Metro-E access with all the smarts in the core is cheap to build, but expensive to operate. You can't run away from the costs. You just have to decide whether you want to pay costs in initial cash or in long-term operational headache.
So this particular case, the major POPs, is actually where we ran into the problem of RE/RP becoming full (too many VRFs/Routes/BGP sessions) halfway through the chassis. Hence I'm considering whether it's actually better to go with multiple small chassis and/or fixed form PEs in the rack as opposed to half/full rack chassis.
Are you saying that even the fastest and biggest control plane on the market for your chassis is unable to support your requirements (assuming their cost did not stop you from looking at them in the first place)? Mark.
From: Mark Tinka <mark.tinka@seacom.mu> Sent: Friday, June 21, 2019 1:27 PM
On 21/Jun/19 10:32, adamv0025@netconsultings.com wrote:
So this particular case, the major POPs, is actually where we ran into the problem of RE/RP becoming full (too many VRFs/Routes/BGP sessions) halfway through the chassis. Hence I'm considering whether it's actually better to go with multiple small chassis and/or fixed form PEs in the rack as opposed to half/full rack chassis.
Are you saying that even the fastest and biggest control plane on the market for your chassis is unable to support your requirements (assuming their cost did not stop you from looking at them in the first place)?
I believe it would, for a time, but it would require SW upgrade -testing etc.. even newer SW in itself gave us better resource management and performance optimizations. However even with powerful CP and streamlined SW we'd be still just buying time while pushing the envelope. Hence the decentralization at the edge seems like a natural strategy to exit the uroboros paradigm. adam
On 27/Jun/19 14:03, adamv0025@netconsultings.com wrote:
I believe it would, for a time, but it would require SW upgrade -testing etc.. even newer SW in itself gave us better resource management and performance optimizations. However even with powerful CP and streamlined SW we'd be still just buying time while pushing the envelope. Hence the decentralization at the edge seems like a natural strategy to exit the uroboros paradigm.
Well, this is one area where I can't meaningfully add value, since you know your environment better than anyone else on this list. Mark.
I've ran into many providers where they had routers in the top 10 or 15 markets... and that was it. If you wanted a connection in South Bend or Indianapolis or New Orleans or Ohio or... you were backhauled potentially hundreds of miles to a nearby big market. More smaller POPs reduces the tromboning. More smaller POPs means that one POP's outage isn't as disastrous on the traffic rerouting around it. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: adamv0025@netconsultings.com To: nanog@nanog.org Sent: Wednesday, June 19, 2019 3:22:45 PM Subject: few big monolithic PEs vs many small PEs Hi folks, Recently I ran into a peculiar situation where we had to cap couple of PE even though merely a half of the rather big chassis was populated with cards, reason being that the central RE/RP was not able to cope with the combined number of routes/vrfs/bgp sessions/etc.. So this made me think about the best strategy in building out SP-Edge nowadays (yes I'm aware of the centralize/decentralize pendulum swinging every couple of years). The conclusion I came to was that *currently the best approach would be to use several medium to small(fixed) PEs to replace a big monolithic chasses based system. So what I was thinking is, Yes it will cost a bit more (router is more expensive than a LC) Will end up with more prefixes in IGP, more BGP sessions etc.. -don't care. But the benefits are less eggs in one basket, simplified and hence faster testing in case of specialized PEs and obviously better RP CPU/MEM to port ratio. Am I missing anything please? *currently, Yes some old chassis systems or even multi-chassis systems used to support additional RPs and offloading some of the processes (e.g. BGP onto those) -problem is these are custom hacks and still a single OS which needs rebooting LC/ASICs when being upgraded -so the problem of too many eggs in one basket still exists (yes cisco NCS6k and recent ASR9k lightspeed LCs are an exception) And yes there is the "node-slicing" approach from Juniper where one can offload CP onto multiple x86 servers and assign LCs to each server (virtual node) - which would solve my chassis full problem -but honestly how many of you are running such setup? Exactly. And that's why I'd be hesitant to deploy this solution in production just yet. I don't know of any other vendor solution like this one, but who knows maybe in 5 years this is going to be the new standard. Anyways I need a solution/strategy for the next 3-5 years. Would like to hear what are your thoughts on this conundrum. adam netconsultings.com ::carrier-class solutions for the telecommunications industry::
On 28/Jun/19 01:23, Mike Hammett wrote:
I've ran into many providers where they had routers in the top 10 or 15 markets... and that was it. If you wanted a connection in South Bend or Indianapolis or New Orleans or Ohio or... you were backhauled potentially hundreds of miles to a nearby big market.
More smaller POPs reduces the tromboning.
More smaller POPs means that one POP's outage isn't as disastrous on the traffic rerouting around it.
I really dislike centralized routing. Mark.
Big routers also mean they're a lot more expensive. You have to squeeze more life out of them because they cost you hundreds of thousands of dollars. You run them longer than you really should. If you run more, smaller, $20k or $30k routers, you'll replace them on a more reasonable cycle. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: adamv0025@netconsultings.com To: nanog@nanog.org Sent: Wednesday, June 19, 2019 3:22:45 PM Subject: few big monolithic PEs vs many small PEs Hi folks, Recently I ran into a peculiar situation where we had to cap couple of PE even though merely a half of the rather big chassis was populated with cards, reason being that the central RE/RP was not able to cope with the combined number of routes/vrfs/bgp sessions/etc.. So this made me think about the best strategy in building out SP-Edge nowadays (yes I'm aware of the centralize/decentralize pendulum swinging every couple of years). The conclusion I came to was that *currently the best approach would be to use several medium to small(fixed) PEs to replace a big monolithic chasses based system. So what I was thinking is, Yes it will cost a bit more (router is more expensive than a LC) Will end up with more prefixes in IGP, more BGP sessions etc.. -don't care. But the benefits are less eggs in one basket, simplified and hence faster testing in case of specialized PEs and obviously better RP CPU/MEM to port ratio. Am I missing anything please? *currently, Yes some old chassis systems or even multi-chassis systems used to support additional RPs and offloading some of the processes (e.g. BGP onto those) -problem is these are custom hacks and still a single OS which needs rebooting LC/ASICs when being upgraded -so the problem of too many eggs in one basket still exists (yes cisco NCS6k and recent ASR9k lightspeed LCs are an exception) And yes there is the "node-slicing" approach from Juniper where one can offload CP onto multiple x86 servers and assign LCs to each server (virtual node) - which would solve my chassis full problem -but honestly how many of you are running such setup? Exactly. And that's why I'd be hesitant to deploy this solution in production just yet. I don't know of any other vendor solution like this one, but who knows maybe in 5 years this is going to be the new standard. Anyways I need a solution/strategy for the next 3-5 years. Would like to hear what are your thoughts on this conundrum. adam netconsultings.com ::carrier-class solutions for the telecommunications industry::
participants (9)
-
Aaron Gould
-
adamv0025@netconsultings.com
-
Anderson, Charles R
-
Bryan Holloway
-
i3D.net - Martijn Schmidt
-
Mark Tinka
-
Mike Hammett
-
Saku Ytti
-
Tarko Tikan