Re: 1GE L3 aggregation

Baldur Norddahl

16 Jun 2016 16 Jun '16

7:36 p.m.

Hi If I need to speak BGP with a customer that only has 1G I will simply make a MPLS L2VPN to one of my edge routers. We use the ZTE 5952E switch with 48x 1G plus 4x 10G for the L2VPN end point. If that is not enough the ZTE 8900 platform will provide a ton of ports that can do MPLS. The tunnel is automatically redundant and will promote link down events, so there is not really any downside to doing it this way on low bandwidth peers. Regards Baldur Den 16. jun. 2016 09.52 skrev "Saku Ytti" <saku@ytti.fi>: Hey, I've been bit poking around trying to find reasonable option for 1GE L3 full BGP table aggregator. It seems vendors are mostly pushing Satellite/Fusion for this application. I don't really like the added complexity and tight coupling Satellite/Fusion forces me. I'd prefer standards based routing redundancy to reduce impact of defects. ASR9001 and MX104 are not an options, due to control-plane scale. New boxes in vendor pipeline are completely ignoring 1GE. I've casually talked with other people, and it seems I'm not really alone here. My dream box would be 96xSFP + 2xQSFP28, with pretty much full edge features (BGP, LDP, ISIS, +1M FIB, +5M RIB, per-interface VLANs, ipfix or sflow, at least per-port QoS with shaper, martini pseudowires). With tinfoil hat tightly fit on my head, I wonder why vendors are ignoring 1GE? Are business cases entirely driven now by Amazon, Google, Facebook and the likes? Are SP volumes so insignificant in comparison it does not make sense to produce boxes for them? Heck even 10GE is starting to become problematic, if your application is anything else than DC, because you can't choose arbitrary optics. -- ++ytti

Show replies by date

Saku Ytti

16 Jun 16 Jun

8:27 p.m.

New subject: 1GE L3 aggregation

On 16 June 2016 at 22:36, Baldur Norddahl <baldur.norddahl@gmail.com> wrote: Hey,

...

If I need to speak BGP with a customer that only has 1G I will simply make a MPLS L2VPN to one of my edge routers. We use the ZTE 5952E switch with 48x 1G plus 4x 10G for the L2VPN end point. If that is not enough the ZTE 8900 platform will provide a ton of ports that can do MPLS.

I wonder if you'd do this, if you could do L3 to the edge. And why is termination technology dependant on termination rate?

...

The tunnel is automatically redundant and will promote link down events, so there is not really any downside to doing it this way on low bandwidth peers.

When you say redundant, do you mean that label can take any path between access port and termination IRB/BVI? Or do you actually have termination redundancy? If you don't have termination redundancy, you have two SPOF, access port and termination. If you do have termination redundancy, you're spending control-plane resource from two devices, doubling your control-plane scale/cost. I'm not saying it's bad solution, I know lot of people do it. But I think people only do it, because L3 at port isn't offered by vendors at lower rates. -- ++ytti

Baldur Norddahl

9:24 p.m.

New subject: 1GE L3 aggregation

On 16 June 2016 at 22:27, Saku Ytti <saku@ytti.fi> wrote:

...

On 16 June 2016 at 22:36, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:

Hey,

...
If I need to speak BGP with a customer that only has 1G I will simply make a MPLS L2VPN to one of my edge routers. We use the ZTE 5952E switch with 48x 1G plus 4x 10G for the L2VPN end point. If that is not enough the ZTE 8900 platform will provide a ton of ports that can do MPLS.

I wonder if you'd do this, if you could do L3 to the edge. And why is termination technology dependant on termination rate?

The ZTE 5952E (routing switch) can do L3VPN including BGP. But it is limited to about 30k routes. It is usable if the customer wants a default route solution, but not if he wants the full default free zone. The ZTE M6000S-2S4 (carrier grade router) will do all you want, however it is more expensive. We use the MPLS routing switch because it is a $2k device compared to the router which is more like $15k. As a small ISP we have two edge routers (the slightly larger M6000-S3 which is about $20k). Our customers are spread out throughout the city and we have 26 PoPs, so it is much more cost effective to have the cheaper device put the traffic in a tunnel and haul it back to the big iron.

...

...
The tunnel is automatically redundant and will promote link down events, so there is not really any downside to doing it this way on low bandwidth peers.

When you say redundant, do you mean that label can take any path between access port and termination IRB/BVI? Or do you actually have termination redundancy?

Our PoPs are connected in a ring topology (actually multiple rings). If a link goes down somewhere, or an intermediate device crashes, the L2VPN will reconfigure and find another path.

...

If you don't have termination redundancy, you have two SPOF, access port and termination.

For a BGP customer I could offer two tunnels, one to each of our provider edge routers. But very few of our customers are BGP customers, they just want normal internet. For them we do VRRP between the two provider edge routers and have the one tunnel go to both.

...

If you do have termination redundancy, you're spending control-plane resource from two devices, doubling your control-plane scale/cost.

The M6000 devices can handle 64k tunnels and are generally way overpowered for our current business. It is true that I might be limited to 1x 64k customers instead of 2x 64k customers, but with that many customers I would need to upgrade anyway.

...

I'm not saying it's bad solution, I know lot of people do it. But I think people only do it, because L3 at port isn't offered by vendors at lower rates.

We actually moved away from a hybrid solution with L3 termination at the customer edge to simply backhauling everything in L2VPNs. We did this because the L2VPN tunnels are needed anyway for other reasons and it is easier to have one way to do things. Regards, Baldur

Mark Tinka

18 Jun 18 Jun

11:07 a.m.

New subject: 1GE L3 aggregation

On 16/Jun/16 23:24, Baldur Norddahl wrote:

...

The ZTE 5952E (routing switch) can do L3VPN including BGP. But it is limited to about 30k routes. It is usable if the customer wants a default route solution, but not if he wants the full default free zone.

Might be worthwhile to ask ZTE to develop their own implementation of BGP Selective Download.

...

Our PoPs are connected in a ring topology (actually multiple rings). If a link goes down somewhere, or an intermediate device crashes, the L2VPN will reconfigure and find another path.

Which is what would happen anyway with your IGP if the service were delivered in the Access, but with fewer moving parts and less inter-dependence if the problem went beyond just ring failure or device crash.

...

For a BGP customer I could offer two tunnels, one to each of our provider edge routers. But very few of our customers are BGP customers, they just want normal internet. For them we do VRRP between the two provider edge routers and have the one tunnel go to both.

If your BGP customer-count grows, while managing 2 eBGP sessions per customer is not life-threatening, it's certainly won't go unnoticed from an operational perspective, especially if you are doing this as a matter of (redundancy) course, in lieu of a revenue-generating request by the customer to increase their SLA.

...

We actually moved away from a hybrid solution with L3 termination at the customer edge to simply backhauling everything in L2VPNs. We did this because the L2VPN tunnels are needed anyway for other reasons and it is easier to have one way to do things.

I've never been one to support the confluence of infrastructure tunnels with customer service tunnels. That's why we avoid infrastructure tunnels in general, e.g., creating a tunnel from a data centre to a peering point over which you will run peering traffic because the device at the data centre can't support peering, or running a tunnel between two PoP's to handle intra-PoP traffic, e.t.c. When you have all these tunnels running around, side-by-side with customer revenue-generating tunnels for their own sake (like a site-to-site l2vpn you've sold to a customer), it can get hairy at scale, I think. Too much inter-dependence, too many lines coming together. But again, that's just me. Mark.

Baldur Norddahl

7:31 p.m.

New subject: 1GE L3 aggregation

On 18 June 2016 at 13:07, Mark Tinka <mark.tinka@seacom.mu> wrote:

...

...
Our PoPs are connected in a ring topology (actually multiple rings). If a link goes down somewhere, or an intermediate device crashes, the L2VPN will reconfigure and find another path.

Which is what would happen anyway with your IGP if the service were delivered in the Access, but with fewer moving parts and less inter-dependence if the problem went beyond just ring failure or device crash.

Is the claim about fewer moving parts actually true? Yes if you are comparing to a plain native single-stack network with IPv4 (or IPv6) directly on the wire. But we are doing MPLS, so in our case it is L2VPN vs L3VPN. Both will reroute using the exact same mechanism, so no difference here. I found that I could remove large parts of the configuration on the access edge devices when we went from L3VPN to L2VPN. Some people will find the network easier to understand when all major configuration is in only two devices, and those two devices are mostly a mirror of each other. I agree that L3VPN is the better solution, at least in principle. That is why we started by implementing L3VPN. But in practice the L2VPN solution we have now is actually easier. Regards, Baldur

Mark Tinka

20 Jun 20 Jun

6:50 a.m.

New subject: 1GE L3 aggregation

On 18/Jun/16 21:31, Baldur Norddahl wrote:

...

Is the claim about fewer moving parts actually true? Yes if you are comparing to a plain native single-stack network with IPv4 (or IPv6) directly on the wire. But we are doing MPLS, so in our case it is L2VPN vs L3VPN. Both will reroute using the exact same mechanism, so no difference here.

I'm talking about all services. We deliver Internet Access/IP Transit, l2vpn's, and l3vpn's on the same chassis in the Access depending on what the customer wants. We are only touching one box in this case (the Access switch), and not more than one when delivering any of these services. This is what I mean by fewer moving parts - and if a problem were to occur in the Access or the core, provided that at least one path of the fibre was fine, that problem would be masked from the Access. When you have dependence between far-end devices (such as an l2vpn from the Access terminating on a centralized IP gateway for onward services), the coupling is too tight and makes things more fragile.

...

I found that I could remove large parts of the configuration on the access edge devices when we went from L3VPN to L2VPN. Some people will find the network easier to understand when all major configuration is in only two devices, and those two devices are mostly a mirror of each other.

While I like lean configurations as much as the next guy, you can't run away from them. Removing lines from one device means you add more on another. Standardization of configurations means you know what to expect (and what not to expect) regardless of the number of devices. 2016 being all the "automation" and "zero touch deployment" rage, it is now possible to operate the network without having to struggle with what the configuration on each device is. I'd rather invest in that than centralized routers.

...

I agree that L3VPN is the better solution, at least in principle. That is why we started by implementing L3VPN. But in practice the L2VPN solution we have now is actually easier.

We don't run l3vpn's for infrastructure requirements. We only run them if a customer wants an l3vpn service. Mark.

Baldur Norddahl

2:07 p.m.

New subject: 1GE L3 aggregation

On 2016-06-20 08:50, Mark Tinka wrote:

...

We don't run l3vpn's for infrastructure requirements. We only run them if a customer wants an l3vpn service. Mark.

For a long time we only had one l3vpn customer: our self. It is a good way to separate the control network from the internet. So our config was "vrf default" = IGP and remote access to devices, "vrf internet" = the thing we deliver to customers. There are two reasons we are not doing l3vpn with ip termination at the access edge devices anymore: 1) We have our own GPON switches and this is our original business. We later connected to the ILEC to resell DSL service on their DSLAMs. The ILEC delivers customers as Q-in-Q with one vlan per customer. Unfortunately our access edge devices do not support layer 3 Q-in-Q termination, so we had no other choice than to backhaul the DSL customers in a l2vpn. We then reconfigured our GPON service to emulate the same Q-in-Q one VLAN per customer so we only have one way to do things. 2) IP address scarcity. We used to allocate IP addresses to the edge devices in blocks of 64 (/26 subnet). But this still creates inefficiency where one area has free address space and another area is out. Also it is much work to constantly allocate new address blocks. It is easier with the centralized solution because customers can be pooled together irrespectively of where they actually live using the supervlan feature. Also we have trouble with a bad IPv6 implementation, that made the network unstable when we did IPv6 termination at the access edge. This has since been solved. But it is a reminder that we sometimes end up with different solutions than planed due to bugs and other unforeseen trouble. Regards, Baldur

Mark Tinka

5:23 p.m.

New subject: 1GE L3 aggregation

On 20/Jun/16 16:07, Baldur Norddahl wrote:

...

On 2016-06-20 08:50, Mark Tinka wrote:

...
We don't run l3vpn's for infrastructure requirements. We only run them if a customer wants an l3vpn service. Mark.

For a long time we only had one l3vpn customer: our self. It is a good way to separate the control network from the internet. So our config was "vrf default" = IGP and remote access to devices, "vrf internet" = the thing we deliver to customers.

Okay. Internally, we use l3vpn's for equipment management as well, but not for other services except customer l3vpn requirements. So we don't do Internet in the VRF, for example.

...

There are two reasons we are not doing l3vpn with ip termination at the access edge devices anymore:

1) We have our own GPON switches and this is our original business. We later connected to the ILEC to resell DSL service on their DSLAMs. The ILEC delivers customers as Q-in-Q with one vlan per customer. Unfortunately our access edge devices do not support layer 3 Q-in-Q termination, so we had no other choice than to backhaul the DSL customers in a l2vpn. We then reconfigured our GPON service to emulate the same Q-in-Q one VLAN per customer so we only have one way to do things.

2) IP address scarcity. We used to allocate IP addresses to the edge devices in blocks of 64 (/26 subnet). But this still creates inefficiency where one area has free address space and another area is out. Also it is much work to constantly allocate new address blocks. It is easier with the centralized solution because customers can be pooled together irrespectively of where they actually live using the supervlan feature.

So these sound like BNG deployments, which I'm okay to centralize for reasons I mentioned before. The issue we were talking about was general Business or IP Transit customers following the same topology. At any rate, it's your network, so you know best. I just wouldn't centralize things for these types of customers for reason I mentioned before.

...

Also we have trouble with a bad IPv6 implementation, that made the network unstable when we did IPv6 termination at the access edge. This has since been solved. But it is a reminder that we sometimes end up with different solutions than planed due to bugs and other unforeseen trouble.

A day in the life of a network operator. But happy to hear your IPv6 deployment has gone well. Mark.

David Charlebois

22 Jun 22 Jun

8:04 p.m.

New subject: 1GE L3 aggregation

Hello I'm curious about the overall recommendation when selecting a small class BGP router for IPv6 (with 1gig ports). We can see the current IPv4 routing table is around 615k routes and the IPv6 routing table is sitting around ~31k routes. In our case, we advertise a single /24 from our head office to 2 upstream providers. The routing is %100 for redundancy. Somebody mentioned that the Brocade CER-RT was once a best seller. Brocade are now offering the CER 4X-RT version at 256K IPv6 routes supported (1.5M IPv4 routes). We don't have immediate plans for IPv6, but I do foresee this in a few year. Question is - is 256k IPv6 routes suitable? Thanks Dave

Mark Tinka

23 Jun 23 Jun

5:54 a.m.

New subject: 1GE L3 aggregation

On 22/Jun/16 22:04, David Charlebois wrote:

...

Hello I'm curious about the overall recommendation when selecting a small class BGP router for IPv6 (with 1gig ports). We can see the current IPv4 routing table is around 615k routes and the IPv6 routing table is sitting around ~31k routes.

In our case, we advertise a single /24 from our head office to 2 upstream providers. The routing is %100 for redundancy.

Somebody mentioned that the Brocade CER-RT was once a best seller. Brocade are now offering the CER 4X-RT version at 256K IPv6 routes supported (1.5M IPv4 routes). We don't have immediate plans for IPv6, but I do foresee this in a few year. Question is - is 256k IPv6 routes suitable?

The CER/CES NetIron boxes from Brocade are reasonable. That said, BGP-SD implementations apply both to IPv4 and IPv6. So in a Metro-E Access deployment scenario, the number of IPv6 routes would not matter, as we only download into FIB the minimum necessary to keep the box alive. Mark.

Owen DeLong

6:07 a.m.

New subject: 1GE L3 aggregation

If it’s 100% for redundancy, why not just ECMP defaults and not take a full table? That will allow you to use a MUCH cheaper router with a much simpler configuration. Owen

...

On Jun 22, 2016, at 13:04 , David Charlebois <dcharlebois@gmail.com> wrote:

Hello I'm curious about the overall recommendation when selecting a small class BGP router for IPv6 (with 1gig ports). We can see the current IPv4 routing table is around 615k routes and the IPv6 routing table is sitting around ~31k routes.

In our case, we advertise a single /24 from our head office to 2 upstream providers. The routing is %100 for redundancy.

Somebody mentioned that the Brocade CER-RT was once a best seller. Brocade are now offering the CER 4X-RT version at 256K IPv6 routes supported (1.5M IPv4 routes). We don't have immediate plans for IPv6, but I do foresee this in a few year. Question is - is 256k IPv6 routes suitable?

Thanks Dave

Mark Tinka

6:17 a.m.

New subject: 1GE L3 aggregation

On 23/Jun/16 08:07, Owen DeLong wrote:

...

If it’s 100% for redundancy, why not just ECMP defaults and not take a full table?

Well, firstly, ring length may be different on either end. So you can't always guarantee ECMP of traffic to/from the device (without much difficulty such as MPLS-TE). You also can't do hop-by-hop routing based on 0/0 or ::/0 when the ring contains multiple devices also doing the same thing. You'll just create a loop. MPLS-based forwarding is your friend here. But yes, if your device is not in a ring, then your suggestion is fine. Mark.

Owen DeLong

6:22 a.m.

New subject: 1GE L3 aggregation

...

On Jun 22, 2016, at 23:17 , Mark Tinka <mark.tinka@seacom.mu> wrote:

On 23/Jun/16 08:07, Owen DeLong wrote:

...
If it’s 100% for redundancy, why not just ECMP defaults and not take a full table?

Well, firstly, ring length may be different on either end. So you can't always guarantee ECMP of traffic to/from the device (without much difficulty such as MPLS-TE).

Unless the difference is HUGE, you usually don’t really care.

...

You also can't do hop-by-hop routing based on 0/0 or ::/0 when the ring contains multiple devices also doing the same thing. You'll just create a loop. MPLS-based forwarding is your friend here.

Who said anything about a ring. He is advertising a /24 to 2 upstream providers. Likely these are two separate transit circuits.

...

But yes, if your device is not in a ring, then your suggestion is fine.

Even if you’re in a ring if you’ve got two transit providers at some random point on the ring, it still probably doesn’t make a meaningful difference between full feeds from each vs. ECMP, because it’s pretty unlikely that the AS PATH length is affected by the ring length. Owen

Mark Tinka

6:32 a.m.

New subject: 1GE L3 aggregation

On 23/Jun/16 08:22, Owen DeLong wrote:

...

Unless the difference is HUGE, you usually don’t really care.

Agree. We are in that scenario, and mostly don't care as well. There is enough link capacity

...

Who said anything about a ring. He is advertising a /24 to 2 upstream providers.

Which is what I said at the end of my reply to you. The ring angle came up as part of a wider discussion earlier in this thread, where protecting the FIB makes sense.

...

Even if you’re in a ring if you’ve got two transit providers at some random point on the ring, it still probably doesn’t make a meaningful difference between full feeds from each vs. ECMP, because it’s pretty unlikely that the AS PATH length is affected by the ring length.

In my experience, rings are normally on-net backbones (Metro-E, e.t.c.). The terminating devices on the core side at each end of the ring will be your own equipment, and not another AS. Two links to your upstream won't matter whether it's in a ring or just plain point-to-point circuits, as there is no IGP relevance on such tails. Mark.

Owen DeLong

6:43 a.m.

New subject: 1GE L3 aggregation

...

On Jun 22, 2016, at 23:32 , Mark Tinka <mark.tinka@seacom.mu> wrote:

On 23/Jun/16 08:22, Owen DeLong wrote:

...
Unless the difference is HUGE, you usually don’t really care.

Agree.

We are in that scenario, and mostly don't care as well. There is enough link capacity

...
Who said anything about a ring. He is advertising a /24 to 2 upstream providers.

Which is what I said at the end of my reply to you.

The ring angle came up as part of a wider discussion earlier in this thread, where protecting the FIB makes sense.

...
Even if you’re in a ring if you’ve got two transit providers at some random point on the ring, it still probably doesn’t make a meaningful difference between full feeds from each vs. ECMP, because it’s pretty unlikely that the AS PATH length is affected by the ring length.

In my experience, rings are normally on-net backbones (Metro-E, e.t.c.). The terminating devices on the core side at each end of the ring will be your own equipment, and not another AS.

Two links to your upstream won't matter whether it's in a ring or just plain point-to-point circuits, as there is no IGP relevance on such tails.

Mark.

Hence my confusion about your ring comments in the context of the message I was replying to. Owen

Baldur Norddahl

8:32 a.m.

New subject: 1GE L3 aggregation

On 22 June 2016 at 22:04, David Charlebois <dcharlebois@gmail.com> wrote:

...

In our case, we advertise a single /24 from our head office to 2 upstream providers. The routing is %100 for redundancy.

The full table is in many cases overrated. If both your transits are good service providers, you do not gain much by trying to get even better routing compared to what the single homed customers of each provider is getting. And that is basically what you are trying by taking in full tables. The only thing to be beware of is some so called Tier 1 providers that have bad interconnectivity to other Tier 1 providers. For example, neither Cogent nor HE will give you a full view of the IPv6 network because these two guys are in a peering war, so they miss the routes from the other network. Taking in full tables allows you to select the correct provider for the (relatively few) trouble routes, but note that you will still have a problem if one link is down. The fix is to use smaller regional transit providers, with each provider having multiple transits of his own. For a feed with default route you can use the most basic BGP speaking switch. Those are available for 1k USD or less. The ZTE switches we use are in that range with copper ports and no 10G. Or you can get a Mikrotik RB2011 for $99. Or you can keep the full feed and use a Linux/BSD server for routing with BIRD og Quagga. At 1G speed a server is going to do the job trivially. If you want to be advanced, get two servers, one for each transit. Redundancy on the LAN side can be provided by VRRP. Regards, Baldur

Mark Tinka

18 Jun 18 Jun

11:05 a.m.

New subject: 1GE L3 aggregation

On 16/Jun/16 22:27, Saku Ytti wrote:

...

I'm not saying it's bad solution, I know lot of people do it. But I think people only do it, because L3 at port isn't offered by vendors at lower rates.

A lot of people did it because because there really wasn't a cheap, dense solution until about 2010. And even then, the traditional strategy had become so entrenched that running IP all the way in the Access was such a foreign concept which was most certainly a lot more expensive than incumbent Layer 2-based Access models. I feel this has since changed with the current offerings from Cisco, Juniper and Brocade. The problem now is how to scale the low-speed port density up, as well as add 10Gbps port density, without increasing the cost or size of the platforms. Mark.

Mark Tinka

11:04 a.m.

New subject: 1GE L3 aggregation

On 16/Jun/16 21:36, Baldur Norddahl wrote:

...

Hi

If I need to speak BGP with a customer that only has 1G I will simply make a MPLS L2VPN to one of my edge routers. We use the ZTE 5952E switch with 48x 1G plus 4x 10G for the L2VPN end point. If that is not enough the ZTE 8900 platform will provide a ton of ports that can do MPLS.

The tunnel is automatically redundant and will promote link down events, so there is not really any downside to doing it this way on low bandwidth peers.

Personally (and at work), I stay away from such topologies. Centralizing IP connectivity like this may seem sexy and cheap in the start, but it has serious scaling and operational issues down the line, IMHO. We push IP/MPLS all the way into the Metro-E Access using a team of Cisco ASR920's and ME3600X's. The value of being able to instantiate an IP service or BGP session directly in the Metro-E Access simplifies network operations a great deal for us. Needless to say, not having to deal with eBGP Multi-Hop drama does not hurt. Centralizing is just horrible, but that's just me. The goal is to make all these unreliable boxes work together to offer a reliable service to your customers, so making them too inter-dependent on each other has the potential to that away in the long run. Mark.

James Jun

3:37 p.m.

New subject: 1GE L3 aggregation

On Sat, Jun 18, 2016 at 01:04:49PM +0200, Mark Tinka wrote:

...

Centralizing is just horrible, but that's just me. The goal is to make all these unreliable boxes work together to offer a reliable service to your customers, so making them too inter-dependent on each other has the potential to that away in the long run.

One issue with pushing IP transit (L3-wise) with small boxes down to the metro is that if a particular customer comes under attack, any DDoS in excess of 10-30 Gbps is going to totally destroy the remote site down to the floor and then some, until NOC intervenes to restore service. A Big Expensive Router at head-end site fed with big pipes to your IP core just needs a subscriber line rate policer configured on the customer EVC off the NNI facing your metro transport network, largely protecting your metro POP during an attack. There are also issues with control-plane policing (or limited options there of) with some of these low-end platforms.

...

We push IP/MPLS all the way into the Metro-E Access using a team of Cisco ASR920's and ME3600X's. The value of being able to instantiate an IP service or BGP session directly in the Metro-E Access simplifies network operations a great deal for us. Needless to say, not having to deal with eBGP Multi-Hop drama does not hurt.

BGP Selective Download has its own drawbacks too--in that, it's largely meant to be used on single-tailed environment with FIB only having single point of egress. Consider a topology where an ASR920 in the metro is dual-homed to two peering sites using variably distant dark fiber (say 30km to Site A, 90km to Site B), with IGP costs configured to conform to fiber distances. How will you guarantee that the best-path that ASR920 chooses for your customer taking full table to be actually congruent with the box's real forwarding path? Your 920 may choose site A as best-path, only to see FIB-programmed default route to force it out on site B. If you're doing active-standby on your fiber uplinks, then it would not be an issue; or may be in metro environment where latency differences are minimal (<1ms), you probably don't care in practice to be bothered with that. Yes, there are some operational complexities and issues with L2vpn'ing customers to head-end router -- such as, link-state propagation needs to be properly validated; and now you're burning two ports instead of one, one at the terminus, one at the access, doubling SPOF and maintenance liabilities. At the end of the day, it's lack of full-featured ports at reasonable cost that drive centralization to head-ends. Spamming Small Expensive Routers (ASR9001/MX104) in every small metro site doesn't scale (btdt myself), but neither is hacking up BGP to work on platforms that aren't really meant to function as heavy L3 routers (e.g. ASR920, ME3600), IMHO. James

Saku Ytti

19 Jun 19 Jun

8:17 a.m.

New subject: 1GE L3 aggregation

On 18 June 2016 at 18:37, James Jun <james.jun@towardex.com> wrote: Hey,

...

One issue with pushing IP transit (L3-wise) with small boxes down to the metro is that if a particular customer comes under attack, any DDoS in excess of 10-30 Gbps is going to totally destroy the remote site down to the floor and then some, until NOC intervenes to restore service.

A Big Expensive Router at head-end site fed with big pipes to your IP core just needs a subscriber line rate policer configured on the customer EVC off the NNI facing your metro transport network, largely protecting your metro POP during an attack.

This is weak rationale. The flip side of this rationale is that the centralised aggregation, when attacked will bring down all the 'remote sites'. Now which is more typical reason for outage, I don't know. But of course the L3 situation can be policed in many places, you can police it at network ingress, you can police it between upper level aggregation and downstream aggregation. I do understand the centralised aggregation, particularly like Baldur explained if only very few customers will have IP transit, it's silly to pay 5k-10k for full DFZ box, when you can probably get L2VPN box for hundreds of bucks. In my case almost all of the customers would have IP transit with full BGP. But I do think, that if L3 to the edge had no commercial problems, people would universally choose to do it. L2VPN is just workaround to a commercial problem. Sometimes (residential access) to a technical problem (how do I share my IPv4 space effectively).

...

There are also issues with control-plane policing (or limited options there of) with some of these low-end platforms.

I'm not really looking cheap pipeline box, I'm looking run-to-completion NPU box with 1GE edge. The Huawei NE20E-S2F proposal was fine, ASR9001 and MX104 are not ok (Both having less than beefy control-plane, while forwarding plane in all is fine). ALU SR would be fine, but I have specific need for configuration management not today supported by TimOS. But even higher-end kit usually have plenty of vectors to do collateral damage, particularly if attacker is one of the customers. For example in ASR9k, you can't really protect customerA from customerB doing eBGP/ICMP/ARP flood, customerB does not have to be malicious, might be just internal L2 loop causing high rate of packets at provider port. -- ++ytti

Mark Tinka

20 Jun 20 Jun

7:14 a.m.

New subject: 1GE L3 aggregation

On 19/Jun/16 10:17, Saku Ytti wrote:

...

But I do think, that if L3 to the edge had no commercial problems, people would universally choose to do it. L2VPN is just workaround to a commercial problem. Sometimes (residential access) to a technical problem (how do I share my IPv4 space effectively).

I think we are getting there now, with the ASR920 and friends. And who knows, with Arista now playing in the IP/MPLS space, they might make a switch worth its name against the traditional routing vendors. You must also consider that there are a number of engineers that generally prefer tunnels. I know it sounds silly, but I've come across several engineers who prefer the idea of centralizing services over a tunnel to a single box where the intelligence happens. I suppose the passion is as much the same as engineers who like MPLS vs. those that don't. But because Layer 2 switches will always be cheaper than IP/MPLS switches, I don't see this problem going away, even if an IP/MPLS switch cost US$200/unit vs. a Layer 2 switch which cost US$150/unit. Mark.

Mark Tinka

6:22 a.m.

New subject: 1GE L3 aggregation

On 18/Jun/16 17:37, James Jun wrote:

...

One issue with pushing IP transit (L3-wise) with small boxes down to the metro is that if a particular customer comes under attack, any DDoS in excess of 10-30 Gbps is going to totally destroy the remote site down to the floor and then some, until NOC intervenes to restore service.

A DoS/DDoS attack on a central IP gateway is no safer from such a scenario. In fact, given the level of aggregation on a centralized router, the level of impact is likely to be higher. Moreover, the attack would still spread from the centralized router to the end customer until the NOC intervenes. So you aren't really gaining much by centralizing the router.

...

A Big Expensive Router at head-end site fed with big pipes to your IP core just needs a subscriber line rate policer configured on the customer EVC off the NNI facing your metro transport network, largely protecting your metro POP during an attack.

So what do you do when an attack is coming from one of your Metro-E customers to another Metro-E customer? In such a case, you've just unnecessarily involved your centralized router in something it should not be aware of.

...

There are also issues with control-plane policing (or limited options there of) with some of these low-end platforms.

On the ASR920's we use, CoPP is reasonable. But as with everything else, you have to make some compromises if you want to keep your costs down without sacrificing too much in operation. Given the spread you'd get with IP/MPLS in the Access, the problem is broken down into smaller, more manageable chunks, which appeals more to me.

...

BGP Selective Download has its own drawbacks too--in that, it's largely meant to be used on single-tailed environment with FIB only having single point of egress.

Consider a topology where an ASR920 in the metro is dual-homed to two peering sites using variably distant dark fiber (say 30km to Site A, 90km to Site B), with IGP costs configured to conform to fiber distances.

For us, that is not a Metro-E Access ring. It's too wide and would normally be broken up by some kind of PE Aggregation. If you're building Metro-E Access rings that wide, you're going to end up in trouble sooner rather than later. We build Metro-E Access rings within 1ms, and while 1ms will give you a 100km cable radius, we'd never build a Metro-E Access ring that wide. So we're typically running the rings at between 1km - 10km. And since we do latency-based IGP cost, there is no difference whether a ring is 1km or 10km wide (or even in your example, 30km vs. 90km).

...

How will you guarantee that the best-path that ASR920 chooses for your customer taking full table to be actually congruent with the box's real forwarding path? Your 920 may choose site A as best-path, only to see FIB-programmed default route to force it out on site B. If you're doing active-standby on your fiber uplinks, then it would not be an issue; or may be in metro environment where latency differences are minimal (<1ms), you probably don't care in practice to be bothered with that.

Not sure I get your scenario. We use BGP-SD where our Metro-E devices have iBGP sessions to our RR's. We download 0/0 + ::/0 + some other routes into FIB, and keep the rest in control plane. We do not see any discrepancy in NEXT_HOP data between the control or data plane when we run BGP-SD. Have you run into the issues you describe when you've turned on BGP-SD?

...

Yes, there are some operational complexities and issues with L2vpn'ing customers to head-end router -- such as, link-state propagation needs to be properly validated; and now you're burning two ports instead of one, one at the terminus, one at the access, doubling SPOF and maintenance liabilities.

The only use-case we've seen for centralizing routing is in a BNG scenario. I once attempted a distributed BNG design, but the issue was load on the control plane of each router (DHCP, session management, e.t.c.). One could deploy a dedicated edge router for BNG terminations alongside an edge router for Business/Wholesale customers, but that gets pricey, quickly. But even with centralized BNG's, we are now looking at ways to distribute them further with current virtualization technologies, into smaller islands that each can handle a decent amount of aggregate bandwidth.

...

At the end of the day, it's lack of full-featured ports at reasonable cost that drive centralization to head-ends. Spamming Small Expensive Routers (ASR9001/MX104) in every small metro site doesn't scale (btdt myself), but neither is hacking up BGP to work on platforms that aren't really meant to function as heavy L3 routers (e.g. ASR920, ME3600), IMHO.

I disagree. Adding high-end BGP support to the ME3600X/3800X might have been an afterthought, but we are lucky that it had sufficient resources to support it. On the ASR920, that was designed in from Day One, and we've been happy running full BGP there as well (in addition to doing it on the ME3600X as well). Been doing this 2010, and it's going well. The level of simplicity we enjoy, and how quickly we can turn up a customer service, the decoupling/independence of devices and the ability to run maintenance activities in a controlled way that minimize aggregate impact are benefits I'd never trade for a centralized router approach. Mark.

Baldur Norddahl

18 Jun 18 Jun

7:55 p.m.

New subject: 1GE L3 aggregation

On 18 June 2016 at 13:04, Mark Tinka <mark.tinka@seacom.mu> wrote:

...

We push IP/MPLS all the way into the Metro-E Access using a team of Cisco ASR920's and ME3600X's. The value of being able to instantiate an IP service or BGP session directly in the Metro-E Access simplifies network operations a great deal for us. Needless to say, not having to deal with eBGP Multi-Hop drama does not hurt.

Just want to point out that there is no eBGP multi-hop involved. These are L2 tunnels so the devices appear to be directly connected on the layer 3 level. The advantage of using L2VPN is that you can connect the customer to whatever can handle the requirements. You are not limited to what your access edge devices can do. 99% of our customers are not BGP customers, so it would be silly to spend cash on equipment that will support full table BGP at each PoP. The major downside is a) hops are invisible to traceroute, b) some traffic might travel longer than necessary. We are a residential ISP and we find that traffic between customers is minimal. We choose to accept that traffic between two neighbors might be backhauled to a central location and back instead of staying local. Regards, Baldur

Mark Tinka

20 Jun 20 Jun

6:59 a.m.

New subject: 1GE L3 aggregation

On 18/Jun/16 21:55, Baldur Norddahl wrote:

...

Just want to point out that there is no eBGP multi-hop involved. These are L2 tunnels so the devices appear to be directly connected on the layer 3 level.

Agree, but there is still a disconnect between what the network knows is the actual physical path vs. what the actual physical path is. This might not be a big of an issue for most, or in cases where the l2vpn does not have to travel very far or via several links. But for us, it adds some complexity we'd rather do without.

...

The advantage of using L2VPN is that you can connect the customer to whatever can handle the requirements. You are not limited to what your access edge devices can do. 99% of our customers are not BGP customers, so it would be silly to spend cash on equipment that will support full table BGP at each PoP.

In the Access, 99% of our customers are Internet Access, i.e., not IP Transit, so don't need BGP. We have come across Internet Access customers that need to do BGP for redundancy, which turns into a private ASN job with them announcing our own routes back to us. But that tends to be the exception, about 0.2% of our deliveries. That notwithstanding, touching only one box to deliver an Internet Access service is a major win for us vs. touching more than one. And we are only burning one port in lieu of more than one. And there is only one place for us to look when troubleshooting issues instead of more than one. It's simple and brain-dead, which is what we like.

...

The major downside is a) hops are invisible to traceroute, b) some traffic might travel longer than necessary.

For us, both of these are major drawbacks, and a huge advantage we gain by taking IP/MPLS into the Access.

...

We are a residential ISP and we find that traffic between customers is minimal. We choose to accept that traffic between two neighbors might be backhauled to a central location and back instead of staying local.

Which I accept in Broadband/BNG scenarios. But outside of that, well, you know my views by now :-)... Mark.

3296

Age (days ago)

3303

Last active (days ago)

List overview

Download

23 comments

6 participants

participants (6)

Baldur Norddahl
David Charlebois
James Jun
Mark Tinka
Owen DeLong
Saku Ytti