hi I would to enquire about the cons/pros of running a full internet routing table in a vrf and the potential challenges of operating it in a VPN cross a large network that does peering and provide transit. I not a fan to support running it in a vrf. I am looking for a list of operational and technical challenges specifically around 1) control plane (route reflectors ) 2) forward plane (recursive lookup issues) 3) Operational 4) DDOS 5) BCP and RFC that would break eg "BGP-SEC does not support in todays draft to check prefixs within the VPN" 6) Vendor specifics
On 03/07/13 22:22 +0200, beavis daniels wrote:
hi
I would to enquire about the cons/pros of running a full internet routing table in a vrf and the potential challenges of operating it in a VPN cross a large network that does peering and provide transit.
I not a fan to support running it in a vrf.
I am looking for a list of operational and technical challenges
specifically around 1) control plane (route reflectors ) 2) forward plane (recursive lookup issues) 3) Operational 4) DDOS 5) BCP and RFC that would break eg "BGP-SEC does not support in todays draft to check prefixs within the VPN" 6) Vendor specifics
We decided against deploying our internet routes via vpnvX. Two major holdups for us were: Each route inside a vpnv4 table will consume more cam (96 bits versus 32), which adds up when taking full routes. Brocade XMR does not support distributing routes via vpnv6, or it did not when we were designing our MPLS network. One of the benefits of distributing internet routes inside a VRF is that it logically separates those routes from your IGP routing tables (your P routers don't see internet routes). Keeping internet routes inside your default VRF may lead to unexpectedly leaking IGP routes out to your BGP sessions, so BGP filters are important, as well as using unique (RIR) addresses inside your MPLS mesh. -- Dan White
I've done this on multiple vendor platforms, including full routes, and haven't had any issues. Resource consumption varies on vendor and implementation, but I've observed that its not as punitive as I thought it would be due to various optimizations. Granted, in most of my cases, it was in a VRF, but I was not running MPLS. On Thu, Mar 7, 2013 at 2:23 PM, Dan White <dwhite@olp.net> wrote:
On 03/07/13 22:22 +0200, beavis daniels wrote:
hi
I would to enquire about the cons/pros of running a full internet routing table in a vrf and the potential challenges of operating it in a VPN cross a large network that does peering and provide transit.
I not a fan to support running it in a vrf.
I am looking for a list of operational and technical challenges
specifically around 1) control plane (route reflectors ) 2) forward plane (recursive lookup issues) 3) Operational 4) DDOS 5) BCP and RFC that would break eg "BGP-SEC does not support in todays draft to check prefixs within the VPN" 6) Vendor specifics
We decided against deploying our internet routes via vpnvX. Two major holdups for us were:
Each route inside a vpnv4 table will consume more cam (96 bits versus 32), which adds up when taking full routes.
Brocade XMR does not support distributing routes via vpnv6, or it did not when we were designing our MPLS network.
One of the benefits of distributing internet routes inside a VRF is that it logically separates those routes from your IGP routing tables (your P routers don't see internet routes). Keeping internet routes inside your default VRF may lead to unexpectedly leaking IGP routes out to your BGP sessions, so BGP filters are important, as well as using unique (RIR) addresses inside your MPLS mesh.
-- Dan White
Hi 1) control plane (route reflectors ) - you can either run a separate control plane infrastructure for inet vrf or you can use common RRs that depends on your hardware capabilities (or you can run a separate BGP process for reflecting inet vrf). - no need to worry about data-plane as VPN routes are not installed into FIB on RRs. - as it was mentioned already porting inet prefixes into VPN table increases control-plane demands. 2) forward plane (recursive lookup issues) - for inet vrf I'd recommend per CE/next-hop labels instead of per prefix labels to save up some label space. - per next-hop label still points directly to outgoing interface so no recursive lookups. - recursive lookups are only needed with per VRF label -but I would not recommend that as it could introduce loops when PIC is used in some scenarios. 3) Operational - I find it operationally complex to keep inet table on the P-Core boxes/vrf-default. 4) DDOS - as I mentioned you can run a separate infrastructure for inet vrf i.e. dedicated box or SDR for inet PEs and inet RRs. - or you can use separate BGP processes so in case some university decides to test some special attribute on their BGP advertisements it will not reload your VPN BGP process. - or you can deploy enhanced BGP error handling on the edges and hope for the best (actually this is what should be implemented as a first thing). adam
Internet in a vrf is doable on most platforms and definitely adds a lot of flexibility. 1) control plane (route reflectors ) This is really dependent on your platform and whether you are doing multiple RD's or not. If you divide your transit into regions and filter based upon RT you can tier your route-reflectors to get plenty of scalability. 2) forward plane (recursive lookup issues) Most platforms program prefix's with associated labels slower so your base convergence will suffer. In addition if you want to run PIC you will likely be left with a bit of custom engineering to make it work. VPN's hide the next hop behind the loopback of the PE so next hop failure awareness of an edge tie will be lost. If you can stomach the double lookup you can run per-vrf labels (per prefix isn't feasible on most platforms) and weight up your edge ties and force a bounce back to another PE, otherwise you will be stuck with bgp control plane based convergence with per-ce labels. 3) Operational It's definitely harder to train operation people on how to look in a vrf. 4) DDOS It's actually much easier to design a DDOS filtering system if everything is in VRF's. If you create separate vrf's for transit and subscription your can have extreme flexibility in DDOS filtering. The import export flexibility allows for injection of /32 or /128's into your transit vrf and you can simply hang your DDOS mitigation seems between the transit and subscription VRF's. 5) BCP and RFC that would break eg "BGP-SEC does not support in todays draft to check prefixs within the VPN" We haven't found any significant functionality we would want to use other than PIC that it would break, and there was a work around with that. 6) Vendor specifics You are probably ok with most vendors but a few still have issues with table carving, and a few don't support 6VPE. -----Original Message----- From: beavis daniels [mailto:beavis.daniels@gmail.com] Sent: Thursday, March 07, 2013 2:23 PM To: nanog@nanog.org Subject: internet routing table in a vrf hi I would to enquire about the cons/pros of running a full internet routing table in a vrf and the potential challenges of operating it in a VPN cross a large network that does peering and provide transit. I not a fan to support running it in a vrf. I am looking for a list of operational and technical challenges specifically around 1) control plane (route reflectors ) 2) forward plane (recursive lookup issues) 3) Operational 4) DDOS 5) BCP and RFC that would break eg "BGP-SEC does not support in todays draft to check prefixs within the VPN" 6) Vendor specifics
On (2013-03-08 16:40 +0000), Matt Newsom wrote:
2) forward plane (recursive lookup issues) Most platforms program prefix's with associated labels slower so your base convergence will suffer.
Do you have any reference you could share? What level of penalty per prefix have you observed in each platform tested?
In addition if you want to run PIC you will likely be left with a bit of custom engineering to make it work. VPN's hide the next hop behind the loopback of the PE so next hop failure awareness of an edge tie will be lost. If you can stomach the double lookup you can run per-vrf labels (per prefix isn't feasible on most platforms) and weight up your edge ties and force a bounce back to another PE, otherwise you will be stuck with bgp control plane based convergence with per-ce labels.
PIC is about converging each prefix at the same time. It does not make statement where next_hop is pointing, is it loop0 (next-hop-self in INET) or is it edge CE. If your IGP carries all edge links, and you don't run next-hop-self, far end PE can converge faster in INET scenario. But current efforts are not to fix this, current efforts are to make the local PE do hitless repair when arriving frame is pointing to dead edge interface. It seems to be very rare to run INET in this way, majority don't carry edge links in IGP and do run next-hop-self. -- ++ytti
If you run PIC and hide the next hop information between a loopback which is what will happen in a vpn environment you will lose awareness of the failure of an edge link on a remote PE. The remote PE will continue to send traffic to the PE with the failed link until it has completely converged both at the control plane, and written to the FIB. If the remote PE has PIC running he can bounce that traffic back to his backup path via another PE. There will be some percentage of your traffic that will then form a transient micro loop though because that remote PE will have his primary path through the failed link due to shortest as path length etc, and he will not have converged yet around the failure on the remote PE and has no awareness of the failure. One possible solution to this is to guarantee that a PE will never use another PE for a primary transit route. This can be accomplished via metrics such as weight etc.. Again one of the downsides of this is you need to run VRF labels so that a local IP lookup can be done on the PE with the failed link and it can execute a local repair when it see's the link drop. -----Original Message----- From: Saku Ytti [mailto:saku@ytti.fi] Sent: Friday, March 08, 2013 11:23 AM To: nanog@nanog.org Subject: Re: internet routing table in a vrf On (2013-03-08 16:40 +0000), Matt Newsom wrote:
2) forward plane (recursive lookup issues) Most platforms program prefix's with associated labels slower so your base convergence will suffer.
Do you have any reference you could share? What level of penalty per prefix have you observed in each platform tested?
In addition if you want to run PIC you will likely be left with a bit of custom engineering to make it work. VPN's hide the next hop behind the loopback of the PE so next hop failure awareness of an edge tie will be lost. If you can stomach the double lookup you can run per-vrf labels (per prefix isn't feasible on most platforms) and weight up your edge ties and force a bounce back to another PE, otherwise you will be stuck with bgp control plane based convergence with per-ce labels.
PIC is about converging each prefix at the same time. It does not make statement where next_hop is pointing, is it loop0 (next-hop-self in INET) or is it edge CE. If your IGP carries all edge links, and you don't run next-hop-self, far end PE can converge faster in INET scenario. But current efforts are not to fix this, current efforts are to make the local PE do hitless repair when arriving frame is pointing to dead edge interface. It seems to be very rare to run INET in this way, majority don't carry edge links in IGP and do run next-hop-self. -- ++ytti
On (2013-03-08 18:17 +0000), Matt Newsom wrote:
If you run PIC and hide the next hop information between a loopback which is what will happen in a vpn environment
Typical SP network has next-hop-self in INET BGP, and does not carry edge-links in IGP. You don't want to have lot of prefixes in IGP.
If the remote PE has PIC running he can bounce that traffic back to his backup path via another PE.
PIC merely makes sure that FIB is hierarchical and it guarantees all prefixes sharing next-hop converge at same time. Local-repair can be done with or without PIC, as it just means you have local information how to deliver frame to alternate destination without expectation of convergence.
There will be some percentage of your traffic that will then form a transient micro loop though because that remote PE will have his primary path through the failed link due to shortest as path length etc
Only if egress PE does IP lookup, which is typically does not do (per-prefix or per-ce, default config in 7600, JunOS, IOS-XR) as egress PE label adjacency entry has egress rewrite information. The faulted edge PE can local-repair and get frame delivered without having to wait for BGP to converge for the customer. Transient loop can occur if both of the edges have faulted. -- ++ytti
On Mar 8, 2013, at 5:55 PM, Saku Ytti <saku@ytti.fi> wrote:
On (2013-03-08 18:17 +0000), Matt Newsom wrote:
If you run PIC and hide the next hop information between a loopback which is what will happen in a vpn environment
Typical SP network has next-hop-self in INET BGP, and does not carry edge-links in IGP. You don't want to have lot of prefixes in IGP.
If the remote PE has PIC running he can bounce that traffic back to his backup path via another PE.
PIC merely makes sure that FIB is hierarchical and it guarantees all prefixes sharing next-hop converge at same time. Local-repair can be done with or without PIC, as it just means you have local information how to deliver frame to alternate destination without expectation of convergence.
Unfortunately Cisco made things confusing by naming their "BGP FRR" feature "BGP PIC Edge."
There will be some percentage of your traffic that will then form a transient micro loop though because that remote PE will have his primary path through the failed link due to shortest as path length etc
Only if egress PE does IP lookup, which is typically does not do (per-prefix or per-ce, default config in 7600, JunOS, IOS-XR) as egress PE label adjacency entry has egress rewrite information. The faulted edge PE can local-repair and get frame delivered without having to wait for BGP to converge for the customer. Transient loop can occur if both of the edges have faulted.
-- ++ytti
There's some fundamental misunderstanding here. By default with vpnv4 and vpnv6 address-familie there's next hop self set by the PE. Local-Repair and label-retention was around many years before PIC came along. It worked nicely with eibgp multipath and allowed the primary PE to work around the failed PE-CE link and send traffic to alternate PE that advertised the same prefix. The added value with PIC is you don't have to have equal attributes in order to have an alternate path installed into FIB There are no micro-loops involved on an alternate PE. During normal operation packet incoming on Primary PE would be label-switched based on the per-prefix or per-ce label via PE-CE link as directed by the L2 overwrite in the FIB. In case of the local PE-CE link failure. PIC or Local-Repair will just label switch the incoming label with label advertised by the alternate PE. Once the alternate PE receives the labeled packet it will just label-switch it out the PE-CE link. During normal operation or during failure there is no recursive lookup done just label-switching. As Ytti pointed out already you don't want the PE-CE links to be carried by the IGP as you can fast reroute over their failure and perform a "local-repair" until the BGP converges and the ingress PE starts forwarding traffic to alternate PE/NH. The only case when you experience an excessive loss of connectivity is when the egress PE fails -in that case you need to really on the speed of IGP convergence to inform the ingress PE to switch to a preprogramed backup path/NH (PIC CORE). There are already some RFCs that propose P-core to fast reroute to alternate PE in case the primary PE fails - can't wait :). adam -----Original Message----- From: Matt Newsom [mailto:matt.newsom@RACKSPACE.COM] Sent: Friday, March 08, 2013 7:18 PM To: Saku Ytti; nanog@nanog.org Subject: RE: internet routing table in a vrf If you run PIC and hide the next hop information between a loopback which is what will happen in a vpn environment you will lose awareness of the failure of an edge link on a remote PE. The remote PE will continue to send traffic to the PE with the failed link until it has completely converged both at the control plane, and written to the FIB. If the remote PE has PIC running he can bounce that traffic back to his backup path via another PE. There will be some percentage of your traffic that will then form a transient micro loop though because that remote PE will have his primary path through the failed link due to shortest as path length etc, and he will not have converged yet around the failure on the remote PE and has no awareness of the failure. One possible solution to this is to guarantee that a PE will never use another PE for a primary transit route. This can be accomplished via metrics such as weight etc.. Again one of the downsides of this is you need to run VRF labels so that a local IP lookup can be done on the PE with the failed link and it can execute a local repair when it see's the link drop. -----Original Message----- From: Saku Ytti [mailto:saku@ytti.fi] Sent: Friday, March 08, 2013 11:23 AM To: nanog@nanog.org Subject: Re: internet routing table in a vrf On (2013-03-08 16:40 +0000), Matt Newsom wrote:
2) forward plane (recursive lookup issues) Most platforms program prefix's with associated labels slower so your base convergence will suffer.
In addition if you want to run PIC you will likely be left with a bit of custom engineering to make it work. VPN's hide the next hop behind the loopback of the PE so next hop failure awareness of an edge tie will be lost. If you can stomach the double lookup you can run per-vrf labels (per
Do you have any reference you could share? What level of penalty per prefix have you observed in each platform tested? prefix isn't feasible on most platforms) and weight up your edge ties and force a bounce back to another PE, otherwise you will be stuck with bgp control plane based convergence with per-ce labels. PIC is about converging each prefix at the same time. It does not make statement where next_hop is pointing, is it loop0 (next-hop-self in INET) or is it edge CE. If your IGP carries all edge links, and you don't run next-hop-self, far end PE can converge faster in INET scenario. But current efforts are not to fix this, current efforts are to make the local PE do hitless repair when arriving frame is pointing to dead edge interface. It seems to be very rare to run INET in this way, majority don't carry edge links in IGP and do run next-hop-self. -- ++ytti
participants (7)
-
Adam Vitkovsky
-
beavis daniels
-
Dan White
-
Matt Newsom
-
PC
-
Phil Bedard
-
Saku Ytti