LAGing backbone links
Hello All, I was wondering if anyone had any thoughts as to the best practices of running multiple backbone links between 2 routers. In the past we've added additional links as needed, then simply enabled IS-IS when they were good to go. I'd then let IS-IS handle load balancing the traffic over the two links. But I know that others out there would setup a LAG once they had more than one link between two routers. Is there a best practice? Does it matter? Any implications to a MPLS setup? Thanks
Payam, On Apr 4, 2011, at 18:17 MDT, Payam Chychi wrote:
Hello All,
I was wondering if anyone had any thoughts as to the best practices of running multiple backbone links between 2 routers. In the past we've added additional links as needed, then simply enabled IS-IS when they were good to go. I'd then let IS-IS handle load balancing the traffic over the two links. But I know that others out there would setup a LAG once they had more than one link between two routers. Is there a best practice? Does it matter? Any implications to a MPLS setup?
In general, if you're using relatively modern, medium- to higher-end equipment, it should "just work". Some things to watch out for in order of importance: 1) Be mindful of the number of component-links you can put into a single LAG. This varies by platform. In general, for higher-end routers/switches the minimum number of component-links in a single LAG is 16. More recently, in the last couple of years, several vendors are shipping equipment and/or software that will take this up to 64x component-links in a single LAG. (Depending on platform, LAG's may allow you to build larger virtual-links between adjacent devices compared to ECMP which may be limited to 8x component-links in a single ECMP ... but, again, that all depends on the platform type). 2) The distribution of flows across the component-links in a single LAG could vary, dramatically, depending on the type of traffic you're pushing. Specifically, for /Internet/ (IPv4 or IPv6) over MPLS traffic, you will most likely very get good load distribution given the pseudo-randomness of IP addresses and Layer-4 port information, (in particular source port's from a client toward a server). OTOH, if you have traffic in [very large] PW's, then typically LSR's/switches/routers can't look past the MPLS labels and inner Layer-2 encapsulation to find granular input keys used for the load-balancing hash. Thus, the load-balancing hash result will cause all traffic for a single PW VC to non-deterministically be placed on a single component-link in the LAG. The good news is that there is hope on the horizon in the form of: http://tools.ietf.org/html/draft-ietf-pwe3-fat-pw-05 ... which, in short, expects the ingress PE to [try to] find granular input keys from the incoming traffic, (e.g.: find input keys from an IP header contained within an Ethernet frame that will be transported as a PW VC over your MPLS core), and create a hash of that that will get placed into a "FAT PW" label that sits below the PW VC label. The idea is that Core LSR's would still load-balance based on the bottom-most to top-most MPLS labels, which should result in more even load-distribution of PW VC flows over component-links in a LAG. This feature is just starting to appear in one vendor's equipment and will hopefully show up in others soon, as well. (Please bug your vendors for this! ;-) 3) Depending on the vendor, you may specifically have to configure the device to do load-balancing over LAG's or ECMP paths, (e.g.: Juniper & Brocade, possibly others). Generally, you have to configure the device what input keys to look for and/or what # of MPLS labels to look past for those input-keys, e.g.: in Juniper you configure forwarding-options -> hash-key -> family mpls -> labels-1, label-2, payload -> ip, etc. Some other things to look out for: 4) Some vendor's may use different hash algorithms for LAG vs. ECMP, so you may get "better" load-balancing from one compared to the other. Ask your vendor for details as this may not be obvious from Lab testing. 5) Some vendors may have a limit, of the maximum number of MPLS labels that they can look past to find, say, an IP payload that can be used as input-keys for the load hashing algorithm. This used to be a concern several years ago, but in general most medium- to high-end equipment can look past /at least/ 3 MPLS labels, which should cover you in the more common cases where either: a) You have IP/LDP/RSVP/RSVP-FRR, where the outermost label is a RSVP Bypass Label when you're [briefly] running on a Bypass; or, b) You have VPN-label/LDP/RSVP, where you're moving IPVPN or 6PE, etc. traffic and using LDP over RSVP tunneling. Anyway, HTH, -shane
On 05/04/2011 16:30, Shane Amante wrote:
1) Be mindful of the number of component-links you can put into a single LAG. This varies by platform. In general, for higher-end routers/switches the minimum number of component-links in a single LAG is 16.
Some older equipment will unequally prefer certain links over others, depending on the number of members in the LAG. I.e. a 2-member LAG might load balance equally under ideal conditions, but a 3-member LAG might naturally load balance 2:2:1. This is particularly a problem if you have, say an 8-member LAG and you lose a single member, which could drop your overall throughput to the total of 4 members. Nick
On Tue, Apr 05, 2011 at 08:05:59PM +0100, Nick Hilliard wrote:
Some older equipment will unequally prefer certain links over others, depending on the number of members in the LAG. I.e. a 2-member LAG might load balance equally under ideal conditions, but a 3-member LAG might naturally load balance 2:2:1.
Even newer gear does that. TurboIron 24X for example. Some Force10 switch model(s) as well, no clue how old though. LAGs have one big advantage over ECMP: with gear implementing "minimum-links" feature, you can make sure your LAG bandwidth doesn't fall below a certain capacity before being removed from IGP topology so you can make sure redundant (full!) capacity elsewhere can automatically kick in. With ECMP traffic engineering and capacity/redundancy planning becomes... "interesting". Aside of all the operational problems regarding troubleshooting (traceroutes/mtr do love such ECMP hells) and operational consequences of having a lot of adjacencies and links. For all those reasons, I usually prefer LAGs (with LACP) above ECMP, even when that means "more bugs" (vendors tend to not properly test all their features on LAGs too). Best regards, Daniel -- CLUE-RIPE -- Jabber: dr@cluenet.de -- dr@IRCnet -- PGP: 0xA85C8AA0
On 6 Apr 2011, at 23:17, Daniel Roesen <dr@cluenet.de> wrote:
On Tue, Apr 05, 2011 at 08:05:59PM +0100, Nick Hilliard wrote:
Some older equipment will unequally prefer certain links over others, depending on the number of members in the LAG. I.e. a 2-member LAG might load balance equally under ideal conditions, but a 3-member LAG might naturally load balance 2:2:1.
Even newer gear does that. TurboIron 24X for example.
I believe this has been fixed on s/w version 4.2.00 on the turboiron, and that it can now support arbitrary numbers of lag members. Haven't tested it though... Nick
On Thu, Apr 07, 2011 at 07:45:20AM +0100, Nick Hilliard wrote:
I.e. a 2-member LAG might load balance equally under ideal conditions, but a 3-member LAG might naturally load balance 2:2:1.
Even newer gear does that. TurboIron 24X for example.
I believe this has been fixed on s/w version 4.2.00 on the turboiron,
Interesting, as Fou^WBrocade's statement was that this is unfixable due to a chipset (which is Broadcom) limitation. Best regards, Daniel -- CLUE-RIPE -- Jabber: dr@cluenet.de -- dr@IRCnet -- PGP: 0xA85C8AA0
participants (4)
-
Daniel Roesen
-
Nick Hilliard
-
Payam Chychi
-
Shane Amante