interesting troubleshooting

newer
Re: free collaborative tools for...

Nimrod Levy

20 Mar 2020 20 Mar '20

9:33 p.m.

I just ran into an issue that I thought was worth sharing with the NANOG community. With recently increased visibility on keeping the Internet running smoothly, I thought that sharing this small experience could benefit everyone. I was contacted by my NOC to investigate a LAG that was not distributing traffic evenly among the members to the point where one member was congested while the utilization on the LAG was reasonably low. Looking at my netflow data, I was able to confirm that this was caused by a single large flow of ESP traffic. Fortunately, I was able to shift this flow to another path that had enough headroom available so that the flow could be accommodated on a single member link. With the increase in remote workers and VPN traffic that won't hash across multiple paths, I thought this anecdote might help someone else track down a problem that might not be so obvious. Please take this message in the spirit in which it was intended and refrain from the snarky "just upgrade you links" comments. -- Nimrod

Attachments:

attachment.html (text/html — 1.3 KB)

Show replies by date

Job Snijders

20 Mar 20 Mar

9:50 p.m.

On Fri, Mar 20, 2020 at 05:33:31PM -0400, Nimrod Levy wrote:

...

With the increase in remote workers and VPN traffic that won't hash across multiple paths, I thought this anecdote might help someone else track down a problem that might not be so obvious.

Do we know which specific VPN technologies specifically are harder to hash in a meaningful way for load balanacing purposes, than others? If the outcome of this troubleshooting is a list of recommendations about which VPN approaches to use, and which ones to avoid (because of the issue you described), that'll be a great outcome. Kind regards, Job

Jared Mauch

9:57 p.m.

...

On Mar 20, 2020, at 5:50 PM, Job Snijders <job@ntt.net> wrote:

On Fri, Mar 20, 2020 at 05:33:31PM -0400, Nimrod Levy wrote:

...
With the increase in remote workers and VPN traffic that won't hash across multiple paths, I thought this anecdote might help someone else track down a problem that might not be so obvious.

Do we know which specific VPN technologies specifically are harder to hash in a meaningful way for load balanacing purposes, than others?

If the outcome of this troubleshooting is a list of recommendations about which VPN approaches to use, and which ones to avoid (because of the issue you described), that'll be a great outcome.

It’s the protocol 50 IPSEC VPNs. They are very sensitive to path changes and reordering as well. If you’re tunneling more than 5 or 10Gb/s of IPSEC it’s likely going to be a bad day when you find a low speed link in the middle. Generally providers with these types of flows have both sides on the same network vs going off-net as they’re not stable on peering links that might change paths. You also need to watch out to ensure you’re not on some L2VPN type product that bumps up against a barrier. I know it’s a stressful time for many networks and systems people as traffic shifts. Good luck out there! - Jared

Job Snijders

10:26 p.m.

On Fri, Mar 20, 2020 at 05:57:19PM -0400, Jared Mauch wrote:

...

You also need to watch out to ensure you’re not on some L2VPN type product that bumps up against a barrier. I know it’s a stressful time for many networks and systems people as traffic shifts.

A few years ago we did a presentation about what can happen if hashing for load balancing purposes doesn't work well (be it either IP or L2VPN traffic). I think some of the information is still relevant as there really isn't much difference between the problem existing in the underlay network's implementation of algorithms or the properties of the enveloppe that encompasses the overlay network packet. video of younger job + jeff: https://www.youtube.com/watch?v=cXSwoKu9zOg slides: https://archive.nanog.org/meetings/nanog57/presentations/Tuesday/tues.genera... Kind regards, Job

Christopher Morrow

21 Mar 21 Mar

5:42 p.m.

(skipping up the thread some) On Fri, Mar 20, 2020 at 5:58 PM Jared Mauch <jared@puck.nether.net> wrote:

...

It’s the protocol 50 IPSEC VPNs. They are very sensitive to path changes and reordering as well.

If you’re tunneling more than 5 or 10Gb/s of IPSEC it’s likely going to be a bad day when you find a low speed link in the middle. Generally providers with these types of flows have both sides on the same network vs going off-net as they’re not stable on peering links that might change paths.

a bunch of times the advice given to folk in this situation is: "Add more entropy", which really for ipsec/gre/etc vpns means more endpoints. For instance, adding 3 more ips on either side for tunnel egress/ingress will make the flows (ideally) smaller and more probable to hash across different links in the intermediary network(s). This also moves the loadbalancing back behind the customer prem so ideally perhaps even the nxM flows are now balanced a little better as well. sometimes this works, sometimes it's hard to accomplish :(

Brandon Martin

24 Mar 24 Mar

5:02 p.m.

On 3/20/20 5:57 PM, Jared Mauch wrote:

...

It’s the protocol 50 IPSEC VPNs. They are very sensitive to path changes and reordering as well.

Is there a reason these are so sensitive to re-ordering or path changes? ESP should just encap whatever is underneath it on a packet-by-packet basis and be relatively stateless on its own unless folks are super strictly enforcing sequence numbering (maybe this is common?). I can understand that some of the underlying protocols in use, especially LAN protocols like SMB/CIFS, might not really like re-ordering or public-Internet-like jitter and delay changes, but that's going to be the case with any transparent VPN and is one of SMB/CIFS many flaws. For LAGs where both endpoints are on the same gear (either the same box/chassis or a multi-chassis virtual setup where both planes are geographically local) and all links traverse the same path i.e. the LAG is purely for capacity, I've always wondered by round-robin isn't more common. That will re-order by at worst the number of links in the LAG, and if the links are much faster and well utilized compared to the sub-flows, I'd expect the re-ordering to be minimal even then though I haven't done the math to show it and might be wrong. I'd argue that any remote access VPN product that can't handle minor packet re-ordering is sufficiently flawed as to be useless. Systems designed for very controlled deployment on a long-term point-to-point basis are perhaps excepted, here. -- Brandon Martin

William Herrin

21 Mar 21 Mar

3:11 a.m.

On Fri, Mar 20, 2020 at 3:07 PM Job Snijders <job@ntt.net> wrote:

...

Do we know which specific VPN technologies specifically are harder to hash in a meaningful way for load balanacing purposes, than others?

I would expect it to be true of any site to site VPN data flow. The whole idea is for the guy in the middle to be unable to deduce anything about the flow. If the technology provides hints about which packets match the same subflow, it isn't doing a very good job. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/

Saku Ytti

20 Mar 20 Mar

10:07 p.m.

Hey Nimrod,

...

I was contacted by my NOC to investigate a LAG that was not distributing traffic evenly among the members to the point where one member was congested while the utilization on the LAG was reasonably low. Looking at my netflow data, I was able to confirm that this was caused by a single large flow of ESP traffic. Fortunately, I was able to shift this flow to another path that had enough headroom available so that the flow could be accommodated on a single member link.

With the increase in remote workers and VPN traffic that won't hash across multiple paths, I thought this anecdote might help someone else track down a problem that might not be so obvious.

This problem is called elephant flow. Some vendors have solution for this, by dynamically monitoring utilisation and remapping the hashResult => egressInt table to create bias to offset the elephant flow. One particular example: https://www.juniper.net/documentation/en_US/junos/topics/reference/configura... Ideally VPN providers would be defensive and would use SPORT for entropy, like MPLSoUDP does. -- ++ytti

Matthew Petach

10:23 p.m.

On Fri, Mar 20, 2020 at 3:09 PM Saku Ytti <saku@ytti.fi> wrote:

...

Hey Nimrod,

...
I was contacted by my NOC to investigate a LAG that was not distributing traffic evenly among the members to the point where one member was congested while the utilization on the LAG was reasonably low. Looking at my netflow data, I was able to confirm that this was caused by a single large flow of ESP traffic. Fortunately, I was able to shift this flow to another path that had enough headroom available so that the flow could be accommodated on a single member link.

With the increase in remote workers and VPN traffic that won't hash across multiple paths, I thought this anecdote might help someone else track down a problem that might not be so obvious.

This problem is called elephant flow. Some vendors have solution for this, by dynamically monitoring utilisation and remapping the hashResult => egressInt table to create bias to offset the elephant flow.

One particular example:

https://www.juniper.net/documentation/en_US/junos/topics/reference/configura...

Ideally VPN providers would be defensive and would use SPORT for entropy, like MPLSoUDP does.

-- ++ytti

There are *several* caveats to doing dynamic monitoring and remapping of flows; one of the biggest challenges is that it puts extra demands on the line cards tracking the flows, especially as the number of flows rises to large values. I recommend reading https://www.juniper.net/documentation/en_US/junos/topics/topic-map/load-bala... before configuring it. "Although the feature performance is high, it consumes significant amount of line card memory. Approximately, 4000 logical interfaces or 16 aggregated Ethernet logical interfaces can have this feature enabled on supported MPCs. However, when the Packet Forwarding Engine hardware memory is low, depending upon the available memory, it falls back to the default load balancing mechanism." What is that old saying? Oh, right--There Ain't No Such Thing As A Free Lunch. ^_^;; Matt

Saku Ytti

21 Mar 21 Mar

7:53 a.m.

Hey Matthew,

...

There are *several* caveats to doing dynamic monitoring and remapping of flows; one of the biggest challenges is that it puts extra demands on the line cards tracking the flows, especially as the number of flows rises to large values. I recommend reading https://www.juniper.net/documentation/en_US/junos/topics/topic-map/load-bala... before configuring it.

You are confusing two features. Stateful and adaptive. I was proposing adaptive, which just remaps the table, which is free, it is not flow aware. Amount of flow results is very small bound number, amount of states is very large unbound number. -- ++ytti

Matthew Petach

22 Mar 22 Mar

8:35 a.m.

On Sat, Mar 21, 2020 at 12:53 AM Saku Ytti <saku@ytti.fi> wrote:

...

Hey Matthew,

...
There are *several* caveats to doing dynamic monitoring and remapping of flows; one of the biggest challenges is that it puts extra demands on the line cards tracking the flows, especially as the number of flows rises to large values. I recommend reading

https://www.juniper.net/documentation/en_US/junos/topics/topic-map/load-bala...

...
before configuring it.

You are confusing two features. Stateful and adaptive. I was proposing adaptive, which just remaps the table, which is free, it is not flow aware. Amount of flow results is very small bound number, amount of states is very large unbound number.

Ah, apologies--you are right, I scanned down the linked document too quickly, thinking it was a single set of configuration notes. Thanks for setting me straight on that. Matt

...

-- ++ytti

Chris Adams

20 Mar 20 Mar

10:15 p.m.

Once upon a time, Nimrod Levy <nimrodl@gmail.com> said:

...

With the increase in remote workers and VPN traffic that won't hash across multiple paths, I thought this anecdote might help someone else track down a problem that might not be so obvious.

Last week I ran into an issue where traffic between my home and work networks had high latency, but only to certain IPs (even different IPs on the same server). Since my work network peers with my home provider, I was able to go to the provider's NOC, and they were very helpful (they ended up turning up more bandwidth). I expect this was also a case of one LAG member being congested, and my problem IP pairs were hashing to that member. My traffic wasn't VPN (SSH, with ping/mtr for testing), but it is possible that somebody else's was - I didn't get detailed with the other NOC. -- Chris Adams <cma@cmadams.net>

Steve Meuse

21 Mar 21 Mar

2:17 a.m.

What that large flow in a single LSP? Is this something that FAT lsp would fix? -Steve On Fri, Mar 20, 2020 at 5:33 PM Nimrod Levy <nimrodl@gmail.com> wrote:

...

I just ran into an issue that I thought was worth sharing with the NANOG community. With recently increased visibility on keeping the Internet running smoothly, I thought that sharing this small experience could benefit everyone.

I was contacted by my NOC to investigate a LAG that was not distributing traffic evenly among the members to the point where one member was congested while the utilization on the LAG was reasonably low. Looking at my netflow data, I was able to confirm that this was caused by a single large flow of ESP traffic. Fortunately, I was able to shift this flow to another path that had enough headroom available so that the flow could be accommodated on a single member link.

With the increase in remote workers and VPN traffic that won't hash across multiple paths, I thought this anecdote might help someone else track down a problem that might not be so obvious.

Please take this message in the spirit in which it was intended and refrain from the snarky "just upgrade you links" comments.

-- Nimrod

Saku Ytti

7:58 a.m.

On Sat, 21 Mar 2020 at 04:20, Steve Meuse <smeuse@mara.org> wrote:

...

What that large flow in a single LSP? Is this something that FAT lsp would fix?

No. FAT adds additional MPLS label for entropy, ingressPE calculates flow hash, based on traditional flow keys and injects that flow number as MPLS label, so transit LSR can use MPLS labels for balancing, without being able to parse the frame. Similarly VPN provider could do that, and inject that flow hash as SPORT at the time of tunneling, by looking at the inside packet. And any defensive VPN provider should do this, as it would be a competitive advantage. Now for some vendors, like Juniper and Nokia transit LSR can look inside pseudowire L3 packet for flow keys, so you don't even need FAT for this. Some other like ASR9k cannot, and you'll need FAT for it. But all of this requires that there is entropy to use, if it's truly just single fat flow, then you won't balance it. Then you have to create bias to the hashResult=>egressInt table, which by default is fair, each egressInt has same amount of hashResults, for elephant flows you want the congested egressInt to be mapped to fewer amount of hashResults. -- ++ytti

Mark Tinka

4:15 p.m.

On 21/Mar/20 09:58, Saku Ytti wrote:

...

No.

FAT adds additional MPLS label for entropy, ingressPE calculates flow hash, based on traditional flow keys and injects that flow number as MPLS label, so transit LSR can use MPLS labels for balancing, without being able to parse the frame. Similarly VPN provider could do that, and inject that flow hash as SPORT at the time of tunneling, by looking at the inside packet. And any defensive VPN provider should do this, as it would be a competitive advantage. Now for some vendors, like Juniper and Nokia transit LSR can look inside pseudowire L3 packet for flow keys, so you don't even need FAT for this. Some other like ASR9k cannot, and you'll need FAT for it.

But all of this requires that there is entropy to use, if it's truly just single fat flow, then you won't balance it. Then you have to create bias to the hashResult=>egressInt table, which by default is fair, each egressInt has same amount of hashResults, for elephant flows you want the congested egressInt to be mapped to fewer amount of hashResults.

So the three or four times we tried to get FAT going (in a multi-vendor network), it simply didn't work. Have you (or anyone else) had any luck with it, in practice? Mark.

Saku Ytti

4:25 p.m.

On Sat, 21 Mar 2020 at 18:19, Mark Tinka <mark.tinka@seacom.mu> wrote:

...

So the three or four times we tried to get FAT going (in a multi-vendor network), it simply didn't work.

Yeah we run it in a multivendor network (JNPR, CSCO, NOK), works. I would also recommend people exclusively using CW+FAT and disabling LSR payload heuristics (JNPR default, but by default won't do with CW, can do with CW too). -- ++ytti

Mark Tinka

22 Mar 22 Mar

7:41 a.m.

On 21/Mar/20 18:25, Saku Ytti wrote:

...

Yeah we run it in a multivendor network (JNPR, CSCO, NOK), works.

I would also recommend people exclusively using CW+FAT and disabling LSR payload heuristics (JNPR default, but by default won't do with CW, can do with CW too).

We weren't as successful (MX480 ingress/egress devices transiting a CRS core). In the end, we updated our policy to avoid running LAG's in the backbone, and going ECMP instead. Even with l2vpn payloads, that spreads a lot more evenly. Mark.

Saku Ytti

9:52 a.m.

On Sun, 22 Mar 2020 at 09:41, Mark Tinka <mark.tinka@seacom.mu> wrote:

...

We weren't as successful (MX480 ingress/egress devices transiting a CRS core).

So you're not even talking about multivendor, as both ends are JNPR? Or are you confusing entropy label with FAT? Transit doesn't know anything about FAT, FAT is PW specific and is only signalled between end-points. Entropy label applies to all services and is signalled to adjacent device. Transit just sees 1 label longer label stack, with hope (not promise) that transit uses the additional label for hashing.

...

In the end, we updated our policy to avoid running LAG's in the backbone, and going ECMP instead. Even with l2vpn payloads, that spreads a lot more evenly.

You really should be doing CW+FAT. And looking your other email, dear god, don't do per-packet outside some unique application where you control the TCP stack :). Modern Windows, Linux, MacOS TCP stack considers out-of-order as packet loss, this is not inherent to TCP, if you can change TCP congestion control, you can make reordering entirely irrelevant to TCP. But in most cases of course we do not control TCP algo, so per-packet will not work one bit. Like OP, you should enable adaptive. This thread is conflating few different balancing issues, so I'll take the opportunity to classify them. 1. Bad hashing implementation 1.1 Insufficient amount of hash-results Think say 6500/7600, what if you only have 8 hash-results and 7 interfaces? You will inherently have 2x more traffic on one interface 1.2 Bad algorithm Different hashes have different use-cases, and we often try to think golden-hammer for hashes (like we tend to use bad hashes for password hashing, like SHA etc, when goal of SHA is to be fast in HW, which is opposite to the goal of PW hash, as you want it to be slow). Equally since the day1 of ethernet silicon, we've had CRC in the slicion, and it has since then been grandfathered hash load-balancing hash. But CRC goals are completely different to hash-algo goals, CRC does not try, and does not need good diffusion quality, hash-algo only needs perfect diffusion, nothing else matters. CRC has terrible diffusion quality, instead of implementing specific good-diffusion hash in silicon vendors do stuff like rot(crcN(x), crcM(x)) which greatly improves diffusion, but is still very bad diffusion compared to hash algos which are designed for perfect diffusion. Poor diffusion means you have different flow count in egressInts. As I can't do math, I did monte-carlo simulation to see what type of bias should we expect even with _perfect_ diffusion: - Here we have 3 egressInt and we run monte carlo until we stop getting worse Bias (of course if we wait for heath death of universe, we will see eventually see every flow in singleInt, even with perfect diffusion). But in normal situation if you see worse bias, you should blame poor diffusion quality of vendor algo, if you see bias of this or lower, it's probably not diffusion you should blame Flows | MaxBias | Example Flow Count per Int 1k | 6.9% | 395, 341, 264 10k | 2.2% | 3490, 3396, 3114 100k |0.6% | 33655, 32702, 33643 1M | 0.2% | 334969, 332424, 332607 2. Elephant flows Even if we assume perfect diffusion, so each egressInt gets exactly same amount of flows, the flows may still be wildly different bps, and there is nothing we do by tuning the hash algo to fix this. The prudent fix here is to have mapping-table between hash-result and egressInt, so that we can inject bias, not to have fair distribution between hash-result and egressInt, but to have fewer hash-results point to the congested egressInt. This is easy, ~free to implement in HW. JNPR does it, NOK is happy to implement should customer want it. This of course also fixes bad algorithmic diffusion, so it's really really great tool to have in your toolbox and I think everyone should be running this feature. 3. Incorrect key recovery Balancing is promise that we know which keys identify a flow. In common case this is simple problem, but there are lot of complexity particularly in MPLS transit. The naive/simple problem everyone knows about is pseudowire flow in-transit parsed as IPv4/IPv6 flow, when DMAC starts with 4 or 6. Some vendors (JNPR, Huawei) do additional checks, like perhaps IP checksum or IP packet length, but this is actually making the situation worse, the problem triggers far less often, but when it triggers, it will be so much more exotic, as now you have underlaying frame where by luck you also have your IP packet length supposedly correct. So you can end up in weird situations where end-customers network works perfectly, then they implement IPSEC from all hosts to concentrator, still riding over your backbone, and now suddently one customer host stops working, after enabling IPSEC, everything else works. The chances that this trouble-ticket even ever ends on your table is low and the possibility that based on the problem description you'd blame the backbone is negligible. Customer will just end up renumberign the host or replacing it's DMAC or something and no one will ever know why it was broken. So it's crucial not to do payload heuristics in MPLS transit, as it cannot be done correctly by-design. FAT and Entropy labels solve this problem correctly, moving the hash-result generation to the edge, where you still can do it correctly. -- ++ytti

Mark Tinka

2:25 p.m.

On 22/Mar/20 11:52, Saku Ytti wrote:

...

So you're not even talking about multivendor, as both ends are JNPR? Or are you confusing entropy label with FAT?

Some cases were MX480 to ASR920, but most were MX480 to MX480, either transiting CRS.

...

Transit doesn't know anything about FAT, FAT is PW specific and is only signalled between end-points. Entropy label applies to all services and is signalled to adjacent device. Transit just sees 1 label longer label stack, with hope (not promise) that transit uses the additional label for hashing.

So the latter. We used both FAT + entropy to provide even load balancing of l2vpn payloads in the edge and core, with little success.

...

You really should be doing CW+FAT.

Yeah - just going back to basics with ECMP worked well, and I'd prefer to use solutions that are less exotic as possible.

...

And looking your other email, dear god, don't do per-packet outside some unique application where you control the TCP stack :). Modern Windows, Linux, MacOS TCP stack considers out-of-order as packet loss, this is not inherent to TCP, if you can change TCP congestion control, you can make reordering entirely irrelevant to TCP. But in most cases of course we do not control TCP algo, so per-packet will not work one bit.

Like I said, that was 2014. We tested it for a couple of months, mucked around as much as we could, and decided it wasn't worth the bother.

...

Like OP, you should enable adaptive.

That's what I said we are doing since 2014, unless I wasn't clear. Mark.

Saku Ytti

5:17 p.m.

On Sun, 22 Mar 2020 at 16:25, Mark Tinka <mark.tinka@seacom.mu> wrote:

...

So the latter. We used both FAT + entropy to provide even load balancing of l2vpn payloads in the edge and core, with little success.

You don't need both. My rule of thumb, green field, go with entropy and get all the services in one go. Brown field, go FAT, and target just PW, ensure you also have CW, then let transit LSR balance MPLS-IP. With entropy label you can entirely disable transit LSR payload heuristics. -- ++ytti

Mark Tinka

5:21 p.m.

On 22/Mar/20 19:17, Saku Ytti wrote:

...

You don't need both. My rule of thumb, green field, go with entropy and get all the services in one go. Brown field, go FAT, and target just PW, ensure you also have CW, then let transit LSR balance MPLS-IP. With entropy label you can entirely disable transit LSR payload heuristics.

We moved to our current strategy back in 2015/2016, after running through multiple combinations of FAT and entropy. I'm curious to give it another go in 2020, but if I'm honest, I'm pleased with the simplicity of our current setup. Mark.

adamv0025＠netconsultings.com

24 Mar 24 Mar

4:23 a.m.

...

Saku Ytti Sent: Saturday, March 21, 2020 4:26 PM

On Sat, 21 Mar 2020 at 18:19, Mark Tinka <mark.tinka@seacom.mu> wrote:

...
So the three or four times we tried to get FAT going (in a multi-vendor network), it simply didn't work.

Yeah we run it in a multivendor network (JNPR, CSCO, NOK), works.

I would also recommend people exclusively using CW+FAT and disabling LSR payload heuristics (JNPR default, but by default won't do with CW, can do with CW too).

And I'd add entropy labels too -for L3VPN traffic. Using all this you know where to look (at PE edge) for any hashing related problems. adam

Tassos Chatzithomaoglou

21 Mar 21 Mar

4:52 p.m.

Mark Tinka wrote on 21/3/20 18:15:

...

So the three or four times we tried to get FAT going (in a multi-vendor network), it simply didn't work.

Have you (or anyone else) had any luck with it, in practice?

Mark.

Only between Cisco boxes. I still don't understand why the vendors cannot make it work in one direction only (the low-end platform would only need to remove an extra label, no need to inspect traffic). That would help us a lot, since the majority of our traffic is downstream to the customer. -- Tassos

Saku Ytti

5:04 p.m.

On Sat, 21 Mar 2020 at 18:55, Tassos Chatzithomaoglou <achatz@forthnet.gr> wrote:

...

I still don't understand why the vendors cannot make it work in one direction only (the low-end platform would only need to remove an extra label, no need to inspect traffic). That would help us a lot, since the majority of our traffic is downstream to the customer.

It is signalled separately for TX and RX and some vendors do allow you to signal it separately. -- ++ytti

Tassos Chatzithomaoglou

8:51 p.m.

Saku Ytti wrote on 21/3/20 19:04:

...

On Sat, 21 Mar 2020 at 18:55, Tassos Chatzithomaoglou <achatz@forthnet.gr> wrote:

...
I still don't understand why the vendors cannot make it work in one direction only (the low-end platform would only need to remove an extra label, no need to inspect traffic). That would help us a lot, since the majority of our traffic is downstream to the customer. It is signalled separately for TX and RX and some vendors do allow you to signal it separately.

Yep, the RFC gives this option. Does Juniper MX/ACX series support it? I know for sure Cisco doesn't. -- Tassos

Saku Ytti

22 Mar 22 Mar

9:26 a.m.

Hey Tassos, On Sat, 21 Mar 2020 at 22:51, Tassos Chatzithomaoglou <achatz@forthnet.gr> wrote:

...

Yep, the RFC gives this option. Does Juniper MX/ACX series support it? I know for sure Cisco doesn't.

I only run bidir, which Cisco do you mean? ASR9k allows you to configure it. both Insert/Discard Flow label on transmit/recceive code Flow label TLV code receive Discard Flow label on receive transmit Insert Flow label on transmit JunOS as well: flow-label-receive Advertise capability to pop Flow Label in receive direction to remote PE flow-label-receive-static Pop Flow Label from PW packets received from remote PE flow-label-transmit Advertise capability to push Flow Label in transmit direction to remote PE flow-label-transmit-static Push Flow Label on PW packets sent to remote PE RP/0/RP0/CPU0:r14.labxtx01.us.(config-l2vpn-pwc-mpls)#do show l2vpn xconnect interface Te0/2/0/3/7.1000 detail .. PW: neighbor 204.42.110.29, PW ID 1290, state is up ( established ) PW class ethernet-ccc, XC ID 0xa0000025 Encapsulation MPLS, protocol LDP Source address 204.42.110.15 PW type Ethernet, control word disabled, interworking none PW backup disable delay 0 sec Sequencing not set LSP : Up Load Balance Hashing: src-dst-ip Flow Label flags configured (Tx=1,Rx=0), negotiated (Tx=1,Rx=0) .... ytti@r28.labxtx01.us.bb# run show l2circuit connections interface et-0/0/54:3.0 ... Neighbor: 204.42.110.15 Interface Type St Time last up # Up trans et-0/0/54:3.0(vc 1290) rmt Up Mar 20 04:06:45 2020 7 Remote PE: 204.42.110.15, Negotiated control-word: No Incoming label: 585, Outgoing label: 24003 Negotiated PW status TLV: No Local interface: et-0/0/54:3.0, Status: Up, Encapsulation: ETHERNET Description: BD: wmccall ixia 1-1 Flow Label Transmit: No, Flow Label Receive: Yes ... I didn't push bits, but at least I can signal unidir between ASR9k and PTX1k. -- ++ytti

Adam Atkinson

8:08 a.m.

On 20/03/2020 21:33, Nimrod Levy wrote:

...

I was contacted by my NOC to investigate a LAG that was not distributing traffic evenly among the members to the point where one member was congested while the utilization on the LAG was reasonably low.

I don't know how well-known this is, and it may not be something many people would want to do, but Enterasys switches, now part of Extreme's portfolio, allow "round-robin" as a load-sharing algorithm on LAGs. see e.g. https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-L... This may not be the only product line supporting this.

Mark Tinka

8:17 a.m.

On 22/Mar/20 10:08, Adam Atkinson wrote:

...

I don't know how well-known this is, and it may not be something many people would want to do, but Enterasys switches, now part of Extreme's portfolio, allow "round-robin" as a load-sharing algorithm on LAGs.

see e.g.

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-L...

This may not be the only product line supporting this.

So Junos does support both per-flow and per-packet load balancing on LAG's on Trio line cards. We tested this back in 2014 for a few months, and while the spread is excellent (obviously), it creates a lot of out-of-order frame delivery conditions, and all the pleasure & joy that goes along with that. So we switched back to per-flow load balancing, and more recently, where we run LAG's (802.1Q trunks between switches and an MX480 in the data centre), we've gone 100Gbps so we don't have to deal with all this anymore :-). Mark.

2003

Age (days ago)

2007

Last active (days ago)

List overview

Download

27 comments

14 participants

participants (14)

Adam Atkinson
adamv0025＠netconsultings.com
Brandon Martin
Chris Adams
Christopher Morrow
Jared Mauch
Job Snijders
Mark Tinka
Matthew Petach
Nimrod Levy
Saku Ytti
Steve Meuse
Tassos Chatzithomaoglou
William Herrin