LAG/ECMP and 'exact-route'

Hey-o, Which platform/software has a command to show which interface will be used for forwarding with given keys? ASR9k has a cef exec-route, and I see references to this in c-nsp, reddit and cisco.com forums, stressing how useful debugging tool it has been. Despite it not actually working, since it's just RE software, it doesn't talk to the EZchip/lightspeed, unless it has been fixed in the past couple of years, certainly hasn't worked in the timeline of various forums finding it useful. MX has 'jsim' https://www.juniper.net/documentation/en_US/day-one-books/TW_MX3D_PacketWalk... which I think actually works, but it is quite involved. I have some (false?) memory that I saw in some release note this being a bit more productised into CLI command, but I'm failing to find anything to support this memory. There is also RFC5837, which is actually implemented in QFX5k, but not for TTL exceeded, we've opened ER to get it supported on MX and PTX and for TTL exceeded. This RFC will allow programmatic platform agnostic discovery of the actual interface used, without relying on platform specific magic. So please do ask your vendors to implement it. -- ++ytti

Hi. On XR platforms you can use the "bundle-hash" command to try and calculate which member some traffic will be distributed to: RP/0/RSP0/CPU0:lab-pe2#bundle-hash bundle-ether 198 location 0/0/CPU0 Wed Aug 13 11:59:05.046 CEST Calculate Bundle-Hash for L2 or L3 or sub-int based: 2/3/4 [3]: /SR

For JUNOS I think that you are looking for user@lab> show forwarding-options load-balance ? Possible completions: destination-address Destination IP address destination-port Destination port family Layer 3 family ingress-interface Ingress Logical Interface packet-dump Raw packet dump in hex without '0x' source-address Source IP address source-port Source port tos Type of Service field transport-protocol Transport layer protocol Nitzan On Tue, Aug 12, 2025 at 5:58 PM Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
Hey-o,
Which platform/software has a command to show which interface will be used for forwarding with given keys?
ASR9k has a cef exec-route, and I see references to this in c-nsp, reddit and cisco.com forums, stressing how useful debugging tool it has been. Despite it not actually working, since it's just RE software, it doesn't talk to the EZchip/lightspeed, unless it has been fixed in the past couple of years, certainly hasn't worked in the timeline of various forums finding it useful.
MX has 'jsim' https://www.juniper.net/documentation/en_US/day-one-books/TW_MX3D_PacketWalk... which I think actually works, but it is quite involved. I have some (false?) memory that I saw in some release note this being a bit more productised into CLI command, but I'm failing to find anything to support this memory.
There is also RFC5837, which is actually implemented in QFX5k, but not for TTL exceeded, we've opened ER to get it supported on MX and PTX and for TTL exceeded. This RFC will allow programmatic platform agnostic discovery of the actual interface used, without relying on platform specific magic. So please do ask your vendors to implement it.
-- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/65IZUIUM...

Thanks Nitzan, that was what I was thinking, that is quite recent (to me) and I suspect it is syntactical sugar for 'jsim'? Unfortunately the 'show forwarding-options load-balance' doesn't allow giving MPLS label stack to it which greatly limits utility for SP networks. Steinar, in your experience does the bundle-hash give correct results? Is it actually injecting packets to ezchip/lightspeed and getting results from the HW (cef exact-route is not doing this at least). Thanks to Pedro Prado for sharing that Arista has a command for this, and indeed in Arista like in Juniper packet is actually injected to the hardware to get the result. I think none of them allow giving MPLS stack though? So mostly useful for cloudy people, not SP people. RFC5837 would more reliably give us the correct answer. On Thu, 14 Aug 2025 at 09:10, Nitzan Tzelniker via NANOG <nanog@lists.nanog.org> wrote:
For JUNOS I think that you are looking for user@lab> show forwarding-options load-balance ? Possible completions: destination-address Destination IP address destination-port Destination port family Layer 3 family ingress-interface Ingress Logical Interface packet-dump Raw packet dump in hex without '0x' source-address Source IP address source-port Source port tos Type of Service field transport-protocol Transport layer protocol
Nitzan
On Tue, Aug 12, 2025 at 5:58 PM Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
Hey-o,
Which platform/software has a command to show which interface will be used for forwarding with given keys?
ASR9k has a cef exec-route, and I see references to this in c-nsp, reddit and cisco.com forums, stressing how useful debugging tool it has been. Despite it not actually working, since it's just RE software, it doesn't talk to the EZchip/lightspeed, unless it has been fixed in the past couple of years, certainly hasn't worked in the timeline of various forums finding it useful.
MX has 'jsim' https://www.juniper.net/documentation/en_US/day-one-books/TW_MX3D_PacketWalk... which I think actually works, but it is quite involved. I have some (false?) memory that I saw in some release note this being a bit more productised into CLI command, but I'm failing to find anything to support this memory.
There is also RFC5837, which is actually implemented in QFX5k, but not for TTL exceeded, we've opened ER to get it supported on MX and PTX and for TTL exceeded. This RFC will allow programmatic platform agnostic discovery of the actual interface used, without relying on platform specific magic. So please do ask your vendors to implement it.
-- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/65IZUIUM...
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/HHWSKHAH...
-- ++ytti

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On Thursday, August 14th, 2025 at 08:26, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote: Hi Saku,
Thanks to Pedro Prado for sharing that Arista has a command for this, and indeed in Arista like in Juniper packet is actually injected to the hardware to get the result.
I'm curious to know what command was provided. On EOS one can use "show ip hardware ale routes x.x.x.x/xx" to see the next-hop + adjacency details. If I use "show ip hardware ale routes vrf XXX x.x.x.x/xx" (seeing as you're talking about MPLS) then I can also see the egress MPLS encap details that will be used. For example: L2Adj Id: 114, Adjs: 1, State: Installed, Plat. FEC: 91904, Hierarchy Depths: 0, Via nextHopHierarchical, Weight: 1, EncapType: MPLS, Operation: push, Label Stack: 116384, L3Intf: IS-IS SR tunnel index 3 (DyTun7340032.3, FecId 432468709529878531), Via L2Adj: 209 This uses L2 adj 114, that then recuses via another adj 209, so I need to dig through the recursion stack using "show ip hardware ale adj x" until I get to the bottom to see the whole thing. One can also use "show ip hardware fib vrf XXX routes x.x.x.x/xx". The first command provides the related adjacency, this second only returns the route without the related adjacency. This isn't querying hardware AFAIK, but there is the command "show ip hardware fib diff" which shows me there is no diff between software and hardware, so the output is as reliable as it can get (you can never know if the output of a command is 100% correct because it's all closed source). Neither of these let me query for any input tuple I want though, I'm just specifying the destination address only. There is the command "show forwarding destination" which allows you to specify some of the header fields, but it expects quite a specific set of headers (e.g. it must be VLAN tagged), doesn't support MPLS, doesn't allow you to specify a VRF (so you can only query the default routing table as far as I can see), and it produces the result of an incoming packet lookup, it doesn't tell you what egress port + encap would be used, so it needs to be executed on the receiving device, not the sending device. I would be interested to know if anyone has used this command with success, as it does seem to be querying the hardware. There is also the command "show port-channel load-balance jericho2 fields" (replace jericho2 with your chip name) and "show load-balance port-channel sand fields" (aliases of each other), but this only tells you which fields are parsed, you can't "ask" a question for a user provided tuple. I think what you actually want to achieve (your original question) is possible if you drop into a BASH shell and use the various debug tools there that aren't documented. I haven't had time to work out the exact syntax yet, but I believe all the required commands are there (you'd need to string a few things together, you can't just write the interface name, and source/dest IP etc, into a single command and get the result). I would be interested to know if anyone has already taken the time to do this. I have already managed to capture the lookup of packets transiting the hardware, so lookups can be captured, just need to get control of triggering that lookup. Cheers, James.
I think none of them allow giving MPLS stack though? So mostly useful for cloudy people, not SP people. RFC5837 would more reliably give us the correct answer.
-----BEGIN PGP SIGNATURE----- Version: ProtonMail wsG5BAEBCgBtBYJondhHCZCoEx+igX+A+0UUAAAAAAAcACBzYWx0QG5vdGF0 aW9ucy5vcGVucGdwanMub3JnjXc3oKk7iaDKl5cz5Zari4ZBJElUSllQvUAr MkLlvJkWIQQ+k2NZBObfK8Tl7sKoEx+igX+A+wAANqsP/jD9gSxMUGCio8Gw X3aaquZYCuRtTUouxNIuivaIAn5eorgiy6dNrAzq7cTpY6VJswCGTlyezMaj YpgzXYCr0HB4oALaOLb1ZoL6OHMGlmZXy7nl1isYPefMP13piiEh9xNBhTGb E96wEsfb9MF3AJww2W73zxoNEgNZpR3zZf/vXxyZ3+4ao2+JiLYqP17ojBzi aB0LsFE74/LjYGap/gH2mCpG7IXai+/jyRgVL2d+LSbYOoKhuYnERZajPjhI R1FwoQhWsGjS8CjpWN6fbWHZdqVXXW8i90i0MhDAHYC50CwUouDY5DOhj9UM Jv2J3x1inPew66xmR6F0BfA07+ttrT17kpVOA690/98ejxFJzEG/tPG7lW0v dKCpvPVS0lMjX8ZBxAb/l4HhrBSRHWzPNtZY4nIuOxt7TGWOrX3dgfoyCb6/ NbkJV9jgdsCTIASDv/LVE4Md5JS+q7lOnsofTjl8WNBhcP21RdRmxVraQdUV nIQYwHZ0ygQNv9nGZl03lAfP7jjsUbZBOiLPmKgPl4gTNqYTgRWViFyOASFP yvqV7xeKhKdCBHYXTy2vNoeKRdK0BIz4HPB1FXa4xmbLRpggb0bQBiEnvdVQ 7aLe/2v1+B0u90hJYYE7CfnsrL/BGz4OXrq/2Jvznyamr4Tq487OGEF60lR6 wYB5zNE4 =ceCY -----END PGP SIGNATURE-----

Hey James, the command is “show load-balance”. This is the article I linked for Saku: https://www.arista.com/en/support/toi/eos-4-17-0f/13816-ecmp-hash-visibility show load-balance destination ip ingress-interface <interface> { src-ipv4-address <ipv4-address> dst-ipv4-address <ipv4-address> | src-ipv6-address <ipv6-address> dst-ipv6-address <ipv6-address> flow-label <label> } ip-protocol <protocol> [ ip-ttl <ttl> ] { { inner src-ipv4-address <ipv4-address> inner dst-ipv4-address <ipv4-address> inner ip-protocol <proto> } | { inner src-ipv6-address <ipv6-address> inner dst-ipv6-address <ipv6-address> inner ip-protocol <proto> } [src-l4-port <port#> dst-l4-port <port#> ] [vlan-id <vlan>] It does use the hardware and as far as I can tell it indeed doesn’t support MPLS as of now. HTH, Pedro Martins Prado pedro.prado@gmail.com / +353 83 036 1875
On 14 Aug 2025, at 13:36, James Bensley via NANOG <nanog@lists.nanog.org> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On Thursday, August 14th, 2025 at 08:26, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
Hi Saku,
Thanks to Pedro Prado for sharing that Arista has a command for this, and indeed in Arista like in Juniper packet is actually injected to the hardware to get the result.
I'm curious to know what command was provided.
On EOS one can use "show ip hardware ale routes x.x.x.x/xx" to see the next-hop + adjacency details. If I use "show ip hardware ale routes vrf XXX x.x.x.x/xx" (seeing as you're talking about MPLS) then I can also see the egress MPLS encap details that will be used. For example:
L2Adj Id: 114, Adjs: 1, State: Installed, Plat. FEC: 91904, Hierarchy Depths: 0, Via nextHopHierarchical, Weight: 1, EncapType: MPLS, Operation: push, Label Stack: 116384, L3Intf: IS-IS SR tunnel index 3 (DyTun7340032.3, FecId 432468709529878531), Via L2Adj: 209
This uses L2 adj 114, that then recuses via another adj 209, so I need to dig through the recursion stack using "show ip hardware ale adj x" until I get to the bottom to see the whole thing.
One can also use "show ip hardware fib vrf XXX routes x.x.x.x/xx". The first command provides the related adjacency, this second only returns the route without the related adjacency.
This isn't querying hardware AFAIK, but there is the command "show ip hardware fib diff" which shows me there is no diff between software and hardware, so the output is as reliable as it can get (you can never know if the output of a command is 100% correct because it's all closed source).
Neither of these let me query for any input tuple I want though, I'm just specifying the destination address only.
There is the command "show forwarding destination" which allows you to specify some of the header fields, but it expects quite a specific set of headers (e.g. it must be VLAN tagged), doesn't support MPLS, doesn't allow you to specify a VRF (so you can only query the default routing table as far as I can see), and it produces the result of an incoming packet lookup, it doesn't tell you what egress port + encap would be used, so it needs to be executed on the receiving device, not the sending device.
I would be interested to know if anyone has used this command with success, as it does seem to be querying the hardware.
There is also the command "show port-channel load-balance jericho2 fields" (replace jericho2 with your chip name) and "show load-balance port-channel sand fields" (aliases of each other), but this only tells you which fields are parsed, you can't "ask" a question for a user provided tuple.
I think what you actually want to achieve (your original question) is possible if you drop into a BASH shell and use the various debug tools there that aren't documented. I haven't had time to work out the exact syntax yet, but I believe all the required commands are there (you'd need to string a few things together, you can't just write the interface name, and source/dest IP etc, into a single command and get the result). I would be interested to know if anyone has already taken the time to do this. I have already managed to capture the lookup of packets transiting the hardware, so lookups can be captured, just need to get control of triggering that lookup.
Cheers, James.
I think none of them allow giving MPLS stack though? So mostly useful for cloudy people, not SP people. RFC5837 would more reliably give us the correct answer.
-----BEGIN PGP SIGNATURE----- Version: ProtonMail
wsG5BAEBCgBtBYJondhHCZCoEx+igX+A+0UUAAAAAAAcACBzYWx0QG5vdGF0 aW9ucy5vcGVucGdwanMub3JnjXc3oKk7iaDKl5cz5Zari4ZBJElUSllQvUAr MkLlvJkWIQQ+k2NZBObfK8Tl7sKoEx+igX+A+wAANqsP/jD9gSxMUGCio8Gw X3aaquZYCuRtTUouxNIuivaIAn5eorgiy6dNrAzq7cTpY6VJswCGTlyezMaj YpgzXYCr0HB4oALaOLb1ZoL6OHMGlmZXy7nl1isYPefMP13piiEh9xNBhTGb E96wEsfb9MF3AJww2W73zxoNEgNZpR3zZf/vXxyZ3+4ao2+JiLYqP17ojBzi aB0LsFE74/LjYGap/gH2mCpG7IXai+/jyRgVL2d+LSbYOoKhuYnERZajPjhI R1FwoQhWsGjS8CjpWN6fbWHZdqVXXW8i90i0MhDAHYC50CwUouDY5DOhj9UM Jv2J3x1inPew66xmR6F0BfA07+ttrT17kpVOA690/98ejxFJzEG/tPG7lW0v dKCpvPVS0lMjX8ZBxAb/l4HhrBSRHWzPNtZY4nIuOxt7TGWOrX3dgfoyCb6/ NbkJV9jgdsCTIASDv/LVE4Md5JS+q7lOnsofTjl8WNBhcP21RdRmxVraQdUV nIQYwHZ0ygQNv9nGZl03lAfP7jjsUbZBOiLPmKgPl4gTNqYTgRWViFyOASFP yvqV7xeKhKdCBHYXTy2vNoeKRdK0BIz4HPB1FXa4xmbLRpggb0bQBiEnvdVQ 7aLe/2v1+B0u90hJYYE7CfnsrL/BGz4OXrq/2Jvznyamr4Tq487OGEF60lR6 wYB5zNE4 =ceCY -----END PGP SIGNATURE----- <publickey - lists+nanog@bensley.me - 0x3E936359.asc.sig>_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/UYPLZZOI...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On Thursday, August 14th, 2025 at 15:20, Pedro Prado <pedro.prado@gmail.com> wrote:
Hey James
Hi Pedro, Thanks for the response. Ah OK, this looks like it is the same as the command I mentioned: "show forwarding destination". #show forwarding destination ? dst-ipv4 Destination IPv4 Address dst-ipv6 Destination IPv6 Address dst-l4-port L4 Destination Port dst-mac MAC address for the destination edit Edit the configured packet eth-type Ethertype flow-label IPv6 Flow Label gre-checksum Generate GRE Checksum field gre-key GRE Key gre-sequence GRE Sequence gre-type GRE type hop-limit IPv6 Hop Limit ingress-interface Ingress Interface inner Inner packet configuration inner-vlan Inner VLAN ID ip-protocol IP Protocol ip-ttl IP TTL next-header IPv6 Next Header nvgre-flow-id NVGRE Flow ID nvgre-virtual-subnet-id NVGRE Virtual Subnet ID packet-type Packet type src-ipv4 Source IPv4 Address src-ipv6 Source IPv6 Address src-l4-port L4 Source Port src-mac MAC address of the source vlan Identifier for a Virtual LAN > Redirect output to URL >> Append redirected output to URL | Command output pipe filters <cr> The command you mentioned doesn't exist on my platform: #show load-balance destination ?% Unrecognized command #show load-balance ? ecmp Show ECMP parameters port-channel Show port-channel parameters profile Show load-balance profile The version on my platform takes the same input as yours (more or less) but as I wrote, I've found it be mostly useless, because it expects a double VLAN tagged frame for example, it's not possible to specify untagged... /me shrugs Cheers, James. -----BEGIN PGP SIGNATURE----- Version: ProtonMail wsG5BAEBCgBtBYJonel0CZCoEx+igX+A+0UUAAAAAAAcACBzYWx0QG5vdGF0 aW9ucy5vcGVucGdwanMub3JnDbiyk0XRhg5OiH2iTu3crcUaBWUP6jvH2UHF pQQFPYsWIQQ+k2NZBObfK8Tl7sKoEx+igX+A+wAAhoUQAK+vTwQzX+G/hw1+ IJwuqCp9QxVltydtvggQIzR5r/JXYC9lLKWKVy9hXnEy9B0Wbg0QPsGqJPDn UXs78/eLkd2Ho+YfRbZSijfE1mPtAhZHB/G87kME75o923eBFLdEsZ5IEhut I0zSCPUrQ8uaSzfGjWfmP5hRYWv9YjFq3R7Lyj55xOJKGpmnAyPHFnUYoiMI P16Uz7OjBiKzVhp4Uxbe+3jKm0s2C071i71pKkRgIpYE8A4qojfvc2nvEsLY 492oCwDbHgXU9Rrxcd9jhombRScc1FwF1kPPpmaWiOVZhXHQZyTdY5cmofCX /uNNKroGidqvEsSaA8xCXJRd6nAwD3JMKhDRAVh3KO6CGPLK13MplmjCZuAH 4y4AXGDXI+gu0iqOW0hK9XT7VchtbOSVLn4MuxYiCHSh3qnJu9HtyFil786x wXjjQusIpIuTKn00gWsCZegtjgB0yEn088RqEdRgC6azLJ0rXlB2SV4RtBlA GhNkjf0gtpbKbR1Srtuhet2z7NxaH87KMYfJw7XOH0fxGJf18khYgMAC+aJS nTzBMlxBfpqAb6MIwoxok31VBDUiHw0y6quwky2bYahFsmL6w8qRxD7saZvm tWbhnOHbmlhx798s+u1REiGF3OsLScz9O9gBKMspCB78B+LnBui2DH5h1j1w jkmu7YLb =p5wp -----END PGP SIGNATURE-----

Unfortunately the 'show forwarding-options load-balance' doesn't allow giving MPLS label stack to it which greatly limits utility for SP networks.
I *think* that you can use the packet-dump option to paste a packet in hex and it will give the proper result with the label stack considered. It was a couple years ago I tried this, and my memory is fuzzy if it did work correctly or not. Even if it did work it's obvious clunky as hell to have to slap a hex decode in there. I did find a note that I asked for an ER to allow label IDs on the CLI, but I can't find anything further if they said yes/no. I'll ask. On Thu, Aug 14, 2025 at 2:26 AM Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
Thanks Nitzan, that was what I was thinking, that is quite recent (to me) and I suspect it is syntactical sugar for 'jsim'?
Unfortunately the 'show forwarding-options load-balance' doesn't allow giving MPLS label stack to it which greatly limits utility for SP networks.
Steinar, in your experience does the bundle-hash give correct results? Is it actually injecting packets to ezchip/lightspeed and getting results from the HW (cef exact-route is not doing this at least).
Thanks to Pedro Prado for sharing that Arista has a command for this, and indeed in Arista like in Juniper packet is actually injected to the hardware to get the result.
I think none of them allow giving MPLS stack though? So mostly useful for cloudy people, not SP people. RFC5837 would more reliably give us the correct answer.
On Thu, 14 Aug 2025 at 09:10, Nitzan Tzelniker via NANOG <nanog@lists.nanog.org> wrote:
For JUNOS I think that you are looking for user@lab> show forwarding-options load-balance ? Possible completions: destination-address Destination IP address destination-port Destination port family Layer 3 family ingress-interface Ingress Logical Interface packet-dump Raw packet dump in hex without '0x' source-address Source IP address source-port Source port tos Type of Service field transport-protocol Transport layer protocol
Nitzan
On Tue, Aug 12, 2025 at 5:58 PM Saku Ytti via NANOG <
nanog@lists.nanog.org>
wrote:
Hey-o,
Which platform/software has a command to show which interface will be used for forwarding with given keys?
ASR9k has a cef exec-route, and I see references to this in c-nsp, reddit and cisco.com forums, stressing how useful debugging tool it has been. Despite it not actually working, since it's just RE software, it doesn't talk to the EZchip/lightspeed, unless it has been fixed in the past couple of years, certainly hasn't worked in the timeline of various forums finding it useful.
MX has 'jsim'
https://www.juniper.net/documentation/en_US/day-one-books/TW_MX3D_PacketWalk...
which I think actually works, but it is quite involved. I have some (false?) memory that I saw in some release note this being a bit more productised into CLI command, but I'm failing to find anything to support this memory.
There is also RFC5837, which is actually implemented in QFX5k, but not for TTL exceeded, we've opened ER to get it supported on MX and PTX and for TTL exceeded. This RFC will allow programmatic platform agnostic discovery of the actual interface used, without relying on platform specific magic. So please do ask your vendors to implement it.
-- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/65IZUIUM...
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/HHWSKHAH...
-- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/DMC65GBT...

This is something that's been asked for as a feature more times than I can count, on more platforms than I can count either. First, I'll assert that this is completely possible. Second, I'll assert that it's not even really "hard" -- but it is a lot of work. The way almost all of these mechanisms work is by identifying a set of key fields from the packet and the associated metadata, then applying some hash function to those fields, and then using that result to index into a table of possible next hops. This is relatively simple if (for example) you’re ONLY doing “pick a member link from this etherchannel / LAG bundle”. If you have 5 members in the bundle, you hash the key fields, then modulo that by 5, and then you use the modulo-remainder as the index into the table from [0:4] and the answer is which port you go out. HOWEVER, this starts to get really complex as soon as you start dealing with multiple levels of recursion. As a much more complex example you might have network where you’re doing Inter-AS VPN, and you might be using something like MPLS-TE within it. Now your resolution “tree” could include at least the following decisions that need to be made: 1. Packet arrives, I need to do a FIB lookup to identify which ASBR exit point to use. So that’s one hash calculation and an index into the group for the ASBR. 2. Once I have that, I have multiple tunnels/paths to get to that ASBR. So I need to do another hash (no guarantee that it’s the same number of parallel paths) and do another index into another table. This tells me which tunnel I’m going to use (and the encap/label for that tunnel...). Once I have THAT... 3. Now I might find out that the output interface for my tunnel is actually a bundle/LAG, which means I need to do (yet) another hash calculation, and index into (yet) another table to pick which actual ethernet interface I send the packet on. (I also have to get the encap for this, to figure out source/destination MAC addresses for this particular link...) SO.... tired yet? Because we ain’t done... If it’s possible for the packet to arrive on any kind of virtual interface (like an MPLS/GRE tunnel, or a pseudowire, or whatever) then I probably have to do some extra digging. Let’s use GRE : in this case somewhere in the forwarding code I have to make sure that I’m using the correct source and destination IP addresses for my hashing.... because hashing on the tunnel address isn’t going to have NEARLY as much entropy as hashing on the actual end-station IP source/dest. UNLESS, of course, someone really does want to hash the whole tunnel together, which means now I have to implement both forwarding flows, AND the knob to select which to use. See? Fun. All of this has to be kept in sync by the control plane across what might be over a hundred discrete forwarding chips/NPUs. If any of those chips don’t get the memo, instead of a router you have a great big doorstop. SO... To figure out exactly what the output interface is for any given packet, you need some combination of (at least) the following info: * the packet itself, often parsed into at least: * IP source / dest * layer4 source / dest * TOS * Entropy / flow labels * ALL of the metadata that might be used to make ANY of the hashing decisions... * at least some systems use the input interface as an input into the hash. * some use TOS, some don’t. * some have different hash generator algorithms, so you have to know which one * usually there’s some additional hash seed for more entropy (such as the router-ID) – you have to know if this is in play and if so what it is. * you need to know exactly which NPU(s) are in the forwarding path, because there’s no guarantee that they use the same algorithms. Once you’ve collected all of this information, and if you assume that you either maintain constantly or can query each of the NPUs for the current state of its local resolution tree(s), now you can compute which output interface will be chosen for that given packet. But to tell the control plane how to do all this math, you’re going to have a VERY long input CLI, or you’re going to just have to feed your command the hex from the raw packet and let it do the decoding for you. And don’t forget that you have to explicitly tell it all the things like “what input interface are we simulating” and “are we using the entropy label or not” and all that. So. Anyway. In my newfound role as head apologist for people who build big systems... the main reason that these commands don’t exist on most systems is not because we don’t know how to implement them, and not because we don’t see the value in implementing them. It’s because the cost to implement (and maintain!!!) them is actually really high, and people have decided (with their wallets) that they want other things more than they want this. </apologies> --lj -----Original Message----- From: Tom Beecher via NANOG <nanog@lists.nanog.org> Sent: Thursday, August 14, 2025 12:31 PM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Steinar.Rimestad@altibox.no; Tom Beecher <beecher@beecher.cc> Subject: Re: LAG/ECMP and 'exact-route'
Unfortunately the 'show forwarding-options load-balance' doesn't allow
giving MPLS label stack to it which greatly limits utility for SP
networks.
I *think* that you can use the packet-dump option to paste a packet in hex and it will give the proper result with the label stack considered. It was a couple years ago I tried this, and my memory is fuzzy if it did work correctly or not. Even if it did work it's obvious clunky as hell to have to slap a hex decode in there. I did find a note that I asked for an ER to allow label IDs on the CLI, but I can't find anything further if they said yes/no. I'll ask. On Thu, Aug 14, 2025 at 2:26 AM Saku Ytti via NANOG <nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
Thanks Nitzan, that was what I was thinking, that is quite recent (to
me) and I suspect it is syntactical sugar for 'jsim'?
Unfortunately the 'show forwarding-options load-balance' doesn't allow
giving MPLS label stack to it which greatly limits utility for SP
networks.
Steinar, in your experience does the bundle-hash give correct results?
Is it actually injecting packets to ezchip/lightspeed and getting
results from the HW (cef exact-route is not doing this at least).
Thanks to Pedro Prado for sharing that Arista has a command for this,
and indeed in Arista like in Juniper packet is actually injected to
the hardware to get the result.
I think none of them allow giving MPLS stack though? So mostly useful
for cloudy people, not SP people. RFC5837 would more reliably give us
the correct answer.
On Thu, 14 Aug 2025 at 09:10, Nitzan Tzelniker via NANOG
<nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>> wrote:
For JUNOS I think that you are looking for user@lab> show
forwarding-options load-balance ?
Possible completions:
destination-address Destination IP address
destination-port Destination port
family Layer 3 family
ingress-interface Ingress Logical Interface
packet-dump Raw packet dump in hex without '0x'
source-address Source IP address
source-port Source port
tos Type of Service field
transport-protocol Transport layer protocol
Nitzan
On Tue, Aug 12, 2025 at 5:58 PM Saku Ytti via NANOG <
nanog@lists.nanog.org<mailto:nanog@lists.nanog.org>>
wrote:
Hey-o,
Which platform/software has a command to show which interface will
be used for forwarding with given keys?
ASR9k has a cef exec-route, and I see references to this in c-nsp,
reddit and cisco.com forums, stressing how useful debugging tool
it has been. Despite it not actually working, since it's just RE
software, it doesn't talk to the EZchip/lightspeed, unless it has
been fixed in the past couple of years, certainly hasn't worked in
the timeline of various forums finding it useful.
MX has 'jsim'
https://www.juniper.net/documentation/en_US/day-one-books/TW_MX3D_Pack
etWalkthrough.pdf
which I think actually works, but it is quite involved. I have
some
(false?) memory that I saw in some release note this being a bit
more productised into CLI command, but I'm failing to find
anything to support this memory.
There is also RFC5837, which is actually implemented in QFX5k, but
not for TTL exceeded, we've opened ER to get it supported on MX
and PTX and for TTL exceeded. This RFC will allow programmatic
platform agnostic discovery of the actual interface used, without
relying on platform specific magic. So please do ask your vendors
to implement it.
--
++ytti
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/65
IZUIUM3WTM56W3CLM6HOGK2T7DCEKF/
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/HH
WSKHAH2RWUUZN5XMLUCOKMCCLXCK77/
--
++ytti
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/DM
C65GBTZVZXSWB3NBCCOO7YRBWAXLGS/
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/YQQFFLNG...

Hey LJ,
* the packet itself, often parsed into at least: * IP source / dest * layer4 source / dest * TOS * Entropy / flow labels * ALL of the metadata that might be used to make ANY of the hashing decisions... * at least some systems use the input interface as an input into the hash. * some use TOS, some don’t. * some have different hash generator algorithms, so you have to know which one * usually there’s some additional hash seed for more entropy (such as the router-ID) – you have to know if this is in play and if so what it is. * you need to know exactly which NPU(s) are in the forwarding path, because there’s no guarantee that they use the same algorithms.
Ingress interface is also a common hash key. Also for tunneling (MPLS, GRE, IPIP, GTP) you may look at bottom headers as well. And in ICMP packets, like PMTUD etc, you should actually hash on the embedded packet, not the actual headers, but this is rarely if ever implemented (despite actually being relatively simple to implement), breaking PMTUd in ECMP cases, causing customers to implement weird workarounds (https://blog.cloudflare.com/path-mtu-discovery-in-practice/). Anyhow, if you have to know which NPU you are using, you misunderstood the assignment. This implementation will work once, when it gets written, and over time it will get wrong because different people maintain the EZchip/LS and the RE hash-code command, it is guaranteed to feed bad information to the user. This is basically where Cisco is today, there is code (cef exact-route), but it doesn't talk to HW, and it gives results people use, but which are not correct. I know that Juniper MX (not PTX) injects the packet in the HW lookup engine, and runs the normal ucode and yoinks the answer. So it will be correct, no one has to maintain it. I did understand from other contributors that this is how Arista implementation works too, but it also appears to have platform gaps. Of course even if this is implemented correctly, for the points you make there is an extremely large risk that users simply do not give the right set of keys, they'll still get results and again end up confidently working with bad data. For these reasons RFC5837 is so much better, the far end system simply tells where it received the frame, removing all guess-work and fragility. So it might be best that the standard case would be that users use RF5837 to glean this information and the 'exact-route' solution on the NOS is the exception case, when you simply do not have the ability to generate those packets right now for real.
So. Anyway. In my newfound role as head apologist for people who build big systems... the main reason that these commands don’t exist on most systems is not because we don’t know how to implement them, and not because we don’t see the value in implementing them. It’s because the cost to implement (and maintain!!!) them is actually really high, and people have decided (with their wallets) that they want other things more than they want this.
If this was true, you would have implemented RFC5837. The real reason why things are not implemented is that no one dangled fat RFQ gated by the request. This is how features get implemented, even when they are absolutely stupid features which should not be implemented and customers should be educated about why what they ask introduces fragility that cannot be justified due to superior options already exists. But doing things the right way and having a good business case may not always go hand in hand. These absolutely stupid features increase technical debt and cause fragility to all users, but of course they help winning that RFQ, so they get implemented. -- ++ytti
participants (7)
-
James Bensley
-
LJ Wobker (lwobker)
-
Nitzan Tzelniker
-
Pedro Prado
-
Saku Ytti
-
steinar.rimestad@lyse.no
-
Tom Beecher