EVPN-MPLS MH-ES DF Election Behavior and BUM forwarding
Good day everyone. For those of you that are using EVPN-MPLS, although this likely applies equally to VXLAN based transport, I have a question for you based on your observations in your production networks. I have a basic configuration as follows: * Two PE routers that provide connectivity to a single downstream device via an all-active multi-homed LAG connection. * The downstream device could be a single aggregation switch, or something like an OLT. Either way, the downstream devices is configured with an uplink port to each PE, and the ports in question use LACP. * The two PE routers do not have any physical connections between each other, but instead have redundant connections to a pair of core routers. These uplinks carry MPLS traffic. * The upstream core routers are acting as route reflectors. * MRAI is set to 0 for route exchange between the core and PE routers. What I am curious to get feedback on is related to BUM traffic forwarding in the brief moment between the start and conclusion of a DF election; and risks related to BUM packets being forwarded back into the same ES from whence they came. Scenario of concern: * PE1 and PE2 are both in a steady state in the network, full routing tables are already propagated. * The port between PE1 and the CE device is active, with LACP negotiated, and with PE1 having announced the relevant EVPN routes, specifically including the type 1 and 4 routes; the port from PE2 to CE is not currently active. * Upon the link between the CE device and PE2 becoming active, with LACP negotiated, PE2 should announce the relevant type 1 and 4 routes and the DF election should commence. Before the conclusion of the DF election, we expect the following to happen, but we only have reference to single vendor implementation and we know there can be RFC interpretation differences which lead to implementation differences. * There will likely be a very brief moment wherein LACP is up and the port on PE2 can send and receive traffic, but the EVPN routes have not yet propagated from PE2 to PE1 for PE1 to include PE2's ESI label in outgoing BUM traffic that may need to be delivered to ports unrelated to this ES that may exist in the same EVI on PE2. The result of which is that those BUM packets could be forwarded by PE2 back to the CE device. I'm not clear that this is avoidable, but I expect the propagation and processing period here is very short. * Whereas, if PE2 received a BUM packet from the newly activated ES interface that was destined for PE1, it would include PE1's ES label in the stack. As PE2, in this scenario, already knows this label value prior to the activation of the ES port, I suspect there was never really a risk of PE1 receiving a BUM packet from PE2, from the same ES, that it then forwarded back into the ES. * In this moment, we believe that PE2 should not assume the DF role, for no other reason than it clearly had received routes for this ES that indicated another PE already being active. My reading of RFC7432 in this regard does not seem 110% explicit, but I don't know why PE2 would assume anything other than a non-DF role prior to the conclusion of the election. * As soon as the type 1 and 4 routes reach from PE2 reach PE1 and they are processed, all future BUM packets sent to PE2 should have PE2's ESI label on the stack, at which point, PE2 should not forward BUM traffic into the ES, as long as it didn't assume the DF role prior to the conclusion of the election. * After the election timer has concluded: - the DF role may stay with PE1, at which point nothing really changes other than the shared knowledge of that. All is good. - the DF role could move to PE2, but both PE1 and PE2 have ESI labels for each other already, and it's really just the rest of the network adjusting where it sends BUM packets relative to this ES. I guess there's a chance that there was already a packet in flight to PE1 for this ES, and PE1 may not forward the packet into the ES; I'm not clear on this, but this isn't an area of concern right now. Other scenarios: I'm frankly not worried about other scenarios as I suspect most platforms have a holddown timer that can be used to suppress forwarding of BUM packets into an ES before routes have a chance to propagate and the conclusion of the DF election. What I'm concerned about in asking for this feedback is largely interpretation of section 8.5 (Designated Forwarder Election) of RFC7432, and how it, for instance, doesn't explicitly say that ahead of step 4, the new PE should assume a non-DF role; and what operators see in their production networks. Do the major manufacturers of network gear and network operating systems all do the same thing? Are there systemic problems related to ES looped BUM traffic prior to the conclusion of the DF election and we just need to accept that? Hopefully this all makes sense. If there is something I neglected to comment on or consider, or just got wrong, I'm happy to receive some education. Thanks in advance, Graham
What is “MRAI” ? What dictates DF Election? Which PE wins and why? Aaron
On Feb 11, 2026, at 3:34 PM, Graham Johnston via NANOG <nanog@lists.nanog.org> wrote:
Good day everyone.
For those of you that are using EVPN-MPLS, although this likely applies equally to VXLAN based transport, I have a question for you based on your observations in your production networks.
I have a basic configuration as follows: * Two PE routers that provide connectivity to a single downstream device via an all-active multi-homed LAG connection. * The downstream device could be a single aggregation switch, or something like an OLT. Either way, the downstream devices is configured with an uplink port to each PE, and the ports in question use LACP. * The two PE routers do not have any physical connections between each other, but instead have redundant connections to a pair of core routers. These uplinks carry MPLS traffic. * The upstream core routers are acting as route reflectors. * MRAI is set to 0 for route exchange between the core and PE routers.
What I am curious to get feedback on is related to BUM traffic forwarding in the brief moment between the start and conclusion of a DF election; and risks related to BUM packets being forwarded back into the same ES from whence they came.
Scenario of concern:
* PE1 and PE2 are both in a steady state in the network, full routing tables are already propagated. * The port between PE1 and the CE device is active, with LACP negotiated, and with PE1 having announced the relevant EVPN routes, specifically including the type 1 and 4 routes; the port from PE2 to CE is not currently active. * Upon the link between the CE device and PE2 becoming active, with LACP negotiated, PE2 should announce the relevant type 1 and 4 routes and the DF election should commence.
Before the conclusion of the DF election, we expect the following to happen, but we only have reference to single vendor implementation and we know there can be RFC interpretation differences which lead to implementation differences.
* There will likely be a very brief moment wherein LACP is up and the port on PE2 can send and receive traffic, but the EVPN routes have not yet propagated from PE2 to PE1 for PE1 to include PE2's ESI label in outgoing BUM traffic that may need to be delivered to ports unrelated to this ES that may exist in the same EVI on PE2. The result of which is that those BUM packets could be forwarded by PE2 back to the CE device.
I'm not clear that this is avoidable, but I expect the propagation and processing period here is very short.
* Whereas, if PE2 received a BUM packet from the newly activated ES interface that was destined for PE1, it would include PE1's ES label in the stack. As PE2, in this scenario, already knows this label value prior to the activation of the ES port, I suspect there was never really a risk of PE1 receiving a BUM packet from PE2, from the same ES, that it then forwarded back into the ES.
* In this moment, we believe that PE2 should not assume the DF role, for no other reason than it clearly had received routes for this ES that indicated another PE already being active. My reading of RFC7432 in this regard does not seem 110% explicit, but I don't know why PE2 would assume anything other than a non-DF role prior to the conclusion of the election.
* As soon as the type 1 and 4 routes reach from PE2 reach PE1 and they are processed, all future BUM packets sent to PE2 should have PE2's ESI label on the stack, at which point, PE2 should not forward BUM traffic into the ES, as long as it didn't assume the DF role prior to the conclusion of the election.
* After the election timer has concluded: - the DF role may stay with PE1, at which point nothing really changes other than the shared knowledge of that. All is good. - the DF role could move to PE2, but both PE1 and PE2 have ESI labels for each other already, and it's really just the rest of the network adjusting where it sends BUM packets relative to this ES. I guess there's a chance that there was already a packet in flight to PE1 for this ES, and PE1 may not forward the packet into the ES; I'm not clear on this, but this isn't an area of concern right now.
Other scenarios:
I'm frankly not worried about other scenarios as I suspect most platforms have a holddown timer that can be used to suppress forwarding of BUM packets into an ES before routes have a chance to propagate and the conclusion of the DF election.
What I'm concerned about in asking for this feedback is largely interpretation of section 8.5 (Designated Forwarder Election) of RFC7432, and how it, for instance, doesn't explicitly say that ahead of step 4, the new PE should assume a non-DF role; and what operators see in their production networks. Do the major manufacturers of network gear and network operating systems all do the same thing? Are there systemic problems related to ES looped BUM traffic prior to the conclusion of the DF election and we just need to accept that?
Hopefully this all makes sense. If there is something I neglected to comment on or consider, or just got wrong, I'm happy to receive some education.
Thanks in advance, Graham _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/2HKUYUYK...
Hi graham, My understanding is that the peering timer postpones the evpn startup for the new link giving time for the DF election. Never verified in a lab mind you. Brian On Wed, Feb 11, 2026, 22:34 Graham Johnston via NANOG <nanog@lists.nanog.org> wrote:
Good day everyone.
For those of you that are using EVPN-MPLS, although this likely applies equally to VXLAN based transport, I have a question for you based on your observations in your production networks.
I have a basic configuration as follows: * Two PE routers that provide connectivity to a single downstream device via an all-active multi-homed LAG connection. * The downstream device could be a single aggregation switch, or something like an OLT. Either way, the downstream devices is configured with an uplink port to each PE, and the ports in question use LACP. * The two PE routers do not have any physical connections between each other, but instead have redundant connections to a pair of core routers. These uplinks carry MPLS traffic. * The upstream core routers are acting as route reflectors. * MRAI is set to 0 for route exchange between the core and PE routers.
What I am curious to get feedback on is related to BUM traffic forwarding in the brief moment between the start and conclusion of a DF election; and risks related to BUM packets being forwarded back into the same ES from whence they came.
Scenario of concern:
* PE1 and PE2 are both in a steady state in the network, full routing tables are already propagated. * The port between PE1 and the CE device is active, with LACP negotiated, and with PE1 having announced the relevant EVPN routes, specifically including the type 1 and 4 routes; the port from PE2 to CE is not currently active. * Upon the link between the CE device and PE2 becoming active, with LACP negotiated, PE2 should announce the relevant type 1 and 4 routes and the DF election should commence.
Before the conclusion of the DF election, we expect the following to happen, but we only have reference to single vendor implementation and we know there can be RFC interpretation differences which lead to implementation differences.
* There will likely be a very brief moment wherein LACP is up and the port on PE2 can send and receive traffic, but the EVPN routes have not yet propagated from PE2 to PE1 for PE1 to include PE2's ESI label in outgoing BUM traffic that may need to be delivered to ports unrelated to this ES that may exist in the same EVI on PE2. The result of which is that those BUM packets could be forwarded by PE2 back to the CE device.
I'm not clear that this is avoidable, but I expect the propagation and processing period here is very short.
* Whereas, if PE2 received a BUM packet from the newly activated ES interface that was destined for PE1, it would include PE1's ES label in the stack. As PE2, in this scenario, already knows this label value prior to the activation of the ES port, I suspect there was never really a risk of PE1 receiving a BUM packet from PE2, from the same ES, that it then forwarded back into the ES.
* In this moment, we believe that PE2 should not assume the DF role, for no other reason than it clearly had received routes for this ES that indicated another PE already being active. My reading of RFC7432 in this regard does not seem 110% explicit, but I don't know why PE2 would assume anything other than a non-DF role prior to the conclusion of the election.
* As soon as the type 1 and 4 routes reach from PE2 reach PE1 and they are processed, all future BUM packets sent to PE2 should have PE2's ESI label on the stack, at which point, PE2 should not forward BUM traffic into the ES, as long as it didn't assume the DF role prior to the conclusion of the election.
* After the election timer has concluded: - the DF role may stay with PE1, at which point nothing really changes other than the shared knowledge of that. All is good. - the DF role could move to PE2, but both PE1 and PE2 have ESI labels for each other already, and it's really just the rest of the network adjusting where it sends BUM packets relative to this ES. I guess there's a chance that there was already a packet in flight to PE1 for this ES, and PE1 may not forward the packet into the ES; I'm not clear on this, but this isn't an area of concern right now.
Other scenarios:
I'm frankly not worried about other scenarios as I suspect most platforms have a holddown timer that can be used to suppress forwarding of BUM packets into an ES before routes have a chance to propagate and the conclusion of the DF election.
What I'm concerned about in asking for this feedback is largely interpretation of section 8.5 (Designated Forwarder Election) of RFC7432, and how it, for instance, doesn't explicitly say that ahead of step 4, the new PE should assume a non-DF role; and what operators see in their production networks. Do the major manufacturers of network gear and network operating systems all do the same thing? Are there systemic problems related to ES looped BUM traffic prior to the conclusion of the DF election and we just need to accept that?
Hopefully this all makes sense. If there is something I neglected to comment on or consider, or just got wrong, I'm happy to receive some education.
Thanks in advance, Graham _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/2HKUYUYK...
My gut would expect an ESI member to not be able to forward until its role in EVPN is clear. It's not a regular L2 port that will be "promoted" to an ESI member and can forward meanwhile. (I actually hope that) *Pedro Martins Prado* pedro.prado@gmail.com / +353 83 036 1875 (FaceTime & WhatsApp) On Thu, 12 Feb 2026 at 15:15, brian turnbow via NANOG <nanog@lists.nanog.org> wrote:
Hi graham,
My understanding is that the peering timer postpones the evpn startup for the new link giving time for the DF election.
Never verified in a lab mind you.
Brian
On Wed, Feb 11, 2026, 22:34 Graham Johnston via NANOG < nanog@lists.nanog.org> wrote:
Good day everyone.
For those of you that are using EVPN-MPLS, although this likely applies equally to VXLAN based transport, I have a question for you based on your observations in your production networks.
I have a basic configuration as follows: * Two PE routers that provide connectivity to a single downstream device via an all-active multi-homed LAG connection. * The downstream device could be a single aggregation switch, or something like an OLT. Either way, the downstream devices is configured with an uplink port to each PE, and the ports in question use LACP. * The two PE routers do not have any physical connections between each other, but instead have redundant connections to a pair of core routers. These uplinks carry MPLS traffic. * The upstream core routers are acting as route reflectors. * MRAI is set to 0 for route exchange between the core and PE routers.
What I am curious to get feedback on is related to BUM traffic forwarding in the brief moment between the start and conclusion of a DF election; and risks related to BUM packets being forwarded back into the same ES from whence they came.
Scenario of concern:
* PE1 and PE2 are both in a steady state in the network, full routing tables are already propagated. * The port between PE1 and the CE device is active, with LACP negotiated, and with PE1 having announced the relevant EVPN routes, specifically including the type 1 and 4 routes; the port from PE2 to CE is not currently active. * Upon the link between the CE device and PE2 becoming active, with LACP negotiated, PE2 should announce the relevant type 1 and 4 routes and the DF election should commence.
Before the conclusion of the DF election, we expect the following to happen, but we only have reference to single vendor implementation and we know there can be RFC interpretation differences which lead to implementation differences.
* There will likely be a very brief moment wherein LACP is up and the port on PE2 can send and receive traffic, but the EVPN routes have not yet propagated from PE2 to PE1 for PE1 to include PE2's ESI label in outgoing BUM traffic that may need to be delivered to ports unrelated to this ES that may exist in the same EVI on PE2. The result of which is that those BUM packets could be forwarded by PE2 back to the CE device.
I'm not clear that this is avoidable, but I expect the propagation and processing period here is very short.
* Whereas, if PE2 received a BUM packet from the newly activated ES interface that was destined for PE1, it would include PE1's ES label in the stack. As PE2, in this scenario, already knows this label value prior to the activation of the ES port, I suspect there was never really a risk of PE1 receiving a BUM packet from PE2, from the same ES, that it then forwarded back into the ES.
* In this moment, we believe that PE2 should not assume the DF role, for no other reason than it clearly had received routes for this ES that indicated another PE already being active. My reading of RFC7432 in this regard does not seem 110% explicit, but I don't know why PE2 would assume anything other than a non-DF role prior to the conclusion of the election.
* As soon as the type 1 and 4 routes reach from PE2 reach PE1 and they are processed, all future BUM packets sent to PE2 should have PE2's ESI label on the stack, at which point, PE2 should not forward BUM traffic into the ES, as long as it didn't assume the DF role prior to the conclusion of the election.
* After the election timer has concluded: - the DF role may stay with PE1, at which point nothing really changes other than the shared knowledge of that. All is good. - the DF role could move to PE2, but both PE1 and PE2 have ESI labels for each other already, and it's really just the rest of the network adjusting where it sends BUM packets relative to this ES. I guess there's a chance that there was already a packet in flight to PE1 for this ES, and PE1 may not forward the packet into the ES; I'm not clear on this, but this isn't an area of concern right now.
Other scenarios:
I'm frankly not worried about other scenarios as I suspect most platforms have a holddown timer that can be used to suppress forwarding of BUM packets into an ES before routes have a chance to propagate and the conclusion of the DF election.
What I'm concerned about in asking for this feedback is largely interpretation of section 8.5 (Designated Forwarder Election) of RFC7432, and how it, for instance, doesn't explicitly say that ahead of step 4, the new PE should assume a non-DF role; and what operators see in their production networks. Do the major manufacturers of network gear and network operating systems all do the same thing? Are there systemic problems related to ES looped BUM traffic prior to the conclusion of the DF election and we just need to accept that?
Hopefully this all makes sense. If there is something I neglected to comment on or consider, or just got wrong, I'm happy to receive some education.
Thanks in advance, Graham _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/2HKUYUYK...
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/V2SMCTJO...
participants (4)
-
Aaron1 -
brian turnbow -
Graham Johnston -
Pedro Prado