
In practice what happened after Juniper enabled that infrastructure is that we started to get a lot of bugs where after the network event we had a blackholing event. These were largely caused because software omits reprogramming hardware when something happens sufficiently fast that software didn't have time to invalidate the best option, then software will prune the invalid+valid before it enters hardware. Which is good optimisation, unless you've now added capability in the hardware to invalidate adjance without sw.
The deltas in hardware programming speed actually become apparent and problematic if you have linecards more than a few generations apart in the same chassis. Assuming all MPCs get the NH updates at the same time, newer card will finish programming and forwarding traffic to NH on older card, which may not be done processing that update yet, so microBH until it catches up. A trickier problem around distributed chassis for sure, but when that combines with software choices like this, makes it a fun time to try and run down. But another notch as to why making the SPF marginally faster doesn't matter so much. On Wed, Aug 20, 2025 at 2:34 AM Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Mon, 18 Aug 2025 at 21:22, Matthew Petach via NANOG <nanog@lists.nanog.org> wrote:
I don't know of many networks that choose link costs to ensure resulting uniqueness of the cumulative cost through the path. Indeed, ECMP is taken to be an assumption for most IGPs we use in the real world.
That is funny, and of course we can beat Djikstra massively if we can make assumptions for specific environments, which is arguably what engineering is, take advantage of environment constants that allow for assumptions which yield to optimisation.
How is SPF ran today? I have no clue, because the modern approach to convergence is not to converge fast, but to converge before fault. Which is not something Djikstra does. The naive approach would be to just run SPF many many times, removing from the topology failed nodes and edges to recover post-converge topology and loop free alternative paths. But absolutely there exists some domain specific solution which is cheaper when you need to recover both the best current path and best post-convergence paths. If such an algorithm is actually used or if the much more antifragile approach is used to throw a compute at it and run SPF as many times as it takes, I have no idea.
In Junos a few years back they enabled out-of-box the infrastructure for this post-fault convergence, regardless if or not you chose to install the backup paths. How this is implemented in practice is that the same structure that ECMP uses is used for backup paths, just the backup path is programmed in the hardware at worse weight, so it becomes excluded as ECMP option during lookup result. However because the infrastructure is still enabled, if for example interface flaps, the HW will invalidate the best ECMP option, and the next-best (if any) becomes valid.
In practice what happened after Juniper enabled that infrastructure is that we started to get a lot of bugs where after the network event we had a blackholing event. These were largely caused because software omits reprogramming hardware when something happens sufficiently fast that software didn't have time to invalidate the best option, then software will prune the invalid+valid before it enters hardware. Which is good optimisation, unless you've now added capability in the hardware to invalidate adjance without sw. To our surprise, Junos code has suffered so much technical debt that Juniper doesn't actually know every place in code where this could happen. We raised a separate issue to figure out why so many similar bugs occurred to us, and Juniper came out with an answer which is paraphrased as 'we just have to find all the bugs where this can happen''. Naively you'd want that all these go through one function call, and you fix the bug once there, but apparently the codebase is far less clean so they cannot deterministically say if all of those cases are fixed or not. This used to be, in my experience, super rare in Junos that HW/SW disagree with, while it used to be extremely common in PFC3. We've not not seen this type of bug in a year or two, so maybe most are fixed.
But certainly if you are running MPLS you can have 100% coverage for all faults, if post-convergence path exists, you can utilise it immediately after hardware detects fault (link down), without waiting for software. This makes SPT performance quite uninteresting, if rapid convergence is the goal.
-- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D4TMSWXO...