Breaking the Sorting Barrier for Directed Single-Source Shortest Paths

I was scrolling LinkedIn and came across a post that mentioned a research paper: Breaking the Sorting Barrier for Directed Single-Source Shortest Paths TLDR: Dijkstra got defeated after 40 years. It will be interesting to see what convergence times will look like with this implemented. Different formats of the same research paper: * https://arxiv.org/pdf/2504.17033 * https://dl.acm.org/doi/pdf/10.1145/3717823.3718179 Kind regards, Ryan Hamel

On Mon, 18 Aug 2025 at 06:57, Ryan Hamel via NANOG <nanog@lists.nanog.org> wrote:
I was scrolling LinkedIn and came across a post that mentioned a research paper: Breaking the Sorting Barrier for Directed Single-Source Shortest Paths
TLDR: Dijkstra got defeated after 40 years. It will be interesting to see what convergence times will look like with this implemented.
I don't think it has practical implications for ISIS/OSPF networks, on a small network with under 300 nodes ISIS SPF costs like 25ms, flooding takes longer. I'm not sure how much cheaper this is, maybe I'm butchering this. If we assume 300 nodes and 900 edges. irb(main):002:0> (900+300) * Math.log(300) # (the paper claims 𝑂(𝑚+𝑛log 𝑛), while ISIS book by Abe Martey claims O(m log m) or O(m log n) for non-highly meshed (which is what I used to estimate in real life with good results, which is somewhat less than what the paper suggests for Djikstra) => 6844.538969587441 irb(main):001:0> 900 * 2/3*Math.log(300) => 3422.2694847937205 Absolutely insane improvement, if (doubtful) I'm not butchering this. But still largely immaterial in practice in this application, but I'm sure in many other Djikstra applications this kind of win can be meaningful. I don't understand the paper and can't tell what we are losing in terms of CPU time needed or DRAM needed, as the improvement makes assumptions about the environment which may not be true. -- ++ytti

TLDR: Dijkstra got defeated after 40 years. It will be interesting to see what convergence times will look like with this implemented.
Good case study of why you can't TLDR science. From the paper directly , emphasis on the last sentence mine : Dijkstra’s algorithm also produces an ordering of vertices by distances
from the source as a byproduct. A recent contribution by Haeupler, Hladík, Rozhoň, Tarjan and Tětek [HHR+24] showed that Dijkstra’s algorithm is optimal if we require the algorithm to output the order of vertices by distances. If only the distances and not the ordering are required, a recent result by Duan, Mao, Shu and Yin [DMSY23] provided an O(m √ log n log log n)-time randomized SSSP algorithm for undirected graphs, better than O(n log n) in sparse graphs. ***** However it remains to break such a sorting barrier in directed graphs. *****
1. Dijkstra wasn't 'defeated'. There have been many algorithms that outperformed Dijkstra's under specific cases. 2. OSPF and IS-IS are directed graphs. This algorithm outperforms Dijkstra on *undirected* graphs. On Sun, Aug 17, 2025 at 11:57 PM Ryan Hamel via NANOG <nanog@lists.nanog.org> wrote:
I was scrolling LinkedIn and came across a post that mentioned a research paper: Breaking the Sorting Barrier for Directed Single-Source Shortest Paths
TLDR: Dijkstra got defeated after 40 years. It will be interesting to see what convergence times will look like with this implemented.
Different formats of the same research paper:
* https://arxiv.org/pdf/2504.17033 * https://dl.acm.org/doi/pdf/10.1145/3717823.3718179
Kind regards,
Ryan Hamel
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RPNTZTJE...

You stopped reading too soon. ;-P This paper does indeed work on directed graphs; the paragraph you cited was laying down previous work upon which this paper builds. However, the weakness in this paper lies further down in the heart of it on page 4: Total order of Paths. Like in many papers on shortest path algorithms, we make the following assumption for an easier presentation of our algorithm: Assumption 2.1. All paths we obtain in this algorithm have different lengths. I don't know of many networks that choose link costs to ensure resulting uniqueness of the cumulative cost through the path. Indeed, ECMP is taken to be an assumption for most IGPs we use in the real world. It's a cute paper, and reading through the algorithms presented, it takes an interesting hybrid approach to identifying 'pivot' vertices that are more interesting to explore first to speed up pruning of uninteresting edges; but it relies on the assumption given above that all paths have different lengths so you can unambiguously prune vertices that are greater than the current best path length. In our highly-ECMP-dependent networks, that assumption falls flat on its face, and the pruning step fails. Note further that the data structure proposed in Lemma 3.3 at the bottom of page 6 makes the assumption explicit, that updating the structure involves only a single insertion or update at each iteration because path lengths are unique, so trying to relax the uniqueness assumption would involve tossing out their insertion-efficient storage structure as well. Interesting, but not really currently applicable to modern IP networks. Unlike what another old-timer might sometimes say, I do *not* encourage my competitors to implement this algorithm in their routing hardware. ;-P Matt On Mon, Aug 18, 2025, 07:27 Tom Beecher via NANOG <nanog@lists.nanog.org> wrote:
TLDR: Dijkstra got defeated after 40 years. It will be interesting to see what convergence times will look like with this implemented.
Good case study of why you can't TLDR science. From the paper directly , emphasis on the last sentence mine :
Dijkstra’s algorithm also produces an ordering of vertices by distances
from the source as a byproduct. A recent contribution by Haeupler, Hladík, Rozhoň, Tarjan and Tětek [HHR+24] showed that Dijkstra’s algorithm is optimal if we require the algorithm to output the order of vertices by distances. If only the distances and not the ordering are required, a recent result by Duan, Mao, Shu and Yin [DMSY23] provided an O(m √ log n log log n)-time randomized SSSP algorithm for undirected graphs, better than O(n log n) in sparse graphs. ***** However it remains to break such a sorting barrier in directed graphs. *****
1. Dijkstra wasn't 'defeated'. There have been many algorithms that outperformed Dijkstra's under specific cases. 2. OSPF and IS-IS are directed graphs. This algorithm outperforms Dijkstra on *undirected* graphs.
On Sun, Aug 17, 2025 at 11:57 PM Ryan Hamel via NANOG < nanog@lists.nanog.org> wrote:
I was scrolling LinkedIn and came across a post that mentioned a research paper: Breaking the Sorting Barrier for Directed Single-Source Shortest Paths
TLDR: Dijkstra got defeated after 40 years. It will be interesting to see what convergence times will look like with this implemented.
Different formats of the same research paper:
* https://arxiv.org/pdf/2504.17033 * https://dl.acm.org/doi/pdf/10.1145/3717823.3718179
Kind regards,
Ryan Hamel
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RPNTZTJE...
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/X56R5NAP...

Mea culpa here. Tom Strickx pointed out to me that I misread part of the intro. I incorrectly assumed that the DMSY23 algorithm (which does not beat Dijkstra on directed ) was the subject of the paper. It is not, that was prior work by the same authors. The paper does make the claim that their new algo beats Dijkstra on directed, and I wanted to make sure I wasn't promulgating false info from my mistake. Saku's point with respect to materiality is still applicable though. Especially that flooding takes way more time than SPF. On Mon, Aug 18, 2025 at 10:26 AM Tom Beecher <beecher@beecher.cc> wrote:
TLDR: Dijkstra got defeated after 40 years. It will be interesting to see
what convergence times will look like with this implemented.
Good case study of why you can't TLDR science. From the paper directly , emphasis on the last sentence mine :
Dijkstra’s algorithm also produces an ordering of vertices by distances
from the source as a byproduct. A recent contribution by Haeupler, Hladík, Rozhoň, Tarjan and Tětek [HHR+24] showed that Dijkstra’s algorithm is optimal if we require the algorithm to output the order of vertices by distances. If only the distances and not the ordering are required, a recent result by Duan, Mao, Shu and Yin [DMSY23] provided an O(m √ log n log log n)-time randomized SSSP algorithm for undirected graphs, better than O(n log n) in sparse graphs. ***** However it remains to break such a sorting barrier in directed graphs. *****
1. Dijkstra wasn't 'defeated'. There have been many algorithms that outperformed Dijkstra's under specific cases. 2. OSPF and IS-IS are directed graphs. This algorithm outperforms Dijkstra on *undirected* graphs.
On Sun, Aug 17, 2025 at 11:57 PM Ryan Hamel via NANOG < nanog@lists.nanog.org> wrote:
I was scrolling LinkedIn and came across a post that mentioned a research paper: Breaking the Sorting Barrier for Directed Single-Source Shortest Paths
TLDR: Dijkstra got defeated after 40 years. It will be interesting to see what convergence times will look like with this implemented.
Different formats of the same research paper:
* https://arxiv.org/pdf/2504.17033 * https://dl.acm.org/doi/pdf/10.1145/3717823.3718179
Kind regards,
Ryan Hamel
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RPNTZTJE...

You stopped reading too soon. ;-P
This paper does indeed work on directed graphs; the paragraph you cited was laying down previous work upon which this paper builds.
Sorry Matt, just seeing that you called this out as well. Apparently Gmail thought you were a dirty spammer. :) On Mon, Aug 18, 2025 at 2:22 PM Matthew Petach <mpetach@netflight.com> wrote:
You stopped reading too soon. ;-P
This paper does indeed work on directed graphs; the paragraph you cited was laying down previous work upon which this paper builds.
However, the weakness in this paper lies further down in the heart of it on page 4:
Total order of Paths. Like in many papers on shortest path algorithms, we make the following assumption for an easier presentation of our algorithm: Assumption 2.1. All paths we obtain in this algorithm have different lengths.
I don't know of many networks that choose link costs to ensure resulting uniqueness of the cumulative cost through the path. Indeed, ECMP is taken to be an assumption for most IGPs we use in the real world.
It's a cute paper, and reading through the algorithms presented, it takes an interesting hybrid approach to identifying 'pivot' vertices that are more interesting to explore first to speed up pruning of uninteresting edges; but it relies on the assumption given above that all paths have different lengths so you can unambiguously prune vertices that are greater than the current best path length. In our highly-ECMP-dependent networks, that assumption falls flat on its face, and the pruning step fails.
Note further that the data structure proposed in Lemma 3.3 at the bottom of page 6 makes the assumption explicit, that updating the structure involves only a single insertion or update at each iteration because path lengths are unique, so trying to relax the uniqueness assumption would involve tossing out their insertion-efficient storage structure as well.
Interesting, but not really currently applicable to modern IP networks.
Unlike what another old-timer might sometimes say, I do *not* encourage my competitors to implement this algorithm in their routing hardware. ;-P
Matt
On Mon, Aug 18, 2025, 07:27 Tom Beecher via NANOG <nanog@lists.nanog.org> wrote:
TLDR: Dijkstra got defeated after 40 years. It will be interesting to
see
what convergence times will look like with this implemented.
Good case study of why you can't TLDR science. From the paper directly , emphasis on the last sentence mine :
Dijkstra’s algorithm also produces an ordering of vertices by distances
from the source as a byproduct. A recent contribution by Haeupler, Hladík, Rozhoň, Tarjan and Tětek [HHR+24] showed that Dijkstra’s algorithm is optimal if we require the algorithm to output the order of vertices by distances. If only the distances and not the ordering are required, a recent result by Duan, Mao, Shu and Yin [DMSY23] provided an O(m √ log n log log n)-time randomized SSSP algorithm for undirected graphs, better than O(n log n) in sparse graphs. ***** However it remains to break such a sorting barrier in directed graphs. *****
1. Dijkstra wasn't 'defeated'. There have been many algorithms that outperformed Dijkstra's under specific cases. 2. OSPF and IS-IS are directed graphs. This algorithm outperforms Dijkstra on *undirected* graphs.
On Sun, Aug 17, 2025 at 11:57 PM Ryan Hamel via NANOG < nanog@lists.nanog.org> wrote:
I was scrolling LinkedIn and came across a post that mentioned a research paper: Breaking the Sorting Barrier for Directed Single-Source Shortest Paths
TLDR: Dijkstra got defeated after 40 years. It will be interesting to see what convergence times will look like with this implemented.
Different formats of the same research paper:
* https://arxiv.org/pdf/2504.17033 * https://dl.acm.org/doi/pdf/10.1145/3717823.3718179
Kind regards,
Ryan Hamel
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/RPNTZTJE...
_______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/X56R5NAP...

On Mon, 18 Aug 2025 at 21:22, Matthew Petach via NANOG <nanog@lists.nanog.org> wrote:
I don't know of many networks that choose link costs to ensure resulting uniqueness of the cumulative cost through the path. Indeed, ECMP is taken to be an assumption for most IGPs we use in the real world.
That is funny, and of course we can beat Djikstra massively if we can make assumptions for specific environments, which is arguably what engineering is, take advantage of environment constants that allow for assumptions which yield to optimisation. How is SPF ran today? I have no clue, because the modern approach to convergence is not to converge fast, but to converge before fault. Which is not something Djikstra does. The naive approach would be to just run SPF many many times, removing from the topology failed nodes and edges to recover post-converge topology and loop free alternative paths. But absolutely there exists some domain specific solution which is cheaper when you need to recover both the best current path and best post-convergence paths. If such an algorithm is actually used or if the much more antifragile approach is used to throw a compute at it and run SPF as many times as it takes, I have no idea. In Junos a few years back they enabled out-of-box the infrastructure for this post-fault convergence, regardless if or not you chose to install the backup paths. How this is implemented in practice is that the same structure that ECMP uses is used for backup paths, just the backup path is programmed in the hardware at worse weight, so it becomes excluded as ECMP option during lookup result. However because the infrastructure is still enabled, if for example interface flaps, the HW will invalidate the best ECMP option, and the next-best (if any) becomes valid. In practice what happened after Juniper enabled that infrastructure is that we started to get a lot of bugs where after the network event we had a blackholing event. These were largely caused because software omits reprogramming hardware when something happens sufficiently fast that software didn't have time to invalidate the best option, then software will prune the invalid+valid before it enters hardware. Which is good optimisation, unless you've now added capability in the hardware to invalidate adjance without sw. To our surprise, Junos code has suffered so much technical debt that Juniper doesn't actually know every place in code where this could happen. We raised a separate issue to figure out why so many similar bugs occurred to us, and Juniper came out with an answer which is paraphrased as 'we just have to find all the bugs where this can happen''. Naively you'd want that all these go through one function call, and you fix the bug once there, but apparently the codebase is far less clean so they cannot deterministically say if all of those cases are fixed or not. This used to be, in my experience, super rare in Junos that HW/SW disagree with, while it used to be extremely common in PFC3. We've not not seen this type of bug in a year or two, so maybe most are fixed. But certainly if you are running MPLS you can have 100% coverage for all faults, if post-convergence path exists, you can utilise it immediately after hardware detects fault (link down), without waiting for software. This makes SPT performance quite uninteresting, if rapid convergence is the goal. -- ++ytti

Many data centers don't use SPF, instead they use RFC 7938. On Tue, Aug 19, 2025 at 11:34 PM Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Mon, 18 Aug 2025 at 21:22, Matthew Petach via NANOG <nanog@lists.nanog.org> wrote:
I don't know of many networks that choose link costs to ensure resulting uniqueness of the cumulative cost through the path. Indeed, ECMP is taken to be an assumption for most IGPs we use in the real world.
That is funny, and of course we can beat Djikstra massively if we can make assumptions for specific environments, which is arguably what engineering is, take advantage of environment constants that allow for assumptions which yield to optimisation.
How is SPF ran today? I have no clue, because the modern approach to convergence is not to converge fast, but to converge before fault. Which is not something Djikstra does. The naive approach would be to just run SPF many many times, removing from the topology failed nodes and edges to recover post-converge topology and loop free alternative paths. But absolutely there exists some domain specific solution which is cheaper when you need to recover both the best current path and best post-convergence paths. If such an algorithm is actually used or if the much more antifragile approach is used to throw a compute at it and run SPF as many times as it takes, I have no idea.
In Junos a few years back they enabled out-of-box the infrastructure for this post-fault convergence, regardless if or not you chose to install the backup paths. How this is implemented in practice is that the same structure that ECMP uses is used for backup paths, just the backup path is programmed in the hardware at worse weight, so it becomes excluded as ECMP option during lookup result. However because the infrastructure is still enabled, if for example interface flaps, the HW will invalidate the best ECMP option, and the next-best (if any) becomes valid.
In practice what happened after Juniper enabled that infrastructure is that we started to get a lot of bugs where after the network event we had a blackholing event. These were largely caused because software omits reprogramming hardware when something happens sufficiently fast that software didn't have time to invalidate the best option, then software will prune the invalid+valid before it enters hardware. Which is good optimisation, unless you've now added capability in the hardware to invalidate adjance without sw. To our surprise, Junos code has suffered so much technical debt that Juniper doesn't actually know every place in code where this could happen. We raised a separate issue to figure out why so many similar bugs occurred to us, and Juniper came out with an answer which is paraphrased as 'we just have to find all the bugs where this can happen''. Naively you'd want that all these go through one function call, and you fix the bug once there, but apparently the codebase is far less clean so they cannot deterministically say if all of those cases are fixed or not. This used to be, in my experience, super rare in Junos that HW/SW disagree with, while it used to be extremely common in PFC3. We've not not seen this type of bug in a year or two, so maybe most are fixed.
But certainly if you are running MPLS you can have 100% coverage for all faults, if post-convergence path exists, you can utilise it immediately after hardware detects fault (link down), without waiting for software. This makes SPT performance quite uninteresting, if rapid convergence is the goal.
-- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D4TMSWXO...

On Thu, 21 Aug 2025 at 00:16, Anoop Ghanwani <anoop@alumni.duke.edu> wrote:
Many data centers don't use SPF, instead they use RFC 7938.
Thanks Anoop, doesn't answer the question how SPT is ran tho. And because of weaknesses of RFC7938, we've come full circle with RFC9815. -- ++ytti

In practice what happened after Juniper enabled that infrastructure is that we started to get a lot of bugs where after the network event we had a blackholing event. These were largely caused because software omits reprogramming hardware when something happens sufficiently fast that software didn't have time to invalidate the best option, then software will prune the invalid+valid before it enters hardware. Which is good optimisation, unless you've now added capability in the hardware to invalidate adjance without sw.
The deltas in hardware programming speed actually become apparent and problematic if you have linecards more than a few generations apart in the same chassis. Assuming all MPCs get the NH updates at the same time, newer card will finish programming and forwarding traffic to NH on older card, which may not be done processing that update yet, so microBH until it catches up. A trickier problem around distributed chassis for sure, but when that combines with software choices like this, makes it a fun time to try and run down. But another notch as to why making the SPF marginally faster doesn't matter so much. On Wed, Aug 20, 2025 at 2:34 AM Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Mon, 18 Aug 2025 at 21:22, Matthew Petach via NANOG <nanog@lists.nanog.org> wrote:
I don't know of many networks that choose link costs to ensure resulting uniqueness of the cumulative cost through the path. Indeed, ECMP is taken to be an assumption for most IGPs we use in the real world.
That is funny, and of course we can beat Djikstra massively if we can make assumptions for specific environments, which is arguably what engineering is, take advantage of environment constants that allow for assumptions which yield to optimisation.
How is SPF ran today? I have no clue, because the modern approach to convergence is not to converge fast, but to converge before fault. Which is not something Djikstra does. The naive approach would be to just run SPF many many times, removing from the topology failed nodes and edges to recover post-converge topology and loop free alternative paths. But absolutely there exists some domain specific solution which is cheaper when you need to recover both the best current path and best post-convergence paths. If such an algorithm is actually used or if the much more antifragile approach is used to throw a compute at it and run SPF as many times as it takes, I have no idea.
In Junos a few years back they enabled out-of-box the infrastructure for this post-fault convergence, regardless if or not you chose to install the backup paths. How this is implemented in practice is that the same structure that ECMP uses is used for backup paths, just the backup path is programmed in the hardware at worse weight, so it becomes excluded as ECMP option during lookup result. However because the infrastructure is still enabled, if for example interface flaps, the HW will invalidate the best ECMP option, and the next-best (if any) becomes valid.
In practice what happened after Juniper enabled that infrastructure is that we started to get a lot of bugs where after the network event we had a blackholing event. These were largely caused because software omits reprogramming hardware when something happens sufficiently fast that software didn't have time to invalidate the best option, then software will prune the invalid+valid before it enters hardware. Which is good optimisation, unless you've now added capability in the hardware to invalidate adjance without sw. To our surprise, Junos code has suffered so much technical debt that Juniper doesn't actually know every place in code where this could happen. We raised a separate issue to figure out why so many similar bugs occurred to us, and Juniper came out with an answer which is paraphrased as 'we just have to find all the bugs where this can happen''. Naively you'd want that all these go through one function call, and you fix the bug once there, but apparently the codebase is far less clean so they cannot deterministically say if all of those cases are fixed or not. This used to be, in my experience, super rare in Junos that HW/SW disagree with, while it used to be extremely common in PFC3. We've not not seen this type of bug in a year or two, so maybe most are fixed.
But certainly if you are running MPLS you can have 100% coverage for all faults, if post-convergence path exists, you can utilise it immediately after hardware detects fault (link down), without waiting for software. This makes SPT performance quite uninteresting, if rapid convergence is the goal.
-- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D4TMSWXO...
participants (5)
-
Anoop Ghanwani
-
Matthew Petach
-
Ryan Hamel
-
Saku Ytti
-
Tom Beecher