
The SPF discussion reminded me of a question I've been thinking about. Why do we use distance vector EGP? Why do we advertise prefixes? BGP made sense when we didn't have to worry about degenerates, when the Internet was largely academic. Prefix is configured once to the site where it exists, and no one else does anything, very optimal. But is that sensible today? When we have to also configure the prefix out-of-band locally on every site, potentially 3 times, RPKI (RTR maybe), prefix-list (for BGP) and access-list (for antispoof). So if we discover ASN/Prefix association anyhow out-of-band, why do we need to see +million prefixes in-band? What if EGP would flood link-states? What would we win? What would we lose? Potential wins: - flooded link-states could be signed, so we could verify both AS1->AS2, AS2<-AS1 link-state exists with valid signatures. You couldn't hijack ASN, the entire path could be validated. - initial convergence would be 50-100 times faster - lot less signalling/flapping - loop free alternatives for rapid convergence We could see some problems, for TE reasons I might advertise different prefixes from different sites with the same AS. I'm not sure if that is a legitimate concern, those are niche cases and for those cases we could just register more ASNs and move the ASNs instead of prefixes. But I'm sure there are more obvious weaknesses that don't immediately spring to mind. -- ++ytti

On Sat, 23 Aug 2025, Saku Ytti via NANOG wrote:
But I'm sure there are more obvious weaknesses that don't immediately spring to mind.
Bootstrap problem. You need the ASN/prefix list to go onto the box before it can do anything, and you need to keep updating it. And yes, the TE aspect is going to be a big pushback, as this would mean lots of tricks people use today wouldn't work anymore. We have an deaggr ratio of close to 3 vs optimal, and your proposal would basically take away that tool. -- Mikael Abrahamsson email: swmike@swm.pp.se

On Sat, 23 Aug 2025 at 18:18, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
Bootstrap problem. You need the ASN/prefix list to go onto the box before it can do anything, and you need to keep updating it.
That would come from the IGP routes, as it comes today.
And yes, the TE aspect is going to be a big pushback, as this would mean lots of tricks people use today wouldn't work anymore. We have an deaggr ratio of close to 3 vs optimal, and your proposal would basically take away that tool.
Agreed. Even though this is status quo bias, we think what we have is a requirement because it was what was possible with the limitations we had. If we had another set of limitations, we'd have another set of solutions, which we'd then believe to be requirements. -- ++ytti

Performance. You can't have someone destroying the whole internet by advertising 99999999 link states, and you can't limit the number of links someone can have to prevent that, either. BGP is considered a path-vector protocol, as it uses a path to avoid loops. On 23 August 2025 16:49:27 CEST, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
The SPF discussion reminded me of a question I've been thinking about.
Why do we use distance vector EGP? Why do we advertise prefixes?
BGP made sense when we didn't have to worry about degenerates, when the Internet was largely academic. Prefix is configured once to the site where it exists, and no one else does anything, very optimal.
But is that sensible today? When we have to also configure the prefix out-of-band locally on every site, potentially 3 times, RPKI (RTR maybe), prefix-list (for BGP) and access-list (for antispoof). So if we discover ASN/Prefix association anyhow out-of-band, why do we need to see +million prefixes in-band?
What if EGP would flood link-states? What would we win? What would we lose?
Potential wins: - flooded link-states could be signed, so we could verify both AS1->AS2, AS2<-AS1 link-state exists with valid signatures. You couldn't hijack ASN, the entire path could be validated. - initial convergence would be 50-100 times faster - lot less signalling/flapping - loop free alternatives for rapid convergence
We could see some problems, for TE reasons I might advertise different prefixes from different sites with the same AS. I'm not sure if that is a legitimate concern, those are niche cases and for those cases we could just register more ASNs and move the ASNs instead of prefixes. But I'm sure there are more obvious weaknesses that don't immediately spring to mind.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/L2FW4MX2...

on second thought, the real reason is that link-state protocols are distributed algorithms which require all nodes to execute the same algorithm on the same data, so there's no room to apply policy that wasn't baked into the design of the protocol. On 23 August 2025 16:49:27 CEST, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
The SPF discussion reminded me of a question I've been thinking about.
Why do we use distance vector EGP? Why do we advertise prefixes?
BGP made sense when we didn't have to worry about degenerates, when the Internet was largely academic. Prefix is configured once to the site where it exists, and no one else does anything, very optimal.
But is that sensible today? When we have to also configure the prefix out-of-band locally on every site, potentially 3 times, RPKI (RTR maybe), prefix-list (for BGP) and access-list (for antispoof). So if we discover ASN/Prefix association anyhow out-of-band, why do we need to see +million prefixes in-band?
What if EGP would flood link-states? What would we win? What would we lose?
Potential wins: - flooded link-states could be signed, so we could verify both AS1->AS2, AS2<-AS1 link-state exists with valid signatures. You couldn't hijack ASN, the entire path could be validated. - initial convergence would be 50-100 times faster - lot less signalling/flapping - loop free alternatives for rapid convergence
We could see some problems, for TE reasons I might advertise different prefixes from different sites with the same AS. I'm not sure if that is a legitimate concern, those are niche cases and for those cases we could just register more ASNs and move the ASNs instead of prefixes. But I'm sure there are more obvious weaknesses that don't immediately spring to mind.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/L2FW4MX2...

On Sat, 23 Aug 2025 at 18:38, nanog--- via NANOG <nanog@lists.nanog.org> wrote:
Performance. You can't have someone destroying the whole internet by advertising 99999999 link states, and you can't limit the number of links someone can have to prevent that, either.
I don't think this track. We already today need to prefix-limit before and after filter to avoid abuse, we'd need to continue doing the same. -- ++ytti

On Sat, 23 Aug 2025 at 18:54, nanog--- via NANOG <nanog@lists.nanog.org> wrote:
on second thought, the real reason is that link-state protocols are distributed algorithms which require all nodes to execute the same algorithm on the same data, so there's no room to apply policy that wasn't baked into the design of the protocol.
It doesn't really matter for sending direction which egress they choose, as long as it doesn't loop. So even in this SPT future, I can choose longer upstream over shorter by local policy, just like today. The big difference is, that the receiver cannot cherry pick which prefixes to receive in which eBGP, you have to be able to receive all prefixes on all eBGP with a given ASN. And these consistent announcements are not today always used, and would need to be replaced by registering multiple ASN.
On 23 August 2025 16:49:27 CEST, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
The SPF discussion reminded me of a question I've been thinking about.
Why do we use distance vector EGP? Why do we advertise prefixes?
BGP made sense when we didn't have to worry about degenerates, when the Internet was largely academic. Prefix is configured once to the site where it exists, and no one else does anything, very optimal.
But is that sensible today? When we have to also configure the prefix out-of-band locally on every site, potentially 3 times, RPKI (RTR maybe), prefix-list (for BGP) and access-list (for antispoof). So if we discover ASN/Prefix association anyhow out-of-band, why do we need to see +million prefixes in-band?
What if EGP would flood link-states? What would we win? What would we lose?
Potential wins: - flooded link-states could be signed, so we could verify both AS1->AS2, AS2<-AS1 link-state exists with valid signatures. You couldn't hijack ASN, the entire path could be validated. - initial convergence would be 50-100 times faster - lot less signalling/flapping - loop free alternatives for rapid convergence
We could see some problems, for TE reasons I might advertise different prefixes from different sites with the same AS. I'm not sure if that is a legitimate concern, those are niche cases and for those cases we could just register more ASNs and move the ASNs instead of prefixes. But I'm sure there are more obvious weaknesses that don't immediately spring to mind.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/L2FW4MX2...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D6VUEYYE...
-- ++ytti

Losses: Privacy. Telling your competitors what all your links and private peerings are may not be what you want. You might not advertise all your prefixes to some of your neighbors, but you still need the link for other prefixes. If you are only advertising the link, then any neighbor could send you traffic that you don't want to provide transit for. So you drop it. How does your neighbor know? You send him the routes for traffic you are willing to transit traffic for. Or you advertise relationships with the links. Then you get soBGP. Kind Regards, Jakob Heitz. Saku Ytti wrote:
The SPF discussion reminded me of a question I've been thinking about.
Why do we use distance vector EGP? Why do we advertise prefixes?
BGP made sense when we didn't have to worry about degenerates, when the Internet was largely academic. Prefix is configured once to the site where it exists, and no one else does anything, very optimal.
But is that sensible today? When we have to also configure the prefix out-of-band locally on every site, potentially 3 times, RPKI (RTR maybe), prefix-list (for BGP) and access-list (for antispoof). So if we discover ASN/Prefix association anyhow out-of-band, why do we need to see +million prefixes in-band?
What if EGP would flood link-states? What would we win? What would we lose?
Potential wins: - flooded link-states could be signed, so we could verify both AS1->AS2, AS2<-AS1 link-state exists with valid signatures. You couldn't hijack ASN, the entire path could be validated. - initial convergence would be 50-100 times faster - lot less signalling/flapping - loop free alternatives for rapid convergence
We could see some problems, for TE reasons I might advertise different prefixes from different sites with the same AS. I'm not sure if that is a legitimate concern, those are niche cases and for those cases we could just register more ASNs and move the ASNs instead of prefixes. But I'm sure there are more obvious weaknesses that don't immediately spring to mind.

On 8/23/25 11:57, Saku Ytti via NANOG wrote:
On Sat, 23 Aug 2025 at 18:54, nanog--- via NANOG <nanog@lists.nanog.org> wrote:
on second thought, the real reason is that link-state protocols are distributed algorithms which require all nodes to execute the same algorithm on the same data, so there's no room to apply policy that wasn't baked into the design of the protocol.
The above dances around the opposite side of a conversation John Scudder and I were having about the properties of the system we've evolved. A BGP rib-out is effectively the output of a hidden state machine for your entire network.
It doesn't really matter for sending direction which egress they choose, as long as it doesn't loop. So even in this SPT future, I can choose longer upstream over shorter by local policy, just like today.
The big difference is, that the receiver cannot cherry pick which prefixes to receive in which eBGP, you have to be able to receive all prefixes on all eBGP with a given ASN. And these consistent announcements are not today always used, and would need to be replaced by registering multiple ASN.
This hits part of the above as well. What you lose through traditional link state type mechanisms is the ability to do policy. Operators like their policies for all sorts of reasons. In order to implement something that resembles the hop by hop policy stuff you can do in BGP in something that is link-state, it becomes necessary to distribute a portion of that policy into the link state distribution machinery and run it as part of your calculations for a large number of hops. The easy way to picture some of the impacts of that is consider what it'd take to distribute "at the boundary of AS X->Y, don't distribute prefix P". Traditional valley-free routing starts to require careful management of large metrics. Etc. It gets very gross, very quickly. You touch on some of the issues. -- Jeff

On Sun, 24 Aug 2025 at 05:52, Jeffrey Haas <jhaas@pfrc.org> wrote:
The easy way to picture some of the impacts of that is consider what it'd take to distribute "at the boundary of AS X->Y, don't distribute prefix P".
If we imagine that we would have day1 had concern of people abusing BGP and that we need to distribute >1M prefixes. We likely would have considered we need out-of-band for validation reasons alone. So we would have evolved a very different looking system. And what limitations that system would have and how to work with them would now look like requirements to us, when they were just the best solution we could come up, with the tools we had in front of us. I suspect all these disjoint advertisement problems that are legitimate would be addressed by registering more ASN and moving the ASNs between sites as needed. We do try to avoid disjoint today already, and many companies have a peering policy which forbids it or recommends against it. But it absolutely still happens, and in my experience it happens more today, because CDNs use disjoint advertisements to enforce their policies when customers override their policies with local-prefs. I'm very confident this system would just work, and Internet end users wouldn't be none the wiser. But I have no confidence that it would be worthwhile. -- ++ytti

On Sat, 23 Aug 2025 at 23:32, Jakob Heitz via NANOG <nanog@lists.nanog.org> wrote:
Losses: Privacy. Telling your competitors what all your links and private peerings are may not be what you want. You might not advertise all your prefixes to some of your neighbors, but you still need the link for other prefixes.
This disjoint advertisement is a legitimate argument, but as explained elsewhere we could address it by registering more ASNs and moving the ASNs, not prefixes. Privacy appears to be the same argument for disjoint advertisements.
If you are only advertising the link, then any neighbor could send you traffic that you don't want to provide transit for. So you drop it. How does your neighbor know? You send him the routes for traffic you are willing to transit traffic for.
Your links that you advertise are the ASme-ASyou you provide traffic for. You don't advertise links you don't carry traffic for. So I would advertise ASme-ASme, ASme-AScustomer + ASprovider-ASme to my upstream, but I would not advertise ASme-ASupstream to my upstream. My upstream similarly would advertise to their peers and upstream ASupstream-ASme. This would allow anyone to validate those paths, because they expect ASme to have ASprovider-ASme adjancency, and they expect ASprovider to corroborate that with having ASprovider-ASme adjacency. Both link-states are signed and singatures verifiable by some out-of-band mechanism. I do think that in an alternate reality, where we would have anticipated that BGP abuse and +1M prefixes we would have landed somewhere entirely different than where we are today. And in that reality whatever limitations that feature has, we would have learned to live with them and started to think they are requirements, because they are requirements there, because we can only. build solutions on top of those that work with that stack. I have full confidence we could have made this link-state based reality work, and the Internet would work just the same for Internet users. I have no confidence that it would be worthwhile. It would be different and whatever it enables would seem like requirements to us now, while they were just solutions we ended up with the limitations we had. -- ++ytti

I do think that in an alternate reality, where we would have anticipated that BGP abuse and +1M prefixes we would have landed somewhere entirely different than where we are today. And in that
Thinking bit more this. If we had ended up in solution something like this, which enforces joint AS:prefix relation. Our lookup engines likely would be very different, because we could have gotten away with some sort of inter domain MPLS, with 'AS labels', doing exact match AS lookups, instead of LPM IP while in transit and LPM only in edges. In this future, it likely would look like not worthwhile to develop fast LPM lookup engines with deep FIBs, and instead inside-ASN we'd assign egress-port labels to work with the HW we would have developed. So possibly HW based LPM simply wouldn't exist. Not saying that would be desirable at all, but it seems likely it would have had profound impact on how we decide to build forwarding. -- ++ytti

On Sat, Aug 23, 2025 at 7:49 AM Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
Why do we use distance vector EGP? Why do we advertise prefixes?
Because we like to get paid. The problem with a link-state protocol is that every link in the system is a valid transit for every routed prefix. We can control the link transit costs, prefer some paths over others, but there's no way to express that even though a particular prefix is attached to a node on the other side of this link, we're not under any circumstances permitted to use this link to reach it. As a result, we can only use a link-state protocol in a system where the subscriber has paid for the right to use -all- of the links in the system. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/

No, you can't, because your upstream's shortest route leads back to you and that's a loop. Any difference in route calculation between two nodes in a link-state protocol is likely to create a loop. On 23 August 2025 17:57:10 CEST, Saku Ytti <saku@ytti.fi> wrote:
On Sat, 23 Aug 2025 at 18:54, nanog--- via NANOG <nanog@lists.nanog.org> wrote:
on second thought, the real reason is that link-state protocols are distributed algorithms which require all nodes to execute the same algorithm on the same data, so there's no room to apply policy that wasn't baked into the design of the protocol.
It doesn't really matter for sending direction which egress they choose, as long as it doesn't loop. So even in this SPT future, I can choose longer upstream over shorter by local policy, just like today.
The big difference is, that the receiver cannot cherry pick which prefixes to receive in which eBGP, you have to be able to receive all prefixes on all eBGP with a given ASN. And these consistent announcements are not today always used, and would need to be replaced by registering multiple ASN.
On 23 August 2025 16:49:27 CEST, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
The SPF discussion reminded me of a question I've been thinking about.
Why do we use distance vector EGP? Why do we advertise prefixes?
BGP made sense when we didn't have to worry about degenerates, when the Internet was largely academic. Prefix is configured once to the site where it exists, and no one else does anything, very optimal.
But is that sensible today? When we have to also configure the prefix out-of-band locally on every site, potentially 3 times, RPKI (RTR maybe), prefix-list (for BGP) and access-list (for antispoof). So if we discover ASN/Prefix association anyhow out-of-band, why do we need to see +million prefixes in-band?
What if EGP would flood link-states? What would we win? What would we lose?
Potential wins: - flooded link-states could be signed, so we could verify both AS1->AS2, AS2<-AS1 link-state exists with valid signatures. You couldn't hijack ASN, the entire path could be validated. - initial convergence would be 50-100 times faster - lot less signalling/flapping - loop free alternatives for rapid convergence
We could see some problems, for TE reasons I might advertise different prefixes from different sites with the same AS. I'm not sure if that is a legitimate concern, those are niche cases and for those cases we could just register more ASNs and move the ASNs instead of prefixes. But I'm sure there are more obvious weaknesses that don't immediately spring to mind.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/L2FW4MX2...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/D6VUEYYE...
-- ++ytti

On 24 August 2025 08:34:51 CEST, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Sat, 23 Aug 2025 at 23:32, Jakob Heitz via NANOG <nanog@lists.nanog.org> wrote:
Losses: Privacy. Telling your competitors what all your links and private peerings are may not be what you want. You might not advertise all your prefixes to some of your neighbors, but you still need the link for other prefixes.
This disjoint advertisement is a legitimate argument, but as explained elsewhere we could address it by registering more ASNs and moving the ASNs, not prefixes. Privacy appears to be the same argument for disjoint advertisements.
If you are only advertising the link, then any neighbor could send you traffic that you don't want to provide transit for. So you drop it. How does your neighbor know? You send him the routes for traffic you are willing to transit traffic for.
Your links that you advertise are the ASme-ASyou you provide traffic for. You don't advertise links you don't carry traffic for. So I would advertise ASme-ASme, ASme-AScustomer + ASprovider-ASme to my upstream, but I would not advertise ASme-ASupstream to my upstream.
But link-state protocols are global shared state, gossip protocols and don't support split horizon. You have a customer with two upstreams but you hide something from your upstream; they'll find out about it anyway via your customer and their other upstream. I don't know what you mean by "links you carry traffic for". All links are presumably intended to carry traffic. So you advertise all links. Lying in a link-state routing protocol is a good way to create routing loops. They fundamentally rely on every node having an identical set of information and running an identical algorithm.
My upstream similarly would advertise to their peers and upstream ASupstream-ASme.
This would allow anyone to validate those paths, because they expect ASme to have ASprovider-ASme adjancency, and they expect ASprovider to corroborate that with having ASprovider-ASme adjacency. Both link-states are signed and singatures verifiable by some out-of-band mechanism.
I do think that in an alternate reality, where we would have anticipated that BGP abuse and +1M prefixes we would have landed somewhere entirely different than where we are today. And in that reality whatever limitations that feature has, we would have learned to live with them and started to think they are requirements, because they are requirements there, because we can only. build solutions on top of those that work with that stack. I have full confidence we could have made this link-state based reality work, and the Internet would work just the same for Internet users. I have no confidence that it would be worthwhile. It would be different and whatever it enables would seem like requirements to us now, while they were just solutions we ended up with the limitations we had.

On Sun, 24 Aug 2025 at 13:09, <nanog@immibis.com> wrote:
No, you can't, because your upstream's shortest route leads back to you and that's a loop. Any difference in route calculation between two nodes in a link-state protocol is likely to create a loop.
The sender will know if it loops or not, if they can choose a non-shortest path that will not loop. I.e. LFA, loop free alternative. -- ++ytti

On 8/24/25 02:23, Saku Ytti wrote:
On Sun, 24 Aug 2025 at 05:52, Jeffrey Haas <jhaas@pfrc.org> wrote:
The easy way to picture some of the impacts of that is consider what it'd take to distribute "at the boundary of AS X->Y, don't distribute prefix P". If we imagine that we would have day1 had concern of people abusing BGP and that we need to distribute >1M prefixes. We likely would have considered we need out-of-band for validation reasons alone. So we would have evolved a very different looking system. It's worth remembering that such validation systems were considered very early. The origins of the IRR and route servers were there partially to deal with scaling situations along with validating routes. It's only with this iteration with the RPKI that we've gotten a flavor of such a database that's had some teeth to it. And what limitations that system would have and how to work with them would now look like requirements to us, when they were just the best solution we could come up, with the tools we had in front of us.
... and similarly what the security landscape would resemble. bgpsec still resembles most of the important bits of S-BGP for such reasons. And rather similarly, the fact that systems actually getting deployed have properties more like SO-bgp than S-BGP. To your point, where we're at is exactly the same type of story I generally tell about BGP: We got here one step at a time, because this has always been a story about successful incremental deployments. Did my elders think about doing everything in the flavor of link-state at the beginning? They certainly were aware of it - and somewhat frightened of it. CPU scale at the time made even lower scale SPFs challenging. These days we have much larger CPUs, although the CPUs available in routers still remain pathetic compared to desktop computers. Would link state make more sense these days? I think those of you on this list running planetary scale IGPs have some opinions about how even internal networks are able to keep up. So... probably not for the scale of the Internet.
I suspect all these disjoint advertisement problems that are legitimate would be addressed by registering more ASN and moving the ASNs between sites as needed.
RFC 1925, §2.(6). The amount of state stays largely the same. A simplifying discussion I have when covering this problem is you can treat an AS effectively as one very large router. The underlying problem is you can't pretend for how ASes work that a route entering one interface of this very large router is guaranteed to exit everywhere else. This is how we'd expect a link-state implementation to generally work. Similarly, you can't expect that we're going to originate routes from that AS uniformly from that single very large router. These things already push us out of classical link state solutions. The very large router is a black box and the Internet is the sum of how all of those black boxes are operating based on the preferences of each party running their AS. Which is a pity in some respects. As you note, if it was closer to link state, forwarding and convergence start to look very different. -- Jeff

On Sun, 24 Aug 2025 at 13:09, <nanog@immibis.com> wrote:
No, you can't, because your upstream's shortest route leads back to you and that's a loop. Any difference in route calculation between two nodes in a link-state protocol is likely to create a loop.
The sender will know if it loops or not, if they can choose a non-shortest path that will not loop. I.e. LFA, loop free alternative.
To give a specific example. I am AS10 I have upstream transit AS2[123] I have downstream stubby customer AS3[123] For every other AS than AS10, AS3[123] I can freely choose any permutation of AS2[123] to send traffic to, _per-prefx_. Let's say I see /some/ AS42 path through each of AS2[123] now I can have a local egress policy for each of AS42 prefix to send it through any permutation of AS2[123] ECMP or not. In fact BGP topology is mostly tree, it's mostly non-loopy so LFA would be mostly there already. And this is so, because inherent business reasons (upstream/dowstream) and because we actually have pretty poor loop prevention hygiene, we filter RIB with different policies, some dropping more-specifics some not dropping them. Which from theory POV is a big no-no, as now you can't guarantee you don't loop. But we do it, because we understand how _this_ implementation in practice looks, and we don't use the solutions that. don't work in _this_ implementation. Infact even internally in our AS, we would almost certainly loop if we didn't do MPLS, because due to specific policy and TE reasons we filter advertisements differently in _iBGP-IN_, this is also kind of big no-no, and if we did do IP lookup in core transit I cannot at all guarantee we wouldn't loop, but because we can guarantee that the edge decision is honored all the way to the other edge, we can get away with it. The ability to use policy to affect egressing traffic wouldn't be that much affected. The ability to affect ingress traffic would be radically different and we would risk that we walk towards a future where we are suddenly looking at a very large number of ASn, because perceived or real needs for disjoint advertisements. So my confidence remains very low that this would be worthwhile, while certainly we could make it go. -- ++ytti

On 8/24/25 10:40, Saku Ytti via NANOG wrote:
Infact even internally in our AS, we would almost certainly loop if we didn't do MPLS, because due to specific policy and TE reasons we filter advertisements differently in_iBGP-IN_, this is also kind of big no-no, and if we did do IP lookup in core transit I cannot at all guarantee we wouldn't loop, but because we can guarantee that the edge decision is honored all the way to the other edge, we can get away with it.
This practical matter regularly comes up when we're doing BGP extensions. We need to design the protocol to be safe on a hop by hop basis so that iBGP safely works - especially with reflectors. You can get away with all sorts of murder if you promise you tunnel edge to edge. -- Jeff

On 24 August 2025 16:40:20 CEST, Saku Ytti <saku@ytti.fi> wrote:
On Sun, 24 Aug 2025 at 13:09, <nanog@immibis.com> wrote:
No, you can't, because your upstream's shortest route leads back to you and that's a loop. Any difference in route calculation between two nodes in a link-state protocol is likely to create a loop.
The sender will know if it loops or not, if they can choose a non-shortest path that will not loop. I.e. LFA, loop free alternative.
To give a specific example.
I am AS10 I have upstream transit AS2[123] I have downstream stubby customer AS3[123]
For every other AS than AS10, AS3[123] I can freely choose any permutation of AS2[123] to send traffic to, _per-prefx_.
Let's say I see /some/ AS42 path through each of AS2[123] now I can have a local egress policy for each of AS42 prefix to send it through any permutation of AS2[123] ECMP or not.
It has to be a shortest path or at least you have to know their shortest path doesn't go back through you. Perhaps AS21's shortest path to AS23 is through you. In a link-state protocol you can't do shit to stop them using you as transit, besides outright blocking their traffic (breaking the internet) or splitting your AS in 3. How many times do I have to say it, maybe with big enough letters? ***A LINK STATE ROUTING PROTOCOL IS A DISTRIBUTED CONSENSUS ALGORITHM. ALL NODES MUST RUN THE IDENTICAL ALGORITHM ON IDENTICAL INPUT DATA OR THE NETWORK BREAKS.*** Perhaps you've invented a new type of algorithm where that's not the case. In this case I suggest ceasing to call it "link state", and writing a detailed paper about it instead of vague hints.
In fact BGP topology is mostly tree, it's mostly non-loopy
Not even remotely true. Customer relationships are almost always a DAG, and that's all we can say. Locally, on any given router, you see a tree, but each router has its own tree and the interconnection of all the trees is not a tree.. Loop prevention often happens anyway as a matter of policy, but BGP explicitly prevents loops by using the path attribute. so LFA
would be mostly there already. And this is so, because inherent business reasons (upstream/dowstream) and because we actually have pretty poor loop prevention hygiene, we filter RIB with different policies, some dropping more-specifics some not dropping them. Which from theory POV is a big no-no
Only in a link-state protocol! Luckily, BGP is not a link-state protocol.
, as now you can't guarantee you don't loop. But we do it, because we understand how _this_ implementation in practice looks, and we don't use the solutions that. don't work in _this_ implementation.
Infact even internally in our AS, we would almost certainly loop if we didn't do MPLS, because due to specific policy and TE reasons we filter advertisements differently in _iBGP-IN_, this is also kind of big no-no, and if we did do IP lookup in core transit I cannot at all guarantee we wouldn't loop, but because we can guarantee that the edge decision is honored all the way to the other edge, we can get away with it.
The ability to use policy to affect egressing traffic wouldn't be that much affected. The ability to affect ingress traffic would be radically different and we would risk that we walk towards a future where we are suddenly looking at a very large number of ASn, because perceived or real needs for disjoint advertisements. So my confidence remains very low that this would be worthwhile, while certainly we could make it go.

On Mon, 25 Aug 2025 at 03:44, <nanog@immibis.com> wrote:
It has to be a shortest path or at least you have to know their shortest path doesn't go back through you. Perhaps AS21's shortest path to AS23 is through you. In a link-state protocol you can't do shit to stop them using you as transit, besides outright blocking their traffic (breaking the internet) or splitting your AS in 3.
How many times do I have to say it, maybe with big enough letters? ***A LINK STATE ROUTING PROTOCOL IS A DISTRIBUTED CONSENSUS ALGORITHM. ALL NODES MUST RUN THE IDENTICAL ALGORITHM ON IDENTICAL INPUT DATA OR THE NETWORK BREAKS.***
Perhaps you've invented a new type of algorithm where that's not the case. In this case I suggest ceasing to call it "link state", and writing a detailed paper about it instead of vague hints.
Oh I'm definitely not writing a paper. But I'm not sure a novel algorithm is needed (nor am I sure it is not needed). Certainly the graph cannot be a symmetric directed graph. That is the directions or arrows represent direction. You have edges which are reachable through you (customers) and you have edges which can be used to reach your customers (upstreams). So my link-state would have AS2[123] edges as reachable through me and AS3[123] as edges that can be used to reach those AS2[123] edges. So arbitrary node further down the network wouldn't use me to reach AS2[123] because of the direction of the arrow.
Only in a link-state protocol! Luckily, BGP is not a link-state protocol.
Of course it is easy to end up with loopy BGP configurations. But then we change the configuration and come up with something else. -- ++ytti

Have you ever looked at soBGP or Path State Vectors. Happy to hang out and explain if it would be helpful, but these are/were effectively BGP security efforts that were ultimately driving to a DAG overlay. They failed because the community became extremely focused on securing "BGP operation" rather than securing the base topology information. :-) /r ------ Original Message ------ From "Saku Ytti via NANOG" <nanog@lists.nanog.org> To nanog@immibis.com Cc "North American Network Operators Group" <nanog@lists.nanog.org>; "Saku Ytti" <saku@ytti.fi> Date 8/25/2025 02:04:15 Subject Re: Link-state EGP
On Mon, 25 Aug 2025 at 03:44, <nanog@immibis.com> wrote:
It has to be a shortest path or at least you have to know their shortest path doesn't go back through you. Perhaps AS21's shortest path to AS23 is through you. In a link-state protocol you can't do shit to stop them using you as transit, besides outright blocking their traffic (breaking the internet) or splitting your AS in 3.
How many times do I have to say it, maybe with big enough letters? ***A LINK STATE ROUTING PROTOCOL IS A DISTRIBUTED CONSENSUS ALGORITHM. ALL NODES MUST RUN THE IDENTICAL ALGORITHM ON IDENTICAL INPUT DATA OR THE NETWORK BREAKS.***
Perhaps you've invented a new type of algorithm where that's not the case. In this case I suggest ceasing to call it "link state", and writing a detailed paper about it instead of vague hints.
Oh I'm definitely not writing a paper. But I'm not sure a novel algorithm is needed (nor am I sure it is not needed). Certainly the graph cannot be a symmetric directed graph. That is the directions or arrows represent direction. You have edges which are reachable through you (customers) and you have edges which can be used to reach your customers (upstreams).
So my link-state would have AS2[123] edges as reachable through me and AS3[123] as edges that can be used to reach those AS2[123] edges. So arbitrary node further down the network wouldn't use me to reach AS2[123] because of the direction of the arrow.
Only in a link-state protocol! Luckily, BGP is not a link-state protocol.
Of course it is easy to end up with loopy BGP configurations. But then we change the configuration and come up with something else.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/2AFXLTXO...

I have not, I will take a peek, thanks. Securing topology information would be a big win with link-state. On Mon, 25 Aug 2025 at 17:43, 7riw77@gmail.com <7riw77@gmail.com> wrote:
Have you ever looked at soBGP or Path State Vectors. Happy to hang out and explain if it would be helpful, but these are/were effectively BGP security efforts that were ultimately driving to a DAG overlay.
They failed because the community became extremely focused on securing "BGP operation" rather than securing the base topology information.
:-) /r
------ Original Message ------ From "Saku Ytti via NANOG" <nanog@lists.nanog.org> To nanog@immibis.com Cc "North American Network Operators Group" <nanog@lists.nanog.org>; "Saku Ytti" <saku@ytti.fi> Date 8/25/2025 02:04:15 Subject Re: Link-state EGP
On Mon, 25 Aug 2025 at 03:44, <nanog@immibis.com> wrote:
It has to be a shortest path or at least you have to know their shortest path doesn't go back through you. Perhaps AS21's shortest path to AS23 is through you. In a link-state protocol you can't do shit to stop them using you as transit, besides outright blocking their traffic (breaking the internet) or splitting your AS in 3.
How many times do I have to say it, maybe with big enough letters? ***A LINK STATE ROUTING PROTOCOL IS A DISTRIBUTED CONSENSUS ALGORITHM. ALL NODES MUST RUN THE IDENTICAL ALGORITHM ON IDENTICAL INPUT DATA OR THE NETWORK BREAKS.***
Perhaps you've invented a new type of algorithm where that's not the case. In this case I suggest ceasing to call it "link state", and writing a detailed paper about it instead of vague hints.
Oh I'm definitely not writing a paper. But I'm not sure a novel algorithm is needed (nor am I sure it is not needed). Certainly the graph cannot be a symmetric directed graph. That is the directions or arrows represent direction. You have edges which are reachable through you (customers) and you have edges which can be used to reach your customers (upstreams).
So my link-state would have AS2[123] edges as reachable through me and AS3[123] as edges that can be used to reach those AS2[123] edges. So arbitrary node further down the network wouldn't use me to reach AS2[123] because of the direction of the arrow.
Only in a link-state protocol! Luckily, BGP is not a link-state protocol.
Of course it is easy to end up with loopy BGP configurations. But then we change the configuration and come up with something else.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/2AFXLTXO...
-- ++ytti

Saku Ytti via NANOG писал(а) 2025-08-24 03:27:
Thinking bit more this. If we had ended up in solution something like this, which enforces joint AS:prefix relation. Our lookup engines likely would be very different, because we could have gotten away with some sort of inter domain MPLS, with 'AS labels', doing exact match AS lookups, instead of LPM IP while in transit and LPM only in edges.
This can be done already, relatively easily inside one AS, but definitely will hit a scalability barrier if expanded between ASes globally. If implemented it will create more instability, because if the path changes inside neighbor AS to another outgoing interface to the same next AS, it will trigger a label change upstream/downstream depending on PoV. In any case LPM will still be necessary on every ingress node to find a prefix for a particular /32, or on a next aggregation node if the ingress node uses default route. Regarding link-state, the number of objects in this database will be much bigger than in current BGP table. The number of BGP paths now is roughly the same an the number of prefixes in DMZ. It probably will be multiplied by number of transit links to each AS, plus some peering links, etc. I see that underneath it's a neat idea to use AS numbers as principal routing objects, but in current reality it's an IP address that identifies the destination, so the lookup to find AS for a particular prefix has to be done and the routing table for this lookup has to be maintained. The idea when it could scale better is to certain extend present in v6, but in reality very far from that with the existing implementation. It could be though a principle for a new IP version if it ever be invented. Kind regards, Andrey
participants (8)
-
7riw77@gmail.com
-
Andrey Kostin
-
Jeffrey Haas
-
jheitz@cisco.com
-
Mikael Abrahamsson
-
nanog@immibis.com
-
Saku Ytti
-
William Herrin