
On 24 August 2025 16:40:20 CEST, Saku Ytti <saku@ytti.fi> wrote:
On Sun, 24 Aug 2025 at 13:09, <nanog@immibis.com> wrote:
No, you can't, because your upstream's shortest route leads back to you and that's a loop. Any difference in route calculation between two nodes in a link-state protocol is likely to create a loop.
The sender will know if it loops or not, if they can choose a non-shortest path that will not loop. I.e. LFA, loop free alternative.
To give a specific example.
I am AS10 I have upstream transit AS2[123] I have downstream stubby customer AS3[123]
For every other AS than AS10, AS3[123] I can freely choose any permutation of AS2[123] to send traffic to, _per-prefx_.
Let's say I see /some/ AS42 path through each of AS2[123] now I can have a local egress policy for each of AS42 prefix to send it through any permutation of AS2[123] ECMP or not.
It has to be a shortest path or at least you have to know their shortest path doesn't go back through you. Perhaps AS21's shortest path to AS23 is through you. In a link-state protocol you can't do shit to stop them using you as transit, besides outright blocking their traffic (breaking the internet) or splitting your AS in 3. How many times do I have to say it, maybe with big enough letters? ***A LINK STATE ROUTING PROTOCOL IS A DISTRIBUTED CONSENSUS ALGORITHM. ALL NODES MUST RUN THE IDENTICAL ALGORITHM ON IDENTICAL INPUT DATA OR THE NETWORK BREAKS.*** Perhaps you've invented a new type of algorithm where that's not the case. In this case I suggest ceasing to call it "link state", and writing a detailed paper about it instead of vague hints.
In fact BGP topology is mostly tree, it's mostly non-loopy
Not even remotely true. Customer relationships are almost always a DAG, and that's all we can say. Locally, on any given router, you see a tree, but each router has its own tree and the interconnection of all the trees is not a tree.. Loop prevention often happens anyway as a matter of policy, but BGP explicitly prevents loops by using the path attribute. so LFA
would be mostly there already. And this is so, because inherent business reasons (upstream/dowstream) and because we actually have pretty poor loop prevention hygiene, we filter RIB with different policies, some dropping more-specifics some not dropping them. Which from theory POV is a big no-no
Only in a link-state protocol! Luckily, BGP is not a link-state protocol.
, as now you can't guarantee you don't loop. But we do it, because we understand how _this_ implementation in practice looks, and we don't use the solutions that. don't work in _this_ implementation.
Infact even internally in our AS, we would almost certainly loop if we didn't do MPLS, because due to specific policy and TE reasons we filter advertisements differently in _iBGP-IN_, this is also kind of big no-no, and if we did do IP lookup in core transit I cannot at all guarantee we wouldn't loop, but because we can guarantee that the edge decision is honored all the way to the other edge, we can get away with it.
The ability to use policy to affect egressing traffic wouldn't be that much affected. The ability to affect ingress traffic would be radically different and we would risk that we walk towards a future where we are suddenly looking at a very large number of ASn, because perceived or real needs for disjoint advertisements. So my confidence remains very low that this would be worthwhile, while certainly we could make it go.