
On Tue, Oct 7, 2025 at 9:55 AM Martin Tonusoo via NANOG < nanog@lists.nanog.org> wrote:
Hi.
2. p3619 : "Then each new prefix will be propagated in parallel."
Not really. Even if you assume the AS A sent a single UPDATE with 1 NLRI for each prefix, ASes B C D are going to aggregate multiple NLRI changes in a single UPDATE message to each other. This isn't going to cause the amplification claimed.
Perhaps the authors meant that each UPDATE message sent by AS A has unique path attributes and thus ensuring that ASes B, C and D can not aggregate multiple NLRIs into a single UPDATE message.
I tried to replicate the "BGP Vortices Delay Network Convergence" test demonstrated in paragraph 5.3. Setup(drawing: https://gist.github.com/tonusoo/1cced39aa6ae53143d12623a05f02331) is very similar to figure 4b on the page 3621, but all my routers are running BIRD 3(single thread mode). Router "rY"(ingress) injects real BGP feed into the lab setup, router "rX"(upstream) periodically advertises and withdraws 50 routes and router "rK" injects 5k prefixes for the BGP vortex. Running the packet capture on Linux bridge connecting, for example, the "rN" and "rM" routers confirms that the BGP vortex is ongoing and I'm seeing well over 10k UPDATE messages per second. However, I might be doing something wrong, but I don't see the delays shown on figure 5a on page 3622. That is, 50 routes advertised or withdrawn by "rX" are propagated to "rZ" within few hundred milliseconds and not delayed for 10+ seconds.
Looking at figure 6, it appears that the larger component appears to be the time between when the BGP update message arrived at the bystander-AS and when FRR finished logging the update message in its logs. As the methodology claims: By subtracting the time a route advertisement arrived at the bystander-AS from when it was logged in the FRR’s BGP log, we computed the processing time on the bystander-AS. As someone who has dealt with logging of debugging output from programs that need to be as real-time as possible, the logging functions are generally written to be asynchronous and separate from the main processing path, so that delays in the logging subsystem don't hold up the real work the program is doing. Using the appearance of a log message as an indicator of precise timing of when a RIB update happened is handwavy at best, and flat-out wrong at worst. The timestamp at which the zlog subsystem of FRR got the BGP update log message is unlikely to be the same timestamp at which the RIB itself was updated. Indeed, when researching FRR logging timestamps, it says - Performance impact: Debug-level logging can significantly increase the load on the system and may not capture precise, real-time updates without impacting performance, especially for frequent RIB updates. So, you end up with a double-whammy; turning on debug logging to see the logs for the routing updates significantly increases the load on the box running FRR, which in turns slows down the rate at which it can process update messages coming in. I think we've all known for years the perils of turning on extensive debug messages on routers. How many of us have had the awkward moment of a partner shaking us awake in bed saying "what happened? You were shouting "undebug all! undebug all!" in your sleep. Were you having a nightmare?" I suspect if you turn on verbose debugging logging on "rZ", you might find that suddenly route updates to the RIB slow down noticeably. This has less to do with the actions of a route vortex, and much more to do with hitting the CPU of your router over the head repeatedly with the blunt hammer of sprintf. ^_^;; Matt