Dear Mike, Ytti, others, First of all and most importantly: congratulations Mike! I thank you and your team for having constructed a great mechanism that helps honor the routing intentions everyone publishes in the RPKI. On Tue, Jun 16, 2020 at 09:08:41AM +0300, Saku Ytti wrote:
On Tue, 16 Jun 2020 at 07:51, Mike Leber via NANOG <nanog@nanog.org> wrote:
These prefix filters are updated automatically both through a system of daily updates and real time updates to prevent RPKI INVALID routes from being carried in our routing table.
What does real time mean in this context? Does it mean exactly 0s leak of INVALID, or 99% less than 30s? Or how do you define it?
My measurement (samplesize = 1) appears to indicate it took less than a minute between AS 6939 receiving (and accepting) an RPKI invalid route announcement, and that same route announcement being removed from the AS 6939 routing tables. Subsequently BGP withdraw messages were sent (for that RPKI invalid route via 6939) to all their peers, which a few more minutes to be processed and converge in the global routing system. I think it is important for the community to understand that the mechanism 6939 currently uses, is a different approach to what other network operators are doing. Most RPKI ROV deployments have set it up in such a way that a-priori all EBGP routers are primed with a full set of VRPs. Feeding the routers the VRPs through the RPKI-To-Router (RTR) protocol allows those BGP speakers to reject an RPKI invalid route - before - installing it in the Loc-RIB. At the same time, we should recognize and praise anyone who managed to deploy a reactive mechanism due to the lack of RTR support on a device. The "route collector -> script -> add prefix list to denylist" approach cannot be avoided if you have gear in the network that does not support RPKI OV as specced out in RFC 6811. The reactive mechanism must be viewed in context of other protection mechanisms that are deployed such as Peerlock, Maximum Prefix Limits, and IRR+RPKI+WHOIS based explicit allowlists, all of which 6939 has done. I actually had to jump through some hoops in the IRR system to trick 6939 into accepting my RPKI invalid route announcement. :-) Since it is with words that we construct the magic of our reality, let's assign a name specific to this engineering effort: Reactive RPKI ROV ================= Reactive RPKI ROV means that a network operator has set up a RPKI-capable route collector which peers with all BGP nodes that do not support RPKI. The route collector logs all RPKI route announcements it receives, and these messages can be used as input to an automated process to update prefix-list filters on the BGP node that received the RPKI invalid route announcement. The free OpenBGPD or BIRD software can be used as such route collectors. As is evident from my 'samplesize=1' study, that whole process can be completed in under one minute. The alternative to the "Reactive RPKI ROV" approach is what we've already done for years: emailing a NOC and request manual intervention to block a problematic route. At the best of times the 'calling the NOC' approach takes hours. As such, Reactive RPKI ROV is obviously far preferable to manual approaches. It would be awesome if the community openly shares notes on how to construct Reactive RPKI ROV deployments to improve routing for everyone. Maybe at some point some open source software pops up somewhere to make it easier for everyone? The future is bright, I'm optimistic we tame the Default-Free Zone beast :) So Mike, please consider to submit a presentation proposal to one of the network operator groups to outline in as much detail as possible how you did it. I'd love to learn from your experience!
So my definition of real time here would be 99% <5min.
I think it should be 99% <1 min, because that's how high 6939 set the bar :-) Kind regards, Job