Hurricane Electric has reached 0 RPKI INVALIDs in our routing table
I'm pleased to announce Hurricane Electric has completed our RPKI INVALID filtering project and we now have 0 RPKI INVALIDs in our routing table. Hurricane Electric has 29021 BGP sessions with 22109 prefix filters with 7191 networks directly and 8239 networks including Internet exchanges. We filter all BGP sessions using prefix filters based on IRR and RPKI. These prefix filters are updated automatically both through a system of daily updates and real time updates to prevent RPKI INVALID routes from being carried in our routing table.
absolutely awesome Mike! Can you put on the roadmap to enable irr based filters for customers with bgp communities? On Mon, Jun 15, 2020 at 9:48 PM Mike Leber via NANOG <nanog@nanog.org> wrote:
I'm pleased to announce Hurricane Electric has completed our RPKI INVALID filtering project and we now have 0 RPKI INVALIDs in our routing table.
Hurricane Electric has 29021 BGP sessions with 22109 prefix filters with 7191 networks directly and 8239 networks including Internet exchanges.
We filter all BGP sessions using prefix filters based on IRR and RPKI.
These prefix filters are updated automatically both through a system of daily updates and real time updates to prevent RPKI INVALID routes from being carried in our routing table.
congratulations HE team!. On Mon, Jun 15, 2020 at 9:56 PM TJ Trout <tj@pcguys.us> wrote:
absolutely awesome Mike!
Can you put on the roadmap to enable irr based filters for customers with bgp communities?
On Mon, Jun 15, 2020 at 9:48 PM Mike Leber via NANOG <nanog@nanog.org> wrote:
I'm pleased to announce Hurricane Electric has completed our RPKI INVALID filtering project and we now have 0 RPKI INVALIDs in our routing table.
Hurricane Electric has 29021 BGP sessions with 22109 prefix filters with 7191 networks directly and 8239 networks including Internet exchanges.
We filter all BGP sessions using prefix filters based on IRR and RPKI.
These prefix filters are updated automatically both through a system of daily updates and real time updates to prevent RPKI INVALID routes from being carried in our routing table.
On Tue, 16 Jun 2020 at 07:51, Mike Leber via NANOG <nanog@nanog.org> wrote: Hey,
These prefix filters are updated automatically both through a system of daily updates and real time updates to prevent RPKI INVALID routes from being carried in our routing table.
What does real time mean in this context? Does it mean exactly 0s leak of INVALID, or 99% less than 30s? Or how do you define it? I'm trying to think of an ideal way to do this in Junos which does a few second ephemeral config commits. I could have an always-on SSH session to each device to amortise login time, but even then if I can do this cycle in 5s, I'd have to wait for BGP propagation delay in DFZ, which is measured in minutes not seconds. So my definition of real time here would be 99% <5min. -- ++ytti
Dear Mike, Ytti, others, First of all and most importantly: congratulations Mike! I thank you and your team for having constructed a great mechanism that helps honor the routing intentions everyone publishes in the RPKI. On Tue, Jun 16, 2020 at 09:08:41AM +0300, Saku Ytti wrote:
On Tue, 16 Jun 2020 at 07:51, Mike Leber via NANOG <nanog@nanog.org> wrote:
These prefix filters are updated automatically both through a system of daily updates and real time updates to prevent RPKI INVALID routes from being carried in our routing table.
What does real time mean in this context? Does it mean exactly 0s leak of INVALID, or 99% less than 30s? Or how do you define it?
My measurement (samplesize = 1) appears to indicate it took less than a minute between AS 6939 receiving (and accepting) an RPKI invalid route announcement, and that same route announcement being removed from the AS 6939 routing tables. Subsequently BGP withdraw messages were sent (for that RPKI invalid route via 6939) to all their peers, which a few more minutes to be processed and converge in the global routing system. I think it is important for the community to understand that the mechanism 6939 currently uses, is a different approach to what other network operators are doing. Most RPKI ROV deployments have set it up in such a way that a-priori all EBGP routers are primed with a full set of VRPs. Feeding the routers the VRPs through the RPKI-To-Router (RTR) protocol allows those BGP speakers to reject an RPKI invalid route - before - installing it in the Loc-RIB. At the same time, we should recognize and praise anyone who managed to deploy a reactive mechanism due to the lack of RTR support on a device. The "route collector -> script -> add prefix list to denylist" approach cannot be avoided if you have gear in the network that does not support RPKI OV as specced out in RFC 6811. The reactive mechanism must be viewed in context of other protection mechanisms that are deployed such as Peerlock, Maximum Prefix Limits, and IRR+RPKI+WHOIS based explicit allowlists, all of which 6939 has done. I actually had to jump through some hoops in the IRR system to trick 6939 into accepting my RPKI invalid route announcement. :-) Since it is with words that we construct the magic of our reality, let's assign a name specific to this engineering effort: Reactive RPKI ROV ================= Reactive RPKI ROV means that a network operator has set up a RPKI-capable route collector which peers with all BGP nodes that do not support RPKI. The route collector logs all RPKI route announcements it receives, and these messages can be used as input to an automated process to update prefix-list filters on the BGP node that received the RPKI invalid route announcement. The free OpenBGPD or BIRD software can be used as such route collectors. As is evident from my 'samplesize=1' study, that whole process can be completed in under one minute. The alternative to the "Reactive RPKI ROV" approach is what we've already done for years: emailing a NOC and request manual intervention to block a problematic route. At the best of times the 'calling the NOC' approach takes hours. As such, Reactive RPKI ROV is obviously far preferable to manual approaches. It would be awesome if the community openly shares notes on how to construct Reactive RPKI ROV deployments to improve routing for everyone. Maybe at some point some open source software pops up somewhere to make it easier for everyone? The future is bright, I'm optimistic we tame the Default-Free Zone beast :) So Mike, please consider to submit a presentation proposal to one of the network operator groups to outline in as much detail as possible how you did it. I'd love to learn from your experience!
So my definition of real time here would be 99% <5min.
I think it should be 99% <1 min, because that's how high 6939 set the bar :-) Kind regards, Job
On 16/Jun/20 22:07, Job Snijders wrote:
Since it is with words that we construct the magic of our reality, let's assign a name specific to this engineering effort:
Reactive RPKI ROV =================
Reactive RPKI ROV, it is, then :-). A great effort by HE for a network that may not yet completely support RFC 6811. We're quickly running out reasons. Mark.
Lets say someone makes an announcement that creates a RPKI invalid and it is determined to be a mistake. They then go back and add ROA objects to fix the problem. With this reactive RPKI approach then continue to block the route because filters where already generated and pushed out to routers? Or in other words, if the system can insert the filter in less than 60 seconds, how long does it take to get rid of the filter again when someone publish valid a ROA ? Regards, Baldur
Dear Baldur, On Wed, Jun 17, 2020 at 01:42:36PM +0200, Baldur Norddahl wrote:
Lets say someone makes an announcement that creates a RPKI invalid and it is determined to be a mistake. They then go back and add ROA objects to fix the problem. With this reactive RPKI approach then continue to block the route because filters where already generated and pushed out to routers? Or in other words, if the system can insert the filter in less than 60 seconds, how long does it take to get rid of the filter again when someone publish valid a ROA ?
What you describe here is what I'd call a "Garbage Collection" process. Garbage collection has to happen periodically. Probably not slower than once an hour. See the following link for an attempt to document that type of aspect of RPKI ROV deployments: https://tools.ietf.org/html/draft-ietf-sidrops-rpki-rov-timing-00.html Maybe HE can comment on their current timers? Kind regards, Job
On Mon, 15 Jun 2020, Mike Leber via NANOG wrote:
I'm pleased to announce Hurricane Electric has completed our RPKI INVALID filtering project and we now have 0 RPKI INVALIDs in our routing table.
Hurricane Electric has 29021 BGP sessions with 22109 prefix filters with 7191 networks directly and 8239 networks including Internet exchanges.
The flip side of this though is that every time an IP space owner publishes an ROA for an aggregate IP block and overlooks the fact that they have customers BGP originating a subnet of the aggregate with an ASN not permitted by an ROA, HE has "less than a full table". :( i.e. I'm questioning whether the system is mature enough and properly used widely enough for dropping RPKI invalids to be a good idea? ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On Wed, 17 Jun 2020 at 17:28, Jon Lewis <jlewis@lewis.org> wrote:
The flip side of this though is that every time an IP space owner publishes an ROA for an aggregate IP block and overlooks the fact that they have customers BGP originating a subnet of the aggregate with an ASN not permitted by an ROA, HE has "less than a full table". :(
It's hard to imagine RPKI doing its MVP function as a flip side. If this argument is against RPKI fundamentally, I can understand it, but that ship has sailed. -- ++ytti
On 17/Jun/20 16:25, Jon Lewis wrote:
The flip side of this though is that every time an IP space owner publishes an ROA for an aggregate IP block and overlooks the fact that they have customers BGP originating a subnet of the aggregate with an ASN not permitted by an ROA, HE has "less than a full table". :(
This is a known business use-case and it's incumbent upon the address and AS holders to co-ordinate this. We dropped some prefixes due to this in October of last year. Once we raised the issue with the remote network, it was fixed in 30 minutes.
i.e. I'm questioning whether the system is mature enough and properly used widely enough for dropping RPKI invalids to be a good idea?
Well, if we don't deploy, nothing matures. The problems we hit in the field will help to make the entire system better. Mark.
On 17/Jun/20 16:25, Jon Lewis wrote:
The flip side of this though is that every time an IP space owner publishes an ROA for an aggregate IP block and overlooks the fact that they have customers BGP originating a subnet of the aggregate with an ASN not permitted by an ROA, HE has "less than a full table". :(
This is a known business use-case and it's incumbent upon the address and AS holders to co-ordinate this.
We dropped some prefixes due to this in October of last year. Once we raised the issue with the remote network, it was fixed in 30 minutes.
Mark.
How did you know? Is there some monitoring system available to let you know or do you have your own? -Tim
Mark Tinka wrote on 18/06/2020 11:16:
On 17/Jun/20 21:16, Tim Warnock wrote:
How did you know? Is there some monitoring system available to let you know or do you have your own?
The usual way - a customer complained :-).
The customer monitoring system is very reliable and often superior to in-house solutions. Nick
On 18/Jun/20 12:51, Nick Hilliard wrote:
The customer monitoring system is very reliable and often superior to in-house solutions.
What really made the experience great for us is that directly contacting the remote network (somewhere in Eastern Europe) and getting them to fix the issue was far more effective than the usual, "Get your customer to log a case with our customer, who can then log a case with us, since we have no commercial contract with you". We had a completely separate second case caused by us rejecting an Invalid route. It got fixed in 30 minutes as well. Invalid routes being dropped creates downtime. People respond to downtime a lot more eagerly. Mark.
Dear Jon, group, On Wed, Jun 17, 2020 at 10:25:14AM -0400, Jon Lewis wrote:
On Mon, 15 Jun 2020, Mike Leber via NANOG wrote:
I'm pleased to announce Hurricane Electric has completed our RPKI INVALID filtering project and we now have 0 RPKI INVALIDs in our routing table.
Hurricane Electric has 29021 BGP sessions with 22109 prefix filters with 7191 networks directly and 8239 networks including Internet exchanges.
The flip side of this though is that every time an IP space owner publishes an ROA for an aggregate IP block and overlooks the fact that they have customers BGP originating a subnet of the aggregate with an ASN not permitted by an ROA, HE has "less than a full table". :(
Do you remember the old BSD paradigm? ... "less is more" I think it applies here. We are now in a time where a *smaller* routing table entry list count is preferable to a 'full' table, because the fullest table is likely to also include problematic BGP routing information. It is important to recognise that RPKI ROA creation is an *OPTIONAL* protection mechanism. If you create ROAs, you indeed can harm your network, but at the same time, if you create the ROAs correctly, you will gain massive benefits. RPKI ROA creation is a big hammer. Everyone needs to think carefully about each ROA they create and if it will positively or negatively impact their network. NTT spend *months* creating ROAs for all the prefixes, researching for each BGP announcement if the ROA would be good or bad. We now got virtually all our space covered by ROAs, it'snice.
i.e. I'm questioning whether the system is mature enough and properly used widely enough for dropping RPKI invalids to be a good idea?
Yes. "We made an impossible bird, and it was able to fly". :-) The global deployment of RPKI ROV in the BGP Default-Free Zone already is a fact, we made it work! All carriers that keep the Internet connected together, and care about preventing routing incidents - are committed to this effort. Thousands of people are now involved at this point. What now remains.. is polishing away some of the sharp edges [1][2][3][4], and bikeshedding about some of the colors :-) The below links are like an 'ala carte menu', anyone can engage in discussions about RPKI at any level they feel comfortable with. Many people are looking for feedback and input through different forums on what and how to build it. Pick a platform you enjoy engaging on and participate (and stick around on this mailing list, all good)! :) Kind regards, Job [1]: https://www.youtube.com/watch?v=oBwAQep7Q7o [2]: https://mailarchive.ietf.org/arch/msg/sidrops/ayCQbKvJZmE5TGq9IxL9qUM-zQ4/ [3]: https://github.com/RIPE-NCC/rpki-validator-3/issues/158 [4]: https://twitter.com/routinator3000/status/1255439035553779713
Do you remember the old BSD paradigm? ... "less is more"
s/bsd/mies/ credit where due.
We are now in a time where a *smaller* routing table entry list count is preferable to a 'full' table, because the fullest table is likely to also include problematic BGP routing information.
do you have measurement of that? i would be *really* interested. randy
participants (11)
-
Baldur Norddahl
-
Job Snijders
-
Jon Lewis
-
Mark Tinka
-
Mehmet Akcin
-
Mike Leber
-
Nick Hilliard
-
Randy Bush
-
Saku Ytti
-
Tim Warnock
-
TJ Trout