Hi all. Just to let this group know that we've started the process of activating the dropping of Invalids for all our eBGP customers. We're starting off with our Juniper edge routers. Once those are done, we'll move on to our Cisco ASR1006 routers, finishing off with our Cisco ASR920 routers. I'll let you know if anything catches fire :-). Mark.
Mark Invalid according to RPKI or IRR? Or both? Regards as On Tue, 10 Dec 2019, 18:22 Randy Bush, <randy@psg.com> wrote:
mark,
Just to let this group know that we've started the process of activating the dropping of Invalids for all our eBGP customers.
cool. any stats and lessons appreciated.
randy
Dear Arturo, group, On Tue, Dec 10, 2019 at 20:51 Arturo Servin <arturo.servin@gmail.com> wrote:
Invalid according to RPKI or IRR? Or both?
In this context the use of the word “invalid” refers to the result of validation procedure described in RFC 6811 - which is to match received BGP updates to the RPKI and attach either of “valid”, “invalid”, or “not-found”. In IRR, the challenge has always been that “route:” objects describe a state of the network that may exist, but the semantics of “route:” objects don’t allow extrapolation towards what should definitely *not* exist in the BGP Default-Free Zone. RPKI ROAs (compared to IRR objects) carry different meaning: the existence of a ROA (both by definition and common implementation) supersedes other data sources (IRR, LOAs, or comments in whois records, etc), and as such can be used on any type of EBGP session for validation of the received Internet routing information. Kind regards, Job
RPKI ROAs (compared to IRR objects) carry different meaning: the existence of a ROA (both by definition and common implementation) supersedes other data sources (IRR, LOAs, or comments in whois records, etc), and as such can be used on any type of EBGP session for validation of the received Internet routing information.
Which brings me to my favorite possible RPKI-IRR integration: a ROA that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP. Rubens
On Tue, Dec 10, 2019 at 7:32 PM Rubens Kuhl <rubensk@gmail.com> wrote:
RPKI ROAs (compared to IRR objects) carry different meaning: the existence of a ROA (both by definition and common implementation) supersedes other data sources (IRR, LOAs, or comments in whois records, etc), and as such can be used on any type of EBGP session for validation of the received Internet routing information.
Which brings me to my favorite possible RPKI-IRR integration: a ROA that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP.
Is this required? or a crutch for use until a network can publish all of their routing data in the RPKI? -chris
Which brings me to my favorite possible RPKI-IRR integration: a ROA that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP.
Is this required? or a crutch for use until a network can publish all of their routing data in the RPKI?
It provides an adoption path based on the information already published in IRRs by operators for some years. It also covers for the fact that RPKI currently is only origin-validation. Rubens
On Wed, Dec 11, 2019 at 5:52 AM Rubens Kuhl <rubensk@gmail.com> wrote:
Which brings me to my favorite possible RPKI-IRR integration: a ROA that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP.
Is this required? or a crutch for use until a network can publish all of their routing data in the RPKI?
It provides an adoption path based on the information already published in IRRs by operators for some years. It also covers for the fact that RPKI currently is only origin-validation.
I would think that if you(royal you) already are publishing: "these are the routes i'm going to originate (and here are my customer lists)" and you (royal you) are accepting the effort to publish 1 'new' thing in the RPKI. you could just as easily take the 'stuff I'm going to publish in IRR' and 'also publish in RPKI'. Right? So adoption path aside, because that seems like a weird argument (since your automation to make IRR data appear can ALSO just send rpki updates), your belief is that: "Hey, this irr object is really, really me" is still useful/required/necessary/interesting? -chris
Right, but you’re also taking a strong, cryptographically-authenticated system and making it sign non-authenticated data. Please don’t do that. If you want to add the data to RPKI, there should be a way to add the data to RPKI, not sign away control of your number resources to unauthenticated sources.
On Dec 11, 2019, at 10:17, Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Wed, Dec 11, 2019 at 5:52 AM Rubens Kuhl <rubensk@gmail.com> wrote:
Which brings me to my favorite possible RPKI-IRR integration: a ROA that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP.
Is this required? or a crutch for use until a network can publish all of their routing data in the RPKI?
It provides an adoption path based on the information already published in IRRs by operators for some years. It also covers for the fact that RPKI currently is only origin-validation.
I would think that if you(royal you) already are publishing: "these are the routes i'm going to originate (and here are my customer lists)"
and you (royal you) are accepting the effort to publish 1 'new' thing in the RPKI.
you could just as easily take the 'stuff I'm going to publish in IRR' and 'also publish in RPKI'. Right? So adoption path aside, because that seems like a weird argument (since your automation to make IRR data appear can ALSO just send rpki updates), your belief is that: "Hey, this irr object is really, really me" is still useful/required/necessary/interesting?
-chris
On Wed, Dec 11, 2019 at 11:35 AM Matt Corallo <nanog@as397444.net> wrote:
Right, but you’re also taking a strong, cryptographically-authenticated system and making it sign non-authenticated data. Please don’t do that. If you want to add the data to RPKI, there should be a way to add the data to RPKI, not sign away control of your number resources to unauthenticated sources.
I don't think that's what I was saying, at all, actually. I was saying: "I assume you must have some system to create IRR data, that system knows: '1.0.1.0/24 ASFOO MAINT-FOOBAR' is ok." that system could now add '1.0.1.0/24 ASFOO' to the RPKI. Where does that say: "make it sign unauthenticated data" ?
On Dec 11, 2019, at 10:17, Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Wed, Dec 11, 2019 at 5:52 AM Rubens Kuhl <rubensk@gmail.com> wrote:
Which brings me to my favorite possible RPKI-IRR integration: a ROA that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP.
Is this required? or a crutch for use until a network can publish all of their routing data in the RPKI?
It provides an adoption path based on the information already published in IRRs by operators for some years. It also covers for the fact that RPKI currently is only origin-validation.
I would think that if you(royal you) already are publishing: "these are the routes i'm going to originate (and here are my customer lists)"
and you (royal you) are accepting the effort to publish 1 'new' thing in the RPKI.
you could just as easily take the 'stuff I'm going to publish in IRR' and 'also publish in RPKI'. Right? So adoption path aside, because that seems like a weird argument (since your automation to make IRR data appear can ALSO just send rpki updates), your belief is that: "Hey, this irr object is really, really me" is still useful/required/necessary/interesting?
-chris
Ah, right. Fair. I was responding, I suppose, to Rubens' original description, which was exactly this. On 12/11/19 5:08 PM, Christopher Morrow wrote:
On Wed, Dec 11, 2019 at 11:35 AM Matt Corallo <nanog@as397444.net> wrote:
Right, but you’re also taking a strong, cryptographically-authenticated system and making it sign non-authenticated data. Please don’t do that. If you want to add the data to RPKI, there should be a way to add the data to RPKI, not sign away control of your number resources to unauthenticated sources.
I don't think that's what I was saying, at all, actually.
I was saying: "I assume you must have some system to create IRR data, that system knows: '1.0.1.0/24 ASFOO MAINT-FOOBAR' is ok."
that system could now add '1.0.1.0/24 ASFOO' to the RPKI.
Where does that say: "make it sign unauthenticated data" ?
On Dec 11, 2019, at 10:17, Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Wed, Dec 11, 2019 at 5:52 AM Rubens Kuhl <rubensk@gmail.com> wrote:
Which brings me to my favorite possible RPKI-IRR integration: a ROA that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP.
Is this required? or a crutch for use until a network can publish all of their routing data in the RPKI?
It provides an adoption path based on the information already published in IRRs by operators for some years. It also covers for the fact that RPKI currently is only origin-validation.
I would think that if you(royal you) already are publishing: "these are the routes i'm going to originate (and here are my customer lists)"
and you (royal you) are accepting the effort to publish 1 'new' thing in the RPKI.
you could just as easily take the 'stuff I'm going to publish in IRR' and 'also publish in RPKI'. Right? So adoption path aside, because that seems like a weird argument (since your automation to make IRR data appear can ALSO just send rpki updates), your belief is that: "Hey, this irr object is really, really me" is still useful/required/necessary/interesting?
-chris
On Wed, Dec 11, 2019 at 12:16 PM Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Wed, Dec 11, 2019 at 5:52 AM Rubens Kuhl <rubensk@gmail.com> wrote:
Which brings me to my favorite possible RPKI-IRR integration: a ROA
that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP.
Is this required? or a crutch for use until a network can publish all of their routing data in the RPKI?
It provides an adoption path based on the information already published in IRRs by operators for some years. It also covers for the fact that RPKI currently is only origin-validation.
I would think that if you(royal you) already are publishing: "these are the routes i'm going to originate (and here are my customer lists)"
and you (royal you) are accepting the effort to publish 1 'new' thing in the RPKI.
you could just as easily take the 'stuff I'm going to publish in IRR' and 'also publish in RPKI'. Right? So adoption path aside, because that seems like a weird argument (since your automation to make IRR data appear can ALSO just send rpki updates), your belief is that: "Hey, this irr object is really, really me" is still useful/required/necessary/interesting?
The history of development of BGP path-validation standards does not give much hope so far... people never seen to be able to agree on how to do it. OTOH, people seem comfortable publishing those relations in IRR... and some using that for prefix-filter building, including AS 15169 that presented yesterday on an IX conference and said preferring using IRR over RPKI to automate prefix filtering. Frankly, I'll take any form of authenticated path-validation that gets traction in the DFZ, whether it's pretty or not. Pure RPKI for both origin and path validation looks much better to me, but will it fly ? Rubens
On Wed, Dec 11, 2019 at 11:52 AM Rubens Kuhl <rubensk@gmail.com> wrote:
On Wed, Dec 11, 2019 at 12:16 PM Christopher Morrow <morrowc.lists@gmail.com> wrote:
On Wed, Dec 11, 2019 at 5:52 AM Rubens Kuhl <rubensk@gmail.com> wrote:
Which brings me to my favorite possible RPKI-IRR integration: a ROA that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP.
Is this required? or a crutch for use until a network can publish all of their routing data in the RPKI?
It provides an adoption path based on the information already published in IRRs by operators for some years. It also covers for the fact that RPKI currently is only origin-validation.
I would think that if you(royal you) already are publishing: "these are the routes i'm going to originate (and here are my customer lists)"
and you (royal you) are accepting the effort to publish 1 'new' thing in the RPKI.
you could just as easily take the 'stuff I'm going to publish in IRR' and 'also publish in RPKI'. Right? So adoption path aside, because that seems like a weird argument (since your automation to make IRR data appear can ALSO just send rpki updates), your belief is that: "Hey, this irr object is really, really me" is still useful/required/necessary/interesting?
The history of development of BGP path-validation standards does not give much hope so far... people never seen to be able to agree on how to do it.
I think path-validation is .. BGPSec - rfc8206 right? so clearly some folks agreed on the process/path/etc for validating paths in bgp. Will that get deployed? unclear to me... To get it deployed though we'd need to start at the beginning of the story and get RPKI data published.
OTOH, people seem comfortable publishing those relations in IRR... and some using that for prefix-filter building, including AS 15169 that presented yesterday on an IX conference and said preferring using IRR over RPKI to automate prefix filtering.
oh, someone presented yesterday, cool... err... I think their presentation says (or should have said?) something like: "today we plan to use IRR data, we'll be adding RPKI data as it's available and we're comfy with the integration efforts..." (that's what this preso said anyway: https://docs.google.com/presentation/d/1-SIa98o-QrMALW3maAvu_W_PHTJRAr0Nv-kO... when i presented it 2x) I think that's still the project plan...
Frankly, I'll take any form of authenticated path-validation that gets traction in the DFZ, whether it's pretty or not. Pure RPKI for both origin and path validation looks much better to me, but will it fly ?
<insert will it blend meme> I think the RPKI adoption and usage in the DFZ and near-to-dfz is picking up (says the graphs, etc). I'd love to see it more picked up, but folk don't REALLY have a need for it 'yet', once more large players start publishing and requiring though :) the goal of the slide deck above, and job's efforts and mark's efforts and jay/nimrod's efforts (and cloudflare/maaaaahtin/jerome/etc .. plus a host of others) is to make it apparent: "Hey, get your data straight, publish in RPKI, start validating!" -chris
Rubens
Christopher Morrow wrote on 11/12/2019 03:45:
On Tue, Dec 10, 2019 at 7:32 PM Rubens Kuhl <rubensk@gmail.com> wrote:
Which brings me to my favorite possible RPKI-IRR integration: a ROA that says that IRR objects on IRR source x with maintainer Y are authoritative for a given number resource. Kinda like SPF for BGP.
Is this required? or a crutch for use until a network can publish all of their routing data in the RPKI?
it sounds like a great idea which is a terrible idea. Each operator will make their own choice about what RPKI TALs to accept. Once they're loaded up on the rpki caches, do you really want to push more complexity down to the router control plane with and start making per-device choices about how to handle the trust level of each individual ROA? The internet dfz is already being killed with complexity. Configuring per-prefix trust levels at a per-device level is nuts - and wholly non-scalable. If you don't want to see ROAs from a specific source, then don't import their TAL. Nick
[ found in old emacs buffer. might have already been sent ]
Invalid according to RPKI or IRR? Or both?
In this context the use of the word “invalid” refers to the result of validation procedure described in RFC 6811 - which is to match received BGP updates to the RPKI and attach either of “valid”, “invalid”, or “not-found”.
In IRR, the challenge has always been that “route:” objects describe a state of the network that may exist, but the semantics of “route:” objects don’t allow extrapolation towards what should definitely *not* exist in the BGP Default-Free Zone.
RPKI ROAs (compared to IRR objects) carry different meaning: the existence of a ROA (both by definition and common implementation) supersedes other data sources (IRR, LOAs, or comments in whois records, etc), and as such can be used on any type of EBGP session for validation of the received Internet routing information.
do not disagree with your pedantry. but ... as i am pretty sure arturo knows all that. i suspect he was wondering if mark is gonna throw irr data in the mix the way chris says google will (or does?). and if so, how? seems a useful question. irr acls scale poorly in routers. but mark said customer-facing, which could be reasonable depending on the platform. e.g. ntt uses irr-based acls toward customers. but i am cheered if mark is dropping rpki-based origin validation invalids. it's a big step. randy
On 16/Dec/19 23:49, Randy Bush wrote:
as i am pretty sure arturo knows all that. i suspect he was wondering if mark is gonna throw irr data in the mix the way chris says google will (or does?). and if so, how? seems a useful question.
No, we'll be focusing solely on RPKI.
irr acls scale poorly in routers.
One of the reasons we have been putting all our energy into RPKI since 2014.
but mark said customer-facing, which could be reasonable depending on the platform. e.g. ntt uses irr-based acls toward customers.
So we have 4 main systems for customer edge termination: - MX480's, primarily used in the data centre. - ASR1006's, also primarily used in the data centre for non-Ethernet customers (waning, over time). - ASR920's, used in the Metro. - MX204's, used in the Metro. Mark.
but mark said customer-facing, which could be reasonable depending on the platform. e.g. ntt uses irr-based acls toward customers.
So we have 4 main systems for customer edge termination:
- MX480's, primarily used in the data centre. - ASR1006's, also primarily used in the data centre for non-Ethernet customers (waning, over time). - ASR920's, used in the Metro. - MX204's, used in the Metro.
so junos and xr support rov sufficiently for production. cool! randy
On 18/Dec/19 00:35, Randy Bush wrote:
and how does that work out at scale when roa changes need previous bgp to be run against them?
If I'm honest, not something I've studied in great detail. For the moment, we are running RPKI on IOS XE boxes that are doing just peering. We have not had any routing issues on those, and I do know of a few networks that had fat-fingered their ROA's that led them to get dropped on our end due to being Invalid. The issue cleared up after they fixed their error, and there was no manual intervention needed on these routers. The customer edge is where we shall be dropping Invalids on this code base on a much larger scale. Notes to take; plenty... Mark.
So just an update on this. We've since completed the roll-out of dropping Invalids on eBGP sessions with customers as well. It also included some Cisco ME3600X routers that will ultimately be replaced this year by Cisco ASR920 routers. All in all, no major drama. 2 main issues I'd like to highlight: * We came across a number of customers whose routes were marked as Invalid due to inconsistent route origination, i.e., they had their routes originated by them and one or more other ASN's who had not created corresponding ROA's for the same. * In IOS XE, all iBGP routes are marked as Valid by default. This is not a big problem in practice, however, because all eBGP points are checked for RPKI state, and anything marked as Invalid is dropped. So whatever will appear in the iBGP would have already been scraped. Of course, IOS XE doing this is not ideal at all, and they are breaking the RFC mandate, but it doesn't cause any real harm. Mark.
Hello Mark, On Fri, 10 Jan 2020 at 13:39, Mark Tinka <mark.tinka@seacom.mu> wrote:
So just an update on this.
We've since completed the roll-out of dropping Invalids on eBGP sessions with customers as well.
It also included some Cisco ME3600X routers that will ultimately be replaced this year by Cisco ASR920 routers.
Thanks for sharing all this. Regarding those 2 platforms specifically, what release are you using here that does not blow up? IIRC you had some RPKI related crash bugs at some point in time?
In IOS XE, all iBGP routes are marked as Valid by default. This is not a big problem in practice, however, because all eBGP points are checked for RPKI state, and anything marked as Invalid is dropped. So whatever will appear in the iBGP would have already been scraped. Of course, IOS XE doing this is not ideal at all, and they are breaking the RFC mandate, but it doesn't cause any real harm.
Apparently though there are real life issues with this, specifically when: - there is no ROA, so prefixes are supposed to be UNKNOWN on all nodes - but IOS-XE prefers VALID over UNKNOWN (changing best path selection) - iBGP is *always* VALID (even if it's really UNKNOWN), eBGP is showing UNKNOWN, so iBGP is preferred over eBGP which breaks a lot of assumptions and "hot potato" concepts (possible temporary routing loops, other than of course different egress behavior) Here's a blog post about this: http://schoolsysadmin.blogspot.com/2019/07/securing-internet-routing-rpki-ov... Apparently there is an IOS feature "Announce RPKI Validation State to Neighbors" to transmit the *real* RPKI state in iBGP (so as opposed to defaulting to VALID for all iBGP neighbors), I'm not sure if that fixes this problem or not. It doesn't really address the root cause (which is: unwanted and not configurable interference with the best path selection algorithm) - but at it can at least hide it's symptoms. RPKI implementations should not touch best path selection. Dropping RPKI invalids is the real use-case here, and if someones wants to loc-pref based on RPKI status we should allow it (even if it doesn't make a lot of sense), but having the RPKI implementation intervene in the best path selection without the possibility to disable it is ... frustrating. How much do you rely on "hot potato" routing for peers/transit and customers? How does that work for you with RPKI unkowns? Thanks for sharing your experiences, Lukas
On 10/Jan/20 16:15, Lukas Tribus wrote:
Thanks for sharing all this. Regarding those 2 platforms specifically, what release are you using here that does not blow up?
On the ASR920, we are on 16(11)01a. On the ME3600X, we are on 15.6(2)SP6.
IIRC you had some RPKI related crash bugs at some point in time?
Yes, that was the first time we were deploying RPKI in 2014 and the code back then crashed the ME3600X. No such problem this time around.
- there is no ROA, so prefixes are supposed to be UNKNOWN on all nodes - but IOS-XE prefers VALID over UNKNOWN (changing best path selection) - iBGP is *always* VALID (even if it's really UNKNOWN), eBGP is showing UNKNOWN, so iBGP is preferred over eBGP which breaks a lot of assumptions and "hot potato" concepts (possible temporary routing loops, other than of course different egress behavior)
So your timing on this is ominous. In the last day or so, we had an issue with a customer on one of our ASR1006 edge routers that fell victim to this IOS XE stupidity. An alternate path toward them learned from a peer was sent back to the edge router they are connected to, which chose it over the local one because, well, it was an iBGP route. We didn't notice this issue with this customer since enabling ROV on this box weeks ago, which means the alternate route became available in the last 2 - 3 days, e.g., perhaps they turned up an alternative provider, or changed their routing toward them for us to see another path. Since this IOS XE stupidity is not configurable, what we've decided to do is disable ROV on all ASR1006 boxes for now. This is not a big issue for us. We've only got 2 customers using them as these boxes only carry non-Ethernet customers. While this should be an issue for our ASR920 and ME3600X routers also, it isn't because we run BGP-SD on those, i.e., even if the RIB will have all iBGP routes marked as Valid, they won't be installed in FIB, whereas the eBGP routes learned locally from the customer will. Having to create a ROA to solve this, while feasible, is inappropriate for a solution, especially when Juniper do it correctly. Randy and I complained to Cisco about this years back, and AFAIK, it was only fixed in IOS XR. That this is still going on in 2020 is silly, especially when it's clear that they are in violation of the RFC.
Apparently there is an IOS feature "Announce RPKI Validation State to Neighbors" to transmit the *real* RPKI state in iBGP (so as opposed to defaulting to VALID for all iBGP neighbors), I'm not sure if that fixes this problem or not. It doesn't really address the root cause (which is: unwanted and not configurable interference with the best path selection algorithm) - but at it can at least hide it's symptoms.
I've not tried communicating RPKI state between routers via BGP communities. One of the reasons I like RPKI is because it is a feature that works on each router independent of another. Each router has a discrete RTR session to a validator, and can make its own RPKI decisions without any regard for the rest of the network. And yet all routers in the network can do this and equally have a converged RPKI state, without ever speaking to one another. So the idea of having routers co-ordinate RPKI information through communities is one I am not so keen on, if I'm honest. Not only do you need to worry about inter-op issues between vendors, there is potential for problems when code changes over time. I'd rather not deal with that, especially since what Cisco are doing with IOS XE is simply a broken implementation. That said, if there is anyone out there who has done this and sees it as a solution to the problem, I'm sure this list would like to hear about it.
RPKI implementations should not touch best path selection. Dropping RPKI invalids is the real use-case here, and if someones wants to loc-pref based on RPKI status we should allow it (even if it doesn't make a lot of sense), but having the RPKI implementation intervene in the best path selection without the possibility to disable it is ... frustrating.
Agreed. At least, if IOS XE had a knob that could "set rpki [valid|notfound|invalid]" this would somewhat help. But alas, they don't :-(. You can only match on existing RPKI state. You cannot manually set RPKI state in IOS XE routing policy. I mean, how dumb is that? It's pretty presumptuous of Cisco to automatically apply policy for you re: RPKI in IOS XE, but then show they can do it right in IOS XR. Unreal!
How much do you rely on "hot potato" routing for peers/transit and customers? How does that work for you with RPKI unkowns?
As above, we've disabled RPKI on all ASR1006 edge routers. It affects only 2 of our customers, so not an issue. I'll go through a few bottles of wine and some music this weekend as I summon up the energy to write to Cisco again to fix this. If I don't, I'll just keep sending money Juniper's way. Simple, and simpler! Mark.
participants (9)
-
Arturo Servin
-
Christopher Morrow
-
Job Snijders
-
Lukas Tribus
-
Mark Tinka
-
Matt Corallo
-
Nick Hilliard
-
Randy Bush
-
Rubens Kuhl