Re: CloudFlare issues?

older
Real-world MPLS P/LSR experience...

Francois Lecavalier

4 Jul 2019 4 Jul '19

3:22 p.m.

Hi Mark, Following that Verizon debacle I got onboard with ROV, after a couple research I stopped my choice on the ....drum roll.... CloudFlare GoRTR (https://github.com/cloudflare/gortr). If you trust them enough they provide an updated JSON every 15 minutes of the global RIR aggregate. I'll see down the road if we'll fetch them ourselves but at least it got us up and running in less than an hour. It was also easy for us to deploy as the routers and the servers are on the same PoP directly connected, so we don't need the whole encryption recipe they provide for mass distribution. But I also have a question for all the ROA folks out there. So far we are not taking any action other than lowering the local-pref - we want to make sure this is stable before we start denying prefixes. So the question, is it safe as of this date to : 1.Accept valid, 2. Accept unknown, 3. Reject invalid? Have any large network who implemented it dealt with unreachable destinations? I'm wondering as I haven't found any blog mentioning anything in this regard and ClouFlare docs only shows example for valid and invalid, but nothing for unknown. My assumption is that 1.Accept valid, 2. Accept unknown, 3. Reject invalid shouldn't break anything. Thanks, -Francois This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. Ce courrier ?lectronique est confidentiel et prot?g?. L'exp?diteur ne renonce pas aux droits et obligations qui s'y rapportent. Toute diffusion, utilisation ou copie de ce message ou des renseignements qu'il contient par une personne autre que le (les) destinataire(s) d?sign?(s) est interdite. Si vous recevez ce courrier ?lectronique par erreur, veuillez m'en aviser imm?diatement, par retour de courrier ?lectronique ou par un autre moyen.

Attachments:

attachment.html (text/html — 4.2 KB)

Show replies by date

Job Snijders

4 Jul 4 Jul

3:33 p.m.

New subject: CloudFlare issues?

Dear Francois, On Thu, Jul 04, 2019 at 03:22:23PM +0000, Francois Lecavalier wrote:

...

Following that Verizon debacle I got onboard with ROV, after a couple research I stopped my choice on the ....drum roll.... CloudFlare GoRTR (https://github.com/cloudflare/gortr). If you trust them enough they provide an updated JSON every 15 minutes of the global RIR aggregate.

At this point in time I think the ideal deployment model is to perform the validation within your administrative domain and run your own validators. You can combine routinator with gortr, or use cloudflare's octorpki software https://github.com/cloudflare/cfrpki

...

I'll see down the road if we'll fetch them ourselves but at least it got us up and running in less than an hour. It was also easy for us to deploy as the routers and the servers are on the same PoP directly connected, so we don't need the whole encryption recipe they provide for mass distribution.

yeah, that is true!

...

But I also have a question for all the ROA folks out there. So far we are not taking any action other than lowering the local-pref - we want to make sure this is stable before we start denying prefixes. So the question, is it safe as of this date to : 1.Accept valid, 2. Accept unknown, 3. Reject invalid? Have any large network who implemented it dealt with unreachable destinations? I'm wondering as I haven't found any blog mentioning anything in this regard and ClouFlare docs only shows example for valid and invalid, but nothing for unknown.

I believe at this point in time it is safe to accept valid and unknown (combined with an IRR filter), and reject RPKI invalid BGP announcements at your EBGP borders. Large examples of other organisations who already are rejecting invalid announcements are AT&T, Nordunet, DE-CIX, YYCIX, XS4ALL, MSK-IX, INEX, France-IX, Seacomm, Workonline, KPN International, and hundreds of others. You can run an analysis yourself to see how traffic would be impacted in your network using pmacct or Kentik, see this post for more info: https://mailman.nanog.org/pipermail/nanog/2019-February/099522.html

...

My assumption is that 1.Accept valid, 2. Accept unknown, 3. Reject invalid shouldn't break anything.

Correct! Let us know how it went :-) Kind regards, Job

Ben Maddison

3:50 p.m.

New subject: CloudFlare issues?

Hi Francois, On Thu, 2019-07-04 at 17:33 +0200, Job Snijders wrote:

...

Dear Francois,

On Thu, Jul 04, 2019 at 03:22:23PM +0000, Francois Lecavalier wrote:

...
At this point in time I think the ideal deployment model is to perform the validation within your administrative domain and run your own validators.

...

...
But I also have a question for all the ROA folks out there. So far we are not taking any action other than lowering the local-pref - we want to make sure this is stable before we start denying prefixes. So the question, is it safe as of this date to : 1.Accept valid, 2. Accept unknown, 3. Reject invalid? Have any large network who implemented it dealt with unreachable destinations? I'm wondering as I haven't found any blog mentioning anything in this regard and ClouFlare docs only shows example for valid and invalid, but nothing for unknown.

Mark Tinka

5:17 p.m.

New subject: CloudFlare issues?

On 4/Jul/19 17:50, Ben Maddison via NANOG wrote:

...

We have been dropping Invalids since April, and have had only a (single-digit) handful of support requests related to those becoming unreachable.

We've had 2 cases where customers could not reach a prefix. Both were mistakes (as we've found most Invalid routes to be), which were promptly fixed. One of them was where a cloud provider decided to originate a longer prefix on behalf of their content-producing customer, using their own AS as opposed to the one the customer had used to create the ROA for the covering block. Mark.

Francois Lecavalier

6:46 p.m.

New subject: CloudFlare issues?

...

...
At this point in time I think the ideal deployment model is to perform the validation within your administrative domain and run your own validators.

...

+1

We'll definitely look into this shortly. I definitely don't want to leave a security measure in the end of a third party but with my team being so busy it was a quick temp fix.

...

The larger challenge has been related to vendor implementation choices and bugs, particularly on ios-xe. Happy to go into more detail if anyone is interested.

We are on Juniper MX204's at the edge and they have been solid for the last 60 weeks - we ran into a long list of bugs on other platforms but not on these. So I had about 4200 routes marked as invalid. After looking at a sample of them it looks like most of them have a valid ROA with an improper mask length - so there is ultimately a route to these prefixes and at worse would result in "suboptimal" routing - or should I say: the remote network can't control its route propagation anymore. In most case they are a stub networks with a single /24 reassigned from the upstream provider. I have no traffic going directly to these networks and I don't expect any to go there anytime soon. It's been close to 3 hours now since I dropped them - radio silence. Whoever fears implementing RPKI/ROA/ROV, simply don't. It's very easy to implement, validate and troubleshoot. -----Original Message----- From: Ben Maddison <benm@workonline.africa> Sent: Thursday, July 4, 2019 11:51 AM To: job@ntt.net; Francois Lecavalier <Francois.Lecavalier@mindgeek.com> Cc: nanog@nanog.org Subject: [External] Re: CloudFlare issues? Hi Francois, On Thu, 2019-07-04 at 17:33 +0200, Job Snijders wrote:

...

Dear Francois,

On Thu, Jul 04, 2019 at 03:22:23PM +0000, Francois Lecavalier wrote:

...
At this point in time I think the ideal deployment model is to perform the validation within your administrative domain and run your own validators.

...

...
But I also have a question for all the ROA folks out there. So far we are not taking any action other than lowering the local-pref - we want to make sure this is stable before we start denying prefixes. So the question, is it safe as of this date to : 1.Accept valid, 2. Accept unknown, 3. Reject invalid? Have any large network who implemented it dealt with unreachable destinations? I'm wondering as I haven't found any blog mentioning anything in this regard and ClouFlare docs only shows example for valid and invalid, but nothing for unknown.

We have been dropping Invalids since April, and have had only a (single-digit) handful of support requests related to those becoming unreachable. The larger challenge has been related to vendor implementation choices and bugs, particularly on ios-xe. Happy to go into more detail if anyone is interested. I would recommend *not* taking any policy action that distinguishes Valid from Unknown. If you find that you have routes for the same prefix/len with both statuses, then that is a bug and/or misconfiguration which you could turn into a loop by taking policy action on that difference. Cheers, Ben This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. Ce courrier électronique est confidentiel et protégé. L'expéditeur ne renonce pas aux droits et obligations qui s'y rapportent. Toute diffusion, utilisation ou copie de ce message ou des renseignements qu'il contient par une personne autre que le (les) destinataire(s) désigné(s) est interdite. Si vous recevez ce courrier électronique par erreur, veuillez m'en aviser immédiatement, par retour de courrier électronique ou par un autre moyen.

Ben Maddison

6:54 p.m.

New subject: CloudFlare issues?

Welcome to the club! Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: Francois Lecavalier <Francois.Lecavalier@mindgeek.com> Sent: Thursday, July 4, 2019 8:46:46 PM To: Ben Maddison; job@ntt.net Cc: nanog@nanog.org Subject: RE: CloudFlare issues?

...

...
At this point in time I think the ideal deployment model is to perform the validation within your administrative domain and run your own validators.

...

+1

We'll definitely look into this shortly. I definitely don't want to leave a security measure in the end of a third party but with my team being so busy it was a quick temp fix.

...

The larger challenge has been related to vendor implementation choices and bugs, particularly on ios-xe. Happy to go into more detail if anyone is interested.

...

Dear Francois,

On Thu, Jul 04, 2019 at 03:22:23PM +0000, Francois Lecavalier wrote:

...
At this point in time I think the ideal deployment model is to perform the validation within your administrative domain and run your own validators.

...

...
But I also have a question for all the ROA folks out there. So far we are not taking any action other than lowering the local-pref - we want to make sure this is stable before we start denying prefixes. So the question, is it safe as of this date to : 1.Accept valid, 2. Accept unknown, 3. Reject invalid? Have any large network who implemented it dealt with unreachable destinations? I'm wondering as I haven't found any blog mentioning anything in this regard and ClouFlare docs only shows example for valid and invalid, but nothing for unknown.

Job Snijders

6:57 p.m.

New subject: CloudFlare issues?

On Thu, Jul 4, 2019 at 8:46 PM Francois Lecavalier <Francois.Lecavalier@mindgeek.com> wrote:

...

It's been close to 3 hours now since I dropped them - radio silence.

I am going to assume that "radio silence" for you means that your network is fully functional and none of your customers have raised issues! :-)

...

Whoever fears implementing RPKI/ROA/ROV, simply don't. It's very easy to implement, validate and troubleshoot.

Thank you for sharing your report. I believe it is good to share rpki stories with each other, not just to celebrate the deployment of an exciting technology, but also to help provide debugging information ahead of time should there be issues between provider A and B due to a ROA misconfiguration. Announcing to the public that one has deployed RPKI - in this stage of the lifecycle of the tech - probably is a productive action to consider. Anyway, you can now enjoy https://rpki.net/s/rpki-test even more! :-) Kind regards, Job

Job Snijders

7:20 p.m.

New subject: CloudFlare issues?

...

Anyway, you can now enjoy https://rpki.net/s/rpki-test even more! :-)

my apologies, I fumbled the ball on typing in that URL, I intended to point here: https://www.ripe.net/s/rpki-test

Mark Tinka

6:59 p.m.

New subject: CloudFlare issues?

On 4/Jul/19 20:46, Francois Lecavalier wrote:

...

It's been close to 3 hours now since I dropped them - radio silence.

Whoever fears implementing RPKI/ROA/ROV, simply don't. It's very easy to implement, validate and troubleshoot.

Well done! Congrats! Mark.

Mark Tinka

6:59 p.m.

New subject: CloudFlare issues?

On 4/Jul/19 20:46, Francois Lecavalier wrote:

...

It's been close to 3 hours now since I dropped them - radio silence.

Whoever fears implementing RPKI/ROA/ROV, simply don't. It's very easy to implement, validate and troubleshoot.

Well done! Congrats! Mark.

Mark Tinka

5:14 p.m.

New subject: CloudFlare issues?

On 4/Jul/19 17:33, Job Snijders wrote:

...

At this point in time I think the ideal deployment model is to perform the validation within your administrative domain and run your own validators.

In essence, this is also my thought process. I think Cloudflare are very well-intentioned in making it as painless as possible to support other operators to get RPKI deployed (and more power to them to going to such lengths to do so), but you have to determine whether you are willing to let a service such as this run outside of our domain. Every year, someone asks me whether I'd be willing to outsource my route reflector VNF's to AWS/Azure/e.t.c. My answer to that falls within the realms of handling RPKI for your network :-). Mark.

Mark Tinka

5:10 p.m.

New subject: CloudFlare issues?

On 4/Jul/19 17:22, Francois Lecavalier wrote:

...

Following that Verizon debacle I got onboard with ROV, after a couple research I stopped my choice on the ….drum roll…. CloudFlare GoRTR (https://github.com/cloudflare/gortr). If you trust them enough they provide an updated JSON every 15 minutes of the global RIR aggregate. I’ll see down the road if we’ll fetch them ourselves but at least it got us up and running in less than an hour. It was also easy for us to deploy as the routers and the servers are on the same PoP directly connected, so we don’t need the whole encryption recipe they provide for mass distribution.

Funny you should mention this... I was speaking with Tom today during an RPKI talk he gave at MyNOG, about whether we'd be willing to trust their RTR streams. But, I'm glad you found a quick solution to get you up and running. Welcome to the club.

...

But I also have a question for all the ROA folks out there. So far we are not taking any action other than lowering the local-pref – we want to make sure this is stable before we start denying prefixes. So the question, is it safe as of this date to : 1.Accept valid, 2. Accept unknown, 3. Reject invalid? Have any large network who implemented it dealt with unreachable destinations? I’m wondering as I haven’t found any blog mentioning anything in this regard and ClouFlare docs only shows example for valid and invalid, but nothing for unknown.

My assumption is that 1.Accept valid, 2. Accept unknown, 3. Reject invalid shouldn’t break anything.

Well, a Valid and NotFound state implicitly mean that the routes can be used for routing/forwarding. In that case, the only policy we create and apply is against Invalid routes, which is to DROP them. Mark.

Nick Hilliard

5:13 p.m.

New subject: CloudFlare issues?

Francois Lecavalier wrote on 04/07/2019 16:22:

...

My assumption is that 1.Accept valid, 2. Accept unknown, 3. Reject invalid shouldn’t break anything.

Accepting valid ROAs is a better idea after checking that the source AS is legitimate from the peer. Nick

2189

Age (days ago)

2189

Last active (days ago)

List overview

Download

12 comments

5 participants

participants (5)

Ben Maddison
Francois Lecavalier
Job Snijders
Mark Tinka
Nick Hilliard