do not filter your customers

newer
Cool IPs: 1.234.35.245 brute force...

Randy Bush

23 Feb 2012 23 Feb '12

5:41 a.m.

don't filter your customers. when they leak the world to you, it will get you a lot of free press and your marketing department will love you. just ask telstra. randy

Show replies by date

Christian Nielsen

23 Feb 23 Feb

6:33 a.m.

Who once said, there is no such thing as bad press? http://www.smh.com.au/technology/technology-news/dodo-takes-blame-for-intern... http://www.itnews.com.au/News/291364,telstra-router-causes-major-internet-ou... "Dodo has revealed a "minor hardware issue" was behind a Telstra outage that impacted multiple service providers and internet services nationwide" Does anyone have any additional details? Christian -----Original Message----- From: Randy Bush [mailto:randy@psg.com] Sent: Wednesday, February 22, 2012 9:42 PM To: North American Network Operators' Group Subject: do not filter your customers don't filter your customers. when they leak the world to you, it will get you a lot of free press and your marketing department will love you. just ask telstra. randy

Randy Bush

6:44 a.m.

...

"Dodo has revealed a "minor hardware issue" was behind a Telstra outage that impacted multiple service providers and internet services nationwide"

bs, trying to blame it on a vendor. a customer leaked a full table to smellstra, and they had not filtered. hence the $subject. and things when further downhill from there, when telstra also did not filter what they announced to their peers, and the peers went over prefix limits and dropped bgp. randy

Christopher Morrow

6:45 a.m.

On Thu, Feb 23, 2012 at 1:44 AM, Randy Bush <randy@psg.com> wrote: \

...

and things when further downhill from there, when telstra also did not filter what they announced to their peers, and the peers went over prefix limits and dropped bgp.

Oh! so protections worked!

...

:)

Randy Bush

6:57 a.m.

...

...
and things when further downhill from there, when telstra also did not filter what they announced to their peers, and the peers went over prefix limits and dropped bgp. Oh! so protections worked!

imiho, prefix count is too big a hammer. it would have been better if optus had irr-based filters in place on peerings with telstra. then they would not have dropped the sessions and their customers could still reach telstra customers. of course, if telstra did not publish accurately in an irr instance, not much optus could do. randy

Peter Ehiwe

7:49 a.m.

IOS-XR On 2/23/12, Randy Bush <randy@psg.com> wrote:

...

...
...
and things when further downhill from there, when telstra also did not filter what they announced to their peers, and the peers went over prefix limits and dropped bgp. Oh! so protections worked!

imiho, prefix count is too big a hammer.

it would have been better if optus had irr-based filters in place on peerings with telstra. then they would not have dropped the sessions and their customers could still reach telstra customers.

of course, if telstra did not publish accurately in an irr instance, not much optus could do.

randy

-- Warm Regards Peter(CCIE 23782).

Anurag Bhatia

8 a.m.

Haha! Funny (Sent from my mobile device) Anurag Bhatia http://anuragbhatia.com On Feb 23, 2012 12:27 PM, "Randy Bush" <randy@psg.com> wrote:

...

...
...
and things when further downhill from there, when telstra also did not filter what they announced to their peers, and the peers went over prefix limits and dropped bgp. Oh! so protections worked!

imiho, prefix count is too big a hammer.

it would have been better if optus had irr-based filters in place on peerings with telstra. then they would not have dropped the sessions and their customers could still reach telstra customers.

of course, if telstra did not publish accurately in an irr instance, not much optus could do.

randy

Christopher Morrow

3:49 p.m.

On Thu, Feb 23, 2012 at 1:57 AM, Randy Bush <randy@psg.com> wrote:

...

...
...
and things when further downhill from there, when telstra also did not filter what they announced to their peers, and the peers went over prefix limits and dropped bgp. Oh! so protections worked!

imiho, prefix count is too big a hammer.

sure. aspath-filter! :)

...

it would have been better if optus had irr-based filters in place on peerings with telstra. then they would not have dropped the sessions and their customers could still reach telstra customers.

really, both parties need/should-have filters, right? both parties should have their 'irr data' up-to-date... both parties should also filter outbound prefixes (so they don't leak internals, or ...etc) telstra seems to have ~8880 or so prefixes registered in IRRs (via radb whois lookup) optus has ~1217 or so prefixes registered in IRRs (again via the same lookup to radb)

...

of course, if telstra did not publish accurately in an irr instance, not much optus could do.

it's not clear how accurate the data is :( I do see one example that's not telstra (and which I don't see through telstra from one host I tested from) 203.59.57.0/24 a REACH customer, supposedly, registered by REACH on the behalf of the customer... the whole /16 there is allocated to the same entity not REACH though, so that's a tad confusing. -chris

Danny McPherson

24 Feb 24 Feb

2 a.m.

On Feb 23, 2012, at 1:44 AM, Randy Bush wrote:

...

a customer leaked a full table to smellstra, and they had not filtered. hence the $subject.

Ahh, this is I think the customer "leak" problem I'm trying to illustrate that an RPKI/BGPSEC-enabled world alone (as currently prescribed) does NOT protect against. If it can happen by accident, it can certainly serve as smoke screen or enable an actual targeted attack quite nicely by those so compelled.

...

and things when further downhill from there, when telstra also did not filter what they announced to their peers, and the peers went over prefix limits and dropped bgp.

Prefix limits are rather binary and indiscriminate, indeed. -danny

Randy Bush

3:42 a.m.

...

...
a customer leaked a full table to smellstra, and they had not filtered. hence the $subject.

Ahh, this is I think the customer "leak" problem I'm trying to illustrate that an RPKI/BGPSEC-enabled world alone (as currently prescribed) does NOT protect against.

the problem is that you have yet to rigorously define it and how to unambiguously and rigorously detect it. lack of that will prevent anyone from helping you prevent it. randy

Danny McPherson

12:46 p.m.

On Feb 23, 2012, at 10:42 PM, Randy Bush wrote:

...

the problem is that you have yet to rigorously define it and how to unambiguously and rigorously detect it. lack of that will prevent anyone from helping you prevent it.

You referred to this incident as a "leak" in your message: "a customer leaked a full table" I was simply agreeing with you -- i.e., looked like a "leak", smelled like a "leak" - let's call it a leak. I'm optimistic that all the good folks focusing on this in their day jobs, and expressly funded and resourced to do so, will eventually recognize what I'm calling "leaks" is part of the routing security problem. -danny

Steven Bellovin

6:10 p.m.

On Feb 24, 2012, at 7:46 40AM, Danny McPherson wrote:

...

On Feb 23, 2012, at 10:42 PM, Randy Bush wrote:

...
the problem is that you have yet to rigorously define it and how to unambiguously and rigorously detect it. lack of that will prevent anyone from helping you prevent it.

You referred to this incident as a "leak" in your message:

"a customer leaked a full table"

I was simply agreeing with you -- i.e., looked like a "leak", smelled like a "leak" - let's call it a leak.

I'm optimistic that all the good folks focusing on this in their day jobs, and expressly funded and resourced to do so, will eventually recognize what I'm calling "leaks" is part of the routing security problem.

Sure; I don't disagree, and I don't think that Randy does. But just because we can't solve the whole problem, does that mean we shouldn't solve any of it? As Randy said, we can't even try for a strong technical solution until we have a definition that's better than "I know it when I see it". --Steve Bellovin, https://www.cs.columbia.edu/~smb

goemon＠anime.net

6:13 p.m.

On Fri, 24 Feb 2012, Steven Bellovin wrote:

...

Sure; I don't disagree, and I don't think that Randy does. But just because we can't solve the whole problem, does that mean we shouldn't solve any of it?

that is often the way things are argued in engineering circles. the solution is imperfect therefore it is useless. this philosophy is reflected in the shoddy state of networks today. -Dan

Joe Maimon

6:35 p.m.

goemon@anime.net wrote:

...

On Fri, 24 Feb 2012, Steven Bellovin wrote:

...
Sure; I don't disagree, and I don't think that Randy does. But just because we can't solve the whole problem, does that mean we shouldn't solve any of it?

that is often the way things are argued in engineering circles.

the solution is imperfect therefore it is useless.

this philosophy is reflected in the shoddy state of networks today.

-Dan

Due to which side winning the debate? Joe

Danny McPherson

7:26 p.m.

On Feb 24, 2012, at 1:10 PM, Steven Bellovin wrote:

...

But just because we can't solve the whole problem, does that mean we shouldn't solve any of it?

Nope, we most certainly should decompose the problem into addressable elements, that's core to engineering and operations. However, simply because the currently envisaged solution doesn't solve this problem doesn't mean we shouldn't acknowledge it exists. The IETF's BGP security threats document [1] "describes a threat model for BGP path security", which constrains itself to the carefully worded SIDR WG charter, which addresses route origin authorization and AS_PATH "semantics" -- i.e., this "leak" problem is expressly out of scope of a threats document discussing BGP path security - eh? How the heck we can talk about BGP path security and not consider this incident a threat is beyond me, particularly when it happens by accident all the time. How we can justify putting all that BGPSEC and RPKI machinery in place and not address this "leak" issue somewhere in the mix is, err.., telling. Alas, I suspect we can all agree that experiments are good and the market will ultimately decide. -danny [1] draft-ietf-sidr-bgpsec-threats-02

Christopher Morrow

7:29 p.m.

On Fri, Feb 24, 2012 at 2:26 PM, Danny McPherson <danny@tcb.net> wrote:

...

happens by accident all the time. How we can justify putting all that BGPSEC and RPKI machinery in place and not address this "leak" issue somewhere in the mix is, err.., telling.

I think if we asked telstra why they didn't filter their customer some answer like: 1) we did, we goofed, oops! 2) we don't it's too hard 3) filters? what? I suspect in the case of 1 it's a software problem that needs more belts/suspenders I suspect in the case of 2 it's a problem that could be shown to be simpler with some resource-certification in place I suspect 3 is not likely... (or I hope so). So, even without defining what a leak is, providing a tool to better create/verify filtering would be a boon.

Danny McPherson

7:40 p.m.

On Feb 24, 2012, at 2:29 PM, Christopher Morrow wrote:

...

I think if we asked telstra why they didn't filter their customer some answer like: 1) we did, we goofed, oops! 2) we don't it's too hard 3) filters? what?

I suspect in the case of 1 it's a software problem that needs more belts/suspenders I suspect in the case of 2 it's a problem that could be shown to be simpler with some resource-certification in place I suspect 3 is not likely... (or I hope so).

So, even without defining what a leak is, providing a tool to better create/verify filtering would be a boon.

Yes, I agree! What I'd hate to see is: 4) We fully deployed BGPSEC, and RPKI, and upgraded our infrastructure, and retooled provisioning, operations and processes to support it all fully, and required our customers and peers to use it, and even then this still happened - WTF was the point? This "leak" thing is a key vulnerability that simply can't be brushed aside - that's the crux of my frustration with the current effort. -danny

Richard Barnes

7:49 p.m.

...

...
I think if we asked telstra why they didn't filter their customer some answer like: 1) we did, we goofed, oops! 2) we don't it's too hard 3) filters? what?

I suspect in the case of 1 it's a software problem that needs more belts/suspenders I suspect in the case of 2 it's a problem that could be shown to be simpler with some resource-certification in place I suspect 3 is not likely... (or I hope so).

So, even without defining what a leak is, providing a tool to better create/verify filtering would be a boon.

Yes, I agree!

What I'd hate to see is:

4) We fully deployed BGPSEC, and RPKI, and upgraded our infrastructure, and retooled provisioning, operations and processes to support it all fully, and required our customers and peers to use it, and even then this still happened - WTF was the point?

I think this is the point: <https://twitter.com/#!/atoonk/status/165245731429564416>

...

This "leak" thing is a key vulnerability that simply can't be brushed aside - that's the crux of my frustration with the current effort.

You seem to think that there's some extension/modification to BGPSEC that would fix route leaks in addition to the ASPATH issues that BGPSEC addresses right now. Have you written this up anywhere? I would be interested to read it. --Richard

Danny McPherson

7:58 p.m.

On Feb 24, 2012, at 2:49 PM, Richard Barnes wrote:

...

You seem to think that there's some extension/modification to BGPSEC that would fix route leaks in addition to the ASPATH issues that BGPSEC addresses right now. Have you written this up anywhere? I would be interested to read it.

I don't, actually -- as I haven't presupposed that "BGPSEC" is the answer to all things routing security related, nor have I excluded it. I didn't realize it was unacceptable to acknowledge a problem exists without having solved already. I might have that backwards though. -danny

Steven Bellovin

10:04 p.m.

On Feb 24, 2012, at 2:26 14PM, Danny McPherson wrote:

...

On Feb 24, 2012, at 1:10 PM, Steven Bellovin wrote:

...
But just because we can't solve the whole problem, does that mean we shouldn't solve any of it?

Nope, we most certainly should decompose the problem into addressable elements, that's core to engineering and operations.

However, simply because the currently envisaged solution doesn't solve this problem doesn't mean we shouldn't acknowledge it exists.

The IETF's BGP security threats document [1] "describes a threat model for BGP path security", which constrains itself to the carefully worded SIDR WG charter, which addresses route origin authorization and AS_PATH "semantics" -- i.e., this "leak" problem is expressly out of scope of a threats document discussing BGP path security - eh?

How the heck we can talk about BGP path security and not consider this incident a threat is beyond me, particularly when it happens by accident all the time. How we can justify putting all that BGPSEC and RPKI machinery in place and not address this "leak" issue somewhere in the mix is, err.., telling.

I repeat -- we're in violent agreement that route leaks are a serious problem. No one involved in BGPSEC -- not me, not Randy, not anyone -- disagrees. Give us an actionable definition and we'll try to build a defense. Right now, we have nothing better than what Justice Potter Stewart once said in an opinion: "I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["hard-core pornography"]; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it..." Again -- *please* give us a definition. --Steve Bellovin, https://www.cs.columbia.edu/~smb P.S. It was routing problems, including leaks between RIP and either EIGRP or OSPF (it's been >20 years; I just don't remember), that got me involved in Internet security in the first place. I really do understand the issue.

Jeffrey S. Young

25 Feb 25 Feb

1:24 a.m.

1. Make your customers register routes, then filter them. (may be time for big providers to put routing tools into open source for the good of the community - make it less hard?) 2. Implement the "1-hop" hack to protect your BGP peering. 98% of problem solved on the Internet today 3. Implement a "# of routes-type" filter to make your peers (and transit customers) phone you if they really do want to add 500,000 routes to your session ( or the wrong set of YouTube routes...). 99.9% of problem solved. 4. Implement BGP-Sec 99.91% of "this" problem solved. Because #1 is 'just too hard' and because #4 is just too sexy as an academic pursuit we all suffer the consequences. It's a shame that tier one peering agreements didn't evolve with a 'filter your customers' clause (aka do the right thing) as well as a 'like for like' (similar investments) clause in them. I'm not downplaying the BGP-SEC work, I think it's valid and may one day save us from some smart bunny who wants to make a name for himself by bringing the Internet to a halt. I don't believe that's what we're battling here. We're battling the operational cost of doing the right thing with the toolset we have versus waiting for a utopian solution (foolproof and free) that may never come. jy ps. my personal view. On 25/02/2012, at 6:26 AM, Danny McPherson <danny@tcb.net> wrote:

...

On Feb 24, 2012, at 1:10 PM, Steven Bellovin wrote:

...
But just because we can't solve the whole problem, does that mean we shouldn't solve any of it?

Nope, we most certainly should decompose the problem into addressable elements, that's core to engineering and operations.

However, simply because the currently envisaged solution doesn't solve this problem doesn't mean we shouldn't acknowledge it exists.

The IETF's BGP security threats document [1] "describes a threat model for BGP path security", which constrains itself to the carefully worded SIDR WG charter, which addresses route origin authorization and AS_PATH "semantics" -- i.e., this "leak" problem is expressly out of scope of a threats document discussing BGP path security - eh?

How the heck we can talk about BGP path security and not consider this incident a threat is beyond me, particularly when it happens by accident all the time. How we can justify putting all that BGPSEC and RPKI machinery in place and not address this "leak" issue somewhere in the mix is, err.., telling.

Alas, I suspect we can all agree that experiments are good and the market will ultimately decide.

-danny

[1] draft-ietf-sidr-bgpsec-threats-02

Christopher Morrow

1:59 a.m.

On Fri, Feb 24, 2012 at 8:24 PM, Jeffrey S. Young <young@jsyoung.net> wrote:

...

1. Make your customers register routes, then filter them. (may be time for big providers to put routing tools into open source for the good of the community - make it less hard?)

not a big provider, but ras@e-gerbil did release irr-tools no?

...

2. Implement the "1-hop" hack to protect your BGP peering.

98% of problem solved on the Internet today

which problem? GTSH only protects your actual bgp session, not the content of the session(s) or the content across the larger network.

...

3. Implement a "# of routes-type" filter to make your peers (and transit customers) phone you if they really do want to add 500,000 routes to your session ( or the wrong set of YouTube routes...).

max-prefix already exists... sometimes it works, sometimes it's a burden. It doesnt' tell you anything about the content of the session though (the YT routes example doesn't actually work that way)

...

99.9% of problem solved.

? not sure about that number

...

4. Implement BGP-Sec

99.91% of "this" problem solved.

Because #1 is 'just too hard' and because #4 is just too sexy as an academic pursuit we all suffer the consequences. It's

there are folks working on the #4 problem, not academics even. It's not been particularly sexy though :(

...

a shame that tier one peering agreements didn't evolve with a 'filter your customers' clause (aka do the right thing) as well as a 'like for like' (similar investments) clause in them.

I'm missing something here... it's not clear to me that 'tier1' providers matter a whole lot in the discussion. Many of them have spoken up saying: "Figuring out the downstream matrix in order to put a prefix-list on my SFP peer is not trivial, and probably not workable on gear today." (shane I think has even said this here...)

...

I'm not downplaying the BGP-SEC work, I think it's valid and may one day save us from some smart bunny who wants to make a name for himself by bringing the Internet to a halt. I don't believe that's what we're battling here. We're battling the operational cost of doing the right thing with the toolset we have

right, so today you have to do a lot of math/work to figure out if your customer's prefixes are hers, and if they should be permitted into your RIB. Tomorrow you COULD get a better end result with less work and more assurance given a populated resource certification system. Extending some into the land of BGPSEC you COULD also know that the route you hear originated from the correct ASN and later you'd be able to tell that path the route travel was the same as the ASPATH in the route... -chris

Dobbins, Roland

2:12 a.m.

On Feb 25, 2012, at 8:59 AM, Christopher Morrow wrote:

...

max-prefix already exists... sometimes it works, sometimes it's a burden.

Some sort of throttle - i.e., allow only X number of routing updates within Y number of [seconds? milliseconds? BGP packets?] would be more useful, IMHO. If the configured rate is exceeded, maintain the session but stop accepting further updates until either manually reset or the rate of updates falls back within acceptable parameters. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

Julien Goodwin

2:23 a.m.

On 25/02/12 13:12, Dobbins, Roland wrote:

...

On Feb 25, 2012, at 8:59 AM, Christopher Morrow wrote:

...
max-prefix already exists... sometimes it works, sometimes it's a burden.

Some sort of throttle - i.e., allow only X number of routing updates within Y number of [seconds? milliseconds? BGP packets?] would be more useful, IMHO. If the configured rate is exceeded, maintain the session but stop accepting further updates until either manually reset or the rate of updates falls back within acceptable parameters.

JunOS does have "out-delay", but that's not quite a solution although it does help stem some prefix flapping issues.

Christopher Morrow

2:39 a.m.

On Fri, Feb 24, 2012 at 9:12 PM, Dobbins, Roland <rdobbins@arbor.net> wrote:

...

On Feb 25, 2012, at 8:59 AM, Christopher Morrow wrote:

...
max-prefix already exists... sometimes it works, sometimes it's a burden.

Some sort of throttle - i.e., allow only X number of routing updates within Y number of [seconds? milliseconds? BGP packets?] would be more useful, IMHO. If the configured rate is exceeded, maintain the session but stop accepting further updates until either manually reset or the rate of updates falls back within acceptable parameters.

it seems to me that most of the options discussed for this are .. bad, in one dimension or another :( typical max-prefix today will dump a session, if you exceed the number of prefixes on the session... good? maybe? bad? maybe? did the peer fire up a full table to you? or did you just not pay attention to the log messages saying: "Hey, joe's going to need an update shortly..." X prefixes/packets in Y seconds/milliseconds doesn't keep the peer from blowing up your RIB, it does slow down convergence :( If you have 200 peers on an edge device, dropping the whole device's routing capabilities because of one AS7007/AS1221/AS9121 .. isn't cool to your network nor the other customers on that device :( max-prefix as it exists today at least caps the damage at one customer. The knobs available are sort of harsh all the way around though today :( -chris

Dobbins, Roland

3:52 a.m.

On Feb 25, 2012, at 9:39 AM, Christopher Morrow wrote:

...

it seems to me that most of the options discussed for this are .. bad, in one dimension or another :(

Concur.

...

X prefixes/packets in Y seconds/milliseconds doesn't keep the peer from blowing up your RIB,

How so? If the configured parameters are exceeded, stop accepting/inserting updates until this is no longer the case. Exceptions would be made for peering session establishment, it would take effect after that.

...

it does slow down convergence :(

Yes, but is this always necessarily a Bad Thing? For example, this particular circumstance (and many like it, c.f. AS7007 incident, et. al.) it could be argued that in this particular case, [incorrect? undesirable? premature? pessimal?] convergence led to a poor result, could it not?

...

If you have 200 peers on an edge device, dropping the whole device's routing capabilities because of one AS7007/AS1221/AS9121 .. isn't cool to your network nor the other customers on that device :(

Apologies for being unclear; I wasn't suggesting dropping or removing anything, but rather refusing to further accept/insert updates from a given peer until the update rate from said peer slowed to within configured parameters.

...

max-prefix as it exists today at least caps the damage at one customer.

But it doesn't, really, does it? The effects cascade in an anisotropic manner throughout a potentially large transit cone.

...

The knobs available are sort of harsh all the way around though today :(

Concur again, sigh. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

Christopher Morrow

7:15 a.m.

On Fri, Feb 24, 2012 at 10:52 PM, Dobbins, Roland <rdobbins@arbor.net> wrote:

...

...
X prefixes/packets in Y seconds/milliseconds doesn't keep the peer from blowing up your RIB,

How so? If the configured parameters are exceeded, stop accepting/inserting updates until this is no longer the case. Exceptions would be made for peering session establishment, it would take effect after that.

if the rate is 1/ms ... I can fill the rib in 2million ms ... ~30mins? Rate alone isn't the problem :( size matters.

...

...
it does slow down convergence :(

Yes, but is this always necessarily a Bad Thing? For example, this particular circumstance (and many like it, c.f. AS7007 incident, et. al.) it could be argued that in this particular case, [incorrect? undesirable? premature? pessimal?] convergence led to a poor result, could it not?

it's not clear, to me at least, that slowing convergence is good. it seems to me that folk do all manner of 'interesting' things in order to limit convergence time. People aren't trying to actively make convergence take longer, that I've seen at least.

...

...
If you have 200 peers on an edge device, dropping the whole device's routing capabilities because of one AS7007/AS1221/AS9121 .. isn't cool to your network nor the other customers on that device :(

Apologies for being unclear; I wasn't suggesting dropping or removing anything, but rather refusing to further accept/insert updates from a given peer until the update rate from said peer slowed to within configured parameters.

yup, I think I jumped a bit around, my penalizing every other customer was a reference to not having any limiting system in place.

...

...
max-prefix as it exists today at least caps the damage at one customer.

But it doesn't, really, does it? The effects cascade in an anisotropic manner throughout a potentially large transit cone.

dropping a single customer sucks, dropping an entire edge device is far far worse.

...

...
The knobs available are sort of harsh all the way around though today :(

Concur again, sigh.

hurray! sort of. thanks! -chris

Dobbins, Roland

7:26 a.m.

On Feb 25, 2012, at 2:15 PM, Christopher Morrow wrote:

...

if the rate is 1/ms ... I can fill the rib in 2million ms ... ~30mins? Rate alone isn't the problem :( size matters.

Sure; the idea is that some sort of throttling, coupled with overall size limitations, might be useful.

...

People aren't trying to actively make convergence take longer, that I've seen at least.

Yes, and in most cases, the goal is to speed up convergence. I'm positing that in these particular circumstances, fast convergence is not necessarily desirable, and that 'these particular circumstances' generally involve large numbers of updates which are not associated with turning up a new peering session being received over a short period of time. What about routing update transmission throttling, instead? Does that make any more sense, in terms of being liberal with what we accept and conservative in what (or how much, how quickly) we send?

...

dropping a single customer sucks, dropping an entire edge device is far far worse.

I agree; I don't mean to imply that anything should be dropped. Again, apologies for being unclear. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

Valdis.Kletnieks＠vt.edu

5:20 p.m.

On Fri, 24 Feb 2012 21:39:37 EST, Christopher Morrow said:

...

The knobs available are sort of harsh all the way around though today :(

So what would be a good knob if it was available? I've seen about forty-leven people say the current knobs suck, but no real proposals of "what would really rock is if we could...."

Tom Hill

6:04 p.m.

On 25/02/12 17:20, Valdis.Kletnieks@vt.edu wrote:

...

On Fri, 24 Feb 2012 21:39:37 EST, Christopher Morrow said:

...
The knobs available are sort of harsh all the way around though today :(

So what would be a good knob if it was available? I've seen about forty-leven people say the current knobs suck, but no real proposals of "what would really rock is if we could...."

I've suggested before that a configured increase limit, in percentage might be *slightly* more intelligent than the current hard limit settings (i.e. max-prefixes). Typically you're going to get, what, 100 routes? Maybe less, maybe more. If that rises by 100%, drop the session. Weird customer? 200%. Tom

Faisal Imtiaz

8:02 p.m.

New subject: Looking For Tinet NOC Contact

Hello, If anyone from TINET (AS 3257) is on this list, can you please contact us ? We have one of their customers announcing one of our blocks and I need to get them to stop doing that.:) Thanks you in advance. Faisal Imtiaz Snappy Internet& Telecom 7266 SW 48 Street Miami, Fl 33155 Tel: 305 663 5518 x 232 Helpdesk: 305 663 5518 option 2 Email: Support@Snappydsl.net

Christopher Morrow

26 Feb 26 Feb

12:55 a.m.

On Sat, Feb 25, 2012 at 12:20 PM, <Valdis.Kletnieks@vt.edu> wrote:

...

On Fri, 24 Feb 2012 21:39:37 EST, Christopher Morrow said:

...
The knobs available are sort of harsh all the way around though today :(

So what would be a good knob if it was available? I've seen about forty-leven people say the current knobs suck, but no real proposals of "what would really rock is if we could...."

I'm not sure... here's a few ideas though to toss on the fire of thought: 1) break the process up inside the router, provide another set of places to blcok and tackle the problem. 2) better metric the problem for operations staff 3) automate the problem 'better' (inside a set of sane boundaries) I think in 1 I want to be able to be assured that inbound data to a bgp peer will not cause problems for all other peers on the same device. Keep the parsing, memory and cpu management separate from the main routing management inside the router, provide controls on these points configurable at a per-peer level. That way you could limit things like: - each peer able to take a maximum amount of RAM, start discarding routes over that limit, alarm at a configurable percentage of the limit. - each peer could consume only a set percentage of CPU resources, better would be the ability to pin bgp peer usage to a particular CPU (or set of CPUs) and other route processing on another CPU/set-of-CPUs. - interfaces between the bgp speaker, receiver, ingest and databases could all be standardized, simple and auditable as well. If the peer sent a malformed update only that peering session would die, if the parsing of the update caused a meltdown again only the single peer would be affected. The interface between the code speaking to the peer and the RIB could be more robust and more resilient to errors. for 2, I think having more data available about avg rate of increase, max rate of increase, average burst size and predicted time to overrun would be helpful. Most of this one could gather with some smart SNMP tricks I suspect... on the other hand, just reacting to the syslog messages in a timely fashion works :) for 3, automate the reaction to syslog/snmp messages, increasing the thresholds if there hasn't been an increase in the last X hours and the limit is not above Y percent of a full table already. (and send a note to the NOC ticket system for historical preservation). These too have flaws... I'm not sure there's a good answer to this though :( -chris

Dobbins, Roland

5:45 a.m.

On Feb 26, 2012, at 7:55 AM, Christopher Morrow wrote:

...

I'm not sure... here's a few ideas though to toss on the fire of thought:

Concur with this general approach, which is a longer-term effort - but it would be nice if there was some discrete, limited-scope knob which could conceivably be added as a point-feature request, thereby having some chance of actually making it into shipping code at some point before the next millennium, and which won't cause more harm than good. ;> ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

Jeff Young

25 Feb 25 Feb

3:53 a.m.

On 25/02/2012, at 12:59 PM, Christopher Morrow wrote:

...

On Fri, Feb 24, 2012 at 8:24 PM, Jeffrey S. Young <young@jsyoung.net> wrote:

...
1. Make your customers register routes, then filter them. (may be time for big providers to put routing tools into open source for the good of the community - make it less hard?)

not a big provider, but ras@e-gerbil did release irr-tools no?

And other providers out there have extensive tool sets from which we could all benefit. I'll let them chime in if they choose.

...

...
2. Implement the "1-hop" hack to protect your BGP peering.

98% of problem solved on the Internet today

which problem? GTSH only protects your actual bgp session, not the content of the session(s) or the content across the larger network.

The security problem, but it was a hedge on my part.

...

...
3. Implement a "# of routes-type" filter to make your peers (and transit customers) phone you if they really do want to add 500,000 routes to your session ( or the wrong set of YouTube routes...).

max-prefix already exists... sometimes it works, sometimes it's a burden. It doesnt' tell you anything about the content of the session though (the YT routes example doesn't actually work that way)

Depends on how many /24's the Pakistan(?) Telecom guy let into the network to block the YT content... but you're right, the example would have been better in support of #1. (had PT been forced to register routes before sending them and his upstream been filtering based on those routes we'd have never heard about it.)

...

...
99.9% of problem solved.

? not sure about that number

...

...
4. Implement BGP-Sec

99.91% of "this" problem solved.

Because #1 is 'just too hard' and because #4 is just too sexy as an academic pursuit we all suffer the consequences. It's

there are folks working on the #4 problem, not academics even. It's not been particularly sexy though :(

Point was that the problem is mostly operational. We have tools to deal with the problem but the operational costs are high. For fifteen (below) years we've treated this (route leak) as "not my problem" because it's too costly. Every 6-12 months it comes back to bite us. If the cost of an outage every 6 months+ is low compared to solving the problem, the community will endure the outage. If we want it to stop today we can make it stop but stopping it has a cost. “...a glitch at a small ISP... triggered a major outage in Internet access across the country. The problem started when MAI Network Services ...passed bad router information from one of its customers onto Sprint.” -- news.com, April 25, 1997 jy

Shane Amante

24 Feb 24 Feb

8:04 p.m.

Steve, On Feb 24, 2012, at 11:10 AM, Steven Bellovin wrote:

...

On Feb 24, 2012, at 7:46 40AM, Danny McPherson wrote:

...
On Feb 23, 2012, at 10:42 PM, Randy Bush wrote:

...
the problem is that you have yet to rigorously define it and how to unambiguously and rigorously detect it. lack of that will prevent anyone from helping you prevent it.

You referred to this incident as a "leak" in your message:

"a customer leaked a full table"

I was simply agreeing with you -- i.e., looked like a "leak", smelled like a "leak" - let's call it a leak.

I'm optimistic that all the good folks focusing on this in their day jobs, and expressly funded and resourced to do so, will eventually recognize what I'm calling "leaks" is part of the routing security problem.

Sure; I don't disagree, and I don't think that Randy does. But just because we can't solve the whole problem, does that mean we shouldn't solve any of it?

Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this. As has been discussed in the SIDR WG, BGPSEC will _increase_ state in BGP, (more DRAM needed in PE's and RR's, crypto processors to verify sigs, more UPDATE traffic for beaconing). And, at the end of the day, ISP's are going to go to their customers and say to them: - BGP convergence may be slower than in the past, because we're shipping sigs around in BGP now - we can prevent a malicious attack from a random third-party (in the right part of the topology); - *but* I can't protect you from a 20+ year old problem of a transit customer accidentally -or- maliciously stealing/dropping your traffic if they leak routes from one provider to another provider?

...

As Randy said, we can't even try for a strong technical solution until we have a definition that's better than "I know it when I see it".

The first step is admitting that we have a problem, then discussing it collectively to try to determine a way to prevent said problem from happening. -shane

Christopher Morrow

8:54 p.m.

On Fri, Feb 24, 2012 at 3:04 PM, Shane Amante <shane@castlepoint.net> wrote:

...

Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this.

I don't think anyone's ignoring the problem... I think lots of people have said an equivalent of: 1) "How do I know that this path: A - B - C - D is a 'leak'?" Followed by: 2) "Tell me how to answer this programatically given the data we have today in the routing system" (bgp data on the wire, IRR data, RIR data) so far ... both of the above questions haven't been answered (well 1 was answered with: "I will know it when i see it" which isn't helpful at all in finding a solution) -chris

Geoff Huston

10:08 p.m.

On 25/02/2012, at 7:54 AM, Christopher Morrow wrote:

...

On Fri, Feb 24, 2012 at 3:04 PM, Shane Amante <shane@castlepoint.net> wrote:

...
Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this.

I don't think anyone's ignoring the problem... I think lots of people have said an equivalent of: 1) "How do I know that this path: A - B - C - D is a 'leak'?"

If you are receiving a path of the form (A B C D), and the origination of the prefix at D is good, then the only way you can figure out this is a leak as compare to the intentional operation of BGP is not by looking at the operation of protocol per se, but by looking at the routing policy intentions of A, B, C and D and working out if what you are seeing is intentional within the scope of the routing policies of these entities. RPSL is one such approach of describing such policy in a manner that one could perform some basic computation over the data. It exposes a broader issue here about the difference between routing intent and protocol correctness. From the perspective of protocol correctness, regardless of whether the information was intended to be propagated, a protocol correctness tool should be able to tell you that the information has been faithfully propagated, but cannot tell you whether such propagation was intentional or not.

...

Followed by: 2) "Tell me how to answer this programatically given the data we have today in the routing system" (bgp data on the wire, IRR data, RIR data)

I wish.

...

so far ... both of the above questions haven't been answered (well 1 was answered with: "I will know it when i see it" which isn't helpful at all in finding a solution)

Some longstanding problems are longstanding because we have not quite managed to apply the appropriate analytical approach to the problem. Others are longstanding problems because they are damn difficult and this makes me wonder if we really understand the nature of the space we are working in. For example, if you think about routing not as a topology and reachability tool, but an distributed algorithm to solve a set of simultaneous equations (policies) would that provide a different insight as to the way in which routing policies and routing protocols interact? Geoff

Leo Bicknell

8:59 p.m.

In a message written on Fri, Feb 24, 2012 at 01:04:20PM -0700, Shane Amante wrote:

...

Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this.

Not all "leaks" are bad. I remember when there was that undersea landside in Asia that took out a bunch of undersea cables. Various providers quickly did mutual transit and other arrangements to route around the problem, getting a number of things back up quite quickly. These did not match IRR records though, and likely would not have matached BGPSEC information, at least not initially. There are plenty of cases where someone "leaks" more specifics with NO_EXPORT to only one of their BGP peers for the purposes of TE. The challenge of securing BGP isn't crypto, and it isn't enough ram/cpu/whatever to process it. The challenge is getting a crypto scheme that operators can use to easily represent the real world. It turns out the real world is quite messy though, often full of temporary hacks, unusual relationships and other issues. I'm sure it will be solved, one day. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

Christopher Morrow

9:07 p.m.

On Fri, Feb 24, 2012 at 3:59 PM, Leo Bicknell <bicknell@ufp.org> wrote:

...

In a message written on Fri, Feb 24, 2012 at 01:04:20PM -0700, Shane Amante wrote:

...
Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this.

Not all "leaks" are bad.

I remember when there was that undersea landside in Asia that took out a bunch of undersea cables. Various providers quickly did mutual transit and other arrangements to route around the problem, getting a number of things back up quite quickly. These did not match IRR records though, and likely would not have matached BGPSEC information, at least not initially.

well.... for bgpsec so if the paths were signed, and origins signed, why would they NOT pass BGPSEC muster? I can see that if the IRR data didn't match up sanely prefix-lists/filters would need some cajoling, but that likely happened anyway in this case. -chris

Leo Bicknell

9:29 p.m.

In a message written on Fri, Feb 24, 2012 at 04:07:28PM -0500, Christopher Morrow wrote:

...

well.... for bgpsec so if the paths were signed, and origins signed, why would they NOT pass BGPSEC muster?

I honestly have trouble keeping the BGP security work straight. There is work to secure the sessions, work to authenticate route origin, work to authenticate the AS-Path, the peer relationships, and so on. I believe BGPSEC authenticates the AS-Path, and thus turning up a new peer requires them to each sign each others "path object". During the time period between when the route propogates and the signature propogates these routes appear to be a leak. I don't believe the signature data is moved via BGP. Worse, in this case, imagine if one of the parties was "cut off" from the signature distribution system. They would need to bring up their (non-validating) routes to reach the signature distribution system before their routes would be accepted! In fact, this happens today with those who strict IRR filter. Try getting a block from ARIN, and then service from a provider who only uses IRR filters. The answer is to go to some other already up and working network to submit your IRR data to the IRR server, before your network can come up and be accepted! On a new turn up for an end-user, not a big deal. When you look at the problems that might occur in the face of natural or man made disasters though, like the cable cut, it could result in outages that could have been fixed in minutes with a non-validing system taking hours to fix in a validating one. That may be an acceptable trade off to get security; but it depends on exactly what the trade off ends up being. To date, I personally have found "insecure" BGP, even with the occasional leaks, to be a better overall solution. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

Christopher Morrow

10:04 p.m.

On Fri, Feb 24, 2012 at 4:29 PM, Leo Bicknell <bicknell@ufp.org> wrote:

...

In a message written on Fri, Feb 24, 2012 at 04:07:28PM -0500, Christopher Morrow wrote:

...
well.... for bgpsec so if the paths were signed, and origins signed, why would they NOT pass BGPSEC muster?

I honestly have trouble keeping the BGP security work straight.

yes

...

There is work to secure the sessions, work to authenticate route origin, work to authenticate the AS-Path, the peer relationships, and so on.

I believe BGPSEC authenticates the AS-Path, and thus turning up a new peer requires them to each sign each others "path object".

well currently it doesn't do anything (really) but the PLAN is that you'd be able to look at the origin, view some transitive community/attribute and say: "That validates with the roa data" - some cert-check/hash-check/etc. then later on you'd be able to say for each AS in the ASPATH: "Yes, the route is signed by AS1, the signature validates. Yes the route is signed by AS2, the signature validates (wash/rinse/repeat for the whole path)"

...

During the time period between when the route propogates and the signature propogates these routes appear to be a leak. I don't

signatures follow inside the announcement as currently draft-spec'd.

...

believe the signature data is moved via BGP. Worse, in this case, imagine if one of the parties was "cut off" from the signature distribution system. They would need to bring up their (non-validating) routes to reach the signature distribution system before their routes would be accepted!

the sig data for an NLRI follows along inside the announcement. the cache of data is probably updated inside of a day... there's likely some skew, but provided the origins don't change and no one has to emergency release new key materials, I think it's not important for this discussion. you simply start hearing routes with same origin as previously on different paths. "new customers" essentially pop up en-mass. This isn't a problem as long as the customers are the same origin-as as before... it'd mean some rejiggering of prefix-lists (as I said before) but ... you'd be doing that anyway.

...

In fact, this happens today with those who strict IRR filter. Try getting a block from ARIN, and then service from a provider who only uses IRR filters. The answer is to go to some other already up and working network to submit your IRR data to the IRR server, before your network can come up and be accepted!

right, there's some lag between publication and acceptance/update. I think in the case of (for example L(3) the lag is ~6hrs in the worst case.

...

On a new turn up for an end-user, not a big deal. When you look at the problems that might occur in the face of natural or man made disasters though, like the cable cut, it could result in outages that could have been fixed in minutes with a non-validing system taking hours to fix in a validating one.

I don't think that's really the case, but walking through the processes/requirements seems like a sane thing to do.

...

That may be an acceptable trade off to get security; but it depends on exactly what the trade off ends up being. To date, I personally have found "insecure" BGP, even with the occasional leaks, to be a better overall solution.

how's that chinese leak of F-root doing for you? :) -chris

George Bonser

10:06 p.m.

...

-----Original Message----- From: Leo Bicknell Sent: Friday, February 24, 2012 1:00 PM

...

There are plenty of cases where someone "leaks" more specifics with NO_EXPORT to only one of their BGP peers for the purposes of TE.

The challenge of securing BGP isn't crypto, and it isn't enough ram/cpu/whatever to process it. The challenge is getting a crypto scheme that operators can use to easily represent the real world. It turns out the real world is quite messy though, often full of temporary hacks, unusual relationships and other issues.

I'm sure it will be solved, one day.

I can think of a way to do it but it would require some trust and it would require that people actually *used* it. What one would do is feed the routes they are proposing to send to a BGP peer to a RIR front-end. The receiving peer would "sign off" on the proposal and the routes would be then entered into the RIR. That is the step that is currently missing. Anyone can enter practically anything into an RIR and the receiving side never gets to "sanity check" the information before it actually gets written to the database. Once you have this base of information, route filtration generated from the database becomes more reliable. In fact, a network might have several "canned" profiles of different route packages registered in the front end. A "transit" package, a "customer routes" package and maybe some specialized packages for peering at various private/public exchange points. If you pick up a new peer at a transit point, you select the package for that point, it proposes that to the peer, peer approves it, and they can both generate their route filters from that information. It could even highlight some glaring errors automatically to spot what might be a typo or even attempted nefarious activity. The receiver of a proposed change might be alerted to the fact that the new route(s) being offered are inconsistent with the database information (routes already being sourced by an AS that the proposed sender is not peering with) which could be overridden by the receiver (or just ignored) but having something show up in some way that highlights a possible inconsistency might generate a closer look at that proposal and head off problems later. But the fundamental problem is that the current system is "open loop".

Nick Hilliard

11:20 p.m.

On 24/02/2012 20:59, Leo Bicknell wrote:

...

It turns out the real world is quite messy though, often full of temporary hacks, unusual relationships and other issues.

... and, if you create a top-down control mechanism to be superimposed upon the current fully distributed control mechanism, you will soon find that politicians and regulators will take a very keen interest in BGP once they realise that they can turn off specific prefixes from a single point. Whatever about temporary hacks and unusual relationships, the entropy introduced by layers 9 through 12 is almost always insufferable. Nick

Nick Hilliard

11:16 p.m.

On 24/02/2012 20:04, Shane Amante wrote:

...

Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this.

I'd be interested to hear your opinions on exactly how rpki in its current implementation would have prevented the optus/telstra problem. Could you elaborate? Here's a quote from draft-ietf-sidr-origin-ops:

...

As the BGP origin AS of an update is not signed, origin validation is open to malicious spoofing. Therefore, RPKI-based origin validation is designed to deal only with inadvertent mis-advertisement.

Origin validation does not address the problem of AS-Path validation. Therefore paths are open to manipulation, either malicious or accidental.

An optus/telstra style problem might have been mitigated by an rpki based full path validation mechanism, but we don't have path validation. Right now, we only have a draft of a list of must-have features - draft-ietf-sidr-bgpsec-reqs. This is only the first step towards designing a functional protocol, not to mind having running code. Nick

Shane Amante

25 Feb 25 Feb

6:07 a.m.

Nick, On Feb 24, 2012, at 4:16 PM, Nick Hilliard wrote:

...

On 24/02/2012 20:04, Shane Amante wrote:

...
Solving for route leaks is /the/ "killer app" for BGPSEC. I can't understand why people keep ignoring this.

I'd be interested to hear your opinions on exactly how rpki in its current implementation would have prevented the optus/telstra problem. Could you elaborate?

I apologize if I mislead you, but I did not claim that the RPKI, in its current ROA implementation, *would* have prevented this specific route leak related to Optus/Telstra. OTOH, I would completely agree with Geoff's comment that the policy language of RPSL has the ability to express routing _policy_, a.k.a. "intent", recursively across multiple ASN's ... (please note that I'm specifically talking about the technical capability of the policy language of RPSL, not the actual _data_ contained in the IRR). Or, to put it a different way, the reachability information carried in BGP is the end-result/output of policy. One needs to understand the *input*, a.k.a.: the policy/intent, if they are to validate the output, namely the reachability information carried in BGP. Unfortunately, denying this reality is not going to make it "go away".

...

Here's a quote from draft-ietf-sidr-origin-ops:

...
As the BGP origin AS of an update is not signed, origin validation is open to malicious spoofing. Therefore, RPKI-based origin validation is designed to deal only with inadvertent mis-advertisement.

Origin validation does not address the problem of AS-Path validation. Therefore paths are open to manipulation, either malicious or accidental.

An optus/telstra style problem might have been mitigated by an rpki based full path validation mechanism, but we don't have path validation. Right now, we only have a draft of a list of must-have features - draft-ietf-sidr-bgpsec-reqs. This is only the first step towards designing a functional protocol, not to mind having running code.

As one example, those "must-have features" have not, yet[1], accounted for the various "kinky" things we all do to manipulate the AS_PATH in the wild, for lots of very important business reasons, namely: ASN consolidation through knobs like "local-as alias" in JUNOS-land and "local-as no-prepend replace-as" in IOS-land, which have existed in shipping code for several years and are in active, widespread use and will continue to remain so[2]. Furthermore, given the current design proposal on the table of a BGPSEC transmitter forward-signing the "Target AS", as learned from a receiver in the BGP OPEN message, this could make it impossible to do ASN consolidation in the future, (unless I'm misunderstanding something). -shane [1] I have asked at the the last SIDR WG meeting in Taipei specifically for this to be accounted for, but I don't see this in the current rev of the draft you cite. Perhaps others should chime in on the SIDR WG mailing list if they are aware of the use of ASN-consolidation knobs and consider them a critical factor to consider during the design process, particularly so they are looked at during the earliest stages of the design. [2] I haven't heard of any vendors stating that they are intending to EOL or not support those features any more, but it would be amusing to see the reaction they would get if they tried. :-)

Nick Hilliard

10:15 p.m.

On 25/02/2012 06:07, Shane Amante wrote:

...

OTOH, I would completely agree with Geoff's comment that the policy language of RPSL has the ability to express routing _policy_, a.k.a. "intent", recursively across multiple ASN's ... (please note that I'm specifically talking about the technical capability of the policy language of RPSL, not the actual _data_ contained in the IRR).

routing policy concerns the interaction of two classes of object (prefixes and asns) as handled between asns. Problem is, while you can describe AS interaction between ASNs and some prefix stuff between ASNs, rpsl doesn't really have proper support to link the two - i.e. tying prefixes to specific paths and all that jazz. Then again, neither do most routers. It hardly matters - without a secure means of path validation, the path is purely advisory and you can only barely trust the peer asn in the path. So RPSL isn't really a solution for describing how prefixes ought to be handled to inter-asn connectivity, and even if it were and routers could handle as->prefix mapping properly, our routers couldn't handle it for large-scale interconnection links due to configuration management limitations. Put simply, managing enormous lists of prefixes and piles of ASN paths (in regex form) causes router RPs to asplode. So from the point of view of prefix distribution control, some sort of live query system is required. To this end, rpki with as path validation (if we actually had an implementation which checked all the boxes in the draft list of requirements) might work. My point was that at the moment, it's vapour and it's not clear at this point that it will ever change into something more solid, particularly given the challenging feature list that we want it to cope with, and given the constraints of what people already do with their policy routing. And even if it does ever work, it immediately opens up an exquisitely ugly can of worms at layers 9 and above. Call me conservative, but I have not been convinced that RPKI solves more problems than it creates. Your other concerns about as path validation implementation are indeed difficult to address. Nick

Dongting Yu

10:39 p.m.

Let me chime in and attempt to explain why a couple of solutions I've seen so far in this thread won't work: - rate-limiting/throttling updates: BGP by protocol does not repeat updates; if an update is sent then the sender assumes that the receiver has received it and will remember it until a change or a withdrawal. If you rate limit announcements, either you hold things off in a buffer, which would need a very large buffer, or you drop updates, which would lead to inconsistent views on the two sides of the session. What if a legitimate update was among the large burst? - max-prefix: it is currently used to prevent large bursts of updates but it won't stop Youtube incident, which was more targeted. Perhaps the YT incident falls into a different category from 'route leaks' but without a clear definition of the latter we simply cannot say. Also, max-prefix causes problems in slowly-increasing peers or peers with new large customers and people not bothered to adjust the max-prefix value accordingly. - max-prefix in the form of a percentage: some peers actually are very stable in the number of prefixes they announce, and some are not. Both are probably valid depending on your business model/requirements. A x% may be too lax for one company but too little for another. Figuring the right number (or even a ballpark) is probably a lot harder than a simple max-prefix value. I have seen ASes that announce hundreds to tens of thousands of prefixes on a periodic basis. Percentages also don't work so well for ASes with single-digit or low-double-digit number of of prefixes. Dongting

Dobbins, Roland

26 Feb 26 Feb

5:43 a.m.

On Feb 26, 2012, at 5:39 AM, Dongting Yu wrote:

...

you drop updates, which would lead to inconsistent views on the two sides of the session.

Views are inconsistent by design - there is no state synchronization. All a sender knows is that he sent the updates, not what (if anything) was done with them by the receiver.

...

What if a legitimate update was among the large burst?

Presumably, soft-reset would be initiated after the throttling. But per previous email, if any sort of throttling is to be done at all, it's probably best that it is done by the sender. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

Randy Bush

25 Feb 25 Feb

12:49 a.m.

...

Solving for route leaks is /the/ "killer app" for BGPSEC.

as would be solving world hunger, war, bad cooking, especially bad cooking. route leaks, as much as i understand them o are indeed bad ops issues o are not security per se o are a violation of business relationshiops o and 20 years of fighting them have not given us any significant increase in understanding, formal definition, or prevention. i would love to see progress on the route leak problem. i do not confuddle it with security. randy

Dobbins, Roland

1:45 a.m.

On Feb 25, 2012, at 7:49 AM, Randy Bush wrote:

...

i would love to see progress on the route leak problem. i do not confuddle it with security.

Availability is a key aspect of security - the most important one, in many cases/contexts. The availability of the control plane itself (i.e., being stable/resilient enough to continue doing its job even under various forms of duress) as well as the availability of the information about paths it propagates in order to allow the routing of transit traffic both fall squarely within the rubric of security, IMHO. The disruption of transit traffic routing often caused by route leaks, as in this particular case, has a negative impact of the overall availability of affected networks/endpoints/applications/services/data. However, route leaks are only one potential cause of such hits to availability - and while there are several BCPs which can and should be adopted in order to protect against control-plane disruption, they in many cases honored more in the breach than in the observance due to complexity, opex (as is the case with many - some would say most - security-related BCPs), and so forth. The single best thing which could be done to improve the stability/resiliency of the control-plane on IP networks in general would be to change the nature of the control-plane (not just BGP, but the IGPs, as well) from in-band to out-of-band, IMHO. I know this will probably never happen, but wanted to be sure that the point was made in relation to this specific topic for the sake of completeness, if nothing else. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

Shane Amante

6:28 a.m.

On Feb 24, 2012, at 5:49 PM, Randy Bush wrote:

...

...
Solving for route leaks is /the/ "killer app" for BGPSEC.

as would be solving world hunger, war, bad cooking, especially bad cooking.

route leaks, as much as i understand them o are indeed bad ops issues o are not security per se o are a violation of business relationshiops o and 20 years of fighting them have not given us any significant increase in understanding, formal definition, or prevention.

i would love to see progress on the route leak problem. i do not confuddle it with security.

So, it is not OK for traffic to be /intentionally/ diverted through a malevolent AS, but it is OK for traffic to be /unintentionally/ diverted through a (possibly) malevolent AS? Who's to judge the security exposure[1] of the latter is not identical (or, worse) than the former? -shane [1] dropped traffic, traffic analysis, etc.

Randy Bush

9:05 a.m.

...

So, it is not OK for traffic to be /intentionally/ diverted through a malevolent AS

traffic? i do not hold the fantasy that traffic is highly correlated to the control plane. see http://archive.psg.com/optometry.pdf if you need a disproof of the fantasy.

...

but it is OK for traffic to be /unintentionally/ diverted through a (possibly) malevolent AS?

intent? how the hell do i know intent? i can barely read my own mind let alone telstra's. and i very much doubt telstra thought they were _attacking_ optus. randy

Randy Bush

9:52 a.m.

...

as would be solving world hunger, war, bad cooking, especially bad cooking.

route leaks, as much as i understand them o are indeed bad ops issues o are not security per se o are a violation of business relationshiops o and 20 years of fighting them have not given us any significant increase in understanding, formal definition, or prevention.

let me try to express how i see the problem. to do this rigorously, i would need to form the transitive closure of the business policies of every inter-provider link on the internet. why i say it is per-link and not just inter-as (which would be hard enough) is that i know a *lot* of examples where two ass have different business policies on different links. [ i'll exchange se asian routes with you in hong kong, but only sell you transit in tokyo. we have two links in frankfurt, one local peering and one international transit. ] it is not just one-hop because telstra was 'supposed to' pass some customers' customers' routes to optus. i find this daunting. but i would *really* like to be able to rigorously solve it. please please please explain to me how it is simpler than this. randy

Randy Bush

12:38 a.m.

...

...
I'm optimistic that all the good folks focusing on this in their day jobs, and expressly funded and resourced to do so, will eventually recognize what I'm calling "leaks" is part of the routing security problem.

Sure; I don't disagree, and I don't think that Randy does. But just because we can't solve the whole problem, does that mean we shouldn't solve any of it?

is it a *security* problem? it is a violation of business intent. and one we would like to solve. but it is not clear to me that 'leaks' are really a security issue. randy

Dobbins, Roland

24 Feb 24 Feb

6:57 a.m.

On Feb 24, 2012, at 9:00 AM, Danny McPherson wrote:

...

Prefix limits are rather binary and indiscriminate, indeed.

AS-PATH filters and max-length filters, OTOH, are not. Also, it's important that network operators understand that flap-dampening has been iatrogenic for many years, now. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton

Randy Bush

7:02 a.m.

...

Also, it's important that network operators understand that flap-dampening has been iatrogenic for many years, now.

wellllll, ... https://datatracker.ietf.org/doc/draft-ymbk-rfd-usable/ randy

Jay Mitchell

23 Feb 23 Feb

7:54 a.m.

I'm laughing now, but it wasn't funny a couple of hours ago. Seems a lot of the .au govt needs to learn some carrier diversity... On 23/02/2012, at 4:41 PM, Randy Bush <randy@psg.com> wrote:

...

don't filter your customers. when they leak the world to you, it will get you a lot of free press and your marketing department will love you.

just ask telstra.

randy

Christian de Larrinaga

10:22 a.m.

not just the .au govt C On 23 Feb 2012, at 07:54, Jay Mitchell wrote:

...

I'm laughing now, but it wasn't funny a couple of hours ago. Seems a lot of the .au govt needs to learn some carrier diversity...

On 23/02/2012, at 4:41 PM, Randy Bush <randy@psg.com> wrote:

...
don't filter your customers. when they leak the world to you, it will get you a lot of free press and your marketing department will love you.

just ask telstra.

randy

virendra rode

8:47 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Speaking of leaking the world, I remember one of our transit peer during their nightly maintenance decided they needed people to talk to, so they decided to share some love by passing ~ 350k routes causing a meltdown. As lesson learned, we included a combination of prefix-list & maximum-prefix filters as part of our config script. When the hard limit hits a certain percentage, we get alerted that the neighbor is approaching the limit. regards, /virendra On 02/22/2012 09:41 PM, Randy Bush wrote:

...

don't filter your customers. when they leak the world to you, it will get you a lot of free press and your marketing department will love you.

just ask telstra.

randy

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iF4EAREIAAYFAk9GpfUACgkQ3HuimOHfh+HwZgD/dlgPaTsxCs0cyRFVBsDI2J5i /dLwyQrUADOySuKlgn0A/iuF+gojyqIbLwstPin0Je06KDytE8AYsNuwLXCmAWI5 =qrOK -----END PGP SIGNATURE-----

4879

Age (days ago)

4882

Last active (days ago)

List overview

Download

58 comments

26 participants

participants (26)

Anurag Bhatia
Christian de Larrinaga
Christian Nielsen
Christopher Morrow
Danny McPherson
Dobbins, Roland
Dongting Yu
Faisal Imtiaz
Geoff Huston
George Bonser
goemon＠anime.net
Jay Mitchell
Jeff Young
Jeffrey S. Young
Joe Maimon
Julien Goodwin
Leo Bicknell
Nick Hilliard
Peter Ehiwe
Randy Bush
Richard Barnes
Shane Amante
Steven Bellovin
Tom Hill
Valdis.Kletnieks＠vt.edu
virendra rode