Max Prefix Out, was Re: Verizon 701 Route leak?

older
Moving fibre trunks: interruptions?

Michael Still

29 Aug 2017 29 Aug '17

6:41 p.m.

I agree a max-prefix outbound could potentially be useful and would hopefully not be too terribly difficult to implement for most vendors. Perhaps RFC4486 would need to be updated to reflect this as a possibility as well? On Mon, Aug 28, 2017 at 5:41 PM, Julien Goodwin <nanog@studio442.com.au> wrote:

...

On 28/08/17 18:34, Job Snijders wrote:

...
Finally, it may be worthwhile exploring if we can standardize and promote maximum prefix limits applied on the the _sending_ side. This way you protect your neighbor (and the Internet at large) by self-destructing when you inadvertently announce more than what you'd expect to announce. BIRD has this functionality http://bird.network.cz/?get_doc&f=bird-3.html#proto-export-limit however I am not aware of other implementations. Feedback welcome!

Having just dug up the reference for some strange reason...

Back at NANOG38 (2006) Tom Scholl mentioned in a talk on max prefix: "Perhaps maximum-prefix outbound? (Suggested by Eric Bell years ago)" https://www.nanog.org/meetings/nanog38/presentations/scholl-maxpfx.pdf

Notably Juniper does now have prefix-export-limit, but only for readvertisement into IS-IS or OSPF: https://www.juniper.net/documentation/en_US/junos/topics/reference/configura...

-- [stillwaxin@gmail.com ~]$ cat .signature cat: .signature: No such file or directory [stillwaxin@gmail.com ~]$

Show replies by date

Alejandro Acosta

31 Aug 31 Aug

5:01 a.m.

What a terrific idea..., simple & useful El 29/8/17 a las 1:41 p.m., Michael Still escribió:

...

I agree a max-prefix outbound could potentially be useful and would hopefully not be too terribly difficult to implement for most vendors.

Perhaps RFC4486 would need to be updated to reflect this as a possibility as well?

On Mon, Aug 28, 2017 at 5:41 PM, Julien Goodwin <nanog@studio442.com.au> wrote:

...
On 28/08/17 18:34, Job Snijders wrote:

...
Finally, it may be worthwhile exploring if we can standardize and promote maximum prefix limits applied on the the _sending_ side. This way you protect your neighbor (and the Internet at large) by self-destructing when you inadvertently announce more than what you'd expect to announce. BIRD has this functionality http://bird.network.cz/?get_doc&f=bird-3.html#proto-export-limit however I am not aware of other implementations. Feedback welcome! Having just dug up the reference for some strange reason...

Back at NANOG38 (2006) Tom Scholl mentioned in a talk on max prefix: "Perhaps maximum-prefix outbound? (Suggested by Eric Bell years ago)" https://www.nanog.org/meetings/nanog38/presentations/scholl-maxpfx.pdf

Notably Juniper does now have prefix-export-limit, but only for readvertisement into IS-IS or OSPF: https://www.juniper.net/documentation/en_US/junos/topics/reference/configura...

Jörg Kost

10:50 a.m.

Hi, but isn't peer A prefix-out a synonym for peer B prefix-in, that will lead to the same result, e.g. a BGP teardown? I just feel that this will add another factor, that people will not use or abuse: neigh $x max-out infinite What about adding an option to the BGP session that A & B do agree on a fixed number of prefixes in both directions, so Bs prefix-in could be As prefix-out automatically? Jörg On 31 Aug 2017, at 7:01, Alejandro Acosta wrote:

...

What a terrific idea..., simple & useful

El 29/8/17 a las 1:41 p.m., Michael Still escribió:

...
I agree a max-prefix outbound could potentially be useful and would hopefully not be too terribly difficult to implement for most vendors.

Perhaps RFC4486 would need to be updated to reflect this as a possibility as well?

Job Snijders

11:06 a.m.

Dear Jörg, On Thu, Aug 31, 2017 at 12:50:58PM +0200, Jörg Kost wrote:

...

but isn't peer A prefix-out a synonym for peer B prefix-in, that will lead to the same result, e.g. a BGP teardown?

I just feel that this will add another factor, that people will not use or abuse: neigh $x max-out infinite

I feel you may be overlooking a key aspect here: Currently all of us rely on our peer's 'inbound maximum prefix limit', and obviously these are not always set correctly. An 'outbound maximum prefix limit' offers networks that care about the rest of the world the option to 'self-destruct' the EBGP session in order to protect others. An 'outbound maximum prefix limit' is a 'permissionless' feature in that you do not require cooperation or support from your peering partner at the other end of the sessio in order to deploy the 'self-destruct to protect' mechanism. If you don't want to use it, then don't. If people configure "neighbor $x max-out infinite" that is fine by me, at least they made a conscience choice, it is no worse than today, and it is clearly documented in the running-configuration what the ramifications of the EBGP session could be.

...

What about adding an option to the BGP session that A & B do agree on a fixed number of prefixes in both directions, so Bs prefix-in could be As prefix-out automatically?

I prefer unilateral permissionless mechanisms. Adding new negotiable options to BGP sessions is a lot of work and requires both parties to run software that supports the new feature, whatever it is. Anything that can be done without requiring your peer's cooperation will be more robust. Kind regards, Job

Jörg Kost

1:21 p.m.

Hi, but in reality you will factorise and summarize outbound and inbound numbers, create spare room for sessions and failover scenarios and therefore leaks and especially partial leaks can still occur. In another example scenario the BGP process may not only shutdown the session to B, that has run into an outbound warning, but all other sessions to prevent "leaks". Last-resort the router will only judge by the number of the prefixes and therefore could shutdown himself by accident, especially if this router was not the origin. That could be a global headache ;-) Jörg On 31 Aug 2017, at 13:06, Job Snijders wrote:

...

Dear Jörg,

On Thu, Aug 31, 2017 at 12:50:58PM +0200, Jörg Kost wrote:

...
but isn't peer A prefix-out a synonym for peer B prefix-in, that will lead to the same result, e.g. a BGP teardown?

I just feel that this will add another factor, that people will not use or abuse: neigh $x max-out infinite

I feel you may be overlooking a key aspect here: Currently all of us rely on our peer's 'inbound maximum prefix limit', and obviously these are not always set correctly. An 'outbound maximum prefix limit' offers networks that care about the rest of the world the option to 'self-destruct' the EBGP session in order to protect others.

Michael Still

2:02 p.m.

I think what this is is just a new (potentially) knob that can be turned. If you don't want to turn it that's your deal, you run your network how you want. There's been no suggestion that there be some explicit default value or even that its turned on by default so behavior won't change unless configured and if you configure it, you are on the hook for knowing how that might affect the behavior of your network. I would expect BGP speakers (router vendors / software devs) to implement this in a way such that it would syslog or otherwise trigger when the number of outbound prefixes reaches a specific percentage (of configured limit) or hard number so that either an engineer could respond or automation take place to do something in response. On Thu, Aug 31, 2017 at 9:21 AM, Jörg Kost <jk@ip-clear.de> wrote:

...

Hi,

but in reality you will factorise and summarize outbound and inbound numbers, create spare room for sessions and failover scenarios and therefore leaks and especially partial leaks can still occur.

In another example scenario the BGP process may not only shutdown the session to B, that has run into an outbound warning, but all other sessions to prevent "leaks". Last-resort the router will only judge by the number of the prefixes and therefore could shutdown himself by accident, especially if this router was not the origin. That could be a global headache ;-)

Jörg

On 31 Aug 2017, at 13:06, Job Snijders wrote:

...
Dear Jörg,

On Thu, Aug 31, 2017 at 12:50:58PM +0200, Jörg Kost wrote:

...
but isn't peer A prefix-out a synonym for peer B prefix-in, that will lead to the same result, e.g. a BGP teardown?

I just feel that this will add another factor, that people will not use or abuse: neigh $x max-out infinite

I feel you may be overlooking a key aspect here: Currently all of us rely on our peer's 'inbound maximum prefix limit', and obviously these are not always set correctly. An 'outbound maximum prefix limit' offers networks that care about the rest of the world the option to 'self-destruct' the EBGP session in order to protect others.

-- [stillwaxin@gmail.com ~]$ cat .signature cat: .signature: No such file or directory [stillwaxin@gmail.com ~]$

Leo Bicknell

3:24 p.m.

In a message written on Thu, Aug 31, 2017 at 12:50:58PM +0200, J??rg Kost wrote:

...

What about adding an option to the BGP session that A & B do agree on a fixed number of prefixes in both directions, so Bs prefix-in could be As prefix-out automatically?

As others have pointed out, that's harder to do, but there's a different reason it may not be desireable. If a peer sets a limit to tear down the session with no automatic reset, forcing a call to their NOC to get a human to reset it then it may be advantageous to set your side to tear down at N-1 prefixes. That way you can insure restoration at the speed of your NOC, and not at the speed of your peer's. -- Leo Bicknell - bicknell@ufp.org PGP keys at http://www.ufp.org/~bicknell/

Christopher Morrow

3:57 p.m.

On Thu, Aug 31, 2017 at 11:24 AM, Leo Bicknell <bicknell@ufp.org> wrote:

...

In a message written on Thu, Aug 31, 2017 at 12:50:58PM +0200, J??rg Kost wrote:

...
What about adding an option to the BGP session that A & B do agree on a fixed number of prefixes in both directions, so Bs prefix-in could be As prefix-out automatically?

As others have pointed out, that's harder to do, but there's a different reason it may not be desireable.

If a peer sets a limit to tear down the session with no automatic reset, forcing a call to their NOC to get a human to reset it then it may be advantageous to set your side to tear down at N-1 prefixes. That way you can insure restoration at the speed of your NOC, and not at the speed of your peer's.

Generally controlling your own destiny is preferred, I agree with that. I think also being able to say: "I shouldn't ever send more than 477 routes, let's round up for ops reasons to 1k max" seems like a great way to make your network safer for the rest of the network. Yes, people (as job and others noted) could set 'too high' limits... ok, that's their decision to make. Yes, maybe in the 523 prefixes that leak in my example there could be some affected party... I think it's pretty unlikely that there will be widescale damage from a small number of routes leaking, there are certainly plenty of documented cases of wide scale problems from full table leaks though :) Yes, your sessions might bounce or stay-down... it's probably better to go down on a some peers and have control to get back up on your side, than to cause widescale outages due to a full table leak. i'd be in favor of a output max prefix limit knob.

Randy Bush

1 Sep 1 Sep

9:26 a.m.

i have 142 largish bgp customers, a large enough number that the number of prefixes i receive from them varies annoyingly. how do i reasonably automate setting of my outbound prefix limit? randy

Patrick W. Gilmore

11:56 a.m.

On Sep 1, 2017, at 5:26 AM, Randy Bush <randy@psg.com> wrote:

...

i have 142 largish bgp customers, a large enough number that the number of prefixes i receive from them varies annoyingly. how do i reasonably automate setting of my outbound prefix limit?

First, it seems you know the inbound so automating the outbound is simple arithmetic. But even if that is unruly, setting the outbound to, say, 300K or so would keep you from spilling a full table. Not perfect, but better than nothing. Orrrrrr, perhaps this feature is not for you? -- TTFN, patrick

Christopher Morrow

3:21 p.m.

On Fri, Sep 1, 2017 at 7:56 AM, Patrick W. Gilmore <patrick@ianai.net> wrote:

...

On Sep 1, 2017, at 5:26 AM, Randy Bush <randy@psg.com> wrote:

...
i have 142 largish bgp customers, a large enough number that the number of prefixes i receive from them varies annoyingly. how do i reasonably automate setting of my outbound prefix limit?

First, it seems you know the inbound so automating the outbound is simple arithmetic.

I would have said the same... i ought to know high-water marks for your inbound peer count(s), and can work out a +20% outbound... you also probably can survey your outbound peerings and +20% some high-water mark there, or make a 'meet in the middle' between the inbound-math and outbound-math.

...

But even if that is unruly, setting the outbound to, say, 300K or so would keep you from spilling a full table. Not perfect, but better than nothing.

Orrrrrr, perhaps this feature is not for you?

-- TTFN, patrick

Randy Bush

2 Sep 2 Sep

3:40 a.m.

...

...
...
i have 142 largish bgp customers, a large enough number that the number of prefixes i receive from them varies annoyingly. how do i reasonably automate setting of my outbound prefix limit?

First, it seems you know the inbound so automating the outbound is simple arithmetic.

I would have said the same... i ought to know high-water marks for your inbound peer count(s), and can work out a +20% outbound...

you just assumed that the transitive closure of everybody's cones implement and propagate count. ain't gonna happen.

Job Snijders

7:05 a.m.

On Sat, 2 Sep 2017 at 05:41, Randy Bush <randy@psg.com> wrote:

...

...
...
...
i have 142 largish bgp customers, a large enough number that the number of prefixes i receive from them varies annoyingly. how do i reasonably automate setting of my outbound prefix limit?

First, it seems you know the inbound so automating the outbound is simple arithmetic.

I would have said the same... i ought to know high-water marks for your inbound peer count(s), and can work out a +20% outbound...

you just assumed that the transitive closure of everybody's cones implement and propagate count. ain't gonna happen.

I am not sure what the issue here is. If I can tell my peering partner a recommended maximum prefix value for them to set on their side, surely I can configure that same value on my side as the upper outbound limit. Kind regards, Job

Randy Bush

7:27 a.m.

...

...
...
...
...
i have 142 largish bgp customers, a large enough number that the number of prefixes i receive from them varies annoyingly. how do i reasonably automate setting of my outbound prefix limit?

First, it seems you know the inbound so automating the outbound is simple arithmetic.

I would have said the same... i ought to know high-water marks for your inbound peer count(s), and can work out a +20% outbound...

you just assumed that the transitive closure of everybody's cones implement and propagate count. ain't gonna happen.

I am not sure what the issue here is. If I can tell my peering partner a recommended maximum prefix value for them to set on their side, surely I can configure that same value on my side as the upper outbound limit.

which is why i do not tell peers a max count. this stuff works for small isps, in the lab, ... but not at scale; especially when you have isps as customers. i wish it did. bgp at scale is rather dynamic. i suspect your $dayjob's irr filters being exact help a bit. randy

Job Snijders

8:16 a.m.

On Sat, Sep 02, 2017 at 04:27:03PM +0900, Randy Bush wrote:

...

...
I am not sure what the issue here is. If I can tell my peering partner a recommended maximum prefix value for them to set on their side, surely I can configure that same value on my side as the upper outbound limit.

which is why i do not tell peers a max count.

I think you'll find that some of your peers will make an educated guess and set an inbound limit anyway. Actively requesting that no limit is applied may make one part of a fringe minority. Most networks publish a baseline number via a rendezvous point like PeeringDB, this makes it easy to signal to larger groups what the recommended values are.

...

this stuff works for small isps, in the lab, ... but not at scale; especially when you have isps as customers. i wish it did.

In this context "small ISPs" may account for the majority of the target audience. It appears there are about 50,000 "origin only" ASNs [1], for the majority of those it'll be straightforward to decide on a sensible max-out value. BGP speaking CDN caching nodes are also low hanging fruit. But even for a network like NTT I can see benefits of a max-out limit in a number of scenarios.

...

bgp at scale is rather dynamic. i suspect your $dayjob's irr filters being exact help a bit.

Yes, BGP is dynamic, but these days a lot of the topology at the wholesale level has been firmly pinned down through mechanisms like 'peerlock' [2]. Speaking as an ISP for ISPs: NTT/2914 applies an inbound maximum-prefix limit on each and every EBGP session. Kind regards, Job [1]: http://bgp.potaroo.net/cgi-bin/plota?file=%2fvar%2fdata%2fbgp%2fas2%2e0%2fbgp-as-term%2etxt&descr=Origin%20only%20ASes&ylabel=Origin%20only%20ASes&with=step [2]: http://instituut.net/~job/peerlock_manual.pdf

Christopher Morrow

4:08 p.m.

(from earlier randy)

...

you just assumed that the transitive closure of everybody's cones implement and propagate count. ain't gonna happen.

well, I was thinking that you can survey your customers to know their approximate inbound number, you can implement a max-prefix in from them with that (ideally you're already doing that). You can figure out the output from you as well in a similar fashion. In either case you're not implementing a limit that's 1% larger than the actual number, you're hedging the number for at least operational overhead reasons to 20-40%. Even a large ISP is sending (today) less than 100k prefixes when the peer isn't asking for 'full routes'. So, I'd imagine you bucket your customers as: default only - limit 10 customer prefixes only - limit +30% of your customer routes set full transit - +20% of current full table (yes, you may have more buckets than me, meh) and those are good starting points, if you keep these bucketed you can just ratchet up the limits as time requires. The prefix-limits (in or out) isn't to stop jim-isp from sending 2 of jane-isp's routes, it's to keep jim-isp from making a bad situation very bad. You (ideally!) have prefix-lists to limit jim from sending jane's routes. On Sat, Sep 2, 2017 at 4:16 AM, Job Snijders <job@instituut.net> wrote:

...

On Sat, Sep 02, 2017 at 04:27:03PM +0900, Randy Bush wrote:

...
...
I am not sure what the issue here is. If I can tell my peering partner a recommended maximum prefix value for them to set on their side, surely I can configure that same value on my side as the upper outbound limit.

which is why i do not tell peers a max count.

I think you'll find that some of your peers will make an educated guess and set an inbound limit anyway. Actively requesting that no limit is applied may make one part of a fringe minority.

This is a quick survey of your peers and setting the buckets from above at 'sane' limits, right?

...

Most networks publish a baseline number via a rendezvous point like PeeringDB, this makes it easy to signal to larger groups what the recommended values are.

...
this stuff works for small isps, in the lab, ... but not at scale; especially when you have isps as customers. i wish it did.

In this context "small ISPs" may account for the majority of the target audience. It appears there are about 50,000 "origin only" ASNs [1], for the majority of those it'll be straightforward to decide on a sensible max-out value. BGP speaking CDN caching nodes are also low hanging fruit. But even for a network like NTT I can see benefits of a max-out limit in a number of scenarios.

...
bgp at scale is rather dynamic. i suspect your $dayjob's irr filters being exact help a bit.

Yes, BGP is dynamic, but these days a lot of the topology at the wholesale level has been firmly pinned down through mechanisms like 'peerlock' [2].

Speaking as an ISP for ISPs: NTT/2914 applies an inbound maximum-prefix limit on each and every EBGP session.

you can answer this if you want, or not.. but I'm curious, is this tuned-per-peer? or via some bucket form as I proposed above? I expect ntt COULD per-peer, since I think almost-all-config is auto-generated, but I'd be curious still if you decided at the bucket level instead because it's saner to think about it that way (for me anyway) or just went 'current +N%' for each peer?

...

Kind regards,

Job

[1]: http://bgp.potaroo.net/cgi-bin/plota?file=%2fvar%2fdata% 2fbgp%2fas2%2e0%2fbgp-as-term%2etxt&descr=Origin%20only% 20ASes&ylabel=Origin%20only%20ASes&with=step [2]: http://instituut.net/~job/peerlock_manual.pdf

Job Snijders

5:41 p.m.

On Sat, Sep 02, 2017 at 12:08:41PM -0400, Christopher Morrow wrote:

...

...
I think you'll find that some of your peers will make an educated guess and set an inbound limit anyway. Actively requesting that no limit is applied may make one part of a fringe minority.

This is a quick survey of your peers and setting the buckets from above at 'sane' limits, right?

yes

...

...
Speaking as an ISP for ISPs: NTT/2914 applies an inbound maximum-prefix limit on each and every EBGP session.

you can answer this if you want, or not.. but I'm curious, is this tuned-per-peer? or via some bucket form as I proposed above? I expect ntt COULD per-peer, since I think almost-all-config is auto-generated, but I'd be curious still if you decided at the bucket level instead because it's saner to think about it that way (for me anyway) or just went 'current +N%' for each peer?

I can contribute two examples: NTT (AS 2914): We use both approaches. For downstream customers a simple bucket system is used (currently with just one bucket: 31000 for IPv4, 2000 for IPv6). On the peering side of things the announcement count for each peer is polled at regular intervals and a +N% limit is set. In both approaches an override option is available in case someone emails the NOC "hey, we are about to turn up something big, can you ensure there is sufficient headroom". Coloclue (AS 8283): For every peering partner, data is fetched from the PeeringDB API and the fields visible here https://www.peeringdb.com/asn/2914 as 'IPv4 Prefixes' and 'IPv6 Prefixes' are used as input into the router configuration process. Coloclue's formula is simple, if the field's value is less than 100, set the limit to 100, if the value is over 100: add 10% to whatever value was published. This process is executed every 12 hours. In case no PDB record for the ASN exists: set 10,000 for IPv4 / 1,000 for IPv6. A manual override mechanism exists. If I compare the two: NTT's method emphasizes business continuity and has no external dependencies, Coloclue (being a network for experimentation) explores how to avoid explicit noc-to-noc coordination and relies on self-published data being kept up to date. Whatever your cooking method, maximum prefix limits should never get in the way of normal operations (e.g. organic growth), but exist only to try to nip obvious route leaks in the bud. This means one can be quite generous when picking values. Kind regards, Job

Theodore Baschak

3 Sep 3 Sep

12:05 a.m.

...

On Sep 2, 2017, at 12:41 PM, Job Snijders <job@instituut.net> wrote:

Coloclue (AS 8283):

For every peering partner, data is fetched from the PeeringDB API and the fields visible here https://www.peeringdb.com/asn/2914 as 'IPv4 Prefixes' and 'IPv6 Prefixes' are used as input into the router configuration process. Coloclue's formula is simple, if the field's value is less than 100, set the limit to 100, if the value is over 100: add 10% to whatever value was published. This process is executed every 12 hours. In case no PDB record for the ASN exists: set 10,000 for IPv4 / 1,000 for IPv6. A manual override mechanism exists.

If I compare the two: NTT's method emphasizes business continuity and has no external dependencies, Coloclue (being a network for experimentation) explores how to avoid explicit noc-to-noc coordination and relies on self-published data being kept up to date.

How has the Coloclue max-prefix method described worked out? This sounds pretty effective for this type of network. How often has manual intervention (beyond a pre-arranged manual override) been required? Theodore Baschak - AS395089 - Hextet Systems https://bgp.guru/ - https://hextet.net/ http://mbix.ca/ - http://mbnog.ca/

Randy Bush

3 a.m.

...

well, I was thinking that you can survey your customers to know their approximate inbound number, you can implement a max-prefix in from them with that (ideally you're already doing that).

You can figure out the output from you as well in a similar fashion.

In either case you're not implementing a limit that's 1% larger than the actual number, you're hedging the number for at least operational overhead reasons to 20-40%. Even a large ISP is sending (today) less than 100k prefixes when the peer isn't asking for 'full routes'.

So, I'd imagine you bucket your customers as: default only - limit 10 customer prefixes only - limit +30% of your customer routes set full transit - +20% of current full table (yes, you may have more buckets than me, meh)

and those are good starting points, if you keep these bucketed you can just ratchet up the limits as time requires. The prefix-limits (in or out) isn't to stop jim-isp from sending 2 of jane-isp's routes, it's to keep jim-isp from making a bad situation very bad. You (ideally!) have prefix-lists to limit jim from sending jane's routes.

first, i have no magic bullet. sure wish i did. and i do not mean ill using ntt as an example; after all, job assures us they are very very important and very smart :) even pulling from peering.db, which is about as well-maintained as the irr (a race to the bottom), as job suggests, this relies on manual maintenance. it assumes the same count at all peerings, etc. etc. and the registered counts are horrifyingly approximate; ntt could leak 10k prefixes and not hit the limit as published. that they are gross approximations shows that they are not at all rigorous, calculated, ... this is not to say that any reasonable prefix count would have allowed the full-table goog leak to vz. and vz could have used an as-path filter not allowing _goog_(lotso-tier-ones)_ (which ntt uses, for example). but without a rigorous source of ground truth, prefix count limits will be approximate upper bounds and hence allow large mis-announcements. it is one tool in a sadly sparse toolbox, and not a strong one. randy

Tassos Chatzithomaoglou

31 Aug 31 Aug

5:20 p.m.

I guess you're looking into something similar to https://tools.ietf.org/html/draft-keyur-idr-bgp-prefix-limit-orf. -- Tassos Jörg Kost wrote on 31/8/17 13:50:

...

What about adding an option to the BGP session that A & B do agree on a fixed number of prefixes in both directions, so Bs prefix-in could be As prefix-out automatically?

Jörg

3034

Age (days ago)

3039

Last active (days ago)

List overview

Download

19 comments

11 participants

participants (11)

Alejandro Acosta
Christopher Morrow
Job Snijders
Job Snijders
Jörg Kost
Leo Bicknell
Michael Still
Patrick W. Gilmore
Randy Bush
Tassos Chatzithomaoglou
Theodore Baschak