any dangers of filtering every /24 on full internet table to preserve FIB space ?
Hello, We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix. What do you think about this approach ? Also maybe you know - some advices for edge routers that have at least 8x100G interfaces and "good" memory for prefix count ? Thanks
On Mon, Oct 10, 2022 at 7:59 AM Edvinas Kairys <edvinas.email@gmail.com> wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
What do you think about this approach ?
Also maybe you know - some advices for edge routers that have at least 8x100G interfaces and "good" memory for prefix count ? Thanks
Sounds good to me. Especially , if you are prioritizing ipv6 routes in your fib What is your use case ? Selling transit bgp might be tricky since you will be not sending specifics to your downstreams. If you are edge network like me, taking a default from your upstream solves all problems and you can filter and TE as you wish.
On Mon, Oct 10, 2022 at 05:58:45PM +0300, Edvinas Kairys <edvinas.email@gmail.com> wrote a message of 35 lines which said:
But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc.
I don't think this is true, even in theory, specially for legacy prefixes. There is probably somewhere a Geoff Huston survey on /24 without a covering route.
On Mon, Oct 10, 2022 at 05:20:33PM +0200, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote a message of 10 lines which said:
But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc.
I don't think this is true, even in theory, specially for legacy prefixes.
I even find an example on my employer's network :-) 192.93.0.0/24
Heho, Let alone $all the /24 assigned under the RIPE waiting list policy. In the Geoff Huston spirit, I quickly took a look how less specifics for /24s looks in my table: 8 {'no_less_specific': 16, 'has_less_specific': 0, 'sum': 16, 'least_specific_length': {}} 9 {'no_less_specific': 9, 'has_less_specific': 4, 'sum': 13, 'least_specific_length': {'8': 4}} 10 {'no_less_specific': 38, 'has_less_specific': 0, 'sum': 38, 'least_specific_length': {}} 11 {'no_less_specific': 98, 'has_less_specific': 4, 'sum': 102, 'least_specific_length': {'10': 4}} 12 {'no_less_specific': 269, 'has_less_specific': 31, 'sum': 300, 'least_specific_length': {'9': 5, '11': 23, '8': 1, '10': 2}} 13 {'no_less_specific': 490, 'has_less_specific': 98, 'sum': 588, 'least_specific_length': {'11': 46, '8': 2, '12': 48, '10': 2}} 14 {'no_less_specific': 1022, 'has_less_specific': 188, 'sum': 1210, 'least_specific_length': {'13': 99, '8': 4, '12': 44, '11': 33, '10': 8}} 15 {'no_less_specific': 1641, 'has_less_specific': 476, 'sum': 2117, 'least_specific_length': {'14': 210, '13': 95, '12': 87, '11': 37, '8': 21, '9': 3, '10': 23}} 16 {'no_less_specific': 10319, 'has_less_specific': 3286, 'sum': 13605, 'least_specific_length': {'15': 577, '14': 527, '12': 470, '13': 548, '9': 14, '11': 446, '8': 173, '10': 531}} 17 {'no_less_specific': 4474, 'has_less_specific': 3942, 'sum': 8416, 'least_specific_length': {'16': 1816, '14': 536, '12': 276, '13': 343, '8': 44, '15': 324, '9': 4, '10': 181, '11': 418}} 18 {'no_less_specific': 6926, 'has_less_specific': 7179, 'sum': 14105, 'least_specific_length': {'17': 888, '14': 1367, '16': 2394, '15': 776, '12': 289, '13': 487, '9': 15, '11': 514, '8': 108, '10': 341}} 19 {'no_less_specific': 15056, 'has_less_specific': 10151, 'sum': 25207, 'least_specific_length': {'17': 813, '16': 2561, '15': 1113, '13': 758, '14': 1373, '12': 544, '18': 1213, '8': 198, '9': 28, '11': 770, '10': 780}} 20 {'no_less_specific': 19592, 'has_less_specific': 24430, 'sum': 44022, 'least_specific_length': {'17': 1319, '14': 3435, '16': 6868, '12': 1216, '11': 1568, '13': 2221, '15': 1919, '18': 1450, '19': 2465, '9': 45, '8': 374, '10': 1550}} 21 {'no_less_specific': 22889, 'has_less_specific': 30065, 'sum': 52954, 'least_specific_length': {'17': 1886, '16': 5234, '14': 2569, '13': 1346, '19': 5019, '18': 1717, '12': 2011, '9': 78, '20': 3210, '15': 1760, '8': 513, '11': 3001, '10': 1721}} 22 {'no_less_specific': 59137, 'has_less_specific': 51280, 'sum': 110417, 'least_specific_length': {'17': 3787, '16': 10049, '13': 5469, '19': 4100, '14': 3784, '21': 3287, '18': 3128, '11': 2965, '12': 3428, '20': 4152, '15': 3157, '8': 1018, '9': 166, '10': 2790}} 23 {'no_less_specific': 41052, 'has_less_specific': 60043, 'sum': 101095, 'least_specific_length': {'17': 3844, '21': 3382, '14': 3324, '16': 10032, '22': 13207, '19': 5658, '15': 3007, '18': 2973, '11': 2243, '13': 1645, '12': 1752, '9': 277, '20': 4941, '8': 1260, '10': 2498}} 24 {'no_less_specific': 257032, 'has_less_specific': 295714, 'sum': 552746, 'least_specific_length': {'22': 38330, '17': 19319, '16': 51487, '21': 16799, '23': 14813, '13': 10067, '14': 14328, '20': 26634, '18': 19216, '12': 9440, '11': 10001, '15': 14437, '9': 2700, '19': 31119, '10': 7992, '8': 9032}} So it seems like there is a healthy amount (~260k) prefixes which lack a less specific. With best regards, Tobias -----Original Message----- From: NANOG <nanog-bounces+tobias=reads-this-mailinglist.com@nanog.org> On Behalf Of Stephane Bortzmeyer Sent: Monday, 10 October 2022 17:21 To: Edvinas Kairys <edvinas.email@gmail.com> Cc: NANOG Operators' Group <nanog@nanog.org> Subject: Re: any dangers of filtering every /24 on full internet table to preserve FIB space ? On Mon, Oct 10, 2022 at 05:58:45PM +0300, Edvinas Kairys <edvinas.email@gmail.com> wrote a message of 35 lines which said:
But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc.
I don't think this is true, even in theory, specially for legacy prefixes. There is probably somewhere a Geoff Huston survey on /24 without a covering route.
On 11 Oct 2022, at 4:23 am, Tobias Fiebig <tobias@reads-this-mailinglist.com> wrote:
Heho, Let alone $all the /24 assigned under the RIPE waiting list policy.
In the Geoff Huston spirit, I quickly took a look how less specifics for /24s looks in my table:
[…]
So it seems like there is a healthy amount (~260k) prefixes which lack a less specific.
I also looked using a slightly different approach - namely looking for /24s where there was no spanning aggregate that matched the /24’s AS Path. In my local table there are 224,580 of them. Geoff
The OP can always take the provider's address space plus their customer's routes and use a default route to fill in the blanks. I did this at a provider years ago where the global routing table outgrew the speed they could spend the money on upgrades and it worked out well. I think it was two upstreams and a connection into a TIE with good peering. -richey On Mon, Oct 10, 2022 at 4:11 PM Geoff Huston <gih@apnic.net> wrote:
On 11 Oct 2022, at 4:23 am, Tobias Fiebig <tobias@reads-this-mailinglist.com> wrote:
Heho, Let alone $all the /24 assigned under the RIPE waiting list policy.
In the Geoff Huston spirit, I quickly took a look how less specifics for /24s looks in my table:
[…]
So it seems like there is a healthy amount (~260k) prefixes which lack a less specific.
I also looked using a slightly different approach - namely looking for /24s where there was no spanning aggregate that matched the /24’s AS Path. In my local table there are 224,580 of them.
Geoff
I frequently do this (accept peer’s, and their customers prefixes), and it works out well. Then you can choose where you want the rest of it to go. With multiple peers in your country this works out quite well. On Mon, Oct 10, 2022 at 5:02 PM richey goldberg <richey.goldberg@gmail.com> wrote:
The OP can always take the provider's address space plus their customer's routes and use a default route to fill in the blanks. I did this at a provider years ago where the global routing table outgrew the speed they could spend the money on upgrades and it worked out well. I think it was two upstreams and a connection into a TIE with good peering.
-richey
On Mon, Oct 10, 2022 at 4:11 PM Geoff Huston <gih@apnic.net> wrote:
On 11 Oct 2022, at 4:23 am, Tobias Fiebig <
tobias@reads-this-mailinglist.com> wrote:
Heho, Let alone $all the /24 assigned under the RIPE waiting list policy.
In the Geoff Huston spirit, I quickly took a look how less specifics
for /24s looks in my table:
[…]
So it seems like there is a healthy amount (~260k) prefixes which lack a less specific.
I also looked using a slightly different approach - namely looking for /24s where there was no spanning aggregate that matched the /24’s AS Path. In my local table there are 224,580 of them.
Geoff
On 10/10/22 9:20 AM, Stephane Bortzmeyer wrote:
But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. I don't think this is true, even in theory, specially for legacy prefixes. There is probably somewhere a Geoff Huston survey on /24 without a covering route.
There's lots of legacy blocks out there that are only /24 in size... Like my primary IPv4 block, for example. I believe you'll also find them referred to as 'portable' blocks. -- Brielle Bruns The Summit Open Source Development Group http://www.sosdg.org / http://www.ahbl.org
< rant > there once used to be 'swamp' space, down in the low 190s where /24s were expected. and folk/rirs tried to keep shorter aggregates, e.g. /19s, as the norm above swamp (negotiated at ietf/danvers). in those days, one could actually filter above swamp on /19. for a while, one could even have classful filters in A and B space. and a soda pop used to be a nickel or a dime. those days are long gone. i expect that we will accept prefixes longer than /24 as things get tighter. and i doubt we can socialize constraining that fragmentation to one section of the space. it is a tragedy that cidr and an open market has helped us more than ipv6 has. randy
I can't find the original message, so replying to the wrong spot in the thread, but... no, filtering /24s is a bad idea if you want (more or less) all your packets to get to their destinations. If you filter all /24s you will lose reachability to 4x /24s I publish that have no covering route because they are not contiguous and not part of any larger logical aggregate. Then there's the 10-20 legacy /24s I *don't* currently publish - if I start advertising them, you won't be able to reach them, either, because they're in the same boat: discontiguous singletons. There are a LOT of legacy discontiguous IPv4 singletons assigned out of the old Class-C space to small/medium businesses, schools, etc. in the pre-ARIN days, and I would guess that the vast majority of them do not have a correct covering /23 or larger - certainly none of the ones I'm currently working with/aware of do. I believe there's at least a couple of DNS servers running in my /24s, so you could potentially lose access to much more than those /24s. Your packet will *probably* hit a next-hop carrier who happens to have the more-specific /24, and it will *probably* eventually reach me, but I thought everyone more-or-less agreed that internet router was already nondeterministic enough as it is? IMHO, if you don't want all the /24s in your FIB (or even RIB!), just pick a carrier, set a default route, and stop worrying about all the headaches BGP provides. Alternately, a valid technique is to have a default route AND a partial BGP feed (a filtered full feed is by definition a partial feed). That helps optimize outbound routing a little bit, you still get the advantage - mostly - of multiple inbound carriers; but you still have to pick one carrier to do the heavy lifting for you. And you are paying them to route for you, so that's not an unfair shifting of the routing burden, unlike relying on covering routes. Note that this approach does NOT provide any redundancy, unlike having full BGP feeds. Separately, I don't know if Geoff has produced such a survey/article, but if not he can probably type it from memory by now :-). Adam Thompson Consultant, Infrastructure Services MERLIN 100 - 135 Innovation Drive Winnipeg, MB R3T 6A8 (204) 977-6824 or 1-800-430-6404 (MB only) https://www.merlin.mb.ca Chat with me on Teams: athompson@merlin.mb.ca
-----Original Message----- From: NANOG <nanog-bounces+athompson=merlin.mb.ca@nanog.org> On Behalf Of Stephane Bortzmeyer Sent: October 10, 2022 10:21 AM To: Edvinas Kairys <edvinas.email@gmail.com> Cc: NANOG Operators' Group <nanog@nanog.org> Subject: Re: any dangers of filtering every /24 on full internet table to preserve FIB space ?
On Mon, Oct 10, 2022 at 05:58:45PM +0300, Edvinas Kairys <edvinas.email@gmail.com> wrote a message of 35 lines which said:
But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc.
I don't think this is true, even in theory, specially for legacy prefixes. There is probably somewhere a Geoff Huston survey on /24 without a covering route.
On 10/20/22 17:50, Adam Thompson wrote:
Alternately, a valid technique is to have a default route AND a partial BGP feed (a filtered full feed is by definition a partial feed). That helps optimize outbound routing a little bit, you still get the advantage - mostly - of multiple inbound carriers; but you still have to pick one carrier to do the heavy lifting for you. And you are paying them to route for you, so that's not an unfair shifting of the routing burden, unlike relying on covering routes. Note that this approach does NOT provide any redundancy, unlike having full BGP feeds.
As a note, you can get redundancy (but still none of the best-path advantages of having multiple transits) by asking your transits to originate default in their BGP feed and then selectively accepting it. You can either ECMP it or pick priority with localpref. You need multiple full-view transits for this to work, though. -- Brandon Martin
I can't believe that never occurred to me in all the time I was doing that, 'way back when... <facepalm> Thanks for pointing that out! -Adam Adam Thompson Consultant, Infrastructure Services MERLIN 100 - 135 Innovation Drive Winnipeg, MB R3T 6A8 (204) 977-6824 or 1-800-430-6404 (MB only) https://www.merlin.mb.ca Chat with me on Teams: athompson@merlin.mb.ca
-----Original Message----- From: NANOG <nanog-bounces+athompson=merlin.mb.ca@nanog.org> On Behalf Of Brandon Martin Sent: October 21, 2022 4:30 PM To: nanog@nanog.org Subject: Re: any dangers of filtering every /24 on full internet table to preserve FIB space ?
Alternately, a valid technique is to have a default route AND a
On 10/20/22 17:50, Adam Thompson wrote: partial BGP feed (a filtered full feed is by definition a partial feed). That helps optimize outbound routing a little bit, you still get the advantage - mostly - of multiple inbound carriers; but you still have to pick one carrier to do the heavy lifting for you. And you are paying them to route for you, so that's not an unfair shifting of the routing burden, unlike relying on covering routes. Note that this approach does NOT provide any redundancy, unlike having full BGP feeds.
As a note, you can get redundancy (but still none of the best-path advantages of having multiple transits) by asking your transits to originate default in their BGP feed and then selectively accepting it. You can either ECMP it or pick priority with localpref.
You need multiple full-view transits for this to work, though.
-- Brandon Martin
Been doing exactly this for a couple ASNs for a few years now with surprisingly good results (thanks to advice way far back from my good friend Brandon Martin above, coincidentally). One of them is even on an L3 switch with something like 96k max routes. Taking defaults from two upstream providers and ECMPing between them. This particular AS is a pretty predictable network so after running netflow for a while, compiling a list of the top ~1000 outbound ASs we talk to, then creating route filters to allow any prefixes from this AS list into our forwarding table, it now has something like 98% of it's traffic by volume covered by specifics from all our upstreams, and of course ECMP defaults to fall back on for the remaining 2%. Not pretty, but have had surprisingly zero issues or traffic weirdness over a few years now - when customers want to play bgp but refuse to buy actual routers you have to get creative :) On Mon, Oct 24, 2022, 11:47 AM Adam Thompson <athompson@merlin.mb.ca> wrote:
I can't believe that never occurred to me in all the time I was doing that, 'way back when... <facepalm> Thanks for pointing that out! -Adam
Adam Thompson Consultant, Infrastructure Services MERLIN 100 - 135 Innovation Drive Winnipeg, MB R3T 6A8 (204) 977-6824 or 1-800-430-6404 (MB only) https://www.merlin.mb.ca Chat with me on Teams: athompson@merlin.mb.ca
-----Original Message----- From: NANOG <nanog-bounces+athompson=merlin.mb.ca@nanog.org> On Behalf Of Brandon Martin Sent: October 21, 2022 4:30 PM To: nanog@nanog.org Subject: Re: any dangers of filtering every /24 on full internet table to preserve FIB space ?
Alternately, a valid technique is to have a default route AND a
On 10/20/22 17:50, Adam Thompson wrote: partial BGP feed (a filtered full feed is by definition a partial feed). That helps optimize outbound routing a little bit, you still get the advantage - mostly - of multiple inbound carriers; but you still have to pick one carrier to do the heavy lifting for you. And you are paying them to route for you, so that's not an unfair shifting of the routing burden, unlike relying on covering routes. Note that this approach does NOT provide any redundancy, unlike having full BGP feeds.
As a note, you can get redundancy (but still none of the best-path advantages of having multiple transits) by asking your transits to originate default in their BGP feed and then selectively accepting it. You can either ECMP it or pick priority with localpref.
You need multiple full-view transits for this to work, though.
-- Brandon Martin
There's 69,055 pure /24's allocated or assigned directly from an RIRs. At least c,d,e, and g root servers only have /24s allocated to them. Major services like Cloudflare only advertise the /24 without advertising an aggregate. Unless you're also getting a default from upstream, it sounds like you're going to end up wasting the money you saved on chasing down subtle brokenness. On Mon, Oct 10, 2022, at 9:58 AM, Edvinas Kairys wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
What do you think about this approach ?
Also maybe you know - some advices for edge routers that have at least 8x100G interfaces and "good" memory for prefix count ? Thanks
If you filter out /23 or longer you cut the v4 table size about in half. I have done this with some edge and eyeball network clients that had really old or underpowered routing gear and upgrades were just not in the budget, and they could barely spell BGP. I know of a number of ASNs with SUP720 era gear still in production this way in 2022 (the power bill is usually someone else’s budget!). Be sure to take default from a couple upstreams and allow /24s for the peers on your IXP links that matter (CDN, etc) and your traffic is mostly fine. Maybe not always taking the most direct return path, but it gets there. Inbound traffic distribution isn’t affected and that is all most eyeball networks care about. On Mon, Oct 10, 2022 at 11:26 AM Nick Suan via NANOG <nanog@nanog.org> wrote:
There's 69,055 pure /24's allocated or assigned directly from an RIRs. At least c,d,e, and g root servers only have /24s allocated to them. Major services like Cloudflare only advertise the /24 without advertising an aggregate.
Unless you're also getting a default from upstream, it sounds like you're going to end up wasting the money you saved on chasing down subtle brokenness.
On Mon, Oct 10, 2022, at 9:58 AM, Edvinas Kairys wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
What do you think about this approach ?
Also maybe you know - some advices for edge routers that have at least 8x100G interfaces and "good" memory for prefix count ? Thanks
-- Jim Troutman, jamesltroutman@gmail.com Pronouns: he/him/his 207-514-5676 (cell)
On Mon, Oct 10, 2022 at 7:58 AM Edvinas Kairys <edvinas.email@gmail.com> wrote:
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
What do you think about this approach ?
If you have a default route that works and you don't have any downstream customers which expect a full routing table, this is fine. You just won't get as good results with the /24s. Beware that MOST Internet /24 routes are NOT covered by a shorter prefix so unless you specifically cover them they will be lost. This will severely impact your Internet connectivity. The Internet FIB is around 900k IPv4 routes. You have years before exhausting a 2.2M table. Regards, Bill Herrin -- For hire. https://bill.herrin.us/resume/
Feasibility of adding some middleware that culls unneeded routes (existing more specific and aggregate routes pointing to the same next hop), when that table starts to fill? Not great for passing downstream, but should fill a need internally. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Edvinas Kairys" <edvinas.email@gmail.com> To: "NANOG Operators' Group" <nanog@nanog.org> Sent: Monday, October 10, 2022 9:58:45 AM Subject: any dangers of filtering every /24 on full internet table to preserve FIB space ? Hello, We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix. What do you think about this approach ? Also maybe you know - some advices for edge routers that have at least 8x100G interfaces and "good" memory for prefix count ? Thanks
nanog@ics-il.net (Mike Hammett) wrote:
Feasibility of adding some middleware that culls unneeded routes (existing more specific and aggregate routes pointing to the same next hop), when that table starts to fill?
Well... if that covering prefix goes away, let's hope you still have a default. I've (been forced to) cull long prefixes on some memory-starved routers, and given that all of them have defaults, For our (former employers and certainly the current one) I've seen moderate to no traffic shifting, and this approach gave the museum gear another lease on life - they were fine with bandwidth. I've even gone down to strip anything longer than a /20 in v4, and a /40 in v6. If you run a backbone that needs to know the best exit for a prefix in order to throw traffic out locally and not pay good money for sightseeing capacity, you might fare better with beefier routing engines. El Mare.
My assumption is that it's not a one-and-done scenario - that the middleware continually adjusts. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Elmar K. Bins" <elmi@4ever.de> To: "NANOG Operators' Group" <nanog@nanog.org> Sent: Monday, October 10, 2022 10:48:56 AM Subject: Re: any dangers of filtering every /24 on full internet table to preserve FIB space ? nanog@ics-il.net (Mike Hammett) wrote:
Feasibility of adding some middleware that culls unneeded routes (existing more specific and aggregate routes pointing to the same next hop), when that table starts to fill?
Well... if that covering prefix goes away, let's hope you still have a default. I've (been forced to) cull long prefixes on some memory-starved routers, and given that all of them have defaults, For our (former employers and certainly the current one) I've seen moderate to no traffic shifting, and this approach gave the museum gear another lease on life - they were fine with bandwidth. I've even gone down to strip anything longer than a /20 in v4, and a /40 in v6. If you run a backbone that needs to know the best exit for a prefix in order to throw traffic out locally and not pay good money for sightseeing capacity, you might fare better with beefier routing engines. El Mare.
On Mon, Oct 10, 2022 at 8:37 AM Mike Hammett <nanog@ics-il.net> wrote:
Feasibility of adding some middleware that culls unneeded routes (existing more specific and aggregate routes pointing to the same next hop), when that table starts to fill?
This is called "FIB aggregation." It exists and works but is not widely adopted. Regards, Bill Herrin -- For hire. https://bill.herrin.us/resume/
There has been a number of efforts to implement FIB (actually BGP RIB) compression. There’s a white paper from MS research; I recall Spotify talking of running off-box BGP compression SW and re-injecting summarized BGP RIB; Volta Networks had an implementation of full BGP table compression to about 370K routes with no connectivity loss and reasonably fast reaction of topology changes/disaggregation needed(it even won some Intel price for innovation), not sure what happened to it (Volta had been acquired by IBM some time ago). To my memory - IOS-XR allows off box custom best path logic and re-injection of routes into BGP RIB Cheers, Jeff
On Oct 10, 2022, at 09:26, William Herrin <bill@herrin.us> wrote:
On Mon, Oct 10, 2022 at 8:37 AM Mike Hammett <nanog@ics-il.net> wrote:
Feasibility of adding some middleware that culls unneeded routes (existing more specific and aggregate routes pointing to the same next hop), when that table starts to fill?
This is called "FIB aggregation." It exists and works but is not widely adopted.
Regards, Bill Herrin
-- For hire. https://bill.herrin.us/resume/
On Mon, Oct 10, 2022 at 11:18 AM Jeff Tantsura <jefftant.ietf@gmail.com> wrote:
There has been a number of efforts to implement FIB (actually BGP RIB) compression. There’s a white paper from MS research; I recall Spotify talking of running off-box BGP compression SW and re-injecting summarized BGP RIB;
Hi Jeff, Actually, I was talking about FIB aggregation/compression where you take a full BGP RIB but optimize the prefixes placed in the local FIB so that you don't need a FIB entry for every RIB entry. Solid FIB compression can give you a 50% savings in FIB entries without altering the next hop selected for any reachable destination IP address. Costs main processor time computing the FIB from the RIB of course, which can be challenging in a large change event (e.g. a link loss). Excepting source aggregation and terminal default routes, RIB compression and reinjection is dangerous on multiple levels from accidental leaks to connectivity loss and (in my opinion) should not be done. Regards, Bill Herrin -- For hire. https://bill.herrin.us/resume/
On 10/10/22 16:58, Edvinas Kairys wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
I wouldn't bank on that. I am confident I have seen /24's with no covering route, more so for PI space from RIR's that may only be able to allocate a /24 and nothing shorter. It would be one heck of an experiment, though :-). Mark.
On Mon, Oct 10, 2022 at 8:44 AM Mark Tinka <mark@tinka.africa> wrote:
On 10/10/22 16:58, Edvinas Kairys wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
I wouldn't bank on that.
I am confident I have seen /24's with no covering route, more so for PI space from RIR's that may only be able to allocate a /24 and nothing shorter.
It would be one heck of an experiment, though :-).
Mark.
I may or may not have done something like this at $PREVIOUS_DAY_JOB. We (might have) discovered some interesting brokenness on the Internet in doing so; in one case, a peer was sending a /20 across exchange peering sessions with us, along with some more specific /24s. After filtering out the /24s, traffic rightly flowed to the covering /20. Peer reached out in an outraged huff; the /24s were being advertised from non-backbone-connected remote sites in their network, that suddenly couldn't fetch content from us anymore. Traceroutes from our side followed the /20 back to their "core", and then died. They explained the /24s were being advertised from remote sites without backbone connections to the site advertising the /20, and we needed to stop sending traffic to the /20, and send it directly to the /24 instead. We demurred, and let them know we were correctly following the information in the routing table. They became even more huffy, insisting that we were breaking the internet by not following the correct routing for the more-specific /24s which were no longer present in our tables. No amount of trying to explain to them that they should not advertise an aggregate route if no connectivity to the more specific constituents existed seemed to get the point across. In their eyes, advertising the /24s meant that everyone should follow the more specific route to the final destination directly. So, even seeing a 'covering route' in the table is no guarantee that you won't create subtle and not-so-subtle breakage when filtering out more specifics to save table space. ^_^; Having (possibly) done this once in the past, I'd strongly recommend looking for a different solution--or at least be willing to arm your front-end response team with suitable "No, *you* broke the Internet" asbestos suits before running a git commit to push your changes out to all the affected devices in your network. ;) Matt
On Oct 10, 2022, at 6:37 PM, Matthew Petach <mpetach@netflight.com> wrote:
On Mon, Oct 10, 2022 at 8:44 AM Mark Tinka <mark@tinka.africa> wrote: On 10/10/22 16:58, Edvinas Kairys wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
I wouldn't bank on that.
I am confident I have seen /24's with no covering route, more so for PI space from RIR's that may only be able to allocate a /24 and nothing shorter.
It would be one heck of an experiment, though :-).
Mark.
I may or may not have done something like this at $PREVIOUS_DAY_JOB.
We (might have) discovered some interesting brokenness on the Internet in doing so; in one case, a peer was sending a /20 across exchange peering sessions with us, along with some more specific /24s. After filtering out the /24s, traffic rightly flowed to the covering /20. Peer reached out in an outraged huff; the /24s were being advertised from non-backbone-connected remote sites in their network, that suddenly couldn't fetch content from us anymore. Traceroutes from our side followed the /20 back to their "core", and then died. They explained the /24s were being advertised from remote sites without backbone connections to the site advertising the /20, and we needed to stop sending traffic to the /20, and send it directly to the /24 instead. We demurred, and let them know we were correctly following the information in the routing table.
We encountered similar behavior, but not from a network desegregating their own address space like this. Rather, it was a network (actually a network services vendor) who had a PA /24 from a colo provider that they were no longer interconnected with. We had to filter /24s on transit (our network does not resell transit) due to issues with spanslogic inefficiency on Nexus 7k. When trying to turn up a demo with this vendor, connections were not establishing. We found that they had an older PA /24 in the FIB but we were following a /20 or some such route to their old upstream/colo. We ended up doing a bunch of work to find other such “possibly disconnected /24s” based mainly on origin ASN, and put in exceptions to our filtering until we could complete some hardware upgrades. In situations like this, we of course did have functioning default routes from our upstream — but that doesn’t help since the /20 from a peer was attracting and blackholing the traffic. As IPv4 continues to desegregate and get resold and otherwise optimized, I imagine this will become more common. Not a problem for a multi-homed stub network with multiple default routes coming from upstream, unless they have peering and don’t micromanage it with this in mind. Ryan
They became even more huffy, insisting that we were breaking the internet by not following the correct routing for the more-specific /24s which were no longer present in our tables. No amount of trying to explain to them that they should not advertise an aggregate route if no connectivity to the more specific constituents existed seemed to get the point across. In their eyes, advertising the /24s meant that everyone should follow the more specific route to the final destination directly.
So, even seeing a 'covering route' in the table is no guarantee that you won't create subtle and not-so-subtle breakage when filtering out more specifics to save table space. ^_^;
+1
Having (possibly) done this once in the past, I'd strongly recommend looking for a different solution--or at least be willing to arm your front-end response team with suitable "No, *you* broke the Internet" asbestos suits before running a git commit to push your changes out to all the affected devices in your network. ;)
Matt
On 10/11/22 00:37, Matthew Petach wrote:
They became even more huffy, insisting that we were breaking the internet by not following the correct routing for the more-specific /24s which were no longer present in our tables. No amount of trying to explain to them that they should not advertise an aggregate route if no connectivity to the more specific constituents existed seemed to get the point across. In their eyes, advertising the /24s meant that everyone should follow the more specific route to the final destination directly.
Certainly, an interesting, half-technical angle to consider when thinking of doing something like this. Folk that are pushing out /24's with the expectation of the rest of the Internet steering traffic a certain way toward them, being surprised by the "brokenness" that can be created due to the decision to override "longest match" in favour of spending less cash. Who has the right to complain the least, or the most, in such a situation? A: "So why don't you have a bigger router that can take our /24?" B: "Well, we don't have the money to afford taking a /24." A: "Ummh... but you are breaking BGP, and..." B: "Yeah... it's my Autonomous System. Sorry!" As my South African friend would say, "It's wild". Mark.
On Mon, Oct 10, 2022 at 3:37 PM Matthew Petach <mpetach@netflight.com> wrote:
They became even more huffy, insisting that we were breaking the internet by not following the correct routing for the more-specific /24s which were no longer present in our tables. No amount of trying to explain to them that they should not advertise an aggregate route if no connectivity to the more specific constituents existed seemed to get the point across. In their eyes, advertising the /24s meant that everyone should follow the more specific route to the final destination directly.
Hi Matthew, They were correct. If the /24 was reaching your network, traffic should not have been following the /20. In your version, they would have to disaggregate the /20 into 16 /24s just because you didn't want to honor most-specific path routing. That's not what anybody wants. Least of all you. One of my service providers has multiple disconnected sites. At each site they advertise the Internet's full BGP table to me -except for- the routes to their other sites. They insist they're doing the right thing but they're not. The BGP table they send me is not full, and when I need to talk to many of -their- servers my traffic ends up routing through one of my other service providers. I'm not paying them enough to make a big stink about it but if I was you can bet I'd take them to task. Regards, Bill Herrin -- For hire. https://bill.herrin.us/resume/
On Tue, Oct 11, 2022 at 7:41 AM William Herrin <bill@herrin.us> wrote:
On Mon, Oct 10, 2022 at 3:37 PM Matthew Petach <mpetach@netflight.com> wrote:
They became even more huffy, insisting that we were breaking the internet by not following the correct routing for the more-specific /24s which were no longer present in our tables. No amount of trying to explain to them that they should not advertise an aggregate route if no connectivity to the more specific constituents existed seemed to get the point across. In their eyes, advertising the /24s meant that everyone should follow the more specific route to the final destination directly.
Hi Matthew,
They were correct. If the /24 was reaching your network, traffic should not have been following the /20. In your version, they would have to disaggregate the /20 into 16 /24s just because you didn't want to honor most-specific path routing. That's not what anybody wants. Least of all you.
I disagree. To illustrate why, let's take your case a step further, shall we? Wouldn't that same argument mean that every ISP that isn't honoring my /26 announcement, but is instead following the covering /24, or /20, or whatever sized prefix is equally in the wrong? And what about Fred's /27 announcement? Gosh, and now Cindy wants to announce a dozen /30's--is it everyone else's error for not listening to those announcements? What makes /24 boundaries magically "OK" to filter on, such that if you announce something smaller than a /24 that gets filtered, and traffic goes to the covering aggregate, everyone says "well, that's just how the Internet works, and of course traffic would be expected to flow towards the covering announcement", but if I set the boundary at a different, but still arbitrarily-sized point, like /23, suddenly the announcing party is right, and I'm wrong? If the stance is "it doesn't matter if there's a covering prefix, that announcement doesn't mean you can reach all the prefixes contained within it, you *must* listen to all the smaller announcements in order to have reachability", then A) you're redefining how BGP works in a fundamental way, and B) we should all buy stock in router memory manufacturers, because they're going to be the next oil companies. BGP 101 says that if I announce a covering prefix, I'm making a statement into the BGP routing table that says "you can reach everything contained within this covering route via me", and that's how the forwarding tables treat it; any time there's nothing more specific in the table, even due to a brief transient change on the Internet, traffic for those prefixes will be forwarded to the router announcing the covering prefix announcement. If I announce 0/1 into the DFZ and drop any traffic destined for it on the floor, I'm not going to get much sympathy by saying "well, it's your fault, you should have been listening to all the more specifics and not trusting the covering route to actually have reachability to the prefixes contained within it." (though that does make me think that if you're a content-heavy shop looking to balance your traffic flows, it might be a interesting way to make the point in a very real way to everyone on the Internet...) To wrap up--I disagree with your assertion because it depends entirely on a 'magic' /24 boundary that makes it OK to filter more specifics smaller than it, but not OK to filter larger than that and depend instead on covering prefixes, without actually being based on anything concrete in BGP or published standards. "But that's how we've always done it" is not the same as "but that's how the protocol works." ^_^; Regards, Bill Herrin Thank you for the discussion! Matt
On Tue, Oct 11, 2022 at 1:15 PM Matthew Petach <mpetach@netflight.com> wrote:
Wouldn't that same argument mean that every ISP that isn't honoring my /26 announcement, but is instead following the covering /24, or /20, or whatever sized prefix is equally in the wrong?
What makes /24 boundaries magically "OK" to filter on,
Hi Matthew, /24 is the consensus filtering level for Internet-wide routes and it has been for decades. It became the consensus as a holdover from "class C" and remains the consensus because too many people would have to cooperate to change it. Indeed, a little over a decade ago some folks tried to change it to /19 and then /20 for prefixes outside "the swamp" and, well, they failed. Likewise, more than a few folks announce /26's to their immediate transit providers and they simply don't move very deep into the system -- nobody has any expectation that they will.
To wrap up--I disagree with your assertion because it depends entirely on a 'magic' /24 boundary that makes it OK to filter more specifics smaller than it, but not OK to filter larger than that and depend instead on covering prefixes, without actually being based on anything concrete in BGP or published standards.
Got any better reasons besides disliking the consensus? Regards, Bill Herrin -- For hire. https://bill.herrin.us/resume/
The /24 is as small as it will get before it cuts into profits for the tiny bit of administration it would take to announce /25, /26. This argument is almost as old as my kids. Is it fair or just, probably not, but that's they way the consensus seems to want it.RichardRichard GolodnerInfratection IT Services -------- Original message --------From: William Herrin <bill@herrin.us> Date: 10/11/22 16:00 (GMT-06:00) To: Matthew Petach <mpetach@netflight.com> Cc: nanog@nanog.org Subject: Re: any dangers of filtering every /24 on full internet table to preserve FIB space ? On Tue, Oct 11, 2022 at 1:15 PM Matthew Petach <mpetach@netflight.com> wrote:> Wouldn't that same argument mean that every ISP that isn't honoring> my /26 announcement, but is instead following the covering /24, or /20,> or whatever sized prefix is equally in the wrong?>> What makes /24 boundaries magically "OK" to filter on,Hi Matthew,/24 is the consensus filtering level for Internet-wide routes and ithas been for decades. It became the consensus as a holdover from"class C" and remains the consensus because too many people would haveto cooperate to change it. Indeed, a little over a decade ago somefolks tried to change it to /19 and then /20 for prefixes outside "theswamp" and, well, they failed. Likewise, more than a few folksannounce /26's to their immediate transit providers and they simplydon't move very deep into the system -- nobody has any expectationthat they will.> To wrap up--I disagree with your assertion because it depends entirely> on a 'magic' /24 boundary that makes it OK to filter more specifics smaller> than it, but not OK to filter larger than that and depend instead on covering> prefixes, without actually being based on anything concrete in BGP or> published standards.Got any better reasons besides disliking the consensus?Regards,Bill Herrin-- For hire. https://bill.herrin.us/resume/
On Tue, Oct 11, 2022 at 1:59 PM William Herrin <bill@herrin.us> wrote:
On Tue, Oct 11, 2022 at 1:15 PM Matthew Petach <mpetach@netflight.com> wrote:
Wouldn't that same argument mean that every ISP that isn't honoring my /26 announcement, but is instead following the covering /24, or /20, or whatever sized prefix is equally in the wrong?
What makes /24 boundaries magically "OK" to filter on,
Hi Matthew,
/24 is the consensus filtering level for Internet-wide routes and it has been for decades. It became the consensus as a holdover from "class C" and remains the consensus because too many people would have to cooperate to change it. Indeed, a little over a decade ago some folks tried to change it to /19 and then /20 for prefixes outside "the swamp" and, well, they failed. Likewise, more than a few folks announce /26's to their immediate transit providers and they simply don't move very deep into the system -- nobody has any expectation that they will.
Yes, I know. I was there when smd was pointing out the arbitrary lines being drawn in the sand, and decided to draw his own line. The first salvo was fired in 1996, with a customer complaining their /24 wasn't being accepted by everyone, leading to a *very* long chorus of people chiming in with different thoughts on where the line could and should be drawn: https://archive.nanog.org/mailinglist/mailarchives/old_archive/1996-01/msg00... My point is that it's not a feature of BGP, it's a purely human convention, arrived at through the intersection of pain and laziness. There's nothing inherently "right" or "wrong" about where the line was drawn, so for networks to decide that /24 is causing too much pain, and moving the line to /23 is no more "right" or "wong" than drawing the line at /24. A network that *counts* on its non-connected sites being reachable because they're over a mythical /24 limit is no more right than a customer upset that their /25 announcements aren't being listened to.
To wrap up--I disagree with your assertion because it depends entirely on a 'magic' /24 boundary that makes it OK to filter more specifics smaller than it, but not OK to filter larger than that and depend instead on covering prefixes, without actually being based on anything concrete in BGP or published standards.
Got any better reasons besides disliking the consensus?
Absolutely. Let BGP work as it's supposed to work. If there's a covering prefix being announced, according to BGP, it's a valid pathway to reach all the prefixes contained within it. If that's not how your network is constructed, don't send out your announcements that way. Only announce prefixes for which you *do* have actual reachability. Consensus isn't a guarantee. "SHOULD" in an RFC is still just a recommendation, and not following it isn't an error. If you're worried about memory in your routers, and you decide to move the line from /24 to /23 or /22, that's not an error, that's not breaking BGP, that's just moving an arbitrary line that was set by stressed and busy network engineers nearly 3 decades ago. If a network engineer feels the need to filter out longer prefixes to deal with a memory shortage in their devices, that's their decision; my anecdote was to point out you'll likely run into people who don't understand BGP very well, and mistakenly think there's some magical guarantee that /24 or shorter prefixes will always work, while longer prefixes won't. And that's just not at all true. BGP simply looks for the longest match in the available table, whatever that might be, and uses whatever the "most specific" match is, no matter how long or short it might be. Networks should always keep that in mind when announcing prefixes; don't announce a prefix you aren't prepared to handle the traffic for, no matter what traffic engineering tweaks you might be attempting to steer traffic away. You should always assume that for whatever reason, if you announce a prefix, there's a good chance that other networks will see that as the best match and make use of it. If you don't want it used for traffic, don't announce it. Thanks! Matt
On Tue, Oct 11, 2022 at 5:32 PM Matthew Petach <mpetach@netflight.com> wrote:
My point is that it's not a feature of BGP, it's a purely human convention, arrived at through the intersection of pain and laziness. There's nothing inherently "right" or "wrong" about where the line was drawn, so for networks to decide that /24 is causing too much pain, and moving the line to /23 is no more "right" or "wong" than drawing the line at /24.
Hi Matthew, If you defy convention in a manner which causes things that normally work to break, your implementation is "wrong" for a fairly important definition of "wrong."
Let BGP work as it's supposed to work.
If there's a covering prefix being announced, according to BGP, it's a valid pathway to reach all the prefixes contained within it. If that's not how your network is constructed, don't send out your announcements that way. Only announce prefixes for which you *do* have actual reachability.
All TCP/IP routing is more-specific route first. That is the expected behavior. I honestly don't fathom your view that BGP is or should be different from that norm. If the origin of a covering route has no problem sinking the traffic when the more-specific is offline, I don't see the problem. You shouldn't be taking them offline with route filtering. Regards, Bill Herrin -- For hire. https://bill.herrin.us/resume/
On Tue, Oct 11, 2022 at 7:03 PM William Herrin <bill@herrin.us> wrote:
On Tue, Oct 11, 2022 at 5:32 PM Matthew Petach <mpetach@netflight.com> wrote: [...] All TCP/IP routing is more-specific route first. That is the expected behavior. I honestly don't fathom your view that BGP is or should be different from that norm. If the origin of a covering route has no problem sinking the traffic when the more-specific is offline, I don't see the problem. You shouldn't be taking them offline with route filtering.
*facepalm* Right. That's the entire point I started off the subthread with. The problem lay with an organization that *did* have a problem sinking the traffic when the more-specific was not available. They had chunked up their allocation into smaller pieces which were distributed to different island locations with no internal network connectivity to the island sites. They were announcing a covering prefix for all the more specifics, where the covering less specific announcement had no reachability to the more specifics; so when a network filtered out the more specifics, the traffic fell on the floor, because it was sent to a location that was announcing the supernet that had no reachability to the correct destination. Their assumption that *everyone* would hear the more specifics, and thus the traffic would flow to the right island location was the "failure to understand BGP" that I was commenting on, and noting that while it is entirely correct to decide if you want to filter prefixes of an arbitrary length from entering your network, you may discover in the process that other networks that do not understand BGP and routing in general may complain that you have Broken The Internet(tm) by doing so. Assuming that your announcement of more specifics will always pull traffic away from a less-specific announcement is overly-optimistic. While it may *often* work, you should still be prepared to deal with traffic arriving at your least-specific announcement as well. This turned out to be something that not every network on the Internet fully grasps, and my original message was warning that filtering on /24s would potentially bring complaints from networks like those. It took a roundabout path, but I'm glad we eventually both ended up at the same place. :) Thanks! Matt
On Sun, Oct 16, 2022 at 1:01 AM Matthew Petach <mpetach@netflight.com> wrote:
Their assumption that *everyone* would hear the more specifics, and thus the traffic would flow to the right island location was the "failure to understand BGP" that I was commenting on, and noting that while it is entirely correct to decide if you want to filter prefixes of an arbitrary length from entering your network, you may discover in the process that other networks that do not understand BGP and routing in general may complain that you have Broken The Internet(tm) by doing so.
Matthew, We studied aggregation to death back in the IRTF Routing Research Group. The bottom line is that you can aggregate at the source and you can aggregate at the BGP leaf nodes (transits, no downstreams or peers) but RIB aggregation anywhere else in the interdomain protocol breaks the network. You may wish that you could filter those more-specific prefixes but you are quite mistaken: that is NOT how BGP works. In point of fact, we couldn't come up with any theoretical interdomain routing protocol in which it was possible to filter conventionally legitimate prefixes and have the system operate reasonably. As near as we could determine, no such thing exists. When I design a covering route, I include a VPN to the site with the more-specific to catch the occasional misrouted packet. But then I also parse the TCP SYN packets and reduce the MSS because there are knuckleheads which think they can filter ICMP and have TCP merrily work without functional path MTU discovery. Those folks are wrong too, TCP doesn't work the way they think, but I'd rather keep the customer than win the argument. Regards, Bill Herrin
Assuming that your announcement of more specifics will always pull traffic away from a less-specific announcement is overly-optimistic. While it may *often* work, you should still be prepared to deal with traffic arriving at your least-specific announcement as well.
This turned out to be something that not every network on the Internet fully grasps, and my original message was warning that filtering on /24s would potentially bring complaints from networks like those.
It took a roundabout path, but I'm glad we eventually both ended up at the same place. :)
Thanks!
Matt
-- For hire. https://bill.herrin.us/resume/
This situation isn’t helped by RIR policies that require you to announce the aggregate in region even if the more specifics are scattered around the world. The whole territorial exclusivity game played by some RIRs may well cause more harm than good at this point. Yes, I realize this is a reversal of my previous views on the subject. I’m becoming more aware of more circumstances in which this idea is fraught and causing problems for legitimate users more than for policy forum shoppers and leasing companies. Owen
On Oct 16, 2022, at 01:01, Matthew Petach <mpetach@netflight.com> wrote:
On Tue, Oct 11, 2022 at 7:03 PM William Herrin <bill@herrin.us> wrote: On Tue, Oct 11, 2022 at 5:32 PM Matthew Petach <mpetach@netflight.com> wrote: [...] All TCP/IP routing is more-specific route first. That is the expected behavior. I honestly don't fathom your view that BGP is or should be different from that norm. If the origin of a covering route has no problem sinking the traffic when the more-specific is offline, I don't see the problem. You shouldn't be taking them offline with route filtering.
*facepalm*
Right. That's the entire point I started off the subthread with.
The problem lay with an organization that *did* have a problem sinking the traffic when the more-specific was not available. They had chunked up their allocation into smaller pieces which were distributed to different island locations with no internal network connectivity to the island sites.
They were announcing a covering prefix for all the more specifics, where the covering less specific announcement had no reachability to the more specifics; so when a network filtered out the more specifics, the traffic fell on the floor, because it was sent to a location that was announcing the supernet that had no reachability to the correct destination.
Their assumption that *everyone* would hear the more specifics, and thus the traffic would flow to the right island location was the "failure to understand BGP" that I was commenting on, and noting that while it is entirely correct to decide if you want to filter prefixes of an arbitrary length from entering your network, you may discover in the process that other networks that do not understand BGP and routing in general may complain that you have Broken The Internet(tm) by doing so.
Assuming that your announcement of more specifics will always pull traffic away from a less-specific announcement is overly-optimistic. While it may *often* work, you should still be prepared to deal with traffic arriving at your least-specific announcement as well.
This turned out to be something that not every network on the Internet fully grasps, and my original message was warning that filtering on /24s would potentially bring complaints from networks like those.
It took a roundabout path, but I'm glad we eventually both ended up at the same place. :)
Thanks!
Matt
Matthew Petach писал(а) 2022-10-11 20:33:
My point is that it's not a feature of BGP, it's a purely human convention, arrived at through the intersection of pain and laziness. There's nothing inherently "right" or "wrong" about where the line was
drawn, so for networks to decide that /24 is causing too much pain, and moving the line to /23 is no more "right" or "wong" than drawing the line at /24. A network that *counts* on its non-connected sites being reachable because they're over a mythical /24 limit is no more right than a customer upset that their /25 announcements aren't being listened to.
IMO this line wasn't arbitrary, it was (and it still is) a smallest possible network size allocated by RIRs. So it's just a common sense to receive everything down to /24 to have the complete data about all Internet participants. -- Kind regards, Andrey
On Wed, Oct 12, 2022 at 7:54 AM Andrey Kostin <ankost@podolsk.ru> wrote:
IMO this line wasn't arbitrary, it was (and it still is) a smallest possible network size allocated by RIRs. So it's just a common sense to receive everything down to /24 to have the complete data about all Internet participants.
Hi Andrey, Filtering routes longer than /24 route filtering came first and is the cause here while the RIR minimum assignment is an effect. The RIRs stay at /24 because it would be implicitly wasteful to assign addresses in units smaller than can be routed on the public Internet. Of the things that would have to change to make longer prefixes routeable on the Internet, the RIR policies are the easiest. The /24 boundary is simply a holdover from pre-CIDR times when the smallest routing unit was a "class C." Folks wanted to make sure CIDR didn't make their routing woes worse, so they filtered and it stuck. Regards, Bill Herrin -- For hire. https://bill.herrin.us/resume/
Andrey, On Oct 12, 2022, at 7:54 AM, Andrey Kostin <ankost@podolsk.ru> wrote:
My point is that it's not a feature of BGP, it's a purely human convention, arrived at through the intersection of pain and laziness. There's nothing inherently "right" or "wrong" about where the line was drawn, so for networks to decide that /24 is causing too much pain, and moving the line to /23 is no more "right" or "wong" than drawing the line at /24. A network that *counts* on its non-connected sites being reachable because they're over a mythical /24 limit is no more right than a customer upset that their /25 announcements aren't being listened to.
IMO this line wasn't arbitrary, it was (and it still is) a smallest possible network size allocated by RIRs.
There was a period in the mid- to late-90s where some of RIRs allocated longer than /24s, i.e., to match the amount of address space justified by the requester, even if that meant (say) a /29. This didn’t last very long as one of the (at the time) 800 lb gorillas (Sprint) decided to start filtering at /19 (which IIRC was the default prefix length RIPE-NCC chose to allocate to LIRs) to keep their routers from falling over. In this context, any prefix length, including /24, is arbitrary. Today, filtering on /24 will probably drop some number of perfectly valid and perhaps better routes to specific destinations (I’m too lazy to look to see). That’s fine as long as there is some covering route that allows the traffic to get from here to there. It feels to me like the responsibility should be on the announcer to ensure there is some covering less-specific for stuff that has "a good chance" of being filtered.
So it's just a common sense to receive everything down to /24 to have the complete data about all Internet participants.
Given infinite resources, sure. However, I believe the issue here, as it was in the mid- to late-90s, is hardware limitations. Having a partial view with (potentially) non-optimal less specifics is better than having your routers fall over. Regards, -drc
David Conrad писал(а) 2022-10-12 11:39:
Andrey,
There was a period in the mid- to late-90s where some of RIRs allocated longer than /24s, i.e., to match the amount of address space justified by the requester, even if that meant (say) a /29. This didn’t last very long as one of the (at the time) 800 lb gorillas (Sprint) decided to start filtering at /19 (which IIRC was the default prefix length RIPE-NCC chose to allocate to LIRs) to keep their routers from falling over.
I'm looking at it only from a practical side. I worked for different ISPs in RIPE region in 2000-s and never saw anything like that. There was a requirement from RIPE to create an object for any assigned /29 subnet and larger for IP space usage documentation, but from allocated block. Anyways, even is it's true, it doesn't change anything. There are /24 PI blocks and if /24s are filtered, default route must be in use to have functional Internet connectivity. Thanks, Andrey
On Wed, 12 Oct 2022, Andrey Kostin wrote:
Matthew Petach писал(а) 2022-10-11 20:33:
My point is that it's not a feature of BGP, it's a purely human convention, arrived at through the intersection of pain and laziness. There's nothing inherently "right" or "wrong" about where the line was
drawn, so for networks to decide that /24 is causing too much pain, and moving the line to /23 is no more "right" or "wong" than drawing the line at /24. A network that *counts* on its non-connected sites being reachable because they're over a mythical /24 limit is no more right than a customer upset that their /25 announcements aren't being listened to.
IMO this line wasn't arbitrary, it was (and it still is) a smallest possible network size allocated by RIRs. So it's just a common sense to receive everything down to /24 to have the complete data about all Internet participants.
Nope. I first did some work on this topic in early 2008 and remembered writing a blog entry about it. https://web.archive.org/web/20060926140659/https://www.ripe.net/ripe/docs/ri... RIPE, at least back in 2008, would allocate as long as /29 from several /8s. I have no idea how many sub-/24 allocations they did or what the recipients tried doing with the space. Even then, despite RIPE saying "we'll allocate as long as /29", I set the filter cut-off [arbitrarily] at /24 and made sure we had defaults pointing at ISPs that had "fuller" tables. And just for the record, despite having been bitten by it more than once, I'm very much in the camp of "if you advertise a covering aggregate, you're offering to get packets there, regardless of whether or not more specifics exist." You have no business demanding what routes someone else's network receives/accepts. All you can reasonably control is what you advertise and what you accept. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On Wed, Oct 12, 2022 at 11:51:13AM -0400, Jon Lewis wrote: [snip]
And just for the record, despite having been bitten by it more than once, I'm very much in the camp of "if you advertise a covering aggregate, you're offering to get packets there, regardless of whether or not more specifics exist." You have no business demanding what routes someone else's network receives/accepts. All you can reasonably control is what you advertise and what you accept.
This. People will come up with all sorts of creative topologies, and as long as they are *internal* that's A-OK. The distance outside of one's AS one can expect "interesting" TE to travel is equivalent to the reach of your $s and/or contracts. Cheers, Joe -- Posted from my personal account - see X-Disclaimer header. Joe Provo / Gweep / Earthling
On 10/10/22 07:58, Edvinas Kairys wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
What do you think about this approach ?
Are you multi-homed? If not you can simply take a default. If so, a better approach might be to apply a max AS rule and take full tables plus a default from both (all). Something like "bgp maxas-limit 4" will optimize routing down to /24 but drop routes with long AS paths and punt to default, reducing your table size at the cost of sub-optimal routing to destinations that are going to take a convoluted path anyway. -- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
On 2022-10-10 09:39, Jay Hennigan wrote:
On 10/10/22 07:58, Edvinas Kairys wrote:
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
If so, a better approach might be to apply a max AS rule and take full tables plus a default from both (all). Something like "bgp maxas-limit 4" will optimize routing down to /24 but drop routes with long AS paths and punt to default, reducing your table size at the cost of sub-optimal routing to destinations that are going to take a convoluted path anyway.
And run something like netflow to determine high traffic AS paths, and optimize those into your filtering.
I like that idea. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Jay Hennigan" <jay@west.net> To: nanog@nanog.org Sent: Monday, October 10, 2022 10:39:06 AM Subject: Re: any dangers of filtering every /24 on full internet table to preserve FIB space ? On 10/10/22 07:58, Edvinas Kairys wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
What do you think about this approach ?
Are you multi-homed? If not you can simply take a default. If so, a better approach might be to apply a max AS rule and take full tables plus a default from both (all). Something like "bgp maxas-limit 4" will optimize routing down to /24 but drop routes with long AS paths and punt to default, reducing your table size at the cost of sub-optimal routing to destinations that are going to take a convoluted path anyway. -- Jay Hennigan - jay@west.net Network Engineering - CCIE #7880 503 897-8550 - WB6RDV
----- Original Message -----
From: "Randy Bush" <randy@psg.com> To: "Edvinas Kairys" <edvinas.email@gmail.com> Subject: Re: any dangers of filtering every /24 on full internet table to preserve FIB space ?
we're thinking to deny all /24s to save the memory
i recommend this to all my competitors
So good to know things haven't changed whilst I was in hiding... Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274
There are most definitely a number of organizations that have /24s that are not part of a larger aggregate. If you don’t have a default route to some router that takes the full table on your behalf, then you will loose connectivity to/from those entities. Owen
On Oct 10, 2022, at 07:58 , Edvinas Kairys <edvinas.email@gmail.com> wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
What do you think about this approach ?
Also maybe you know - some advices for edge routers that have at least 8x100G interfaces and "good" memory for prefix count ? Thanks
You’ll need to be very selective about the IP ranges you apply that filter to, or more likely, just do it and make sure have one or more default routes to devices/providers that carry full tables. As for alternate devices, have you looked at Arista 7280, particularly the Jericho >1 versions. Sent from my iPhone
On Oct 10, 2022, at 10:59 AM, Edvinas Kairys <edvinas.email@gmail.com> wrote:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
What do you think about this approach ?
Also maybe you know - some advices for edge routers that have at least 8x100G interfaces and "good" memory for prefix count ? Thanks
I already had this idea, I even implemented it in the desperate time of the 512K "bug". And with that I can tell you: Do not do it! You will be bothered! But if you want to go this way, what I can recommend is to try not to put routes in the FIB that match your Default. Talking about having a default route other than /dev/null is already a problem at first... Because a Transit Provider is not expected to use the Default route. But I'm not even going to get into that (many flames will arise). If you really decide to use a default route and choose what will not and what will not apply to the FIB, you must be prepared for a certain complexity in these choices. And the more Peers and DFZ views you have, the more complex it will be. In a very simplified hypothetical example of dual-homed DFZ, take the best routes from link B, and leave the default by link A. There are even tools that, comparing flow analysis and routes exported from the RIB, "choose" the routes with more matching packages, and apply that for you. But thinking in this way, the transition to the dark side of the force is already beginning to be made. Walking through the valley of death until arriving in the land of the Route Optimizers. My memory is not helping me... But I think the name of one of the projects that did this magic was rt-flow or flow-rt. Something like. Em seg., 10 de out. de 2022 às 12:01, Edvinas Kairys < edvinas.email@gmail.com> escreveu:
Hello,
We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 24x100G, but only 2.2mln route (FIB) memory entries. In a near future it will be not enough - so we're thinking to deny all /24s to save the memory. What do you think about that approach - I know it could provide some misbehavior. But theoretically every filtered /24 could be routed via smaller prefix /23 /22 /21 or etc. But of course it could be a situation when denied /24 will not be covered by any smaller prefix.
What do you think about this approach ?
Also maybe you know - some advices for edge routers that have at least 8x100G interfaces and "good" memory for prefix count ? Thanks
-- Douglas Fernando Fischer Engº de Controle e Automação
participants (32)
-
Adam Thompson
-
Andrey Kostin
-
Brandon Martin
-
Brie
-
Ca By
-
David Bass
-
David Conrad
-
Douglas Fischer
-
Edvinas Kairys
-
Elmar K. Bins
-
Geoff Huston
-
Jay Hennigan
-
Jay R. Ashworth
-
Jeff Tantsura
-
Jim Troutman
-
Joe Provo
-
John Gilmore
-
Jon Lewis
-
Jon Sands
-
Mark Tinka
-
Matthew Petach
-
Mike Hammett
-
Nick Suan
-
Owen DeLong
-
Randy Bush
-
Raymond Burkholder
-
Richard Golodner
-
richey goldberg
-
Ryan Rawdon
-
Stephane Bortzmeyer
-
Tobias Fiebig
-
William Herrin