Hi, Considering: http://thyme.apnic.net Total number of prefixes smaller than registry allocations: 113220 !!!!! /20:17046 /21:16106 /22:20178 /23:21229 /24:126450 That is saying to me that a significant number of these smaller prefixes are due to de-aggregation of PA and not PI announcements. My question is - how can I construct a filter / route map that will filter out any more specific prefixes where a less specific one exists in the BGP table. If my above conclusion is correct a significant portion ~47% of the number of the prefixes in the table could be argued to be very unnecessary at one level or another. Is such a filter possible easily or would it have to be explicitly declared, any chance of a process the automatically tracks and publishes a list of offending specifics similar to Team Cymru's Bogon BGP feed. As a transit consumer - why would I want to carry all this cr*p in my routing table, I would still be getting a BGP route to the larger prefix anyway - let my transit feeds sort out which route they use & traffic engineering. Thoughts anyone? Kind Regards Ben
On Tue, Jan 15, 2008 at 04:11:36PM -0000, Ben Butler wrote:
As a transit consumer - why would I want to carry all this cr*p in my routing table, I would still be getting a BGP route to the larger prefix anyway - let my transit feeds sort out which route they use & traffic engineering.
Well, you could always just take "Customer" routes from each of your providers (since you're running BGP I presume you're actually multihomed and not adding to the pollution) and point default at one/both providers for the other networks (or take default from one or both of them). - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
Hi, Default wont work - I do care about my transit providers network becoming partitioned or IXPs having problems or fiber cuts etc etc So I need my router to see all the reachability of a prefix in BGP so that my router knows which transit to send it to. Defaults wont work because a routing decision has to be made, my transit originating a default or me pointing a default at them does not guarantee the reachability of all prefixes.. But if I can see the /19 in the table, do I care about a load of /24s because the whole of the /19 should be reachable as the origin AS is announcing it somewhere in their network and it is being received my a transit so should be reachable. Ok, I can dream up a few emergencies where it might be helpful to pin a /24 as well as the /19 - but I am sure there aren't 100K+ emergencies happening continuously in the route table and it is on the whole general whatever because there is no incentive to stop de-aggregating once you have started. If they are only announcing the de-aggregated /24s and no summary /19 then my question doesn't apply as I only want to drop the more specifics where a less specific exists. I am struggling to see a defensible position for why just shy of 50% of all routes appears to be mostly comprised of de-aggregated routes when aggregation is one of the aims RIRs make the LIRs strive to achieve. If we cant clean the mess up because there is no incentive than cant I simply ignore the duplicates. Regards Ben -----Original Message----- From: Jared Mauch [mailto:jared@puck.nether.net] Sent: 15 January 2008 16:19 To: Ben Butler Cc: nanog@merit.edu Subject: Re: BGP Filtering On Tue, Jan 15, 2008 at 04:11:36PM -0000, Ben Butler wrote:
As a transit consumer - why would I want to carry all this cr*p in my routing table, I would still be getting a BGP route to the larger prefix anyway - let my transit feeds sort out which route they use & traffic engineering.
Well, you could always just take "Customer" routes from each of your providers (since you're running BGP I presume you're actually multihomed and not adding to the pollution) and point default at one/both providers for the other networks (or take default from one or both of them). - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
On 15-Jan-2008, at 11:40, Ben Butler wrote:
Defaults wont work because a routing decision has to be made, my transit originating a default or me pointing a default at them does not guarantee the reachability of all prefixes..
Taking a table that won't fit in RAM similarly won't guarantee reachability of anything :-) Filter on assignment boundaries and supplement with a default. That ought to mean that you have a reasonable shot at surviving de-peering/ partitioning events, and the defaults will pick up the slack in the event that you don't. For extra credit, supplement with a bunch of null routes for bogons so packets with bogon destination addresses don't leave your network, and maybe make exceptions for "golden prefixes".
I am struggling to see a defensible position for why just shy of 50% of all routes appears to be mostly comprised of de-aggregated routes when aggregation is one of the aims RIRs make the LIRs strive to achieve. If we cant clean the mess up because there is no incentive than cant I simply ignore the duplicates.
You can search the archives I'm sure for more detailed discussion of this. However, you can't necessarily always attribute the presence of covered prefixes to incompetence. Joe
Hi, Agreed that is why I have lots of RAM - doesn't mean I should carry on upgrading my tower of babble though to make it ever higher and higher if there is a better way of doing things. I still don't see how a default route to a portioned pop is going to help in the slightest - you are saved by getting the prefixes from an alternate transit and the default doesn't get used. Where is does help is to capture anything which has been filtered out completely and then there is no prefix from the alternate transit provider anyway - so whichever default gets used and takes its chances. Bogons - obviously. My question was if what I was asking was possible. Kind Regards Ben -----Original Message----- From: Joe Abley [mailto:jabley@ca.afilias.info] Sent: 15 January 2008 17:07 To: Ben Butler Cc: nanog@merit.edu Subject: Re: BGP Filtering On 15-Jan-2008, at 11:40, Ben Butler wrote:
Defaults wont work because a routing decision has to be made, my transit originating a default or me pointing a default at them does not guarantee the reachability of all prefixes..
Taking a table that won't fit in RAM similarly won't guarantee reachability of anything :-) Filter on assignment boundaries and supplement with a default. That ought to mean that you have a reasonable shot at surviving de-peering/ partitioning events, and the defaults will pick up the slack in the event that you don't. For extra credit, supplement with a bunch of null routes for bogons so packets with bogon destination addresses don't leave your network, and maybe make exceptions for "golden prefixes".
I am struggling to see a defensible position for why just shy of 50% of all routes appears to be mostly comprised of de-aggregated routes when aggregation is one of the aims RIRs make the LIRs strive to achieve. If we cant clean the mess up because there is no incentive than cant I simply ignore the duplicates.
You can search the archives I'm sure for more detailed discussion of this. However, you can't necessarily always attribute the presence of covered prefixes to incompetence. Joe
Ben, I think I understand what you want, and you don't want it. If you receive a route for, say, 204.91.0.0/16, 204.91.0.0/17, and 204.91.128.0/17, you want to drop the /17s and just care about the /16. But a change in topology does not generally result in a complete update of the BGP table. Route changes result in route adds and draws, not a flood event. So if you forgot about the /17s and just kept the /16, and the /16 was subsequently withdrawn, your router would not magically remember that it had /17s to route to as well. You'd drop traffic, unless you had a default, in which case you'd just route it suboptimally. -Dave Ben Butler wrote:
Hi,
Agreed that is why I have lots of RAM - doesn't mean I should carry on upgrading my tower of babble though to make it ever higher and higher if there is a better way of doing things.
I still don't see how a default route to a portioned pop is going to help in the slightest - you are saved by getting the prefixes from an alternate transit and the default doesn't get used. Where is does help is to capture anything which has been filtered out completely and then there is no prefix from the alternate transit provider anyway - so whichever default gets used and takes its chances.
Bogons - obviously.
My question was if what I was asking was possible.
Kind Regards
Ben
-----Original Message----- From: Joe Abley [mailto:jabley@ca.afilias.info] Sent: 15 January 2008 17:07 To: Ben Butler Cc: nanog@merit.edu Subject: Re: BGP Filtering
On 15-Jan-2008, at 11:40, Ben Butler wrote:
Defaults wont work because a routing decision has to be made, my transit originating a default or me pointing a default at them does not guarantee the reachability of all prefixes..
Taking a table that won't fit in RAM similarly won't guarantee reachability of anything :-)
Filter on assignment boundaries and supplement with a default. That ought to mean that you have a reasonable shot at surviving de-peering/ partitioning events, and the defaults will pick up the slack in the event that you don't.
For extra credit, supplement with a bunch of null routes for bogons so packets with bogon destination addresses don't leave your network, and maybe make exceptions for "golden prefixes".
I am struggling to see a defensible position for why just shy of 50% of all routes appears to be mostly comprised of de-aggregated routes when aggregation is one of the aims RIRs make the LIRs strive to achieve. If we cant clean the mess up because there is no incentive than cant I simply ignore the duplicates.
You can search the archives I'm sure for more detailed discussion of this. However, you can't necessarily always attribute the presence of covered prefixes to incompetence.
Joe
Hi Dave, Yes that is what I was thinking I want to do - so I am guessing here - I think what we are saying is the /17s never get re-added when the /16 is withdrawn because this does not - for very good reasons when I think about it- cause the filter to be evaluated upon the withdrawal of a prefix, only on when it is newly announced does it get checked - or maybe the odd table scan in the code?? But basically the /17s just sit there and continue to be filtered. Is that approximately correct? so umm, yes a default would be needed, ummm. Is it even technically possible to easily achieve though? Ben ________________________________ From: Dave Israel [mailto:davei@otd.com] Sent: 15 January 2008 17:51 To: Ben Butler Cc: nanog@merit.edu Subject: Re: BGP Filtering Ben, I think I understand what you want, and you don't want it. If you receive a route for, say, 204.91.0.0/16, 204.91.0.0/17, and 204.91.128.0/17, you want to drop the /17s and just care about the /16. But a change in topology does not generally result in a complete update of the BGP table. Route changes result in route adds and draws, not a flood event. So if you forgot about the /17s and just kept the /16, and the /16 was subsequently withdrawn, your router would not magically remember that it had /17s to route to as well. You'd drop traffic, unless you had a default, in which case you'd just route it suboptimally. -Dave Ben Butler wrote: Hi, Agreed that is why I have lots of RAM - doesn't mean I should carry on upgrading my tower of babble though to make it ever higher and higher if there is a better way of doing things. I still don't see how a default route to a portioned pop is going to help in the slightest - you are saved by getting the prefixes from an alternate transit and the default doesn't get used. Where is does help is to capture anything which has been filtered out completely and then there is no prefix from the alternate transit provider anyway - so whichever default gets used and takes its chances. Bogons - obviously. My question was if what I was asking was possible. Kind Regards Ben -----Original Message----- From: Joe Abley [mailto:jabley@ca.afilias.info] Sent: 15 January 2008 17:07 To: Ben Butler Cc: nanog@merit.edu Subject: Re: BGP Filtering On 15-Jan-2008, at 11:40, Ben Butler wrote: Defaults wont work because a routing decision has to be made, my transit originating a default or me pointing a default at them does not guarantee the reachability of all prefixes.. Taking a table that won't fit in RAM similarly won't guarantee reachability of anything :-) Filter on assignment boundaries and supplement with a default. That ought to mean that you have a reasonable shot at surviving de-peering/ partitioning events, and the defaults will pick up the slack in the event that you don't. For extra credit, supplement with a bunch of null routes for bogons so packets with bogon destination addresses don't leave your network, and maybe make exceptions for "golden prefixes". I am struggling to see a defensible position for why just shy of 50% of all routes appears to be mostly comprised of de-aggregated routes when aggregation is one of the aims RIRs make the LIRs strive to achieve. If we cant clean the mess up because there is no incentive than cant I simply ignore the duplicates. You can search the archives I'm sure for more detailed discussion of this. However, you can't necessarily always attribute the presence of covered prefixes to incompetence. Joe
The /17 isn't sitting there still being filtered; it was never there to begin with. Your router heard the /17, saw that it didn't want it because of your filter settings, and promptly forgot it. You can tell your router to remember routes it doesn't install; it's called soft reconfiguration on a Cisco and is the normal mode of operation for a Juniper. But if you do that, you're not saving memory; an inactive route does not take less RAM than an active one. I am pretty sure that there isn't a way to match a route on whether a larger aggregate exists using the current route map/policy statement verbage on the routers I have worked with. Doing so would be a reasonably simple code tweak, but without a purpose it isn't a tweak you're going to see any time soon. -Dave Ben Butler wrote:
Hi Dave,
Yes that is what I was thinking I want to do - so I am guessing here - I think what we are saying is the /17s never get re-added when the /16 is withdrawn because this does not - for very good reasons when I think about it- cause the filter to be evaluated upon the withdrawal of a prefix, only on when it is newly announced does it get checked - or maybe the odd table scan in the code?? But basically the /17s just sit there and continue to be filtered. Is that approximately correct?
so umm, yes a default would be needed, ummm.
Is it even technically possible to easily achieve though?
Ben
------------------------------------------------------------------------ *From:* Dave Israel [mailto:davei@otd.com] *Sent:* 15 January 2008 17:51 *To:* Ben Butler *Cc:* nanog@merit.edu *Subject:* Re: BGP Filtering
Ben,
I think I understand what you want, and you don't want it. If you receive a route for, say, 204.91.0.0/16, 204.91.0.0/17, and 204.91.128.0/17, you want to drop the /17s and just care about the /16. But a change in topology does not generally result in a complete update of the BGP table. Route changes result in route adds and draws, not a flood event. So if you forgot about the /17s and just kept the /16, and the /16 was subsequently withdrawn, your router would not magically remember that it had /17s to route to as well. You'd drop traffic, unless you had a default, in which case you'd just route it suboptimally.
-Dave
Ben Butler wrote:
Hi,
Agreed that is why I have lots of RAM - doesn't mean I should carry on upgrading my tower of babble though to make it ever higher and higher if there is a better way of doing things.
I still don't see how a default route to a portioned pop is going to help in the slightest - you are saved by getting the prefixes from an alternate transit and the default doesn't get used. Where is does help is to capture anything which has been filtered out completely and then there is no prefix from the alternate transit provider anyway - so whichever default gets used and takes its chances.
Bogons - obviously.
My question was if what I was asking was possible.
Kind Regards
Ben
-----Original Message----- From: Joe Abley [mailto:jabley@ca.afilias.info] Sent: 15 January 2008 17:07 To: Ben Butler Cc: nanog@merit.edu Subject: Re: BGP Filtering
On 15-Jan-2008, at 11:40, Ben Butler wrote:
Defaults wont work because a routing decision has to be made, my transit originating a default or me pointing a default at them does not guarantee the reachability of all prefixes..
Taking a table that won't fit in RAM similarly won't guarantee reachability of anything :-)
Filter on assignment boundaries and supplement with a default. That ought to mean that you have a reasonable shot at surviving de-peering/ partitioning events, and the defaults will pick up the slack in the event that you don't.
For extra credit, supplement with a bunch of null routes for bogons so packets with bogon destination addresses don't leave your network, and maybe make exceptions for "golden prefixes".
I am struggling to see a defensible position for why just shy of 50% of all routes appears to be mostly comprised of de-aggregated routes when aggregation is one of the aims RIRs make the LIRs strive to achieve. If we cant clean the mess up because there is no incentive than cant I simply ignore the duplicates.
You can search the archives I'm sure for more detailed discussion of this. However, you can't necessarily always attribute the presence of covered prefixes to incompetence.
Joe
On Jan 15, 2008 12:51 PM, Dave Israel <davei@otd.com> wrote:
I think I understand what you want, and you don't want it. If you receive a route for, say, 204.91.0.0/16, 204.91.0.0/17, and 204.91.128.0/17, you want to drop the /17s and just care about the /16. But a change in topology does not generally result in a complete update of the BGP table. Route changes result in route adds and draws, not a flood event. So if you forgot about the /17s and just kept the /16, and the /16 was subsequently withdrawn, your router would not magically remember that it had /17s to route to as well.
Dave, That's half-true. The "routing table" is comprised of two components: the Routing Information Base (RIB) and the Forwarding Information Base (FIB). The RIB sits in slow, cheap memory and contains routes and metrics for every route as announced by every neighbor. The FIB sits in fast, expensive memory and contains the currently "best" route for each destination. The FIB is built by choosing the best routes from the RIB. Packet-forwarding decisions are made by consulting the FIB. Opportunistically filtering routes from the RIB would have exactly the problem you point out: routing updates are incremental. The knowledge that the /16 has been withdrawn may not accompany the knowledge that the /17s are available. Opportunistically filtering more-specific routes from the FIB, however, could be very valuable at the edge of the DFZ. If Cisco supported such filtering, those Sup2's could have another few years of life left in them. With 512m ram in a two-transit provider scenario a Sup2 could handle upwards of 1M routes in the RIB. Unfortunately, they can only handle 244k routes in the FIB. Ben, coming back to your question: I don't think there is a way to make the software filter the routes inserted into the FIB. I don't see a reason why it couldn't be programmed to do that. But the fine folks at Cisco didn't see fit to write that software. Its a pity 'cause it would be very useful. The next best thing you can do is statically filter /8's from distant regions. You're posting to NANOG, so I assume that the RIPE and APNIC regions are distant for you. Go to IANA's web site and download the list of /8's assigned exclusively to each of those registries. For each, create a set of /8 static routes towards each of your transit providers with a route target address picked from an address block that will disappear or become distant if your link to that transit provider is severed. Then use prefix lists to filter more specific routes within those /8's. That should give you a result that's almost as good as if you carried all the routes while cutting a bunch of routes from your table. Regards, Bill Herrin -- William D. Herrin herrin@dirtside.com bill@herrin.us 3005 Crane Dr. Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
William Herrin wrote:
On Jan 15, 2008 12:51 PM, Dave Israel <davei@otd.com> wrote:
I think I understand what you want, and you don't want it. If you receive a route for, say, 204.91.0.0/16, 204.91.0.0/17, and 204.91.128.0/17, you want to drop the /17s and just care about the /16. But a change in topology does not generally result in a complete update of the BGP table. Route changes result in route adds and draws, not a flood event. So if you forgot about the /17s and just kept the /16, and the /16 was subsequently withdrawn, your router would not magically remember that it had /17s to route to as well.
Dave,
That's half-true.
[discussion of FIB vs RIB deleted] But, as you said yourself:
Ben, coming back to your question: I don't think there is a way to make the software filter the routes inserted into the FIB. I don't see a reason why it couldn't be programmed to do that. But the fine folks at Cisco didn't see fit to write that software. Its a pity 'cause it would be very useful.
Ergo, why I didn't discuss the FIB in my email. If you want to filter routes, you generally have to filter them at the RIB. How you move data from the RIB to the FIB is one of those questions that keep router engineers up all night. The transfer must be fast, reliable, and cheap on the CPU. Often, this means keeping logic out of it. A paradigm is decided upon early, and if it takes ten years to actually come back to haunt them, they haven't done too badly. Fixing something that far down in the nuts and bolts isn't easy. (I am not saying the presence of a revenue-generating hardware fix doesn't influence the decision not to make a risky change to the software; I'm just saying there's a lot of grey area to play in.) -Dave
But if I can see the /19 in the table, do I care about a load of /24s because the whole of the /19 should be reachable as the origin AS is announcing it somewhere in their network and it is being received my a transit so should be reachable.
The "presumption" in cases like this is that the /24 may take a different path than the /19 in some or all cases. If you have only a single provider you can safely dump more specifics -- but then, you could just point default. If you *are* multihomed and the /19 and /24 both have the same primacy (first choice in a routing decision and same path) you can safely drop the more specific. The "presumption" is that in some cases the /24 would take a different path than the /19 in a routing fight. How much cost you want to incur for these is your choice. If enough people drop the more specifics, they will go away as well -- if they provided no benefit, fewer would exist. Some of this originates from the peering-contests where folks have "x number of prefixes" which makes them bigger than "y number of prefixes". I'd be interested to see any metrics on rate of growth of allocations longer than RIR limits since Verio instituted then dropped mandatory prefix filters. (vs the rate of growth of prefixes overall). I would guess that they accelerated. Deepak
Hi, It is late and am just checking email. But... The /24 is more specific than the /19 therefore the /24 take priority. In my opinion AS path length became somewhat redundant with the rise of confederations and BGP doesn't understand bandwidth, latency and congestion. But I didn't write it, I am not that clever and it works and is what we have today. But.... I don't care about the remote de-aggregating AS's local traffic engineering, I care about the reach ability of the IP my customer has requested, and the /19 is a valid route in the route table the origin AS put it there and it is in my local transit feed. Why should I pay in my router for the degaregated AS's traffic engineering which doesnt benefit me, I care about my transit and peers as long as the /19 is reachable. Personally it is the deagregating ASs problem if they have poor transit and peering not mine, maybe if they took ownership of their problem rather than trying to make it everyone else's problem we would not find ourselves in the mess we are currently in with no sign of the problem diminishing or fixing itself. This is not about my router or processor - it is fine thank you with plenty of capacity transits and peers - but that doesn't excuse the generation of dross in the table - I refuse to believe there are justifiable reasons for anywhere near the majority of those 100K+ suspect routes. As a wide general rough rule of thumb, more specifics (if any) for peering should only be getting announced to peers + customers not back up into transit providers. RIPE RIR rules don't deagreagte - period - these ASs should not expect others to carry their extra x prefixes just because they want to stretch the size of their table in a router waiving contest. I know I can dump them, for identical origination ASes, and things will continue to work for me - the trick and my question is how to dynamically classify them so that it is possible to think about dropping them. The question was how? The answer is - seems it cant be done. The main/best I have heard work around seems to be RIR minimum allocation PA space filtering plus defaults to capture the very small number of unique prefixes of PA less than minimum allocation size that would get filtered - as I understand it, it is top of my reading list on my desk tommorow. The idea as much as possible is to go with what is in the routing table not to pin default routes all over the place and to simply try and "easily with minimum maintenance" drop a slice of the dross without impacting customer experience. Thank you to all who suggested solutions. Ben -----Original Message----- From: Deepak Jain [mailto:deepak@ai.net] Sent: 15 January 2008 22:09 To: Ben Butler Cc: nanog@merit.edu Subject: Re: BGP Filtering
But if I can see the /19 in the table, do I care about a load of /24s because the whole of the /19 should be reachable as the origin AS is announcing it somewhere in their network and it is being received my a
transit so should be reachable.
The "presumption" in cases like this is that the /24 may take a different path than the /19 in some or all cases. If you have only a single provider you can safely dump more specifics -- but then, you could just point default. If you *are* multihomed and the /19 and /24 both have the same primacy (first choice in a routing decision and same path) you can safely drop the more specific. The "presumption" is that in some cases the /24 would take a different path than the /19 in a routing fight. How much cost you want to incur for these is your choice. If enough people drop the more specifics, they will go away as well -- if they provided no benefit, fewer would exist. Some of this originates from the peering-contests where folks have "x number of prefixes" which makes them bigger than "y number of prefixes". I'd be interested to see any metrics on rate of growth of allocations longer than RIR limits since Verio instituted then dropped mandatory prefix filters. (vs the rate of growth of prefixes overall). I would guess that they accelerated. Deepak
On 15 Jan 2008, at 16:11, Ben Butler wrote:
As a transit consumer - why would I want to carry all this cr*p in my routing table, I would still be getting a BGP route to the larger prefix anyway - let my transit feeds sort out which route they use & traffic engineering.
Maybe you don't get covering aggregates. That causes holes. Whether you care is a matter of local policy, though. :-)
On Sat, 19 Jan 2008, Andy Davidson wrote:
On 15 Jan 2008, at 16:11, Ben Butler wrote:
As a transit consumer - why would I want to carry all this cr*p in my routing table, I would still be getting a BGP route to the larger prefix anyway - let my transit feeds sort out which route they use & traffic engineering.
Maybe you don't get covering aggregates. That causes holes. Whether you care is a matter of local policy, though. :-)
There's no maybe about it. Filtering on RIR minimums will result in the more clue deprived networks disappearing from your view and a loss of reachability unless you have default pointed at some network not using such a filter. Since I've gotten several requests recently for the "latest version" of what I posted back in September (I guess others are starting to get worried/close to their limits), I decided this was a good time to setup some blog software (late last night)...and I posted the filter and a brief intro to http://jonsblog.lewis.org/ ---------------------------------------------------------------------- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
participants (8)
-
Andy Davidson
-
Ben Butler
-
Dave Israel
-
Deepak Jain
-
Jared Mauch
-
Joe Abley
-
Jon Lewis
-
William Herrin