Route table growth and hardware limits...talk to the filter
This evolved from a thread on another list. I think it's more appropriate for nanog, so here it is. Since many of you probably aren't on the other list, some context is lost, but it shouldn't matter. The prefix-list presented below should be considered a proof-of-concept / work-in-progress. As stated below, no testing has been done to verfiy it will cause no loss in connectivity (i.e. due to networks deaggregating their space and announcing it only as longer subnets than their RIR states are the minimum allocation size for the range from which their CIDRs were carved). OTOH, this _should_ be a relatively safe way for networks under the gun to upgrade (especially those running 7600/6500 gear with anything less than Sup720-3bxl) to survive on an internet with >~240k routes and get by with these filtered routes, either buying more time to get upgrades done or putting off upgrades for perhaps a considerable time. Here's what I ended up with (so far) based on Barry Greene's work at ftp://ftp-eng.cisco.com/cons/isp/security/Ingress-Prefix-Filter-Templates/ T-ip-prefix-filter-ingress-strict-check-v18.txt While working on this, I noticed a bunch of inconsistencies in the expected RIR minimum allocations in ISP-Ingress-In-Strict and in the data actually published by the various RIRs. I've adjusted the appropriate entries, and as previously mentioned, flipped things around so that for each of the known RIR /8 or shorter prefixes, prefixes longer than RIR specified minimums (or /24 in cases where the RIR specifies longer than /24) are denied. Due to the number of minimum acceptable allocation inconsistencies, I recollected all the data on number of routes shaved per RIR filter. For some reason, today I started out with fewer routes (228289...yesterday, I started with 230686) with no filtering. RIR filter section Reduction in routes APNIC 16690 ARIN 41070 RIPE 16981 LANIC 4468 AFRINIC 1516 ----------------------------- TOTAL 80725 The end result of applying all the RIR minimum allocation filters was 147564 BGP routes. I haven't checked to make sure there was no loss in reachability...this is just an idle 7206/NPE225 with nothing but its ethernet uplink. The prefix-list I'm using for this experiment is: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! APNIC http://www.apnic.net/db/min-alloc.html !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ip prefix-list ISP-Ingress-In-Strict SEQ 4000 deny 58.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 4001 deny 59.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4002 deny 60.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4004 deny 116.0.0.0/6 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 4008 deny 120.0.0.0/7 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 4009 deny 122.0.0.0/7 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 4011 deny 124.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4012 deny 125.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4013 deny 126.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4014 deny 202.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 4016 deny 210.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4018 permit 218.100.0.0/16 ge 17 le 24 ip prefix-list ISP-Ingress-In-Strict SEQ 4019 deny 218.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4021 deny 220.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict seq 4023 deny 222.0.0.0/8 ge 21 ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! http://www.arin.net/reference/ip_blocks.html#ipv4 !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ip prefix-list ISP-Ingress-In-Strict SEQ 5000 deny 24.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5001 deny 63.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5002 deny 64.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5004 deny 66.0.0.0/6 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5008 deny 70.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5010 deny 72.0.0.0/6 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5014 deny 76.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5015 deny 96.0.0.0/6 ge 21 ! these ge 25's are redundant, but left in for accounting purposes ip prefix-list ISP-Ingress-In-Strict SEQ 5020 deny 198.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 5022 deny 204.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 5023 deny 206.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 5032 deny 208.0.0.0/8 ge 23 ip prefix-list ISP-Ingress-In-Strict SEQ 5033 deny 209.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5034 deny 216.0.0.0/8 ge 21 ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! RIPE NCC https://www.ripe.net/ripe/docs/ripe-ncc-managed-address-space.html !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ip prefix-list ISP-Ingress-In-Strict SEQ 6000 deny 62.0.0.0/8 ge 20 ip prefix-list ISP-Ingress-In-Strict SEQ 6001 deny 77.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6002 deny 78.0.0.0/7 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6004 deny 80.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 6006 deny 82.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 6007 deny 83.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6008 deny 84.0.0.0/6 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6012 deny 88.0.0.0/7 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6014 deny 90.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6015 deny 91.0.0.0/8 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 6016 deny 92.0.0.0/6 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6020 deny 193.0.0.0/8 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 6021 deny 194.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 6023 deny 212.0.0.0/7 ge 20 ip prefix-list ISP-Ingress-In-Strict SEQ 6025 deny 217.0.0.0/8 ge 21 ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! LANIC - http://lacnic.net/en/registro/index.html !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ip prefix-list ISP-Ingress-In-Strict SEQ 7000 deny 189.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 7001 deny 190.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 7002 deny 200.0.0.0/8 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 7003 deny 201.0.0.0/8 ge 21 ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! AFRINIC http://www.afrinic.net/index.htm !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ip prefix-list ISP-Ingress-In-Strict SEQ 8000 deny 41.0.0.0/8 ge 23 ip prefix-list ISP-Ingress-In-Strict SEQ 8001 deny 196.0.0.0/8 ge 23 ! ip prefix-list ISP-Ingress-In-Strict seq 10200 permit 0.0.0.0/0 le 24 Just to show where a lot of the dropped routes are, here's a show ip prefix-list detail after the session is up and "full routes" have been received right after clearing the prefix-list counter and then clearing the bgp session. ip prefix-list ISP-Ingress-In-Strict: count: 51, range entries: 51, sequences: 4000 - 10200, refcount: 3 seq 4000 deny 58.0.0.0/8 ge 22 (hit count: 609, refcount: 6) seq 4001 deny 59.0.0.0/8 ge 21 (hit count: 662, refcount: 1) seq 4002 deny 60.0.0.0/7 ge 21 (hit count: 2014, refcount: 2) seq 4004 deny 116.0.0.0/6 ge 22 (hit count: 616, refcount: 4) seq 4008 deny 120.0.0.0/7 ge 22 (hit count: 370, refcount: 3) seq 4009 deny 122.0.0.0/7 ge 22 (hit count: 1153, refcount: 1) seq 4011 deny 124.0.0.0/8 ge 21 (hit count: 1040, refcount: 3) seq 4012 deny 125.0.0.0/8 ge 21 (hit count: 1302, refcount: 1) seq 4013 deny 126.0.0.0/8 ge 21 (hit count: 0, refcount: 1) seq 4014 deny 202.0.0.0/7 ge 25 (hit count: 0, refcount: 6) seq 4016 deny 210.0.0.0/7 ge 21 (hit count: 4776, refcount: 4) seq 4018 permit 218.100.0.0/16 ge 17 le 24 (hit count: 4, refcount: 1) seq 4019 deny 218.0.0.0/7 ge 21 (hit count: 1285, refcount: 3) seq 4021 deny 220.0.0.0/7 ge 21 (hit count: 2164, refcount: 2) seq 4023 deny 222.0.0.0/8 ge 21 (hit count: 679, refcount: 1) seq 5000 deny 24.0.0.0/8 ge 21 (hit count: 1889, refcount: 1) seq 5001 deny 63.0.0.0/8 ge 21 (hit count: 2818, refcount: 2) seq 5002 deny 64.0.0.0/7 ge 21 (hit count: 8420, refcount: 1) seq 5004 deny 64.0.0.0/6 ge 21 (hit count: 7878, refcount: 4) seq 5008 deny 70.0.0.0/7 ge 21 (hit count: 1426, refcount: 1) seq 5010 deny 72.0.0.0/6 ge 21 (hit count: 4637, refcount: 2) seq 5014 deny 76.0.0.0/8 ge 21 (hit count: 255, refcount: 3) seq 5015 deny 96.0.0.0/6 ge 21 (hit count: 23, refcount: 1) seq 5020 deny 198.0.0.0/7 ge 25 (hit count: 0, refcount: 3) seq 5022 deny 204.0.0.0/7 ge 25 (hit count: 0, refcount: 2) seq 5023 deny 206.0.0.0/7 ge 25 (hit count: 0, refcount: 1) seq 5032 deny 208.0.0.0/8 ge 23 (hit count: 3322, refcount: 2) seq 5033 deny 209.0.0.0/8 ge 21 (hit count: 4661, refcount: 1) seq 5034 deny 216.0.0.0/8 ge 21 (hit count: 5734, refcount: 2) seq 6000 deny 62.0.0.0/8 ge 20 (hit count: 1428, refcount: 1) seq 6001 deny 77.0.0.0/8 ge 22 (hit count: 447, refcount: 1) seq 6002 deny 78.0.0.0/7 ge 22 (hit count: 97, refcount: 1) seq 6004 deny 80.0.0.0/7 ge 21 (hit count: 2394, refcount: 4) seq 6006 deny 82.0.0.0/8 ge 21 (hit count: 994, refcount: 2) seq 6007 deny 83.0.0.0/8 ge 22 (hit count: 596, refcount: 1) seq 6008 deny 84.0.0.0/6 ge 22 (hit count: 3197, refcount: 1) seq 6012 deny 88.0.0.0/7 ge 22 (hit count: 1933, refcount: 3) seq 6014 deny 90.0.0.0/8 ge 22 (hit count: 32, refcount: 2) seq 6015 deny 91.0.0.0/8 ge 25 (hit count: 0, refcount: 1) seq 6016 deny 92.0.0.0/6 ge 22 (hit count: 0, refcount: 1) seq 6020 deny 193.0.0.0/8 ge 25 (hit count: 0, refcount: 2) seq 6021 deny 194.0.0.0/7 ge 25 (hit count: 0, refcount: 1) seq 6023 deny 212.0.0.0/7 ge 20 (hit count: 4190, refcount: 1) seq 6025 deny 217.0.0.0/8 ge 21 (hit count: 1690, refcount: 1) seq 7000 deny 189.0.0.0/8 ge 21 (hit count: 253, refcount: 2) seq 7001 deny 190.0.0.0/8 ge 21 (hit count: 1841, refcount: 1) seq 7002 deny 200.0.0.0/8 ge 25 (hit count: 0, refcount: 2) seq 7003 deny 201.0.0.0/8 ge 21 (hit count: 2390, refcount: 1) seq 8000 deny 41.0.0.0/8 ge 23 (hit count: 378, refcount: 1) seq 8001 deny 196.0.0.0/8 ge 23 (hit count: 1136, refcount: 1) seq 10200 permit 0.0.0.0/0 le 24 (hit count: 147571, refcount: 1) ---------------------------------------------------------------------- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
It's been pointed out that in my attempt to aggregate some of the rules, I missed a few chances to do aggregation and screwed up in one place deleting the wrong line after aggregating nearby lines. On the bright side, the way this prefix-list works, such an omission is harmless in that it only lets more routes through. The missing line was 68.0.0.0/7 from the ARIN region...so the route savings filtering ARIN space by min-allocation size is even greater than the numbers I previously posted. Here's an updated version: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! APNIC http://www.apnic.net/db/min-alloc.html !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ip prefix-list ISP-Ingress-In-Strict SEQ 4000 deny 58.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 4001 deny 59.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4002 deny 60.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4004 deny 116.0.0.0/6 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 4008 deny 120.0.0.0/6 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 4011 deny 124.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4013 deny 126.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4014 deny 202.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 4016 deny 210.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4018 permit 218.100.0.0/16 ge 17 le 24 ip prefix-list ISP-Ingress-In-Strict SEQ 4019 deny 218.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4021 deny 220.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict seq 4023 deny 222.0.0.0/8 ge 21 ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! http://www.arin.net/reference/ip_blocks.html#ipv4 !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ip prefix-list ISP-Ingress-In-Strict SEQ 5000 deny 24.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5001 deny 63.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5002 deny 64.0.0.0/6 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5006 deny 68.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5008 deny 70.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5010 deny 72.0.0.0/6 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5014 deny 76.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5015 deny 96.0.0.0/6 ge 21 ! these ge 25's are redundant, but left in for accounting purposes ip prefix-list ISP-Ingress-In-Strict SEQ 5020 deny 198.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 5022 deny 204.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 5023 deny 206.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 5032 deny 208.0.0.0/8 ge 23 ip prefix-list ISP-Ingress-In-Strict SEQ 5033 deny 209.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5034 deny 216.0.0.0/8 ge 21 ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! !! RIPE NCC https://www.ripe.net/ripe/docs/ripe-ncc-managed-address-space.html !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! ! ip prefix-list ISP-Ingress-In-Strict SEQ 6000 deny 62.0.0.0/8 ge 20 ip prefix-list ISP-Ingress-In-Strict SEQ 6001 deny 77.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6002 deny 78.0.0.0/7 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6004 deny 80.0.0.0/7 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 6006 deny 82.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 6007 deny 83.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6008 deny 84.0.0.0/6 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6012 deny 88.0.0.0/7 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6014 deny 90.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6015 deny 91.0.0.0/8 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 6016 deny 92.0.0.0/6 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6020 deny 193.0.0.0/8 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 6021 deny 194.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 6023 deny 212.0.0.0/7 ge 20 ip prefix-list ISP-Ingress-In-Strict SEQ 6025 deny 217.0.0.0/8 ge 21 ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! LANIC - http://lacnic.net/en/registro/index.html !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ip prefix-list ISP-Ingress-In-Strict SEQ 7000 deny 189.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 7001 deny 190.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 7002 deny 200.0.0.0/8 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 7003 deny 201.0.0.0/8 ge 21 ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! AFRINIC http://www.afrinic.net/index.htm !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! ip prefix-list ISP-Ingress-In-Strict SEQ 8000 deny 41.0.0.0/8 ge 23 ip prefix-list ISP-Ingress-In-Strict SEQ 8001 deny 196.0.0.0/8 ge 23 ! ! ip prefix-list ISP-Ingress-In-Strict seq 10200 permit 0.0.0.0/0 le 24 ---------------------------------------------------------------------- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
In a message written on Fri, Sep 07, 2007 at 07:14:01PM -0400, Jon Lewis wrote:
For some reason, today I started out with fewer routes (228289...yesterday, I started with 230686) with no filtering.
RIR filter section Reduction in routes APNIC 16690 ARIN 41070 RIPE 16981 LANIC 4468 AFRINIC 1516 ----------------------------- TOTAL 80725
The end result of applying all the RIR minimum allocation filters was 147564 BGP routes. I haven't checked to make sure there was no loss in reachability...this is just an idle 7206/NPE225 with nothing but its ethernet uplink.
The CIDR report states that we have 235647 routes that could be aggregated to 154503 routes. While not the same metric, I'd be surprised at 147,564 routes if you did not have reachability issues.
The prefix-list I'm using for this experiment is:
One idea I've seen tossed around is to allow for a small amount of deaggregation. For instance, if in a /8, the RIR allocates down to a /20, you might allow a /21 (break it into two blocks) or a /22 (break it into four blocks). Yes, that allows people with bigger allocations to break into more blocks, but it also allows everyone to do some TE without letting them do an unlimited amount. I fear some filtering is in our future. I'm not really opposed to it, either. However I'm afraid your results show the currently available filters to be too aggressive. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/ Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org
On Fri, 7 Sep 2007, Leo Bicknell wrote:
The CIDR report states that we have 235647 routes that could be aggregated to 154503 routes. While not the same metric, I'd be surprised at 147,564 routes if you did not have reachability issues.
If everyone behaved and announced their CIDRs as allocated (or even just deagged down to RIR minimum allocation size), those 147,564 routes would get you to everyone (in some cases suboptimally). Obviously, anyone with PA-using BGP customers would need to punch some holes to allow those customer subnets through. The trouble is, it turns out there are a number of networks where CIDR isn't spoken. They get their IP space from their RIR, break it up into /24s, and announce those /24s (the ones they're using anyway) into BGP as /24s with no covering CIDR. So, use of this prefix-list without a default route will cut off portions of the internet.
One idea I've seen tossed around is to allow for a small amount of deaggregation. For instance, if in a /8, the RIR allocates down to a /20, you might allow a /21 (break it into two blocks) or a /22 (break it into four blocks). Yes, that allows people with bigger allocations to break into more blocks, but it also allows everyone to do some TE without letting them do an unlimited amount.
I'm not crazy about that, but certainly it'd work, and there would still be some savings. Due to the above mentioned stupidity, you'd still have no routes for some parts of the internet.
I fear some filtering is in our future. I'm not really opposed to it, either. However I'm afraid your results show the currently available filters to be too aggressive.
If filtering is inevitible, I think it's worth reviving the CIDR police and perhaps scaring some clue into the networks that stand to be filtered off the net by anyone needing to do any level of filtering. ---------------------------------------------------------------------- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
I'm not crazy about that, but certainly it'd work, and there would still be some savings. Due to the above mentioned stupidity, you'd still have no routes for some parts of the internet.
what i think it boils down to is that many folks seem to run default-free because they can, because its cool, because its what tier-1 folks do, because (insert cool/uber reason why here), but not necessarily because they HAVE TO. even if you're a content-provider in North America and want to ensure an "optimal path" of traffic, generally speaking, you could accept prefixes (as-is) from ARIN allocations but for (say) APNIC and RIPE do either some degree of filtering or just push it via a default. having a full feed may be cool, but i'm not sure what cost folks are willing to pay for that 'cool' factor. filtering and/or default-to-one-place may be so 90s but that doesn't mean its a bad thing. cheers, lincoln.
On Sat, 8 Sep 2007, Lincoln Dale wrote:
what i think it boils down to is that many folks seem to run default-free because they can, because its cool, because its what tier-1 folks do, because (insert cool/uber reason why here), but not necessarily because they HAVE TO.
Consider a regional or local ISP providing BGP to a customer. The customer also has a connection to a "Tier 1". The customer may start asking questions when they notice they get 250k routes from one provider and only 50k to 80k less routes from you. I suppose some "Tier 1"s got away with this in the past though...so maybe there are acceptable answers.
even if you're a content-provider in North America and want to ensure an "optimal path" of traffic, generally speaking, you could accept prefixes (as-is) from ARIN allocations but for (say) APNIC and RIPE do either some degree of filtering or just push it via a default.
I actually suggested this yesterday to a friend who runs an ISP and has just run into his 7500s running out of RAM and crashing when turning up a new transit provider with full BGP routes. Filtering the APNIC and RIPE regions and adding a default will very likely let him fit "mostly full routes" on his router and put off the inevitible fork-lift upgrade a while longer. ---------------------------------------------------------------------- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net | _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
what i think it boils down to is that many folks seem to run default-free because they can, because its cool, because its what tier-1 folks do, because (insert cool/uber reason why here), but not necessarily because they HAVE TO.
Consider a regional or local ISP providing BGP to a customer. The customer also has a connection to a "Tier 1". The customer may start asking questions when they notice they get 250k routes from one provider and only 50k to 80k less routes from you.
It is all in the education. Educated right, you could claim that you're providing a superior service by _filtering_ what announcements you accept. better yet, you can claim that you're saving the customer money - THEY don't have to invest in more RAM / larger routers / larger TCAMs. OR, money dynamics will be that you charge a higher price for customers that want a 'full feed' with the higher price based on the higher price you have to pay to run a default-free network. the reality is that for many end customers (even multi-homed ones), receiving a 'default' route from an upstream rather than a ton of more-specific routes is perfectly acceptable. they can filter out that 0/0 if they don't want it, otherwise "things still work" if they accept it. in short: it is a MYTH that folks THINK they NEED a full routing table. most folks don't. cheers, lincoln.
Jon Lewis wrote:
If filtering is inevitible, I think it's worth reviving the CIDR police and perhaps scaring some clue into the networks that stand to be filtered off the net by anyone needing to do any level of filtering.
I agree. The first step would be figuring out the needed aggregate announcements, contacting the providers or upstreams. Who is willing to run a database to coordinate the effort? In North America, most everybody has returned from holidays. Let's make September the month of CIDR improvement! And October 1st the deadline.... I do not agree the filters as originally proposed are "too aggressive". Traffic engineering with one's peers is all very well and good, but at the second AS (or overseas) it's not acceptable.
On Sat, Sep 08, 2007 at 08:22:24AM -0400, William Allen Simpson wrote:
Jon Lewis wrote:
If filtering is inevitible, I think it's worth reviving the CIDR police and perhaps scaring some clue into the networks that stand to be filtered off the net by anyone needing to do any level of filtering. I agree.
The first step would be figuring out the needed aggregate announcements, contacting the providers or upstreams.
Who is willing to run a database to coordinate the effort?
In North America, most everybody has returned from holidays. Let's make September the month of CIDR improvement! And October 1st the deadline....
I do not agree the filters as originally proposed are "too aggressive". Traffic engineering with one's peers is all very well and good, but at the second AS (or overseas) it's not acceptable.
I think this is the most important point so far. There are a lot of providers that think that their announcements need to be global to manage link/load balancing with their peers/upstreams. Proper use of no-export (or similar) on the more specifics and the aggregate being sent out will reduce the global noise significantly. Perhaps some of the providers to these networks will nudge them a bit more to use proper techniques. I'm working on routing leaks this month. There have already been over 2600 leak events today that could have been prevented with as-path filters of some sort, either on a cutomer or peer. (this would obviously be in-addition to prefix-list filters). - Jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
On Sat, Sep 08, 2007, Jared Mauch wrote:
I do not agree the filters as originally proposed are "too aggressive". Traffic engineering with one's peers is all very well and good, but at the second AS (or overseas) it's not acceptable.
I think this is the most important point so far. There are a lot of providers that think that their announcements need to be global to manage link/load balancing with their peers/upstreams. Proper use of no-export (or similar) on the more specifics and the aggregate being sent out will reduce the global noise significantly.
Perhaps some of the providers to these networks will nudge them a bit more to use proper techniques.
Maybe some publicly documented case studies would be nice? Or is that all just too "topic secret" to tell your competitors how to do? That said, my second-to-last employer wouldn't actually handle BGP communities that they documented they'd accept.. with that kind of consistency are you surprised that the slightly-less-than-"nanog"-cluey-netadmins crowd of network administrators don't bother with BGP juju and stick to what works? Adrian (And no, they still don't accept the communities they document they would; nor tag traffic with DSCP bits they documented they would. Fun times.)
On Sat, Sep 08, 2007 at 09:17:16AM -0400, Jared Mauch wrote:
On Sat, Sep 08, 2007 at 08:22:24AM -0400, William Allen Simpson wrote: [snip]
I do not agree the filters as originally proposed are "too aggressive". Traffic engineering with one's peers is all very well and good, but at the second AS (or overseas) it's not acceptable.
I think this is the most important point so far. There are a lot of providers that think that their announcements need to be global to manage link/load balancing with their peers/upstreams. Proper use of no-export (or similar) on the more specifics and the aggregate being sent out will reduce the global noise significantly.
Perhaps some of the providers to these networks will nudge them a bit more to use proper techniques.
Any policing effort will require co-ordination and to be stated publicly (here and elsewhere) that it is a Good Thing. At a previous employer, I managed the network-wide memory carefully for years with such filtering techniques and received intense pushback from remote networks that were broken. The obvious lack of clue, lack of care, contstant arguments and pushback was so disgusting that it contributed to me departing that position.
From direct experience of chasing and hounding on my own time, apathy far outweighed ignorance most of the time. The fact that you can trivially operate more effectively using the same basic toolset (synrchonized and well-maintained prefix lists at the start) needed to clean up your external announcements was ignored. A direct cookbook provided and lots of folks will still think you are asking too much of them.
Some large transit providers also encourage customers to deaggregate and just announce prefixes in use rather than aggregate allocations as a so-called security measure. The ongoing pollution is also a way to both squeeze competators out of the marketplace and abuse longest match as a revenue stream. The mental image that came to me of the so-called tier 1s is a multi-party game of chicken, where the exhaustion of routing slots in their own gear is the point of collision. Or maybe a a series of cartoon planes in a nose dive contest to see who can pull up closest to the last second is more appropriate, as the crunch is inevitable. Cheers! Joe -- RSUC / GweepNet / Spunk / FnB / Usenix / SAGE
Thus spake "Jon Lewis" <jlewis@lewis.org>
The trouble is, it turns out there are a number of networks where CIDR isn't spoken. They get their IP space from their RIR, break it up into /24s, and announce those /24s (the ones they're using anyway) into BGP as /24s with no covering CIDR.
IMHO, such networks are broken and they should be filtered. If people doing this found themselves unable to reach the significant fraction of the Net (or certain key sites), they would add the covering route even if they were hoping people would accept their incompetent/TE /24s. S Stephen Sprunk "God does not play dice." --Albert Einstein CCIE #3723 "God is an inveterate gambler, and He throws the K5SSS dice at every possible opportunity." --Stephen Hawking
On Mon, Sep 10, 2007 at 10:16:17AM -0500, Stephen Sprunk wrote:
Thus spake "Jon Lewis" <jlewis@lewis.org>
The trouble is, it turns out there are a number of networks where CIDR isn't spoken. They get their IP space from their RIR, break it up into /24s, and announce those /24s (the ones they're using anyway) into BGP as /24s with no covering CIDR.
IMHO, such networks are broken and they should be filtered. If people doing this found themselves unable to reach the significant fraction of the Net (or certain key sites), they would add the covering route even if they were hoping people would accept their incompetent/TE /24s.
well, your assumptio n about how prefixes are used might be tempered with the thought that some /24s are used for interconnecting ISP's at exchanges... and for that matter it seems a lazy ISP to pass the buck on "routability" to an org that runs no transit infrastructure. RIR's (Well ARIN anyway) has NEVER assured routability of a delegated prefix. Tracking /filters based on RIR delegation policy seems like a leap to me... --bill
Stephen Sprunk "God does not play dice." --Albert Einstein
On 8/09/2007, at 3:45 PM, Leo Bicknell wrote:
In a message written on Fri, Sep 07, 2007 at 07:14:01PM -0400, Jon Lewis wrote:
For some reason, today I started out with fewer routes (228289...yesterday, I started with 230686) with no filtering.
RIR filter section Reduction in routes APNIC 16690 ARIN 41070 RIPE 16981 LANIC 4468 AFRINIC 1516 ----------------------------- TOTAL 80725
The end result of applying all the RIR minimum allocation filters was 147564 BGP routes. I haven't checked to make sure there was no loss in reachability...this is just an idle 7206/NPE225 with nothing but its ethernet uplink.
The CIDR report states that we have 235647 routes that could be aggregated to 154503 routes. While not the same metric, I'd be surprised at 147,564 routes if you did not have reachability issues.
The difference is roughly 3% of the total prefixes. ((154503-147564)/ 235647*100) It wouldn't be hard to run some form of netflow, and gauge the amount of traffic to those prefixes. If it's as insignificant as the number of prefixes, get/use a 0/0 route. -- Nathan Ward
[changed subject because I'm back to the original subject...] On 8-sep-2007, at 1:14, Jon Lewis wrote:
The prefix-list presented below should be considered a proof-of- concept / work-in-progress. As stated below, no testing has been done to verfiy it will cause no loss in connectivity (i.e. due to networks deaggregating their space and announcing it only as longer subnets than their RIR states are the minimum allocation size for the range from which their CIDRs were carved).
I think this is a good idea. Others have pointed out some problems with people who use PA to multihome or do traffic engineering with longer prefixes. A good way to go forward with this type of filtering without hurting people who rely on these types of deaggregation too much would be to only apply this filter to address space from RIRs in other regions. I.e., someone in the US would apply the RIR allocation size filters for address space from all RIRs except ARIN. This way, multihoming and traffic engineering still work for the most part. If I multihome with PA or traffic engineer here in the Netherlands and someone from the US doesn't see my more specifics, they at least see my aggregate or my ISP's aggregate, and when the traffic arrives in RIPE land the more specifics kick in.
While working on this, I noticed a bunch of inconsistencies in the expected RIR minimum allocations in ISP-Ingress-In-Strict and in the data actually published by the various RIRs.
I have a few more for you by dredging up the actual minimum size allocations from the allocation records on their FTP sites:
ip prefix-list ISP-Ingress-In-Strict SEQ 4000 deny 58.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 4001 deny 59.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 4002 deny 60.0.0.0/7 ge 21
I see /21, /19, /20 and /21 for 58, 59, 60 and 61.
ip prefix-list ISP-Ingress-In-Strict SEQ 4008 deny 120.0.0.0/7 ge 22
Hm, I don't see 120, probably not in use yet.
ip prefix-list ISP-Ingress-In-Strict SEQ 4009 deny 122.0.0.0/7 ge 22
/21
ip prefix-list ISP-Ingress-In-Strict SEQ 4011 deny 124.0.0.0/8 ge 21
/24
ip prefix-list ISP-Ingress-In-Strict SEQ 4013 deny 126.0.0.0/8 ge 21
/8 (!)
ip prefix-list ISP-Ingress-In-Strict SEQ 4014 deny 202.0.0.0/7 ge 25
/24
ip prefix-list ISP-Ingress-In-Strict SEQ 4016 deny 210.0.0.0/7 ge 21
210 = /24, 211 = /19
ip prefix-list ISP-Ingress-In-Strict SEQ 4019 deny 218.0.0.0/7 ge 21
218 = /24, 219 = /20
ip prefix-list ISP-Ingress-In-Strict seq 4023 deny 222.0.0.0/8 ge 21
Don't see it.
ip prefix-list ISP-Ingress-In-Strict SEQ 5000 deny 24.0.0.0/8 ge 21
/24
ip prefix-list ISP-Ingress-In-Strict SEQ 5004 deny 66.0.0.0/6 ge 21
66 = /20; 67, 68, 69 = /24
ip prefix-list ISP-Ingress-In-Strict SEQ 5008 deny 70.0.0.0/7 ge 21
70 = /22, 71 has a block of 20480 addresses as its smallest allocation
ip prefix-list ISP-Ingress-In-Strict SEQ 5010 deny 72.0.0.0/6 ge 21
72 = /22, 73 = /8; 74, 75 = /20
ip prefix-list ISP-Ingress-In-Strict SEQ 5014 deny 76.0.0.0/8 ge 21
/22
ip prefix-list ISP-Ingress-In-Strict SEQ 5015 deny 96.0.0.0/6 ge 21
/96 = /19, rest /16
! these ge 25's are redundant, but left in for accounting purposes ip prefix-list ISP-Ingress-In-Strict SEQ 5020 deny 198.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 5022 deny 204.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 5023 deny 206.0.0.0/7 ge 25 ip prefix-list ISP-Ingress-In-Strict SEQ 5032 deny 208.0.0.0/8 ge 23 ip prefix-list ISP-Ingress-In-Strict SEQ 5033 deny 209.0.0.0/8 ge 21 ip prefix-list ISP-Ingress-In-Strict SEQ 5034 deny 216.0.0.0/8 ge 21
/24, /24, /24, /20, /20, /24 I suspect the /21 vs /22 discrepancy is where ARIN gives out /22s but reserves /21s. (oh and I mistyped /20 a few places that need to be /22 but don't want to go back and correct, mail me for the raw data per /8 if you want it)
ip prefix-list ISP-Ingress-In-Strict SEQ 6001 deny 77.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6002 deny 78.0.0.0/7 ge 22
/21
ip prefix-list ISP-Ingress-In-Strict SEQ 6004 deny 80.0.0.0/7 ge 21
/20
ip prefix-list ISP-Ingress-In-Strict SEQ 6006 deny 82.0.0.0/8 ge 21
/20
ip prefix-list ISP-Ingress-In-Strict SEQ 6007 deny 83.0.0.0/8 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6008 deny 84.0.0.0/6 ge 22 ip prefix-list ISP-Ingress-In-Strict SEQ 6012 deny 88.0.0.0/7 ge 22
/21
ip prefix-list ISP-Ingress-In-Strict SEQ 6014 deny 90.0.0.0/8 ge 22
/17
ip prefix-list ISP-Ingress-In-Strict SEQ 6015 deny 91.0.0.0/8 ge 25
/24
ip prefix-list ISP-Ingress-In-Strict SEQ 6016 deny 92.0.0.0/6 ge 22
/13
ip prefix-list ISP-Ingress-In-Strict SEQ 6020 deny 193.0.0.0/8 ge 25
/29
ip prefix-list ISP-Ingress-In-Strict SEQ 6021 deny 194.0.0.0/7 ge 25
194 = /28, 195 = block of 26 addresses
ip prefix-list ISP-Ingress-In-Strict SEQ 6025 deny 217.0.0.0/8 ge 21
/20
ip prefix-list ISP-Ingress-In-Strict SEQ 7000 deny 189.0.0.0/8 ge 21
/12
ip prefix-list ISP-Ingress-In-Strict SEQ 7001 deny 190.0.0.0/8 ge 21
/20
ip prefix-list ISP-Ingress-In-Strict SEQ 7002 deny 200.0.0.0/8 ge 25
/24
ip prefix-list ISP-Ingress-In-Strict SEQ 7003 deny 201.0.0.0/8 ge 21
/20
ip prefix-list ISP-Ingress-In-Strict SEQ 8000 deny 41.0.0.0/8 ge 23
/22
ip prefix-list ISP-Ingress-In-Strict SEQ 8001 deny 196.0.0.0/8 ge 23
/24
! ip prefix-list ISP-Ingress-In-Strict seq 10200 permit 0.0.0.0/0 le 24
Legacy class A space are all /8 except that some of those are de facto used as RIR space, I think 4/8 and 12/8. Non-RIR class B should all be /16 but: | various | 128 | 32768 | | various | 129 | 20480 | | various | 131 | 32768 | | various | 133 | 16777216 | | various | 137 | 8192 | | various | 142 | 32768 | | various | 143 | 32768 | | various | 146 | 32768 | | various | 158 | 32768 | | various | 161 | 8192 | | various | 166 | 8192 | | various | 167 | 2304 | | various | 169 | 256 | | various | 170 | 32768 | | various | 172 | 1638400 | | various | 188 | 65536 | | various | 192 | 256 | | various | 198 | 256 | Also, there are only a few very small blocks so either ignoring those or making special case exceptions would allow even tighther filters: +---------+-----------------+------+ | rir | descr | num | +---------+-----------------+------+ | ripencc | 193.58.0.0 | 16 | | ripencc | 193.58.0.16 | 8 | | ripencc | 193.58.0.24 | 16 | | ripencc | 193.58.0.40 | 8 | | ripencc | 193.58.0.48 | 8 | | ripencc | 193.58.0.56 | 8 | | ripencc | 193.188.134.64 | 16 | | ripencc | 193.188.134.160 | 8 | | ripencc | 193.188.134.200 | 8 | | ripencc | 193.188.134.208 | 8 | | ripencc | 193.188.134.216 | 8 | | ripencc | 193.188.134.224 | 16 | | ripencc | 193.188.134.240 | 16 | | ripencc | 193.192.15.0 | 16 | | ripencc | 193.192.15.16 | 16 | | ripencc | 193.219.15.0 | 16 | | ripencc | 194.149.71.192 | 16 | | ripencc | 194.149.71.208 | 16 | | ripencc | 194.149.71.224 | 16 | | ripencc | 194.149.71.240 | 16 | | ripencc | 195.95.173.0 | 26 | +---------+-----------------+------+ 21 rows in set (0.16 sec) select rir, count(*) from addrspace where type='ipv4' and num < 256 group by rir; +---------+----------+ | rir | count(*) | +---------+----------+ | ripencc | 200 | +---------+----------+
participants (11)
-
Adrian Chadd
-
bmanning@vacation.karoshi.com
-
Iljitsch van Beijnum
-
Jared Mauch
-
Joe Provo
-
Jon Lewis
-
Leo Bicknell
-
Lincoln Dale
-
Nathan Ward
-
Stephen Sprunk
-
William Allen Simpson