Hello This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar? Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all. Thanks, Baldur
What about these ones? https://teamarin.net/2019/05/13/taking-a-hard-line-on-fraud/ On Wed, May 15, 2019 at 01:43:30PM +0200, Baldur Norddahl wrote:
Hello
This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar?
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
Thanks,
Baldur
Those numbers were subject to fraudulent acquisition. Some end users of these subject prefixes are victims. This blanket approach victimizes them further IMHO. My guess is this direction is why ARIN didn't post the prefixes in their blog post. They are however in the court docs. I don't recommend acting now. I could be wrong? Follow the registry, IMHO. John? Best, -M< On Wed, May 15, 2019 at 08:25 Anderson, Charles R <cra@wpi.edu> wrote:
What about these ones?
https://teamarin.net/2019/05/13/taking-a-hard-line-on-fraud/
On Wed, May 15, 2019 at 01:43:30PM +0200, Baldur Norddahl wrote:
Hello
This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar?
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
Thanks,
Baldur
We recently filtered out >=/24 prefixes since we're impacted by 768k day. I'm attaching our lightly researched list of exceptions. I'm interested in what others' operational experience is with filtering in this way. Filtering /24s cut our table down to around 315K. On 05/15/19 13:43 +0200, Baldur Norddahl wrote:
Hello
This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar?
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
Thanks,
Baldur
-- Dan White BTC Broadband Network Admin Lead Ph 918.366.0248 (direct) main: (918)366-8000 Fax 918.366.6610 email: dwhite@mybtc.com http://www.btcbroadband.com
We recently filtered out >=/24 prefixes since we're impacted by 768k day.
What kind of network are you running? Doing such prefix filtering on an eyeball network strikes me as insane - you'd be cutting off customers from huge swathes of the Internet (including small companies like us) that don't have large IPv4 sequential allocations.
On 05/15/19 13:44 +0000, Phil Lavin wrote:
We recently filtered out >=/24 prefixes since we're impacted by 768k day.
What kind of network are you running? Doing such prefix filtering on an eyeball network strikes me as insane - you'd be cutting off customers from huge swathes of the Internet (including small companies like us) that don't have large IPv4 sequential allocations.
We're an eyeball network. We accept default routes from our transit providers so in theory there should be no impact on reachability. I'm pretty concerned about things that I don't know due to inefficient routing, e.g. customers hitting a public anycast DNS server in the wrong location resulting in Geolocation issues. -- Dan White BTC Broadband Network Admin Lead Ph 918.366.0248 (direct) main: (918)366-8000 Fax 918.366.6610 email: dwhite@mybtc.com http://www.btcbroadband.com
We're an eyeball network. We accept default routes from our transit providers so in theory there should be no impact on reachability.
I'm pretty concerned about things that I don't know due to inefficient routing, e.g. customers hitting a public anycast DNS server in the wrong location resulting in Geolocation issues.
Ah! Understood. The default route(s) was the bit I missed. Makes a lot of sense if you can't justify buying new routers. Have you seen issues with Anycast routing thus far? One would assume that routing would still be fairly efficient unless you're picking up transit from non-local providers over extended L2 links.
On 05/15/19 13:58 +0000, Phil Lavin wrote:
We're an eyeball network. We accept default routes from our transit providers so in theory there should be no impact on reachability.
I'm pretty concerned about things that I don't know due to inefficient routing, e.g. customers hitting a public anycast DNS server in the wrong location resulting in Geolocation issues.
Ah! Understood. The default route(s) was the bit I missed. Makes a lot of sense if you can't justify buying new routers.
Have you seen issues with Anycast routing thus far? One would assume that routing would still be fairly efficient unless you're picking up transit from non-local providers over extended L2 links.
We've had no issues so far but this was a recent change. There was no noticeable change to outbound traffic levels. -- Dan White BTC Broadband Network Admin Lead Ph 918.366.0248 (direct) main: (918)366-8000 Fax 918.366.6610 email: dwhite@mybtc.com http://www.btcbroadband.com
On Wed, May 15, 2019 at 7:27 AM Dan White <dwhite@olp.net> wrote:
On 05/15/19 13:58 +0000, Phil Lavin wrote:
We're an eyeball network. We accept default routes from our transit providers so in theory there should be no impact on reachability.
I'm pretty concerned about things that I don't know due to inefficient routing, e.g. customers hitting a public anycast DNS server in the wrong location resulting in Geolocation issues.
Ah! Understood. The default route(s) was the bit I missed. Makes a lot of sense if you can't justify buying new routers.
Have you seen issues with Anycast routing thus far? One would assume that routing would still be fairly efficient unless you're picking up transit from non-local providers over extended L2 links.
We've had no issues so far but this was a recent change. There was no noticeable change to outbound traffic levels.
+1, there is no issue with this approach. i have been taking “provider routes” + default for a long time, works great. This makes sure you use each provider’s “customer cone” and SLA to the max while reducing your route load / churn. IMHO, you should only take full routes if your core business is providing full bgp feeds to downstrean transit customers.
-- Dan White BTC Broadband Network Admin Lead Ph 918.366.0248 (direct) main: (918)366-8000 Fax 918.366.6610 email: dwhite@mybtc.com http://www.btcbroadband.com
You can't do uRPF if you're not taking full routes. You also have a more limited set of information for analytics if you don't have full routes. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Ca By" <cb.list6@gmail.com> To: "Dan White" <dwhite@olp.net> Cc: nanog@nanog.org Sent: Wednesday, May 15, 2019 1:50:41 PM Subject: Re: BGP prefix filter list On Wed, May 15, 2019 at 7:27 AM Dan White < dwhite@olp.net > wrote: On 05/15/19 13:58 +0000, Phil Lavin wrote:
We're an eyeball network. We accept default routes from our transit providers so in theory there should be no impact on reachability.
I'm pretty concerned about things that I don't know due to inefficient routing, e.g. customers hitting a public anycast DNS server in the wrong location resulting in Geolocation issues.
Ah! Understood. The default route(s) was the bit I missed. Makes a lot of sense if you can't justify buying new routers.
Have you seen issues with Anycast routing thus far? One would assume that routing would still be fairly efficient unless you're picking up transit from non-local providers over extended L2 links.
We've had no issues so far but this was a recent change. There was no noticeable change to outbound traffic levels. +1, there is no issue with this approach. i have been taking “provider routes” + default for a long time, works great. This makes sure you use each provider’s “customer cone” and SLA to the max while reducing your route load / churn. IMHO, you should only take full routes if your core business is providing full bgp feeds to downstrean transit customers. <blockquote> -- Dan White BTC Broadband Network Admin Lead Ph 918.366.0248 (direct) main: (918)366-8000 Fax 918.366.6610 email: dwhite@mybtc.com http://www.btcbroadband.com </blockquote>
On Wed, May 15, 2019 at 11:52 AM Mike Hammett <nanog@ics-il.net> wrote:
You can't do uRPF if you're not taking full routes.
I would never do uRPF , i am not a transit shop, so no problem there. BCP38 is as sexy as i get.
You also have a more limited set of information for analytics if you don't have full routes.
Yep, i don’t run a sophisticate internet CDN either. Just pumping packets from eyeballs to clouds and back, mostly.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
------------------------------ *From: *"Ca By" <cb.list6@gmail.com> *To: *"Dan White" <dwhite@olp.net> *Cc: *nanog@nanog.org *Sent: *Wednesday, May 15, 2019 1:50:41 PM
*Subject: *Re: BGP prefix filter list
On Wed, May 15, 2019 at 7:27 AM Dan White <dwhite@olp.net> wrote:
On 05/15/19 13:58 +0000, Phil Lavin wrote:
We're an eyeball network. We accept default routes from our transit providers so in theory there should be no impact on reachability.
I'm pretty concerned about things that I don't know due to inefficient routing, e.g. customers hitting a public anycast DNS server in the wrong location resulting in Geolocation issues.
Ah! Understood. The default route(s) was the bit I missed. Makes a lot of sense if you can't justify buying new routers.
Have you seen issues with Anycast routing thus far? One would assume that routing would still be fairly efficient unless you're picking up transit from non-local providers over extended L2 links.
We've had no issues so far but this was a recent change. There was no noticeable change to outbound traffic levels.
+1, there is no issue with this approach.
i have been taking “provider routes” + default for a long time, works great.
This makes sure you use each provider’s “customer cone” and SLA to the max while reducing your route load / churn.
IMHO, you should only take full routes if your core business is providing full bgp feeds to downstrean transit customers.
-- Dan White BTC Broadband Network Admin Lead Ph 918.366.0248 (direct) main: (918)366-8000 Fax 918.366.6610 email: dwhite@mybtc.com http://www.btcbroadband.com
As an eyeball network myself, you'll probably want to look at those things. You don't need to run a CDN to know where your bits are going. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Ca By" <cb.list6@gmail.com> To: "Mike Hammett" <nanog@ics-il.net> Cc: "Dan White" <dwhite@olp.net>, nanog@nanog.org Sent: Wednesday, May 15, 2019 2:14:21 PM Subject: Re: BGP prefix filter list On Wed, May 15, 2019 at 11:52 AM Mike Hammett < nanog@ics-il.net > wrote: You can't do uRPF if you're not taking full routes. I would never do uRPF , i am not a transit shop, so no problem there. BCP38 is as sexy as i get. <blockquote> You also have a more limited set of information for analytics if you don't have full routes. </blockquote> Yep, i don’t run a sophisticate internet CDN either. Just pumping packets from eyeballs to clouds and back, mostly. <blockquote> ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com From: "Ca By" < cb.list6@gmail.com > To: "Dan White" < dwhite@olp.net > Cc: nanog@nanog.org Sent: Wednesday, May 15, 2019 1:50:41 PM Subject: Re: BGP prefix filter list On Wed, May 15, 2019 at 7:27 AM Dan White < dwhite@olp.net > wrote: <blockquote> On 05/15/19 13:58 +0000, Phil Lavin wrote:
We're an eyeball network. We accept default routes from our transit providers so in theory there should be no impact on reachability.
I'm pretty concerned about things that I don't know due to inefficient routing, e.g. customers hitting a public anycast DNS server in the wrong location resulting in Geolocation issues.
Ah! Understood. The default route(s) was the bit I missed. Makes a lot of sense if you can't justify buying new routers.
Have you seen issues with Anycast routing thus far? One would assume that routing would still be fairly efficient unless you're picking up transit from non-local providers over extended L2 links.
We've had no issues so far but this was a recent change. There was no noticeable change to outbound traffic levels. </blockquote> +1, there is no issue with this approach. i have been taking “provider routes” + default for a long time, works great. This makes sure you use each provider’s “customer cone” and SLA to the max while reducing your route load / churn. IMHO, you should only take full routes if your core business is providing full bgp feeds to downstrean transit customers. <blockquote> -- Dan White BTC Broadband Network Admin Lead Ph 918.366.0248 (direct) main: (918)366-8000 Fax 918.366.6610 email: dwhite@mybtc.com http://www.btcbroadband.com </blockquote> </blockquote>
Ca, taking a self-originated default route (with or without an additional partial view of the global routing table) from your transit provider's edge router seems to make the assumption that your transit provider's edge router either has a full table or a working default route itself. In the case of transit provider outages (planned or unplanned), the transit provider's edge router that you peer with may be up and reachable (and generating a default route to your routers), but may not have connectivity to the greater internet. Put another way, if your own routers don't have a full routing table then they don't have enough information to make intelligent routing decisions and are offloading that responsibility onto the transit provider. IMHO, what's the point of being multi-homed if you can't make intelligent routing decisions and provide routing redundancy in the case of a transit provider outage? Mike Hammett wrote on 5/15/2019 2:19 PM:
As an eyeball network myself, you'll probably want to look at those things. You don't need to run a CDN to know where your bits are going.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
------------------------------------------------------------------------ *From: *"Ca By" <cb.list6@gmail.com> *To: *"Mike Hammett" <nanog@ics-il.net> *Cc: *"Dan White" <dwhite@olp.net>, nanog@nanog.org *Sent: *Wednesday, May 15, 2019 2:14:21 PM *Subject: *Re: BGP prefix filter list
On Wed, May 15, 2019 at 11:52 AM Mike Hammett <nanog@ics-il.net <mailto:nanog@ics-il.net>> wrote:
You can't do uRPF if you're not taking full routes.
I would never do uRPF , i am not a transit shop, so no problem there. BCP38 is as sexy as i get.
You also have a more limited set of information for analytics if you don't have full routes.
Yep, i don’t run a sophisticate internet CDN either. Just pumping packets from eyeballs to clouds and back, mostly.
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
------------------------------------------------------------------------ *From: *"Ca By" <cb.list6@gmail.com <mailto:cb.list6@gmail.com>> *To: *"Dan White" <dwhite@olp.net <mailto:dwhite@olp.net>> *Cc: *nanog@nanog.org <mailto:nanog@nanog.org> *Sent: *Wednesday, May 15, 2019 1:50:41 PM
*Subject: *Re: BGP prefix filter list
On Wed, May 15, 2019 at 7:27 AM Dan White <dwhite@olp.net <mailto:dwhite@olp.net>> wrote:
On 05/15/19 13:58 +0000, Phil Lavin wrote: >> We're an eyeball network. We accept default routes from our transit >> providers so in theory there should be no impact on reachability. >> >> I'm pretty concerned about things that I don't know due to inefficient >> routing, e.g. customers hitting a public anycast DNS server in the wrong >> location resulting in Geolocation issues. > >Ah! Understood. The default route(s) was the bit I missed. Makes a lot of >sense if you can't justify buying new routers. > >Have you seen issues with Anycast routing thus far? One would assume that >routing would still be fairly efficient unless you're picking up transit >from non-local providers over extended L2 links.
We've had no issues so far but this was a recent change. There was no noticeable change to outbound traffic levels.
+1, there is no issue with this approach.
i have been taking “provider routes” + default for a long time, works great.
This makes sure you use each provider’s “customer cone” and SLA to the max while reducing your route load / churn.
IMHO, you should only take full routes if your core business is providing full bgp feeds to downstrean transit customers.
-- Dan White BTC Broadband Network Admin Lead Ph 918.366.0248 (direct) main: (918)366-8000 Fax 918.366.6610 email: dwhite@mybtc.com <mailto:dwhite@mybtc.com> http://www.btcbroadband.com
On Thu, May 16, 2019, at 16:38, Blake Hudson wrote:
offloading that responsibility onto the transit provider. IMHO, what's the point of being multi-homed if you can't make intelligent routing decisions and provide routing redundancy in the case of a transit provider outage?
Speaking of "intelligent routing", this is why doing some targeting on what you filter by some criteria other than prefix or as-path length is a good idea. Either manually every once in a while (just make sure that you at least check the situation every few weeks), or in an automated manner (better). You just need more data (usually *flow/ipfix based) in order to be able to take the good decisions. You can use traffic levels (or better - lack of traffic), traffic criticality (?!?! cirticity ?!?!) and prefix count saving as criteria. -- R-A.F.
Radu-Adrian Feurdean wrote on 5/17/2019 5:10 AM:
On Thu, May 16, 2019, at 16:38, Blake Hudson wrote:
offloading that responsibility onto the transit provider. IMHO, what's the point of being multi-homed if you can't make intelligent routing decisions and provide routing redundancy in the case of a transit provider outage? Speaking of "intelligent routing", this is why doing some targeting on what you filter by some criteria other than prefix or as-path length is a good idea. Either manually every once in a while (just make sure that you at least check the situation every few weeks), or in an automated manner (better). You just need more data (usually *flow/ipfix based) in order to be able to take the good decisions.
You can use traffic levels (or better - lack of traffic), traffic criticality (?!?! cirticity ?!?!) and prefix count saving as criteria.
-- R-A.F.
From my perspective one's ability to intelligently route IP traffic is directly correlated to the data they have available (their routing protocol and table). For example, with static default routes one can only make the simplest of routing decisions; with dynamic default routes one can make more informed decisions; with a partial view of the internet one can make even better decisions; with a full view of the internet one can make good decisions; and with a routing protocol that takes into account bandwidth, latency, loss, or other metrics one can make the very best decisions. Determining how intelligent one wants his or her decisions to be, and how much he or she is willing to spend to get there, is an exercise for the reader. Not all routers need a full view of the internet, but some do. The cost of routers that hold a full routing table in FIB is generally more than those that do not, but overall is not cost prohibitive (in my opinion) for the folks that are already paying to be multihomed. Single homed networks (or those with a single transit provider and additional peers), probably won't benefit from holding more than a default route to their transit provider and therefore may be able to get by with a less capable router. Each network is different and the choices driven by the needs for redundancy, availability, performance, and cost will come out differently as well.
I wanted to mention one additional important point in all these monitoring discussion. Right now, for one of my subnets Google services stopped working. Why? Because it seems like someone from Russia did BGP hijack, BUT, exclusively for google services (most likely some kind of peering). Quite by chance, I noticed that the traceroute from the google cloud to this subnet goes through Russia, although my country has nothing to do with Russia at all, not even transit traffic through them. Sure i mailed noc@google, but reaching someone in big companies is not easiest job, you need to search for some contact that answers. And good luck for realtime communications. And, all large CDNs have their own "internet", although they have BGP, they often interpret it in their own way, which no one but them can monitor and keep history. No looking glass for sure, as well. If your network is announced by a malicious party from another country, you will not even know about it, but your requests(actually answers from service) will go through this party.
Did this get resolved? if not please email me directly. On Fri, May 17, 2019 at 9:46 AM Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
I wanted to mention one additional important point in all these monitoring discussion. Right now, for one of my subnets Google services stopped working. Why? Because it seems like someone from Russia did BGP hijack, BUT, exclusively for google services (most likely some kind of peering). Quite by chance, I noticed that the traceroute from the google cloud to this subnet goes through Russia, although my country has nothing to do with Russia at all, not even transit traffic through them. Sure i mailed noc@google, but reaching someone in big companies is not easiest job, you need to search for some contact that answers. And good luck for realtime communications. And, all large CDNs have their own "internet", although they have BGP, they often interpret it in their own way, which no one but them can monitor and keep history. No looking glass for sure, as well. If your network is announced by a malicious party from another country, you will not even know about it, but your requests(actually answers from service) will go through this party.
On Fri, May 17, 2019, at 15:28, Blake Hudson wrote:
From my perspective one's ability to intelligently route IP traffic is directly correlated to the data they have available (their routing protocol and table). For example, with static default routes one can
For me, routing table and available routing protocols are not the only things needed for intelligent routing. And the router is not the only component involved in "intelligent routing". Not these days/not anymore. One thing that can help immensely in an internet environment is knowing where the data goes and where it comes from. Knowing your "important" traffic source/destinations is part of it. You can say "I can no longer keep all the routes in FIB, so I'll drop the /24s", then come to a conclusion that that you have loads of traffic towards an anycast node located in a /24 or that you exchange voice with a VoIP provider that announces /24. you just lost the ability to do something proper with your important destination. On the other hand, you may easily leave via default (in extreme cases even drop) traffic to several /16s from Mulgazanzar Telecom which which you barely exchange a few packets per day except the quarterly wave of DDoS/spam/scans/[name your favorite abuse]. Or you may just drop a few hundred more-specific routes for a destination that you do care about, but you cannot do much because network-wise it is too far away. Of course, such an approach involves human intervention, either for selecting the important and non-important destinations or for writing the code that does it automagically. Or both. There is no magic potion. (as a friday afternoon remark, there used to be such a potion in France, the "green powder", but they permanently ran out of stock in 2004 - see http://poudreverte.org/ - site in fr_FR).
Radu-Adrian Feurdean wrote on 5/17/2019 9:15 AM:
On Fri, May 17, 2019, at 15:28, Blake Hudson wrote:
From my perspective one's ability to intelligently route IP traffic is directly correlated to the data they have available (their routing protocol and table). For example, with static default routes one can For me, routing table and available routing protocols are not the only things needed for intelligent routing. And the router is not the only component involved in "intelligent routing". Not these days/not anymore.
One thing that can help immensely in an internet environment is knowing where the data goes and where it comes from. Knowing your "important" traffic source/destinations is part of it.
You can say "I can no longer keep all the routes in FIB, so I'll drop the /24s", then come to a conclusion that that you have loads of traffic towards an anycast node located in a /24 or that you exchange voice with a VoIP provider that announces /24. you just lost the ability to do something proper with your important destination. On the other hand, you may easily leave via default (in extreme cases even drop) traffic to several /16s from Mulgazanzar Telecom which which you barely exchange a few packets per day except the quarterly wave of DDoS/spam/scans/[name your favorite abuse]. Or you may just drop a few hundred more-specific routes for a destination that you do care about, but you cannot do much because network-wise it is too far away.
Of course, such an approach involves human intervention, either for selecting the important and non-important destinations or for writing the code that does it automagically. Or both. There is no magic potion. (as a friday afternoon remark, there used to be such a potion in France, the "green powder", but they permanently ran out of stock in 2004 - see http://poudreverte.org/ - site in fr_FR).
Radu, you're absolutely correct that BGP does not include the metrics often needed to make the best routing decisions. I mentioned metrics like bandwidth, delay, and loss (which some other routing protocols do consider); and you mentioned metrics like importance (I assume for business continuity or happy eyeballs) or the amount or frequency of data exchanged with a given remote AS/IP network. BGP addresses some problems (namely routing redundancy), but it has some intentional shortcomings when choosing the cheapest path, best performing path, or load balancing (not to mention its security shortcomings). Some folks choose to improve upon BGP by using BGP "optimizers", manual local pref adjustments, or similar configurations. And as this discussion has shown, other folks choose to introduce their own additional shortcomings by ignoring part of what BGP does have to offer. Perhaps in the future we will be able to agree on a replacement to (or improvements upon) BGP that addresses some of these shortcomings; we may also find that technology solves the limitations that currently force some folks to discard potentially valuable routing information.
On Fri, May 17, 2019 at 3:28 PM Blake Hudson <blake@ispn.net> wrote:
From my perspective one's ability to intelligently route IP traffic is directly correlated to the data they have available (their routing protocol and table)
One point perhaps being missed by some is that routing decisions are not always best made in the very last moment when you have a packet and need to decide on the destination. The culling of routing table I wanted to do is on a full feed from my upstream providers. I am not taking a default, but I may add a default manually. Think about this way to save at least half the size of the FIB with two transit providers: Find out which provider has the most prefixes going their way. Make a default to them and a route-map that drops every route. For the other provider, keep only the routes where they have better routing. This way you only use FIB space for the smaller provider. Everything else goes by default through the larger provider. Now doing that in practice is hard because router vendors did generally not make route-map or similar constructs flexible enough for the needed logic. But we can do other things, some of which have already been proposed in this thread. Like before have a default to the "best" of your transit providers and using culling to drop routes. Are we not all doing something like that already, with route maps to give some routes higher priority instead of always going strict shortest AS-path? Only difference is that you can fully drop the routes from FIB if you install defaults to handle it instead. Or what if I know that one of my transit providers are really good with Asia? I just want traffic to Asia by default go to them. I can install my own covering routes from the APNIC address space and then save a ton of FIB space by dropping routes within that space. I can have exceptions if needed. The above does not give you poorer routing decisions and may give you better. Regards, Baldur
Baldur Norddahl wrote on 5/17/2019 11:05 AM:
On Fri, May 17, 2019 at 3:28 PM Blake Hudson <blake@ispn.net <mailto:blake@ispn.net>> wrote:
From my perspective one's ability to intelligently route IP traffic is directly correlated to the data they have available (their routing protocol and table)
One point perhaps being missed by some is that routing decisions are not always best made in the very last moment when you have a packet and need to decide on the destination. The culling of routing table I wanted to do is on a full feed from my upstream providers. I am not taking a default, but I may add a default manually.
Think about this way to save at least half the size of the FIB with two transit providers: Find out which provider has the most prefixes going their way. Make a default to them and a route-map that drops every route. For the other provider, keep only the routes where they have better routing. This way you only use FIB space for the smaller provider. Everything else goes by default through the larger provider.
Now doing that in practice is hard because router vendors did generally not make route-map or similar constructs flexible enough for the needed logic.
But we can do other things, some of which have already been proposed in this thread. Like before have a default to the "best" of your transit providers and using culling to drop routes. Are we not all doing something like that already, with route maps to give some routes higher priority instead of always going strict shortest AS-path? Only difference is that you can fully drop the routes from FIB if you install defaults to handle it instead.
Or what if I know that one of my transit providers are really good with Asia? I just want traffic to Asia by default go to them. I can install my own covering routes from the APNIC address space and then save a ton of FIB space by dropping routes within that space. I can have exceptions if needed.
The above does not give you poorer routing decisions and may give you better.
Regards,
Baldur
Baldur, I believe most routing platforms already make use of clever shortcuts or techniques to reduce their FIB usage, but I don't think anyone has found a good, reliable method of reducing their RIB at zero cost. For example, what happens in your above configuration when your "better/default" transit provider is down due to maintenance or outage and your equipment continues to use its default route to direct traffic that direction? What happens if the transit provider that you normally only retain the best paths for becomes the best path for all destinations (for example if your connection to the better/default transit provider is down for maintenance or there is an upsteam peering change) and your router that normally only has a few thousand routes in RIB suddenly gets tasked with a 768k-1M route RIB? I would argue that one can generally safely add information to his or her router's RIB (such as adding a local preference, weight, or advertising with prepends to direct traffic toward a better performing, less utilized, or lower cost peer), but that removing information from a router's RIB always comes at some cost (and some may find this cost perfectly acceptable).
On Fri, May 17, 2019 at 9:44 PM Blake Hudson <blake@ispn.net> wrote:
Baldur, I believe most routing platforms already make use of clever shortcuts or techniques to reduce their FIB usage, but I don't think anyone has found a good, reliable method of reducing their RIB at zero cost. For example, what happens in your above configuration when your "better/default" transit provider is down due to maintenance or outage and your equipment continues to use its default route to direct traffic that direction?
You will of course have two default routes, one to each transit provider. Using route priorities to program which one is actually used. If that link goes down, that default becomes invalid and the router will use the other one. A more advanced setup can use triggers, such as ping, bfd or BGP, to mark the route as valid or invalid.
What happens if the transit provider that you normally only retain the best paths for becomes the best path for all destinations (for example if your connection to the better/default transit provider is down for maintenance or there is an upsteam peering change) and your router that normally only has a few thousand routes in RIB suddenly gets tasked with a 768k-1M route RIB?
I am not sure I am following that question. Nothing happens, you will have a default plus a bunch of redundant routes, but not any more than you had before the primary transit went down.
I would argue that one can generally safely add information to his or her router's RIB (such as adding a local preference, weight, or advertising with prepends to direct traffic toward a better performing, less utilized, or lower cost peer), but that removing information from a router's RIB always comes at some cost (and some may find this cost perfectly acceptable).
One needs to remember that removing information from RIB is how BGP works. If you have the common setup of two BGP edge routers, each with a directly connected transit provider link, the routers will only tell the other one about the routes it actually uses. Neither router has a complete view. Regards, Baldur
I would argue that one can generally safely add information to his or her router's RIB (such as adding a local preference, weight, or advertising with prepends to direct traffic toward a better performing, less utilized, or lower cost peer), but that removing information from a router's RIB always comes at some cost (and some may find this cost perfectly acceptable).
One needs to remember that removing information from RIB is how BGP works. If you have the common setup of two BGP edge routers, each with a directly connected transit provider link, the routers will only tell the other one about the routes it actually uses. Neither router has a complete view.
I manage a network like you describe: Two BGP edge routers, both routers accept a full eBGP feed from transit, both share routing information via iBGP. Both edge routers in my network have a complete view. If one transit provider is down or there is an upstream peering change, both still have a complete view. The only time they wouldn't have a complete view is during convergence or when there is a simultaneous outage of both transit providers at different physical facilities. I could certainly use a default route (configured statically or received via BGP) instead, but that reduces my network's ability to make informed decisions. When one of my upstream transit providers is performing maintenance and loses a peer, I want that to be reflected in my routing so that traffic can be directed via the shortest path. When my transit provider's edge router loses upstream connectivity, but maintains connectivity to my equipment, I want that reflected in my routing so that traffic doesn't go towards the path that leads to the bit bucket. I can't detect those conditions and route around them if my router only has a default route.
On Fri, May 17, 2019 at 10:43 PM Blake Hudson <blake@ispn.net> wrote:
I manage a network like you describe: Two BGP edge routers, both routers accept a full eBGP feed from transit, both share routing information via iBGP. Both edge routers in my network have a complete view. If one transit provider is down or there is an upstream peering change, both still have a complete view. The only time they wouldn't have a complete view is during convergence or when there is a simultaneous outage of both transit providers at different physical facilities.
What I mean by not having a complete view, is that your two routers do not have the same information. One router has all the routes from the transit directly connected, but only a subset of routes from the other transit provider. And visa versa for the other router. Therefore the two routers might not make the same routing decisions. Let me show you an example from two routers in our network: albertslund-edge1#show bgp vpnv4 unicast vrf internet detail 8.8.8.0 255.255.255.0 BGP routing table entry for 8.8.8.0/24 20w0d received from 193.239.117.141 (66.249.94.118), path-id 0 Origin i, nexthop 193.239.117.141, metric 100, localpref 500,weight 0, rtpref 200, best, block best, selected, Community 60876:34307 As path [15169] As4 path Received label notag Imported from 185.24.168.254 (185.24.168.254); Route Distinguisher:60876:0 (default for vrf internet) Origin i, nexthop 185.24.168.254, metric 100, localpref 500,weight 0, rtpref 200, Community 60876:34307 As path [15169] As4 path Route target:60876:0 Received label 164540 --- ballerup-edge1#show bgp vpnv4 unicast vrf internet detail 8.8.8.0 255.255.255.0 BGP routing table entry for 8.8.8.0/24 43w1d received from 193.239.117.141 (66.249.94.118), path-id 0 Origin i, nexthop 193.239.117.141, metric 100, localpref 500,weight 0, rtpref 200, best, block best, selected, Community 60876:34307 As path [15169] As4 path Received label notag Imported from 185.24.171.254 (185.24.171.254); Route Distinguisher:60876:0 (default for vrf internet) Origin i, nexthop 185.24.171.254, metric 100, localpref 500,weight 0, rtpref 200, Community 60876:34307 As path [15169] As4 path Route target:60876:0 Received label 164140 29w2d received from 216.66.83.101 (216.218.252.202), path-id 0 Origin i, nexthop 216.66.83.101, metric 100, localpref 450,weight 0, rtpref 200, Community 60876:6939 As path [6939 15169] As4 path Received label notag 43w2d received from 149.6.137.57 (154.26.32.142), path-id 0 Origin i, nexthop 149.6.137.57, metric 200, localpref 100,weight 0, rtpref 200, Community 174:21100 174:22010 60876:174 As path [174 6453 15169] As4 path Received label notag --- One router knows about 2 paths, the other about 4 paths. Why? Because BGP only advertises the route that is in use. Everyone here of course knows this, I am just pointing it out because culling information before allowing it to be redistributed within your network is what BGP is already doing anyway. It is possible to remove some of that information from the local FIB too without losing anything at all. Using a default also gives you a dramatically shorter convergence time if one of the transits goes down. Having 800k routes can be harmful to your network even with equipment that can handle it. Yes I am aware that I am not doing what I am preaching here, but I am considering it :-). Regards Baldur
Baldur Norddahl wrote on 5/18/2019 3:57 AM:
... One router knows about 2 paths, the other about 4 paths. Why? Because BGP only advertises the route that is in use. Everyone here of course knows this, I am just pointing it out because culling information before allowing it to be redistributed within your network is what BGP is already doing anyway. It is possible to remove some of that information from the local FIB too without losing anything at all.
Using a default also gives you a dramatically shorter convergence time if one of the transits goes down. Having 800k routes can be harmful to your network even with equipment that can handle it. Yes I am aware that I am not doing what I am preaching here, but I am considering it :-).
Thanks for the clarification. Yes, you are correct that each router will have its own unique view. By full view I meant that a router has at least one route for every prefix advertised into the DFZ. One should also expect that each transit provider will provide a slight variation in the routes provided via its "full BGP feed" because each transit provider has its own unique view and may include routes in its feeds that are not advertised into the DFZ. Appreciate the discourse my friend, --B
On Fri, May 17, 2019 at 9:06 AM Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Think about this way to save at least half the size of the FIB with two transit providers: Find out which provider has the most prefixes going their way. Make a default to them and a route-map that drops every route. For the other provider, keep only the routes where they have better routing. This way you only use FIB space for the smaller provider. Everything else goes by default through the larger provider.
Hi Baldur, The technique you describe was one variant of FIB Compression. It got some attention around 8 years ago on the IRTF Routing Research Group and some more attention about 5 years ago when several researchers fleshed out the possible algorithms and projected gains. As I recall they found a 30% to 60% reduction in FIB use depending on which algorithm was chosen, how many peers you had, etc. As far as I know there are no production implementations. Likely the extra complexity needed to process RIB updates in to FIB updates outweighs the cost of simply adding more TCAM. Another down side is that you lose the implicit discard default route, which means that routing loops become possible. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
Brocade (now Extreme) does this on their SLX platform to market 1M FIB boxes as 1.3M FIB boxes after compression. We went with the Juniper MX platform instead, the relatively small FIB size on the SLX being one of the main sticking points for me personally. Nowadays there are also some SLX models with a larger FIB, which don't need compression algorithms to accommodate the routing table growth for a couple of years. Best regards, Martijn On 20 May 2019 23:05:45 BST, William Herrin <bill@herrin.us> wrote:
On Fri, May 17, 2019 at 9:06 AM Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Think about this way to save at least half the size of the FIB with two transit providers: Find out which provider has the most prefixes going their way. Make a default to them and a route-map that drops every route. For the other provider, keep only the routes where they have better routing. This way you only use FIB space for the smaller provider. Everything else goes by default through the larger provider.
Hi Baldur,
The technique you describe was one variant of FIB Compression. It got some attention around 8 years ago on the IRTF Routing Research Group and some more attention about 5 years ago when several researchers fleshed out the possible algorithms and projected gains. As I recall they found a 30% to 60% reduction in FIB use depending on which algorithm was chosen, how many peers you had, etc.
As far as I know there are no production implementations. Likely the extra complexity needed to process RIB updates in to FIB updates outweighs the cost of simply adding more TCAM. Another down side is that you lose the implicit discard default route, which means that routing loops become possible.
Regards, Bill Herrin
On 5/20/19 3:05 PM, William Herrin wrote:
The technique you describe was one variant of FIB Compression. It got some attention around 8 years ago on the IRTF Routing Research Group and some more attention about 5 years ago when several researchers fleshed out the possible algorithms and projected gains. As I recall they found a 30% to 60% reduction in FIB use depending on which algorithm was chosen, how many peers you had, etc.
A good start would be killing any /24 announcement where a covering aggregate exists.
On Mon, May 20, 2019 at 4:09 PM Seth Mattinen <sethm@rollernet.us> wrote:
On 5/20/19 3:05 PM, William Herrin wrote:
The technique you describe was one variant of FIB Compression. It got some attention around 8 years ago on the IRTF Routing Research Group and some more attention about 5 years ago when several researchers fleshed out the possible algorithms and projected gains. As I recall they found a 30% to 60% reduction in FIB use depending on which algorithm was chosen, how many peers you had, etc.
A good start would be killing any /24 announcement where a covering aggregate exists.
Only when the routes are identical -- same origin, same path. Otherwise you're potentially throwing away your only path to that destination. And if you lose the aggregate, the /24 has to be reintroduced to the FIB. Which means you have to interlink the routes in the RIB data structure so that the update algorithm dealing with the aggregate knows there's an associated /24. There's some real subtlety a FIB Compression implementor must take in to account. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On 5/15/19 2:52 PM, Mike Hammett wrote:
You can't do uRPF if you're not taking full routes.
You also have a more limited set of information for analytics if you don't have full routes.
Or instead of uRPF (loose) on transit links, just take a BOGON feed? -- inoc.net!rblayzor XMPP: rblayzor.AT.inoc.net PGP: https://pgp.inoc.net/rblayzor/
Would also cut out anyone who uses /24s for anycast, or just general traffic control... Or as you put it, an insane amount of important stuff. Sent from my iPhone On May 15, 2019, at 7:44 AM, Phil Lavin <phil.lavin@cloudcall.com> wrote:
We recently filtered out >=/24 prefixes since we're impacted by 768k day.
What kind of network are you running? Doing such prefix filtering on an eyeball network strikes me as insane - you'd be cutting off customers from huge swathes of the Internet (including small companies like us) that don't have large IPv4 sequential allocations.
If you have multiple transit providers and still want to be able to push traffic to the best path (no default route), then maybe a filter that will accept only AS Path 2/3 or shorter per transit provider, and a default route for the rest. You will get significantly less prefixes, and BGP path selection will work “locally”. For far away prefixes though (more than 4 ASes away), you will not (always) pick the best path.
On 15 May 2019, at 16:36, Dan White <dwhite@olp.net> wrote:
We recently filtered out >=/24 prefixes since we're impacted by 768k day. I'm attaching our lightly researched list of exceptions. I'm interested in what others' operational experience is with filtering in this way.
Filtering /24s cut our table down to around 315K.
On 05/15/19 13:43 +0200, Baldur Norddahl wrote:
Hello
This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar?
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
Thanks,
Baldur
-- Dan White BTC Broadband Network Admin Lead Ph 918.366.0248 (direct) main: (918)366-8000 Fax 918.366.6610 email: dwhite@mybtc.com http://www.btcbroadband.com <punt-768k-day.txt>
What is the most common platform people are using with such limitations? How long ago was it deprecated? ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Baldur Norddahl" <baldur.norddahl@gmail.com> To: nanog@nanog.org Sent: Wednesday, May 15, 2019 6:43:30 AM Subject: BGP prefix filter list Hello This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar? Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all. Thanks, Baldur
On Wed, 15 May 2019, Mike Hammett wrote:
What is the most common platform people are using with such limitations? How long ago was it deprecated?
One network's deprecated router is another network's new [bargain priced] core router. :) ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
You have no idea how sad and true this is. On Wed, May 15, 2019 at 10:16 AM Jon Lewis <jlewis@lewis.org> wrote:
On Wed, 15 May 2019, Mike Hammett wrote:
What is the most common platform people are using with such limitations? How long ago was it deprecated?
One network's deprecated router is another network's new [bargain priced] core router. :)
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On 5/15/19 7:26 AM, Dovid Bender wrote:
You have no idea how sad and true this is.
On Wed, May 15, 2019 at 10:16 AM Jon Lewis <jlewis@lewis.org <mailto:jlewis@lewis.org>> wrote:
On Wed, 15 May 2019, Mike Hammett wrote:
> What is the most common platform people are using with such limitations? How long ago was it deprecated?
One network's deprecated router is another network's new [bargain priced] core router. :)
This is very true. I picked up a nicely equipped juniper mx240 - waayyyy overkill for my current operation - for far, far cheaper than anything I could have otherwise afforded new. Absolutely killer could not be happier, and J has won a convert. But, I find this seems to be the thing - needing capacity/feature sets/etc just to be able to stand still, but not having the revenue stream to actually pay new for what these vendors want to charge for their gear/licenses/etc. Mike-
On 15/May/19 19:20, Mike wrote:
This is very true. I picked up a nicely equipped juniper mx240 - waayyyy overkill for my current operation - for far, far cheaper than anything I could have otherwise afforded new. Absolutely killer could not be happier, and J has won a convert. But, I find this seems to be the thing - needing capacity/feature sets/etc just to be able to stand still, but not having the revenue stream to actually pay new for what these vendors want to charge for their gear/licenses/etc.
It is a quagmire, isn't it? The revenue from capacity (Ethernet, IP, DWDM, SDH) is falling every year, to a point where it stops becoming a primary revenue source for any telecoms provider. However, the cost of equipment is not following suit, be it on the IP, Transport or Mobile side, terrestrial, marine or wireless. Work that is going on in the open space around all of this for hardware and software needs to pick its pace up, otherwise this disconnect between the loss of revenue and the cost of capex will remain. Mark.
Eh... you'll find it hard to get that past me. I know hundreds of self-funded ISPs that don't have route table size issues. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Jon Lewis" <jlewis@lewis.org> To: "Mike Hammett" <nanog@ics-il.net> Cc: nanog@nanog.org Sent: Wednesday, May 15, 2019 9:14:57 AM Subject: Re: BGP prefix filter list On Wed, 15 May 2019, Mike Hammett wrote:
What is the most common platform people are using with such limitations? How long ago was it deprecated?
One network's deprecated router is another network's new [bargain priced] core router. :) ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On 5/15/2019 9:10 AM, Mike Hammett wrote:
Eh... you'll find it hard to get that past me. I know hundreds of self-funded ISPs that don't have route table size issues.
Lots of good non-big vendor options these days - times have changed for sure. I'm running an EdgeRouter Infinity with BGP feeds for v4 and v6 at home - very reasonably priced router with lots of ports and functionality. Even the old EdgeRouter Lite supported multiple BGP tables - and that was 7 years ago at a ~ $100 price point. But, for sub 200 can get an ER4 which will do most of the things the $1000+ routers will do. 'Tik, white box Linux/BSD, etc all offer good options at varying price points. -- Brielle Bruns The Summit Open Source Development Group http://www.sosdg.org / http://www.ahbl.org
On 15/05/2019 17:28, Brielle Bruns wrote:
Lots of good non-big vendor options these days - times have changed for sure.
Indeed.
'Tik, white box Linux/BSD, etc all offer good options at varying price points.
Any pointers and/or references, when looking into speeds *above* what is possible with aggregated 10G links?
On 5/15/2019 9:46 AM, Hansen, Christoffer wrote:
'Tik, white box Linux/BSD, etc all offer good options at varying price points.
Any pointers and/or references, when looking into speeds *above* what is possible with aggregated 10G links?
That's a good question - I've not gotten past 10G yet. Cheaply, you could get ConnectX-3 40G PCIe cards and throw them in your favorite Dell/HP/Supermicro/other rack mount server with your Linux/BSD distro of choice, or VyOS for that matter. There are instructions online on converting the IB versions of the Mellanox cards to their Ethernet counterparts, if you want to cut some cost even more. -- Brielle Bruns The Summit Open Source Development Group http://www.sosdg.org / http://www.ahbl.org
If you're going whitebox, I would check out Netgate's new product called TNSR. It uses VPP for the data plane, which does all its processing in user space, thus avoiding the inefficiencies of the kernel network stack. That's particularly important at higher speeds like 40G or 100G. Disclaimer: I have not tried it myself but I've only heard good things. On Wed, May 15, 2019 at 12:01 PM Brielle Bruns <bruns@2mbit.com> wrote:
On 5/15/2019 9:46 AM, Hansen, Christoffer wrote:
'Tik, white box Linux/BSD, etc all offer good options at varying price points.
Any pointers and/or references, when looking into speeds *above* what is possible with aggregated 10G links?
That's a good question - I've not gotten past 10G yet.
Cheaply, you could get ConnectX-3 40G PCIe cards and throw them in your favorite Dell/HP/Supermicro/other rack mount server with your Linux/BSD distro of choice, or VyOS for that matter.
There are instructions online on converting the IB versions of the Mellanox cards to their Ethernet counterparts, if you want to cut some cost even more.
-- Brielle Bruns The Summit Open Source Development Group http://www.sosdg.org / http://www.ahbl.org
Hello On Wed, May 15, 2019 at 3:56 PM Mike Hammett <nanog@ics-il.net> wrote:
What is the most common platform people are using with such limitations? How long ago was it deprecated?
We are a small network with approx 10k customers and two core routers. The routers are advertised as 2 million FIB and 10 million RIB. This morning at about 2 AM CET our iBGP session between the two core routers started flapping every 5 minutes. This is how long it takes to exchange the full table between the routers. The eBGP sessions to our transits were stable and never went down. The iBGP session is a MPLS multiprotocol BGP session that exhanges IPv4, IPv6 and VRF in a single session. We are working closely together with another ISP that have the same routers. His network went down as well. Nothing would help until I culled the majority of the IPv6 routes by installing a default IPv6 route together with a filter, that drops every IPv6 route received on our transits. After that I could not make any more experimentation. Need to have a maintenance window during the night. These routers have shared IPv4 and IPv6 memory space. My theory is that the combined prefix numbers is causing the problem. But it could also be some IPv6 prefix first seen this night, that triggers a bug. Or something else. Regards, Baldur
Hello Baldur, What routers are you running? -Mike
On May 15, 2019, at 11:22, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Hello
On Wed, May 15, 2019 at 3:56 PM Mike Hammett <nanog@ics-il.net> wrote: What is the most common platform people are using with such limitations? How long ago was it deprecated?
We are a small network with approx 10k customers and two core routers. The routers are advertised as 2 million FIB and 10 million RIB.
This morning at about 2 AM CET our iBGP session between the two core routers started flapping every 5 minutes. This is how long it takes to exchange the full table between the routers. The eBGP sessions to our transits were stable and never went down.
The iBGP session is a MPLS multiprotocol BGP session that exhanges IPv4, IPv6 and VRF in a single session.
We are working closely together with another ISP that have the same routers. His network went down as well.
Nothing would help until I culled the majority of the IPv6 routes by installing a default IPv6 route together with a filter, that drops every IPv6 route received on our transits. After that I could not make any more experimentation. Need to have a maintenance window during the night.
These routers have shared IPv4 and IPv6 memory space. My theory is that the combined prefix numbers is causing the problem. But it could also be some IPv6 prefix first seen this night, that triggers a bug. Or something else.
Regards,
Baldur
My purpose is not to shame the vendor, but anyway these are ZTE M6000. We are currently planing to implement Juniper MX204 instead, but not because of this incident. We just ran out of bandwidth and brand new MX204 are cheaper than 100G capable shelves for the old platform. Regards, Baldur On Wed, May 15, 2019 at 8:42 PM <mike.lyon@gmail.com> wrote:
Hello Baldur,
What routers are you running?
-Mike
On May 15, 2019, at 11:22, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Hello
On Wed, May 15, 2019 at 3:56 PM Mike Hammett <nanog@ics-il.net> wrote:
What is the most common platform people are using with such limitations? How long ago was it deprecated?
We are a small network with approx 10k customers and two core routers. The routers are advertised as 2 million FIB and 10 million RIB.
This morning at about 2 AM CET our iBGP session between the two core routers started flapping every 5 minutes. This is how long it takes to exchange the full table between the routers. The eBGP sessions to our transits were stable and never went down.
The iBGP session is a MPLS multiprotocol BGP session that exhanges IPv4, IPv6 and VRF in a single session.
We are working closely together with another ISP that have the same routers. His network went down as well.
Nothing would help until I culled the majority of the IPv6 routes by installing a default IPv6 route together with a filter, that drops every IPv6 route received on our transits. After that I could not make any more experimentation. Need to have a maintenance window during the night.
These routers have shared IPv4 and IPv6 memory space. My theory is that the combined prefix numbers is causing the problem. But it could also be some IPv6 prefix first seen this night, that triggers a bug. Or something else.
Regards,
Baldur
I wouldn't call it shaming the vendor. There are a ton of platforms out there by nearly every vendor that can't accommodate modern table sizes. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Baldur Norddahl" <baldur.norddahl@gmail.com> To: nanog@nanog.org Sent: Wednesday, May 15, 2019 1:47:24 PM Subject: Re: BGP prefix filter list My purpose is not to shame the vendor, but anyway these are ZTE M6000. We are currently planing to implement Juniper MX204 instead, but not because of this incident. We just ran out of bandwidth and brand new MX204 are cheaper than 100G capable shelves for the old platform. Regards, Baldur On Wed, May 15, 2019 at 8:42 PM < mike.lyon@gmail.com > wrote: Hello Baldur, What routers are you running? -Mike On May 15, 2019, at 11:22, Baldur Norddahl < baldur.norddahl@gmail.com > wrote: <blockquote> Hello On Wed, May 15, 2019 at 3:56 PM Mike Hammett < nanog@ics-il.net > wrote: <blockquote> What is the most common platform people are using with such limitations? How long ago was it deprecated? We are a small network with approx 10k customers and two core routers. The routers are advertised as 2 million FIB and 10 million RIB. This morning at about 2 AM CET our iBGP session between the two core routers started flapping every 5 minutes. This is how long it takes to exchange the full table between the routers. The eBGP sessions to our transits were stable and never went down. The iBGP session is a MPLS multiprotocol BGP session that exhanges IPv4, IPv6 and VRF in a single session. We are working closely together with another ISP that have the same routers. His network went down as well. Nothing would help until I culled the majority of the IPv6 routes by installing a default IPv6 route together with a filter, that drops every IPv6 route received on our transits. After that I could not make any more experimentation. Need to have a maintenance window during the night. These routers have shared IPv4 and IPv6 memory space. My theory is that the combined prefix numbers is causing the problem. But it could also be some IPv6 prefix first seen this night, that triggers a bug. Or something else. Regards, Baldur </blockquote> </blockquote>
Hi Baldur, Have you tried disabling storage of received updates from your upstream on your edge/PE or Border? Just remove *soft-reconfiguration inbound* for eBGP peering with your upstream/s. This will resolve your issue. If you have multiple links to different upstream providers and you want to simplify your network operation, you might want to introduce a pair of route reflectors to handle all your IP and MPLS VPN routes... Cheers, Ahad On Thu, May 16, 2019 at 4:24 AM Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Hello
On Wed, May 15, 2019 at 3:56 PM Mike Hammett <nanog@ics-il.net> wrote:
What is the most common platform people are using with such limitations? How long ago was it deprecated?
We are a small network with approx 10k customers and two core routers. The routers are advertised as 2 million FIB and 10 million RIB.
This morning at about 2 AM CET our iBGP session between the two core routers started flapping every 5 minutes. This is how long it takes to exchange the full table between the routers. The eBGP sessions to our transits were stable and never went down.
The iBGP session is a MPLS multiprotocol BGP session that exhanges IPv4, IPv6 and VRF in a single session.
We are working closely together with another ISP that have the same routers. His network went down as well.
Nothing would help until I culled the majority of the IPv6 routes by installing a default IPv6 route together with a filter, that drops every IPv6 route received on our transits. After that I could not make any more experimentation. Need to have a maintenance window during the night.
These routers have shared IPv4 and IPv6 memory space. My theory is that the combined prefix numbers is causing the problem. But it could also be some IPv6 prefix first seen this night, that triggers a bug. Or something else.
Regards,
Baldur
Can you check the actual FIB usage? With 2m IPv4 divided into v4 and v6 * Fast ReRoute could hit the limit. Baldur Norddahl <baldur.norddahl@gmail.com> schrieb am Mi., 15. Mai 2019, 20:24:
Hello
On Wed, May 15, 2019 at 3:56 PM Mike Hammett <nanog@ics-il.net> wrote:
What is the most common platform people are using with such limitations? How long ago was it deprecated?
We are a small network with approx 10k customers and two core routers. The routers are advertised as 2 million FIB and 10 million RIB.
This morning at about 2 AM CET our iBGP session between the two core routers started flapping every 5 minutes. This is how long it takes to exchange the full table between the routers. The eBGP sessions to our transits were stable and never went down.
The iBGP session is a MPLS multiprotocol BGP session that exhanges IPv4, IPv6 and VRF in a single session.
We are working closely together with another ISP that have the same routers. His network went down as well.
Nothing would help until I culled the majority of the IPv6 routes by installing a default IPv6 route together with a filter, that drops every IPv6 route received on our transits. After that I could not make any more experimentation. Need to have a maintenance window during the night.
These routers have shared IPv4 and IPv6 memory space. My theory is that the combined prefix numbers is causing the problem. But it could also be some IPv6 prefix first seen this night, that triggers a bug. Or something else.
Regards,
Baldur
On Wed, 15 May 2019, Baldur Norddahl wrote:
Hello
This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar?
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
This may be too old to be terribly useful other than as a starting point, but we went through essentially the same thing a little more than 10 years ago: http://jonsblog.lewis.org/2008/01/19#bgp ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Hi, did you find https://labs.ripe.net/Members/emileaben/768k-day-will-it-happen-did-it-happe... ? It has further links at the end as well. If you hit the 768k issue for IPv4 you might look at IPv6 as well as there might be a 64k limit on some tcam profiles. If there is no IPv6 in use (very sad face) there might be the option to switch to a 1m IPv4 route profile. Using a default route might influence Reverse Path Forwarding on the device. But you can apply outbound ACL on upstream ports as well. The weekly routing table report has lists of worst offenders when it comes to de aggregation or https://www.cidr-report.org/as2.0/ Karsten Am Mi., 15. Mai 2019 um 13:45 Uhr schrieb Baldur Norddahl <baldur.norddahl@gmail.com>:
Hello
This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar?
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
Thanks,
Baldur
On Wed, May 15, 2019, at 13:44, Baldur Norddahl wrote:
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
Hi, You can start here : http://www.cidr-report.org/as2.0/#Gains You will have to do some manual work in order to identify how to optimally filter, but you may save some space. But the more important questions are: - how long will it last after one round of clean-up ? - can't you afford to use default route ? You can use tools like AS-Stats (or the more expensive and much more powerful alternatives) if your hardware allows it, in order to get the ASes that you have close to no traffic towards and leave those via default. Or, if you can afford a dedicated internet border router, there are models that start getting to decent pricing level on refurbished market (a thought to ASR9001 that should be pretty cheap these days).
At a previous company , about 10-ish years ago, had the same problem due to equipment limitations, and wasn't able to get dollars to upgrade anything. The most effective thing for me at the time was to start dumping any prefix with an as-path length longer than 10. For our business then, if you were that 'far away' , there wasn't any good reason for us to keep your route. Following default was going to be good enough. It's still a reasonable solution I think in a lot of cases to filter out a lot of the unnecessary prepend messes out there today. On Wed, May 15, 2019 at 7:45 AM Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Hello
This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar?
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
Thanks,
Baldur
Hello, As a comment, after receiving several complains and after looking many cases, we evaluated what is better, to cut the table size filtering "big" network or "small" networks. Of course this is a difficult scenario and I guess there are mix thinking about this, however, we concluded that the people (networks) that is less affected are those who learn small network prefixes (such as /24, /23, /22, /21 in the v4 world). If you learn, let's say, up to /22 (v4), and someone hijacks one /21 you will learn the legitimate prefix and the hijacked prefix. Now, the owner of the legitimate prefix wants to defends their routes announcing /23 or /24, of course those prefixes won't be learnt if they are filtered. We published this some time ago (sorry, in Spanish): http://w4.labs.lacnic.net/site/BGP-network-size-filters That's it, my two cents. Alejandro, On 5/15/19 7:43 AM, Baldur Norddahl wrote:
Hello
This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar?
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
Thanks,
Baldur
This discussion is very interesting, I didn't know about this problem, it has implications to our work on routing security, thanks! On Sat, May 18, 2019 at 11:37 AM Alejandro Acosta < alejandroacostaalamo@gmail.com> wrote:
If you learn, let's say, up to /22 (v4), and someone hijacks one /21 you will learn the legitimate prefix and the hijacked prefix. Now, the owner of the legitimate prefix wants to defends their routes announcing /23 or /24, of course those prefixes won't be learnt if they are filtered.
I wonder if this really is a consideration to avoid filtering small prefixes (e.g. /24): - attackers are quite likely to do sub-prefix hijacks (or say a specific /24), so I'm not sure this `hits' defenders more than it `hits' attackers - I think we're talking only/mostly about small providers here, right? as larger providers probably will not have such problems of tables exceeding router resources.I expect such small providers normally connect thru several tier-2 or so providers... if these upper-tier providers get hijacked, the fact you've prevented this at the stub/multihome ISP may not help much - we showed how this happens with ROV in our NDSS paper on it: https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/are-we-there-yet... Amir Herzberg Comcast professor for security innovation Dept. of Computer Science and Engineering, University of Connecticut Foundations of Cybersecurity: https://www.researchgate.net/project/Lecture-notes-on-Introduction-to-Cyber-... Homepage: https://sites.google.com/site/amirherzberg/home
Hello Amir, On 5/18/19 1:08 PM, Amir Herzberg wrote:
This discussion is very interesting, I didn't know about this problem, it has implications to our work on routing security, thanks!
Your welcome..., since long time ago I wanted to expose our findings in English.
On Sat, May 18, 2019 at 11:37 AM Alejandro Acosta <alejandroacostaalamo@gmail.com <mailto:alejandroacostaalamo@gmail.com>> wrote:
If you learn, let's say, up to /22 (v4), and someone hijacks one /21 you will learn the legitimate prefix and the hijacked prefix. Now, the owner of the legitimate prefix wants to defends their routes announcing /23 or /24, of course those prefixes won't be learnt if they are filtered.
I wonder if this really is a consideration to avoid filtering small prefixes (e.g. /24):
My position is exactly the opposite.
- attackers are quite likely to do sub-prefix hijacks (or say a specific /24), so I'm not sure this `hits' defenders more than it `hits' attackers
Yes, you are right, but anyhow -IMHO- this still better than not learning small prefixes at all.
- I think we're talking only/mostly about small providers here, right? as larger providers probably will not have such problems of tables exceeding router resources.I expect such small providers normally connect thru several tier-2 or so providers... if these upper-tier providers get hijacked, the fact you've prevented this at the stub/multihome ISP may not help much - we showed how this happens with ROV in our NDSS paper on it: https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/are-we-there-yet...
You are right here. Thanks for the link, I will take a look. Alejandro,
Amir Herzberg Comcast professor for security innovation Dept. of Computer Science and Engineering, University of Connecticut
Foundations of Cybersecurity: https://www.researchgate.net/project/Lecture-notes-on-Introduction-to-Cyber-...
Gracias Alejandro, I had never considered anti-hijack, anti-DoS, or RTBH advertisements in this equation. Another knock against filtering based on prefix size is that it may not have the intended outcome on some platforms. As I recall reading about one vendor's platform (the ASR9k perhaps?) and its TCAM organization process, it stored /32 routes in a dedicated area for faster lookups and did the same for /24 routes. If one were to remove just the /24 routes from their RIB, the result would free up space in the storage area dedicated for /24's, but would consequently put more pressure on the areas reserved for prefixes between /0 and /23 as covering routes are installed into FIB. The result of removing /24's from the RIB on this platform would, unintuitively, put the user in a worse position with regard to TCAM utilization - not a better one. If one is going to filter routes from his or her router's RIB, doing so based on subnet size seems to be a poor way. Doing so based on AS depth (your second solution) has fewer disadvantages in my opinion. As others have mentioned, there are even more intelligent ways of filtering but they rely on outside knowledge like cost, bandwidth, delay, or the importance to your customers of reaching a given destination - stuff not normally known to BGP. Alejandro Acosta wrote on 5/18/2019 10:35 AM:
Hello,
As a comment, after receiving several complains and after looking many cases, we evaluated what is better, to cut the table size filtering "big" network or "small" networks. Of course this is a difficult scenario and I guess there are mix thinking about this, however, we concluded that the people (networks) that is less affected are those who learn small network prefixes (such as /24, /23, /22, /21 in the v4 world).
If you learn, let's say, up to /22 (v4), and someone hijacks one /21 you will learn the legitimate prefix and the hijacked prefix. Now, the owner of the legitimate prefix wants to defends their routes announcing /23 or /24, of course those prefixes won't be learnt if they are filtered.
We published this some time ago (sorry, in Spanish): http://w4.labs.lacnic.net/site/BGP-network-size-filters
That's it, my two cents.
Alejandro,
On 5/15/19 7:43 AM, Baldur Norddahl wrote:
Hello
This morning we apparently had a problem with our routers not handling the full table. So I am looking into culling the least useful prefixes from our tables. I can hardly be the first one to take on that kind of project, and I am wondering if there is a ready made prefix list or similar?
Or maybe we have a list of worst offenders? I am looking for ASN that announces a lot of unnecessary /24 prefixes and which happens to be far away from us? I would filter those to something like /20 and then just have a default route to catch all.
Thanks,
Baldur
From: NANOG <nanog-bounces@nanog.org> On Behalf Of Blake Hudson Sent: Monday, May 20, 2019 4:35 PM
As I recall reading about one vendor's platform (the ASR9k perhaps?) and its TCAM organization process, it stored /32 routes in a dedicated area for faster lookups and did the same for /24 routes.
Yes that was true for the first generation (trident based) line-cards and is no longer the case anymore. adam
adamv0025@netconsultings.com wrote on 5/22/2019 3:23 AM:
From: NANOG <nanog-bounces@nanog.org> On Behalf Of Blake Hudson Sent: Monday, May 20, 2019 4:35 PM
As I recall reading about one vendor's platform (the ASR9k perhaps?) and its TCAM organization process, it stored /32 routes in a dedicated area for faster lookups and did the same for /24 routes.
Yes that was true for the first generation (trident based) line-cards and is no longer the case anymore.
adam
Thanks Adam! For the life of me I could not remember where I read that information or what platform it applied to. I do recall it being a very transparent view into TCAM organization and I appreciated the insight. It was also a good reminder that it pays to understand your platform as I had previously (naively) thought that a 1M capacity FIB could hold 1M entries with any mask size, whether those be 1M /32 entries (a BRAS with 1M PPP/BNG subscribers) or 1M /24 or bigger entries (a BGP edge router). This was obviously not the case on that platform.
participants (31)
-
adamv0025@netconsultings.com
-
Ahad Aboss
-
Alejandro Acosta
-
Amir Herzberg
-
Anderson, Charles R
-
Antonios Chariton
-
Baldur Norddahl
-
Blake Hudson
-
Brielle
-
Brielle Bruns
-
Ca By
-
Christopher Morrow
-
Dan White
-
Denys Fedoryshchenko
-
Dovid Bender
-
Hansen, Christoffer
-
i3D.net - Martijn Schmidt
-
Jon Lewis
-
Karsten Elfenbein
-
Mark Tinka
-
Martin Hannigan
-
Mike
-
Mike Hammett
-
mike.lyon@gmail.com
-
Phil Lavin
-
Radu-Adrian Feurdean
-
Robert Blayzor
-
Ross Tajvar
-
Seth Mattinen
-
Tom Beecher
-
William Herrin