RE: Newbies Question: Do I really need to sacrifice Prefix-aggregation to do BGP Load-sharing?
Dear all, Before all else: thank you all for the lightning-fast responses (even taking the time zone advantage into account). I really, really, really appreciate all your recommendations. Virtually all of you recommend prepending as the first choice. I also get the feeling that you guys consider de-aggregation “distasteful” (at the least) but sometimes unavoidable. I have considered the prepending myself, but dare not implement it yet for the fear that BGP (Human) Community will burn me alive, witch-hunt style, because of the following reasons: 1. I can see from looking glass(es) that my upstreams already practice prepending (some paths) at their level (at least 3 more hops [x4]), supposedly to “balance” their bandwidth. 2. Should I start prepending mine, I might upset their balance, causing them to prepend more, thus starting a “prepend war”. [I imagine that x20+ prepending starts out this way] The way I see it, prepending (or maybe even the whole BGP-Path thing) is a local-optimization problem: it’s only best for someone, not globally. And the Higher-Tiers (Lower Tier-Numbers) will always “engineer” me in the end. Worse yet, I might be out-voted by de-aggregation insider “cultists” anyway. Which forces me to proactively ask you guys questions about ROV-Overlapping and ROV “Hijack Gap” soon, in another posting with separate “Subject:”. Again, Thank you. Cheers, Pirawat. P.S. [Off-Topic] Any comment on the “SCION” System? Any good (I will even take "academically")? [Reference: https://scion-architecture.net/]
If your Upstream(Transit provider) prepends your routes without you asking or authorizing it to do so, you should SERIOUSLY consider switching providers! In the other email I talked about traffic engineering BGP communities. If those prepends were made from some community you were applying... OK, that's great! Even better if you could apply a community that did something like "apply 2 prepends for south america only". But a Transit Provider changing the AS-PATH (in addition to the mandatory hop) arbitrarily without your consent is not for good people. P.S. Your email replies are breaking threads in email readers. I suggest you review the email client tool. Em qui., 20 de out. de 2022 às 09:16, Pirawat WATANAPONGSE via NANOG < nanog@nanog.org> escreveu:
Dear all,
Before all else: thank you all for the lightning-fast responses (even taking the time zone advantage into account). I really, really, really appreciate all your recommendations.
Virtually all of you recommend prepending as the first choice. I also get the feeling that you guys consider de-aggregation “distasteful” (at the least) but sometimes unavoidable.
I have considered the prepending myself, but dare not implement it yet for the fear that BGP (Human) Community will burn me alive, witch-hunt style, because of the following reasons: 1. I can see from looking glass(es) that my upstreams already practice prepending (some paths) at their level (at least 3 more hops [x4]), supposedly to “balance” their bandwidth. 2. Should I start prepending mine, I might upset their balance, causing them to prepend more, thus starting a “prepend war”. [I imagine that x20+ prepending starts out this way]
The way I see it, prepending (or maybe even the whole BGP-Path thing) is a local-optimization problem: it’s only best for someone, not globally. And the Higher-Tiers (Lower Tier-Numbers) will always “engineer” me in the end.
Worse yet, I might be out-voted by de-aggregation insider “cultists” anyway.
Which forces me to proactively ask you guys questions about ROV-Overlapping and ROV “Hijack Gap” soon, in another posting with separate “Subject:”.
Again, Thank you.
Cheers,
Pirawat.
P.S. [Off-Topic] Any comment on the “SCION” System? Any good (I will even take "academically")? [Reference: https://scion-architecture.net/]
-- Douglas Fernando Fischer Engº de Controle e Automação
Reading between the lines this network’s current lack of diverse providers is consistent with a geographic/monopoly disadvantage. I do agree that your transit provider is in bad form to pad your routes, but it does happen. A phone call or email to understand their limitations may be helpful. Trying to fit all of your traffic into an upstream’s own uplink that is far to small does not provide the best user experience. It could be an bug in the route-map. Speaking of bugs, trying to use communities can cause you to observe bugs in other network’s route-maps (with great power comes great…). Padding much past three usually has little affect. Splitting your advertisement into say four smaller announcements and starting to advertise them one at a time through your preferred provider is a good place to start. Traffic will prefer the more specific route. With luck that was done last night 😊 Once you have balanced this out somewhat, you have bought yourself time. Next fun thing is to understand how this works when one provider fails or similar. Traffic can prefer the oldest route, so a small bump down the road can cause unanticipated traffic changes the next nightly peak. Or to put it another way, this is how the sausage is made. P.S. Both of us top posting is also bad form. Kevin Burke 802-540-0979 Burlington Telecom 200 Church St, Burlington, VT From: NANOG <nanog-bounces+kburke=burlingtontelecom.com@nanog.org> On Behalf Of Douglas Fischer Sent: Thursday, October 20, 2022 8:51 AM To: Pirawat WATANAPONGSE <pirawat.w@ku.th> Cc: nanog@nanog.org Subject: Re: Newbies Question: Do I really need to sacrifice Prefix-aggregation to do BGP Load-sharing? WARNING!! This message originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email. If your Upstream(Transit provider) prepends your routes without you asking or authorizing it to do so, you should SERIOUSLY consider switching providers! In the other email I talked about traffic engineering BGP communities. If those prepends were made from some community you were applying... OK, that's great! Even better if you could apply a community that did something like "apply 2 prepends for south america only". But a Transit Provider changing the AS-PATH (in addition to the mandatory hop) arbitrarily without your consent is not for good people. P.S. Your email replies are breaking threads in email readers. I suggest you review the email client tool. Em qui., 20 de out. de 2022 às 09:16, Pirawat WATANAPONGSE via NANOG <nanog@nanog.org<mailto:nanog@nanog.org>> escreveu: Dear all, Before all else: thank you all for the lightning-fast responses (even taking the time zone advantage into account). I really, really, really appreciate all your recommendations. Virtually all of you recommend prepending as the first choice. I also get the feeling that you guys consider de-aggregation “distasteful” (at the least) but sometimes unavoidable. I have considered the prepending myself, but dare not implement it yet for the fear that BGP (Human) Community will burn me alive, witch-hunt style, because of the following reasons: 1. I can see from looking glass(es) that my upstreams already practice prepending (some paths) at their level (at least 3 more hops [x4]), supposedly to “balance” their bandwidth. 2. Should I start prepending mine, I might upset their balance, causing them to prepend more, thus starting a “prepend war”. [I imagine that x20+ prepending starts out this way] The way I see it, prepending (or maybe even the whole BGP-Path thing) is a local-optimization problem: it’s only best for someone, not globally. And the Higher-Tiers (Lower Tier-Numbers) will always “engineer” me in the end. Worse yet, I might be out-voted by de-aggregation insider “cultists” anyway. Which forces me to proactively ask you guys questions about ROV-Overlapping and ROV “Hijack Gap” soon, in another posting with separate “Subject:”. Again, Thank you. Cheers, Pirawat. P.S. [Off-Topic] Any comment on the “SCION” System? Any good (I will even take "academically")? [Reference: https://scion-architecture.net/] -- Douglas Fernando Fischer Engº de Controle e Automação
1. Prepending by itself isn’t bad. Prepending past the point that it is effective in accomplishing anything is what you generally want to avoid. Even then, it’s not nearly as big a deal as some make it out to be in most cases. 2. De-aggregation has it’s uses and it’s place. Have a /20 , but announcing all the component /24s, even though you aren’t doing anything different with any of those? Bad practice. You’re just polluting the global table size for no good reason. However, perhaps you have a set of hosts in a single /24 that you want to try and protect from a prefix hijack. Announce the /20 and that singe /24. Not perfect protection , but provides some cover, and isn’t that big a deal. The answers to all of these questions are really : “It depends on what you are trying to do.” There are generally accepted solutions to certain problems, and there are plenty of dumb solutions that are the only thing possible due to circumstances, so sometimes that’s what you have to do too. Don’t worry about the pitchforks so much. :) On Thu, Oct 20, 2022 at 08:15 Pirawat WATANAPONGSE via NANOG < nanog@nanog.org> wrote:
Dear all,
Before all else: thank you all for the lightning-fast responses (even taking the time zone advantage into account). I really, really, really appreciate all your recommendations.
Virtually all of you recommend prepending as the first choice. I also get the feeling that you guys consider de-aggregation “distasteful” (at the least) but sometimes unavoidable.
I have considered the prepending myself, but dare not implement it yet for the fear that BGP (Human) Community will burn me alive, witch-hunt style, because of the following reasons: 1. I can see from looking glass(es) that my upstreams already practice prepending (some paths) at their level (at least 3 more hops [x4]), supposedly to “balance” their bandwidth. 2. Should I start prepending mine, I might upset their balance, causing them to prepend more, thus starting a “prepend war”. [I imagine that x20+ prepending starts out this way]
The way I see it, prepending (or maybe even the whole BGP-Path thing) is a local-optimization problem: it’s only best for someone, not globally. And the Higher-Tiers (Lower Tier-Numbers) will always “engineer” me in the end.
Worse yet, I might be out-voted by de-aggregation insider “cultists” anyway.
Which forces me to proactively ask you guys questions about ROV-Overlapping and ROV “Hijack Gap” soon, in another posting with separate “Subject:”.
Again, Thank you.
Cheers,
Pirawat.
P.S. [Off-Topic] Any comment on the “SCION” System? Any good (I will even take "academically")? [Reference: https://scion-architecture.net/]
On Thu, 20 Oct 2022, Tom Beecher wrote:
1. Prepending by itself isn’t bad. Prepending past the point that it is effective in accomplishing anything is what you generally want to avoid. Even then, it’s not nearly as big a deal as some make it out to be in most cases.
To me, it's somewhat comical to see routes prepended 10-20 or more times. If one or two prepends doesn't do it, 10-20 isn't likely to either. AFAIK, it's pretty common to use localpref to prefer peering (free) routes over transit (paid paths), and in cases where remote networks see your prepended path via peering, "no amount" of prepends is going change the fact that they prefer the free path. While writing this though, two things occurred to me. 1) Are there any networks with routing policy that looks at prepends and says "if we see a peering path with >X number of prepends (or maybe just path length >X), demote the localpref to transit or lower"? "i.e. They obviously don't want us using this path, turn it into a backup path." 2) Particularly back when it was found some BGP implementations broke when encountering unusually long as-paths, I think it became somewhat common to reject routes with "crazy long" as-paths. If such policy is still in place in many networks, excessive prepending would actually have the desired effect for those networks. i.e. The excessive prepends would get that path rejected, keeping it from being used. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
1) Are there any networks with routing policy that looks at prepends and says "if we see a peering path with >X number of prepends (or maybe just path length >X), demote the localpref to transit or lower"? "i.e. They obviously don't want us using this path, turn it into a backup path."
Yes. At a previous job, this is exactly what I did. If the path length was X or longer, set localpref to our last resort value. If path length was Y or longer, then I dropped completely, and at that point following defaults was just as good. Maybe once I hit something that caused a performance problem , but an email to that AS was all it took to fix ; they didn't realize they were prepending that much and corrected it. I have firsthand knowledge of some other networks that do similar things. On Thu, Oct 20, 2022 at 9:21 AM Jon Lewis <jlewis@lewis.org> wrote:
On Thu, 20 Oct 2022, Tom Beecher wrote:
1. Prepending by itself isn’t bad. Prepending past the point that it is effective in accomplishing anything is what you generally want to avoid. Even then, it’s not nearly as big a deal as some make it out to be in most cases.
To me, it's somewhat comical to see routes prepended 10-20 or more times. If one or two prepends doesn't do it, 10-20 isn't likely to either.
AFAIK, it's pretty common to use localpref to prefer peering (free) routes over transit (paid paths), and in cases where remote networks see your prepended path via peering, "no amount" of prepends is going change the fact that they prefer the free path.
While writing this though, two things occurred to me.
1) Are there any networks with routing policy that looks at prepends and says "if we see a peering path with >X number of prepends (or maybe just path length >X), demote the localpref to transit or lower"? "i.e. They obviously don't want us using this path, turn it into a backup path."
2) Particularly back when it was found some BGP implementations broke when encountering unusually long as-paths, I think it became somewhat common to reject routes with "crazy long" as-paths. If such policy is still in place in many networks, excessive prepending would actually have the desired effect for those networks. i.e. The excessive prepends would get that path rejected, keeping it from being used.
---------------------------------------------------------------------- Jon Lewis, MCP :) | I route StackPath, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On Thu, Oct 20, 2022 at 6:23 AM Jon Lewis <jlewis@lewis.org> wrote:
[...] While writing this though, two things occurred to me.
1) Are there any networks with routing policy that looks at prepends and says "if we see a peering path with >X number of prepends (or maybe just path length >X), demote the localpref to transit or lower"? "i.e. They obviously don't want us using this path, turn it into a backup path."
2) Particularly back when it was found some BGP implementations broke when encountering unusually long as-paths, I think it became somewhat common to reject routes with "crazy long" as-paths. If such policy is still in place in many networks, excessive prepending would actually have the desired effect for those networks. i.e. The excessive prepends would get that path rejected, keeping it from being used.
At a previous job, I explicitly crafted policies that were structured such that: if PREFIXLENGTH > MAXPREFIXLENGTH then reject if ASPATH > MAXASPATH then reject strip_internal_communities if ASPATH > MAX_VALID_PATH then set localpref = TRANSIT_DEPREF_LOCALPREF set communities DEPREF_TRANSIT blah blah blah if match external_signal_communities then set localpref set internal propagation communities set external propagation communities blah blah blah then accept that way, if the prefix size is too small, or the aspath is too long (>100), it gets dropped before even bothering to evaluate communities; save every bit of CPU and memory you can. Then, strip your internal communities off everything else that's a reasonable path length; set a lower threshold for what you consider a "reasonable" internet diameter to be, including a reasonable 3x prepend at one or two levels; if it's longer than that, it's a backup path at best, treat it that way (below standard transit level) finally, on all the remaining routes, evaluate your external signalling communities, and apply internal signalling communities as appropriate, and process normally. There's a clear tradeoff between trying to ensure maximum reachability to the rest of the internet versus protecting your CPU and memory from unnecessary work and state-keeping. As mentioned in another thread, what each network decides the MAXPREFIXLENGTH is will depend on their relationships and the capabilities of their hardware. It doesn't necessarily have to be /24 and /48, but it should be set at the longest value your network can happily support, unless you want to chase down odd connectivity issues in other people's networks. ^_^; Thanks! Matt
On Thu, Oct 20, 2022 at 5:13 AM Pirawat WATANAPONGSE via NANOG <nanog@nanog.org> wrote:
I have considered the prepending myself, but dare not implement it yet for the fear that BGP (Human) Community will burn me alive, witch-hunt style, because of the following reasons: 1. I can see from looking glass(es) that my upstreams already practice prepending (some paths) at their level (at least 3 more hops [x4]), supposedly to “balance” their bandwidth. 2. Should I start prepending mine, I might upset their balance, causing them to prepend more, thus starting a “prepend war”. [I imagine that x20+ prepending starts out this way]
The way I see it, prepending (or maybe even the whole BGP-Path thing) is a local-optimization problem: it’s only best for someone, not globally. And the Higher-Tiers (Lower Tier-Numbers) will always “engineer” me in the end.
Worse yet, I might be out-voted by de-aggregation insider “cultists” anyway.
Hi Pirawat, You asked the experts how it's done. It's done with prepends. Do you really want to argue with the answer? De-aggregation is a last resort, the bluntest tool in the toolchest. And it costs other people money so they don't appreciate you doing it unless you absolutely have to. https://bill.herrin.us/network/bgpcost.html As others have said, no one is going to yell at you because you prepended your AS two or three or even five times. If you don't get the desired effect after 5, you're running up against a problem prepends won't solve. The typical problem is that your upstream has used "localprefs" to prefer a particular path to you, overriding AS path length as the deciding factor. Competent upstreams that employ this technique also allow you to set a "BGP community" on your advertisement that overrides this behavior. A "BGP Community" is a 32-bit number often expressed as two 16-bit numbers the first of which is the ISP's AS number. When detected by the router, the number causes it to apply some locally-chosen rule to the route. If you ask them, the ISP will provide you with a list of "BGP Communities" (numbers) they allow you to set on your route advertisement along with what action they will take if they see that number. Regards, Bill Herrin -- For hire. https://bill.herrin.us/resume/
participants (7)
-
Douglas Fischer
-
Jon Lewis
-
Kevin Burke
-
Matthew Petach
-
Pirawat WATANAPONGSE
-
Tom Beecher
-
William Herrin