Howdy, Does anyone have suggestions for dealing with networks who ignore my BGP route prepends? I have a primary ingress with no prepends and then several distant backups with multiple prepends of my own AS number. My intention, of course, is that folks take the short path to me whenever it's reachable. A few years ago, Comcast decided it would prefer the 5000 mile, five-prepend loop to the short 10 mile path. I was able to cure that with a community telling my ISP along that path to not advertise my route to Comcast. Today it's Centurylink. Same story; they'd rather send the packets 5000 miles to the other coast and back than 10 miles across town. I know they have the correct route because when I withdraw the distant ones entirely, they see and use it. But this time it's not just one path; they prefer any other path except the one I want them to use. And Centurylink is not a peer of those ISPs, so there doesn't appear to be any community I can use to tell them not to use the route. I hate to litter the table with a batch of more-specifics that only originate from the short, preferred link but I'm at a loss as to what else to do. Advice would be most welcome. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
Prepend contraction is becoming more common. You can’t really stop providers from doing it, and it reduces BGP table size, which I’ve heard as a secondary rationale. I’d love to see the statistics on that though. BGP Communities seem to be the only alternative, and that limits your engineering reach to mostly immediate peers. Another problem is providers that hide multiple router hops inside MPLS, which appears as a single ip hop in traceroutes, making it impossible to know the truth path geographically. The Internet is lying to itself, and that’s not a situation that can persist forever. -mel via cell
On Jan 22, 2024, at 4:52 AM, William Herrin <bill@herrin.us> wrote:
Howdy,
Does anyone have suggestions for dealing with networks who ignore my BGP route prepends?
I have a primary ingress with no prepends and then several distant backups with multiple prepends of my own AS number. My intention, of course, is that folks take the short path to me whenever it's reachable.
A few years ago, Comcast decided it would prefer the 5000 mile, five-prepend loop to the short 10 mile path. I was able to cure that with a community telling my ISP along that path to not advertise my route to Comcast. Today it's Centurylink. Same story; they'd rather send the packets 5000 miles to the other coast and back than 10 miles across town. I know they have the correct route because when I withdraw the distant ones entirely, they see and use it. But this time it's not just one path; they prefer any other path except the one I want them to use. And Centurylink is not a peer of those ISPs, so there doesn't appear to be any community I can use to tell them not to use the route.
I hate to litter the table with a batch of more-specifics that only originate from the short, preferred link but I'm at a loss as to what else to do.
Advice would be most welcome.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
The Internet is lying to itself, and that’s not a situation that can persist forever.
I am not sure I agree. First, prepends are a suggestion. Perhaps a request. It has never (or at least not for the 3 decades I’ve been doing this) been a guarantee. In the situation below, perhaps the 5K mile backup path is through a provider who pays Centurylink (Lumen?). Standard practice is to localpref your customers up, which makes prepends irrelevant. Why would anyone expect different behavior? As for hiding hops, that is not lying. What happens inside my network is my business. If I give the world some info, say with in-addrs on hops, that’s fine. If I do not, I am not “lying”. This is perfectly sustainable, nothing will break (IMHO). In fact, I would argue without tools like MPLS, the Internet would have broken a long time ago. -- TTFN, patrick
On Jan 22, 2024, at 08:13, Mel Beckman <mel@beckman.org> wrote:
Prepend contraction is becoming more common. You can’t really stop providers from doing it, and it reduces BGP table size, which I’ve heard as a secondary rationale. I’d love to see the statistics on that though.
BGP Communities seem to be the only alternative, and that limits your engineering reach to mostly immediate peers.
Another problem is providers that hide multiple router hops inside MPLS, which appears as a single ip hop in traceroutes, making it impossible to know the truth path geographically.
The Internet is lying to itself, and that’s not a situation that can persist forever.
-mel via cell
On Jan 22, 2024, at 4:52 AM, William Herrin <bill@herrin.us> wrote:
Howdy,
Does anyone have suggestions for dealing with networks who ignore my BGP route prepends?
I have a primary ingress with no prepends and then several distant backups with multiple prepends of my own AS number. My intention, of course, is that folks take the short path to me whenever it's reachable.
A few years ago, Comcast decided it would prefer the 5000 mile, five-prepend loop to the short 10 mile path. I was able to cure that with a community telling my ISP along that path to not advertise my route to Comcast. Today it's Centurylink. Same story; they'd rather send the packets 5000 miles to the other coast and back than 10 miles across town. I know they have the correct route because when I withdraw the distant ones entirely, they see and use it. But this time it's not just one path; they prefer any other path except the one I want them to use. And Centurylink is not a peer of those ISPs, so there doesn't appear to be any community I can use to tell them not to use the route.
I hate to litter the table with a batch of more-specifics that only originate from the short, preferred link but I'm at a loss as to what else to do.
Advice would be most welcome.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
On Mon, Jan 22, 2024 at 5:24 AM Patrick W. Gilmore <patrick@ianai.net> wrote:
Standard practice is to localpref your customers up, which makes prepends irrelevant. Why would anyone expect different behavior?
It gives me, your paying customer, less control over my routing through your network than if I wasn't your paying customer. That seems... backwards. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
* bill@herrin.us (William Herrin) [Mon 22 Jan 2024, 15:05 CET]:
On Mon, Jan 22, 2024 at 5:24 AM Patrick W. Gilmore <patrick@ianai.net> wrote:
Standard practice is to localpref your customers up, which makes prepends irrelevant. Why would anyone expect different behavior?
It gives me, your paying customer, less control over my routing through your network than if I wasn't your paying customer. That seems... backwards.
Most sellers of IP transit offer a "treat as peer" BGP community which will flatten your localpref to that of peers rather than a customer. -- Niels.
On Mon, 22 Jan 2024, William Herrin wrote:
On Mon, Jan 22, 2024 at 5:24 AM Patrick W. Gilmore <patrick@ianai.net> wrote:
Standard practice is to localpref your customers up, which makes prepends irrelevant. Why would anyone expect different behavior?
It gives me, your paying customer, less control over my routing through your network than if I wasn't your paying customer. That seems... backwards.
Not at all. Think like a service provider. "I've got packets to deliver. I've got 3 different classes of paths I can use. One of them, I get paid to use. One is cost neutral. The last one, I pay to use." Which path would you pick (assuming you're trying to maximize revenue from your network)? ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
----- Original Message -----
From: "Jon Lewis" <jlewis@lewis.org>
On Mon, 22 Jan 2024, William Herrin wrote:
It gives me, your paying customer, less control over my routing through your network than if I wasn't your paying customer. That seems... backwards.
Not at all. Think like a service provider.
"I've got packets to deliver. I've got 3 different classes of paths I can use. One of them, I get paid to use. One is cost neutral. The last one, I pay to use."
Which path would you pick (assuming you're trying to maximize revenue from your network)?
And here, you nail it, Jon: The Internet stopped being an engineering construct many years ago, to its--and our--detriment; things work much more poorly, and harder to understand and diagnose and fix, because of this. His example, of packets going from Miami to Ft Lauderdale via One Wilshire, is a classic example. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://www.bcp38.info 2000 Land Rover DII St Petersburg FL USA BCP38: Ask For It By Name! +1 727 647 1274
On Wed, 24 Jan 2024, Jay R. Ashworth wrote:
----- Original Message -----
From: "Jon Lewis" <jlewis@lewis.org>
On Mon, 22 Jan 2024, William Herrin wrote:
It gives me, your paying customer, less control over my routing through your network than if I wasn't your paying customer. That seems... backwards.
Not at all. Think like a service provider.
"I've got packets to deliver. I've got 3 different classes of paths I can use. One of them, I get paid to use. One is cost neutral. The last one, I pay to use."
Which path would you pick (assuming you're trying to maximize revenue from your network)?
And here, you nail it, Jon:
The Internet stopped being an engineering construct many years ago, to its--and our--detriment; things work much more poorly, and harder to understand and diagnose and fix, because of this.
His example, of packets going from Miami to Ft Lauderdale via One Wilshire, is a classic example.
It can be a whole lot worse. At a previous job, running an anycast CDN, we had POPs originating the same prefixes all over the world. Cogent was one of our transit providers in most POPs (i.e. all the POPs in North America and Europe). Toward the end of my time there, Cogent started making some progress breaking into the transit market in Asia. So, we saw some eyeball networks in Asia hitting our anycast IPs via Cogent. Trouble was, the established "tier 1's" in Asia wouldn't peer with Cogent in Asia (for business reasons - i.e. they didn't want Cogent coming into their market and upsetting their apple carts). Our Asian POPs had lots of peering (IX and private) and transit from established Asian tier 1's. So this traffic from Cogent's Asian customers would land in our LA and San Jose POPs. As you can imagine, the RTT from an eyeball in Tokyo is "a bit higher" when talking to our LA POP vs our Tokyo POP. Cogent has some BGP community controls available, but nothing that says "keep this route in-region". IIRC, the closest to it they had was lower localpref when sharing with region X. Lowering localpref doesn't matter if region X has no path other than the one received from an out-of-region customer session. Our options were "stop advertising anycast to Cogent globally" or "connect to Cogent in Asia so we can serve that traffic locally from our Asian POPs." In one of his messages, William complained that the big bad networks are breaking the BGP rules by ignoring as-path length. That's nonsense. If you look at the BGP best path decision algorithm, there are several attributes considered before as-path length. Localpref is one of them...and since most networks exist to make money, it's standard practice to use localpref to make sure you route traffic economically rather than efficiently (via the shortest as-path, which may still not be the shortest actual path). For traffic you care about, obviously there's a balance between cost and performance. If you've made poor/cheap choices in your transit providers, nobody cares that your traffic takes the scenic route. At least not the networks carrying your traffic that you're not directly paying...and you're likely to find, as above, even when you are directly paying, their interests are likely to outweigh yours. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On Wed, Jan 24, 2024 at 7:02 AM Jon Lewis <jlewis@lewis.org> wrote:
In one of his messages, William complained that the big bad networks are breaking the BGP rules by ignoring as-path length.
To be clear, I don't really care whether you're "breaking the rules." Moreover, if my words suggested that I thought using BGP's local pref capability was "breaking the rules," then either you misunderstood me or I chose my words poorly. What I did say, and stand behind, was that applying local prefs moves BGP's route selection off the _defaults_, and if Centurylink was routing to me based instead on the defaults they'd have made a _good_ route selection instead of a _bad_ one. I do care whether you're routing packets in a reasonable way. When you pick the 10-AS path over the 3-AS path because the 10-AS path arrives from a customer, the odds that you're routing those packets in a _good_ way are very low. I get that a lot of you do that. I'm telling you that when you do, you're doing a _bad_ job. If you think you're justified, well, it's your business. But don't doubt for a second that you've served your customers poorly. And before you suggest that I'm not your customer, let me point out what should be obvious: if none of your paying customers were trying to reach my network, I wouldn't notice which direction you routed my packets, let alone care. It's not about serving me, it's about serving your paying customers. My packets are their packets, and when you send _their_ packets along the scenic route, you have done a bad job. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On Wed, Jan 24, 2024 at 07:25:42AM -0800, William Herrin wrote: [ snip ]
or I chose my words poorly. What I did say, and stand behind, was that applying local prefs moves BGP's route selection off the _defaults_, and if Centurylink was routing to me based instead on the defaults they'd have made a _good_ route selection instead of a _bad_ one.
This cuts both ways Bill. First, 3356 is making an intended route selection, their customer who interconnects directly into 3356 demands this. That customer who connects into 3356 probably had no idea that you (AS11875) would someday decide to take IP transit from a downstream AS of them, and your situation was likely never in their minds of consideration in their network planning. _You_ want better connectivity from 3356 to 11875 for the explicit benefit of 11875, which _you_ operate and control. That's good, so let's continue.
I do care whether you're routing packets in a reasonable way. When you pick the 10-AS path over the 3-AS path because the 10-AS path arrives from a customer, the odds that you're routing those packets in a _good_ way are very low. I get that a lot of you do that. I'm telling you that when you do, you're doing a _bad_ job. If you think you're justified, well, it's your business. But don't doubt for a second that you've served your customers poorly.
Conversely at the same time, the below is also equally true: You (AS11875) have an operational need for good connectivity into 3356 but, you made a poor purchasing decision by buying IP transit for 11875 from a provider who has 10-AS path into 3356 instead of <=3 AS path. You've done a _bad_ job here in selecting an inferior pathway into 3356, and what you SHOULD have done is to select an IP transit provider who has an optimal AS-path into 3356 to meet your operational need of having good connectivity into 3356.
And before you suggest that I'm not your customer, let me point out what should be obvious: if none of your paying customers were trying to reach my network, I wouldn't notice which direction you routed my packets, let alone care. It's not about serving me, it's about serving your paying customers. My packets are their packets, and when you send _their_ packets along the scenic route, you have done a bad job.
We can do this all day long. You (AS11875) also have the responsibility to yourself and your end-users to select and award business to an IP transit provider and make every reasonable efforts to ensuer that 11875 has good connectivity into 3356 as your operational needs require. You've abrogated that responsibility in your own AS and decided to spew non-sense over the most critical and important knob that is more important than AS_PATH (LOCAL_PREF) in BGP-4 that was developed since NSFNET days and are telling us that we're doing a poor job. Your argument fails. The internet works upon the principle of "best-effort." What you're describing is the net effect of that "best-effort", and you, as the operator and controller of AS11875 which is involved in the path are just as culpable and responsible. Moreover, you, by being the operator of an AS in the problematic path, have the wherewithal and commercial ability to fix it, without involving the rest of us. The answer right is in front of you. James
On Wed, Jan 24, 2024 at 8:11 AM James Jun <james.jun@towardex.com> wrote:
You (AS11875) have an operational need for good connectivity into 3356 but, you made a poor purchasing decision by buying IP transit for 11875 from a provider who has 10-AS path into 3356 instead of <=3 AS path. You've done a _bad_ job here in selecting an inferior pathway into 3356, and what you SHOULD have done is to select an IP transit provider who has an optimal AS-path into 3356 to meet your operational need of having good connectivity into 3356.
Sophistry. I buy IP transit from 3 providers, one of which has a 3 AS path to 3356. -Bill -- William Herrin bill@herrin.us https://bill.herrin.us/
On Wed, Jan 24, 2024 at 08:16:56AM -0800, William Herrin wrote:
On Wed, Jan 24, 2024 at 8:11???AM James Jun <james.jun@towardex.com> wrote:
You (AS11875) have an operational need for good connectivity into 3356 but, you made a poor purchasing decision by buying IP transit for 11875 from a provider who has 10-AS path into 3356 instead of <=3 AS path. You've done a _bad_ job here in selecting an inferior pathway into 3356, and what you SHOULD have done is to select an IP transit provider who has an optimal AS-path into 3356 to meet your operational need of having good connectivity into 3356.
Sophistry. I buy IP transit from 3 providers, one of which has a 3 AS path to 3356.
Again you omit context. We've already established as per the RFC, that calculation of degree of preference takes precedence over and overrides AS_PATH (Phase 1 decision). Therefore, let's rephrase what you've just said above: You're buying IP transit from 3 providers, two of which are configured with the following known constraints: - 20473 who buys from 1299, who has lower degree of preference into 3356, as 1299 and 3356 are interconnection (could be settlement-free or paid-peer) peering partners. - 53356 who buys from 47787 as a prioritized downstream customer, and then 47787 too subsequently connects into 3356 as a prioritized downstream customer. It's obviously clear that 53356 path you've bought has a priority ticket into 3356 no matter how inferior or long its AS_PATH may be, and the solution is right in front of you. Next. James
On Wed, Jan 24, 2024 at 8:39 AM James Jun <james.jun@towardex.com> wrote:
On Wed, Jan 24, 2024 at 08:16:56AM -0800, William Herrin wrote:
Sophistry. I buy IP transit from 3 providers, one of which has a 3 AS path to 3356.
Again you omit context.
What you're calling context, I call deceptive. For one thing, Centurylink's process is, like a spammer, opt-out rather than opt-in. 3356 enables the local pref unless told through a BGP community not to. There's no evidence that 47787 even knows that Centurylink is preferring them despite shorter AS paths elsewhere, let alone desires that behavior. Indeed, given the prepends that 47787 added, it's quite possible they desire the opposite. For another, a key implication in your "context" is that if one customer intentionally pays 3356 to intentionally send another customer's packets on a longer, slower trip than 3356 otherwise would, that's a legitimate above-board business transaction. Not obviously corrupt. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On Wed, Jan 24, 2024 at 09:22:06AM -0800, William Herrin wrote:
On Wed, Jan 24, 2024 at 8:39???AM James Jun <james.jun@towardex.com> wrote:
On Wed, Jan 24, 2024 at 08:16:56AM -0800, William Herrin wrote:
Sophistry. I buy IP transit from 3 providers, one of which has a 3 AS path to 3356.
Again you omit context.
What you're calling context, I call deceptive.
For one thing, Centurylink's process is, like a spammer, opt-out rather than opt-in.
Nope. Your allegation that Lumen (Centurylink)'s "process" is out-out like a spammer is factually and historically incorrect. However, Lumen's practice is complaint with best common practices and experiences as documented on RFC 4277 and provided by RFC 4271. Lumen/Centurylink's alleged "opt-out spamming" practice predates their very existence and was established during the NSFNET, with an operational need at the time to differenciate commercial networks from R&E networks. Just as R&E networks needed to treat commercial network traffic differently during the needs of the NSFNET, commercial operators of the Internet are also expected and demanded to prioritize traffic by their paying customers, over non-paying customers.
3356 enables the local pref unless told through a BGP community not to. There's no evidence that 47787 even knows that Centurylink is preferring them despite shorter AS paths elsewhere, let alone desires that behavior. Indeed, given the prepends that 47787 added, it's quite possible they desire the opposite.
The evidence is widely documented and is in best common practices of every major ASN exercising routing policy and subsequent RFCs and BCPs published concerning discussions herein. Internet standards and documented widely accepted current practices exist for a good reason. Your, or alleged 47787's possibility of failure, ignorance or act of ommission in being informed of how the current practices work does not make you any less responsible in identifying the problem at hand. Your allegation and arguments that currently adopted and documented inter-AS traffic engineering practices are deceptive and "opt-out" in a bad-faith nature are simply too tenuous a connection and amount to reductio ad absurdum. You are however welcome to participate in IETF process to propose to alter the way BGP practices work for the better, as you wish. That's what's so great about community input-based policy development processes.
For another, a key implication in your "context" is that if one customer intentionally pays 3356 to intentionally send another customer's packets on a longer, slower trip than 3356 otherwise would, that's a legitimate above-board business transaction. Not obviously corrupt.
False. None of the parties described herein, neither 47784, nor 3356 are liable in "intentionally" sending traffic of another customer on a longer, less efficient path. What they are however likely liable for, are contractual obligations and commercial expectations of bilateral parties engaged in an ongoing transaction. You fit into the chain of buying from 53356 without understanding the underlying infrastructure and connectivity relationships that 53356 has toward 3356. And you're now litigating that it's corrupt and is possibly some kind of a coordinated scheme or a racket without your consent. You gave your consent by agreeing to run BGP with 53356 as your vendor, which you awarded that business to, and began advertising your prefix. It's not working the way you want, so engage with your vendor to fix it, or fire them. This is not hard. James
On Mon, Jan 22, 2024 at 06:02:53AM -0800, William Herrin wrote:
On Mon, Jan 22, 2024 at 5:24???AM Patrick W. Gilmore <patrick@ianai.net> wrote:
Standard practice is to localpref your customers up, which makes prepends irrelevant. Why would anyone expect different behavior?
It gives me, your paying customer, less control over my routing through your network than if I wasn't your paying customer. That seems... backwards.
Nope, that is not at all backwards. Have you actually wondered what would happen, if every major ISP stopped classifying routes with localpref, and treated every route received by them (including customers and external peers) on same local-pref, so your AS prepending can work easily? Some 21 years ago, there was this little known story during early stages of the IPv6 development, called 6bone. Aside from the lack of native IPv6 (where everything had to be tunneled), the #1 issue that guaranteed IPv6 sucked many times worse than IPv4 back in the day was the lack of BGP clue by most of IPv6 DFZ participants at that time, where nobody classified any of their routes accordingly with localpref and communities. Not classifying your routes with local-pref leads to complete operational chaos, including world-tour hair-pin sightseeing becoming very common with IPv6 during 6bone days (which resulted in rise of as30071/occaid to dominate the IPv6 DFZ for several years for many to transition out of 6bone). Not classifying routes with local-pref means you do not care whether a particular peer is a settlement-free peer or a customer-- this lack of relationship classifiction leads to operational harm: A customer may be paying you $/bits expecting you to deliver your on-net traffic onto them over their paid peering (or transit) link they bought from you, except, only to find you preferring an IX peer (e.g. Hurricane Electric, etc. over IX) as best-path, even without any AS Path prepending involved. Further, not classifying routes with local-pref and ident communities means you are entirely at the mercy of prefix-lists applied on your export policy. A very common occurrence is often a rookie ISP appeared to be giving "transit" to a major Tier-1 backbone on a route that was supposed to be customer-originated route, but this network selected AS-Path via its uptream provider as best-path, instead of direct connection into the said customer. This happens a lot on a route that is "downstream of a downstream" customer, who is also multi-homed with the said rookie ISP's upstream Tier-1 provider, thereby resulting in equidisant AS-Paths to what is supposed to be a customer-originated route. Scale this up to many routes and you have complete chaos and breakdown of your BGP routing table. So, as a customer, you actually SHOULD be demanding your ISPs to positively identify and categorize their routes using local-pref and communities. In fact, I will never purchase IP transit with BGP from a provider who doesn't categorize routes with local-pref. As a customer, if you want more control over your network's incoming traffic, you need to instead ask your upstream providers about their BGP routing policy and how well they support BGP communities to let you steer traffic, and use those communities to make absolute traffic decisions. Always remember this #1 rule of BGP decision process: AS Path is a **tie-breaker** to local-pref classification. When you prepend AS Path, your goal is to try to steer traffic from routes that are in the same category (i.e. customer or peer) as you. When your goal is absolute steering (i.e. absolute as in, do not advertise to a particular peer, or make your connection standby backup where no traffic ever comes until there is complete outage on the other path, etc), you absolutely SHOULD be using BGP communities provided by your upstream IP provider. If your IP transit provider does not provide extensive BGP communities to meet your requirements, cancel their service and give your business to someone else. A rookie BGP mistake that is commonly made made by those without real-world experience, is the assumption that AS Path prepending should deliver absolute traffic steering -- it does not, and should NOT, by design. The BGP Best-Path Selection Algorithm is taught very well in the CCIE curriculum, but last I looked, they don't teach you on the _why_, only on on the how. So it's common to see enterprise CCIE's working for VARs often falling into the false assumption of AS Path. See https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/13... Hope this clarifies. James
On Mon, Jan 22, 2024 at 10:19 AM James Jun <james.jun@towardex.com> wrote:
So, as a customer, you actually SHOULD be demanding your ISPs to positively identify and categorize their routes using local-pref and communities.
Hi James, The best path to me from Centurylink is: 3356 1299 20473 11875 The path Centurylink chose is: 3356 47787 47787 47787 47787 53356 11875 11875 11875 Do you want to tell me again how that's a reasonable path selection, or how I'm supposed to pass communities to either 20473 or 53356 which tell 3356 to behave itself? Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On Jan 22, 2024, at 14:35, William Herrin <bill@herrin.us> wrote:
The best path to me from Centurylink is: 3356 1299 20473 11875
The path Centurylink chose is: 3356 47787 47787 47787 47787 53356 11875 11875 11875
Do you want to tell me again how that's a reasonable path selection, or how I'm supposed to pass communities to either 20473 or 53356 which tell 3356 to behave itself?
This certainly seems like a reasonable path selection, in the context that 47787 is likely a 3356 customer. AS53356 (Free Range Cloud Hosting) appears to have some limited BGP communities that may help. https://docs.freerangecloud.com/en/bgp/communities implies that you sending 53356:19014 would block announcements to 47787. That may turn into a game of whack a mole, but the knobs appear to be there to try something other than prepending to influence 3356’s selection. — Andrew Hoyos hoyosa@gmail.com <mailto:hoyosa@gmail.com>
On Mon, Jan 22, 2024 at 1:11 PM Andrew Hoyos <hoyosa@gmail.com> wrote:
On Jan 22, 2024, at 14:35, William Herrin <bill@herrin.us> wrote:
The best path to me from Centurylink is: 3356 1299 20473 11875
The path Centurylink chose is: 3356 47787 47787 47787 47787 53356 11875 11875 11875
Do you want to tell me again how that's a reasonable path selection, or how I'm supposed to pass communities to either 20473 or 53356 which tell 3356 to behave itself?
AS53356 (Free Range Cloud Hosting) appears to have some limited BGP communities that may help. https://docs.freerangecloud.com/en/bgp/communities
implies that you sending 53356:19014 would block announcements to 47787.
At which point Centurylink chooses 40676 7489 11875 11875 11875 11875 11875 11875 11875.
This certainly seems like a reasonable path selection, in the context that 47787 is likely a 3356 customer.
That's -why- 3356 chooses the paths. 40676 and 47787 are customers, 1299 is a peer. You're telling me with a straight face that you think that's *reasonable* routing?
That may turn into a game of whack a mole, but the knobs appear to be there to try something other than prepending to influence 3356’s selection.
Whack-a-mole is not a reasonable solution to anything. Besides, I don't want to drop the path to 53356 via 47787. If the path through 20473 fails, the path through 53356 is the next best and I want Centurylink to use it. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
William Herrin wrote on 22/01/2024 21:26:
At which point Centurylink chooses 40676 7489 11875 11875 11875 11875 11875 11875 11875. [...] You're telling me with a straight face that you think that's*reasonable* routing?
yep, looks pretty reasonable, if you're Centurylink and 40676 is a Centurylink customer.
Besides, I don't want to drop the path to53356 via 47787. If the path through 20473 fails, the path through 53356 is the next best and I want Centurylink to use it. You have your own ASN, you have control over your own routing policy. Centurylink probably aren't going to be interested in engaging with you if you're not a customer. It's a pickle.
Nick
On Mon, Jan 22, 2024 at 1:55 PM Nick Hilliard <nick@foobar.org> wrote:
You have your own ASN, you have control over your own routing policy. Centurylink probably aren't going to be interested in engaging with you if you're not a customer. It's a pickle.
It's not a pickle for me. I'll announce three prefixes instead of one, and you get to pay for the extra two TCAM slots. It offends my pride to handle it this way, but -you- shoulder the cost. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On Mon, Jan 22, 2024 at 02:03:48PM -0800, William Herrin wrote:
It offends my pride to handle it this way, but -you- shoulder the cost.
You're misdiagnosing the issue at hand. CL is choosing 3356 47787[x3] 53356 11875[x3] over better path via 1299: What you need to be doing is reaching out to AS53356 (your upstream provider supposedly) to assist with traffic engineering. Given the # of prepends that 53356 added themselves, it looks like you're using their communities to prepend on top of your own prepends (wasted effort), or they've attempted to help you by prepending manually, but to no avail (see our prior discussion). The next level of escalation is for 53356 to now work with 47787 to implement the correct traffic engineering policy facing 3356. This is really something your IP transit providers should be assisting you with. You're misdiagnosing and complaining about something which BGP is supposed to be doing, instead of escalating with the right parties who are in the best position to be assisting you. Believe it or not, there are small-medium IP transit providers who are _very good_ at assisting their BGP customers in traffic engineering efforts, especially with extensive BGP community options, competent network engineers, automation and the likes. Your upstream providers need to step up their game to help you out here. This is not a Lumen/CenturyLink/Level 3 problem. HTH, James
On Mon, Jan 22, 2024 at 5:59 PM James Jun <james.jun@towardex.com> wrote:
CL is choosing 3356 47787[x3] 53356 11875[x3] over better path via 1299: This is not a Lumen/CenturyLink/Level 3 problem. What you need to be doing is
Hi James, My solution has been to add two more-specific routes to -your- routing table so that my one prefix now consumes three routes. If you and the others defending Centurylink's behavior are satisfied with that solution, then we're done here. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On Mon, Jan 22, 2024 at 6:43 PM William Herrin <bill@herrin.us> wrote:
On Mon, Jan 22, 2024 at 5:59 PM James Jun <james.jun@towardex.com> wrote:
CL is choosing 3356 47787[x3] 53356 11875[x3] over better path via 1299: This is not a Lumen/CenturyLink/Level 3 problem. What you need to be doing is
Hi James,
My solution has been to add two more-specific routes to -your- routing table so that my one prefix now consumes three routes. If you and the others defending Centurylink's behavior are satisfied with that solution, then we're done here.
Of course, I'll probably have to do the same thing with my v6 prefix too. But hey, if that works for you I'll conquer my irritation at the inefficiency. -Bill -- William Herrin bill@herrin.us https://bill.herrin.us/
At which point Centurylink chooses 40676 7489 11875 11875 11875 11875 11875 11875 11875.
This certainly seems like a reasonable path selection, in the context that 47787 is likely a 3356 customer.
That's -why- 3356 chooses the paths. 40676 and 47787 are customers, 1299 is a peer. You're telling me with a straight face that you think that's *reasonable* routing?
The reasons why have been pointed out by others: This is perfectly reasonable routing _if you're 3356_ In this profit-driven world, expecting 3356 to do something that's unprofitable for them just because it happens to be convenient for you is, well, unreasonable. Deaggregation offers one loophole out of this Layer 8 problem though, making TCAM slots just the price we pay for "my network, my rules". Convincing 53356 and 47787 to add 3356:70 to your route is another. Have you asked them? I know I would look into it if a customer comes to me with a similar request. Alex
On Mon, Jan 22, 2024 at 3:34 PM Alex Le Heux <alexlh@funk.org> wrote:
This is perfectly reasonable routing _if you're 3356_
In this profit-driven world, expecting 3356 to do something that's unprofitable for them just because it happens to be convenient for you is, well, unreasonable.
Hi Alex, Every packet has two customers: the one sending it and the one receiving it. 3356 is providing a service to its customers. ALL of its customers. Not just 47787. Sending the packet an extra 5,000 miles harms every one of 3356's customers -except for- 47787. In this case, I am the customer on both ends. 3356's choice to route my packet via 47787 serves me poorly. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On Jan 23, 2024, at 00:43, William Herrin <bill@herrin.us> wrote:
On Mon, Jan 22, 2024 at 3:34 PM Alex Le Heux <alexlh@funk.org> wrote:
This is perfectly reasonable routing _if you're 3356_
In this profit-driven world, expecting 3356 to do something that's unprofitable for them just because it happens to be convenient for you is, well, unreasonable.
Every packet has two customers: the one sending it and the one receiving it. 3356 is providing a service to its customers. ALL of its customers. Not just 47787. Sending the packet an extra 5,000 miles harms every one of 3356's customers -except for- 47787.
In this case, I am the customer on both ends. 3356's choice to route my packet via 47787 serves me poorly.
Packets don't have customers, ISPs do. And in this case you're not a customer of the ISP making the routing decision and 3356 is doing precisely what its customer tells it to do by adding (or not adding) specific communities to what is announced. In other words, 3356 is doing precisely what its customer pays it to do. You can build a shorter backup path, deaggregate, get 53356 and 47787 to propagate your routes differently or change your transit mix. There aren't many other options. Fact is that all prepending does it provide a vague hint to other networks about what you would like them to do. And this is only one of the many things those networks take into account when formulating their routing policies. This is why many ASes build extensive community lists to set things like localpref and limit route propagation in other ways. Perhaps you can try adding 53356:47787 to your announcement although it's anyone's guess how that'll affect things. Alex
On Mon, Jan 22, 2024 at 4:16 PM Alex Le Heux <alexlh@funk.org> wrote:
On Jan 23, 2024, at 00:43, William Herrin <bill@herrin.us> wrote: Every packet has two customers: the one sending it and the one receiving it. 3356 is providing a service to its customers. ALL of its customers. Not just 47787. Sending the packet an extra 5,000 miles harms every one of 3356's customers -except for- 47787.
In this case, I am the customer on both ends. 3356's choice to route my packet via 47787 serves me poorly.
Packets don't have customers, ISPs do. And in this case you're not a customer of the ISP making the routing decision
Incorrect. I am a customer of 3356. A residential customer, not a BGP customer. I'm paying them to route my packets too, and they're routing them poorly. Also incorrect: every packet in your network is linked to either one or two customers. Never more. Never less. Routing my packet via 47787 in this case serves neither of us: my Internet access is severely degraded and 47787 is charged money for a packet they need not have handled. Charging your customers to make their service worse doesn't seem like a good business model to me, but maybe that's why I'm not a CEO.
Fact is that all prepending does it provide a vague hint to other networks about what you would like them to do.
Until they tamper with it using localpref, BGP's default behavior with prepends does exactly the right thing, at least in my situation. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
As I already explained, neither the primary nor any of the backup providers directly peer with Centurylink, thus have no communities for controlling announcements to Centurylink.
No, but they do have an option to not announce to 47787. https://docs.freerangecloud.com/en/bgp/communities 53356:19014 would deny to 47787 , which would seem to be the 'problematic' intermediate ASN in your case, You could try that and see what other upstream paths are taken , and see if that gets you over an upstream that lines up more with your performance expectations. Otherwise, you either have to deal with more specifics, or try to get better connected to 3356 some other way. 3356 isn't doing anything wrong here, as much as you seem to want to believe that to be true. This is all pretty standard customer / peer preference handling. On Mon, Jan 22, 2024 at 7:26 PM William Herrin <bill@herrin.us> wrote:
On Mon, Jan 22, 2024 at 4:16 PM Alex Le Heux <alexlh@funk.org> wrote:
On Jan 23, 2024, at 00:43, William Herrin <bill@herrin.us> wrote: Every packet has two customers: the one sending it and the one receiving it. 3356 is providing a service to its customers. ALL of its customers. Not just 47787. Sending the packet an extra 5,000 miles harms every one of 3356's customers -except for- 47787.
In this case, I am the customer on both ends. 3356's choice to route my packet via 47787 serves me poorly.
Packets don't have customers, ISPs do. And in this case you're not a customer of the ISP making the routing decision
Incorrect. I am a customer of 3356. A residential customer, not a BGP customer. I'm paying them to route my packets too, and they're routing them poorly.
Also incorrect: every packet in your network is linked to either one or two customers. Never more. Never less. Routing my packet via 47787 in this case serves neither of us: my Internet access is severely degraded and 47787 is charged money for a packet they need not have handled.
Charging your customers to make their service worse doesn't seem like a good business model to me, but maybe that's why I'm not a CEO.
Fact is that all prepending does it provide a vague hint to other networks about what you would like them to do.
Until they tamper with it using localpref, BGP's default behavior with prepends does exactly the right thing, at least in my situation.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
Packets don't have customers, ISPs do. And in this case you're not a customer of the ISP making the routing decision
Incorrect. I am a customer of 3356. A residential customer, not a BGP customer. I'm paying them to route my packets too, and they're routing them poorly.
Oh, you should have said that right away, or perhaps I missed it. In that case it’s simple: Stop giving them money for bad service. By continuing to give them money you’re incentivizing them to continue breaking your internet, making you the architect of your own misery ;)
Also incorrect: every packet in your network is linked to either one or two customers. Never more. Never less. Routing my packet via 47787 in this case serves neither of us: my Internet access is severely degraded and 47787 is charged money for a packet they need not have handled.
Nonsense. 47787 is clearly telling 3356 they *want* to handle that traffic and even paying for the privilege. Apparently there is a conflict between what you want and what 47787 wants. As you both seem to be paying customers, you should probably ask 3356 to resolve that instead of us random internet folks.
Fact is that all prepending does it provide a vague hint to other networks about what you would like them to do.
Until they tamper with it using localpref, BGP's default behavior with prepends does exactly the right thing, at least in my situation.
Try giving your money to someone who runs BGP with just its default settings and no policies, see how well that works out. Cheers, Alex
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
Apparently there is a conflict between what you want and what 47787 wants. As you both seem to be paying customers, you should probably ask 3356 to resolve that instead of us random internet folks.
Calling 3356 and saying "I know your global routing policy is to prefer a customer learned route over a peer route, but can you change that for me please?" probably won't see much success. On Tue, Jan 23, 2024 at 8:39 AM Alex Le Heux <alexlh@funk.org> wrote:
Packets don't have customers, ISPs do. And in this case you're not a customer of the ISP making the routing decision
Incorrect. I am a customer of 3356. A residential customer, not a BGP customer. I'm paying them to route my packets too, and they're routing them poorly.
Oh, you should have said that right away, or perhaps I missed it.
In that case it’s simple: Stop giving them money for bad service. By continuing to give them money you’re incentivizing them to continue breaking your internet, making you the architect of your own misery ;)
Also incorrect: every packet in your network is linked to either one or two customers. Never more. Never less. Routing my packet via 47787 in this case serves neither of us: my Internet access is severely degraded and 47787 is charged money for a packet they need not have handled.
Nonsense. 47787 is clearly telling 3356 they *want* to handle that traffic and even paying for the privilege. Apparently there is a conflict between what you want and what 47787 wants. As you both seem to be paying customers, you should probably ask 3356 to resolve that instead of us random internet folks.
Fact is that all prepending does it provide a vague hint to other networks about what you would like them to do.
Until they tamper with it using localpref, BGP's default behavior with prepends does exactly the right thing, at least in my situation.
Try giving your money to someone who runs BGP with just its default settings and no policies, see how well that works out.
Cheers,
Alex
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
William Herrin <bill@herrin.us> wrote: Until they tamper with it using localpref, BGP's default behavior with prepends does exactly the right thing, at least in my situation.
I feel your pain Bill, but from a slightly different angle. For years the large CDNs have been disregarding prepends. When a source AS disregards BGP best path selection rules, it sets off a chain reaction of silliness not attributable to the transit AS's. At the terminus of that chain are destination / eyeball AS's now compelled to do undesirable things out of necessity such as: 1) Advertise specifics towards select peers - i.e. inconsistent edge routing policy & littering global table 2) Continuing to prepending a ridiculous amount anyway Gotta wonder how things would be if everyone just abided by the rules.
I feel your pain Bill, but from a slightly different angle. For years the large CDNs have been disregarding prepends. When a source AS disregards BGP best path selection rules, it sets off a chain reaction of silliness not attributable to the transit AS's. At the terminus of that chain are destination / eyeball AS's now compelled to do undesirable things out of necessity such as: 1) Advertise specifics towards select peers - i.e. inconsistent edge routing policy & littering global table 2) Continuing to prepending a ridiculous amount anyway Gotta wonder how things would be if everyone just abided by the rules.
What 'rule' are you asserting is being broken here? On Mon, Jan 22, 2024 at 9:56 PM Jeff Behrns via NANOG <nanog@nanog.org> wrote:
William Herrin <bill@herrin.us> wrote: Until they tamper with it using localpref, BGP's default behavior with prepends does exactly the right thing, at least in my situation.
I feel your pain Bill, but from a slightly different angle. For years the large CDNs have been disregarding prepends. When a source AS disregards BGP best path selection rules, it sets off a chain reaction of silliness not attributable to the transit AS's. At the terminus of that chain are destination / eyeball AS's now compelled to do undesirable things out of necessity such as: 1) Advertise specifics towards select peers - i.e. inconsistent edge routing policy & littering global table 2) Continuing to prepending a ridiculous amount anyway Gotta wonder how things would be if everyone just abided by the rules.
On Jan 22, 2024, at 6:53 PM, Jeff Behrns via NANOG <nanog@nanog.org> wrote:
William Herrin <bill@herrin.us> wrote: Until they tamper with it using localpref, BGP's default behavior with prepends does exactly the right thing, at least in my situation.
I feel your pain Bill, but from a slightly different angle. For years the large CDNs have been disregarding prepends. When a source AS disregards BGP best path selection rules, it sets off a chain reaction of silliness not attributable to the transit AS's. At the terminus of that chain are destination / eyeball AS's now compelled to do undesirable things out of necessity such as: 1) Advertise specifics towards select peers - i.e. inconsistent edge routing policy & littering global table 2) Continuing to prepending a ridiculous amount anyway Gotta wonder how things would be if everyone just abided by the rules.
One might argue that the global routing system should allow for sites to signal their ingress traffic engineering preferences to remote sites in ways other than bloating the global routing table. But that ship seems to have sailed. Regards, -Darrel
All,
But that ship seems to have sailed.
The problem is well known and it consists of two orthogonal aspects: #1 - Ability to signal the preference of which return path to choose by arbitrary remote ASN #2 - Actually applying this preference by remote ASN. For #1 I have proposed some time back a new set of well known wide communities defined in section 2.2.4 of this draft: https://datatracker.ietf.org/doc/html/draft-ietf-idr-registered-wide-bgp-com... Perhaps one day this will surface such that operators will be able to signal their preference without extending AS-PATH or trashing the table with more specifics. For #2 it is quite likely that the economical aspect plays a role here. So it could be that accepting such a preference may not be for free. But before that happens BGP for obvious reasons should be secured and updates should be signed. And we all know how fast that is going to happen. Kind regards, Robert On Wed, Jan 24, 2024 at 5:38 AM Darrel Lewis <d@rrel.me> wrote:
On Jan 22, 2024, at 6:53 PM, Jeff Behrns via NANOG <nanog@nanog.org> wrote:
William Herrin <bill@herrin.us> wrote: Until they tamper with it using localpref, BGP's default behavior with prepends does exactly the right thing, at least in my situation.
I feel your pain Bill, but from a slightly different angle. For years the large CDNs have been disregarding prepends. When a source AS disregards BGP best path selection rules, it sets off a chain reaction of silliness not attributable to the transit AS's. At the terminus of that chain are destination / eyeball AS's now compelled to do undesirable things out of necessity such as: 1) Advertise specifics towards select peers - i.e. inconsistent edge routing policy & littering global table 2) Continuing to prepending a ridiculous amount anyway Gotta wonder how things would be if everyone just abided by the rules.
One might argue that the global routing system should allow for sites to signal their ingress traffic engineering preferences to remote sites in ways other than bloating the global routing table. But that ship seems to have sailed.
Regards,
-Darrel
I’d bet that 47787 is a paying century link customer. As such, despite the ugliness of the path, CL probably local prefs everything advertised by them higher than any non-paying link. I’m willing to bet 1299 is peered and not paying CL. Sending bits for revenue is almost always preferable to sending bits for free, so… Owen
On Jan 22, 2024, at 12:37, William Herrin <bill@herrin.us> wrote:
On Mon, Jan 22, 2024 at 10:19 AM James Jun <james.jun@towardex.com> wrote:
So, as a customer, you actually SHOULD be demanding your ISPs to positively identify and categorize their routes using local-pref and communities.
Hi James,
The best path to me from Centurylink is: 3356 1299 20473 11875
The path Centurylink chose is: 3356 47787 47787 47787 47787 53356 11875 11875 11875
Do you want to tell me again how that's a reasonable path selection, or how I'm supposed to pass communities to either 20473 or 53356 which tell 3356 to behave itself?
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
I’d bet that 47787 is a paying century link customer. As such, despite the ugliness of the path, CL probably local prefs everything advertised by them higher than any non-paying link. I’m willing to bet 1299 is peered and not paying CL.
It's almost as if you've done this before. :) Community : 3356:3 3356:22 3356:100 ==> 3356:123 <++ 3356:575 3356:903 3356:2011 3356:11918 47787:1020 47787:3090 47787:3690 47787:30000 Cluster : 0.0.7.15 0.0.7.19 Originator Id : 4.69.181.14 Peer Router Id : 4.69.130.10 Fwd Class : None Priority : None Flags : Used Valid Best IGP Group-Best Route Source : Internal AS-Path : 47787 47787 47787 47787 53356 11875 11875 11875 3356:123 = Customer On Mon, Jan 22, 2024 at 5:45 PM Owen DeLong via NANOG <nanog@nanog.org> wrote:
I’d bet that 47787 is a paying century link customer. As such, despite the ugliness of the path, CL probably local prefs everything advertised by them higher than any non-paying link. I’m willing to bet 1299 is peered and not paying CL.
Sending bits for revenue is almost always preferable to sending bits for free, so…
Owen
On Jan 22, 2024, at 12:37, William Herrin <bill@herrin.us> wrote:
On Mon, Jan 22, 2024 at 10:19 AM James Jun <james.jun@towardex.com> wrote:
So, as a customer, you actually SHOULD be demanding your ISPs to positively identify and categorize their routes using local-pref and communities.
Hi James,
The best path to me from Centurylink is: 3356 1299 20473 11875
The path Centurylink chose is: 3356 47787 47787 47787 47787 53356 11875 11875 11875
Do you want to tell me again how that's a reasonable path selection, or how I'm supposed to pass communities to either 20473 or 53356 which tell 3356 to behave itself?
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
William Herrin writes:
The best path to me from Centurylink is: 3356 1299 20473 11875
The path Centurylink chose is: 3356 47787 47787 47787 47787 53356 11875 11875 11875
Do you want to tell me again how that's a reasonable path selection, or how I'm supposed to pass communities to either 20473 or 53356 which tell 3356 to behave itself?
What you want to do is pass communities to 3356 so they apply the same local-pref to routes from both paths, enabling as-path-length-based path selection to work. That means lowering their local-pref on the currently-chosen customer path via 47787 to match the local-pref on the their 1299 peer path. as3356's TE communities are listed in their IRR aut-num: AS3356 object: remarks: ---------------------------------------------------- remarks: customer traffic engineering communities - LocalPref remarks: ---------------------------------------------------- remarks: 3356:70 - set local preference to 70 remarks: 3356:80 - set local preference to 80 remarks: 3356:90 - set local preference to 90 remarks: ---------------------------------------------------- Those communities look like RFC1998. Thus presumably 3356's peer local-pref is 80, and you'll want to signal using 3356:80. As you make signaling changes you should use as3356's looking glass to confirm. as47787 and as53356 should pass your 3356:80 community along to as3356. If they don't do so, complain to them or vote with your feet. Jay B.
On Jan 23, 2024, at 10:47, Jay Borkenhagen <jayb@braeburn.org> wrote:
William Herrin writes:
The best path to me from Centurylink is: 3356 1299 20473 11875
The path Centurylink chose is: 3356 47787 47787 47787 47787 53356 11875 11875 11875
Do you want to tell me again how that's a reasonable path selection, or how I'm supposed to pass communities to either 20473 or 53356 which tell 3356 to behave itself?
What you want to do is pass communities to 3356 so they apply the same local-pref to routes from both paths, enabling as-path-length-based path selection to work. That means lowering their local-pref on the currently-chosen customer path via 47787 to match the local-pref on the their 1299 peer path.
as3356's TE communities are listed in their IRR aut-num: AS3356 object:
remarks: ---------------------------------------------------- remarks: customer traffic engineering communities - LocalPref remarks: ---------------------------------------------------- remarks: 3356:70 - set local preference to 70 remarks: 3356:80 - set local preference to 80 remarks: 3356:90 - set local preference to 90 remarks: ----------------------------------------------------
Those communities look like RFC1998. Thus presumably 3356's peer local-pref is 80, and you'll want to signal using 3356:80. As you make signaling changes you should use as3356's looking glass to confirm.
as47787 and as53356 should pass your 3356:80 community along to as3356. If they don't do so, complain to them or vote with your feet.
The catch to all of that, however, is that he’s not directly peered with 3356 and many AS operators strip communities. Owen
On Tue, Jan 23, 2024 at 11:45 AM Owen DeLong via NANOG <nanog@nanog.org> wrote:
The catch to all of that, however, is that he’s not directly peered with 3356 and many AS operators strip communities.
And even if I didn't, the problem isn't just one ISP localprefing to prefer distant routes. Centurylink most directly impacts me, but as others have pointed out: many ISPs do the same darn thing. The only workable solution available to me appears to be tripling my presence in the DFZ tables. Because big operators think it reasonable to localpref distance routes ahead of nearby ones so long as the distant routes arrive from customers. I'll remember that the next time folks complain about the size of the routing table. This one you did to yourselves. Regards, Bill -- William Herrin bill@herrin.us https://bill.herrin.us/
Once upon a time, William Herrin <bill@herrin.us> said:
Because big operators think it reasonable to localpref distance routes ahead of nearby ones so long as the distant routes arrive from customers. I'll remember that the next time folks complain about the size of the routing table. This one you did to yourselves.
This isn't some "big operators" conspiracy... it's how lots of networks with BGP customers work (even small networks). BGP has no knowledge of the distance you keep emphasizing, and path prepends have always been known to be down the decision tree. When you receive a route over a paid link, it's not unreasonable to assume it's because your paying customer wants that traffic from you. It's been pretty standard practice to localpref up routes from your customers for a long time, and then (often but not always) provide communities for said customers to override the localpref. Being a customer of a customer makes that harder, but then it's basically on you to choose your connections with that in mind. -- Chris Adams <cma@cmadams.net>
* bill@herrin.us (William Herrin) [Tue 23 Jan 2024, 21:02 CET]:
On Tue, Jan 23, 2024 at 11:45 AM Owen DeLong via NANOG <nanog@nanog.org> wrote:
The catch to all of that, however, is that he’s not directly peered with 3356 and many AS operators strip communities.
And even if I didn't, the problem isn't just one ISP localprefing to prefer distant routes. Centurylink most directly impacts me, but as others have pointed out: many ISPs do the same darn thing. The only workable solution available to me appears to be tripling my presence in the DFZ tables.
Why do you buy from ISPs when you don't want to receive traffic via them? Have you tried asking that upstream to interconnect more locally with certain other networks? Why do you buy from ISPs that strip TE communities from your announcements that don't affect them in the first place?
Because big operators think it reasonable to localpref distance routes ahead of nearby ones so long as the distant routes arrive from customers. I'll remember that the next time folks complain about the size of the routing table. This one you did to yourselves.
BGP, while a distance vector protocol, famously does not take latency into account when making routing decisions. -- Niels.
On Tue, Jan 23, 2024 at 12:34 PM Niels Bakker <niels=nanog@bakker.net> wrote:
BGP, while a distance vector protocol, famously does not take latency into account when making routing decisions.
Unless overridden, BGP takes -distance- into account where distance = AS path length. Centurylink has overridden that with a localpref so that it DOES NOT take distance into account. Which rather defeats the function of a distance vector protocol. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
Unless overridden, BGP takes -distance- into account where distance = AS path length.
An AS_PATH length of 10 could be a physical distance of 1 mile. An AS_PATH length of 1 could be a physical distance of 1000 miles. BGP TE communities exist to provide signalling in the event that the standards implemented by a provider don't align with the desires of an ASN. They are certainly imperfect, but they are a very useful tool in the toolbox that can solve problems exactly as you are experiencing. If you chose not to even attempt to use them, for whatever your reasons may be, I guess that's all there is to say at this point. On Tue, Jan 23, 2024 at 5:29 PM William Herrin <bill@herrin.us> wrote:
On Tue, Jan 23, 2024 at 12:34 PM Niels Bakker <niels=nanog@bakker.net> wrote:
BGP, while a distance vector protocol, famously does not take latency into account when making routing decisions.
Unless overridden, BGP takes -distance- into account where distance = AS path length.
Centurylink has overridden that with a localpref so that it DOES NOT take distance into account. Which rather defeats the function of a distance vector protocol.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
On Tue, Jan 23, 2024 at 3:27 PM Tom Beecher <beecher@beecher.cc> wrote:
Unless overridden, BGP takes -distance- into account where distance = AS path length.
An AS_PATH length of 10 could be a physical distance of 1 mile.
An AS_PATH length of 1 could be a physical distance of 1000 miles.
Nevertheless, in the protocol's design, the one expressed in the RFC's, AS path length = distance. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
Once upon a time, William Herrin <bill@herrin.us> said:
Nevertheless, in the protocol's design, the one expressed in the RFC's, AS path length = distance.
The RFC doesn't make any equivalence between AS path length and distance. You are the one trying to make that equivalence, but that's not how BGP is used on the Internet. You're about 30 years too late to have any influence on that. -- Chris Adams <cma@cmadams.net>
On Tue, Jan 23, 2024 at 4:00 PM Chris Adams <cma@cmadams.net> wrote:
Once upon a time, William Herrin <bill@herrin.us> said:
Nevertheless, in the protocol's design, the one expressed in the RFC's, AS path length = distance.
The RFC doesn't make any equivalence between AS path length and distance. You are the one trying to make that equivalence,
Respectfully Chris, you are mistaken. https://datatracker.ietf.org/doc/html/rfc4271#section-9.1.2.2 "a) Remove from consideration all routes that are not tied for having the smallest number of AS numbers present in their AS_PATH attributes." So literally, the first thing BGP does when picking the best next hop is to discard all but the routes with the shortest AS path. It also says that BGP implementations are -allowed- to use other selection criteria. And there are many situations where doing so is well advised and improves the result. But AS path length is unambiguously the default, off which a user has to move it. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
Bill,
https://datatracker.ietf.org/doc/html/rfc4271#section-9.1.2.2
"a) Remove from consideration all routes that are not tied for having the smallest number of AS numbers present in their AS_PATH attributes."
So literally, the first thing BGP does when picking the best next hop is to discard all but the routes with the shortest AS path.
Not really. I have never seen a BGP implementation which would do that. That section 9 you are referring to is just informational - no specific order in there is mandated. Shortest AS-PATH is used as step 4 or 5 in best path selection - not to mention Cost Communities which below links do not even consider: https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/13... https://www.juniper.net/documentation/us/en/software/junos/vpn-l2/bgp/topics... Thx, R.
On Tue, Jan 23, 2024 at 10:12:33PM -0800, William Herrin wrote:
Respectfully Chris, you are mistaken.
https://datatracker.ietf.org/doc/html/rfc4271#section-9.1.2.2
"a) Remove from consideration all routes that are not tied for having the smallest number of AS numbers present in their AS_PATH attributes."
So literally, the first thing BGP does when picking the best next hop is to discard all but the routes with the shortest AS path.
Not true. Read the whole RFC--you've ommitted Sections 9.1 and 9.1.1, which are very critical. Discarding all but the routes with shortest AS path is _not_ literally the first thing BGP does as you stated above. The first thing BGP does is to calculate the degree of preference whenever BGP receives a new route, withdrawn route or replacement route (See Section 9.1.1). The determination of the degree of preference is considered to be a local matter for each Autonomous System exercising route policy, typically expressed using LOCAL_PREF, to execute upon the configured administrative policy to class the incoming routes. After completion of 9.1.1, section 9.1.2 and 9.1.2.2 which you cited begins (Phase 2: Route Selection). Route selection under 9.1.2 is only invoked after degree of preference is determined (called 'Phase 1' decision) as clearly described in Section 9.1. In fact, even in 9.1.2.2 that you cited above, it clearly states: In its Adj-RIBs-In, a BGP speaker may have several routes to the same destination that have the same degree of preference. [ snip ] The following tie-breaking procedure assumes that, for each candidate route, all the BGP speakers within an autonomous system can ascertain the cost of a path (interior distance) to the address depicted by the NEXT_HOP attribute of the route, and follow the same route selection algorithm. The tie-breaking algorithm begins by considering all equally preferable routes to the same destination, and then selects routes to be removed from consideration. The algorithm terminates as soon as only one route remains in consideration. The criteria MUST be applied in the order specified. [ snip ] a) Remove from consideration all routes that are not tied for having the smallest number of AS numbers present in their AS_PATH attributes. Note that when counting this number, an AS_SET counts as 1, no matter how many ASes are in the set. So you see, the comparison of AS_PATH and therefore the route selection process could only begin after routes are first resolved by their degree of preference, often typically exercised by LOCAL_PREF across the AS (or other similar import, such as Cisco's "weight" parameter which is applied before LOCAL_PREF locally significant to the router itself where its been configured). The route selection process, including the elimination of routes with inferior AS paths, is a tie-breaker algorithm after degree of preference is first calculated, which is what we've been trying to tell you. So no, AS_PATH comparison is not literally the first thing BGP does. You're ignoring Section 9.1.1 in its entirety, which chronologically begins before Section 9.1.2.2 (the section you cited), which also clearly specifies that route selection process described in it (including AS_PATH comparison) is a tie-breaking procedure.
It also says that BGP implementations are -allowed- to use other selection criteria.
Further followed by the following clause immediately afterwards: "BGP implementations MAY use any algorithm that produces the __same results__ as those described here." And restricted by the following clause in the preceding paragraph: "The criteria MUST be applied in the order specified." And clarified by Section 9.1: "as long as the implementations support the described functionality and they exhibit the same externally visible behavior."
And there are many situations where doing so is well advised and improves the result. But AS path length is unambiguously the default, off which a user has to move it.
So, when a BGP implementation is written in a router software, how does the manufacturer know whether your network is going to need to be applying lot of degrees of preference, or none? The vendors have no idea, and RFC also clarifies that degree of preference is a local policy matter. Therefore, the default behavior is to assume a universally same LOCAL_PREF until a policy is configured, which typically has been '100' across many vendor implementations. In this instance, since all routes have the same degree of preference of 100, Section 9.1.2.2 you cited then begins to tie-break the routes of same preference, starting with the AS_PATH comparison, but it is absolutely by no means, the first thing BGP does, at all. The first thing BGP does as clearly specified in the RFC is to determine the degree of preference to meet local routing policy. The degree of preference differs greatly depending on what type of network you run. If you're an edge consumer ASN (such as multi-homed stub enterprise running BGP), without providing any downstream IP transit to other BGP customers, and not peering with other networks (at an IX or otherwise), then your network probably doesn't have a lot of need to apply administrative policy to determine a degree of preference, and you can be happy fiddling with just AS_PATH. But if you're running a network which provides transit to other ASNs and peering with other networks, then suddenly, applying administrative policy is not only desirable, but operationally required. This isn't solely a revenue/greed problem as some have cynically stated, but it's actually also a critical service availiability and reliability issue, because not having degree of preference pursuant to established routing policy in an IP network completely eliminates the ability to implement a desired predictability in traffic engineering to meet capacity planning objectives for network interconnections. Are there exceptions, pitfalls to this, where poorly designed or thought-out networks suffer in certain routing situations? Absolutely. But that's the Internet-- it's not perfect, but it works very well most of the time for most situations. Your desired 'policy-free, AS_PATH-only' world may solve your particular complaint at hand, but it absolutely would break the rest of the Internet, with no effective ways to implement routing policy for large-scale network interconnections that make the Internet tick. BGP exists to provide anchors to apply routing policy into the path selection process at scale. It is wrong to assume that AS_PATH is the first thing and the only thing which matters in BGP, through incorrect and out-of-context parsing of the RFC to fit your desired narrative. In operational realities, backed by the history and the RFCs themselves, the single most important and influencial knob in BGP is actually arguablely the LOCAL_PREF, more so than AS_PATH. Sadly, most people won't get to experience this until they've run or dealt with operational realities of managing a large IP network. The problem you're complaining about is an exception, primarily caused by your poor selection of IP transit provider at the data center which you're running AS11875, and you're demanding everyone else to take responsibility for the purchasing decision you've made. There are some good proposals, such as commonly accepted wide communities for commonly encountered traffic-engineering scenarios to help improve upon this, and make BGP a better experience for the end-user in situations like the one you're having, but we're not quite there today, and it's understandably not going to be a quick process. In the meantime, in the immediate short term, glad to hear that your route pollution announcement solved the issue for you. In the medium-term, you should get a new transit provider for AS11875 with better connectivity into 3356. Long-term, perhaps commonly accepted wide communities could become a standard some day to improve knobs in situations like this. James
Once upon a time, William Herrin <bill@herrin.us> said:
On Tue, Jan 23, 2024 at 4:00 PM Chris Adams <cma@cmadams.net> wrote:
Once upon a time, William Herrin <bill@herrin.us> said:
Nevertheless, in the protocol's design, the one expressed in the RFC's, AS path length = distance.
The RFC doesn't make any equivalence between AS path length and distance. You are the one trying to make that equivalence,
Respectfully Chris, you are mistaken.
https://datatracker.ietf.org/doc/html/rfc4271#section-9.1.2.2
"a) Remove from consideration all routes that are not tied for having the smallest number of AS numbers present in their AS_PATH attributes."
So literally, the first thing BGP does when picking the best next hop is to discard all but the routes with the shortest AS path.
That's literally not the first thing - you skipped section 9.1.1. It also literally says nothing about distance. -- Chris Adams <cma@cmadams.net>
On Wed, Jan 24, 2024 at 5:23 AM Chris Adams <cma@cmadams.net> wrote:
Once upon a time, William Herrin <bill@herrin.us> said:
On Tue, Jan 23, 2024 at 4:00 PM Chris Adams <cma@cmadams.net> wrote:
Once upon a time, William Herrin <bill@herrin.us> said:
Nevertheless, in the protocol's design, the one expressed in the RFC's, AS path length = distance.
The RFC doesn't make any equivalence between AS path length and distance. You are the one trying to make that equivalence,
Respectfully Chris, you are mistaken.
https://datatracker.ietf.org/doc/html/rfc4271#section-9.1.2.2
"a) Remove from consideration all routes that are not tied for having the smallest number of AS numbers present in their AS_PATH attributes."
So literally, the first thing BGP does when picking the best next hop is to discard all but the routes with the shortest AS path.
That's literally not the first thing - you skipped section 9.1.1.
Phase 1 is local pref. That's what 9.1.1 says. As implied by the word "local," it's set locally by the local operator, not by the origin, though many providers offer haphazard mechanisms that sometimes have some impact if the origin doesn't mind playing whack-a-mole with BGP communities. Unless locally configured to selectively change the local pref off the default, all routes have the same local pref. So it moves to phase 2 (section 9.1.2). This matches what I've been saying for the entire thread: unless the operator intentionally makes the route worse, it follows the shortest AS path. Per the RFC.
It also literally says nothing about distance.
BGP is a distance-vector protocol. BGP's authors preferred different terminology so they used different terminology. Nevertheless, BGP is a distance-vector protocol and when you ask what it uses to determine distance, the answer is the AS path length because all the other criteria are policy functions not distance functions. Want to go another few rounds with pedantry over word choice, or can we leave it there? Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
The basic disconnect here is that you seem to think that BGP is to be used to dictate policy to other networks on how to reach your network. That is not and has never been the case. When I learned BGP back in the 1990s, it was explicitly said that you control your outbound traffic with your BGP policy, but that all you can do is try to influence the decisions of other networks for your inbound traffic (using a combination of prepends, communities, and somtimes other tricks), but sometimes they'll take a path that isn't what you'd prefer (and you just have to accept that). Just like your outbound policy is 100% in your control, so it is with every other network. We always took that kind of thing into account when choosing where to buy transit. When not buying from a "big guy" with a well-connected nationwide network, we'd check BGP announcements and traceroutes to see where things went. -- Chris Adams <cma@cmadams.net>
On Tue, Jan 23, 2024 at 03:37:25PM -0800, William Herrin wrote:
Nevertheless, in the protocol's design, the one expressed in the RFC's, AS path length = distance.
Bill, The protocol was also developed at a time when everyone utilized the same transit provider, and all other ASes were regional or local in scope. Still, I'm not sure your assertion is true. There are senior network engineers on this list who weren't even alive when 1105 was published, and express contemplation of AS path as a tiebreaker doesn't come into it until 1164: "1. An AS can minimize the number of transit ASs. (Shorter AS paths can be preferred over longer ones.)" Note the can...hardly a MUST, or a SHOULD. AS hop count was never intended as a large hammer, and it has never been one in practice, since most people are making their decisions based on local preference, which for the last couple of decades is typically set based on internal community tagging. --msa
BGP is more of a PDVP (Policy Distance Vector Protocol). Policy will always override Distance in BGP and is pretty much the key difference between an EGP and an IGP. Once you recognize that, the rest makes much more sense. Owen
On Jan 23, 2024, at 14:29, William Herrin <bill@herrin.us> wrote:
On Tue, Jan 23, 2024 at 12:34 PM Niels Bakker <niels=nanog@bakker.net> wrote:
BGP, while a distance vector protocol, famously does not take latency into account when making routing decisions.
Unless overridden, BGP takes -distance- into account where distance = AS path length.
Centurylink has overridden that with a localpref so that it DOES NOT take distance into account. Which rather defeats the function of a distance vector protocol.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
On Wed, Jan 24, 2024 at 12:55 AM Owen DeLong <owen@delong.com> wrote:
BGP is more of a PDVP (Policy Distance Vector Protocol).
Hi Owen, That's a distinction without a difference. All but the most rudimentary implementation of a distance-vector protocol supports policy definition and enforcement. BGP has more policy knobs than most, but at its heart it's still a distance-vector protocol and until pushed off its default settings its first differentiator for distance is the length of the AS path. Only link-state protocols tend to lack policy knobs since all nodes must agree about the correct full path, not just the next closest hop. When you twist a policy knob to move BGP off its defaults, you take responsibility for making a better routing choice. And for correcting that choice if it should prove faulty. What I've seen here in this thread is a bunch of folks abdicating that responsibility. That's not unexpected, but it is disappointing. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
When you twist a policy knob to move BGP off its defaults, you take responsibility for making a better routing choice. And for correcting that choice if it should prove faulty. What I've seen here in this thread is a bunch of folks abdicating that responsibility. That's not unexpected, but it is disappointing.
Better is in the eye of the beholder. From your perspective, better is the lowest latency. From almost any ISPs perspective, better is the revenue positive path, followed by the revenue neutral path, with last choice being the revenue negative path. From 3356 perspective, they ARE choosing the best route… the route that pays them. Owen
Because big operators think it reasonable to localpref distance routes ahead of nearby ones so long as the distant routes arrive from customers. I'll remember that the next time folks complain about the size of the routing table. This one you did to yourselves.
That has absolutely nothing to do with it, at all. 3356 is following common practice : Use customer routes before peer routes. This is not some Illuminati based conspiracy , it's pretty standard stuff. Nobody at 3356 is doing some magic latency based twerking to mess with you. You kinda have a lousy upstream IMO. It just so happens that their customer ( 47787 ) happens to takes a *physical* pathway that is less performant than you'd prefer. Two people ( myself and Andrew Hoyos ) went and looked , and found that the upstream you use ( 53356 ) provides TE communities that you can use to prevent your advertisement from being sent to 47787, thus avoiding that poorly performing pathway, and hopefully using someone else better. Again, for reference ( https://docs.freerangecloud.com/en/bgp/communities ). You can: 1. Experiment with 53356's TE communities to prevent them from announcing to upstreams that give you poor performance to 3356. 2. See if 47787 will talk to you about their path to 3356. ( Doubtful, since you aren't a direct customer of theirs.) 3. Pick an upstream that has better / more direct connectivity to 3356, use them instead of /in parallel with 53356. 4. Get yourself connected to 3356 directly. 5. Keep yelling at the clouds about 3356 , even though they are doing the same thing that (to the best of my knowledge) every large transit provider does. On Tue, Jan 23, 2024 at 3:02 PM William Herrin <bill@herrin.us> wrote:
On Tue, Jan 23, 2024 at 11:45 AM Owen DeLong via NANOG <nanog@nanog.org> wrote:
The catch to all of that, however, is that he’s not directly peered with 3356 and many AS operators strip communities.
And even if I didn't, the problem isn't just one ISP localprefing to prefer distant routes. Centurylink most directly impacts me, but as others have pointed out: many ISPs do the same darn thing. The only workable solution available to me appears to be tripling my presence in the DFZ tables.
Because big operators think it reasonable to localpref distance routes ahead of nearby ones so long as the distant routes arrive from customers. I'll remember that the next time folks complain about the size of the routing table. This one you did to yourselves.
Regards, Bill
-- William Herrin bill@herrin.us https://bill.herrin.us/
On Tue, Jan 23, 2024 at 12:38 PM Tom Beecher <beecher@beecher.cc> wrote:
1. Experiment with 53356's TE communities to prevent them from announcing to upstreams that give you poor performance to 3356.
Respectfully, I rejected that approach because it doesn't address the other few hundred instances of this problem, nor even resolves the current issue since Centurylink is demonstrated to then switch to yet another customer via a different one of my upstreams that would require yet another community, if there is one.
2. See if 47787 will talk to you about their path to 3356.
Haha. You're funny.
3. Pick an upstream that has better / more direct connectivity to 3356, use them instead of /in parallel with 53356.
Haha. You're funny.
4. Get yourself connected to 3356 directly.
I am, just not as a BGP customer. And I won't be as a BGP customer. Opening a ticket with them has not yielded results. Or any response from network engineering at all. Just the frontline support who wants me to reboot my modem. :(
5. Keep yelling at the clouds about 3356 , even though they are doing the same thing that (to the best of my knowledge) every large transit provider does.
6. Pollute the DFZ because in light of what "every large transit provider does," that's the solution that actually works. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
William Herrin wrote:
Nevertheless, in the protocol's design, the one expressed in the RFC's, AS path length = distance.
Since we're opening RFCs now, and somehow it is being opined that LOCAL_PREF is a profit-driven conspiracy and a coordinated scheme concocted by commercial networks to tamper with, or "override" AS_PATH desires of the majority, let us review factually about what LOCAL_PREF actually does and why it was implemented into BGP in the first place: RFC 4277 entitled "Experience with the BGP-4 Protocol", Section 20: The NSFNET program used EGP, and then BGP, to provide external routing information. It was the NSF policy of offering different prices and providing different levels of support to the Research and Education (RE) and the Commercial (CO) networks that led to BGP's initial policy requirements. In addition to being charged more, CO networks were not able to use the NSFNET backbone to reach other CO networks. The rationale for higher prices was that commercial users of the NSFNET within the business and research entities should subsidize the RE community. Recognition that the Internet was evolving away from a hierarchical network to a mesh of peers led to changes away from EGP and BGP-1 that eliminated any assumptions of hierarchy. Enforcement of NSF policy was accomplished through maintenance of the NSF Policy Routing Database (PRDB). The PRDB not only contained each networks designation as CO or RE, but also contained a list of the preferred exit points to the NSFNET to reach each network. This was the basis for setting what would later be called BGP LOCAL_PREF on the NSFNET. Tools provided with the PRDB generated complete router configurations for the NSFNET. RFC 4271 entitled "A Border Gateway Protocol 4 (BGP-4)" (supersedes RFC 1771), Section 5.1.5: A BGP speaker SHALL calculate the degree of preference for each external route based on the locally-configured policy, and include the degree of preference when advertising a route to its internal peers. The higher degree of preference MUST be preferred. A BGP speaker uses the degree of preference learned via LOCAL_PREF in its Decision Process (see Section 9.1.1). It is clear by the experiences of NSFnet and early days of the Internet, that AS_PATH alone is insufficient to meet interconnection policy objectives. In fact, this LOCAL_PREF "conspiracy" was actually concocted by Research and Education (R&E) networks to make evil commercial networks pay--but in reality, NSFnet and early R&E networks had actual operational and demonstrated reasons for this, and a path vector routing protocol where cross-border interconnection policies must be applied cannot simply rely on AS_PATH for decision mechanism. Otherwise, it'd have been easier to just scale up RIP into a global routing protocol instead of using BGP. This is where your argument and basis of your claim fails-- a parameter to express administrative policy preference was required even in early days of NSFnet, and that is why LOCAL_PREF was put in there in the first place, despite your assertions claiming it is broken and being used to "override" AS_PATH to small guys for bad faith reasons. This was not some later "add-on" for conspiracy by commercial networks; LOCAL_PREF in fact, was one of the principal features and reasons for developing BGP-4. You're 29 years late to this conversation buddy.
4. Get yourself connected to 3356 directly.
I am, just not as a BGP customer. And I won't be as a BGP customer. Opening a ticket with them has not yielded results. Or any response from network engineering at all. Just the frontline support who wants me to reboot my modem. :(
I get that you are not in the position to buy from 3356, and to that extent, that is a completely respectable and reasonable position (commercial reasons, personal experience/preference or otherwise, you are the customer here). But you have a voice as a customer on which BGP transit provider you're purchasing on the other end (the far-end location or data center where your ASN is operating and taking transit from) -- take it as a lesson learned going forward: when choosing a smaller/nimble or blended bandwidth IP provider, make sure you to ask, what can the provider do to help you achieve better connectivity into 3356 or any other network you're trying to get to? It's your transit provider's business to make sure your ASN's connectivity works to your expectations. Otherwise why would you, the customer, choose to do business with a middle-man when you could just buy direct from 3356 at the data center for your ASN instead? It is incumbent upon your IP transit provider to help you better meet your connectivity requirements (especially for retail and small traffic customers in data centers like yourself who are not subject to capacity or comercial interconnection disputes), and going forward, this is one area of requirements to be checked for during the RFQ process for procuring enterprise BGP-based IP transit.
5. Keep yelling at the clouds about 3356 , even though they are doing the same thing that (to the best of my knowledge) every large transit provider does.
6. Pollute the DFZ because in light of what "every large transit provider does," that's the solution that actually works.
We've done our part to factually inform you on why BGP works the way it is using LOCAL_PREF, and why it is a very bad idea and actually operationally harmful to assume that AS_PATH should be the only metric that matters. Assuming that AS_PATH is the only metric that would matter demonstrates complete misunderstanding and misrepresentation of facts regarding the history of BGP and what the protocol is supposed to do. Your answer to these facts is to claim that we're defending 3356, perhaps there is some kind of a coordinated scheme by old school boyz club or something, and that you'll simply pollute everyone's routing table (either to spite or because that is the only option that works for you) because BGP is broken or is being "tampered with" for profit-driven reasons by your opinions held in view. You're welcome to do what you feel you'd need to do to meet your traffic-engineering requirements and hold whatever opinions you so desire, but you're not entitled to your own set of facts. I'm sure this will be an amusing case example for FIB compression algorithms to automatically filter out your said 'polluting' route, but that's a different conversation entirely. ;-) Regards, James
On Mon, 22 Jan 2024, William Herrin wrote:
Howdy,
Does anyone have suggestions for dealing with networks who ignore my BGP route prepends?
I have a primary ingress with no prepends and then several distant backups with multiple prepends of my own AS number. My intention, of course, is that folks take the short path to me whenever it's reachable.
A few years ago, Comcast decided it would prefer the 5000 mile, five-prepend loop to the short 10 mile path. I was able to cure that with a community telling my ISP along that path to not advertise my route to Comcast. Today it's Centurylink. Same story; they'd rather send the packets 5000 miles to the other coast and back than 10 miles across town. I know they have the correct route because when I withdraw the distant ones entirely, they see and use it. But this time it's not just one path; they prefer any other path except the one I want them to use. And Centurylink is not a peer of those ISPs, so there doesn't appear to be any community I can use to tell them not to use the route.
I hate to litter the table with a batch of more-specifics that only originate from the short, preferred link but I'm at a loss as to what else to do.
In my experience, it's pretty common for service providers to use localpref to differentiate paid/free/customer routes (with LP increasing in this order). Since LP trumps as-path length, no amount of prepending will get around this. You may be limited to seeing if your backup providers have community controls that would let you tell them "don't share with Centurylink" or seeing if your primary has similar controls that would let you advertise both the aggregate and more specifics, but have them not propagate the more specifics except to those networks (i.e. Centurylink) that you need to see them to get them off your backup paths. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route Blue Stream Fiber, Sr. Neteng | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
On Mon, Jan 22, 2024 at 5:23 AM Jon Lewis <jlewis@lewis.org> wrote:
You may be limited to seeing if your backup providers have community controls that would let you tell them "don't share with Centurylink"
As I already explained, neither the primary nor any of the backup providers directly peer with Centurylink, thus have no communities for controlling announcements to Centurylink. I hate to litter the table with a batch of more-specifics that only originate from the short, preferred link but I'm not hearing any practical alternatives. Treating my distant links as equivalent even though I told you with prepends that they are not leaves me with few knobs I can turn. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
I really really wish there were a couple of well-known and globally respected communities which you could set to say either "this is a route of last resort" or "this is my preferred route". I feel like it would avoid many of us doing exactly what you're about to do which is pollute the routing tables with extra, more specific routes to do basic traffic engineering. (Resulting in 3 routes where one would do). I'm not talking fine level control here, just being able to say "hey this route is better than nothing, but not much" or "treat this as backup". I understand the resistance to honoring various route engineering tactics, but being able to effectively do the exact same thing that announcing more specifics does without having to resort to announcing more specifics would be a good thing as far as the global bgp table size goes. On Mon, Jan 22, 2024, 1:16 PM William Herrin <bill@herrin.us> wrote:
On Mon, Jan 22, 2024 at 5:23 AM Jon Lewis <jlewis@lewis.org> wrote:
You may be limited to seeing if your backup providers have community controls that would let you tell them "don't share with Centurylink"
As I already explained, neither the primary nor any of the backup providers directly peer with Centurylink, thus have no communities for controlling announcements to Centurylink.
I hate to litter the table with a batch of more-specifics that only originate from the short, preferred link but I'm not hearing any practical alternatives. Treating my distant links as equivalent even though I told you with prepends that they are not leaves me with few knobs I can turn.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
On Jan 22, 2024, at 21:34, Forrest Christian (List Account) <lists@packetflux.com> wrote:
I really really wish there were a couple of well-known and globally respected communities which you could set to say either "this is a route of last resort" or "this is my preferred route".
You're not the first to wish for this: https://datatracker.ietf.org/doc/html/draft-dickson-idr-last-resort-05 Alex
And now you are faced with an object lesson as to why TE routes are so prevalent. Less specifics are your only functional alternative here. In most cases, you shouldn’t need more than 2 per prefix. Owen
On Jan 22, 2024, at 12:16, William Herrin <bill@herrin.us> wrote:
On Mon, Jan 22, 2024 at 5:23 AM Jon Lewis <jlewis@lewis.org> wrote:
You may be limited to seeing if your backup providers have community controls that would let you tell them "don't share with Centurylink"
As I already explained, neither the primary nor any of the backup providers directly peer with Centurylink, thus have no communities for controlling announcements to Centurylink.
I hate to litter the table with a batch of more-specifics that only originate from the short, preferred link but I'm not hearing any practical alternatives. Treating my distant links as equivalent even though I told you with prepends that they are not leaves me with few knobs I can turn.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
To expand on what others have said here, I find it helpful to think of BGP as a policy enforcement protocol, rather than as a distance vector routing protocol. To that end, there’s a generally expected hierarchy of routes, and then a lot of individuality between networks. Having done traffic engineering for some global CDNs, there’s a bunch of inbound traffic control that you can do by letting an understanding of how most other providers think about this guide your transit and peering policies, and a remaining portion that generally needs to be solved through either discussions, negotiations, or commercial arrangements with the sending party or their upstreams. For the general rules, local-preference trumps everything else. The number of AS path hops comes after local-preference. Other things being equal networks usually like to hand off traffic to a short AS path, and at the closest point to its origination (there are valid performance reasons for this) but local-preference policies will override both of those. Local-preferences usually have three default tiers — customer, peering, and transit. In other words, get paid, hand off for free, and pay. There are often some additional peers that can be selected for traffic engineering reasons, either internally or by customers using BGP communities. BUT, those BGP communities don’t transit to other ASes, so even if you manage to signal one hop up stream, you may still find your upstream provider announcing your routes to those who have different ideas. One example of this from the early days of anycasted DNS root servers involved k.root-servers.net <http://k.root-servers.net/> installing a node in Delhi, which pulled 60% of its traffic from North America. This was clearly non-optimal. They had attempted to get routing diversity by getting transit from different providers in different parts of the world, but their Delhi node was, if I recall correctly, a customer of a customer of a customer of Level3. Oops. So, what do you do about this? If you’re a global network operator, you probably attempt to maintain consistent peering/transit relationships across sites. That way, AS paths and local-preferences should be fairly even, and you can let nearest exit routing do its thing. If you have a smaller network, but have multiple interconnection locations that are far enough apart to make a performance difference, make the same transit and peering relationships at each one. Make exceptions only for peers (not transit providers) whose customers or services only exist in one of the areas, and make sure they don’t announce your routes to their upstreams. That way you won’t trombone traffic. If you’ve done all that, and traffic is still coming in the wrong place, then you start talking to people. “Hey, I’m buying transit from you in both Asia and the Western US, and all my traffic from asian-country-x is coming into San Jose. Why?” “Well, they only have a 100 Mb/s interconnection to us in Asia. We have to traffic engineer around it.” And then you have to figure out how to convince some national telco to want to talk to you more than they want to talk to your transit provider. I think in your case, I would be asking why you have a 5,000 mile, five-prepend loop to get to a provide ten miles away. It suggests that your network is doing things 5,000 miles away that are inconsistent with what you're doing locally, or that you have upstreams who aren’t interconnecting locally or aren’t maintaining sufficient capacity or sufficient political relationships on those paths. All of those would predictably have this result. The solution is likely to take a look at your transit relationships, ask your transit providers about their transit relationships, and either supplement or switch to a set of transit providers who can provide the routing you want. -Steve
On Jan 22, 2024, at 4:49 AM, William Herrin <bill@herrin.us> wrote:
Howdy,
Does anyone have suggestions for dealing with networks who ignore my BGP route prepends?
I have a primary ingress with no prepends and then several distant backups with multiple prepends of my own AS number. My intention, of course, is that folks take the short path to me whenever it's reachable.
A few years ago, Comcast decided it would prefer the 5000 mile, five-prepend loop to the short 10 mile path. I was able to cure that with a community telling my ISP along that path to not advertise my route to Comcast. Today it's Centurylink. Same story; they'd rather send the packets 5000 miles to the other coast and back than 10 miles across town. I know they have the correct route because when I withdraw the distant ones entirely, they see and use it. But this time it's not just one path; they prefer any other path except the one I want them to use. And Centurylink is not a peer of those ISPs, so there doesn't appear to be any community I can use to tell them not to use the route.
I hate to litter the table with a batch of more-specifics that only originate from the short, preferred link but I'm at a loss as to what else to do.
Advice would be most welcome.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
You can use the ultimate BOFH BGP tool, which is to include the network you don't want those announcements to go in the AS Path. Let's say your ASN is 65000, and the target you want to not route through that path is 65001. For the path you want that network to route to, announce this AS Path: 65000 65000 65000 65000 65000 For the path you don't want that network to route to, announce this AS Path: 65000 65001 65000 So your announcements still have your AS as first AS and peer AS. But 65001 loop detection will kill that announcement, regardless of local preference or AS Path size. Rubens On Mon, Jan 22, 2024 at 9:50 AM William Herrin <bill@herrin.us> wrote:
Howdy,
Does anyone have suggestions for dealing with networks who ignore my BGP route prepends?
I have a primary ingress with no prepends and then several distant backups with multiple prepends of my own AS number. My intention, of course, is that folks take the short path to me whenever it's reachable.
A few years ago, Comcast decided it would prefer the 5000 mile, five-prepend loop to the short 10 mile path. I was able to cure that with a community telling my ISP along that path to not advertise my route to Comcast. Today it's Centurylink. Same story; they'd rather send the packets 5000 miles to the other coast and back than 10 miles across town. I know they have the correct route because when I withdraw the distant ones entirely, they see and use it. But this time it's not just one path; they prefer any other path except the one I want them to use. And Centurylink is not a peer of those ISPs, so there doesn't appear to be any community I can use to tell them not to use the route.
I hate to litter the table with a batch of more-specifics that only originate from the short, preferred link but I'm at a loss as to what else to do.
Advice would be most welcome.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/
participants (21)
-
Alex Le Heux
-
Andrew Hoyos
-
behrnsjeff@yahoo.com
-
Chris Adams
-
Darrel Lewis
-
Forrest Christian (List Account)
-
James Jun
-
Jay Borkenhagen
-
Jay R. Ashworth
-
Jon Lewis
-
Majdi S. Abbas
-
Mel Beckman
-
Nick Hilliard
-
Niels Bakker
-
Owen DeLong
-
Patrick W. Gilmore
-
Robert Raszuk
-
Rubens Kuhl
-
Steve Gibbard
-
Tom Beecher
-
William Herrin