On Tue, Jan 23, 2024 at 10:12:33PM -0800, William Herrin wrote:
Respectfully Chris, you are mistaken.
https://datatracker.ietf.org/doc/html/rfc4271#section-9.1.2.2
"a) Remove from consideration all routes that are not tied for having the smallest number of AS numbers present in their AS_PATH attributes."
So literally, the first thing BGP does when picking the best next hop is to discard all but the routes with the shortest AS path.
Not true. Read the whole RFC--you've ommitted Sections 9.1 and 9.1.1, which are very critical. Discarding all but the routes with shortest AS path is _not_ literally the first thing BGP does as you stated above. The first thing BGP does is to calculate the degree of preference whenever BGP receives a new route, withdrawn route or replacement route (See Section 9.1.1). The determination of the degree of preference is considered to be a local matter for each Autonomous System exercising route policy, typically expressed using LOCAL_PREF, to execute upon the configured administrative policy to class the incoming routes. After completion of 9.1.1, section 9.1.2 and 9.1.2.2 which you cited begins (Phase 2: Route Selection). Route selection under 9.1.2 is only invoked after degree of preference is determined (called 'Phase 1' decision) as clearly described in Section 9.1. In fact, even in 9.1.2.2 that you cited above, it clearly states: In its Adj-RIBs-In, a BGP speaker may have several routes to the same destination that have the same degree of preference. [ snip ] The following tie-breaking procedure assumes that, for each candidate route, all the BGP speakers within an autonomous system can ascertain the cost of a path (interior distance) to the address depicted by the NEXT_HOP attribute of the route, and follow the same route selection algorithm. The tie-breaking algorithm begins by considering all equally preferable routes to the same destination, and then selects routes to be removed from consideration. The algorithm terminates as soon as only one route remains in consideration. The criteria MUST be applied in the order specified. [ snip ] a) Remove from consideration all routes that are not tied for having the smallest number of AS numbers present in their AS_PATH attributes. Note that when counting this number, an AS_SET counts as 1, no matter how many ASes are in the set. So you see, the comparison of AS_PATH and therefore the route selection process could only begin after routes are first resolved by their degree of preference, often typically exercised by LOCAL_PREF across the AS (or other similar import, such as Cisco's "weight" parameter which is applied before LOCAL_PREF locally significant to the router itself where its been configured). The route selection process, including the elimination of routes with inferior AS paths, is a tie-breaker algorithm after degree of preference is first calculated, which is what we've been trying to tell you. So no, AS_PATH comparison is not literally the first thing BGP does. You're ignoring Section 9.1.1 in its entirety, which chronologically begins before Section 9.1.2.2 (the section you cited), which also clearly specifies that route selection process described in it (including AS_PATH comparison) is a tie-breaking procedure.
It also says that BGP implementations are -allowed- to use other selection criteria.
Further followed by the following clause immediately afterwards: "BGP implementations MAY use any algorithm that produces the __same results__ as those described here." And restricted by the following clause in the preceding paragraph: "The criteria MUST be applied in the order specified." And clarified by Section 9.1: "as long as the implementations support the described functionality and they exhibit the same externally visible behavior."
And there are many situations where doing so is well advised and improves the result. But AS path length is unambiguously the default, off which a user has to move it.
So, when a BGP implementation is written in a router software, how does the manufacturer know whether your network is going to need to be applying lot of degrees of preference, or none? The vendors have no idea, and RFC also clarifies that degree of preference is a local policy matter. Therefore, the default behavior is to assume a universally same LOCAL_PREF until a policy is configured, which typically has been '100' across many vendor implementations. In this instance, since all routes have the same degree of preference of 100, Section 9.1.2.2 you cited then begins to tie-break the routes of same preference, starting with the AS_PATH comparison, but it is absolutely by no means, the first thing BGP does, at all. The first thing BGP does as clearly specified in the RFC is to determine the degree of preference to meet local routing policy. The degree of preference differs greatly depending on what type of network you run. If you're an edge consumer ASN (such as multi-homed stub enterprise running BGP), without providing any downstream IP transit to other BGP customers, and not peering with other networks (at an IX or otherwise), then your network probably doesn't have a lot of need to apply administrative policy to determine a degree of preference, and you can be happy fiddling with just AS_PATH. But if you're running a network which provides transit to other ASNs and peering with other networks, then suddenly, applying administrative policy is not only desirable, but operationally required. This isn't solely a revenue/greed problem as some have cynically stated, but it's actually also a critical service availiability and reliability issue, because not having degree of preference pursuant to established routing policy in an IP network completely eliminates the ability to implement a desired predictability in traffic engineering to meet capacity planning objectives for network interconnections. Are there exceptions, pitfalls to this, where poorly designed or thought-out networks suffer in certain routing situations? Absolutely. But that's the Internet-- it's not perfect, but it works very well most of the time for most situations. Your desired 'policy-free, AS_PATH-only' world may solve your particular complaint at hand, but it absolutely would break the rest of the Internet, with no effective ways to implement routing policy for large-scale network interconnections that make the Internet tick. BGP exists to provide anchors to apply routing policy into the path selection process at scale. It is wrong to assume that AS_PATH is the first thing and the only thing which matters in BGP, through incorrect and out-of-context parsing of the RFC to fit your desired narrative. In operational realities, backed by the history and the RFCs themselves, the single most important and influencial knob in BGP is actually arguablely the LOCAL_PREF, more so than AS_PATH. Sadly, most people won't get to experience this until they've run or dealt with operational realities of managing a large IP network. The problem you're complaining about is an exception, primarily caused by your poor selection of IP transit provider at the data center which you're running AS11875, and you're demanding everyone else to take responsibility for the purchasing decision you've made. There are some good proposals, such as commonly accepted wide communities for commonly encountered traffic-engineering scenarios to help improve upon this, and make BGP a better experience for the end-user in situations like the one you're having, but we're not quite there today, and it's understandably not going to be a quick process. In the meantime, in the immediate short term, glad to hear that your route pollution announcement solved the issue for you. In the medium-term, you should get a new transit provider for AS11875 with better connectivity into 3356. Long-term, perhaps commonly accepted wide communities could become a standard some day to improve knobs in situations like this. James