BGP Path Attribute Filtering, YES or NO?

Would like to gather current views of a wider community on BGP Path Attribute Filtering (discarding selected attributes in particular, not treat as withdraw) as an addition to the long list of standard conditioning tools like max as-path length limit, limiting number of communities all the way to running iBGP infrastructure to carry Internet prefixes separate to the one carrying customers' L3/L2VPN prefixes. And I appreciate the topic is somewhat contentious and there's no simple yes or no answer either. My view is that in a stub AS there should be no harm in discarding unused BGP path attributes, On transit AS-es I'd expect two opposing views: One might be: "I have a business to run and don't care about some university experiments, so unless any of my customers specifically asks for some attribute I'll drop all reserved, unassigned and deprecated ones and might even drop some not widely used ones just to be on the well-trodden bug free path" Other might be: "These experimental work is of great value to the community and there's a process now to announce and manage these experiments, what about net neutrality, and besides modern BGP implementations should handle well formatted attributes and if it's not the case its good that these flaws are being exposed and fixed." Please let me know your thoughts. adam

On 8/Jan/20 14:44, adamv0025@netconsultings.com wrote:
Would like to gather current views of a wider community on BGP Path Attribute Filtering (discarding selected attributes in particular, not treat as withdraw) as an addition to the long list of standard conditioning tools like max as-path length limit, limiting number of communities all the way to running iBGP infrastructure to carry Internet prefixes separate to the one carrying customers’ L3/L2VPN prefixes.
And I appreciate the topic is somewhat contentious and there’s no simple yes or no answer either.
My view is that in a stub AS there should be no harm in discarding unused BGP path attributes,
On transit AS-es I’d expect two opposing views:
One might be: “I have a business to run and don’t care about some university experiments, so unless any of my customers specifically asks for some attribute I’ll drop all reserved, unassigned and deprecated ones and might even drop some not widely used ones just to be on the well-trodden bug free path”
Other might be: “These experimental work is of great value to the community and there’s a process now to announce and manage these experiments, what about net neutrality, and besides modern BGP implementations should handle well formatted attributes and if it’s not the case its good that these flaws are being exposed and fixed.”
Please let me know your thoughts.
From our side, on peering links, re-write all MED to 0 and scrubs all communities, and replace them with our own.
On customer links, we re-write MED to 0. While we don't scrub our customer's specific communities, we do ensure they cannot use our own, unauthorized internal communities beyond what we've allowed them to. Mark.

On Wed, 8 Jan 2020 at 15:09, Mark Tinka <mark.tinka@seacom.mu> wrote:
From our side, on peering links, re-write all MED to 0 and scrubs all communities, and replace them with our own.
If you rewrite MED, you SHOULD rewrite origin (which RFC prohibits, incorrectly). I can understand rationale for rewriting MED, you don't want to cold potato, which is fair and certainly cannot be argued to be objectively wrong, since there may be a market where people will abuse your cold potato to save on their own infrastructure costs. If you rewrite MED but not origin, then you're not really accomplishing anything. -- ++ytti

On Wed, 8 Jan 2020 at 15:24, Mark Tinka <mark.tinka@seacom.mu> wrote:
Hmmh, now I'm curious... please explain why rewriting MED but not ORIGIN doesn't help.
If you reset MED in effort to stop me from transferring my infrastructure costs to your network, I can still set origin and force cold potato in your network. -- ++ytti

On 8/Jan/20 15:49, Saku Ytti wrote:
If you reset MED in effort to stop me from transferring my infrastructure costs to your network, I can still set origin and force cold potato in your network.
Okay, I see how this could be abused in a scenario where you have multiple peering locations with a single network. Looking at our own network specifically, I'm not immediately seeing a risk of this (will look a little deeper in the next couple of hours) as the only location where we may be peering in multiple locations with the same network across a vast geographic scope is Europe. Specifically, AMS-IX, DE-CIX, ECIX, France-IX, LINX and NL-IX. Considering the scope of our European backbone, I don't see any obvious benefit to a multi-homed peer given the rather contained latency between these cities, and the average cost of capacity on the continent. We specifically refuse to have multiple transit locations with upstreams, partly because of this and also because we already have a transit compliment that works well for us. So for every transit provider we have, that would be in a single location. The other location where we peer with the same provider in multiple locations is South Africa (Johannesburg, Cape Town, Durban). Considering that we have a Selective peering policy, we can manage any issues here that may crop up, and they can come up very quickly and noticeably considering South Africa has 2 major exit points, west via Cape Town and east via Kwazulu Natal. I could see where a peer may decide to cold-potato us between the 3 major cities, on-land, but there are "social" reasons why this is not likely (buy me a beer, hehe), apart from being quickly noticeable by our NOC and customers (including their NOC and customers). I can see how one transit provider could use the ORIGIN attribute to force my network to send more traffic toward them vs. my other transit providers. However, that would require that at least one or more of the other transit providers set their ORIGIN code to EGP or Incomplete, so that the default of IGP works toward their objective. Otherwise, setting anything other than IGP, when the rest leave it default, actually increases their potential to lose my traffic. For customers trying to do this, I'm not sure I really care since they are paying us for any and all ports they have with us, if multi-homed to us. I guess I'm just battling with my mind as to whether going back to retrofit the network with "set origin igp" explicitly is worth it. For the moment, not yet, in our specific case (might be a different case if we had a large North American network), but I'll keep chewing on it. Mark.

On Wed, Jan 08, 2020 at 03:06:45PM +0200, Mark Tinka wrote:
From our side, on peering links, re-write all MED to 0 and scrubs all communities, and replace them with our own.
On customer links, we re-write MED to 0.
[ snip ] I get that you'd want to reset MED on peering sessions, but any particular rationale on why you'd rewrite MED to 0 on customer sessions? I would argue that providing the ability for customers to transfer backhaul costs onto their transit provider is one of the compelling commercial reasons *for* IP transit vs. other modes of IP interconnection. Conversely speaking, I would also argue that transit provider *should* forward meaningful MED values on its route advertisements to customers. If a customer wants to cold potato his outbound traffic on his own network, that's entirely his call; he has the option of rewriting MED to 0 if he wants closest exit to his transit instead. Most transit providers (at least in US, I can't imagine it's much different in EU) will permit downstream customers to cold potato traffic through their network. James

On 8/Jan/20 16:26, James Jun wrote:
I get that you'd want to reset MED on peering sessions, but any particular rationale on why you'd rewrite MED to 0 on customer sessions?
I would argue that providing the ability for customers to transfer backhaul costs onto their transit provider is one of the compelling commercial reasons *for* IP transit vs. other modes of IP interconnection.
Conversely speaking, I would also argue that transit provider *should* forward meaningful MED values on its route advertisements to customers. If a customer wants to cold potato his outbound traffic on his own network, that's entirely his call; he has the option of rewriting MED to 0 if he wants closest exit to his transit instead.
Most transit providers (at least in US, I can't imagine it's much different in EU) will permit downstream customers to cold potato traffic through their network.
We provide customers with a ton of LOCAL_PREF options they can activate in our network via communities: http://as37100.net/?bgp As I mentioned to Saku re: the ORIGIN attribute, I don't mind customers using this on us since we have sufficient backbone capacity in all markets, and they pay us to provide them with a port in each market. So if customers want to change our LOCAL_PREF values in order to push traffic some way or another, we are okay with this, since it's $$. Mark.

On Wed, Jan 08, 2020 at 04:36:29PM +0200, Mark Tinka wrote:
We provide customers with a ton of LOCAL_PREF options they can activate in our network via communities:
As I mentioned to Saku re: the ORIGIN attribute, I don't mind customers using this on us since we have sufficient backbone capacity in all markets, and they pay us to provide them with a port in each market. So if customers want to change our LOCAL_PREF values in order to push traffic some way or another, we are okay with this, since it's $$.
I see. LOCAL_PREF and RFC 1998 style of community attributes however are not the right tool for signalling exit locations -- it does not scale. Sure, it's a useful hammer to hard enforce a baseline mode of preference on given route (e.g. route of last resort, backup or equalize to same baseline level as peer-learned routes, etc), but for signalling optimal exit locations at scale, MED is exactly the right tool for that job (and networks would typically derive MED values using IGP metrics). I'm not concerned about ORIGIN attr, as that's abuse of interconnection, so slightly a different situation. But, denying the ability for customers who have ports at multiple locations to use MED isn't very ideal. James

On 8/Jan/20 16:52, James Jun wrote:
I see. LOCAL_PREF and RFC 1998 style of community attributes however are not the right tool for signalling exit locations -- it does not scale. Sure, it's a useful hammer to hard enforce a baseline mode of preference on given route (e.g. route of last resort, backup or equalize to same baseline level as peer-learned routes, etc), but for signalling optimal exit locations at scale, MED is exactly the right tool for that job (and networks would typically derive MED values using IGP metrics).
Two solutions, two methods, same result, IMHO. It's been scaling very well for us, and offers customers explicit control that comes with a flip-switch cover over the, well, switch :-). If you know of any reason why LOCAL_PREF doesn't scale, I'd like to hear it, since I'd imagine that if closely maintaining exit paths is important to you, you don't want to leave it to chance anyway. Mark.

On Wed, 8 Jan 2020 at 14:46, <adamv0025@netconsultings.com> wrote:
Other might be: “These experimental work is of great value to the community and there’s a process now to announce and manage these experiments, what about net neutrality, and besides modern BGP implementations should handle well formatted attributes and if it’s not the case its good that these flaws are being exposed and fixed.”
This is my position. Unfortunately it's a pipe dream, as you only need very few to think filtering is needed to ruin the utility. Some specific examples - don't clean up communities which don't belong to you ( - don't clean up TOS byte (I may want to communicate QoS over internet between my islands) - don't clean up BGP attributes (128 would have utility if it transit, but due to old issues, it often does not) - don't drop ICMP (ICMP TS would be high utility if not filtered) I think we need specific good reason to mangle/filter and if you cannot come up with one, don't do it. If you can come up with one, consider if it's persistent or workaround to deal with specific active defect. -- ++ytti

From: Saku Ytti <saku@ytti.fi> Sent: Wednesday, January 8, 2020 1:09 PM
On Wed, 8 Jan 2020 at 14:46, <adamv0025@netconsultings.com> wrote:
Other might be: “These experimental work is of great value to the community and there’s a process now to announce and manage these experiments, what about net neutrality, and besides modern BGP implementations should handle well formatted attributes and if it’s not the case its good that these flaws are being exposed and fixed.”
This is my position. Unfortunately it's a pipe dream, as you only need very few to think filtering is needed to ruin the utility.
In an ideal world that would be my position too, but I suppose it depends on the context, Imagine: CTO: Could you have prevented this major financial and market loss and damage to our reputation resulting from this major network outage, is this something that never happened before and couldn't be foreseen? Me: Nah happened already and sure I could have simply dropped the offending BGP attribute 254 in this case since it's not used anyways. CTO: What the ..., why haven’t you do so then?!?!? Me: Well because "this experimental work is of great value to the community and there’s a process now to announce and manage these experiments, what about net neutrality, and besides modern BGP implementations should handle well formatted attributes and if it’s not the case it's good that these flaws are being exposed and fixed". CTO: You mean exposed like this? Like breaking my network?!?!? Get the ... out of here you're fired!!!!
Some specific examples
- don't clean up communities which don't belong to you ( Agreed, whatever you do only condition communities with your AS# on them.
- don't clean up TOS byte (I may want to communicate QoS over internet between my islands) Agreed, will dump it all into best-effort or scavenger class in my MPLS backbone anyways, along with all the SD-WAN super high priority stuff... Falls into do not touch transit traffic unless under DoS.
- don't clean up BGP attributes (128 would have utility if it transit, but due to old issues, it often does not) Looking at https://www.iana.org/assignments/bgp-parameters/bgp-parameters.xhtml (don't know how well maintained it is actually), I see 128 as assigned to ATTR_SET [RFC6368], so if I filtered only Unassigned, Deprecated and Reserved from that list that shouldn't do any harm right?
- don't drop ICMP (ICMP TS would be high utility if not filtered) So obvious that you shouldn't even mention it, but again falls into do not touch transit traffic unless under DoS, Traffic (ICMP included) destined to your infrastructure -well that's subject to the iACL policies.
I think we need specific good reason to mangle/filter and if you cannot come up with one, don't do it. If you can come up with one, consider if it's persistent or workaround to deal with specific active defect.
Well the BGP attribute induced outage has a precedence and had quite a positive fallout in terms of BGP enhanced error handling etc... My pipedream is to have the time to shoot random stuff at BGP to see what happens and then report back to vendors about my findings,... But no, this is not part of our software certification test suite. adam
participants (4)
-
adamv0025@netconsultings.com
-
James Jun
-
Mark Tinka
-
Saku Ytti