Hi, I'm looking for information about the way networks use communities in BGP. It seems that many of the larger networks only use communities to supply their customers with a mechanism to adjust the local preference to indicate which connection is preferred when a customer connects over more than one link (something that can also be done with the MED). How many networks are there that use communities to indicate where (which interconnect point) a route was learned? And how many networks use this information if their upstream provides it? And how about things like congestion? Is there any need for more "well known" communities? TIA, Iljitsch van Beijnum
Date: Mon, 15 Oct 2001 11:55:27 +0200 (CEST) From: Iljitsch van Beijnum <iljitsch@muada.com>
I'm looking for information about the way networks use communities in BGP.
It seems that many of the larger networks only use communities to supply their customers with a mechanism to adjust the local preference to indicate which connection is preferred when a customer connects over more than one link (something that can also be done with the MED).
Remember that local-pref has higher priority than as-path length; MED is the lowest priority before router ID. For instance, I match "_asnthatIdontlike_" and penalize local-pref to [try to] avoid routing traffic over an ASN that I think has poor performance. If I penalize AS65000, then me 3549 65500 65432 65432 65432 65123 will be preferred over me 6347 65000 65123 This is one reason that redistributing one's upstream routes via BGP can be bad despite as-path length: If someone uses local-pref, it's quite conceivable that one will take the erroneous path that some edge idiot[1] leaked into the table. [1] I'm an edge-dweller. I can insult them. Note, however, that upstreams _should_ filter their downstreams to prevent improper adverts... but the root of the problem is the one at the edge. [ snip ]
And how about things like congestion?
How do you mean?
Is there any need for more "well known" communities?
I wish that providers would set a community indicating route ingress. I know, for instance, that GBLX does this... but their system with hundreds of communites leaves some to be desired, IMHO. I'd like to see providers tag "route learned in this region" at various granularity levels. As for providers listening to communities, I like selective as-path padding... I'd have to dig up the thread, but this has been discussed in the past few months. Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
First, thanks to everyone who replied. On Mon, 15 Oct 2001, E.B. Dreger wrote:
For instance, I match "_asnthatIdontlike_" and penalize local-pref to [try to] avoid routing traffic over an ASN that I think has poor performance. If I penalize AS65000, then
me 3549 65500 65432 65432 65432 65123
will be preferred over
me 6347 65000 65123
This is one reason that redistributing one's upstream routes via BGP can be bad despite as-path length: If someone uses local-pref, it's quite conceivable that one will take the erroneous path that some edge idiot[1] leaked into the table.
I don't understand what you mean. Redistributing upstream routes where/into what? How can this be "despite" as-path length?
And how about things like congestion?
How do you mean?
Well, let me provide a real-world example. It's not really congestion, but close enough for these purposes. When Telehouse had problems in Manhattan after the attacks on September 11, one of our transit networks issued a warning that they might lose lots or routes if the power would go down (which seemed likely at the time). Since they use ATM, the BGP session to the router at the other side of the connection would have to time out if this happened, creating a temporary black hole. For us, the lost routes wouldn't be a problem, since we multihome. But a black hole is a multihomer's worst nightmare. So a community that indicates "you don't want to use this route unless you absolutely have to--trust us" would have been very welcome. Such a community would be especially useful in the face of congestion: multihomers can route around the congested area and since some traffic is rerouted the congestion would be less for the traffic that remains.
Is there any need for more "well known" communities?
I wish that providers would set a community indicating route ingress. I know, for instance, that GBLX does this... but their system with hundreds of communites leaves some to be desired, IMHO.
I'd like to see providers tag "route learned in this region" at various granularity levels.
I would love to be able to see where a route originated and how bad the detour getting here was. But is it worth the trouble to try to "standardize" communities for this? Iljitsch
Date: Mon, 15 Oct 2001 19:56:02 +0200 (CEST) From: Iljitsch van Beijnum <iljitsch@muada.com>
(This is more like two messages in one... I'm posting as a single message in sort of a "self digest" mode.) [ snip ] *** Message #1 ***
I don't understand what you mean. Redistributing upstream routes where/into what? How can this be "despite" as-path length?
Hypothetical example with real names: Let's say that I have transit from 6347 and 2914. Now let's say that I'm stupid, and start advertising routes that I learn from 2914 into 6347, and that 6347 isn't filtering my as-paths or netblocks. [Note: 6347 does know better in the real world.] Now a customer ("Network X") of 6347 and 1239 will see 2914 netblocks via 6347 19358 2914 6347 { 701 | 1239 | 3561 } 2914 1239 2914 assuming that: + 1239/2914 directly connect + 6347/2914 do not directly connect + 6347 obtains transit to 2914 via 701, 1239, and 3561. 6347 learns 2914 routes from 701; 1239; 3561; and (wrongly) me, 19358... then chooses a best route to redistribute. Because 6347 sells transit to me, they'll give my routes higher local-pref than their peers or upstreams. Thus, for any 2914 netblock, I become the preferred egress from 6347. Problem #1. Now lets say that Network X uses local-pref to penalize _1239_.*_2914 Network X sees: 6347 19358 2914 1239 2914 Network X's local-pref policies in their route-maps makes the latter one undesirable. Problem #2, and the [extreme] example in my prior post. Some old-timers help me out: IIRC, 3561 got blackholed in 1997 by bad BGP from another well-known network... but I don't want to say more in case my memory is bad. *** Message #2 ***
Well, let me provide a real-world example. It's not really congestion, but close enough for these purposes.
When Telehouse had problems in Manhattan after the attacks on
[ snip ]
So a community that indicates "you don't want to use this route unless you absolutely have to--trust us" would have been very welcome. Such a community would be especially useful in the face of congestion:
I see and agree. Good idea, IMHO.
But is it worth the trouble to try to "standardize" communities for this?
I should think that this would be trivial. 0x0000:* and 0xffff:* are reserved per RFC1997... release a new RFC with your "you don't want this route!" communities added, participants would benefit, non-participants would observe no change, and there would be no interoperability troubles. I think I like this better than my prior geography-based post... you're suggesting that MED-like info be advertised via standard communities. And who would know better than the originating provider? Makes sense to me... Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
Hypothetical example with real names:
Let's say that I have transit from 6347 and 2914. Now let's say that I'm stupid, and start advertising routes that I learn from 2914 into 6347, and that 6347 isn't filtering my as-paths or netblocks. [Note: 6347 does know better in the real world.]
Gee, this is already something that can easily be solved - route-maps are your friends. The moment you do something like this you *will* get filtered.
Now a customer ("Network X") of 6347 and 1239 will see 2914 netblocks via
6347 19358 2914 6347 { 701 | 1239 | 3561 } 2914 1239 2914
assuming that:
+ 1239/2914 directly connect + 6347/2914 do not directly connect + 6347 obtains transit to 2914 via 701, 1239, and 3561.
6347 learns 2914 routes from 701; 1239; 3561; and (wrongly) me, 19358... then chooses a best route to redistribute. Because 6347 sells transit to me, they'll give my routes higher local-pref than their peers or upstreams. Thus, for any 2914 netblock, I become the preferred egress from 6347. Problem #1.
You are missing a few little things - if 6347 does not filter and you redistribute 2914 routes to 6347, you will redistribute entire view of the world from perspective of 2914, since 2914 if your upstream provider as well. Since 6347 prefers your routes, you will become exit point for all non-customer traffic of 6347, which is going to be immediately detected. All of this of course is exercise in typing since everyone sane has some knobs that they set to make sure that their customers do not blow up their entire network.
Now lets say that Network X uses local-pref to penalize
_1239_.*_2914
Network X sees:
6347 19358 2914 1239 2914
Network X's local-pref policies in their route-maps makes the latter one undesirable. Problem #2, and the [extreme] example in my prior post.
Some old-timers help me out: IIRC, 3561 got blackholed in 1997 by bad BGP from another well-known network... but I don't want to say more in case my memory is bad.
7007 problem was different. The issue was that 7007 redistributed EGP into classful IGP, which got redistributed back into IGP, which of course broke AS_PATH loop detection in addition to creating a set of higher specificity routes. Alex
On Mon, 15 Oct 2001, E.B. Dreger wrote:
Let's say that I have transit from 6347 and 2914. Now let's say that I'm stupid, and start advertising routes that I learn from 2914 into 6347, and that 6347 isn't filtering my as-paths or netblocks. [Note: 6347 does know better in the real world.]
Ok, I understand. There was a problem along these lines a few weeks ago. "Sorry guys, a circuit came into service unexpectedly, we hadn't installed any filters yet." (AS#s withheld to protect the guilty.) But then the question is: which is worse, having traffic flow over an inferior path, or taking the chance that two people who both should know better screw up?
*** Message #2 ***
[ snip ]
So a community that indicates "you don't want to use this route unless you absolutely have to--trust us" would have been very welcome. Such a community would be especially useful in the face of congestion:
I see and agree. Good idea, IMHO.
But is it worth the trouble to try to "standardize" communities for this?
I should think that this would be trivial. 0x0000:* and 0xffff:* are reserved per RFC1997... release a new RFC with your "you don't want this route!" communities added, participants would benefit, non-participants would observe no change, and there would be no interoperability troubles.
Yes, why not. If anyone has something to contribute or wants to co-author such a draft or RFC, contact me off-list.
I think I like this better than my prior geography-based post... you're suggesting that MED-like info be advertised via standard communities. And who would know better than the originating provider? Makes sense to me...
I've been thinking about other information that could be conveyed in communities. For instance, bandwidth, delay and packet loss. If each router along the way modifies such a community (should probably be an extended one) then a much richer set of information would be available to multihomers to aid in route selection. Iljitsch
* iljitsch@muada.com (Iljitsch van Beijnum) [Tue 16 Oct 2001, 10:48 CEST]:
I've been thinking about other information that could be conveyed in communities. For instance, bandwidth, delay and packet loss. If each router along the way modifies such a community (should probably be an extended one) then a much richer set of information would be available to multihomers to aid in route selection.
And generate a route flap every time a link gets used more or less? That would be suboptimal to say the least (the word `countereffective' seems more applicable to me). -- Niels.
On Tue, 16 Oct 2001, Niels Bakker wrote:
I've been thinking about other information that could be conveyed in communities. For instance, bandwidth, delay and packet loss.
And generate a route flap every time a link gets used more or less? That would be suboptimal to say the least (the word `countereffective' seems more applicable to me).
Using dynamic data for this is not going to work in BGP, so this would have to be static information (hm, packet loss is not too static, hopefully). Static system-derived or configured information would already help a lot. You can then easily select the route with the highest potential bandwidth or the lowest speed-of-light delay, without the need to know a lot about the internals of a transit network. Introducing "metrics" like this like this is not contrary to BGP design philosophy: the way in which an AS selects the best route is not defined in the RFC and the length of the AS path is certainly not the best possible criterion. The processing along the way would be limited to a simple addition (delay), compare/replace (bandwidth) or multiplication (packet loss) without introducing anything SPF-like. Iljitsch
* iljitsch@muada.com (Iljitsch van Beijnum) [Tue 16 Oct 2001, 12:11 CEST]:
I've been thinking about other information that could be conveyed in communities. For instance, bandwidth, delay and packet loss.
On Tue, 16 Oct 2001, Niels Bakker wrote:
And generate a route flap every time a link gets used more or less? That would be suboptimal to say the least (the word `countereffective' seems more applicable to me). Using dynamic data for this is not going to work in BGP, so this would have to be static information (hm, packet loss is not too static, hopefully).
Indeed.
Static system-derived or configured information would already help a lot. You can then easily select the route with the highest potential bandwidth or the lowest speed-of-light delay, without the need to know a lot about the internals of a transit network.
Introducing "metrics" like this like this is not contrary to BGP design philosophy: the way in which an AS selects the best route is not defined in the RFC and the length of the AS path is certainly not the best possible criterion.
Setting communities based on a prefix's entry point into an ASN is doable with today's technologies (slight understatement). What's needed besides a standard numbering scheme for those communities is a way in all routers to route packets not merely destination-based but also based on a community set by the customer advertising the prefix to its upstream provider. As already noted, currently communities are mostly used to control advertisements of one's announcements by upstream providers, and not for outbound routing, which Example: Customer A has a connection to upstream B and speaks BGP with B. B as two different paths to C: one cheap and slow, one fast and expensive. (This seems to be a business opportunity - devise lines that are both cheap and fast.) Now B can set communities on routes received from C based on where a certain prefix was received. If they overlap, however, only the best route out of the two will be passed on to customer A. If this obstacle is overcome, A still faces the problem of getting B to discern between packets meant for either exit point to C. B could reengineer its network to basically exist of two separate entities (a cheap one and an expensive one) and let customers like A to connect to both, or extend all its routers to have a pre-prefix source+destination routing table entry to decide where to send packets. This seems to need quite some engineering work. :-) Or A could buy B and do it themselves. On a side note, A's possibilities of influencing inbound routing decisions - given that B acts on communities set by A, like `Prepend own ASN a few times before sending over just this link' or `Don't announce to D at all' - are already technically possible. Frankly, if I were B I'm not sure I'd be all that happy with customers influencing my routing decision process. They hand me their packets (or not); that should be it. Regards, -- Niels.
On Tue, 16 Oct 2001, Niels Bakker wrote:
As already noted, currently communities are mostly used to control advertisements of one's announcements by upstream providers, and not for outbound routing,
I'm sure it's used more for the former than the latter, however, there are networks that look at communities for outbound routing. A little more than I expected, even. This seems to happen mostly at multihomed networks. For instance we (AS12854) set a lower metric for routes that come in over a certain exchange point and a lower local preference for routes learned somewhere across the atlantic.
Customer A has a connection to upstream B and speaks BGP with B. B as two different paths to C: one cheap and slow, one fast and expensive. (This seems to be a business opportunity - devise lines that are both cheap and fast.)
Well, lines used to be both expensive and slow, so at least there is progress...
Now B can set communities on routes received from C based on where a certain prefix was received. If they overlap, however, only the best route out of the two will be passed on to customer A.
Yes, this is always the problem with BGP. If I like low delay, but my upstream prefers a high bandwidth route that is also available for that destination, I don't get to see that nice low delay route I would have liked to use.
If this obstacle is overcome, A still faces the problem of getting B to discern between packets meant for either exit point to C. B could reengineer its network to basically exist of two separate entities (a cheap one and an expensive one) and let customers like A to connect to both, or extend all its routers to have a pre-prefix source+destination routing table entry to decide where to send packets.
This seems to need quite some engineering work. :-)
B could also do away with layer 3 and sell layer 2 (or layer 1) connectivity to C, where each customer can select the appropriate quality levels. Other options are for B to focus on one selling point and try to optimize the network for that selling point, or use their expertise to find the perfect middle ground, or run several parallel networks.
Date: Tue, 16 Oct 2001 13:28:41 +0200 From: Niels Bakker <niels=nanog@bakker.net>
(Too lazy^H^H^H^Hrushed to rewrap quoted lines <= 72 char) [ snip ]
Customer A has a connection to upstream B and speaks BGP with B. B as two different paths to C: one cheap and slow, one fast and expensive. (This seems to be a business opportunity - devise lines that are both cheap and fast.)
Now B can set communities on routes received from C based on where a certain prefix was received. If they overlap, however, only the best route out of the two will be passed on to customer A. If this obstacle is overcome, A still faces the problem of getting B to discern between packets meant for either exit point to C. B could reengineer its network to basically exist of two separate entities (a cheap one and an expensive one) and let customers like A to connect to both, or extend all its routers to have a pre-prefix source+destination routing table entry to decide where to send packets.
This seems to need quite some engineering work. :-)
I've thought about this before. Let's say that I have a DS3 to 701 and a 4xDS1 to 7018. I might sell a DS1 to NetX, and advert their routes[1] to both upstreams. If I sell a 15Mbps frac-DS3 to NetZ, I'd better use my 7018 connection for backup only on their routes. In short, we start looking at multiple FIBs. It's not really that much more difficult; it's more of a scalability issue. I know that Zebra can run multiple router processes, but I've not played with this feature... perhaps that's a start. Or, if you want to get ugly, you could have your upstreams speak multihop EBGP selectively with your downstreams. *ducking and running* [1] Ignore issue of table fragmentation for now. That's another thread...
Or A could buy B and do it themselves.
On a side note, A's possibilities of influencing inbound routing decisions - given that B acts on communities set by A, like `Prepend own ASN a few times before sending over just this link' or `Don't announce to D at all' - are already technically possible. Frankly, if I were B
Correct. And a few upstreams allow this.
I'm not sure I'd be all that happy with customers influencing my routing decision process. They hand me their packets (or not); that should be it.
I disagree. Let's say that you sell me transit, and purchase yours from 701 and 1239. Would you complain if I fill my pipe to you with traffic to/from 701? No. If I fill it with traffic to/from 1329? No. Why, then, would you complain if I set a community to _prefer_ 701 over 1239 or vice-versa? By giving your downstreams fine- grained tuning, you allow them to tinker for a system that they like... and you don't reach the extreme cases that are possible even without fine-grained tuning. Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
* eddy+public+spam@noc.everquick.net (E.B. Dreger) [Tue 16 Oct 2001, 18:09 CEST]:
Let's say that I have a DS3 to 701 and a 4xDS1 to 7018. I might sell a DS1 to NetX, and advert their routes[1] to both upstreams. If I sell a 15Mbps frac-DS3 to NetZ, I'd better use my 7018 connection for backup only on their routes.
In an ideal world. Not sure how many networks engineer their external connections so that each one equals the maximum amount of data sent out on all of them, in case all except one fail...
In short, we start looking at multiple FIBs. It's not really that much more difficult; it's more of a scalability issue. I know that Zebra can run multiple router processes, but I've not played with this feature... perhaps that's a start.
Zebra doesn't actually forward packets. Ciscos with newer IOS can do this (12.0T and onwards) with different VRFs. I've seen companies who have something like that in production; packets hit the same router a few times in a row in a traceroute.
Or, if you want to get ugly, you could have your upstreams speak multihop EBGP selectively with your downstreams. *ducking and running*
The "less hassle" part of having a limited amount of upstream providers to deal with certainly diminishes in this particular scenario, yes.
Frankly, if I were B I'm not sure I'd be all that happy with customers influencing my routing decision process. They hand me their packets (or not); that should be it. I disagree. Let's say that you sell me transit, and purchase yours from 701 and 1239. Would you complain if I fill my pipe to you with traffic to/from 701? No. If I fill it with traffic to/from 1329? No.
Yes, I would complain if you sent me packets with source addresses you shouldn't be sourcing (i.e., not your own). Traffic from 701 or 1239 should not pass you to reach me (if I were B and you customer A).
Why, then, would you complain if I set a community to _prefer_ 701 over 1239 or vice-versa? By giving your downstreams fine- grained tuning, you allow them to tinker for a system that they like... and you don't reach the extreme cases that are possible even without fine-grained tuning.
This is about packets from the world via me to you, not from you to the outside world. The case you just described already exists; I wrote so before (albeit in a bit broken English). The only routing decision customer A can force upon B is "Send packets destined for these netblocks <here's a BGP announcement> to me," and enforces this via a contract both parties enter in and A (presumably) pays B for. Regards, -- Niels.
Date: Tue, 16 Oct 2001 18:30:05 +0200 From: Niels Bakker <niels=nanog@bakker.net>
In short, we start looking at multiple FIBs. It's not really that much more difficult; it's more of a scalability issue. I know that Zebra can run multiple router processes, but I've not played with this feature... perhaps that's a start.
Zebra doesn't actually forward packets. Ciscos with newer IOS can do
Correct. It edits the *ix kernel's FIB, adding and deleting routes. However, Zebra running on a single machine can have multiple BGP processes running... which is along the same lines.
this (12.0T and onwards) with different VRFs. I've seen companies who have something like that in production; packets hit the same router a few times in a row in a traceroute.
Interesting. I was unaware of this.
Frankly, if I were B I'm not sure I'd be all that happy with customers influencing my routing decision process. They hand me their packets (or not); that should be it. I disagree. Let's say that you sell me transit, and purchase yours from 701 and 1239. Would you complain if I fill my pipe to you with traffic to/from 701? No. If I fill it with traffic to/from 1329? No.
Yes, I would complain if you sent me packets with source addresses you shouldn't be sourcing (i.e., not your own). Traffic from 701 or 1239 should not pass you to reach me (if I were B and you customer A).
Whoa! Where did I say spoofed packets? If 701 is one of your upstreams or peers, then I can exchange traffic with 701 all day long. I never indicated using improper source addresses. Please reread my post. me <--> you <--> 701 me <--> you <--> 1239 Both are valid.
Why, then, would you complain if I set a community to _prefer_ 701 over 1239 or vice-versa? By giving your downstreams fine- grained tuning, you allow them to tinker for a system that they like... and you don't reach the extreme cases that are possible even without fine-grained tuning.
This is about packets from the world via me to you, not from you to the outside world. The case you just described already exists; I wrote so before (albeit in a bit broken English).
The only routing decision customer A can force upon B is "Send packets destined for these netblocks <here's a BGP announcement> to me," and
In your scenario. But this is arbitrary; it is not borne of necessity due to the technology.
enforces this via a contract both parties enter in and A (presumably) pays B for.
Let's say that I'm strictly a Web host. Inbound traffic is negligible. I send any and all 701-bound traffic via you; any and all other traffic goes through <some other upstreams>. No complaint there -- and I can do this in your aforementioned scheme. Why do you balk at a community that says "I dislike 1239"[1], thus _preferring_ 701, when I could simply route _all_ non-701 traffic over another one of my upstreams? IMHO, your dislike of tuning is illogical... I can sway the balance _far_ more with coarse-grained routing when you don't provide fine-grained controls. Not providing fine-grained tuning accomplishes nothing positive, and can be a negative thing. Offering it provides benefit, and is not difficult.[2] [1] Reminder: Hypothetical example. Interpret accordingly. I used 701 and 1239 in my original example, and don't care to change the scenario. [2] Yes, more maintenance with communities. But a few dozen is all it takes to handle many ASen with a few different lengths... both the initial effort and upkeep are negligible. Search the archives for this discussion. Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
* eddy+public+spam@noc.everquick.net (E.B. Dreger) [Tue 16 Oct 2001, 21:09 CEST]:
In short, we start looking at multiple FIBs. It's not really that much more difficult; it's more of a scalability issue. I know that Zebra can run multiple router processes, but I've not played with this feature... perhaps that's a start. Zebra doesn't actually forward packets. Ciscos with newer IOS can do Correct. It edits the *ix kernel's FIB, adding and deleting routes. However, Zebra running on a single machine can have multiple BGP processes running... which is along the same lines.
Except that Zebra currently does not have any provisions to be able to tell the forwarding engine it's running on (i.e. any Unix) a rule to the effect of "If packets originate from this peer [this interface] and are destined for this prefix, route them over that particular interface instead of the interface that would've been taken for all packets from all other prefixes." Which is, in effect, what multiple FIBs mean in practice.
Frankly, if I were B I'm not sure I'd be all that happy with customers influencing my routing decision process. They hand me their packets (or not); that should be it. I disagree. Let's say that you sell me transit, and purchase yours from 701 and 1239. Would you complain if I fill my pipe to you with traffic to/from 701? No. If I fill it with traffic to/from 1329? No. Yes, I would complain if you sent me packets with source addresses you shouldn't be sourcing (i.e., not your own). Traffic from 701 or 1239 should not pass you to reach me (if I were B and you customer A). Whoa! Where did I say spoofed packets? If 701 is one of your upstreams or peers, then I can exchange traffic with 701 all day long. I never indicated using improper source addresses. Please reread my post.
Sorry, I misread you. Let me restate my previous statement before that a bit then: Yes, I would mind them attempting to choose which exit point into AS701 their packets would take. This could lead to suboptimal performance for all B's customers and a loss of control over the bills sent to B by its upstream providers. In addition to having to monitor its own network for long-term bottlenecks B will have to stay on a continuous alert for customers clogging one link.
me <--> you <--> 701 me <--> you <--> 1239
Both are valid.
Used above by me, customer A <--> B <--> AS701 (West Buttmunch) <--> AS701 (East Buttmunch) (numbers hold of course no discernable relationship to reality)
Why, then, would you complain if I set a community to _prefer_ 701 over 1239 or vice-versa? By giving your downstreams fine- grained tuning, you allow them to tinker for a system that they like... and you don't reach the extreme cases that are possible even without fine-grained tuning. This is about packets from the world via me to you, not from you to the outside world. The case you just described already exists; I wrote so before (albeit in a bit broken English). The only routing decision customer A can force upon B is "Send packets destined for these netblocks <here's a BGP announcement> to me," and In your scenario. But this is arbitrary; it is not borne of necessity due to the technology.
Actually, yes. The technology exists today for customer A to tell B to announce A's prefixes only to some peers/upstream providers of B, but not to route packets from A all via some peers/upstream providers of B and not via the others, even though B would choose those routes for its own packets (and thus has installed them into the FIBs of their routers).
enforces this via a contract both parties enter in and A (presumably) pays B for. Let's say that I'm strictly a Web host. Inbound traffic is negligible. I send any and all 701-bound traffic via you; any and all other traffic goes through <some other upstreams>. No complaint there -- and I can do this in your aforementioned scheme.
Why do you balk at a community that says "I dislike 1239"[1], thus _preferring_ 701, when I could simply route _all_ non-701 traffic over another one of my upstreams? IMHO, your dislike of tuning is illogical... I can sway the balance _far_ more with coarse-grained routing when you don't provide fine-grained controls.
Because then you introduce C into the mix, another upstream provider of A. That's cheating. :-) I thought the whole discussion was about B having multiple exit points and A influencing what exit points from B's network A's packets would take?
Not providing fine-grained tuning accomplishes nothing positive,
Simplicity for its own sake also has value (even aside from benefits like easier troubleshooting in case of failures, no need to generate transient outages while fiddling with the tuning knobs, etc.). Regards, -- Niels.
On Wed, 17 Oct 2001, Niels Bakker wrote:
Except that Zebra currently does not have any provisions to be able to tell the forwarding engine it's running on (i.e. any Unix) a rule to the effect of "If packets originate from this peer [this interface] and are destined for this prefix, route them over that particular interface instead of the interface that would've been taken for all packets from all other prefixes." Which is, in effect, what multiple FIBs mean in practice.
Yes it does. Sort of. Under Linux, You can tell each zebra instance to populate a differnt routing table, and then use the ip command to set route policys to direct inbound traffic to these tables based on whatever you like (source address, ingress port, TOS bits, etc)
At 17:52 17/10/01 +0200, Niels Bakker wrote: Been there. We tried to put together a simple RFC for communities to influence routing as can be found at: http://www.att.net.il/~hank/bgp-glob-comm.txt IDR never accepted it so it died a quiet death. -Hank
* eddy+public+spam@noc.everquick.net (E.B. Dreger) [Tue 16 Oct 2001, 21:09 CEST]:
In short, we start looking at multiple FIBs. It's not really that much more difficult; it's more of a scalability issue. I know that Zebra can run multiple router processes, but I've not played with this feature... perhaps that's a start. Zebra doesn't actually forward packets. Ciscos with newer IOS can do Correct. It edits the *ix kernel's FIB, adding and deleting routes. However, Zebra running on a single machine can have multiple BGP processes running... which is along the same lines.
Except that Zebra currently does not have any provisions to be able to tell the forwarding engine it's running on (i.e. any Unix) a rule to the effect of "If packets originate from this peer [this interface] and are destined for this prefix, route them over that particular interface instead of the interface that would've been taken for all packets from all other prefixes." Which is, in effect, what multiple FIBs mean in practice.
Frankly, if I were B I'm not sure I'd be all that happy with customers influencing my routing decision process. They hand me their packets (or not); that should be it. I disagree. Let's say that you sell me transit, and purchase yours from 701 and 1239. Would you complain if I fill my pipe to you with traffic to/from 701? No. If I fill it with traffic to/from 1329? No. Yes, I would complain if you sent me packets with source addresses you shouldn't be sourcing (i.e., not your own). Traffic from 701 or 1239 should not pass you to reach me (if I were B and you customer A). Whoa! Where did I say spoofed packets? If 701 is one of your upstreams or peers, then I can exchange traffic with 701 all day long. I never indicated using improper source addresses. Please reread my post.
Sorry, I misread you. Let me restate my previous statement before that a bit then: Yes, I would mind them attempting to choose which exit point into AS701 their packets would take. This could lead to suboptimal performance for all B's customers and a loss of control over the bills sent to B by its upstream providers. In addition to having to monitor its own network for long-term bottlenecks B will have to stay on a continuous alert for customers clogging one link.
me <--> you <--> 701 me <--> you <--> 1239
Both are valid.
Used above by me, customer A <--> B <--> AS701 (West Buttmunch) <--> AS701 (East Buttmunch)
(numbers hold of course no discernable relationship to reality)
Why, then, would you complain if I set a community to _prefer_ 701 over 1239 or vice-versa? By giving your downstreams fine- grained tuning, you allow them to tinker for a system that they like... and you don't reach the extreme cases that are possible even without fine-grained tuning. This is about packets from the world via me to you, not from you to the outside world. The case you just described already exists; I wrote so before (albeit in a bit broken English). The only routing decision customer A can force upon B is "Send packets destined for these netblocks <here's a BGP announcement> to me," and In your scenario. But this is arbitrary; it is not borne of necessity due to the technology.
Actually, yes. The technology exists today for customer A to tell B to announce A's prefixes only to some peers/upstream providers of B, but not to route packets from A all via some peers/upstream providers of B and not via the others, even though B would choose those routes for its own packets (and thus has installed them into the FIBs of their routers).
enforces this via a contract both parties enter in and A (presumably) pays B for. Let's say that I'm strictly a Web host. Inbound traffic is negligible. I send any and all 701-bound traffic via you; any and all other traffic goes through <some other upstreams>. No complaint there -- and I can do this in your aforementioned scheme.
Why do you balk at a community that says "I dislike 1239"[1], thus _preferring_ 701, when I could simply route _all_ non-701 traffic over another one of my upstreams? IMHO, your dislike of tuning is illogical... I can sway the balance _far_ more with coarse-grained routing when you don't provide fine-grained controls.
Because then you introduce C into the mix, another upstream provider of A. That's cheating. :-)
I thought the whole discussion was about B having multiple exit points and A influencing what exit points from B's network A's packets would take?
Not providing fine-grained tuning accomplishes nothing positive,
Simplicity for its own sake also has value (even aside from benefits like easier troubleshooting in case of failures, no need to generate transient outages while fiddling with the tuning knobs, etc.).
Regards,
-- Niels.
On Wed, 17 Oct 2001, Hank Nussbacher wrote:
Been there. We tried to put together a simple RFC for communities to influence routing as can be found at: http://www.att.net.il/~hank/bgp-glob-comm.txt
IDR never accepted it so it died a quiet death.
Any particular reason why they didn't accept it?
At 14:00 18/10/01 +0200, Iljitsch van Beijnum wrote:
On Wed, 17 Oct 2001, Hank Nussbacher wrote:
Been there. We tried to put together a simple RFC for communities to influence routing as can be found at: http://www.att.net.il/~hank/bgp-glob-comm.txt
IDR never accepted it so it died a quiet death.
Any particular reason why they didn't accept it?
Not really sure. Best to ask them. -hank
This seems to need quite some engineering work. :-)
I've thought about this before.
Let's say that I have a DS3 to 701 and a 4xDS1 to 7018. I might sell a DS1 to NetX, and advert their routes[1] to both upstreams. If I sell a 15Mbps frac-DS3 to NetZ, I'd better use my 7018 connection for backup only on their routes.
Router (A) policy 701 Router (B) policy 7018 Router (C) your policy ( 701 && 7018 && whatever else you have) Where is the problem again? We, Netaxs, (AS4969) do this in multiple locations with multiple OC-12s to different transit providers and our own network where some customers want to always use a specific path or not use any path at all, while the others do not want to be bothered about which path can be used. That is all done today with Cisco and Juniper gear and confederations, without need for any random changes to BGP protocol.
On a side note, A's possibilities of influencing inbound routing decisions - given that B acts on communities set by A, like `Prepend own ASN a few times before sending over just this link' or `Don't announce to D at all' - are already technically possible. Frankly, if I were B
Correct. And a few upstreams allow this.
It is very simple to do. Create a set of 'advertise-me' communities and 'pad-me' communities. Alex
On Tue, Oct 16, 2001 at 01:00:45PM -0400, alex@yuriev.com wrote:
On a side note, A's possibilities of influencing inbound routing decisions - given that B acts on communities set by A, like `Prepend own ASN a few times before sending over just this link' or `Don't announce to D at all' - are already technically possible. Frankly, if I were B
Correct. And a few upstreams allow this.
It is very simple to do. Create a set of 'advertise-me' communities and 'pad-me' communities.
Although it did a few other things as well, an attempt at standardizing parts of this failed: draft-bonaventure-bgp-redistribution-01 This draft included: IDRP style DIST_LIST_INCL, DIST_LIST_EXCL Proxied NO_EXPORT Proxied Prepending The IDRP-style DIST_LISTs seem to generate most of the heat. We never got a firm feel for why the other two componenents were disliked. Geoff Huston proposed draft-huston-nopeer-00.txt to attempt to address some of the route propagation issues that the DIST_LISTs were intended to address.
Alex
-- Jeff Haas NextHop Technologies
Jeffrey Haas wrote:
draft-bonaventure-bgp-redistribution-01
This draft included: IDRP style DIST_LIST_INCL, DIST_LIST_EXCL Proxied NO_EXPORT Proxied Prepending
The IDRP-style DIST_LISTs seem to generate most of the heat. We never got a firm feel for why the other two componenents were disliked.
Geoff Huston proposed draft-huston-nopeer-00.txt to attempt to address some of the route propagation issues that the DIST_LISTs were intended to address.
The problem, as I saw it, was that in attempting to specify a subset of the routing space the authors specified an enumerated list of AS's that formed the boundary of this subset. The two major problems, as I see it, is that you may not have up to date information about what As's are on the boundary of the subset which you want to apply to the redistribution, and each remote AS that is not on the boundary has no knowledge whether it is intended to be inside or outside this set. The alternative approach was to specify a common condition which characterized all members of the subset, allowing each remote AS to use local knowledge to see if it met the originator-defined constraints or not. This was the basis of the no-peer approach. Geoff Huston Telstra
On Mon, Oct 15, 2001 at 11:55:27AM +0200, Iljitsch van Beijnum wrote:
How many networks are there that use communities to indicate where (which interconnect point) a route was learned?
How feasible is it for me to provide this information in any meaningful way if I have tens or even hundreds of interconnect points in my network? Obviously I can assign a unique community to each such point on my network, and tag all routes I learn there with that community, but is the benefit of my doing so? Unless you have some way of knowing whether interconnect point A is "better" than interconnect point B, how would you use that information? This isn't to say that there isn't a reason to do this. I can think of several *internal* uses for such a scheme, including distance- sensitive billing applications, traffic engineering, etc. But is there a benefit to revealing this information to customers or peers?
And how many networks use this information if their upstream provides it?
Without having a clear understanding of each upstream's network topology and routing policy, how would you use such information to label one route as "better" than another? What problem(s) are you trying to solve, and are you sure that BGP communities are the right tool for the job? --Jeff
Date: Mon, 15 Oct 2001 12:34:24 -0400 From: Jeff Aitken <jaitken@aitken.com>
How feasible is it for me to provide this information in any
[ snip ]
This isn't to say that there isn't a reason to do this. I can think
[ snip ]
Without having a clear understanding of each upstream's network topology and routing policy, how would you use such information to label one route as "better" than another?
Let's take a simple example. Say that I connect to AS65123 in DFW and AS65456 in Chicago. Assume that both ASen have similar peering with other networks. Now, using only as-path length, where do I send traffic? Is as-path length the best metric? No. If I need traffic headed for MSP, it should go through CHI. If I need traffic to go to Houston, it should be routed through Dallas. How does one do this now? Static entries based on RADB or similar?! If that's acceptable, then why don't we just static route, period?! Real example: If I'm a 6347 downstream and I know that 6347 has transit via 701, 1239, 3561 near me, I'm going to use a route-map. That's easy. Now let's take 3967... in most places, peering with 6347 seems better than with 3549. I send 3967 traffic via 6347. But it's not perfect... I'd rather send certain regions via 3549. Without regional tagging, how do I do that? Hypothetical example made into real example. Furthermore, define "clear understanding". If I test different traffic paths, I can get a pretty clear understanding. Not as good as a detailed network map, but enough to tune routes better than leaving them up to nature.
What problem(s) are you trying to solve, and are you sure that
See above.
BGP communities are the right tool for the job?
Sure that they're the right tool, no. Sure that they're the best tool -- until someone shows me a better one. Eddy --------------------------------------------------------------------------- Brotsman & Dreger, Inc. - EverQuick Internet Division Phone: +1 (316) 794-8922 Wichita/(Inter)national Phone: +1 (785) 865-5885 Lawrence --------------------------------------------------------------------------- Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
On Mon, Oct 15, 2001 at 05:26:54PM +0000, E.B. Dreger wrote:
Let's take a simple example. Say that I connect to AS65123 in DFW and AS65456 in Chicago. Assume that both ASen have similar peering with other networks. [...] If I need traffic headed for MSP, it should go through CHI.
Perhaps, but in order to be sure we need to know more. First of all, we need to know who the third network is -- the upstream for the end user in MSP. Let's assume it's AS65535. You've decided on geography alone that it's better to hand MSP-destined traffic to AS65456 in ORD. Do AS65456 and AS6535 even have peering in that area? What if the only two places they peer are on the west and east coasts? You also didn't identify the starting point on your network. Let's assume it's Dallas. You have two choices: 1. You carry the bits from DFW to ORD on your network and hand them to AS65456. AS65456 then carries them to wherever they peer with AS65535 and hand them over for final delivery. 2. You hand off to AS65123 in DFW. AS65123 carries the bits to a location where they peer with AS65535, who takes them the rest of the way. Without any knowledge of the topology, routing policy, backbone capacity, and peering placement and density of the three networks involved, how can you say for sure which option is "better"? I'd be inclined to ask why you're paying AS65123 for service if you can do a better job of carrying bits to MSP than they can, personally.
If I need traffic to go to Houston, it should be routed through Dallas.
Not necessarily. See above. Where is the starting point on your network? If it's Dallas, maybe. But what about from other points on your network?
Furthermore, define "clear understanding".
Clear understanding means: 1. You need to know each providers' backbone. Just because two cities are close together on a map or have fiber between them doesn't mean that they're connected at layer 3. 2. You need to know each provider's routing policy. Just because two networks peer here doesn't mean they won't exchange bits there. 3. You need to know where each provider has peering, and with whom. Just because a provider has a POP in a given city doesn't mean that they peer there. Just because two providers have routers in the same room in the same building in the same city doesn't mean they peer there. If you don't have this information, then what you're doing is guessing based on nothing more than geographical information. If you're going to do that, how granular do you want the data? I don't think city-level is a good idea -- too often you'll make the wrong choice. Regional-level might work, but for what definition of "region"? As a provider, I'd rather hear from my customers that there is a problem so that I can fix the root cause, rather than telling them to hack around it. --Jeff
On Mon, 15 Oct 2001, Jeff Aitken wrote:
How many networks are there that use communities to indicate where (which interconnect point) a route was learned?
How feasible is it for me to provide this information in any meaningful way if I have tens or even hundreds of interconnect points in my network?
Hm, are there hundreds of interconnect points, even world wide?
Obviously I can assign a unique community to each such point on my network, and tag all routes I learn there with that community, but is the benefit of my doing so? Unless you have some way of knowing whether interconnect point A is "better" than interconnect point B, how would you use that information?
If two networks are both rather large and interconnect in many places, it may be hard to put this information to use. But for multihoming customers this shouldn't be a big problem. For instance, we are in Europe and we assign a lower local preference to routes our upstreams receive in the US. So if there is a route over an interconnect point in Europe, we prefer it, regardless of AS path length. Obviously this will not guarantee selection of the best path, but there are cases when it prevents a transcontinental detour.
And how many networks use this information if their upstream provides it?
Without having a clear understanding of each upstream's network topology and routing policy, how would you use such information to label one route as "better" than another?
Give your multihomed customers some credit. They know how the traceroute program works. If part of you network or an exchange point is congested, your customers will know. Why not give them the tools to route around the problem?
What problem(s) are you trying to solve, and are you sure that BGP communities are the right tool for the job?
The problem is that the BGP route selection algorithm is far from perfect. Setting the local preference based on the AS path and communities is the only tool (apart from a big bag of money that makes all the problems disappear). Iljitsch
participants (9)
-
alex@yuriev.com
-
E.B. Dreger
-
Geoff Huston
-
Greg Maxwell
-
Hank Nussbacher
-
Iljitsch van Beijnum
-
Jeff Aitken
-
Jeffrey Haas
-
Niels Bakker