how is cold-potato done?

Ralph Doncaster

26 Jun 2002 26 Jun '02

5:52 p.m.

If I peer with network X in cities A and B, and receive the same route in both cities with an AS-path of X, how do I know which city to use for an exit? I can understand how if X uses communities to tag the geographic origin of the traffic, but I'm not aware of many networks that do this. Lots of networks claim to use cold-potato routing though, so how do they do it? Ralph Doncaster principal, IStop.com

Show replies by date

Jared Mauch

26 Jun 26 Jun

5:54 p.m.

On Wed, Jun 26, 2002 at 01:52:08PM -0400, Ralph Doncaster wrote:

...

If I peer with network X in cities A and B, and receive the same route in both cities with an AS-path of X, how do I know which city to use for an exit? I can understand how if X uses communities to tag the geographic origin of the traffic, but I'm not aware of many networks that do this. Lots of networks claim to use cold-potato routing though, so how do they do it?

they use the MED sent on the route (aka metric) from the other provider to determine which exit where they both interconnect is the "shortest". this can at times provide undesired results because of aggregation. - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.

Ralph Doncaster

6:07 p.m.

...

...
If I peer with network X in cities A and B, and receive the same route in both cities with an AS-path of X, how do I know which city to use for an exit? I can understand how if X uses communities to tag the geographic origin of the traffic, but I'm not aware of many networks that do this. Lots of networks claim to use cold-potato routing though, so how do they do it?

they use the MED sent on the route (aka metric) from the other provider to determine which exit where they both interconnect is the "shortest".

this can at times provide undesired results because of aggregation.

Besides aggregation, wouldn't this lead to a lot of ties? Let's say the cities are LA & Manhattan, and the route from X originates in Chicago. I would think that it would be a common occurrance for the route to have the same metric in LA & Manhattan. -Ralph

Greg Maxwell

5:57 p.m.

On Wed, 26 Jun 2002, Ralph Doncaster wrote:

...

If I peer with network X in cities A and B, and receive the same route in both cities with an AS-path of X, how do I know which city to use for an exit? I can understand how if X uses communities to tag the geographic origin of the traffic, but I'm not aware of many networks that do this. Lots of networks claim to use cold-potato routing though, so how do they do it?

MED's are one way.. External traceroute kungfu feeding a routeserver are another.

Clayton Fiske

6:07 p.m.

On Wed, Jun 26, 2002 at 01:52:08PM -0400, Ralph Doncaster wrote:

...

If I peer with network X in cities A and B, and receive the same route in both cities with an AS-path of X, how do I know which city to use for an exit? I can understand how if X uses communities to tag the geographic origin of the traffic, but I'm not aware of many networks that do this. Lots of networks claim to use cold-potato routing though, so how do they do it?

If they are really doing cold-potato routing, they are listening to the BGP MEDs (metrics) sent by their peer(s) and making the routing decision based on that. If the MEDs are the same for both routes, the IGP metric for each BGP next-hop is likely making the decision. http://www.nanog.org/mtg-9811/ppt/avi/tsld010.htm Those are the criteria, in order, which BGP uses to make its decision. I am assuming synchronization, route to next hop, and router-local decisions (IBGP vs EBGP, weight) are non-issues in this scenario. Since localpref would be set internally, and AS path is the same (as I would assume origin code is), that leaves the MED as the first criterion, followed by shortest next-hop metric (IGP metric, typically). -c

E.B. Dreger

6:10 p.m.

RD> Date: Wed, 26 Jun 2002 13:52:08 -0400 (EDT) RD> From: Ralph Doncaster RD> If I peer with network X in cities A and B, and receive the same route in RD> both cities with an AS-path of X, how do I know which city to use for an RD> exit? I can understand how if X uses communities to tag the geographic RD> origin of the traffic, but I'm not aware of many networks that do RD> this. Lots of networks claim to use cold-potato routing though, so how do RD> they do it? MEDs Eddy -- Brotsman & Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 (785) 865-5885 Lawrence and [inter]national Phone: +1 (316) 794-8922 Wichita ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.

Leo Bicknell

6:35 p.m.

In a message written on Wed, Jun 26, 2002 at 01:52:08PM -0400, Ralph Doncaster wrote:

...

If I peer with network X in cities A and B, and receive the same route in both cities with an AS-path of X, how do I know which city to use for an exit? I can understand how if X uses communities to tag the geographic origin of the traffic, but I'm not aware of many networks that do this. Lots of networks claim to use cold-potato routing though, so how do they do it?

Wow, I'm amazed at the wrong answers here. The vendors even document this, as do the RFC's, see http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/bgp.htm More to your question, cold-potato uses MEDS to determine the best exit. Generally they do not work for large aggregates of the peer, so they are spread out across the network. Clueful peers set the outgoing meds on their aggregates to all the same value. Set to the same value, or clobbered on inbound, if there is no MED, then the routers inside your network will choose the closest exit based on your IGP cost. This is "hot potato" routing. If, by strange chance, you have equal IGP costs to two peering points with equal MEDS, then it will choose the one with the lower router ID. As you can see, there are many other steps to the selection process, as documented in the link above. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/ Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org

dre

7:22 p.m.

Shortest-exit is the default because of the BGP decision process. This tends to favor heavy-content providers because the bulk of the data travels shorter distances out of the AS sending content to the AS receiving the content to their eyeballs. Shortest-exit is caused by IGP metrics (which shouldn't ever be the same for two paths, unless you actually want that to happen). IGP metrics are generally set by length of fiber paths or delay values. Provider backbones set these manually with ISIS or OSPF costs. There are many ways to do best-exit. People are always coming up with strange ways to do routing (ToS routing, MPLS-TE, DS-TE), and they can sometimes apply these techniques to best-exit. For those looking for something simple and standard, the two ways were made known in the first email -> outbound MED's and delay-based routing from `traceroute' information. There are quite a few problems with this as well, documented in many various papers on the matter e.g.: http://www.ietf.org/internet-drafts/draft-ietf-idr-route-oscillation-01.txt For MED's, Avi spoke to the methods used in the following talks: http://www.nanog.org/mtg-9901/ppt/bgp102/index.htm http://www.nanog.org/mtg-9811/ppt/avi/index.htm One thing Avi mentioned here, I never quite understood.. http://www.nanog.org/mtg-9811/ppt/avi/sld031.htm He says "set MED's in one direction only", but he doesn't say which direction or why. As to solving the aggregation problem making outbound MED's insignficant, there is some work trying to be solved using Communities (NO-PEER, supercommunities, redistribution, cost communities, link-bw, et al). Some of which is believed (and probably rightly so) to be overcomplicated and possibly even oscillatory just like the other methods. I enjoy the simple approach that RFC 3272 takes (surprisingly simple Inter-Domain traffic engineering coming from the super complex Intra-Domain TE based on MPLS/etc that the authors recommend). They have some suggestions on setting local_pref and inbound MED's that I found to be very clueful. http://www.ietf.org/rfc/rfc3272.txt (Section 7.0) "Inter-domain TE is inherently more difficult than intra-domain TE under the current Internet architecture. The reasons for this are both technical and administrative." So maybe best practice today for doing best-exit is simply having the technical data (communities, tags, traffic, etc) and talking directly with the administrators of your peer-AS to find a solution (or reading their minds without their data, or inferring it, or guessing). I guess the final question is -- why is anyone concerned about best-exit at all? Doesn't shortest-exit still get the traffic there? I'm willing to bet there are a lot of different answers to all these questions. -dre On Wed, Jun 26, 2002 at 02:35:55PM -0400, Leo Bicknell wrote:

...

In a message written on Wed, Jun 26, 2002 at 01:52:08PM -0400, Ralph Doncaster wrote:

...
If I peer with network X in cities A and B, and receive the same route in

Wow, I'm amazed at the wrong answers here. The vendors even document this, as do the RFC's, see http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/bgp.htm

More to your question, cold-potato uses MEDS to determine the best exit. Generally they do not work for large aggregates of the peer, so they are spread out across the network. Clueful peers set the outgoing meds on their aggregates to all the same value.

Set to the same value, or clobbered on inbound, if there is no MED, then the routers inside your network will choose the closest exit based on your IGP cost. This is "hot potato" routing.

If, by strange chance, you have equal IGP costs to two peering points with equal MEDS, then it will choose the one with the lower router ID.

As you can see, there are many other steps to the selection process, as documented in the link above.

Ralph Doncaster

7:31 p.m.

...

I guess the final question is -- why is anyone concerned about best-exit at all? Doesn't shortest-exit still get the traffic there? I'm willing to bet there are a lot of different answers to all these questions.

Some networks will supposedly relax their peering requirements if you do best-exit. Also, for some networks shortest-exit results in pipes with large traffic flows in one direction and not the other, so using best-exit may not require any increase in backbone capacity. -Ralph

Stephen J. Wilcox

10:14 p.m.

...

I guess the final question is -- why is anyone concerned about best-exit at all? Doesn't shortest-exit still get the traffic there? I'm willing to bet there are a lot of different answers to all these questions.

-dre

Hmm I have this, equal lengths in terms of geography and hops through my network but capacity is an issue, traffic sometimes becomes imbalanced and I'd like to be able to indicate to it which way I want to be receiving.. trouble is seems no one listens to my MEDs! Steve

Daniel Golding

27 Jun 27 Jun

5:37 p.m.

Andre, What Avi meant is that when you use routing policy (like routemaps or the equivalent) to set additive MEDs between POPs, only do it on egress from all POPs or ingress to all POPs. Don't do it on routes both ways. Look at slide 35 - it has all the MEDs being added as "from" routemaps, as opposed to both "from" and "to". Here is an example: I have a POPs in NYC, Chicago, Seattle. I have routes in BGP being announced from NYC, with a MED of +100 being tacked on as it leaves the NYC POP. I then add an additional MED of +200 when it leaves the Chicago POP, heading for Seattle. This is a cost metric, so higher is "worse". If I had routemaps adding more MED cost upon ingress to the Chicago and Seattle POPs, in addition to on egress from the NYC and Chicago POPs, you are adding twice as much to the metrics - it just doesn't make much sense, and is twice the number of values to control, when you are adjusting the values. Of course, this is all about generating meaningful MEDs on your own network for your own purposes, and for those of your customers and peers. It doesn't really have to do with cold potato routing of other's traffic on your network (although it does let people cold-potato route your traffic on THEIR networks.) Another valid approach for doing this sort of thing is setting your MEDs to be the same as your IGP metrics to the next hops of the BGP routes - there are "shortcut" commands for doing this. Of course, your mileage may vary. - Daniel Golding

...

-----Original Message----- From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of dre Sent: Wednesday, June 26, 2002 3:22 PM To: nanog@merit.edu Subject: Re: how is cold-potato done?

Shortest-exit is the default because of the BGP decision process. This tends to favor heavy-content providers because the bulk of the data travels shorter distances out of the AS sending content to the AS receiving the content to their eyeballs.

Shortest-exit is caused by IGP metrics (which shouldn't ever be the same for two paths, unless you actually want that to happen). IGP metrics are generally set by length of fiber paths or delay values. Provider backbones set these manually with ISIS or OSPF costs.

There are many ways to do best-exit. People are always coming up with strange ways to do routing (ToS routing, MPLS-TE, DS-TE), and they can sometimes apply these techniques to best-exit.

For those looking for something simple and standard, the two ways were made known in the first email -> outbound MED's and delay-based routing from `traceroute' information. There are quite a few problems with this as well, documented in many various papers on the matter e.g.: http://www.ietf.org/internet-drafts/draft-ietf-idr-route-oscillati on-01.txt

For MED's, Avi spoke to the methods used in the following talks: http://www.nanog.org/mtg-9901/ppt/bgp102/index.htm http://www.nanog.org/mtg-9811/ppt/avi/index.htm

One thing Avi mentioned here, I never quite understood.. http://www.nanog.org/mtg-9811/ppt/avi/sld031.htm He says "set MED's in one direction only", but he doesn't say which direction or why.

As to solving the aggregation problem making outbound MED's insignficant, there is some work trying to be solved using Communities (NO-PEER, supercommunities, redistribution, cost communities, link-bw, et al). Some of which is believed (and probably rightly so) to be overcomplicated and possibly even oscillatory just like the other methods.

I enjoy the simple approach that RFC 3272 takes (surprisingly simple Inter-Domain traffic engineering coming from the super complex Intra-Domain TE based on MPLS/etc that the authors recommend). They have some suggestions on setting local_pref and inbound MED's that I found to be very clueful. http://www.ietf.org/rfc/rfc3272.txt (Section 7.0)

"Inter-domain TE is inherently more difficult than intra-domain TE under the current Internet architecture. The reasons for this are both technical and administrative."

So maybe best practice today for doing best-exit is simply having the technical data (communities, tags, traffic, etc) and talking directly with the administrators of your peer-AS to find a solution (or reading their minds without their data, or inferring it, or guessing).

I guess the final question is -- why is anyone concerned about best-exit at all? Doesn't shortest-exit still get the traffic there? I'm willing to bet there are a lot of different answers to all these questions.

-dre

On Wed, Jun 26, 2002 at 02:35:55PM -0400, Leo Bicknell wrote:

...
In a message written on Wed, Jun 26, 2002 at 01:52:08PM -0400,

Ralph Doncaster wrote:

...
...
If I peer with network X in cities A and B, and receive the same route in

Wow, I'm amazed at the wrong answers here. The vendors even document this, as do the RFC's, see http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/bgp.htm

More to your question, cold-potato uses MEDS to determine the best exit. Generally they do not work for large aggregates of the peer, so they are spread out across the network. Clueful peers set the outgoing meds on their aggregates to all the same value.

Set to the same value, or clobbered on inbound, if there is no MED, then the routers inside your network will choose the closest exit based on your IGP cost. This is "hot potato" routing.

If, by strange chance, you have equal IGP costs to two peering points with equal MEDS, then it will choose the one with the lower router ID.

As you can see, there are many other steps to the selection process, as documented in the link above.

Mathew Richardson

26 Jun 26 Jun

10:12 p.m.

...

Leo Bicknell <bicknell@ufp.org> [Wed, Jun 26, 2002 at 02:35:55PM -0400]:

In a message written on Wed, Jun 26, 2002 at 01:52:08PM -0400, Ralph Doncaster wrote:

...
If I peer with network X in cities A and B, and receive the same route in both cities with an AS-path of X, how do I know which city to use for an exit? I can understand how if X uses communities to tag the geographic origin of the traffic, but I'm not aware of many networks that do this. Lots of networks claim to use cold-potato routing though, so how do they do it?

Wow, I'm amazed at the wrong answers here. The vendors even document this, as do the RFC's, see http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/bgp.htm

More to your question, cold-potato uses MEDS to determine the best exit. Generally they do not work for large aggregates of the peer, so they are spread out across the network. Clueful peers set the outgoing meds on their aggregates to all the same value.

Set to the same value, or clobbered on inbound, if there is no MED, then the routers inside your network will choose the closest exit based on your IGP cost. This is "hot potato" routing.

If, by strange chance, you have equal IGP costs to two peering points with equal MEDS, then it will choose the one with the lower router ID.

<snip> In the interest of accuracy, it's worth noting that some vendors will choose the one with the lower router ID, and others will choose the route that was learned first (at least by default), despite documentation to the contrary. mrr

Nick Feamster

1 Jul 1 Jul

5:44 p.m.

More detail on how Cisco does this at: http://www.cisco.com/warp/public/459/25.shtml specifically, see step 10: "10. When both paths are external, prefer the path that was received first (the oldest one). This step minimizes route-flap, since a newer path won't displace an older one, even if it was the preferred route based on additional decision criteria, as described in steps 11, 12, and 13. Skip this step if any of the following is true: * The bgp best path compare-routerid command is enabled. Note: This command was introduced in Cisco IOS® Software Releases 12.0.11S, 12.0.11SC, 12.0.11S3, 12.1.3, 12.1.3AA, 12.1.3.T, and 12.1.3.E. ' * The router ID is the same for multiple paths, since the routes were received from the same router. * There is no current best path. An example of losing the current best path occurs when the neighbor offering the path goes down." -Nick On Wed, Jun 26, 2002 at 06:12:18PM -0400, Mathew Richardson wrote:

...

In the interest of accuracy, it's worth noting that some vendors will choose the one with the lower router ID, and others will choose the route that was learned first (at least by default), despite documentation to the contrary.

mrr

8412

Age (days ago)

8417

Last active (days ago)

List overview

Download

12 comments

11 participants

participants (11)

Clayton Fiske
Daniel Golding
dre
E.B. Dreger
Greg Maxwell
Jared Mauch
Leo Bicknell
Mathew Richardson
Nick Feamster
Ralph Doncaster
Stephen J. Wilcox