Re: route for linx.net in Level3?
Leo Bicknell wrote: Even if the exchange does not advertise the exchange LAN, it's probably the case that it is in the IGP (or at least IBGP) of everyone connected to it, and by extension all of their customers with a default route pointed at them. Actually, that may not be the case, and probably *should* not be the case. Here's why, in a nutshell: If two regional ISPs on either side of the planet, point default to the same Global ISP, even if they do not peer with that ISP, by using the IX next-hop at IX A (for ISP A), and IX B (for ISP B), then the Global ISP is now giving free on-net transit to A and B. So, it turns out that pretty much the only way to prevent this at a routing level, is to not carry IXP networks (in IGP or IBGP), but rather to do next-hop-self. The other way is to filter at a packet level on ingress, based on Layer 2 information, which on many kinds of IX-capable hardware, is actually impossible. So, when it comes to IXPs: Next-Hop-Self. (BCP 38 actually doesn't even enter into it, oddly enough.) Brian
On 2013-04-04, at 15:53, Brian Dickson <brian.peter.dickson@gmail.com> wrote:
Leo Bicknell wrote:
Even if the exchange does not advertise the exchange LAN, it's probably the case that it is in the IGP (or at least IBGP) of everyone connected to it,
I have experience of several networks where that is not the case. IGP carries routes for loopback and internal-facing interfaces; external-facing interface routes are only known to the local router; pervasive next-hop-self for IBGP. So, no great survey, but don't assume that everybody does things the same way. Joe
Even if the exchange does not advertise the exchange LAN, it's probably the case that it is in the IGP (or at least IBGP) of everyone connected to it,
yikes! this is quite ill-advised and i don't know anyone who does this, but i think all my competitors should.
I have experience of several networks where that is not the case. IGP carries routes for loopback and internal-facing interfaces;
i have seen some carry external because, for some reason, they do not want to re-write next-hop at the border. randy
On Thu, Apr 4, 2013 at 1:43 PM, Randy Bush <randy@psg.com> wrote:
Even if the exchange does not advertise the exchange LAN, it's probably the case that it is in the IGP (or at least IBGP) of everyone connected to it,
yikes! this is quite ill-advised and i don't know anyone who does this, but i think all my competitors should.
Its more common than uncommon. At WIX (Wellington), 64 out of 93 members will carry packets destined to APE (Auckland Exchange). (source: http://conference.apnic.net/__data/assets/pdf_file/0018/50706/apnic34-mike-j...) and this is just New Zealand! Just checked a few exchanges, not just are the IXP ranges being carried, they're being leaked: Equinix SG: $ bgpctl show rib 202.79.197.0/24 flags: * = Valid, > = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin 202.79.197.0/24 100 0 13335 23947 23947 ? 202.79.197.0/24 100 0 13335 10026 i Any2 LA: bgpctl show rib 206.223.143.0/24 flags: * = Valid, > = Selected, I = via IBGP, A = Announced origin: i = IGP, e = EGP, ? = Incomplete flags destination gateway lpref med aspath origin 206.223.143.0/24 100 0 13335 9304 i 206.223.143.0/24 100 0 13335 9304 i 206.223.143.0/24 100 0 13335 4635 9304 i 206.223.143.0/24 100 0 13335 9304 i
I have experience of several networks where that is not the case. IGP carries routes for loopback and internal-facing interfaces;
i have seen some carry external because, for some reason, they do not want to re-write next-hop at the border.
randy
On Thu, Apr 4, 2013 at 1:43 PM, Randy Bush <randy@psg.com> wrote:
Even if the exchange does not advertise the exchange LAN, it's probably the case that it is in the IGP (or at least IBGP) of everyone connected to it,
yikes! this is quite ill-advised and i don't know anyone who does this, but i think all my competitors should.
Its more common than uncommon.
At WIX (Wellington), 64 out of 93 members will carry packets destined to APE (Auckland Exchange). (source: http://conference.apnic.net/__data/assets/pdf_file/0018/50706/apnic34-mike-j...) and this is just New Zealand!
Just checked a few exchanges, not just are the IXP ranges being carried, they're being leaked:
i am not unhappy by the exchange mesh being carried within a member and being propagated to their customer cone, see my nanog preso of feb 1997 and leo's recent post. it's putting such things in one's igp that disgusts me. as joe said, igp is just for the loopbacks and other interfaces it takes to make your ibgp work. randy
In a message written on Fri, Apr 05, 2013 at 10:01:34AM +0900, Randy Bush wrote:
it's putting such things in one's igp that disgusts me. as joe said, igp is just for the loopbacks and other interfaces it takes to make your ibgp work.
While your method is correct for probably 80-90% of the ISP networks, the _why_ people do that has almost been lost to the mysts of time. I'm sure Randy knows what I'm about to type, but for the rest of the list... The older school of thought was to put all of the edge interfaces into the IGP, and then carry all of the external routes in BGP. This caused a one level recursion in the routers: eBGP Route->IXP w/IGP Next Hop->Output Interface The Internet then became a thing, and there started to be a lot of BGP speaking customers (woohoo! T1's for everyone!), and thus lots of edge /30's in the IGP. The IGP convergence time quickly got very, very bad. I think a network or two may have even broken an IGP. The "solution" was to take edge interfaces (really "redistribute connected" for most people) and move it from the IGP to BGP, and to make that work BGP had to set "next-hop-self" on the routes. The exchange /24 would now appear in BGP with a next hop of the router loopback, the router itself knew it was directly connected. A side effect is that this caused a two-step lookup in BGP: eBGP-Route->IXP w/Router Loopback Next Hop->Loopback w/IGP Next Hop->Output Interface IGP's went from O(bgp_customers) routes to O(router) routes, and stopped falling over and converged much faster. On the flip side, every RIB->FIB operation now has to go through an extra step of recursion for every route, taking BGP resolution from O(routes) to O(routes * 1.1ish). Since all this happened, CPU's have gotten much faster, RAM has gotten much larger. Most people have never revisited the problem, the scaling of IGP's, or what hardware can do today. There are plenty of scenarios where the "old way" works just spiffy, and can have some advantages. For a network with a very low number of BGP speakers the faster convergence of the IGP may be desireable. Not every network is built the same, or has the same scaling properties. What's good for a CDN may not be good for an access ISP, and vice versa, for example. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
The older school of thought was to put all of the edge interfaces into the IGP, and then carry all of the external routes in BGP. I thought people where doing it because IGP converged faster than iBGP and in case of an external link failure the ingress PE was informed via IGP that it has to find an alternate next-hop. Though now with the advent of BGP PIC this is not an argument anymore.
adam
In a message written on Fri, Apr 05, 2013 at 09:32:52AM +0200, Adam Vitkovsky wrote:
I thought people where doing it because IGP converged faster than iBGP and in case of an external link failure the ingress PE was informed via IGP that it has to find an alternate next-hop. Though now with the advent of BGP PIC this is not an argument anymore.
You're talking about stuff that's all 7-10 years after the decisions were made that I described in my previous e-mail. Tag switching (now MPLS) had not yet been invented/deployed when the first "next-hop-self" wave occured it was all about scaling both the IGP and BGP. In some MPLS topologies it may speed re-routing to have edge interfaces in the IGP due to the faster convergence of IGP's. YMMV, Batteries not Included, Some Assembly Required. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
participants (6)
-
Adam Vitkovsky
-
Brian Dickson
-
Joe Abley
-
Leo Bicknell
-
Randy Bush
-
Tom Paseka