Hi all, Happy new year... I have a question regarding multi-homing, mostly from stub network's operational point of view. My big question is: what kind of failures do you usually see from your providers? Link down? Link up, but withdraw some routes? Link up, no route change, but blackholing partial or all traffic? Anything else? Let's say that I have two local routers (Ra and Rb) connecting to two providers, A and B. If router Ra sees provider A with problems of the first two cases (link down, link up but withdraw routes), the Rb can easily step up. My question is, if I am using provider A as the default, but provider A has the third problem (link up, no route change, but blackholing traffic), how can I detect it and switch provider automatically? To state this problem in detail: I use a static default route on Ra to forward traffic to provider A, or receive 0/0 from provider A via BGP. For some reason, provider A can no longer reach a /24. My network cannot be notified (unless, I receive a full internet routing table). In this case, all I know is that my traffic to /24 is blackholed through provider A. In this case, is there an automatic way for my stub network to switch over to provider B? Do I have to do the detection and switch over manually? I don't think VRRP can help here, right? Thanks. -Simon
Simon Chen wrote:
Hi all,
Happy new year...
I have a question regarding multi-homing, mostly from stub network's operational point of view. My big question is: what kind of failures do you usually see from your providers? Link down? Link up, but withdraw some routes? Link up, no route change, but blackholing partial or all traffic? Anything else?
I am a multihomed network with no downstream customers. Speaking only for myself over the last 5 years I have only had loss of link conditions as the majority problem such as: * DLCI deleted (LEC "accidentally canceled" a FRT1 once) * Loss of signal (almost always LEC problems) * Loss of frame (almost always long haul problems) It's worth noting all my circuits are T1, T3, or OC-x and less likely to have an "up/up but not passing traffic" state like an Ethernet handoff could do. And only once: * Sprint vs. Cogent peering spat (I'm a Sprint customer) The last one would have been a huge problem for default route or single homed users - and why I always recommend full tables - but for me I didn't care since the affected paths disappeared via Sprint but were still there via my other upstream.
To state this problem in detail: I use a static default route on Ra to forward traffic to provider A, or receive 0/0 from provider A via BGP. For some reason, provider A can no longer reach a /24. My network cannot be notified (unless, I receive a full internet routing table). In this case, all I know is that my traffic to /24 is blackholed through provider A. In this case, is there an automatic way for my stub network to switch over to provider B? Do I have to do the detection and switch over manually? I don't think VRRP can help here, right?
You're asking for what BGP does. You could ping every prefix you care about and do it by hand, I guess. If this is a major concern for you I'd say full tables are in your best interest so you can let BGP do what it does best. (Disclaimer: there may be some trick I'm not aware of because I always prefer to let BGP do its job.) ~Seth
Simon- We do exactly what you are trying to accomplish. We have two routers and two providers. Provider A is our primary and we receive partial routes from them (no static route). Then Router B is connected to Provider B with no default route (basically it looks like we are not advertising to them). Our AS on router b is prepended several times. Router A and B are connected via iBGP to eachother. Then, using interface tracking (we are a cisco shop) we can fail to provider B. So, about the only failure we cannot automatically recover from is if we have our router A interface / layer1 to provider A start to fail and we get enough traffic through to keep BGP up, but errors make ip traffic fail. This failover has worked server times while in production. Mostly we see our BGP drop from provider A, but we have also seen link down from provider a. In testing we failed links and routers, which always recovered just fine. But we all know the lab can be completely different from the real world. If you want to see how this work for us, go to bgplay.com and enter the following: Network: 67.135.55.0/24 Start: 26/12/2009 20:00:00 End: 27/12/2009 07:00:00 Pull out 19629 (ME) 209 (Qwest, provider A) 7263 (GoFast. Dba Sungard, provider B) At about 20:11 you see the routes start failing to AS7263 and then at about 6:23 the next day they start failing back. This example happened when Qwest lost an edge router in Minnesota. Link status was up, but BGP tables were lost, so we had no router out to qwest. Dylan Ebner, Network Engineer Consulting Radiologists, Ltd. 1221 Nicollet Mall, Minneapolis, MN 55403 ph. 612.573.2236 fax. 612.573.2250 dylan.ebner@crlmed.com www.consultingradiologists.com -----Original Message----- From: Simon Chen [mailto:simonchennj@gmail.com] Sent: Wednesday, December 30, 2009 11:03 AM To: nanog@nanog.org Subject: question regarding multi-homing Hi all, Happy new year... I have a question regarding multi-homing, mostly from stub network's operational point of view. My big question is: what kind of failures do you usually see from your providers? Link down? Link up, but withdraw some routes? Link up, no route change, but blackholing partial or all traffic? Anything else? Let's say that I have two local routers (Ra and Rb) connecting to two providers, A and B. If router Ra sees provider A with problems of the first two cases (link down, link up but withdraw routes), the Rb can easily step up. My question is, if I am using provider A as the default, but provider A has the third problem (link up, no route change, but blackholing traffic), how can I detect it and switch provider automatically? To state this problem in detail: I use a static default route on Ra to forward traffic to provider A, or receive 0/0 from provider A via BGP. For some reason, provider A can no longer reach a /24. My network cannot be notified (unless, I receive a full internet routing table). In this case, all I know is that my traffic to /24 is blackholed through provider A. In this case, is there an automatic way for my stub network to switch over to provider B? Do I have to do the detection and switch over manually? I don't think VRRP can help here, right? Thanks. -Simon
If you are using Cisco... http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6554/ps6599/ps8... On Wed, Dec 30, 2009 at 12:38 PM, Dylan Ebner <dylan.ebner@crlmed.com>wrote:
Simon- We do exactly what you are trying to accomplish. We have two routers and two providers. Provider A is our primary and we receive partial routes from them (no static route). Then Router B is connected to Provider B with no default route (basically it looks like we are not advertising to them). Our AS on router b is prepended several times. Router A and B are connected via iBGP to eachother. Then, using interface tracking (we are a cisco shop) we can fail to provider B. So, about the only failure we cannot automatically recover from is if we have our router A interface / layer1 to provider A start to fail and we get enough traffic through to keep BGP up, but errors make ip traffic fail.
This failover has worked server times while in production. Mostly we see our BGP drop from provider A, but we have also seen link down from provider a. In testing we failed links and routers, which always recovered just fine. But we all know the lab can be completely different from the real world.
If you want to see how this work for us, go to bgplay.com and enter the following:
Network: 67.135.55.0/24
Start: 26/12/2009 20:00:00 End: 27/12/2009 07:00:00
Pull out 19629 (ME) 209 (Qwest, provider A) 7263 (GoFast. Dba Sungard, provider B)
At about 20:11 you see the routes start failing to AS7263 and then at about 6:23 the next day they start failing back.
This example happened when Qwest lost an edge router in Minnesota. Link status was up, but BGP tables were lost, so we had no router out to qwest.
Dylan Ebner, Network Engineer Consulting Radiologists, Ltd. 1221 Nicollet Mall, Minneapolis, MN 55403 ph. 612.573.2236 fax. 612.573.2250 dylan.ebner@crlmed.com www.consultingradiologists.com
-----Original Message----- From: Simon Chen [mailto:simonchennj@gmail.com] Sent: Wednesday, December 30, 2009 11:03 AM To: nanog@nanog.org Subject: question regarding multi-homing
Hi all,
Happy new year...
I have a question regarding multi-homing, mostly from stub network's operational point of view. My big question is: what kind of failures do you usually see from your providers? Link down? Link up, but withdraw some routes? Link up, no route change, but blackholing partial or all traffic? Anything else?
Let's say that I have two local routers (Ra and Rb) connecting to two providers, A and B. If router Ra sees provider A with problems of the first two cases (link down, link up but withdraw routes), the Rb can easily step up. My question is, if I am using provider A as the default, but provider A has the third problem (link up, no route change, but blackholing traffic), how can I detect it and switch provider automatically?
To state this problem in detail: I use a static default route on Ra to forward traffic to provider A, or receive 0/0 from provider A via BGP. For some reason, provider A can no longer reach a /24. My network cannot be notified (unless, I receive a full internet routing table). In this case, all I know is that my traffic to /24 is blackholed through provider A. In this case, is there an automatic way for my stub network to switch over to provider B? Do I have to do the detection and switch over manually? I don't think VRRP can help here, right?
Thanks. -Simon
-- To him who is able to keep you from falling and to present you before his glorious presence without fault and with great joy
On Wed, Dec 30, 2009 at 12:02 PM, Simon Chen <simonchennj@gmail.com> wrote:
I have a question regarding multi-homing, mostly from stub network's operational point of view. My big question is: what kind of failures do you usually see from your providers? Link down? Link up, but withdraw some routes? Link up, no route change, but blackholing partial or all traffic? Anything else?
Two more failure modes: Link up, receiving all routes but provider stops propagating your announcement outward. Link up but unusably high packet loss to some or all destinations. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
Link up, receiving all routes but provider stops propagating your announcement outward.
Longer AS path prepending on your secondary connection should take care of this, eh? Might end up with asymmetric routing but better than no traffic being returned.
Link up but unusably high packet loss to some or all destinations.
Assuming Cisco hardware what is the best way to handle this? Setup some IP SLA and bind them to a tracking objects? Use EEM and TCL scripting? Thanks! Jason -----Original Message----- From: William Herrin [mailto:herrin-nanog@dirtside.com] Sent: Wednesday, December 30, 2009 12:25 PM To: Simon Chen Cc: nanog@nanog.org Subject: Re: question regarding multi-homing On Wed, Dec 30, 2009 at 12:02 PM, Simon Chen <simonchennj@gmail.com> wrote:
I have a question regarding multi-homing, mostly from stub network's operational point of view. My big question is: what kind of failures do you usually see from your providers? Link down? Link up, but withdraw some routes? Link up, no route change, but blackholing partial or all traffic? Anything else?
Two more failure modes: Link up, receiving all routes but provider stops propagating your announcement outward. Link up but unusably high packet loss to some or all destinations. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004 *** NOTICE--The attached communication contains privileged and confidential information. If you are not the intended recipient, DO NOT read, copy, or disseminate this communication. Non-intended recipients are hereby placed on notice that any unauthorized disclosure, duplication, distribution, or taking of any action in reliance on the contents of these materials is expressly prohibited. If you have received this communication in error, please delete this information in its entirety and contact the Amedisys Privacy Hotline at 1-866-518-6684. Also, please immediately notify the sender via e-mail that you have received this communication in error. ***
On Wed, Dec 30, 2009 at 1:25 PM, William Herrin <herrin-nanog@dirtside.com> wrote:
On Wed, Dec 30, 2009 at 12:02 PM, Simon Chen <simonchennj@gmail.com> wrote:
I have a question regarding multi-homing, mostly from stub network's operational point of view. My big question is: what kind of failures do you usually see from your providers? Link down? Link up, but withdraw some routes? Link up, no route change, but blackholing partial or all traffic? Anything else?
Two more failure modes:
Link up, receiving all routes but provider stops propagating your announcement outward.
Link up but unusably high packet loss to some or all destinations.
Regards, Bill Herrin
-- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
Thank you all for the reply! It seems to me that Cisco performance based routing and other commercial solutions can probably handle the potential problems. How about operators that deal with this on their own? Is there a standard detection and recovery procedure? How long does it usually take, with or without scripting? Thanks! -Simon
participants (6)
-
Dylan Ebner
-
Jason Shearer
-
Seth Mattinen
-
Simon Chen
-
Steven Fischer
-
William Herrin