So, we have two upstreams, both coming in on Ethernet. One of our switch crashed and rebooted itself. Although we have other paths to egress out the network, because the router's Ethernet interface didn't go down, our router's BGP didn't realize the neighbor was down until default BGP timeout was reached. Our upstream connectivity was out for couple minutes. I am looking for ways to detect neighbor being down faster so traffic can be re-routed faster. I can do BFD internally but the issue is how the upstream is going to detect the outage and stop routing our traffic to that downed link. I have asked both of my upstreams and one said they don't do anything like that, second upstream I am still waiting on the answer. My question is, do other carriers do BFD or any other means to detect the neighbor being down faster than normal BGP will allow? (Both upstreams are major telcos [AT&T and Qwest], so I think they are less flexible than some others.) Or, has anyone succeeded in getting something done with those two carriers? Thanks!
On 5/11/2010 11:35, Jay Nakamura wrote:
So, we have two upstreams, both coming in on Ethernet. One of our switch crashed and rebooted itself. Although we have other paths to egress out the network, because the router's Ethernet interface didn't go down, our router's BGP didn't realize the neighbor was down until default BGP timeout was reached. Our upstream connectivity was out for couple minutes.
I am looking for ways to detect neighbor being down faster so traffic can be re-routed faster. I can do BFD internally but the issue is how the upstream is going to detect the outage and stop routing our traffic to that downed link. I have asked both of my upstreams and one said they don't do anything like that, second upstream I am still waiting on the answer.
My question is, do other carriers do BFD or any other means to detect the neighbor being down faster than normal BGP will allow? (Both upstreams are major telcos [AT&T and Qwest], so I think they are less flexible than some others.)
Or, has anyone succeeded in getting something done with those two carriers?
In my experience this is a pretty common problem with carrier Ethernet links where the interface is always "up" unless the directly connected switch/mux fails. Even then, it may still keep the port up through reboots. I like how Ethernet is cheap, but I hate how it lacks simple things like "link is down if any segment of the L1 or L2 between endpoints faults" that you get without silly tricks on a DSx or OC-x. (Then again, I suppose you're paying for that capability if it's important enough.) ~Seth
Yes, I understand BFD. The question is, do carriers usually do BFD with customers? And if they say no, are there other remedies? AT&T doesn't seem to be even willing to change BGP timers. If anyone have been able to talk AT&T or Qwest in doing so, it would really help to find out how they convinced them. They are such a big bureaucracies that it's frustrating to do anything that makes sense. Although Qwest seems a lot more responsive than AT&T. On Tue, May 11, 2010 at 5:59 PM, Randy Bush <randy@psg.com> wrote:
I am looking for ways to detect neighbor being down faster so traffic can be re-routed faster.
BFD
On Tue, May 11, 2010 at 09:31:51PM -0400, Jay Nakamura wrote:
Yes, I understand BFD. The question is, do carriers usually do BFD with customers? And if they say no, are there other remedies? AT&T doesn't seem to be even willing to change BGP timers. If anyone have been able to talk AT&T or Qwest in doing so, it would really help to find out how they convinced them. They are such a big bureaucracies that it's frustrating to do anything that makes sense. Although Qwest seems a lot more responsive than AT&T.
Slow as the titanic carriers won't do anything innovative for anyone, regardless of the benefit. Try a clueful carrier and they'll be happy to run BFD with you. Of course after promoting it for more than a year now we have something like 5 peers and 0 customers using it (mostly because of broken vendor implementations), but hey it's never too late to start. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
What about IP SLA with some EEM? This link may give you some ideas: http://blog.ioshints.info/2008/01/ospf-default-route-based-on-ip-sla.html Frank -----Original Message----- From: Jay Nakamura [mailto:zeusdadog@gmail.com] Sent: Tuesday, May 11, 2010 1:35 PM To: NANOG Subject: BGP and convergence time So, we have two upstreams, both coming in on Ethernet. One of our switch crashed and rebooted itself. Although we have other paths to egress out the network, because the router's Ethernet interface didn't go down, our router's BGP didn't realize the neighbor was down until default BGP timeout was reached. Our upstream connectivity was out for couple minutes. I am looking for ways to detect neighbor being down faster so traffic can be re-routed faster. I can do BFD internally but the issue is how the upstream is going to detect the outage and stop routing our traffic to that downed link. I have asked both of my upstreams and one said they don't do anything like that, second upstream I am still waiting on the answer. My question is, do other carriers do BFD or any other means to detect the neighbor being down faster than normal BGP will allow? (Both upstreams are major telcos [AT&T and Qwest], so I think they are less flexible than some others.) Or, has anyone succeeded in getting something done with those two carriers? Thanks!
Believe have narrowed down problem to layer 2. A ping to address 224.0.0.5 shows no reply. Believe problme to do with blocking of multicast Regards, Shake On Fri, May 14, 2010 at 5:28 AM, Frank Bulk <frnkblk@iname.com> wrote:
What about IP SLA with some EEM? This link may give you some ideas: http://blog.ioshints.info/2008/01/ospf-default-route-based-on-ip-sla.html
Frank
-----Original Message----- From: Jay Nakamura [mailto:zeusdadog@gmail.com] Sent: Tuesday, May 11, 2010 1:35 PM To: NANOG Subject: BGP and convergence time
So, we have two upstreams, both coming in on Ethernet. One of our switch crashed and rebooted itself. Although we have other paths to egress out the network, because the router's Ethernet interface didn't go down, our router's BGP didn't realize the neighbor was down until default BGP timeout was reached. Our upstream connectivity was out for couple minutes.
I am looking for ways to detect neighbor being down faster so traffic can be re-routed faster. I can do BFD internally but the issue is how the upstream is going to detect the outage and stop routing our traffic to that downed link. I have asked both of my upstreams and one said they don't do anything like that, second upstream I am still waiting on the answer.
My question is, do other carriers do BFD or any other means to detect the neighbor being down faster than normal BGP will allow? (Both upstreams are major telcos [AT&T and Qwest], so I think they are less flexible than some others.)
Or, has anyone succeeded in getting something done with those two carriers?
Thanks!
Apologies, kindly ignore my earlier responce. Rgrds, Shake On Fri, May 14, 2010 at 3:46 PM, shake righa <ssrigha@gmail.com> wrote:
Believe have narrowed down problem to layer 2.
A ping to address 224.0.0.5 shows no reply.
Believe problme to do with blocking of multicast
Regards, Shake
On Fri, May 14, 2010 at 5:28 AM, Frank Bulk <frnkblk@iname.com> wrote:
What about IP SLA with some EEM? This link may give you some ideas: http://blog.ioshints.info/2008/01/ospf-default-route-based-on-ip-sla.html
Frank
-----Original Message----- From: Jay Nakamura [mailto:zeusdadog@gmail.com] Sent: Tuesday, May 11, 2010 1:35 PM To: NANOG Subject: BGP and convergence time
So, we have two upstreams, both coming in on Ethernet. One of our switch crashed and rebooted itself. Although we have other paths to egress out the network, because the router's Ethernet interface didn't go down, our router's BGP didn't realize the neighbor was down until default BGP timeout was reached. Our upstream connectivity was out for couple minutes.
I am looking for ways to detect neighbor being down faster so traffic can be re-routed faster. I can do BFD internally but the issue is how the upstream is going to detect the outage and stop routing our traffic to that downed link. I have asked both of my upstreams and one said they don't do anything like that, second upstream I am still waiting on the answer.
My question is, do other carriers do BFD or any other means to detect the neighbor being down faster than normal BGP will allow? (Both upstreams are major telcos [AT&T and Qwest], so I think they are less flexible than some others.)
Or, has anyone succeeded in getting something done with those two carriers?
Thanks!
participants (6)
-
Frank Bulk
-
Jay Nakamura
-
Randy Bush
-
Richard A Steenbergen
-
Seth Mattinen
-
shake righa