endpoint liveness (RE: Do ATM-based Exchange Points make sense an ymore?)
BGP keepalive/hold timers are configurable even down to granularity of link or PVC level keepalives, but for session stability reasons, it appears that most ISPs at GigE exchanges choose not to tweak them down from the defaults. IIRC, Juniper is 30/90 and Cisco is 60/180. My gut feel was that even something like 10/30 would be reasonable, but nobody seems compelled that this is much of an issue. Cheers, -Lane -----Original Message----- From: Petri Helenius [mailto:pete@he.iki.fi] Sent: Friday, August 09, 2002 3:07 PM To: Mikael Abrahamsson; nanog@merit.edu Subject: Re: Do ATM-based Exchange Points make sense anymore?
What functionality does PVC give you that the ethernet VLAN does not?
That´s quite easy. Endpoint liveness. A IPv4 host on a VLAN has no idea if the guy on the "other end" died until the BGP timer expires. FR has LMI, ATM has OAM. (and ILMI) Pete
It makes little sense to detect transient glitches. Any possible reaction on those glitches (i.e. withdrawal of exterior routes with subsequent reinstatement) is more damaging than the glitches themselves. --vadim On Fri, 9 Aug 2002, Lane Patterson wrote:
BGP keepalive/hold timers are configurable even down to granularity of link or PVC level keepalives, but for session stability reasons, it appears that most ISPs at GigE exchanges choose not to tweak them down from the defaults. IIRC, Juniper is 30/90 and Cisco is 60/180. My gut feel was that even something like 10/30 would be reasonable, but nobody seems compelled that this is much of an issue.
Cheers, -Lane
-----Original Message----- From: Petri Helenius [mailto:pete@he.iki.fi] Sent: Friday, August 09, 2002 3:07 PM To: Mikael Abrahamsson; nanog@merit.edu Subject: Re: Do ATM-based Exchange Points make sense anymore?
What functionality does PVC give you that the ethernet VLAN does not?
That╢s quite easy. Endpoint liveness. A IPv4 host on a VLAN has no idea if the guy on the "other end" died until the BGP timer expires.
FR has LMI, ATM has OAM. (and ILMI)
Pete
Thus spake "Vadim Antonov" <avg@exigengroup.com>
It makes little sense to detect transient glitches. Any possible reaction on those glitches (i.e. withdrawal of exterior routes with subsequent reinstatement) is more damaging than the glitches themselves.
(Ignoring BGP for the moment, which has no clue of the reliability of its links) That's due to the "slow down, fast up" nature of IETF protocols. Do you really want a link or routing protocol claiming your link is "up" if it passes only 33% of your keepalives? IMHO, the key to fast-response protocols is reversing this behavior: require (say) 10 keepalives in a row for a link to be "up", and missing one forces it "down". S
On Fri, 9 Aug 2002, Lane Patterson wrote:
BGP keepalive/hold timers are configurable even down to granularity of link or PVC level keepalives, but for session stability reasons, it appears that most ISPs at GigE exchanges choose not to tweak them down from the defaults.
Endpoint liveness may also start to become more of an issue as more networks choose to private peer, or reach ethernet exchanges, over L2 pseudowires. When the router at the far end goes away for whatever reason - the router has really gone away, the MPLS provider in the middle is banjaxed, etc - this isn't immediately visible to the other end, which will still see "link up" from the PE. I think someone (can't remember who, maybe Riverstone) is implementing a method of dropping link on the ethernet ports at both ends of a pseudowire if something goes bang in the middle, and end-to-end connectivity fails. But, how does that work when you may be delivering multiple q-tags on a single GigE port (for example)? If only one tag is affected, you don't want to drop link, right? So, we're back to detection at layer 3, can I ping it, do I have adjacency, etc. Some sort of lower-level heartbeat (maybe like OAM), not dependent on IP reachability, would be a bonus - and it's probably low in the tax stakes, if it can be made simple enough. Mike
Mike Hughes wrote:
But, how does that work when you may be delivering multiple q-tags on a single GigE port (for example)? If only one tag is affected, you don't want to drop link, right?
So, we're back to detection at layer 3, can I ping it, do I have adjacency, etc.
Some sort of lower-level heartbeat (maybe like OAM), not dependent on IP reachability, would be a bonus - and it's probably low in the tax stakes, if it can be made simple enough.
I think pseudowire liveness (in case of ethernet pseudowires which are by nature multipoint and multi-vlan) does not really make sense but as you conclude L3 liveness does. Obviously one can repeat the exercise for everything that needs liveness but it would make more sense to have a generic way to determine L3 reachability in a robust manner. Pete
On Fri, Aug 09, 2002 at 03:22:00PM -0700, Lane Patterson wrote:
BGP keepalive/hold timers are configurable even down to granularity of link or PVC level keepalives, but for session stability reasons, it appears that most ISPs at GigE exchanges choose not to tweak them down from the defaults. IIRC, Juniper is 30/90 and Cisco is 60/180. My gut feel was that even something like 10/30 would be reasonable, but nobody seems compelled that this is much of an issue.
Your Cisco router (say a GSR) will go foobar if you use 10/30 seconds timers, a IGP topology change, causing a new next-hop interface for 100k routes, will cause processes (probably CEF related) to run for so long, that you will loose your BGP keepalives, thus loose sessions, and everything will go *BOOM* - so please be nice and don't do that without real testing. /Jesper -- Jesper Skriver, jesper(at)skriver(dot)dk - CCIE #5456 Senior network engineer @ AS3292, TDC Tele Danmark One Unix to rule them all, One Resolver to find them, One IP to bring them all and in the zone to bind them.
Jesper Skriver wrote:
Your Cisco router (say a GSR) will go foobar if you use 10/30 seconds timers, a IGP topology change, causing a new next-hop interface for 100k routes, will cause processes (probably CEF related) to run for so long, that you will loose your BGP keepalives, thus loose sessions, and everything will go *BOOM* - so please be nice and don't do that without real testing.
This is the exact reason why you want your liveness to be detected out of band of the actual routing protocol keepalives which might also be stuck behind a queue of incoming updates which you need to read off the socket before you can see the HELLO coming in. Of course you're toast either way if your interface queues are large enough and you don't do preferential queueing for BGP. Pete
participants (6)
-
Jesper Skriver
-
Lane Patterson
-
Mike Hughes
-
Petri Helenius
-
Stephen Sprunk
-
Vadim Antonov