We had a similar issue happen and modified our BGP peering to use one BGP session per provider, as we had multiple neighbours for one of our peers. It seems to have resolved this particular issue for us. I would love to hear how others are actively probing their peers networks using an NMS to verify connectivity. Sam Roche - Supervisor of Network Operations - Lakeland Networks sroche@lakelandnetworks.com| Office: 705-640-0086 | Cell: 705-706-2606| www.lakelandnetworks.com IT SOLUTIONS for BUSINESS Fiber Optics, Wireless, DSL Network Provider; I.T. Support; Telephony Hardware and Cabling; SIP Trunks, VoIP; Server Hosting; Disaster Recovery Systems "The information contained in this message is directed in confidence solely to the person(s) named above and may not be otherwise distributed, copied or disclosed. The message may contain information that is privileged, proprietary and/or confidential and exempt from disclosure under applicable law. If you have received this message in error, please notify the sender immediately advising of the error and delete the message without making a copy." -----Original Message----- From: Christopher Morrow [mailto:morrowc.lists@gmail.com] Sent: October-23-13 11:06 PM To: JRC NOC Cc: nanog list Subject: Re: BGP failure analysis and recommendations On Wed, Oct 23, 2013 at 10:40 PM, JRC NOC <nospam-nanog@jensenresearch.com> wrote:
Is this just an unavoidable issue with scaling large networks?
nope... sounds like (to me at least) the forwarding plane and control plane are non-congruent in your provider's network :( so as you said, if the forwarding-plane is dorked up between you and 'the rest of their netowrk', but the edge device you are connected to thinks next-hops for routes are still valid... oops :(
Is it perhaps a known side effect of MPLS?
nope.
Have we/they lost something important in the changeover to converged mutiprotocol networks? Is there a better way for us edge networks to achieve IP resiliency in the current environment?
sadly I bet not, aside from active probing and disabling paths that are non-functional.