Arista “IP-SLA” / Active Probing

20 Dec 2023

      Hello all,

We find ourselves trying to solve a requirement where we would like to test
the viability of our paths to the internet and tear down the bgp session if
it is determined to be faulty. We had an issue recently where we did not
lose link or bgp but the carrier lost the ability to route traffic to the
internet for us and our existing automatic detection and remediation
strategies failed to detect this condition and we lost customer packets.

Conceptually, we have a pair of DCS7050-QX landing a fiber each from two
ISPs with default routes on BGP at a dozen POPs around the US.

One of the ISPs is our primary transit, and one is predominantly for peered
customers, but we can use it for transit during issues with the primary
circuits.

I did some research on this and it seems like perhaps the on-boot event
handler launching a python daemon to do this active probing out each isp
circuit and then making config changes in response to transit failures
might be the best option available to us.

However, I thought I’d reach out to the broader community to see if there’s
a better way to solve this, has an example script, or if anyone has
recommendations for methods of active monitoring for protecting against
this sort of failure.

Thanks in advance for any insight and time.

*Alex Buie*Senior Cloud Operations Engineer

450 Century Pkwy # 100 Allen, TX 75013
<https://maps.google.com/?q=450+Century+Pkwy+STE+100+%7C+Allen,+TX+%7C+75013&entry=gmail&source=g>
D: 469-884-0225 | www.cytracom.com

Alex Buie

David Zimmerman

William Herrin

Tom Beecher

tags

participants (4)