Definitely will be interesting to read the list
discussion about this. My first reaction was why would you even need
this, so def curious.
There are two main target situations: firstly, when a router
unexpectedly drops off an ixp platform, this won't be explicitly
signaled to the other routers on the fabric, which can mean that packets
to that device will be black-holed until all the others bgp hold
timers kick in. This would
typically happen after 90-180s (e.g. hello time: 30-60s). The second
situation would be to deal with forwarding plane incongruence on IXPs,
i.e. where router A can reach RS, router B can reach RS, router A cannot
reach router B due to a problem on the IXP fabric itself. Thankfully
this style of problem has become quite unusual over the last several
years.
I'm not sure it's a all-round good solution to either of these problems,
in the "be careful what you wish for, because you might get it" sense.
There are going to be router platforms out there which won't handle
hundreds of BFD sessions reliably, so if the protocol were widely
supported, it's not clear that it would help or harm interdomain routing
stability due to the ability of routers to handle large numbers of BFD
sessions, particularly where there were situations where all the
sessions could be triggered simultaneously.
As a separate issue, hold timers should generally be of a comparable
order of magnitude to the non-availability effect they're attempting to
mitigate. Inter-domain routing convergence is often measured in minutes
rather than seconds. So even if the protocol layer worked at IXPs
without causing control plane meltdown, it's still a mechanism which
which has a trigger timer two orders of magnitude faster than the
general case of DFZ reconvergence. I can't see that this would help
overall inter-domain routing stability.