Saku speaks from the privileged position of an infrastructure owner. We assume that interface connectivity is provided by L1 links, with OEO in transit nodes as a worst case.

But the clever budget conscious among us have deployed router links over provided MPLS based L2 services as critical infrastructure. We have an invisible WAN. In the absence of L1 PM statistics, how do we validate service over other networks? 802.3ag and y.1731 attempt to answer that question.

Of course the real answer is buy a wire.

On Fri, Jan 10, 2025, 1:33 AM Saku Ytti <saku@ytti.fi> wrote:

On Fri, 10 Jan 2025 at 00:34, David Zimmerman via NANOG <nanog@nanog.org> wrote:

> Towards Saku's, Tore's, and Tom's comments about watching error counters, I'll keep that in mind, though I expect I'll want to cover situations where frames are simply lost rather than errored. For example (tapping into Alex's point) on an L2VPN circuit with carrier underlay congestion where the last-mile circuits are otherwise clean.

Explain how this could happen?

Like if we are thinking of a scenario where the far-end didn't even
send it, because the link is full, then we will of course see the link
being full (if we adjust SNMP stats to L1 speed, we will know if it is
full or not).

If we are thinking situation where the far end didn't even send it,
and the link isn't full we are getting into the weeds and we probably
shouldn't try to optimise that, as trying to solve it systematically
may cause more problems than addressing each case individually.

If the far end did send a frame, but the frame didn't arrive, and
we're talking about point-to-point link, there is no reason to
optimize for cases where this isn't visible.
If the far end did send a frame, but the frame didn't arrive, and
we're talking about OEO transport devices between us, it may not be a
sufficiently common scenario to optimise for. There is a specific
exception here, and it is RFI assertion, OEO transport may lose the
RFI assertion instead of tunneling it, making interface-down detection
slow and it is impossible to prove it works now (without bringing
service down), even if it worked during provisioning of the service.
But this should be well covered by BFD/OEM.

--
++ytti