
Oh. And this is not getting better, this is getting worse. In juniper you can do flow -> logical -> physical -> npu level admission control. LPTS is NPU. So collateral damage is very expensive. There was 'lpts punt excessive-flow-trap' which was retired, and we couldn't get Cisco to understand why replacement is needed. E.g. interface1 customer has L2 loop, and offers us excessive amount of ARP. Other interfaces in same NPU are dead too, you used to be able to address this in excessive-flow-trap. Further it is impossible to expect customers to understand LPTS, when Cisco does not. We had PE-CE BGP flaps in 690279616, where TAC was focused on fixing our MQC config, despite LPTS not being subject to MQC at all. It took escalation to Xander, who initially thought ingress ACL can be used to discriminate here, until I reminded him how LPTS works and he luckily didn't try to gas light like TAC, but immediately agreed that LPTS is not subject to ingress ACL either (apparently it at some time was, which is why Xander was confused for a while). So when LPTS does have gaps or collateral damage, you can't even add ACL or ingress MQC to tactically address the offending interface. So lot more complexity would be needed, to make LPTS functional, but already the complexity is higher than what vendor can support. And complexity is being reduced (removal of flow-trap) without understanding why it was actually needed. On Sun, 16 Mar 2025 at 10:11, Saku Ytti <saku@ytti.fi> wrote:
LPTS is not really competitive with Juniper offering. But because Juniper needs configuration and LPTS does not, in practice LPTS ends up having better outcome. Granted the outcome is terrible and easy to bypass, but it is still better than typical Juniper outcome.
I could explain many gaps in it, absolute gaps and relative gaps to Juniper. But one particular thing is that dimensioning is all wrong, the device has no idea if it can handle what LPTS admits. For example, we regularly had 1/8th of our BGP peers go down, because some xipc worker was congested, because LPTS admitted too many packets to it, and ended up doing software drops. It does a poor job in deciding what should and what should not be admitted, and the rate at which they should be admitted, or that rate of session 1 does not overpower session 2.
The above problem is particularly hilarious, because the CPU performance was used by BGP, which meant XIPC had less CPU cycles to handle what LPTS admitted. Now because XIPC doesn't have higher priority over BGP, this of course meant that XIPC couldn't give the packet to BGP, causing more pressure and CPU cycle demand on BGP. If XIPC had had priority over BGP, then BGP processing would have been slowed down, but XIPC could have offered it the work it was going to need to do, reducing overall CPU time. eXR works better, but that's mostly out of luck, not out of design. Cisco marketed cXR as real time OS, and stressed that the point of real time was crucial for mission critical system. Yet cisco ran everything in flat priority, Cisco did try to introduce priorities in cXR internally, but it just made things worse, due to having incomplete understanding on what customers are doing and how. The losing 1/8th of BGP sessions regularly was known problem to cisco, and cisco explicitly decided not to try to address it, other than 'imaybe it'll work better on eXR'.
On Sun, 16 Mar 2025 at 09:01, Jakob Heitz (jheitz) via NANOG <nanog@lists.nanog.org> wrote:
Hi Saku,
Search the Internet for “IOS-XR LPTS” for one way to protect the control plane.
Regards, Jakob.
most others don't even have a way to protect control-plane.
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/TFPR5TJH...
-- ++ytti
-- ++ytti