On Fri, 29 Sept 2023 at 23:43, William Herrin <bill@herrin.us> wrote:
My understanding of Juniper's approach to the problem is that instead of employing TCAMs for next-hop lookup, they use general purpose CPUs operating on a radix tree, exactly as you would for an all-software
They use proprietary NPUs, with proprietary IA. Which is called 'Trio'. Single Trio can have hundreds of PPEs, packet processing engines, these are all identical. Packets are sprayed to PPEs, PPEs do not run constant time, so reordering occurs always. Juniper is a pioneer in FIB in DRAM, and has patente gated it to a degree. So it takes a very very long time to get an answer from memory. To amortise this, PPEs have a lot of threads, and while waiting for memory, another packet is worked on. But there is no pre-emption, there is no kind of moving register/memory around or cache-misses here as a function of FIB size. PPE does all the work it has, then it requests an answer from memory, then goes to sleep, then comes back when the answer arrives and does all the work it has, never pre-empted. But there is a lot more complexity here, memory used to be in the original Trio RLDRAM which was a fairly simple setup. Once they changed to HMC, they added a cache in front of memory, a proprietary chip called CAE. IFLs were dynamically allocated one of multiple CAEs they'd use to access memory. Single CAE wouldn't have 'wire rate' performance. So if you had pathological setup, like 2 IFL, and you'd get unlucky, you'd get both IFLs in some boots assigned to same CAE, instead of spread to two CAEs, you would on some boots see lower PPS performance than other boots, because you were hot-banking the CAE. This is only type of cache problem I can recall related to Juniper. But these devices are entirely proprietary and things move relatively fast and complexity increases all the time.
router. This makes each lookup much slower than a TCAM can achieve. However, that doesn't matter much: the lookup delays are much shorter than the transmission delays so it's not noticeable to the user. To
In DRAM lookups, like what Juniper does, most of the time you're waiting for the memory. With DRAM, FIB size is trivial engineering problem, memory bandwidth and latency is the hard problem. Juniper does not do TC AMs on it's service provider class devices. -- ++ytti