Re: maximum ipv4 bgp prefix length of /24 ?

30 Sep 2023

      On Fri, 29 Sept 2023 at 23:43, William Herrin <bill@herrin.us> wrote:
...
My understanding of Juniper's approach to the problem is that instead
of employing TCAMs for next-hop lookup, they use general purpose CPUs
operating on a radix tree, exactly as you would for an all-software
They use proprietary NPUs, with proprietary IA. Which is called 'Trio'.

Single Trio can have hundreds of PPEs, packet processing engines,
these are all identical.

Packets are sprayed to PPEs, PPEs do not run constant time, so
reordering occurs always.

Juniper is a pioneer in FIB in DRAM, and has patente gated it to a
degree. So it takes a very very long time to get an answer from
memory.

To amortise this, PPEs have a lot of threads, and while waiting for
memory, another packet is worked on. But there is no pre-emption,
there is no kind of moving register/memory around or cache-misses here
as a function of FIB size. PPE does all the work it has, then it
requests an answer from memory, then goes to sleep, then comes back
when the answer arrives and does all the work it has, never
pre-empted.

But there is a lot more complexity here, memory used to be in the
original Trio RLDRAM which was a fairly simple setup. Once they
changed to HMC, they added a cache in front of memory, a proprietary
chip called CAE. IFLs were dynamically allocated one of multiple CAEs
they'd use to access memory. Single CAE wouldn't have 'wire rate'
performance. So if you had pathological setup, like 2 IFL, and you'd
get unlucky, you'd get both IFLs in some boots assigned to same CAE,
instead of spread to two CAEs, you would on some boots see lower PPS
performance than other boots, because you were hot-banking the CAE.
This is only type of cache problem I can recall related to Juniper.

But these devices are entirely proprietary and things move relatively
fast and complexity increases all the time.
...
router. This makes each lookup much slower than a TCAM can achieve.
However, that doesn't matter much: the lookup delays are much shorter
than the transmission delays so it's not noticeable to the user. To
In DRAM lookups, like what Juniper does, most of the time you're
waiting for the memory. With DRAM, FIB size is trivial engineering
problem, memory bandwidth and latency is the hard problem. Juniper
does not do TC AMs on it's service provider class devices.

-- 
  ++ytti

Re: maximum ipv4 bgp prefix length of /24 ?

Saku Ytti