On Sat, 6 Aug 2022 at 17:08, <ljwobker@gmail.com> wrote:
For a while, GSR and CRS type systems had linecards where each card had a bunch of chips that together built the forwarding pipeline. You had chips for the L1/L2 interfaces, chips for the packet lookups, chips for the QoS/queueing math, and chips for the fabric interfaces. Over time, we integrated more and more of these things together until you (more or less) had a linecard where everything was done on one or two chips, instead of a half dozen or more. Once we got here, the next step was to build linecards where you actually had multiple independent things doing forwarding -- on the ASR9k we called these "slices". This again multiplies the performance you can get, but now both the software and the operators have to deal with the complexity of having multiple things running code where you used to only have one. Now let's jump into the 2010's where the silicon integration allows you to put down multiple cores or pipelines on a single chip, each of these is now (more or less) it's own forwarding entity. So now you've got yet ANOTHER layer of abstraction. If I can attempt to draw out the tree, it looks like this now:
1) you have a chassis or a system, which has a bunch of linecards. 2) each of those linecards has a bunch of NPUs/ASICs 3) each of those NPUs has a bunch of cores/pipelines
Thank you for this. I think we may have some ambiguity here. I'll ignore multichassis designs, as those went out of fashion, for now. And describe only 'NPU' not express/brcm style pipeline. 1) you have a chassis with multiple linecards 2) each linecard has 1 or more forwarding packages 3) each package has 1 or more NPUs (Juniper calls these slices, unsure if EZchip vocabulary is same here) 4) each NPU has 1 or more identical cores (well, I can't really name any with 1 core, I reckon, NPU like GPU pretty inherently has many many cores, and unlike some in this thread, I don't think they ever are ARM instruction set, that makes no sense, you create instruction set targeting the application at hand which ARM instruction set is not, but maybe some day we have some forwarding-IA, allowing customers to provide ucode that runs on multiple targets, but this would reduce pace of innovation) Some of those NPU core architectures are flat, like Trio, where a single core handles the entire packet. Where other core architectures, like FP are matrices, where you have multiple lines and packet picks 1 of the lines and traverses each core in line. (FP has much more cores in line, compared to leaba/pacific stuff) -- ++ytti