Re: Distributed Router Fabrics

1 Jan 2025

      (sorry I should have also mentioned one other thing below)

On Wed, Jan 1, 2025 at 3:30 PM Christopher Morrow
<morrowc.lists@gmail.com> wrote:
...
On Tue, Dec 24, 2024 at 8:26 AM Mike Hammett <nanog@ics-il.net> wrote:
...
In the articles I've read and videos I've watched, they have mentioned varying amounts of reduced power. I didn't commit them to memory because that wasn't the part I was interested in at the moment.
I'd think that, especially as data rates climb, the power consumption is going to really get important fast.
When a single device requires ~50kw to run ... I think you'll want to make sure you have space/power to deal with that :(
I'm not sure that distributed fabric plans make that problem better? (maybe it's all the same problem in the end because the fabric interconnect is
going to be distance limited/etc too)
One thing to really think about here is:
   "What is the traffic pattern you expect to see?"

I mean here:
  * "I have jewels in the center of my network, and everyone comes
through the edge to the jewel(s), and then back out"
    -  "just have a ton of pizza boxes, all traffic is edge-to-core
and I'm not spending optics/etc in the local edge area bouncing
traffic around"
  * "I have no jewels, I'm providing any-to-any connectivity, folk
might just bounce traffic through me and right back out the same
location to someone else"
    -  "oops, now I need to spray traffic in the local pop/metro and
do not need as much core-facing capacity out of that local area"

anyway, interesting conversation :)
...
...
Management of the things is a big thing I've been concerned about going into more modern systems. So often there's hand waiving regarding the orchestration piece of non-traditional systems. From what I've seen (and I would love to be wrong), you either build it in-house (not a small lift) or you buy something that ends up taking away all of the cost advantages that path had.
You almost certainly get into (pretty quickly) something that smells a bunch like:
  "here's my pile of ansible recipes for...."
  (choice of ansible here for example only, s/ansible/<whatever>/ of course to whatever you feel like)
That's maybe fine if that's your jam? I think it's hard to reason/plan/build without some automation plan 'now',
and it looks like a ton of folk start without that then try to retrofit once: "omg this is very large now... ugh" happens.
  (1-10 devices? sure fine do it by hand, 5-><bunches more> you really ought to have had an automation plan at ~5 ... my opinion clearly)
...
Failure domain stuff is part of what I'm trying to learn more about, which goes back to more about the fundamentals of how the fabric works.
yea... This part(reasoning about failure domains) I assume is also a tad hard.
A scenario is:
  "I built this 200tb fabric, I interconnect to the outside with ~100T max and internally with ~100T"
now that ~100T breaks and (ideally!) everything on the outside re-routes around to a different front-door... oops are you prepared for an extra ~100T arriving?
How do you deal with parts (fabric parts) failing in part? "oops only 50T of my 100T can get through here and ... I also am still telling my external neighbors all's good"
Really that failure-domain problem is tightly linked to the 'manage a ton of things' problem too.. at least for containing damage in a quick manner.