Distributed Router Fabrics

newer
cloudflare timeout errors (from...

Mike Hammett

20 Dec 2024 20 Dec '24

4:06 p.m.

I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction? ----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ]

Show replies by date

When you say distributed router fabrics, are you thinking OCP concept with interconnect switch with ATM-like cell relay (after flowery speeches about "not betting against Ethernet", or course)? https://www.youtube.com/watch?v=l_hyZwf6-Y0 https://www.ufispace.com/company/blog/what-is-a-distributed-disaggregated-ch... mostly advocated by Drivenets. It has been a while, but from what I remember, the argument, and it has a lot of merit, is you can scale to a lot bigger "chassis" than you could with any bigiron device. If you look at Broadcom latest interconnect specs https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/..., you can build a pretty big Pops, and while they are trying to appeal mostly to AI cluster crowd, one could build aggregation services with that, or something smaller and you get incremental scaling and possibly higher availability, since everything is separated and you could even get enough RPs for proper consensus. I admit, I have never seen it outside of lab environment, but AT&T appears to like it. Plus all the mechanics of getting through your fabric are still handled by the vendor and you manage it like a single node. One could argue that with chassis systems, you can still scale incrementally, use different line card ports for access and aggregation and your leaf/interconnect is purely electrical, so you are not spending money on optics, so it does not exactly invalidate chassis setup and that is why every big vendor will sell you both, especially if you are not of AT&T scale. There is of course the other design with normal Ethernet fabrics based on Fat Tree or some other topology with all the normal protocols between the devices, but then you are in charge of setting up, traffic engineering and scaling those protocols. IETF has done interesting things with these scaling ideas and some vendors may have even implemented them to the point that they work. :) But "too many devices" argument starts creeping in. Yan On Fri, Dec 20, 2024 at 5:43 PM Mike Hammett <nanog@ics-il.net> wrote:

...

I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction?

----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ]

Mike Hammett

6:28 p.m.

Yeah, UfiSpace is where I had first seen it, but then I saw it elsewhere. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Yan Filyurin" <yanf787@gmail.com> To: "Mike Hammett" <nanog@ics-il.net> Cc: "NANOG" <nanog@nanog.org> Sent: Saturday, December 21, 2024 8:48:24 AM Subject: Re: Distributed Router Fabrics When you say distributed router fabrics, are you thinking OCP concept with interconnect switch with ATM-like cell relay (after flowery speeches about "not betting against Ethernet", or course)? https://www.youtube.com/watch?v=l_hyZwf6-Y0 https://www.ufispace.com/company/blog/what-is-a-distributed-disaggregated-ch... mostly advocated by Drivenets. It has been a while, but from what I remember, the argument, and it has a lot of merit, is you can scale to a lot bigger "chassis" than you could with any bigiron device. If you look at Broadcom latest interconnect specs https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/... , you can build a pretty big Pops, and while they are trying to appeal mostly to AI cluster crowd, one could build aggregation services with that, or something smaller and you get incremental scaling and possibly higher availability, since everything is separated and you could even get enough RPs for proper consensus. I admit, I have never seen it outside of lab environment, but AT&T appears to like it. Plus all the mechanics of getting through your fabric are still handled by the vendor and you manage it like a single node. One could argue that with chassis systems, you can still scale incrementally, use different line card ports for access and aggregation and your leaf/interconnect is purely electrical, so you are not spending money on optics, so it does not exactly invalidate chassis setup and that is why every big vendor will sell you both, especially if you are not of AT&T scale. There is of course the other design with normal Ethernet fabrics based on Fat Tree or some other topology with all the normal protocols between the devices, but then you are in charge of setting up, traffic engineering and scaling those protocols. IETF has done interesting things with these scaling ideas and some vendors may have even implemented them to the point that they work. :) But "too many devices" argument starts creeping in. Yan On Fri, Dec 20, 2024 at 5:43 PM Mike Hammett < nanog@ics-il.net > wrote: I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction? ----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ]

Tom Beecher

6:33 p.m.

It's just tradeoffs. Many of the benefits ( smaller failure domains, power savings , incremental expandability ) can be counterbalanced by increased operational complexity.

...

From my experiences, if you don't have proper automation/tooling for management/configuration and fault detection, it's a nightmare. If you do have those things, then the benefits can be substantial.

I think every network will have a tipping point in which such a model starts to make more sense, but at smaller scales I think fat chassis are still likely a better place to be. On Sat, Dec 21, 2024 at 9:51 AM Yan Filyurin <yanf787@gmail.com> wrote:

...

When you say distributed router fabrics, are you thinking OCP concept with interconnect switch with ATM-like cell relay (after flowery speeches about "not betting against Ethernet", or course)?

https://www.youtube.com/watch?v=l_hyZwf6-Y0

https://www.ufispace.com/company/blog/what-is-a-distributed-disaggregated-ch...

mostly advocated by Drivenets. It has been a while, but from what I remember, the argument, and it has a lot of merit, is you can scale to a lot bigger "chassis" than you could with any bigiron device. If you look at Broadcom latest interconnect specs https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/..., you can build a pretty big Pops, and while they are trying to appeal mostly to AI cluster crowd, one could build aggregation services with that, or something smaller and you get incremental scaling and possibly higher availability, since everything is separated and you could even get enough RPs for proper consensus. I admit, I have never seen it outside of lab environment, but AT&T appears to like it. Plus all the mechanics of getting through your fabric are still handled by the vendor and you manage it like a single node.

One could argue that with chassis systems, you can still scale incrementally, use different line card ports for access and aggregation and your leaf/interconnect is purely electrical, so you are not spending money on optics, so it does not exactly invalidate chassis setup and that is why every big vendor will sell you both, especially if you are not of AT&T scale.

There is of course the other design with normal Ethernet fabrics based on Fat Tree or some other topology with all the normal protocols between the devices, but then you are in charge of setting up, traffic engineering and scaling those protocols. IETF has done interesting things with these scaling ideas and some vendors may have even implemented them to the point that they work. :) But "too many devices" argument starts creeping in.

Yan

On Fri, Dec 20, 2024 at 5:43 PM Mike Hammett <nanog@ics-il.net> wrote:

...
I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction?

----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ]

Mike Hammett

7:54 p.m.

Oh, so you're saying that small networks benefit more from a traditional chassis than a distributed fabric? I would have thought it the other way around in that you could start with a single pizza box, then add another appropriate to the need, then another appropriate to the need as opposed to trying to figure out if you needed a 4, 8, 13, 16, or 20 slot chassis and then end up either over (or under) buying. I guess it also depends on one's definition of small. I guess it also depends on what tooling is available. So often, I see platforms offer a bunch of programability, but then no one commercially (or open source) provides that tooling - they expect you to build it yourself. Most anyone can sit down at XYZ chassis and figure it out, but if it's obscure distributed system without centralized tooling, that could be tricky. Well, if you have more than a handful of boxes. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Tom Beecher" <beecher@beecher.cc> To: "Yan Filyurin" <yanf787@gmail.com> Cc: "Mike Hammett" <nanog@ics-il.net>, "NANOG" <nanog@nanog.org> Sent: Saturday, December 21, 2024 12:33:54 PM Subject: Re: Distributed Router Fabrics It's just tradeoffs. Many of the benefits ( smaller failure domains, power savings , incremental expandability ) can be counterbalanced by increased operational complexity. From my experiences, if you don't have proper automation/tooling for management/configuration and fault detection, it's a nightmare. If you do have those things, then the benefits can be substantial. I think every network will have a tipping point in which such a model starts to make more sense, but at smaller scales I think fat chassis are still likely a better place to be. On Sat, Dec 21, 2024 at 9:51 AM Yan Filyurin < yanf787@gmail.com > wrote: When you say distributed router fabrics, are you thinking OCP concept with interconnect switch with ATM-like cell relay (after flowery speeches about "not betting against Ethernet", or course)? https://www.youtube.com/watch?v=l_hyZwf6-Y0 https://www.ufispace.com/company/blog/what-is-a-distributed-disaggregated-ch... mostly advocated by Drivenets. It has been a while, but from what I remember, the argument, and it has a lot of merit, is you can scale to a lot bigger "chassis" than you could with any bigiron device. If you look at Broadcom latest interconnect specs https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/... , you can build a pretty big Pops, and while they are trying to appeal mostly to AI cluster crowd, one could build aggregation services with that, or something smaller and you get incremental scaling and possibly higher availability, since everything is separated and you could even get enough RPs for proper consensus. I admit, I have never seen it outside of lab environment, but AT&T appears to like it. Plus all the mechanics of getting through your fabric are still handled by the vendor and you manage it like a single node. One could argue that with chassis systems, you can still scale incrementally, use different line card ports for access and aggregation and your leaf/interconnect is purely electrical, so you are not spending money on optics, so it does not exactly invalidate chassis setup and that is why every big vendor will sell you both, especially if you are not of AT&T scale. There is of course the other design with normal Ethernet fabrics based on Fat Tree or some other topology with all the normal protocols between the devices, but then you are in charge of setting up, traffic engineering and scaling those protocols. IETF has done interesting things with these scaling ideas and some vendors may have even implemented them to the point that they work. :) But "too many devices" argument starts creeping in. Yan On Fri, Dec 20, 2024 at 5:43 PM Mike Hammett < nanog@ics-il.net > wrote: <blockquote> I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction? ----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ] </blockquote>

Yan Filyurin

9:52 p.m.

Maybe more like medium, but if you know that you won't grow beyond a certain size and growth trajectory, chassis would make life easier. If you are dealing with some compute and you know how many racks you have, same thing. In fact with small networks, you are actually starting out with more than what you need, since you have to install "line card" and "backplane" boxes. Plus Route Processors. So you are thinking of going beyond the capacity of a single pizza box (or half of device), you are starting with a chassis. If you are going down the road of pizza boxes, it could be easier to standardize deployments to a single type of device. And not think about which chassis to buy. On Sat, Dec 21, 2024 at 2:54 PM Mike Hammett <nanog@ics-il.net> wrote:

...

Oh, so you're saying that small networks benefit more from a traditional chassis than a distributed fabric? I would have thought it the other way around in that you could start with a single pizza box, then add another appropriate to the need, then another appropriate to the need as opposed to trying to figure out if you needed a 4, 8, 13, 16, or 20 slot chassis and then end up either over (or under) buying.

I guess it also depends on one's definition of small.

I guess it also depends on what tooling is available. So often, I see platforms offer a bunch of programability, but then no one commercially (or open source) provides that tooling - they expect you to build it yourself. Most anyone can sit down at XYZ chassis and figure it out, but if it's obscure distributed system without centralized tooling, that could be tricky. Well, if you have more than a handful of boxes.

----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb> <https://www.linkedin.com/company/intelligent-computing-solutions> <https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix> <https://www.linkedin.com/company/midwest-internet-exchange> <https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------ *From: *"Tom Beecher" <beecher@beecher.cc> *To: *"Yan Filyurin" <yanf787@gmail.com> *Cc: *"Mike Hammett" <nanog@ics-il.net>, "NANOG" <nanog@nanog.org> *Sent: *Saturday, December 21, 2024 12:33:54 PM *Subject: *Re: Distributed Router Fabrics

It's just tradeoffs.

Many of the benefits ( smaller failure domains, power savings , incremental expandability ) can be counterbalanced by increased operational complexity. From my experiences, if you don't have proper automation/tooling for management/configuration and fault detection, it's a nightmare. If you do have those things, then the benefits can be substantial.

I think every network will have a tipping point in which such a model starts to make more sense, but at smaller scales I think fat chassis are still likely a better place to be.

On Sat, Dec 21, 2024 at 9:51 AM Yan Filyurin <yanf787@gmail.com> wrote:

...
When you say distributed router fabrics, are you thinking OCP concept with interconnect switch with ATM-like cell relay (after flowery speeches about "not betting against Ethernet", or course)?

https://www.youtube.com/watch?v=l_hyZwf6-Y0

https://www.ufispace.com/company/blog/what-is-a-distributed-disaggregated-ch...

mostly advocated by Drivenets. It has been a while, but from what I remember, the argument, and it has a lot of merit, is you can scale to a lot bigger "chassis" than you could with any bigiron device. If you look at Broadcom latest interconnect specs https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/..., you can build a pretty big Pops, and while they are trying to appeal mostly to AI cluster crowd, one could build aggregation services with that, or something smaller and you get incremental scaling and possibly higher availability, since everything is separated and you could even get enough RPs for proper consensus. I admit, I have never seen it outside of lab environment, but AT&T appears to like it. Plus all the mechanics of getting through your fabric are still handled by the vendor and you manage it like a single node.

One could argue that with chassis systems, you can still scale incrementally, use different line card ports for access and aggregation and your leaf/interconnect is purely electrical, so you are not spending money on optics, so it does not exactly invalidate chassis setup and that is why every big vendor will sell you both, especially if you are not of AT&T scale.

There is of course the other design with normal Ethernet fabrics based on Fat Tree or some other topology with all the normal protocols between the devices, but then you are in charge of setting up, traffic engineering and scaling those protocols. IETF has done interesting things with these scaling ideas and some vendors may have even implemented them to the point that they work. :) But "too many devices" argument starts creeping in.

Yan

On Fri, Dec 20, 2024 at 5:43 PM Mike Hammett <nanog@ics-il.net> wrote:

...
I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction?

----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ]

Mike Hammett

10:25 p.m.

True. Small networks would just have a single pizza box and call it a day. I haven't looked that deeply yet. I was assuming you could just start with a single pizza box and add more on as requirements matured. It certainly gets more complicated quickly if you can't do that. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Yan Filyurin" <yanf787@gmail.com> To: "Mike Hammett" <nanog@ics-il.net> Cc: "Tom Beecher" <beecher@beecher.cc>, "NANOG" <nanog@nanog.org> Sent: Saturday, December 21, 2024 3:52:27 PM Subject: Re: Distributed Router Fabrics Maybe more like medium, but if you know that you won't grow beyond a certain size and growth trajectory, chassis would make life easier. If you are dealing with some compute and you know how many racks you have, same thing. In fact with small networks, you are actually starting out with more than what you need, since you have to install "line card" and "backplane" boxes. Plus Route Processors. So you are thinking of going beyond the capacity of a single pizza box (or half of device), you are starting with a chassis. If you are going down the road of pizza boxes, it could be easier to standardize deployments to a single type of device. And not think about which chassis to buy. On Sat, Dec 21, 2024 at 2:54 PM Mike Hammett < nanog@ics-il.net > wrote: Oh, so you're saying that small networks benefit more from a traditional chassis than a distributed fabric? I would have thought it the other way around in that you could start with a single pizza box, then add another appropriate to the need, then another appropriate to the need as opposed to trying to figure out if you needed a 4, 8, 13, 16, or 20 slot chassis and then end up either over (or under) buying. I guess it also depends on one's definition of small. I guess it also depends on what tooling is available. So often, I see platforms offer a bunch of programability, but then no one commercially (or open source) provides that tooling - they expect you to build it yourself. Most anyone can sit down at XYZ chassis and figure it out, but if it's obscure distributed system without centralized tooling, that could be tricky. Well, if you have more than a handful of boxes. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP From: "Tom Beecher" < beecher@beecher.cc > To: "Yan Filyurin" < yanf787@gmail.com > Cc: "Mike Hammett" < nanog@ics-il.net >, "NANOG" < nanog@nanog.org > Sent: Saturday, December 21, 2024 12:33:54 PM Subject: Re: Distributed Router Fabrics It's just tradeoffs. Many of the benefits ( smaller failure domains, power savings , incremental expandability ) can be counterbalanced by increased operational complexity. From my experiences, if you don't have proper automation/tooling for management/configuration and fault detection, it's a nightmare. If you do have those things, then the benefits can be substantial. I think every network will have a tipping point in which such a model starts to make more sense, but at smaller scales I think fat chassis are still likely a better place to be. On Sat, Dec 21, 2024 at 9:51 AM Yan Filyurin < yanf787@gmail.com > wrote: <blockquote> When you say distributed router fabrics, are you thinking OCP concept with interconnect switch with ATM-like cell relay (after flowery speeches about "not betting against Ethernet", or course)? https://www.youtube.com/watch?v=l_hyZwf6-Y0 https://www.ufispace.com/company/blog/what-is-a-distributed-disaggregated-ch... mostly advocated by Drivenets. It has been a while, but from what I remember, the argument, and it has a lot of merit, is you can scale to a lot bigger "chassis" than you could with any bigiron device. If you look at Broadcom latest interconnect specs https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/... , you can build a pretty big Pops, and while they are trying to appeal mostly to AI cluster crowd, one could build aggregation services with that, or something smaller and you get incremental scaling and possibly higher availability, since everything is separated and you could even get enough RPs for proper consensus. I admit, I have never seen it outside of lab environment, but AT&T appears to like it. Plus all the mechanics of getting through your fabric are still handled by the vendor and you manage it like a single node. One could argue that with chassis systems, you can still scale incrementally, use different line card ports for access and aggregation and your leaf/interconnect is purely electrical, so you are not spending money on optics, so it does not exactly invalidate chassis setup and that is why every big vendor will sell you both, especially if you are not of AT&T scale. There is of course the other design with normal Ethernet fabrics based on Fat Tree or some other topology with all the normal protocols between the devices, but then you are in charge of setting up, traffic engineering and scaling those protocols. IETF has done interesting things with these scaling ideas and some vendors may have even implemented them to the point that they work. :) But "too many devices" argument starts creeping in. Yan On Fri, Dec 20, 2024 at 5:43 PM Mike Hammett < nanog@ics-il.net > wrote: <blockquote> I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction? ----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ] </blockquote> </blockquote>

Tom Beecher

11:13 p.m.

...

I haven't looked that deeply yet. I was assuming you could just start with a single pizza box and add more on as requirements matured. It certainly gets more complicated quickly if you can't do that.

Fabric, fabric, fabric. For a modular chassis, the fabric capacity has to be sized so it's fully non-blocking for a fully populated set of linecards. Even if you're only using 2/10 slots, still needs to be sized for all 10. For distributed chassis , you can get away with sizing your fabric stage for exactly what you need, but you still have to be aware of expansion, port allocations, etc. You normally would pre-plan out the max size , since you can't easily just shuffle those 'internal links' around. You also tend to want to buy switches for the full fabric in one shot anyways ; mixing buffer sizes in these at the same stage/level is death. When the vendor builds it in, you don't have to think about all of these things. But if you build it yourself you really have to understand it. On Sat, Dec 21, 2024 at 5:25 PM Mike Hammett <nanog@ics-il.net> wrote:

...

True. Small networks would just have a single pizza box and call it a day.

I haven't looked that deeply yet. I was assuming you could just start with a single pizza box and add more on as requirements matured. It certainly gets more complicated quickly if you can't do that.

----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb> <https://www.linkedin.com/company/intelligent-computing-solutions> <https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix> <https://www.linkedin.com/company/midwest-internet-exchange> <https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------ *From: *"Yan Filyurin" <yanf787@gmail.com> *To: *"Mike Hammett" <nanog@ics-il.net> *Cc: *"Tom Beecher" <beecher@beecher.cc>, "NANOG" <nanog@nanog.org> *Sent: *Saturday, December 21, 2024 3:52:27 PM *Subject: *Re: Distributed Router Fabrics

Maybe more like medium, but if you know that you won't grow beyond a certain size and growth trajectory, chassis would make life easier. If you are dealing with some compute and you know how many racks you have, same thing. In fact with small networks, you are actually starting out with more than what you need, since you have to install "line card" and "backplane" boxes. Plus Route Processors. So you are thinking of going beyond the capacity of a single pizza box (or half of device), you are starting with a chassis.

If you are going down the road of pizza boxes, it could be easier to standardize deployments to a single type of device. And not think about which chassis to buy.

On Sat, Dec 21, 2024 at 2:54 PM Mike Hammett <nanog@ics-il.net> wrote:

...
Oh, so you're saying that small networks benefit more from a traditional chassis than a distributed fabric? I would have thought it the other way around in that you could start with a single pizza box, then add another appropriate to the need, then another appropriate to the need as opposed to trying to figure out if you needed a 4, 8, 13, 16, or 20 slot chassis and then end up either over (or under) buying.

I guess it also depends on one's definition of small.

I guess it also depends on what tooling is available. So often, I see platforms offer a bunch of programability, but then no one commercially (or open source) provides that tooling - they expect you to build it yourself. Most anyone can sit down at XYZ chassis and figure it out, but if it's obscure distributed system without centralized tooling, that could be tricky. Well, if you have more than a handful of boxes.

----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb> <https://www.linkedin.com/company/intelligent-computing-solutions> <https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix> <https://www.linkedin.com/company/midwest-internet-exchange> <https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------ *From: *"Tom Beecher" <beecher@beecher.cc> *To: *"Yan Filyurin" <yanf787@gmail.com> *Cc: *"Mike Hammett" <nanog@ics-il.net>, "NANOG" <nanog@nanog.org> *Sent: *Saturday, December 21, 2024 12:33:54 PM *Subject: *Re: Distributed Router Fabrics

It's just tradeoffs.

Many of the benefits ( smaller failure domains, power savings , incremental expandability ) can be counterbalanced by increased operational complexity. From my experiences, if you don't have proper automation/tooling for management/configuration and fault detection, it's a nightmare. If you do have those things, then the benefits can be substantial.

I think every network will have a tipping point in which such a model starts to make more sense, but at smaller scales I think fat chassis are still likely a better place to be.

On Sat, Dec 21, 2024 at 9:51 AM Yan Filyurin <yanf787@gmail.com> wrote:

...
When you say distributed router fabrics, are you thinking OCP concept with interconnect switch with ATM-like cell relay (after flowery speeches about "not betting against Ethernet", or course)?

https://www.youtube.com/watch?v=l_hyZwf6-Y0

https://www.ufispace.com/company/blog/what-is-a-distributed-disaggregated-ch...

mostly advocated by Drivenets. It has been a while, but from what I remember, the argument, and it has a lot of merit, is you can scale to a lot bigger "chassis" than you could with any bigiron device. If you look at Broadcom latest interconnect specs https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/..., you can build a pretty big Pops, and while they are trying to appeal mostly to AI cluster crowd, one could build aggregation services with that, or something smaller and you get incremental scaling and possibly higher availability, since everything is separated and you could even get enough RPs for proper consensus. I admit, I have never seen it outside of lab environment, but AT&T appears to like it. Plus all the mechanics of getting through your fabric are still handled by the vendor and you manage it like a single node.

One could argue that with chassis systems, you can still scale incrementally, use different line card ports for access and aggregation and your leaf/interconnect is purely electrical, so you are not spending money on optics, so it does not exactly invalidate chassis setup and that is why every big vendor will sell you both, especially if you are not of AT&T scale.

There is of course the other design with normal Ethernet fabrics based on Fat Tree or some other topology with all the normal protocols between the devices, but then you are in charge of setting up, traffic engineering and scaling those protocols. IETF has done interesting things with these scaling ideas and some vendors may have even implemented them to the point that they work. :) But "too many devices" argument starts creeping in.

Yan

On Fri, Dec 20, 2024 at 5:43 PM Mike Hammett <nanog@ics-il.net> wrote:

...
I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction?

----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ]

Mike Hammett

23 Dec 23 Dec

3:15 p.m.

"Obviously, at some point, buying a big chassis..." Actually, as I read more about it and watch more videos about it, it seems like that isn't necessarily true. The claims they have at the top end surpass what any chassis platform I've seen is capable of, though I don't know that they actually have pushed the upper bounds of what's possible in the real world. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Mike Hammett" <nanog@ics-il.net> To: "NANOG" <nanog@nanog.org> Sent: Friday, December 20, 2024 10:06:36 AM Subject: Distributed Router Fabrics I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction? ----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ]

David Sinn

6:23 p.m.

Saku Ytti

24 Dec 24 Dec

2:54 p.m.

On Tue, 24 Dec 2024 at 16:50, David Sinn <dsinn@uw.edu> wrote Actually distributed BFD (not "it's all running on one line card because

...

customers like LACP bundles spread between line cards and that's really hard to distribute reliably), solved.

Isn't the solution here the same? Have all LACP member ports in the same chip? -- ++ytti

Tom Beecher

2:59 p.m.

...

Isn't the solution here the same? Have all LACP member ports in the same chip?

Which any self-respecting operator usually doesn't want to do because of failure domains. On Tue, Dec 24, 2024 at 9:57 AM Saku Ytti <saku@ytti.fi> wrote:

...

On Tue, 24 Dec 2024 at 16:50, David Sinn <dsinn@uw.edu> wrote

Actually distributed BFD (not "it's all running on one line card because

...
customers like LACP bundles spread between line cards and that's really hard to distribute reliably), solved.

Isn't the solution here the same? Have all LACP member ports in the same chip?

-- ++ytti

Saku Ytti

3:08 p.m.

On Tue, 24 Dec 2024 at 17:00, Tom Beecher <beecher@beecher.cc> wrote:

...

...
Isn't the solution here the same? Have all LACP member ports in the same chip? Which any self-respecting operator usually doesn't want to do because of failure domains.

I'm just struggling to understand the difference between stack-of-pizza to chassis here. -- ++ytti

Tom Beecher

3:21 p.m.

It's possible I s/chip/ in my head with a different meaning than you intended, and I am answering a different question. I generally won't put all LAG members on the same ASIC, or even same linecard, for failure domain reasons. I also don't really care about possible challenges with BFD there, because I just use micro-BFD on members + min-links. On Tue, Dec 24, 2024 at 10:09 AM Saku Ytti <saku@ytti.fi> wrote:

...

On Tue, 24 Dec 2024 at 17:00, Tom Beecher <beecher@beecher.cc> wrote:

...
...
Isn't the solution here the same? Have all LACP member ports in the same chip? Which any self-respecting operator usually doesn't want to do because of failure domains.

I'm just struggling to understand the difference between stack-of-pizza to chassis here.

-- ++ytti

Saku Ytti

3:28 p.m.

On Tue, 24 Dec 2024 at 17:22, Tom Beecher <beecher@beecher.cc> wrote:

...

It's possible I s/chip/ in my head with a different meaning than you intended, and I am answering a different question.

I generally won't put all LAG members on the same ASIC, or even same linecard, for failure domain reasons. I also don't really care about possible challenges with BFD there, because I just use micro-BFD on members + min-links.

Quite, it depends what is important for your case. You may want to put all in one chip for better feature parity in terms of QoS, counters et.al., especially if you want them to fail as one, because you're doing it purely for capacity, not for redundancy. And indeed without uBFD, you're going to run LACP over one interface in one chip at most, anyhow, and with uBFD each member are going to run their own, anyhow. So I wonder, what benefits is OP seeing here when it comes to pizzabox? To me pizzzabox seems identical here to chassis box with LACP spanning only single chip. -- ++ytti

Tom Beecher

3:38 p.m.

>
> So I wonder, what benefits is OP seeing here when it comes to
> pizzabox? To me pizzzabox seems identical here to chassis box with
> LACP spanning only single chip.


Yeah that's a good question.


On Tue, Dec 24, 2024 at 10:28 AM Saku Ytti <saku@ytti.fi> wrote:

> On Tue, 24 Dec 2024 at 17:22, Tom Beecher <beecher@beecher.cc> wrote:
>
> > It's possible I s/chip/ in my head with a different meaning than you
> intended, and I am answering a different question.
> >
> > I generally won't put all LAG members on the same ASIC, or even same
> linecard, for failure domain reasons. I also don't really care about
> possible challenges with BFD there, because I just use micro-BFD on members
> + min-links.
>
> Quite, it depends what is important for your case. You may want to put
> all in one chip for better feature parity in terms of QoS, counters
> et.al., especially if you want them to fail as one, because you're
> doing it purely for capacity, not for redundancy.
> And indeed without uBFD, you're going to run LACP over one interface
> in one chip at most, anyhow, and with uBFD each member are going to
> run their own, anyhow.
>
> So I wonder, what benefits is OP seeing here when it comes to
> pizzabox? To me pizzzabox seems identical here to chassis box with
> LACP spanning only single chip.
> --
>   ++ytti
>

Mike Hammett

7:38 p.m.

" what benefits is OP seeing here when it comes to pizzabox" I'm more learning and questioning than stating. I've thoroughly enjoyed the thread. One of the main advantages I saw from the outset was that I could start with a single box and then grow if needed. Other than recabling, if not planned for accordingly, it seems like I can still do that. You would have an increased cost once you had to add a fabric box, but you've already had some amount of scale to get there. With a chassis system, you have the larger cost up front before you even know how you're going to scale. It's more difficult to plan what sized solution and no matter what you do, you'll probably pick the wrong one. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Saku Ytti" <saku@ytti.fi> To: "Tom Beecher" <beecher@beecher.cc> Cc: "NANOG" <nanog@nanog.org> Sent: Tuesday, December 24, 2024 9:28:06 AM Subject: Re: Distributed Router Fabrics On Tue, 24 Dec 2024 at 17:22, Tom Beecher <beecher@beecher.cc> wrote:

...

It's possible I s/chip/ in my head with a different meaning than you intended, and I am answering a different question.

I generally won't put all LAG members on the same ASIC, or even same linecard, for failure domain reasons. I also don't really care about possible challenges with BFD there, because I just use micro-BFD on members + min-links.

Shawn L

8:06 p.m.

It has been an interesting discussion. Always willing to see what others are doing in this space and evaluate it to what we're doing / thinking about going forward. We were just quoted 250k+ for a Cisco ASR9902 with one route processor card (list. not our price). There can be 2 rps, but they don't talk to each other -- who thought that was a good idea. We were just looking for something that could take a couple of full tables and had 2 - 4 100g connections. We're still using ASR920s (12 10g ports, rather uncommon) at most of our pops but have seen the writing on the wall that 10g will not be enough at some point and 100g will be necessary. I did have an interesting conversation with Ribbon (we have a C15 phone switch) about their Neptune platform. Surprisingly affordable when you look at what Cisco is charging. Though they didn't have as many 100g interfaces as I'd like -- can't see using them as a BGP speaking router, but for internal transport stuff, it was definitely attractive. Again, always nice to see what others are using / considering for similar stuff. Too often all we hear about are the 'really big guys' and how they're deploying X for 400g now, etc. Shawn -----Original Message----- From: "Mike Hammett" <nanog@ics-il.net> Sent: Tuesday, December 24, 2024 2:38pm To: "Saku Ytti" <saku@ytti.fi> Cc: "NANOG" <nanog@nanog.org> Subject: Re: Distributed Router Fabrics "what benefits is OP seeing here when it comes to pizzabox"I'm more learning and questioning than stating. I've thoroughly enjoyed the thread. One of the main advantages I saw from the outset was that I could start with a single box and then grow if needed. Other than recabling, if not planned for accordingly, it seems like I can still do that. You would have an increased cost once you had to add a fabric box, but you've already had some amount of scale to get there. With a chassis system, you have the larger cost up front before you even know how you're going to scale. It's more difficult to plan what sized solution and no matter what you do, you'll probably pick the wrong one. -----Mike Hammett[ Intelligent Computing Solutions ]( http://www.ics-il.com/ )[ ]( https://www.facebook.com/ICSIL )[ ]( https://plus.google.com/+IntelligentComputingSolutionsDeKalb )[ ]( https://www.linkedin.com/company/intelligent-computing-solutions )[ ]( https://twitter.com/ICSIL )[ Midwest Internet Exchange ]( http://www.midwest-ix.com/ )[ ]( https://www.facebook.com/mdwestix )[ ]( https://www.linkedin.com/company/midwest-internet-exchange )[ ]( https://twitter.com/mdwestix )[ The Brothers WISP ]( http://www.thebrotherswisp.com/ )[ ]( https://www.facebook.com/thebrotherswisp )[ ]( https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ) From: "Saku Ytti" <saku@ytti.fi> To: "Tom Beecher" <beecher@beecher.cc> Cc: "NANOG" <nanog@nanog.org> Sent: Tuesday, December 24, 2024 9:28:06 AM Subject: Re: Distributed Router Fabrics On Tue, 24 Dec 2024 at 17:22, Tom Beecher <beecher@beecher.cc> wrote:

...

It's possible I s/chip/ in my head with a different meaning than you intended, and I am answering a different question.

I generally won't put all LAG members on the same ASIC, or even same linecard, for failure domain reasons. I also don't really care about possible challenges with BFD there, because I just use micro-BFD on members + min-links.

Tony Wicks

9:39 p.m.

...

From: NANOG <nanog-bounces+tony=wicks.co.nz@nanog.org> On Behalf Of Shawn L via NANOG Sent: Wednesday, 25 December 2024 9:07 am To: NANOG <nanog@nanog.org> Subject: Re: Distributed Router Fabrics

...

It has been an interesting discussion. Always willing to see what others are doing in this space and evaluate it to what we're doing / thinking about going forward.

...

We were just quoted 250k+ for a Cisco ASR9902 with one route processor card (list. not our price). There can be 2 rps, but they don't talk to each other -- who thought that was a good idea. We were just looking for something that could take a couple of full tables and had 2 - 4 100g connections.

...

We're still using ASR920s (12 10g ports, rather uncommon) at most of our pops but have seen the writing on the wall that 10g will not be enough at some point and 100g will be necessary.

Nokia 7750-sr2s would do what you want at a much better price point I would suggest.

...

I did have an interesting conversation with Ribbon (we have a C15 phone switch) about their Neptune platform. Surprisingly affordable when you look at what Cisco is charging. Though they didn't have as many 100g interfaces as I'd like -- can't see using them as a BGP speaking router, but for internal transport stuff, it was definitely attractive.

...

Again, always nice to see what others are using / considering for similar stuff. Too often all we hear about are the 'really big guys' and how they're deploying X for 400g now, etc.

Shawn

Saku Ytti

25 Dec 25 Dec

8:15 a.m.

On Tue, 24 Dec 2024 at 21:38, Mike Hammett <nanog@ics-il.net> wrote: "what benefits is OP seeing here when it comes to pizzabox"

...

I'm more learning and questioning than stating. I've thoroughly enjoyed the thread.

One of the main advantages I saw from the outset was that I could start with a single box and then grow if needed. Other than recabling, if not planned for accordingly, it seems like I can still do that. You would have an increased cost once you had to add a fabric box, but you've already had some amount of scale to get there. With a chassis system, you have the larger cost up front before you even know how you're going to scale. It's more difficult to plan what sized solution and no matter what you do, you'll probably pick the wrong one.

Thank you for sharing that Mike, however I was curious about the specific case of LACP OP stated. -- ++ytti

Tom Beecher

24 Dec 24 Dec

3:12 p.m.

Much of this is right, but again with caveats. - The boxes are fungible, to a point. Differences in ASICs, buffers, etc can really create traffic problems if you mix wrong. You don't want to be yolo'ing whatever is cheap this month in there. - You're going to eventually have a feature need that commercial management software doesn't account for. Can they build it for you, and how much is that? If you built your own software to manage it, how much does it cost you to build it? - You're very correct about how initial mistakes or things you didn't know can bite you hard later. The wrong growing pain can really hurt if you're not prepared for it. - Really have to think about the internals and the design. There are some companies who have presented on how they built these things, and when you listen to their protocol design, it makes your head hurt how much overcomplication was built in. Like I said before, distributed fabrics CAN be amazing, but there are always tradeoffs. There are some things you don't have to care about with a big chassis, but you do with a DF. And the other way around as well. It's about picking which set of things you WANT to deal with, or are better for you to deal with than the other. On Tue, Dec 24, 2024 at 9:50 AM David Sinn <dsinn@uw.edu> wrote:

...

From experience I can tell you that once you fully operationalize the pizza box model you will never go back to the chassis model. Why would you trade, open, standards based model for interconnect (OSPF and BGP work great at scale) for proprietary black boxes that do stupid router tricks to make a bunch of discrete components pretend to be one along with giving you the benefit of a huge blast-radius when the software inevitably breaks? Distributed ARP/ND, solved. Actually distributed BFD (not "it's all running on one line card because customers like LACP bundles spread between line cards and that's really hard to distribute reliably), solved. Pizza box models means the boxes are fungible. So you can competitively bid between multiple suppliers and pick and choose who you want to buy from depending on what is the most important thing at the time (delivery dates? price? which of them is annoying you the least at that moment in time?). They are also infinitely more scaleable(*) than any big chassis model. State of the art 5 years ago had Internet edge systems deploying with 8k of 400G ports and datacenter deployments with 65k 400G ports using the same fundamental design.

The real downside: vendors don't like the flexibility that it affords the customer and the meaninglessness of differentiation between vendors that it drives the operators to avoid.

David

(*) - Among some of the critical things to get right from the outset is what peak scale you want to have for the fabric because recabling is not something to be taken lightly...

On Dec 23, 2024, at 7:15 AM, Mike Hammett <nanog@ics-il.net> wrote:

"Obviously, at some point, buying a big chassis..."

Actually, as I read more about it and watch more videos about it, it seems like that isn't necessarily true. The claims they have at the top end surpass what any chassis platform I've seen is capable of, though I don't know that they actually have pushed the upper bounds of what's possible in the real world.

----- Mike Hammett Intelligent Computing Solutions <https://urldefense.com/v3/__http://www.ics-il.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0E8DBeZtQ$>

<https://urldefense.com/v3/__https://www.facebook.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HlU5g15g$> <https://urldefense.com/v3/__https://plus.google.com/*IntelligentComputingSolutionsDeKalb__;Kw!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fd4VS5Aw$> <https://urldefense.com/v3/__https://www.linkedin.com/company/intelligent-computing-solutions__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HbHzd3kA$> <https://urldefense.com/v3/__https://twitter.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fj7RvhHQ$> Midwest Internet Exchange <https://urldefense.com/v3/__http://www.midwest-ix.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Gzrk6nZQ$>

<https://urldefense.com/v3/__https://www.facebook.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0F4YUnLMg$> <https://urldefense.com/v3/__https://www.linkedin.com/company/midwest-internet-exchange__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HgozL-IA$> <https://urldefense.com/v3/__https://twitter.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GiPThd0g$> The Brothers WISP <https://urldefense.com/v3/__http://www.thebrotherswisp.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GRvdikNA$>

<https://urldefense.com/v3/__https://www.facebook.com/thebrotherswisp__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0FzIY3xBA$> <https://urldefense.com/v3/__https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GJBCPlCg$> ------------------------------ *From: *"Mike Hammett" <nanog@ics-il.net> *To: *"NANOG" <nanog@nanog.org> *Sent: *Friday, December 20, 2024 10:06:36 AM *Subject: *Distributed Router Fabrics

I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction?

----- Mike Hammett [ http://www.ics-il.com/ <https://urldefense.com/v3/__http://www.ics-il.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0E8DBeZtQ$> | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL <https://urldefense.com/v3/__https://www.facebook.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HlU5g15g$> ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb <https://urldefense.com/v3/__https://plus.google.com/*IntelligentComputingSolutionsDeKalb__;Kw!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fd4VS5Aw$> ] [ https://www.linkedin.com/company/intelligent-computing-solutions <https://urldefense.com/v3/__https://www.linkedin.com/company/intelligent-computing-solutions__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HbHzd3kA$> ] [ https://twitter.com/ICSIL <https://urldefense.com/v3/__https://twitter.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fj7RvhHQ$> ] [ http://www.midwest-ix.com/ <https://urldefense.com/v3/__http://www.midwest-ix.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Gzrk6nZQ$> | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix <https://urldefense.com/v3/__https://www.facebook.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0F4YUnLMg$> ] [ https://www.linkedin.com/company/midwest-internet-exchange <https://urldefense.com/v3/__https://www.linkedin.com/company/midwest-internet-exchange__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HgozL-IA$> ] [ https://twitter.com/mdwestix <https://urldefense.com/v3/__https://twitter.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GiPThd0g$> ] [ http://www.thebrotherswisp.com/ <https://urldefense.com/v3/__http://www.thebrotherswisp.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GRvdikNA$> | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp <https://urldefense.com/v3/__https://www.facebook.com/thebrotherswisp__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0FzIY3xBA$> ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg <https://urldefense.com/v3/__https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GJBCPlCg$> ]

Vasilenko Eduard

4:49 p.m.

Every port has 2 costs associated with it: port itself and optical pluggable. Historically, the proportion was even bigger than 10:1, but optical people did manage to preserve their margin better. I did not try to calculate for a couple of years, but before the cost of a router port was comparable to single-mode pluggable, multi-mode was cheaper, and copper was even cheaper. (it was funny to see that a discount was asked from the networking vendor, but the biggest payment was going to the optical vendor, especially for switches) 2 hops though CLOS architecture is 4 ports and 4 pluggables – all are typically (in Telco) single-mode for unification. In the case of a chassis, the cross-bar is electrical and costs very little money. Actually, the chassis-based router has 4 ports too on the traffic path, but single-mode pluggable is only on external ports, internal ports are cost-compared to copper pluggable. Hence, the chassis-based router has a natural advantage: less cost on SFP/QSFP/etc. Strictly speaking, pluggable is not a networking vendor's business, especially for new high-speed interfaces. The actual situation is that pizza-box (even from a respected vendor, where all features are available) is cheaper than chassis-based (per port) is attributed to non-technical reasons, primarily bigger competition. Pay attention that “Modern Switch” is just a package for “Optical Pluggables”. Hence, DC people are making very strange designs (with a lot of compromises, typically oversubscription) to downgrade multi-mode pluggable to copper pluggable. Then claim it as a big achievement. The money is there. PS: I agree (with Tom) that the feature list is important even for the pizza box. Eduard From: NANOG <nanog-bounces+vasilenko.eduard=huawei.com@nanog.org> On Behalf Of Tom Beecher Sent: Tuesday, December 24, 2024 18:12 To: David Sinn <dsinn@uw.edu> Cc: NANOG <nanog@nanog.org> Subject: Re: Distributed Router Fabrics Much of this is right, but again with caveats. - The boxes are fungible, to a point. Differences in ASICs, buffers, etc can really create traffic problems if you mix wrong. You don't want to be yolo'ing whatever is cheap this month in there. - You're going to eventually have a feature need that commercial management software doesn't account for. Can they build it for you, and how much is that? If you built your own software to manage it, how much does it cost you to build it? - You're very correct about how initial mistakes or things you didn't know can bite you hard later. The wrong growing pain can really hurt if you're not prepared for it. - Really have to think about the internals and the design. There are some companies who have presented on how they built these things, and when you listen to their protocol design, it makes your head hurt how much overcomplication was built in. Like I said before, distributed fabrics CAN be amazing, but there are always tradeoffs. There are some things you don't have to care about with a big chassis, but you do with a DF. And the other way around as well. It's about picking which set of things you WANT to deal with, or are better for you to deal with than the other. On Tue, Dec 24, 2024 at 9:50 AM David Sinn <dsinn@uw.edu<mailto:dsinn@uw.edu>> wrote: From experience I can tell you that once you fully operationalize the pizza box model you will never go back to the chassis model. Why would you trade, open, standards based model for interconnect (OSPF and BGP work great at scale) for proprietary black boxes that do stupid router tricks to make a bunch of discrete components pretend to be one along with giving you the benefit of a huge blast-radius when the software inevitably breaks? Distributed ARP/ND, solved. Actually distributed BFD (not "it's all running on one line card because customers like LACP bundles spread between line cards and that's really hard to distribute reliably), solved. Pizza box models means the boxes are fungible. So you can competitively bid between multiple suppliers and pick and choose who you want to buy from depending on what is the most important thing at the time (delivery dates? price? which of them is annoying you the least at that moment in time?). They are also infinitely more scaleable(*) than any big chassis model. State of the art 5 years ago had Internet edge systems deploying with 8k of 400G ports and datacenter deployments with 65k 400G ports using the same fundamental design. The real downside: vendors don't like the flexibility that it affords the customer and the meaninglessness of differentiation between vendors that it drives the operators to avoid. David (*) - Among some of the critical things to get right from the outset is what peak scale you want to have for the fabric because recabling is not something to be taken lightly... On Dec 23, 2024, at 7:15 AM, Mike Hammett <nanog@ics-il.net<mailto:nanog@ics-il.net>> wrote: "Obviously, at some point, buying a big chassis..." Actually, as I read more about it and watch more videos about it, it seems like that isn't necessarily true. The claims they have at the top end surpass what any chassis platform I've seen is capable of, though I don't know that they actually have pushed the upper bounds of what's possible in the real world. ----- Mike Hammett Intelligent Computing Solutions<https://urldefense.com/v3/__http:/www.ics-il.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0E8DBeZtQ$> [http://www.ics-il.com/images/fbicon.png]<https://urldefense.com/v3/__https:/www.facebook.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HlU5g15g$>[http://www.ics-il.com/images/googleicon.png]<https://urldefense.com/v3/__https:/plus.google.com/*IntelligentComputingSolutionsDeKalb__;Kw!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fd4VS5Aw$>[http://www.ics-il.com/images/linkedinicon.png]<https://urldefense.com/v3/__https:/www.linkedin.com/company/intelligent-computing-solutions__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HbHzd3kA$>[http://www.ics-il.com/images/twittericon.png]<https://urldefense.com/v3/__https:/twitter.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fj7RvhHQ$> Midwest Internet Exchange<https://urldefense.com/v3/__http:/www.midwest-ix.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Gzrk6nZQ$> [http://www.ics-il.com/images/fbicon.png]<https://urldefense.com/v3/__https:/www.facebook.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0F4YUnLMg$>[http://www.ics-il.com/images/linkedinicon.png]<https://urldefense.com/v3/__https:/www.linkedin.com/company/midwest-internet-exchange__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HgozL-IA$>[http://www.ics-il.com/images/twittericon.png]<https://urldefense.com/v3/__https:/twitter.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GiPThd0g$> The Brothers WISP<https://urldefense.com/v3/__http:/www.thebrotherswisp.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GRvdikNA$> [http://www.ics-il.com/images/fbicon.png]<https://urldefense.com/v3/__https:/www.facebook.com/thebrotherswisp__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0FzIY3xBA$>[http://www.ics-il.com/images/youtubeicon.png]<https://urldefense.com/v3/__https:/www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GJBCPlCg$> ________________________________ From: "Mike Hammett" <nanog@ics-il.net<mailto:nanog@ics-il.net>> To: "NANOG" <nanog@nanog.org<mailto:nanog@nanog.org>> Sent: Friday, December 20, 2024 10:06:36 AM Subject: Distributed Router Fabrics I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction? ----- Mike Hammett [ http://www.ics-il.com/<https://urldefense.com/v3/__http:/www.ics-il.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0E8DBeZtQ$> | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL<https://urldefense.com/v3/__https:/www.facebook.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HlU5g15g$> ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb<https://urldefense.com/v3/__https:/plus.google.com/*IntelligentComputingSolutionsDeKalb__;Kw!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fd4VS5Aw$> ] [ https://www.linkedin.com/company/intelligent-computing-solutions<https://urldefense.com/v3/__https:/www.linkedin.com/company/intelligent-computing-solutions__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HbHzd3kA$> ] [ https://twitter.com/ICSIL<https://urldefense.com/v3/__https:/twitter.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fj7RvhHQ$> ] [ http://www.midwest-ix.com/<https://urldefense.com/v3/__http:/www.midwest-ix.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Gzrk6nZQ$> | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix<https://urldefense.com/v3/__https:/www.facebook.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0F4YUnLMg$> ] [ https://www.linkedin.com/company/midwest-internet-exchange<https://urldefense.com/v3/__https:/www.linkedin.com/company/midwest-internet-exchange__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HgozL-IA$> ] [ https://twitter.com/mdwestix<https://urldefense.com/v3/__https:/twitter.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GiPThd0g$> ] [ http://www.thebrotherswisp.com/<https://urldefense.com/v3/__http:/www.thebrotherswisp.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GRvdikNA$> | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp<https://urldefense.com/v3/__https:/www.facebook.com/thebrotherswisp__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0FzIY3xBA$> ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg<https://urldefense.com/v3/__https:/www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GJBCPlCg$> ]

Mike Hammett

7:41 p.m.

"Differences in ASICs, buffers, etc can really create traffic problems if you mix wrong" This is why I liked to create this thread. In the information I've read so far, the marketing speak was more or less that it worked and to just move on. It's good to learn that there are caveats that need to be explored. "makes your head hurt how much overcomplication" Aren't there memes about Silicon Valley re-inventing things we already have in a more complicated and cumbersome way? ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Tom Beecher" <beecher@beecher.cc> To: "David Sinn" <dsinn@uw.edu> Cc: "Mike Hammett" <nanog@ics-il.net>, "NANOG" <nanog@nanog.org> Sent: Tuesday, December 24, 2024 9:12:17 AM Subject: Re: Distributed Router Fabrics Much of this is right, but again with caveats. - The boxes are fungible, to a point. Differences in ASICs, buffers, etc can really create traffic problems if you mix wrong. You don't want to be yolo'ing whatever is cheap this month in there. - You're going to eventually have a feature need that commercial management software doesn't account for. Can they build it for you, and how much is that? If you built your own software to manage it, how much does it cost you to build it? - You're very correct about how initial mistakes or things you didn't know can bite you hard later. The wrong growing pain can really hurt if you're not prepared for it. - Really have to think about the internals and the design. There are some companies who have presented on how they built these things, and when you listen to their protocol design, it makes your head hurt how much overcomplication was built in. Like I said before, distributed fabrics CAN be amazing, but there are always tradeoffs. There are some things you don't have to care about with a big chassis, but you do with a DF. And the other way around as well. It's about picking which set of things you WANT to deal with, or are better for you to deal with than the other. On Tue, Dec 24, 2024 at 9:50 AM David Sinn < dsinn@uw.edu > wrote:

...

From experience I can tell you that once you fully operationalize the pizza box model you will never go back to the chassis model. Why would you trade, open, standards based model for interconnect (OSPF and BGP work great at scale) for proprietary black boxes that do stupid router tricks to make a bunch of discrete components pretend to be one along with giving you the benefit of a huge blast-radius when the software inevitably breaks? Distributed ARP/ND, solved. Actually distributed BFD (not "it's all running on one line card because customers like LACP bundles spread between line cards and that's really hard to distribute reliably), solved. Pizza box models means the boxes are fungible. So you can competitively bid between multiple suppliers and pick and choose who you want to buy from depending on what is the most important thing at the time (delivery dates? price? which of them is annoying you the least at that moment in time?). They are also infinitely more scaleable(*) than any big chassis model. State of the art 5 years ago had Internet edge systems deploying with 8k of 400G ports and datacenter deployments with 65k 400G ports using the same fundamental design.

The real downside: vendors don't like the flexibility that it affords the customer and the meaninglessness of differentiation between vendors that it drives the operators to avoid. David (*) - Among some of the critical things to get right from the outset is what peak scale you want to have for the fabric because recabling is not something to be taken lightly... <blockquote> On Dec 23, 2024, at 7:15 AM, Mike Hammett < nanog@ics-il.net > wrote: "Obviously, at some point, buying a big chassis..." Actually, as I read more about it and watch more videos about it, it seems like that isn't necessarily true. The claims they have at the top end surpass what any chassis platform I've seen is capable of, though I don't know that they actually have pushed the upper bounds of what's possible in the real world. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP From: "Mike Hammett" < nanog@ics-il.net > To: "NANOG" < nanog@nanog.org > Sent: Friday, December 20, 2024 10:06:36 AM Subject: Distributed Router Fabrics I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction? ----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ] </blockquote>

Phil Bedard

9 p.m.

All major vendors espouse both chassis and fabrics depending on what you are doing. I’m typically more of a fan of fabric based models but as others mentioned depends what you are doing. When I say fabric I mean something using Ethernet and a standard control plane, not proprietary interconnects and fabric encapsulation which is what you see in some BRCM based solutions. Those are basically virtual/multi-chassis systems, which also have their own pros and cons. What vendor/box you choose is mostly dependent on the feature set you need. There is a giant list of pros and cons between traditional chassis and distributed “fabrics” but here are a few. Management and control plane scale can be an issue until that gets figured out. Doing a 1:1 replacement of traditional large chassis with fabrics can add a lot of routers to a network. Power depends on the chassis and power distribution design. However, you can build a fabric as you need, it doesn’t require the power day one like a traditional chassis. Upgrading chassis switch fabrics and moving/mixing generations of line cards is almost always a painful experience. Phil From: NANOG <nanog-bounces+bedard.phil=gmail.com@nanog.org> on behalf of David Sinn <dsinn@uw.edu> Date: Tuesday, December 24, 2024 at 14:12 To: Mike Hammett <nanog@ics-il.net> Cc: NANOG <nanog@nanog.org> Subject: Re: Distributed Router Fabrics From experience I can tell you that once you fully operationalize the pizza box model you will never go back to the chassis model. Why would you trade, open, standards based model for interconnect (OSPF and BGP work great at scale) for proprietary black boxes that do stupid router tricks to make a bunch of discrete components pretend to be one along with giving you the benefit of a huge blast-radius when the software inevitably breaks? Distributed ARP/ND, solved. Actually distributed BFD (not "it's all running on one line card because customers like LACP bundles spread between line cards and that's really hard to distribute reliably), solved. Pizza box models means the boxes are fungible. So you can competitively bid between multiple suppliers and pick and choose who you want to buy from depending on what is the most important thing at the time (delivery dates? price? which of them is annoying you the least at that moment in time?). They are also infinitely more scaleable(*) than any big chassis model. State of the art 5 years ago had Internet edge systems deploying with 8k of 400G ports and datacenter deployments with 65k 400G ports using the same fundamental design. The real downside: vendors don't like the flexibility that it affords the customer and the meaninglessness of differentiation between vendors that it drives the operators to avoid. David (*) - Among some of the critical things to get right from the outset is what peak scale you want to have for the fabric because recabling is not something to be taken lightly... On Dec 23, 2024, at 7:15 AM, Mike Hammett <nanog@ics-il.net> wrote: "Obviously, at some point, buying a big chassis..." Actually, as I read more about it and watch more videos about it, it seems like that isn't necessarily true. The claims they have at the top end surpass what any chassis platform I've seen is capable of, though I don't know that they actually have pushed the upper bounds of what's possible in the real world. ----- Mike Hammett Intelligent Computing Solutions<https://urldefense.com/v3/__http:/www.ics-il.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0E8DBeZtQ$> [Image removed by sender.]<https://urldefense.com/v3/__https:/www.facebook.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HlU5g15g$>[Image removed by sender.]<https://urldefense.com/v3/__https:/plus.google.com/*IntelligentComputingSolutionsDeKalb__;Kw!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fd4VS5Aw$>[Image removed by sender.]<https://urldefense.com/v3/__https:/www.linkedin.com/company/intelligent-computing-solutions__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HbHzd3kA$>[Image removed by sender.]<https://urldefense.com/v3/__https:/twitter.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fj7RvhHQ$> Midwest Internet Exchange<https://urldefense.com/v3/__http:/www.midwest-ix.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Gzrk6nZQ$> [Image removed by sender.]<https://urldefense.com/v3/__https:/www.facebook.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0F4YUnLMg$>[Image removed by sender.]<https://urldefense.com/v3/__https:/www.linkedin.com/company/midwest-internet-exchange__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HgozL-IA$>[Image removed by sender.]<https://urldefense.com/v3/__https:/twitter.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GiPThd0g$> The Brothers WISP<https://urldefense.com/v3/__http:/www.thebrotherswisp.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GRvdikNA$> [Image removed by sender.]<https://urldefense.com/v3/__https:/www.facebook.com/thebrotherswisp__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0FzIY3xBA$>[Image removed by sender.]<https://urldefense.com/v3/__https:/www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GJBCPlCg$> ________________________________ From: "Mike Hammett" <nanog@ics-il.net<mailto:nanog@ics-il.net>> To: "NANOG" <nanog@nanog.org<mailto:nanog@nanog.org>> Sent: Friday, December 20, 2024 10:06:36 AM Subject: Distributed Router Fabrics I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction? ----- Mike Hammett [ http://www.ics-il.com/<https://urldefense.com/v3/__http:/www.ics-il.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0E8DBeZtQ$> | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL<https://urldefense.com/v3/__https:/www.facebook.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HlU5g15g$> ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb<https://urldefense.com/v3/__https:/plus.google.com/*IntelligentComputingSolutionsDeKalb__;Kw!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fd4VS5Aw$> ] [ https://www.linkedin.com/company/intelligent-computing-solutions<https://urldefense.com/v3/__https:/www.linkedin.com/company/intelligent-computing-solutions__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HbHzd3kA$> ] [ https://twitter.com/ICSIL<https://urldefense.com/v3/__https:/twitter.com/ICSIL__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Fj7RvhHQ$> ] [ http://www.midwest-ix.com/<https://urldefense.com/v3/__http:/www.midwest-ix.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0Gzrk6nZQ$> | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix<https://urldefense.com/v3/__https:/www.facebook.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0F4YUnLMg$> ] [ https://www.linkedin.com/company/midwest-internet-exchange<https://urldefense.com/v3/__https:/www.linkedin.com/company/midwest-internet-exchange__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0HgozL-IA$> ] [ https://twitter.com/mdwestix<https://urldefense.com/v3/__https:/twitter.com/mdwestix__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GiPThd0g$> ] [ http://www.thebrotherswisp.com/<https://urldefense.com/v3/__http:/www.thebrotherswisp.com/__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GRvdikNA$> | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp<https://urldefense.com/v3/__https:/www.facebook.com/thebrotherswisp__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0FzIY3xBA$> ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg<https://urldefense.com/v3/__https:/www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg__;!!K-Hz7m0Vt54!iatSqn4yHM3x91laULnIOllnvAEqZBClRhLXcW0CJw8ZrVGDtYCggOnkHCfTy97OA0GJBCPlCg$> ]

Matthew Petach

25 Dec 25 Dec

3:31 a.m.

On Tue, Dec 24, 2024 at 2:00 PM Phil Bedard <bedard.phil@gmail.com> wrote:

...

All major vendors espouse both chassis and fabrics depending on what you are doing. I’m typically more of a fan of fabric based models but as others mentioned depends what you are doing. When I say fabric I mean something using Ethernet and a standard control plane, not proprietary interconnects and fabric encapsulation which is what you see in some BRCM based solutions. Those are basically virtual/multi-chassis systems, which also have their own pros and cons. What vendor/box you choose is mostly dependent on the feature set you need.

There is a giant list of pros and cons between traditional chassis and distributed “fabrics” but here are a few.

Management and control plane scale can be an issue until that gets figured out. Doing a 1:1 replacement of traditional large chassis with fabrics can add a lot of routers to a network.

Power depends on the chassis and power distribution design. However, you can build a fabric as you need, it doesn’t require the power day one like a traditional chassis.

Upgrading chassis switch fabrics and moving/mixing generations of line cards is almost always a painful experience.

Phil

Power is a *huge* part of the equation that I think many people overlook. When you look at what a really big chassis takes in terms of power feeds, it's not uncommon to need relatively specialized 3-phase 240V power feeds for the very-high-end chassis box that give you the same type of high speed port densities that a pizza-box fabric folded Clos model can yield. (not to pick on any vendor, but here's an example of the types of power feeds a large chassis can require:) " AC Power Distribution Modules (PDMs) The (REDACTED MODEL #) supports connection of a single-phase or three-phase (delta or wye) AC PDM. Four AC PDM models are available: three-phase delta, three-phase wye, seven-feed single-phase, and nine-feed single-phase. - Each three-phase AC PDM requires two three-phase feeds to be connected. Each phase from each of the two feeds is distributed among one or two PSMs. One feed has each phase going to two PSMs, and the other feed has each phase going to a single PSM. - The single-phase AC PDM provides an AC input connection from the single-phase AC power source, and also provides an input power interface to the PSM through a system power midplane. The single-phase AC PDMs accept seven or nine AC power cords from a single-phase AC source. - Each AC input is independent and feeds one PSM. Up to nine PSMs can be connected through the AC PDM. Generally speaking, you're getting a licensed electrician to run three-phase power feeds for them, you're not going to just ask for a couple more outlets from the colo provider, and the chassis listed above takes 4 power distribution modules each with two 3-phase AC feeds for a total of 8 3-phase AC connections, 4 primary and 4 secondary feeds. That's a lot of custom electrical work to feed your chassis, not to mention 2x12KW of provisioned power. By comparison, you can get a similar amount of port density and fabric throughput with a folded-Clos design using 1RU 24 port 400G rack switches which each require a redundant 10amp, 120V power feed; absolutely standard, your normal rack PDU handles them quite well. Start with the 12 switches for your spine, add leaf switches as needed to scale up, and by the time you've hit the same leaf port density as the big chassis box, you've only provisioned up half the power as the big chassis box. Over time, that difference in provisioned power can make a huge difference in operational costs. At this point, I'd be hard-pressed to find a reason to support recommending a big single chassis solution to anyone other than an enterprise customer that wants to outsource most of its network support needs to a vendor. In that model, yes, the one big chassis model can make sense. But for everyone else, it's seriously time to look at the scalability and operational cost benefits of clustered pizza boxes. Thanks! Matt

Saku Ytti

8:30 a.m.

On Wed, 25 Dec 2024 at 05:34, Matthew Petach <mpetach@netflight.com> wrote:

...

Power is a *huge* part of the equation that I think many people overlook. When you look at what a really big chassis takes in terms of power feeds, it's not uncommon to need relatively specialized 3-phase 240V power feeds for the very-high-end chassis box that give you the same type of high speed port densities that a pizza-box fabric folded Clos model can yield. (not to pick on any vendor, but here's an example of the types of power feeds a large chassis can require:)

This and most of the differences are implementation details, not fundamental. If we assume some fantastical world, where you can make anything appear out of thin air with no cost, no one will use the same interfaces to connect customers and to build interconnect between chips. Because the serdes fabric interfaces are lower power, pin, thermal, cost than the real customer facing port and you can have higher density of them. On very high and impractical level, that's the difference, how do you interconnect chips, do you use some specialised solution that understands that both ends are going to be the chips in the same rack or do you use generic solution that makes no assumption on what is going to be connected on the other end. In practice people do these stack of switches, because they want rightsized platforms and there isn't quite the right size available from anyone that they care to deploy. So the rightsized stack ends up being commercially more viable and more energy efficient, because the right box for the application has no commercial availability. Luckily for most of us, these problems do not matter, as chip densities are front-running even most hyperscalers, when Amazon presented their stack-of-switches solutions couple years ago, giving the densities and how many front-facing ports are 'wasted' on internal interconnect, vendors were already shipping single chip solutions matching the stack-of-switches non-interconnect port densities, i.e. already this hyperscaler solution could have been single chip device, not stack-of-switches, not fabric box. And this is true for almost every buyer in the market, you need just a single chip box today, densities are absurd for almost everyone. -- ++ytti

Christopher Morrow

24 Dec 24 Dec

12:05 a.m.

On Mon, Dec 23, 2024 at 10:15 AM Mike Hammett <nanog@ics-il.net> wrote:

...

Actually, as I read more about it and watch more videos about it, it seems like that isn't necessarily true. The claims they have at the top end surpass what any chassis platform I've seen is capable of, though I don't know that they actually have pushed the upper bounds of what's possible in the real world.

I wonder how large a failure domain folk are willing to accept. I also don't know that it's actually better to have 1 thing vs N things, since management of the things is probably the expensive part (once you get past space/power which don't seem to be part of the calculations here (not in my brief read of the thread at least). -chris

Mike Hammett

1:26 p.m.

In the articles I've read and videos I've watched, they have mentioned varying amounts of reduced power. I didn't commit them to memory because that wasn't the part I was interested in at the moment. Management of the things is a big thing I've been concerned about going into more modern systems. So often there's hand waiving regarding the orchestration piece of non-traditional systems. From what I've seen (and I would love to be wrong), you either build it in-house (not a small lift) or you buy something that ends up taking away all of the cost advantages that path had. Failure domain stuff is part of what I'm trying to learn more about, which goes back to more about the fundamentals of how the fabric works. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Christopher Morrow" <morrowc.lists@gmail.com> To: "NANOG" <nanog@nanog.org> Sent: Monday, December 23, 2024 6:05:12 PM Subject: Re: Distributed Router Fabrics On Mon, Dec 23, 2024 at 10:15 AM Mike Hammett < nanog@ics-il.net > wrote: Actually, as I read more about it and watch more videos about it, it seems like that isn't necessarily true. The claims they have at the top end surpass what any chassis platform I've seen is capable of, though I don't know that they actually have pushed the upper bounds of what's possible in the real world. I wonder how large a failure domain folk are willing to accept. I also don't know that it's actually better to have 1 thing vs N things, since management of the things is probably the expensive part (once you get past space/power which don't seem to be part of the calculations here (not in my brief read of the thread at least). -chris

Tom Beecher

2:58 p.m.

...

In the articles I've read and videos I've watched, they have mentioned varying amounts of reduced power. I didn't commit them to memory because that wasn't the part I was interested in at the moment.

The aggregate power load from the 1U boxes is by itself generally ( but not always ) going to be less than the equivalent sized big chassis. ( If you start having to add middle stages that can sometimes not hold true. ) On top of that, many designs allow for DACs to be used for a large percentage of connections, which also has significant power savings. On Tue, Dec 24, 2024 at 8:28 AM Mike Hammett <nanog@ics-il.net> wrote:

...

In the articles I've read and videos I've watched, they have mentioned varying amounts of reduced power. I didn't commit them to memory because that wasn't the part I was interested in at the moment.

Management of the things is a big thing I've been concerned about going into more modern systems. So often there's hand waiving regarding the orchestration piece of non-traditional systems. From what I've seen (and I would love to be wrong), you either build it in-house (not a small lift) or you buy something that ends up taking away all of the cost advantages that path had.

Failure domain stuff is part of what I'm trying to learn more about, which goes back to more about the fundamentals of how the fabric works.

----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb> <https://www.linkedin.com/company/intelligent-computing-solutions> <https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix> <https://www.linkedin.com/company/midwest-internet-exchange> <https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------ *From: *"Christopher Morrow" <morrowc.lists@gmail.com> *To: *"NANOG" <nanog@nanog.org> *Sent: *Monday, December 23, 2024 6:05:12 PM *Subject: *Re: Distributed Router Fabrics

On Mon, Dec 23, 2024 at 10:15 AM Mike Hammett <nanog@ics-il.net> wrote:

...
Actually, as I read more about it and watch more videos about it, it seems like that isn't necessarily true. The claims they have at the top end surpass what any chassis platform I've seen is capable of, though I don't know that they actually have pushed the upper bounds of what's possible in the real world.

I wonder how large a failure domain folk are willing to accept.

I also don't know that it's actually better to have 1 thing vs N things, since management of the things is probably the expensive part (once you get past space/power which don't seem to be part of the calculations here (not in my brief read of the thread at least).

-chris

Christopher Morrow

1 Jan 1 Jan

8:30 p.m.

On Tue, Dec 24, 2024 at 8:26 AM Mike Hammett <nanog@ics-il.net> wrote:

...

In the articles I've read and videos I've watched, they have mentioned varying amounts of reduced power. I didn't commit them to memory because that wasn't the part I was interested in at the moment.

I'd think that, especially as data rates climb, the power consumption is going to really get important fast. When a single device requires ~50kw to run ... I think you'll want to make sure you have space/power to deal with that :( I'm not sure that distributed fabric plans make that problem better? (maybe it's all the same problem in the end because the fabric interconnect is going to be distance limited/etc too)

...

Management of the things is a big thing I've been concerned about going into more modern systems. So often there's hand waiving regarding the orchestration piece of non-traditional systems. From what I've seen (and I would love to be wrong), you either build it in-house (not a small lift) or you buy something that ends up taking away all of the cost advantages that path had.

You almost certainly get into (pretty quickly) something that smells a bunch like: "here's my pile of ansible recipes for...." (choice of ansible here for example only, s/ansible/<whatever>/ of course to whatever you feel like) That's maybe fine if that's your jam? I think it's hard to reason/plan/build without some automation plan 'now', and it looks like a ton of folk start without that then try to retrofit once: "omg this is very large now... ugh" happens. (1-10 devices? sure fine do it by hand, 5-><bunches more> you really ought to have had an automation plan at ~5 ... my opinion clearly)

...

Failure domain stuff is part of what I'm trying to learn more about, which goes back to more about the fundamentals of how the fabric works.

yea... This part(reasoning about failure domains) I assume is also a tad hard. A scenario is: "I built this 200tb fabric, I interconnect to the outside with ~100T max and internally with ~100T" now that ~100T breaks and (ideally!) everything on the outside re-routes around to a different front-door... oops are you prepared for an extra ~100T arriving? How do you deal with parts (fabric parts) failing in part? "oops only 50T of my 100T can get through here and ... I also am still telling my external neighbors all's good" Really that failure-domain problem is tightly linked to the 'manage a ton of things' problem too.. at least for containing damage in a quick manner.

Christopher Morrow

8:39 p.m.

(sorry I should have also mentioned one other thing below) On Wed, Jan 1, 2025 at 3:30 PM Christopher Morrow <morrowc.lists@gmail.com> wrote:

...

On Tue, Dec 24, 2024 at 8:26 AM Mike Hammett <nanog@ics-il.net> wrote:

...
In the articles I've read and videos I've watched, they have mentioned varying amounts of reduced power. I didn't commit them to memory because that wasn't the part I was interested in at the moment.

I'd think that, especially as data rates climb, the power consumption is going to really get important fast. When a single device requires ~50kw to run ... I think you'll want to make sure you have space/power to deal with that :(

I'm not sure that distributed fabric plans make that problem better? (maybe it's all the same problem in the end because the fabric interconnect is going to be distance limited/etc too)

One thing to really think about here is: "What is the traffic pattern you expect to see?" I mean here: * "I have jewels in the center of my network, and everyone comes through the edge to the jewel(s), and then back out" - "just have a ton of pizza boxes, all traffic is edge-to-core and I'm not spending optics/etc in the local edge area bouncing traffic around" * "I have no jewels, I'm providing any-to-any connectivity, folk might just bounce traffic through me and right back out the same location to someone else" - "oops, now I need to spray traffic in the local pop/metro and do not need as much core-facing capacity out of that local area" anyway, interesting conversation :)

...

...
Management of the things is a big thing I've been concerned about going into more modern systems. So often there's hand waiving regarding the orchestration piece of non-traditional systems. From what I've seen (and I would love to be wrong), you either build it in-house (not a small lift) or you buy something that ends up taking away all of the cost advantages that path had.

You almost certainly get into (pretty quickly) something that smells a bunch like: "here's my pile of ansible recipes for...." (choice of ansible here for example only, s/ansible/<whatever>/ of course to whatever you feel like)

That's maybe fine if that's your jam? I think it's hard to reason/plan/build without some automation plan 'now', and it looks like a ton of folk start without that then try to retrofit once: "omg this is very large now... ugh" happens. (1-10 devices? sure fine do it by hand, 5-><bunches more> you really ought to have had an automation plan at ~5 ... my opinion clearly)

...
Failure domain stuff is part of what I'm trying to learn more about, which goes back to more about the fundamentals of how the fabric works.

yea... This part(reasoning about failure domains) I assume is also a tad hard. A scenario is: "I built this 200tb fabric, I interconnect to the outside with ~100T max and internally with ~100T" now that ~100T breaks and (ideally!) everything on the outside re-routes around to a different front-door... oops are you prepared for an extra ~100T arriving? How do you deal with parts (fabric parts) failing in part? "oops only 50T of my 100T can get through here and ... I also am still telling my external neighbors all's good"

Really that failure-domain problem is tightly linked to the 'manage a ton of things' problem too.. at least for containing damage in a quick manner.

Mike Hammett

26 Dec 26 Dec

9:48 p.m.

I don't find this explained in any of the literature I've looked at so far. In a distributed fabric, where is the traditional control plane run? Say I've got 100 BGP sessions of upstream,peer, and downstream across ten routers. Is each pizza box grinding this out on its own, or is the work done on the x86 box mentioned in the larger installations? If each box is doing it on its own, are there route reflectors somewhere making all of the decisions? ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Mike Hammett" <nanog@ics-il.net> To: "NANOG" <nanog@nanog.org> Sent: Friday, December 20, 2024 10:06:36 AM Subject: Distributed Router Fabrics I've noticed that the whitebox hardware vendors are pushing distributed router fabrics, where you can keep buying pizza boxes and hooking them into a larger and larger fabric. Obviously, at some point, buying a big chassis makes more sense. Does it make sense building up to that point? What are your thoughts on that direction? ----- Mike Hammett [ http://www.ics-il.com/ | Intelligent Computing Solutions ] [ https://www.facebook.com/ICSIL ] [ https://plus.google.com/+IntelligentComputingSolutionsDeKalb ] [ https://www.linkedin.com/company/intelligent-computing-solutions ] [ https://twitter.com/ICSIL ] [ http://www.midwest-ix.com/ | Midwest Internet Exchange ] [ https://www.facebook.com/mdwestix ] [ https://www.linkedin.com/company/midwest-internet-exchange ] [ https://twitter.com/mdwestix ] [ http://www.thebrotherswisp.com/ | The Brothers WISP ] [ https://www.facebook.com/thebrotherswisp ] [ https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg ]

Randy Bush

10:46 p.m.

...

In a distributed fabric, where is the traditional control plane run? Say I've got 100 BGP sessions of upstream,peer, and downstream across ten routers. Is each pizza box grinding this out on its own, or is the work done on the x86 box mentioned in the larger installations?

one way to think of it is that each pizza box (customer facing ports) recognizes control plane messages (e.g. port 179) and "punts" them to the control plane box, aka routing engine. randy

Randy Bush

10:51 p.m.

...

...
In a distributed fabric, where is the traditional control plane run? Say I've got 100 BGP sessions of upstream,peer, and downstream across ten routers. Is each pizza box grinding this out on its own, or is the work done on the x86 box mentioned in the larger installations?

one way to think of it is that each pizza box (customer facing ports) recognizes control plane messages (e.g. port 179) and "punts" them to the control plane box, aka routing engine.

fwiw, that is pretty much what line cards on a big-box fabric do, punt to the RE. randy

Mike Hammett

27 Dec 27 Dec

2:27 a.m.

*nods* Yeah, I knew that's how a traditional chassis worked. In a distributed setup, you have the option for a single "line card", which obviously doesn't happen in the traditional chassis world. I do see in a DDCv2 document where they briefly mention 2 compute boxes, so now that makes sense. I had to look up some of the acronyms because the document didn't define them within itself. ----- Mike Hammett Intelligent Computing Solutions Midwest Internet Exchange The Brothers WISP ----- Original Message ----- From: "Randy Bush" <randy@psg.com> To: "Mike Hammett" <nanog@ics-il.net> Cc: "NANOG" <nanog@nanog.org> Sent: Thursday, December 26, 2024 4:51:45 PM Subject: Re: Distributed Router Fabrics

...

...
In a distributed fabric, where is the traditional control plane run? Say I've got 100 BGP sessions of upstream,peer, and downstream across ten routers. Is each pizza box grinding this out on its own, or is the work done on the x86 box mentioned in the larger installations?

one way to think of it is that each pizza box (customer facing ports) recognizes control plane messages (e.g. port 179) and "punts" them to the control plane box, aka routing engine.

fwiw, that is pretty much what line cards on a big-box fabric do, punt to the RE. randy

Tom Beecher

3:26 a.m.

Again, it depends. DFs at the edge as you're talking about are tricky. We worked on some designs a couple years ago. FIB management can become really tricky, with a lot of big peers and/or connections to the DFZ. If you do it wrong you can get tricky hotspotting or bouncing issues with your N/S traffic. It's doable of course, but in many circumstances I think these make the most sense down in the aggregation layers of a design. On Thu, Dec 26, 2024 at 9:30 PM Mike Hammett <nanog@ics-il.net> wrote:

...

*nods* Yeah, I knew that's how a traditional chassis worked. In a distributed setup, you have the option for a single "line card", which obviously doesn't happen in the traditional chassis world.

I do see in a DDCv2 document where they briefly mention 2 compute boxes, so now that makes sense. I had to look up some of the acronyms because the document didn't define them within itself.

----- Mike Hammett Intelligent Computing Solutions <http://www.ics-il.com/> <https://www.facebook.com/ICSIL> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb> <https://www.linkedin.com/company/intelligent-computing-solutions> <https://twitter.com/ICSIL> Midwest Internet Exchange <http://www.midwest-ix.com/> <https://www.facebook.com/mdwestix> <https://www.linkedin.com/company/midwest-internet-exchange> <https://twitter.com/mdwestix> The Brothers WISP <http://www.thebrotherswisp.com/> <https://www.facebook.com/thebrotherswisp> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg> ------------------------------ *From: *"Randy Bush" <randy@psg.com> *To: *"Mike Hammett" <nanog@ics-il.net> *Cc: *"NANOG" <nanog@nanog.org> *Sent: *Thursday, December 26, 2024 4:51:45 PM *Subject: *Re: Distributed Router Fabrics

...
...
In a distributed fabric, where is the traditional control plane run? Say I've got 100 BGP sessions of upstream,peer, and downstream across ten routers. Is each pizza box grinding this out on its own, or is the work done on the x86 box mentioned in the larger installations?

one way to think of it is that each pizza box (customer facing ports) recognizes control plane messages (e.g. port 179) and "punts" them to the control plane box, aka routing engine.

fwiw, that is pretty much what line cards on a big-box fabric do, punt to the RE.

randy

joel jaeggli

26 Dec 26 Dec

11:16 p.m.

On 12/26/24 14:46, Randy Bush wrote:

...

...
In a distributed fabric, where is the traditional control plane run? Say I've got 100 BGP sessions of upstream,peer, and downstream across ten routers. Is each pizza box grinding this out on its own, or is the work done on the x86 box mentioned in the larger installations? one way to think of it is that each pizza box (customer facing ports) recognizes control plane messages (e.g. port 179) and "punts" them to the control plane box, aka routing engine.

This is similar to the way I think about it. In a router(switch) with a bunch of linecards (1 or more) there are a set of match rules (ACLs if you will) which match traffic bound for the control-plane and forward them via a management port to the control plane cpu, conveniently this is also where you implement your control-plane-protection. If you substitute ethernet switch for line-card, broadly you are an in a similar place conceptually if the main control plane processor is at some remove from the switch that is now a line card. at some point you need to encapsulate the control-plane messages because the managment next-hop is remote rather than local and the neighbor is also a switch. For me, the realization a decade ago was that enclosing a large number of asics in sheet metal with the assiciated midplane and glue logic was a higher capital risk and reduced the size of the addressable market vs smaller switches with could be assembled piecemeal into a larger switch. The white box vendors and the ODMs are loath to build something that has limited market addressability so it becomes more attractive over time to build large assemblies of box rather than large boxes that can leverage the atom of switch that the 1/2ru pizza box can enclose and just buy more of them. That said the maximum radix of a single large switch asic right now is 512 * 100G ports. so you need to be able to build a box that can enclose that many ports or the 64 x 800G, that, that maps to which is still a very hefty pizza box.

...

randy

260

Age (days ago)

272

Last active (days ago)

List overview

Download

36 comments

13 participants

participants (13)

Christopher Morrow
David Sinn
joel jaeggli
Matthew Petach
Mike Hammett
Phil Bedard
Randy Bush
Saku Ytti
Shawn L
Tom Beecher
Tony Wicks
Vasilenko Eduard
Yan Filyurin

Distributed Router Fabrics

tags

participants (13)