Guys, I am wondering how many of you are doing layer 3 to top of rack switches and what the pros and cons are. Also, if you are doing layer 3 to top of rack do you guys have any links to published white papers on it? Thanks, Raj Singh
We are heading towards that type of deployment beginning next year with Juniper EX4200 switches in a redundant configuration. This will be pure Layer2 in nature on the switches and they will "uplink" to Juniper M10i's for layer3... the power savings, space savings etc over traditional Cisco 6500 chassis (plus all the cabling between cabinets which is in our case a nightmare) made this a pretty easy choice... and price too..;) Somewhere on Juniper's website in the product info section they have deployment whitepapers on this kind of stuff if that's of interest.... Hope this helps.. Paul -----Original Message----- From: Raj Singh [mailto:raj.singh@demandmedia.com] Sent: November-12-09 2:49 PM To: 'nanog@nanog.org' Subject: Layer 2 vs. Layer 3 to TOR Guys, I am wondering how many of you are doing layer 3 to top of rack switches and what the pros and cons are. Also, if you are doing layer 3 to top of rack do you guys have any links to published white papers on it? Thanks, Raj Singh ---------------------------------------------------------------------------- "The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you."
We are actually looking at going Layer 3 all the way to the top of rack and make each rack its own /24. This provides us flexibility when doing maintenance (spanning-tree). Also, troubleshooting during outages is much easier by using common tools like ping and trace routes. I want to make sure this is something other people are doing out there and want to know if anyone ran into any issues with this setup. Thanks, Raj Singh | Director Network Engineering _________________________________ Demand Media | eNom, Inc. Direct: 425.974.4679 15801 NE 24th St. Bellevue, WA 98008 Raj.Singh@DemandMedia.com -----Original Message----- From: Paul Stewart [mailto:pstewart@nexicomgroup.net] Sent: Thursday, November 12, 2009 11:53 AM To: Raj Singh; nanog@nanog.org Subject: RE: Layer 2 vs. Layer 3 to TOR We are heading towards that type of deployment beginning next year with Juniper EX4200 switches in a redundant configuration. This will be pure Layer2 in nature on the switches and they will "uplink" to Juniper M10i's for layer3... the power savings, space savings etc over traditional Cisco 6500 chassis (plus all the cabling between cabinets which is in our case a nightmare) made this a pretty easy choice... and price too..;) Somewhere on Juniper's website in the product info section they have deployment whitepapers on this kind of stuff if that's of interest.... Hope this helps.. Paul -----Original Message----- From: Raj Singh [mailto:raj.singh@demandmedia.com] Sent: November-12-09 2:49 PM To: 'nanog@nanog.org' Subject: Layer 2 vs. Layer 3 to TOR Guys, I am wondering how many of you are doing layer 3 to top of rack switches and what the pros and cons are. Also, if you are doing layer 3 to top of rack do you guys have any links to published white papers on it? Thanks, Raj Singh ---------------------------------------------------------------------------- "The information transmitted is intended only for the person or entity to which it is addressed and contains confidential and/or privileged material. If you received this in error, please contact the sender immediately and then destroy this transmission, including all attachments, without copying, distributing or disclosing same. Thank you."
Hej, Am 12.11.2009 21:04 Uhr schrieb Raj Singh:
We are actually looking at going Layer 3 all the way to the top of rack and make each rack its own /24.
what a waste of IPs and unnecessary loss of flexibility!
This provides us flexibility when doing maintenance (spanning-tree).
If you use a simple setup for aggregation, you do not need xSTP. Even including redundancy, RTG (big C: flex-link) will be sufficient. Spanning the L2 over more than one rack is dirty when you do L3 on the TORs, because you need to build a Virtual Chassis or VPLS tunnels (not sure if EX4200 does that as of today).
Also, troubleshooting during outages is much easier by using common tools like ping and trace routes.
Oh, c'mon. Yes, Layer 2 is a wild jungle compared to clean routing, but tracing isn't that magic there. You have LLDP, mac-address-tables, arp-tables...
I want to make sure this is something other people are doing out there and want to know if anyone ran into any issues with this setup.
From the design POV, it is a clean and nice concept to do L3 on the TOR-switches, but in real life, it's not working very well. Everytime I played with such, with every vendor I've seen, there is just always the same conclusion: Let routers route and let switches switch. Switches which are supposed to do routing never scale, provide almost always immature implementations of common L3 features and run into capacity problems just too fast (too small tables for firewall roules, route entries, no full IPv6 capabilities, sometimes expensive licenses needed for stuff like IS-IS...). I understand the wish to keep broadcast domains small and network paths deterministic and clean, but the switches you can buy today for not-too-much-money aren't ready yet. So my hint is: Look at model #4 from the mentioned NANOG presentation. My 2 Euro-Cents, .m
I believe TRILL will render this discussion moot. It should be shipping on gear from various vendors within the next year. -----Original Message----- From: Malte von dem Hagen [mailto:mvh@hosteurope.de] Sent: Thursday, November 12, 2009 1:09 PM To: Raj Singh Cc: nanog@nanog.org Subject: Re: Layer 2 vs. Layer 3 to TOR Hej, Am 12.11.2009 21:04 Uhr schrieb Raj Singh:
We are actually looking at going Layer 3 all the way to the top of rack and make each rack its own /24.
what a waste of IPs and unnecessary loss of flexibility!
Raj Singh wrote:
We are actually looking at going Layer 3 all the way to the top of rack and make each rack its own /24. This provides us flexibility when doing maintenance (spanning-tree). Also, troubleshooting during outages is much easier by using common tools like ping and trace routes. I'm confused where STP fits into this. If you're doing /24s to each switch, why even bring STP into the picture? Do /31s to each TOR switch and use OSPF or ISIS. I don't know too many people who have not had an awful experience with STP at some point.
On Nov 12, 2009, at 2:48 PM, Raj Singh wrote:
Guys,
I am wondering how many of you are doing layer 3 to top of rack switches and what the pros and cons are. Also, if you are doing layer 3 to top of rack do you guys have any links to published white papers on it?
Dani Roisman gave an excellent talk on this subject at NANOG 46 in Philadelpha: http://www.nanog.org/meetings/nanog46/abstracts.php?pt=MTQwOCZuYW5vZzQ2&nm=nanog46 Steve
Steve Feldman wrote:
On Nov 12, 2009, at 2:48 PM, Raj Singh wrote:
Guys,
I am wondering how many of you are doing layer 3 to top of rack switches and what the pros and cons are. Also, if you are doing layer 3 to top of rack do you guys have any links to published white papers on it?
Dani Roisman gave an excellent talk on this subject at NANOG 46 in Philadelpha:
http://www.nanog.org/meetings/nanog46/abstracts.php?pt=MTQwOCZuYW5vZzQ2&nm=nanog46
I'd always wondered how you make a subnet available across racks with L3 rack switching. It seems that you don't. ~Seth
On Thu, Nov 12, 2009 at 12:19:36PM -0800, Seth Mattinen wrote:
I'd always wondered how you make a subnet available across racks with L3 rack switching. It seems that you don't.
~Seth
It's possible, with prior planning. You can have the uplinks be layer 2 trunks, with a layer 3 SVI in the trunk acting as your actual routed uplink. Requires much planning in advance regarding what vlans are trunked where, etc. Allows one to do layer 3 termination at top of rack for single servers, but offer vlans that span multiple layer 3 switches with HSRP at distribution as an option for systems/services that require a common broadcast domain. -- Brandon Ewing (nicotine@warningg.com)
If you use stackable switches, you can stack across cabinets (up to 3 with 1 meter Cisco 3750 Stackwise), and uplink on the ends. It's a pretty solid layout if you plan your port needs properly based on NIC density and cabinet size, plus you can cable cleanly to an adjacent cabinet's switch if necessary. Slightly off-topic.. Consider offloading 100Mb connections like PDUs, DRAC/iLO, etc. to lower cost switches to get the most out of your premium ports. -Tim -----Original Message----- From: Seth Mattinen [mailto:sethm@rollernet.us] Sent: Thursday, November 12, 2009 3:20 PM To: 'nanog@nanog.org' Subject: Re: Layer 2 vs. Layer 3 to TOR Steve Feldman wrote:
On Nov 12, 2009, at 2:48 PM, Raj Singh wrote:
Guys,
I am wondering how many of you are doing layer 3 to top of rack switches and what the pros and cons are. Also, if you are doing layer 3 to top of rack do you guys have any links to published white papers on it?
Dani Roisman gave an excellent talk on this subject at NANOG 46 in Philadelpha:
http://www.nanog.org/meetings/nanog46/abstracts.php?pt=MTQwOCZuYW5vZzQ2&nm=nanog46
I'd always wondered how you make a subnet available across racks with L3 rack switching. It seems that you don't. ~Seth
On Thu, Nov 12, 2009 at 2:40 PM, Bulger, Tim <Tim_Bulger@polk.com> wrote:
If you use stackable switches, you can stack across cabinets (up to 3 with 1 meter Cisco 3750 Stackwise), and uplink on the ends. It's a pretty solid layout if you plan your port needs properly based on NIC density and cabinet size, plus you can cable cleanly to an adjacent cabinet's switch if necessary.
Slightly off-topic.. Consider offloading 100Mb connections like PDUs, DRAC/iLO, etc. to lower cost switches to get the most out of your premium ports.
Agreed. We use Netgear gigabit unmanaged switches for what Tim suggests to save the higher-cost-per-port switchports for server gear. -brandon
-Tim
-----Original Message----- From: Seth Mattinen [mailto:sethm@rollernet.us] Sent: Thursday, November 12, 2009 3:20 PM To: 'nanog@nanog.org' Subject: Re: Layer 2 vs. Layer 3 to TOR
Steve Feldman wrote:
On Nov 12, 2009, at 2:48 PM, Raj Singh wrote:
Guys,
I am wondering how many of you are doing layer 3 to top of rack switches and what the pros and cons are. Also, if you are doing layer 3 to top of rack do you guys have any links to published white papers on it?
Dani Roisman gave an excellent talk on this subject at NANOG 46 in Philadelpha:
http://www.nanog.org/meetings/nanog46/abstracts.php?pt=MTQwOCZuYW5vZzQ2&nm=nanog46
I'd always wondered how you make a subnet available across racks with L3 rack switching. It seems that you don't.
~Seth
-- Brandon Galbraith Mobile: 630.400.6992 FNAL: 630.840.2141
On 12/11/2009 20:40, Bulger, Tim wrote:
Slightly off-topic.. Consider offloading 100Mb connections like PDUs, DRAC/iLO, etc. to lower cost switches to get the most out of your premium ports.
Not just that, you can also use lower cost switches to move your management fully out-of-band with respect to your production traffic. This can work well in times of catastrophe. Nick
On Thu, Nov 12, 2009 at 9:40 PM, Bulger, Tim <Tim_Bulger@polk.com> wrote:
If you use stackable switches, you can stack across cabinets (up to 3 with 1 meter Cisco 3750 Stackwise), and uplink on the ends. It's a pretty solid layout if you plan your port needs properly based on NIC density and cabinet size, plus you can cable cleanly to an adjacent cabinet's switch if necessary.
Juniper claims their switches can do clustering using ethernet cabling, yet a cluster behaves as a single-system-image configuration-wise. Should allow for very flexible cabling and operations-wise for TOR switches. I have never tried it however. /Kinkie
On Wed, Nov 18, 2009 at 4:04 PM, Kinkie <gkinkie@gmail.com> wrote:
On Thu, Nov 12, 2009 at 9:40 PM, Bulger, Tim <Tim_Bulger@polk.com> wrote:
If you use stackable switches, you can stack across cabinets (up to 3 with 1 meter Cisco 3750 Stackwise), and uplink on the ends. It's a pretty solid layout if you plan your port needs properly based on NIC density and cabinet size, plus you can cable cleanly to an adjacent cabinet's switch if necessary.
Juniper claims their switches can do clustering using ethernet cabling, yet a cluster behaves as a single-system-image configuration-wise. Should allow for very flexible cabling and operations-wise for TOR switches. I have never tried it however.
The Ex4200 can be stacked by the ethernet expansion ports, either 4 x 1G or 2 x 10G. And yes, it behaves as single switch with multiple line cards.
On Wed, Nov 18, 2009 at 04:34:11PM +0200, Eugeniu Patrascu wrote:
On Wed, Nov 18, 2009 at 4:04 PM, Kinkie <gkinkie@gmail.com> wrote:
On Thu, Nov 12, 2009 at 9:40 PM, Bulger, Tim <Tim_Bulger@polk.com> wrote:
If you use stackable switches, you can stack across cabinets (up to 3 with 1 meter Cisco 3750 Stackwise), and uplink on the ends. It's a pretty solid layout if you plan your port needs properly based on NIC density and cabinet size, plus you can cable cleanly to an adjacent cabinet's switch if necessary.
Juniper claims their switches can do clustering using ethernet cabling, yet a cluster behaves as a single-system-image configuration-wise. Should allow for very flexible cabling and operations-wise for TOR switches. I have never tried it however.
The Ex4200 can be stacked by the ethernet expansion ports, either 4 x 1G or 2 x 10G. And yes, it behaves as single switch with multiple line cards.
Yes, up to 10 EX4200 switches can be interconnected into a "Virtual Chassis" using either the rear Virtual Chassis Ports (32 Gbps ingress + 32 Gbps egress for each of the 2 ports) with up to 5-meter VCP cables, or using SFP, XFP or SFP+ fiber links (not sure if it works with copper SFP, but might). You can mix/match each type of interconnection within the same VC.
I'd always wondered how you make a subnet available across racks with L3 rack switching. It seems that you don't. You could route /32s within your L3 environment, or maybe even leverage something like VPLS - Not sure of any TOR-level switches that MPLS
Seth Mattinen wrote: pseudowire a port into a VPLS cloud though. Kinda makes L3 and spanning tree sound like a great option, doesn't it?
Excerpts from David Coulson's message of Thu Nov 12 13:07:35 -0800 2009:
You could route /32s within your L3 environment, or maybe even leverage something like VPLS - Not sure of any TOR-level switches that MPLS pseudowire a port into a VPLS cloud though.
I was recently looking into this (top-of-rack VPLS PE box). Doesn't seem to be any obvious options, though the new Juniper MX80 sounds like it can do this. It's 2 RU, and looks like it can take a DPC card or comes in a fixed 48-port GigE variety. I like the idea of doing IP routing to a top-of-rack or edge device, but have found others to be skeptical. Are there any applications that absolutely *have* to sit on the same LAN/broadcast domain and can't be configured to use unicast or multicast IP? --j
Jonathan Lassoff wrote:
I was recently looking into this (top-of-rack VPLS PE box). Doesn't seem to be any obvious options, though the new Juniper MX80 sounds like it can do this. It's 2 RU, and looks like it can take a DPC card or comes in a fixed 48-port GigE variety.
The MX-series are pretty nice. That should be able to do VPLS PE, however I've never tried it - MX240 did it pretty well last time I tried. I've no clue how the cost of that switch compares to a cisco 4900 or something (not that a 4900 is anything special - L3 is all in software).
Are there any applications that absolutely *have* to sit on the same LAN/broadcast domain and can't be configured to use unicast or multicast IP?
The biggest hurdle we hit when trying to do TOR L3 (Cisco 4948s w/ /24s routed to each one) was devices that either required multiple physical Ethernet connections that we typically use LACP with, or any environments that do IP takeover for redundancy. Both are obviously easily worked around if you run an IGP on your servers, but that was just insanely complex for our environment. It's hard to convince people that a HP-UX box needs to work like a router now. So now we have a datacenter full of 4948s doing pure L2 and spanning tree... What a waste :-)
On 2009-11-12 22:37, David Coulson wrote:
The MX-series are pretty nice. That should be able to do VPLS PE, however I've never tried it - MX240 did it pretty well last time I tried. I've no clue how the cost of that switch compares to a cisco 4900 or something (not that a 4900 is anything special - L3 is all in software).
For both 4948/4948-10GE and 4900M L3 is in hardware. For 4948/4948-10GE IPv6 is in software, for 4900M it's in hardware. -- "Everything will be okay in the end. | Łukasz Bromirski If it's not okay, it's not the end. | http://lukasz.bromirski.net
I would suggest doing a VC with the TOR switches. That way you can have "one" switch for a lot of racks (I believe 10 would be the upper limit if using Juniper). If you have a VC you could do L3 and L2 where needed on every rack that the VC covers. // Olof 2009/11/13 Łukasz Bromirski <lukasz@bromirski.net>:
On 2009-11-12 22:37, David Coulson wrote:
The MX-series are pretty nice. That should be able to do VPLS PE, however I've never tried it - MX240 did it pretty well last time I tried. I've no clue how the cost of that switch compares to a cisco 4900 or something (not that a 4900 is anything special - L3 is all in software).
For both 4948/4948-10GE and 4900M L3 is in hardware. For 4948/4948-10GE IPv6 is in software, for 4900M it's in hardware.
-- "Everything will be okay in the end. | Łukasz Bromirski If it's not okay, it's not the end. | http://lukasz.bromirski.net
Hi, Am 12.11.2009 22:29 Uhr schrieb Jonathan Lassoff:
Are there any applications that absolutely *have* to sit on the same LAN/broadcast domain and can't be configured to use unicast or multicast IP?
yes. There are at least some implementations of iSCSI and the accompanying management services (e.g., for redundancy) that do not work well via routed connections. Generally, storage services may be difficult being routed. Further, some aspects of VMware (clusters) including management "need" L2 connectivity, for example when you want to dynamically shift VMs from one hardware node to another transparently and so on and so forth. The same applies to several load balancing and/or redundancy/failover mechanisms. rgds, .m
* Jonathan Lassoff
Are there any applications that absolutely *have* to sit on the same LAN/broadcast domain and can't be configured to use unicast or multicast IP?
FCoE comes to mind. -- Tore Anderson Redpill Linpro AS - http://www.redpill-linpro.com/ Tel: +47 21 54 41 27
On Fri, 2009-11-13 at 09:44 +0100, Tore Anderson wrote:
* Jonathan Lassoff
Are there any applications that absolutely *have* to sit on the same LAN/broadcast domain and can't be configured to use unicast or multicast IP?
FCoE comes to mind.
....and in a similar vein, ATAoE ; either Coraid stuff or the the free one in the Linux kernel. Its heavily used in some shops that use virtual farms with SANS as it's cheap/free and works over existing hardware but only at layer 2. I even run it at home (!) - and it's a surprisingly easy way to have a shelf of storage hanging off the back of a server, with 4GB of cache for each set of 4 disks per box. Stand too close can feel the wind from it, especially if RAIDed. Depends if there's much call for VM-ing in your shop in the future? Gord -- NNNN
Tore Anderson writes:
* Jonathan Lassoff
Are there any applications that absolutely *have* to sit on the same LAN/broadcast domain and can't be configured to use unicast or multicast IP?
FCoE comes to mind.
Doesn't FCoE need even more than that, i.e. "lossless" Ethernet with end-to-end flow control, such as IEEE DCB? As far as I understand, traditional switched Ethernets don't fit the bill anyway. On the other hand iSCSI should be fine with routed IP paths; though Malte's mail suggests that there are (broken?) implementations that aren't. -- Simon.
2009/11/12 David Coulson <david@davidcoulson.net>
You could route /32s within your L3 environment, or maybe even leverage something like VPLS - Not sure of any TOR-level switches that MPLS pseudowire a port into a VPLS cloud though.
Just to let you know - the Juniper EX4200 series only support a single label stack, and RSVP not LDP - plus they have a restricted BGP table size, so VPLS is out of the question. Matthew Walster
i have seen no mention of arista as a tos switch/router, yet folk tell me it is one of the hottest on the block today. is there anyone who is actuallly using it who would care to report? randy
Good point about Arista - Doug Gourlay, of [ex-]Cisco fame, is probably the person to ask all possible questions about those solutions. Cisco UCS is missing, also - looking at the Nexus deployment as ToR solution (2K + 5K, even 1KV, considering the needs for virtualization, also) with all benefits of both traditional ToR and E/MoR will definitely shed some light in the debate on whether L3 in ToR makes any sense at all (e..g how would you VMotion across racks?!? - how you you sync SANs across L3 in the DC (tunnel?!?), etc.). Here are some interesting articles associated with technologies in new DC designs, for example, allowing some rethinking of the L3 question: http://www.internetworkexpert.org/ - search for ToR and VMotion articles (actually poke arond the whole blog - it is very good) http://blogstu.wordpress.com/2009/10/05/fcoe-ecosystem/ (start from 1, of course) ...etc. ***Stefan On Fri, Nov 13, 2009 at 7:33 AM, Randy Bush <randy@psg.com> wrote:
i have seen no mention of arista as a tos switch/router, yet folk tell me it is one of the hottest on the block today. is there anyone who is actuallly using it who would care to report?
randy
I've been using Arista's 7124S in a ToR deployment for a new build out for a high frequency trading client I've been engaged with. For the aggregation layer I went with Cisco 4900m's and have had much success with this deployment especially with the Arista's. Sent from my iPhone 3GS. On Nov 13, 2009, at 8:33 AM, Randy Bush <randy@psg.com> wrote:
i have seen no mention of arista as a tos switch/router, yet folk tell me it is one of the hottest on the block today. is there anyone who is actuallly using it who would care to report?
randy
From a colleague here at NASA (high-performance computing area):
"We are currently using our three Arista switches as an extremely economical way to get a 10G non-blocking testbed for our various test areas. We have every intention of looking at them as an option for their routing capabilities, but have been buried with setup and testing of our testbed equipment and getting ready for Super Computing 2009. They seem to have a number of very promising possibilities and have so far proven to be very capable switches. Paul Lang" Joe From: Randy Bush <randy@psg.com> To: Matthew Walster <matthew@walster.org> Cc: nanog list <nanog@nanog.org> Date: 11/13/2009 08:34 AM Subject: Re: Layer 2 vs. Layer 3 to TOR i have seen no mention of arista as a tos switch/router, yet folk tell me it is one of the hottest on the block today. is there anyone who is actuallly using it who would care to report? randy
On Nov 13, 2009, at 4:14 AM, Matthew Walster wrote:
2009/11/12 David Coulson <david@davidcoulson.net>
You could route /32s within your L3 environment, or maybe even leverage something like VPLS - Not sure of any TOR-level switches that MPLS pseudowire a port into a VPLS cloud though.
Just to let you know - the Juniper EX4200 series only support a single label stack, and RSVP not LDP - plus they have a restricted BGP table size, so VPLS is out of the question.
If you wanted something to do this, it's called an MX series. The ex is a switch... l3, but still a switch.
Disagree, the EX is a very capable L3 router for LANs. On Nov 13, 2009, at 1:17 PM, Cord MacLeod wrote:
On Nov 13, 2009, at 4:14 AM, Matthew Walster wrote:
2009/11/12 David Coulson <david@davidcoulson.net>
You could route /32s within your L3 environment, or maybe even leverage something like VPLS - Not sure of any TOR-level switches that MPLS pseudowire a port into a VPLS cloud though.
Just to let you know - the Juniper EX4200 series only support a single label stack, and RSVP not LDP - plus they have a restricted BGP table size, so VPLS is out of the question.
If you wanted something to do this, it's called an MX series. The ex is a switch... l3, but still a switch.
I believe the issue will become a moot point in the next 12 months when vendors begin to ship switches with TRILL. TRILL is basically a layer 2 routing protocol that will replace spanning tree. It will allow you to connect several uplinks, utilize all the bandwidth of the uplinks, prevent loops, and find the best path to the destination through the switch fabric. Think of it like OSPF for layer 2. It should be shipping within the next 6 to 9 months. -----Original Message----- From: Raj Singh [mailto:raj.singh@demandmedia.com] Sent: Thursday, November 12, 2009 11:49 AM To: 'nanog@nanog.org' Subject: Layer 2 vs. Layer 3 to TOR Guys, I am wondering how many of you are doing layer 3 to top of rack switches and what the pros and cons are. Also, if you are doing layer 3 to top of rack do you guys have any links to published white papers on it? Thanks, Raj Singh
participants (27)
-
Brandon Ewing
-
Brandon Galbraith
-
Bulger, Tim
-
Chuck Anderson
-
Cord MacLeod
-
David Coulson
-
Eugeniu Patrascu
-
George Bonser
-
gordon b slater
-
Joe Loiacono
-
Jonathan Lassoff
-
Kinkie
-
Malte von dem Hagen
-
Matthew Walster
-
Nick Hilliard
-
Olof Kasselstrand
-
Paul Stewart
-
Raj Singh
-
Randy Bush
-
rodrick brown
-
Seth Mattinen
-
Shane Ronan
-
Simon Leinen
-
Stefan
-
Steve Feldman
-
Tore Anderson
-
Łukasz Bromirski