Thanks, Working on a similar design and now know what to avoid :) Rgds dan On Feb 27, 2012, at 4:22 PM, David Swafford wrote:
Hi Everyone!
I had several requests for more feedback on our FCoE experience, based on my comments from a thread last week, so I'm writing here with a bit more background on our project in hopes that it saves some pain for others :-).
I'm with a sizable health insurance provider in the mid-west, and we've typically focused on technology vs. headcount as an overal strategy. Based on that, we upgrade much more often than some of our peers in the industry because techology is still cheaper than long-term staffing costs.
Last fall, we were faced with an issue of both power and rack capacity constraints in our primary datacenter, which is just three years old now. As various ideas were on the table, which included taking out a section of IT cubes to expand the DC, the most appealing idea was to consolidate our server and network infrastructure into what was coined our "High Density Row".
We transitioned from Cat6500s as access to a Nexus 5K deployment, using 5Ks as both distribution and access for the new HD row. We didn't like how oversubscription is handled on 2K FEXs when it comes to 10G links, so for the situation here all 5Ks made the most sense. Our capacity needs couldn't justify 7Ks and while they would have been cool to have, we didn't want to blow money just because.
Our SAN is an EMC Symmetrix with Cisco MDS switches in between it and the hosts (Fiber Channel). In the new row, we deployed all hosts with CNAs (converged net adapters), which combine both FCoE storage and network in a single 10Gb connection. Since FCoE was new to all of us, we use a phased approach that the Nexus offered where we brough straight fiber channel connections into our distibution layer 5Ks and used the Nexus' FCoE proxy functionality to convert between true FC to FCoE.
From the host perpsective, it was only aware of FCoE connectivity to the Nexus. VSANs had to be created on the Nexus to map back to the FC VSANs on the MDS side, Virtual Fiber Channel (VFC) interfaces were created on the Nexus side, and a few other settings had to be configured.
Overall though, the config wasn't huge, but the biggest hurdle for was that as the network guys, we had to learn the storage side to be able to properly set this up. So new terms like WWN (world wide name), floggy database, VSAN (a VLAN for storage), etc. Also, on the Nexus side, you have to enable the feature of FCOE, as Nexus OS is very modulular and leaves most options disabled during the initial setup.
The painful part, which is probably what might be of most interest here, is that we hit a very strange and catrastrophic issue specific to QLogic's 8242 Copper-based (twinax) CNA adapter. As part of the burn-in testing, we were working with our server team to simulate the loss of a link/card/switch (all hosts were dual-connected with dual-CNAs to separate 5Ks). We were using the Cisco branded twinax cabling and QLogic's 8242 card (brand new HP DL580s in this case, new card, new 5K, new cabling). When a single link was dropped/diconnected PHYSICALLY (a shut/no shut is not the same here), the host's throughput on BOTH storage and network went to crap.
Our baseline was showing nearly 400MB/s on storage (raw disk IO) tests prior to a link drop and 1-8 MB/s after! This siutation would not recover until you fully rebooted/power cycled the server. We had the same results accross every HP DL 580 tested, which was 5-6 of them I belive. We replaced CNAs, cables, and even moved ports across 5Ks. It didn't matter which cable, 5K, port, of card we used, all reacted the same! The hosts were all Windows 2008 Datacenter, simliar hardware, Nexus 5K on current code, twinax cabling.
This situation led to a sev 2 w/ Cisco, the equivalant w/ HP, EMC, and QLogic. We used both the straight QLogic 8242 and the HP OEM'd version and the results were identical. QLogic acknowledged the issue but could not resolve it due not being able to grab a hardware level trace of the connection (required some type of test equipment that they couldn't provide and we didn't have).
As part of our trail/error testing, we had our re-seller ship us the fiber versions of the same QLogic cards, becuase we eventually got down to a gut instinct of this being a copper/electrical anomoly. That instict was dead-on. Switching to the fiber versions, with fiber SFPs on the 5K side resolved the situation entirely. We are now able to drop a link with NO noticable degradation, back and forth, and eveyrthing is consistent again.
We originally went the twinax route because it was signifiantly cheaper than the fiber, but in retrospect, as a whole, the danger posed was not worth it. You might ask, well... why would you intentially drop the cable? Think about a situation of doing a code upgrade on the 5K, since it's not a dual-sup box, you physcailly go through a reboot to upgrade it. That reboot right htere would have hosed our entire environment (keep in mind, the HD row's intent was to replace a signifiant portion of our production environment). You could also have a HW failure on a 5K. It kind of defeats the point of all this redundancy if your throuhput goes to hell when loosing a single path. As our storage guys best put it "i'd rather loose a path than have bad performance through it....based on how things alert, I'd know right away if a path were down, but not if it were severaly degraded."
Btw, we've been rock solid on the fiber-connected CNAs ever since. We're still using copper on our connections to HP blade chassis though, which go to FLEX Fabric cards, as we couldn't produce the problem on those. For those wondering, we did rebuild several of the DL580s from scratch (all of this was a new deployment, thankfully!), we also went through many iterations of driver updates/changes/etc.
Lots of head-banging and teamwork eventually got us squared away! This situation is a good example of why network guys NEED to have a great relationship with both server and storage guys (we're all really close where I'm at). Had there been tension/etc between the teams, this would have been signifiantky harder to resolve.
Hope this helps, sorry for the long winded email :-), but I think those interested will find it beneficial.
David.