On 2015-05-13 19:42, nanog@cdl.asgaard.org wrote:
Greetings,
Do we really need them to be swappable at that point? The reason we swap HDD's (if we do) is because they are rotational, and mechanical things break.
Right.
Do we swap CPUs and memory hot?
Nope. Usually just toss the whole thing. Well I keep spare ram around cause it's so cheap. But if CPU goes, chuck it in the ewaste pile in the back. Do we even replace
memory on a server that's gone bad, or just pull the whole thing during the periodic "dead body collection" and replace it?
Usually swap memory. But yeah, often times the hardware ops folks just cull old boxes on a quarterly basis and backfill with the latest batch of inbound kit. At large scale (which many on this list operate at), you have pallets of gear sitting in the to deploy queue, and another couple pallets worth racked up but not even imaged yet. (This is all supposition of course. I'm used to working with $HUNDREDS of racks worth of gear). Containers, moonshot type things etc are certainly on the radar. Might it
not be more efficient (and space saving) to just add 20% more storage to a server than the design goal, and let the software use the extra space to keep running when an SSD fails?
Yes. Also a few months ago I read an article about several SSD brands having $MANY terabytes written to them. Can't find it just now. But they seem to take quite a long time (data wise/number of write wise) to fail. When the overall storage
falls below tolerance, the unit is dead. I think we will soon need to (if we aren't already) stop thinking about individual components as FRUs. The server (or rack, or container) is the FRU.
Christopher
Yes. Agree. Most of the very large scale shops (the ones I've worked at) are massively horizontal scaled, cookie cutter. Many boxes replicating/extending/expanding a set of well defined workloads.