George Herbert <george.herbert@gmail.com> said:
I worked for a Sun clone vendor (Axil) for a while and took some of our systems and storage to Comdex one year in the 90s. We had a RAID unit (Mylex controller) we had just introduced. Beforehand, I made REALLY REALLY SURE that the pull-the-disk and pull-the-redundant-power tricks worked. And showed them to people with the "Please keep in mind that this voids the warranty, but here we *rip* go...". All of the other server vendors were giving me dirty looks for that one. Apparently I sold a few systems that way.
:) Nice. Thanks. Many years ago, I worked for one of DEC's research groups. We built a network using FDDI 4B/5B link technology based on AMD TAXI chips. (They were state of the art back then.) The switches were 3U(?) boxes with 12 ports. It took a rack of 6 or 8 of them in the phone closet to cover a floor. Workstations had 2 cables plugged into different switches. In theory, we covered any single point of failure. My office was near the phone closet. I got to watch my boss give demos to visiting VIPs. He was pretty good at it. In the middle of explaining things, he would grab a power cord and yank it. Blinka-blinka=blinka and the remaining switches would reconfigure and go back to work. (It took under a second.) It was interesting to watch the VIPs. Most of them got it: the network really could recover quickly. The interesting ones had a telco background. They were really surprised. The concept of disrupting live traffic for something as insignificant as a demo was off scale in their culture. It was just a research lab. We were used to eating our own dog food. ---------- "Greg D. Moore" <mooregr@greenms.com> said:
If folks have not read it, I would suggest reading Normal Accidents by Charles Perrow.
+1
The "it can't happen" is almost guaranteed to happen. ;-) And when it does, it'll often interact in ways we can't predict or sometimes even understand.
My memory of that sort of event is roughly... (see above for context) The hardware broke and turned a vanilla packet into a super-long packet. My FPGA code was supposed to catch that case and do something sane. It was never tested and didn't work. It poured crap all over memory. Needless to say, things went downhill from there. Easy to spot in hindsight. None of us thought that was an interesting case while we were testing. -- These are my opinions. I hate spam.