Re: Cloudflare is down

4 Mar 2013

      In a message written on Mon, Mar 04, 2013 at 09:31:13AM +0200, Saku Ytti wrote:
...
Probably only thing you could have done to plan against this, would have
been to have solid dual-vendor strategy, to presume that sooner or later,
software defect will take one vendor completely out. And maybe they did
plan for it, but decided dual-vendor costs more than the rare outages.
From what I have heard so far there is something else they could
have done, hire higher quality people.

Any competent network admin would have stopped and questioned a
90,000+ byte packet and done more investigation.  Competent programmers
writing their internal tools would have flagged that data as out
of rage.

I can't tell you how many times I've sat in a post mortem meeting
about some issue and the answer from senior management is "why don't
you just provide a script to our NOC guys, so the next time they
can run it and make it all better."  Of course it's easy to say
that, the smart people have diagnosed the problem!

You can buy these "scripts" for almost any profession.  There are
manuals on how to fix everything on a car, and treatment plans for
almost every disease.  Yet most people intuitively understand you
take your car to a mechanic and your body to a doctor for the proper
diagnosis.  The primary thing you're paying for is expertise in
what to fix, not how to fix it.  That takes experience and training.

But somehow it doesn't sink in with networking.  I would not at all
be surprised to hear that someone over at Cloudflare right now is
saying "let's make a script to check the packet size" as if that
will fix the problem.  It won't.  Next time the issue will be
different, and the same undertrained person who missed the packet
size this time will miss the next issue as well.  They should all be
sitting around saying, "how can we hire compentent network admins for
our NOC", but that would cost real money.

-- 
       Leo Bicknell - bicknell@ufp.org - CCIE 3440
        PGP keys at http://www.ufp.org/~bicknell/