RE: Redundant Data Center Architectures

28 Oct 2009

      ...
-----Original Message-----
From: Darren Bolding [mailto:darren@bolding.org]
Sent: Wednesday, October 28, 2009 4:57 PM
To: Roland Dobbins
Cc: NANOG list
Subject: Re: Redundant Data Center Architectures
Also, commercial solutions from F5 (their GTM product and their old 3-
DNS
product).
Using CDN's is also a way of handling this, but you need to be prepared
for
all your traffic to come from their source-ip's or do creative things
with
x-forwarded-for etc.
Making an active/active datacenter design work (or preferably one with
enough sites such that more than one can be down without seriously
impacted
service) is a serious challenge.  Lots of people will tell you (and
sell you
solutions for) parts of the puzzle.  My experience has been that the
best
case is when the architecture of the application/infrastructure have
been
designed with these challenges in mind from the get-go.  I have seen
that
done on the network and server side, but never on the software side-
that
has always required significant effort when the time came.
The "drop in" solutions for this (active/active database replication,
middleware solutions, proxies) are always expensive in one way or
another
and frequently have major deployment challenges.
The network side of this can frequently be the easiest to resolve, in
my
experience.  If you are serving up content that does not require
synchronized data on the backend, then that will make your life much
easier,
and GSLB, a CDN or similar may help a great deal.
Thanks everyone who has responded so far.  

I should have clarified my intent a bit in the original email.  I am definitely interested in architectures which support synchronized data between data center locations in as close to real-time as possible.  More specifically, I am interested in designs which support zero down-time during failures, or as close to zero down-time as possible.  GSLB, Anycast, CDNs... those types of approaches certainly have their place especially where the pull-model is employed (DNS, Netflix, etc.).  However, what types of solutions are being used for synchronized data and even network I/O on back-end systems?  I've been looking at the VMware vSphere 4 Fault Tolerance stuff to synchronize the data storage and network I/O across distributed virtual machines, but still worried about the consequences of doing this stuff across WAN links with high degrees of latency, etc.  From the thread I get the feeling that L2 interconnects (VPLS, PWs) are generally considered a bad thing, I gathered as much as I figured there would be lots of unintended consequences with regards to designated router elections and other weirdness.  Besides connecting sites via L3 VPNs, what other approaches are others using?  Also, would appreciate any comments to the synchronization items above.

Thanks,

--
Stefan Fouant