Re: Does anyone use anycast DHCP service?

13 Aug 2012

      In a message written on Mon, Aug 13, 2012 at 08:54:09AM -0500, Ryan Malayter wrote:
...
1) No third-party "witness" service for the cluster, making
split-brain scenarios a very real possibility.
The ISC implementation is designed to continue to work with a "split
brain".  I believe the Microsoft solution is as well, but I know
less about it.  There's no need to detect if the redundant pair
can't communicate as things continue to work.  (With some caveats,
see below.)
...
2) Multi-master databases are quite challenging in practice. This one
appears to rely on timestamps from the system clock for conflict
detection, which has been shown to be unreliable time and again in the
application space.
You are incorrect.  The ISC implementation divides the free addresses
between the two servers.  The client will only interact with the
first to respond (literally, no timestamps involved).  Clients
talking to each half of a split brain can continue to receive
addresses from the shared range, no timestamps are needed to resolve
conflicts, because the pool was split prior to the loss of
server-to-server communication.

There is a down-side to this design, in that if half the brain goes
away half of the free addresses become unusable with it until it
resynchronizes.  This can be mitigated by oversizing the pools.
...
3) There are single points of failure. You've traded hardware as a
single point of failure for "bug-free implementation of clustering
code on both DHCP servers" as a single point of failure. In general,
software is far less reliable than hardware.
Fair enough.

However I suspect most folks are not protecting against hardware
or software failures, but rather circuit failures between the client
and the DHCP servers.

I've actually never been a huge fan of large, centralized DHCP
servers, clustered or otherwise.  Too many eggs in one basket.  I
see how it may make administration a bit easier, but it comes at
the cost of a lot of resiliancy.  Push them out to the edge, make
each one responsible for a local network or two.  Impact of an
outage is much lower.  If the router provides DHCP, the failure
modes work together, router goes down so does the DHCP server.

I think a lot of organizations only worry about the redundancy of
DHCP servers because the entire company is dependant on one server
(or cluster), and the rest of their infrastructure is largely
non-redundant.

-- 
       Leo Bicknell - bicknell@ufp.org - CCIE 3440
        PGP keys at http://www.ufp.org/~bicknell/