In a message written on Mon, Aug 13, 2012 at 08:54:09AM -0500, Ryan Malayter wrote:
1) No third-party "witness" service for the cluster, making split-brain scenarios a very real possibility.
The ISC implementation is designed to continue to work with a "split brain". I believe the Microsoft solution is as well, but I know less about it. There's no need to detect if the redundant pair can't communicate as things continue to work. (With some caveats, see below.)
2) Multi-master databases are quite challenging in practice. This one appears to rely on timestamps from the system clock for conflict detection, which has been shown to be unreliable time and again in the application space.
You are incorrect. The ISC implementation divides the free addresses between the two servers. The client will only interact with the first to respond (literally, no timestamps involved). Clients talking to each half of a split brain can continue to receive addresses from the shared range, no timestamps are needed to resolve conflicts, because the pool was split prior to the loss of server-to-server communication. There is a down-side to this design, in that if half the brain goes away half of the free addresses become unusable with it until it resynchronizes. This can be mitigated by oversizing the pools.
3) There are single points of failure. You've traded hardware as a single point of failure for "bug-free implementation of clustering code on both DHCP servers" as a single point of failure. In general, software is far less reliable than hardware.
Fair enough. However I suspect most folks are not protecting against hardware or software failures, but rather circuit failures between the client and the DHCP servers. I've actually never been a huge fan of large, centralized DHCP servers, clustered or otherwise. Too many eggs in one basket. I see how it may make administration a bit easier, but it comes at the cost of a lot of resiliancy. Push them out to the edge, make each one responsible for a local network or two. Impact of an outage is much lower. If the router provides DHCP, the failure modes work together, router goes down so does the DHCP server. I think a lot of organizations only worry about the redundancy of DHCP servers because the entire company is dependant on one server (or cluster), and the rest of their infrastructure is largely non-redundant. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/