On Mon, Aug 13, 2012 at 9:10 AM, Leo Bicknell <bicknell@ufp.org> wrote:
The ISC implementation is designed to continue to work with a "split brain". I believe the Microsoft solution is as well, but I know ... You are incorrect. The ISC implementation divides the free addresses between the two servers. The client will only interact with the first to respond (literally, no timestamps involved). Clients talking to each half of a split brain can continue to receive addresses from the shared range, no timestamps are needed to resolve conflicts, because the pool was split prior to the loss of server-to-server communication.
There is a down-side to this design, in that if half the brain goes away half of the free addresses become unusable with it until it resynchronizes. This can be mitigated by oversizing the pools.
Glad to hear it is a better design than my first skimming of the documentation indicated. Essentially,an ISC DHCPD cluster is basically two independent servers, with the added optimization of replicating reservations from one system to the other so it can answer renewals when possible. I still wonder what happens when a renewal happens during failover, and then the original server comes back on-line, and a renewal of the same address happens during startup. Hopefully any node joining a cluster waits until it is fully synchronized before answering queries. I've seen so many two-node "HA pair" setups go horribly sideways during my IT career, I usually assume the worst. Firewalls, load balancers, stackable switches, databases, SANs, you name it. They all usually survive the "pull the plug on one node" test during QA, but that's about it. -- RPM