On Fri, Mar 19, 2010 at 05:10:04PM -0700, Mike wrote:
With all due respect and acknowledgment of the tremendous contributions of ISC and you yourself Mr. Hankins, I have to comment that failover in isc-dhcp is broken by design because it requires the amount of handholding and operator thinking in the event of a failure that you explained to us at length is required. Failure needs to be handled automatically and without any intervention at all, otherwise you might as well not have it and I think most network operators would agree.
First let me say that I wasn't involved in failover's design, I'm only a sort of "maintainer," so the criticism is not offending me in the slightest. :) Failover definitely busied itself with the cross-country, geographically diverse DHCP server situation, hoping that by solving that they are also giving "HA", heartbeat-cable types of folks a tool they can also use, although it isn't explicitly designed for that purpose alone. That does tend to leave this community a little under-served and unhappy, which was my motivation for failover features in 4.2 to try and support their needs better (auto partner- down, greater endurance in comms-interrupted). What you describe for an alternative (although I will criticize it slightly in suggesting you are under-estimating DHCP's needs; the question of message delivery is really not relevant) are the building blocks for something I would refer to as "DHCP Server Clustering". I fully endorse it. That is a set of separate programs that work together to appear from the outside to be a single DHCP server (as those terms are defined in RFC), and the ways in which you can build-in redundancy and self- healing (self-restarting components, component failures only affect a subset of services, redundant processes that cover gaps in coverage, etc). In short, you're describing one of our key motivations for migrating ISC DHCP to the BIND 10 framework. That gives us a complete set of tools. Within the same rack, you will ultimately be able to implement a "single server" from all outside observance that is actually implemented in a redundant way across (N+1) systems* or CPU's within one system, while still maintaining a failover ability to tie two such geographically diverse clusters together (not to mention co-habitation with BIND 10's DNS services in the same configuration and monitoring plane) that don't actually have to be clusters if you don't want all that baggage either. So everyone's happy. Unfortunately at the moment we are still collecting sponsors for the DHCP-in-BIND-10 project, and no shovels have been turned. But I'm confident the work will proceed (and if anyone wishes to help as a sponsor or a participant, please contact us! We are in Anaheim this week, and there is also a link in my signature you can click). In the meantime, failover is a tool we have whereas DHCP clustering software is so far only a tool we want to create. * Some objects in the future-mirror may be further away than they appear. -- David W. Hankins BIND 10 needs more DHCP voices. Software Engineer There just aren't enough in our heads. Internet Systems Consortium, Inc. http://bind10.isc.org/