As with all things, there's no "right answer" ..... a lot of it depends on three things : - what you are hoping to achieve - what your budget is - what you have at your disposal in terms of numbers of qualified staff available to both implement and support the chosen solution That's the main business level factors. From a technical level, two key factors (although, of course, there are many others to consider) are : - whether you are after an active/active or active/passive solution - what the underlying application(s) are (e.g. you might have other options such as anycast with DNS) Anyway, there's a lot to consider. And despite all the expertise on Nanog, I would still suggest the original poster does their fair share of their own homework. :) ----- Original Message ---- From: Jim Wise <jwise@draga.com> To: gb10hkzo-nanog@yahoo.co.uk Cc: nanog@nanog.org Sent: Wednesday, 3 June, 2009 15:42:24 Subject: Re: Facility wide DR/Continuity gb10hkzo-nanog@yahoo.co.uk writes:
On the subject of DNS GSLB, there's a fairly well known article on the subject that anyone considering implementing it should read at least once.... :)
http://www.tenereillo.com/GSLBPageOfShame.htm and part 2 http://www.tenereillo.com/GSLBPageOfShameII.htm
Yes it was written in 2004. But all the "food for thought" that it provides is still very much applicable today.
One thing I've noticed about this paper in the past that kind of bugs me is that in arguing that multiple A records are a better solution than a single GSLB-managed A record, the paper assumes that browsers and other common internet clients will actually cache multiple A records, and fail between them if the earlier A records fail. The (first) of the two pages explicitly touts this as a high availability solution. However, I haven't observed this behavior from browsers, media players, and similar programs `in the wild' -- as far as I've been able to tell, most client software picks an A record from those returned (possibly, but not usually skipping those found to be unreachable), and then holds onto that choice of IP address until the record times out of cache, and a new request is made. Have I been unlucky in my observations? Are there client programs which do failover between multiple A records returned for a single name -- presumably sticking with one IP for session-affinity purposes until a failure is detected? If clients do not behave this way, then the paper's observations about GSLB for HA purposes don't seem to hold -- though in my limited experience the paper's other point (that geographic dispatch is Hard) seems much more accurate (making GSLB a better HA solution than it is a load-sharing solution, again, at least in my experience). Or am I missing something? -- Jim Wise jwise@draga.com