RE: Facility wide DR/Continuity
On the subject of DNS GSLB, there's a fairly well known article on the subject that anyone considering implementing it should read at least once.... :) http://www.tenereillo.com/GSLBPageOfShame.htm and part 2 http://www.tenereillo.com/GSLBPageOfShameII.htm Yes it was written in 2004. But all the "food for thought" that it provides is still very much applicable today.
gb10hkzo-nanog@yahoo.co.uk writes:
On the subject of DNS GSLB, there's a fairly well known article on the subject that anyone considering implementing it should read at least once.... :)
http://www.tenereillo.com/GSLBPageOfShame.htm and part 2 http://www.tenereillo.com/GSLBPageOfShameII.htm
Yes it was written in 2004. But all the "food for thought" that it provides is still very much applicable today.
One thing I've noticed about this paper in the past that kind of bugs me is that in arguing that multiple A records are a better solution than a single GSLB-managed A record, the paper assumes that browsers and other common internet clients will actually cache multiple A records, and fail between them if the earlier A records fail. The (first) of the two pages explicitly touts this as a high availability solution. However, I haven't observed this behavior from browsers, media players, and similar programs `in the wild' -- as far as I've been able to tell, most client software picks an A record from those returned (possibly, but not usually skipping those found to be unreachable), and then holds onto that choice of IP address until the record times out of cache, and a new request is made. Have I been unlucky in my observations? Are there client programs which do failover between multiple A records returned for a single name -- presumably sticking with one IP for session-affinity purposes until a failure is detected? If clients do not behave this way, then the paper's observations about GSLB for HA purposes don't seem to hold -- though in my limited experience the paper's other point (that geographic dispatch is Hard) seems much more accurate (making GSLB a better HA solution than it is a load-sharing solution, again, at least in my experience). Or am I missing something? -- Jim Wise jwise@draga.com
As with all things, there's no "right answer" ..... a lot of it depends on three things : - what you are hoping to achieve - what your budget is - what you have at your disposal in terms of numbers of qualified staff available to both implement and support the chosen solution That's the main business level factors. From a technical level, two key factors (although, of course, there are many others to consider) are : - whether you are after an active/active or active/passive solution - what the underlying application(s) are (e.g. you might have other options such as anycast with DNS) Anyway, there's a lot to consider. And despite all the expertise on Nanog, I would still suggest the original poster does their fair share of their own homework. :) ----- Original Message ---- From: Jim Wise <jwise@draga.com> To: gb10hkzo-nanog@yahoo.co.uk Cc: nanog@nanog.org Sent: Wednesday, 3 June, 2009 15:42:24 Subject: Re: Facility wide DR/Continuity gb10hkzo-nanog@yahoo.co.uk writes:
On the subject of DNS GSLB, there's a fairly well known article on the subject that anyone considering implementing it should read at least once.... :)
http://www.tenereillo.com/GSLBPageOfShame.htm and part 2 http://www.tenereillo.com/GSLBPageOfShameII.htm
Yes it was written in 2004. But all the "food for thought" that it provides is still very much applicable today.
One thing I've noticed about this paper in the past that kind of bugs me is that in arguing that multiple A records are a better solution than a single GSLB-managed A record, the paper assumes that browsers and other common internet clients will actually cache multiple A records, and fail between them if the earlier A records fail. The (first) of the two pages explicitly touts this as a high availability solution. However, I haven't observed this behavior from browsers, media players, and similar programs `in the wild' -- as far as I've been able to tell, most client software picks an A record from those returned (possibly, but not usually skipping those found to be unreachable), and then holds onto that choice of IP address until the record times out of cache, and a new request is made. Have I been unlucky in my observations? Are there client programs which do failover between multiple A records returned for a single name -- presumably sticking with one IP for session-affinity purposes until a failure is detected? If clients do not behave this way, then the paper's observations about GSLB for HA purposes don't seem to hold -- though in my limited experience the paper's other point (that geographic dispatch is Hard) seems much more accurate (making GSLB a better HA solution than it is a load-sharing solution, again, at least in my experience). Or am I missing something? -- Jim Wise jwise@draga.com
On Wed, Jun 3, 2009 at 10:53 AM, <gb10hkzo-nanog@yahoo.co.uk> wrote:
- whether you are after an active/active or active/passive solution
In practice, active/passive DR solutions often fail. You rarely need to fail over to the passive system. When you finally do need to fail over, there are a dozen configuration changes that didn't make it from the active system, so the passive system isn't in a runable state. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
Tell me about it ...... "failover test.... what failover test" ;-) ----- Original Message ---- From: William Herrin <herrin-nanog@dirtside.com> To: gb10hkzo-nanog@yahoo.co.uk Cc: nanog@nanog.org Sent: Wednesday, 3 June, 2009 16:05:15 Subject: Re: Facility wide DR/Continuity On Wed, Jun 3, 2009 at 10:53 AM, <gb10hkzo-nanog@yahoo.co.uk> wrote:
- whether you are after an active/active or active/passive solution
In practice, active/passive DR solutions often fail. You rarely need to fail over to the passive system. When you finally do need to fail over, there are a dozen configuration changes that didn't make it from the active system, so the passive system isn't in a runable state. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On Jun 3, 2009, at 10:05 PM, William Herrin wrote:
You rarely need to fail over to the passive system.
And management will never, ever let you do a full-up test, nor will they allow you to spend the money to build a scaled-up system which can handle the full load, because they can't stand the thought of hardware sitting there gathering dust. Concur 100%. Active/passive is an obsolete 35-year-old mainframe paradigm, and it deserves to die the death. With modern technology, there's just really no excuse not to go active/active, IMHO. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Unfortunately, inefficiency scales really well. -- Kevin Lawton
On Wed, Jun 3, 2009 at 11:15 AM, Roland Dobbins<rdobbins@arbor.net> wrote:
Active/passive is an obsolete 35-year-old mainframe paradigm, and it deserves to die the death. With modern technology, there's just really no excuse not to go active/active, IMHO.
Roland, Sometimes you're limited by the need to use applications which aren't capable of running on more than one server at a time. In other cases, its obscenely expensive to run an application on more than one server at a time. Nor is the split-brain problem in active/active systems a trivial one. There are still reasons for using active/passive configurations, but be advised that active/active solutions have a noticeably better success rate than active/passive ones. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On Jun 3, 2009, at 10:36 PM, William Herrin wrote:
Sometimes you're limited by the need to use applications which aren't capable of running on more than one server at a time.
All understood - which is why it's important that app devs/database folks/sysadmins are all part of the virtual team working to uplift legacy siloed OS/app stacks into more modern and flexible architectures. ;> ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Unfortunately, inefficiency scales really well. -- Kevin Lawton
Roland Dobbins wrote:
On Jun 3, 2009, at 10:05 PM, William Herrin wrote:
You rarely need to fail over to the passive system.
And management will never, ever let you do a full-up test, nor will they allow you to spend the money to build a scaled-up system which can handle the full load, because they can't stand the thought of hardware sitting there gathering dust.
Concur 100%.
Active/passive is an obsolete 35-year-old mainframe paradigm, and it deserves to die the death. With modern technology, there's just really no excuse not to go active/active, IMHO.
There's always one good reason: money. Some things just don't active/active nicely on a budget. Then you're trying to explain why you want to spend money on a SAN when they really want to spend the money on new "green" refrigerators. (That's not a joke, it really happened.) ~Seth
On Jun 3, 2009, at 10:38 PM, Seth Mattinen wrote:
Some things just don't active/active nicely on a budget.
Sure, because of inefficient legacy design choices. Distribution and scale is ultimately an application architecture issue, with networking and ancillary technologies playing an important supporting role. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Unfortunately, inefficiency scales really well. -- Kevin Lawton
participants (5)
-
gb10hkzo-nanog@yahoo.co.uk
-
Jim Wise
-
Roland Dobbins
-
Seth Mattinen
-
William Herrin