On Wed, Jun 3, 2009 at 7:09 AM, Drew Weaver <drew.weaver@thenap.com> wrote:
Hi All,
I'm attempting to devise a method which will provide continuous operation of certain resources in the event of a disaster at a single facility.
The types of resources that need to be available in the event of a disaster are ecommerce applications and other business critical resources.
Some of the questions I keep running into are:
Should the additional sites be connected to the primary site (and/or the Internet directly)? What is the best way to handle the routing? Obviously two devices cannot occupy the same IP address at the same time, so how do you provide that instant 'cut-over'? I could see using application balancers to do this but then what if the application balancers fail, etc?
Any advice from folks on list or off who have done similar work is greatly appreciated.
Thanks, -Drew
In an environment where a DR site is deemed critical, it is my experience that critical business applications also have a test or development environment associated with the production one. If you look at the problem this way, then a DR equipped with the test/devel systems, with one "instance" of production always available, would only be challenging in terms of data sync. Various SAN solutions would resolve that (SAN sync-ing over WAN/MAN/etc.). Virtualization of critical systems may also add some benefits here: clone the critical VMs in the DR, and in conjunction with the storage being available, you'll be able to bring up this type of machines in no time - just make sure you have some sort of L2 available - maybe EoS, or tunneling over an L3 connectivity - tons of info when querying for virtual machine mobility and inter-site connectivity. Voice has to be considered, also - f/PSTN - make arrangements with provider to re-route (8xx) in case of disaster. VoIP may add some extra capabilities in terms of reachability over the Internet, in case your DR site cannot accommodate - C/S people, for example, who are critical to interface with customers in case of disaster (if no information - bigger loss - perception issues) have to be able to connect even from home. As far as "immediate" switch from one to another - DNS is the primary concern (unless some wise people have hardcoded IPs all over), but there are other issues people tend to forget, at the core of some clilents - take Oracle "fat" client and its TNS names - I've seen those associated with IPs, instead of host names ... etc. Disclaimer: the above = one of many aspects. Have seen DNS comments already, so I won't repeat those aspects. HTH, -- ***Stefan http://twitter.com/netfortius