On Sun, Feb 26, 2012 at 7:02 PM, Randy Carpenter <rcarpen@network1.net> wrote:
On Feb 26, 2012, at 4:56 PM, Randy Carpenter wrote:
1. Full redundancy with instant failover to other hypervisor hosts upon hardware failure (I thought this was a given!)
This is actually a much harder problem to solve than it sounds, and gets progressively harder depending on what you mean by "failover".
At the very least, having two physical hosts capable of running your VM requires that your VM be stored on some kind of SAN (usually iSCSI based) storage system. Otherwise, two hosts have no way of accessing your VM's data if one were to die. This makes things an order of magnitude or higher more expensive.
This does not have to be true at all. Even having a fully fault-tolerant SAN in addition to spare servers should not cost much more than having separate RAID arrays inside each of the server, when you are talking about 1,000s of server (which Rackspace certainly has)
Randy, You're kidding, right? SAN storage costs the better part of an order of magnitude more than server storage, which itself is several times more expensive than workstation storage. That's before you duplicate the SAN and set up the replication process so that cabinet and room level failures don't take you out. DR sites then create a ferocious (read: expensive) bandwidth challenge. Data can't flush from the primary SAN's write cache until the DR SAN acknowledges receipt. If you don't have enough bandwidth to keep up under the heaviest daily loads, the cache quickly fills and the writes block. I maintain 50ish VMs with about 30 different providers at the moment. Not one of them attempts to do anything like what you describe.
NetApp. HA heads. Done. Add a DR site with replication, and you can survive a site failure, and be back up and running in less than an hour. I would think that the big datacenter guys already have this type of thing set up.
That's expensive and VMs are sold primarily on price. You want high reliability, you start with the dedicated colo server. Customers who want DR in a VM environment buy two VMs and build data replication at the app layer. On Mon, Feb 27, 2012 at 9:31 AM, Max <perldork@webwizarddesign.com> wrote:
Linode.com is not cloud based but they offer IP failover between VPS instances at no additonal charge - their pricing is excellent, I have had no down time issues with them in 3+ years with 3 different customers using them and they have nice OOB and programmatic API access for controlling VPs instances as well.
Hi Max, I have had superb results from Linode and highly recommend them. However, they're facilitating application level failover not keeping your VM magically alive. And: http://library.linode.com/linux-ha/ip-failover-heartbeat-pacemaker-ubuntu-10... "Both Linodes must reside in the same datacenter for IP failover" So they don't support a full DR capability even if you're smart at the app level. On Mon, Feb 27, 2012 at 9:39 AM, Jared Mauch <jared@puck.nether.net> wrote:
Is the DNS service authoritative or recursive? If auth, you can solve this a few ways, either by giving the DNS name people point to multiple AAAA (and A) records pointing at a diverse set of instances. DNS is designed to work around a host being down. Same goes for MX and several other services. While it may make the service slightly slower, it's certainly not the end of the world.
Hi Jared, How DNS is designed to work and how it actually works is not the same. Look up "DNS Pinning" for example. For most kinds of DR you need IP level failover where the IP address is rerouted to the available site. Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004