You dont say who the "clients" are - I presume this is a web based application so essentially you are trying to migrate service in flight to another set of servers within the TCP/HTTP session timeout without the client missing a beat ? If another kind of client, does it also have auto reconnect/retry logic built in for service restoral if the connection timesout ? Is the session/host state worth preserving for communication between the servers in the cluster or between the clients and the service also ? I know of people who have been able to do this on LANs using SANs to store shared host states and having a new VM pick up the connections, but on an internet-wide scale you are likely looking only at a probabilistic guarentee assuming that your routing would always converge in time and packets start flowing to the Disaster Recovery (DR) site. This is much easier if you can stick within a single AS ofcourse. Others will be able to answer whether these routing changes will attract dampening penalties if you have to pick providers in different ASes. Assuming all of that doesnt matter, then a somewhat cleaner way to do this would be to advertize a less specific route from the DR location covering the more specific route of the primary location. If the primary route is withdrawn, voila .. traffic starts moving to the less specific route automatically without you having to scramble at the time of the outage to inject a new route. Andrew Warfield <andrew.warfield@cl.cam.ac.uk> wrote: I've got a bit of a network reconfiguration question that I'm wondering if anyone on NANOG might be able to provide a bit of advice on: I'm working on a project to provide failover of entire cluster-based (and so multi-host) applications to a geographically distinct backup site. The general idea is that as one datacentre burns down, a live service may be moved over to an alternate site without any interruption to clients. All of the host-state migration is done using virtual machines and associated magic; I'm trying to get a more clear understanding as to what is involved in terms of moving the IPs, and how fast it can potentially be done. I'm fairly sure that what I would like to do is to arrange what is effectively dual-homing, but with two geographically distinct homes: Assuming that I have an in-service primary site A, and an emergency backup site B, each with a distinct link into a common provider AS, I would configure B's link as redundant into the stub AS for A -- as if the link to B were the redundant link in a (traditional single-site) dual-homing setup. B would additionally host it's own IP range, used for control traffic between the two sites in normal operation. When I desire to migrate hosts to the failover site, B would send a BGP update advertizing that the redundant link should become preferred, and (hopefully) the IGP in the provider AS would seamlessly redirect traffic. Assuming that everything works okay with the virtual machine migration, connections would continue as they were and clients would be unaware of the reconfiguration. Does the routing reconfiguration story here sound plausible? Does anyone have any insight as to how long such a reconfiguration would reasonably take and/or if it is something that I might be able to negotiate a SLA for with a provider if I wanted to actually deploy this sort of redundancy as a service? Is anyone aware of similar high-speed failover schemes in use on the network today? Thoughts appreciated, I hope this is reasonably on-topic for the list. best, a.