I'm trying to get a more clear understanding as to what is involved in terms of moving the IPs, and how fast it can potentially be done.
can we presume that separate ip spaces and changing dns, i.e. maybe ten minutes at worst, is insufficiently fast?
Absolutely. We are trying to explore the (arguably insane) idea of failing things over sufficiently fast (and state-fully) that open connections remain completely functional.
I'm fairly sure that what I would like to do is to arrange what is effectively dual-homing, but with two geographically distinct homes:
uh, that kinda inverts what we normally mean by 'multi-homing'. that's usually two upstream providers for a single site.
Yep, which is what I want -- It's just that the single site is going to move. ;) Consider a traditional (single site) dual-homed situation, where I'm not doing any kind of balancing across the links. In that (my understanding of) that case, I would use a private stub AS with the two upstream links going to the common provider AS, and advertize a change to the link weight on the backup link when I wanted a switch to happen. (Or if the primary failed this would presumably happen automatically through it's link disappearing.) In this new scheme, I want to make _everything_ redundant. The backup link is to a geographically distinct site, and all of the hosts in the primary site are actively mirrored to the backup site: OS, applications, TCP connection state and all. So it's _kind of_ dual homing -- two upstream links for a single (virtual) site.
... i am sure others can come up with more clever hacks. beware if they're too clever.
I completely agree with your comments regarding clever hacks, which is why I'm trying to draw analogy to dual-homing, a technique that's known, trusted, and clearly not fraught with corner-cases and devilish complexity. ;) Seriously though, I'm trying to convince myself that there is a reasonable approach here that is within the means of datacenter operators and their ISPs, and would allow a switch with on the order of seconds of reconfiguration time.
persistent tcp connections from clients would not fare well unless you actually did the hacks to migrate the sessions, i.e. tcp serial numbers and all the rest of the tcp state. hard to do.
Since we move the entire OS, the TCP state goes with it. We've done this in the past on the local link by migrating the host and sending an unsolicited ARP reply to notify the switch that the IP has moved to a new MAC (http://www.cl.cam.ac.uk/~akw27/papers/nsdi-migration.pdf), I think that order-of-seconds reconfiguration should allow the same sort of migration to work at a larger scope.
well, you left of mention of us legislative follies and telco and cable greed. but maybe you can get away with a purely technical question once if you promise not to do it again. :-)
Thanks! And thanks everyone for the feedback -- incredibly helpful. I'll try for follies and greed next time. ;) a.