On Mon, Mar 31, 2008 at 8:24 AM, <michael.dillon@bt.com> wrote:
Here is a little hint - most distributed applications in traditional jobsets, tend to work best when they are close together. Unless you can map those jobsets onto truly partitioned algorithms that work on local copy, this is a _non starter_.
Let's make it simple and say it in plain English. The users of services have made the decision that it is "good enough" to be a user of a service hosted in a data center that is remote from the client. Remote means in another building in the same city, or in another city.
Try reading for comprehension. The users of services have made the decision that it is good enough to be a user of a service hosted in a datacenter, and thanks to the wonders of AJAX and pipelining, you can even get snappy performance. What the users haven't signed up for is the massive amounts of scatter gathers that happen _behind_ the front end. Eg, I click on a web page to log in. The login process then kicks off a few authentication sessions with servers located halfway around the world. Then you do the data gathering, 2 phase locks, distributed file systems with the masters and lock servers all over the place. Your hellish user experience, let me SHOW YOU IT.
Now, given that context, many of these "good enough" applications will run just fine if the "data center" is no longer in one physical location, but distributed across many. Of course, as you point out, one should not be stupid when designing such distributed data centers or when setting up the applications in them.
Other than that minor handwaving, we are all good. Turns out that desining such distributed datacenters and setting up applications that you just handwaved away is a bit harder than it looks. I eagerly await papers on distributed database transactions with cost estimates for a distributed datacenter model vs. a traditional model.
I would assume that every data center has local storage available using some protocol like iSCSI and probably over a separate network from the external client access. That right there solves most of your problems of traditional jobsets. And secondly, I am not suggesting that everybody should shut down big data centers or that every application should be hosted across several of these distributed data centers.
See above. That right there doesn't quite solve most of the problems of traditional jobsets but its kind of hard to hear with the wind in my ears.
There will always be some apps that need centralised scaling. But there are many others that can scale in a distributed manner, or at least use distributed mirrors in a failover scenario.
Many many others indeed.
No matter how much optical technology you have, it will tend to be more expensive to run, have higher failure rates, and use more power, than simply running fiber or copper inside your datacenter. There is a reason most people, who are backed up by sober accountants, tend to cluster stuff under one roof.
Frankly I don't understand this kind of statement. It seems obvious to me that high-speed metro fibre exists and corporate IT people already have routers and switches and servers in the building, connected to the metro fiber. Also, the sober accountants do tend to agree with spending money on backup facilities to avoid the risk of single points of failure. Why should company A operate two data centers, and company B operate two data centers, when they could outsource it all to ISP X running one data center in each of the two locations (Company A and Company B).
I guess I can try to make it clearer by example: look at the cross-sectional bandwidth availability of a datacenter, now compare and contrast what it would take to pull it apart by a few tens of miles and conduct the cost comparison. /vijay
In addition, there is a trend to commoditize the whole data center. Amazon EC2 and S3 is not the only example of a company who does not offer any kind of colocation, but you can host your apps out of their data centers. I believe that this trend will pick up steam and that as the corporate market begins to accept running virtual servers on top of a commodity infrastructure, there is an opportunity for network providers to branch out and not only be specialists in the big consolidated data centers, but also in running many smaller data centers that are linked by fast metro fiber.
--Michael Dillon