On Mon, Mar 31, 2008 at 8:24 AM, <
michael.dillon@bt.com> wrote:
> Here is a little hint - most distributed applications in
> traditional jobsets, tend to work best when they are close
> together. Unless you can map those jobsets onto truly
> partitioned algorithms that work on local copy, this is a
> _non starter_.
Let's make it simple and say it in plain English. The users
of services have made the decision that it is "good enough"
to be a user of a service hosted in a data center that is
remote from the client. Remote means in another building in
the same city, or in another city.
Try reading for comprehension. The users of services have made the decision that it is good enough to be a user of a service hosted in a datacenter, and thanks to the wonders of AJAX and pipelining, you can even get snappy performance. What the users haven't signed up for is the massive amounts of scatter gathers that happen _behind_ the front end. Eg, I click on a web page to log in. The login process then kicks off a few authentication sessions with servers located halfway around the world. Then you do the data gathering, 2 phase locks, distributed file systems with the masters and lock servers all over the place. Your hellish user experience, let me SHOW YOU IT.
Now, given that context, many of these "good enough" applications
will run just fine if the "data center" is no longer in one
physical location, but distributed across many. Of course,
as you point out, one should not be stupid when designing such
distributed data centers or when setting up the applications
in them.
Other than that minor handwaving, we are all good. Turns out that desining such distributed datacenters and setting up applications that you just handwaved away is a bit harder than it looks. I eagerly await papers on distributed database transactions with cost estimates for a distributed datacenter model vs. a traditional model.
I would assume that every data center has local storage available
using some protocol like iSCSI and probably over a separate network
from the external client access. That right there solves most of
your problems of traditional jobsets. And secondly, I am not suggesting
that everybody should shut down big data centers or that every
application
should be hosted across several of these distributed data centers.
See above. That right there doesn't quite solve most of the problems of traditional jobsets but its kind of hard to hear with the wind in my ears.
There will always be some apps that need centralised scaling. But
there are many others that can scale in a distributed manner, or
at least use distributed mirrors in a failover scenario.
Many many others indeed.
> No matter how much optical technology you have, it will tend
> to be more expensive to run, have higher failure rates, and
> use more power, than simply running fiber or copper inside
> your datacenter. There is a reason most people, who are
> backed up by sober accountants, tend to cluster stuff under one roof.
Frankly I don't understand this kind of statement. It seems
obvious to me that high-speed metro fibre exists and corporate
IT people already have routers and switches and servers in the
building, connected to the metro fiber. Also, the sober accountants
do tend to agree with spending money on backup facilities to
avoid the risk of single points of failure. Why should company A
operate two data centers, and company B operate two data centers,
when they could outsource it all to ISP X running one data center
in each of the two locations (Company A and Company B).
I guess I can try to make it clearer by example: look at the cross-sectional bandwidth availability of a datacenter, now compare and contrast what it would take to pull it apart by a few tens of miles and conduct the cost comparison.
/vijay
In addition, there is a trend to commoditize the whole data center.
Amazon EC2 and S3 is not the only example of a company who does
not offer any kind of colocation, but you can host your apps out
of their data centers. I believe that this trend will pick up
steam and that as the corporate market begins to accept running
virtual servers on top of a commodity infrastructure, there is
an opportunity for network providers to branch out and not only
be specialists in the big consolidated data centers, but also
in running many smaller data centers that are linked by fast metro
fiber.
--Michael Dillon