Eric Germann <ekgermann@cctec.com> writes:
At 04:31 PM 9/18/97 -0400, Sean M. Doran wrote:
Perhaps you could explain to me how you can find the shortest path between A and B using ping times, traceroute hop counts, and AS_PATHS observed at C, assuming that traffic between A and B is not exchanged through C?
You're not trying to find it between A and B. A connects to B, but B has every intention of redirecting to C1, C2, or C3, etc.
The originally proposed idea was that search engines would present a list of hits sorted by the proximity to the client that made the query. What you are doing is rehashing the IBM scheme. Unfortunately the IBM scheme requires * an initial connection to a server * each server must know the location of its replicated data * each server must be able to determine the proximity of its replicated data to the client in reasonable time and without placing too much additional load on the server, the client, or the replicating sites Point (1) means that for short transactions, preserving locality is a net loss. If it is more work to redirect than to serve directly, there is no point in not serving directly. Point (2) means that copies of the data that may be nearer the client may not be known about. This is also a scaling problem. Point (3) is a difficult engineering challenge, not least because determining proximity (assuming proximity is a function of bandwidth, delay, and infrastructure) without inducing traffic load is hard. What is really needed is a scheme such that * the client learns the identity of the datum desired, and the ultimate, authoritative source of the datum, through user interaction (typing in a URL) or through a search database * the client or its proxy asks increasingly distant infrastructure if replicas are available; Van Jacobson suggests this is a good application for multicast and I'm inclined to agree somewhat * the client or its proxy retreives the found replicated datum if a replica exists and is found within a reasonable time or scope of multicast search or the client or its proxy turns to the ultimate source for a copy * should the client or its proxy hear a query for this same datum, it will offer it up for retrieval to other clients that is, everything in your client or proxy's local cache may be served up to nearby clients Note that the squid caching system emulates this scheme to a large degree. The difficulty is now reduced to constructing a spanning tree for all multicast queries and determining whether something found by the client or its proxy to be topologically nearby is a good choice for serving up the data. These problems seem much more tractable than the ones the IBM scheme needs to solve, and a great deal of theoretical work has gone into this sort of model already.
Anyone know of some reasonably available methods for measuring end to end "performance" which are almost universally implemented?
No. See the CAIDA or IPPM efforts for why not. Sean. P.S.: Curtis Villamizar had another interesting approach which involved pushing content far afield to machines with the same transport-layer (IP) addresses, relying upon closest-exit routing to connect one to the topologically-closest replication machine. Unfortunately, while this could be really cool for NSPs to offload stuff towards peering points (public or private), it also has some poor scaling properties and is uncomfortably reliant upon the stability of routing. If he's done any more thinking about the idea, I'd love to hear about it though.