Eric Germann <ekgermann@cctec.com> writes:
At 04:31 PM 9/18/97 -0400, Sean M. Doran wrote:
Perhaps you could explain to me how you can find the shortest path between A and B using ping times, traceroute hop counts, and AS_PATHS observed at C, assuming that traffic between A and B is not exchanged through C?
You're not trying to find it between A and B. A connects to B, but B has every intention of redirecting to C1, C2, or C3, etc.
The originally proposed idea was that search engines would present a list of hits sorted by the proximity to the client that made the query. What you are doing is rehashing the IBM scheme. Unfortunately the IBM scheme requires * an initial connection to a server * each server must know the location of its replicated data * each server must be able to determine the proximity of its replicated data to the client in reasonable time and without placing too much additional load on the server, the client, or the replicating sites Point (1) means that for short transactions, preserving locality is a net loss. If it is more work to redirect than to serve directly, there is no point in not serving directly. Point (2) means that copies of the data that may be nearer the client may not be known about. This is also a scaling problem. Point (3) is a difficult engineering challenge, not least because determining proximity (assuming proximity is a function of bandwidth, delay, and infrastructure) without inducing traffic load is hard. What is really needed is a scheme such that * the client learns the identity of the datum desired, and the ultimate, authoritative source of the datum, through user interaction (typing in a URL) or through a search database * the client or its proxy asks increasingly distant infrastructure if replicas are available; Van Jacobson suggests this is a good application for multicast and I'm inclined to agree somewhat * the client or its proxy retreives the found replicated datum if a replica exists and is found within a reasonable time or scope of multicast search or the client or its proxy turns to the ultimate source for a copy * should the client or its proxy hear a query for this same datum, it will offer it up for retrieval to other clients that is, everything in your client or proxy's local cache may be served up to nearby clients Note that the squid caching system emulates this scheme to a large degree. The difficulty is now reduced to constructing a spanning tree for all multicast queries and determining whether something found by the client or its proxy to be topologically nearby is a good choice for serving up the data. These problems seem much more tractable than the ones the IBM scheme needs to solve, and a great deal of theoretical work has gone into this sort of model already.
Anyone know of some reasonably available methods for measuring end to end "performance" which are almost universally implemented?
No. See the CAIDA or IPPM efforts for why not. Sean. P.S.: Curtis Villamizar had another interesting approach which involved pushing content far afield to machines with the same transport-layer (IP) addresses, relying upon closest-exit routing to connect one to the topologically-closest replication machine. Unfortunately, while this could be really cool for NSPs to offload stuff towards peering points (public or private), it also has some poor scaling properties and is uncomfortably reliant upon the stability of routing. If he's done any more thinking about the idea, I'd love to hear about it though.
P.S.: Curtis Villamizar had another interesting approach which involved pushing content far afield to machines with the same transport-layer (IP) addresses, relying upon closest-exit routing to connect one to the topologically-closest replication machine. Unfortunately, while this could be really cool for NSPs to offload stuff towards peering points (public or private), it also has some poor scaling properties and is uncomfortably reliant upon the stability of routing.
If he's done any more thinking about the idea, I'd love to hear about it though.
I don't know about Curtis, but others have solved this problem (in theory) recently. We at Net Access have figured out a way (we believe) to get around the stability-of-routing issue for already-established TCP sessions in the above approach (multiple machines with the same IP externally, plus an internally different IP, each running gated to announce their /32(s) to your IGP) - hint: a question I asked on NANOG a few days back - And Alec Peterson (now of Erols) has figured out an even arguably slicker way to do it. I'll see if Merit wants to have Alec and I do a presentation on the methods @ NANOG. We should be able to implement our various solutions by then... Avi
The cyclic stuff in News recently has hammered home that the search for elegance pays off. I "initially" solved the problem of having multiple machines with the same IP by postulating IP-stack hackery involving forwarding any packets that come in for sockets that don't exist to a central database-forwarder machine that keeps track of all open tcp sessions on all replicated machines, but it turns out that there are some much easier non-stack-hackery-mandatory solutions. Avi
Going through some very back email on this list... In message <yt7mcel11t.fsf@cesium.clock.org>, "Sean M. Doran" writes:
P.S.: Curtis Villamizar had another interesting approach which involved pushing content far afield to machines with the same transport-layer (IP) addresses, relying upon closest-exit routing to connect one to the topologically-closest replication machine. Unfortunately, while this could be really cool for NSPs to offload stuff towards peering points (public or private), it also has some poor scaling properties and is uncomfortably reliant upon the stability of routing.
If he's done any more thinking about the idea, I'd love to hear about it though.
I'd like to hear what the scaling problems were. The replicated routes are advertised IBGP only and aggregated at the borders so I'm not sure what is "uncomfortably reliant upon the stability of routing" either. If you point traffic toward an ANS aggregate some of it is probably going to web servers served by this scheme. We have at least one fairly large customer using this but not one of the major sources of web content. We see their replicated web server subnet as a /27 internally and after aggregation you see it only as part of a /14. Those a bit more timid announce the same subnet from multiple sites with us setting differing local-pref and use DNS based load split, using the replicated prefix for fast fallback. This does nothing to improve proximity of access but means a down tail circuit is mended in under a minute rather than after DNS times out. Curtis
The biggest drawback we saw when we evaluated this approach (way back when) was that it depends on routing protocols, which are quite inadequate for predicting what a user is going to experience as the "best site". You also create some amount (perhaps not significant) of problems for stateful sessions (like web servers that are tracking who the user is through their entire session). I agree that it might be adequate for simply reducing backbone traffic. I believe AGIS uses this approach for their coolocation.
P.S.: Curtis Villamizar had another interesting approach which involved pushing content far afield to machines with the same transport-layer (IP) addresses, relying upon closest-exit routing to connect one to the topologically-closest replication machine. Unfortunately, while this could be really cool for NSPs to offload stuff towards peering points (public or private), it also has some poor scaling properties and is uncomfortably reliant upon the stability of routing.
participants (4)
-
Avi Freedman
-
Curtis Villamizar
-
jon@branch.net
-
Sean M. Doran