How do you plan to accumulate a priori knowledge of distant topology and connectivity using current routing protocols and the current transport addressing scheme?
AS_PATH was the first idea - other such tools could include ping times and traceroute hop counts. It's been pointed out to me that IBM supposedly did something like this for the '96 Olympics.
Idea: what about a search engine that understands a BGP table?
Whose BGP table? Remember that you want to determine what is most local to the client or its proxies.
True - having a search engine look at its own BGP table is not the best indicator of distance, especially if the search client is "distant" (many AS's away) from the engine. However, given the prevalence of things like the Merit tools that show the BGP exchanges at major NAPs, it's conceivable that a search engine could grabe these tables on a regular basis, and from there it becomes pretty much an SPF tree through AS's. I do concur, though, that ping and traceroute are probably more sensible metrics to use.
1) perform the query. 2) if your query returns multiple places to get the same page a) look at the AS_PATH for the querying IP address b) look at the AS_PATHs for the found pages c) Determine and return the "closest" one - perhaps the one whose AS_PATH is most like that of the querying host.
(c) is full of landmines thanks to such nifty things as aggregation, the single-view propagation feature, deliberately non-unique addresses and change and instability of intermediate topology from moment to moment.
Agreed.
Anybody out there have any spare venture capital? :)
Since you are trying to get it to work correctly with an addressing scheme which only very weakly encodes topological information, the lossy DV approach to propagating routing information (as opposed to a map-exchanging scheme), three huge churny databases (the mapping of information to URL, the mapping of hostname to IP address and the mapping of IP addresses to paths) and attempting to come up with a non existant database or workable heuristics (the mapping of n observed paths to a graph of connectivity among m endpoints), I would say that you need the level of funding you could only raise from such lucrative business as the Psychic Friends Network.
Just for the record, I *was* kidding. I don't actually think I have the time or expertise to make it work. However, I think the idea is worth looking at. Two of the "three huge churny databases" (info to URL, url to IP) already are in place, and I bet the overhead involved in an un-cached IP lookup is a lot more than that of an SPF walk through a BGP tree.
Meanwhile, I suggest you look at Dave Clark's distributed database work (I think I remember Van Jacobson commenting in more detail than his "How to Kill the Internet" viewgraphs on how to apply this to the WWW) and consider a scheme where rather than a database which centralizes searches for a weak data architechture, a better architecture and a scheme which treats every reference into it as a search for the most local copy would be a better development direction.
Will do, thanks.
Sean.
eric