osborne@terra.net writes:
So perhaps what we need is a way for search engines to determine what's "close" - geographically, politically, or speed-wise. This isn't particularly easy to do, but if it was implemented and only worked, say, 15% of the time, it'd still make things look that much faster.
How do you plan to accumulate a priori knowledge of distant topology and connectivity using current routing protocols and the current transport addressing scheme?
Idea: what about a search engine that understands a BGP table?
Whose BGP table? Remember that you want to determine what is most local to the client or its proxies.
1) perform the query. 2) if your query returns multiple places to get the same page a) look at the AS_PATH for the querying IP address b) look at the AS_PATHs for the found pages c) Determine and return the "closest" one - perhaps the one whose AS_PATH is most like that of the querying host.
(c) is full of landmines thanks to such nifty things as aggregation, the single-view propagation feature, deliberately non-unique addresses and change and instability of intermediate topology from moment to moment.
Anybody out there have any spare venture capital? :)
Since you are trying to get it to work correctly with an addressing scheme which only very weakly encodes topological information, the lossy DV approach to propagating routing information (as opposed to a map-exchanging scheme), three huge churny databases (the mapping of information to URL, the mapping of hostname to IP address and the mapping of IP addresses to paths) and attempting to come up with a non existant database or workable heuristics (the mapping of n observed paths to a graph of connectivity among m endpoints), I would say that you need the level of funding you could only raise from such lucrative business as the Psychic Friends Network. Meanwhile, I suggest you look at Dave Clark's distributed database work (I think I remember Van Jacobson commenting in more detail than his "How to Kill the Internet" viewgraphs on how to apply this to the WWW) and consider a scheme where rather than a database which centralizes searches for a weak data architechture, a better architecture and a scheme which treats every reference into it as a search for the most local copy would be a better development direction. Note that since this seems to be possible through feature accretion upon the current practice of aggressive interception of WWW queries, you probably want to think about whether time-to-market issues lead you into developing on that type of platform. (Several people reading this message are heavily into researching that sort of thing already, btw.) Sean.