Re: Traffic Engineering (fwd)
osborne@terra.net writes:
So perhaps what we need is a way for search engines to determine what's "close" - geographically, politically, or speed-wise. This isn't particularly easy to do, but if it was implemented and only worked, say, 15% of the time, it'd still make things look that much faster.
How do you plan to accumulate a priori knowledge of distant topology and connectivity using current routing protocols and the current transport addressing scheme?
Idea: what about a search engine that understands a BGP table?
Whose BGP table? Remember that you want to determine what is most local to the client or its proxies.
1) perform the query. 2) if your query returns multiple places to get the same page a) look at the AS_PATH for the querying IP address b) look at the AS_PATHs for the found pages c) Determine and return the "closest" one - perhaps the one whose AS_PATH is most like that of the querying host.
(c) is full of landmines thanks to such nifty things as aggregation, the single-view propagation feature, deliberately non-unique addresses and change and instability of intermediate topology from moment to moment.
Anybody out there have any spare venture capital? :)
Since you are trying to get it to work correctly with an addressing scheme which only very weakly encodes topological information, the lossy DV approach to propagating routing information (as opposed to a map-exchanging scheme), three huge churny databases (the mapping of information to URL, the mapping of hostname to IP address and the mapping of IP addresses to paths) and attempting to come up with a non existant database or workable heuristics (the mapping of n observed paths to a graph of connectivity among m endpoints), I would say that you need the level of funding you could only raise from such lucrative business as the Psychic Friends Network. Meanwhile, I suggest you look at Dave Clark's distributed database work (I think I remember Van Jacobson commenting in more detail than his "How to Kill the Internet" viewgraphs on how to apply this to the WWW) and consider a scheme where rather than a database which centralizes searches for a weak data architechture, a better architecture and a scheme which treats every reference into it as a search for the most local copy would be a better development direction. Note that since this seems to be possible through feature accretion upon the current practice of aggressive interception of WWW queries, you probably want to think about whether time-to-market issues lead you into developing on that type of platform. (Several people reading this message are heavily into researching that sort of thing already, btw.) Sean.
How do you plan to accumulate a priori knowledge of distant topology and connectivity using current routing protocols and the current transport addressing scheme?
AS_PATH was the first idea - other such tools could include ping times and traceroute hop counts. It's been pointed out to me that IBM supposedly did something like this for the '96 Olympics.
Idea: what about a search engine that understands a BGP table?
Whose BGP table? Remember that you want to determine what is most local to the client or its proxies.
True - having a search engine look at its own BGP table is not the best indicator of distance, especially if the search client is "distant" (many AS's away) from the engine. However, given the prevalence of things like the Merit tools that show the BGP exchanges at major NAPs, it's conceivable that a search engine could grabe these tables on a regular basis, and from there it becomes pretty much an SPF tree through AS's. I do concur, though, that ping and traceroute are probably more sensible metrics to use.
1) perform the query. 2) if your query returns multiple places to get the same page a) look at the AS_PATH for the querying IP address b) look at the AS_PATHs for the found pages c) Determine and return the "closest" one - perhaps the one whose AS_PATH is most like that of the querying host.
(c) is full of landmines thanks to such nifty things as aggregation, the single-view propagation feature, deliberately non-unique addresses and change and instability of intermediate topology from moment to moment.
Agreed.
Anybody out there have any spare venture capital? :)
Since you are trying to get it to work correctly with an addressing scheme which only very weakly encodes topological information, the lossy DV approach to propagating routing information (as opposed to a map-exchanging scheme), three huge churny databases (the mapping of information to URL, the mapping of hostname to IP address and the mapping of IP addresses to paths) and attempting to come up with a non existant database or workable heuristics (the mapping of n observed paths to a graph of connectivity among m endpoints), I would say that you need the level of funding you could only raise from such lucrative business as the Psychic Friends Network.
Just for the record, I *was* kidding. I don't actually think I have the time or expertise to make it work. However, I think the idea is worth looking at. Two of the "three huge churny databases" (info to URL, url to IP) already are in place, and I bet the overhead involved in an un-cached IP lookup is a lot more than that of an SPF walk through a BGP tree.
Meanwhile, I suggest you look at Dave Clark's distributed database work (I think I remember Van Jacobson commenting in more detail than his "How to Kill the Internet" viewgraphs on how to apply this to the WWW) and consider a scheme where rather than a database which centralizes searches for a weak data architechture, a better architecture and a scheme which treats every reference into it as a search for the most local copy would be a better development direction.
Will do, thanks.
Sean.
eric
map-exchanging scheme), three huge churny databases (the mapping of information to URL, the mapping of hostname to IP address and the mapping of IP addresses to paths)
Just for the record, I *was* kidding. I don't actually think I have the time or expertise to make it work. However, I think the idea is worth looking at. Two of the "three huge churny databases" (info to URL, url to IP) already are in place, and I bet the overhead involved in an un-cached IP lookup is a lot more than that of an SPF walk through a BGP tree.
Some people already have a database of IP address to path that is used to more effectively route traffic from caching proxy hierarchies over multiple international connections. ******************************************************** Michael Dillon voice: +1-650-482-2840 Senior Systems Architect fax: +1-650-482-2844 PRIORI NETWORKS, INC. http://www.priori.net "The People You Know. The People You Trust." ********************************************************
Since you are trying to get it to work correctly with an addressing scheme which only very weakly encodes topological information, the lossy DV approach to propagating routing information (as opposed to a map-exchanging scheme),
DV = Direction Vector, i.e. paths like in BGP For more info on maps check out the big-internet archives for April ftp://munnari.oz.au/big-internet/list-archive/1997-04-Apr
Note that since this seems to be possible through feature accretion upon the current practice of aggressive interception of WWW queries,
This is essentially what a Squid http proxy cache does. And now companies like mirror-image and Cisco are coming out with transparent proxy caches that intercept port 80 traffic so this technology is likely to become more widespread even in North America. If only those vendors would make their software compatible with Squid's parent/sibling protocols for sharing cache contents then it would be even easier to offload a significant amount of web traffic onto caching proxies. ******************************************************** Michael Dillon voice: +1-650-482-2840 Senior Systems Architect fax: +1-650-482-2844 PRIORI NETWORKS, INC. http://www.priori.net "The People You Know. The People You Trust." ********************************************************
participants (3)
-
Michael Dillon
-
osborneļ¼ terra.net
-
Sean M. Doran