Re: Transparent Caching Solutions - Information Wanted
Interesting discussion - no many comments from ISPs currently using caching regarding the effectiveness - cache hit rates or bandwidth "saved". Any strategies for populating caches (in addition to user hits)? Bill At 12:08 PM 2/3/98 -0800, you wrote:
Or, if you are a "Cheap Bastard (tm)", you can take a copy of Squid (http://squid.nlanr.net), and an OS which can handle transparent proxying (Solaris, Linux, etc), a high performance RAID, and have yourself the equivilent. Squid is a heirarchial web cache/proxy/accelerator available for free.
i love squid. i use it as a nontransparent proxy and also as the web server accelerator for all of the content we publish. squid is really cool, especially squid-novm.
i worry a bit about the supposed proxy capabilities of these other os's. we don't think of ourselves as idiots here when it comes to kernel programming, and so the fact that it's taken a year of wizard level kernel muckraking to get a system that can do thousands of simultaneous transparent sessions makes me think that it's actually a hard problem.
we got basic kernel transparency working in an afternoon. the devil is in the details though.
U-NET Limited - Global Transit Division Transit for ISPs from PAIX, MaNAP or Telehouse London mailto:world@u-net.net http://world.u-net.net Tel 44 1925 484444 Fax 44 1925 484466
On Wed, 4 Feb 1998 at around 09:49:53, "BU" == Bill Unsworth penned:
BU> Interesting discussion - no many comments from ISPs currently using caching BU> regarding the effectiveness - cache hit rates or bandwidth "saved". BU> Any strategies for populating caches (in addition to user hits)? Take the stats from some of the large (top level I should say) caches for most popular sites and use wget over them after setting the http_proxy env var. NLANR publish the logfiles from their parent caches. BU> Bill Cheers, Lyndon -- Penis Envy is a total Phallusy.
Any strategies for populating caches (in addition to user hits)?
caches should be populated with objects that future users are probably going to want to fetch. usually caches use past popularity as an indicator of future popularity, where "past popularity" is some combination of the recency of the last fetch, the number of times it has been fetched, and the number of different users (if indeed you can tell them apart, client IP address is not as useful as we thought) who did the fetching. folks who have preloaded caches with data that they _thought_ was going to be popular, or which actually was popular _last_week_ when they made the tape, have lost horribly. the thing you need is knowledge about what objects are popular among your own and similar user communities. that's what multilevel caching does for you: if you have a secondary cache of some kind and all cache misses from a moderate to large number of primary caches get pulled through or by the secondary cache, then the chance of a future primary cache miss being turned into a secondary cache hit goes up by a lot. multilevel caching also has the benefit of keeping N primary caches in the same region from all having to go fetch from an origin server. and the secondary caches usually have more disk space on them. but the big advantage is sharing of object interest factor among a lot of different primary caches.
participants (3)
-
Bill Unsworth
-
Lyndon Levesley
-
Paul A Vixie