All of these problems are solvable, using common and well-known techniques: Daniel Senie <dts@senie.com> writes:
It might be worth thinking about the problem from the other end. From a web site owner's perspective, caching is a major annoyance. Here are the arguments you may encounter from a web site owner or web developer:
1. It interferes with content in many cases (web site visitors may see cached pages instead of current content). I know cache products claim this doesn't happen, but it has, and often.
Most (all?) reasonable caching products will honor whatever expiration information you put in the page, such as the Cache-Control header and Expires header. Where I've made careful use of these, I've never had problems with stale content, even from browser caches.
2. The website owner loses information on how many visitors are coming to the site.
A common technique to just count Web page hits is to <img src> a small image on the page, and then use that to count page visits, or to have the page itself not be cacheable, but the images (which are most of the load time) cachable. Having the page itself be dynamic and uncachable, while the images can always be cached, can be a big win all around; dynamic images are fairly rare (except from MRTG. :) )
3. The website owner loses the demographics on where visitors are coming from, and especially the number of unique visitors. (It's not helpful to know that one cache engine visited, if that cache engine equated to 10,000 visits in an hour).
You can use the X-Forwarded-For header that many caches provide to gather this same information. In the future, you may be able to use the protocol described in RFC 2227 to get more detailed information.
4. Banner advertising may or may not display properly when caching is involved, thereby costing the website money.
I've never experienced this; I've been viewing the Web through a cache or a hierarchy of caches for 2 years now, and I've never noticed anything weird with banner ads. At least nothing an "Expires: 0" wouldn't solve.
5. There's NOTHING in it for the website owner, other than the possibility that SOME pages might display faster for SOME users.
If folks running networks really think website designers and owners should care about caching, then there needs to be some sort of benefit (perhaps paid in dollars) to those affected. Otherwise, there's little reason for them to care.
I don't understand this; having Web pages which are effectively cached around the world reduces the load on your servers significantly (especially as more and more ISPs start to cache), and saves you significant bandwidth. This lets you buy fewer servers for your farm, and buy less upstream bandwidth. Right now, having a site which is cache friendly can save you money in a big way, at the same time savin ISPs money, making your page display how you want it (since the ISPs are already deploying caching, whether your pages are friendly to it or not), and having the page load faster for quite a few users. How is that not a benefit? How is that not paid in dollars? In the future, if Webserver operators would take effective cache performance while maintaining correct display into account when configuring their servers, and make sure that page designers do the same, that would allow caches to become more ubiquitous, and push people to set up large-scale cache hierarchies. It could get to the point where all of the non-dynamic content from an infinitely large Website could be served by an old desktop computer over a 28.8 modem, since it would just have to send its content once to the UUNet cache, once to the MCI/Worldcom cache, once to the Sprint cache, etc. Of course, that's still a ways off. :) Just my 2 cents, ------ScottG.