Jamie Scheinblum <jamie@fast.net> wrote:
While this thread is slowly drifting, I disagree with your assertion that so much of the web traffic is cacheable (nlanr's caching effort, if I remember, only got around 60% of requests hit in the cache, pooled over a large number of clients. That probably should be the correct percentage of cacheable content on the net).
Well, there is a big difference between being cacheable and being delivered more than once to the same locale during some period of time (aka cache hit). A lot of things are cacheable but never result in a cache hit simply because these things are low-demand. Actually, 60% hit rate is rather impressive; it is much better than you can get if you restrict time window for content replication to few milliseconds. (In multicasting world the equivalent of cache hit is a packet being duplicated).
If anything, the net is moving to be *more* dynamic.
Unlikely. The transport is becoming cheaper. Content production doesn't. This means that the same content is likely to be viewed by a larger audience.
The problem is that web sites are putting unrealistic expires on images and html files because they're being driven by ad revenues.
The big thing byte-wise are images. The ad agencies could do much better (and annoy people much less) if they provided cacheable images (probably embedded into non-cacheable HTML pages few dozens of bytes long, to ensure accountability). If they won't cooperate at some point the disparity between delivery speeds of cached content (which does not stress source servers) and the centrally-served ads will become large enough for people to start using ad filters (several are already available) to get significant percieved performance improvements. It is in the best interest of advertisers to make sure ads are delivered fast. Of course, the caches may help by providing adequate reporting, when requested.
The only way caching is going to truly be viable in the next 5 years is either by a commercial company stepping in and working with commercial content providers (which is happening now), or webserver software vendors work with content companies on truly embracing a hit reporting protocol.
So far the driving force behind cache deployment were ISPs. For them caching represents significant savings and lets to improve perceived quality of service by reducing content retrieval latency. With more and more hi-bandwidth content, the edge web servers will run of gas and will become the bottlenecks. (The current bottleneck is still in the packet transport, which is going to be fixed with DWDM and massively parallel routers). In other words, the real effect of cacheing is not in bandwidth savings (this is significant, but not really important) but in spreading popular content, thus removing hot spots and increasing real network performance (which is, btw, measured in user-visible latency, not in bandwidth). The existing cacheing experiments show as much as 3x improvement in median file retrieval times.
So basically, my assertion is that L4 caching on any protocol will not work if the content provider is given any control of TTL and metrics.
Well, messing up caches is against the objective interests of the content providers. They may be ignorant of that (some of them are... i talked once with the chief technical guy of one of the largest internet advertisement company), but they are going to be reminded about the realities when cacheing becomes ubiquotus and the difference between cacheing-friendly and cacheing-averse advertisers becomes apparent to a large audience. And, anyway, advertisers are shifting to click-thru accounting from eyeball accounting.
The only way web caching *really* works is when people get aggressive and ignore the expire tags from a network administrator point of view, not a content company's.
That's only because of lack of a useful reporting protocol. That is a technical issue and can be fixed.
Further, the more you rely on L4 implementations for caching, the more it seems you would be open to broken implementations... Although that is a broad statement...
That is a valid statement. However, there are no fundamental problems. --vadim