RE: Is anyone actually USING IP QoS?
While this thread is slowly drifting, I disagree with your assertion that so much of the web traffic is cacheable (nlanr's caching effort, if I remember, only got around 60% of requests hit in the cache, pooled over a large number of clients. That probably should be the correct percentage of cacheable content on the net). If anything, the net is moving to be *more* dynamic. The problem is that web sites are putting unrealistic expires on images and html files because they're being driven by ad revenues. I doubt that any of the US based commercial websites are interested in losing the entries in their hit logs. Caching is the type of thing is totally broken by session-ids, (sites like amazon.com and cdnow). The only way caching is going to truly be viable in the next 5 years is either by a commercial company stepping in and working with commercial content providers (which is happening now), or webserver software vendors work with content companies on truly embracing a hit reporting protocol. So basically, my assertion is that L4 caching on any protocol will not work if the content provider is given any control of TTL and metrics. The only way web caching *really* works is when people get aggressive and ignore the expire tags from a network administrator point of view, not a content company's. From what I remember, that was the only way the some Australian isps were able to make very aggressive caching work for them. Further, the more you rely on L4 implementations for caching, the more it seems you would be open to broken implementations... Although that is a broad statement... -jamie@networked.org
-----Original Message----- From: Vadim Antonov [SMTP:avg@kotovnik.com] Sent: Tuesday, June 15, 1999 4:23 PM To: Brett_Watson@enron.net; nanog@merit.edu Subject: Re: Is anyone actually USING IP QoS?
99% of Web content is write-once. It does not need any fancy management. The remaining 1% can be delivered end-to-end.
(BTW, i do consider intelligent cache-synchronization development efforts seriously misguided; there's a much simpler and much more scalable solution to the cache performance problem. If someone wants to invest, i'd like to talk about it :)
even if i assume caching is as efficient, or more so, than multicast, i'm still just trading one set of security/scalability concerns for others. caching is no more a silver bullet than multicast.
It is not that caching is a silver bullet, it is rather that multicating is unuseable at a large scale.
i won't deny the potential scalability problems but i think your generalizing/oversimplifying to say caching just works and has no security or scalability concerns.
Well, philosophical note: science is _all_ about generalizing. For an inventor of perpetuum mobile the flat refusal of a modern physicist to look into details to assert that it will not work sure looks as an oversimplifying. After all, the details of actual construction sure are a lot more complex than the second law of thermodynamics.
In this case, i just do not care to go into details of implementations. The L2/L3 mcasting is not scalable and _cannot be made_ scalable for reasons having nothing to do with deficiencies of protocols.
Caching algorithms do not have similar limitations, solely because they do not rely on distributed computations. So they have a chance of working. Of course, nothing "just works".
--vadim
PS To those who point that provider ABC already sells mcast service: there's an old saying at NASA that with enough thrust even pigs can fly. However, no reactively propulsed hog is likely to make it to an orbit all on its own.
Jamie writes:
While this thread is slowly drifting, I disagree with your assertion that so much of the web traffic is cacheable (nlanr's caching effort, if I remember, only got around 60% of requests hit in the cache, pooled over a large number of clients. That probably should be the correct percentage of cacheable content on the net). If anything, the net is moving to be *more* dynamic. The problem is that web sites are putting unrealistic expires on images and html files because they're being driven by ad revenues. I doubt that any of the US based commercial websites are interested in losing the entries in their hit logs. Caching is the type of thing is totally broken by session-ids, (sites like amazon.com and cdnow).
The only way caching is going to truly be viable in the next 5 years is either by a commercial company stepping in and working with commercial content providers (which is happening now), or webserver software vendors work with content companies on truly embracing a hit reporting protocol.
The workshop results from the last IRCACHE workshop have some interesting data on hit rates in a variety of caches (http://workshop.ircache.net/ for the main program). In general, it is worse even than you assert; it is often as bad as a 40 percent hit rate, even for a cache serving a large number of users. There has, however, been a fair amount of work to determine which algorithms for cache replacement are effective; John Dilley, in particular, has implemented several for Squid (the IRCACHE group's example cache engine). Like Jamie, I tend to believe that the current caching paradigm is broken. It relies on a community of users having sufficiently similar patterns of use to populate a cache with resources which will re-used; in most cases, that doesn't happen often enough to make it worth it, except in instances where the resources are very expensive to get (trans-oceanic links etc.) or where the cache and the aggregated user community are very large indeed. At a BOF at the last IRCACHE workshop, a group of us discussed the idea of creating a caching system that acts on behalf of the content providers rather than the user (an outward-facing "surrogate" instead of an inward-facing "proxy"). This paradigm relies on the fairly well documented phenomena of "flash crowds" or "cnn events" to presume that the users accessing a particular content provider will tend to have a high overlap for short time intervals. This reflect my experience as a NASA web guy, as well as the experience of some of the web hosting providers in the room at the time. You won't always get the high overlap rates of a CNN event, of course, but it seems worth checking to see if we can get better than the rates for proxy caches. Surrogates have their own problems, of course, but they do solve some of the traditional proxy issues like hit metering and authentication (since the surrogate operator has a prior business relationship with the content provider). This discussion and work continues on a mailing list "surrogates@equinix.com" (majordomo syntax to the -request address). The URL of the original BOF info is http://workshop.ircache.net/BOFs/bof2.html, for those who are interested. regards, Ted Hardie Equinix
I quite aggree - about 60 - 80% of WEB content is caching. But it's a great difference - WED consist of a lot of the smapp pieces; multimedia stores consist of (usially) a small number of the big pieces. See the difference. And then, caching is the second step, after the replication. And the third step is multicasting.
-jamie@networked.org
participants (3)
-
Alex P. Rudnev
-
hardie@equinix.com
-
Jamie Scheinblum