On Sun, Aug 19, 2012 at 5:37 PM, Mark Andrews <marka@isc.org> wrote:
As for the original problem. LRU replacement will keep "hot" items in the cache unless it is seriously undersized.
Maybe. This discussion is reminiscent of the Linux swappiness debate. Early in the 2.x series Linux kernels, the guy responsible for the virtual memory manager changed it to allow the disk cache to push program code and data out of ram if all other disk cache was more recently touched than the program data. Previously, the disk cache would only consume otherwise free memory. Programs would only get pushed to swap by memory demands from other programs. The users went ape. Suddenly if you copied a bunch of data from one disk to another, your machine would be sluggish and choppy for minutes or hours afterward as programs recovered swapped pages from disk and ran just long enough to hit the next section needing to be recovered from swap. Some folks ditched swap entirely to get around the problem. The guy insisted the users were wrong. He had the benchmarks, meticulously collected data and careful analysis to prove that the machines were more efficient with pure LRU swap. The math said he was right. 2+2=4. But it didn't. In the very common case of copy-a-bunch-of-files, simple LRU expiration of memory pages was the wrong answer. It caused the machine to behave badly. More work was required until a tuned and weighted LRU algorithm solved the problem. Whether John's solution of limiting the cache by zone subtree is useful or not, he started an interesting discussion. Consider, for example, what happens when you ask for www.google.com. You get a 7-day CNAME record for a 5 minute www.l.google.com A record and the resolver gets 2-day NS records for ns1.google.com, 4 day A records for ns1.google.com, 2 day NS records for a.gtld-servers.com, etc. Those authority records don't get touched again until www.l.google.com expires. With a hypothetically simple least recently used (LRU) algorithm, the 4 minute old A record for ns1.google.com was last touched longer ago than the 3 minute old A record for 5.6.7.8.rbl.antispam.com. So when the resolver needs more cache for 4.3.2.1.rbl.antispam.com, which record gets kicked? Then, of course, when www.l.google.com expires after five minutes the entire chain has to be refetched because ns1.google.com was already LRU'd out of the cache. This is distinctly slower than just refetching www.l.google.com from the already known address of ns1.google.com and the user sees a slight pause at their web browser while it happens. Would a smarter, weighted LRU algorithm work better here? Something where rarely used leaf data doesn't tend to expire also rarely used but much more important data from the lookup chain? Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004