The cache needs to be big enough that it has a thrashy bit that is getting changed all the time. Those are the records that go into the cache and then die without being queried again. If the problem is that there's some other record in there that might be queried again, but that doesn't get queried often enough to keep it alive, then the additional cost of the recursive lookup is just not that big a deal.
A big part of the problem is that TTLs reflect the server's estimate of how long it might be until the data changes, but not the client's likelihood of reusing the data. I publish a BL of hosts in Korea. The data changes maybe once a month so from my point of view a large TTL would make sense, but most of the hits are for bots spraying spam at random, so the client's not likely to reuse the data even if it sticks around for many hours. A plausible alternative to a partitioned cache might be TTL capping, some way to say to your cache that no matter what the TTL is on data from a range of names, don't keep it for more than a few seconds. Another thing that would be useful to me for figuring out the BL cache stuff would be traces of incoming IP addresses and timestamps for mail servers larger than my tiny system, so I could try out some cache models. R's, John