DNS caches that support partitioning ?
Are there DNS caches that allow you to partition the cache for subtrees of DNS names? That is, you can say that all entries from say, in-addr.arpa, are limited to 20% of the cache. The application I have in mind is to see if it helps to keep DNSBL traffic, which caches poorly, from pushing other stuff out of the cache, but there are doubtless others. R's, John
On Fri, Aug 17, 2012 at 04:13:09PM -0000, John Levine wrote:
The application I have in mind is to see if it helps to keep DNSBL traffic, which caches poorly, from pushing other stuff out of the cache, but there are doubtless others.
If it's getting evicted from cache because other things are getting used more often, why do you want to put your thumb on that scale? The other queries are presumably benefitting just as much from the caching. Best, A -- Andrew Sullivan Dyn Labs asullivan@dyn.com
On Fri, 17 Aug 2012 15:32:11 -0400, Andrew Sullivan said:
On Fri, Aug 17, 2012 at 04:13:09PM -0000, John Levine wrote:
The application I have in mind is to see if it helps to keep DNSBL traffic, which caches poorly, from pushing other stuff out of the cache, but there are doubtless others.
If it's getting evicted from cache because other things are getting used more often, why do you want to put your thumb on that scale? The other queries are presumably benefitting just as much from the caching.
I think John's issue is that he's seeing those other queries *not* benefiting from the caching because they get pushed out by DNSBL queries that will likely not ever be used again. You don't want your cached entry for www.google.com to get pushed out by a lookup for a dialup line somewhere in Africa.
On 08/17/2012 01:32 PM, valdis.kletnieks@vt.edu wrote:
On Fri, 17 Aug 2012 15:32:11 -0400, Andrew Sullivan said:
On Fri, Aug 17, 2012 at 04:13:09PM -0000, John Levine wrote:
The application I have in mind is to see if it helps to keep DNSBL traffic, which caches poorly, from pushing other stuff out of the cache, but there are doubtless others. If it's getting evicted from cache because other things are getting used more often, why do you want to put your thumb on that scale? The other queries are presumably benefitting just as much from the caching. I think John's issue is that he's seeing those other queries *not* benefiting from the caching because they get pushed out by DNSBL queries that will likely not ever be used again. You don't want your cached entry for www.google.com to get pushed out by a lookup for a dialup line somewhere in Africa. If the dnsbl queries are not likely to be used again, why don't they set their ttl way down?
In any case, DNSBL's use of DNS has always been a hack. If v6 causes the hack to blow up, they should create their own protocol rather than ask how we can make the global DNS accommodate their misuse of DNS. Mike
If the dnsbl queries are not likely to be used again, why don't they set their ttl way down? Because the DNSBLs don't tune the TTLs for individual responses; they
On 8/17/12, Michael Thomas <mike@mtcc.com> wrote: likely still benefit from extended caching, high TTLs for responses makes sense for controlling load on the DNSBL servers, and your cache efficiency is not the DNSBL operators' problem. There are /some/ IP addresses that generate a lot of mail; and I would suggest there are /plenty/ of low-traffic recursive DNS servers in the world that may still make DNSBL queries. The /real/ problem to be addressed is the poor caching discipline used by DNS recursive servers. A recursive server with intelligent caching, would not simply allocate a chunk of heap space and hold for TTL, and when assigned cache space runs out, evict the oldest query, when space is short, eg. Least-Recently Hit cache entry;LRU; Least-Recent RR Response; Least-Recently Queried; or Lowest-Remaining TTL, are all naive. An intelligent recursive DNS caching system would leverage both RAM and Disk when appropriate, store the cache efficiently and persistently, keep track of cache evictions that occured, and the number of times a RR was requested would factor into not only avoiding eviction of the cache, but for popular RR, it would make sense for the recursive DNS server to make a pre-emptive query, to refresh a RR that is about to expire due to TTL, So that the response latency for some random time that RR is queried in the future doesn't go up due to the cache entry expiring. And I say that, because some very popular RRs have insanely low TTLs. Case in point: www.l.google.com. 300 IN A 74.125.227.148 www.l.google.com. 300 IN A 74.125.227.144 www.l.google.com. 300 IN A 74.125.227.146 www.l.google.com. 300 IN A 74.125.227.145 www.l.google.com. 300 IN A 74.125.227.147 www.l.google.com. 300 IN A 74.125.227.148
In any case, DNSBL's use of DNS has always been a hack. If v6 causes the hack to blow up, they should create their own protocol rather than ask how we can make the global DNS accommodate their misuse of DNS.
Mike [snip]
-- -JH
On Aug 18, 2012, at 8:44, Jimmy Hess <mysidia@gmail.com> wrote:
And I say that, because some very popular RRs have insanely low TTLs.
Case in point: www.l.google.com. 300 IN A 74.125.227.148 www.l.google.com. 300 IN A 74.125.227.144 www.l.google.com. 300 IN A 74.125.227.146 www.l.google.com. 300 IN A 74.125.227.145 www.l.google.com. 300 IN A 74.125.227.147 www.l.google.com. 300 IN A 74.125.227.148
Different people have different points of view. IMHO, if Google losses a datacenter and all users are stuck waiting for a long TTL to run out, that is Very Bad. In fact, I would call even 2.5 minutes (average of 5 min TTL) Very Bad. I'm impressed they are comfortable with a 300 second TTL. You obviously feel differently. Feel free to set your TTL higher. -- TTFN, patrick
What Patrick said. For large sites that offer services in multiple data centers on multiple IPs that can individually fail at any time, 300 seconds is actually a bit on the long end. -C On Aug 18, 2012, at 3:43 PM, Patrick W. Gilmore <patrick@ianai.net> wrote:
On Aug 18, 2012, at 8:44, Jimmy Hess <mysidia@gmail.com> wrote:
And I say that, because some very popular RRs have insanely low TTLs.
Case in point: www.l.google.com. 300 IN A 74.125.227.148 www.l.google.com. 300 IN A 74.125.227.144 www.l.google.com. 300 IN A 74.125.227.146 www.l.google.com. 300 IN A 74.125.227.145 www.l.google.com. 300 IN A 74.125.227.147 www.l.google.com. 300 IN A 74.125.227.148
Different people have different points of view.
IMHO, if Google losses a datacenter and all users are stuck waiting for a long TTL to run out, that is Very Bad. In fact, I would call even 2.5 minutes (average of 5 min TTL) Very Bad. I'm impressed they are comfortable with a 300 second TTL.
You obviously feel differently. Feel free to set your TTL higher.
-- TTFN, patrick
In message <DDF607B5-415B-41E8-9222-EB549D3DBB0C@semihuman.com>, Chris Woodfiel d writes:
What Patrick said. For large sites that offer services in multiple data = centers on multiple IPs that can individually fail at any time, 300 = seconds is actually a bit on the long end.
-C
Which is why the DNS supports multiple address records. Clients don't have to wait a minutes to fallover to a second address. One doesn't have to point all the addresses returned to the closest data center. One can get sub-second fail over in clients as HE code shows. As for the original problem. LRU replacement will keep "hot" items in the cache unless it is seriously undersized. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
On Sun, Aug 19, 2012 at 5:37 PM, Mark Andrews <marka@isc.org> wrote:
As for the original problem. LRU replacement will keep "hot" items in the cache unless it is seriously undersized.
Maybe. This discussion is reminiscent of the Linux swappiness debate. Early in the 2.x series Linux kernels, the guy responsible for the virtual memory manager changed it to allow the disk cache to push program code and data out of ram if all other disk cache was more recently touched than the program data. Previously, the disk cache would only consume otherwise free memory. Programs would only get pushed to swap by memory demands from other programs. The users went ape. Suddenly if you copied a bunch of data from one disk to another, your machine would be sluggish and choppy for minutes or hours afterward as programs recovered swapped pages from disk and ran just long enough to hit the next section needing to be recovered from swap. Some folks ditched swap entirely to get around the problem. The guy insisted the users were wrong. He had the benchmarks, meticulously collected data and careful analysis to prove that the machines were more efficient with pure LRU swap. The math said he was right. 2+2=4. But it didn't. In the very common case of copy-a-bunch-of-files, simple LRU expiration of memory pages was the wrong answer. It caused the machine to behave badly. More work was required until a tuned and weighted LRU algorithm solved the problem. Whether John's solution of limiting the cache by zone subtree is useful or not, he started an interesting discussion. Consider, for example, what happens when you ask for www.google.com. You get a 7-day CNAME record for a 5 minute www.l.google.com A record and the resolver gets 2-day NS records for ns1.google.com, 4 day A records for ns1.google.com, 2 day NS records for a.gtld-servers.com, etc. Those authority records don't get touched again until www.l.google.com expires. With a hypothetically simple least recently used (LRU) algorithm, the 4 minute old A record for ns1.google.com was last touched longer ago than the 3 minute old A record for 5.6.7.8.rbl.antispam.com. So when the resolver needs more cache for 4.3.2.1.rbl.antispam.com, which record gets kicked? Then, of course, when www.l.google.com expires after five minutes the entire chain has to be refetched because ns1.google.com was already LRU'd out of the cache. This is distinctly slower than just refetching www.l.google.com from the already known address of ns1.google.com and the user sees a slight pause at their web browser while it happens. Would a smarter, weighted LRU algorithm work better here? Something where rarely used leaf data doesn't tend to expire also rarely used but much more important data from the lookup chain? Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
On 8/19/12, Mark Andrews <marka@isc.org> wrote:
As for the original problem. LRU replacement will keep "hot" items in the cache unless it is seriously undersized. [snip] Well, that's the problem. Items that are not relatively "hot" will be purged, even though they may be very popular RRs. Cache efficiency is not defined as "keeping the hot items".
Efficient caching is defined as maximizing the hit percentage. The DNS cache may have a load of DNSBL queries; so all the entries will be cold. The problem in that case is not low utilization, it's high utilization of queries that are useless to cache, because those questions will only be asked once, because in LRU there's no buffer maintaining an eviction history. An example alternative strategy is, you have a Cache size of XX RR buckets, and you keep a list of YY cache replacements (not necessarily the entire RRs, just label and a 1-byte count of evictions), and you have a cache policy of: pick the entry whose TTL has expired OR that has the lowest eviction count, that is least-recently used or has the least number of queries, to replace,. -- -JH
Re: LRU badness One approach is called adaptive replacement cache (ARC) which is used by Oracle/Sun in ZFS, and was used in PostgreSQL for a time (and slightly modified to (as I recall) to be more like 2Q due to concerns over the IBM patent on the algorithm). Unfortunately, we do not have any implementations of the OPT (aka clairvoyant) algorithm, so something like 2Q might be an interesting approach to experiment with. Gary
While I hesitate to argue DNS with Mark, I feel this needs a response. On Aug 19, 2012, at 17:37 , Mark Andrews <marka@isc.org> wrote:
In message <DDF607B5-415B-41E8-9222-EB549D3DBB0C@semihuman.com>, Chris Woodfield writes:
What Patrick said. For large sites that offer services in multiple data = centers on multiple IPs that can individually fail at any time, 300 = seconds is actually a bit on the long end.
Which is why the DNS supports multiple address records. Clients don't have to wait a minutes to fallover to a second address. One doesn't have to point all the addresses returned to the closest data center. One can get sub-second fail over in clients as HE code shows.
I'm afraid I am not familiar with "HE code", so perhaps I am being silly here. But I do not think returning multiple A records for multiple datacenters is as useful as lowering the TTL. Just a few reasons off the top of my head: * How do you guarantee the user goes to the closer location if you respond with multiple addresses? Forcing users to go to farther away datacenters half the time is likely a poor trade-off for the occasional TTL problem when a DC goes down. * How many applications are even aware multiple addresses were returned? * How do you guarantee sub-second failover when most apps will wait longer than one second to see if an address responds? Etc. And that doesn't begin to touch thing such as cache efficiency that affect companies like Google, CDNs, etc.
As for the original problem. LRU replacement will keep "hot" items in the cache unless it is seriously undersized.
This was covered well by others. -- TTFN, patrick
On Aug 20, 2012, at 5:24 PM, Patrick W. Gilmore wrote:
But I do not think returning multiple A records for multiple datacenters is as useful as lowering the TTL.
Some folks do this via various GSLB mechanisms which selectively respond with different records based on the assumed relative topological distance between the querying resolver and various server/service instantiations in different locations. ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton
On Aug 20, 2012, at 06:49 , "Dobbins, Roland" <rdobbins@arbor.net> wrote:
On Aug 20, 2012, at 5:24 PM, Patrick W. Gilmore wrote:
But I do not think returning multiple A records for multiple datacenters is as useful as lowering the TTL.
Some folks do this via various GSLB mechanisms which selectively respond with different records based on the assumed relative topological distance between the querying resolver and various server/service instantiations in different locations.
"Some folks" == "more than half of all traffic on broadband modems" these days. However, I think you missed a post or two in this thread. The original point was you need a low TTL to respond with a single A record or multiple A records which all point to the same datacenter in case that node / DC goes down. Mark replied saying you can respond with multiple A records pointing at multiple DCs, thereby allowing a much longer TTL. My question above is asking Mark how you guarantee the user/application selects the A record closest to them and only use the other A record when the closer one is unavailable. -- TTFN, patrick
On Aug 20, 2012, at 5:56 PM, Patrick W. Gilmore wrote:
My question above is asking Mark how you guarantee the user/application selects the A record closest to them and only use the other A record when the closer one is unavailable.
I understand - my point was that folks using a GSLB-type solution would generally include availability probing in the GSLB stack, so that a given instance won't be included in answers if it's locally unavailable (obviously, the GSLB can't know about all path elements between the querying resolver and the desired server/service). ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Luck is the residue of opportunity and design. -- John Milton
On Aug 20, 2012, at 10:07 , "Dobbins, Roland" <rdobbins@arbor.net> wrote:
On Aug 20, 2012, at 5:56 PM, Patrick W. Gilmore wrote:
My question above is asking Mark how you guarantee the user/application selects the A record closest to them and only use the other A record when the closer one is unavailable.
I understand - my point was that folks using a GSLB-type solution would generally include availability probing in the GSLB stack, so that a given instance won't be included in answers if it's locally unavailable
How does that allow for a long TTL? If you set a 3600 second TTL when the DC is up, and the DC goes down 2 seconds later, what do you do?
(obviously, the GSLB can't know about all path elements between the querying resolver and the desired server/service).
Says who? :) -- TTFN, patrick
In message <0D919D57-BDA0-4FDA-873D-3DC0CD5745C0@ianai.net>, "Patrick W. Gilmore" writes:
On Aug 20, 2012, at 06:49 , "Dobbins, Roland" <rdobbins@arbor.net> = wrote:
On Aug 20, 2012, at 5:24 PM, Patrick W. Gilmore wrote: =20
But I do not think returning multiple A records for multiple = datacenters is as useful as lowering the TTL. =20 Some folks do this via various GSLB mechanisms which selectively = respond with different records based on the assumed relative topological = distance between the querying resolver and various server/service = instantiations in different locations.
"Some folks" =3D=3D "more than half of all traffic on broadband modems" = these days.
However, I think you missed a post or two in this thread.
The original point was you need a low TTL to respond with a single A = record or multiple A records which all point to the same datacenter in = case that node / DC goes down. Mark replied saying you can respond with = multiple A records pointing at multiple DCs, thereby allowing a much = longer TTL.
My question above is asking Mark how you guarantee the user/application = selects the A record closest to them and only use the other A record = when the closer one is unavailable.
You can't but a GSLB also can't know if the path from the client to the DC selected by the GSLB will work. There is a high probability that it will but no certainty. By returning addresses to multiple DC's you increase the probability that a client will be able to connect in the presence of network errors not visible to the GSLB control algorithms. If you want to add weights etc. then you need to use something like SRV to pass this information to the client. The GSLB can then adjust these. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Patrick W. Gilmore <patrick@ianai.net> wrote:
On Aug 19, 2012, at 17:37 , Mark Andrews <marka@isc.org> wrote:
Which is why the DNS supports multiple address records. Clients don't have to wait a minutes to fallover to a second address. One doesn't have to point all the addresses returned to the closest data center. One can get sub-second fail over in clients as HE code shows.
I'm afraid I am not familiar with "HE code", so perhaps I am being silly here.
Mark is referring to "happy eyeballs": http://www.isc.org/community/blog/201101/how-to-connect-to-a-multi-homed-ser... Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first. Rough, becoming slight or moderate. Showers, rain at first. Moderate or good, occasionally poor at first.
On Aug 20, 2012, at 08:25 , Tony Finch <dot@dotat.at> wrote:
Patrick W. Gilmore <patrick@ianai.net> wrote:
On Aug 19, 2012, at 17:37 , Mark Andrews <marka@isc.org> wrote:
Which is why the DNS supports multiple address records. Clients don't have to wait a minutes to fallover to a second address. One doesn't have to point all the addresses returned to the closest data center. One can get sub-second fail over in clients as HE code shows.
I'm afraid I am not familiar with "HE code", so perhaps I am being silly here.
Mark is referring to "happy eyeballs": http://www.isc.org/community/blog/201101/how-to-connect-to-a-multi-homed-ser...
Oh. Yep, I was being silly, thinking only of v4. (I'm sleep deprived of late - yes, more than usual.) Unfortunately, whether we like it or not, 99+% of traffic on the 'Net is still v4, as were the examples given. Even with HE, though, there is no (not yet a?) way in DNS to signal "use this A record first, then that one if the first doesn't work / is slow / whatever". Any chance of getting MX-style weights for A records? :) Even then, it would not solve the original problem of low TTLs. Just as a simple example, when traffic ramps quickly, a provider may want to move some users off a node to balance traffic. With a long TTL, that's not really possible baring really bad hacks like DoS'ing some users to hope they use the next A record, which would lead to massive complaints. We could go on, but hopefully the point is clear that low TTLs are useful in many instances despite the ability to return multiple A records. -- TTFN, patrick
Once upon a time, Patrick W. Gilmore <patrick@ianai.net> said:
* How many applications are even aware multiple addresses were returned?
Most anything that supports IPv6 should handle this correctly, since getaddrinfo() will return a list of addresses to try.
* How do you guarantee sub-second failover when most apps will wait longer than one second to see if an address responds?
That's a bigger issue. Also, for web services, the application might wait, but the end-user usually won't (if the site doesn't come up in a second, they move on to the something else). -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble.
On Aug 20, 2012, at 08:47 , Chris Adams <cmadams@hiwaay.net> wrote:
Once upon a time, Patrick W. Gilmore <patrick@ianai.net> said:
* How many applications are even aware multiple addresses were returned?
Most anything that supports IPv6 should handle this correctly, since getaddrinfo() will return a list of addresses to try.
Ah, the amazing new call which destroys any possibility of randomness or round robin or other ways of load balancing between A / AAAA records. Yes, all of us returning more than one A / AAAA record are hoping that gets widely deployed instantly. Or not. -- TTFN, patrick
Patrick W. Gilmore <patrick@ianai.net> wrote:
On Aug 20, 2012, at 08:47 , Chris Adams <cmadams@hiwaay.net> wrote:
Most anything that supports IPv6 should handle this correctly, since getaddrinfo() will return a list of addresses to try.
Ah, the amazing new call which destroys any possibility of randomness or round robin or other ways of load balancing between A / AAAA records. Yes, all of us returning more than one A / AAAA record are hoping that gets widely deployed instantly. Or not.
The problem is RFC 3484 address selection; getaddrinfo is just the usual place this is implemented. I had believed that there was work in progress to fix this problem with the specs but it seems to have stalled. http://tools.ietf.org/html/draft-ietf-6man-rfc3484-revise-05 Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first. Rough, becoming slight or moderate. Showers, rain at first. Moderate or good, occasionally poor at first.
On 8/20/12 10:11 AM, Tony Finch wrote:
Patrick W. Gilmore <patrick@ianai.net> wrote:
On Aug 20, 2012, at 08:47 , Chris Adams <cmadams@hiwaay.net> wrote:
Most anything that supports IPv6 should handle this correctly, since getaddrinfo() will return a list of addresses to try.
Ah, the amazing new call which destroys any possibility of randomness or round robin or other ways of load balancing between A / AAAA records. Yes, all of us returning more than one A / AAAA record are hoping that gets widely deployed instantly. Or not.
The problem is RFC 3484 address selection; getaddrinfo is just the usual place this is implemented. I had believed that there was work in progress to fix this problem with the specs but it seems to have stalled. http://tools.ietf.org/html/draft-ietf-6man-rfc3484-revise-05
Tony.
It's in the RFC editor queue actually: http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/?include_text=1 http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/history/ --Shumon.
Shumon Huque <shuque@upenn.edu> wrote:
On 8/20/12 10:11 AM, Tony Finch wrote:
The problem is RFC 3484 address selection; getaddrinfo is just the usual place this is implemented. I had believed that there was work in progress to fix this problem with the specs but it seems to have stalled. http://tools.ietf.org/html/draft-ietf-6man-rfc3484-revise-05
It's in the RFC editor queue actually:
http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/?include_text=1 http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/history/
Excellent :-) Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first. Rough, becoming slight or moderate. Showers, rain at first. Moderate or good, occasionally poor at first.
On 20 Aug 2012, at 16:39, Tony Finch <dot@dotat.at> wrote:
Shumon Huque <shuque@upenn.edu> wrote:
On 8/20/12 10:11 AM, Tony Finch wrote:
The problem is RFC 3484 address selection; getaddrinfo is just the usual place this is implemented. I had believed that there was work in progress to fix this problem with the specs but it seems to have stalled. http://tools.ietf.org/html/draft-ietf-6man-rfc3484-revise-05
It's in the RFC editor queue actually:
http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/?include_text=1 http://datatracker.ietf.org/doc/draft-ietf-6man-rfc3484bis/history/
Excellent :-)
Tony.
I'll ask the IETF admins to link -revise to the bis draft so this continuation is clearer in the tools pages. Tim
On 20/08/2012 14:18, Patrick W. Gilmore wrote:
On Aug 20, 2012, at 08:47 , Chris Adams <cmadams@hiwaay.net> wrote:
Most anything that supports IPv6 should handle this correctly, since getaddrinfo() will return a list of addresses to try.
Ah, the amazing new call which destroys any possibility of randomness or round robin or other ways of load balancing between A / AAAA records.
well, "new" as in about 16 years old.
Yes, all of us returning more than one A / AAAA record are hoping that gets widely deployed instantly. Or not.
inet_addr() is zomfg brain damaged, ipv4 only, non-thread-safe and needs to die horribly in a fire. getaddrinfo() puts at least the possibility of address selection into the hands of the developer which is going to be completely necessary for happy eyeballs / rfc3484bis / etc. Nick
In message <15BF47FF-9B3D-498B-A58B-F03853F0BA08@ianai.net>, "Patrick W. Gilmore" writes:
On Aug 20, 2012, at 08:47 , Chris Adams <cmadams@hiwaay.net> wrote:
Once upon a time, Patrick W. Gilmore <patrick@ianai.net> said:
* How many applications are even aware multiple addresses were = returned? =20 Most anything that supports IPv6 should handle this correctly, since getaddrinfo() will return a list of addresses to try.
Ah, the amazing new call which destroys any possibility of randomness or = round robin or other ways of load balancing between A / AAAA records.
Yes, all of us returning more than one A / AAAA record are hoping that = gets widely deployed instantly. Or not.
The problem is not getaddrinfo(). The problem is that there isn't a DNS type that reliably returns both IPv4 and IPv6 addresses so there is no way for the DNS server to sort these addresses together. AAAA could have done it if there had been agreement to publish mapped A records from the very start. That will have also helped with turning off of IPv4. One can fudge around the lack of this type with SRV records. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
In message <20120820124734.GA14383@hiwaay.net>, Chris Adams writes:
Once upon a time, Patrick W. Gilmore <patrick@ianai.net> said:
* How many applications are even aware multiple addresses were returned?
Most anything that supports IPv6 should handle this correctly, since getaddrinfo() will return a list of addresses to try.
* How do you guarantee sub-second failover when most apps will wait longer than one second to see if an address responds?
That's a bigger issue. Also, for web services, the application might wait, but the end-user usually won't (if the site doesn't come up in a second, they move on to the something else).
You file RFE / bug reports against the clients for having crappy fail over behaviour. It isn't hard to write TCP based code that falls over to the next available server. You don't have to wait for connect to fail before you attempt to connect to the next address. You just use a smarter connect loop. UDP code is a little harder as the work needs to more spread though the code than just replacing the dumb connect loop with a smart connect loop. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
On Sat, 18 Aug 2012, Patrick W. Gilmore wrote:
IMHO, if Google losses a datacenter and all users are stuck waiting for a long TTL to run out, that is Very Bad. In fact, I would call even 2.5 minutes (average of 5 min TTL) Very Bad. I'm impressed they are comfortable with a 300 second TTL.
Google is very aggressive about reducing user-visible latency, of which they cite DNS as a significant contributing factor. They may be choosing to strike a different balance of faster when everything is working correctly vs faster to recover when something breaks. -- Aaron
On Fri, Aug 17, 2012 at 04:32:45PM -0400, valdis.kletnieks@vt.edu wrote:
I think John's issue is that he's seeing those other queries *not* benefiting from the caching because they get pushed out by DNSBL queries that will likely not ever be used again. You don't want your cached entry for www.google.com to get pushed out by a lookup for a dialup line somewhere in Africa.
Oh, yes, I see. You're right, I misread it. But the proposed solution still seems wrong to me. If the entry for www.google.com gets invalidated by a new cache candidate that is never going to get used again, the cache is simply too small (or else it doesn't have enough traffic, and you shouldn't have a cache there at all). The cache needs to be big enough that it has a thrashy bit that is getting changed all the time. Those are the records that go into the cache and then die without being queried again. If the problem is that there's some other record in there that might be queried again, but that doesn't get queried often enough to keep it alive, then the additional cost of the recursive lookup is just not that big a deal. Best, A -- Andrew Sullivan Dyn Labs asullivan@dyn.com
The cache needs to be big enough that it has a thrashy bit that is getting changed all the time. Those are the records that go into the cache and then die without being queried again. If the problem is that there's some other record in there that might be queried again, but that doesn't get queried often enough to keep it alive, then the additional cost of the recursive lookup is just not that big a deal.
A big part of the problem is that TTLs reflect the server's estimate of how long it might be until the data changes, but not the client's likelihood of reusing the data. I publish a BL of hosts in Korea. The data changes maybe once a month so from my point of view a large TTL would make sense, but most of the hits are for bots spraying spam at random, so the client's not likely to reuse the data even if it sticks around for many hours. A plausible alternative to a partitioned cache might be TTL capping, some way to say to your cache that no matter what the TTL is on data from a range of names, don't keep it for more than a few seconds. Another thing that would be useful to me for figuring out the BL cache stuff would be traces of incoming IP addresses and timestamps for mail servers larger than my tiny system, so I could try out some cache models. R's, John
Hi!
The cache needs to be big enough that it has a thrashy bit that is getting changed all the time. Those are the records that go into the cache and then die without being queried again. If the problem is that there's some other record in there that might be queried again, but that doesn't get queried often enough to keep it alive, then the additional cost of the recursive lookup is just not that big a deal.
A big part of the problem is that TTLs reflect the server's estimate of how long it might be until the data changes, but not the client's likelihood of reusing the data. I publish a BL of hosts in Korea. The data changes maybe once a month so from my point of view a large TTL would make sense, but most of the hits are for bots spraying spam at random, so the client's not likely to reuse the data even if it sticks around for many hours.
A plausible alternative to a partitioned cache might be TTL capping, some way to say to your cache that no matter what the TTL is on data from a range of names, don't keep it for more than a few seconds.
Another thing that would be useful to me for figuring out the BL cache stuff would be traces of incoming IP addresses and timestamps for mail servers larger than my tiny system, so I could try out some cache models.
About any caching nameserver supports forwarding. So if you REALLY are afraid this will bite you. Fire up another instance. Forward all the reverse DNS to that and you are set. That other instance will handle your reverse DNS with a cache that you will specify. The other ones are not impacted at all. When you use forwarding it doesnt cache the entry. ('forward only' option in bind for example). So: Caching resolver -> Caching resolver dedicated for reverse DNS. Your first resolver does not cache reverse DNS and you are safe there. On the second you do the reverse DNS. I talked with Paul Vixie about doing this internal inside bind but that was not something they would be delighted to do (at least not now). If you could define how large your cache pool was for certain objects that would fix it also. Reverse DNS isnt the only issue here. There are many sites that give each user a subdomain. And if i look at my top talkers on some busy resolvers i do see that thats doing about 25-30% of the lookups currently. akamai.net, amazonaws.com and so on. All make nice use of DNS for this. Those have litterly millions of entry's in DNS also. And thats what currently is doing the load on resolvers... Oh and dont forget DNSSEC. Traffic went up allready by a factor 7 due to that. Those objects are also much larger to store. Bye, Raymond.
On Aug 18, 2012, at 5:35, Raymond Dijkxhoorn <raymond@prolocation.net> wrote:
Reverse DNS isnt the only issue here. There are many sites that give each user a subdomain. And if i look at my top talkers on some busy resolvers i do see that thats doing about 25-30% of the lookups currently.
akamai.net, amazonaws.com and so on. All make nice use of DNS for this. Those have litterly millions of entry's in DNS also. And thats what currently is doing the load on resolvers...
Akamai has no "users". So not really sure what you mean by that. There are a /lot/ of hostnames on *.akamai.net. That may have something to do with the 1000s of companies that use Akamai to deliver approximately 20% of all the traffic going down broadband modems. Which fits nicely in your DNS lookup percentage. -- TTFN, patrick
Raymond Dijkxhoorn <raymond@prolocation.net> wrote:
When you use forwarding it doesnt cache the entry. ('forward only' option in bind for example).
That's incorrect. Try configuring a forwarded zone and observe the TTLs you get in responses. The "forward only" option disables recursion but not cacheing.
I talked with Paul Vixie about doing this internal inside bind but that was not something they would be delighted to do (at least not now). If you could define how large your cache pool was for certain objects that would fix it also.
You might be able to hack it using a combination of forwarding zones, views, max-cache-ttl, and attach-cache. Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first. Rough, becoming slight or moderate. Showers, rain at first. Moderate or good, occasionally poor at first.
* John Levine:
Are there DNS caches that allow you to partition the cache for subtrees of DNS names? That is, you can say that all entries from say, in-addr.arpa, are limited to 20% of the cache.
You can build something like that using forwarders and most DNS caches. But it won't result in an exact split because cross-subtree CNAMEs and DNS delegations will cause caching outside the subtree. However, for in-addr.arpa, that's probably what you want anyway.
participants (19)
-
Aaron Hopkins
-
Andrew Sullivan
-
Chris Adams
-
Chris Woodfield
-
Dobbins, Roland
-
Florian Weimer
-
Gary Buhrmaster
-
Jimmy Hess
-
John Levine
-
Mark Andrews
-
Michael Thomas
-
Nick Hilliard
-
Patrick W. Gilmore
-
Raymond Dijkxhoorn
-
Shumon Huque
-
Tim Chown
-
Tony Finch
-
valdis.kletnieks@vt.edu
-
William Herrin