How common is lack of DNS server diversity?
Mice and Men found that 38% of the .COM domains surveyed had all their name servers on the same subnet. And 75% had one or more configuration errors. http://www.menandmice.com/dnsplace/healthsurvey.html DNS, like most databases, suffers from information entropy. In other words, it takes a lot of energy to keep information correctly updated while it is being changed. Anyone who has been Hostmaster for even a moderately sized ISP knows there is an amazing number of ways for people to mess up any of the pieces of data required to make the whole thing work. As several people pointed out, you can't really assume close IP addresses are in fact topologically close on the network. For example, if you look at the name severs for GENUITY.NET Domain servers in listed order: DNSAUTH1.SYS.GTEI.NET 4.2.49.2 DNSAUTH2.SYS.GTEI.NET 4.2.49.3 DNSAUTH3.SYS.GTEI.NET 4.2.49.4 They appear to be closely related. However, the addresses are in fact routed to very diverse locations on Genuity's network. You will find the same thing if you look at the name servers for UU.NET Domain servers in listed order: AUTH00.NS.UU.NET 198.6.1.65 AUTH60.NS.UU.NET 198.6.1.181 These servers are also geographically diverse. So I'm not sure if the 38% number is a true indication of how much diversity DNS servers have.
And what happens if the 4.0.0.0/8 route is flapped from the routing table? No more DNS. So you still want route diversity that isn't in the same block or aggregated block. Then I guess you try and get a bunch of /24's for your name servers but they might get filtered elsewhere by someone else. Thomas Sean Donelan wrote:
Mice and Men found that 38% of the .COM domains surveyed had all their name servers on the same subnet. And 75% had one or more configuration errors.
http://www.menandmice.com/dnsplace/healthsurvey.html
DNS, like most databases, suffers from information entropy.
In other words, it takes a lot of energy to keep information correctly updated while it is being changed. Anyone who has been Hostmaster for even a moderately sized ISP knows there is an amazing number of ways for people to mess up any of the pieces of data required to make the whole thing work.
As several people pointed out, you can't really assume close IP addresses are in fact topologically close on the network.
For example, if you look at the name severs for GENUITY.NET
Domain servers in listed order:
DNSAUTH1.SYS.GTEI.NET 4.2.49.2 DNSAUTH2.SYS.GTEI.NET 4.2.49.3 DNSAUTH3.SYS.GTEI.NET 4.2.49.4
They appear to be closely related. However, the addresses are in fact routed to very diverse locations on Genuity's network.
You will find the same thing if you look at the name servers for UU.NET
Domain servers in listed order:
AUTH00.NS.UU.NET 198.6.1.65 AUTH60.NS.UU.NET 198.6.1.181
These servers are also geographically diverse.
So I'm not sure if the 38% number is a true indication of how much diversity DNS servers have.
Thomas Kernen wrote:
And what happens if the 4.0.0.0/8 route is flapped from the routing table? No more DNS. So you still want route diversity that isn't in the same block or aggregated block.
You know, some folks simply decide that, for the cost and complexity of managing a box in someone else's space (not to mention potential security issues, et al, for some) that the loss of DNS server is fairly irrelevant if the entire rest of their netblock is offline. "Gee, DNS says that www.joebobsisp.com is over here... but I can't get there with the route yoyo-ing like mad". Have you *really* gained much, in this situation? (Note that I'm not claiming in the least that there aren't situations in which having off-AS servers is worthwhile, and if you have multiple ASes from aquisitions or the like, it would certainly seem wise to make use of that fact, but there ARE issues, and for some number of folks, those issues can easily outweigh the (often limited) benefits gained...) <Soapbox> Remember: one of the most important things about knowing the rules is that it makes it possible to evaluate whether breaking the rules is worth the consequences. </Soapbox> -- *************************************************************************** Joel Baker System Administrator - lightbearer.com lucifer@lightbearer.com http://www.lightbearer.com/~lucifer
[ On Saturday, January 27, 2001 at 01:08:38 (-0800), lucifer@lightbearer.com wrote: ]
Subject: Re: How common is lack of DNS server diversity?
Thomas Kernen wrote:
And what happens if the 4.0.0.0/8 route is flapped from the routing table? No more DNS. So you still want route diversity that isn't in the same block or aggregated block.
You know, some folks simply decide that, for the cost and complexity of managing a box in someone else's space (not to mention potential security issues, et al, for some) that the loss of DNS server is fairly irrelevant if the entire rest of their netblock is offline.
Well maybe, but, it depends on what your offered services are too. If you're offering e-mail and you've published your addresses as <user@subdomain.yourdomain.tld> but you've got no DNS to hand MX records back then there's a good chance that many improperly implemented mailers, and/or DNS resolver libraries that those mailers might use, will bounce any of your e-mail instead of keeping it in their queues and retrying at regular intervals. Whether this is worse than just being off the air temporarily or not depends on many factors. Of course if you're doing DNS for many zones then, as others have already pointed out, having all the nameservers routing into one AS is definitely going to be less reliable than some of your users might think it should be. -- Greg A. Woods +1 416 218-0098 VE3TCP <gwoods@acm.org> <robohack!woods> Planix, Inc. <woods@planix.com>; Secrets of the Weird <woods@weird.com>
On Fri, Jan 26, 2001 at 11:17:06PM -0400, Thomas Kernen wrote:
And what happens if the 4.0.0.0/8 route is flapped from the routing table? No more DNS. So you still want route diversity
Then it probably doesn't matter if you resolve their DNS, because you won't be getting to any of their services anyway.
On Sat, Jan 27, 2001 at 08:17:00AM -0500, Shawn McMahon wrote:
On Fri, Jan 26, 2001 at 11:17:06PM -0400, Thomas Kernen wrote:
And what happens if the 4.0.0.0/8 route is flapped from the routing table? No more DNS. So you still want route diversity
Then it probably doesn't matter if you resolve their DNS, because you won't be getting to any of their services anyway.
Only if all of their services are in 4.0.0.0/8 What if they're providing DNS services to customers who are not in 4.0.0.0/8 space, and who's route hasn't flapped? -- John Payne http://www.sackheads.org/jpayne/ john@sackheads.org http://www.sackheads.org/uce/ Fax: +44 870 0547954 To send me mail, use the address in the From: header
All: I have a related question that may be dated at this point but of which I'm curious. Some time ago we had a problem with a DNS server we located on a totally separate network to achieve DNS server diversity. At one point there was a failure on that network so that our DNS server loacted there could not be reached. It appeared from the reports/complaints we received that a number of client systems/resolvers had decided to only request data from the nonfunctional DNS server and despite failing on that wouldn't ask our other listed DNS servers. They therefore could not resolve addresses for otherwise functional network assets. I seem to remember this was somehow related to systems running Microsoft OS's. Am I confused or could it be that Microsoft knows something about this? Chuck Scott
I have heard from numerous sources that even if you provide multiple name servers in windows 9x tcp/ip config, only the first is used. Not sure how true it is currently. It seems from below you are talking about resolvers, not authorative name servers?? Brian On Sat, 27 Jan 2001, Charles Scott wrote:
All: I have a related question that may be dated at this point but of which I'm curious. Some time ago we had a problem with a DNS server we located on a totally separate network to achieve DNS server diversity. At one point there was a failure on that network so that our DNS server loacted there could not be reached. It appeared from the reports/complaints we received that a number of client systems/resolvers had decided to only request data from the nonfunctional DNS server and despite failing on that wouldn't ask our other listed DNS servers. They therefore could not resolve addresses for otherwise functional network assets. I seem to remember this was somehow related to systems running Microsoft OS's. Am I confused or could it be that Microsoft knows something about this?
Chuck Scott
On Sat, 27 Jan 2001, Brian wrote:
I have heard from numerous sources that even if you provide multiple name servers in windows 9x tcp/ip config, only the first is used. Not sure how true it is currently. It seems from below you are talking about resolvers, not authorative name servers??
Brian: Yes, talking about resolvers. If I remember the incident correctly, at the time a number of other nearby providers were using NT servers and it was those which failed to ask another authoritative name server for the domain when a particular server was not reachable. In our case, it was actually the second server listed for our domains. Chuck
On Sat, 27 Jan 2001, Charles Scott wrote:
On Sat, 27 Jan 2001, Brian wrote:
I have heard from numerous sources that even if you provide multiple name servers in windows 9x tcp/ip config, only the first is used. Not sure how true it is currently. It seems from below you are talking about resolvers, not authorative name servers??
Brian: Yes, talking about resolvers. If I remember the incident correctly, at the time a number of other nearby providers were using NT servers and it was those which failed to ask another authoritative name server for the domain when a particular server was not reachable. In our case, it was actually the second server listed for our domains.
i experienced this exact same thing, and it was the secondary ns that NT was "fixating" on when making queries. (the secondary was up and down for a few weeks until a new one was shipped out--yes, off-site and off-AS ;) i had a VERY hard time explaining to NT professionals that their email to our domains shouldn't be bouncing, and that 99% of the internet could get mail to our domains just fine with one operating nameserver. i also didn't have any proof that NT didn't do The Right Thing, and no one wanted to help me prove it by hanging on the phone with me after complaining that "your nameservers are down." is this misbehavior of NT documented anywhere? is it fixable? i don't know d*ck about NT, but i'd love to be able to at least suggest a fix and give someone a URL. thanks! deeann m.m. mikula network administrator telerama internet -- http://www.telerama.com abuse@telerama.com/spam@telerama.com 1.877.688.3200x501
i experienced this exact same thing, and it was the secondary ns that NT was "fixating" on when making queries. (the secondary was up and down for a few weeks until a new one was shipped out--yes, off-site and off-AS ;)
i had a VERY hard time explaining to NT professionals that their email to our domains shouldn't be bouncing, and that 99% of the internet could get mail to our domains just fine with one operating nameserver. i also didn't have any proof that NT didn't do The Right Thing, and no one wanted to help me prove it by hanging on the phone with me after complaining that "your nameservers are down." is this misbehavior of NT documented anywhere? is it fixable? i don't know d*ck about NT, but i'd love to be able to at least suggest a fix and give someone a URL.
The DNS resolver for normal run-of-the-mill lookups handles failover properly. If anything, it is too ambitious. The algorithm suggested in RFC 1035 is to "wait 5 seconds" for a timeout before trying another server, while with WinSock-2 resolvers, the timeout threshold is one second, and then multiple unique queries are sent shotgun-fashion to ALL of the other servers simultaneously. The aggressiveness level is a matter of administrative taste: when a query is for a name in a slow remote zone, the shotgun approach is annoying. When the server is kaput, five seconds can be too long. The NT4 DNS server is not this aggressive when it does failover queries against remote zones. It waits a few seconds for responses to come back and even ignores ICMP Destination Unreachable Port Unreachable errors (generated when the DNS server is administratively down but the server is still running). Note that ignoring ICMP errors is not uncommon, the stock Linux resolver also does it, while Solaris and a few others do the right thing. Anyway, it is possible to get into a situation where the DNS resolver on a WinSock-2 system agressively fails out while the local DNS server is still searching for an answer. In truth everything is doing what it is supposed to do, just that the resolver does it too fast sometimes. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
On Sat, 27 Jan 2001, deeann mikula wrote:
i experienced this exact same thing, and it was the secondary ns that NT was "fixating" on when making queries. (the secondary was up and down for a few weeks until a new one was shipped out--yes, off-site and off-AS ;)
Deeann: Yep, that sounds right. In our case it was also the secondary and I'm pretty sure it was in fact some NT servers and as you say they were "fixating" on the secondary and wouldn't ask the primary even though they had the ns data. Guess I wasn't loosing it afterall. Chuck
For example, if you look at the name severs for GENUITY.NET
Domain servers in listed order:
DNSAUTH1.SYS.GTEI.NET 4.2.49.2 DNSAUTH2.SYS.GTEI.NET 4.2.49.3 DNSAUTH3.SYS.GTEI.NET 4.2.49.4
They appear to be closely related. However, the addresses are in fact routed to very diverse locations on Genuity's network.
However the 4/8 route is what is advertised to the world, and there are certainly ocassions where that route fails to be propagated. It's more diverse than adjacent nodes on an ethernet, but hardly as diverse as would be ideal. Ideally, all DNS servers for a site shouldn't be in the same autonomous system. --jhawk (who recently made the observation that there are VBNS-connected root nameservers, but not VBNS-connected gtld servers, so a hypotehtical site with a VBNS connection and a commodity connection has great difficulty using their VBNS connection to resolve VBNS names when the commodity connection goes down)
participants (11)
-
Brian
-
Charles Scott
-
deeann mikula
-
Eric A. Hall
-
John Hawkinson
-
John Payne
-
lucifer@lightbearer.com
-
Sean Donelan
-
Shawn McMahon
-
Thomas Kernen
-
woods@weird.com