dns authority changes and lame servers
I find it exceptionally annoying that there is no process whereby the root servers and/or registrars can inform us of new/modified/removed delegations. The end result is that we serve a lot of stale zones long after they leave us. In the past I've hacked out some perl to audit our BIND configs and find the stuff that's moved, but it's ugly. And really, it's only partially dependable. For example, does the lack of root server records mean that: 1) the customer abandoned the zone and no longer wishes us to host it - or - 2) the customer forgot to pay the zone today, and tomorrow will bitch like hell if my script removes it overnight There are sub-problems of this, mostly related around customers who move and change their company names every six months. So now I have a customer whose zone has expired from the roots (no more email to them) and whose phone number has changed (no way to call and find out what real intentions re: expired zone are). It's not worth our time to physically drive to their site to answer a question that has little to no real financial implications for us (thanks to the free hosting of up to three domains with order of T1 service). So questions: 1) Does anyone else find this flaw in the DNS system as annoying as I do? If authority is to be regularly moved around between ISPs (who may be hosting thousands of customer domains), some automated process is needed to allow the ISP to make intelligent choices about when to remove a customer zone (authority transfers to another provider are likely the thing I'd key on, while non-payment removals would probably have a 30 day grace period since aforementioned physical moves are most likely cause of non-payment expiration). 2) Does anyone have a better way of cleaning out the dreck than some home-grown scripts? I've used sleep() judiciously to try not beating on any external servers more than necessary, but the output is less than 100% predictable and often hand audits are required before I can really generate automatic removals. We used to get bitch notices from someone about zones we were supposed to be authoritative for and weren't. This was even more annoying, since often the whole point was that the customer was "parking" it on our servers but had used their 3 freebies and had no real immediate use for it, so neglected to tell us of it. Fine. But give us some notification, from somebody, so we can stick an empty placeholder in there and be ready when it is deployed. For extra fun, this week a customer simply added their new providers DNS servers to their zone, without removing ours, or asking us to remove our config. So things were kinda whacky for them until someone called us and asked WTF was going on.
1) Does anyone else find this flaw in the DNS system as annoying as I do? If authority is to be regularly moved around between ISPs (who may be hosting thousands
As an operator of both free and paid DNS services, I wish there was a quick and easy way to pull a list of all of the zones that were delegated to a specific IP address. I say IP because people can now register their own DNS name servers at the registrar and use our IP addresses, so using the "official" hostname isn't even fool-proof. Being able to pull such an "official" list for forward DNS zones would certainly make life easier. We also have home-grown scripts that figure out whether a domain is delegated to us or not and flag the ones that aren't. In the case of the free service we flag them for two weeks and if they still aren't delegated to us after that period we disable them on the DNS servers but leave the domain in their account. In the case of the paid service we make a note of the status in the database but do not make any changes to the account (they're paying us, after all, to have it there). We don't do recursive lookups so it's not an issue (even though it's technically an RFC violation, if I remember correctly). I suppose the problem with having an official list to query would be getting all of the various registries to participate and keep it regularly updated. I personally qualify this as a slight inconvenience, but I'm not sure I would call it a flaw in the DNS system. -Justin Scott
Justin Scott wrote:
I suppose the problem with having an official list to query would be getting all of the various registries to participate and keep it regularly updated. I personally qualify this as a slight inconvenience, but I'm not sure I would call it a flaw in the DNS system.
If we just call DNS a distributed database, then it is easy to see that when the keys (glue at root) get updated, the relations to those keys *should* all reflect that change. The flaw is that the system creates cruft almost continuously. I'd love to see a graph of the cruft on a global scale, because I'm positive that over time it is growing (though in ways that are not always operationally impactful since most of it will be dead and abandoned zones still sitting in our named.conf). And I'll admit, I'm not sure how to properly fix it either. My first thought was a BIND directive to "expire-stale-zones <interval>;" so that every <interval> the server might check to be sure it is still auth, and if it has found authority changed, would stop giving out AAs for it. But I see all kinds of operational issues arising from that too (such as, how do we gracefully setup new customer's zone before it has transitioned here). Really, in my ideal Internet, once my server was notified that it was no longer authoritative, it would have an option to do a reverse xfer to the new auth servers (who would then be free to accept/reject the old information as necessary - can't count the number of times I've tried to get customers to provide zone file records in advance and failed because they don't know how/where to get them from). But that's an ideal Internet that will never exist, I know.
mike@rockynet.com (Mike Lewinski) writes:
Justin Scott wrote:
I suppose the problem with having an official list to query would be getting all of the various registries to participate and keep it regularly updated. I personally qualify this as a slight inconvenience, but I'm not sure I would call it a flaw in the DNS system.
If we just call DNS a distributed database, then it is easy to see that when the keys (glue at root) get updated, the relations to those keys *should* all reflect that change. ...
And I'll admit, I'm not sure how to properly fix it either. My first thought was a BIND directive to "expire-stale-zones <interval>;" so that every <interval> the server might check to be sure it is still auth, and if it has found authority changed, would stop giving out AAs for it. But I see all kinds of operational issues arising from that too (such as, how do we gracefully setup new customer's zone before it has transitioned here).
as duane said, it's possible to accomplish this with creative nagios plugins. however, i agree that it's something BIND should do, to be comprehensive. if someone is excited enough about this to consider sponsoring the work, please contact me (vixie@isc.org) to discuss details.
Really, in my ideal Internet, once my server was notified that it was no longer authoritative, it would have an option to do a reverse xfer to the new auth servers (who would then be free to accept/reject the old information as necessary - can't count the number of times I've tried to get customers to provide zone file records in advance and failed because they don't know how/where to get them from). But that's an ideal Internet that will never exist, I know.
it's because we didn't know exactly how to scope this problem that RFC 2136 does not permit the insertion or deletion of authority zones. noting that the ideal internet you want is within our grasp if we can only define it and sponsor it, i recommend taking up this thread on namedroppers@ops.ietf.org or dns-operations@lists.oarci.net. -- Paul Vixie
On Friday 19 October 2007 01:03, Paul Vixie wrote:
i agree that it's something BIND should do, to be comprehensive. if someone is excited enough about this to consider sponsoring the work, please contact me (vixie@isc.org) to discuss details.
Sounds like a really bad idea to me. The original problems sound like management issues mostly. Why are they letting customers who don't understand DNS update their NS records, and if they do, why is it a problem for them (and not just the customer who fiddled and broke stuff). Similarly we'll provide authoritative DNS for a zone as instructed (and paid for), even if it isn't delegated, if that is what the customer wants. For as long as one doesn't mix authoritative and recursive servers, it matters not a jot what a server believes it is authoritative for, only what is delegated. Hence one can't "graph the mistakes" as one would have to be psychic to find them. Perhaps they need to provide DNS status reports to clients, so the clients know if things are misconfigured? Monitoring/measuring is the first step in managing most things. But I think far more important to find and fix what is broken, than to try and let the machines prune it down when something is wrong, although I guess breaking things that are misconfigured is a good way to get them fixed ;)
Justin Scott wrote:
As an operator of both free and paid DNS services, I wish there was a quick and easy way to pull a list of all of the zones that were delegated to a specific IP address. I say IP because people can now register their own DNS name servers at the registrar and use our IP addresses, so using the "official" hostname isn't even fool-proof. Being able to pull such an "official" list for forward DNS zones would certainly make life easier.
How annoying or frustrating is it for people? Is it so annoying that you'd be willing to pay for a list of every public-facing NS record pointed at a given IP? I should also mention the related work starting over here: http://www.nanog.org/mtg-0710/presentations/Vixie-lightning.pdf -David
How annoying or frustrating is it for people?
Is it so annoying that you'd be willing to pay for a list of every public-facing NS record pointed at a given IP?
Nope. As I mentioned earlier, I qualify this as a minor inconvenience on the servers that I manage. It may be for someone who manages more zones than I do though. -Justin Scott
davidu@everydns.net (David Ulevitch) writes:
I should also mention the related work starting over here: http://www.nanog.org/mtg-0710/presentations/Vixie-lightning.pdf
indeed. while i don't have even a tenth of the analysis expertise of someone like robt, wessels, florian, or april, i am most assurely going to gather the raw data and make it available to those folks and similar folks. (noting that i've got 5Mbit/sec now and am hoping for 1000X that much a year from now, and noting that robt, wessels, florian, april, paul laudanski, and jeff chan have already got dedicated or shared hosts connected to the rebroadcast switch, and that more are welcome.) we may yet publish a top-500-domains web page, since that's a fairly easy thing to build using this raw data. current zonecuts, and nameserver name or address deltas, may also come from us, though i think it'll come sooner from wessels, april, or florian. if you're not submitting data yet, i hope you'll decide to do so, and drop me some e-mail (vixie@isc.org) to discuss details. -- Paul Vixie
Justin Scott wrote:
We also have home-grown scripts that figure out whether a domain is delegated to us or not and flag the ones that aren't. In the case of the free service we flag them for two weeks and if they still aren't delegated to us after that period we disable them on the DNS servers but leave the domain in their account. In the case of the paid service we make a note of the status in the database but do not make any changes to the account (they're paying us, after all, to have it there). We don't do recursive lookups so it's not an issue (even though it's technically an RFC violation, if I remember correctly).
We use home-grown scripts to follow the NS trail and verify that we are listed in some form or fashion. If we aren't, we handle the problem based on the criteria. If the domain is listed elsewhere, we immediately remove and notify. If the domain isn't listed in TLD, we notify yet hold the domain for I think 30 days before removing it; unless the status changes. Jack
On Thu, 18 Oct 2007, Jack Bates said:
We use home-grown scripts to follow the NS trail and verify that we are
I do something similar with a nagios plugin (perl script). It reports lameness and serial mismatch. I've put it online here: http://www.life-gone-hazy.com/src/nagios/check_zone_auth Duane W.
This report used to be quite useful in that regard: http://www.cymru.com/DNS/lame.html Perhaps Rob needs a coffee injection to get that going again? (BTW: Need/want some more of our famous "Colo Blend" Mr. Thomas?) --chuck
Hi, Chuck!
This report used to be quite useful in that regard:
http://www.cymru.com/DNS/lame.html
Perhaps Rob needs a coffee injection to get that going again?
Oh, my, I'd totally forgotten about that report. I do need to get that going again. I'll dig around now to see what we can produce in short order.
(BTW: Need/want some more of our famous "Colo Blend" Mr. Thomas?)
That was some of the best joe I've had, and I'd welcome another batch! Just don't tell the rest o' Team Cymru about it - it's mine, all mine! Muahaha! :) Thanks! Rob. -- Rob Thomas Team Cymru http://www.cymru.com/ cmn_err(do_panic, "Out of coffee!");
On Thu, Oct 18, 2007 at 12:27:35PM -0600, Mike Lewinski wrote:
I find it exceptionally annoying that there is no process whereby the root servers and/or registrars can inform us of new/modified/removed delegations.
Why can't you just query the other side of the zone cut once a day/week/month/youpick and compare the NS set from the delegating side to the NS set you have as the presumed authority side? That combined with a bit of information only you would have about which of your mismatches are changes you're currently managing, and which are surprises, would surely give you the data you need? At the same time, I'll point out that registries, at least, are under some pressure not to release too much information about this sort of thing. Nevertheless, various third parties are obtaining regular zone snapshots, and then making some sort of business out of their conclusions from the zone data. I'd (personally, not speaking for my employer) love to be able to offer such services, but any time a registry operator suggests anything of the sort, people get angry. To answer specific questions:
1) Does anyone else find this flaw in the DNS system as annoying as I do?
I don't think this is a "flaw in the DNS system" as much as it is a consequence of the funny economics currently on display among domain name registrars, DNS operators, and ISPs.
2) Does anyone have a better way of cleaning out the dreck than some home-grown scripts?
If you pay someone else to operate your DNS, then you get to offload the dreck-cleaning to them! But other than that, no. Best regards, A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@ca.afilias.info> M2P 2A8 +1 416 646 3304 x4110
Andrew Sullivan wrote:
I don't think this is a "flaw in the DNS system" as much as it is a consequence of the funny economics currently on display among domain name registrars, DNS operators, and ISPs.
I suppose it is a social problem at the very bottom here. If my users were educated enough to notify me when they moved authority I wouldn't have this problem. Maybe it's not fair to ask the Registrars/Roots to provide updates when it's really incumbent on their customers to do so. But then I start to balk -- any process that involves duplicate updates of one piece of information in two disparate systems is inefficient at best, and inherently prone to these kind of errors even with good intentions. There is an economic factor at play in our smaller scale operation. It's barely worth the time of billing to track all these "free" dns hostings. If we charged for it, the customers might be more attentive and notify us in order to be released from the charges (but likely we can't charge enough to really even make it worth their time either). At one level this is all a minor nuisance. Then I hear of the customer who, doing business with another former customer in the same building, spent a year printing out and walking over their emails because they were too lazy to call us and find out why they weren't getting through. I can pretty fairly claim that's "not our fault" that no one bothered to ask us to remove the cruft, but the customers on the receiving end of the DNS black hole just know that our DNS server was "broken" and "didn't get an update" and next week they'll be calling me asking me to "update my cache" when they can't get to foobar.com.
On Thu, 18 Oct 2007, Mike Lewinski wrote:
At one level this is all a minor nuisance. Then I hear of the customer who, doing business with another former customer in the same building, spent a year printing out and walking over their emails because they were too lazy to call us and find out why they weren't getting through. I can pretty fairly claim that's "not our fault" that no one bothered to ask us to remove the cruft, but the customers on the receiving end of the DNS black hole just know that our DNS server was "broken" and "didn't get an update" and next week they'll be calling me asking me to "update my cache" when they can't get to foobar.com.
Sounds like the real problem is that your authotative and caching DNS servers are mixed up. If they are split then it doesn't really matter if you still host a lame record because (since it's lame) nobody will ask you about it. -- Simon J. Lyall | Very Busy | Web: http://www.darkmere.gen.nz/ "To stay awake all night adds a day to your life" - Stilgar | eMT.
Simon Lyall wrote:
Sounds like the real problem is that your authotative and caching DNS servers are mixed up.
Understood. I've worked to turn off recursion to the world and made it through that without too much pain (except for the people who transport statically configured laptops on and off our network). The next step isn't trivial since it's a matter of updating quite a lot of data. It's important and we're working on it for the benefit of the customers, but this will be an operational issue for us for a while. I'm sure I'll get a response telling me to just change the glue at root for the NS and be done, but that won't help any other externally registered names pointing to my DNS with their own glue at root. Then there are the ARPAs, all with "interesting" pedigrees and various processes (true, they are least likely to be the problem, but now I have to split the zone management onto more than one server so it's not as simple as just changing my glue at root). And there's the case in the last few years of $REAL_BIG_ILEC who provides DSL service and has the same configuration we do. It took some legalish threats all the way to their CEO to get a stale zone removed, after 9 months of attempting to work through the "regular" channels (even their former customer couldn't get the request processed!). Their policy is apparently to not remove zones, ever. So no matter how quickly I transition my network, this is still going to affect your customers some day, because there are a lot of other people in the same boat I am - lots of statically configured DNS resolvers aren't going to change themselves and if the same caching servers are also hosting thousands of zones that were added incrementally over the last 12+ years.... We gave up long ago trying to get our technical contacts listed on each customer domain whois / registrar role account, because we couldn't get better than 50% response rate.
If they are split then it doesn't really matter if you still host a lame record because (since it's lame) nobody will ask you about it.
It's still cruft and ideally should still be cleaned up automatically based on the external authority changing.
On 20/10/2007, at 1:24 PM, Mike Lewinski wrote:
Simon Lyall wrote:
Sounds like the real problem is that your authotative and caching DNS servers are mixed up.
Understood. I've worked to turn off recursion to the world and made it through that without too much pain (except for the people who transport statically configured laptops on and off our network). The next step isn't trivial since it's a matter of updating quite a lot of data. It's important and we're working on it for the benefit of the customers, but this will be an operational issue for us for a while.
I've yet to try it, but if you're running BIND you should be able to split it up in to views: - View A takes queries from your end users (based on source IP) and acts as a recursive cache. - View B takes queries from everyone else (catchall) and answers authoritatively. You'll probably run in to a couple of problems where and end user needs an authoritative answer of a name you are authoritative for, but that'll be a small percentage I expect. Again, I haven't tested this, but I can't see any obvious reason why it wouldn't work.
If they are split then it doesn't really matter if you still host a lame record because (since it's lame) nobody will ask you about it.
It's still cruft and ideally should still be cleaned up automatically based on the external authority changing.
Maybe. Note that the same is true of MTA and MX servers. (ie. MX record points at the same place for domains you host, as your customers do to send mail to domains you don't host). -- Nathan Ward
The correct way to change a delegation is to: * add the new servers as stealth servers for the current zone. * if the old master is to be removed, make it a slave of the new master. * add the new NS records to the zone. * wait for all the slaves to have the new zone. * inform the parent zone of the new NS records. * wait until all the old NS RRsets have expired from caches (implies waiting for the parent's changes to propagate). * remove the old NS records from the zone. * wait for all the slaves to update. * inform the parent zone of the new NS records. * wait until all the intermediate NS RRsets have expired from caches (implies waiting for the parent's changes to propagate). * any slaves that are not being remove and that are still using the old master (or slave that is going away) need to be configured to use the new master by this point. * stop serving the zone on the old servers. Note: all through this process the namesevers listed in the NS records are answering for the zone in a consistant manner. Note: even if the parents informed you that the delegation was removed you still have to wait for the records to expire from caches *before* you can stop serving the zone. One can collapse the above slightly by informing the parent of the final NS RRset, rather than the intermediate NS RRset, but that won't work with registrars that check the childs NS RRset. One way to get around this would be to charge a cleanup fee that only gets charged when the client fails to notify you in advance that they are going change delegations. Mark
participants (14)
-
Andrew Sullivan
-
chuck goolsbee
-
David Ulevitch
-
Duane Wessels
-
Jack Bates
-
Justin Scott
-
Mark Andrews
-
Mike Lewinski
-
Nathan Ward
-
Paul Vixie
-
Rob Thomas
-
schahzad@gol.net.pk
-
Simon Lyall
-
Simon Waters