Unix machines set up by anyone with half a brain run a local caching server, and use forwarders. IE, the nameserver process can establish a persistent TCP connection to its trusted forwarders, if we just let it. That old sneer we used to use against Windows users of not having a "full featured host" and all. Windows stub resolvers multiplex through AD to a MS DNS server; which can easily use TCP to its trusted forwarders; unless they have no DC, which is not so common; in which case they just use standard queries, presumably to a patched ISP host (often a Nominum box). In both cases, the fix is in the local server, which serves only a few (and in a "full featured host" only one) machines using TCP to its forwarder, and the chain repeating itself. I don't see the problem with going to TCP for the recursive queries here. It's akin to the CDN scaling model, which has worked pretty well.
-----Original Message----- From: Joe Greco [mailto:jgreco@ns.sol.net] Sent: Sunday, August 10, 2008 3:14 PM To: Chris Paul Cc: nanog@merit.edu Subject: Re: maybe a dumb idea on how to fix the dns problems i don't know....
But we only care about TCP connection setup time in *interactive* sessions (a human using something like the web). If you have a persistent connection to your dns server from your dns resolver on your browser machine, you just send the request.... no TCP setup there at all. You can even pool connections. We do this stuff in LDAP all the time.
How does TCP resolution work in most resolver libraries? A TCP connection for each lookup? That is kind of dumb isn't it, speaking of dumb.... I actually don't know. Not much of a coder, so I'll let you coders check your code and get back to me on that...
well.. maybe i'll fire up snort or wireshark and check it out later with some different dns libs....
Pretending for a moment that it was even possible to make such large scale changes and get them pushed into a large enough number of clients to matter, you're talking about meltdown at the recurser level, because it isn't just one connection per _computer_, but one connection per _resolver stub_ per _computer_ (which, on a UNIX machine, would tend to gravitate towards one connection per process), and this just turns into an insane number of sockets you have to manage.
For your average ISP recurser where they only have 50,000 people online at any given time, this could still be half a million open sockets. We already know this sort of thing doesn't scale well.
This is very broken in any number of other ways. This message is not intended to imply otherwise.
... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.