i experienced this exact same thing, and it was the secondary ns that NT was "fixating" on when making queries. (the secondary was up and down for a few weeks until a new one was shipped out--yes, off-site and off-AS ;)
i had a VERY hard time explaining to NT professionals that their email to our domains shouldn't be bouncing, and that 99% of the internet could get mail to our domains just fine with one operating nameserver. i also didn't have any proof that NT didn't do The Right Thing, and no one wanted to help me prove it by hanging on the phone with me after complaining that "your nameservers are down." is this misbehavior of NT documented anywhere? is it fixable? i don't know d*ck about NT, but i'd love to be able to at least suggest a fix and give someone a URL.
The DNS resolver for normal run-of-the-mill lookups handles failover properly. If anything, it is too ambitious. The algorithm suggested in RFC 1035 is to "wait 5 seconds" for a timeout before trying another server, while with WinSock-2 resolvers, the timeout threshold is one second, and then multiple unique queries are sent shotgun-fashion to ALL of the other servers simultaneously. The aggressiveness level is a matter of administrative taste: when a query is for a name in a slow remote zone, the shotgun approach is annoying. When the server is kaput, five seconds can be too long. The NT4 DNS server is not this aggressive when it does failover queries against remote zones. It waits a few seconds for responses to come back and even ignores ICMP Destination Unreachable Port Unreachable errors (generated when the DNS server is administratively down but the server is still running). Note that ignoring ICMP errors is not uncommon, the stock Linux resolver also does it, while Solaris and a few others do the right thing. Anyway, it is possible to get into a situation where the DNS resolver on a WinSock-2 system agressively fails out while the local DNS server is still searching for an answer. In truth everything is doing what it is supposed to do, just that the resolver does it too fast sometimes. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/