It's usually interesting to be proven wrong, but perhaps not in this case. I was among the first to point out that the 11-second DNS poisioning claim made by Vixie only worked out to about a week of concentrated attack after the patch. This was a number I extrapolated purely from Paul's 11-second number and the factor-of-65000x introduced by port randomization. I am very, very, very disheartened to be shown to be wrong. As if 8 days wasn't bad enough, a concentrated attack has been shown to be effective in 10 hours. See http://www.nytimes.com/2008/08/09/technology/09flaw.html With modern data rates being what they are, I believe that this is still a severe operational hazard, and would like to suggest a discussion of further mitigation strategies. On my list of concepts: 1) Use of multiple IP addresses for queries (reduce success rate somewhat) 2) Rate-limiting of query traffic, since I really doubt many sites actually have recursers that need to be able to spike to many times their normal traffic, 3) Forwarding of failed queries (which I believe BIND doesn't currently allow) to a "backup" server (which would seem to be interesting in combination with 2) 4) I wonder if it wouldn't make sense to change the advice for large-scale recursers to run multiple instances of BIND, internally distribute the requests (random pf/ipfw load balancing) to present a version of 1) that would render smaller segments of the user base vulnerable in the event of success. It would represent more memory, more CPU, and more requests, but a smaller victory for attackers. 5) Modify BIND to report mismatch QID's. Not a log report per hit, but some reasonable strategy. Make the default installation instructions include a script to scan for these - often - and mail hostmaster. 6) Have someone explain to me the reasoning behind allowing the corruption of in-cache data, even if the data would otherwise be in-baliwick. I'm not sure I quite get why this has to be. It would seem to me to be safer to discard the data. (Does not eliminate the problem, but would seem to me to reduce it) 7) Have someone explain to me the repeated claims I've seen that djbdns and Nominum's server are not vulnerable to this, and why that is. It would seem that the floor is wide open to a large number of possibilities for mitigating this beyond the patch. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.
jgreco@ns.sol.net (Joe Greco) writes:
I am very, very, very disheartened to be shown to be wrong. As if 8 days wasn't bad enough, a concentrated attack has been shown to be effective in 10 hours. See http://www.nytimes.com/2008/08/09/technology/09flaw.html
that's what theory predicted. guessing a 30-or-so-bit number isn't "hard."
With modern data rates being what they are, I believe that this is still a severe operational hazard, and would like to suggest a discussion of further mitigation strategies. ...
i have two gripes here. first, can we please NOT use the nanog@ mailing list as a workshop for discussing possible DNS spoofing mitigation strategies? namedroppers@ops.ietf.org already has a running gun battle on that topic, and dns-operations@lists.oarci.net would be appropriate. but unless we're going to talk about deploying BCP38, which would be the mother of all mitigations for DNS spoofing attacks, it's offtopic on nanog@. second, please think carefully about the word "severe". any time someone can cheerfully hammer you at full-GigE speed for 10 hours, you've got some trouble, and you'll need to monitor for those troubles. 11 seconds of 10MBit/sec fit my definition of "severe". 10 hours at 1000MBit/sec doesn't. -- Paul Vixie -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On Aug 9, 2008, at 6:23 PM, Paul Vixie wrote:
second, please think carefully about the word "severe". any time someone can cheerfully hammer you at full-GigE speed for 10 hours, you've got some trouble, and you'll need to monitor for those troubles. 11 seconds of 10MBit/sec fit my definition of "severe". 10 hours at 1000MBit/sec doesn't.
I think what we're seeing here is the realization that DNS hosting, like web hosting, is no longer something that can simply be done by tossing a machine on the internet and leaving it there; it needs professional management, monitoring and updates. That's always a hard transition for some people to make, but it's one that has to be made; that's the world we live in. Kee Hinckley CEO/CTO Somewhere Inc. Somewhere: http://www.somewhere.com/ TechnoSocial: http://xrl.us/bh35i I'm not sure which upsets me more; that people are so unwilling to accept responsibility for their own actions, or that they are so eager to regulate those of everybody else.
* Joe Greco:
I am very, very, very disheartened to be shown to be wrong. As if 8 days wasn't bad enough, a concentrated attack has been shown to be effective in 10 hours. See http://www.nytimes.com/2008/08/09/technology/09flaw.html
Note that the actual bandwidth utilization on that GE link should be somewhere between 10% and 20% if you send minimally sized replies during spoofing. In fact, the theoretically predicted time for 50% success probability for 100mbps attacks is below one day. This also matches the numbers posted here: <http://tservice.net.ru/~s0mbre/blog/devel/networking/dns/2008_08_08.html>
1) Use of multiple IP addresses for queries (reduce success rate somewhat)
You must implement this carefully. Just using a load-balanced DNS setup doesn't work, for instance. The attacker could trigger the cache misses through a CNAME he controls, so he'd know which instance to attack in each round.
2) Rate-limiting of query traffic, since I really doubt many sites actually have recursers that need to be able to spike to many times their normal traffic,
The problem with that is that 130,000 queries over a 10 hour period (as in Evgeniy's experiment) are often lost in the noise. Only if the authoritative servers are RTT-wise close to your recursor, the attacker benefits from high query rates.
3) Forwarding of failed queries (which I believe BIND doesn't currently allow) to a "backup" server (which would seem to be interesting in combination with 2)
I don't think any queries fail in this scenario.
4) I wonder if it wouldn't make sense to change the advice for large-scale recursers to run multiple instances of BIND, internally distribute the requests (random pf/ipfw load balancing) to present a version of 1) that would render smaller segments of the user base vulnerable in the event of success. It would represent more memory, more CPU, and more requests, but a smaller victory for attackers.
User-specific DNS caches are interesting from a privacy perspective, too. But I don't think they'll work, except when the cache is in the CPE.
5) Modify BIND to report mismatch QID's. Not a log report per hit, but some reasonable strategy. Make the default installation instructions include a script to scan for these - often - and mail hostmaster.
Yes, better monitoring is crucial. Recent BIND 9.5 has a counter for mismatched replies, which should provide at least one indicator. Due to the diversity of potential attacks, it's very difficult to set up generic monitoring.
6) Have someone explain to me the reasoning behind allowing the corruption of in-cache data, even if the data would otherwise be in-baliwick. I'm not sure I quite get why this has to be. It would seem to me to be safer to discard the data. (Does not eliminate the problem, but would seem to me to reduce it)
The idea is that the delegated zone can introduce additional servers not listed in the delegated zone. (It's one thing that gets you a bit of IPv6 traffic.) Unfortunately, it's likely that performance would suffer for some sites if resolver
7) Have someone explain to me the repeated claims I've seen that djbdns and Nominum's server are not vulnerable to this, and why that is.
For DJBDNS, see: <http://article.gmane.org/gmane.network.djbdns/13371> Nominum has published a few bits about their secret sauce: <http://nominum.com/news_events/security_vulnerability_update.php> TCP fallback on detected attack attempts is expected to be sufficiently effective so that you can get away with a smaller source port pool. Even if it's not, on some platforms, a smallish pool is the only way to cope with the existing load until you can bring in more servers, so it's better than nothing. The TCP fallback idea was posted to namedroppers in 2006, in response to one of Bert's early drafts which evolved into the forgery resilience document, so it should not be encumbered. The heuristics when to trigger the attack could be, though.
Joe Greco wrote:
6) Have someone explain to me the reasoning behind allowing the corruption of in-cache data, even if the data would otherwise be in-baliwick. I'm not sure I quite get why this has to be. It would seem to me to be safer to discard the data. (Does not eliminate the problem, but would seem to me to reduce it)
I had this question in my post weeks ago. No one bothered to reply. Older poisoning is why the auth data must be within the same zone to be cached, but apparently no one bothered to question the wisdom of altering existing cache data. Wish they'd just fix the fault in the logic and move on. Talking til everyone is blue in the face about protocol changes and encryption doesn't serve operations. There are recursive resolvers that work just fine without the issues some standard resolvers have. The protocol seems to work, some vendors just need to change how they use it and tighten up on cache integrity.
7) Have someone explain to me the repeated claims I've seen that djbdns and Nominum's server are not vulnerable to this, and why that is.
PowerDNS has this to say about their non-vulnerability status: http://mailman.powerdns.com/pipermail/pdns-users/2008-July/005536.html I know some very happy providers that haven't had to patch. I hope to be one of them on the next round. Jack
In a message written on Mon, Aug 11, 2008 at 09:41:54AM -0500, Jack Bates wrote:
7) Have someone explain to me the repeated claims I've seen that djbdns and Nominum's server are not vulnerable to this, and why that is.
PowerDNS has this to say about their non-vulnerability status:
http://mailman.powerdns.com/pipermail/pdns-users/2008-July/005536.html
I know some very happy providers that haven't had to patch. I hope to be one of them on the next round.
It's not that they are immune to the attack, and I think a few people deserve to be smacked around for the language they use..... Let's be perfectly clear, without DNSSEC or an alteration to the DNS Protocol THERE IS NO WAY TO PREVENT THIS ATTACK. There are only ways to make the attack harder. So what PowerDNS, DJB and others are telling you is not that you are immune, it is that you're not the low hanging fruit. A more direct way of stating their press releases would be: Everyone else figured out it took 3 minutes to hack their servers and implemented patches to make it take 2 hours. Our server always had the logic to make it take 2 hours, so we were ahead of the game. Great. If your vendor told you that you are not at risk they are wrong, and need to go re-read the Kaminski paper. EVERYONE is vunerable, the only question is if the attack takes 1 second, 1 minute, 1 hour or 1 day. While possibly interesting for short term problem management none of those are long term fixes. I'm not sure your customers care when .COM is poisoned if it took the attacker 1 second or 1 day. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
Leo Bicknell wrote:
If your vendor told you that you are not at risk they are wrong, and need to go re-read the Kaminski paper. EVERYONE is vunerable, the only question is if the attack takes 1 second, 1 minute, 1 hour or 1 day. While possibly interesting for short term problem management none of those are long term fixes. I'm not sure your customers care when .COM is poisoned if it took the attacker 1 second or 1 day.
EVERYONE with a CACHE MIGHT be vulnerable. Have studies been done to determine if existing cached records will be overwritten on ALL caching resolvers? Poisoning has always and will always be possible until DNSSEC, but the question isn't if you can poison a few off the wall records, but if you can poison the resolver in any meaningful way. If the cache isn't passively overwritten, then the only records you could poison would be records that aren't cached. The operational impact would be a much smaller scope. .COM will be cached constantly and to poison it, the attacker would have to forge the packet in the small window of cache expiry to renewal. This can be mitigated even more if sites give out auth on negative responses, which means for that specific domain, the attacker gets 1 shot to spoof and then the auth info is cached. Obviously there is a downside to sending larger packets, but that is a decision for the domain holder. I'll be happy to add DNSSEC to my operational list as soon as it's actually useful (other people can argue over who signs what). Jack
participants (6)
-
Florian Weimer
-
Jack Bates
-
Joe Greco
-
Kee Hinckley
-
Leo Bicknell
-
Paul Vixie