On Oct 3, 2011, at 11:20 AM, Leo Bicknell wrote:
Thus the impact to valid names should be minimal, even in the face of longer timeouts.
If you're performing validation on a recursive name server (or similar resolution process) expecting a signed response yet the response you receive is either unsigned or doesn't validate (i.e., bogus) you have to: 1) ask other authorities? how many? how frequently? impact? 2) consider implications on _entire chain of trust? 3) tell the client something? 4) cache what (e.g., zone cut from who you asked)? how long? 5) other? "minimal" is not what I was thinking..
Network layer integrity and secure routing don't help the majority of end users. At my house I can choose Comcast or AT&T service. They will not run BGP with me, I could not apply RPKI, secure BGP, or any other method to the connections. They may well do NXDOMAIN remapping on their resolvers, or even try and transparently rewrite DNS answers. Indeed some ISP's have even experimented with injecting data into port 80 traffic transparently!
Secure networks only help if the users have a choice, and choose to not use "bad" networks. If you want to be able to connect at Starbucks, or the airport, or even the conference room Wifi on a clients site you need to assume it's a rogue network in the middle.
The only way for a user to know what they are getting is end to end crypto. Period.
I'm not sure how "end to end" crypto helps end users in the advent of connectivity and *availability* issues resulting from routing brokenness in an upstream network which they do not control. "crypto", OTOH, depending on what it is and where in the stack it's applied, might well align with my "network layer integrity" assertion.
As for the speed of detection, its either instantenous (DNSSEC validation fails), or it doesn't matter how long it is (minutes, hours, days). The real problem is the time to resolve. It doesn't matter if we can detect in seconds or minutes when it may take hours to get the right people on the phone and resolve it. Consider this weekend's activity; it happened on a weekend for both an operator based in the US and a provider based in China, so you're dealing with weekend staff and a 12 hour time difference.
If you want to insure accuracy of data, you need DNSSEC, period. If you want to insure low latency access to the root, you need multiple Anycasted instances because at any one point in time a particular one may be "bad" (node near you down for maintenance, routing issue, who knows) which is part of why there are 13 root servers. Those two things together can make for resilliance, security and high performance.
You miss the point here Leo. If the operator of a network service can't detect issues *when they occur* in the current system in some automated manner, whether unintentional or malicious, they won't be alerted, they certainly can't "fix" the problem, and the potential exposure window can be significant. Ideally, the trigger for the alert and detection function is more mechanized than "notification by services consumer", and the network service operators or other network operators aware of the issue have some ability to institute reactive controls to surgically deal with that particular issue, rather than being captive to the [s]lowest common denominator of all involved parties, and dealing with additional non-determinsitic failures or exposure in the interim. Back to my earlier point, for *resilience* network layer integrity techniques and secure routing infrastructure are the only preventative controls here, and necessarily to augment DNSSEC's authentication and integrity functions at the application layer. Absent these, rapid detection enabling reactive controls that mitigate the issue are necessary. -danny