I happened to notice the following at three separate sites around the US and one site in Europe: $ dig +short +norec @F.ROOT-SERVERS.NET HOSTNAME.BIND CHAOS TXT "pek2a.f.root-servers.org" and: $ dig +short +norec @F.ROOT-SERVERS.NET HOSTNAME.BIND CHAOS TXT "pek2b.f.root-servers.org" After running a couple of traceroutes it appears that he.net has a route for F's anycast IPv6 address (2001:500:2f::f) towards Beijing. According to https://www.isc.org/community/f-root/sites the Beijing node should be a "Local Node" (without IPv6 but I suppose the list is not up to date). I believe this means that a lot of DNS queries from IPv6 enabled sites in US and other countries are going to Beijing. I wonder if this is intentional? Chinese government (CNNIC) seems to be in the path. All my sites seem to have he.net somewhere in the IPv6 connectivity path. I wonder if this is specific to he.net or more wide-spread routing anomaly? I have notified he.net NOC and F-root @ ISC. Best Regards, -- Janne Snabb / EPIPE Communications snabb@epipe.com - http://epipe.com/
I see similar, intermittedly # dig +short +norec @F.ROOT-SERVERS.NET HOSTNAME.BIND CHAOS TXT "pek2a.f.root-servers.org" # dig +short +norec @F.ROOT-SERVERS.NET HOSTNAME.BIND CHAOS TXT "ord1b.f.root-servers.org" On Sun, Oct 2, 2011 at 12:40 PM, Janne Snabb <snabb@epipe.com> wrote:
I happened to notice the following at three separate sites around the US and one site in Europe:
$ dig +short +norec @F.ROOT-SERVERS.NET HOSTNAME.BIND CHAOS TXT "pek2a.f.root-servers.org"
and:
$ dig +short +norec @F.ROOT-SERVERS.NET HOSTNAME.BIND CHAOS TXT "pek2b.f.root-servers.org"
After running a couple of traceroutes it appears that he.net has a route for F's anycast IPv6 address (2001:500:2f::f) towards Beijing. According to https://www.isc.org/community/f-root/sites the Beijing node should be a "Local Node" (without IPv6 but I suppose the list is not up to date).
I believe this means that a lot of DNS queries from IPv6 enabled sites in US and other countries are going to Beijing. I wonder if this is intentional? Chinese government (CNNIC) seems to be in the path.
All my sites seem to have he.net somewhere in the IPv6 connectivity path. I wonder if this is specific to he.net or more wide-spread routing anomaly?
I have notified he.net NOC and F-root @ ISC.
Best Regards, -- Janne Snabb / EPIPE Communications snabb@epipe.com - http://epipe.com/
-- -JH
On Sun, 2 Oct 2011 17:40:23 +0000 (UTC), Janne Snabb wrote
I happened to notice the following at three separate sites around the US and one site in Europe:
Getting palo alto from east coast. 3 10gigabitethernet1-2.core1.atl1.he.net (2001:470:0:1b5::2) 8.166 ms 8.135 ms 8.103 ms 4 2001:470:0:ce::2 (2001:470:0:ce::2) 77.881 ms 77.866 ms 77.909 ms 5 iana.r1.atl1.isc.org (2001:500:61:6::1) 77.885 ms 77.924 ms 77.896 ms 6 int-0-5-0-1.r1.pao1.isc.org (2001:4f8:0:1::49:1) 76.846 ms 75.854 ms 75.819 ms 7 f.root-servers.net (2001:500:2f::f) 75.788 ms 75.756 ms 75.726 ms
In a message written on Sun, Oct 02, 2011 at 05:40:23PM +0000, Janne Snabb wrote:
I happened to notice the following at three separate sites around the US and one site in Europe:
ISC has verified our PEK2 route was being leaked further than intended, and for the moment we have pulled the route until we can get confirmation from our partners that the problem has been resolved. Service should be back to normal, but if anyone is still having problems noc@isc.org will open a ticket. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
leo, all, in the past, name servers that operated inside of china were subject to arbitrary rewriting or blocking of their results by the Great Firewall. this is obviously bad for Chinese citizens but it's *dramatically* worse for people outside of china who end up reaching a root server in china by mistake, no? people who ostensibly live free of this kind of interference and censorship are now subject to it by mistake. a previous time this happened renesys did a good write up on it. http://www.renesys.com/blog/2010/06/two-strikes-i-root.shtml i guess my questions now are: 1) how long was this happening? 2) can any root server operator who serves data inside of china verify that the data that they serve have not been rewritten by the great firewall? 3) does ISC (or <Insert Root Operator Here>) have a plan for monitoring route distribution to ensure that this doesn't happen again (without prompt detection and mitigation)? i'm not really singling out ISC here--this is a serious problem for anyone who chooses to operate a root server node on untrustworthy or malicious network infrastructure (which is one appropriate way of thinking of a rewriting firewall from the perspective of a root server operator). cheers, t On Sun, Oct 2, 2011 at 3:08 PM, Leo Bicknell <bicknell@ufp.org> wrote:
In a message written on Sun, Oct 02, 2011 at 05:40:23PM +0000, Janne Snabb wrote:
I happened to notice the following at three separate sites around the US and one site in Europe:
ISC has verified our PEK2 route was being leaked further than intended, and for the moment we have pulled the route until we can get confirmation from our partners that the problem has been resolved. Service should be back to normal, but if anyone is still having problems noc@isc.org will open a ticket.
-- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On Sun, 02 Oct 2011 17:30:37 EDT, Todd Underwood said:
2) can any root server operator who serves data inside of china verify that the data that they serve have not been rewritten by the great firewall?
DNSSEC should help this issue dramatically. This however could be problematic if the Chinese govt (or any repressive regime) decides to ban the use of technology that allows a user to identify when they're being repressed.
3) does ISC (or <Insert Root Operator Here>) have a plan for monitoring route distribution to ensure that this doesn't happen again (without prompt detection and mitigation)?
Leaked routes happen External monitors and looking glasses and filters and communities are all things we should probably be doing more of, in order to minimize routing bogosity. But when all is said and done, there's no real way to have a dynamic routing protocol like BGP and at the same time *guarantee* that some chucklehead NOC monkey won't bollix things up. At best, we'll be able to get to "less than N brown-paper-bag moments per Tier-[12] per annum" for some value of N.
valdis, all, On Sun, Oct 2, 2011 at 6:02 PM, <Valdis.Kletnieks@vt.edu> wrote:
On Sun, 02 Oct 2011 17:30:37 EDT, Todd Underwood said:
2) can any root server operator who serves data inside of china verify that the data that they serve have not been rewritten by the great firewall?
DNSSEC should help this issue dramatically. This however could be problematic if the Chinese govt (or any repressive regime) decides to ban the use of technology that allows a user to identify when they're being repressed.
sure, but DNSSEC is still basically unused.
3) does ISC (or <Insert Root Operator Here>) have a plan for monitoring route distribution to ensure that this doesn't happen again (without prompt detection and mitigation)?
Leaked routes happen External monitors and looking glasses and filters and communities are all things we should probably be doing more of, in order to minimize routing bogosity. But when all is said and done, there's no real way to have a dynamic routing protocol like BGP and at the same time *guarantee* that some chucklehead NOC monkey won't bollix things up. At best, we'll be able to get to "less than N brown-paper-bag moments per Tier-[12] per annum" for some value of N.
yep. this is a *great* argument *against* running critical information services on known-malicious network infrastructure, right? i.e.: if you are sure you're going to be interfered with regularly and you're positive you can't restrict the damage of that interference narrowly to the people who were already suffering such interference, perhaps you should choose to not locate your critical network information resource on that network. yes, i'm (again) suggesting that people take seriously not doing root name service inside of china as long as the great firewall exists. t
Todd Underwood <toddunder@gmail.com> wrote:
sure, but DNSSEC is still basically unused.
If you are running BIND 9.8 there is really no reason not to turn on DNSSEC validation, then you won't have to worry about anycast routes leaking from behind the great firewall. dnssec-validation auto; dnssec-lookaside auto; Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Viking, North Utsire: Southerly veering southwesterly 6 to gale 8, occasionally severe gale 9 at first in northwest Viking. Moderate or rough becoming very rough or high. Rain then squally showers. Moderate or good, occasionally poor.
On Oct 3, 2011, at 7:29 AM, Tony Finch wrote:
If you are running BIND 9.8 there is really no reason not to turn on DNSSEC validation, then you won't have to worry about anycast routes leaking from behind the great firewall.
User Exercise: What happens when you enable integrity checking in an application (e.g., 'dnssec-validation auto') and datapath manipulation persists? Bonus points for analysis of implementation and deployment behaviors and resulting systemic effects. Network layer integrity techniques and secure routing infrastructure are all that's going to fix this. In the interim, the ability to detect such incidents at some rate faster than the speed of mailing lists would be ideal. -danny
User Exercise: What happens when you enable integrity checking in an application (e.g., 'dnssec-validation auto') and datapath manipulation persists? Bonus points for analysis of implementation and deployment behaviors and resulting systemic effects.
i agree with danny here. ignoring randy (and others) off-topic comments about hypocrisy, this situation is fundamentally a situation of bad (or different) network policy being applied outside of its scope. i would prefer that china not censor the internet, sure. but i really require that china not censor *my* internet when i'm not in china. t
On Mon, Oct 03, 2011 at 10:30:47AM -0400, Todd Underwood wrote:
User Exercise: What happens when you enable integrity checking in an application (e.g., 'dnssec-validation auto') and datapath manipulation persists? Bonus points for analysis of implementation and deployment behaviors and resulting systemic effects.
i agree with danny here.
ignoring randy (and others) off-topic comments about hypocrisy, this situation is fundamentally a situation of bad (or different) network policy being applied outside of its scope. i would prefer that china not censor the internet, sure. but i really require that china not censor *my* internet when i'm not in china.
t
well, not to disagree - BUT.... the sole reason we have BGP and use ASNs the way we do is to ensure/enforce local policy. It is, after all, an AUTONOMOUS SYSTEM number. One sets policy at its boundaries on what/how to accept/reject/modify traffic crossing the boundary. If you dont -like- the ASN policy - then don't use/traverse that ASN. and rPKI has the same problems as DNSSEC. lack of uniform use/implementation is going to be a huge party - full of fun & games. /bill
ignoring randy (and others) off-topic comments about hypocrisy
actually, if you had followed the thread in its sad detail, at that point of jingoism they were on.
this situation is fundamentally a situation of bad (or different) network policy being applied outside of its scope.
kink is gonna leak. rfc1918 is gonna leak. ula-foo is gonna leak. pakistani kink is gonna leak. anycast 'local' cones are gonna leak. chinese kink is gonna leak. american kink is gonna leak. s/are gonna/has already/g are people gonna stop doing kink? sadly, not likely. so all we are left is Danny McPherson wrote:
Network layer integrity techniques and secure routing infrastructure are all that's going to fix this.
and Danny McPherson wrote:
In the interim, the ability to detect such incidents at some rate faster than the speed of mailing lists would be ideal.
is not a lot of good unless you insert "and fix." watching train wrecks is about as fun as reading pontification on nanog. qed :) randy
On 3 okt 2011, at 16:30, Todd Underwood wrote:
ignoring randy (and others) off-topic comments about hypocrisy, this situation is fundamentally a situation of bad (or different) network policy being applied outside of its scope. i would prefer that china not censor the internet, sure. but i really require that china not censor *my* internet when i'm not in china.
Most if not all European operators today force rewrite or blocking of DNS lookups. Belgium added a fairly large site today. There is virtually no way that this can be contained just inside a country. This problem is waaaay beyond root-servers, China etc. Filtering on the net is becoming common, and was pushed quite hard for at Interent Governance Forum last week. By Interpol and MPAA. Best regards, - kurtis -
In a message written on Mon, Oct 03, 2011 at 09:27:46AM -0400, Danny McPherson wrote:
User Exercise: What happens when you enable integrity checking in an application (e.g., 'dnssec-validation auto') and datapath manipulation persists? Bonus points for analysis of implementation and deployment behaviors and resulting systemic effects.
I think this is a (to some on the list) cryptic way of asking "If all your routes to the server go to someone masquerading, what happens when you try to validate that data?" The question being if you configure your nameserver to validate the root, but don't get signed answers back will your nameserver refuse to serve up any data, effectively taking you and your users offline? The answer should be no. This is part of why there are 13 root servers. If a nameserver is told the root is signed and it gets unsigned answers from one of the 13, it should ignore them and move on. I do not off the top of my head know all the timeouts and implementation dependant behaviors, but also remember that a up caching resolver will make approximately 1 query to the root per day for _valid_ names, but many queries per day for invalid names. Thus the impact to valid names should be minimal, even in the face of longer timeouts. Is there enough operational experience with DNSSEC? No. Can we fix that by saying it's not good enough yet? No. Run it. The people who write nameserver software are comitted to fixing any issues as quickly as possible, because it is our best way to secure DNS.
Network layer integrity techniques and secure routing infrastructure are all that's going to fix this. In the interim, the ability to detect such incidents at some rate faster than the speed of mailing lists would be ideal.
Network layer integrity and secure routing don't help the majority of end users. At my house I can choose Comcast or AT&T service. They will not run BGP with me, I could not apply RPKI, secure BGP, or any other method to the connections. They may well do NXDOMAIN remapping on their resolvers, or even try and transparently rewrite DNS answers. Indeed some ISP's have even experimented with injecting data into port 80 traffic transparently! Secure networks only help if the users have a choice, and choose to not use "bad" networks. If you want to be able to connect at Starbucks, or the airport, or even the conference room Wifi on a clients site you need to assume it's a rogue network in the middle. The only way for a user to know what they are getting is end to end crypto. Period. As for the speed of detection, its either instantenous (DNSSEC validation fails), or it doesn't matter how long it is (minutes, hours, days). The real problem is the time to resolve. It doesn't matter if we can detect in seconds or minutes when it may take hours to get the right people on the phone and resolve it. Consider this weekend's activity; it happened on a weekend for both an operator based in the US and a provider based in China, so you're dealing with weekend staff and a 12 hour time difference. If you want to insure accuracy of data, you need DNSSEC, period. If you want to insure low latency access to the root, you need multiple Anycasted instances because at any one point in time a particular one may be "bad" (node near you down for maintenance, routing issue, who knows) which is part of why there are 13 root servers. Those two things together can make for resilliance, security and high performance. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On Oct 3, 2011, at 11:20 AM, Leo Bicknell wrote:
Thus the impact to valid names should be minimal, even in the face of longer timeouts.
If you're performing validation on a recursive name server (or similar resolution process) expecting a signed response yet the response you receive is either unsigned or doesn't validate (i.e., bogus) you have to: 1) ask other authorities? how many? how frequently? impact? 2) consider implications on _entire chain of trust? 3) tell the client something? 4) cache what (e.g., zone cut from who you asked)? how long? 5) other? "minimal" is not what I was thinking..
Network layer integrity and secure routing don't help the majority of end users. At my house I can choose Comcast or AT&T service. They will not run BGP with me, I could not apply RPKI, secure BGP, or any other method to the connections. They may well do NXDOMAIN remapping on their resolvers, or even try and transparently rewrite DNS answers. Indeed some ISP's have even experimented with injecting data into port 80 traffic transparently!
Secure networks only help if the users have a choice, and choose to not use "bad" networks. If you want to be able to connect at Starbucks, or the airport, or even the conference room Wifi on a clients site you need to assume it's a rogue network in the middle.
The only way for a user to know what they are getting is end to end crypto. Period.
I'm not sure how "end to end" crypto helps end users in the advent of connectivity and *availability* issues resulting from routing brokenness in an upstream network which they do not control. "crypto", OTOH, depending on what it is and where in the stack it's applied, might well align with my "network layer integrity" assertion.
As for the speed of detection, its either instantenous (DNSSEC validation fails), or it doesn't matter how long it is (minutes, hours, days). The real problem is the time to resolve. It doesn't matter if we can detect in seconds or minutes when it may take hours to get the right people on the phone and resolve it. Consider this weekend's activity; it happened on a weekend for both an operator based in the US and a provider based in China, so you're dealing with weekend staff and a 12 hour time difference.
If you want to insure accuracy of data, you need DNSSEC, period. If you want to insure low latency access to the root, you need multiple Anycasted instances because at any one point in time a particular one may be "bad" (node near you down for maintenance, routing issue, who knows) which is part of why there are 13 root servers. Those two things together can make for resilliance, security and high performance.
You miss the point here Leo. If the operator of a network service can't detect issues *when they occur* in the current system in some automated manner, whether unintentional or malicious, they won't be alerted, they certainly can't "fix" the problem, and the potential exposure window can be significant. Ideally, the trigger for the alert and detection function is more mechanized than "notification by services consumer", and the network service operators or other network operators aware of the issue have some ability to institute reactive controls to surgically deal with that particular issue, rather than being captive to the [s]lowest common denominator of all involved parties, and dealing with additional non-determinsitic failures or exposure in the interim. Back to my earlier point, for *resilience* network layer integrity techniques and secure routing infrastructure are the only preventative controls here, and necessarily to augment DNSSEC's authentication and integrity functions at the application layer. Absent these, rapid detection enabling reactive controls that mitigate the issue are necessary. -danny
On Mon, Oct 3, 2011 at 12:38 PM, Danny McPherson <danny@tcb.net> wrote:
If the operator of a network service can't detect issues *when they occur* in the current system in some automated manner, whether unintentional or malicious, they won't be alerted, they certainly can't "fix" the problem, and the potential exposure window can be significant.
Ideally, the trigger for the alert and detection function is more mechanized than "notification by services consumer", and the network service operators or other network operators aware of the issue have
Does ISC (or any other anycast root/*tld provider) have external polling methods that can reliably tell when, as was in this case, local-anycast-instances are made global? (or when the cone of silence widens?) Given that in the ISC case the hostname.bind query can tell you at least the region + instance#, it seems plausible that some system of systems could track current/changes in the mappings, no? and either auto-action some 'fix' (SHUT DOWN THE IAD INSTANCE IT's ROGUE!) or at least log and notify a hi-priority operations fixer. Given something like the unique-as work Verisign has been behind you'd think monitoring route origins and logging 'interesting' changes could accomplish this as well? (I suppose i'm not prescribing solutions above, just wondering if something like these is/could-be done feasibly) -chris
In a message written on Mon, Oct 03, 2011 at 12:38:25PM -0400, Danny McPherson wrote:
1) ask other authorities? how many? how frequently? impact? 2) consider implications on _entire chain of trust? 3) tell the client something? 4) cache what (e.g., zone cut from who you asked)? how long? 5) other?
"minimal" is not what I was thinking..
I'm asking the BIND team for a better answer, however my best understanding is this will query a second root server (typically next best by RTT) when it gets a non-validating answer, and assuming the second best one validates just fine there are no further follow on effects. So you're talking one extra query when a caching resolver hits the root. We can argue if that is minimal or not, but I suspect most end users behind that resolver would never notice.
You miss the point here Leo. If the operator of a network service can't detect issues *when they occur* in the current system in some automated manner, whether unintentional or malicious, they won't be alerted, they certainly can't "fix" the problem, and the potential exposure window can be significant.
In a message written on Mon, Oct 03, 2011 at 01:09:17PM -0400, Christopher Morrow wrote:
Does ISC (or any other anycast root/*tld provider) have external polling methods that can reliably tell when, as was in this case, local-anycast-instances are made global? (or when the cone of silence widens?)
Could ISC (or any other root operator) do more monitoring? I'm sure, but let's scope the problem first. We're dealing here with a relatively wide spread leak, but that is in fact the rare case. There are 39,000 ASN's active in the routing system. Each one of those ASN's can affect it's path to the root server by: 1) Bringing up an internal instance of a root server, injecting it into its IGP, and "hijacking" the route. 2) Turning up or down a peer that hosts a root server. 3) Turning up or down a transit provider. 4) Adding or removing links internal to their network that change their internal selection to use a different external route. The only way to make sure a route was correct, everywhere, would be to have 39,000+ probes, one on every ASN, and check the path to the root server. Even if you had that, how do you define when any of the changes in 1-4 are legitimate? You could DNSSEC verify to rule out #1, but #2-4 are local decisions made by the ASN (or one of its upstreams). I suppose, if someone had all 39,000+ probes, we could attempt to write algorythms that determined if too much "change" was happening at once; but I'm reminded of events like the earthquake that took out many asian cables a few years back. There's a very real danger in such a system shutting down a large number of nodes during such an event due to the magnitude of changes which I'd suggest is the exact opposite of what the Internet needs to have happen in that event.
(I suppose i'm not prescribing solutions above, just wondering if something like these is/could-be done feasibly)
Not really. Look, I chase down several dozen F-Root leaks a year. You never hear about them on NANOG. Why? Well, it's some small ISP in the middle of nowhere leaking to a peer who believes them, and thus they get a 40ms response time when they should have a 20ms response time by believing the wrong route. Basically, almost no one cares, generally it takes some uber-DNS nerd at a remote site to figure this out and contact us for help. This has tought me that viewpoints are key. You have to be on the right network to detect it has hijacked all 13 root servers, you can't probe that from the outside. You also have to be on the right network to see you're getting the F-Root 1000 miles away rather than the one 500. Those 39,000 ASN's are providing a moving playing field, with relationships changing quite literally every day, and every one of them may be a "leak". This one caught attention not because it was a bad leak. It was IPv6 only. Our monitoring suggests this entire leak siphoned away 40 queries per second, at it's peak, across all of F-Root. In terms of a percentage of queries it doesn't even show visually on any of our graphs. No, it drew attention for totally non-technical reasons, US users panicing that the Chinese goverment was hijacking the Internet which is just laughable in this context. There really is nothing to see here. DNSSEC fixes any security implications from these events. My fat fingers have dropped more than 40qps on the floor more than once this year, and you didn't notice. Bad events (like earthquakes and fiber cuts) have taken any number of servers from any number of operators multiple times this year. Were it not for the fact that someone posted to NANOG, I bet most of the people here would have never noticed their 99.999% working system kept working just fine. I think all the root ops can do better, use more monitoring services, detect more route hijacks faster, but none of us will ever get 100%. None will ever be instantaneous. Don't make that the goal, make the system robust in the face of that reality. My own resolution is better IPv6 monitoring for F-root. :) -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On Oct 3, 2011, at 1:34 PM, Leo Bicknell wrote:
I'm asking the BIND team for a better answer, however my best understanding is this will query a second root server (typically next best by RTT) when it gets a non-validating answer, and assuming the second best one validates just fine there are no further follow on effects. So you're talking one extra query when a caching resolver hits the root. We can argue if that is minimal or not, but I suspect most end users behind that resolver would never notice.
I'm not talking "one extra query", and it's not simply about subsequent transaction attempts either - so conjecture aiming to marginalize the impact isn't particularly helpful. I.e., have that look, get back to us... :-) -danny
Leo, On Mon, Oct 3, 2011 at 7:34 PM, Leo Bicknell <bicknell@ufp.org> wrote:
The only way to make sure a route was correct, everywhere, would be to have 39,000+ probes, one on every ASN, and check the path to the root server. Even if you had that, how do you define when any of the changes in 1-4 are legitimate? You could DNSSEC verify to rule out #1, but #2-4 are local decisions made by the ASN (or one of its upstreams).
I suppose, if someone had all 39,000+ probes, we could attempt to write algorythms that determined if too much "change" was happening at once; but I'm reminded of events like the earthquake that took out many asian cables a few years back. There's a very real danger in such a system shutting down a large number of nodes during such an event due to the magnitude of changes which I'd suggest is the exact opposite of what the Internet needs to have happen in that event.
This sounds an awfully lot like the notary concept: - http://perspectives-project.org/ - http://convergence.io/ Furthermore, changing network paths used to reach information probably should not be reason to shut down a service, in general. More interesting than which path is used, I suppose, is whether or not the data being returned has been changed in some unexpected/undesired way. Regards, Martin
In a message written on Tue, Oct 04, 2011 at 07:00:52AM +0900, Randy Bush wrote:
cool. then we can get rid of dynamic routing. it always has been a pain in the ass.
If we went back to hosts.txt this pesky DNS infrastructure would be totally unnecessary. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On Oct 3, 2011, at 2:44 PM, Martin Millnert wrote:
Leo,
On Mon, Oct 3, 2011 at 7:34 PM, Leo Bicknell <bicknell@ufp.org> wrote:
<snip>
This sounds an awfully lot like the notary concept: - http://perspectives-project.org/ - http://convergence.io/
Furthermore, changing network paths used to reach information probably should not be reason to shut down a service, in general. More interesting than which path is used, I suppose, is whether or not the data being returned has been changed in some unexpected/undesired way.
Actually, some other related work that's been around for 3-6 years includes: - http://vantage-points.org/ - http://secspider.cs.ucla.edu/ The former has a tech report (listed on its page, http://techreports.verisignlabs.com/tr-lookup.cgi?trid=1110001 ) that presents candidate closed form analysis of how much faith you can gain using network path diversity. Eric
On Oct 3, 2011, at 1:09 PM, Christopher Morrow wrote:
Given that in the ISC case the hostname.bind query can tell you at least the region + instance#, it seems plausible that some system of systems could track current/changes in the mappings, no? and either auto-action some 'fix' (SHUT DOWN THE IAD INSTANCE IT's ROGUE!) or at least log and notify a hi-priority operations fixer.
That sort of capability at the application layer certainly seems prudent to me, noting that it does assume you have a measurement node within the catchment in question and are measuring at a high enough frequency to detect objective incidents.
Given something like the unique-as work Verisign has been behind you'd think monitoring route origins and logging 'interesting' changes could accomplish this as well?
I'm a fan of both routing system && consumer-esque monitoring, and do believe that a discriminator in the routing system associated with globally anycasted prefixes makes this simpler - for both detection, and possibly even reactive or preventative controls IF necessary. A unique origin AS is not the only place you can do this in the routing system, as I'm sure some will observe, but it seems an ideal location to me. -danny
On 2011-10-03, at 13:39, Danny McPherson wrote:
On Oct 3, 2011, at 1:09 PM, Christopher Morrow wrote:
Given that in the ISC case the hostname.bind query can tell you at least the region + instance#, it seems plausible that some system of systems could track current/changes in the mappings, no? and either auto-action some 'fix' (SHUT DOWN THE IAD INSTANCE IT's ROGUE!) or at least log and notify a hi-priority operations fixer.
That sort of capability at the application layer certainly seems prudent to me, noting that it does assume you have a measurement node within the catchment in question and are measuring at a high enough frequency to detect objective incidents.
In principle there seems like no reason that a DNS client sending queries to authority-only servers couldn't decide to include the NSID option and log changes in declared server identity between subsequent queries (or take some other configured action). We support 5001 on L-Root (which runs NSD), for what that's worth, as well as HOSTNAME.BIND/CH/TXT, VERSION.BIND/CH/TXT, ID.SERVER/CH/TXT and VERSION.SERVER/CH/TXT, but those require separate queries. I appreciate NSID support is not universal, but perhaps that's ok in the sense of "better than nothing".
I'm a fan of both routing system && consumer-esque monitoring, and do believe that a discriminator in the routing system associated with globally anycasted prefixes makes this simpler - for both detection, and possibly even reactive or preventative controls IF necessary. A unique origin AS is not the only place you can do this in the routing system, as I'm sure some will observe, but it seems an ideal location to me.
Whether it's the right-most entry in the AS_PATH or a bigger substring, you still need more measurement points than you have if you want to catch every leak. Joe
----- Original Message -----
From: "Valdis Kletnieks" <Valdis.Kletnieks@vt.edu>
DNSSEC should help this issue dramatically. This however could be problematic if the Chinese govt (or any repressive regime) decides to ban the use of technology that allows a user to identify when they're being repressed.
We won't be permitted to see the repression inherent in the system? Cheers, -- jr 'Run Away!' a -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274
On Sun, Oct 2, 2011 at 11:11 PM, Jay Ashworth <jra@baylink.com> wrote:
----- Original Message -----
From: "Valdis Kletnieks" <Valdis.Kletnieks@vt.edu>
DNSSEC should help this issue dramatically. This however could be problematic if the Chinese govt (or any repressive regime) decides to ban the use of technology that allows a user to identify when they're being repressed.
We won't be permitted to see the repression inherent in the system?
help, help! I'm being repressed! phil
On Sun, Oct 2, 2011 at 10:11 PM, Jay Ashworth <jra@baylink.com> wrote:
DNSSEC should help this issue dramatically. This however could be problematic if the Chinese govt (or any repressive regime) decides to ban the use of technology that allows a user to identify when they're being repressed. We won't be permitted to see the repression inherent in the system?
You actually think China will be the first to ban DNSSEC? Maybe, but It will probably be banned first indirectly, by governments legislating requirements of SPs that are incompatible with DNSSEC. The repression is at home in the form of the US PROTECT IP bill that will provide a framework for DNS authorities, domain registries, and ISPs/operators of non-authoritative nameservers to be sent letters requiring them to modify DNS responses for other organization's domains based on allegations/suspicions. -- -JH
china nukes 120,000 domains for going against the policy of the state. oops! that wasn't china, was it? perhaps, we should postpone telling others what to do until our side of the street is clean? randy
120K domains - basically cnnic seems to have finally got tired of russian botmaster types registering thousands of domains at a time, and put in a rule that says you need business registration in China / ID in china to register a .cn Beyond that, that's one ccTLD - however large. There are multiple gTLDs that have already done a great job of cleanup (biz, info for example) and so far I haven't heard of .us having an infestation of botmasters / spammers. And of course there are all the registrars out there that need to be reached out to / handled etc etc - but that's another kettle of fish. We're discussing two different things here - apples and oranges, though it does look like they're all part of the same fruit salad. 1. Action by different registrars / registries [in .cn's case, a government controlled registry, to be sure] 2. State policy to route internet access and DNS through an inspecting + rewriting firewall that blocks or replaces politically unacceptable content --srs On Mon, Oct 3, 2011 at 11:17 AM, Randy Bush <randy@psg.com> wrote:
china nukes 120,000 domains for going against the policy of the state.
oops! that wasn't china, was it?
perhaps, we should postpone telling others what to do until our side of the street is clean?
randy
-- Suresh Ramasubramanian (ops.lists@gmail.com)
On Mon, 03 Oct 2011 11:29:43 +0530, Suresh Ramasubramanian said:
120K domains - basically cnnic seems to have finally got tired of russian
No, I think Randy was referring to this sort of thing: http://www.theregister.co.uk/2011/02/18/fed_domain_seizure_slammed/
Sure - but what was being discussed in this thread was transparent / on the fly rewrites of root server responses getting exposed to people beyond china. Whether these responses should be altered / censored within china or not is a different can of worms, and that too has nothing at all to do with either registry policy, or law enforcement mandated domain seizure. On Mon, Oct 3, 2011 at 11:42 AM, <Valdis.Kletnieks@vt.edu> wrote:
No, I think Randy was referring to this sort of thing:
http://www.theregister.co.uk/2011/02/18/fed_domain_seizure_slammed/
-- Suresh Ramasubramanian (ops.lists@gmail.com)
----- Original Message ----- From: <Valdis.Kletnieks@vt.edu> On Mon, 03 Oct 2011 11:29:43 +0530, Suresh Ramasubramanian said:
120K domains - basically cnnic seems to have finally got tired of russian
No, I think Randy was referring to this sort of thing:
http://www.theregister.co.uk/2011/02/18/fed_domain_seizure_slammed/ "Our government has gone rogue on us," Eric Goldman, a professor at Santa Clara University School of Law, said. "Our government is going into court with half-baked facts and half-baked legal theories and shutting down operations. This is exactly what we thought the government couldn't do. I'm scratching my head why we aren't' grabbing the pitchforks." ® I.C.E., our very own Gestapo-Without-Borders. Makes me proud.<sigh>
In a message written on Sun, Oct 02, 2011 at 05:30:37PM -0400, Todd Underwood wrote:
i guess my questions now are:
1) how long was this happening? 2) can any root server operator who serves data inside of china verify that the data that they serve have not been rewritten by the great firewall? 3) does ISC (or <Insert Root Operator Here>) have a plan for monitoring route distribution to ensure that this doesn't happen again (without prompt detection and mitigation)?
I can't answer #1 with precision yet, but will attempt to get a precise answer soon. I'd like to partially address #2 and #3. ISC can verify that the responses sent from F-Root boxes are always the same, regardless of which server returns the answer. That is, there is no filtering or rewriting on any ISC root servers. We do know there are a number of locations around the world that have various rewriting and blocking systems employed. We have found networks where a query sent to F-Root never reaches an ISC run server. As a root operator we hate this, and believe the best way to solve the problem is DNSSEC. Short of providing a method like DNSSEC to verify the answer is legitimate, we know of no other countermeasure. There are in fact networks in the world that impersonate all 13 root servers, and we don't know of a lever to make them stop (short of local empowerment). In the case of transparent re-writers or blockers between us and the end users there is no practical way for us to detect that the modifications are happening, and thus I don't think anyone could answer your second question with precision. DNSSEC will at least let every user do the verification from their own vantage point, which is part of why it is so important. Regarding #3, ISC does monitor for leaked routes. Unfortunately these monitors are only as good as the vantage points they occupy, and so with upwards of 40,000 ASN's I don't know of any way to cover them all with any certianty. In this case it was even harder, as the leak (appears to have been) IPv6 only. There are a lot fewer IPv6 monitors, and folks are generally sloppy with their IPv6 configs so there is more leaking. The situation is improving rapidly.
i'm not really singling out ISC here--this is a serious problem for anyone who chooses to operate a root server node on untrustworthy or malicious network infrastructure (which is one appropriate way of thinking of a rewriting firewall from the perspective of a root server operator).
I think the problem goes a lot further than root operators. The fact of the matter is that there are networks that tamper with your packets. From the benign NAT, to the full on transparent content filter/blocker. Most places that tamper with root queries also tamper with lots of other things. Without sort of reliable end to end crypto you really have no way of knowing. The root zone is signed. You can enable DNSSEC validation in your caching resolvers. There are plugins for popular browsers that attempt to do DNSSEC validation and show the results to the end user in some pleasing way. Much more work needs to be done in this area, but the technology is usable today. If you care about authentic responses, use it. Lastly, for some reason a ton of people always jump to the conclusion that these sort of events are the plot of $insert_bad_guy. I've chased down many leaks of F-Root during my time, and 100% of them to date have been an accident. The clueless NOC monkey. The poorly written route map. Someone not reading the documentation. Even if $insert_bad_guy wanted to hijack F-Root (or any other root), doing it in this way is very visable and easy to work around. It just doesn't make sense to even try. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On Sun, Oct 02, 2011 at 04:06:44PM -0700, Leo Bicknell <bicknell@ufp.org> wrote a message of 107 lines which said:
We have found networks where a query sent to F-Root never reaches an ISC run server.
For details on such behavior, i highly recommend the excellent paper "Identifying and Characterizing Anycast in the Domain Name System" <http://www.isi.edu/~xunfan/research/anycast_Tech_Report_ISI_TR_671.pdf>, which shows, among other things, that such masquerading (by a false root name server) happens.
On Sun, 02 Oct 2011 12:08:35 PDT, Leo Bicknell said:
ISC has verified our PEK2 route was being leaked further than intended, and for the moment we have pulled the route until we can get confirmation from our partners that the problem has been resolved.
So Leo - you don't have to give us a full reveal of the root cause, but did the phrase "chuckleheaded NOC monkey" enter at all into the saga? ;)
On Sun, Oct 02, 2011 at 05:40:23PM +0000, Janne Snabb <snabb@epipe.com> wrote a message of 32 lines which said:
I happened to notice the following at three separate sites around the US and one site in Europe:
Good analysis at <http://bgpmon.net/blog/?p=540>
On 03/10/2011 09:03, Stephane Bortzmeyer wrote:
On Sun, Oct 02, 2011 at 05:40:23PM +0000, Janne Snabb <snabb@epipe.com> wrote a message of 32 lines which said:
I happened to notice the following at three separate sites around the US and one site in Europe:
Good analysis at <http://bgpmon.net/blog/?p=540>
We used DNSMON data to analyse this event, and found an earlier leak on 29 and 30 September: https://labs.ripe.net/Members/emileaben/f-root-route-leak-the-dnsmon-view best regards, Emile Aben RIPE NCC
On Sun, Oct 02, 2011 at 05:40:23PM +0000, Janne Snabb <snabb@epipe.com> wrote a message of 32 lines which said:
$ dig +short +norec @F.ROOT-SERVERS.NET HOSTNAME.BIND CHAOS TXT "pek2a.f.root-servers.org"
The next time, I suggest to also run "data" queries such as "A www.facebook.com" or "A www.twitter.com" to see if there is hard evidence of an actual security problem. (Most articles on this case mentioned that "we have no proof there was a rewriting of answers from the F-root instance".)
participants (21)
-
bmanning@vacation.karoshi.com
-
Christopher Morrow
-
Danny McPherson
-
Emile Aben
-
Eric Osterweil
-
Janne Snabb
-
Jay Ashworth
-
Jimmy Hess
-
Joe Abley
-
Leo Bicknell
-
Lindqvist Kurt Erik
-
Martin Millnert
-
Michael Painter
-
Phil Dyer
-
Randy Bush
-
Randy McAnally
-
Stephane Bortzmeyer
-
Suresh Ramasubramanian
-
Todd Underwood
-
Tony Finch
-
Valdis.Kletnieks@vt.edu