On Wed, 15 Mar 2006, Simon Waters wrote:
This behavior is unfortunately not unique.
Alas what others peoples servers do, shouldn't be an issue for you.
Your
problem is they can be coerced into a DoS attack, not that the data is stale.
actually, dos-attack-aside, the interesting thing is that lots of people (original poster perhaps included) believe that TTL's are adhered to except in some marginal cases. I think Rodney's point is that they are not adhered to anywhere near as much as we would all like to believe :(
So, if you, or the original poster, is going to move ${important_resource} around ip-wise keep in mind that your ${important_thing} may have to answer to more than 1 ip address for a period much longer than your tuned TTL :(
Thanks all for the responses. I do understand we may need to support the old IP addresses for sometime. I was hoping someone had performed a study out there to determine what a ratio maybe for us supporting an old IP address (I know our traffic profile will be unique for us thus it would only give us a general idea). For example if we change ip addresses will we need to plan on 20% traffic at old site on day1, 10% day2, 5%, day3, and so on...? There are also issues related to proxy servers and browser caching that are independent of DNS we will need to quantify to understand full risk. The more data we have will drive some of our decisions. Thanks again, Steve
(re-sending because I wasn't on nanog-post)
For example if we change ip addresses will we need to plan on 20% traffic at old site on day1, 10% day2, 5%, day3, and so on...? There are also issues related to proxy servers and browser caching that are independent of DNS we will need to quantify to understand full risk. The more data we have will drive some of our decisions.
You might consider the following paper from IMC 2003: "On the Responsiveness of DNS-based Network Control" by Jeffrey Pang, Aditya Akella, Anees Shaikh, Balachander Krishnamurthy, Srinivasan Seshan, http://www.imconf.net/imc-2004/papers/p21-pang.pdf It sheds some light on how widely DNS TTLs are adhered to. The CDF graphs on the 4th page suggest that you should be fairly safe after a day, though I don't see if the paper specifically states what the largest recorded violation was. Sharad.
On Thursday 16 Mar 2006 04:23, you wrote:
You might consider the following paper from IMC 2003: "On the Responsiveness of DNS-based Network Control" by Jeffrey Pang, Aditya Akella, Anees Shaikh, Balachander Krishnamurthy, Srinivasan Seshan, http://www.imconf.net/imc-2004/papers/p21-pang.pdf
The results are greatly at odds with my experience. As they imply the problem may be specifically misconfigured ISPs DNS server, which might explain why we see less violations, if our sites aren't popular with those ISPs users. However I wouldn't trust any report where the control of the authoritative DNS itself wasn't explicitly monitored and reported. They may think they have updated the authoritative answers (and TTL), but in my experience when you find violators you often find that the authoritative DNS servers didn't all update as, or when, expected, or that earlier records were returned with a longer TTL from those servers. Certainly that was the experience of moving many sites last week. Where you can in real time check the logs and find which domains we messed up on by the traffic still arriving. Looking at the 4 long term violators for one site.... Hits Source IP 8 198.78.130.68 <--- ?? 1 212.95.252.16 <--- lager.netcraft.com 15 66.147.154.3 <--- IBM Almaden Research Center 5 70.42.51.10 <--- Fast Search & Transfer During this period (starting 3 days after moving a 10 minute TTL) we saw 27234 hits (okay not exactly a busy site) for that site on the correct server. So roughly 1 in a 1000 hits during days 3 to 6 went to the old web server, and this domain had the most lost hits, most of the moved domains don't show in the old server's log at all. Given I think we can exclude at least 21 out of 29 safely as being "non-human" (sorry IBM Research if you were deeply interested in proof reading), and I'm guessing have made a deliberate effort to cache stale data for their own reasons. So I can put an upper estimate on our sites of 1 in 1000 hits of interest going to the wrong site during days 3 to 6. The most popular site moved, had only two DNS violators days 3 to 6, the most notable being the same "Fast Search & Transfer" IP above. It may be that popular sites have a far worse problem by dint of exercising more caching code, but this site is far from being our most popular. And these sites were moved by reducing the TTL to a low value (10 minutes) and keeping it there for a long period of time, before we actually performed the move.
:: >So, if you, or the original poster, is going to move :: ${important_resource} :: >around ip-wise keep in mind that your ${important_thing} may have to :: >answer to more than 1 ip address for a period much longer than your :: tuned :: >TTL :( :: :: Thanks all for the responses. I do understand we may need to support the :: old IP addresses for sometime. I was hoping someone had performed a :: study out there to determine what a ratio maybe for us supporting an old :: IP address (I know our traffic profile will be unique for us thus it :: would only give us a general idea). :: :: For example if we change ip addresses will we need to plan on 20% :: traffic at old site on day1, 10% day2, 5%, day3, and so on...? There are :: also issues related to proxy servers and browser caching that are :: independent of DNS we will need to quantify to understand full risk. The :: more data we have will drive some of our decisions. In my not-so-scientific "studies" with changind IPs for a fairly large volume site, I found that 90% of the people will use the new ip within an hour of TTL expiration, 99.999% of the people within 3 days, and that remaining .001% may take years.... As someone said earlier, some parts of the 'net are just broken beyond your control... -igor
participants (4)
-
Igor Gashinsky
-
Sharad Agarwal
-
Simon Waters
-
Thurman, Steven