DNS Reliability

newer
Bandwidth for a weekend @ Gaylord...

Phil Fagan

12 Sep 2013 12 Sep '13

8:03 p.m.

Everything else remaining equal...is there a standard or expectation for DNS reliability? 98% 99% 99.5% 99.9% 99.99% 99.999% Measured in queries completed vs. queries lost. Whats the consensus? -- Phil Fagan Denver, CO 970-480-7618

Show replies by date

Bryan Tong

12 Sep 12 Sep

8:11 p.m.

To me anything below 99.99% is unacceptable. 100 failures out of 100,000 queries still seems like a lot especially if its not network related. So I would say 99.999% would be what I would look for. Thanks On Thu, Sep 12, 2013 at 2:03 PM, Phil Fagan <philfagan@gmail.com> wrote:

...

Everything else remaining equal...is there a standard or expectation for DNS reliability?

98% 99% 99.5% 99.9% 99.99% 99.999%

Measured in queries completed vs. queries lost.

Whats the consensus?

-- Phil Fagan Denver, CO 970-480-7618

-- -------------------- Bryan Tong Nullivex LLC | eSited LLC (507) 298-1624

Beavis

8:14 p.m.

I go with 99.999% given that you have a good number of DNS Servers (anycasted). On Thu, Sep 12, 2013 at 9:03 PM, Phil Fagan <philfagan@gmail.com> wrote:

...

Everything else remaining equal...is there a standard or expectation for DNS reliability?

98% 99% 99.5% 99.9% 99.99% 99.999%

Measured in queries completed vs. queries lost.

Whats the consensus?

-- Phil Fagan Denver, CO 970-480-7618

-- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments Disclaimer: http://goldmark.org/jeff/stupid-disclaimers/

Phil Fagan

8:25 p.m.

Its a good point about the anycast; 99.999% should be expected. On Thu, Sep 12, 2013 at 2:14 PM, Beavis <pfunix@gmail.com> wrote:

...

I go with 99.999% given that you have a good number of DNS Servers (anycasted).

On Thu, Sep 12, 2013 at 9:03 PM, Phil Fagan <philfagan@gmail.com> wrote:

...
Everything else remaining equal...is there a standard or expectation for DNS reliability?

98% 99% 99.5% 99.9% 99.99% 99.999%

Measured in queries completed vs. queries lost.

Whats the consensus?

-- Phil Fagan Denver, CO 970-480-7618

-- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Disclaimer: http://goldmark.org/jeff/stupid-disclaimers/

-- Phil Fagan Denver, CO 970-480-7618

Glen Wiley

8:40 p.m.

Remember though that anycast only solves for availability in one layer of the system and it is not difficult to create a less available anycast presence if you do silly things with the way you manage your routes. A system is only as available as the least available layer in that system For example, if you use an automated system that changes your route advertisements and that system encounters a defect that breaks your announcements then although a well built anycast footprint might acheive 99.999, a poorly implemented management system that is less available and creates an outage would reduce the number. On Thu, Sep 12, 2013 at 4:25 PM, Phil Fagan <philfagan@gmail.com> wrote:

...

Its a good point about the anycast; 99.999% should be expected.

On Thu, Sep 12, 2013 at 2:14 PM, Beavis <pfunix@gmail.com> wrote:

...
I go with 99.999% given that you have a good number of DNS Servers (anycasted).

On Thu, Sep 12, 2013 at 9:03 PM, Phil Fagan <philfagan@gmail.com> wrote:

...
Everything else remaining equal...is there a standard or expectation for DNS reliability?

98% 99% 99.5% 99.9% 99.99% 99.999%

Measured in queries completed vs. queries lost.

Whats the consensus?

-- Phil Fagan Denver, CO 970-480-7618

-- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Disclaimer: http://goldmark.org/jeff/stupid-disclaimers/

-- Phil Fagan Denver, CO 970-480-7618

-- Glen Wiley KK4SFV "A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away." - Antoine de Saint-Exupery

Phil Fagan

8:49 p.m.

Thumbs up on this one; my entire path and chain of management of that path need to be equally fault tolerant - Awesome. On Thu, Sep 12, 2013 at 2:40 PM, Glen Wiley <glen.wiley@gmail.com> wrote:

...

Remember though that anycast only solves for availability in one layer of the system and it is not difficult to create a less available anycast presence if you do silly things with the way you manage your routes. A system is only as available as the least available layer in that system

For example, if you use an automated system that changes your route advertisements and that system encounters a defect that breaks your announcements then although a well built anycast footprint might acheive 99.999, a poorly implemented management system that is less available and creates an outage would reduce the number.

On Thu, Sep 12, 2013 at 4:25 PM, Phil Fagan <philfagan@gmail.com> wrote:

...
Its a good point about the anycast; 99.999% should be expected.

On Thu, Sep 12, 2013 at 2:14 PM, Beavis <pfunix@gmail.com> wrote:

...
I go with 99.999% given that you have a good number of DNS Servers (anycasted).

On Thu, Sep 12, 2013 at 9:03 PM, Phil Fagan <philfagan@gmail.com> wrote:

...
Everything else remaining equal...is there a standard or expectation for DNS reliability?

98% 99% 99.5% 99.9% 99.99% 99.999%

Measured in queries completed vs. queries lost.

Whats the consensus?

-- Phil Fagan Denver, CO 970-480-7618

-- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Disclaimer: http://goldmark.org/jeff/stupid-disclaimers/

-- Phil Fagan Denver, CO 970-480-7618

-- Glen Wiley KK4SFV

"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away." - Antoine de Saint-Exupery

-- Phil Fagan Denver, CO 970-480-7618

Rubens Kuhl

8:39 p.m.

On Thu, Sep 12, 2013 at 5:03 PM, Phil Fagan <philfagan@gmail.com> wrote:

...

Everything else remaining equal...is there a standard or expectation for DNS reliability?

98% 99% 99.5% 99.9% 99.99% 99.999%

Measured in queries completed vs. queries lost.

Whats the consensus?

ICANN new gTLD agreements specified 100% availability for the service, meaning at least 2 DNS IP addresses answered 95% of requests within 500 ms (UDP) or 1500 ms (TCP) for 51+% of the probes, or 99% availability for a single name server, defined as 1 DNS IP address. Rubens

Phil Fagan

8:48 p.m.

Good reference; thank you. On Thu, Sep 12, 2013 at 2:39 PM, Rubens Kuhl <rubensk@gmail.com> wrote:

...

On Thu, Sep 12, 2013 at 5:03 PM, Phil Fagan <philfagan@gmail.com> wrote:

...
Everything else remaining equal...is there a standard or expectation for DNS reliability?

98% 99% 99.5% 99.9% 99.99% 99.999%

Measured in queries completed vs. queries lost.

Whats the consensus?

ICANN new gTLD agreements specified 100% availability for the service, meaning at least 2 DNS IP addresses answered 95% of requests within 500 ms (UDP) or 1500 ms (TCP) for 51+% of the probes, or 99% availability for a single name server, defined as 1 DNS IP address.

Rubens

-- Phil Fagan Denver, CO 970-480-7618

Eric Brunner-Williams

13 Sep 13 Sep

2:32 a.m.

On 9/12/13 1:39 PM, Rubens Kuhl wrote:

...

ICANN new gTLD agreements specified 100% availability for the service, meaning at least 2 DNS IP addresses answered 95% of requests within 500 ms (UDP) or 1500 ms (TCP) for 51+% of the probes, or 99% availability for a single name server, defined as 1 DNS IP address.

unless phil happens to be building out (or spec'ing out $provider's offered sla) for one of the happy thousand or so celebrants of 2014, a surprisingly large fraction of which are tenant plays on existing infrastructure, the bogie above, uninterpreted, is not a controlling authority. additionally, was phil asking for a metric for an authoritative server, serving a zone delegated directly from the iana root? was he asking for a metric for a caching server? and if the metric is "queries completed vs. queries lost", from where to where? (that is the "uninterpreted" bit from the bogie rubens quotes, as we did have to correct some assumptions of the requirement author -- where is the measurement being preformed? i'm with randy on this, dns is a service, the better question is what fails as query response degrades, in the presence of hierarchical caching and the protocol being used as designed under best effort of infrastructure and application. eric

Randy Bush

12 Sep 12 Sep

9:35 p.m.

...

Everything else remaining equal...is there a standard or expectation for DNS reliability? ... Measured in queries completed vs. queries lost.

this is the wrong question. the protocol is designed assuming query failures. randy

George William Herbert

10:26 p.m.

On Sep 12, 2013, at 2:35 PM, Randy Bush <randy@psg.com> wrote:

...

...
Everything else remaining equal...is there a standard or expectation for DNS reliability? ... Measured in queries completed vs. queries lost.

this is the wrong question. the protocol is designed assuming query failures.

randy

I think it's part of the right answer. Capacity and server connectivity issues, what this metric will mostly measure, do matter. The other part, more likely to get you on CNN and Reddit and the front pages of the NY Times and WSJ, is the area represented by MTBF / MTTR / etc. how often is DNS for your domain DOWN - or WRONG - and how fast did you recover. The other subthread about routeability plays into that. For BIGPLACE environments, you should be considering how many AS numbers independently host DNS instances for you, in how many geographical regions, and do you have a backup registrar available spun up... -george william herbert Sent from Kangphone

George Michaelson

10:34 p.m.

we're already outside our operating envelope, if these community expectation figures are believable. a wise man once said to me that when setting formal conformance targets its a good idea to only set ones you can honestly achieve, otherwise you're setting yourself up to be measured to fail. I don't think that necessarily competes with 'aim high' ('be all you can be') but... On Fri, Sep 13, 2013 at 8:26 AM, George William Herbert < george.herbert@gmail.com> wrote:

...

On Sep 12, 2013, at 2:35 PM, Randy Bush <randy@psg.com> wrote:

...
...
Everything else remaining equal...is there a standard or expectation for DNS reliability? ... Measured in queries completed vs. queries lost.

this is the wrong question. the protocol is designed assuming query failures.

randy

I think it's part of the right answer. Capacity and server connectivity issues, what this metric will mostly measure, do matter.

The other part, more likely to get you on CNN and Reddit and the front pages of the NY Times and WSJ, is the area represented by MTBF / MTTR / etc. how often is DNS for your domain DOWN - or WRONG - and how fast did you recover.

The other subthread about routeability plays into that. For BIGPLACE environments, you should be considering how many AS numbers independently host DNS instances for you, in how many geographical regions, and do you have a backup registrar available spun up...

-george william herbert

Sent from Kangphone

Randy Bush

10:39 p.m.

...

we're already outside our operating envelope

not really. just some folk seem not to understand things such as udp datagrams and the dns protocols. randy

George William Herbert

11:02 p.m.

On Sep 12, 2013, at 3:39 PM, Randy Bush <randy@psg.com> wrote:

...

...
we're already outside our operating envelope

not really. just some folk seem not to understand things such as udp datagrams and the dns protocols.

randy

Statistically, UDP sometimes arrives after an internet wide round trip. Honest! The worry is bimodal. Most small sites, two or three servers, stop worrying. Most medium sites, watch your server load and run external monitoring. Most big sites are not sufficiently paranoid / redundant here. -george william herbert Sent from Kangphone

George Michaelson

11:36 p.m.

you removed a clause in that sentence randy: "we're already outside our operating envelope, if these community expectation figures are believable" there is a point to that clause. its the same as your answer in some respects. On Fri, Sep 13, 2013 at 8:39 AM, Randy Bush <randy@psg.com> wrote:

...

...
we're already outside our operating envelope

not really. just some folk seem not to understand things such as udp datagrams and the dns protocols.

randy

Christopher Morrow

13 Sep 13 Sep

2 a.m.

On Thu, Sep 12, 2013 at 6:26 PM, George William Herbert <george.herbert@gmail.com> wrote:

...

The other subthread about routeability plays into that. For BIGPLACE environments, you should be considering how many AS numbers independently host DNS instances for you, in how many geographical regions, and do you have a backup registrar available spun up...

here's an interesting point... if you are a BIGPLACE, do you want to trust your fate to some third party hosting your dns for you? What about how your internal name service stuff is managed? say you have a practice of using rsh to affect updates across your 4 main dns nodes, adding a 5th or Nth outside where rsh is not possible/desired .... means adding additional processes and cruft to your update process, is this acceptable? Take, for instance the FBI.gov domain 3 days ago, some set of updates happened, their ipv4 servers were answering with a consistent response, their ipv6 nodes were answering with a variety of not correct answers :( In the case of the FBI.gov domain, all of it is handled outside 'fbi.gov hands' (all servers hosted externally) but... -chris

Valdis.Kletnieks＠vt.edu

12:45 a.m.

On Thu, 12 Sep 2013 14:03:44 -0600, Phil Fagan said:

...

Everything else remaining equal...is there a standard or expectation for DNS reliability?

98% 99% 99.5% 99.9% 99.99% 99.999%

Measured in queries completed vs. queries lost.

Whats the consensus?

Remember to factor in Duane Wessel's work that showed that something like 98% of the DNS traffic at the root servers was totally bogus? Maybe you need to factor in "broken queries not answered, and offenders slapped around with a large trout"? Because if it's busted requests you're sending towards the root, they're going to count against your completed/lost ratio in a really bad way. Anybody know if people have cleaned up their collective acts since Duane did that paper?

Sebastian Castro

16 Sep 16 Sep

8:45 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/09/13 12:45, Valdis.Kletnieks@vt.edu wrote:

...

On Thu, 12 Sep 2013 14:03:44 -0600, Phil Fagan said:

...
Everything else remaining equal...is there a standard or expectation for DNS reliability?

98% 99% 99.5% 99.9% 99.99% 99.999%

Measured in queries completed vs. queries lost.

Whats the consensus?

Remember to factor in Duane Wessel's work that showed that something like 98% of the DNS traffic at the root servers was totally bogus?

Maybe you need to factor in "broken queries not answered, and offenders slapped around with a large trout"? Because if it's busted requests you're sending towards the root, they're going to count against your completed/lost ratio in a really bad way.

Anybody know if people have cleaned up their collective acts since Duane did that paper?

Wearing a different hat, I had the chance to rerun that analysis with data from 2008 (original paper is from 2003) and the number were still around 98% http://www.caida.org/publications/presentations/2008/wide_castro_root_server... Cheers, - -- Sebastian Castro DNS Specialist .nz Registry Services (New Zealand Domain Name Registry Limited) desk: +64 4 495 2337 mobile: +64 21 400535 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iEYEARECAAYFAlI3bfYACgkQWyqRrHcQWTkagwCeOaShzFH1i8q9Y34/cybV6bUY qBYAn1A8JPgNJqH6mijUFN7+4ufybJqZ =X7UE -----END PGP SIGNATURE-----

4573

Age (days ago)

4577

Last active (days ago)

List overview

Download

17 comments

12 participants

participants (12)

Beavis
Bryan Tong
Christopher Morrow
Eric Brunner-Williams
George Michaelson
George William Herbert
Glen Wiley
Phil Fagan
Randy Bush
Rubens Kuhl
Sebastian Castro
Valdis.Kletnieks＠vt.edu

DNS Reliability

Phil Fagan

Phil Fagan

Glen Wiley

Phil Fagan

Phil Fagan

Eric Brunner-Williams

George Michaelson

George Michaelson

Sebastian Castro

tags

participants (12)