Recommended DNS server for a medium 20-30k users isp

Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any. Thank you /DP

We find SimpleDNSPlus (https://simpledns.plus<https://simpledns.plus/>) scales quite well. It runs under windows, but as long as you operated behind a good firewall, it’s not a security issue. It also can host zones as well, so can double as a recursive name server and as a primary or secondary name server for your hosted domains. It includes a nice graphical display of performance metrics. -mel via cell On Aug 7, 2025, at 5:45 PM, DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> wrote: Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any. Thank you /DP _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...

For millions of customers I would use Knot https://www.knot-resolver.cz/ - Andrew "lathama" Latham - On Thu, Aug 7, 2025, 18:45 DurgaPrasad - DatasoftComnet via NANOG < nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...

Maybe I'm naive here, but would ISC BIND not be a reasonably good choice? -Rusty On Thu, Aug 7, 2025 at 8:45 PM DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...

We ran BIND on Linux boxes for decades, but just got tired of the tedious maintenance tasks and error-prone zone file maintenance. SimpleDNSPlus has a great GUI, simple software update procedures, excellent realtime performance monitoring, and paid technical support. The pointless tedium of bare-bones BIND doesn’t make a lot of sense for resilient infrastructure. SimpleDNSPlus is just one of many packaged DNS products. You choose one based on the features and scalability you need. -mel via cell
On Aug 7, 2025, at 6:36 PM, Rusty Dekema via NANOG <nanog@lists.nanog.org> wrote:
Maybe I'm naive here, but would ISC BIND not be a reasonably good choice?
-Rusty
On Thu, Aug 7, 2025 at 8:45 PM DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7JEVOCYI...

BrbOS: https://brbytelatam.com/brbos Enviado do Gmail para iPad On Thu, 7 Aug 2025 at 22:45 Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
We ran BIND on Linux boxes for decades, but just got tired of the tedious maintenance tasks and error-prone zone file maintenance. SimpleDNSPlus has a great GUI, simple software update procedures, excellent realtime performance monitoring, and paid technical support. The pointless tedium of bare-bones BIND doesn’t make a lot of sense for resilient infrastructure.
SimpleDNSPlus is just one of many packaged DNS products. You choose one based on the features and scalability you need.
-mel via cell
On Aug 7, 2025, at 6:36 PM, Rusty Dekema via NANOG < nanog@lists.nanog.org> wrote:
Maybe I'm naive here, but would ISC BIND not be a reasonably good choice?
-Rusty
On Thu, Aug 7, 2025 at 8:45 PM DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7JEVOCYI... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7D6INWHS...

Webmin is a gui for bind if you're managing a zone. For 30k users it sounds more like a cache server, which bind/named would be a great option for. On Thu, Aug 7, 2025 at 9:45 PM Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
We ran BIND on Linux boxes for decades, but just got tired of the tedious maintenance tasks and error-prone zone file maintenance. SimpleDNSPlus has a great GUI, simple software update procedures, excellent realtime performance monitoring, and paid technical support. The pointless tedium of bare-bones BIND doesn’t make a lot of sense for resilient infrastructure.
SimpleDNSPlus is just one of many packaged DNS products. You choose one based on the features and scalability you need.
-mel via cell
On Aug 7, 2025, at 6:36 PM, Rusty Dekema via NANOG < nanog@lists.nanog.org> wrote:
Maybe I'm naive here, but would ISC BIND not be a reasonably good choice?
-Rusty
On Thu, Aug 7, 2025 at 8:45 PM DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7JEVOCYI... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/7D6INWHS...

On Thu, 2025-08-07 at 22:02 -0400, Josh Luthman via NANOG wrote:
Webmin is a gui for bind if you're managing a zone.
For 30k users it sounds more like a cache server, which bind/named would be a great option for.
You can of course should spread the load out across several servers for redundancy. DNS clients typically round robin requests between servers. On Linux boxes, you can also set up client caches which will reduce requests across the network. Looks like Windows clients and OSX have a similar caching service. Bind also caches requests from remote zones as well. I am not familiar with other DNS servers, but I bet they all have similar functionality. It may just come down to ease of administration. -- Smoot Carl-Mitchell System/Network Architect voice: +1 480 922-7313 cell: +1 602 421-9005 smoot@tic.com

On Thu, Aug 7, 2025, 20:45 DurgaPrasad - DatasoftComnet via NANOG < nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP
<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISSISPWQY3YGF25FBQNN2JD5HDP/> It's surprising that you didn't get the performance you hoped for out of PowerDNS. You already tried the suggestions in their tuning guide[0], I'm assuming? You may also want to load in entire zones to the hot cache[1]. And there's always horizontal scaling; sometimes you just plain hit limits on vertical scale. I haven't tried it yet, but dnsdist[2] should let you do this. (Or keepalived and/or HAproxy, or... etc. Any loadbalancer that can handle raw TCP and UDP.) Dnsdist in particular seems explicitly targeted towards a large set of untrusted clients with additional optional "safeguarding/consumer protection" features. Quad9 uses it in some fashion, if I recall correctly. [0] https://doc.powerdns.com/recursor/performance.html [1] https://docs.powerdns.com/recursor/lua-config/ztc.html [2] https://www.dnsdist.org/index.html

On 7 Aug 2025, at 20:53, brent saner via NANOG wrote:
On Thu, Aug 7, 2025, 20:45 DurgaPrasad - DatasoftComnet via NANOG < nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP
It's surprising that you didn't get the performance you hoped for out of PowerDNS. You already tried the suggestions in their tuning guide[0], I'm assuming?
You may also want to load in entire zones to the hot cache[1].
And there's always horizontal scaling; sometimes you just plain hit limits on vertical scale.
I haven't tried it yet, but dnsdist[2] should let you do this. (Or keepalived and/or HAproxy, or... etc. Any loadbalancer that can handle raw TCP and UDP.) Dnsdist in particular seems explicitly targeted towards a large set of untrusted clients with additional optional "safeguarding/consumer protection" features. Quad9 uses it in some fashion, if I recall correctly.
[0] https://doc.powerdns.com/recursor/performance.html [1] https://docs.powerdns.com/recursor/lua-config/ztc.html [2] https://www.dnsdist.org/index.html
You beat me to it - dnsdist is an exceptionally robust solution for front-ending recursive (or authoritative) servers. Quad9 is indeed using it for all our recursive systems, and we split traffic on the "back-end" between PowerDNS recursor and Unbound. It (dnsdist) has a "packet cache" feature which handles much of the load once warmed, and it answers on DOT/DOH as well as providing for a very rich set of tooling that allows management of unwanted behaviors. The combination of dnsdist plus a good recursive resolver should easily be able to handle 30k users on a single modest chassis with ease, though of course it there are very good reasons to have several systems similarly configured in fail-over models using ECMP or your favorite routing protocol. Hot caches work better - try not to spread load too much.) At this point, I can't imagine running a recursive system that is open to anything other than a tiny number of users without ensuring that dnsdist is in front of it - it's exactly the right thing and has been sandblasted by a lot of trial-and-error to make it fast and reliable with lots of features for ISP environments. If a decent-sized system doesn't seem fast, there may be some other underlying issue that is at the root of a perceived speed issue. There is useful data that can be pulled out of dnsdist with prometheus-style outputs - I would suggest instrumenting things and seeing where the problems are. Now, the original question of "points on how much we tune the settings" - that is a much longer discussion, but honestly you can get to 80% goodput without too much fiddling. JT

Not a lot of detail on your needs, but you may consider just providing service through one of the very big DNS providers. The expense of building, managing, and supporting your own infrastructure is not insignificant. You may be able to offer add-on services through a big provider that may be difficult to roll your own like security features, safe searches, parental controls, etc. On Thu, Aug 7, 2025 at 9:42 PM John Todd via NANOG <nanog@lists.nanog.org> wrote:
On 7 Aug 2025, at 20:53, brent saner via NANOG wrote:
On Thu, Aug 7, 2025, 20:45 DurgaPrasad - DatasoftComnet via NANOG < nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP
< https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...
It's surprising that you didn't get the performance you hoped for out of PowerDNS. You already tried the suggestions in their tuning guide[0], I'm assuming?
You may also want to load in entire zones to the hot cache[1].
And there's always horizontal scaling; sometimes you just plain hit limits on vertical scale.
I haven't tried it yet, but dnsdist[2] should let you do this. (Or keepalived and/or HAproxy, or... etc. Any loadbalancer that can handle raw TCP and UDP.) Dnsdist in particular seems explicitly targeted towards a large set of untrusted clients with additional optional "safeguarding/consumer protection" features. Quad9 uses it in some fashion, if I recall correctly.
[0] https://doc.powerdns.com/recursor/performance.html [1] https://docs.powerdns.com/recursor/lua-config/ztc.html [2] https://www.dnsdist.org/index.html
You beat me to it - dnsdist is an exceptionally robust solution for front-ending recursive (or authoritative) servers. Quad9 is indeed using it for all our recursive systems, and we split traffic on the "back-end" between PowerDNS recursor and Unbound. It (dnsdist) has a "packet cache" feature which handles much of the load once warmed, and it answers on DOT/DOH as well as providing for a very rich set of tooling that allows management of unwanted behaviors. The combination of dnsdist plus a good recursive resolver should easily be able to handle 30k users on a single modest chassis with ease, though of course it there are very good reasons to have several systems similarly configured in fail-over models using ECMP or your favorite routing protocol. Hot caches work better - try not to spread load too much.) At this point, I can't imagine running a recursive system that is open to anything other than a tiny number of users without ensuring that dnsdist is in front of it -! it's exa ctly the right thing and has been sandblasted by a lot of trial-and-error to make it fast and reliable with lots of features for ISP environments.
If a decent-sized system doesn't seem fast, there may be some other underlying issue that is at the root of a perceived speed issue. There is useful data that can be pulled out of dnsdist with prometheus-style outputs - I would suggest instrumenting things and seeing where the problems are.
Now, the original question of "points on how much we tune the settings" - that is a much longer discussion, but honestly you can get to 80% goodput without too much fiddling.
JT _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/J4WSKWYC...

*NEVER* use an off-net resolving DNS server for an ISP. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Crist Clark via NANOG" <nanog@lists.nanog.org> To: "North American Network Operators Group" <nanog@lists.nanog.org> Cc: "Crist Clark" <cjc+nanog@pumpky.net> Sent: Friday, August 8, 2025 12:22:03 AM Subject: Re: Recommended DNS server for a medium 20-30k users isp Not a lot of detail on your needs, but you may consider just providing service through one of the very big DNS providers. The expense of building, managing, and supporting your own infrastructure is not insignificant. You may be able to offer add-on services through a big provider that may be difficult to roll your own like security features, safe searches, parental controls, etc. On Thu, Aug 7, 2025 at 9:42 PM John Todd via NANOG <nanog@lists.nanog.org> wrote:
On 7 Aug 2025, at 20:53, brent saner via NANOG wrote:
On Thu, Aug 7, 2025, 20:45 DurgaPrasad - DatasoftComnet via NANOG < nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP
< https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...
It's surprising that you didn't get the performance you hoped for out of PowerDNS. You already tried the suggestions in their tuning guide[0], I'm assuming?
You may also want to load in entire zones to the hot cache[1].
And there's always horizontal scaling; sometimes you just plain hit limits on vertical scale.
I haven't tried it yet, but dnsdist[2] should let you do this. (Or keepalived and/or HAproxy, or... etc. Any loadbalancer that can handle raw TCP and UDP.) Dnsdist in particular seems explicitly targeted towards a large set of untrusted clients with additional optional "safeguarding/consumer protection" features. Quad9 uses it in some fashion, if I recall correctly.
[0] https://doc.powerdns.com/recursor/performance.html [1] https://docs.powerdns.com/recursor/lua-config/ztc.html [2] https://www.dnsdist.org/index.html
You beat me to it - dnsdist is an exceptionally robust solution for front-ending recursive (or authoritative) servers. Quad9 is indeed using it for all our recursive systems, and we split traffic on the "back-end" between PowerDNS recursor and Unbound. It (dnsdist) has a "packet cache" feature which handles much of the load once warmed, and it answers on DOT/DOH as well as providing for a very rich set of tooling that allows management of unwanted behaviors. The combination of dnsdist plus a good recursive resolver should easily be able to handle 30k users on a single modest chassis with ease, though of course it there are very good reasons to have several systems similarly configured in fail-over models using ECMP or your favorite routing protocol. Hot caches work better - try not to spread load too much.) At this point, I can't imagine running a recursive system that is open to anything other than a tiny number of users without ensuring that dnsdist is in front of it -! it's exa ctly the right thing and has been sandblasted by a lot of trial-and-error to make it fast and reliable with lots of features for ISP environments.
If a decent-sized system doesn't seem fast, there may be some other underlying issue that is at the root of a perceived speed issue. There is useful data that can be pulled out of dnsdist with prometheus-style outputs - I would suggest instrumenting things and seeing where the problems are.
Now, the original question of "points on how much we tune the settings" - that is a much longer discussion, but honestly you can get to 80% goodput without too much fiddling.
JT _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/J4WSKWYC...
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/WD56K3TZ...

On Fri, Aug 8, 2025, 00:41 John Todd <jtodd@loligo.com> wrote:
You beat me to it - dnsdist is an exceptionally robust solution for front-ending recursive (or authoritative) servers. Quad9 is indeed using it for all our recursive systems, and we split traffic on the "back-end" between PowerDNS recursor and Unbound. It (dnsdist) has a "packet cache" feature which handles much of the load once warmed, and it answers on DOT/DOH as well as providing for a very rich set of tooling that allows management of unwanted behaviors.
Thanks, John! I was considering evaling/deploying dnsdist for our own customers, and this has me convinced that's a solid direction; if it works well for y'all at Quad9, it'd definitely work for us. Cheers!

On Aug 7, 2025, at 9:41 PM, John Todd via NANOG <nanog@lists.nanog.org> wrote:
we split traffic on the "back-end" between PowerDNS recursor and Unbound
Using multiple products is definitely best practice. At my company, we have half of our (anycasted) authoritative DNS servers using BIND, and the other half using PowerDNS. If you don't do this, you can be vulnerable to something like CVE-2025-40775, where an attacker can terminate all your DNS servers simultaneously by sending each a malicious packet. Or maybe there's some other bug in the software that makes it randomly crash at a certain time. If this happens, you want to make sure that only half of them go offline. -- Robert L Mathews

Am 08.08.2025 um 00:44:40 Uhr schrieb DurgaPrasad - DatasoftComnet via NANOG:
Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
The university I worked at ran ISC BIND9 on 2 machines as recursive resolver and authoritative server. You can run multiple of them and assign them randomly to your customers if the load is too high. -- Gruß Marco Send unsolicited bulk mail to 1754606680muell@cartoonies.org

On Thu, Aug 7, 2025 at 5:44 PM DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> wrote:
Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Howdy, For 30k users, a pair of bind9 servers will do just fine without any special performance tuning. Whether you use bind9 or any other DNS server software, the key things are that these should be bare metal, not virtual machines, and they should be dedicated to the DNS task. VMs or competing workloads introduce latency which will be perceptible in your DNS performance. You'll observe that the CPU is lightly used on these machines, and that's the result you want to see. This is true even if, for some reason, the bulk of your users do not employ DOH to a public server for the web browser DNS lookups. On Thu, Aug 7, 2025 at 7:17 PM Smoot Carl-Mitchell via NANOG <nanog@lists.nanog.org> wrote:
DNS clients typically round robin requests between servers.
They do not. DNS resolvers may round-robin requests between authoritative servers, but clients usually talk to resolvers in the order configured. It's something to keep in mind if you want to spread the load between the DNS resolvers. 30k users is not enough for it to make much difference. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/

Hi, We use PowerDNS Recursor together with dnsdist to handle millions of DNS requests per day for more than 100k users. In our experience, a small server such as one from the Intel E22xx series with 32 GB of RAM is sufficient for this setup. Based on my experience, you only need to install dnsdist for load balancing and implement per-IP rate limiting. Best regards, David From: William Herrin via NANOG <nanog@lists.nanog.org> Date: Friday, August 8, 2025 at 17:21 To: North American Network Operators Group <nanog@lists.nanog.org> Cc: DurgaPrasad - DatasoftComnet <dp@datasoftcomnet.com>, William Herrin <bill@herrin.us> Subject: Re: Recommended DNS server for a medium 20-30k users isp On Thu, Aug 7, 2025 at 5:44 PM DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> wrote:
Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Howdy, For 30k users, a pair of bind9 servers will do just fine without any special performance tuning. Whether you use bind9 or any other DNS server software, the key things are that these should be bare metal, not virtual machines, and they should be dedicated to the DNS task. VMs or competing workloads introduce latency which will be perceptible in your DNS performance. You'll observe that the CPU is lightly used on these machines, and that's the result you want to see. This is true even if, for some reason, the bulk of your users do not employ DOH to a public server for the web browser DNS lookups. On Thu, Aug 7, 2025 at 7:17 PM Smoot Carl-Mitchell via NANOG <nanog@lists.nanog.org> wrote:
DNS clients typically round robin requests between servers.
They do not. DNS resolvers may round-robin requests between authoritative servers, but clients usually talk to resolvers in the order configured. It's something to keep in mind if you want to spread the load between the DNS resolvers. 30k users is not enough for it to make much difference. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/ _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/PHCJ4ABP...

Subject: Recommended DNS server for a medium 20-30k users isp Date: Fri, Aug 08, 2025 at 12:44:40AM +0000 Quoting DurgaPrasad - DatasoftComnet via NANOG (nanog@lists.nanog.org):
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
anycast unbound, preferably on something more mature than Linux, so like FreeBSD or OpenBSD. crucial part being _anycast_ so you don't have to pay protection money to the likes of haproxy or F5, but still can have good service availability. troublish thing with resolver service is that the clients have a tendency to wait painfully long before they try No. 2 in the resolver list, so fast answers from the first one are kind of important. my one advice on anycast is to make _certain_ that the routing reflects service availability on individual nodes -- i.e a node that can't answer queries MUST stop advertising the resolver /128 (or /32 if you have that). I have built this several times at various organisations. it is solid. as in "it just works". also, since I made certain my resolvers speak ipv6, resolution is much snappier. auth DNS service has a very good v6 roll out status, overall. on tuning, you have a metric ton of options in unbound -- considerably more so than in BIND. otoh, since I learnt of unbound I have avoided BIND for recursive service, so there mightabeen some evolution there. with that, the people at cz.nic (knot resolver) are quite competent, so I would follow the advice given and look at their offering too. of course you can run anyast with knot resolver too. -- Måns Nilsson primary/secondary/besserwisser/machina MN-1334-RIPE SA0XLR +46 705 989668 Hmmm ... a PINHEAD, during an EARTHQUAKE, encounters an ALL-MIDGET FIDDLE ORCHESTRA ... ha ... ha ...

On Fri, 8 Aug 2025 at 12:19, Måns Nilsson via NANOG <nanog@lists.nanog.org> wrote:
my one advice on anycast is to make _certain_ that the routing reflects service availability on individual nodes -- i.e a node that can't answer queries MUST stop advertising the resolver /128 (or /32 if you have that).
If you do this in a single ASN, where you can guarantee preferences are honored, then instead of pulling advertisement, deprefer it. Eventually you will manage to cause an issue, where all advertisements are falsely pulled. Same strategy works in any domain where you are testing if something works, like default route by pinging 8.8.8.8, don't pull, depref. -- ++ytti

Subject: Re: Recommended DNS server for a medium 20-30k users isp Date: Fri, Aug 08, 2025 at 12:23:32PM +0300 Quoting Saku Ytti (saku@ytti.fi):
On Fri, 8 Aug 2025 at 12:19, Måns Nilsson via NANOG <nanog@lists.nanog.org> wrote:
Eventually you will manage to cause an issue, where all advertisements are falsely pulled.
good advice. -- Måns Nilsson primary/secondary/besserwisser/machina MN-1334-RIPE SA0XLR +46 705 989668 Do you like "TENDER VITTLES"?

Saku Ytti via NANOG wrote on 08/08/2025 10:23:
Eventually you will manage to cause an issue, where all advertisements are falsely pulled.
Someone up-thread mentioned firewalling DNS servers. Withdrawing DNS service workers due to firewall state overloading can cause cascading service failure which can take out an entire DNS infrastructure within milliseconds. Don't ask me how I know this. Also obviously works when n=1. tl;dr: packet filters only for DNS, preferably in hardware. Don't ever use state tracking. Nick

Nick Hilliard said: Withdrawing DNS service workers due to firewall state overloading can cause cascading service failure which can take out an entire DNS infrastructure within milliseconds. Don't ask me how I know this. Also obviously works when n=1. tl;dr: packet filters only for DNS, preferably in hardware. Don't ever use state tracking Nick, Appropriately sized, HA firewall pairs mitigate this pretty handily. In my opinion, the days of not firewaling critical infrastructure are pretty much over. There are just two many potential vulnerabilites to expect packet filters alone to addres them. If necessary, you can use multiple segregated firewalled networks for redundancy to mitigate cascading service failures. -mel ________________________________ From: Nick Hilliard via NANOG <nanog@lists.nanog.org> Sent: Friday, August 8, 2025 4:05 AM To: North American Network Operators Group <nanog@lists.nanog.org> Cc: Nick Hilliard <nick@foobar.org> Subject: Re: Recommended DNS server for a medium 20-30k users isp Saku Ytti via NANOG wrote on 08/08/2025 10:23:
Eventually you will manage to cause an issue, where all advertisements are falsely pulled.
Someone up-thread mentioned firewalling DNS servers. Withdrawing DNS service workers due to firewall state overloading can cause cascading service failure which can take out an entire DNS infrastructure within milliseconds. Don't ask me how I know this. Also obviously works when n=1. tl;dr: packet filters only for DNS, preferably in hardware. Don't ever use state tracking. Nick _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/UGOKLG42...

We do do it. No problems in ten years. We upgrade the firewalls to cheaper, faster, more reliable models every few years. In the meantime, DNS traffic has actual declined, probably due to DOH. I'm happy to hear your war stories 🙂 -mel ________________________________ From: Nick Hilliard <nick@foobar.org> Sent: Friday, August 8, 2025 9:19 AM To: Mel Beckman <mel@beckman.org> Cc: North American Network Operators Group <nanog@lists.nanog.org> Subject: Re: Recommended DNS server for a medium 20-30k users isp Mel Beckman wrote on 08/08/2025 17:08:
Appropriately sized, HA firewall pairs mitigate this pretty handily.
Mel, Please don't let me stop you from doing this. The failure modes are really quite entertaining, at least from a distance. Anyone got popcorn? Nick

Yeah, As a person that in my $dailyjob builds hardware firewalls (so called NGFWs but also "SP class" boxes), I can assure you properly configured DNS servers can absolutely defend themselves. If they need protection, you're doing it wrong. And there are design choices (load balancers, ECMP/UCMP, anycast) that makes these designs scale and switch over without any problems if additional "capabilities" don't go in their way. Adding stateful firewall in front of them is waste of good hardware. More over, if you insist on doing so, you'll likely suffer from state exhaustion or self-DDoS at one point in time. That typically leads you to blame firewall vendor, and not your poor thinking, design or planning skills. Don't do that. KISS is decent design practice. Doing "tricks" with firewall may be relevant to Enterprise type of deployment, where "fusing" DNS info with other pieces (identity, data plane telemetry, etc) is typically element of your security architecture (and defense). What is way more useful for layered defence is applying QoS on upstream switch/router if it is enforced in hardware. "QoS" as expressed in maximum packets/second (which are roughly requests), not as in bits/second (which is pretty useless). That is, if you do know your rough levels exceeding which makes your server behave in less stable/predictable way. This is hardly unique or innovative though. I did deploy myself, and helped others to deploy FreeBSD-based BIND and nsd+unbound anycasted DNS servers. Biggest one (two pairs of Xeon based servers) was handling requests from ~3 million users while mostly idling last time I checked. And that was couple of years ago. I know it's still in production and handling "more". The only firewall they have is pf with pretty generic set of rules to drop host attacks and protect management access, DNS traffic is unfiltered as it doesn't make any sense. -- ./
On 8 Aug 2025, at 18:20, Nick Hilliard via NANOG <nanog@lists.nanog.org> wrote:
Mel Beckman wrote on 08/08/2025 17:08:
Appropriately sized, HA firewall pairs mitigate this pretty handily.
Mel,
Please don't let me stop you from doing this. The failure modes are really quite entertaining, at least from a distance. Anyone got popcorn?
Nick _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/H5WQB2KF...

Subject: Re: Recommended DNS server for a medium 20-30k users isp Date: Fri, Aug 08, 2025 at 05:19:39PM +0100 Quoting Nick Hilliard via NANOG (nanog@lists.nanog.org):
Mel Beckman wrote on 08/08/2025 17:08:
Appropriately sized, HA firewall pairs mitigate this pretty handily.
Mel,
Please don't let me stop you from doing this. The failure modes are really quite entertaining, at least from a distance. Anyone got popcorn?
I suppose you bring the beer then, because it's going to take both to endure the cringefest that is "cascading resource exhaustion in DNS / firewall setup" -- it can pretty fast end up snowballing completely out of hand. Don't ask me how I know without picking up the bar tab. /Måns -- Måns Nilsson primary/secondary/besserwisser/machina MN-1334-RIPE SA0XLR +46 705 989668 Am I accompanied by a PARENT or GUARDIAN?

Sheesh! People claiming firewalling DNS is bad, but hide the receipts behind “pay my bar tab” evasion. Here’s the real bar talk: put up or shut up. LOL! Data or it never happened. -mel via cell
On Aug 9, 2025, at 5:42 AM, Måns Nilsson <mansaxel@besserwisser.org> wrote:
Subject: Re: Recommended DNS server for a medium 20-30k users isp Date: Fri, Aug 08, 2025 at 05:19:39PM +0100 Quoting Nick Hilliard via NANOG (nanog@lists.nanog.org):
Mel Beckman wrote on 08/08/2025 17:08:
Appropriately sized, HA firewall pairs mitigate this pretty handily.
Mel,
Please don't let me stop you from doing this. The failure modes are really quite entertaining, at least from a distance. Anyone got popcorn?
I suppose you bring the beer then, because it's going to take both to endure the cringefest that is "cascading resource exhaustion in DNS / firewall setup" -- it can pretty fast end up snowballing completely out of hand. Don't ask me how I know without picking up the bar tab.
/Måns -- Måns Nilsson primary/secondary/besserwisser/machina MN-1334-RIPE SA0XLR +46 705 989668 Am I accompanied by a PARENT or GUARDIAN? <signature.asc>

On Sat, 9 Aug 2025 at 15:42, Måns Nilsson via NANOG <nanog@lists.nanog.org> wrote:
I suppose you bring the beer then, because it's going to take both to endure the cringefest that is "cascading resource exhaustion in DNS / firewall setup" -- it can pretty fast end up snowballing completely out of hand. Don't ask me how I know without picking up the bar tab.
I can share lessons from personal mistakes. a) FW is always additional fuse in front of service, failure modes are union of FW and Service, so MTBF is lower and MTTR is higher - state establishment rate is reduced - state count is reduced - either FW has protocol intelligence and occasionally as protocols evolve or more exotic use cases exist drops valid protocol packets or protocol unintelligent and doesn't add anything to stateless HW based filter on edge router - any service protected by FW is easier to DoS than same service without FW b) Even if FW is ran (like in front of corporate LAN which doesn't have to deal with denial of service issues and regulator or PCI or equivalent may require FW) valid configurations in my mind are - if 2 == cluster, 1 == single and + == routing separation - 1, 1+1, 2+1 are valid configurations - 2 and 2+2 are invalid configurations - every time i've ran '2', eventually there has been case where cluster is dead and MTTR is high as vendor needs to be engaged and depending on hour the people at vendor who actually can troubleshoot the issue are not at work (used to be US hours, now increasingly experts are in India time) - So if you can only afford 2 devices, have two devices separated by routing, you'll lose state during failure, but you have less failures, even if you can afford 4 devices, don't buy two clusters, since the problem that breaks cluster may affect both clusters Generally FW is needed if what is behind FW has dubious and únknown state (like user LAN). But if what is behind FW is well thought out DNS or HTTP service FW adds no utility and a lot of liability. -- ++ytti

Saku, Thanks for the well delineated examples. I agree with them. You clearly illustratewrong configurations that can cause unanticipated failure modes. Thus it’s best to follow established design patterns, rather than cooking your own recipe. But how is this different than using a firewall to protect any other service? Firewalls can fail, and thus require resiliency considerations. But they also can do a lot to insulate underlying services from attacks — source IP flooding, for example, or the myriad of sequence attacks — the kinds of attacks that are difficult to protect against in the pure IP stack. I submit that one major firewall advantage is consistency of implementation. People who are protecting their DNS by cleverly hardening them using packet filters and load balancing are doing so with error-prone manual methods. Human error, as HAL says, is always a problem. Firewall code, on the other hand, goes through certification processes and deep regression testing before being deployed. Firewall developers are dedicated to the protection mission, while people standing up DNS at many enterprises, including ISPs, are not DNS experts. DNS is just one of many services they must manage. I appreciate your anecdotes, but as every good scientist knows, the plural of anecdote is not data. I need to see some data backing up these claims about the relative unreliability of firewalls. -mel
On Aug 9, 2025, at 8:41 AM, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Sat, 9 Aug 2025 at 15:42, Måns Nilsson via NANOG <nanog@lists.nanog.org> wrote:
I suppose you bring the beer then, because it's going to take both to endure the cringefest that is "cascading resource exhaustion in DNS / firewall setup" -- it can pretty fast end up snowballing completely out of hand. Don't ask me how I know without picking up the bar tab.
I can share lessons from personal mistakes.
a) FW is always additional fuse in front of service, failure modes are union of FW and Service, so MTBF is lower and MTTR is higher - state establishment rate is reduced - state count is reduced - either FW has protocol intelligence and occasionally as protocols evolve or more exotic use cases exist drops valid protocol packets or protocol unintelligent and doesn't add anything to stateless HW based filter on edge router - any service protected by FW is easier to DoS than same service without FW
b) Even if FW is ran (like in front of corporate LAN which doesn't have to deal with denial of service issues and regulator or PCI or equivalent may require FW) valid configurations in my mind are - if 2 == cluster, 1 == single and + == routing separation - 1, 1+1, 2+1 are valid configurations - 2 and 2+2 are invalid configurations - every time i've ran '2', eventually there has been case where cluster is dead and MTTR is high as vendor needs to be engaged and depending on hour the people at vendor who actually can troubleshoot the issue are not at work (used to be US hours, now increasingly experts are in India time) - So if you can only afford 2 devices, have two devices separated by routing, you'll lose state during failure, but you have less failures, even if you can afford 4 devices, don't buy two clusters, since the problem that breaks cluster may affect both clusters
Generally FW is needed if what is behind FW has dubious and únknown state (like user LAN). But if what is behind FW is well thought out DNS or HTTP service FW adds no utility and a lot of liability.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/DZZOX5JW...

Firewall have a long history of breaking DNS. They have been known to throw away UDP fragments. This breaks responses that exceed path MTU. There is this myth that IPv6 doesn’t have fragments so they can just be blocked so IPv6 is particularly bad in this respect. Drop ICMP PTB. This breaks PMTU discovery which partially affects IPv6 UDP responses getting though as the sender needs to fragment. It also stops TCP responses where the MSS and PMTU don’t align. MSS fix up wouldn’t be needed if ICMP PTB weren’t blocked and are consistently generated. Filter out every query type but a handful that are magically blessed. The firewalls this are oftern years behind the current query mix and DNS servers don’t need this service anyway. DNS servers know how to return this record does not exist. Additionally if you have added the record to the zone you don’t need a firewall second guessing your desires. Block DNS over TCP. DNS has ALWAYS used both UDP and TCP for normal queries. There have been plenty of times where UDP responses have said retry over TCP because the answer is to big only for the TCP request that be blocked because of the myth that DNS is only TCP. Run out of state tracking. Recursive servers make hundreds of queries per incoming query when their caches are empty. We’ve seen connection tracking tables overwhelmed often. Stupid firewalls that “know” that this bit is 0 or this type never appears in this section or there aren’t any EDNS options in requests or drop requests with unknown EDNS options. Nameservers have rules for dealing with the unknown and they are infinitely better than drop the request. I’m sure there are other stupidities I’ve seen firewalls do. Juniper were particularly bad until we complained enough to get the defaults changed. -- Mark Andrews
El 10 ago 2025, a las 3:45, Mel Beckman via NANOG <nanog@lists.nanog.org> escribió:
Saku,
Thanks for the well delineated examples. I agree with them. You clearly illustratewrong configurations that can cause unanticipated failure modes. Thus it’s best to follow established design patterns, rather than cooking your own recipe.
But how is this different than using a firewall to protect any other service? Firewalls can fail, and thus require resiliency considerations. But they also can do a lot to insulate underlying services from attacks — source IP flooding, for example, or the myriad of sequence attacks — the kinds of attacks that are difficult to protect against in the pure IP stack.
I submit that one major firewall advantage is consistency of implementation. People who are protecting their DNS by cleverly hardening them using packet filters and load balancing are doing so with error-prone manual methods. Human error, as HAL says, is always a problem. Firewall code, on the other hand, goes through certification processes and deep regression testing before being deployed. Firewall developers are dedicated to the protection mission, while people standing up DNS at many enterprises, including ISPs, are not DNS experts. DNS is just one of many services they must manage.
I appreciate your anecdotes, but as every good scientist knows, the plural of anecdote is not data. I need to see some data backing up these claims about the relative unreliability of firewalls.
-mel
On Aug 9, 2025, at 8:41 AM, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Sat, 9 Aug 2025 at 15:42, Måns Nilsson via NANOG <nanog@lists.nanog.org> wrote:
I suppose you bring the beer then, because it's going to take both to endure the cringefest that is "cascading resource exhaustion in DNS / firewall setup" -- it can pretty fast end up snowballing completely out of hand. Don't ask me how I know without picking up the bar tab.
I can share lessons from personal mistakes.
a) FW is always additional fuse in front of service, failure modes are union of FW and Service, so MTBF is lower and MTTR is higher - state establishment rate is reduced - state count is reduced - either FW has protocol intelligence and occasionally as protocols evolve or more exotic use cases exist drops valid protocol packets or protocol unintelligent and doesn't add anything to stateless HW based filter on edge router - any service protected by FW is easier to DoS than same service without FW
b) Even if FW is ran (like in front of corporate LAN which doesn't have to deal with denial of service issues and regulator or PCI or equivalent may require FW) valid configurations in my mind are - if 2 == cluster, 1 == single and + == routing separation - 1, 1+1, 2+1 are valid configurations - 2 and 2+2 are invalid configurations - every time i've ran '2', eventually there has been case where cluster is dead and MTTR is high as vendor needs to be engaged and depending on hour the people at vendor who actually can troubleshoot the issue are not at work (used to be US hours, now increasingly experts are in India time) - So if you can only afford 2 devices, have two devices separated by routing, you'll lose state during failure, but you have less failures, even if you can afford 4 devices, don't buy two clusters, since the problem that breaks cluster may affect both clusters
Generally FW is needed if what is behind FW has dubious and únknown state (like user LAN). But if what is behind FW is well thought out DNS or HTTP service FW adds no utility and a lot of liability.
-- ++ytti _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/DZZOX5JW...
NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CSKJGBIL...

But how is this different than using a firewall to protect any other service? Firewalls can fail, and thus require resiliency considerations. But they also can do a lot to insulate underlying services from attacks — source IP flooding, for example, or the myriad of sequence attacks — the kinds of attacks that are difficult to protect against in the pure IP stack.
For a DNS service , a few network ACLs upstream, combined with very standard protection mechanisms on host , is more than sufficient. I submit that one major firewall advantage is consistency of
implementation. People who are protecting their DNS by cleverly hardening them using packet filters and load balancing are doing so with error-prone manual methods
It has been trivial for many years now to manage entire fleets of servers with automation and tooling to maintain consistent configurations. Firewall code, on the other hand, goes through certification processes and
deep regression testing before being deployed. Firewall developers are dedicated to the protection mission, while people standing up DNS at many enterprises, including ISPs, are not DNS experts. DNS is just one of many services they must manage.
The large firewall vendors do way less testing than you are assuming here. Some are better than others, but there isn't a single one that is releasing excellent quality code on a timely basis. It's also fair to say that most enterprises aren't doing their own extensive testing of firewall code and operation themselves before deploying. I need to see some data backing up these claims about the relative
unreliability of firewalls.
I'm not sure anyone is making the claim that firewalls are unreliable. The statements are that putting stateful firewalls in front of a DNS service can cause said DNS service to become unreliable, because of the way stateful firewalls function, and the nature of DNS traffic and operation. On Sat, Aug 9, 2025 at 1:46 PM Mel Beckman via NANOG <nanog@lists.nanog.org> wrote:
Saku,
Thanks for the well delineated examples. I agree with them. You clearly illustratewrong configurations that can cause unanticipated failure modes. Thus it’s best to follow established design patterns, rather than cooking your own recipe.
But how is this different than using a firewall to protect any other service? Firewalls can fail, and thus require resiliency considerations. But they also can do a lot to insulate underlying services from attacks — source IP flooding, for example, or the myriad of sequence attacks — the kinds of attacks that are difficult to protect against in the pure IP stack.
I submit that one major firewall advantage is consistency of implementation. People who are protecting their DNS by cleverly hardening them using packet filters and load balancing are doing so with error-prone manual methods. Human error, as HAL says, is always a problem. Firewall code, on the other hand, goes through certification processes and deep regression testing before being deployed. Firewall developers are dedicated to the protection mission, while people standing up DNS at many enterprises, including ISPs, are not DNS experts. DNS is just one of many services they must manage.
I appreciate your anecdotes, but as every good scientist knows, the plural of anecdote is not data. I need to see some data backing up these claims about the relative unreliability of firewalls.
-mel
On Aug 9, 2025, at 8:41 AM, Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Sat, 9 Aug 2025 at 15:42, Måns Nilsson via NANOG <nanog@lists.nanog.org> wrote:
I suppose you bring the beer then, because it's going to take both to endure the cringefest that is "cascading resource exhaustion in DNS / firewall setup" -- it can pretty fast end up snowballing completely out of hand. Don't ask me how I know without picking up the bar tab.
I can share lessons from personal mistakes.
a) FW is always additional fuse in front of service, failure modes are union of FW and Service, so MTBF is lower and MTTR is higher - state establishment rate is reduced - state count is reduced - either FW has protocol intelligence and occasionally as protocols evolve or more exotic use cases exist drops valid protocol packets or protocol unintelligent and doesn't add anything to stateless HW based filter on edge router - any service protected by FW is easier to DoS than same service without FW
b) Even if FW is ran (like in front of corporate LAN which doesn't have to deal with denial of service issues and regulator or PCI or equivalent may require FW) valid configurations in my mind are - if 2 == cluster, 1 == single and + == routing separation - 1, 1+1, 2+1 are valid configurations - 2 and 2+2 are invalid configurations - every time i've ran '2', eventually there has been case where cluster is dead and MTTR is high as vendor needs to be engaged and depending on hour the people at vendor who actually can troubleshoot the issue are not at work (used to be US hours, now increasingly experts are in India time) - So if you can only afford 2 devices, have two devices separated by routing, you'll lose state during failure, but you have less failures, even if you can afford 4 devices, don't buy two clusters, since the problem that breaks cluster may affect both clusters
Generally FW is needed if what is behind FW has dubious and únknown state (like user LAN). But if what is behind FW is well thought out DNS or HTTP service FW adds no utility and a lot of liability.
-- ++ytti _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/DZZOX5JW... _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/CSKJGBIL...

On Fri, Aug 8, 2025 at 2:24 AM Saku Ytti via NANOG <nanog@lists.nanog.org> wrote:
On Fri, 8 Aug 2025 at 12:19, Måns Nilsson via NANOG <nanog@lists.nanog.org> wrote:
my one advice on anycast is to make _certain_ that the routing reflects service availability on individual nodes -- i.e a node that can't answer queries MUST stop advertising the resolver /128 (or /32 if you have that).
If you do this in a single ASN, where you can guarantee preferences are honored, then instead of pulling advertisement, deprefer it.
Eventually you will manage to cause an issue, where all advertisements are falsely pulled.
Same strategy works in any domain where you are testing if something works, like default route by pinging 8.8.8.8, don't pull, depref.
Having been bitten by this in the past...never base your determination of "healthy" or "working" on a single external data reference. It can be tempting to just assume 8.8.8.8 will always be "up" and "pingable" to verify your internet connectivity is good...right up to the point where Google has a routing snafu, and your DNS infrastructure goes into cascading failure as every one of your sites begins depreferencing its announcements based on the failure of the external health check, and the load begins shifting to a smaller and smaller number of serving sites that were slower at detecting and depreferencing their route announcements, often to the point where the final site is so overwhelmed by all the traffic slamming it that it can't perform healthcheck/depreferencing anymore. Always have at least 3 external probe destinations or health check sites, operated by different entities, and only depreference upon failure to reach 3/3 or 2/3. Do not make decisions about the health of your network based upon the health of a single external entity (unless they are your only upstream provider, or you otherwise share fate with them). If you're pinging someone else to make sure the internet is still alive, ping several, like 8.8.8.8, 1.1.1.1, and 9.9.9.9, and don't react unless you see failures to reach multiple of them. Otherwise, it's likely to be their failure, not yours, and there's no reason to make things worse by changing your systems based on their problems. ...so many painful lessons learned the hard way over the years... ^_^; Matt

On Mon, Aug 11, 2025 at 3:08 PM Matthew Petach via NANOG <nanog@lists.nanog.org> wrote:
often to the point where the final site is so overwhelmed by all the traffic slamming it that it can't perform healthcheck/depreferencing anymore.
Hi Matthew, The unix "nice" command helps in this situation. It's counterintuitive to run the critical Internet-facing service at a below-normal priority, but it works. Under normal load there's no difference in performance but when the server is overloaded administrative access and health checks have priority access to the CPU. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/

On Mon, Aug 11, 2025 at 3:40 PM William Herrin <bill@herrin.us> wrote:
On Mon, Aug 11, 2025 at 3:08 PM Matthew Petach via NANOG <nanog@lists.nanog.org> wrote:
often to the point where the final site is so overwhelmed by all the traffic slamming it that it can't perform healthcheck/depreferencing anymore.
Hi Matthew,
The unix "nice" command helps in this situation. It's counterintuitive to run the critical Internet-facing service at a below-normal priority, but it works. Under normal load there's no difference in performance but when the server is overloaded administrative access and health checks have priority access to the CPU.
Oh--I wasn't talking about the CPU having issues. I was talking about DDoSing your own site, with all the inbound traffic worldwide traffic focusing in on the last remaining site, hammering the network links to the point of absolute congestion. At that point, trying to send update messages to depref the anycast routes for the site generally fails, leading to an extended outage as all the traffic gets stuck trying to reach that last site. It's helpful to set a minimum number of anycast sites in your topology automation systems, such that sites will no longer remove themselves from rotation/distribution if doing so would reduce the count of active sites below the minimum required site count. Dynamic systems are great things, but as with most things in the world, "all things in moderation" is a good motto to keep in mind. Allow sites to dynamically adjust, but only within reasonably set bounds. Don't let too many sites decide they need to shed load at once; the first several, sure; but if the conditions continue, have a floor below which the system stops trying to react, and instead holds steady while paging a human to look at the bigger picture problem, before the entire system goes off line due to the lemmings of automation all chasing one another off the proverbial cliff. Fortunately for me, the search engine caches have long since purged out the evidence of how some of these lessons were learned. ^_^;; Matt

On Mon, Aug 11, 2025 at 6:16 PM Matthew Petach <mpetach@netflight.com> wrote:
Oh--I wasn't talking about the CPU having issues. I was talking about DDoSing your own site, with all the inbound traffic worldwide traffic focusing in on the last remaining site, hammering the network links to the point of absolute congestion. At that point, trying to send update messages to depref the anycast routes for the site generally fails, leading to an extended outage as all the traffic gets stuck trying to reach that last site.
Howdy. Why wouldn't the server itself be originating the announcement so that the high-pref route goes away when the routing session collapses?
It's helpful to set a minimum number of anycast sites in your topology automation systems, such that sites will no longer remove themselves from rotation/distribution if doing so would reduce the count of active sites below the minimum required site count.
Treading dangerous territory since the participants can't necessarily know the difference between a site that's down and a site that's inaccessible to them (but not other people). Might be safer for the system's components to intentionally collapse to the neutral routing preference at that point rather than waiting for the failure cascade to push system there. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/

On Mon, Aug 11, 2025 at 3:08 PM Matthew Petach via NANOG < nanog@lists.nanog.org> wrote:
Having been bitten by this in the past...never base your determination of "healthy" or "working" on a single external data reference. It can be tempting to just assume 8.8.8.8 will always be "up" and "pingable" to verify your internet connectivity is good...right up to the point where Google has a routing snafu
... No need for a routing snafu... 8.8.8.8 is current getting a steady-state 27Mpps (million packets/second) of ICMP ECHO_REQUEST. Internet connectivity checking is not a service we offer, and there is no SLA for it, therefore it may go away at any time. There is a very real risk of me running an April 1st experiment of "what would happen if I just ACL off all the pings?". I might have guessed I'd light up a couple dozen pagers and start a nanog@ flamewar... but if anyone is basing routing decisions on that, it will be a "fun" day indeed! Damian -- Damian Menscher :: Security Reliability Engineer :: Google :: AS15169

This here has always been my biggest concern with external monitoring. If the chosen site decides to deny ping one day then your monitoring tool is broken. Can do a quick DNS lookup via a DNS server, since they shouldn't turn that off. But, what happens when they notice the same site doing the same lookup(s) every x minutes. In the past I've utilized the root DNS servers as a good measurement tool. Majority are anycast. All are dual-stack so I get both IPv4 and IPv6 verification. If 60% of them are responding we should be good. But again this is load they aren't expecting, but I assume they know is happening. I can rotate through doing a DNS lookup for .com, .net, .org, .gov, etc. so that I'm not doing the same thing over and over and I'm utilizing something they are designed and prepared to handle. David -- https://dprall.net On 8/11/2025 8:08 PM, Damian Menscher via NANOG wrote:
On Mon, Aug 11, 2025 at 3:08 PM Matthew Petach via NANOG < nanog@lists.nanog.org> wrote:
Having been bitten by this in the past...never base your determination of "healthy" or "working" on a single external data reference. It can be tempting to just assume 8.8.8.8 will always be "up" and "pingable" to verify your internet connectivity is good...right up to the point where Google has a routing snafu
...
No need for a routing snafu... 8.8.8.8 is current getting a steady-state 27Mpps (million packets/second) of ICMP ECHO_REQUEST. Internet connectivity checking is not a service we offer, and there is no SLA for it, therefore it may go away at any time. There is a very real risk of me running an April 1st experiment of "what would happen if I just ACL off all the pings?". I might have guessed I'd light up a couple dozen pagers and start a nanog@ flamewar... but if anyone is basing routing decisions on that, it will be a "fun" day indeed!
Damian

You would be surprised as to what percentage of DNS recursive resolution traffic is "a.root-servers.net" and "www.example.com" and other more specific names like "connectivitycheck.gstatic.com" (which I know has different purposes.) Related: there is a draft at IETF about probing for "reachability" using the DNS rather than picking random names which tends to skew data or present un-necessary costs in various ways, or using ICMP echo. Since query-based status checking seems to be a thing that people do anyway, so maybe it should be formalized so everyone can use/expect the same methods. https://datatracker.ietf.org/doc/draft-sst-dnsop-probe-name/ Despite the flippant comment below about "april 1st experiment" with the largest global resolver, there is a significant risk associated with the concentration of measurements on systems with unintentional shared fate issues. I expect there is a large community of services which expect correct DNS resolution and ICMP echo response from "a.root-servers.net" and "www.google.com" as indicators of general network accessibility. If (for example) the services in .com/.net/.org were to be offline, this would probably create much larger impact than their localized outage since both those services would be offline which would trigger undetermined failure behaviors in many network monitoring/automation or application software stacks. Using IP addresses for service check destinations is slightly better but as noted, ICMP is rarely a service with an SLA, and ICMP echo is frequently blocked or heavily rate-limited. I will comment with my Quad9 hat on that there is no risk of us doing an April 1st experiment of turning off ICMP echo packets to 9.9.9.9. There are however real risks of ICMP having increased failure rates in DDOS conditions in any network, either locally or at the receiving end. As another DNS-oriented friend of mine has in his .sig: "The Prudent Mariner never relies solely on any single aid to navigation." JT On 12 Aug 2025, at 7:15, David Prall via NANOG wrote:
This here has always been my biggest concern with external monitoring. If the chosen site decides to deny ping one day then your monitoring tool is broken.
Can do a quick DNS lookup via a DNS server, since they shouldn't turn that off. But, what happens when they notice the same site doing the same lookup(s) every x minutes.
In the past I've utilized the root DNS servers as a good measurement tool. Majority are anycast. All are dual-stack so I get both IPv4 and IPv6 verification. If 60% of them are responding we should be good. But again this is load they aren't expecting, but I assume they know is happening. I can rotate through doing a DNS lookup for .com, .net, .org, .gov, etc. so that I'm not doing the same thing over and over and I'm utilizing something they are designed and prepared to handle.
David
On 8/11/2025 8:08 PM, Damian Menscher via NANOG wrote:
On Mon, Aug 11, 2025 at 3:08 PM Matthew Petach via NANOG < nanog@lists.nanog.org> wrote:
Having been bitten by this in the past...never base your determination of "healthy" or "working" on a single external data reference. It can be tempting to just assume 8.8.8.8 will always be "up" and "pingable" to verify your internet connectivity is good...right up to the point where Google has a routing snafu
...
No need for a routing snafu... 8.8.8.8 is current getting a steady-state 27Mpps (million packets/second) of ICMP ECHO_REQUEST. Internet connectivity checking is not a service we offer, and there is no SLA for it, therefore it may go away at any time. There is a very real risk of me running an April 1st experiment of "what would happen if I just ACL off all the pings?". I might have guessed I'd light up a couple dozen pagers and start a nanog@ flamewar... but if anyone is basing routing decisions on that, it will be a "fun" day indeed!
Damian
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/YIM6ZS3Z...

On Tue, Aug 12, 2025 at 10:48 AM David Prall via NANOG <nanog@lists.nanog.org> wrote:
Can do a quick DNS lookup via a DNS server, since they shouldn't turn that off. But, what happens when they notice the same site doing the same lookup(s) every x minutes.
I think they won't notice, because that kind of query volume is orders of magnitude less than average usage of 1 internet-connected device. That is if you are running 2 or 3 queries every 3 or 4 minutes. Meanwhile the average web-surfing user connects to websites that easily cause 20+ DNS queries over the span of a couple seconds in order to load a whole web page with all its JS frameworks, CSS, and Fonts being remote-loaded from various domains. Querying the service on the IP with an actual query is the best test, but it should be: use a few common FQDNs on different domains to run the lookup on, and not just one FQDN. If any of the lookups succeed, then the resolver is deemed "alive and working / available". If you only query one FQDN per resolver, then you might not always be able to easily distinguish between a failure of the target authoritative domain you are querying, versus a lack of responsiveness by that resolver in general -- -JA

On Tue, 12 Aug 2025 at 01:08, Matthew Petach <mpetach@netflight.com> wrote:
If you're pinging someone else to make sure the internet is still alive, ping several, like 8.8.8.8, 1.1.1.1, and 9.9.9.9, and don't react unless you see failures to reach multiple of them. Otherwise, it's likely to be their failure, not yours, and there's no reason to make things worse by changing your systems based on their problems.
I am bit repeating myself, apologies. But do also ensure that your health check is demoting, not removing. Like changing admin weight to inferior, this way if everything 'fails' because the health check is bogus, you are back to square 1. Very easy to with IP SLA/tracking and the like, yet most examples, even at vendor documentation remove, instead of demote. -- ++ytti

On Fri, Aug 8, 2025 at 2:17 AM Måns Nilsson via NANOG <nanog@lists.nanog.org> wrote:
anycast unbound, preferably on something more mature than Linux, so like FreeBSD or OpenBSD.
You don't need anycast DNS for 30k users. Stay away from anycast unless you really, really, really know what you're doing. DNS is also TCP and no commodity DNS software environment implements an anycast TCP stack, only the normal unicast stack. Route splitting shows up in the most unexpected places and it won't just give you a bad day, it'll give you a bad month with intractable and seemingly (but not really) intermittent problems that are challenging to nail down. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/

I do Anycast for much much smaller. It's great to reboot one server and have the other take all of the load. 0 customer interruption, not even a single DNS query lost. On Fri, Aug 8, 2025, 12:21 PM William Herrin via NANOG < nanog@lists.nanog.org> wrote:
On Fri, Aug 8, 2025 at 2:17 AM Måns Nilsson via NANOG <nanog@lists.nanog.org> wrote:
anycast unbound, preferably on something more mature than Linux, so like FreeBSD or OpenBSD.
You don't need anycast DNS for 30k users. Stay away from anycast unless you really, really, really know what you're doing.
DNS is also TCP and no commodity DNS software environment implements an anycast TCP stack, only the normal unicast stack. Route splitting shows up in the most unexpected places and it won't just give you a bad day, it'll give you a bad month with intractable and seemingly (but not really) intermittent problems that are challenging to nail down.
Regards, Bill Herrin
-- William Herrin bill@herrin.us https://bill.herrin.us/ _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/ZBFC32QZ...

On Fri, Aug 8, 2025 at 9:42 AM Josh Luthman <josh@imaginenetworksllc.com> wrote:
I do Anycast for much much smaller. It's great to reboot one server and have the other take all of the load. 0 customer interruption, not even a single DNS query lost.
Hi Josh, You don't need anycast routing to do that, or more precisely you don't need the route to persist in an anycast state for more than a few seconds during the handoff. You can implement dynamic but still unicast routing to the DNS servers without incurring the wrath of the anycast gods. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/

Subject: Re: Recommended DNS server for a medium 20-30k users isp Date: Fri, Aug 08, 2025 at 10:09:04AM -0700 Quoting William Herrin via NANOG (nanog@lists.nanog.org):
On Fri, Aug 8, 2025 at 9:42 AM Josh Luthman <josh@imaginenetworksllc.com> wrote:
I do Anycast for much much smaller. It's great to reboot one server and have the other take all of the load. 0 customer interruption, not even a single DNS query lost.
Hi Josh,
You don't need anycast routing to do that, or more precisely you don't need the route to persist in an anycast state for more than a few seconds during the handoff. You can implement dynamic but still unicast routing to the DNS servers without incurring the wrath of the anycast gods.
The elephant in the room is cascading failures. Other than that, I'd not want to be without anycast for its service level record. I don't have to be up in the middle of the night to patch my resolvers. I can take the most loaded one out of service at any time by shutting down BGP, waiting a couple seconds, and it will be completely drained from requests, and I can reboot. No customer or end user is going to notice. Regarding TCP, yes, this is a potential issue. You can think about it and it will grow in your mind, or you can do some observations and conclude that unless you messed your routing up really badly (which is not DNS' fault but still on-topic here) the mean session length for a client-to 1st hop resolver TCP session is going to be orders of magnitude shorter than the times between routing updates that make a certain router change its mind about which anycast node is the closest one. Further, I'd make an educated guess and say that the recursion traffic going from resolver to auth server is much more likely to hit TCP. And that is unicast all the way. Also, EDNS0. We usually have ~1200 bytes to play with. Not 512. YMMV. -- Måns Nilsson primary/secondary/besserwisser/machina MN-1334-RIPE SA0XLR +46 705 989668 YOW!! Everybody out of the GENETIC POOL!

On Sat, Aug 9, 2025 at 5:38 AM Måns Nilsson <mansaxel@besserwisser.org> wrote:
Regarding TCP, yes, this is a potential issue. You can think about it and it will grow in your mind, or you can do some observations and conclude that unless you messed your routing up really badly (which is not DNS' fault but still on-topic here) the mean session length for a client-to 1st hop resolver TCP session is going to be orders of magnitude shorter than the times between routing updates that make a certain router change its mind about which anycast node is the closest one.
Hi Måns, This is a case of misunderstanding what the numbers are telling you. Yes, the failure rate is low, but it's not random. It's not a case of 99 queries work, 1 doesn't. and you try again and it works. It's a case of queries work for 99 people and 1 person with just the wrong connections to the network graph experiences persistent failures. And then your front-line customer support blames the customer for your error because obviously it's working for everybody else. If it doesn't work in the corner cases then it doesn't work. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/

On 8/7/25 20:44, DurgaPrasad - DatasoftComnet via NANOG wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? I've been happy with PowerDNS Recursor. What sort of latency were you seeing in it and at what loading?
-- Bryan Fields 727-409-1194 - Voice http://bryanfields.net

Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? I've been happy with PowerDNS Recursor. What sort of latency were you seeing in it and at what loading?
I wonder if we're trying to answer the wrong question. "sometimes find the caching times a bit on upper side" could certainly be interpreted to refer to caching according to the TTL specified for the zone on the *authoritative* server. If so - it's possible that the user simply needs to be able to specify a maximum TTL, independent of the setting on the authoritative server. Steinar Haug, AS2116

On Fri, Aug 08, 2025 at 12:44:40AM +0000, DurgaPrasad - DatasoftComnet via NANOG wrote:
Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP.
Yes. ISC BIND, running on OpenBSD. Performs well on minimal hardware, plus the OpenBSD firewall implementation ("pf") is excellent. And since both can be configured and operated from the command line, this setup readily lends itself to revision control, scripting, and synchronization. ---rsk

And completely the opposite in every possible way from running some gui dependent security nightmare on windows, a platform renowned for its amazing scheduler. If you want authoritative only on OpenBSD then NSD works well and can be synced from BIND if you want to only present OpenBSD to the internet.
On 8 Aug 2025, at 14:20, Rich Kulawiec via NANOG <nanog@lists.nanog.org> wrote:
On Fri, Aug 08, 2025 at 12:44:40AM +0000, DurgaPrasad - DatasoftComnet via NANOG wrote:
Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP.
Yes. ISC BIND, running on OpenBSD. Performs well on minimal hardware, plus the OpenBSD firewall implementation ("pf") is excellent. And since both can be configured and operated from the command line, this setup readily lends itself to revision control, scripting, and synchronization.
---rsk _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/VDMXUUAI...

At $lastjob, we had 60k subs using a pair of BSD boxes running a pretty simple BIND instance with no issues. We used frr to advertise a vip for customers to query against, but otherwise it was a pretty simple install/config.
On Aug 7, 2025, at 19:45, DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...

On 8/8/25 02:44, DurgaPrasad - DatasoftComnet via NANOG wrote:
Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
In my experience with ~700k DSL customers before 2010 and DC setups after that the default PowerDNS recursor settings do not really need tuning apart from limiting the amount of entries in cache [0] which directly corresponds to memory usage. The amount of memory required per entry depends on your platform and has changed over time so you should monitor resource usage and adjust accordingly. I also usually limit the max-negative-ttl to 10 minutes instead of the 1 hour default [1] which helps with recovery after some misconfiguration out there. For monitoring these and other metrics can recommend the use of prometheus/grafana via the provided metrics endpoint. [2] The average response latency in particular can also let you know when the quality of your recursive nameservers network connection deteriorates. Since there also is dnsdist [3] these days i can wholeheartedly recommend putting your recursive DNS Service behind it or an HA-setup of them so you can seamlessly switch between nodes or even implementations. dnsdist also provides a /metrics endpoint. [4] [0] https://doc.powerdns.com/recursor/settings.html#max-cache-entries [1] https://doc.powerdns.com/recursor/settings.html#max-negative-ttl [2] https://doc.powerdns.com/recursor/metrics.html#using-prometheus-export [3] https://www.dnsdist.org/index.html [4] https://www.dnsdist.org/statistics.html

Responding to the thread in general, not any particular person. For those of you with a firewall in front of your DNS servers, what are you having the firewall do? ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "DurgaPrasad - DatasoftComnet via NANOG" <nanog@lists.nanog.org> To: nanog@lists.nanog.org Cc: "DurgaPrasad - DatasoftComnet" <dp@datasoftcomnet.com> Sent: Thursday, August 7, 2025 7:44:40 PM Subject: Recommended DNS server for a medium 20-30k users isp Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any. Thank you /DP _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...

We have used powerdns and unbound but sometimes find the caching times a bit on upper side.
By default, unbound caches an entry for the duration of the TTL received from the authoritative server. You can modify this . - cache-min-ttl : if TTL less than this, cache it at least this long - cache-max-ttl : if TTL is more than this, only cache it this long There are many other config options in unbound that allow you to tune cache behaviors to your desired use case. A cursory look at PowerDNS docs show that has similar options. I'd suggest working with the software you already have to see if it can be configured to meet your requirements first. Likely to be less effort. On Thu, Aug 7, 2025 at 8:45 PM DurgaPrasad - DatasoftComnet via NANOG < nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP _______________________________________________ NANOG mailing list
https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...

I've had good luck with knot-resolver, combined with ExaBGP and a health check script controlling the announcements upstream among other things. Ryan Hamel ________________________________ From: DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> Sent: Thursday, August 7, 2025 5:44 PM To: nanog@lists.nanog.org <nanog@lists.nanog.org> Cc: DurgaPrasad - DatasoftComnet <dp@datasoftcomnet.com> Subject: Recommended DNS server for a medium 20-30k users isp Caution: This is an external email and may be malicious. Please take care when clicking links or opening attachments. Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any. Thank you /DP _______________________________________________ NANOG mailing list https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.nanog.org%2Farchives%2Flist%2Fnanog%40lists.nanog.org%2Fmessage%2FSUTKDISSISPWQY3YGF25FBQNN2JD5HDP%2F&data=05%7C02%7Cryan%40rkhtech.org%7C887d41a221174314345408ddd614e919%7C81c24bb4f9ec4739ba4d25c42594d996%7C0%7C0%7C638902107516087426%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=zNAPhzQH2Mjon3Cc%2F1GkJDqkug5BjraBHshuP2MNHEs%3D&reserved=0<https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISSISPWQY3YGF25FBQNN2JD5HDP/>

Use nameservers that support DNS COOKIE (RFC 7873) and enable it if it is not already on by default. If the nameserver vendor that you are currently using doesn’t support DNS COOKIE find a better nameserver. DNS COOKIE provides cheap protection against off path DNS spoofing but it only provides protection if both server and client support it. It’s been 9 years since RFC 7873 was published and in that time just about all of the servers with broken EDNS implementations that failed to ignore unknown EDNS options, as per RFC 6981, have been replaced with ones that are RFC compliant. If you previously disabled sending DNS COOKIE requests in the past it is time to re-enable it. Mark
On 8 Aug 2025, at 10:44, DurgaPrasad - DatasoftComnet via NANOG <nanog@lists.nanog.org> wrote:
Hello all, Do you have any recommendations for recursive DNS servers for a medium sized (20-30k users) ISP. We have used powerdns and unbound but sometimes find the caching times a bit on upper side. Any suggestions between these two or anything new? Also need points on how much we tune the settings pros and cons if any.
Thank you /DP _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/SUTKDISS...
-- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
participants (32)
-
Andrew Latham
-
brent saner
-
Bryan Fields
-
Crist Clark
-
Damian Menscher
-
David Guo
-
David Prall
-
DurgaPrasad - DatasoftComnet
-
Jay Acuna
-
John Todd
-
Josh Luthman
-
Marco Moock
-
Mark Andrews
-
Matthew Petach
-
Mel Beckman
-
Mike Hammett
-
Mike Simpson
-
Måns Nilsson
-
Nick Hilliard
-
Rich Kulawiec
-
Robert L Mathews
-
Rusty Dekema
-
Ryan Hamel
-
Saku Ytti
-
Smoot Carl-Mitchell
-
Stefan Schmidt
-
sthaug@nethelp.no
-
Tim Burke
-
Tom Beecher
-
Uesley Correa
-
William Herrin
-
Łukasz Bromirski