Re: Memory leak cause of Comcast DNS problems

newer
DSCP ECN bits

older
Re: cost of doing business

Fergie (Paul Ferguson)

17 Apr 2005 17 Apr '05

5:01 p.m.

...

From a Washington Post article yesterday (posted via Yahoo! News), Comcast said that the problem manifested itself when

Not to my knowledge, or at least, none that has been publicly acknowledged. they were in the process of upgrading their DNS servers: http://story.news.yahoo.com/news?tmpl=story&ncid=1212&e=3&u=/washpost/20050416/tc_washpost/a56223_2005apr15&sid=96168964 - ferg -- Florian Weimer <fw@deneb.enyo.de> wrote:

...

Regardless of whether it actually _was_ a memory leak, or not, it appears that the impact was on a rather large enough scale.

Have other service providers been affected, too? -- "Fergie", a.k.a. Paul Ferguson Engineering Architecture for the Internet fergdawg@netzero.net or fergdawg@sbcglobal.net ferg's tech blog: http://fergdawg.blogspot.com/

Show replies by date

Steven M. Bellovin

17 Apr 17 Apr

6:18 p.m.

New subject: Memory leak cause of Comcast DNS problems

In message <20050417.100203.11740.378954@webmail23.lax.untd.com>, "Fergie (Paul Ferguson)" writes:

...

Not to my knowledge, or at least, none that has been publicly acknowledged.

...
From a Washington Post article yesterday (posted via Yahoo! News), Comcast said that the problem manifested itself when they were in the process of upgrading their DNS servers:

http://story.news.yahoo.com/news?tmpl=story&ncid=1212&e=3&u=/washpost/20050416 /tc_washpost/a56223_2005apr15&sid=96168964

At least in my neighborhood, Comcast appears to be running BIND 9.2.4rc6 --Prof. Steven M. Bellovin, http://www.cs.columbia.edu/~smb

Martin J. Levy

7:35 p.m.

New subject: Memory leak cause of Comcast DNS problems

Steve (and all),

...

At least in my neighborhood, Comcast appears to be running BIND 9.2.4rc6

Ah... Then there are to possible paths... 1) There was a real memory-leak bug and this was an unfortunate operations event. The CHANGES file for 9.3.1 and bind-9.2.5rc1 show various big fixes related to memory leak issues. I leave it to someone else to comment on the potential of being tickled within a Comcast environment. -or- (And on a much more cynical note.) 2) Someone checked the latest CHANGES file for bind and realized that saying it was a memory leak was a good cover (see quick pseudo-grep of file below. Note that not all the bug's affect the running bind name server code). Whichever it was, I wonder how it could affect so many name servers at only one provider and all at the same time. This is just plain strange. I would have thought that best practices for a DNS service would recommend staggered upgrades, heck, even forced different s/w releases. etc. etc. Martin --------------------------------------- awk ' /^ --- 9\.2\.[0123][^ ]* released ---/ { print; exit; } /^ --- [^ ]* released ---/ { print; next; } /^[ ]*$/ { if (memory) { print all; } all = ""; memory = 0; next; } /[mM]emory/ { memory = 1; } { all = all "\n" $0; next } ' < bind-9.3.1/CHANGES --------------------------------------- --- 9.3.1 released --- --- 9.3.1rc1 released --- --- 9.3.1beta2 released --- --- 9.3.1beta1 released --- --- 9.3.0 released --- --- 9.3.0rc4 released --- --- 9.3.0rc3 released --- --- 9.3.0rc2 released --- 1683. [bug] dig +sigchase could leak memory. [RT #11445] --- 9.3.0rc1 released --- 1643. [bug] dns_db_closeversion() could leak memory / node references. [RT #11163] --- 9.3.0beta4 released --- 1635. [bug] Memory leak on error in query_addds(). --- 9.3.0beta3 released --- 1599. [bug] Fix memory leak on error path when checking named.conf. --- 9.3.0beta2 released --- --- 9.3.0beta1 released --- 1562. [bug] isc_socket_create() and isc_socket_accept() could leak memory under error conditions. [RT #10230] 1561. [bug] It was possible to release the same name twice if named ran out of memory. [RT #10197] 1547. [bug] Named wasted memory recording duplicate lame zone entries. [RT #9341] 1545. [bug] It was possible to leak memory if named was unable to bind to the specified transfer source and TSIG was being used. [RT #10120] 1364. [func] Log file name when unable to open memory statistics and dump database files. [RT# 3437] 1235. [func] Report 'out of memory' errors from openssl. 1143. [bug] When a trusted-keys statement was present and named was built without crypto support, it would leak memory. 982. [func] If "memstatistics-file" is set in options the memory statistics will be written to it. --- 9.2.3rc1 released ---

Daniel Golding

18 Apr 18 Apr

5:13 p.m.

New subject: Memory leak cause of Comcast DNS problems

Several of the servers that were down are not BIND, at least these: prospero:~/Desktop/fpdns-0.9.1 dgold$ ./fpdns.pl 68.87.66.196 fingerprint (68.87.66.196, 68.87.66.196): Cisco CNR I ran fpdns against them between outages. They now respond differently. prospero:~/Desktop/fpdns-0.9.1 dgold$ ./fpdns.pl 68.87.66.196 fingerprint (68.87.66.196, 68.87.66.196): q0r?1,IQUERY,0,0,1,1,0,0,REFUSED,0,0,0,0 These are the Comcast "national" DNS servers. (I am using plural, because there are several reverse DNS entries for this IP address - ns.cmc.co.denver.comcast.net and ns.inflow.pa.bo.comcast.net) I wouldn't rush to blame BIND for this. For purposes of investigation, does anyone have DNS servers from those periods of downtime other than the ones above? Comcast is quite a patchwork, that's to the incomplete integrations of MediaOne, AT&T Broadband, etc. It would be interesting to see data on other DNS servers during the downtime periods. Many folks on various forums were suggesting the use of ns1. And ns2.level3. Of course, logic suggests that the vast majority of folks, having no Internet access, could not have read the advice. ---- There have been three explanations given for the outage - 1) Upgrade issues 2) Memory leak/software issue 3) DDoS There is also the possibility of some combination of the above. There are a number of possible permutations. - Dan On 4/17/05 2:18 PM, "Steven M. Bellovin" <smb@cs.columbia.edu> wrote:

...

In message <20050417.100203.11740.378954@webmail23.lax.untd.com>, "Fergie (Paul Ferguson)" writes:

...
Not to my knowledge, or at least, none that has been publicly acknowledged.

...
From a Washington Post article yesterday (posted via Yahoo! News), Comcast said that the problem manifested itself when they were in the process of upgrading their DNS servers:

http://story.news.yahoo.com/news?tmpl=story&ncid=1212&e=3&u=/washpost/2005041>> 6

...

...
/tc_washpost/a56223_2005apr15&sid=96168964

At least in my neighborhood, Comcast appears to be running BIND 9.2.4rc6

--Prof. Steven M. Bellovin, http://www.cs.columbia.edu/~smb

-- Daniel Golding Network and Telecommunications Strategies Burton Group

Florian Weimer

7:49 p.m.

New subject: Memory leak cause of Comcast DNS problems

* Daniel Golding:

...

I wouldn't rush to blame BIND for this.

Maybe the leak wasn't in the DNS service, but some other software component which company policy required on each server (think of Tivoli, antivirus software, or CSA). Who knows? The possiblities are endless.

Jason Frisvold

8:07 p.m.

New subject: Memory leak cause of Comcast DNS problems

On 4/18/05, Florian Weimer <fw@deneb.enyo.de> wrote:

...

Maybe the leak wasn't in the DNS service, but some other software component which company policy required on each server (think of Tivoli, antivirus software, or CSA). Who knows? The possiblities are endless.

There was, at one time, a fairly serious memory leak in Cisco CNR... I believe I saw a post indicating that CNR was possibly in use? -- Jason 'XenoPhage' Frisvold XenoPhage0@gmail.com

Christopher L. Morrow

2:17 a.m.

New subject: Memory leak cause of Comcast DNS problems

On Sun, 17 Apr 2005, Fergie (Paul Ferguson) wrote:

...

Not to my knowledge, or at least, none that has been publicly acknowledged.

...
From a Washington Post article yesterday (posted via Yahoo! News), Comcast said that the problem manifested itself when they were in the process of upgrading their DNS servers:

http://story.news.yahoo.com/news?tmpl=story&ncid=1212&e=3&u=/washpost/20050416/tc_washpost/a56223_2005apr15&sid=96168964

-- Florian Weimer <fw@deneb.enyo.de> wrote:

...
Regardless of whether it actually _was_ a memory leak, or not, it appears that the impact was on a rather large enough scale.

So, 'wide scale' because they, presuming of course the article is on the level, upgraded all devices at approximately the same time...

7396

Age (days ago)

7397

Last active (days ago)

List overview

Download

6 comments

7 participants

participants (7)

Christopher L. Morrow
Daniel Golding
Fergie (Paul Ferguson)
Florian Weimer
Jason Frisvold
Martin J. Levy
Steven M. Bellovin