Re: TCP session disconnection caused by Code Red?

newer
Re: TCP session disconnection...

older
The Death of TCP/IP

George William Herbert

6 Aug 2001 6 Aug '01

6:57 p.m.

mike harrison <meuon@highertech.net> wrote

...

Blaz Zupan <blaz@amis.net> wrote:

...
For the last few days, our network seems to be basically unreachable from the outside. Most incoming TCP sessions (web requests, incoming mail, telnet sessions, etc.) often fail with a simple "Connection refused" like nobody is

Your routers are brain dead from the load.. routers that are used to handling a few thousand connections are being asked to handle 10's of thousands. 1 good 1000+ address scan from an ISDN user kills my Lucent/Ascend TNT unless we filter for it.

I've been told (but not given permission to forward details of who/how/what) that some major sites with a single router and relatively flat network topology are dying due to the ARP request flood that is being generated by Code Red scans on the inside of their border router choking the router. Check the rate of ARP requests coming off your border router and see if it seems excessive; if so, that may be it. -george william herbert gherbert@retro.com

Show replies by date

Kevin Gannon

6 Aug 6 Aug

6:51 p.m.

New subject: TCP session disconnection caused by Code Red?

Some things that are worth looking if you are running Cisco's ( I blieve the original poster was): http://www.cisco.com/warp/public/63/ts_codred_worm.html Regards, Kevin

...

mike harrison <meuon@highertech.net> wrote

...
Blaz Zupan <blaz@amis.net> wrote:

...
For the last few days, our network seems to be basically unreachable from the outside. Most incoming TCP sessions (web requests, incoming mail, telnet sessions, etc.) often fail with a simple "Connection refused" like nobody is

Your routers are brain dead from the load.. routers that are used to handling a few thousand connections are being asked to handle 10's of thousands. 1 good 1000+ address scan from an ISDN user kills my Lucent/Ascend TNT unless we filter for it.

I've been told (but not given permission to forward details of who/how/what) that some major sites with a single router and relatively flat network topology are dying due to the ARP request flood that is being generated by Code Red scans on the inside of their border router choking the router. Check the rate of ARP requests coming off your border router and see if it seems excessive; if so, that may be it.

-george william herbert gherbert@retro.com

Alex Bligh

7:50 p.m.

New subject: TCP session disconnection caused by Code Red?

...

I've been told (but not given permission to forward details of who/how/what) that some major sites with a single router and relatively flat network topology are dying due to the ARP request flood that is being generated by Code Red scans on the inside of their border router choking the router. Check the rate of ARP requests coming off your border router and see if it seems excessive; if so, that may be it.

2 points: 1. RFC826 appears to mandate only positive ARP caching. I can't see a reason why negative ARP caching shouldn't work this way: Keep only one ARP request in flight at a time. Retry ARPs a maximum of [5] times, separated by at least [1] second. After that, cache non-existance of a h/w address for that IP address for normal positive caching time. If you see any IP traffic inbound on that interface with that IP address, remove the negative cache. However, to get a positive cache entry you still need a valid ARP response (promiscuous or not). More formally, when address resolution is required: a) Look up IP address in ARP table i) If entry is PRESENT (i.e. h/w address OK) return this value. ii) If entry is NEXIST return ARP failure immediately (i.e. as a router, drop into the code where no route is found - on Cisco this would be rate-limited unreachables) iii) If entry is INCOMPLETE[\d] go to (b) performing further packet transmission (i.e. transmitting an ARP packet ONLY if the entry is fully aged (i.e. otherwise perform your RFC826 compatible / current operation without transmitting another ARP packet) iv) If entry is absent, transmit ARP packet as normal, set entry to INCOMPLETE[0] and go to (b) b) [this is the action we perform if we don't yet know the h/w address]. RFC826 suggests returning allowing a higher layer to retransmit, though I suppose blocking is theoretically possible If a valid ARP response is received (promiscuous or otherwise), remove any existing entry, and generate a PRESENT entry. If /any/ packet is received from with a valid IP address remove an NEXIST entry if present (on the ARP table for the interface on which it was received only) [this check is arguably too thorough as it will remove valid NEXIST entries for IP addresses that exist, but behind a router on the current subnet, rather than on it directly, though this is (a) better than nothing, and (b) required to support proxy ARP properly; note that you can't rely on the MAC address being that of the IP though - still have to ARP] Age INCOMPLETE[n] states to INCOMPLETE[n+1] states after [t1] seconds (probably about 1 second), for n<N, and to NEXIST for n>=N (N is probably about 5) Age NEXIST state to deleted after about [t2] seconds (where t2 is probably close to the arp timeout - i.e. about 300) INCOMPLETE essentially means PENDING 2. It has been observed that Cisco products in particular do not handle ARP storms well. Even worse is the Catalyst 5[50]00. This may have been fixed since I saw it. The application in which I saw it seriously merited having a linux box or similar 'proxy'-arp all non-existant addresses to null. You can probably achieve the same result with static arp entries to a non-existant h/w address. Alex Bligh Personal Capacity

Craig Partridge

7:55 p.m.

New subject: TCP session disconnection caused by Code Red?

RFC 1122 mandates that you query for a particular ARP destination no more frequently than once per second. RFC 1122 also notes a number of reasons why people may want to make the positive ARP cache timeout long -- if one suppresses ARP queries for that time, you'll have the situation where if a popular host goes down for a period of time, it is effectively off the network for a long period while waiting for ARP negative caches to timeout. Probably a bad idea. Rate limiting, as RFC 1122 suggests, would seem to be much better. Craig

Eric A. Hall

10:14 p.m.

New subject: TCP session disconnection caused by Code Red?

Alex Bligh wrote:

...

1. RFC826 appears to mandate only positive ARP caching. I can't see a reason why negative ARP caching shouldn't work this way:

Keep only one ARP request in flight at a time. Retry ARPs a maximum of [5] times, separated by at least [1] second. After that, cache non-existance of a h/w address for that IP address for normal positive caching time.

The immediate problem with this is that it requires a *MUCH* larger ARP cache. Rather than needing enough memory for a couple of thousand active entries (the current norm for middle-of-the road routers), you need enough room for every possible address on every attached segment. [unsubstantiated conjecture] This may be what's killing the cable networks. If they are making room in the NAS ARP caches for the addresses that are being probed, then they are making room by flushing the "real" ARP entries, resulting in a constant flush/load cycle. [/uc, but exemplary of the problem with negative ARP caching.] --- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/

Daniel Senie

10:37 p.m.

New subject: TCP session disconnection caused by Code Red?

At 06:14 PM 8/6/01, Eric A. Hall wrote:

...

Alex Bligh wrote:

...
1. RFC826 appears to mandate only positive ARP caching. I can't see a reason why negative ARP caching shouldn't work this way:

Keep only one ARP request in flight at a time. Retry ARPs a maximum of [5] times, separated by at least [1] second. After that, cache non-existance of a h/w address for that IP address for normal positive caching time.

The immediate problem with this is that it requires a *MUCH* larger ARP cache. Rather than needing enough memory for a couple of thousand active entries (the current norm for middle-of-the road routers), you need enough room for every possible address on every attached segment.

[unsubstantiated conjecture] This may be what's killing the cable networks. If they are making room in the NAS ARP caches for the addresses that are being probed, then they are making room by flushing the "real" ARP entries, resulting in a constant flush/load cycle. [/uc, but exemplary of the problem with negative ARP caching.]

Adding to this conjecture, I'm seeing VERY high ARP rates (arp broadcast packets) arriving via the cable modem in my office. Also seeing a high rate of Code Red type attacks attempted at the machines attached. Firewall is just catching and logging them. ----------------------------------------------------------------- Daniel Senie dts@senie.com Amaranth Networks Inc. http://www.amaranth.com

David Schwartz

11:50 p.m.

New subject: TCP session disconnection caused by Code Red?

...

The immediate problem with this is that it requires a *MUCH* larger ARP cache. Rather than needing enough memory for a couple of thousand active entries (the current norm for middle-of-the road routers), you need enough room for every possible address on every attached segment.

...

Eric A. Hall http://www.ehsco.com/

Weight that against the advantages, however. If you have a large address space for the segment with few attached hosts (the case where this is a problem), you're better off with a lot of negative entries cached then with a lot of active ARP attempts. One thing I see a lot of on segments with large address spaces is that the quantity of ARP traffic can get high. Each ARP request causes an interrupt on each attach host on the segment. I'd rather the router have a larger ARP cache than the network have more broadcast traffic. I'm curious what kind of algorithms my routers currently use. If it's one packet per second with five retries -- consider a network with a /22 that's only half full. You could see as much as 512 broadcast packets a second just from one router. Sounds like an interesting technique for getting amplification by a factor of 5 -- 5 broadcast packets for every unicast packet you send. Smarter rate limiting sounds like a win. DS

8735

Age (days ago)

8735

Last active (days ago)

List overview

Download

6 comments

7 participants

participants (7)

Alex Bligh
Craig Partridge
Daniel Senie
David Schwartz
Eric A. Hall
George William Herbert
Kevin Gannon