Greetings, It looks like all hell is breaking loose on some of the nations backbones. http://www.internethealthreport.com The port counters on my AT&T DS3 were reading in the 250 megabit range, that is a DS3, mind you. Any source IP's I can add to the circular file would be appreciated. Any ranges I find I'll echo back to the list. Regards, Christopher J. Wolff, VP CIO Broadband Laboratories, Inc. http://www.bblabs.com
On Sat, 25 Jan 2003, Christopher J. Wolff wrote:
Greetings,
It looks like all hell is breaking loose on some of the nations backbones. http://www.internethealthreport.com
The port counters on my AT&T DS3 were reading in the 250 megabit range, that is a DS3, mind you.
Any source IP's I can add to the circular file would be appreciated. Any ranges I find I'll echo back to the list.
It's an MS SQL worm that is sending and receiving UDP on 1434. http://www.nextgenss.com/advisories/mssql-udp.txt appears to be relevant. Anyone want to get involved in some sort of real time chat (like IRC) to disuss strategies? We're seeing some pretty big traffic, and related problems in multiple colo's world wide. Doug -- "We have known freedom's price. We have shown freedom's power. And in this great conflict, ... we will see freedom's victory." - George W. Bush, President of the United States State of the Union, January 28, 2002 Do YOU Yahoo!?
On Sat, 25 Jan 2003, Doug Barton wrote:
Anyone want to get involved in some sort of real time chat (like IRC) to disuss strategies? We're seeing some pretty big traffic, and related problems in multiple colo's world wide.
What's to discuss? If you put something like access-list 150 deny udp any any eq 1434 log-input access-list 150 permit ip any any on all your customer-facing ports you get to 1. filter out the disruptive traffic 2. see which customer systems are infected This works well even on relatively underpowered Cisco 7200 boxes.
Hi, NANOGers. ] access-list 150 deny udp any any eq 1434 log-input Be _very_ careful about enabling such logging. Some of the worm flows have filled GigE pipes. I doubt you really want to log that; Netflow is a better option in this case. Too much logging will raise the CPU utilization to the point of creating a DoS on the router. Thanks, Rob. -- Rob Thomas http://www.cymru.com ASSERT(coffee != empty);
On Sat, 25 Jan 2003, Rob Thomas wrote:
] access-list 150 deny udp any any eq 1434 log-input
Be _very_ careful about enabling such logging. Some of the worm flows have filled GigE pipes. I doubt you really want to log that; Netflow is a better option in this case. Too much logging will raise the CPU utilization to the point of creating a DoS on the router.
As a general rule, yes. But: " Access list logging does not show every packet that matches an entry. Logging is rate-limited to avoid CPU overload. What logging shows you is a reasonably representative sample, but not a complete packet trace. Remember that there are packets you're not seeing. Access lists and logging have a performance impact, but not a large one. Be careful on routers running at more than about 80 percent CPU load, or when applying access lists to very high-speed interfaces. " ( http://www.cisco.com/warp/public/707/22.html ) There doesn't seem to be a noticable impact on CPU usage for a C12000 GigE linecard. Can you do Netflow rather than CEF on such a beast without a performance penalty?
On Sat, 25 Jan 2003, Iljitsch van Beijnum wrote:
On Sat, 25 Jan 2003, Rob Thomas wrote:
] access-list 150 deny udp any any eq 1434 log-input
Be _very_ careful about enabling such logging. Some of the worm flows have filled GigE pipes. I doubt you really want to log that; Netflow is a better option in this case. Too much logging will raise the CPU utilization to the point of creating a DoS on the router.
As a general rule, yes. But:
" Access list logging does not show every packet that matches an entry. Logging is rate-limited to avoid CPU overload. What logging shows you is a reasonably representative sample, but not a complete packet trace. Remember that there are packets you're not seeing.
either way, the logging for this, ESPECIALLY with log-input, is a dangerous proposition. One thing to keep in mind is that the S-train platforms are different in handling logging than the normal trains... so S-train rate-limits (and bumps out them annoying messages about rate-limited messages) while others punt as much to the route processor as possible and happily saturate it :( (Don't log on like a 7500 for instance if the packet rates are over like 5kpps...)
Access lists and logging have a performance impact, but not a large one. Be careful on routers running at more than about 80 percent CPU load, or when applying access lists to very high-speed interfaces. "
right, or on platforms not built to scale :) (like 7500 or smaller boxen)
( http://www.cisco.com/warp/public/707/22.html )
There doesn't seem to be a noticable impact on CPU usage for a C12000 GigE linecard. Can you do Netflow rather than CEF on such a beast without a performance penalty?
One thing to keep in mind is that perhaps you don't care about the logging :) Just drop it and make your customers fix their borked boxes...
On Sat, 25 Jan 2003, Christopher L. Morrow wrote:
" Access list logging does not show every packet that matches an entry. Logging is rate-limited to avoid CPU overload.
either way, the logging for this, ESPECIALLY with log-input, is a dangerous proposition.
Are you saying that I shouldn't believe Cisco's own documentation? Obviously, it's going to take _some_ CPU cycles, but I would expect the box to remain operational.
One thing to keep in mind is that the S-train platforms are different in handling logging than the normal trains...
Ok, I've been working with Cisco equipment for 8 years now and I can configure them in my sleep, but all the version/image/train/feature set is still voodoo to me. Obviously, the router caches the information it wants to log for a while and then counts hits against the cache until it actually logs. This should work very well, and it does as per my tests on a heavily loaded 4500 router. So why would one type of IOS do this right and another version that isn't immediately recognizable by the version number as inferior do it wrong?
possible and happily saturate it :( (Don't log on like a 7500 for instance if the packet rates are over like 5kpps...)
I think today's events show that CPU-based routers have no business handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4 Gbps coming in so the box must be able to handle it to some usable degree.
There doesn't seem to be a noticable impact on CPU usage for a C12000 GigE linecard. Can you do Netflow rather than CEF on such a beast without a performance penalty?
One thing to keep in mind is that perhaps you don't care about the logging :) Just drop it and make your customers fix their borked boxes...
That's why I want the logging: to see which customer is spewing out the garbage. (-:
On Sat, 25 Jan 2003, Iljitsch van Beijnum wrote:
On Sat, 25 Jan 2003, Christopher L. Morrow wrote:
" Access list logging does not show every packet that matches an entry. Logging is rate-limited to avoid CPU overload.
either way, the logging for this, ESPECIALLY with log-input, is a dangerous proposition.
Are you saying that I shouldn't believe Cisco's own documentation? Obviously, it's going to take _some_ CPU cycles, but I would expect the box to remain operational.
Yes, you'd expect this to remain operational.. but the real world 'testing' shows that not to be the case. If the attack has highly random source or destination the log messages get gen'd for each packet :( This causes a little pain (or alot if you qualify dropping routing protocols as alot) on the router :( CPU spikes due to logging large floods are quite common. This I know from very personal experience.
One thing to keep in mind is that the S-train platforms are different in handling logging than the normal trains...
Ok, I've been working with Cisco equipment for 8 years now and I can configure them in my sleep, but all the version/image/train/feature set is still voodoo to me. Obviously, the router caches the information it
me too.
wants to log for a while and then counts hits against the cache until it
only for identical packets... so source A:123 -> Dest B:80 x500000 packets gets logged 'once'. One log for the first packet and update logs at 5 min intervals (which may be setable in some ios command, which may only exist in S-train code). If the attack is randomized, sources, destinations, or ports... there is effecively a new 'flow' for each packet and thus a new log message for each... (again, in S-train code or 12.0(21)+ code this is rate-limited to the RP and thus to the logs... somewhat atleast)
actually logs. This should work very well, and it does as per my tests on a heavily loaded 4500 router. So why would one type of IOS do this right and another version that isn't immediately recognizable by the version number as inferior do it wrong?
S-train code has specific features that don't get propogated to other trains because they aren't 'required' there or aren't applicable, or not asked for.
possible and happily saturate it :( (Don't log on like a 7500 for instance if the packet rates are over like 5kpps...)
I think today's events show that CPU-based routers have no business handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4 Gbps coming in so the box must be able to handle it to some usable degree.
that may be, but CPE isn't normally vendor J for t1/t3/oc3 customers... never mind dsl/dial/cable customers, eh? The vast majority is cpu based equipment. Whether or not that's a good thing is immaterial, no one is going to upgrade all ruouting gear overnight :( (or in 2 years as we've seen)
There doesn't seem to be a noticable impact on CPU usage for a C12000 GigE linecard. Can you do Netflow rather than CEF on such a beast without a performance penalty?
One thing to keep in mind is that perhaps you don't care about the logging :) Just drop it and make your customers fix their borked boxes...
That's why I want the logging: to see which customer is spewing out the garbage. (-:
well, then.. log vs log-input :) cause log-input is more processing and thus more pain. (and if its 'inbound' on interfaces the 'log-input' is kinda pointless, eh?
On Sat, 25 Jan 2003, Christopher L. Morrow wrote:
wants to log for a while and then counts hits against the cache until it
only for identical packets... so source A:123 -> Dest B:80 x500000 packets gets logged 'once'. One log for the first packet and update logs at 5 min intervals (which may be setable in some ios command, which may only exist in S-train code). If the attack is randomized, sources, destinations, or ports... there is effecively a new 'flow' for each packet and thus a new log message for each... (again, in S-train code or 12.0(21)+ code this is rate-limited to the RP and thus to the logs... somewhat atleast)
It seems the flow recognition isn't that strict but I might just have been lucky.
actually logs. This should work very well, and it does as per my tests on a heavily loaded 4500 router. So why would one type of IOS do this right and another version that isn't immediately recognizable by the version number as inferior do it wrong?
S-train code has specific features that don't get propogated to other trains because they aren't 'required' there or aren't applicable, or not asked for.
Lovely when others decide what you require.
I think today's events show that CPU-based routers have no business handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4 Gbps coming in so the box must be able to handle it to some usable degree.
that may be, but CPE isn't normally vendor J for t1/t3/oc3 customers...
CPE for T1 would be 2500, T3 3600, OC3 7200 or some such. All are fine for day-to-day stuff but don't pack enough power to handle today's events at line rate. But the difference is small enough that it can be remedied by simply using faster CPUs. Those were available at the time the boxes were introduced, but I assume a faster CPU would have increased the cost price too much.
never mind dsl/dial/cable customers, eh?
Those are slow enough to be done in software easily.
The vast majority is cpu based equipment. Whether or not that's a good thing is immaterial, no one is going to upgrade all ruouting gear overnight :( (or in 2 years as we've seen)
People are buying GE equipment left right and center too. It doesn't make much sense to have more computing power in the ethernet chip (GE over UTP takes a lot of processing power) than in the chip doing the routing. Maybe its possible to find some middle ground, for instance by doing some basic flow recognition and rate limiting in hardware but the actual routing in software. That way, you can build a GE CPE router that can do 100 kpps which is enough for regular traffic but still have some protection when there is a 1.4 Mpps DoS attack which would otherwise have killed the CPU.
That's why I want the logging: to see which customer is spewing out the garbage. (-:
well, then.. log vs log-input :) cause log-input is more processing and thus more pain. (and if its 'inbound' on interfaces the 'log-input' is kinda pointless, eh?
Good point. The reason it's there is that I didn't know what I was dealing with when I enabled this logging and I wanted to see the MAC addresses in case the source IP addresses were spoofed.
From: "Iljitsch van Beijnum"
Are you saying that I shouldn't believe Cisco's own documentation? Obviously, it's going to take _some_ CPU cycles, but I would expect the box to remain operational.
Actually, Cisco's documentation is not always accurate, and it heavily depends on IOS version, train, feature set, and hardware.
One thing to keep in mind is that the S-train platforms are different in handling logging than the normal trains...
Ok, I've been working with Cisco equipment for 8 years now and I can configure them in my sleep, but all the version/image/train/feature set is still voodoo to me. Obviously, the router caches the information it wants to log for a while and then counts hits against the cache until it actually logs. This should work very well, and it does as per my tests on a heavily loaded 4500 router. So why would one type of IOS do this right and another version that isn't immediately recognizable by the version number as inferior do it wrong?
As stated above, it depends on the code. When logging high volume, I recommend turning off all logging facilities except the one you plan to use. Multiple logging facilities will create a multiple effect on the CPU for some trains and versions. ie. logging to console and syslog and running a term mon is a very, very bad thing under heavy logging. This also depends on what you are logging. Narrow the scope as much as possible, ie, log only a narrow customer selection at a time, then try the next.
possible and happily saturate it :( (Don't log on like a 7500 for instance if the packet rates are over like 5kpps...)
I think today's events show that CPU-based routers have no business handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4 Gbps coming in so the box must be able to handle it to some usable degree.
Actually, you wouldn't expect to see 4 Gbps comming in. That would be full saturation, which would imply serious performance degregation. Most networks that I've dealt with stick to a 70-80% saturation rule. In addition, many of the problems concerning this traffic weren't throughput issues. Each router has a bandwidth limitation and a pps limitation. The worst DDOS I've had to deal with didn't even show as a bandwidth spike on my circuits but exceeded the pps of the router. Luckily, such attacks are easily dealt with using access-lists as the router is optimized to block more pps than it is designed to switch. This worm had both. The packets were small and the bandwidth utilization was high. Blocking the packets would lower cpu utilization to a manageable degree while the bandwidth usage on each infected circuit was localized to that circuit. Depending on the type of circuit depended on how well it dealt with the loading as different L2 protocols handle saturation differently. ATM is the ideal medium as the latency remains lower than FE or GE at peak saturation. One's responsibility is only to the edge of their controllable network, though. If you can't shut off the ethernet port to an infected server, the customer is responsible for that equipment. Ideally, you have one customer per each circuit that you control. Jack Bates Network Engineer BrightNet Oklahoma
On Sat, 25 Jan 2003, Jack Bates wrote:
I think today's events show that CPU-based routers have no business handling anything more than 1 x 100 Mbps in and 1 x 100 Mbps out. If a box has 40 FE interfaces or 4 GE interfaces, at some point you'll see 4 Gbps coming in so the box must be able to handle it to some usable degree.
Actually, you wouldn't expect to see 4 Gbps comming in.
You wouldn't expect it, but it simply happens anyway.
That would be full saturation, which would imply serious performance degregation. Most networks that I've dealt with stick to a 70-80% saturation rule.
Unfortunately worms (or denial of service attackers) don't play nice.
In addition, many of the problems concerning this traffic weren't throughput issues. Each router has a bandwidth limitation and a pps limitation. The worst DDOS I've had to deal with didn't even show as a bandwidth spike on my circuits but exceeded the pps of the router.
That's my point: if you can exceed the router's pps while staying within the aggregate bandwidth for all ports on the box, you'll find yourself in trouble at some point.
Luckily, such attacks are easily dealt with using access-lists as the router is optimized to block more pps than it is designed to switch. This worm had both.
First of all, I don't want to have to install a filter to make a router usable again. Second, this one was easy to filter. We can't count on always being that lucky.
circuit depended on how well it dealt with the loading as different L2 protocols handle saturation differently. ATM is the ideal medium as the latency remains lower than FE or GE at peak saturation.
??? Latency is strictly a function of the average queue size, which is a function of the number of bits coming in vs the number of bits going out per unit of time. Iljitsch van Beijnum
On 1/25/03 2:00 AM, "Christopher J. Wolff" <chris@bblabs.com> wrote:
Greetings,
It looks like all hell is breaking loose on some of the nations backbones. http://www.internethealthreport.com
The port counters on my AT&T DS3 were reading in the 250 megabit range, that is a DS3, mind you.
Any source IP's I can add to the circular file would be appreciated. Any ranges I find I'll echo back to the list.
Regards, Christopher J. Wolff, VP CIO Broadband Laboratories, Inc. http://www.bblabs.com
You need a filter similar to this (in junos format):
show configuration firewall filter filter-012503 term deny-dos { from { packet-length 404; protocol udp; destination-port 1434; } then { count codered-4; discard; } } term allow-rest { then accept; }
--Phil ISPrime
--On Saturday, January 25, 2003 12:00:47 AM -0700 "Christopher J. Wolff" <chris@bblabs.com> wrote:
Greetings,
It looks like all hell is breaking loose on some of the nations backbones. http://www.internethealthreport.com
The port counters on my AT&T DS3 were reading in the 250 megabit range, that is a DS3, mind you.
Outbound? (can't imagine inbound counters breaking that badly)
Any source IP's I can add to the circular file would be appreciated. Any ranges I find I'll echo back to the list.
Forget IPs. Just block port 1434 protocol UDP in *and* out.
On Sat, 25 Jan 2003, Christopher J. Wolff wrote: Hi,
It looks like all hell is breaking loose on some of the nations backbones. http://www.internethealthreport.com
You are not the only one.. I've been sitting here since 06:30 now. So far I have discovered that a lot of Windows boxes send out UDP packes of 376 bytes to random addresses. 09:36:51.711380 802.1Q vlan#50 P0 213.136.0.251.3303 > 239.103.224.157.1434: udp 376 [ttl 1] (id 10818, len 404) 0x0000 0032 0800 4500 0194 2a42 0000 0111 e78e.2..E...*B...... 0x0010 d588 00fb ef67 e09d 0ce7 059a 0180 81db.....g.......... 0x0020 0401 0101 0101 0101 0101 0101 0101 0101................ 0x0030 0101 0101 0101 0101 0101 0101 0101 0101................ 0x0040 0101 0101 0101 0101 0101 0101 0101 0101................ 0x0050 0101 .. -- Sabri Berisha www.cluecentral.net "I route, therefore you are"
SB> Date: Sat, 25 Jan 2003 09:43:24 +0100 (CET) SB> From: Sabri Berisha SB> You are not the only one.. I've been sitting here since 06:30 SB> now. So far I have discovered that a lot of Windows boxes SB> send out UDP packes of 376 bytes to random addresses. Main body of worm contains an infinite loop that spews 0x178-byte long payload. Eddy -- Brotsman & Dreger, Inc. - EverQuick Internet Division Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 (785) 865-5885 Lawrence and [inter]national Phone: +1 (316) 794-8922 Wichita ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 21 May 2001 11:23:58 +0000 (GMT) From: A Trap <blacklist@brics.com> To: blacklist@brics.com Subject: Please ignore this portion of my mail signature. These last few lines are a trap for address-harvesting spambots. Do NOT send mail to <blacklist@brics.com>, or you are likely to be blocked.
Someone already posted this, but its some crazy wormy thingy on port 1434 udp. On Sat, 25 Jan 2003, Christopher J. Wolff wrote:
Greetings,
It looks like all hell is breaking loose on some of the nations backbones. http://www.internethealthreport.com
The port counters on my AT&T DS3 were reading in the 250 megabit range, that is a DS3, mind you.
Any source IP's I can add to the circular file would be appreciated. Any ranges I find I'll echo back to the list.
Regards, Christopher J. Wolff, VP CIO Broadband Laboratories, Inc. http://www.bblabs.com
Hi
Any ranges I find I'll echo back to the list.
not sure if you've received any nanog mail yet. don't worry about source ip's, unless you're doing to deny '0.0.0.0'. block anything with a destination of udp 1434, find hosts pushing extreme amounts of traffic, get them patched (http://www.microsoft.com/technet/treeview/default.asp?url=/technet/security/...) and then wait for the rest of the internet to catch up... --Rob
participants (11)
-
Christopher J. Wolff
-
Christopher L. Morrow
-
Doug Barton
-
E.B. Dreger
-
fingers
-
Iljitsch van Beijnum
-
Jack Bates
-
John Payne
-
Phil Rosenthal
-
Rob Thomas
-
Sabri Berisha