Re: D/DoS mitigation hardware/software needed.

newer
Re: D/DoS mitigation...

older
trying to analyze vispa isp outage

Roger Marquis

10 Jan 2010 10 Jan '10

2:03 a.m.

Dobbins, Roland wrote:

...

...
Firewalls do have their place in DDoS mitigation scenarios, but if used as the "ultimate" solution you're asking for trouble.

In my experience, their role is to fall over and die, without exception.

That hasn't been my experience but then I'm not selling anything that might have a lower ROI than firewalls, in small to mid-sized installations.

...

I can't imagine what possible use a stateful firewall has being placed in front of servers under normal conditions, much less during a DDoS attack; it just doesn't make sense.

Firewalls are not designed to mitigate large scale DDoS, unlike Arbors, but they do a damn good job of mitigating small scale attacks of all kinds including DDoS. Firewalls actually do a better job for small to medium sites whereas you need an Arbor-like solution for large scale server farms. Firewalls do a good job of protecting servers, when properly configured, because they are designed exclusively for the task. Their CAM tables, realtime ASICs and low latencies are very much unlike the CPU-driven, interrupt-bound hardware and kernel-locking, multi-tasking software on a typical web server. IME it is a rare firewall that doesn't fail long, long after (that's after, not before) the hosts behind them would have otherwise gone belly-up. Rebooting a hosed firewall is also considerably easier than repairing corrupt database tables, cleaning full log partitions, identifying zombie processes, and closing their open file handles. Perhaps a rhetorical question but, does systems administration or operations staff agree with netop's assertion they 'don't need no stinking firewall'? Roger Marquis

Show replies by date

Dobbins, Roland

10 Jan 10 Jan

2:21 a.m.

New subject: D/DoS mitigation hardware/software needed.

On Jan 10, 2010, at 9:03 AM, Roger Marquis wrote:

...

That hasn't been my experience but then I'm not selling anything that might have a lower ROI than firewalls, in small to mid-sized installations.

I loudly evinced this position when I worked for the world's largest firewall vendor, so that dog won't hunt, sorry. Think about it; firewalls go down under DDoS *much more quickly than the hosts themselves*; Arbor and other vendor's IDMSes protect many, many firewalls unwisely deployed in front of servers, worldwide. Were I that sort of person (and I'm not, ask anyone who knows me), it's in my naked commercial interest to *promote* firewall deployments, so that *more* sites will go down more easily and require IDMSes, heh.

...

Firewalls are not designed to mitigate large scale DDoS, unlike Arbors, but they do a damn good job of mitigating small scale attacks of all kinds including DDoS.

Not been my experience at all - quite the opposite.

...

Firewalls actually do a better job for small to medium sites whereas you need an Arbor-like solution for large scale server farms.

No, S/RTBH and/or flow-spec are a much better answer for sites which don't need IDMS, read the thread. And they essentially cost nothing from a CAPEX perspective, and little from an OPEX perspective, as they leverage the existing network infrastructure.

...

Firewalls do a good job of protecting servers, when properly configured, because they are designed exclusively for the task.

No, they don't, and no, they aren't.

...

Their CAM tables, realtime ASICs and low latencies are very much unlike the CPU-driven, interrupt-bound hardware and kernel-locking, multi-tasking software on a typical web server. IME it is a rare firewall that doesn't fail long, long after (that's after, not before) the hosts behind them would have otherwise gone belly-up.

Completely incorrect on all counts. Sales propaganda regurgitated as gospel.

...

Rebooting a hosed firewall is also considerably easier than repairing corrupt database tables, cleaning full log partitions, identifying zombie processes, and closing their open file handles.

Properly-designed server installations don't have these problems. Firewalls don't help, either - they just go down.

...

Perhaps a rhetorical question but, does systems administration or operations staff agree with netop's assertion they 'don't need no stinking firewall'?

I've been a sysadmin, thanks. How about you? You can assert that the sun rises in the West all you like, but that doesn't make it true. All the assertions you've made above are 100% incorrect, as borne out by the real-world operational experiences of multiple people who've commented on this thread, not just me. I've worked inside the sausage factory, FYI, and am quite familiar with how modern firewalls function, what they can do, and their limitations. And they've no place in front of servers, period. ;> ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Injustice is relatively easy to bear; what stings is justice. -- H.L. Mencken

George Bonser

6:29 a.m.

New subject: D/DoS mitigation hardware/software needed.

...

Firewalls are not designed to mitigate large scale DDoS,

Generally speaking, if it didn't being the firewall to its knees, it wasn't a DoS. It was just sort of an annoying attempt at a DoS. I think that more or less the definition of a DoS is one that exploits the resource limitations of the firewall to deny service to everything behind it. The ultimate DoS, though, is simply filling the pipe with traffic from "legitimate" data transfer requests. Nothing you are going to do is going to mitigate that because to stop it you have to DoS yourself. Imagine thousands of requests per second from all around the internet for a legitimate URL. How do you use a firewall to separate the wheat from the chaff? So let's say you have some client software that you want people to download. Suddenly you are getting more download requests than you can handle. Nobody is flooding you with syn or icmp packets. They are sending a single packet (a legitimate URL) that results in you sending thousands of packets to real IP addresses that are simply copying the traffic to what amounts to /dev/null. Now when your download server gets slow, things get worse because connections begin to take longer to clear. The kernel on the web server is able to handle the tcp/ip setup fairly quickly but getting the file actually shipped out takes time. As connections build up on the firewall, it finally reaches a point where it is out of RAM in storing all those NAT translations and connection state. Now you start noticing that services not under attack are starting to slow down because the firewall has to sort through an increasingly large connection table when doing stateful inspection of traffic going to other services. All the while, there really isn't anything the firewall can do to mitigate the traffic because it is all correct and "legitimate". Basically you are being Slashdotted or experiencing the Drudge Effect but in this case you are being botnetted. If you have the server capacity to keep up, now your outbound pipe to the Internet is filling up, you are dropping packets, TCP/IP connections begin to back off, connections back up even more and at some point the firewall just gives up by failing over to the secondary, which then promptly fails back to the primary and you bounce back and forth in that state for a while and then finally it just gets hung someplace and the whole thing is stuck. And during the entire incident there was no "illegal traffic" that your firewall could have done a thing to block. Oh, and rate limiting connections isn't going to fix things either unless you can do it on a per URL basis. Maybe the rate of requests for /really-big-file.tgz that clogs your system is way different than the rate of requests for /somewhat-smaller-file.tgz or /index.html

Joe Greco

2:40 p.m.

New subject: D/DoS mitigation hardware/software needed.

...

Firewalls do a good job of protecting servers, when properly configured, because they are designed exclusively for the task. Their CAM tables, realtime ASICs and low latencies are very much unlike the CPU-driven, interrupt-bound hardware and kernel-locking, multi-tasking software on a typical web server. IME it is a rare firewall that doesn't fail long, long after (that's after, not before) the hosts behind them would have otherwise gone belly-up.

Then you need to get rid of that '90's antique web server and get something modern. When you say "interrupt-bound hardware," all you are doing is showing that you're not familiar with modern servers and quality operating systems that are designed to mitigate things like DDoS attacks. "Stateful filtering" is to firewalls what "interrupt-based packet processing" is to web servers. Both are recipes for disaster. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.

Roger Marquis

4:19 p.m.

New subject: D/DoS mitigation hardware/software needed.

...

Then you need to get rid of that '90's antique web server and get something modern. When you say "interrupt-bound hardware," all you are doing is showing that you're not familiar with modern servers and quality operating systems that are designed to mitigate things like DDoS attacks.

"Modern" servers? IP is processed in the kernel on web servers, regardless of OS. Have you configured a kernel lately? Noticed there are ~3,000 lines in the Linux config file alone? _Lots_ of device drivers in there, which are interrupt driven and have to be timeshared. No servers I know do realtime processing (RT kernels don't) or process IP in ASICs. What configurations of Linux / BSD / Solaris / etc does web / email / ntp / sip / iptables / ipfw / ... and doesn't have issues with kernel locking? Test it on your own servers by mounting a damaged DVD on the root directory, and dd'ing it to /dev/null. Notice how the ATA/SATA/SCSI driver impacts the latency of everything on the system. How would you replicate that on a firmware and ASIC drive appliance? Roger Marquis

Joe Greco

5:09 p.m.

New subject: D/DoS mitigation hardware/software needed.

...

...
Then you need to get rid of that '90's antique web server and get something modern. When you say "interrupt-bound hardware," all you are doing is showing that you're not familiar with modern servers and quality operating systems that are designed to mitigate things like DDoS attacks.

"Modern" servers? IP is processed in the kernel on web servers, regardless of OS. Have you configured a kernel lately?

Yes, pretty much every time I install a server.

...

Noticed there are ~3,000 lines in the Linux config file alone?

Well, that explains a lot. % wc -l /sys/i386/conf/WEBX4 324 /sys/i386/conf/WEBX4 I probably haven't noticed that there are ~3,000 lines in the Linux config file alone because I use a different OS; ~3,000 lines of config would just be another example of why I generally consider Linux to be a little broken. I can see why admins would be hesitant to challenge such a thing.

...

_Lots_ of device drivers in there, which are interrupt driven and have to be timeshared. No servers I know do realtime processing (RT kernels don't) or process IP in ASICs.

Roger, meet FreeBSD. FreeBSD, meet Roger. FreeBSD, would you please show Roger how IP is handled without excessive interrupts? % systat -vm (snipped from larger display) Interrupts 2208 total stray irq7 mux irq9 em5 irq5 85 ata0 irq14 mux irq11 fdc0 irq6 atkbd0 irq sio0 irq4 1995 clk irq0 128 rtc irq8 % netstat 1 input (Total) output packets errs bytes packets errs bytes colls 58991 0 54547321 58975 0 54523849 0 59492 0 58297208 59475 0 58388027 0 65828 0 62105928 65856 0 62081922 0 60257 0 56781863 60219 0 56809674 0 62547 0 61254034 62583 0 61231514 0 58188 9 55536734 58103 0 55560822 0 73870 0 70245952 73959 0 70223249 0 61436 0 58766122 61429 0 58786292 0 61390 0 59050710 61336 0 59029298 0 61447 0 58701312 61502 0 58725356 0 63934 0 60801413 63932 0 60777621 0 60187 0 56724030 60189 0 56751946 0 60247 0 55544082 60036 0 55522162 0 66472 0 63061572 66635 0 63033232 0 66415 0 62876955 66438 0 62854488 0 66612 0 63270235 66355 0 63335538 0 66020 0 60478426 66293 0 60454874 0 67696 0 63512069 67692 0 63534500 0 66342 0 60462142 66353 0 60439239 0 That's 60Kpps being handled with 2K interrupts per second. It'll be 2K interrupts per second at 0pps or 200Kpps or whatever. % ipfw l | wc -l 620 It's doing nontrivial amounts of firewalling while doing this. % top last pid: 83148; load averages: 0.31, 0.28, 0.23 up 459+08:00:24 12:00:33 51 processes: 3 running, 42 sleeping, 6 stopped CPU states: 14.8% user, 0.0% nice, 19.1% system, 13.3% interrupt, 52.7% idle % cat /var/run/dmesg.boot [...] CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2994.90-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf41 Stepping = 1 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> [...] Ewww, but it *is* a 2004-vintage Pentium Prescott CPU on a legacy PCI mobo, so it is actually a little disadvantaged compared to modern hardware.

...

What configurations of Linux / BSD / Solaris / etc does web / email / ntp / sip / iptables / ipfw / ... and doesn't have issues with kernel locking?

That's like saying "what cars cannot be crashed into a wall." A much better question is "what combination of driver and vehicle can I get that significantly reduces the chances of my being involved in a crash." Driver is important because even the best vehicle can be driven into a wall; vehicle is important because even the best driver is severely limited by a decrepit old car. It's when you get a great driver in a great vehicle that you get the good results.

...

Test it on your own servers by mounting a damaged DVD on the root directory, and dd'ing it to /dev/null. Notice how the ATA/SATA/SCSI driver impacts the latency of everything on the system.

As soon as a remote attacker is able to insert a damaged DVD into one of my servers (maybe via specially crafted IP options in a TCP packet?), you will witness my posterior emit a large number of blocks of ceramic material (used in masonry construction). Until then, I am unfazed by this because it isn't particularly relevant to the discussion. I can cause excessive latency simply by switching off gear too. I *strongly* suggest you go and look over http://info.iet.unipi.it/~luigi/polling/ /and note its date/ before you compose any reply; device polling has been around for a *long* time and its usefulness as a DDoS mitigator in the server arena is hard to refute. ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.

Valdis.Kletnieks＠vt.edu

5:25 p.m.

New subject: D/DoS mitigation hardware/software needed.

On Sun, 10 Jan 2010 08:19:27 PST, Roger Marquis said:

...

...
Then you need to get rid of that '90's antique web server and get something modern. When you say "interrupt-bound hardware," all you are doing is showing that you're not familiar with modern servers and quality operating systems that are designed to mitigate things like DDoS attacks.

"Modern" servers? IP is processed in the kernel on web servers, regardless of OS. Have you configured a kernel lately? Noticed there are ~3,000 lines in the Linux config file alone? _Lots_ of device drivers in there, which are interrupt driven and have to be timeshared.

Yes, but all the fast network adapters are able to do a lot of stuff like interrupt coalescing so you don't need to take an interrupt on every packet. And "have you configured a kernel lately" is another red herring - yes, there are indeed be 4,533 lines in the current Fedora .config. But that's because that config turns on everything under the sun. I just checked, and my current kernel config has only 960 '=y' lines, and another 220 '=m' lines - and a large portion of those could easily be turned off. I have a minimal config file that comes in under 730 non-comment lines.

...

No servers I know do realtime processing (RT kernels don't) or process IP in ASICs.

That's because in general, processing the IP in an ASIC simply Does Not Work as well as you might hope. Alan Cox did a nice discussion of some of the issues here: http://lkml.indiana.edu/hypermail/linux/kernel/0307.1/2116.html One should read his last paragraph carefully, and note that what he wrote back in 2003 is still true today: http://www.internet2.edu/lsr/history.html

...

What configurations of Linux / BSD / Solaris / etc does web / email / ntp / sip / iptables / ipfw / ... and doesn't have issues with kernel locking?

So let me get this straight - you perceive a problem with locking inside the kernel, where if you're lucky the lock is in an already-hot cache line and your biggest worry is cache line ping-ponging, and if you're unlucky you actually have to go out the southbridge and hit main memory, at main memory access speeds. And to fix this, you're going to move one of the things contending for the lock off the CPU, so now every time the lock is contended, it has to go out through the PCI bridge to an external card? How the heck is that supposed to help? You're suggesting the same "go talk to another card" solution that the router vendors learned is the *last* thing you want to do - calling out to the supervisor card rather than handling it onboard the line card is guaranteed performance death.

...

Test it on your own servers by mounting a damaged DVD on the root directory, and dd'ing it to /dev/null. Notice how the ATA/SATA/SCSI driver impacts the latency of everything on the system. How would you replicate that on a firmware and ASIC drive appliance?

There's two little things you went astray on here: 1) I've in fact had to do this while doing data recovery. It doesn't do squat to the latency of anything that doesn't have to go through the same controller as the DVD. Everything else works just fine. Heck, it isn't even enough to cause audio playback skips (and those are noticeable even at the millisecond level). 2) Your latency hit is because the controller is *busy* while trying to re-read and error-correct a bad block. So yeah - trying to do I/O through a controller that's taking a several-second time-out dealing with bad media will cause a latency hit *for that I/O*. What's your point?

5662

Age (days ago)

5662

Last active (days ago)

List overview

Download

6 comments

5 participants

participants (5)

Dobbins, Roland
George Bonser
Joe Greco
Roger Marquis
Valdis.Kletnieks＠vt.edu