On Sun, 10 Jan 2010 08:19:27 PST, Roger Marquis said:
Then you need to get rid of that '90's antique web server and get something modern. When you say "interrupt-bound hardware," all you are doing is showing that you're not familiar with modern servers and quality operating systems that are designed to mitigate things like DDoS attacks.
"Modern" servers? IP is processed in the kernel on web servers, regardless of OS. Have you configured a kernel lately? Noticed there are ~3,000 lines in the Linux config file alone? _Lots_ of device drivers in there, which are interrupt driven and have to be timeshared.
Yes, but all the fast network adapters are able to do a lot of stuff like interrupt coalescing so you don't need to take an interrupt on every packet. And "have you configured a kernel lately" is another red herring - yes, there are indeed be 4,533 lines in the current Fedora .config. But that's because that config turns on everything under the sun. I just checked, and my current kernel config has only 960 '=y' lines, and another 220 '=m' lines - and a large portion of those could easily be turned off. I have a minimal config file that comes in under 730 non-comment lines.
No servers I know do realtime processing (RT kernels don't) or process IP in ASICs.
That's because in general, processing the IP in an ASIC simply Does Not Work as well as you might hope. Alan Cox did a nice discussion of some of the issues here: http://lkml.indiana.edu/hypermail/linux/kernel/0307.1/2116.html One should read his last paragraph carefully, and note that what he wrote back in 2003 is still true today: http://www.internet2.edu/lsr/history.html
What configurations of Linux / BSD / Solaris / etc does web / email / ntp / sip / iptables / ipfw / ... and doesn't have issues with kernel locking?
So let me get this straight - you perceive a problem with locking inside the kernel, where if you're lucky the lock is in an already-hot cache line and your biggest worry is cache line ping-ponging, and if you're unlucky you actually have to go out the southbridge and hit main memory, at main memory access speeds. And to fix this, you're going to move one of the things contending for the lock off the CPU, so now every time the lock is contended, it has to go out through the PCI bridge to an external card? How the heck is that supposed to help? You're suggesting the same "go talk to another card" solution that the router vendors learned is the *last* thing you want to do - calling out to the supervisor card rather than handling it onboard the line card is guaranteed performance death.
Test it on your own servers by mounting a damaged DVD on the root directory, and dd'ing it to /dev/null. Notice how the ATA/SATA/SCSI driver impacts the latency of everything on the system. How would you replicate that on a firmware and ASIC drive appliance?
There's two little things you went astray on here: 1) I've in fact had to do this while doing data recovery. It doesn't do squat to the latency of anything that doesn't have to go through the same controller as the DVD. Everything else works just fine. Heck, it isn't even enough to cause audio playback skips (and those are noticeable even at the millisecond level). 2) Your latency hit is because the controller is *busy* while trying to re-read and error-correct a bad block. So yeah - trying to do I/O through a controller that's taking a several-second time-out dealing with bad media will cause a latency hit *for that I/O*. What's your point?