Re: Let's talk about Distance Sniffing/Remote Visibility

29 Mar 2002

      On Thu, Mar 28, 2002 at 03:14:53PM -0800, Gironda, Andre wrote:
...
Why do you say that?  In the 10/100 range, yes, no problems.  But at
the Gigabit range (say with two GbE cards or a single OC-48 card) on
an x86 box with IDE disks (or even SCSI RAID0), doesn't disk I/O
become a severe problem?  Under Solaris or Linux, scaling disk seems
relatively easy with Veritas Foundation Suite on Solaris or GFS under
Linux http://www.sistina.com/products_gfs.htm
Capturing packets for realtime analysis is an attainable goal using cheap
off the shelf hardware and a little bit of clue. Storing many Gbps of data
on a harddrive is much harder task. Even using 160Gig drives, 1Gbps fills
one in about 20 minutes (10 if you're recording full duplex). Unless
you're the FBI, I really don't think you want to store that much data for
any reason. Be smart in what you write to disk, and how you write it.
...
However, I don't think Linux or Solaris can handle the packet capture
capabilities like FreeBSD and BPF can.  I've heard things about the
new LPF capabilities and turbopacket, but it's just hard to believe
coming from such a joke/toy operating system.
The data capture mechanism of BPF is pretty simple (the filter language is
whats complex), I doubt even Linux can get it too wrong.

All you need is a buffer in the kernel (FreeBSD defaults to 4096, 
libpcap turns it up to 32768 I believe but doesn't expose the value to the 
user, you should probably turn that up a bit if you want to capture at 
high speed). Read data from the nic, copy it into the buffer (or 
preferably have the NIC be responsable for transfering it into the buffer 
:P), and increment the offset. Then when someone comes along to read for 
more data, copy out the buffer into the userland buffer, use the offset 
value to indicate the total length, and reset the offset to 0. If you need 
more than 20 lines to do that part, you're probably doing it wrong. :)

In normal use of bpf the data is copied 3 times, from the NIC to an mbuf,
from the mbuf to the bpf buffer if there is a configured bpf reader, and
then from the bpf buffer to the user supplied buffer when the user does a
read() on the BPF descriptor. Fortunately multiple packets are buffered
into a single copy in stages 2 and 3. If you want to eliminate some of
those copies, you have to make a dedicated reader mechanism. Malloc the
memory in userland so you get nice page aligned chunk, allocate the
counters in userland and pass it in via a character device similar to BPF. 
You probably want to go with a ring structure, use 2 counters as a 
producer and consumer index. The kernel updates the producer index, and 
you update the consumer index as you process data. When both values are 
equal, the ring is empty. When the end is 1 below the start, the ring is 
full. With an intelligent card, you pass the memory address of your 
userland allocated memory as where you want the RX data to be DMA'd. The 
kernel updates the producer index, discarding any data which the consumer 
can't read. Then you just have your userland program constantly scanning 
the ring for new data, put a usleep(1); in there and you'll stay below 
0.01% cpu.

Think there would be a benefit to writing this as an extension to BPF?

-- 
Richard A Steenbergen <ras@e-gerbil.net>       http://www.e-gerbil.net/ras
PGP Key ID: 0x138EA177  (67 29 D7 BC E8 18 3E DA  B2 46 B3 D8 14 36 FE B6)