On Thu, Apr 12, 2012 at 4:18 PM, Ian McDonald <iam@st-andrews.ac.uk> wrote:
You'll need to build an array that'll random read/write upwards of 200MB/s if you want to get a semi-reliable capture to disk. That means SSD if you're very rich, or many spindles
Hey, Saving packet captures to file is a ~98% asynchronous write, 2% read; ~95% sequential activity. And maybe you think about applying some variant of header compression to the packets during capture, to trade a little CPU and increased RAM requirements for storage efficiency. The format used by PCAP and saving raw packet header bits directly to disk is not necessarily among the most I/O or space efficient on disk storage formats to pick. Random writes should only occur if you are saving your captures to a fragmented file system, which is not recommended; avoiding fragmentation is important. Random reads aren't involved for archiving data, only for analyzing it. Do you make random reads into your saved capture files? Possibly you're more likely to be doing a sequential scan, even during analysis; random reads imply you have already indexed a dataset and you are seeking a smaller number of specific records, to collect information about them. Read requirements are totally dependent on your analysis workload, e.g. Table scan vs Index search. Depending on what the analysis is, it may make sense to even make extra filtered copies of the data, using more disk space, in order to avoid a random access pattern. If you are building a database of analysis results from raw data, you can and use a separate random IO optimized disk subsystem for the stats database. If you really need approximately 200 MB/s with some random read performance for analysis, you should probably be looking at building a RAID50 with several 4-drive sets and 1gb+ of writeback cache. RAID10 makes more sense in situations where write requirements are not sequential, when external storage is actually shared with multiple applications, or when there is a requirement for a disk drive failure to be truly transparent, but there is a huge capacity sacrifice in choosing mirroring over parity. There is a Time vs Cost tradeoff with regards to the analysis of the data. When your 'analysis tools' start reading data, the reads increase the disk access time, and therefore reduce write performance; therefore the reads should be throttled, the higher the capacity the disk subsystem, the higher the cost. Performing your analysis ahead of time via pre-caching, or at least indexing newly captured data in small chunks on a continuous basis may be useful, to minimize the amount of searching of the raw dataset later. A small SSD or separate mirrored drive pair for that function, would avoid adding load to the "raw capture storage" disk system, if your analysis requirements are amenable to that pattern. Modern OSes cache some recent filesystem data in RAM. So if the server capturing data has sufficient SDRAM, analyzing data while it's still hot in the page cache, and saving that analysis in an efficient index for later use, can be useful.
(preferably 15k's) in a stripe/ raid10 if you're building from your scrap pile. Bear in mind that write >cache won't help you, as the io isn't going to be bursty, rather a continuous stream.
Not really... A good read cache is more important for the analysis, but Efficient write cache on your array and OS page cache is still highly beneficial, especially because it can ensure that your RAID subsystem is performing full stripe writes, for maximal efficiency of sequential write activity, and it can delay the media write until the optimal moment based on platter position, and sequence the read/write requests; as long as the performance of the storage system behind the cache is such that the storage system can on average successfully drain the cache at a faster rate than you can fill it with data a sufficient amount of the time, the write cache serves an important function. Your I/O may be a continuous stream, but there are most certainly variations and spikes in the rate of packets and the performance of mechanical disk drives.
Aligning your partitions with the physical disk geometry can produce surprising speedups, as can >stripe block size changes, but that's generally empirical, and depends on your workload.
For RAID systems partitions should absolutely be aligned if the OS install defaults don't align them correctly; on a modern OS, the defaults are normally OK. Having an unaligned or improperly aligned partition is just a misconfiguration; A track crossing for every other sector read is an easy way of doubling the size of small I/Os. You won't notice with this particular use case when you are writing large blocks, you're writing a 100Mb chunk, asynchronously, you won't notice a 63kB difference, it's less than .0001% of your transfer size; this is primarily a concern during analysis or database searching which may involve small random reads and small synchronous random writes. In other words, you will probably get away just ignoring partition alignment and filesystem block size, so there are other aspects of the configuration to be more concerned about (YMMV). -- -JH