Network Storage

Maverick

12 Apr 2012 12 Apr '12

8:25 p.m.

Hello Everyone, Can you please comment on what is best solution for storing network traffic. We have been graciously granted access by our network administrator to capture traffic but the one Tera byte disk space is no match with the data that we are seeing, so it fills up quickly. We can't get additional space on the server itself so I am looking for some external solutions. Can you please suggest something that would be best for Gbps speeds . Best, Ali

Show replies by date

Joel jaeggli

12 Apr 12 Apr

8:43 p.m.

Depends on the duration and goals of your capture... 1TB is 2.276 hours at 1Gb/s If you need to capture it all and store it forever well sorry. If you just need the flows and not the packets sampled netflow can reduce youre requirements by many orders of magnitude, ultimately it really depends on your goals. if you need to capture more data for a shorter duration probably write speed rather than capcity is the issue 20Gb/s is 2.5GB/s which requires a pretty healthy disk array to write to disk... On 4/12/12 13:25 , Maverick wrote:

...

Hello Everyone,

Can you please comment on what is best solution for storing network traffic. We have been graciously granted access by our network administrator to capture traffic but the one Tera byte disk space is no match with the data that we are seeing, so it fills up quickly. We can't get additional space on the server itself so I am looking for some external solutions. Can you please suggest something that would be best for Gbps speeds .

Best, Ali

Michael J McCafferty

9:06 p.m.

Ali, Do you need to capture the whole packet, including the payload? You will save a lot of space by just capturing the headers. For example, tcpdump doesn't capture the whole packet by default anyway. You may not be able to capture at line rate anyway depending on what you are using to capture with (drivers, libraries, software, etc). See the -s option in tcpdump man page for info. Good luck, Mike On Thu, 2012-04-12 at 16:25 -0400, Maverick wrote:

...

Hello Everyone,

Can you please comment on what is best solution for storing network traffic. We have been graciously granted access by our network administrator to capture traffic but the one Tera byte disk space is no match with the data that we are seeing, so it fills up quickly. We can't get additional space on the server itself so I am looking for some external solutions. Can you please suggest something that would be best for Gbps speeds .

Best, Ali

-- ************************************************************ Michael J. McCafferty CEO M5 Hosting http://www.m5hosting.com Like us on Facebook for updates and photos: https://www.facebook.com/m5hosting ************************************************************

Maverick

9:16 p.m.

Thank you very much for your suggestions. 1) My goal is to store the traffic may be fore ever, and analyze it in the future for security related incidents detected by ids/ips. 2) I am storing just header and initial few bytes but still it gets filled up quite quickly. 3) Netflow approach is nice but I also want to have traces available for reasons mentioned in 1). 4) Are there any issues having an external storage as a solution for this problem. Best, Ali On Thu, Apr 12, 2012 at 5:06 PM, Michael J McCafferty <mike@m5computersecurity.com> wrote:

...

Ali, Do you need to capture the whole packet, including the payload? You will save a lot of space by just capturing the headers. For example, tcpdump doesn't capture the whole packet by default anyway. You may not be able to capture at line rate anyway depending on what you are using to capture with (drivers, libraries, software, etc). See the -s option in tcpdump man page for info.

Good luck, Mike

On Thu, 2012-04-12 at 16:25 -0400, Maverick wrote:

...
Hello Everyone,

Can you please comment on what is best solution for storing network traffic. We have been graciously granted access by our network administrator to capture traffic but the one Tera byte disk space is no match with the data that we are seeing, so it fills up quickly. We can't get additional space on the server itself so I am looking for some external solutions. Can you please suggest something that would be best for Gbps speeds .

Best, Ali

-- ************************************************************ Michael J. McCafferty CEO M5 Hosting http://www.m5hosting.com

Like us on Facebook for updates and photos: https://www.facebook.com/m5hosting ************************************************************

John T. Yocum

9:18 p.m.

In that case, just keep adding disks to you capture system, or use a NAS to do it. --John On 4/12/2012 2:16 PM, Maverick wrote:

...

Thank you very much for your suggestions.

1) My goal is to store the traffic may be fore ever, and analyze it in the future for security related incidents detected by ids/ips.

2) I am storing just header and initial few bytes but still it gets filled up quite quickly.

3) Netflow approach is nice but I also want to have traces available for reasons mentioned in 1).

4) Are there any issues having an external storage as a solution for this problem.

Best, Ali

On Thu, Apr 12, 2012 at 5:06 PM, Michael J McCafferty <mike@m5computersecurity.com> wrote:

...
Ali, Do you need to capture the whole packet, including the payload? You will save a lot of space by just capturing the headers. For example, tcpdump doesn't capture the whole packet by default anyway. You may not be able to capture at line rate anyway depending on what you are using to capture with (drivers, libraries, software, etc). See the -s option in tcpdump man page for info.

Good luck, Mike

On Thu, 2012-04-12 at 16:25 -0400, Maverick wrote:

...
Hello Everyone,

Can you please comment on what is best solution for storing network traffic. We have been graciously granted access by our network administrator to capture traffic but the one Tera byte disk space is no match with the data that we are seeing, so it fills up quickly. We can't get additional space on the server itself so I am looking for some external solutions. Can you please suggest something that would be best for Gbps speeds .

Best, Ali

-- ************************************************************ Michael J. McCafferty CEO M5 Hosting http://www.m5hosting.com

Like us on Facebook for updates and photos: https://www.facebook.com/m5hosting ************************************************************

Valdis.Kletnieks＠vt.edu

9:34 p.m.

On Thu, 12 Apr 2012 14:18:30 -0700, "John T. Yocum" said:

...

In that case, just keep adding disks to you capture system, or use a NAS to do it.

On Thu, 12 Apr 2012 13:43:49 -0700, Joel jaeggli said:

...

1TB is 2.276 hours at 1Gb/s

If he's got a gigabit of traffic, he's going to be adding another shelf of 12 1T drives to that NAS - every day. If he gets the high-density shelves with 60 drives, he's only adding one a week. He's going to have to work smarter, not harder.

John T. Yocum

9:37 p.m.

On 4/12/2012 2:34 PM, Valdis.Kletnieks@vt.edu wrote:

...

On Thu, 12 Apr 2012 14:18:30 -0700, "John T. Yocum" said:

...
In that case, just keep adding disks to you capture system, or use a NAS to do it.

On Thu, 12 Apr 2012 13:43:49 -0700, Joel jaeggli said:

...
1TB is 2.276 hours at 1Gb/s

If he's got a gigabit of traffic, he's going to be adding another shelf of 12 1T drives to that NAS - every day. If he gets the high-density shelves with 60 drives, he's only adding one a week.

He's going to have to work smarter, not harder.

He did indicate he's only storing the headers and a few bytes, not the full payload. --John

Dan Olson

9:44 p.m.

If this is just for post analysis and you have another system (IDS) to identify the timeframe, a tape based system might be a better approach, esp if you want to retain forever. Maybe "Library LTFS" ----- Original Message ----- From: "John T. Yocum" <john.yocum@fluidhosting.com> To: "Valdis Kletnieks" <Valdis.Kletnieks@vt.edu> Cc: nanog@nanog.org Sent: Thursday, April 12, 2012 5:37:38 PM Subject: Re: Network Storage On 4/12/2012 2:34 PM, Valdis.Kletnieks@vt.edu wrote:

...

On Thu, 12 Apr 2012 14:18:30 -0700, "John T. Yocum" said:

...
In that case, just keep adding disks to you capture system, or use a NAS to do it.

On Thu, 12 Apr 2012 13:43:49 -0700, Joel jaeggli said:

...
1TB is 2.276 hours at 1Gb/s

If he's got a gigabit of traffic, he's going to be adding another shelf of 12 1T drives to that NAS - every day. If he gets the high-density shelves with 60 drives, he's only adding one a week.

He's going to have to work smarter, not harder.

He did indicate he's only storing the headers and a few bytes, not the full payload. --John

Matthew Luckie

9:47 p.m.

...

1) My goal is to store the traffic may be fore ever, and analyze it in the future for security related incidents detected by ids/ips.

Take a look at "Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic" https://www.usenix.org/conference/imc-05/building-time-machine-efficient-rec...

Jared Mauch

10:19 p.m.

You can also look at a machine like this: http://www.supermicro.com/products/chassis/4U/417/SC417E16-R1400U.cfm Jared Mauch On Apr 12, 2012, at 5:47 PM, Matthew Luckie <mjl@luckie.org.nz> wrote:

...

...
1) My goal is to store the traffic may be fore ever, and analyze it in the future for security related incidents detected by ids/ips.

Take a look at "Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic"

https://www.usenix.org/conference/imc-05/building-time-machine-efficient-rec...

George Herbert

15 Apr 15 Apr

11:18 p.m.

On Thu, Apr 12, 2012 at 3:19 PM, Jared Mauch <jared@puck.nether.net> wrote:

...

You can also look at a machine like this:

http://www.supermicro.com/products/chassis/4U/417/SC417E16-R1400U.cfm

Jared Mauch

On Apr 12, 2012, at 5:47 PM, Matthew Luckie <mjl@luckie.org.nz> wrote:

...
...
1) My goal is to store the traffic may be fore ever, and analyze it in the future for security related incidents detected by ids/ips.

Take a look at "Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic"

https://www.usenix.org/conference/imc-05/building-time-machine-efficient-rec...

Just FYI, it's somewhat of a tossup on large large arrays with 3.5" and 2.5" models. Equivalent 3.5" units hold 36-48 HDDs, and drive sizes for enterprise SAS drives are 3 TB in 3.5" vs 1 TB in 2.5" now, so you get more per box with 3.5" drives. Also a lot cheaper in the end. About six months ago I purchased two similar boxes for nearline backups purposes (lower bandwidth) with 3.5" drives; 34 x 3 TB plus a couple of much faster 2.5" 15k boot drives, post-RAID-10-and-hotspare-and-filesystem usable space was about 42 TB. About $22k each. One can go somewhat cheaper than that but the VAR had a good support story and "just fixed it" the next day when a RAID card model didn't quite work out. -- -george william herbert george.herbert@gmail.com

Andrew Thrift

11:43 p.m.

If you want something from a Tier1 the new Dell R720XD's will take 24x 900GB SAS disks and have 16 cores. If you order it with a SAS6-HBA you can add up to 8 trays of 24 x 900GB SAS disks to provide 194TB of raw space at quite a reasonable cost. Alternatively, you could have a couple of "probe" servers connected to some nice fast SAN backend with redundant controllers. This will provide failover at the probe and storage levels but will cost a fair bit more :) Regards, Andrew On 16/04/2012 11:18 a.m., George Herbert wrote:

...

On Thu, Apr 12, 2012 at 3:19 PM, Jared Mauch<jared@puck.nether.net> wrote:

...
You can also look at a machine like this:

http://www.supermicro.com/products/chassis/4U/417/SC417E16-R1400U.cfm

Jared Mauch

On Apr 12, 2012, at 5:47 PM, Matthew Luckie<mjl@luckie.org.nz> wrote:

...
...
1) My goal is to store the traffic may be fore ever, and analyze it in the future for security related incidents detected by ids/ips. Take a look at "Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic"

https://www.usenix.org/conference/imc-05/building-time-machine-efficient-rec... Just FYI, it's somewhat of a tossup on large large arrays with 3.5" and 2.5" models. Equivalent 3.5" units hold 36-48 HDDs, and drive sizes for enterprise SAS drives are 3 TB in 3.5" vs 1 TB in 2.5" now, so you get more per box with 3.5" drives. Also a lot cheaper in the end.

About six months ago I purchased two similar boxes for nearline backups purposes (lower bandwidth) with 3.5" drives; 34 x 3 TB plus a couple of much faster 2.5" 15k boot drives, post-RAID-10-and-hotspare-and-filesystem usable space was about 42 TB. About $22k each. One can go somewhat cheaper than that but the VAR had a good support story and "just fixed it" the next day when a RAID card model didn't quite work out.

Simon Leinen

16 Apr 16 Apr

9:37 a.m.

Andrew Thrift writes:

...

If you want something from a Tier1 the new Dell R720XD's will take 24x 900GB SAS disks

or 12x 2TB 3.5" cheap & slow SATA disks or 12x 3TB 3.5" more expensive & slightly faster SAS disks - if you take the (cheaper) 3.5"-disk variant of the R720xd chassis. or 12x 3TB 3.5" cheap&slow SATA disks if you buy them directly rather than from Dell. (Presumably you'd have to buy Dell "hot-swap trays") -- Simon.

...

and have 16 cores. If you order it with a SAS6-HBA you can add up to 8 trays of 24 x 900GB SAS disks to provide 194TB of raw space at quite a reasonable cost.

Drew Weaver

3:58 p.m.

I'd like to point out that you can actually do 26 2.5" disks on an R720xd if you use the flexbay +1 SD card for your os install if you're being a maximalist. =) -Drew -----Original Message----- From: Simon Leinen [mailto:simon.leinen@switch.ch] Sent: Monday, April 16, 2012 5:38 AM To: Andrew Thrift Cc: nanog@nanog.org Subject: Re: Network Storage Andrew Thrift writes:

...

If you want something from a Tier1 the new Dell R720XD's will take 24x 900GB SAS disks

...

and have 16 cores. If you order it with a SAS6-HBA you can add up to 8 trays of 24 x 900GB SAS disks to provide 194TB of raw space at quite a reasonable cost.

Michael J McCafferty

12 Apr 12 Apr

10:01 p.m.

more in-line... On Thu, 2012-04-12 at 17:16 -0400, Maverick wrote:

...

Thank you very much for your suggestions.

1) My goal is to store the traffic may be fore ever, and analyze it in the future for security related incidents detected by ids/ips.

The poor man's way to do this is to use the space you have and use the -C and -W options in tcpdump. You have as much history as you have disk space. Maybe make 500M files, and a count of 1800 to use 900G of disk space. When you have an event, you copy off the files that are relevant to the time period of the events, to a workstation. Another option is -G for rotating the files by time instead of size.

...

2) I am storing just header and initial few bytes but still it gets filled up quite quickly.

You can use the -z option to gzip compress the files to save space. However, I don't know how this will affect your disk io... will it be fast enough to keep up with the writing of the raw data and doing a concurrent gzip of the last file. If you have enough hardware performance, but are limited on space, then it's worth a shot.

...

3) Netflow approach is nice but I also want to have traces available for reasons mentioned in 1).

4) Are there any issues having an external storage as a solution for this problem.

There is also some advice in the man page for tcpdump regarding the -z option. You can write a shell script that takes the capture file as the only argument, to do other stuff you want done... in this case, copy the file off to another drive. It could be a network location too... of course, don't forget to not capture *that* traffic (feedback!).

...

Best, Ali

On Thu, Apr 12, 2012 at 5:06 PM, Michael J McCafferty <mike@m5computersecurity.com> wrote:

...
Ali, Do you need to capture the whole packet, including the payload? You will save a lot of space by just capturing the headers. For example, tcpdump doesn't capture the whole packet by default anyway. You may not be able to capture at line rate anyway depending on what you are using to capture with (drivers, libraries, software, etc). See the -s option in tcpdump man page for info.

Good luck, Mike

On Thu, 2012-04-12 at 16:25 -0400, Maverick wrote:

...
Hello Everyone,

Can you please comment on what is best solution for storing network traffic. We have been graciously granted access by our network administrator to capture traffic but the one Tera byte disk space is no match with the data that we are seeing, so it fills up quickly. We can't get additional space on the server itself so I am looking for some external solutions. Can you please suggest something that would be best for Gbps speeds .

Best, Ali

-- ************************************************************ Michael J. McCafferty CEO M5 Hosting http://www.m5hosting.com

Like us on Facebook for updates and photos: https://www.facebook.com/m5hosting ************************************************************

Leo Bicknell

15 Apr 15 Apr

1:38 p.m.

In a message written on Thu, Apr 12, 2012 at 05:16:27PM -0400, Maverick wrote:

...

1) My goal is to store the traffic may be fore ever, and analyze it in the future for security related incidents detected by ids/ips.

Let's just assume you have enough disk space that you can write out every packet, or even just packet header. That's a hard problem, but you've received plenty of suggestions on how to go down that path. Once you have that data, how are you going to process it? Yes, disk reads are faster than disk writes, but not by that much. If it takes you 24 hours to write a day of data to disk, it might take you 12 hours just to read it all back off and process it. Processing a weeks worth of back data could take days. I'm also not even starting to count the CPU and memory necessary to build state tables and statistical analysis tables to generate useful data. There's a reason why most network traffic tools summarize early, as early as on the network device when using Netflow type collection. It's not just to save storage space on disk, but it's to make the processing of the data fast enough that it can be done in a short enough time that the data is still relevant when the processing is complete. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

Ian McDonald

12 Apr 12 Apr

9:18 p.m.

Hi, You'll need to build an array that'll random read/write upwards of 200MB/s if you want to get a semi-reliable capture to disk. That means SSD if you're very rich, or many spindles (preferably 15k's) in a stripe/ raid10 if you're building from your scrap pile. Bear in mind that write cache won't help you, as the io isn't going to be bursty, rather a continuous stream. Another great help is scoping what you're looking for and pre-processing before writing out only the 'interesting' bits, thus reducing the io requirement. It does depend what you're trying to do, as headers can be adequate for many applications. Aligning your partitions with the physical disk geometry can produce surprising speedups, as can stripe block size changes, but that's generally empirical, and depends on your workload. -- ian -----Original Message----- From: Maverick Sent: 12/04/2012, 21:27 To: nanog@nanog.org Subject: Network Storage Hello Everyone, Can you please comment on what is best solution for storing network traffic. We have been graciously granted access by our network administrator to capture traffic but the one Tera byte disk space is no match with the data that we are seeing, so it fills up quickly. We can't get additional space on the server itself so I am looking for some external solutions. Can you please suggest something that would be best for Gbps speeds . Best, Ali

Jimmy Hess

13 Apr 13 Apr

5:45 a.m.

On Thu, Apr 12, 2012 at 4:18 PM, Ian McDonald <iam@st-andrews.ac.uk> wrote:

...

You'll need to build an array that'll random read/write upwards of 200MB/s if you want to get a semi-reliable capture to disk. That means SSD if you're very rich, or many spindles

Hey, Saving packet captures to file is a ~98% asynchronous write, 2% read; ~95% sequential activity. And maybe you think about applying some variant of header compression to the packets during capture, to trade a little CPU and increased RAM requirements for storage efficiency. The format used by PCAP and saving raw packet header bits directly to disk is not necessarily among the most I/O or space efficient on disk storage formats to pick. Random writes should only occur if you are saving your captures to a fragmented file system, which is not recommended; avoiding fragmentation is important. Random reads aren't involved for archiving data, only for analyzing it. Do you make random reads into your saved capture files? Possibly you're more likely to be doing a sequential scan, even during analysis; random reads imply you have already indexed a dataset and you are seeking a smaller number of specific records, to collect information about them. Read requirements are totally dependent on your analysis workload, e.g. Table scan vs Index search. Depending on what the analysis is, it may make sense to even make extra filtered copies of the data, using more disk space, in order to avoid a random access pattern. If you are building a database of analysis results from raw data, you can and use a separate random IO optimized disk subsystem for the stats database. If you really need approximately 200 MB/s with some random read performance for analysis, you should probably be looking at building a RAID50 with several 4-drive sets and 1gb+ of writeback cache. RAID10 makes more sense in situations where write requirements are not sequential, when external storage is actually shared with multiple applications, or when there is a requirement for a disk drive failure to be truly transparent, but there is a huge capacity sacrifice in choosing mirroring over parity. There is a Time vs Cost tradeoff with regards to the analysis of the data. When your 'analysis tools' start reading data, the reads increase the disk access time, and therefore reduce write performance; therefore the reads should be throttled, the higher the capacity the disk subsystem, the higher the cost. Performing your analysis ahead of time via pre-caching, or at least indexing newly captured data in small chunks on a continuous basis may be useful, to minimize the amount of searching of the raw dataset later. A small SSD or separate mirrored drive pair for that function, would avoid adding load to the "raw capture storage" disk system, if your analysis requirements are amenable to that pattern. Modern OSes cache some recent filesystem data in RAM. So if the server capturing data has sufficient SDRAM, analyzing data while it's still hot in the page cache, and saving that analysis in an efficient index for later use, can be useful.

...

(preferably 15k's) in a stripe/ raid10 if you're building from your scrap pile. Bear in mind that write >cache won't help you, as the io isn't going to be bursty, rather a continuous stream.

Not really... A good read cache is more important for the analysis, but Efficient write cache on your array and OS page cache is still highly beneficial, especially because it can ensure that your RAID subsystem is performing full stripe writes, for maximal efficiency of sequential write activity, and it can delay the media write until the optimal moment based on platter position, and sequence the read/write requests; as long as the performance of the storage system behind the cache is such that the storage system can on average successfully drain the cache at a faster rate than you can fill it with data a sufficient amount of the time, the write cache serves an important function. Your I/O may be a continuous stream, but there are most certainly variations and spikes in the rate of packets and the performance of mechanical disk drives.

...

Aligning your partitions with the physical disk geometry can produce surprising speedups, as can >stripe block size changes, but that's generally empirical, and depends on your workload.

For RAID systems partitions should absolutely be aligned if the OS install defaults don't align them correctly; on a modern OS, the defaults are normally OK. Having an unaligned or improperly aligned partition is just a misconfiguration; A track crossing for every other sector read is an easy way of doubling the size of small I/Os. You won't notice with this particular use case when you are writing large blocks, you're writing a 100Mb chunk, asynchronously, you won't notice a 63kB difference, it's less than .0001% of your transfer size; this is primarily a concern during analysis or database searching which may involve small random reads and small synchronous random writes. In other words, you will probably get away just ignoring partition alignment and filesystem block size, so there are other aspects of the configuration to be more concerned about (YMMV). -- -JH

Kyle Creyts

15 Apr 15 Apr

6:43 a.m.

Storage capable of keeping up with 10G/20G packet capture doesn't have to be extremely expensive... We build this with a commodity host, multiple 10G, multiple SAS HBAs each attached to a JBOD enclosure of at least 36 4TB 7.2k commodity sata3 disks. In our configuration, this delivers 58 TB per JBOD enclosure. Properly tuned, and with a little commodity SSD cache, it delivers synchronous sequential reads and writes over 2.5GB/sec, (and incredible random speeds which I can't recall off the top of my head) and all for under $25k. It could yield less or much more, depending on your redundancy/striping choices. Run out of room? Fill another JBOD shelf for ~18k. You could opt for lower parity than we did, or fewer stripes. Either one would stretch the space out by quite a bit. (At least 20 TB.) I didn't want to be constantly changing drives out, however. On Apr 13, 2012 1:46 AM, "Jimmy Hess" <mysidia@gmail.com> wrote:

...

On Thu, Apr 12, 2012 at 4:18 PM, Ian McDonald <iam@st-andrews.ac.uk> wrote:

...
You'll need to build an array that'll random read/write upwards of 200MB/s if you want to get a semi-reliable capture to disk. That means SSD if you're very rich, or many spindles

Hey, Saving packet captures to file is a ~98% asynchronous write, 2% read; ~95% sequential activity. And maybe you think about applying some variant of header compression to the packets during capture, to trade a little CPU and increased RAM requirements for storage efficiency.

The format used by PCAP and saving raw packet header bits directly to disk is not necessarily among the most I/O or space efficient on disk storage formats to pick.

Random writes should only occur if you are saving your captures to a fragmented file system, which is not recommended; avoiding fragmentation is important. Random reads aren't involved for archiving data, only for analyzing it.

Do you make random reads into your saved capture files? Possibly you're more likely to be doing a sequential scan, even during analysis; random reads imply you have already indexed a dataset and you are seeking a smaller number of specific records, to collect information about them.

Read requirements are totally dependent on your analysis workload, e.g. Table scan vs Index search. Depending on what the analysis is, it may make sense to even make extra filtered copies of the data, using more disk space, in order to avoid a random access pattern.

If you are building a database of analysis results from raw data, you can and use a separate random IO optimized disk subsystem for the stats database.

If you really need approximately 200 MB/s with some random read performance for analysis, you should probably be looking at building a RAID50 with several 4-drive sets and 1gb+ of writeback cache.

RAID10 makes more sense in situations where write requirements are not sequential, when external storage is actually shared with multiple applications, or when there is a requirement for a disk drive failure to be truly transparent, but there is a huge capacity sacrifice in choosing mirroring over parity.

There is a Time vs Cost tradeoff with regards to the analysis of the data.

When your 'analysis tools' start reading data, the reads increase the disk access time, and therefore reduce write performance; therefore the reads should be throttled, the higher the capacity the disk subsystem, the higher the cost.

Performing your analysis ahead of time via pre-caching, or at least indexing newly captured data in small chunks on a continuous basis may be useful, to minimize the amount of searching of the raw dataset later. A small SSD or separate mirrored drive pair for that function, would avoid adding load to the "raw capture storage" disk system, if your analysis requirements are amenable to that pattern.

Modern OSes cache some recent filesystem data in RAM. So if the server capturing data has sufficient SDRAM, analyzing data while it's still hot in the page cache, and saving that analysis in an efficient index for later use, can be useful.

...
(preferably 15k's) in a stripe/ raid10 if you're building from your scrap pile. Bear in mind that write >cache won't help you, as the io isn't going to be bursty, rather a continuous stream.

Not really... A good read cache is more important for the analysis, but Efficient write cache on your array and OS page cache is still highly beneficial, especially because it can ensure that your RAID subsystem is performing full stripe writes, for maximal efficiency of sequential write activity, and it can delay the media write until the optimal moment based on platter position, and sequence the read/write requests;

as long as the performance of the storage system behind the cache is such that the storage system can on average successfully drain the cache at a faster rate than you can fill it with data a sufficient amount of the time, the write cache serves an important function.

Your I/O may be a continuous stream, but there are most certainly variations and spikes in the rate of packets and the performance of mechanical disk drives.

...
Aligning your partitions with the physical disk geometry can produce surprising speedups, as can >stripe block size changes, but that's generally empirical, and depends on your workload.

For RAID systems partitions should absolutely be aligned if the OS install defaults don't align them correctly; on a modern OS, the defaults are normally OK. Having an unaligned or improperly aligned partition is just a misconfiguration; A track crossing for every other sector read is an easy way of doubling the size of small I/Os.

You won't notice with this particular use case when you are writing large blocks, you're writing a 100Mb chunk, asynchronously, you won't notice a 63kB difference, it's less than .0001% of your transfer size; this is primarily a concern during analysis or database searching which may involve small random reads and small synchronous random writes.

In other words, you will probably get away just ignoring partition alignment and filesystem block size, so there are other aspects of the configuration to be more concerned about (YMMV).

-- -JH

Nathan Stratton

12 Apr 12 Apr

10:03 p.m.

On Thu, 12 Apr 2012, Maverick wrote:

...

Hello Everyone,

Can you please comment on what is best solution for storing network traffic. We have been graciously granted access by our network administrator to capture traffic but the one Tera byte disk space is no match with the data that we are seeing, so it fills up quickly. We can't get additional space on the server itself so I am looking for some external solutions. Can you please suggest something that would be best for Gbps speeds .

I have done this two ways in the past, first is the simple way, LSI raid card with lots of disks and some nice 10 gig capture cards. The 2nd way is to use Gluster, over a large number of hosts with infiniband connecting them together.

...

<> Nathan Stratton nathan at robotics.net http://www.robotics.net

Julien Goodwin

15 Apr 15 Apr

12:01 p.m.

On 13/04/12 06:25, Maverick wrote:

...

Can you please comment on what is best solution for storing network traffic. We have been graciously granted access by our network administrator to capture traffic but the one Tera byte disk space is no match with the data that we are seeing, so it fills up quickly. We can't get additional space on the server itself so I am looking for some external solutions. Can you please suggest something that would be best for Gbps speeds .

In terms of tools, something shiny that I've not had a chance to play with yet that is designed for this is Security Onion, which is an Ubuntu based linux distribution that groups a bunch of tools for doing this sort of thing. http://securityonion.blogspot.com/

5009

Age (days ago)

5013

Last active (days ago)

List overview

Download

20 comments

18 participants

participants (18)

Andrew Thrift
Dan Olson
Drew Weaver
George Herbert
Ian McDonald
Jared Mauch
Jimmy Hess
Joel jaeggli
John T. Yocum
Julien Goodwin
Kyle Creyts
Leo Bicknell
Matthew Luckie
Maverick
Michael J McCafferty
Nathan Stratton
Simon Leinen
Valdis.Kletnieks＠vt.edu

Network Storage

John T. Yocum

John T. Yocum

Dan Olson

Ian McDonald

tags

participants (18)