Got a call at 4am - RAID Gurus Please Read
Server down..... Got to colo at 4:39 and an old IBM X346 node with Serveraid-7k has failed. Opened it up to find a swollen cache battery that has bent the card in three different axis. Separated the battery. (i) Inspect card and plug back in, (ii) reboot, and got (code 2807) Not functioning.... Return to (i) x3 got same result. Dusted her off and let it sit for a while plugged in, rebooted to see if I can get her to write-through mode, disks start spinning. Horay. Plan of action, (and the reason for my post): * Can I change from an active (ie, disks with data) raid 5 to raid 10. There are 4 drives in the unit, and I have two on the shelf that I can plug in. * If so, will I have less of performance impact with RAID 10 + write-thru then RAID 5 + write through * When the new raid card comes in, can I just plug it in without loosing my data? I would: i) RAID 10 ii) Write-thru iii) Replace card The new card is probably coming with a bad battery that would put us kind of in square one. New batteries are 200+ if I can find them. Best case scenario is move it over to RAID 10+Write-thru, and feel less of the performance pinch. Given I can move from RAID 5 to RAID 10 without loosing data. How long to anticipate downtime for this process? Is there heavy sector re-arranging happening here? And the same for write-thru, is it done quick? I'm going to go lay down just for a little white. Thanks in Advance, Nick from Toronto.
If the serveraid7k cards are LSI and not Adaptec based (I think they are) you should just be able to plug in a new adapter and import the foreign configuration. You do have a good backup, yes? Switching to write-through has already happened (unless you specified WriteBackModeEvenWithNoBBU - not the default) - these (LSI) cards by default only WB when "safe". If WT, RAID10 much better perf. BUT you just can't migrate from R5 to R10 non-destructively. - Michael from Kitchener Original Message From: symack Sent: Tuesday, December 9, 2014 16:04 To: nanog@nanog.org Subject: Got a call at 4am - RAID Gurus Please Read Server down..... Got to colo at 4:39 and an old IBM X346 node with Serveraid-7k has failed. Opened it up to find a swollen cache battery that has bent the card in three different axis. Separated the battery. (i) Inspect card and plug back in, (ii) reboot, and got (code 2807) Not functioning.... Return to (i) x3 got same result. Dusted her off and let it sit for a while plugged in, rebooted to see if I can get her to write-through mode, disks start spinning. Horay. Plan of action, (and the reason for my post): * Can I change from an active (ie, disks with data) raid 5 to raid 10. There are 4 drives in the unit, and I have two on the shelf that I can plug in. * If so, will I have less of performance impact with RAID 10 + write-thru then RAID 5 + write through * When the new raid card comes in, can I just plug it in without loosing my data? I would: i) RAID 10 ii) Write-thru iii) Replace card The new card is probably coming with a bad battery that would put us kind of in square one. New batteries are 200+ if I can find them. Best case scenario is move it over to RAID 10+Write-thru, and feel less of the performance pinch. Given I can move from RAID 5 to RAID 10 without loosing data. How long to anticipate downtime for this process? Is there heavy sector re-arranging happening here? And the same for write-thru, is it done quick? I'm going to go lay down just for a little white. Thanks in Advance, Nick from Toronto.
+1 on the most important statement below, from my point of view: RAID 5 and RAID 10 are totally separate animals and while you can set up a separate RAID 10 array and migrate your data to it (as soon as possible!!!) you cannot migrate from 5 to 10 in place absent some utter magic that I am unaware of. 10 requires more raw drive space but offers significant write performance advantages when correctly configured (which isn't really too difficult). 5 is fine for protection against losing one drive, but 5 requires much more internal processing of writeable data before it begins the writes and, not too long ago, was considered completely inappropriate for applications with high numbers of writes, such as a transactional database. Still, 5 is often used for database systems in casual installations just because it's easy, cheap (relatively) and modern fast boxes are fast enough. Ok, getting down off my RAID soapbox - good luck. ..Allen
On Dec 9, 2014, at 17:22, Michael Brown <michael@supermathie.net> wrote:
If the serveraid7k cards are LSI and not Adaptec based (I think they are) you should just be able to plug in a new adapter and import the foreign configuration.
You do have a good backup, yes?
Switching to write-through has already happened (unless you specified WriteBackModeEvenWithNoBBU - not the default) - these (LSI) cards by default only WB when "safe".
If WT, RAID10 much better perf. BUT you just can't migrate from R5 to R10 non-destructively.
- Michael from Kitchener Original Message From: symack Sent: Tuesday, December 9, 2014 16:04 To: nanog@nanog.org Subject: Got a call at 4am - RAID Gurus Please Read
Server down..... Got to colo at 4:39 and an old IBM X346 node with Serveraid-7k has failed. Opened it up to find a swollen cache battery that has bent the card in three different axis. Separated the battery. (i) Inspect card and plug back in, (ii) reboot, and got (code 2807) Not functioning.... Return to (i) x3 got same result. Dusted her off and let it sit for a while plugged in, rebooted to see if I can get her to write-through mode, disks start spinning. Horay.
Plan of action, (and the reason for my post):
* Can I change from an active (ie, disks with data) raid 5 to raid 10. There are 4 drives in the unit, and I have two on the shelf that I can plug in. * If so, will I have less of performance impact with RAID 10 + write-thru then RAID 5 + write through * When the new raid card comes in, can I just plug it in without loosing my data? I would:
i) RAID 10 ii) Write-thru iii) Replace card
The new card is probably coming with a bad battery that would put us kind of in square one. New batteries are 200+ if I can find them. Best case scenario is move it over to RAID 10+Write-thru, and feel less of the performance pinch.
Given I can move from RAID 5 to RAID 10 without loosing data. How long to anticipate downtime for this process? Is there heavy sector re-arranging happening here? And the same for write-thru, is it done quick?
I'm going to go lay down just for a little white.
Thanks in Advance,
Nick from Toronto.
symack schreef op 9-12-2014 22:03:
* Can I change from an active (ie, disks with data) raid 5 to raid 10. There are 4 drives
Dump and restore. I've used Acronis succesfully in the past and today, they have a bootable ISO. Also, if you have the option, they have universal restore so you can restore Windows on another piece of hardware (you provide the drivers).
in the unit, and I have two on the shelf that I can plug in. * If so, will I have less of performance impact with RAID 10 + write-thru then RAID 5 + write through
Raid10 is the only valid raid format these days. With the disks as big as they get these days it's possible for silent corruption. And with 4TB+ disks that is a real thing. Raid 6 is ok, if you accept rebuilds that take a week, literally. Although the rebuild rate on our 11 disk raid 6 SSD array (2TB) is less then a day. If it accepts sata drives, consider just using SSDs instead. They're just 600 euros for a 800GB drive. (Intel S3500)
Given I can move from RAID 5 to RAID 10 without loosing data. How long to anticipate downtime for this process? Is there heavy sector re-arranging happening here? And the same for write-thru, is it done quick?
Heavy sectory re-arranging, yes, so just dump and restore, it's faster and more reliable. Also, you then have a working bare metal restore backup. Regards, Seth
The subject is drifting a bit but I'm going with the flow here: Seth Mos <seth.mos@dds.nl> writes:
Raid10 is the only valid raid format these days. With the disks as big as they get these days it's possible for silent corruption.
How do you detect it? A man with two watches is never sure what time it is. Unless you have a filesystem that detects and corrects silent corruption, you're still hosed, you just don't know it yet. RAID10 between the disks in and of itself doesn't help.
And with 4TB+ disks that is a real thing. Raid 6 is ok, if you accept rebuilds that take a week, literally. Although the rebuild rate on our 11 disk raid 6 SSD array (2TB) is less then a day.
I did a rebuild on a RAIDZ2 vdev recently (made out of 4tb WD reds). It took nowhere near a day let alone a week. Theoretically takes 8-11 hours if the vdev is completely full, proportionately less if it's not, and I was at about 2/3 in use. -r
I'm just going to chime in here since I recently had to deal with bit-rot affecting a 6TB linux raid5 setup using mdadm (6x 1TB disks) We couldn't rebuild because of 5 URE sectors on one of the other disks in the array after a power / ups issue rebooted our storage box. We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago? +1 for ZFS and RAIDZ On Wed, Dec 10, 2014 at 8:40 AM, Rob Seastrom <rs@seastrom.com> wrote:
The subject is drifting a bit but I'm going with the flow here:
Seth Mos <seth.mos@dds.nl> writes:
Raid10 is the only valid raid format these days. With the disks as big as they get these days it's possible for silent corruption.
How do you detect it? A man with two watches is never sure what time it is.
Unless you have a filesystem that detects and corrects silent corruption, you're still hosed, you just don't know it yet. RAID10 between the disks in and of itself doesn't help.
And with 4TB+ disks that is a real thing. Raid 6 is ok, if you accept rebuilds that take a week, literally. Although the rebuild rate on our 11 disk raid 6 SSD array (2TB) is less then a day.
I did a rebuild on a RAIDZ2 vdev recently (made out of 4tb WD reds). It took nowhere near a day let alone a week. Theoretically takes 8-11 hours if the vdev is completely full, proportionately less if it's not, and I was at about 2/3 in use.
-r
I'm just going to chime in here since I recently had to deal with bit-rot affecting a 6TB linux raid5 setup using mdadm (6x 1TB disks)
We couldn't rebuild because of 5 URE sectors on one of the other disks in the array after a power / ups issue rebooted our storage box.
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
+1 for ZFS and RAIDZ
I hope you are NOT using RAIDZ. The chances of an error showing up during a resilver is uncomfortably high and there are no automatic tools to fix pool corruption with ZFS. Ideally use RAIDZ2 or RAIDZ3 to provide more appropriate levels of protection. Errors introduced into a pool can cause substantial unrecoverable damage to the pool, so you really want the bitrot detection and correction mechanisms to be working "as designed." ... JG -- Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net "We call it the 'one bite at the apple' rule. Give me one chance [and] then I won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN) With 24 million small businesses in the US alone, that's way too many apples.
On Thu, Dec 11, 2014 at 2:25 AM, Randy Bush <randy@psg.com> wrote:
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
because it is not production on linux,
Well, it depends on what you mean by "production". Certainly the ZFS on Linux group has said in some forums that it is "production ready", although I would say that their definition is not exactly the same as what I mean by the term.
which i have to use because freebsd does not have kvm/ganeti.
There is bhyve, and virt-manager can support bhyve in later versions (but is disabled by default as I recall). Not exactly the same, of course.
want zfs very very badly. snif.
Anyone who really cares about their data wants ZFS. Some just do not yet know that they (should) want it. There is always Illumos/OnmiOS/SmartOS to consider (depending on your particular requirements) which can do ZFS and KVM.
zfs and ganeti -- Phones are not computers and suck for email On December 11, 2014 2:39:19 PM GMT+09:00, Gary Buhrmaster <gary.buhrmaster@gmail.com> wrote:
On Thu, Dec 11, 2014 at 2:25 AM, Randy Bush <randy@psg.com> wrote:
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
because it is not production on linux,
Well, it depends on what you mean by "production". Certainly the ZFS on Linux group has said in some forums that it is "production ready", although I would say that their definition is not exactly the same as what I mean by the term.
which i have to use because freebsd does not have kvm/ganeti.
There is bhyve, and virt-manager can support bhyve in later versions (but is disabled by default as I recall). Not exactly the same, of course.
want zfs very very badly. snif.
Anyone who really cares about their data wants ZFS. Some just do not yet know that they (should) want it.
There is always Illumos/OnmiOS/SmartOS to consider (depending on your particular requirements) which can do ZFS and KVM.
Gary Buhrmaster <gary.buhrmaster@gmail.com> writes:
There is always Illumos/OnmiOS/SmartOS to consider (depending on your particular requirements) which can do ZFS and KVM.
2.5-year SmartOS user here. Generally speaking pretty good though I have my list of gripes like everything else I touch. -r
From: Randy Bush <randy@psg.com>
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
because it is not production on linux, which i have to use because freebsd does not have kvm/ganeti. want zfs very very badly. snif.
I keep reading zfs vs btrfs articles and...inconclusive. My problem with both is I need quotas, both file and "inode", and both are weaker than ext4 on that, zfs is very weak on this, you can only sort of simulate them. -- -Barry Shein The World | bzs@TheWorld.com | http://www.TheWorld.com Purveyors to the Trade | Voice: 800-THE-WRLD | Dial-Up: US, PR, Canada Software Tool & Die | Public Access Internet | SINCE 1989 *oo*
Barry Shein <bzs@world.std.com> writes:
From: Randy Bush <randy@psg.com>
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
because it is not production on linux, which i have to use because freebsd does not have kvm/ganeti. want zfs very very badly. snif.
I keep reading zfs vs btrfs articles and...inconclusive.
My problem with both is I need quotas, both file and "inode", and both are weaker than ext4 on that, zfs is very weak on this, you can only sort of simulate them.
By file, you mean "disk space used"? By whom and where? Quotas and reservations on a per-dataset basis are pretty darned well supported in ZFS. As for inodes, well, since there isn't really such a thing as an inode in ZFS... what exactly are you trying to do here? -r
Disk space by uid (by group is a plus but not critical), like BSD and EXTn. And the reason I put "inode" in quotes was to indicate that they may not (certainly not) be called inodes but an upper limit to the total number of files and directories, typically to stop a runaway script or certain malicious or grossly irresponsible behavior.
From my reading the closest you can get to disk space quotas in ZFS is by limiting on a per directory (dataset, mount) basis which is similar but different.
On December 11, 2014 at 16:57 rs@seastrom.com (Rob Seastrom) wrote:
Barry Shein <bzs@world.std.com> writes:
From: Randy Bush <randy@psg.com>
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
because it is not production on linux, which i have to use because freebsd does not have kvm/ganeti. want zfs very very badly. snif.
I keep reading zfs vs btrfs articles and...inconclusive.
My problem with both is I need quotas, both file and "inode", and both are weaker than ext4 on that, zfs is very weak on this, you can only sort of simulate them.
By file, you mean "disk space used"? By whom and where? Quotas and reservations on a per-dataset basis are pretty darned well supported in ZFS. As for inodes, well, since there isn't really such a thing as an inode in ZFS... what exactly are you trying to do here?
-r
-- -Barry Shein The World | bzs@TheWorld.com | http://www.TheWorld.com Purveyors to the Trade | Voice: 800-THE-WRLD | Dial-Up: US, PR, Canada Software Tool & Die | Public Access Internet | SINCE 1989 *oo*
On Thu, Dec 11, 2014 at 9:05 PM, Barry Shein <bzs@world.std.com> wrote: [snip]
From my reading the closest you can get to disk space quotas in ZFS is by limiting on a per directory (dataset, mount) basis which is similar but different.
This is the normal type of quota within ZFS. it is applied to a dataset and limits the size of the dataset, such as home/username. You can have as many datasets ("filesystems") as you like (within practical limits), which is probably the way to go in regards to home directories. But another option is zfs set groupquota@groupname=100GB example1/blah zfs set userquota@user1=200MB example1/blah This would be available on the Solaris implementation. I am not 100% certain that this is available under the BSD implementations, even if QUOTA is enabled in your kernel config. In the past.... the BSD implementation of ZFS never seemed to be as stable, functional, or performant as the OpenSolaris/Illumos version. -- -JH
On Thu, 11 Dec 2014, Jimmy Hess wrote:
I am not 100% certain that this is available under the BSD implementations, even if QUOTA is enabled in your kernel config.
In the past.... the BSD implementation of ZFS never seemed to be as stable, functional, or performant as the OpenSolaris/Illumos version.
That's a scary low bar for comparison. OpenSolaris (or even Solaris 11), ZFS, Stable. Pick one. Maybe two. Three? Yeah right. Anyone who's used it hard, under heavy load, should understand. ---------------------------------------------------------------------- Jon Lewis, MCP :) | I route | therefore you are _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Jon Lewis <jlewis@lewis.org> writes:
OpenSolaris (or even Solaris 11), ZFS, Stable. Pick one. Maybe two. Three? Yeah right. Anyone who's used it hard, under heavy load, should understand.
The most recent release of OpenSolaris was over 5 years ago. You're working from (extremely) dated information. The current FOSS Solaris ecosystem forked when Oracle brought stuff back in-house. Significant development has happened over the intervening half-decade. Anyone who's using Nexentastor (or hosted in Joyent Cloud) is getting all three (supra). -r
That might be close enough. I need to set up a test system and play around with zfs and btrfs. Thanks. On December 11, 2014 at 21:29 mysidia@gmail.com (Jimmy Hess) wrote:
On Thu, Dec 11, 2014 at 9:05 PM, Barry Shein <bzs@world.std.com> wrote: [snip]
From my reading the closest you can get to disk space quotas in ZFS is by limiting on a per directory (dataset, mount) basis which is similar but different.
This is the normal type of quota within ZFS. it is applied to a dataset and limits the size of the dataset, such as home/username. You can have as many datasets ("filesystems") as you like (within practical limits), which is probably the way to go in regards to home directories.
But another option is
zfs set groupquota@groupname=100GB example1/blah zfs set userquota@user1=200MB example1/blah
This would be available on the Solaris implementation.
I am not 100% certain that this is available under the BSD implementations, even if QUOTA is enabled in your kernel config.
In the past.... the BSD implementation of ZFS never seemed to be as stable, functional, or performant as the OpenSolaris/Illumos version.
-- -JH
-- -Barry Shein The World | bzs@TheWorld.com | http://www.TheWorld.com Purveyors to the Trade | Voice: 800-THE-WRLD | Dial-Up: US, PR, Canada Software Tool & Die | Public Access Internet | SINCE 1989 *oo*
Are you running ZFS and RAIDZ on Linux or BSD? On 10 Dec 2014 23:21, "Javier J" <javier@advancedmachines.us> wrote:
I'm just going to chime in here since I recently had to deal with bit-rot affecting a 6TB linux raid5 setup using mdadm (6x 1TB disks)
We couldn't rebuild because of 5 URE sectors on one of the other disks in the array after a power / ups issue rebooted our storage box.
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
+1 for ZFS and RAIDZ
On Wed, Dec 10, 2014 at 8:40 AM, Rob Seastrom <rs@seastrom.com> wrote:
The subject is drifting a bit but I'm going with the flow here:
Seth Mos <seth.mos@dds.nl> writes:
Raid10 is the only valid raid format these days. With the disks as big as they get these days it's possible for silent corruption.
How do you detect it? A man with two watches is never sure what time it is.
Unless you have a filesystem that detects and corrects silent corruption, you're still hosed, you just don't know it yet. RAID10 between the disks in and of itself doesn't help.
And with 4TB+ disks that is a real thing. Raid 6 is ok, if you accept rebuilds that take a week, literally. Although the rebuild rate on our 11 disk raid 6 SSD array (2TB) is less then a day.
I did a rebuild on a RAIDZ2 vdev recently (made out of 4tb WD reds). It took nowhere near a day let alone a week. Theoretically takes 8-11 hours if the vdev is completely full, proportionately less if it's not, and I was at about 2/3 in use.
-r
Zfs on BSD or a Solaris like OS
On Dec 11, 2014, at 10:06 AM, Bacon Zombie <baconzombie@gmail.com> wrote:
Are you running ZFS and RAIDZ on Linux or BSD?
On 10 Dec 2014 23:21, "Javier J" <javier@advancedmachines.us> wrote:
I'm just going to chime in here since I recently had to deal with bit-rot affecting a 6TB linux raid5 setup using mdadm (6x 1TB disks)
We couldn't rebuild because of 5 URE sectors on one of the other disks in the array after a power / ups issue rebooted our storage box.
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
+1 for ZFS and RAIDZ
On Wed, Dec 10, 2014 at 8:40 AM, Rob Seastrom <rs@seastrom.com> wrote:
The subject is drifting a bit but I'm going with the flow here:
Seth Mos <seth.mos@dds.nl> writes:
Raid10 is the only valid raid format these days. With the disks as big as they get these days it's possible for silent corruption.
How do you detect it? A man with two watches is never sure what time it is.
Unless you have a filesystem that detects and corrects silent corruption, you're still hosed, you just don't know it yet. RAID10 between the disks in and of itself doesn't help.
And with 4TB+ disks that is a real thing. Raid 6 is ok, if you accept rebuilds that take a week, literally. Although the rebuild rate on our 11 disk raid 6 SSD array (2TB) is less then a day.
I did a rebuild on a RAIDZ2 vdev recently (made out of 4tb WD reds). It took nowhere near a day let alone a week. Theoretically takes 8-11 hours if the vdev is completely full, proportionately less if it's not, and I was at about 2/3 in use.
-r
+1 on both. Mostly SmartOS, some FreeNAS (which is FreeBSD underneath). -r Ryan Brooks <ryan@hack.net> writes:
Zfs on BSD or a Solaris like OS
On Dec 11, 2014, at 10:06 AM, Bacon Zombie <baconzombie@gmail.com> wrote:
Are you running ZFS and RAIDZ on Linux or BSD?
On 10 Dec 2014 23:21, "Javier J" <javier@advancedmachines.us> wrote:
I'm just going to chime in here since I recently had to deal with bit-rot affecting a 6TB linux raid5 setup using mdadm (6x 1TB disks)
We couldn't rebuild because of 5 URE sectors on one of the other disks in the array after a power / ups issue rebooted our storage box.
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
+1 for ZFS and RAIDZ
On Wed, Dec 10, 2014 at 8:40 AM, Rob Seastrom <rs@seastrom.com> wrote:
The subject is drifting a bit but I'm going with the flow here:
Seth Mos <seth.mos@dds.nl> writes:
Raid10 is the only valid raid format these days. With the disks as big as they get these days it's possible for silent corruption.
How do you detect it? A man with two watches is never sure what time it is.
Unless you have a filesystem that detects and corrects silent corruption, you're still hosed, you just don't know it yet. RAID10 between the disks in and of itself doesn't help.
And with 4TB+ disks that is a real thing. Raid 6 is ok, if you accept rebuilds that take a week, literally. Although the rebuild rate on our 11 disk raid 6 SSD array (2TB) is less then a day.
I did a rebuild on a RAIDZ2 vdev recently (made out of 4tb WD reds). It took nowhere near a day let alone a week. Theoretically takes 8-11 hours if the vdev is completely full, proportionately less if it's not, and I was at about 2/3 in use.
-r
Hey guys, I am running it on freeBSD. (nas4free) It's my understanding that when a resilver happens in a zpool, only the data that has actually been written to the disks gets used, not the whole array like traditional raid5 does, reading even empty blocks. I know I should be using RAIDZ2 for this size array, but I have daily backups off of this array and also this is a lab, not a production environment. In a production environment I would use raidz2 or raidz3. The bottom line is even just Raidz1 is way better than any RAID5 hardware/software solution I have come across. 1 disk with ZFS can survive 1/8 of the disk becoming destroyed apparently. ZFS itself has many protections against data corruption. Also I have scheduled a zpool scrub to run twice a week (to detect bitrot before it happens.) Anyway. I have been using linux raid since it has been available and I ask myself, why haven't I used ZFS seriously before now. - J On Thu, Dec 11, 2014 at 11:06 AM, Bacon Zombie <baconzombie@gmail.com> wrote:
Are you running ZFS and RAIDZ on Linux or BSD? On 10 Dec 2014 23:21, "Javier J" <javier@advancedmachines.us> wrote:
I'm just going to chime in here since I recently had to deal with bit-rot affecting a 6TB linux raid5 setup using mdadm (6x 1TB disks)
We couldn't rebuild because of 5 URE sectors on one of the other disks in the array after a power / ups issue rebooted our storage box.
We are now using ZFS RAIDZ and the question I ask myself is, why wasn't I using ZFS years ago?
+1 for ZFS and RAIDZ
On Wed, Dec 10, 2014 at 8:40 AM, Rob Seastrom <rs@seastrom.com> wrote:
The subject is drifting a bit but I'm going with the flow here:
Seth Mos <seth.mos@dds.nl> writes:
Raid10 is the only valid raid format these days. With the disks as big as they get these days it's possible for silent corruption.
How do you detect it? A man with two watches is never sure what time it is.
Unless you have a filesystem that detects and corrects silent corruption, you're still hosed, you just don't know it yet. RAID10 between the disks in and of itself doesn't help.
And with 4TB+ disks that is a real thing. Raid 6 is ok, if you accept rebuilds that take a week, literally. Although the rebuild rate on our 11 disk raid 6 SSD array (2TB) is less then a day.
I did a rebuild on a RAIDZ2 vdev recently (made out of 4tb WD reds). It took nowhere near a day let alone a week. Theoretically takes 8-11 hours if the vdev is completely full, proportionately less if it's not, and I was at about 2/3 in use.
-r
As for conversion between RAID levels; usually dump and restore are your best bet. Even if your controller HBA supports a RAID level migration; for a small array hosted in a server, dump and restore is your least risky bet for successful execution; you really need to dump anyways, even on a controller that supports clever RAID level migrations (The ServeRaid does not fall into this category), there is the possibility that the operation fails, leading to data loss, so backup first. On Wed, Dec 10, 2014 at 2:49 AM, Seth Mos <seth.mos@dds.nl> wrote:
symack schreef op 9-12-2014 22:03: [snip] Raid10 is the only valid raid format these days. With the disks as big as they get these days it's possible for silent corruption.
No! Mistake. It depends. RAID6, RAID60, RAID-DP, RAIDZ3, and a few others are perfectly valid RAID formats, with sufficient sparing. You get fewer extra average random write IOPS per spindle, but better survivability, particularly in the event of simultaneous double failures or even a simultaneous triple-failure or simultaneous quadruple failure (with appropriate RAID group sizing), which are not necessarily as rare as one might intuitively expect. And silent corruption can be addressed partially via surface scanning and built-in ECC on the hard drives, then also (for Non-SATA SAS/FC drives), the decent array subsystems low-level formatted disks with larger sector size at the time of initialization and slip in additional error correction data within each chunk's metadata, so silent corruption or bit-flipping isn't necessarily so silent on a decent piece of storage equipment. If you need to have a configuration less than 12 disk drives, where you require good performance for many small random reads and writes, and only cheap controllers are an option, then yeah you probably need Raid10, but not always. In case you have a storage chassis with 16 disk drives, an integrated RAID controller, a solid 1 to 2gb NVRAM cache and a few gigabytes read cache, then RAID6 or RAID60, or (maybe) even RAID50 could be a solid option for a wide number of use cases. You really just need to calculate an upper bound on the right number of spindles spread over the right number of host ports for the workload adjusted based on which RAID level you pick with sufficient cache (taking into account the caching policy and including a sufficiently large safety factor to encompass inherent uncertainties in spindle performance and the level of variability for your specific overall workload). -- -JH
On 2014-12-09, symack <symack@gmail.com> wrote:
Server down..... Got to colo at 4:39 and an old IBM X346 node with Serveraid-7k has failed. Opened it up to find a swollen cache battery that has bent the card in three different axis.
* Can I change from an active (ie, disks with data) raid 5 to raid 10.
Even if the hw/firmware supports it, raid level migration is risky enough at the best of times, and totally insane on a known-bad controller.
participants (15)
-
Allen McKinley Kitchen (gmail)
-
Bacon Zombie
-
Barry Shein
-
Gary Buhrmaster
-
Javier J
-
Jimmy Hess
-
Joe Greco
-
Jon Lewis
-
Michael Brown
-
Randy Bush
-
Rob Seastrom
-
Ryan Brooks
-
Seth Mos
-
Stuart Henderson
-
symack