Date: Fri, 20 Jun 2014 08:41:14 -0700 From: Freddie Cash <fjwcash@gmail.com> To: Graham Allan <allan@physics.umn.edu> Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: Re: Large ZFS arrays? Message-ID: <CAOjFWZ7XJKLJdrX2R3hsDe6vkem%2BjisTOLrysWR_zHx06Px2%2BQ@mail.gmail.com> In-Reply-To: <53A44A23.6050604@physics.umn.edu> References: <1402846139.4722.352.camel@btw.pki2.com> <53A44A23.6050604@physics.umn.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 20, 2014 at 7:50 AM, Graham Allan <allan@physics.umn.edu> wrote= : > =E2=80=8B<snippage>=E2=80=8B > > Would be interesting to hear a little about experiences with the drives > used... For our first "experimental" chassis we used 3TB Seagate desktop > drives - cheap but not the best choice, 18 months later they are dropping > like flies (luckily we can risk some cheapness here as most of our data c= an > be re-transferred from other sites if needed). Another chassis has 2TB WD > RE4 enterprise drives (no problems), and four others have 3TB and 4TB WD > "Red" NAS drives... which are another "slightly risky" selection but so f= ar > have been very solid (also in some casual discussion with a WD field > engineer he seemed to feel these would be fine for both ZFS and hadoop us= e). =E2=80=8BWe've had good experiences with WD Black drives (500 GB, 1 TB, and= 2 TB). These tend to last the longest, and provide the nicest failure modes. It's also very easy to understand the WD model numbers. We've also used Seagate =E2=80=8B7200.11 and 7200.12 drives (1 TB and 2 TB)= . These perform well, but fail in weird ways. They also tend to fail sooner than the WD. Thankfully, the RMA process with Seagate is fairly simple and turn-around time is fairly quick. Unfortunately, trying to figure out exactly which model of Seagate drive to order is becoming more of a royal pain as time goes on. They keep changing their marketing model names and the actual model numbers. There's now something like 8 separate product lines to pick from and 6+ different models in each line, times 2 for 4K vs 0.5K sectors. We started out (3? 4? years ago) using WD Blue drives because they were inexpensive (like almost half the price of WD Black) and figured all the ZFS goodness would work well on them. Quickly found out that desktop drives really aren't suited to server work. Especially when being written to for 12+ hours a day. :) =E2=80=8BWe were going to try some Toshiba drives in our next setup, but we received an exceptionally good price on WD Black drives on our last tender ($80 CDN for 1 TB) that we decided to stick with those for now.=E2=80=8B := D After all, they work well, so why rock the boat? =E2=80=8BWe haven't used any drives larger than 2 TB as of yet. Tracking drives for failures and replacements was a big issue for us. One > of my co-workers wrote a nice perl script which periodically harvests all > the data from the chassis (via sg3utils) and stores the mappings of chass= is > slots, da devices, drive labels, etc into a database. It also understands > the layout of the 847 chassis and labels the drives for us according to > some rules we made up - we do some prefix for the pool name, then "f" or > "b" for front/back of chassis, then the slot number, and finally (?) has > some controls to turn the chassis drive identify lights on or off. There > might be other ways to do all this but we didn't find any, so it's been > incredibly useful for us. > =E2=80=8BWe partition each drive into a single GPT partition (starting at 1= MB, covering whole disk), and label that partition with the chassis/slot that it's in. Then use the GPT label to build the pool (/dev/gpt/diskname). That way, all the metadata in the pool, and any error messages from ZFS, tell us exactly which disk, in which chassis, in which slot, is having issues. No external database required. :)=E2=80=8B Currently using smartmontools and the periodic scripts to alert us of pending drive failures, and a custom cron job that checks the health of the pools for alerting us to actual drive failures. It's not pretty, but with only 4 large servers to monitor, it works for us. I'm hoping to eventually convert those scripts to Nagios plugins, and let our existing monitoring setup keep track of the ZFS pools as well. --=20 Freddie Cash fjwcash@gmail.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOjFWZ7XJKLJdrX2R3hsDe6vkem%2BjisTOLrysWR_zHx06Px2%2BQ>