Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jun 2014 08:41:14 -0700
From:      Freddie Cash <fjwcash@gmail.com>
To:        Graham Allan <allan@physics.umn.edu>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: Large ZFS arrays?
Message-ID:  <CAOjFWZ7XJKLJdrX2R3hsDe6vkem%2BjisTOLrysWR_zHx06Px2%2BQ@mail.gmail.com>
In-Reply-To: <53A44A23.6050604@physics.umn.edu>
References:  <1402846139.4722.352.camel@btw.pki2.com> <53A44A23.6050604@physics.umn.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 20, 2014 at 7:50 AM, Graham Allan <allan@physics.umn.edu> wrote=
:

> =E2=80=8B<snippage>=E2=80=8B
>
> Would be interesting to hear a little about experiences with the drives
> used... For our first "experimental" chassis we used 3TB Seagate desktop
> drives - cheap but not the best choice, 18 months later they are dropping
> like flies (luckily we can risk some cheapness here as most of our data c=
an
> be re-transferred from other sites if needed). Another chassis has 2TB WD
> RE4 enterprise drives (no problems), and four others have 3TB and 4TB WD
> "Red" NAS drives... which are another "slightly risky" selection but so f=
ar
> have been very solid (also in some casual discussion with a WD field
> engineer he seemed to feel these would be fine for both ZFS and hadoop us=
e).


=E2=80=8BWe've had good experiences with WD Black drives (500 GB, 1 TB, and=
 2 TB).
 These tend to last the longest, and provide the nicest failure modes.
 It's also very easy to understand the WD model numbers.

We've also used Seagate =E2=80=8B7200.11 and 7200.12 drives (1 TB and 2 TB)=
.  These
perform well, but fail in weird ways.  They also tend to fail sooner than
the WD.  Thankfully, the RMA process with Seagate is fairly simple and
turn-around time is fairly quick.  Unfortunately, trying to figure out
exactly which model of Seagate drive to order is becoming more of a royal
pain as time goes on.  They keep changing their marketing model names and
the actual model numbers.  There's now something like 8 separate product
lines to pick from and 6+ different models in each line, times 2 for 4K vs
0.5K sectors.

We started out (3? 4? years ago) using WD Blue drives because they were
inexpensive (like almost half the price of WD Black) and figured all the
ZFS goodness would work well on them.  Quickly found out that desktop
drives really aren't suited to server work.  Especially when being written
to for 12+ hours a day.  :)

=E2=80=8BWe were going to try some Toshiba drives in our next setup, but we
received an exceptionally good price on WD Black drives on our last tender
($80 CDN for 1 TB) that we decided to stick with those for now.=E2=80=8B  :=
D  After
all, they work well, so why rock the boat?

=E2=80=8BWe haven't used any drives larger than 2 TB as of yet.

Tracking drives for failures and replacements was a big issue for us. One
> of my co-workers wrote a nice perl script which periodically harvests all
> the data from the chassis (via sg3utils) and stores the mappings of chass=
is
> slots, da devices, drive labels, etc into a database. It also understands
> the layout of the 847 chassis and labels the drives for us according to
> some rules we made up - we do some prefix for the pool name, then "f" or
> "b" for front/back of chassis, then the slot number, and finally (?) has
> some controls to turn the chassis drive identify lights on or off. There
> might be other ways to do all this but we didn't find any, so it's been
> incredibly useful for us.
>

=E2=80=8BWe partition each drive into a single GPT partition (starting at 1=
 MB,
covering whole disk), and label that partition with the chassis/slot that
it's in.  Then use the GPT label to build the pool (/dev/gpt/diskname).
 That way, all the metadata in the pool, and any error messages from ZFS,
tell us exactly which disk, in which chassis, in which slot, is having
issues.  No external database required.  :)=E2=80=8B

Currently using smartmontools and the periodic scripts to alert us of
pending drive failures, and a custom cron job that checks the health of the
pools for alerting us to actual drive failures.  It's not pretty, but with
only 4 large servers to monitor, it works for us.  I'm hoping to eventually
convert those scripts to Nagios plugins, and let our existing monitoring
setup keep track of the ZFS pools as well.

--=20
Freddie Cash
fjwcash@gmail.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOjFWZ7XJKLJdrX2R3hsDe6vkem%2BjisTOLrysWR_zHx06Px2%2BQ>