Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Nov 2015 11:08:55 -0800
From:      Tim Gustafson <tjg@ucsc.edu>
To:        freebsd-fs@freebsd.org
Subject:   ZFS RAID 0+1 Throwing Checksum Errors
Message-ID:  <CAPyBAS7oYvp6vvzetcGmrXy0_Qn0fXBN_d510w41CguDZCzMxw@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
I have a FreeBSD 10.1 server configured as root-on-zfs with the
following pool configuration:

NAME            STATE     READ WRITE CKSUM
tank           ONLINE       0     0     0
 mirror-0      ONLINE       0     0     0
   gpt/zfs0    ONLINE       0     0     0
   gpt/zfs1    ONLINE       0     0     0
 mirror-1      ONLINE       0     0     0
   gpt/zfs2    ONLINE       0     0     0
   gpt/zfs3    ONLINE       0     0     0

The disks are each 1TB Samsung 850EVO SSDs connected via an mrsas Dell
Perc raid controller configured in "RAID Disabled" mode.

I run a "zpool scrub" every weekend and every weekend the scrub finds
a handful (usually between 1 and 10) checksum errors per disk.  The
scrub fixes the checksum errors, and I clear the counters and
everything seems fine.  As far as I know, I do not have any corrupt or
missing data.

The server is a fairly busy web and database server, handling about 5
million hits per day.

I'm wondering if the problem is that the scrub is calculating the
checksum for the data on gpt/zfs0, and while that's happening, some
data is updated by Apache or MySQL, and then checksum for the data on
gpt/zfs1 is calculated, which now doesn't match, and therefore the
scrub is reporting an error.  Is that possible?

If that's not it, could this be a bug?  Or should I be worried about
my SSDs?  What additional data would be helpful for me to share to
diagnose this?

-- 

Tim Gustafson
Technical Lead, Baskin School of Engineering
tjg@ucsc.edu
831-459-5354
Baskin Engineering, Room 313A



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAPyBAS7oYvp6vvzetcGmrXy0_Qn0fXBN_d510w41CguDZCzMxw>