Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 May 2013 23:45:57 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Steven Hartland" <killing@multiplay.co.uk>, "Ajit Jain" <ajit.jain@cloudbyte.com>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: seeing data corruption with zfs trim functionality
Message-ID:  <93D0677B373A452BAF58C8EA6823783D@multiplay.co.uk>
References:  <CAA71u6Y5dKZ9O0rqxCpx-9t7DYgTnPZSoNy-iHOnmzrOUYp%2Bvw@mail.gmail.com> <60316751643743738AB83DABC6A5934B@multiplay.co.uk> <20130429105143.GA1492@icarus.home.lan> <3AD1AB31003D49B2BF2EA7DD411B38A2@multiplay.co.uk> <C6AA4D0A7C49469ABB3C7440B1BCC108@multiplay.co.uk> <CAA71u6Zh7BbbdC=utqfR2MD1Nn=9euUDXHKqqu9NyBG-Jx%2B=Ow@mail.gmail.com> <9681E07546D348168052D4FC5365B4CD@multiplay.co.uk> <CAA71u6ZuO9CF0ECFS4z07-E5qPea-6SfNwkvhr_g6pFT5MV5yQ@mail.gmail.com> <CAA71u6YKGHDRVg6W_xnCNaA68bJvAZ2Lkp-UisiPqb1vKjJhfA@mail.gmail.com> <3E9CA9334E6F433A8F135ACD5C237340@multiplay.co.uk> <CAA71u6YZAKrmfTLU32f8UmYecmydwiqRT-OrR1ukZ9V6PGsU%2Bw@mail.gmail.com> <A05ACD84EB974E80B7142CE9982E479C@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- 
From: "Steven Hartland" <killing@multiplay.co.uk>
> 
> After initially seeing not issues, our overnight monitoring started moaning
> big time on the test box. So we checked and there was zpool corruption as well
> as a missing boot loader and a corrupt GPT, so I believe we have reproduced
> your issue.
> 
> After recovering the machine I created 3 pools on 3 different disks each
> running a different delete_method.
> 
> We then re-ran the tests which resulted in the pool running with delete_method
> WS16 being so broken it had suspended IO. A reboot resulted in it once again
> reporting no partition table via gpart.
> 
> A third test run again produced a corrupt pool for WS16.
> 
> I've conducted a preliminary review of the CAM WS16 code path along with SBC-3
> spec which didn't identify any obvious issues.
> 
> Given we're both using LSI 2008 based controllers it could be FW issue specific
> to WS16 but that's just speculation atm, so I'll continue to investigate.
> 
> If you could re-test you end without using WS16 to see if you can reproduce the
> problem with either UNMAP or ATA_TRIM that would be a very useful data point.

After much playing I narrow down a test case of one delete which was causing
disc corruption for us (deleted the partition table instead of data in
the middle of the disk).

The conclusion is LSI 2008 HBA with FW below P13 will eat the data on your SATA
disks if you use WS16 due to the following bug:-
SCGCQ00230159 (DFCT) - Write same command to a SATA drive that doesn't support
SCT write same may write wrong region.

After updating here to P16, which we would generally be running, but test box
was new and hadnt updated yet the corruption issue is no longer reproducable.

So Ajit please check your FW version, I'm hoping to here your on something
below P13, P12 possibly?

If so then this is your issue, to fix simply update to P16 and the problem
should be gone.

    Regards
    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?93D0677B373A452BAF58C8EA6823783D>