Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 May 2013 12:49:46 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Ajit Jain" <ajit.jain@cloudbyte.com>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: seeing data corruption with zfs trim functionality
Message-ID:  <2C2F5CAAE72B4658BFA09E4694A21375@multiplay.co.uk>
References:  <CAA71u6Y5dKZ9O0rqxCpx-9t7DYgTnPZSoNy-iHOnmzrOUYp%2Bvw@mail.gmail.com> <60316751643743738AB83DABC6A5934B@multiplay.co.uk> <20130429105143.GA1492@icarus.home.lan> <3AD1AB31003D49B2BF2EA7DD411B38A2@multiplay.co.uk> <C6AA4D0A7C49469ABB3C7440B1BCC108@multiplay.co.uk> <CAA71u6Zh7BbbdC=utqfR2MD1Nn=9euUDXHKqqu9NyBG-Jx%2B=Ow@mail.gmail.com> <9681E07546D348168052D4FC5365B4CD@multiplay.co.uk> <CAA71u6ZuO9CF0ECFS4z07-E5qPea-6SfNwkvhr_g6pFT5MV5yQ@mail.gmail.com> <CAA71u6YKGHDRVg6W_xnCNaA68bJvAZ2Lkp-UisiPqb1vKjJhfA@mail.gmail.com> <3E9CA9334E6F433A8F135ACD5C237340@multiplay.co.uk> <CAA71u6YZAKrmfTLU32f8UmYecmydwiqRT-OrR1ukZ9V6PGsU%2Bw@mail.gmail.com> <A05ACD84EB974E80B7142CE9982E479C@multiplay.co.uk> <93D0677B373A452BAF58C8EA6823783D@multiplay.co.uk> <CAA71u6bZ_4fb9FxYSwcrHBBApkZog30iQJGyTERi-xFMksud1g@mail.gmail.com> <35ABA7AAEB7F4D86A1ED54C4C47FEB49@multiplay.co.uk> <CAA71u6ahzRai=uUp5L6nDQxxEZC=d5jd4jBBfPNa2k29OwTZDg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Sorry to pester, but any update on this Ajit?

I ask as its currently blocking the MFC of TRIM to stable/8 & 9 and I've been
unable to reproduce this issue even with your testing code on working FW
versions.

    Regards
    Steve

----- Original Message ----- 
From: "Ajit Jain" <ajit.jain@cloudbyte.com>


> Sure Steven,
> I'll apply the patches and update ASAP.
> 
> thanks
> ajit
> 
> 
> On Thu, May 23, 2013 at 3:03 PM, Steven Hartland <killing@multiplay.co.uk>wrote:
> 
>> I've attacked the two patch sets I'm looking to MFC to stable-9, one
>> adds BIO_DELETE CAM changes and the other is ZFS TRIM support.
>>
>> They should both apply cleanly to stable-9, if you could test with
>> those on your machine and let me know.
>>
>>    Regards
>>    Steve
>>
>> ----- Original Message ----- From: "Ajit Jain" <ajit.jain@cloudbyte.com>
>>
>>
>>  Hi Steven,
>>>
>>> FW version on the setup is P15.
>>> I will upgrade the FW to P16, but I think my
>>> best bet will be to update code base to 9 stable as unlike you,
>>> I was seeing corruption for all three delete methods.
>>>
>>> thanks
>>> ajit
>>>
>>> On Sat, May 18, 2013 at 4:15 AM, Steven Hartland <killing@multiplay.co.uk
>>> >**wrote:
>>>
>>>  ----- Original Message ----- From: "Steven Hartland" <
>>>> killing@multiplay.co.uk>
>>>>
>>>>
>>>>  After initially seeing not issues, our overnight monitoring started
>>>>> moaning
>>>>> big time on the test box. So we checked and there was zpool corruption
>>>>> as
>>>>> well
>>>>> as a missing boot loader and a corrupt GPT, so I believe we have
>>>>> reproduced
>>>>> your issue.
>>>>>
>>>>> After recovering the machine I created 3 pools on 3 different disks each
>>>>> running a different delete_method.
>>>>>
>>>>> We then re-ran the tests which resulted in the pool running with
>>>>> delete_method
>>>>> WS16 being so broken it had suspended IO. A reboot resulted in it once
>>>>> again
>>>>> reporting no partition table via gpart.
>>>>>
>>>>> A third test run again produced a corrupt pool for WS16.
>>>>>
>>>>> I've conducted a preliminary review of the CAM WS16 code path along with
>>>>> SBC-3
>>>>> spec which didn't identify any obvious issues.
>>>>>
>>>>> Given we're both using LSI 2008 based controllers it could be FW issue
>>>>> specific
>>>>> to WS16 but that's just speculation atm, so I'll continue to
>>>>> investigate.
>>>>>
>>>>> If you could re-test you end without using WS16 to see if you can
>>>>> reproduce the
>>>>> problem with either UNMAP or ATA_TRIM that would be a very useful data
>>>>> point.
>>>>>
>>>>>
>>>> After much playing I narrow down a test case of one delete which was
>>>> causing
>>>> disc corruption for us (deleted the partition table instead of data in
>>>> the middle of the disk).
>>>>
>>>> The conclusion is LSI 2008 HBA with FW below P13 will eat the data on
>>>> your
>>>> SATA
>>>> disks if you use WS16 due to the following bug:-
>>>> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that doesn't
>>>> support
>>>> SCT write same may write wrong region.
>>>>
>>>> After updating here to P16, which we would generally be running, but test
>>>> box
>>>> was new and hadnt updated yet the corruption issue is no longer
>>>> reproducable.
>>>>
>>>> So Ajit please check your FW version, I'm hoping to here your on
>>>> something
>>>> below P13, P12 possibly?
>>>>
>>>> If so then this is your issue, to fix simply update to P16 and the
>>>> problem
>>>> should be gone.
>>>>
>>>>
>>>>    Regards
>>>>    Steve
>>>>
>>>>
>>>> ==============================****==================
>>>>
>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>>>> the person or entity to whom it is addressed. In the event of
>>>> misdirection,
>>>> the recipient is prohibited from using, copying, printing or otherwise
>>>> disseminating it or any information contained in it.
>>>> In the event of misdirection, illegible or incomplete transmission please
>>>> telephone +44 845 868 1337
>>>> or return the E.mail to postmaster@multiplay.co.uk.
>>>>
>>>>
>>>>
>>>
>> ==============================**==================
>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>> the person or entity to whom it is addressed. In the event of misdirection,
>> the recipient is prohibited from using, copying, printing or otherwise
>> disseminating it or any information contained in it.
>> In the event of misdirection, illegible or incomplete transmission please
>> telephone +44 845 868 1337
>> or return the E.mail to postmaster@multiplay.co.uk.
>>
>

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2C2F5CAAE72B4658BFA09E4694A21375>