Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Jun 2013 21:23:25 +0530
From:      Ajit Jain <ajit.jain@cloudbyte.com>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: seeing data corruption with zfs trim functionality
Message-ID:  <CAA71u6Zs9B=S6qFnTYarJXXo4wAq-5WuiQ7aMkrO=wxyG1_sxw@mail.gmail.com>
In-Reply-To: <AC3508418584444B85EBC5616508259F@multiplay.co.uk>
References:  <CAA71u6Y5dKZ9O0rqxCpx-9t7DYgTnPZSoNy-iHOnmzrOUYp%2Bvw@mail.gmail.com> <CAA71u6YKGHDRVg6W_xnCNaA68bJvAZ2Lkp-UisiPqb1vKjJhfA@mail.gmail.com> <3E9CA9334E6F433A8F135ACD5C237340@multiplay.co.uk> <CAA71u6YZAKrmfTLU32f8UmYecmydwiqRT-OrR1ukZ9V6PGsU%2Bw@mail.gmail.com> <A05ACD84EB974E80B7142CE9982E479C@multiplay.co.uk> <93D0677B373A452BAF58C8EA6823783D@multiplay.co.uk> <CAA71u6bZ_4fb9FxYSwcrHBBApkZog30iQJGyTERi-xFMksud1g@mail.gmail.com> <35ABA7AAEB7F4D86A1ED54C4C47FEB49@multiplay.co.uk> <CAA71u6ahzRai=uUp5L6nDQxxEZC=d5jd4jBBfPNa2k29OwTZDg@mail.gmail.com> <2C2F5CAAE72B4658BFA09E4694A21375@multiplay.co.uk> <CAA71u6a3TJ_sO3Q%2BiJa8EHKE2iM0MKh31D37pGAoua7QU_6xYg@mail.gmail.com> <6E4EBFE196274519B847A47A062950EE@multiplay.co.uk> <CAA71u6bZqYcyW-3RAQj9zjYcWp%2BUXPa4KhH4__nY=S6EuVVR-w@mail.gmail.com> <F71FEDB8BA5142C5A3A0F72DF75A6421@multiplay.co.uk> <CAA71u6a8d5b5CdaAp50HLGmNvK7p1PBJM6yH8AisCsSf%2B8U3-A@mail.gmail.com> <CAA71u6ZmZNKOHECqX=cEuVLFNfZkTCD6yUaz%2BhnG2GKsUHVp7A@mail.gmail.com> <AC3508418584444B85EBC5616508259F@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Steven,

I did not see the fs corruption issue after installing kernel from your
tarball on
my 9stable machine, and run the same test many times on the same physical
setup with the same disk.

So we can pretty much conclude that it was an issue with my src code
previously.

thanks
ajit


On Tue, Jun 4, 2013 at 2:05 PM, Steven Hartland <killing@multiplay.co.uk>wrote:

> Those are all just symlinks so unless your extracting to the correct
> location then its likely just moaning because the absolutely pathed file
> doesn't exsit.
>
> If your just wanting to update that machine run:-
> rm -rf /usr/src /usr/obj
> tar -xzPf stable-9-r251096.tar.gz
>
>
>    Regards
>    Steve
> ----- Original Message ----- From: "Ajit Jain" <ajit.jain@cloudbyte.com>
> To: "freebsd-fs" <freebsd-fs@freebsd.org>; "Steven Hartland" <
> killing@multiplay.co.uk>
> Sent: Tuesday, June 04, 2013 8:39 AM
> Subject: Fwd: seeing data corruption with zfs trim functionality
>
>
>  Hi Steven,
>>
>> I am not able to send full output file to freebsd-fs.
>> I am just sending the error file in this mail and will
>> send you another mail which contain to full untar output.
>>
>>
>> regards,
>> ajit
>>
>> ---------- Forwarded message ----------
>> From: Ajit Jain <ajit.jain@cloudbyte.com>
>> Date: Mon, Jun 3, 2013 at 11:51 PM
>> Subject: Re: seeing data corruption with zfs trim functionality
>> To: Steven Hartland <killing@multiplay.co.uk>
>> Cc: freebsd-fs <freebsd-fs@freebsd.org>
>>
>>
>> Hi Steven,
>>
>>
>> untar of the tarball is throwing the error below:
>> tar: Error exit delayed from previous errors.
>>
>> I have download the file from the link 3 times, every time I am seeing the
>> same issue.
>> Please find the tar output file and error (grep from the tar output file)
>> attached with mail.
>>
>> checksum of tar ball (after unzip, on freebsd) is:
>> root@everest:/pool_9stable/**obj_src/new # cksum stable-9-r251096.tar
>> 2972813925 3474278400 stable-9-r251096.tar
>>
>>
>> regards,
>> ajit
>>
>>
>>
>>
>> On Fri, May 31, 2013 at 4:12 AM, Steven Hartland <killing@multiplay.co.uk
>> >**wrote:
>>
>>  Tar archive of /usr/src and /usr/obj with built world and GENERIC kernel
>>> for ams64 can be found here:-
>>> http://blog.multiplay.co.uk/****dropzone/freebsd/stable-9-****
>>> r251096.tar.gz<http://blog.multiplay.co.uk/**dropzone/freebsd/stable-9-**r251096.tar.gz>;
>>> <http://blog.**multiplay.co.uk/dropzone/**freebsd/stable-9-r251096.tar.*
>>> *gz<http://blog.multiplay.co.uk/dropzone/freebsd/stable-9-r251096.tar.gz>;
>>> >
>>>
>>>
>>> This is based off r251096 with current proposed MFC of CAM BIO_DELETE &
>>> ZFS TRIM.
>>>
>>>
>>>    Regards
>>>    Steve
>>> ----- Original Message ----- From: "Ajit Jain" <ajit.jain@cloudbyte.com>
>>>
>>>
>>>  Hi Steven,
>>>
>>>>
>>>> That would be really great. I'll install build provided by you and can
>>>> quickly
>>>> update the result. I am kind of feeling that I am asking too much of
>>>> fever
>>>> from you.
>>>>
>>>> thanks for the help and bearing me,
>>>> ajit
>>>>
>>>>
>>>> On Wed, May 29, 2013 at 6:39 PM, Steven Hartland <
>>>> killing@multiplay.co.uk
>>>> >**wrote:
>>>>
>>>>
>>>>  Unfortunately FS corruption is a serious matters so even though I'm
>>>>
>>>>> 99.99%
>>>>> convinced there isn't a problem I'd still prefer to confirm this was
>>>>> indeed
>>>>> an issue with your code base and not an issue with the current code
>>>>> prior
>>>>> to MFC'ing.
>>>>>
>>>>> Would a pre-patched stable/9 source / build help. If so I can look at
>>>>> making
>>>>> that available for you.
>>>>>
>>>>>
>>>>>    Regards
>>>>>    Steve
>>>>>
>>>>> ----- Original Message ----- From: "Ajit Jain" <
>>>>> ajit.jain@cloudbyte.com>
>>>>>
>>>>>
>>>>>  Hi Steven,
>>>>>
>>>>>
>>>>>> Sorry for the long delay, but might delay even further.
>>>>>> I think the reason for the corruption was, my code
>>>>>> was not updated specially cam directory.
>>>>>>
>>>>>> I request please do not stop just because of the issue I reported.
>>>>>> I'll update my src tree and rerun the experiments I was running
>>>>>> if I see some issue then probably we fix the bug rather then stopping
>>>>>> for MFC.
>>>>>>
>>>>>> thanks,
>>>>>> ajit
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 29, 2013 at 5:19 PM, Steven Hartland <
>>>>>> killing@multiplay.co.uk
>>>>>> >**wrote:
>>>>>>
>>>>>>
>>>>>>  Sorry to pester, but any update on this Ajit?
>>>>>>
>>>>>>
>>>>>>> I ask as its currently blocking the MFC of TRIM to stable/8 & 9 and
>>>>>>> I've
>>>>>>> been
>>>>>>> unable to reproduce this issue even with your testing code on working
>>>>>>> FW
>>>>>>> versions.
>>>>>>>
>>>>>>>
>>>>>>>    Regards
>>>>>>>    Steve
>>>>>>>
>>>>>>> ----- Original Message ----- From: "Ajit Jain" <
>>>>>>> ajit.jain@cloudbyte.com>
>>>>>>>
>>>>>>>
>>>>>>>  Sure Steven,
>>>>>>>
>>>>>>>  I'll apply the patches and update ASAP.
>>>>>>>
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> ajit
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 23, 2013 at 3:03 PM, Steven Hartland <
>>>>>>>> killing@multiplay.co.uk
>>>>>>>> >**wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>  I've attacked the two patch sets I'm looking to MFC to stable-9,
>>>>>>>> one
>>>>>>>>
>>>>>>>>  adds BIO_DELETE CAM changes and the other is ZFS TRIM support.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> They should both apply cleanly to stable-9, if you could test with
>>>>>>>>> those on your machine and let me know.
>>>>>>>>>
>>>>>>>>>    Regards
>>>>>>>>>    Steve
>>>>>>>>>
>>>>>>>>> ----- Original Message ----- From: "Ajit Jain" <
>>>>>>>>> ajit.jain@cloudbyte.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Hi Steven,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  FW version on the setup is P15.
>>>>>>>>>
>>>>>>>>>> I will upgrade the FW to P16, but I think my
>>>>>>>>>> best bet will be to update code base to 9 stable as unlike you,
>>>>>>>>>> I was seeing corruption for all three delete methods.
>>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>> ajit
>>>>>>>>>>
>>>>>>>>>> On Sat, May 18, 2013 at 4:15 AM, Steven Hartland <
>>>>>>>>>> killing@multiplay.co.uk
>>>>>>>>>> >**wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  ----- Original Message ----- From: "Steven Hartland" <
>>>>>>>>>>
>>>>>>>>>>  killing@multiplay.co.uk>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  After initially seeing not issues, our overnight monitoring
>>>>>>>>>>> started
>>>>>>>>>>>
>>>>>>>>>>>  moaning
>>>>>>>>>>>
>>>>>>>>>>>  big time on the test box. So we checked and there was zpool
>>>>>>>>>>>> corruption
>>>>>>>>>>>> as
>>>>>>>>>>>> well
>>>>>>>>>>>> as a missing boot loader and a corrupt GPT, so I believe we have
>>>>>>>>>>>> reproduced
>>>>>>>>>>>> your issue.
>>>>>>>>>>>>
>>>>>>>>>>>> After recovering the machine I created 3 pools on 3 different
>>>>>>>>>>>> disks
>>>>>>>>>>>> each
>>>>>>>>>>>> running a different delete_method.
>>>>>>>>>>>>
>>>>>>>>>>>> We then re-ran the tests which resulted in the pool running with
>>>>>>>>>>>> delete_method
>>>>>>>>>>>> WS16 being so broken it had suspended IO. A reboot resulted in
>>>>>>>>>>>> it
>>>>>>>>>>>> once
>>>>>>>>>>>> again
>>>>>>>>>>>> reporting no partition table via gpart.
>>>>>>>>>>>>
>>>>>>>>>>>> A third test run again produced a corrupt pool for WS16.
>>>>>>>>>>>>
>>>>>>>>>>>> I've conducted a preliminary review of the CAM WS16 code path
>>>>>>>>>>>> along
>>>>>>>>>>>> with
>>>>>>>>>>>> SBC-3
>>>>>>>>>>>> spec which didn't identify any obvious issues.
>>>>>>>>>>>>
>>>>>>>>>>>> Given we're both using LSI 2008 based controllers it could be FW
>>>>>>>>>>>> issue
>>>>>>>>>>>> specific
>>>>>>>>>>>> to WS16 but that's just speculation atm, so I'll continue to
>>>>>>>>>>>> investigate.
>>>>>>>>>>>>
>>>>>>>>>>>> If you could re-test you end without using WS16 to see if you
>>>>>>>>>>>> can
>>>>>>>>>>>> reproduce the
>>>>>>>>>>>> problem with either UNMAP or ATA_TRIM that would be a very
>>>>>>>>>>>> useful
>>>>>>>>>>>> data
>>>>>>>>>>>> point.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  After much playing I narrow down a test case of one delete
>>>>>>>>>>>> which
>>>>>>>>>>>> was
>>>>>>>>>>>>
>>>>>>>>>>>>  causing
>>>>>>>>>>>>
>>>>>>>>>>> disc corruption for us (deleted the partition table instead of
>>>>>>>>>>> data
>>>>>>>>>>> in
>>>>>>>>>>> the middle of the disk).
>>>>>>>>>>>
>>>>>>>>>>> The conclusion is LSI 2008 HBA with FW below P13 will eat the
>>>>>>>>>>> data
>>>>>>>>>>> on
>>>>>>>>>>> your
>>>>>>>>>>> SATA
>>>>>>>>>>> disks if you use WS16 due to the following bug:-
>>>>>>>>>>> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that
>>>>>>>>>>> doesn't
>>>>>>>>>>> support
>>>>>>>>>>> SCT write same may write wrong region.
>>>>>>>>>>>
>>>>>>>>>>> After updating here to P16, which we would generally be running,
>>>>>>>>>>> but
>>>>>>>>>>> test
>>>>>>>>>>> box
>>>>>>>>>>> was new and hadnt updated yet the corruption issue is no longer
>>>>>>>>>>> reproducable.
>>>>>>>>>>>
>>>>>>>>>>> So Ajit please check your FW version, I'm hoping to here your on
>>>>>>>>>>> something
>>>>>>>>>>> below P13, P12 possibly?
>>>>>>>>>>>
>>>>>>>>>>> If so then this is your issue, to fix simply update to P16 and
>>>>>>>>>>> the
>>>>>>>>>>> problem
>>>>>>>>>>> should be gone.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Regards
>>>>>>>>>>>    Steve
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ==============================************==================
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This e.mail is private and confidential between Multiplay (UK)
>>>>>>>>>>> Ltd.
>>>>>>>>>>> and
>>>>>>>>>>> the person or entity to whom it is addressed. In the event of
>>>>>>>>>>> misdirection,
>>>>>>>>>>> the recipient is prohibited from using, copying, printing or
>>>>>>>>>>> otherwise
>>>>>>>>>>> disseminating it or any information contained in it.
>>>>>>>>>>> In the event of misdirection, illegible or incomplete
>>>>>>>>>>> transmission
>>>>>>>>>>> please
>>>>>>>>>>> telephone +44 845 868 1337
>>>>>>>>>>> or return the E.mail to postmaster@multiplay.co.uk.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>   ==============================**********==================
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>  This e.mail is private and confidential between Multiplay (UK)
>>>>>>>>>>
>>>>>>>>> Ltd. and
>>>>>>>>> the person or entity to whom it is addressed. In the event of
>>>>>>>>> misdirection,
>>>>>>>>> the recipient is prohibited from using, copying, printing or
>>>>>>>>> otherwise
>>>>>>>>> disseminating it or any information contained in it.
>>>>>>>>> In the event of misdirection, illegible or incomplete transmission
>>>>>>>>> please
>>>>>>>>> telephone +44 845 868 1337
>>>>>>>>> or return the E.mail to postmaster@multiplay.co.uk.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   ==============================********==================
>>>>>>>>>
>>>>>>>>
>>>>>>>>  This e.mail is private and confidential between Multiplay (UK)
>>>>>>> Ltd. and
>>>>>>> the person or entity to whom it is addressed. In the event of
>>>>>>> misdirection,
>>>>>>> the recipient is prohibited from using, copying, printing or
>>>>>>> otherwise
>>>>>>> disseminating it or any information contained in it.
>>>>>>> In the event of misdirection, illegible or incomplete transmission
>>>>>>> please
>>>>>>> telephone +44 845 868 1337
>>>>>>> or return the E.mail to postmaster@multiplay.co.uk.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   ==============================******==================
>>>>>>
>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>>>>> the person or entity to whom it is addressed. In the event of
>>>>> misdirection,
>>>>> the recipient is prohibited from using, copying, printing or otherwise
>>>>> disseminating it or any information contained in it.
>>>>> In the event of misdirection, illegible or incomplete transmission
>>>>> please
>>>>> telephone +44 845 868 1337
>>>>> or return the E.mail to postmaster@multiplay.co.uk.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>  ==============================****==================
>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and
>>> the person or entity to whom it is addressed. In the event of
>>> misdirection,
>>> the recipient is prohibited from using, copying, printing or otherwise
>>> disseminating it or any information contained in it.
>>> In the event of misdirection, illegible or incomplete transmission please
>>> telephone +44 845 868 1337
>>> or return the E.mail to postmaster@multiplay.co.uk.
>>>
>>>
>>>
>>
>
> ==============================**==================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and
> the person or entity to whom it is addressed. In the event of misdirection,
> the recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to postmaster@multiplay.co.uk.
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAA71u6Zs9B=S6qFnTYarJXXo4wAq-5WuiQ7aMkrO=wxyG1_sxw>