From owner-freebsd-fs@FreeBSD.ORG Wed May 29 15:19:30 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3C6B4D97 for ; Wed, 29 May 2013 15:19:30 +0000 (UTC) (envelope-from ajit.jain@cloudbyte.com) Received: from mail-ob0-x22c.google.com (mail-ob0-x22c.google.com [IPv6:2607:f8b0:4003:c01::22c]) by mx1.freebsd.org (Postfix) with ESMTP id 01DCCED3 for ; Wed, 29 May 2013 15:19:29 +0000 (UTC) Received: by mail-ob0-f172.google.com with SMTP id wo10so567109obc.17 for ; Wed, 29 May 2013 08:19:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=qeW96SPbNXUuFMfqeTNIESv3Za6afSy8jyAr2YJ2qrU=; b=YFoLwsKqb0KhjX62U6PTUDsfIqIz2CwCiFDTCIMu8jGJA83pjOfqH68BIbW+nHY3vE WiBrZ079c/Ixr6yHfZwbErCJPRxXhp08ioGGK46hbxwTlQ4y0daoLdlRJ/PP+coGgaNm ZaSokTdoMrGYHa3Ts3XaBa0fBC0Z4Fju8GvWfcb5PSufvzaqV0f45B8lhx926do86q6k ErGdjXzZEOw7QAMtGtxQ1nl3U9vijYa4vNJlg9KGcmx75kqLaqjUtFaihAah9CS7l8/Z 7Zme/hRgDBQhvuU51QgePiDE7KB9iZBmOs4o7ZUUYdwN0/3TqRHtMa782iuRIJ+0L+Wa h4vw== X-Received: by 10.182.98.135 with SMTP id ei7mr1823900obb.102.1369840769539; Wed, 29 May 2013 08:19:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.151.134 with HTTP; Wed, 29 May 2013 08:19:08 -0700 (PDT) In-Reply-To: <6E4EBFE196274519B847A47A062950EE@multiplay.co.uk> References: <3AD1AB31003D49B2BF2EA7DD411B38A2@multiplay.co.uk> <9681E07546D348168052D4FC5365B4CD@multiplay.co.uk> <3E9CA9334E6F433A8F135ACD5C237340@multiplay.co.uk> <93D0677B373A452BAF58C8EA6823783D@multiplay.co.uk> <35ABA7AAEB7F4D86A1ED54C4C47FEB49@multiplay.co.uk> <2C2F5CAAE72B4658BFA09E4694A21375@multiplay.co.uk> <6E4EBFE196274519B847A47A062950EE@multiplay.co.uk> From: Ajit Jain Date: Wed, 29 May 2013 20:49:08 +0530 Message-ID: Subject: Re: seeing data corruption with zfs trim functionality To: Steven Hartland X-Gm-Message-State: ALoCoQnuBhDUBuhNNH1dtRWJW1xjqC0i5P/sOZEiJZDTvSWyjj+LG51IYGnYAIeBQPlhTfNWDnb2 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 May 2013 15:19:30 -0000 Hi Steven, That would be really great. I'll install build provided by you and can quickly update the result. I am kind of feeling that I am asking too much of fever from you. thanks for the help and bearing me, ajit On Wed, May 29, 2013 at 6:39 PM, Steven Hartland wrote: > Unfortunately FS corruption is a serious matters so even though I'm 99.99% > convinced there isn't a problem I'd still prefer to confirm this was indeed > an issue with your code base and not an issue with the current code prior > to MFC'ing. > > Would a pre-patched stable/9 source / build help. If so I can look at > making > that available for you. > > > Regards > Steve > > ----- Original Message ----- From: "Ajit Jain" > > > Hi Steven, >> >> Sorry for the long delay, but might delay even further. >> I think the reason for the corruption was, my code >> was not updated specially cam directory. >> >> I request please do not stop just because of the issue I reported. >> I'll update my src tree and rerun the experiments I was running >> if I see some issue then probably we fix the bug rather then stopping >> for MFC. >> >> thanks, >> ajit >> >> >> >> On Wed, May 29, 2013 at 5:19 PM, Steven Hartland > >**wrote: >> >> Sorry to pester, but any update on this Ajit? >>> >>> I ask as its currently blocking the MFC of TRIM to stable/8 & 9 and I've >>> been >>> unable to reproduce this issue even with your testing code on working FW >>> versions. >>> >>> >>> Regards >>> Steve >>> >>> ----- Original Message ----- From: "Ajit Jain" >>> >>> >>> Sure Steven, >>> >>>> I'll apply the patches and update ASAP. >>>> >>>> thanks >>>> ajit >>>> >>>> >>>> On Thu, May 23, 2013 at 3:03 PM, Steven Hartland < >>>> killing@multiplay.co.uk >>>> >**wrote: >>>> >>>> >>>> I've attacked the two patch sets I'm looking to MFC to stable-9, one >>>> >>>>> adds BIO_DELETE CAM changes and the other is ZFS TRIM support. >>>>> >>>>> They should both apply cleanly to stable-9, if you could test with >>>>> those on your machine and let me know. >>>>> >>>>> Regards >>>>> Steve >>>>> >>>>> ----- Original Message ----- From: "Ajit Jain" < >>>>> ajit.jain@cloudbyte.com> >>>>> >>>>> >>>>> Hi Steven, >>>>> >>>>> >>>>>> FW version on the setup is P15. >>>>>> I will upgrade the FW to P16, but I think my >>>>>> best bet will be to update code base to 9 stable as unlike you, >>>>>> I was seeing corruption for all three delete methods. >>>>>> >>>>>> thanks >>>>>> ajit >>>>>> >>>>>> On Sat, May 18, 2013 at 4:15 AM, Steven Hartland < >>>>>> killing@multiplay.co.uk >>>>>> >**wrote: >>>>>> >>>>>> >>>>>> ----- Original Message ----- From: "Steven Hartland" < >>>>>> >>>>>> killing@multiplay.co.uk> >>>>>>> >>>>>>> >>>>>>> After initially seeing not issues, our overnight monitoring started >>>>>>> >>>>>>> moaning >>>>>>>> big time on the test box. So we checked and there was zpool >>>>>>>> corruption >>>>>>>> as >>>>>>>> well >>>>>>>> as a missing boot loader and a corrupt GPT, so I believe we have >>>>>>>> reproduced >>>>>>>> your issue. >>>>>>>> >>>>>>>> After recovering the machine I created 3 pools on 3 different disks >>>>>>>> each >>>>>>>> running a different delete_method. >>>>>>>> >>>>>>>> We then re-ran the tests which resulted in the pool running with >>>>>>>> delete_method >>>>>>>> WS16 being so broken it had suspended IO. A reboot resulted in it >>>>>>>> once >>>>>>>> again >>>>>>>> reporting no partition table via gpart. >>>>>>>> >>>>>>>> A third test run again produced a corrupt pool for WS16. >>>>>>>> >>>>>>>> I've conducted a preliminary review of the CAM WS16 code path along >>>>>>>> with >>>>>>>> SBC-3 >>>>>>>> spec which didn't identify any obvious issues. >>>>>>>> >>>>>>>> Given we're both using LSI 2008 based controllers it could be FW >>>>>>>> issue >>>>>>>> specific >>>>>>>> to WS16 but that's just speculation atm, so I'll continue to >>>>>>>> investigate. >>>>>>>> >>>>>>>> If you could re-test you end without using WS16 to see if you can >>>>>>>> reproduce the >>>>>>>> problem with either UNMAP or ATA_TRIM that would be a very useful >>>>>>>> data >>>>>>>> point. >>>>>>>> >>>>>>>> >>>>>>>> After much playing I narrow down a test case of one delete which >>>>>>>> was >>>>>>>> >>>>>>> causing >>>>>>> disc corruption for us (deleted the partition table instead of data >>>>>>> in >>>>>>> the middle of the disk). >>>>>>> >>>>>>> The conclusion is LSI 2008 HBA with FW below P13 will eat the data on >>>>>>> your >>>>>>> SATA >>>>>>> disks if you use WS16 due to the following bug:- >>>>>>> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that >>>>>>> doesn't >>>>>>> support >>>>>>> SCT write same may write wrong region. >>>>>>> >>>>>>> After updating here to P16, which we would generally be running, but >>>>>>> test >>>>>>> box >>>>>>> was new and hadnt updated yet the corruption issue is no longer >>>>>>> reproducable. >>>>>>> >>>>>>> So Ajit please check your FW version, I'm hoping to here your on >>>>>>> something >>>>>>> below P13, P12 possibly? >>>>>>> >>>>>>> If so then this is your issue, to fix simply update to P16 and the >>>>>>> problem >>>>>>> should be gone. >>>>>>> >>>>>>> >>>>>>> Regards >>>>>>> Steve >>>>>>> >>>>>>> >>>>>>> ==============================********================== >>>>>>> >>>>>>> >>>>>>> >>>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. >>>>>>> and >>>>>>> the person or entity to whom it is addressed. In the event of >>>>>>> misdirection, >>>>>>> the recipient is prohibited from using, copying, printing or >>>>>>> otherwise >>>>>>> disseminating it or any information contained in it. >>>>>>> In the event of misdirection, illegible or incomplete transmission >>>>>>> please >>>>>>> telephone +44 845 868 1337 >>>>>>> or return the E.mail to postmaster@multiplay.co.uk. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ==============================******================== >>>>>> >>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>>>> the person or entity to whom it is addressed. In the event of >>>>> misdirection, >>>>> the recipient is prohibited from using, copying, printing or otherwise >>>>> disseminating it or any information contained in it. >>>>> In the event of misdirection, illegible or incomplete transmission >>>>> please >>>>> telephone +44 845 868 1337 >>>>> or return the E.mail to postmaster@multiplay.co.uk. >>>>> >>>>> >>>>> >>>> ==============================****================== >>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>> the person or entity to whom it is addressed. In the event of >>> misdirection, >>> the recipient is prohibited from using, copying, printing or otherwise >>> disseminating it or any information contained in it. >>> In the event of misdirection, illegible or incomplete transmission please >>> telephone +44 845 868 1337 >>> or return the E.mail to postmaster@multiplay.co.uk. >>> >>> >>> >> > ==============================**================== > This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of misdirection, > the recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > >