From owner-freebsd-fs@FreeBSD.ORG Thu May 23 06:10:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0F3A7442 for ; Thu, 23 May 2013 06:10:00 +0000 (UTC) (envelope-from ajit.jain@cloudbyte.com) Received: from mail-oa0-f54.google.com (mail-oa0-f54.google.com [209.85.219.54]) by mx1.freebsd.org (Postfix) with ESMTP id CC29161D for ; Thu, 23 May 2013 06:09:59 +0000 (UTC) Received: by mail-oa0-f54.google.com with SMTP id o17so3909061oag.13 for ; Wed, 22 May 2013 23:09:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=kxJ9CHNH6+dake9NWUUsqLviQKtLDnokbwnnmDTU7P8=; b=cU7ttuploMqoemvkC1zNLGcVqxjLzic7g8biTaO0V8Ch2m63r2Jb6HZwpYJ9HSIfEo aN1q4eximfgCfnOiNwlid79YAIGNz2Hzx2c2xgmEQ2L8AbLCAq/KU+LWEjyIzL8ChvmX 2+JUvzJXhJ3oLuMbBuCyOZCW6c/p5yg2QXam1AM8WTdxC9Rhy5IUk4N6CcHRY3xi4DsE 0lIlasLGy+gyMr5fUj5jFSVNZCv2z1VT9wS1Vbrt+YlYbe21cKFObV5A8CCbBb0YhJPP i6BIHfiFDZyEBd6ZbtVY+lSLaWS8HOZUQD9sv7SN/iGLJy0JALprfkdIebMcuJOtPpyA ga+w== X-Received: by 10.182.225.199 with SMTP id rm7mr7398682obc.20.1369289392740; Wed, 22 May 2013 23:09:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.151.134 with HTTP; Wed, 22 May 2013 23:09:32 -0700 (PDT) In-Reply-To: <93D0677B373A452BAF58C8EA6823783D@multiplay.co.uk> References: <60316751643743738AB83DABC6A5934B@multiplay.co.uk> <20130429105143.GA1492@icarus.home.lan> <3AD1AB31003D49B2BF2EA7DD411B38A2@multiplay.co.uk> <9681E07546D348168052D4FC5365B4CD@multiplay.co.uk> <3E9CA9334E6F433A8F135ACD5C237340@multiplay.co.uk> <93D0677B373A452BAF58C8EA6823783D@multiplay.co.uk> From: Ajit Jain Date: Thu, 23 May 2013 11:39:32 +0530 Message-ID: Subject: Re: seeing data corruption with zfs trim functionality To: Steven Hartland X-Gm-Message-State: ALoCoQlYnYig6d90RoBeMNi6gNIkHmou9vJVoeOZkdA31HVDdco5YiDXJA7SA4OpDk37Pmn5K/Yr Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 May 2013 06:10:00 -0000 Hi Steven, FW version on the setup is P15. I will upgrade the FW to P16, but I think my best bet will be to update code base to 9 stable as unlike you, I was seeing corruption for all three delete methods. thanks ajit On Sat, May 18, 2013 at 4:15 AM, Steven Hartland wrote: > ----- Original Message ----- From: "Steven Hartland" < > killing@multiplay.co.uk> > > >> After initially seeing not issues, our overnight monitoring started >> moaning >> big time on the test box. So we checked and there was zpool corruption as >> well >> as a missing boot loader and a corrupt GPT, so I believe we have >> reproduced >> your issue. >> >> After recovering the machine I created 3 pools on 3 different disks each >> running a different delete_method. >> >> We then re-ran the tests which resulted in the pool running with >> delete_method >> WS16 being so broken it had suspended IO. A reboot resulted in it once >> again >> reporting no partition table via gpart. >> >> A third test run again produced a corrupt pool for WS16. >> >> I've conducted a preliminary review of the CAM WS16 code path along with >> SBC-3 >> spec which didn't identify any obvious issues. >> >> Given we're both using LSI 2008 based controllers it could be FW issue >> specific >> to WS16 but that's just speculation atm, so I'll continue to investigate. >> >> If you could re-test you end without using WS16 to see if you can >> reproduce the >> problem with either UNMAP or ATA_TRIM that would be a very useful data >> point. >> > > After much playing I narrow down a test case of one delete which was > causing > disc corruption for us (deleted the partition table instead of data in > the middle of the disk). > > The conclusion is LSI 2008 HBA with FW below P13 will eat the data on your > SATA > disks if you use WS16 due to the following bug:- > SCGCQ00230159 (DFCT) - Write same command to a SATA drive that doesn't > support > SCT write same may write wrong region. > > After updating here to P16, which we would generally be running, but test > box > was new and hadnt updated yet the corruption issue is no longer > reproducable. > > So Ajit please check your FW version, I'm hoping to here your on something > below P13, P12 possibly? > > If so then this is your issue, to fix simply update to P16 and the problem > should be gone. > > > Regards > Steve > > > ==============================**================== > This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of misdirection, > the recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > >