From owner-freebsd-fs@FreeBSD.ORG Thu May 16 22:35:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 872205D8 for ; Thu, 16 May 2013 22:35:25 +0000 (UTC) (envelope-from prvs=18480ee867=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 1561D18F for ; Thu, 16 May 2013 22:35:24 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50003842988.msg for ; Thu, 16 May 2013 23:35:23 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 16 May 2013 23:35:23 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=18480ee867=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Ajit Jain" References: <60316751643743738AB83DABC6A5934B@multiplay.co.uk> <20130429105143.GA1492@icarus.home.lan> <3AD1AB31003D49B2BF2EA7DD411B38A2@multiplay.co.uk> <9681E07546D348168052D4FC5365B4CD@multiplay.co.uk> <3E9CA9334E6F433A8F135ACD5C237340@multiplay.co.uk> Subject: Re: seeing data corruption with zfs trim functionality Date: Thu, 16 May 2013 23:35:23 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 May 2013 22:35:25 -0000 ----- Original Message ----- From: "Ajit Jain" To: "Steven Hartland" Cc: "freebsd-fs" Sent: Wednesday, May 15, 2013 4:18 PM Subject: Re: seeing data corruption with zfs trim functionality > Hi Steven, > > Thanks for the update. > It is surprising that there is no/less disk activity as the command is > correct. May be we need to enable > sync=always on the zfs dataset. > > I will try again the once I update the cam code base. > > regards, > ajit > > > On Wed, May 15, 2013 at 7:37 PM, Steven Hartland wrote: > >> Unless you have the latest CAM patches, which is in current, you wont be >> doing TRIM on SATA disk connected to an LSI controller. >> >> I've just tested using the following cmd under 8.3 with MFC'ed changes from >> current, using ATA_TRIM, UNMAP & WS16 and have had no issues on a machine >> with Intel SSD and LSI controller. >> ./iotest -t 20 -s 536870912 -W 100 -T 500 /test/iotest/ >> >> I did however notice that your test is hardly doing any disk access apart >> from when its "Initializing test file....", instead it seems to be CPU >> bound, >> so not sure if there's a problem with the iotest code or with my command >> line args? >> >> Given this my current thinking is either: >> 1. There's a problem with your patches 2. There's a bug in the FW of the >> Seagate disk >> 3. There's a problem with the UNMAP code which is being trigged by your >> disk >> only. >> >> I think #3 is quite unlikely. >> >> If you could install a recent version of current and test with that it >> should rule out #1, leaving #2. After initially seeing not issues, our overnight monitoring started moaning big time on the test box. So we checked and there was zpool corruption as well as a missing boot loader and a corrupt GPT, so I believe we have reproduced your issue. After recovering the machine I created 3 pools on 3 different disks each running a different delete_method. We then re-ran the tests which resulted in the pool running with delete_method WS16 being so broken it had suspended IO. A reboot resulted in it once again reporting no partition table via gpart. A third test run again produced a corrupt pool for WS16. I've conducted a preliminary review of the CAM WS16 code path along with SBC-3 spec which didn't identify any obvious issues. Given we're both using LSI 2008 based controllers it could be FW issue specific to WS16 but that's just speculation atm, so I'll continue to investigate. If you could re-test you end without using WS16 to see if you can reproduce the problem with either UNMAP or ATA_TRIM that would be a very useful data point. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.