From owner-freebsd-fs@FreeBSD.ORG Sun Jan 13 15:30:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E9035B9B for ; Sun, 13 Jan 2013 15:30:47 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id 9F40FA0C for ; Sun, 13 Jan 2013 15:30:47 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.6/8.14.6) with ESMTP id r0DFUkhW015940; Sun, 13 Jan 2013 08:30:46 -0700 (MST) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.6/8.14.6/Submit) with ESMTP id r0DFUi0D015937; Sun, 13 Jan 2013 08:30:46 -0700 (MST) (envelope-from wblock@wonkity.com) Date: Sun, 13 Jan 2013 08:30:43 -0700 (MST) From: Warren Block To: kpneal@pobox.com Subject: Re: Using glabel In-Reply-To: <20130113062702.GA63271@neutralgood.org> Message-ID: References: <20130112200041.GA77338@psconsult.nl> <20130113062702.GA63271@neutralgood.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (wonkity.com [127.0.0.1]); Sun, 13 Jan 2013 08:30:46 -0700 (MST) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jan 2013 15:30:48 -0000 On Sun, 13 Jan 2013, kpneal@pobox.com wrote: >> You can use glabel to label your disks or partition the disks with gpart >> (using the GPT scheme) and let gpt put a label on each (-l flag). > > Don't use glabel pretty much ever. It stores the label inside the > partition (or disk). If the end of the partition is ever touched then > the label goes *poof*. Stick to gpt labels. If you label a partition, the label device will be one block smaller in size. The metadata is hidden and safe, as long as it is accessed through the label device. # diskinfo -v /dev/da0p1 /dev/da0p1 512 # sectorsize 512000 # mediasize in bytes (500k) 1000 # mediasize in sectors # glabel label teeny /dev/da0p1 # diskinfo -v /dev/label/teeny /dev/label/teeny 512 # sectorsize 511488 # mediasize in bytes (499k) 999 # mediasize in sectors Note the size in sectors. The problem is that sometimes people don't realize that the label device (/dev/label/teeny) is offering those extra features and will continue to use the raw partition in newfs commands and such. Anyway, GPT labels are still preferable to glabel because they can be created at the same time as partitions and don't use any extra metadata. ZFS has its own metadata, and newer versions are supposed to leave the last megabyte or so unused to allow for actual versus nominal disk sizes. I'm not clear whether there's a good reason to use additional labels instead of just giving ZFS the whole disk. Unless you aren't planning on using the whole disk for ZFS, of course. From owner-freebsd-fs@FreeBSD.ORG Sun Jan 13 18:22:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 32AB7776 for ; Sun, 13 Jan 2013 18:22:00 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id BF8BC2B1 for ; Sun, 13 Jan 2013 18:21:59 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1TuSBq-0000tX-Ku for freebsd-fs@freebsd.org; Sun, 13 Jan 2013 19:21:52 +0100 Received: from h253044.upc-h.chello.nl ([62.194.253.44] helo=ronaldradial.home) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1TuSBq-0005Gi-PY for freebsd-fs@freebsd.org; Sun, 13 Jan 2013 19:21:50 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: Using glabel References: <20130112200041.GA77338@psconsult.nl> <20130113062702.GA63271@neutralgood.org> Date: Sun, 13 Jan 2013 19:21:52 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/12.12 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.8 X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: c09395f469c52153b963e4ff2d10f427 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jan 2013 18:22:00 -0000 On Sun, 13 Jan 2013 16:30:43 +0100, Warren Block wrote: > On Sun, 13 Jan 2013, kpneal@pobox.com wrote: > >>> You can use glabel to label your disks or partition the disks with >>> gpart >>> (using the GPT scheme) and let gpt put a label on each (-l flag). >> >> Don't use glabel pretty much ever. It stores the label inside the >> partition (or disk). If the end of the partition is ever touched then >> the label goes *poof*. Stick to gpt labels. > > If you label a partition, the label device will be one block smaller in > size. The metadata is hidden and safe, as long as it is accessed > through the label device. > > # diskinfo -v /dev/da0p1 > /dev/da0p1 > 512 # sectorsize > 512000 # mediasize in bytes (500k) > 1000 # mediasize in sectors > > # glabel label teeny /dev/da0p1 > # diskinfo -v /dev/label/teeny > /dev/label/teeny > 512 # sectorsize > 511488 # mediasize in bytes (499k) > 999 # mediasize in sectors > > Note the size in sectors. The problem is that sometimes people don't > realize that the label device (/dev/label/teeny) is offering those extra > features and will continue to use the raw partition in newfs commands > and such. > > Anyway, GPT labels are still preferable to glabel because they can be > created at the same time as partitions and don't use any extra metadata. > > ZFS has its own metadata, and newer versions are supposed to leave the > last megabyte or so unused to allow for actual versus nominal disk > sizes. I'm not clear whether there's a good reason to use additional > labels instead of just giving ZFS the whole disk. Unless you aren't > planning on using the whole disk for ZFS, of course. Gpt labels are also portable between different OS'es. If you ever want to import your pool with OpenSolaris or the more recent forks of OpenSolaris they will understand gpt. Glabel is FreeBSD only. Ronald. From owner-freebsd-fs@FreeBSD.ORG Sun Jan 13 21:28:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D45116C7; Sun, 13 Jan 2013 21:28:46 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from mail.distal.com (mail.distal.com [IPv6:2001:470:e24c:200::ae25]) by mx1.freebsd.org (Postfix) with ESMTP id A8EBFDB2; Sun, 13 Jan 2013 21:28:46 +0000 (UTC) Received: from [192.168.1.151] (static-71-163-17-12.washdc.fios.verizon.net [71.163.17.12]) (authenticated bits=0) by mail.distal.com (8.14.3/8.14.3) with ESMTP id r0DLSicA026417 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sun, 13 Jan 2013 16:28:45 -0500 (EST) From: Chris Ross Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: ZFS loader crash on sparc64 (since Oct 2012) Date: Sun, 13 Jan 2013 16:28:43 -0500 Message-Id: <4031F492-C30C-4F5D-BF8E-B2D61FFD0EAD@distal.com> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) X-Mailer: Apple Mail (2.1499) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.2 (mail.distal.com [206.138.151.250]); Sun, 13 Jan 2013 16:28:45 -0500 (EST) Cc: "freebsd-sparc64@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jan 2013 21:28:46 -0000 Since this is a loader crash related to ZFS, someone wisely pointed = out that posting to freebsd-fs might be a good idea. There's a long = thread on the freebsd-sparc64 list about this issue, you can find at: = http://list-archives.org/2012/12/23/freebsd-sparc64-freebsd-org/changes-to= -kern-geom-debugflags/f/4758203564 But, the relevant details are that I determined that stable/9 changed = for sparc64 on October 28, with revision 242230. This was noted as a = merge by avg for revision 241289, and appears to be part of a bunch of = changes he made on October 6 (revs 241282-241294, plus some others = nearby). I'm interested in getting this fixed, so I can build a bootloader that = doesn't cause my sparc64 to hit a divide by zero trap. Please feel free to contact me with any questions about what I found. = (most of it is in the freebsd-sparc64 thread linked to above, but I'm = happy to describe anything unclear or recount from memory.) Thanks! - Chris From owner-freebsd-fs@FreeBSD.ORG Sun Jan 13 22:22:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A0C47776 for ; Sun, 13 Jan 2013 22:22:47 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-la0-f44.google.com (mail-la0-f44.google.com [209.85.215.44]) by mx1.freebsd.org (Postfix) with ESMTP id 056161F3 for ; Sun, 13 Jan 2013 22:22:46 +0000 (UTC) Received: by mail-la0-f44.google.com with SMTP id fr10so3293807lab.31 for ; Sun, 13 Jan 2013 14:22:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=rqxrAmv+IZanzO0YEz3YyR9ZlP14vKr0uBuFbIc+7Y4=; b=EcVxDgFeuTn2ZTT9Hr/DaG2L5VcXyzPwEnn0kSNVo0d2+Kvfqfxdx68R2wJ937Codx 6Z5zvL790PbC/k687YzlNdMrJi7JEFBoKA6lP+I8xW6KN8xLx3gQUrwe/uHZy/qr5w/a zfTEKD1zYmqVFiHcaoIpu7AJvMM2PYlfOvPKo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=rqxrAmv+IZanzO0YEz3YyR9ZlP14vKr0uBuFbIc+7Y4=; b=PqXRNpWaM26QiflVi5+Fzw83sUgXrLiQDAoxeKU+OjailUni2ii2lCkMU/QqpuUsEK NQp6SAKZaT/wht/Dls3s0e93XBC60yCjQNkN3sBphlkyMG5bOjxzOyYnHaWLFLQ2AcN+ pvH/430m5EE6ZdMcv/f/mncqT8flCRpyEGdBb7z27KOhMYy5Vn/Ijkk6I/mA2NJHEh0W hg6hBHnIxmPbBtH+UYGQ/8cGF0Bo14JgajVx9yjO4snoFgKoLPhDDmXpBp8X6NPoL1Yc e5zNZJUSYl+r5sWua6s5gpvnGeYUex4YZyVO4AVcyjXdCAGXVsRG2hIvReZkbx5b/jSr byuw== Received: by 10.112.23.2 with SMTP id i2mr34917868lbf.24.1358115760248; Sun, 13 Jan 2013 14:22:40 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.151.36 with HTTP; Sun, 13 Jan 2013 14:22:10 -0800 (PST) In-Reply-To: References: From: Eitan Adler Date: Sun, 13 Jan 2013 17:22:10 -0500 Message-ID: Subject: Re: What are the limits for FFS file systems and assorted questions To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQn+rh8h/q9i4Cd34+aYC9zoOCwu6p+X7Otau6M/bfucUsNqY7J1XtDDVlCYfq9fhUft8+IB X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jan 2013 22:22:47 -0000 Can anyone provide an up to date answer for the following: If these are all already perfect and correct can you please tell me so? On 18 December 2012 23:13, Eitan Adler wrote: > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/book.html#ffs-limits > Are the bugs listed still bugs? > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/book.html#mount-foreign-fs > Is this completely true? Should it be updated? > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/book.html#alternate-directory-layout > Does this still deserve to be listed as a FAQ? > > -- > Eitan Adler -- Eitan Adler From owner-freebsd-fs@FreeBSD.ORG Sun Jan 13 23:52:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 24D7070F for ; Sun, 13 Jan 2013 23:52:47 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id B417D700 for ; Sun, 13 Jan 2013 23:52:46 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3Ykvkb3jr4z7ySF; Mon, 14 Jan 2013 00:52:39 +0100 (CET) Date: Mon, 14 Jan 2013 00:52:39 +0100 From: Nicolas Rachinsky To: Steven Hartland Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130113235239.GA16318@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111111147.GA34160@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jan 2013 23:52:47 -0000 * Steven Hartland [2013-01-11 13:58 -0000]: > TBH looks like your just saturating your disks with the number of IOP's > your doing. But now a backup takes forever (16hours and more) that took less than 30 minutes two weeks ago. I duplicated the complete setup to another (slower) server. There the backups are slower than they were on this machine, but they are much faster than on this machine. Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 00:05:36 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 865C4A48 for ; Mon, 14 Jan 2013 00:05:36 +0000 (UTC) (envelope-from prvs=1726b28670=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 215A4791 for ; Mon, 14 Jan 2013 00:05:35 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001701803.msg for ; Mon, 14 Jan 2013 00:05:33 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 14 Jan 2013 00:05:33 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1726b28670=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <778BCD159C6546A4ADC7D9FBBFC3DA8A@multiplay.co.uk> From: "Steven Hartland" To: "Nicolas Rachinsky" References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111111147.GA34160@mid.pc5.i.0x5.de> <20130113235239.GA16318@mid.pc5.i.0x5.de> Subject: Re: slowdown of zfs (tx->tx) Date: Mon, 14 Jan 2013 00:05:54 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 00:05:36 -0000 ----- Original Message ----- From: "Nicolas Rachinsky" To: "Steven Hartland" Cc: "freebsd-fs" Sent: Sunday, January 13, 2013 11:52 PM Subject: Re: slowdown of zfs (tx->tx) >* Steven Hartland [2013-01-11 13:58 -0000]: >> TBH looks like your just saturating your disks with the number of IOP's >> your doing. > > But now a backup takes forever (16hours and more) that took less than > 30 minutes two weeks ago. > > I duplicated the complete setup to another (slower) server. There the > backups are slower than they were on this machine, but they are much > faster than on this machine. Its not something silly like you have 4k disks which aren't 4k aligned is it? IIRC you where using rsync, is there a reason why you aren't using zfs send - recv? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 02:38:01 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D2EB294D for ; Mon, 14 Jan 2013 02:38:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx08.syd.optusnet.com.au (fallbackmx08.syd.optusnet.com.au [211.29.132.10]) by mx1.freebsd.org (Postfix) with ESMTP id 35618D41 for ; Mon, 14 Jan 2013 02:38:00 +0000 (UTC) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0E2bwBo026710 for ; Mon, 14 Jan 2013 13:37:58 +1100 Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0E2bjxc027910 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 14 Jan 2013 13:37:48 +1100 Date: Mon, 14 Jan 2013 13:37:45 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Eitan Adler Subject: Re: What are the limits for FFS file systems and assorted questions In-Reply-To: Message-ID: <20130114120607.T1405@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Zty1sKHG c=1 sm=1 a=kfiv9bC3p58A:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=VR0RDpC6nxIA:10 a=uyavkMrdAAAA:8 a=6I5d2MoRAAAA:8 a=F9O_ex-_ffWYsS3ENVsA:9 a=CjuIK1q_8ugA:10 a=CA2XKIK3M3QA:10 a=JGX6LFFZUg8A:10 a=JXfXW3mLlncKNhj-:21 a=24pwTRVnVc0i8OkX:21 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 02:38:01 -0000 On Sun, 13 Jan 2013, Eitan Adler wrote: > Can anyone provide an up to date answer for the following: > > If these are all already perfect and correct can you please tell me so? > > On 18 December 2012 23:13, Eitan Adler wrote: > >> http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/book.html#ffs-limits >> Are the bugs listed still bugs? This almost useless, since it pre-dates ffs2. It seems to be derived from something I wrote in a mailing list. The "Should Work" column in the table is now implemented, but it is only for ffs1 and is buggy for a block sizes of 8K (the limit should be 16TB, not 32TB). The wording of the descriptions could be improved. All related known bugs for ffs1, including the ones described there, were fixed 5-15 years ago. But recent work on ext2fs showed a new one. A very minor one that only recently became reachable, but it has been fixed in Linux-ext2fs: there is a block count (di_nblocks in ffs[1-2]) that is only 32 bits in ffs1 and in ext2fs (actually it only has 31 bits in ffs1 and in FreeBSD-ext2fs, since it is signed). Fs block numbers in these fs's are also 32 (or 31) bits, but this block counter doesn't suffice for counting them because it has units of 512-blocks while fs block numbers have larger units. When this block counter overflows, the only (?) thing broken is st_nblocks in stat(2). One way of fixing this is to limit the file size to 1TB - 1. This would also simplify describing the limit. This is only a serious restriction for sparse files. With the default block size of 32K, ffs1 can only handle file systems of size 64TB. It can only handle 1 non-sparse file of size nearly 64TB, or 63 non-sparse files of size 1TB-1. It is now barely reasonable to have non-sparse files of these sizes, but systems with such files probably wouldn't be using ffs1. Sparse files are more interesting. You can fit a large number of sparse files of size 64TB-1 in on a file system of size just a few GB, and also write them in less than a day or two. Also, the potentially-overflowing block counter is for physical blocks, so it can't overflow for fairly sparse files. Thus restricting the file size to 1TB-1 would break some cases unnecessarily. ffs2 generally gives much larger limits for file system sizes but halves the limits for file sizes (since block numbers are twice as large, the block size must be twice as large to fit the same number of block numbers in an indirect block). >> http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/book.html#mount-foreign-fs >> Is this completely true? Should it be updated? This doesn't give much detail, so there is less to go wrong in it. >> http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/book.html#alternate-directory-layout >> Does this still deserve to be listed as a FAQ? I think it never did, since it is about a technical problem that can't really be solved outside of the file system, especially with today's disk sizes allowing 10's if not thousands as many files as when it was written in 1998, or thousands if not millions as many files as when ffs was written in ~1983. With millions of files, you just can't make much difference with a few changes to the directory layout. It was written by mckusick in 1998, so it is also out-of-date with respect to the better layout policies that he implemented in ffs in 2001. BTW, cp(1) still has bogus sorting related to this. It sorts files so that non-directory files are copied before directory files, because it knows too much about ffs's internals and about ffs being the only file system. Perhaps this is still good if the file system is ffs, but I think it is better to preserve any existing order that you get from the command line or from a directory traversal (use fts and specify pre- or post-order). But the sorting function is of low quality and tends to destroy any existing order: - it uses qsort(), which gives an unstable sort for items that compare equal - everything except directories vs non-directories compares equal. The result is that if you have a perfectly sorted list on the command line, say consisting of all regular files in alphabetical order, then the order is very unstable. Except the instability is very stable -- it is usually close to a perfect inversion of the order. Anyway, this instability makes it impossible to either preserve existing orders in file hierarchies or to specify optimal orders on the command line. Bruce From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 09:40:13 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 64E59E48; Mon, 14 Jan 2013 09:40:13 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id C4AD6F66; Mon, 14 Jan 2013 09:40:11 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3Yl8mV0wr0z7yTR; Mon, 14 Jan 2013 10:40:10 +0100 (CET) Date: Mon, 14 Jan 2013 10:40:10 +0100 From: Nicolas Rachinsky To: Artem Belevich Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130114094010.GA75529@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="6TrnltStXW4iwmi0" Content-Disposition: inline In-Reply-To: X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 09:40:13 -0000 --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline * Artem Belevich [2013-01-11 12:39 -0800]: > On Thu, Jan 10, 2013 at 11:34 PM, Nicolas Rachinsky > wrote: > > * Nicolas Rachinsky [2013-01-10 20:39 +0100]: > >> after replacing one of the controllers, all problems seem to have > >> disappeared. Thank you very much for your advice! > > > > Now the problem is back. > > > > After changing the controller, there were no more timeouts logged. > > > > No UDMA_CRC_Error_Count changed. > > > > Is there anything special about ada8? It does seem to have noticeably > higher service time compared to other disks. Nothing I know of. The disks are Samsung HD103UJ and HD103SI, multiple of each type. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0007 073 073 011 Pre-fail Always - 8890 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 32 5 Reallocated_Sector_Ct 0x0033 094 094 010 Pre-fail Always - 166 7 Seek_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 10872 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5688 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 31 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 000 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 078 069 000 Old_age Always - 22 (Min/Max 21/25) 194 Temperature_Celsius 0x0022 077 067 000 Old_age Always - 23 (Min/Max 21/26) 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 1259614646 196 Reallocated_Event_Count 0x0032 096 096 000 Old_age Always - 166 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x000a 100 099 000 Old_age Always - 5 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 Reallocated_Sector_Ct did not increase during the last days. > Cound you do gstat with 1-second interval. Some of the 5-second > samples show that ada8 is the bottleneck -- it has its request queue > full (L(q)=10) when all other drives were done with their jobs. And > that's a 5-sec average. Its write service time also seems to be a lot > higher than for other drives. Attached. I have replace ada8 by ada9, which is a Western Digital Caviar Black. Now ada0 and ada4 seem to be the bottleneck. But I don't understand the intervalls without any disk activity. > Does the drive have its write cache disabled by any chance? That could > explain why it takes so much longer to service writes. No, camcontrol identify says it's enabled. > Can you remove ada8 and see if your performance go back to normal? The problem still persists. Thank you for your help! Nicolas -- http://www.rachinsky.de/nicolas --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="gstat.txt" dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 0 0 0 0.0 0 0 0.0 0.0 ada0 0 0 0 0 0.0 0 0 0.0 0.0 ada1 0 0 0 0 0.0 0 0 0.0 0.0 ada2 0 0 0 0 0.0 0 0 0.0 0.0 ada3 0 0 0 0 0.0 0 0 0.0 0.0 ada4 0 0 0 0 0.0 0 0 0.0 0.0 ada5 0 0 0 0 0.0 0 0 0.0 0.0 ada6 0 0 0 0 0.0 0 0 0.0 0.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 0 0 0 0.0 0 0 0.0 0.0 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 46 0 0 0.0 46 33 0.9 0.9 ada0 0 47 0 0 0.0 47 32 3.3 1.9 ada1 0 47 0 0 0.0 47 32 3.3 1.9 ada2 0 47 0 0 0.0 47 32 3.3 1.9 ada3 0 49 0 0 0.0 49 33 0.9 0.9 ada4 0 49 0 0 0.0 49 33 3.6 2.1 ada5 0 46 0 0 0.0 46 33 1.0 0.9 ada6 0 46 0 0 0.0 46 33 0.8 0.8 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 49 0 0 0.0 49 33 0.8 0.8 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 10 429 0 0 0.0 429 997 2.5 18.1 ada0 0 458 21 27 1.2 437 994 3.1 17.7 ada1 0 406 0 0 0.0 406 988 2.6 14.6 ada2 0 427 0 0 0.0 427 989 2.0 12.5 ada3 10 335 0 0 0.0 335 938 4.1 22.9 ada4 0 419 0 0 0.0 419 990 2.1 11.9 ada5 0 434 0 0 0.0 434 1005 2.1 13.1 ada6 0 486 25 133 5.2 461 1006 1.4 12.1 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 441 20 35 6.2 421 994 1.4 13.1 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 275 0 0 0.0 274 334 12.7 56.6 ada0 0 278 11 14 3.0 266 308 2.0 28.7 ada1 0 305 0 0 0.0 303 315 1.5 28.7 ada2 0 303 0 0 0.0 301 311 1.6 14.7 ada3 0 311 0 0 0.0 309 375 15.4 69.2 ada4 0 285 0 0 0.0 283 310 2.1 15.8 ada5 0 282 0 0 0.0 280 306 1.7 18.6 ada6 0 307 11 17 2.3 294 318 1.0 19.1 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 329 9 6 0.5 318 312 0.7 12.4 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.000s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 0 0 0 0.0 0 0 0.0 0.0 ada0 0 0 0 0 0.0 0 0 0.0 0.0 ada1 0 0 0 0 0.0 0 0 0.0 0.0 ada2 0 0 0 0 0.0 0 0 0.0 0.0 ada3 0 0 0 0 0.0 0 0 0.0 0.0 ada4 0 0 0 0 0.0 0 0 0.0 0.0 ada5 0 0 0 0 0.0 0 0 0.0 0.0 ada6 0 0 0 0 0.0 0 0 0.0 0.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 0 0 0 0.0 0 0 0.0 0.0 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.000s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 23 7 7 0.2 16 21 1.3 0.6 ada0 0 29 15 39 7.8 14 20 1.6 3.6 ada1 0 17 2 2 0.2 15 20 1.4 0.6 ada2 0 16 2 2 0.2 14 19 1.4 0.6 ada3 0 19 5 5 0.2 14 19 1.2 0.5 ada4 0 19 5 5 0.2 14 19 1.2 0.5 ada5 0 23 7 7 0.2 16 22 1.2 0.6 ada6 0 29 13 9 0.2 16 20 1.1 0.6 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 28 14 42 11.8 14 21 1.0 11.5 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 0 0 0 0.0 0 0 0.0 0.0 ada0 0 0 0 0 0.0 0 0 0.0 0.0 ada1 0 0 0 0 0.0 0 0 0.0 0.0 ada2 0 0 0 0 0.0 0 0 0.0 0.0 ada3 0 0 0 0 0.0 0 0 0.0 0.0 ada4 0 0 0 0 0.0 0 0 0.0 0.0 ada5 0 0 0 0 0.0 0 0 0.0 0.0 ada6 0 0 0 0 0.0 0 0 0.0 0.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 0 0 0 0.0 0 0 0.0 0.0 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 0 0 0 0.0 0 0 0.0 0.0 ada0 0 0 0 0 0.0 0 0 0.0 0.0 ada1 0 0 0 0 0.0 0 0 0.0 0.0 ada2 0 0 0 0 0.0 0 0 0.0 0.0 ada3 0 0 0 0 0.0 0 0 0.0 0.0 ada4 0 0 0 0 0.0 0 0 0.0 0.0 ada5 0 0 0 0 0.0 0 0 0.0 0.0 ada6 0 0 0 0 0.0 0 0 0.0 0.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 0 0 0 0.0 0 0 0.0 0.0 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 142 0 0 0.0 142 624 2.6 7.8 ada0 0 148 0 0 0.0 148 628 3.4 9.2 ada1 0 147 0 0 0.0 147 634 2.1 7.2 ada2 0 148 0 0 0.0 148 629 2.3 7.6 ada3 0 146 0 0 0.0 146 633 1.7 6.6 ada4 5 140 0 0 0.0 140 623 3.2 8.6 ada5 0 149 0 0 0.0 149 634 1.8 6.9 ada6 0 142 0 0 0.0 142 624 1.3 6.1 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 146 0 0 0.0 146 627 1.4 6.2 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.000s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 10 842 0 0 0.0 842 714 8.1 80.0 ada0 0 872 42 85 8.3 830 715 2.6 42.4 ada1 0 943 0 0 0.0 943 764 1.3 18.3 ada2 0 954 0 0 0.0 954 773 1.5 20.0 ada3 10 815 0 0 0.0 815 700 7.8 73.2 ada4 0 935 0 0 0.0 935 750 1.6 21.4 ada5 0 880 0 0 0.0 880 753 3.4 40.4 ada6 0 910 46 133 8.6 864 704 1.4 25.7 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 925 44 71 7.7 881 710 1.8 35.9 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 53 0 0 0.0 51 49 19.0 33.0 ada0 0 6 0 0 0.0 4 4 0.4 30.9 ada1 0 6 0 0 0.0 4 4 0.3 29.2 ada2 0 6 0 0 0.0 4 4 0.3 5.1 ada3 0 41 0 0 0.0 39 38 24.8 32.6 ada4 0 6 0 0 0.0 4 4 0.4 7.6 ada5 0 6 0 0 0.0 4 4 0.3 14.9 ada6 0 6 0 0 0.0 4 4 0.2 10.2 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 6 0 0 0.0 4 4 0.5 9.7 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.000s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 0 0 0 0.0 0 0 0.0 0.0 ada0 0 0 0 0 0.0 0 0 0.0 0.0 ada1 0 0 0 0 0.0 0 0 0.0 0.0 ada2 0 0 0 0 0.0 0 0 0.0 0.0 ada3 0 0 0 0 0.0 0 0 0.0 0.0 ada4 0 0 0 0 0.0 0 0 0.0 0.0 ada5 0 0 0 0 0.0 0 0 0.0 0.0 ada6 0 0 0 0 0.0 0 0 0.0 0.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 0 0 0 0.0 0 0 0.0 0.0 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.000s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 19 6 6 3.4 13 19 1.5 2.6 ada0 0 28 14 41 12.1 14 21 1.7 10.8 ada1 0 15 1 1 8.7 14 21 1.4 1.4 ada2 0 15 1 1 0.2 14 21 1.6 0.6 ada3 0 19 5 5 4.4 14 21 1.5 2.8 ada4 0 19 5 5 3.7 14 21 1.7 2.4 ada5 0 20 6 6 2.7 14 21 1.5 2.2 ada6 0 26 13 9 7.9 13 18 1.4 4.4 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 27 14 9 17.8 13 20 1.4 10.3 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 0 0 0 0.0 0 0 0.0 0.0 ada0 0 0 0 0 0.0 0 0 0.0 0.0 ada1 0 0 0 0 0.0 0 0 0.0 0.0 ada2 0 0 0 0 0.0 0 0 0.0 0.0 ada3 0 0 0 0 0.0 0 0 0.0 0.0 ada4 0 0 0 0 0.0 0 0 0.0 0.0 ada5 0 0 0 0 0.0 0 0 0.0 0.0 ada6 0 0 0 0 0.0 0 0 0.0 0.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 0 0 0 0.0 0 0 0.0 0.0 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 0 0 0 0.0 0 0 0.0 0.0 ada0 0 0 0 0 0.0 0 0 0.0 0.0 ada1 0 0 0 0 0.0 0 0 0.0 0.0 ada2 0 0 0 0 0.0 0 0 0.0 0.0 ada3 0 0 0 0 0.0 0 0 0.0 0.0 ada4 0 0 0 0 0.0 0 0 0.0 0.0 ada5 0 0 0 0 0.0 0 0 0.0 0.0 ada6 0 0 0 0 0.0 0 0 0.0 0.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 0 0 0 0.0 0 0 0.0 0.0 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 0 0 0 0.0 0 0 0.0 0.0 ada0 0 0 0 0 0.0 0 0 0.0 0.0 ada1 0 0 0 0 0.0 0 0 0.0 0.0 ada2 0 0 0 0 0.0 0 0 0.0 0.0 ada3 0 0 0 0 0.0 0 0 0.0 0.0 ada4 0 0 0 0 0.0 0 0 0.0 0.0 ada5 0 0 0 0 0.0 0 0 0.0 0.0 ada6 0 0 0 0 0.0 0 0 0.0 0.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 0 0 0 0.0 0 0 0.0 0.0 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 0 0 0 0.0 0 0 0.0 0.0 ada0 0 0 0 0 0.0 0 0 0.0 0.0 ada1 0 0 0 0 0.0 0 0 0.0 0.0 ada2 0 0 0 0 0.0 0 0 0.0 0.0 ada3 0 0 0 0 0.0 0 0 0.0 0.0 ada4 0 0 0 0 0.0 0 0 0.0 0.0 ada5 0 0 0 0 0.0 0 0 0.0 0.0 ada6 0 0 0 0 0.0 0 0 0.0 0.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 0 0 0 0.0 0 0 0.0 0.0 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 87 0 0 0.0 87 481 3.0 4.2 ada0 0 88 0 0 0.0 88 477 3.8 4.9 ada1 0 92 0 0 0.0 92 483 2.8 4.1 ada2 0 89 0 0 0.0 89 477 2.3 3.5 ada3 0 100 0 0 0.0 100 480 1.8 3.4 ada4 0 100 0 0 0.0 100 475 3.6 5.4 ada5 0 87 0 0 0.0 87 483 1.8 3.2 ada6 0 89 0 0 0.0 89 480 1.6 3.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 100 0 0 0.0 100 474 1.5 3.1 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 6 0 0 0.0 6 68 0.2 0.0 mirror/ROOT121027.journal dT: 1.001s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 10 449 0 0 0.0 449 400 9.0 49.6 ada0 0 477 26 68 4.3 451 418 2.6 21.9 ada1 0 469 0 0 0.0 469 417 1.7 11.2 ada2 0 466 0 0 0.0 466 412 1.7 10.9 ada3 10 378 0 0 0.0 378 305 9.3 43.0 ada4 0 475 0 0 0.0 475 414 1.8 12.1 ada5 0 430 0 0 0.0 430 407 8.7 41.3 ada6 0 511 27 35 8.0 484 423 1.2 14.2 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 505 26 49 10.4 479 416 0.7 17.8 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0 ad4 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 0 0 0 0.0 0 0 0.0 0.0 ad6 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1 0 139 0 0 0.0 137 155 6.4 24.4 ada0 0 138 9 4 1.1 127 131 1.8 27.3 ada1 0 153 0 0 0.0 151 137 1.7 22.6 ada2 0 150 0 0 0.0 148 135 1.3 11.2 ada3 0 140 0 0 0.0 138 237 15.8 41.0 ada4 0 141 0 0 0.0 139 134 1.3 14.1 ada5 0 129 0 0 0.0 127 128 1.4 16.2 ada6 0 158 5 2 0.7 151 128 0.5 15.0 ada7 0 0 0 0 0.0 0 0 0.0 0.0 ada8 0 153 5 2 11.9 146 138 0.6 19.9 ada9 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad6s1b 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/SWAP121027 0 0 0 0 0.0 0 0 0.0 0.0 mirror/ROOT121027.journal --6TrnltStXW4iwmi0-- From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 09:43:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id BA7B2F3A for ; Mon, 14 Jan 2013 09:43:46 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 6FB07F94 for ; Mon, 14 Jan 2013 09:43:46 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3Yl8rd512Vz7ySF; Mon, 14 Jan 2013 10:43:45 +0100 (CET) Date: Mon, 14 Jan 2013 10:43:45 +0100 From: Nicolas Rachinsky To: Steven Hartland Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130114094345.GB75529@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111111147.GA34160@mid.pc5.i.0x5.de> <20130113235239.GA16318@mid.pc5.i.0x5.de> <778BCD159C6546A4ADC7D9FBBFC3DA8A@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <778BCD159C6546A4ADC7D9FBBFC3DA8A@multiplay.co.uk> X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 09:43:46 -0000 * Steven Hartland [2013-01-14 00:05 -0000]: > > ----- Original Message ----- From: "Nicolas Rachinsky" > > To: "Steven Hartland" > Cc: "freebsd-fs" > Sent: Sunday, January 13, 2013 11:52 PM > Subject: Re: slowdown of zfs (tx->tx) > > > >* Steven Hartland [2013-01-11 13:58 -0000]: > >>TBH looks like your just saturating your disks with the number of IOP's > >>your doing. > > > >But now a backup takes forever (16hours and more) that took less than > >30 minutes two weeks ago. > > > >I duplicated the complete setup to another (slower) server. There the > >backups are slower than they were on this machine, but they are much > >faster than on this machine. ^they are ^now > > Its not something silly like you have 4k disks which aren't 4k aligned > is it? No, these are 512 bytes/sector disks. And it started to become so slow without any change (hardware or software). > IIRC you where using rsync, is there a reason why you aren't using > zfs send - recv? Most of the machines whose backups are put to this machine run with linux and don't have zfs. Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 10:58:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id EC964B73 for ; Mon, 14 Jan 2013 10:58:41 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-la0-f51.google.com (mail-la0-f51.google.com [209.85.215.51]) by mx1.freebsd.org (Postfix) with ESMTP id 76FA2307 for ; Mon, 14 Jan 2013 10:58:41 +0000 (UTC) Received: by mail-la0-f51.google.com with SMTP id fj20so3698736lab.10 for ; Mon, 14 Jan 2013 02:58:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type; bh=3yQl+K7AqRR3ky63oHeeLRDoNxK7bqfmNqdZMlfWg7A=; b=aPwmF7lQ5mB91NH3WXS65qcErYVyy3+SLnc2ULfHf1rw737Cd1IuIpkAOgYSprsTue D8N65oOp3pwUok8Pqb3tB5Q0WKhiA/ficfk/I19mR1rpnFiIAPXqOvBXz39T/OscblCl I0HyXQHsZGKttymMkvmCbsyUZw4O6T4IDSzqulCYUqzZTg9XhWc96/zWaCBPVOvf86pN vBIc0SbkZz+pLNVKoFbt9+KFLFme3e702y5DW3LjNtzkQV9gvrw0B6Oy7Vcr3pXShRGW /HYDFw/tXxO3JOiK4Y0cJk2kOFnhaHskYaJCO4gpv7NpJsQNCnoOpMxRmbQLxelt7Hzt KhfQ== X-Received: by 10.152.145.8 with SMTP id sq8mr80768163lab.21.1358161120058; Mon, 14 Jan 2013 02:58:40 -0800 (PST) Received: from [192.168.50.105] (double-l.xs4all.nl. [80.126.205.144]) by mx.google.com with ESMTPS id ox6sm5029619lab.16.2013.01.14.02.58.38 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 14 Jan 2013 02:58:39 -0800 (PST) Message-ID: <50F3E4DC.8030704@gmail.com> Date: Mon, 14 Jan 2013 11:58:36 +0100 From: Johan Hendriks User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Nicolas Rachinsky Subject: Re: slowdown of zfs (tx->tx) References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> In-Reply-To: <20130114094010.GA75529@mid.pc5.i.0x5.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 10:58:42 -0000 Nicolas Rachinsky schreef: > * Artem Belevich [2013-01-11 12:39 -0800]: >> On Thu, Jan 10, 2013 at 11:34 PM, Nicolas Rachinsky >> wrote: >>> * Nicolas Rachinsky [2013-01-10 20:39 +0100]: >>>> after replacing one of the controllers, all problems seem to have >>>> disappeared. Thank you very much for your advice! >>> Now the problem is back. >>> >>> After changing the controller, there were no more timeouts logged. >>> >>> No UDMA_CRC_Error_Count changed. >>> >> Is there anything special about ada8? It does seem to have noticeably >> higher service time compared to other disks. > Nothing I know of. The disks are Samsung HD103UJ and HD103SI, multiple > of each type. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 > 3 Spin_Up_Time 0x0007 073 073 011 Pre-fail Always - 8890 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 32 > 5 Reallocated_Sector_Ct 0x0033 094 094 010 Pre-fail Always - 166 > 7 Seek_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 > 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 10872 > 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 5688 > 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 > 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 31 > 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0 > 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 > 184 End-to-End_Error 0x0033 100 100 000 Pre-fail Always - 0 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 > 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 > 190 Airflow_Temperature_Cel 0x0022 078 069 000 Old_age Always - 22 (Min/Max 21/25) > 194 Temperature_Celsius 0x0022 077 067 000 Old_age Always - 23 (Min/Max 21/26) > 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 1259614646 > 196 Reallocated_Event_Count 0x0032 096 096 000 Old_age Always - 166 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 > 200 Multi_Zone_Error_Rate 0x000a 100 099 000 Old_age Always - 5 > 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 > > Reallocated_Sector_Ct did not increase during the last days. > > >> Cound you do gstat with 1-second interval. Some of the 5-second >> samples show that ada8 is the bottleneck -- it has its request queue >> full (L(q)=10) when all other drives were done with their jobs. And >> that's a 5-sec average. Its write service time also seems to be a lot >> higher than for other drives. > Attached. I have replace ada8 by ada9, which is a Western Digital > Caviar Black. > > Now ada0 and ada4 seem to be the bottleneck. > > But I don't understand the intervalls without any disk activity. > >> Does the drive have its write cache disabled by any chance? That could >> explain why it takes so much longer to service writes. > No, camcontrol identify says it's enabled. > >> Can you remove ada8 and see if your performance go back to normal? > The problem still persists. > > Thank you for your help! > > Nicolas > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" Could it be that something else is occupying the pool. I had to disable a security check from periodic. daily_status_security_neggrpperm_enable="NO" After i disabled that check, my pool was performing normal again. If you do not have many snapshots, it is no problem, but with a lot of snashots, this check stalls the pool. gr Johan From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 11:06:46 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 56FFA48E for ; Mon, 14 Jan 2013 11:06:46 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 3ACD463A for ; Mon, 14 Jan 2013 11:06:46 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0EB6k3w086371 for ; Mon, 14 Jan 2013 11:06:46 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0EB6jBM086369 for freebsd-fs@FreeBSD.org; Mon, 14 Jan 2013 11:06:45 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 14 Jan 2013 11:06:45 GMT Message-Id: <201301141106.r0EB6jBM086369@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 11:06:46 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/175179 fs [zfs] ZFS may attach wrong device on move o kern/175101 fs [zfs] [nfs] ZFS NFSv4 ACL's allows user without perm t o kern/175071 fs [ufs] [panic] softdep_deallocate_dependencies: unrecov o kern/174950 fs [zfs] delete ZFS ACL have no effect o kern/174949 fs [zfs] ZFS ACL: rwxp required to mkdir. p should not be o kern/174948 fs [zfs] owner@ always have ZFS ACL full permissions. Sho o kern/174372 fs [zfs] Pagefault appears to be related to ZFS o kern/174315 fs [zfs] chflags uchg not supported o kern/174310 fs [zfs] root point mounting broken on CURRENT with multi o kern/174279 fs [ufs] UFS2-SU+J journal and filesystem corruption o kern/174060 fs [ext2fs] Ext2FS system crashes (buffer overflow?) o kern/173830 fs [zfs] Brain-dead simple change to ZFS error descriptio o kern/173718 fs [zfs] phantom directory in zraid2 pool f kern/173657 fs [nfs] strange UID map with nfsuserd o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165923 fs [nfs] Writing to NFS-backed mmapped files fails if flu o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis p kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 300 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 19:13:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 48135DA5 for ; Mon, 14 Jan 2013 19:13:42 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vb0-f47.google.com (mail-vb0-f47.google.com [209.85.212.47]) by mx1.freebsd.org (Postfix) with ESMTP id F04751C9 for ; Mon, 14 Jan 2013 19:13:41 +0000 (UTC) Received: by mail-vb0-f47.google.com with SMTP id e21so3912475vbm.34 for ; Mon, 14 Jan 2013 11:13:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=pZo+nyGb8Coy3KWiu9wDGHcznmwlSyv8uWLJMzxP8mo=; b=mq5NNJ+pWNc2HMqdufIE0bqP3uMKcLUCoBhDsyVVTRtmGDAGGj2ru4eE7C1unorfRk KX4G1HuZPMqgtpPW79SfU0VEqxUpZgN9Dc1nTkLGh2cb8KcfhsKf90v5wA6Z7WvtOYSY VKJ1E1j8dKVHLuUIkEZyV0WHiS1LqqHlsXJ8jEBa9L1ivNVWIfvH4wMpLKeG08ZOeg8D Wi+qiRoqpKTuJk/DtXQSD0fCDiInQc4w2Vqy8GR51TmBltQVOPu3onf/1KZxRnSnVXUb A8lmkUUTSp2gzCi1pba/KKWHav6IQAzZ7LIWfVMZwK0zwOeQaC9VAiErATYYgmO+blNV BKxw== MIME-Version: 1.0 Received: by 10.52.180.200 with SMTP id dq8mr89384491vdc.71.1358190820894; Mon, 14 Jan 2013 11:13:40 -0800 (PST) Sender: artemb@gmail.com Received: by 10.220.122.196 with HTTP; Mon, 14 Jan 2013 11:13:40 -0800 (PST) In-Reply-To: <20130114094010.GA75529@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> Date: Mon, 14 Jan 2013 11:13:40 -0800 X-Google-Sender-Auth: wj3keMDjo9kBGkdBzj1W7RwB6V0 Message-ID: Subject: Re: slowdown of zfs (tx->tx) From: Artem Belevich To: Nicolas Rachinsky Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 19:13:42 -0000 On Mon, Jan 14, 2013 at 1:40 AM, Nicolas Rachinsky wrote: > 5 Reallocated_Sector_Ct 0x0033 094 094 010 Pre-fail Always - 166 > 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 1259614646 > 196 Reallocated_Event_Count 0x0032 096 096 000 Old_age Always - 166 > Reallocated_Sector_Ct did not increase during the last days. It does not matter IMHO. That hard drive already got quite a few bad sectors that ECC could not deal with. There are apparently more marginally bad sectors, but ECC deals with it for now. Once enough bits rot, you'll get more bad sectors. I personally would replace the drive. >> Cound you do gstat with 1-second interval. Some of the 5-second >> samples show that ada8 is the bottleneck -- it has its request queue >> full (L(q)=10) when all other drives were done with their jobs. And >> that's a 5-sec average. Its write service time also seems to be a lot >> higher than for other drives. > > Attached. I have replace ada8 by ada9, which is a Western Digital > Caviar Black. > > Now ada0 and ada4 seem to be the bottleneck. > > But I don't understand the intervalls without any disk activity. It is puzzling. Is rsync still sleeping in tx->tx state? Try running "procstat -kk " periodically. It will print in-kernel stack trace and may help giving a clue where/why rsync is stuck. --Artem From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 19:37:16 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C2DCD731; Mon, 14 Jan 2013 19:37:16 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 0A59E34F; Mon, 14 Jan 2013 19:37:16 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id AFD43B95B; Mon, 14 Jan 2013 14:37:14 -0500 (EST) From: John Baldwin To: fs@freebsd.org Subject: [PATCH] Properly handle signals on interruptible NFS mounts Date: Mon, 14 Jan 2013 14:37:04 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201301141437.05040.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 14 Jan 2013 14:37:14 -0500 (EST) Cc: Rick Macklem , Doug Rabson X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 19:37:16 -0000 When the new RPC layer was brought in, the RPC_INTR return value (to indicate an RPC request was interrupted by a signal) was not handled in the NFS client. As a result, if an NFS request is interrupted by a signal (on a mount with the "intr" option), then the nfs_request() functions would fall through to the default case and return EACCES rather than EINTR. While here, I noticed that the new RPC layer also lost all of the RPC statistics the old client used to keep (but that are still reported in 'nfsstat -c'). I've added back as many of the statistics as I could, but retries are not easy to do as only the RPC layer knows about them and not the NFS client. Index: fs/nfs/nfs_commonkrpc.c =================================================================== --- fs/nfs/nfs_commonkrpc.c (revision 245225) +++ fs/nfs/nfs_commonkrpc.c (working copy) @@ -767,12 +767,18 @@ if (stat == RPC_SUCCESS) { error = 0; } else if (stat == RPC_TIMEDOUT) { + NFSINCRGLOBAL(newnfsstats.rpctimeouts); error = ETIMEDOUT; } else if (stat == RPC_VERSMISMATCH) { + NFSINCRGLOBAL(newnfsstats.rpcinvalid); error = EOPNOTSUPP; } else if (stat == RPC_PROGVERSMISMATCH) { + NFSINCRGLOBAL(newnfsstats.rpcinvalid); error = EPROTONOSUPPORT; + } else if (stat == RPC_INTR) { + error = EINTR; } else { + NFSINCRGLOBAL(newnfsstats.rpcinvalid); error = EACCES; } if (error) { Index: nfsclient/nfs_krpc.c =================================================================== --- nfsclient/nfs_krpc.c (revision 245225) +++ nfsclient/nfs_krpc.c (working copy) @@ -549,14 +549,21 @@ */ if (stat == RPC_SUCCESS) error = 0; - else if (stat == RPC_TIMEDOUT) + else if (stat == RPC_TIMEDOUT) { + nfsstats.rpctimeouts++; error = ETIMEDOUT; - else if (stat == RPC_VERSMISMATCH) + } else if (stat == RPC_VERSMISMATCH) { + nfsstats.rpcinvalid++; error = EOPNOTSUPP; - else if (stat == RPC_PROGVERSMISMATCH) + } else if (stat == RPC_PROGVERSMISMATCH) { + nfsstats.rpcinvalid++; error = EPROTONOSUPPORT; - else + } else if (stat == RPC_INTR) { + error = EINTR; + } else { + nfsstats.rpcinvalid++; error = EACCES; + } if (error) goto nfsmout; @@ -572,6 +579,7 @@ if (error == ENOMEM) { m_freem(mrep); AUTH_DESTROY(auth); + nfsstats.rpcinvalid++; return (error); } -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 19:45:30 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D0481C00; Mon, 14 Jan 2013 19:45:30 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id A4AB765E; Mon, 14 Jan 2013 19:45:30 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1E09DB95E; Mon, 14 Jan 2013 14:45:30 -0500 (EST) From: John Baldwin To: fs@freebsd.org Subject: [PATCH] Better handle NULL utimes() in the NFS client Date: Mon, 14 Jan 2013 14:45:29 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201301141445.29260.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 14 Jan 2013 14:45:30 -0500 (EST) Cc: Rick Macklem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 19:45:30 -0000 The NFS client tries to infer when an application has passed NULL to utimes() so that it can let the server set the timestamp rather than using a client- supplied timestamp. It does this by checking to see if the desired timestamp's second matches the current second. However, this breaks applications that are intentionally trying to set a specific timestamp within the current second. In addition, utimes() sets a flag to indicate if NULL was passed to utimes(). The patch below changes the NFS client to check this flag and only use the server-supplied time in that case: Index: fs/nfsclient/nfs_clport.c =================================================================== --- fs/nfsclient/nfs_clport.c (revision 225511) +++ fs/nfsclient/nfs_clport.c (working copy) @@ -762,7 +762,7 @@ *tl = newnfs_false; } if (vap->va_atime.tv_sec != VNOVAL) { - if (vap->va_atime.tv_sec != curtime.tv_sec) { + if (!(vap->va_vaflags & VA_UTIMES_NULL)) { NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED); *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); txdr_nfsv3time(&vap->va_atime, tl); @@ -775,7 +775,7 @@ *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE); } if (vap->va_mtime.tv_sec != VNOVAL) { - if (vap->va_mtime.tv_sec != curtime.tv_sec) { + if (!(vap->va_vaflags & VA_UTIMES_NULL)) { NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED); *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); txdr_nfsv3time(&vap->va_mtime, tl); Index: nfsclient/nfs_subs.c =================================================================== --- nfsclient/nfs_subs.c (revision 225511) +++ nfsclient/nfs_subs.c (working copy) @@ -1119,7 +1119,7 @@ *tl = nfs_false; } if (va->va_atime.tv_sec != VNOVAL) { - if (va->va_atime.tv_sec != time_second) { + if (!(vattr.va_vaflags & VA_UTIMES_NULL)) { tl = nfsm_build_xx(3 * NFSX_UNSIGNED, mb, bpos); *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); txdr_nfsv3time(&va->va_atime, tl); @@ -1132,7 +1132,7 @@ *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE); } if (va->va_mtime.tv_sec != VNOVAL) { - if (va->va_mtime.tv_sec != time_second) { + if (!(vattr.va_vaflags & VA_UTIMES_NULL)) { tl = nfsm_build_xx(3 * NFSX_UNSIGNED, mb, bpos); *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); txdr_nfsv3time(&va->va_mtime, tl); -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 19:51:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B7CAE16B; Mon, 14 Jan 2013 19:51:50 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id E5A186FB; Mon, 14 Jan 2013 19:51:49 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3YlQLD53Bzz7ySc; Mon, 14 Jan 2013 20:51:48 +0100 (CET) Date: Mon, 14 Jan 2013 20:51:48 +0100 From: Nicolas Rachinsky To: Artem Belevich Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130114195148.GA20540@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 19:51:50 -0000 * Artem Belevich [2013-01-14 11:13 -0800]: > On Mon, Jan 14, 2013 at 1:40 AM, Nicolas Rachinsky > wrote: > > 5 Reallocated_Sector_Ct 0x0033 094 094 010 Pre-fail Always - 166 > > 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 1259614646 > > 196 Reallocated_Event_Count 0x0032 096 096 000 Old_age Always - 166 > > > Reallocated_Sector_Ct did not increase during the last days. > > It does not matter IMHO. That hard drive already got quite a few bad > sectors that ECC could not deal with. There are apparently more > marginally bad sectors, but ECC deals with it for now. Once enough > bits rot, you'll get more bad sectors. I personally would replace the > drive. Yes, I'll do that. > >> Cound you do gstat with 1-second interval. Some of the 5-second > >> samples show that ada8 is the bottleneck -- it has its request queue > >> full (L(q)=10) when all other drives were done with their jobs. And > >> that's a 5-sec average. Its write service time also seems to be a lot > >> higher than for other drives. > > > > Attached. I have replace ada8 by ada9, which is a Western Digital > > Caviar Black. > > > > Now ada0 and ada4 seem to be the bottleneck. > > > > But I don't understand the intervalls without any disk activity. > > It is puzzling. Is rsync still sleeping in tx->tx state? Try running > "procstat -kk " periodically. It will print in-kernel stack > trace and may help giving a clue where/why rsync is stuck. # sh -c 'for i in `jot 100`; do procstat -kk 36639 ; sleep 1; done' | sort | uniq -c 100 PID TID COMM TDNAME KSTACK 1 36639 100574 rsync - 99 36639 100574 rsync - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 zfs_freebsd_write+0x3a6 VOP_WRITE_APV+0xb2 vn_write+0x373 dofilewrite+0x8b kern_writev+0x60 write+0x55 amd64_syscall+0x1f4 Xfast_syscall+0xfc # sh -c 'for i in `jot 100`; do procstat -kk 36639 ; sleep 0.36; done' | sort | uniq -c 100 PID TID COMM TDNAME KSTACK 1 36639 100574 rsync - mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_delay+0x137 dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_freebsd_write+0x38a VOP_WRITE_APV+0xb2 vn_write+0x373 dofilewrite+0x8b kern_writev+0x60 write+0x55 amd64_syscall+0x1f4 Xfast_syscall+0xfc 99 36639 100574 rsync - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 zfs_freebsd_write+0x3a6 VOP_WRITE_APV+0xb2 vn_write+0x373 dofilewrite+0x8b kern_writev+0x60 write+0x55 amd64_syscall+0x1f4 Xfast_syscall+0xfc # sh -c 'for i in `jot 100`; do procstat -kk 36639 ; sleep 0.1; done' | sort | uniq -c 100 PID TID COMM TDNAME KSTACK 100 36639 100574 rsync - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 zfs_freebsd_write+0x3a6 VOP_WRITE_APV+0xb2 vn_write+0x373 dofilewrite+0x8b kern_writev+0x60 write+0x55 amd64_syscall+0x1f4 Xfast_syscall+0xfc Thanks in advance Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 20:41:03 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 73244162 for ; Mon, 14 Jan 2013 20:41:03 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vb0-f51.google.com (mail-vb0-f51.google.com [209.85.212.51]) by mx1.freebsd.org (Postfix) with ESMTP id 1772D92F for ; Mon, 14 Jan 2013 20:41:02 +0000 (UTC) Received: by mail-vb0-f51.google.com with SMTP id fq11so4047416vbb.10 for ; Mon, 14 Jan 2013 12:41:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=XQIM1nDAEmRY7MDfEhauYCWiozqZHx65EbRdswFhdJE=; b=wK7sZLleTAtTH9cg7amqFyKXAFSxIbuYUvjARznHHQ81hypaT8UfmmToSCsNi4BqXr r14C9VrhOJLPqjpc34CnRo0e1ujXP1lxgHF2R9oQvLP4wBDFOFgdYV1+hyRedEmp4DV9 5vPVIIesiLyhe3DySZnUP2uS5oMAF5M5YdfjM0hLiMxoBCufSATVDbSD2V3u+geBz4iT 8FvEaohBpAyaAvQGQec2PDi+hsovSb3uRLsdhqno9xMTx7jxVIDIwzI9GL1Dg0PWM0jI 2dWi8k/juPcZx9jBfhOSTzsH7U5sfznghfIHtrYzAqCSKXuMM40h0QfKiwWPf2fiMgYe r/ig== MIME-Version: 1.0 Received: by 10.52.156.40 with SMTP id wb8mr90499872vdb.39.1358196062075; Mon, 14 Jan 2013 12:41:02 -0800 (PST) Sender: artemb@gmail.com Received: by 10.220.122.196 with HTTP; Mon, 14 Jan 2013 12:41:01 -0800 (PST) In-Reply-To: <20130114195148.GA20540@mid.pc5.i.0x5.de> References: <20130108174225.GA17260@mid.pc5.i.0x5.de> <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> Date: Mon, 14 Jan 2013 12:41:01 -0800 X-Google-Sender-Auth: g2rKpwHkwJdoYdnres2GBFw2hCM Message-ID: Subject: Re: slowdown of zfs (tx->tx) From: Artem Belevich To: Nicolas Rachinsky Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 20:41:03 -0000 txg_wait_open means that ZFS is waiting for ongoing transaction group sync. There should've been some write activity in this case. Check what zfs kernel threads are doing with procstat -kk on zfskern proces= s. --Artem On Mon, Jan 14, 2013 at 11:51 AM, Nicolas Rachinsky wrote: > * Artem Belevich [2013-01-14 11:13 -0800]: >> On Mon, Jan 14, 2013 at 1:40 AM, Nicolas Rachinsky >> wrote: >> > 5 Reallocated_Sector_Ct 0x0033 094 094 010 Pre-fail Alwa= ys - 166 >> > 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Alwa= ys - 1259614646 >> > 196 Reallocated_Event_Count 0x0032 096 096 000 Old_age Alwa= ys - 166 >> >> > Reallocated_Sector_Ct did not increase during the last days. >> >> It does not matter IMHO. That hard drive already got quite a few bad >> sectors that ECC could not deal with. There are apparently more >> marginally bad sectors, but ECC deals with it for now. Once enough >> bits rot, you'll get more bad sectors. I personally would replace the >> drive. > > Yes, I'll do that. > >> >> Cound you do gstat with 1-second interval. Some of the 5-second >> >> samples show that ada8 is the bottleneck -- it has its request queue >> >> full (L(q)=3D10) when all other drives were done with their jobs. And >> >> that's a 5-sec average. Its write service time also seems to be a lot >> >> higher than for other drives. >> > >> > Attached. I have replace ada8 by ada9, which is a Western Digital >> > Caviar Black. >> > >> > Now ada0 and ada4 seem to be the bottleneck. >> > >> > But I don't understand the intervalls without any disk activity. >> >> It is puzzling. Is rsync still sleeping in tx->tx state? Try running >> "procstat -kk " periodically. It will print in-kernel stack >> trace and may help giving a clue where/why rsync is stuck. > > # sh -c 'for i in `jot 100`; do procstat -kk 36639 ; sleep 1; done' | sor= t | uniq -c > 100 PID TID COMM TDNAME KSTACK > 1 36639 100574 rsync - > 99 36639 100574 rsync - mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 zfs_freebsd_write+0x3a6 VOP_W= RITE_APV+0xb2 vn_write+0x373 dofilewrite+0x8b kern_writev+0x60 write+0x55 a= md64_syscall+0x1f4 Xfast_syscall+0xfc > > # sh -c 'for i in `jot 100`; do procstat -kk 36639 ; sleep 0.36; done' | = sort | uniq -c > 100 PID TID COMM TDNAME KSTACK > 1 36639 100574 rsync - mi_switch+0x176 sleep= q_timedwait+0x42 _cv_timedwait+0x134 txg_delay+0x137 dsl_pool_tempreserve_s= pace+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_freebsd_w= rite+0x38a VOP_WRITE_APV+0xb2 vn_write+0x373 dofilewrite+0x8b kern_writev+0= x60 write+0x55 amd64_syscall+0x1f4 Xfast_syscall+0xfc > 99 36639 100574 rsync - mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 zfs_freebsd_write+0x3a6 VOP_W= RITE_APV+0xb2 vn_write+0x373 dofilewrite+0x8b kern_writev+0x60 write+0x55 a= md64_syscall+0x1f4 Xfast_syscall+0xfc > > # sh -c 'for i in `jot 100`; do procstat -kk 36639 ; sleep 0.1; done' | s= ort | uniq -c > 100 PID TID COMM TDNAME KSTACK > 100 36639 100574 rsync - mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 zfs_freebsd_write+0x3a6 VOP_W= RITE_APV+0xb2 vn_write+0x373 dofilewrite+0x8b kern_writev+0x60 write+0x55 a= md64_syscall+0x1f4 Xfast_syscall+0xfc > > > Thanks in advance > > Nicolas > > -- > http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Mon Jan 14 21:46:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9F9FEF8D; Mon, 14 Jan 2013 21:46:54 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 9F719D50; Mon, 14 Jan 2013 21:46:53 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3YlSv031mZz7ySG; Mon, 14 Jan 2013 22:46:52 +0100 (CET) Date: Mon, 14 Jan 2013 22:46:52 +0100 From: Nicolas Rachinsky To: Artem Belevich Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130114214652.GA76779@mid.pc5.i.0x5.de> References: <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 21:46:54 -0000 * Artem Belevich [2013-01-14 12:41 -0800]: > txg_wait_open means that ZFS is waiting for ongoing transaction group > sync. There should've been some write activity in this case. > > Check what zfs kernel threads are doing with procstat -kk on zfskern process. # sh -c 'for i in `jot 1000`; do procstat -kk 47 ; sleep 0.1; done' | sort | uniq -c 1000 47 100083 zfskern arc_reclaim_thre mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x29d fork_exit+0x11f fork_trampoline+0xe 1000 47 100084 zfskern l2arc_feed_threa mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1a8 fork_exit+0x11f fork_trampoline+0xe 1000 47 100224 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x11f fork_trampoline+0xe 165 47 100225 zfskern txg_thread_enter 1 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x5e5 dbuf_findbp+0x107 dbuf_prefetch+0x8f dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x675 dnode_hold_impl+0xf2 dmu_buf_hold_array+0x38 dmu_write+0x53 space_map_sync+0x1ff metaslab_sync+0x13e vdev_sync+0x6e spa_sync+0x3ab txg_sync_thread+0x139 1 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x5e5 dbuf_will_dirty+0x60 dmu_write+0x82 space_map_sync+0x1ff metaslab_sync+0x13e vdev_sync+0x6e spa_sync+0x3ab txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe 1 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dsl_pool_sync+0x189 spa_sync+0x336 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe 81 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dsl_pool_sync+0x2c3 spa_sync+0x336 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe 719 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe 4 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 spa_sync+0x286 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe 2 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 spa_sync+0x370 txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe 21 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 vdev_config_sync+0xe3 spa_sync+0x49a txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe 5 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 vdev_uberblock_sync_list+0xd0 vdev_config_sync+0x10f spa_sync+0x49a txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe 1000 PID TID COMM TDNAME KSTACK Thanks Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 00:51:00 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 233E5EAA; Tue, 15 Jan 2013 00:51:00 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9861E8BC; Tue, 15 Jan 2013 00:50:59 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEACSn9FCDaFvO/2dsb2JhbABEhjq3WHOCHgEBAQQBAQEgKyALGw4KAgINGQIpAQkmBggHBAEcBId4DKUikFqBI4tjgxWBEwOIYYp8gi6BHI8tgxOBUTU X-IronPort-AV: E=Sophos;i="4.84,469,1355115600"; d="scan'208";a="11900158" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 14 Jan 2013 19:50:52 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 9DBE8B3F16; Mon, 14 Jan 2013 19:50:52 -0500 (EST) Date: Mon, 14 Jan 2013 19:50:52 -0500 (EST) From: Rick Macklem To: John Baldwin Message-ID: <21875538.1984621.1358211052621.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201301141437.05040.jhb@freebsd.org> Subject: Re: [PATCH] Properly handle signals on interruptible NFS mounts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Rick Macklem , Doug Rabson , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 00:51:00 -0000 John Baldwin wrote: > When the new RPC layer was brought in, the RPC_INTR return value (to > indicate > an RPC request was interrupted by a signal) was not handled in the NFS > client. > As a result, if an NFS request is interrupted by a signal (on a mount > with the > "intr" option), then the nfs_request() functions would fall through to > the > default case and return EACCES rather than EINTR. While here, I > noticed that > the new RPC layer also lost all of the RPC statistics the old client > used to > keep (but that are still reported in 'nfsstat -c'). I've added back as > many > of the statistics as I could, but retries are not easy to do as only > the RPC > layer knows about them and not the NFS client. > > Index: fs/nfs/nfs_commonkrpc.c > =================================================================== > --- fs/nfs/nfs_commonkrpc.c (revision 245225) > +++ fs/nfs/nfs_commonkrpc.c (working copy) > @@ -767,12 +767,18 @@ > if (stat == RPC_SUCCESS) { > error = 0; > } else if (stat == RPC_TIMEDOUT) { > + NFSINCRGLOBAL(newnfsstats.rpctimeouts); > error = ETIMEDOUT; > } else if (stat == RPC_VERSMISMATCH) { > + NFSINCRGLOBAL(newnfsstats.rpcinvalid); > error = EOPNOTSUPP; > } else if (stat == RPC_PROGVERSMISMATCH) { > + NFSINCRGLOBAL(newnfsstats.rpcinvalid); > error = EPROTONOSUPPORT; > + } else if (stat == RPC_INTR) { > + error = EINTR; > } else { > + NFSINCRGLOBAL(newnfsstats.rpcinvalid); > error = EACCES; > } > if (error) { > Index: nfsclient/nfs_krpc.c > =================================================================== > --- nfsclient/nfs_krpc.c (revision 245225) > +++ nfsclient/nfs_krpc.c (working copy) > @@ -549,14 +549,21 @@ > */ > if (stat == RPC_SUCCESS) > error = 0; > - else if (stat == RPC_TIMEDOUT) > + else if (stat == RPC_TIMEDOUT) { > + nfsstats.rpctimeouts++; > error = ETIMEDOUT; > - else if (stat == RPC_VERSMISMATCH) > + } else if (stat == RPC_VERSMISMATCH) { > + nfsstats.rpcinvalid++; > error = EOPNOTSUPP; > - else if (stat == RPC_PROGVERSMISMATCH) > + } else if (stat == RPC_PROGVERSMISMATCH) { > + nfsstats.rpcinvalid++; > error = EPROTONOSUPPORT; > - else > + } else if (stat == RPC_INTR) { > + error = EINTR; > + } else { > + nfsstats.rpcinvalid++; > error = EACCES; > + } > if (error) > goto nfsmout; > > @@ -572,6 +579,7 @@ > if (error == ENOMEM) { > m_freem(mrep); > AUTH_DESTROY(auth); > + nfsstats.rpcinvalid++; > return (error); > } > This patch looks fine to me, rick > > -- > John Baldwin > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 01:22:34 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D0F9F5C3; Tue, 15 Jan 2013 01:22:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7672D9CB; Tue, 15 Jan 2013 01:22:33 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAOCu9FCDaFvO/2dsb2JhbABEhjqzZYN0c4IeAQEFIwRSGw4KAgINGQJZBogspS+QW4EjjniBEwOIYY0qkEmDE4IG X-IronPort-AV: E=Sophos;i="4.84,469,1355115600"; d="scan'208";a="9060144" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 14 Jan 2013 20:20:55 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id F246FB3F26; Mon, 14 Jan 2013 20:20:54 -0500 (EST) Date: Mon, 14 Jan 2013 20:20:54 -0500 (EST) From: Rick Macklem To: John Baldwin Message-ID: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201301141445.29260.jhb@freebsd.org> Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 01:22:34 -0000 John Baldwin wrote: > The NFS client tries to infer when an application has passed NULL to > utimes() > so that it can let the server set the timestamp rather than using a > client- > supplied timestamp. It does this by checking to see if the desired > timestamp's second matches the current second. However, this breaks > applications that are intentionally trying to set a specific timestamp > within > the current second. In addition, utimes() sets a flag to indicate if > NULL was > passed to utimes(). The patch below changes the NFS client to check > this flag > and only use the server-supplied time in that case: > > Index: fs/nfsclient/nfs_clport.c > =================================================================== > --- fs/nfsclient/nfs_clport.c (revision 225511) > +++ fs/nfsclient/nfs_clport.c (working copy) > @@ -762,7 +762,7 @@ > *tl = newnfs_false; > } > if (vap->va_atime.tv_sec != VNOVAL) { > - if (vap->va_atime.tv_sec != curtime.tv_sec) { > + if (!(vap->va_vaflags & VA_UTIMES_NULL)) { > NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED); > *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); > txdr_nfsv3time(&vap->va_atime, tl); > @@ -775,7 +775,7 @@ > *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE); > } > if (vap->va_mtime.tv_sec != VNOVAL) { > - if (vap->va_mtime.tv_sec != curtime.tv_sec) { > + if (!(vap->va_vaflags & VA_UTIMES_NULL)) { > NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED); > *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); > txdr_nfsv3time(&vap->va_mtime, tl); > Index: nfsclient/nfs_subs.c > =================================================================== > --- nfsclient/nfs_subs.c (revision 225511) > +++ nfsclient/nfs_subs.c (working copy) > @@ -1119,7 +1119,7 @@ > *tl = nfs_false; > } > if (va->va_atime.tv_sec != VNOVAL) { > - if (va->va_atime.tv_sec != time_second) { > + if (!(vattr.va_vaflags & VA_UTIMES_NULL)) { > tl = nfsm_build_xx(3 * NFSX_UNSIGNED, mb, bpos); > *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); > txdr_nfsv3time(&va->va_atime, tl); > @@ -1132,7 +1132,7 @@ > *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE); > } > if (va->va_mtime.tv_sec != VNOVAL) { > - if (va->va_mtime.tv_sec != time_second) { > + if (!(vattr.va_vaflags & VA_UTIMES_NULL)) { > tl = nfsm_build_xx(3 * NFSX_UNSIGNED, mb, bpos); > *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); > txdr_nfsv3time(&va->va_mtime, tl); > > -- > John Baldwin I think this patch is ok, too. In the old days, a lot of NFS servers only stored times at a resolution of 1sec, which I think is why the code had the habit of comparing "seconds equal". If there is some app. out there that sets "current time" via utimes(2) with a curent time argument instead of a NULL argument would seem to be broken to me. (It is conceivable that some app. did this to avoid clock skew between the client and server, but I doubt it.) Have fun with it, rick ps: If you were concerned that the change might break something that depended on the old behaviour, you could apply the patch to the new client only. Then switching to an "oldnfs" mount would provide the old "same sec->set time to current time on the server" behaviour. From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 01:37:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E113CA3E for ; Tue, 15 Jan 2013 01:37:31 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vc0-f172.google.com (mail-vc0-f172.google.com [209.85.220.172]) by mx1.freebsd.org (Postfix) with ESMTP id 902F7A54 for ; Tue, 15 Jan 2013 01:37:31 +0000 (UTC) Received: by mail-vc0-f172.google.com with SMTP id fw7so4210442vcb.31 for ; Mon, 14 Jan 2013 17:37:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ERJqw0lCTNCaIJYj1D/i06oFPviy76eK8ykVyi7zL5g=; b=pCgjNYjD/Kek39O97JLEWbnDnHpxYik7sZXMyZYI7YqzZ0nZGOdFDvMZsTseoeYSKR mftYEdOeGZKTSTltO9nXsXhP5VlOVis02sFju7NttLm+Kmg3mdjWwhf233+GCiuMEjRk fgB5TGr9JBMcXj9Z++EJxpjKu+NjjuV/xdzOU9a7pxODIwqz/nIbfAi3SRPl1OfEQNuQ Dx6Nx6yUtmQCPxyCh8XnB3dRTnhWNiokp/MJoyIEAp5UD7OEbxFx6UPHvuauH41E+XZz BCG9e0941iRSaQ+7QTuu/kRn5+oAi0Qk6ovlUwvUnSRCdN5HIiOt9STnbeVf3M7ikrPG Otng== MIME-Version: 1.0 Received: by 10.52.180.200 with SMTP id dq8mr90147517vdc.71.1358213845191; Mon, 14 Jan 2013 17:37:25 -0800 (PST) Sender: artemb@gmail.com Received: by 10.220.122.196 with HTTP; Mon, 14 Jan 2013 17:37:25 -0800 (PST) In-Reply-To: <20130114214652.GA76779@mid.pc5.i.0x5.de> References: <20130109162613.GA34276@mid.pc5.i.0x5.de> <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> Date: Mon, 14 Jan 2013 17:37:25 -0800 X-Google-Sender-Auth: ooDSOCBbgBRv9mcQWqd_AE-gd9U Message-ID: Subject: Re: slowdown of zfs (tx->tx) From: Artem Belevich To: Nicolas Rachinsky Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 01:37:31 -0000 On Mon, Jan 14, 2013 at 1:46 PM, Nicolas Rachinsky wrote: > * Artem Belevich [2013-01-14 12:41 -0800]: >> txg_wait_open means that ZFS is waiting for ongoing transaction group >> sync. There should've been some write activity in this case. >> >> Check what zfs kernel threads are doing with procstat -kk on zfskern pro= cess. > > # sh -c 'for i in `jot 1000`; do procstat -kk 47 ; sleep 0.1; done' | sor= t | uniq -c > 1000 47 100083 zfskern arc_reclaim_thre mi_switch+0x176 sleep= q_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x29d fork_exit+0x1= 1f fork_trampoline+0xe > 1000 47 100084 zfskern l2arc_feed_threa mi_switch+0x176 sleep= q_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1a8 fork_exit+0x11= f fork_trampoline+0xe > 1000 47 100224 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 for= k_exit+0x11f fork_trampoline+0xe > 165 47 100225 zfskern txg_thread_enter > 1 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x5e5 dbuf_findbp+0x107 = dbuf_prefetch+0x8f dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x67= 5 dnode_hold_impl+0xf2 dmu_buf_hold_array+0x38 dmu_write+0x53 space_map_syn= c+0x1ff metaslab_sync+0x13e vdev_sync+0x6e spa_sync+0x3ab txg_sync_thread+0= x139 > 1 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x5e5 dbuf_will_dirty+0x= 60 dmu_write+0x82 space_map_sync+0x1ff metaslab_sync+0x13e vdev_sync+0x6e s= pa_sync+0x3ab txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe > 1 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dsl_pool_sync+0x189 spa_sync+0x336= txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe > 81 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dsl_pool_sync+0x2c3 spa_sync+0x336= txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe > 719 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dsl_pool_sync+0xe0 spa_sync+0x336 = txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe > 4 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 zio_wait+0x61 spa_sync+0x286 txg_sync_thread+0x1= 39 fork_exit+0x11f fork_trampoline+0xe > 2 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 zio_wait+0x61 spa_sync+0x370 txg_sync_thread+0x1= 39 fork_exit+0x11f fork_trampoline+0xe > 21 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 zio_wait+0x61 vdev_config_sync+0xe3 spa_sync+0x4= 9a txg_sync_thread+0x139 fork_exit+0x11f fork_trampoline+0xe > 5 47 100225 zfskern txg_thread_enter mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 zio_wait+0x61 vdev_uberblock_sync_list+0xd0 vdev= _config_sync+0x10f spa_sync+0x49a txg_sync_thread+0x139 fork_exit+0x11f for= k_trampoline+0xe > 1000 PID TID COMM TDNAME KSTACK OK. threads responsible for transaction sync seem to be stuck in zio_wait. zio_wait is in turn waiting for some task thread to be done with its work. Now you need to figure out what are those task threads doing. 'procstat -kk 0' will dump few hundreds of taskq threads. Most of them would be zfs related. On an idle box (8.3/amd64 in my case) most of them would have the same stack trace looking like this (modulo offsets): mi_switch+0x196 sleepq_wait+0x42 _sleep+0x3c0 taskqueue_thread_loop+0xbe fork_exit+0x11f fork_trampoline+0xe Look for stack traces that don't match that pattern. --Artem From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 02:50:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0FEAF8F8 for ; Tue, 15 Jan 2013 02:50:10 +0000 (UTC) (envelope-from edward@gogrid.com) Received: from smtp1.servepath.com (smtp1.servepath.com [216.93.160.25]) by mx1.freebsd.org (Postfix) with ESMTP id F1894DBA for ; Tue, 15 Jan 2013 02:50:09 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=january; d=gogrid.com; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:References:Content-Type:Content-Transfer-Encoding; b=g3CuU3DzA2u5FyFQ5EKi3d3IjijQUObTp6D3GOMTKXTJmi8/Jn+cUMUGYjdezKtgt2ZW04+Ec0T6po3mhq4U6aHPZLYzLKD6HVUjV5CQlTyXrn2f4NCwLXrFSDzY3g6p; Received: from [192.168.7.178] by smtp1.servepath.com with esmtp (Exim 4.68 (FreeBSD)) (envelope-from ) id 1Tuw3a-000GHb-F3 for freebsd-fs@freebsd.org; Mon, 14 Jan 2013 18:15:18 -0800 Message-ID: <50F4BBE7.7050207@gogrid.com> Date: Mon, 14 Jan 2013 18:16:07 -0800 From: Edward Xiao User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: HAST + ZFS self healing? Hot spares? References: 4DD5A1CF.70807@itassistans.se Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 02:50:10 -0000 From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 04:51:35 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B0E841BD; Tue, 15 Jan 2013 04:51:35 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id 374117BE; Tue, 15 Jan 2013 04:51:34 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0F4pNaE013436 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 15 Jan 2013 15:51:25 +1100 Date: Tue, 15 Jan 2013 15:51:23 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client In-Reply-To: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20130115141019.H1444@besplex.bde.org> References: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=P/xiHV8u c=1 sm=1 a=S8Qr1IbAvFsA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=U1Z5fgpPGSMA:10 a=9QiI2z3JOZ09_-QNc5AA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: Rick Macklem , fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 04:51:35 -0000 On Mon, 14 Jan 2013, Rick Macklem wrote: > John Baldwin wrote: >> The NFS client tries to infer when an application has passed NULL to >> utimes() >> so that it can let the server set the timestamp rather than using a >> client- >> supplied timestamp. It does this by checking to see if the desired >> timestamp's second matches the current second. However, this breaks >> applications that are intentionally trying to set a specific timestamp >> within >> the current second. In addition, utimes() sets a flag to indicate if >> NULL was >> passed to utimes(). The patch below changes the NFS client to check >> this flag >> and only use the server-supplied time in that case: It is certainly an error to not check VA_UTIMES_NULL at all. I think the flag (or the NULL pointer) cannot be passed to the server, so the best we can do for the VA_UTIMES_NULL case is read the current time on the client and pass it to the server. Upper layers have already read the current time, but have passed us VA_UTIMES_NULL so that we can tell that the pointer was originally null so that we can do the different permissions checks for this case. >> Index: fs/nfsclient/nfs_clport.c >> =================================================================== >> --- fs/nfsclient/nfs_clport.c (revision 225511) >> +++ fs/nfsclient/nfs_clport.c (working copy) >> @@ -762,7 +762,7 @@ >> *tl = newnfs_false; >> } >> if (vap->va_atime.tv_sec != VNOVAL) { >> - if (vap->va_atime.tv_sec != curtime.tv_sec) { >> + if (!(vap->va_vaflags & VA_UTIMES_NULL)) { >> NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED); >> *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); >> txdr_nfsv3time(&vap->va_atime, tl); >> @@ -775,7 +775,7 @@ >> *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE); >> ... Something mangled the patch so that it is hard to see what it does. It just uses the flag instead of guessing. I can't see anything that does the different permissions check for the VA_UTIMES_NULL case, and testing shows that this case is just broken, at least for an old version of the old nfs client -- the same permissions are required for all cases, but write permission is supposed to be enough for the VA_UTIMES_NULL case (since write permission is sufficient for setting the mtime to the current time (plus epsilon) using write(2) and truncate(2). Setting the atime to the current time should require no more and no less than read permission, since it can be done using read(2), but utimes(NULL) requires write permission for that too). > In the old days, a lot of NFS servers only stored times at a > resolution of 1sec, which I think is why the code had the habit > of comparing "seconds equal". I think this is not the reason for the check here. > If there is some app. out there > that sets "current time" via utimes(2) with a curent time argument > instead of a NULL argument would seem to be broken to me. > (It is conceivable that some app. did this to avoid clock > skew between the client and server, but I doubt it.) Apps have no alternative to using the NULL arg if they have write permission to the file but don't own it. Oops, on looking at the code I now think it _is_ possible to pass the request to set the current time on the server, since in the NFSV3SATTRTIME_TOSERVER case we just pass this case value and not any time value to the server, so the server has no option but to use its current time. It is not surprising that the permissions checks for this don't work right. I thought that the client was responsible for most permissions checks, but can't find many or the relevant one here. The NFSV3SATTRTIME_TOSERVER code on the server sets VA_UTIMES_NULL, so I would have thought that the permissions check on the server does the right thing. There are some large timestamping bugs nearby: - the old nfs server code for NFSV3SATTRTIME_TOSERVER uses getnanotime() to read the current time. This violates the system's policy set by the vfs.timestamp precision in most cases, since using getnanotime() is the worst supported policy and is not the defaul. The old nfs client uses the correct function to read the current time, vfs_timestamp(), in nfs_create(), but this is the only use of vfs_timestamp() in old nfs code. I think most cases use the server time and thus use the correct function iff the leaf server file system uses the correct function. - the new nfs server code for NFSV3SATTRTIME_TOSERVER macro-izes all reads of the current time except 1 as NFSGETTIME(). This uses getmicrotime(), so it violates the system's policy in all cases, since using getmicrotime() is not a supported policy (using microtime() is supported). The 1 exception is a hard-coded getmicrotime() in fs/nfsclient/nfs_clport.c whose use is visible in the above patch. This one really didn't matter, because only the seconds part of curtime was used. It was just a micro-pessimization and style bug. The (not quite) correct way to get the seconds part is to use time_second, as is done in the old nfs client. (This way is not quite correct because there are some races and non-monotonicities reading the times. In the above check, vap->va_atime.tv_sec might have been read by a more precise clock than curtime.tv_sec. Then the check might give a false positive or negative. But the check is only a heuristic, and is inherently racy, so this doesn't rally matter. With the above pathcm the check becomes a different pessimization and style bug. The curtime variable becomes unused except for its incorrect initialization. New nfs code never uses the correct function vfs_timestamp(). Following the system pollcy for file timestamps causes some problems for utimes(NULL) too. Old versions hard-coded microtime(). Current versions use vfs_timestamp(). The latter is better, but tends to give different results than times(non_NULL), since few or no applications know anything about the system's policy. touch(1) probably should know, but doesn't. So the simple "touch foo" gives various results, depending: - touch(1) starts with gettimeofday(). This gives microseconds resolution and usually microseconds accuracy if its result is used. - touch then tries utimes(non_NULL) with the current time that it just read. This usually works, giving microseconds resolution, etc. This is OK, but often different from the system policy. - touch then tries utimes(NULL). If this works, then it follow the system policy. Another problem is that not all file systems support nanoseconds resolutions, so not all system policies or utimes() requests can be honored. I would usually prefer the system's policy to be enforced as far as possible. Thus if the system's policy is microseconds resolution, then times with nanoseconds resolution should be rounded down to the nearest microsecond. This case is most useful since utimes() cannot preserve times with more than microseconds resolution. Utilities like cp(1) blindly round the times given in nanoseconds by stat(2) to ones that can be written by utimes(2), so this often happens in an uncontrollable way anyway (POSIX is finally getting around to specifying permissible errors for unrepresentable resolutions). But sometimes I want utimes() to preserve times as well as possible. Bruce From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 12:52:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D94589A for ; Tue, 15 Jan 2013 12:52:59 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-qa0-f48.google.com (mail-qa0-f48.google.com [209.85.216.48]) by mx1.freebsd.org (Postfix) with ESMTP id 8A304655 for ; Tue, 15 Jan 2013 12:52:59 +0000 (UTC) Received: by mail-qa0-f48.google.com with SMTP id l8so121903qaq.7 for ; Tue, 15 Jan 2013 04:52:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=MUVcmjAyXuMFqtoC/Sb1vGEv1i/1T1T7E8U0HKFX6FE=; b=c85W1MlFJTEVEHyuqaZegPM0sVc6hJ30z89gQBMa8SB6vHLdVY7MAL++yYhTbSdFvC cVEt/IGpfBYG6Ocelzlwau6F/AXRdfVuv+x22jxtF99qwM/1uqWZ4212zWUxGvU0hvN8 U5mCqVKSPuLm2gqu1F+1HsKe7TTO2JzVbGIBQkwS6h8DRH4FBuPS+us9qktaByh/AvCW ZVXZ/u2Mhih+zeuEBPvooztSw516UDZy9qj625xlyO05TTGYjtlLFPTv+MldYC776DIJ He0Hqzi6YoEky/iPy51c9WYEa8aPLHuMgnweVI752EIW2MhCs4OgPs/W9GGUiqN2rKn9 6XIw== MIME-Version: 1.0 Received: by 10.224.60.12 with SMTP id n12mr75306031qah.23.1358254378886; Tue, 15 Jan 2013 04:52:58 -0800 (PST) Received: by 10.229.78.96 with HTTP; Tue, 15 Jan 2013 04:52:58 -0800 (PST) Date: Tue, 15 Jan 2013 15:52:58 +0300 Message-ID: Subject: getcwd lies on/under nfs4-mounted zfs dataset From: Sergey Kandaurov To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 12:52:59 -0000 Hi. We stuck with the problem getting wrong current directory path when sitting on/under zfs dataset filesystem mounted over NFSv4. Both nfs server and client are 10.0-CURRENT from December or so. The component path "user3" unexpectedly appears to be "." (dot). nfs-client:/home/user3 # pwd /home/. nfs-client:/home/user3/var/run # pwd /home/./var/run nfs-client:~ # procstat -f 3225 PID COMM FD T V FLAGS REF OFFSET PRO NAME 3225 a.out text v r r-------- - - - /home/./var/a.out 3225 a.out ctty v c rw------- - - - /dev/pts/2 3225 a.out cwd v d r-------- - - - /home/./var 3225 a.out root v d r-------- - - - / The used setup follows. 1. NFS Server with local ZFS: # cat /etc/exports V4: / -sec=sys # zfs list pool1 10.4M 122G 580K /pool1 pool1/user3 on /pool1/user3 (zfs, NFS exported, local, nfsv4acls) Exports list on localhost: /pool1/user3 109.70.28.0 /pool1 109.70.28.0 # zfs get sharenfs pool1/user3 NAME PROPERTY VALUE SOURCE pool1/user3 sharenfs -alldirs -maproot=root -network=109.70.28.0/24 local 2. pool1 is mounted on NFSv4 client: nfs-server:/pool1 on /home (nfs, noatime, nfsv4acls) So that on NFS client the "pool1/user3" dataset comes at /home/user3. / - ufs /home - zpool-over-nfsv4 /home/user3 - zfs dataset "pool1/user3" At the same time it works as expected when we're not on zfs dataset, but directly on its parent zfs pool (also over NFSv4), e.g. nfs-client:/home/non_dataset_dir # pwd /home/non_dataset_dir The ls command works as expected: nfs-client:/# ls -dl /home/user3/var/ drwxrwxrwt+ 6 root wheel 6 Jan 10 16:19 /home/user3/var/ -- wbr, pluknet From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 16:56:44 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 63E107E2; Tue, 15 Jan 2013 16:56:44 +0000 (UTC) (envelope-from trasz@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 240F2779; Tue, 15 Jan 2013 16:56:44 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0FGuh3e048016; Tue, 15 Jan 2013 16:56:43 GMT (envelope-from trasz@freefall.freebsd.org) Received: (from trasz@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0FGuh0C048012; Tue, 15 Jan 2013 16:56:43 GMT (envelope-from trasz) Date: Tue, 15 Jan 2013 16:56:43 GMT Message-Id: <201301151656.r0FGuh0C048012@freefall.freebsd.org> To: trasz@FreeBSD.org, freebsd-fs@FreeBSD.org, trasz@FreeBSD.org From: trasz@FreeBSD.org Subject: Re: kern/174948: [zfs] owner@ always have ZFS ACL full permissions. Should not be the case. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 16:56:44 -0000 Synopsis: [zfs] owner@ always have ZFS ACL full permissions. Should not be the case. Responsible-Changed-From-To: freebsd-fs->trasz Responsible-Changed-By: trasz Responsible-Changed-When: Tue Jan 15 16:56:43 UTC 2013 Responsible-Changed-Why: I'll take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=174948 From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 16:57:17 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 82113862; Tue, 15 Jan 2013 16:57:17 +0000 (UTC) (envelope-from trasz@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 5C61B786; Tue, 15 Jan 2013 16:57:17 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0FGvHdJ048113; Tue, 15 Jan 2013 16:57:17 GMT (envelope-from trasz@freefall.freebsd.org) Received: (from trasz@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0FGvHME048109; Tue, 15 Jan 2013 16:57:17 GMT (envelope-from trasz) Date: Tue, 15 Jan 2013 16:57:17 GMT Message-Id: <201301151657.r0FGvHME048109@freefall.freebsd.org> To: trasz@FreeBSD.org, freebsd-fs@FreeBSD.org, trasz@FreeBSD.org From: trasz@FreeBSD.org Subject: Re: kern/174949: [zfs] ZFS ACL: rwxp required to mkdir. p should not be required. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 16:57:17 -0000 Synopsis: [zfs] ZFS ACL: rwxp required to mkdir. p should not be required. Responsible-Changed-From-To: freebsd-fs->trasz Responsible-Changed-By: trasz Responsible-Changed-When: Tue Jan 15 16:57:16 UTC 2013 Responsible-Changed-Why: I'll take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=174949 From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 16:57:28 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E510E8D0; Tue, 15 Jan 2013 16:57:28 +0000 (UTC) (envelope-from trasz@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id C179A78A; Tue, 15 Jan 2013 16:57:28 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0FGvSjP048207; Tue, 15 Jan 2013 16:57:28 GMT (envelope-from trasz@freefall.freebsd.org) Received: (from trasz@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0FGvSh8048203; Tue, 15 Jan 2013 16:57:28 GMT (envelope-from trasz) Date: Tue, 15 Jan 2013 16:57:28 GMT Message-Id: <201301151657.r0FGvSh8048203@freefall.freebsd.org> To: trasz@FreeBSD.org, freebsd-fs@FreeBSD.org, trasz@FreeBSD.org From: trasz@FreeBSD.org Subject: Re: kern/174950: [zfs] delete ZFS ACL have no effect X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 16:57:29 -0000 Synopsis: [zfs] delete ZFS ACL have no effect Responsible-Changed-From-To: freebsd-fs->trasz Responsible-Changed-By: trasz Responsible-Changed-When: Tue Jan 15 16:57:28 UTC 2013 Responsible-Changed-Why: I'll take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=174950 From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 17:01:05 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8F23FB2B; Tue, 15 Jan 2013 17:01:05 +0000 (UTC) (envelope-from trasz@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 6AF417B6; Tue, 15 Jan 2013 17:01:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0FH15VT049847; Tue, 15 Jan 2013 17:01:05 GMT (envelope-from trasz@freefall.freebsd.org) Received: (from trasz@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0FH15kx049842; Tue, 15 Jan 2013 17:01:05 GMT (envelope-from trasz) Date: Tue, 15 Jan 2013 17:01:05 GMT Message-Id: <201301151701.r0FH15kx049842@freefall.freebsd.org> To: trasz@FreeBSD.org, freebsd-fs@FreeBSD.org, trasz@FreeBSD.org From: trasz@FreeBSD.org Subject: Re: kern/175101: [zfs] [nfs] ZFS NFSv4 ACL's allows user without perm to delete and update timestamp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 17:01:05 -0000 Synopsis: [zfs] [nfs] ZFS NFSv4 ACL's allows user without perm to delete and update timestamp Responsible-Changed-From-To: freebsd-fs->trasz Responsible-Changed-By: trasz Responsible-Changed-When: Tue Jan 15 17:01:04 UTC 2013 Responsible-Changed-Why: I'll take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=175101 From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 19:55:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4156A7F1; Tue, 15 Jan 2013 19:55:25 +0000 (UTC) (envelope-from olivier777a7@gmail.com) Received: from mail-la0-f50.google.com (mail-la0-f50.google.com [209.85.215.50]) by mx1.freebsd.org (Postfix) with ESMTP id C1AB425A; Tue, 15 Jan 2013 19:55:23 +0000 (UTC) Received: by mail-la0-f50.google.com with SMTP id fs13so564618lab.23 for ; Tue, 15 Jan 2013 11:55:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=gmPWyccEvg/1qMgT5YgggOnZQo4WIg59D1YBCCOPe+o=; b=vpkwX7gqVtnLLzPliWhTCcB3w+ZU3nmvi3WYbf1zir0TV3qhi4OJQyBm+kPDZtOQXe Xv8nFqOhSdLSzndKyqwSE2uGTFYxa6A5D99UQ/qdAWavYCrqyUpBwU4jvNNpcdhhyAur CiZ7WWPIjf3E62wDbEzUlswE5M6M8UuA4DErKIDFP1ESr75u/FWodd6TSjg/9iE3OUS6 OWMvfiqraS4s8rTrzetrq6mt81DQjGiLlAxtJ8UMKlPqp963T5J5Qb9Pe74xgeOQSQcg Vson8t14ry3iZTzdrXRIBvhdPtv5yOu//aXGFevqNkv3lSwFuRRhuCreDRU0nA5vnNLt cpZg== MIME-Version: 1.0 Received: by 10.152.145.37 with SMTP id sr5mr30198611lab.33.1358279722551; Tue, 15 Jan 2013 11:55:22 -0800 (PST) Received: by 10.114.78.41 with HTTP; Tue, 15 Jan 2013 11:55:22 -0800 (PST) In-Reply-To: References: Date: Tue, 15 Jan 2013 11:55:22 -0800 Message-ID: Subject: Re: CAM hangs in 9-STABLE? [Was: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE] From: olivier To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: ken@freebsd.org, Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 19:55:25 -0000 Dear All, Still experiencing the same hangs I reported earlier with 9.1. I've been running a kernel with WITNESS enabled to provide more information. During an occurrence of the hang, running show alllocks gave Process 25777 (sysctl) thread 0xfffffe014c5b2920 (102567) exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff811e34c0) locked @ /usr/src/sys/dev/usb/usb_transfer.c:3171 Process 25750 (sshd) thread 0xfffffe015a688000 (104313) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0bb98) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 24922 (cnid_dbd) thread 0xfffffe0187ac4920 (103597) shared lockmgr zfs (zfs) r = 0 (0xfffffe0973062488) locked @ /usr/src/sys/kern/vfs_syscalls.c:3591 Process 24117 (sshd) thread 0xfffffe07bd914490 (104195) exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0a8f0) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148 Process 1243 (java) thread 0xfffffe01ca85d000 (102704) exclusive sleep mutex pmap (pmap) r = 0 (0xfffffe015aec1440) locked @ /usr/src/sys/amd64/amd64/pmap.c:4840 exclusive rw pmap pv global (pmap pv global) r = 0 (0xffffffff81409780) locked @ /usr/src/sys/amd64/amd64/pmap.c:4802 exclusive sleep mutex vm page (vm page) r = 0 (0xffffffff813f0a80) locked @ /usr/src/sys/vm/vm_object.c:1128 exclusive sleep mutex vm object (standard object) r = 0 (0xfffffe01458e43a0) locked @ /usr/src/sys/vm/vm_object.c:1076 shared sx vm map (user) (vm map (user)) r = 0 (0xfffffe015aec1388) locked @ /usr/src/sys/vm/vm_map.c:2045 Process 994 (nfsd) thread 0xfffffe015a0df000 (102426) shared lockmgr zfs (zfs) r = 0 (0xfffffe0c3b505878) locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 Process 994 (nfsd) thread 0xfffffe015a0f8490 (102422) exclusive lockmgr zfs (zfs) r = 0 (0xfffffe02db3b3e60) locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 Process 931 (syslogd) thread 0xfffffe015af18920 (102365) shared lockmgr zfs (zfs) r = 0 (0xfffffe0141dd6680) locked @ /usr/src/sys/kern/vfs_syscalls.c:3591 Process 22 (syncer) thread 0xfffffe0125077000 (100279) exclusive lockmgr syncer (syncer) r = 0 (0xfffffe015a2ff680) locked @ /usr/src/sys/kern/vfs_subr.c:1809 I don't have full "show lockedvnods" output because the output does not get captured by ddb after using "capture on", it doesn't fit on a single screen, and doesn't get piped into a "more" equivalent. What I did manage to get (copied by hand, typos possible) is: 0xfffffe0c3b5057e0: 0xfffffe0c3b5057e0: tag zfs, type VREG tag zfs, type VREG usecount 1, writecount 0, refcount 1 mountedhere 0 usecount 1, writecount 0, refcount 1 mountedhere 0 flags (VI_ACTIVE) flags (VI_ACTIVE) v_object 0xfffffe089bc1b828 ref 0 pages 0 v_object 0xfffffe089bc1b828 ref 0 pages 0 lock type zfs: SHARED (count 1) lock type zfs: SHARED (count 1) 0xfffffe02db3b3dc8: 0xfffffe02db3b3dc8: tag zfs, type VREG tag zfs, type VREG usecount 6, writecount 0, refcount 6 mountedhere 0 usecount 6, writecount 0, refcount 6 mountedhere 0 flags (VI_ACTIVE) flags (VI_ACTIVE) v_object 0xfffffe0b79583ae0 ref 0 pages 0 v_object 0xfffffe0b79583ae0 ref 0 pages 0 lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) with exclusive waiters pending with exclusive waiters pending The output of show witness is at http://pastebin.com/eSRb3FEu The output of alltrace is at http://pastebin.com/X1LruNrf (a number of threads are stuck in zio_wait, none I can find in zio_interrupt, and according to gstat and disks eventually going to sleep all disk IO seems to be stuck for good; I think Andriy explained earlier that these criteria might indicate this is a ZFS hang). The output of show geom is at http://pastebin.com/6nwQbKr4 The output of vmstat -i is at http://pastebin.com/9LcZ7Mi0 Interrupts are occurring at a normal rate during the hang, as far as I can tell. Any help would be greatly appreciated. Thanks Olivier PS: my kernel was compiled from 9-STABLE from December, with CAM and ahci from 9.0 (in the hope it would fix the hangs I was experiencing in plain 9-STABLE; obviously the hangs are still occurring). The rest of my configuration is the same as posted earlier. On Mon, Dec 24, 2012 at 9:42 PM, olivier wrote: > Dear All > It turns out that reverting to an older version of the mps driver did not > fix the ZFS hangs I've been struggling with in 9.1 and 9-STABLE after all > (they just took a bit longer to occur again, possibly just by chance). I > followed steps along lines suggested by Andriy to collect more information > when the problem occurs. Hopefully this will help figure out what's going > on. > > As far as I can tell, what happens is that at some point IO operations to > a bunch of drives that belong to different pools get stuck. For these > drives, gstat shows no activity but 1 pending operation, as such: > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > ms/d %busy Name > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da1 > > I've been running gstat in a loop (every 100s) to monitor the machine. > Just before the hang occurs, everything seems fine (see full gstat output > below). Right after the hang occurs a number of drives seem stuck (see full > gstat output below). Notably, some stuck drives are seen through the mps > driver and others through the mpt driver. So the problem doesn't seem to be > driver-specific. I have had the problem occur (at a lower frequency) on > similar machines that don't use the mpt driver (and only have 1 disk > provided through mps), so the problem doesn't seem to be caused by the mpt > driver (and is likely not caused by defective hardware). Since based on the > information I provided earlier Andriy thinks the problem might not > originate in ZFS, perhaps that means that the problem is in the CAM layer? > > camcontrol tags -v (as suggested by Andriy) in the hung state shows for > example > > (pass56:mpt1:0:8:20): dev_openings 254 > (pass56:mpt1:0:8:20): dev_active 1 > (pass56:mpt1:0:8:20): devq_openings 254 > (pass56:mpt1:0:8:20): devq_queued 0 > (pass56:mpt1:0:8:20): held 0 > (pass56:mpt1:0:8:20): mintags 2 > (pass56:mpt1:0:8:20): maxtags 255 > (I'm not providing full camcontrol tags output below because I couldn't > get it to run during the specific hang I documented most thoroughly; the > example above is from a different occurrence of the hang). > > The buses don't seem completely frozen: if I manually remove drives while > the machine is hanging, that's picked up by the mpt driver, which prints > out corresponding messages to the console. But camcontrol reset all or > rescan all don't seem to do anything. > > I've tried reducing vfs.zfs.vdev.min_pending and vfs.zfs.vdev.max_pending > to 1, to no avail. > > Any suggestions to resolve this problem, work around it, or further > investigate it would be greatly appreciated! > Thanks a lot > Olivier > > Detailed information: > > Output of procstat -a -kk when the machine is hanging is available at > http://pastebin.com/7D2KtT35 (not putting it here because it's pretty > long) > > dmesg is available at http://pastebin.com/9zJQwWJG . Note that I'm using > LUN masking, so the "illegal requests" reported aren't really errors. Maybe > one day if I get my problems sorted out I'll use geom multipathing instead. > > My kernel config is > include GENERIC > ident MYKERNEL > > options IPSEC > device crypto > > options OFED # Infiniband protocol > > device mlx4ib # ConnectX Infiniband support > device mlxen # ConnectX Ethernet support > device mthca # Infinihost cards > device ipoib # IP over IB devices > > options ATA_CAM # Handle legacy controllers with CAM > options ATA_STATIC_ID # Static device numbering > > options KDB > options DDB > > > > Full output of gstat just before the hang (at most 100s before the hang): > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > ms/d %busy Name > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da2/da2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da0/da0 > 1 85 48 79 4.7 35 84 0.5 0 0 > 0.0 24.3 da1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da1/da1 > 1 83 47 77 4.3 34 79 0.5 0 0 > 0.0 22.1 da4 > 1 1324 1303 21433 0.6 19 42 0.7 0 0 > 0.0 79.8 da3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da5 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da8 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da9 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da10 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da11 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da12 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da13 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da14 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da15 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da18 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da19 > 0 97 57 93 3.5 38 84 0.3 0 0 > 0.0 21.3 da20 > 0 85 47 69 3.3 36 86 0.4 0 0 > 0.0 16.8 da21 > 0 1666 1641 18992 0.3 23 43 0.4 0 0 > 0.0 57.9 da22 > 0 93 55 98 3.5 36 87 0.4 0 0 > 0.0 20.6 da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da27 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da28 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da31 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da32 > 0 1200 0 0 0.0 1198 11751 0.6 0 0 > 0.0 67.3 da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da34 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da35 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da36 > 0 81 44 67 2.0 35 84 0.3 0 0 > 0.0 10.1 da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da38 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da39 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da42 > 1 1020 999 22028 0.8 19 42 0.7 0 0 > 0.0 84.8 da43 > 0 1050 1029 23479 0.8 19 47 0.7 0 0 > 0.0 83.3 da44 > 1 1006 984 22758 0.8 21 46 0.6 0 0 > 0.0 84.8 da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da46 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da47 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da48 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da49 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da50 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 cd0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da4/da4 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da3/da3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da5/da5 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da6/da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da7/da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da8/da8 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da9/da9 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da10/da10 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da11/da11 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da12/da12 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da13/da13 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da14/da14 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da15/da15 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da16/da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da17/da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da18/da18 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da19/da19 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da20/da20 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da21/da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da22/da22 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da23/da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da24/da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da25/da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26/da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 PART/da26/da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da27/da27 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da28/da28 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da29/da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da30/da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da31/da31 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da32/da32 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da33/da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da34/da34 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da35/da35 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da36/da36 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da37/da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da38/da38 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da39/da39 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da40/da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da41/da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da42/da42 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da43/da43 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da44/da44 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da45/da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da46/da46 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da47/da47 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da48/da48 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da49/da49 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da50/da50 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/cd0/cd0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p1/da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p2/da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 LABEL/da26p1/da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p3/da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 LABEL/da26p2/da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 > DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 > DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 > > > Full output of gstat just after the hang (at most 100s after the hang): > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > ms/d %busy Name > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da2/da2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da0/da0 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da1/da1 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da4 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da5 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da8 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da9 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da10 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da11 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da12 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da13 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da14 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da15 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da16 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da18 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da19 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da20 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da22 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da23 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da25 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da27 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da28 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da31 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da32 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da34 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da35 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da36 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da38 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da39 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da42 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da43 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da44 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da46 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da47 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da48 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da49 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da50 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 cd0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da4/da4 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da3/da3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da5/da5 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da6/da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da7/da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da8/da8 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da9/da9 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da10/da10 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da11/da11 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da12/da12 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da13/da13 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da14/da14 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da15/da15 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da16/da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da17/da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da18/da18 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da19/da19 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da20/da20 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da21/da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da22/da22 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da23/da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da24/da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da25/da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26/da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 PART/da26/da26 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p2 > 1 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da27/da27 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da28/da28 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da29/da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da30/da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da31/da31 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da32/da32 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da33/da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da34/da34 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da35/da35 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da36/da36 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da37/da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da38/da38 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da39/da39 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da40/da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da41/da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da42/da42 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da43/da43 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da44/da44 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da45/da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da46/da46 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da47/da47 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da48/da48 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da49/da49 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da50/da50 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/cd0/cd0 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p1/da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p2/da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 LABEL/da26p1/da26p1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 DEV/da26p3/da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 LABEL/da26p2/da26p2 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 > DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 > DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 > 0 0 0 0 0.0 0 0 0.0 0 0 > 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 > > > On Thu, Dec 13, 2012 at 10:14 PM, olivier wrote: > >> For what it's worth, I think I might have solved my problem by reverting >> to an older version of the mps driver. I checked out a recent version of >> 9-STABLE and reversed the changes in >> http://svnweb.freebsd.org/base?view=revision&revision=230592 (perhaps >> there was a simpler way of reverting to the older mps driver). So far so >> good, no hang even when hammering the file system. >> >> This does not conclusively prove that the new LSI mps driver is at fault, >> but that seems to be a likely explanation. >> >> Thanks to everybody who pointed me in the right direction. Hope this >> helps others who run into similar problems with 9.1 >> Olivier >> >> >> On Thu, Dec 13, 2012 at 10:14 AM, olivier wrote: >> >>> >>> >>> On Thu, Dec 13, 2012 at 9:54 AM, Andriy Gapon wrote: >>> >>>> Google for "zfs deadman". This is already committed upstream and I >>>> think that it >>>> is imported into FreeBSD, but I am not sure... Maybe it's imported >>>> just into the >>>> vendor area and is not merged yet. >>>> >>> >>> Yes, that's exactly what I had in mind. The logic for panicking makes >>> sense. >>> As far as I can tell you're correct that deadman is in the vendor area >>> but not merged. Any idea when it might make it into 9-STABLE? >>> Thanks >>> Olivier >>> >>> >>> >>> >>>> So, when enabled this logic would panic a system as a way of letting >>>> know that >>>> something is wrong. You can read in the links why panic was selected >>>> for this job. >>>> >>>> And speaking FreeBSD-centric - I think that our CAM layer would be a >>>> perfect place >>>> to detect such issues in non-ZFS-specific way. >>>> >>>> -- >>>> Andriy Gapon >>>> >>> >>> >> > From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 20:28:00 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AFA488DA; Tue, 15 Jan 2013 20:28:00 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 88634647; Tue, 15 Jan 2013 20:28:00 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C4AEBB9AD; Tue, 15 Jan 2013 15:27:59 -0500 (EST) From: John Baldwin To: Bruce Evans Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client Date: Tue, 15 Jan 2013 14:58:42 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca> <20130115141019.H1444@besplex.bde.org> In-Reply-To: <20130115141019.H1444@besplex.bde.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201301151458.42874.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 15 Jan 2013 15:27:59 -0500 (EST) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 20:28:00 -0000 On Monday, January 14, 2013 11:51:23 pm Bruce Evans wrote: > On Mon, 14 Jan 2013, Rick Macklem wrote: > > > John Baldwin wrote: > >> The NFS client tries to infer when an application has passed NULL to > >> utimes() > >> so that it can let the server set the timestamp rather than using a > >> client- > >> supplied timestamp. It does this by checking to see if the desired > >> timestamp's second matches the current second. However, this breaks > >> applications that are intentionally trying to set a specific timestamp > >> within > >> the current second. In addition, utimes() sets a flag to indicate if > >> NULL was > >> passed to utimes(). The patch below changes the NFS client to check > >> this flag > >> and only use the server-supplied time in that case: > > It is certainly an error to not check VA_UTIMES_NULL at all. I think > the flag (or the NULL pointer) cannot be passed to the server, so the > best we can do for the VA_UTIMES_NULL case is read the current time on > the client and pass it to the server. Upper layers have already read > the current time, but have passed us VA_UTIMES_NULL so that we can tell > that the pointer was originally null so that we can do the different > permissions checks for this case. > > >> Index: fs/nfsclient/nfs_clport.c > >> =================================================================== > >> --- fs/nfsclient/nfs_clport.c (revision 225511) > >> +++ fs/nfsclient/nfs_clport.c (working copy) > >> @@ -762,7 +762,7 @@ > >> *tl = newnfs_false; > >> } > >> if (vap->va_atime.tv_sec != VNOVAL) { > >> - if (vap->va_atime.tv_sec != curtime.tv_sec) { > >> + if (!(vap->va_vaflags & VA_UTIMES_NULL)) { > >> NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED); > >> *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); > >> txdr_nfsv3time(&vap->va_atime, tl); > >> @@ -775,7 +775,7 @@ > >> *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE); > >> ... > > Something mangled the patch so that it is hard to see what it does. It > just uses the flag instead of guessing. > > I can't see anything that does the different permissions check for > the VA_UTIMES_NULL case, and testing shows that this case is just broken, > at least for an old version of the old nfs client -- the same permissions > are required for all cases, but write permission is supposed to be > enough for the VA_UTIMES_NULL case (since write permission is sufficient > for setting the mtime to the current time (plus epsilon) using write(2) > and truncate(2). Setting the atime to the current time should require > no more and no less than read permission, since it can be done using > read(2), but utimes(NULL) requires write permission for that too). Correct. All the other uses of VA_UTIMES_NULL in the tree are to provide the permissions check you describe and there is a large comment about it in ufs_setattr(). Other filesystems have comments that reference ufs_setattr(). I think these checks should be done in nfs_setattr() rather than in the routine to build an NFS attribute object however. Fixing NFS to properly use vfs_timestamp() seems to be a larger project. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 22:46:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4A131EC; Tue, 15 Jan 2013 22:46:05 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 029C7EAB; Tue, 15 Jan 2013 22:46:04 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3Ym68h4Ct7z7ySF; Tue, 15 Jan 2013 23:45:56 +0100 (CET) Date: Tue, 15 Jan 2013 23:45:56 +0100 From: Nicolas Rachinsky To: Artem Belevich Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130115224556.GA41774@mid.pc5.i.0x5.de> References: <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 22:46:05 -0000 * Artem Belevich [2013-01-14 17:37 -0800]: > OK. threads responsible for transaction sync seem to be stuck in zio_wait. > zio_wait is in turn waiting for some task thread to be done with its work. > Now you need to figure out what are those task threads doing. > > 'procstat -kk 0' will dump few hundreds of taskq threads. Most of them > would be zfs related. On an idle box (8.3/amd64 in my case) most of > them would have the same stack trace looking like this (modulo > offsets): > > mi_switch+0x196 sleepq_wait+0x42 _sleep+0x3c0 > taskqueue_thread_loop+0xbe fork_exit+0x11f fork_trampoline+0xe > > Look for stack traces that don't match that pattern. There are some of these. root@bolte ~# sh -c 'for i in `jot 1000`; do procstat -kk 0 ; sleep 0.1 ; done' | sort | uniq -c | grep -v -F 'mi_switch+0x176 sleepq_wait+0x42 _sleep+0x317 taskqueue_thread_loop+0xbe fork_exit+0x11f fork_trampoline+0xe' | sort -n 1 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 1 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dmu_buf_hold_array_by_dnode+0x22b dmu_read+0x89 space_map_load+0x108 metaslab_activate+0xdc metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f 1 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e metaslab_alloc+0x77b zio_dva_allocate+0x1aa zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 1 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e metaslab_alloc+0x77b zio_dva_allocate+0x1aa zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 1 0 100098 kernel zio_write_issue_ mi_switch+0x176 turnstile_wait+0x1cb _mtx_lock_sleep+0xb0 taskqueue_member+0xe8 zio_execute+0x10c taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 1 0 100099 kernel zio_write_issue_ mi_switch+0x176 critical_exit+0xa5 intr_event_handle+0xb3 intr_execute_handlers+0x5f lapic_handle_intr+0x37 Xapic_isr1+0xa5 space_map_remove+0x81 space_map_load+0x1a4 metaslab_activate+0xdc metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 1 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 1 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e metaslab_alloc+0x77b zio_dva_allocate+0x1aa zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 1 0 100106 kernel zio_write_intr_1 1 0 100108 kernel zio_write_intr_3 1 0 100109 kernel zio_write_intr_4 1 0 100109 kernel zio_write_intr_4 mi_switch+0x176 turnstile_wait+0x1cb _mtx_lock_sleep+0xb0 _sleep+0x251 taskqueue_thread_loop+0xbe fork_exit+0x11f fork_trampoline+0xe 1 0 100110 kernel zio_write_intr_5 mi_switch+0x176 turnstile_wait+0x1cb _mtx_lock_sleep+0xb0 _sleep+0x251 taskqueue_thread_loop+0xbe fork_exit+0x11f fork_trampoline+0xe 2 0 100040 kernel nfe0 taskq 2 0 100096 kernel zio_read_intr_0 2 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 2 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 2 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x5e5 dbuf_findbp+0x107 dbuf_prefetch+0x8f dmu_prefetch+0x1bb space_map_load+0x289 metaslab_activate+0xdc metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 2 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e metaslab_alloc+0x77b zio_dva_allocate+0x9a zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 3 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e 3 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e 3 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dmu_buf_hold_array_by_dnode+0x22b dmu_read+0x89 space_map_load+0x108 metaslab_activate+0xdc metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 4 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 6 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 7 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 7 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 12 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 14 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dmu_buf_hold_array_by_dnode+0x22b dmu_read+0x89 space_map_load+0x108 metaslab_activate+0xdc metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 18 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 23 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 26 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 31 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 37 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 zio_ready+0x17d zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 84 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 89 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x1aa zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 145 0 100099 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 147 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe 313 0 100099 kernel zio_write_issue_ 329 0 100098 kernel zio_write_issue_ 998 0 100040 kernel nfe0 taskq mi_switch+0x176 sleepq_wait+0x42 msleep_spin+0x1a2 taskqueue_thread_loop+0x71 fork_exit+0x11f fork_trampoline+0xe 1000 0 100000 kernel swapper mi_switch+0x176 sleepq_timedwait+0x42 _sleep+0x301 scheduler+0x357 mi_startup+0x77 btext+0x2c 1000 0 100017 kernel acpi_task_0 mi_switch+0x176 sleepq_wait+0x42 msleep_spin+0x1a2 taskqueue_thread_loop+0x71 fork_exit+0x11f fork_trampoline+0xe 1000 0 100018 kernel acpi_task_1 mi_switch+0x176 sleepq_wait+0x42 msleep_spin+0x1a2 taskqueue_thread_loop+0x71 fork_exit+0x11f fork_trampoline+0xe 1000 0 100019 kernel acpi_task_2 mi_switch+0x176 sleepq_wait+0x42 msleep_spin+0x1a2 taskqueue_thread_loop+0x71 fork_exit+0x11f fork_trampoline+0xe 1000 0 100041 kernel nfe1 taskq mi_switch+0x176 sleepq_wait+0x42 msleep_spin+0x1a2 taskqueue_thread_loop+0x71 fork_exit+0x11f fork_trampoline+0xe 1000 PID TID COMM TDNAME KSTACK Thank you very much Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 22:49:04 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0EE961F7; Tue, 15 Jan 2013 22:49:04 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 79BE6EF6; Tue, 15 Jan 2013 22:49:03 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAJXb9VCDaFvO/2dsb2JhbABFhjq3YnOCHgEBBAEjBFIFFg4KAgINGQJZBogmBqYEgkCOc4EjjwKBEwOIYY0rkEmDE4IG X-IronPort-AV: E=Sophos;i="4.84,475,1355115600"; d="scan'208";a="9236221" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 15 Jan 2013 17:49:00 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 2F355B3F0D; Tue, 15 Jan 2013 17:49:00 -0500 (EST) Date: Tue, 15 Jan 2013 17:49:00 -0500 (EST) From: Rick Macklem To: Bruce Evans Message-ID: <1149390778.2023367.1358290140175.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20130115141019.H1444@besplex.bde.org> Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Rick Macklem , fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 22:49:04 -0000 Bruce Evans wrote: > On Mon, 14 Jan 2013, Rick Macklem wrote: > > > John Baldwin wrote: > >> The NFS client tries to infer when an application has passed NULL > >> to > >> utimes() > >> so that it can let the server set the timestamp rather than using a > >> client- > >> supplied timestamp. It does this by checking to see if the desired > >> timestamp's second matches the current second. However, this breaks > >> applications that are intentionally trying to set a specific > >> timestamp > >> within > >> the current second. In addition, utimes() sets a flag to indicate > >> if > >> NULL was > >> passed to utimes(). The patch below changes the NFS client to check > >> this flag > >> and only use the server-supplied time in that case: > > It is certainly an error to not check VA_UTIMES_NULL at all. I think > the flag (or the NULL pointer) cannot be passed to the server, so the > best we can do for the VA_UTIMES_NULL case is read the current time on > the client and pass it to the server. Upper layers have already read > the current time, but have passed us VA_UTIMES_NULL so that we can > tell > that the pointer was originally null so that we can do the different > permissions checks for this case. > > >> Index: fs/nfsclient/nfs_clport.c > >> =================================================================== > >> --- fs/nfsclient/nfs_clport.c (revision 225511) > >> +++ fs/nfsclient/nfs_clport.c (working copy) > >> @@ -762,7 +762,7 @@ > >> *tl = newnfs_false; > >> } > >> if (vap->va_atime.tv_sec != VNOVAL) { > >> - if (vap->va_atime.tv_sec != curtime.tv_sec) { > >> + if (!(vap->va_vaflags & VA_UTIMES_NULL)) { > >> NFSM_BUILD(tl, u_int32_t *, 3 * NFSX_UNSIGNED); > >> *tl++ = txdr_unsigned(NFSV3SATTRTIME_TOCLIENT); > >> txdr_nfsv3time(&vap->va_atime, tl); > >> @@ -775,7 +775,7 @@ > >> *tl = txdr_unsigned(NFSV3SATTRTIME_DONTCHANGE); > >> ... > > Something mangled the patch so that it is hard to see what it does. It > just uses the flag instead of guessing. > > I can't see anything that does the different permissions check for > the VA_UTIMES_NULL case, and testing shows that this case is just > broken, > at least for an old version of the old nfs client -- the same > permissions > are required for all cases, but write permission is supposed to be > enough for the VA_UTIMES_NULL case (since write permission is > sufficient > for setting the mtime to the current time (plus epsilon) using > write(2) > and truncate(2). Setting the atime to the current time should require > no more and no less than read permission, since it can be done using > read(2), but utimes(NULL) requires write permission for that too). > I did a quick test on a -current client/server and it seems to work ok. The client uses SET_TIME_TO_SERVER and the server sets VA_UTIMES_NULL for this case. At least it works for a UFS exported volume. > > In the old days, a lot of NFS servers only stored times at a > > resolution of 1sec, which I think is why the code had the habit > > of comparing "seconds equal". > > I think this is not the reason for the check here. > > > If there is some app. out there > > that sets "current time" via utimes(2) with a curent time argument > > instead of a NULL argument would seem to be broken to me. > > (It is conceivable that some app. did this to avoid clock > > skew between the client and server, but I doubt it.) > > Apps have no alternative to using the NULL arg if they have write > permission > to the file but don't own it. > > Oops, on looking at the code I now think it _is_ possible to pass the > request to set the current time on the server, since in the > NFSV3SATTRTIME_TOSERVER case we just pass this case value and not > any time value to the server, so the server has no option but to use > its current time. It is not surprising that the permissions checks > for this don't work right. I thought that the client was responsible > for most permissions checks, but can't find many or the relevant one > here. The NFSV3SATTRTIME_TOSERVER code on the server sets > VA_UTIMES_NULL, so I would have thought that the permissions check on > the server does the right thing. > As noted above, it seems to work correctly for the new server in -current, at least for UFS exports. Normally a server will do permission checking for NFS RPCs. There is nothing stopping a client from doing a check and returning an error, but traditionally a server has not trusted a client to do so. (I'm not sure if adding a check in the client is what jhb@ was referring to in his reply to this?) > There are some large timestamping bugs nearby: > > - the old nfs server code for NFSV3SATTRTIME_TOSERVER uses > getnanotime() > to read the current time. This violates the system's policy set by > the vfs.timestamp precision in most cases, since using getnanotime() > is the worst supported policy and is not the defaul. > > The old nfs client uses the correct function to read the current > time, vfs_timestamp(), in nfs_create(), but this is the only use of > vfs_timestamp() in old nfs code. I think most cases use the server > time and thus use the correct function iff the leaf server file > system uses the correct function. > > - the new nfs server code for NFSV3SATTRTIME_TOSERVER macro-izes all > reads of the current time except 1 as NFSGETTIME(). This uses > getmicrotime(), so it violates the system's policy in all cases, > since using getmicrotime() is not a supported policy (using > microtime() is supported). The 1 exception is a hard-coded > getmicrotime() in fs/nfsclient/nfs_clport.c whose use is visible > in the above patch. This one really didn't matter, because only the > seconds part of curtime was used. It was just a micro-pessimization > and style bug. The (not quite) correct way to get the seconds part > is to use time_second, as is done in the old nfs client. > (This way is not quite correct because there are some races and > non-monotonicities reading the times. In the above check, > vap->va_atime.tv_sec might have been read by a more precise clock > than curtime.tv_sec. Then the check might give a false positive > or negative. But the check is only a heuristic, and is inherently > racy, so this doesn't rally matter. > With the above pathcm the check becomes a different pessimization and > style bug. The curtime variable becomes unused except for its > incorrect initialization. > In this case, after the patch is applied, curtime and getmicrotime() can just be deleted (as you noted, above). > New nfs code never uses the correct function vfs_timestamp(). This needs to be fixed. Until now, I would have had no idea what is the correct interface. (When I did the port, I just used a call that seemed to return what I wanted.;-) Having said that, after reading what you wrote below, it is not obvious to me what the correct fix is? (It seems to be a choice between microtime() and vfs_timestamp()?) > > Following the system pollcy for file timestamps causes some problems > for utimes(NULL) too. Old versions hard-coded microtime(). Current > versions use vfs_timestamp(). The latter is better, but tends to > give different results than times(non_NULL), since few or no > applications know anything about the system's policy. touch(1) > probably should know, but doesn't. So the simple "touch foo" gives > various results, depending: > - touch(1) starts with gettimeofday(). This gives microseconds > resolution and usually microseconds accuracy if its result is used. > - touch then tries utimes(non_NULL) with the current time that it > just read. This usually works, giving microseconds resolution, > etc. This is OK, but often different from the system policy. > - touch then tries utimes(NULL). If this works, then it follow the > system policy. > > Another problem is that not all file systems support nanoseconds > resolutions, so not all system policies or utimes() requests can > be honored. > > I would usually prefer the system's policy to be enforced as far as > possible. Thus if the system's policy is microseconds resolution, > then times with nanoseconds resolution should be rounded down to the > nearest microsecond. This case is most useful since utimes() cannot > preserve times with more than microseconds resolution. Utilities like > cp(1) blindly round the times given in nanoseconds by stat(2) to ones > that can be written by utimes(2), so this often happens in an > uncontrollable way anyway (POSIX is finally getting around to > specifying > permissible errors for unrepresentable resolutions). But sometimes I > want utimes() to preserve times as well as possible. > > Bruce From owner-freebsd-fs@FreeBSD.ORG Tue Jan 15 23:32:07 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E6F1B3F1 for ; Tue, 15 Jan 2013 23:32:07 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id AF2B527C for ; Tue, 15 Jan 2013 23:32:07 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAG/l9VCDaFvO/2dsb2JhbABFhjq3YnOCHgEBAQQBAQEgKyALGw4KAgINGQIjBgEJJgYIBwQBHASHZgMPDKV/gkCGZQ2HfoEjimWBCIMVgRMDiGGKfViBVoEcihuFEoMTgVE1 X-IronPort-AV: E=Sophos;i="4.84,475,1355115600"; d="scan'208";a="12075217" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 15 Jan 2013 18:32:05 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3C2F0B3F18; Tue, 15 Jan 2013 18:32:05 -0500 (EST) Date: Tue, 15 Jan 2013 18:32:05 -0500 (EST) From: Rick Macklem To: Sergey Kandaurov Message-ID: <2118820107.2024400.1358292725230.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: getcwd lies on/under nfs4-mounted zfs dataset MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 23:32:08 -0000 pluknet@gmail.com wrote: > Hi. > > We stuck with the problem getting wrong current directory path > when sitting on/under zfs dataset filesystem mounted over NFSv4. > Both nfs server and client are 10.0-CURRENT from December or so. > > The component path "user3" unexpectedly appears to be "." (dot). > nfs-client:/home/user3 # pwd > /home/. > nfs-client:/home/user3/var/run # pwd > /home/./var/run > > nfs-client:~ # procstat -f 3225 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 3225 a.out text v r r-------- - - - /home/./var/a.out > 3225 a.out ctty v c rw------- - - - /dev/pts/2 > 3225 a.out cwd v d r-------- - - - /home/./var > 3225 a.out root v d r-------- - - - / > > The used setup follows. > > 1. NFS Server with local ZFS: > # cat /etc/exports > V4: / -sec=sys > > # zfs list > pool1 10.4M 122G 580K /pool1 > pool1/user3 on /pool1/user3 (zfs, NFS exported, local, nfsv4acls) > > Exports list on localhost: > /pool1/user3 109.70.28.0 > /pool1 109.70.28.0 > > # zfs get sharenfs pool1/user3 > NAME PROPERTY VALUE SOURCE > pool1/user3 sharenfs -alldirs -maproot=root -network=109.70.28.0/24 > local > > 2. pool1 is mounted on NFSv4 client: > nfs-server:/pool1 on /home (nfs, noatime, nfsv4acls) > > So that on NFS client the "pool1/user3" dataset comes at /home/user3. > / - ufs > /home - zpool-over-nfsv4 > /home/user3 - zfs dataset "pool1/user3" > > At the same time it works as expected when we're not on zfs dataset, > but directly on its parent zfs pool (also over NFSv4), e.g. > nfs-client:/home/non_dataset_dir # pwd > /home/non_dataset_dir > > The ls command works as expected: > nfs-client:/# ls -dl /home/user3/var/ > drwxrwxrwt+ 6 root wheel 6 Jan 10 16:19 /home/user3/var/ > Well, if you are just looking for a work around, you could try mounting /home/user3 separately. Otherwise, here's roughly what needs to happen for it to work. (There may be some additional trick(s) I am not aware of.) On the server, ZFS must report: - different fsids for /home vs /home/user3 - fileno (A) must be the same value for "." and ".." for the zfs dataset root (and set VV_ROOT on the vnode) - fileno (B) for "user3" reported by readdir() on /home must be different than what "." and ".." report. Then the NFS server will report a different value (B) for Mounted_on_fileno than it does for Fileno (A), when the client gets attributes for the directory /home/user3. When the client sees Mounted_on_fileno != Fileno, it knows it is at a server mount point boundary and should report the correct stuff to stat() and readdir(). I haven't tested this for a while, so it might be broken for UFS as well. If that's the case, I can probably try and track down the problem here. If not, you can capture packets when you do the getcwd() and then look at them in wireshark, so you can see what the server is returning for Fileno and Mounted_on_fileno. They should be different for "/home/user3" and the latter one should be the value returned by Readdir of "/home" for the "user3" entry. I won't be in a position to look at a wireshark trace until April, so I can't help with that at this time. Since I've never used ZFS, I have no idea what it considers a "mount point"? (Generically, within a mount point there needs to be "same fsid" and a unique set of "fileno" values for all objects. When crossing the mount point, VV_ROOT needs to be set and the mounted_on_vp (or whatever it's called) must refer to the parent. "/home" for this case.) I don't know if this helps, rick ps: Solaris10 clients don't get this to work, so you always need to mount each server file system separately, which is the "work around" I suggested at the beginning of this post. > -- > wbr, > pluknet > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 00:07:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 99B89260 for ; Wed, 16 Jan 2013 00:07:44 +0000 (UTC) (envelope-from rcartwri@asu.edu) Received: from mail-oa0-f48.google.com (mail-oa0-f48.google.com [209.85.219.48]) by mx1.freebsd.org (Postfix) with ESMTP id 63E69713 for ; Wed, 16 Jan 2013 00:07:44 +0000 (UTC) Received: by mail-oa0-f48.google.com with SMTP id h2so816736oag.35 for ; Tue, 15 Jan 2013 16:07:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=IVTKzE3xkDriny8v4brXDu2cpIUqrxpTT97TyGQjVEc=; b=QCHCzYMMg+WAeYkZCRroLHFeKqjYr1HdJ6AmaVFVrj0w76Ih5nd0DI7VRujZEwEUl6 fThNh9HhDjfxbxsJIzeIrYjcSKJ3W0cctTQGkKzZ0dic5kJ7tGSHCpqis1zEF+DY5FTQ eHT0S116SMwsZ6Jta+5y7KK3sc+RNhS13rHsZRw4Q90Snzj+/+mL1m33rumdw1aSuYTG G91aQS+lHG5tI587zyo/uL0b51S4ZgqdNQeKn5aYdyLTqrpmbeM3MQYn475I5u8i4u2c 1MAeIOiUpQ+kU9nuyYWZeXDeRblaCX42HmdW5l2ExdKN9QUKTSai5zTsUHdooOHDSeC4 NXpg== MIME-Version: 1.0 Received: by 10.182.235.70 with SMTP id uk6mr23274848obc.54.1358294858555; Tue, 15 Jan 2013 16:07:38 -0800 (PST) Received: by 10.76.173.101 with HTTP; Tue, 15 Jan 2013 16:07:38 -0800 (PST) In-Reply-To: References: Date: Tue, 15 Jan 2013 17:07:38 -0700 Message-ID: Subject: Re: CAM hangs in 9-STABLE? [Was: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE] From: "Reed A. Cartwright" To: olivier Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmAQOk86bShZ+ExMD9igtOjThBtIy1ywzwWkit1RMiFqSA8tl1wcgXm4Su7dQLt/gDJYFdW Cc: freebsd-fs@freebsd.org, ken@freebsd.org, "freebsd-stable@freebsd.org" , Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 00:07:44 -0000 I don't know if this is relevant or not, but I deadlock was recently fixed in the VFS code: http://svnweb.freebsd.org/base?view=revision&revision=244795 On Tue, Jan 15, 2013 at 12:55 PM, olivier wrote: > Dear All, > Still experiencing the same hangs I reported earlier with 9.1. I've been > running a kernel with WITNESS enabled to provide more information. > > During an occurrence of the hang, running show alllocks gave > > Process 25777 (sysctl) thread 0xfffffe014c5b2920 (102567) > exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff811e34c0) locked @ > /usr/src/sys/dev/usb/usb_transfer.c:3171 > Process 25750 (sshd) thread 0xfffffe015a688000 (104313) > exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0bb98) locked @ > /usr/src/sys/kern/uipc_sockbuf.c:148 > Process 24922 (cnid_dbd) thread 0xfffffe0187ac4920 (103597) > shared lockmgr zfs (zfs) r = 0 (0xfffffe0973062488) locked @ > /usr/src/sys/kern/vfs_syscalls.c:3591 > Process 24117 (sshd) thread 0xfffffe07bd914490 (104195) > exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0a8f0) locked @ > /usr/src/sys/kern/uipc_sockbuf.c:148 > Process 1243 (java) thread 0xfffffe01ca85d000 (102704) > exclusive sleep mutex pmap (pmap) r = 0 (0xfffffe015aec1440) locked @ > /usr/src/sys/amd64/amd64/pmap.c:4840 > exclusive rw pmap pv global (pmap pv global) r = 0 (0xffffffff81409780) > locked @ /usr/src/sys/amd64/amd64/pmap.c:4802 > exclusive sleep mutex vm page (vm page) r = 0 (0xffffffff813f0a80) locked @ > /usr/src/sys/vm/vm_object.c:1128 > exclusive sleep mutex vm object (standard object) r = 0 > (0xfffffe01458e43a0) locked @ /usr/src/sys/vm/vm_object.c:1076 > shared sx vm map (user) (vm map (user)) r = 0 (0xfffffe015aec1388) locked @ > /usr/src/sys/vm/vm_map.c:2045 > Process 994 (nfsd) thread 0xfffffe015a0df000 (102426) > shared lockmgr zfs (zfs) r = 0 (0xfffffe0c3b505878) locked @ > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 > Process 994 (nfsd) thread 0xfffffe015a0f8490 (102422) > exclusive lockmgr zfs (zfs) r = 0 (0xfffffe02db3b3e60) locked @ > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 > Process 931 (syslogd) thread 0xfffffe015af18920 (102365) > shared lockmgr zfs (zfs) r = 0 (0xfffffe0141dd6680) locked @ > /usr/src/sys/kern/vfs_syscalls.c:3591 > Process 22 (syncer) thread 0xfffffe0125077000 (100279) > exclusive lockmgr syncer (syncer) r = 0 (0xfffffe015a2ff680) locked @ > /usr/src/sys/kern/vfs_subr.c:1809 > > I don't have full "show lockedvnods" output because the output does not get > captured by ddb after using "capture on", it doesn't fit on a single > screen, and doesn't get piped into a "more" equivalent. What I did manage > to get (copied by hand, typos possible) is: > > 0xfffffe0c3b5057e0: 0xfffffe0c3b5057e0: tag zfs, type VREG > tag zfs, type VREG > usecount 1, writecount 0, refcount 1 mountedhere 0 > usecount 1, writecount 0, refcount 1 mountedhere 0 > flags (VI_ACTIVE) > flags (VI_ACTIVE) > v_object 0xfffffe089bc1b828 ref 0 pages 0 > v_object 0xfffffe089bc1b828 ref 0 pages 0 > lock type zfs: SHARED (count 1) > lock type zfs: SHARED (count 1) > > 0xfffffe02db3b3dc8: 0xfffffe02db3b3dc8: tag zfs, type VREG > tag zfs, type VREG > usecount 6, writecount 0, refcount 6 mountedhere 0 > usecount 6, writecount 0, refcount 6 mountedhere 0 > flags (VI_ACTIVE) > flags (VI_ACTIVE) > v_object 0xfffffe0b79583ae0 ref 0 pages 0 > v_object 0xfffffe0b79583ae0 ref 0 pages 0 > lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) > lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) > with exclusive waiters pending > with exclusive waiters pending > > The output of show witness is at http://pastebin.com/eSRb3FEu > > The output of alltrace is at http://pastebin.com/X1LruNrf (a number of > threads are stuck in zio_wait, none I can find in zio_interrupt, and > according to gstat and disks eventually going to sleep all disk IO seems to > be stuck for good; I think Andriy explained earlier that these criteria > might indicate this is a ZFS hang). > > The output of show geom is at http://pastebin.com/6nwQbKr4 > > The output of vmstat -i is at http://pastebin.com/9LcZ7Mi0 Interrupts are > occurring at a normal rate during the hang, as far as I can tell. > > Any help would be greatly appreciated. > Thanks > Olivier > PS: my kernel was compiled from 9-STABLE from December, with CAM and ahci > from 9.0 (in the hope it would fix the hangs I was experiencing in plain > 9-STABLE; obviously the hangs are still occurring). The rest of my > configuration is the same as posted earlier. > > On Mon, Dec 24, 2012 at 9:42 PM, olivier wrote: > >> Dear All >> It turns out that reverting to an older version of the mps driver did not >> fix the ZFS hangs I've been struggling with in 9.1 and 9-STABLE after all >> (they just took a bit longer to occur again, possibly just by chance). I >> followed steps along lines suggested by Andriy to collect more information >> when the problem occurs. Hopefully this will help figure out what's going >> on. >> >> As far as I can tell, what happens is that at some point IO operations to >> a bunch of drives that belong to different pools get stuck. For these >> drives, gstat shows no activity but 1 pending operation, as such: >> >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps >> ms/d %busy Name >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da1 >> >> I've been running gstat in a loop (every 100s) to monitor the machine. >> Just before the hang occurs, everything seems fine (see full gstat output >> below). Right after the hang occurs a number of drives seem stuck (see full >> gstat output below). Notably, some stuck drives are seen through the mps >> driver and others through the mpt driver. So the problem doesn't seem to be >> driver-specific. I have had the problem occur (at a lower frequency) on >> similar machines that don't use the mpt driver (and only have 1 disk >> provided through mps), so the problem doesn't seem to be caused by the mpt >> driver (and is likely not caused by defective hardware). Since based on the >> information I provided earlier Andriy thinks the problem might not >> originate in ZFS, perhaps that means that the problem is in the CAM layer? >> >> camcontrol tags -v (as suggested by Andriy) in the hung state shows for >> example >> >> (pass56:mpt1:0:8:20): dev_openings 254 >> (pass56:mpt1:0:8:20): dev_active 1 >> (pass56:mpt1:0:8:20): devq_openings 254 >> (pass56:mpt1:0:8:20): devq_queued 0 >> (pass56:mpt1:0:8:20): held 0 >> (pass56:mpt1:0:8:20): mintags 2 >> (pass56:mpt1:0:8:20): maxtags 255 >> (I'm not providing full camcontrol tags output below because I couldn't >> get it to run during the specific hang I documented most thoroughly; the >> example above is from a different occurrence of the hang). >> >> The buses don't seem completely frozen: if I manually remove drives while >> the machine is hanging, that's picked up by the mpt driver, which prints >> out corresponding messages to the console. But camcontrol reset all or >> rescan all don't seem to do anything. >> >> I've tried reducing vfs.zfs.vdev.min_pending and vfs.zfs.vdev.max_pending >> to 1, to no avail. >> >> Any suggestions to resolve this problem, work around it, or further >> investigate it would be greatly appreciated! >> Thanks a lot >> Olivier >> >> Detailed information: >> >> Output of procstat -a -kk when the machine is hanging is available at >> http://pastebin.com/7D2KtT35 (not putting it here because it's pretty >> long) >> >> dmesg is available at http://pastebin.com/9zJQwWJG . Note that I'm using >> LUN masking, so the "illegal requests" reported aren't really errors. Maybe >> one day if I get my problems sorted out I'll use geom multipathing instead. >> >> My kernel config is >> include GENERIC >> ident MYKERNEL >> >> options IPSEC >> device crypto >> >> options OFED # Infiniband protocol >> >> device mlx4ib # ConnectX Infiniband support >> device mlxen # ConnectX Ethernet support >> device mthca # Infinihost cards >> device ipoib # IP over IB devices >> >> options ATA_CAM # Handle legacy controllers with CAM >> options ATA_STATIC_ID # Static device numbering >> >> options KDB >> options DDB >> >> >> >> Full output of gstat just before the hang (at most 100s before the hang): >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps >> ms/d %busy Name >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da2/da2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da0/da0 >> 1 85 48 79 4.7 35 84 0.5 0 0 >> 0.0 24.3 da1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da1/da1 >> 1 83 47 77 4.3 34 79 0.5 0 0 >> 0.0 22.1 da4 >> 1 1324 1303 21433 0.6 19 42 0.7 0 0 >> 0.0 79.8 da3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da5 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da8 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da9 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da10 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da11 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da12 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da13 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da14 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da15 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da18 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da19 >> 0 97 57 93 3.5 38 84 0.3 0 0 >> 0.0 21.3 da20 >> 0 85 47 69 3.3 36 86 0.4 0 0 >> 0.0 16.8 da21 >> 0 1666 1641 18992 0.3 23 43 0.4 0 0 >> 0.0 57.9 da22 >> 0 93 55 98 3.5 36 87 0.4 0 0 >> 0.0 20.6 da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da27 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da28 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da31 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da32 >> 0 1200 0 0 0.0 1198 11751 0.6 0 0 >> 0.0 67.3 da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da34 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da35 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da36 >> 0 81 44 67 2.0 35 84 0.3 0 0 >> 0.0 10.1 da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da38 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da39 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da42 >> 1 1020 999 22028 0.8 19 42 0.7 0 0 >> 0.0 84.8 da43 >> 0 1050 1029 23479 0.8 19 47 0.7 0 0 >> 0.0 83.3 da44 >> 1 1006 984 22758 0.8 21 46 0.6 0 0 >> 0.0 84.8 da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da46 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da47 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da48 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da49 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da50 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 cd0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da4/da4 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da3/da3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da5/da5 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da6/da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da7/da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da8/da8 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da9/da9 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da10/da10 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da11/da11 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da12/da12 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da13/da13 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da14/da14 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da15/da15 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da16/da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da17/da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da18/da18 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da19/da19 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da20/da20 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da21/da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da22/da22 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da23/da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da24/da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da25/da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26/da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 PART/da26/da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da27/da27 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da28/da28 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da29/da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da30/da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da31/da31 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da32/da32 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da33/da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da34/da34 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da35/da35 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da36/da36 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da37/da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da38/da38 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da39/da39 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da40/da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da41/da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da42/da42 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da43/da43 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da44/da44 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da45/da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da46/da46 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da47/da47 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da48/da48 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da49/da49 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da50/da50 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/cd0/cd0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p1/da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p2/da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 LABEL/da26p1/da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p3/da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 LABEL/da26p2/da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 >> DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 >> DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 >> >> >> Full output of gstat just after the hang (at most 100s after the hang): >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps >> ms/d %busy Name >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da2/da2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da0/da0 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da1/da1 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da4 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da5 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da8 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da9 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da10 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da11 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da12 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da13 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da14 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da15 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da16 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da18 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da19 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da20 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da22 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da23 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da25 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da27 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da28 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da31 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da32 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da34 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da35 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da36 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da38 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da39 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da42 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da43 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da44 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da46 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da47 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da48 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da49 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da50 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 cd0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da4/da4 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da3/da3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da5/da5 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da6/da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da7/da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da8/da8 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da9/da9 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da10/da10 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da11/da11 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da12/da12 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da13/da13 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da14/da14 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da15/da15 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da16/da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da17/da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da18/da18 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da19/da19 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da20/da20 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da21/da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da22/da22 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da23/da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da24/da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da25/da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26/da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 PART/da26/da26 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p2 >> 1 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da27/da27 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da28/da28 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da29/da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da30/da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da31/da31 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da32/da32 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da33/da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da34/da34 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da35/da35 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da36/da36 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da37/da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da38/da38 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da39/da39 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da40/da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da41/da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da42/da42 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da43/da43 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da44/da44 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da45/da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da46/da46 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da47/da47 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da48/da48 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da49/da49 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da50/da50 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/cd0/cd0 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p1/da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p2/da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 LABEL/da26p1/da26p1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 DEV/da26p3/da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 LABEL/da26p2/da26p2 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 >> DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 >> DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 >> 0 0 0 0 0.0 0 0 0.0 0 0 >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 >> >> >> On Thu, Dec 13, 2012 at 10:14 PM, olivier wrote: >> >>> For what it's worth, I think I might have solved my problem by reverting >>> to an older version of the mps driver. I checked out a recent version of >>> 9-STABLE and reversed the changes in >>> http://svnweb.freebsd.org/base?view=revision&revision=230592 (perhaps >>> there was a simpler way of reverting to the older mps driver). So far so >>> good, no hang even when hammering the file system. >>> >>> This does not conclusively prove that the new LSI mps driver is at fault, >>> but that seems to be a likely explanation. >>> >>> Thanks to everybody who pointed me in the right direction. Hope this >>> helps others who run into similar problems with 9.1 >>> Olivier >>> >>> >>> On Thu, Dec 13, 2012 at 10:14 AM, olivier wrote: >>> >>>> >>>> >>>> On Thu, Dec 13, 2012 at 9:54 AM, Andriy Gapon wrote: >>>> >>>>> Google for "zfs deadman". This is already committed upstream and I >>>>> think that it >>>>> is imported into FreeBSD, but I am not sure... Maybe it's imported >>>>> just into the >>>>> vendor area and is not merged yet. >>>>> >>>> >>>> Yes, that's exactly what I had in mind. The logic for panicking makes >>>> sense. >>>> As far as I can tell you're correct that deadman is in the vendor area >>>> but not merged. Any idea when it might make it into 9-STABLE? >>>> Thanks >>>> Olivier >>>> >>>> >>>> >>>> >>>>> So, when enabled this logic would panic a system as a way of letting >>>>> know that >>>>> something is wrong. You can read in the links why panic was selected >>>>> for this job. >>>>> >>>>> And speaking FreeBSD-centric - I think that our CAM layer would be a >>>>> perfect place >>>>> to detect such issues in non-ZFS-specific way. >>>>> >>>>> -- >>>>> Andriy Gapon >>>>> >>>> >>>> >>> >> > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Reed A. Cartwright, PhD Assistant Professor of Genomics, Evolution, and Bioinformatics School of Life Sciences Center for Evolutionary Medicine and Informatics The Biodesign Institute Arizona State University From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 00:16:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CC3AE599 for ; Wed, 16 Jan 2013 00:16:47 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vc0-f180.google.com (mail-vc0-f180.google.com [209.85.220.180]) by mx1.freebsd.org (Postfix) with ESMTP id 8CF557C4 for ; Wed, 16 Jan 2013 00:16:47 +0000 (UTC) Received: by mail-vc0-f180.google.com with SMTP id p16so762001vcq.39 for ; Tue, 15 Jan 2013 16:16:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=RZ4UGXlPyiFkjowbzCuy+4v/0dm3vdNPzEKHocW0Nhc=; b=lf1S93PQXn8v7iWzMVhoRF/N8O2yUWBseY9wFk9fN5Zgtmf72aFFP02lv/rfrNUYuY zzVIR55EAaGxe0XMFqpcLbfy4uQcfx5bMa4zxI2tkW+quHkLE6fZNqCh9d3wbQjC4ZY0 eRbe2wL4vGQFQwocjUjW1yE0OmRUeuDKVdJgWTyhKloHJ2C6P7scU2RHlo3/WD8A/buF V2uL12pS/nmVW/g+fjEr/F19be4OBihNm9KyEylErzdz5FoBWqVslhqqwoEpDb37S/zT 0g+RzyrVpDUtsPCDknCo86OPo9KhD2FY4VP/X2ka411KbQrBbgiuOYsIdf02PZQBgiOP dc9w== MIME-Version: 1.0 Received: by 10.220.153.201 with SMTP id l9mr106550322vcw.33.1358295401115; Tue, 15 Jan 2013 16:16:41 -0800 (PST) Sender: artemb@gmail.com Received: by 10.220.122.196 with HTTP; Tue, 15 Jan 2013 16:16:40 -0800 (PST) In-Reply-To: <20130115224556.GA41774@mid.pc5.i.0x5.de> References: <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> Date: Tue, 15 Jan 2013 16:16:40 -0800 X-Google-Sender-Auth: kDVbLxBGd15xXKwDJfHspV66yps Message-ID: Subject: Re: slowdown of zfs (tx->tx) From: Artem Belevich To: Nicolas Rachinsky Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 00:16:47 -0000 On Tue, Jan 15, 2013 at 2:45 PM, Nicolas Rachinsky wrote: > 147 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleep= q_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 = metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 taskqueue_run_l= ocked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe It appears that lots of threads are stuck in metaslab_activate->space_map_load_wait path. This sounds like CR# 6876962 in Solaris: "degraded write performance with threads held up by space_map_load_wait(). This bug is fixed in patch 147440-05, -06 or -07, which is current and contains the fix." Alas, I could not find specifics on how the issue got fixed and whether the same fix is present in illumos and FreeBSD. You may want to update your system to very recent FreeBSD as quite a few fixes were recently imported from illumos. Hopefully it will deal with the issue. I'm out of ideas otherwise. Sorry. --Artem From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 00:56:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 49CFD7C9; Wed, 16 Jan 2013 00:56:52 +0000 (UTC) (envelope-from prvs=1728d5906c=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 5EB15A16; Wed, 16 Jan 2013 00:56:51 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001720881.msg; Wed, 16 Jan 2013 00:56:43 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 16 Jan 2013 00:56:43 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1728d5906c=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <00F86FD0E85D4EEEA1A01E115497F022@multiplay.co.uk> From: "Steven Hartland" To: "Artem Belevich" , "Nicolas Rachinsky" References: <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> Subject: Re: slowdown of zfs (tx->tx) Date: Wed, 16 Jan 2013 00:57:03 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 00:56:52 -0000 ----- Original Message ----- From: "Artem Belevich" To: "Nicolas Rachinsky" Cc: "freebsd-fs" Sent: Wednesday, January 16, 2013 12:16 AM Subject: Re: slowdown of zfs (tx->tx) > On Tue, Jan 15, 2013 at 2:45 PM, Nicolas Rachinsky > wrote: >> 147 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 >> metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 taskqueue_run_locked+0x85 >> taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe > > It appears that lots of threads are stuck in > metaslab_activate->space_map_load_wait path. This sounds like CR# > 6876962 in Solaris: "degraded write performance with threads held up > by space_map_load_wait(). This bug is fixed in patch 147440-05, -06 or > -07, which is current and contains the fix." Alas, I could not find > specifics on how the issue got fixed and whether the same fix is > present in illumos and FreeBSD. > > You may want to update your system to very recent FreeBSD as quite a > few fixes were recently imported from illumos. Hopefully it will deal > with the issue. I'm out of ideas otherwise. Sorry. That would tend to indicate its blocking on write. If this is the case yet the rsync is copying from this box, with little else doing writes it could be atime which is causing the issue. A test for this would be to use the following to disable atime and see if that helps: zfs set atime=off [filesystem] Also out of interest does the pool have many snapshots? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 02:23:32 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 41B80DED; Wed, 16 Jan 2013 02:23:32 +0000 (UTC) (envelope-from olivier777a7@gmail.com) Received: from mail-la0-f48.google.com (mail-la0-f48.google.com [209.85.215.48]) by mx1.freebsd.org (Postfix) with ESMTP id C17A0E9A; Wed, 16 Jan 2013 02:23:30 +0000 (UTC) Received: by mail-la0-f48.google.com with SMTP id ej20so871696lab.35 for ; Tue, 15 Jan 2013 18:23:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=WxoEaKQMpUbcYwd6CadgFniPrFeyKzP9yeqLzzSeJao=; b=otcxFn/U7LRdxP1bOUJRrt4l9CYWDemIBHjEqORBzAwp4ky1pnwGI57opXWN5xbIXO U9F87ArtiVxRFH/wIrFQYLjiKjjjR6u16WxmaiBxSmKiiMJyE86HL6EO8UdOfIhBjrk8 dnf57JBg8kyIsXNww3wq3H3s7OW+9SA+VaOcYxKiYGyD4mmUCL5cfPZOK3bj+oTtDt5L hkIEGUyOLGh/9v3/H3szSetsP5mMp2O5AEgY4BP9SIzVOLcFY1U7PyggJiZjXMDQeCe8 NgTlqTxWHb7PqyxqDuQtICH65yPTqjrND1fTGeI6WiZOplbSmPFjecA51f70dNgJJ94s 2tkg== MIME-Version: 1.0 Received: by 10.152.144.164 with SMTP id sn4mr87688027lab.57.1358303004305; Tue, 15 Jan 2013 18:23:24 -0800 (PST) Received: by 10.114.78.41 with HTTP; Tue, 15 Jan 2013 18:23:24 -0800 (PST) In-Reply-To: References: Date: Tue, 15 Jan 2013 18:23:24 -0800 Message-ID: Subject: Re: CAM hangs in 9-STABLE? [Was: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE] From: olivier To: "Reed A. Cartwright" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org, ken@freebsd.org, "freebsd-stable@freebsd.org" , Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 02:23:32 -0000 My understanding is that the locks (and pieces of kernel code) involved are different. Maybe someone more knowledgeable than I am can comment. Thanks for the suggestion... Olivier On Tue, Jan 15, 2013 at 4:07 PM, Reed A. Cartwright wrote: > I don't know if this is relevant or not, but I deadlock was recently > fixed in the VFS code: > > http://svnweb.freebsd.org/base?view=revision&revision=244795 > > On Tue, Jan 15, 2013 at 12:55 PM, olivier wrote: > > Dear All, > > Still experiencing the same hangs I reported earlier with 9.1. I've been > > running a kernel with WITNESS enabled to provide more information. > > > > During an occurrence of the hang, running show alllocks gave > > > > Process 25777 (sysctl) thread 0xfffffe014c5b2920 (102567) > > exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff811e34c0) locked @ > > /usr/src/sys/dev/usb/usb_transfer.c:3171 > > Process 25750 (sshd) thread 0xfffffe015a688000 (104313) > > exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0bb98) locked @ > > /usr/src/sys/kern/uipc_sockbuf.c:148 > > Process 24922 (cnid_dbd) thread 0xfffffe0187ac4920 (103597) > > shared lockmgr zfs (zfs) r = 0 (0xfffffe0973062488) locked @ > > /usr/src/sys/kern/vfs_syscalls.c:3591 > > Process 24117 (sshd) thread 0xfffffe07bd914490 (104195) > > exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfffffe0204e0a8f0) locked @ > > /usr/src/sys/kern/uipc_sockbuf.c:148 > > Process 1243 (java) thread 0xfffffe01ca85d000 (102704) > > exclusive sleep mutex pmap (pmap) r = 0 (0xfffffe015aec1440) locked @ > > /usr/src/sys/amd64/amd64/pmap.c:4840 > > exclusive rw pmap pv global (pmap pv global) r = 0 (0xffffffff81409780) > > locked @ /usr/src/sys/amd64/amd64/pmap.c:4802 > > exclusive sleep mutex vm page (vm page) r = 0 (0xffffffff813f0a80) > locked @ > > /usr/src/sys/vm/vm_object.c:1128 > > exclusive sleep mutex vm object (standard object) r = 0 > > (0xfffffe01458e43a0) locked @ /usr/src/sys/vm/vm_object.c:1076 > > shared sx vm map (user) (vm map (user)) r = 0 (0xfffffe015aec1388) > locked @ > > /usr/src/sys/vm/vm_map.c:2045 > > Process 994 (nfsd) thread 0xfffffe015a0df000 (102426) > > shared lockmgr zfs (zfs) r = 0 (0xfffffe0c3b505878) locked @ > > > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 > > Process 994 (nfsd) thread 0xfffffe015a0f8490 (102422) > > exclusive lockmgr zfs (zfs) r = 0 (0xfffffe02db3b3e60) locked @ > > > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760 > > Process 931 (syslogd) thread 0xfffffe015af18920 (102365) > > shared lockmgr zfs (zfs) r = 0 (0xfffffe0141dd6680) locked @ > > /usr/src/sys/kern/vfs_syscalls.c:3591 > > Process 22 (syncer) thread 0xfffffe0125077000 (100279) > > exclusive lockmgr syncer (syncer) r = 0 (0xfffffe015a2ff680) locked @ > > /usr/src/sys/kern/vfs_subr.c:1809 > > > > I don't have full "show lockedvnods" output because the output does not > get > > captured by ddb after using "capture on", it doesn't fit on a single > > screen, and doesn't get piped into a "more" equivalent. What I did manage > > to get (copied by hand, typos possible) is: > > > > 0xfffffe0c3b5057e0: 0xfffffe0c3b5057e0: tag zfs, type VREG > > tag zfs, type VREG > > usecount 1, writecount 0, refcount 1 mountedhere 0 > > usecount 1, writecount 0, refcount 1 mountedhere 0 > > flags (VI_ACTIVE) > > flags (VI_ACTIVE) > > v_object 0xfffffe089bc1b828 ref 0 pages 0 > > v_object 0xfffffe089bc1b828 ref 0 pages 0 > > lock type zfs: SHARED (count 1) > > lock type zfs: SHARED (count 1) > > > > 0xfffffe02db3b3dc8: 0xfffffe02db3b3dc8: tag zfs, type VREG > > tag zfs, type VREG > > usecount 6, writecount 0, refcount 6 mountedhere 0 > > usecount 6, writecount 0, refcount 6 mountedhere 0 > > flags (VI_ACTIVE) > > flags (VI_ACTIVE) > > v_object 0xfffffe0b79583ae0 ref 0 pages 0 > > v_object 0xfffffe0b79583ae0 ref 0 pages 0 > > lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) > > lock type zfs: EXCL by thread 0xfffffe015a0f8490 (pid 994) > > with exclusive waiters pending > > with exclusive waiters pending > > > > The output of show witness is at http://pastebin.com/eSRb3FEu > > > > The output of alltrace is at http://pastebin.com/X1LruNrf (a number of > > threads are stuck in zio_wait, none I can find in zio_interrupt, and > > according to gstat and disks eventually going to sleep all disk IO seems > to > > be stuck for good; I think Andriy explained earlier that these criteria > > might indicate this is a ZFS hang). > > > > The output of show geom is at http://pastebin.com/6nwQbKr4 > > > > The output of vmstat -i is at http://pastebin.com/9LcZ7Mi0 Interrupts > are > > occurring at a normal rate during the hang, as far as I can tell. > > > > Any help would be greatly appreciated. > > Thanks > > Olivier > > PS: my kernel was compiled from 9-STABLE from December, with CAM and ahci > > from 9.0 (in the hope it would fix the hangs I was experiencing in plain > > 9-STABLE; obviously the hangs are still occurring). The rest of my > > configuration is the same as posted earlier. > > > > On Mon, Dec 24, 2012 at 9:42 PM, olivier wrote: > > > >> Dear All > >> It turns out that reverting to an older version of the mps driver did > not > >> fix the ZFS hangs I've been struggling with in 9.1 and 9-STABLE after > all > >> (they just took a bit longer to occur again, possibly just by chance). I > >> followed steps along lines suggested by Andriy to collect more > information > >> when the problem occurs. Hopefully this will help figure out what's > going > >> on. > >> > >> As far as I can tell, what happens is that at some point IO operations > to > >> a bunch of drives that belong to different pools get stuck. For these > >> drives, gstat shows no activity but 1 pending operation, as such: > >> > >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > >> ms/d %busy Name > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da1 > >> > >> I've been running gstat in a loop (every 100s) to monitor the machine. > >> Just before the hang occurs, everything seems fine (see full gstat > output > >> below). Right after the hang occurs a number of drives seem stuck (see > full > >> gstat output below). Notably, some stuck drives are seen through the mps > >> driver and others through the mpt driver. So the problem doesn't seem > to be > >> driver-specific. I have had the problem occur (at a lower frequency) on > >> similar machines that don't use the mpt driver (and only have 1 disk > >> provided through mps), so the problem doesn't seem to be caused by the > mpt > >> driver (and is likely not caused by defective hardware). Since based on > the > >> information I provided earlier Andriy thinks the problem might not > >> originate in ZFS, perhaps that means that the problem is in the CAM > layer? > >> > >> camcontrol tags -v (as suggested by Andriy) in the hung state shows for > >> example > >> > >> (pass56:mpt1:0:8:20): dev_openings 254 > >> (pass56:mpt1:0:8:20): dev_active 1 > >> (pass56:mpt1:0:8:20): devq_openings 254 > >> (pass56:mpt1:0:8:20): devq_queued 0 > >> (pass56:mpt1:0:8:20): held 0 > >> (pass56:mpt1:0:8:20): mintags 2 > >> (pass56:mpt1:0:8:20): maxtags 255 > >> (I'm not providing full camcontrol tags output below because I couldn't > >> get it to run during the specific hang I documented most thoroughly; the > >> example above is from a different occurrence of the hang). > >> > >> The buses don't seem completely frozen: if I manually remove drives > while > >> the machine is hanging, that's picked up by the mpt driver, which prints > >> out corresponding messages to the console. But camcontrol reset all or > >> rescan all don't seem to do anything. > >> > >> I've tried reducing vfs.zfs.vdev.min_pending and > vfs.zfs.vdev.max_pending > >> to 1, to no avail. > >> > >> Any suggestions to resolve this problem, work around it, or further > >> investigate it would be greatly appreciated! > >> Thanks a lot > >> Olivier > >> > >> Detailed information: > >> > >> Output of procstat -a -kk when the machine is hanging is available at > >> http://pastebin.com/7D2KtT35 (not putting it here because it's pretty > >> long) > >> > >> dmesg is available at http://pastebin.com/9zJQwWJG . Note that I'm > using > >> LUN masking, so the "illegal requests" reported aren't really errors. > Maybe > >> one day if I get my problems sorted out I'll use geom multipathing > instead. > >> > >> My kernel config is > >> include GENERIC > >> ident MYKERNEL > >> > >> options IPSEC > >> device crypto > >> > >> options OFED # Infiniband protocol > >> > >> device mlx4ib # ConnectX Infiniband support > >> device mlxen # ConnectX Ethernet support > >> device mthca # Infinihost cards > >> device ipoib # IP over IB devices > >> > >> options ATA_CAM # Handle legacy controllers with CAM > >> options ATA_STATIC_ID # Static device numbering > >> > >> options KDB > >> options DDB > >> > >> > >> > >> Full output of gstat just before the hang (at most 100s before the > hang): > >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > >> ms/d %busy Name > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da2 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da0 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da2/da2 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da0/da0 > >> 1 85 48 79 4.7 35 84 0.5 0 0 > >> 0.0 24.3 da1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da1/da1 > >> 1 83 47 77 4.3 34 79 0.5 0 0 > >> 0.0 22.1 da4 > >> 1 1324 1303 21433 0.6 19 42 0.7 0 0 > >> 0.0 79.8 da3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da5 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da6 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da7 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da8 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da9 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da10 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da11 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da12 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da13 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da14 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da15 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da16 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da17 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da18 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da19 > >> 0 97 57 93 3.5 38 84 0.3 0 0 > >> 0.0 21.3 da20 > >> 0 85 47 69 3.3 36 86 0.4 0 0 > >> 0.0 16.8 da21 > >> 0 1666 1641 18992 0.3 23 43 0.4 0 0 > >> 0.0 57.9 da22 > >> 0 93 55 98 3.5 36 87 0.4 0 0 > >> 0.0 20.6 da23 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da24 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da25 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da26 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da27 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da28 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da29 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da30 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da31 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da32 > >> 0 1200 0 0 0.0 1198 11751 0.6 0 0 > >> 0.0 67.3 da33 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da34 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da35 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da36 > >> 0 81 44 67 2.0 35 84 0.3 0 0 > >> 0.0 10.1 da37 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da38 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da39 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da40 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da41 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da42 > >> 1 1020 999 22028 0.8 19 42 0.7 0 0 > >> 0.0 84.8 da43 > >> 0 1050 1029 23479 0.8 19 47 0.7 0 0 > >> 0.0 83.3 da44 > >> 1 1006 984 22758 0.8 21 46 0.6 0 0 > >> 0.0 84.8 da45 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da46 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da47 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da48 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da49 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da50 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 cd0 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da4/da4 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da3/da3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da5/da5 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da6/da6 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da7/da7 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da8/da8 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da9/da9 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da10/da10 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da11/da11 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da12/da12 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da13/da13 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da14/da14 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da15/da15 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da16/da16 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da17/da17 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da18/da18 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da19/da19 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da20/da20 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da21/da21 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da22/da22 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da23/da23 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da24/da24 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da25/da25 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da26/da26 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 PART/da26/da26 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da26p1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da26p2 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da26p3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da27/da27 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da28/da28 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da29/da29 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da30/da30 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da31/da31 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da32/da32 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da33/da33 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da34/da34 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da35/da35 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da36/da36 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da37/da37 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da38/da38 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da39/da39 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da40/da40 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da41/da41 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da42/da42 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da43/da43 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da44/da44 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da45/da45 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da46/da46 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da47/da47 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da48/da48 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da49/da49 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da50/da50 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/cd0/cd0 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da26p1/da26p1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da26p2/da26p2 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 LABEL/da26p1/da26p1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da26p3/da26p3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 LABEL/da26p2/da26p2 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 > >> > DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 > >> > DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 > >> > >> > >> Full output of gstat just after the hang (at most 100s after the hang): > >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps > >> ms/d %busy Name > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da2 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da0 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da2/da2 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da0/da0 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da1/da1 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da4 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da5 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da6 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da7 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da8 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da9 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da10 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da11 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da12 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da13 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da14 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da15 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da16 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da17 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da18 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da19 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da20 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da21 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da22 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da23 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da24 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da25 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da26 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da27 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da28 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da29 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da30 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da31 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da32 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da33 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da34 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da35 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da36 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da37 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da38 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da39 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da40 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da41 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da42 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da43 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da44 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da45 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da46 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da47 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da48 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da49 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da50 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 cd0 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da4/da4 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da3/da3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da5/da5 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da6/da6 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da7/da7 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da8/da8 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da9/da9 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da10/da10 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da11/da11 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da12/da12 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da13/da13 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da14/da14 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da15/da15 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da16/da16 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da17/da17 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da18/da18 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da19/da19 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da20/da20 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da21/da21 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da22/da22 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da23/da23 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da24/da24 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da25/da25 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da26/da26 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 PART/da26/da26 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da26p1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da26p2 > >> 1 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 da26p3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da27/da27 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da28/da28 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da29/da29 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da30/da30 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da31/da31 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da32/da32 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da33/da33 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da34/da34 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da35/da35 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da36/da36 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da37/da37 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da38/da38 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da39/da39 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da40/da40 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da41/da41 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da42/da42 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da43/da43 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da44/da44 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da45/da45 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da46/da46 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da47/da47 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da48/da48 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da49/da49 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da50/da50 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/cd0/cd0 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da26p1/da26p1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da26p2/da26p2 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 LABEL/da26p1/da26p1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 gptid/84d4487b-34e3-11e2-b773-00259058949a > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 DEV/da26p3/da26p3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 LABEL/da26p2/da26p2 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 gptid/b4255780-34e3-11e2-b773-00259058949a > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 > >> > DEV/gptid/84d4487b-34e3-11e2-b773-00259058949a/gptid/84d4487b-34e3-11e2-b773-00259058949a > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da25 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 > >> > DEV/gptid/b4255780-34e3-11e2-b773-00259058949a/gptid/b4255780-34e3-11e2-b773-00259058949a > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da40 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da41 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da26p3 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da29 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da30 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da24 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da6 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da7 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da16 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da17 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da20 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da21 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da37 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da23 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da1 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da4 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da43 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da44 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da22 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da33 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da45 > >> 0 0 0 0 0.0 0 0 0.0 0 0 > >> 0.0 0.0 ZFS::VDEV/zfs::vdev/da3 > >> > >> > >> On Thu, Dec 13, 2012 at 10:14 PM, olivier > wrote: > >> > >>> For what it's worth, I think I might have solved my problem by > reverting > >>> to an older version of the mps driver. I checked out a recent version > of > >>> 9-STABLE and reversed the changes in > >>> http://svnweb.freebsd.org/base?view=revision&revision=230592 (perhaps > >>> there was a simpler way of reverting to the older mps driver). So far > so > >>> good, no hang even when hammering the file system. > >>> > >>> This does not conclusively prove that the new LSI mps driver is at > fault, > >>> but that seems to be a likely explanation. > >>> > >>> Thanks to everybody who pointed me in the right direction. Hope this > >>> helps others who run into similar problems with 9.1 > >>> Olivier > >>> > >>> > >>> On Thu, Dec 13, 2012 at 10:14 AM, olivier > wrote: > >>> > >>>> > >>>> > >>>> On Thu, Dec 13, 2012 at 9:54 AM, Andriy Gapon > wrote: > >>>> > >>>>> Google for "zfs deadman". This is already committed upstream and I > >>>>> think that it > >>>>> is imported into FreeBSD, but I am not sure... Maybe it's imported > >>>>> just into the > >>>>> vendor area and is not merged yet. > >>>>> > >>>> > >>>> Yes, that's exactly what I had in mind. The logic for panicking makes > >>>> sense. > >>>> As far as I can tell you're correct that deadman is in the vendor area > >>>> but not merged. Any idea when it might make it into 9-STABLE? > >>>> Thanks > >>>> Olivier > >>>> > >>>> > >>>> > >>>> > >>>>> So, when enabled this logic would panic a system as a way of letting > >>>>> know that > >>>>> something is wrong. You can read in the links why panic was selected > >>>>> for this job. > >>>>> > >>>>> And speaking FreeBSD-centric - I think that our CAM layer would be a > >>>>> perfect place > >>>>> to detect such issues in non-ZFS-specific way. > >>>>> > >>>>> -- > >>>>> Andriy Gapon > >>>>> > >>>> > >>>> > >>> > >> > > _______________________________________________ > > freebsd-stable@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org > " > > > > -- > Reed A. Cartwright, PhD > Assistant Professor of Genomics, Evolution, and Bioinformatics > School of Life Sciences > Center for Evolutionary Medicine and Informatics > The Biodesign Institute > Arizona State University > From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 02:50:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B06936A4 for ; Wed, 16 Jan 2013 02:50:57 +0000 (UTC) (envelope-from freebsd@deman.com) Received: from plato.corp.nas.com (plato.corp.nas.com [66.114.32.138]) by mx1.freebsd.org (Postfix) with ESMTP id 792BFFEA for ; Wed, 16 Jan 2013 02:50:57 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by plato.corp.nas.com (Postfix) with ESMTP id F0DA112D75DFE for ; Tue, 15 Jan 2013 18:43:49 -0800 (PST) X-Virus-Scanned: amavisd-new at corp.nas.com Received: from plato.corp.nas.com ([127.0.0.1]) by localhost (plato.corp.nas.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sfVJv17MLc76 for ; Tue, 15 Jan 2013 18:43:49 -0800 (PST) Received: from [192.168.0.120] (c-50-135-255-120.hsd1.wa.comcast.net [50.135.255.120]) by plato.corp.nas.com (Postfix) with ESMTPSA id 6F8ED12D75DF3 for ; Tue, 15 Jan 2013 18:43:49 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: HAST + ZFS self healing? Hot spares? From: Michael DeMan In-Reply-To: <50F4BBE7.7050207@gogrid.com> Date: Tue, 15 Jan 2013 18:43:49 -0800 Content-Transfer-Encoding: 7bit Message-Id: <6214EC5B-D846-4B0D-AF14-7AB9F91D2F82@deman.com> References: 4DD5A1CF.70807@itassistans.se <50F4BBE7.7050207@gogrid.com> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1499) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 02:50:57 -0000 Was there supposed to be any content in this envelope? On Jan 14, 2013, at 6:16 PM, Edward Xiao wrote: > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 03:10:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EB3EBB97 for ; Wed, 16 Jan 2013 03:10:41 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id A298C1E1 for ; Wed, 16 Jan 2013 03:10:41 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAPIZ9lCDaFvO/2dsb2JhbABFhjq3aXOCHgEBAQMBAQEBIAQnIAsFFg4KERkCBB8GAQkmBggHBAEcBIdmAwkGDKYQgkCGXA2HfowIgQiDFYETA4hhhieEVliBVoEcihuFEoMTgVE1 X-IronPort-AV: E=Sophos;i="4.84,476,1355115600"; d="scan'208";a="9262898" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 15 Jan 2013 22:10:35 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 5E9ABB3F2E; Tue, 15 Jan 2013 22:10:35 -0500 (EST) Date: Tue, 15 Jan 2013 22:10:35 -0500 (EST) From: Rick Macklem To: Sergey Kandaurov Message-ID: <980540815.2029630.1358305835362.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: getcwd lies on/under nfs4-mounted zfs dataset MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_2029629_336101775.1358305835359" X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 03:10:42 -0000 ------=_Part_2029629_336101775.1358305835359 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit pluknet@gmail.com wrote: > Hi. > > We stuck with the problem getting wrong current directory path > when sitting on/under zfs dataset filesystem mounted over NFSv4. > Both nfs server and client are 10.0-CURRENT from December or so. > > The component path "user3" unexpectedly appears to be "." (dot). > nfs-client:/home/user3 # pwd > /home/. > nfs-client:/home/user3/var/run # pwd > /home/./var/run > Yep, it was broken for UFS too. I think the attached patch for the client might fix it. (It fixes a trivial test case for UFS, but I haven't gone through the code to check if it might break something else.) I vaguely recall bumping into a non-FreeBSD server that only returned the Mounted_on_fileno attribute for mount points at a testing bakeathon and hacking around the problem that caused. (I think that hack made it into head, oops.;-) So, I wouldn't test this patch for a production type system, but if you can test it, that would be great. Sorry about the breakage, rick > nfs-client:~ # procstat -f 3225 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 3225 a.out text v r r-------- - - - /home/./var/a.out > 3225 a.out ctty v c rw------- - - - /dev/pts/2 > 3225 a.out cwd v d r-------- - - - /home/./var > 3225 a.out root v d r-------- - - - / > > The used setup follows. > > 1. NFS Server with local ZFS: > # cat /etc/exports > V4: / -sec=sys > > # zfs list > pool1 10.4M 122G 580K /pool1 > pool1/user3 on /pool1/user3 (zfs, NFS exported, local, nfsv4acls) > > Exports list on localhost: > /pool1/user3 109.70.28.0 > /pool1 109.70.28.0 > > # zfs get sharenfs pool1/user3 > NAME PROPERTY VALUE SOURCE > pool1/user3 sharenfs -alldirs -maproot=root -network=109.70.28.0/24 > local > > 2. pool1 is mounted on NFSv4 client: > nfs-server:/pool1 on /home (nfs, noatime, nfsv4acls) > > So that on NFS client the "pool1/user3" dataset comes at /home/user3. > / - ufs > /home - zpool-over-nfsv4 > /home/user3 - zfs dataset "pool1/user3" > > At the same time it works as expected when we're not on zfs dataset, > but directly on its parent zfs pool (also over NFSv4), e.g. > nfs-client:/home/non_dataset_dir # pwd > /home/non_dataset_dir > > The ls command works as expected: > nfs-client:/# ls -dl /home/user3/var/ > drwxrwxrwt+ 6 root wheel 6 Jan 10 16:19 /home/user3/var/ > > -- > wbr, > pluknet > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" ------=_Part_2029629_336101775.1358305835359 Content-Type: text/x-patch; name=client-getcwd.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=client-getcwd.patch LS0tIGZzL25mcy9uZnNwcm90by5oLnNhdjIJMjAxMy0wMS0xNSAyMTozNDo0OS4wMDAwMDAwMDAg LTA1MDAKKysrIGZzL25mcy9uZnNwcm90by5oCTIwMTMtMDEtMTUgMjE6MzY6NTUuMDAwMDAwMDAw IC0wNTAwCkBAIC05ODQsNyArOTg0LDggQEAgc3RydWN0IG5mc3YzX3NhdHRyIHsKICAJTkZTQVRU UkJNX1NQQUNFVVNFRCB8CQkJCQkJXAogIAlORlNBVFRSQk1fVElNRUFDQ0VTUyB8CQkJCQkJXAog IAlORlNBVFRSQk1fVElNRU1FVEFEQVRBIHwJCQkJCVwKLSAJTkZTQVRUUkJNX1RJTUVNT0RJRlkp CisgCU5GU0FUVFJCTV9USU1FTU9ESUZZIHwJCQkJCQlcCisJTkZTQVRUUkJNX01PVU5URURPTkZJ TEVJRCkKIAogLyoKICAqIFN1YnNldCBvZiB0aGUgYWJvdmUgdGhhdCB0aGUgV3JpdGUgUlBDIGdl dHMuCi0tLSBmcy9uZnMvbmZzX2NvbW1vbnN1YnMuYy5zYXYyCTIwMTMtMDEtMTUgMjE6Mzg6NTMu MDAwMDAwMDAwIC0wNTAwCisrKyBmcy9uZnMvbmZzX2NvbW1vbnN1YnMuYwkyMDEzLTAxLTE1IDIx OjQwOjA0LjAwMDAwMDAwMCAtMDUwMApAQCAtMTcyNiw2ICsxNzI2LDcgQEAgbmZzdjRfbG9hZGF0 dHIoc3RydWN0IG5mc3J2X2Rlc2NyaXB0ICpuZAogCQkJICAgIGlmICgqdGwrKykKIAkJCQlwcmlu dGYoIk5GU3Y0IG1vdW50ZWQgb24gZmlsZWlkID4gMzJiaXRzXG4iKTsKIAkJCSAgICBuYXAtPm5h X21udG9uZmlsZW5vID0gdGh5cDsKKwkJCSAgICBuYXAtPm5hX2ZpbGVpZCA9IG5hcC0+bmFfbW50 b25maWxlbm87CiAJCQl9CiAJCQlhdHRyc3VtICs9IE5GU1hfSFlQRVI7CiAJCQlicmVhazsK ------=_Part_2029629_336101775.1358305835359-- From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 03:49:20 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D681726E; Wed, 16 Jan 2013 03:49:20 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 5BC7B344; Wed, 16 Jan 2013 03:49:19 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0G3nBT5001550 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 16 Jan 2013 14:49:13 +1100 Date: Wed, 16 Jan 2013 14:49:11 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: John Baldwin Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client In-Reply-To: <201301151458.42874.jhb@freebsd.org> Message-ID: <20130116134627.S1060@besplex.bde.org> References: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca> <20130115141019.H1444@besplex.bde.org> <201301151458.42874.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=P/xiHV8u c=1 sm=1 a=S8Qr1IbAvFsA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=U1Z5fgpPGSMA:10 a=kNFpKb6NvubOC5D93twA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: Rick Macklem , fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 03:49:20 -0000 On Tue, 15 Jan 2013, John Baldwin wrote: > On Monday, January 14, 2013 11:51:23 pm Bruce Evans wrote: >> >> I can't see anything that does the different permissions check for >> the VA_UTIMES_NULL case, and testing shows that this case is just broken, >> at least for an old version of the old nfs client -- the same permissions >> are required for all cases, but write permission is supposed to be >> enough for the VA_UTIMES_NULL case (since write permission is sufficient >> for setting the mtime to the current time (plus epsilon) using write(2) >> and truncate(2). Setting the atime to the current time should require >> no more and no less than read permission, since it can be done using >> read(2), but utimes(NULL) requires write permission for that too). > > Correct. All the other uses of VA_UTIMES_NULL in the tree are to > provide the permissions check you describe and there is a large > comment about it in ufs_setattr(). Other filesystems have comments > that reference ufs_setattr(). I think these checks should be done > in nfs_setattr() rather than in the routine to build an NFS attribute > object however. Perhaps it can be done in vfs. There are some technical problems with this, but perhaps they are small. One is that file systems might not even have any timestamps. (I forgot to mention a related problem with the error handling. The permissions checks for utimes() are usually too strict for file systems that only have fake ownerships, like msdosfs. OTOH, msdosfs also doesn't have atimes for most variants of the file system. Since utimes() is supposed to set both the mtime and the atime (especially in the non-NULL case), it strictly cannot work on msdosfs. But msdosfs is not strict about this. It silently ignores the atimes when it can't set them.) > Fixing NFS to properly use vfs_timestamp() seems to be a larger > project. I think it is smaller. For the new nfs code, it is not as simple as changing the NFSGETTIME() macro, since nfs wants extra precision in most cases (for things like comparing cache times), and very rarely wants the semantics of vfs_timestamp(). I somehow missed seeing seeing even more confusion in this area: - the new nfs code also has a macro NFSGETNANOTIME() which reduces to getnanotime(). - monotonic times should be used if possible, but the new nfs code only uses them for NFSD_MONOSEC (which is used a lot) and in 2 places in nfs_commonkrpc.c where a hard-coded getmicrouptime() is used. - NFSGETNANOTIME() is used 4 times. But there are 4 hard-coded uses of getnanotime(). The latter are mostly in places where vfs_timestamp() is correct, for things like n_atim for fifos. nfs almost never needs to set file timestamps, since most file timestamps are set by leaf (non-nfs) file systems on the server. The only exceptions that I know about are the ones already noted (utimes(NULL) on the server, something in create() (?), and n_atim for fifos on the client (?). n_atim for special files on the client were an exception when special files were supported. In the old nfs code: - there are no NFSGET*TIME() macros, and there don't seem to be any get*time() calls instead either. The get*time() calls used are almost exactly the same ones as in the nfs nfs code, with the only exception that I noticed being the ones in the server for utimes(NULL). - monotonic times are only used in the same 2 places in nfs_commonkrpc.c. The non-monotonic time_second is used a lot (hard-coded), I think in much the same places where the new nfs server code time_uptime via NFSD_MONOSEC. I don't like obfuscating standard time calls using macros. Others that I don't like: - both the old and the new nfs client use NFS_TIMESPEC_COMPARE() instead of the standard and better timespeccmp(). NFS_TIMESSPEC_COMPARE() is more verbose and only compares for equality. Its implementation is home made and not based on timespeccmp(). - the new nfs client also has a macro NFS_CMPTIME(). This gives the same result as NFS_TIMESPEC_COMPARE() (or timespeccmp(..., =).. Iti works accidentally for both timespecs and timevals, due to the POSIX bug that the struct members for timespecs abuse the prefix for timevals in their spelling. Code derived from the old nfs client has not been translated to use the new macro (the "new" macro is probably actually older and may even be older or more likely just from a different code base than FreeBSD's timespeccmp()). - the new nfs client also has a macro NFS_SETTIME(). This doesn't actually set the time, but converts from the global timeval `time' to a timespec in the same way as the standard macro TIMEVAL_TO_TIMESPEC() would if it is passed a pointer to the global timeval. Accessing the global `time' like this would give races. Fortunately, the global `time' doesn't exist in FreeBSD, so of course this macro is never used. I like NFSD_MONOSEC, however. Global variables give a much more fragile and harder to translate API than function calls and function-like macros. Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 05:19:17 2013 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 921C69C9; Wed, 16 Jan 2013 05:19:17 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id D282B8B2; Wed, 16 Jan 2013 05:19:16 +0000 (UTC) Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au [211.29.133.168]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0G5JBQC009796; Wed, 16 Jan 2013 16:19:11 +1100 Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail27.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0G5J1mS028275 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 16 Jan 2013 16:19:02 +1100 Date: Wed, 16 Jan 2013 16:19:01 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client In-Reply-To: <1149390778.2023367.1358290140175.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20130116151051.O1060@besplex.bde.org> References: <1149390778.2023367.1358290140175.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Or8XUFDt c=1 sm=1 a=S8Qr1IbAvFsA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=U1Z5fgpPGSMA:10 a=0YgfEWQ0QP7ufM_Kn4MA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: Rick Macklem , fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 05:19:17 -0000 On Tue, 15 Jan 2013, Rick Macklem wrote: > Bruce Evans wrote: >> I can't see anything that does the different permissions check for >> the VA_UTIMES_NULL case, and testing shows that this case is just >> broken, >> at least for an old version of the old nfs client -- the same >> permissions >> are required for all cases, but write permission is supposed to be >> enough for the VA_UTIMES_NULL case (since write permission is >> sufficient >> for setting the mtime to the current time (plus epsilon) using >> write(2) >> and truncate(2). Setting the atime to the current time should require >> no more and no less than read permission, since it can be done using >> read(2), but utimes(NULL) requires write permission for that too). >> > I did a quick test on a -current client/server and it seems to work ok. > The client uses SET_TIME_TO_SERVER and the server sets VA_UTIMES_NULL > for this case. At least it works for a UFS exported volume. It's not working for me with newnfs from 4 Mar 2012: $ mount | grep /c besplex:/c on /c (nfs, asynchronous) $ ls -l /c/tmp/z -rw-rw-rw- 1 root wheel 0 Jan 16 15:12 /c/tmp/z # Not even root owns it, since root on the client is mapped to 0xFFFFFFFFE. $ touch /c/tmp/z touch: /c/tmp/z: Operation not permitted $ touch -r . /c/tmp/z touch: /c/tmp/z: Operation not permitted touch: /c/tmp/z: Operation not permitted The error message from touch are confusing. For plain touch: - it fails twice using utimes(), with errno EPERM and no error message - it then succeeds using read(), write() and truncate() - it then prints an error message - it then exits with status 0. This is with an old version of touch. It always prints an error message if it reaches the read()/write()/truncate() step (rw() function): - if rw() succeeded, then it prints an error message after the rw() returns. rw() fails to preserve errno, so the errno for this step is garbage, but it is usually the one from the second failing utimes(). - if rw() fails, it prints an error message internally. The errno for this is now correct. The current version of touch is even more broken. Someone removed the rw() step from it, under the naive assumption that utimes() actually works. For touch -r: - it fails twice using utimes(), with errno EPERM and no error message. Now even trying the second time (with utimes(NULL) is a bug. A comment says that there is nothing else that we can do in this case, but the code actually falls through and does something wrong (it tries to set to the current time instead of to the specified time). This bug fixed in the current version. - since it is not supposed to do anything more, it prints an error message after the first utimes() failure. It also sets rval to 1 to give an exit status of 1 later. - then it continues the same as for the plain touch case: - it then "succeeds" using read(), write() and truncate(), but this success is in clobbering the timestamps to the current time - it then prints an error message despite "succeeding" - it then exits with status 1. The nfs error is just for the second utimes() in the plain touch case. This should succeed (it succeeds on a local ffs file system). Also, when it fails, the correct errno is EACCES, not EPERM. This works correctly after changing the file mode to readonly and using the buggy touch -r to reach the second utimes() -- the error is now EACCES for both nfs and local ffs. So it seems that the server ffs is being reached correctly, but the non-error case for utimes(NULL) is being mishandled somewhere. This is not due to some maproot magic, since the same error occurs for the non-error case when the ownership is changed to a mere user (!= the test user). >> Oops, on looking at the code I now think it _is_ possible to pass the >> request to set the current time on the server, since in the >> NFSV3SATTRTIME_TOSERVER case we just pass this case value and not >> any time value to the server, so the server has no option but to use >> its current time. It is not surprising that the permissions checks >> for this don't work right. I thought that the client was responsible >> for most permissions checks, but can't find many or the relevant one >> here. The NFSV3SATTRTIME_TOSERVER code on the server sets >> VA_UTIMES_NULL, so I would have thought that the permissions check on >> the server does the right thing. >> > As noted above, it seems to work correctly for the new server in -current, > at least for UFS exports. > > Normally a server will do permission checking for NFS RPCs. There is nothing > stopping a client from doing a check and returning an error, but traditionally > a server has not trusted a client to do so. (I'm not sure if adding a check > in the client is what jhb@ was referring to in his reply to this?) Checking in the client doesn't seem right now. The bug seems to be a different one on the server. >> There are some large timestamping bugs nearby: >> >> - the old nfs server code for NFSV3SATTRTIME_TOSERVER uses >> getnanotime() >> to read the current time. This violates the system's policy set by >> the vfs.timestamp precision in most cases, since using getnanotime() >> is the worst supported policy and is not the defaul. >> ... > >> New nfs code never uses the correct function vfs_timestamp(). > This needs to be fixed. Until now, I would have had no idea what is the > correct interface. (When I did the port, I just used a call that seemed > to return what I wanted.;-) > > Having said that, after reading what you wrote below, it is not obvious > to me what the correct fix is? (It seems to be a choice between microtime() > and vfs_timestamp()?) Just use vfs_timestamp() whenever generating a file timestamp but not for other purposes. Like permissions checking, the client very rarely generates file timestamps, and even on the server most timestamps are not generated by nfs directly. So there are only a few places to check and change. We know about fifos and the utimes(NULL) case in the server (the latter is emulating upper layers in vfs) before calling VOP_SETATTR(). I wonder how well the fifo code works. Its timestamps aren't very important, but they should be synced to the server very occasionally. Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 07:28:47 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4B11081D for ; Wed, 16 Jan 2013 07:28:47 +0000 (UTC) (envelope-from meritoriouslyn4@jetstar.com) Received: from mail.soundplugtw.com (mail.soundplugtw.com [220.130.230.4]) by mx1.freebsd.org (Postfix) with ESMTP id 9B633EE4 for ; Wed, 16 Jan 2013 07:27:06 +0000 (UTC) Received: from [216.82.254.35:63758] by server-8.bemta-7.messagelabs.com id 46/BB-25004-C0552805; Wed, 16 Jan 2013 15:27:05 +0800 Received: (qmail 20687 invoked from network); Wed, 16 Jan 2013 15:27:05 +0800 Received: from unknown (HELO sydeqximr01.corp.jetstar.com) (168.134.2.42) by server-15.tower-143.messagelabs.com with SMTP; Wed, 16 Jan 2013 15:27:05 +0800 Received: from SYDEQXITN04 (sydeqxitn04.corp.jetstar.com [172.23.145.89]) by sydeqximr01.corp.jetstar.com (Postfix) with ESMTP id DA94058046 for <>; Wed, 16 Jan 2013 15:27:05 +0800 From: Jetstar To: Date: Wed, 16 Jan 2013 15:27:05 +0800 Subject: Jetstar Check-in Details Message-Id: <20123171978166.5Z86445057@sydeqximr01.corp.jetstar.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=a__itnirykf_04_10_35" X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 07:28:47 -0000 ------=a__itnirykf_04_10_35 Content-Type: text/plain; charset="windows-1250" Content-Transfer-Encoding: quoted-printable Jetstar Flight Itinerary (Booking ref# F2K3SN) bod= y, td{font-family:arial,sans-serif;font-size:13px} a:link, a:active {colo= r:#1155CC; text-decoration:none} a:hover {text-decoration:underline; curs= or: pointer} a:visited{color:##6611CC} img{border:0px} pre { white-space:= pre; white-space: -moz-pre-wrap; white-space: -o-pre-wrap; white-space: = pre-wrap; word-wrap: break-word; width: 800px; overflow: auto;} = = Wed, 16 Jan 2013 15:27:05 +0800 = = = = = = = = = = = Your= self Check-in Details are attached = = = = = = Booking Reference = = M1KNFB = = = = = = = = = = = = = This is not a Boarding Pa= ss = = = Your Itinerary is attached as file to p= rint = = = = = = = = Unsubscribe from Jet= star Marketing Communications = = = Baggage = Cabin Baggage = = = Your main item must not exceed the dimensions below = = = = = = Flight = Height = Width = Depth = = = JQ, 3K, VF, GK = 56cm = 36cm 23cm = = = Carry-on Bag = = = = = Flight = Height = Width = Depth = = = JQ, 3K, VF, GK = 114cm = 60cm 11cm = = = Suit Pack- non rigid = = = = If your cabin baggage exce= eds any of these limits, you may be required to check in your baggage and= significant fees may apply.Your allowance:Economy Starter fares (includi= ng Starter Plus and Starter Max): 1 main piece of Cabin Baggage and one s= mall item up to a maximum total combined weight of 10kg for each passenge= r for JQ/3K/VF/GK, or 7kg for BL operated flights.Business fares (includi= ng Business Max fares): 2 main pieces of Cabin Baggage and one small item= for each passenger ,provided that each main item does not exceed 10kg, w= ith a total combined Cabin Baggage weight of up to 20kg.All Small items m= ust fit under the seat in front of you and may be one of the following it= ems: small handbag, pocket book or purse, coat, umbrella or laptop. Infan= ts do not have a baggage allowance if they do not occupy a paid seat. Jet= star will carry strollers/prams and allow food/consumables for in-flight = use for any infants/small children free of charge. = = = Checked Baggage = = Any one bag must not weigh more than 32kg or be higher th= an the dimensions below = = = = = = Flight = Height = = A320, A321, B737 = 190cm (6'3) = = The ty= pe of aircraft operating your flight can be found in your flight details = above. Not all fares include a checked baggage allowance, see flight deta= ils above for details of each passengers allowance. = = = = Fare Rules = Your flights are governed by the particular far= e rules of each selected fare. The fare rules give key information as to = if and when the booking is refundable, what changes are permitted and wit= hin what timeframe, and other key information you are required to know.Th= e selected fare for each flight can be found in Flight Details above. = Click to view your full fare rules = = = = Further Important Informat= ion Click here to view further i= nformation regarding your booking and flight with Jetstar, such as furthe= r baggage information, health requirements security information and in-fl= ight product. = = = Conditions of Carriage = Your travel is subject to the Jetst= ar Conditions of Carriage. Some of the applicable key conditions of carri= age are provided in the link below. The full Conditions of Carriage are a= vailable at the airport or on jetstar.com. If your journey is to another = country, the Montreal or Warsaw Convention may govern and limit liability= for death or injury and for loss of or delay or damage to baggage. For m= ore info view the link below. Vi= ew Jetstar key conditions of carriage = = = No Flight Connections = Unless you have been advised otherwi= se by Jetstar, you must collect your Checked Baggage after each individua= l flight. It is the Passenger's responsibility when making Bookings to al= low time for Baggage collection and recheck and terminal transfer if requ= ired. Please see our Ԕravel InfoԠsection of Jetstar.com, and = refer to our ԁt the AirportԠpage for further information. Tra= vel insurance is recommended. Jetstar does not guarantee it will be able = to carry you and your Baggage in accordance with the scheduled date and t= ime of the flights specified. Schedules may change without notice for a r= ange of reasons including but not limited to bad weather, air traffic con= trol delays, strikes, technical disruptions and late inbound aircraft. Un= less otherwise required by law, we will not be responsible for paying any= costs or expenses you may incur as a result of the changed time or cance= llation. = = = For all other details visit our Customer Service Page = = Jetstar= Airways Pty. Ltd.GPO Box 4713Melbourne VIC 3001AUSTRALIAABN: 33 069 720 = 243Travel Agents Licence Number VIC32696 = T= his e-mail is intended only to be read or used by the addressee. It is co= nfidential and may contain legally privileged information. If you are not= the addressee indicated in this message (or responsible for delivery of = the message to such person), you may not copy or deliver this message to = anyone, and you should destroy this message and kindly notify the sender = by reply e-mail. Confidentiality and legal privilege are not waived or lo= st by reason of mistaken delivery to you. = Jetstar Airways Pty Limited ABN = 33 069 720 243 = = ------=a__itnirykf_04_10_35-- From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 07:38:03 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 48B2A93E; Wed, 16 Jan 2013 07:38:03 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 60986F49; Wed, 16 Jan 2013 07:38:02 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3YmKyb6vnkz7ySF; Wed, 16 Jan 2013 08:37:59 +0100 (CET) Date: Wed, 16 Jan 2013 08:37:59 +0100 From: Nicolas Rachinsky To: Artem Belevich Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130116073759.GA47781@mid.pc5.i.0x5.de> References: <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 07:38:03 -0000 * Artem Belevich [2013-01-15 16:16 -0800]: > On Tue, Jan 15, 2013 at 2:45 PM, Nicolas Rachinsky > wrote: > > 147 0 100098 kernel zio_write_issue_ mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 space_map_load_wait+0x20 metaslab_activate+0x73 metaslab_alloc+0x7b2 zio_dva_allocate+0x9a zio_execute+0xc3 taskqueue_run_locked+0x85 taskqueue_thread_loop+0x4e fork_exit+0x11f fork_trampoline+0xe > > It appears that lots of threads are stuck in > metaslab_activate->space_map_load_wait path. This sounds like CR# > 6876962 in Solaris: "degraded write performance with threads held up > by space_map_load_wait(). This bug is fixed in patch 147440-05, -06 or > -07, which is current and contains the fix." Alas, I could not find > specifics on how the issue got fixed and whether the same fix is > present in illumos and FreeBSD. > > You may want to update your system to very recent FreeBSD as quite a > few fixes were recently imported from illumos. Hopefully it will deal > with the issue. I'm out of ideas otherwise. Sorry. Do you mean -CURRENT or -STABLE with very recent? Or just 9.1? Thank you for your efforts! Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 07:42:46 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 92A1CC74; Wed, 16 Jan 2013 07:42:46 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 179DDF98; Wed, 16 Jan 2013 07:42:46 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3YmL452m1Yz7ySF; Wed, 16 Jan 2013 08:42:45 +0100 (CET) Date: Wed, 16 Jan 2013 08:42:45 +0100 From: Nicolas Rachinsky To: Steven Hartland Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130116074245.GB47781@mid.pc5.i.0x5.de> References: <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <00F86FD0E85D4EEEA1A01E115497F022@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <00F86FD0E85D4EEEA1A01E115497F022@multiplay.co.uk> X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 07:42:46 -0000 * Steven Hartland [2013-01-16 00:57 -0000]: > > ----- Original Message ----- From: "Artem Belevich" > > To: "Nicolas Rachinsky" > Cc: "freebsd-fs" > Sent: Wednesday, January 16, 2013 12:16 AM > Subject: Re: slowdown of zfs (tx->tx) > > > >It appears that lots of threads are stuck in > >metaslab_activate->space_map_load_wait path. This sounds like CR# > >6876962 in Solaris: "degraded write performance with threads held up > >by space_map_load_wait(). This bug is fixed in patch 147440-05, -06 or > >-07, which is current and contains the fix." Alas, I could not find > >specifics on how the issue got fixed and whether the same fix is > >present in illumos and FreeBSD. > > That would tend to indicate its blocking on write. If this is the case > yet the rsync is copying from this box, with little else doing writes > it could be atime which is causing the issue. I was probably misformulating my mail. The rsync writes to the local zpool. > A test for this would be to use the following to disable atime and see > if that helps: > zfs set atime=off [filesystem] > > Also out of interest does the pool have many snapshots? There are 115 filesystems. 84 of these have between 10 and 20 snapshots. Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 08:45:04 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A99CA2C6 for ; Wed, 16 Jan 2013 08:45:04 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vb0-f53.google.com (mail-vb0-f53.google.com [209.85.212.53]) by mx1.freebsd.org (Postfix) with ESMTP id 6B99035F for ; Wed, 16 Jan 2013 08:45:04 +0000 (UTC) Received: by mail-vb0-f53.google.com with SMTP id b23so1039545vbz.26 for ; Wed, 16 Jan 2013 00:45:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=C3Jkc6hVnOvIT58v6Kf2LAxUDOYu5fzO+97ZfbORdoI=; b=qUxrD63WSN+dyaDeaDbT6Zv7eDvdLsf69b3WIO7tGmDLkRyNzqVSzi5GR1h4bxl4Cc t4AnapxPmS4NxOVCjC5Pqb3cjU683WpjQ8Ofl+dGXyY2uFsNOLR5O8AMOcpEI6TwfWCf IVhTVm6Zk27JKB21EYobjhmQgooKFVZxVW9Py5xT8cAeIfJlmylCXPzjZoTT1VfBiPzJ R5N954n6BSeVcyun1ktdg5zX20HU7qNjbYCWTl5sKkbOtE7g7Ve279Sv1zKY52f3jS5g I+C7Ch6brN9r3UvkTRVSwZiYFgHjBjIi6b1pDtEsJYHtK1z5w3d0DRGyKjgP//SQEftZ JFow== MIME-Version: 1.0 X-Received: by 10.52.156.40 with SMTP id wb8mr273529vdb.39.1358325901454; Wed, 16 Jan 2013 00:45:01 -0800 (PST) Sender: artemb@gmail.com Received: by 10.220.122.196 with HTTP; Wed, 16 Jan 2013 00:45:01 -0800 (PST) In-Reply-To: <20130116073759.GA47781@mid.pc5.i.0x5.de> References: <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <20130116073759.GA47781@mid.pc5.i.0x5.de> Date: Wed, 16 Jan 2013 00:45:01 -0800 X-Google-Sender-Auth: B8y6J4yhknJ3SFmTekEizE9Tq_w Message-ID: Subject: Re: slowdown of zfs (tx->tx) From: Artem Belevich To: Nicolas Rachinsky Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 08:45:04 -0000 On Tue, Jan 15, 2013 at 11:37 PM, Nicolas Rachinsky wrote: >> You may want to update your system to very recent FreeBSD as quite a >> few fixes were recently imported from illumos. Hopefully it will deal >> with the issue. I'm out of ideas otherwise. Sorry. > > Do you mean -CURRENT or -STABLE with very recent? Or just 9.1? -HEAD or -STABLE (-8 or -9). --Artem From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 09:39:42 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F05B6277; Wed, 16 Jan 2013 09:39:42 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 0C3E8875; Wed, 16 Jan 2013 09:39:41 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA29126; Wed, 16 Jan 2013 11:39:31 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TvPT1-000Nsg-DI; Wed, 16 Jan 2013 11:39:31 +0200 Message-ID: <50F67551.5020704@FreeBSD.org> Date: Wed, 16 Jan 2013 11:39:29 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Nicolas Rachinsky Subject: Re: slowdown of zfs (tx->tx) References: <20130110193949.GA10023@mid.pc5.i.0x5.de> <20130111073417.GA95100@mid.pc5.i.0x5.de> <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> In-Reply-To: X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 09:39:43 -0000 on 16/01/2013 02:16 Artem Belevich said the following: > It appears that lots of threads are stuck in > metaslab_activate->space_map_load_wait path. Nicolas, another thing to check - is your pool nearly full. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 09:50:10 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DD1A9471; Wed, 16 Jan 2013 09:50:10 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 97B7591E; Wed, 16 Jan 2013 09:50:10 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3YmNv50LWCz7ySc; Wed, 16 Jan 2013 10:50:09 +0100 (CET) Date: Wed, 16 Jan 2013 10:50:09 +0100 From: Nicolas Rachinsky To: Andriy Gapon Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130116095009.GA36867@mid.pc5.i.0x5.de> References: <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <50F67551.5020704@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50F67551.5020704@FreeBSD.org> X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 09:50:10 -0000 * Andriy Gapon [2013-01-16 11:39 +0200]: > on 16/01/2013 02:16 Artem Belevich said the following: > > It appears that lots of threads are stuck in > > metaslab_activate->space_map_load_wait path. > > another thing to check - is your pool nearly full. Don't think so: NAME USED AVAIL REFER MOUNTPOINT pool1 5.52T 697G 11.9M /pool1 Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 10:14:33 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DE53FB06; Wed, 16 Jan 2013 10:14:33 +0000 (UTC) (envelope-from prvs=1728d5906c=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 5C08DA5C; Wed, 16 Jan 2013 10:14:32 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001723873.msg; Wed, 16 Jan 2013 10:14:30 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 16 Jan 2013 10:14:30 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1728d5906c=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Nicolas Rachinsky" , "Andriy Gapon" References: <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <50F67551.5020704@FreeBSD.org> <20130116095009.GA36867@mid.pc5.i.0x5.de> Subject: Re: slowdown of zfs (tx->tx) Date: Wed, 16 Jan 2013 10:14:54 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 10:14:33 -0000 ----- Original Message ----- From: "Nicolas Rachinsky" >* Andriy Gapon [2013-01-16 11:39 +0200]: >> on 16/01/2013 02:16 Artem Belevich said the following: >> > It appears that lots of threads are stuck in >> > metaslab_activate->space_map_load_wait path. >> >> another thing to check - is your pool nearly full. > > Don't think so: > NAME USED AVAIL REFER MOUNTPOINT > pool1 5.52T 697G 11.9M /pool1 You only have ~11% free so yer it is pretty full ;-) Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 10:20:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7C89BC2D; Wed, 16 Jan 2013 10:20:25 +0000 (UTC) (envelope-from prvs=1728d5906c=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 7EBD1AAD; Wed, 16 Jan 2013 10:20:24 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001723941.msg; Wed, 16 Jan 2013 10:20:22 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 16 Jan 2013 10:20:22 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1728d5906c=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <98723B7F45F643F3A96FDB6B9285E935@multiplay.co.uk> From: "Steven Hartland" To: "Nicolas Rachinsky" References: <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <00F86FD0E85D4EEEA1A01E115497F022@multiplay.co.uk> <20130116074245.GB47781@mid.pc5.i.0x5.de> Subject: Re: slowdown of zfs (tx->tx) Date: Wed, 16 Jan 2013 10:20:46 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 10:20:25 -0000 ----- Original Message ----- From: "Nicolas Rachinsky" To: "Steven Hartland" Cc: "Artem Belevich" ; "freebsd-fs" Sent: Wednesday, January 16, 2013 7:42 AM Subject: Re: slowdown of zfs (tx->tx) >* Steven Hartland [2013-01-16 00:57 -0000]: >> >> ----- Original Message ----- From: "Artem Belevich" >> >> To: "Nicolas Rachinsky" >> Cc: "freebsd-fs" >> Sent: Wednesday, January 16, 2013 12:16 AM >> Subject: Re: slowdown of zfs (tx->tx) >> >> >> >It appears that lots of threads are stuck in >> >metaslab_activate->space_map_load_wait path. This sounds like CR# >> >6876962 in Solaris: "degraded write performance with threads held up >> >by space_map_load_wait(). This bug is fixed in patch 147440-05, -06 or >> >-07, which is current and contains the fix." Alas, I could not find >> >specifics on how the issue got fixed and whether the same fix is >> >present in illumos and FreeBSD. >> >> That would tend to indicate its blocking on write. If this is the case >> yet the rsync is copying from this box, with little else doing writes >> it could be atime which is causing the issue. > > I was probably misformulating my mail. The rsync writes to the local > zpool. > >> A test for this would be to use the following to disable atime and see >> if that helps: >> zfs set atime=off [filesystem] If you don't need atime I would still recommend setting atime=off. >> Also out of interest does the pool have many snapshots? > > There are 115 filesystems. 84 of these have between 10 and 20 > snapshots. Hmm so over 1000 snapshots, that's not going to help. If there's are something that's built up over time + increased disk usage, that could well explain the slowdown your seeing and would also explain why your seeing threads taking time in metaslab_activate->space_map_load_wait. Are the snapshots something you can clear down and test to see if that improves things? Out of interested what type of data are you working with and is it compressible? If it is, it might be worth testing with compression enabled. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 12:05:35 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1BAD7574 for ; Wed, 16 Jan 2013 12:05:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 61E14140 for ; Wed, 16 Jan 2013 12:05:34 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA00475; Wed, 16 Jan 2013 14:05:30 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TvRkI-000O2E-51; Wed, 16 Jan 2013 14:05:30 +0200 Message-ID: <50F69788.2040506@FreeBSD.org> Date: Wed, 16 Jan 2013 14:05:28 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Nicolas Rachinsky Subject: Re: slowdown of zfs (tx->tx) References: <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <50F67551.5020704@FreeBSD.org> <20130116095009.GA36867@mid.pc5.i.0x5.de> In-Reply-To: X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 12:05:35 -0000 on 16/01/2013 12:14 Steven Hartland said the following: > ----- Original Message ----- From: "Nicolas Rachinsky" > >> * Andriy Gapon [2013-01-16 11:39 +0200]: >>> on 16/01/2013 02:16 Artem Belevich said the following: >>> > It appears that lots of threads are stuck in >>> > metaslab_activate->space_map_load_wait path. >>> >>> another thing to check - is your pool nearly full. >> >> Don't think so: >> NAME USED AVAIL REFER MOUNTPOINT >> pool1 5.52T 697G 11.9M /pool1 > > You only have ~11% free so yer it is pretty full ;-) Nicolas, just in case, Steve is not kidding. Those free hundreds of gigabytes could be spread over the terabytes and could be quite fragmented if the pool has a history of adding and removing lots of files. ZFS could be spending quite a lot of time in that case when it looks for some free space and tries to minimize further fragmentation. Empirical/anecdotal safe limit on pool utilization is said to be about 70-80%. You can test if this guess is true by doing the following: kgdb -w (kgdb) set metaslab_min_alloc_size=4096 If performance noticeably improves after that, then this is your problem indeed. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 13:01:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 27307CE8; Wed, 16 Jan 2013 13:01:23 +0000 (UTC) (envelope-from feld@feld.me) Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2]) by mx1.freebsd.org (Postfix) with ESMTP id EBF2F680; Wed, 16 Jan 2013 13:01:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me; s=blargle; h=In-Reply-To:Message-Id:From:Mime-Version:Date:References:Subject:Cc:To:Content-Type; bh=DZp3JXw6VlNx5xcPoteJ0jCEm4wk1vL/UOH1VAdtMVw=; b=ohtn4sAFxTYUlXYroLgRYEJ0xeLYjUODKf7vaziOceUbylyGlS1UFef6QPFvVKNscdrMf0wcJrREcQJf2c6R/GXCi4ctq2ALfmuGK3jlRt4C4heHdGMoeFxjGu9QTOoH; Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org) by feld.me with esmtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1TvScG-0007uH-7t; Wed, 16 Jan 2013 07:01:16 -0600 Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4) with esmtpsa id 1358341270-24241-86284/5/1; Wed, 16 Jan 2013 13:01:10 +0000 Content-Type: text/plain; format=flowed; delsp=yes To: Andriy Gapon , Nicolas Rachinsky Subject: Re: slowdown of zfs (tx->tx) References: <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <50F67551.5020704@FreeBSD.org> <20130116095009.GA36867@mid.pc5.i.0x5.de> Date: Wed, 16 Jan 2013 07:01:10 -0600 Mime-Version: 1.0 From: Mark Felder Message-Id: In-Reply-To: <20130116095009.GA36867@mid.pc5.i.0x5.de> User-Agent: Opera Mail/12.12 (FreeBSD) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 13:01:23 -0000 On Wed, 16 Jan 2013 03:50:09 -0600, Nicolas Rachinsky wrote: > * Andriy Gapon [2013-01-16 11:39 +0200]: >> on 16/01/2013 02:16 Artem Belevich said the following: >> > It appears that lots of threads are stuck in >> > metaslab_activate->space_map_load_wait path. >> >> another thing to check - is your pool nearly full. > > Don't think so: > NAME USED AVAIL REFER MOUNTPOINT > pool1 5.52T 697G 11.9M /pool1 > Never let your ZFS pool go above 80% or you'll have very, very poor performance. From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 14:26:47 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D684FF76; Wed, 16 Jan 2013 14:26:47 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 65382C41; Wed, 16 Jan 2013 14:26:47 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAAO49lCDaFvO/2dsb2JhbABFhjq3YnOCHgEBBAEjVgUWDgoCAg0ZAlkGiCYGpmmRKYEji1KDMIETA4hhjSuQSYMTggY X-IronPort-AV: E=Sophos;i="4.84,479,1355115600"; d="scan'208";a="9317566" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 16 Jan 2013 09:26:46 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 08BA7B3F15; Wed, 16 Jan 2013 09:26:46 -0500 (EST) Date: Wed, 16 Jan 2013 09:26:46 -0500 (EST) From: Rick Macklem To: Bruce Evans Message-ID: <1642392672.2036529.1358346406018.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20130116151051.O1060@besplex.bde.org> Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Rick Macklem , fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 14:26:47 -0000 Bruce Evans wrote: > On Tue, 15 Jan 2013, Rick Macklem wrote: > > > Bruce Evans wrote: > > >> I can't see anything that does the different permissions check for > >> the VA_UTIMES_NULL case, and testing shows that this case is just > >> broken, > >> at least for an old version of the old nfs client -- the same > >> permissions > >> are required for all cases, but write permission is supposed to be > >> enough for the VA_UTIMES_NULL case (since write permission is > >> sufficient > >> for setting the mtime to the current time (plus epsilon) using > >> write(2) > >> and truncate(2). Setting the atime to the current time should > >> require > >> no more and no less than read permission, since it can be done > >> using > >> read(2), but utimes(NULL) requires write permission for that too). > >> > > I did a quick test on a -current client/server and it seems to work > > ok. > > The client uses SET_TIME_TO_SERVER and the server sets > > VA_UTIMES_NULL > > for this case. At least it works for a UFS exported volume. > > It's not working for me with newnfs from 4 Mar 2012: > > $ mount | grep /c > besplex:/c on /c (nfs, asynchronous) > $ ls -l /c/tmp/z > -rw-rw-rw- 1 root wheel 0 Jan 16 15:12 /c/tmp/z > # Not even root owns it, since root on the client is mapped to > 0xFFFFFFFFE. > $ touch /c/tmp/z > touch: /c/tmp/z: Operation not permitted > $ touch -r . /c/tmp/z > touch: /c/tmp/z: Operation not permitted > touch: /c/tmp/z: Operation not permitted > Well, I just ran essentially the same test, using the new client patched with jhb@'s patch and an up to date server and I got the same behaviour as when doing the touch locally on the file in the file system. - when not the file owner, but having write permissions touch - worked for both local and NFS mount touch -r - failed with Operation not permitted for both local and NFS mount The test I had done before used a trivial program that just did a utimes(NULL) and it worked as non-owner with write access, as well. The server appears to have been patched for this at r157325 (Apr. 2006). Maybe your server hasn't been patched for this? rick > The error message from touch are confusing. For plain touch: > - it fails twice using utimes(), with errno EPERM and no error message > - it then succeeds using read(), write() and truncate() > - it then prints an error message > - it then exits with status 0. > This is with an old version of touch. It always prints an error > message > if it reaches the read()/write()/truncate() step (rw() function): > - if rw() succeeded, then it prints an error message after the rw() > returns. rw() fails to preserve errno, so the errno for this step > is garbage, but it is usually the one from the second failing > utimes(). > - if rw() fails, it prints an error message internally. The errno for > this is now correct. > The current version of touch is even more broken. Someone removed the > rw() step from it, under the naive assumption that utimes() actually > works. > > For touch -r: > - it fails twice using utimes(), with errno EPERM and no error > message. > Now even trying the second time (with utimes(NULL) is a bug. A > comment says that there is nothing else that we can do in this case, > but the code actually falls through and does something wrong (it > tries to set to the current time instead of to the specified time). > This bug fixed in the current version. > - since it is not supposed to do anything more, it prints an error > message > after the first utimes() failure. It also sets rval to 1 to give an > exit status of 1 later. > - then it continues the same as for the plain touch case: > - it then "succeeds" using read(), write() and truncate(), but this > success is in clobbering the timestamps to the current time > - it then prints an error message despite "succeeding" > - it then exits with status 1. > > The nfs error is just for the second utimes() in the plain touch case. > This should succeed (it succeeds on a local ffs file system). Also, > when > it fails, the correct errno is EACCES, not EPERM. This works correctly > after changing the file mode to readonly and using the buggy touch -r > to reach the second utimes() -- the error is now EACCES for both nfs > and local ffs. So it seems that the server ffs is being reached > correctly, but the non-error case for utimes(NULL) is being mishandled > somewhere. This is not due to some maproot magic, since the same error > occurs for the non-error case when the ownership is changed to a mere > user (!= the test user). > > >> Oops, on looking at the code I now think it _is_ possible to pass > >> the > >> request to set the current time on the server, since in the > >> NFSV3SATTRTIME_TOSERVER case we just pass this case value and not > >> any time value to the server, so the server has no option but to > >> use > >> its current time. It is not surprising that the permissions checks > >> for this don't work right. I thought that the client was > >> responsible > >> for most permissions checks, but can't find many or the relevant > >> one > >> here. The NFSV3SATTRTIME_TOSERVER code on the server sets > >> VA_UTIMES_NULL, so I would have thought that the permissions check > >> on > >> the server does the right thing. > >> > > As noted above, it seems to work correctly for the new server in > > -current, > > at least for UFS exports. > > > > Normally a server will do permission checking for NFS RPCs. There is > > nothing > > stopping a client from doing a check and returning an error, but > > traditionally > > a server has not trusted a client to do so. (I'm not sure if adding > > a check > > in the client is what jhb@ was referring to in his reply to this?) > > Checking in the client doesn't seem right now. The bug seems to be a > different one on the server. > > >> There are some large timestamping bugs nearby: > >> > >> - the old nfs server code for NFSV3SATTRTIME_TOSERVER uses > >> getnanotime() > >> to read the current time. This violates the system's policy set by > >> the vfs.timestamp precision in most cases, since using > >> getnanotime() > >> is the worst supported policy and is not the defaul. > >> ... > > > >> New nfs code never uses the correct function vfs_timestamp(). > > This needs to be fixed. Until now, I would have had no idea what is > > the > > correct interface. (When I did the port, I just used a call that > > seemed > > to return what I wanted.;-) > > > > Having said that, after reading what you wrote below, it is not > > obvious > > to me what the correct fix is? (It seems to be a choice between > > microtime() > > and vfs_timestamp()?) > > Just use vfs_timestamp() whenever generating a file timestamp but not > for > other purposes. Like permissions checking, the client very rarely > generates > file timestamps, and even on the server most timestamps are not > generated > by nfs directly. So there are only a few places to check and change. > We > know about fifos and the utimes(NULL) case in the server (the latter > is > emulating upper layers in vfs) before calling VOP_SETATTR(). I wonder > how well the fifo code works. Its timestamps aren't very important, > but > they should be synced to the server very occasionally. > > Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 14:33:53 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 161AC1DA for ; Wed, 16 Jan 2013 14:33:53 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D4988D11 for ; Wed, 16 Jan 2013 14:33:52 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAFC59lCDaFvO/2dsb2JhbABFhjq3YnOCHgEBAQMBAQEBICsgCwUWDgoCAg0ZAiMGAQkmBggHBAEcBIdmAwkGDKZdiR4Nh36BI4plgQiDFYETA4hhin1YgVaBHIobhRKDE4FRNQ X-IronPort-AV: E=Sophos;i="4.84,479,1355115600"; d="scan'208";a="9319197" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 16 Jan 2013 09:33:51 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id ED68BB3F0B; Wed, 16 Jan 2013 09:33:51 -0500 (EST) Date: Wed, 16 Jan 2013 09:33:51 -0500 (EST) From: Rick Macklem To: Sergey Kandaurov Message-ID: <227703439.2036949.1358346831962.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: getcwd lies on/under nfs4-mounted zfs dataset MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 14:33:53 -0000 pluknet@gmail.com wrote: > Hi. > > We stuck with the problem getting wrong current directory path > when sitting on/under zfs dataset filesystem mounted over NFSv4. > Both nfs server and client are 10.0-CURRENT from December or so. > > The component path "user3" unexpectedly appears to be "." (dot). > nfs-client:/home/user3 # pwd > /home/. > nfs-client:/home/user3/var/run # pwd > /home/./var/run > Although you are welcome to try the patch I emailed you yesterday, I think it will result in the tree traversal algorithm in libc complaining about a cycle at some point, because there could be another node in the file system on the server that has the same fileno as the mounted-on-fileno. I need to take a close look at getcwd() and see how it handles mount point crossings. The trick is that the NFSv4 client must make getcwd() happy, but also try to avoid duplicate fileno (i-node #s) values within a subtree of the mount that has a given fsid. Since I can reproduce it here, I'll work on it and post if/when I have a better patch. rick > nfs-client:~ # procstat -f 3225 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 3225 a.out text v r r-------- - - - /home/./var/a.out > 3225 a.out ctty v c rw------- - - - /dev/pts/2 > 3225 a.out cwd v d r-------- - - - /home/./var > 3225 a.out root v d r-------- - - - / > > The used setup follows. > > 1. NFS Server with local ZFS: > # cat /etc/exports > V4: / -sec=sys > > # zfs list > pool1 10.4M 122G 580K /pool1 > pool1/user3 on /pool1/user3 (zfs, NFS exported, local, nfsv4acls) > > Exports list on localhost: > /pool1/user3 109.70.28.0 > /pool1 109.70.28.0 > > # zfs get sharenfs pool1/user3 > NAME PROPERTY VALUE SOURCE > pool1/user3 sharenfs -alldirs -maproot=root -network=109.70.28.0/24 > local > > 2. pool1 is mounted on NFSv4 client: > nfs-server:/pool1 on /home (nfs, noatime, nfsv4acls) > > So that on NFS client the "pool1/user3" dataset comes at /home/user3. > / - ufs > /home - zpool-over-nfsv4 > /home/user3 - zfs dataset "pool1/user3" > > At the same time it works as expected when we're not on zfs dataset, > but directly on its parent zfs pool (also over NFSv4), e.g. > nfs-client:/home/non_dataset_dir # pwd > /home/non_dataset_dir > > The ls command works as expected: > nfs-client:/# ls -dl /home/user3/var/ > drwxrwxrwt+ 6 root wheel 6 Jan 10 16:19 /home/user3/var/ > > -- > wbr, > pluknet > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Jan 16 23:47:43 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 80680C8E for ; Wed, 16 Jan 2013 23:47:43 +0000 (UTC) (envelope-from cochard@gmail.com) Received: from mail-bk0-f53.google.com (mail-bk0-f53.google.com [209.85.214.53]) by mx1.freebsd.org (Postfix) with ESMTP id 101A8858 for ; Wed, 16 Jan 2013 23:47:42 +0000 (UTC) Received: by mail-bk0-f53.google.com with SMTP id j5so1024650bkw.40 for ; Wed, 16 Jan 2013 15:47:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:sender:from:date:x-google-sender-auth :message-id:subject:to:content-type; bh=yNrhERibtUvbyHvK+62pMOiErmvu/IC4/8sci63L884=; b=OAumYMNmR3miDlasg8U5m4sbRH64oJpp9Pxv7hqNEVvveatN0u80+/ve7psjKEB8Ex b6zLaWllNI/Tz1azxZTGBpvpQDALZCAxA7VU8eeuAVI3zli5S7E+uqW1nOKaRIuJQU5i EGrcZQq0eno6KslMbGPnS7McGio15NIlgPMfzVnGHRHC8eWDN0Vr3ugL21b7vCbFxPg0 KXwWBf/Gcypun2nzQVhUU1bfVH9GNLQvelxFOQee3T3tj2ArV9HEsmwfA5jNkM17VTE+ Nofzf1tjzR1LaB3+umNZKRKCvOnY7Gb32jW4fpWmgBicjxICpquo1PcBDjvhN7jkBWR9 fJGA== X-Received: by 10.204.157.152 with SMTP id b24mr949549bkx.92.1358380061423; Wed, 16 Jan 2013 15:47:41 -0800 (PST) MIME-Version: 1.0 Sender: cochard@gmail.com Received: by 10.205.116.199 with HTTP; Wed, 16 Jan 2013 15:47:21 -0800 (PST) From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= Date: Thu, 17 Jan 2013 00:47:21 +0100 X-Google-Sender-Auth: 6LF5BdYdwXlmEbOktVDLzyp2QWw Message-ID: Subject: Reproducible crash with tmpfs on 9.1-release To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 23:47:43 -0000 Hi, I reach to reproduce a crash on 9.1-release (amd64) by compiling software on a tmpfs workdir (cf PR/175353). My first machine is a 8 core with 56GB RAM, but without swap, then I didn't have a core dump on it. Then I reproduced the crash on a smaller machine with 4 core and only 4GB RAM but with a swap. I've put the files from my /var/crash online for anyone interested by (with the exception of the full vmcore): http://gugus69.free.fr/freebsd/tmpfs/core0/ Happy debugging ! Olivier From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 00:13:38 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A82B96BD; Thu, 17 Jan 2013 00:13:38 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 823E19D0; Thu, 17 Jan 2013 00:13:38 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0H0DcaR013091; Thu, 17 Jan 2013 00:13:38 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0H0DcRD013087; Thu, 17 Jan 2013 00:13:38 GMT (envelope-from linimon) Date: Thu, 17 Jan 2013 00:13:38 GMT Message-Id: <201301170013.r0H0DcRD013087@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/175353: [tmpfs] [panic] panic during building a nanobsd image + ports X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 00:13:38 -0000 Old Synopsis: tmpfs panic during building a nanobsd image + ports New Synopsis: [tmpfs] [panic] panic during building a nanobsd image + ports Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Thu Jan 17 00:13:20 UTC 2013 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=175353 From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 00:42:58 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CB36370 for ; Thu, 17 Jan 2013 00:42:58 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 919A6B24 for ; Thu, 17 Jan 2013 00:42:58 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAMxH91CDaFvO/2dsb2JhbABEhj23WXOCHgEBAQMBAQEBICsgCwUWDgoCAg0ZAiMGAQkmBggHBAEcBIdmAwkGDKduiQcNiBqBI4plgQiDFYETA4hhin1YgVaBHIobhRKDE4FRNQ X-IronPort-AV: E=Sophos;i="4.84,481,1355115600"; d="scan'208";a="12252288" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 16 Jan 2013 19:42:57 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 8A372B4096; Wed, 16 Jan 2013 19:42:57 -0500 (EST) Date: Wed, 16 Jan 2013 19:42:57 -0500 (EST) From: Rick Macklem To: Sergey Kandaurov Message-ID: <1171241649.2066788.1358383377496.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: getcwd lies on/under nfs4-mounted zfs dataset MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 00:42:58 -0000 pluknet@gmail.com wrote: > Hi. > > We stuck with the problem getting wrong current directory path > when sitting on/under zfs dataset filesystem mounted over NFSv4. > Both nfs server and client are 10.0-CURRENT from December or so. > > The component path "user3" unexpectedly appears to be "." (dot). > nfs-client:/home/user3 # pwd > /home/. > nfs-client:/home/user3/var/run # pwd > /home/./var/run > Ok, I've figured out what is going on. The algorithm in libc works, but vn_fullpath1() doesn't. The latter assumes that "mount points" are marked with VV_ROOT etc. For the "pseudo mount points" (which are mount points within the directory tree on the NFSv4 server), this isn't the case. If you: sysctl debug.disablecwd=1 sysctl debug.disablefullpath=1 it works. (At least for the UFS case I tested.) I can't see how this can be made to work correctly for vn_fullpath1() unless it was re-written to use the same algorithm that lib/libc/gen/getcwd.c implements. I was pretty sure this used to work. Maybe the syscalls used to be disabled by default or weren't used by the libc functions? Anyhow, sorry about the cofusing posts while I figured out what was going on, rick ps: Don't use the patch I posted. It isn't needed and will break stuff. > nfs-client:~ # procstat -f 3225 > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 3225 a.out text v r r-------- - - - /home/./var/a.out > 3225 a.out ctty v c rw------- - - - /dev/pts/2 > 3225 a.out cwd v d r-------- - - - /home/./var > 3225 a.out root v d r-------- - - - / > > The used setup follows. > > 1. NFS Server with local ZFS: > # cat /etc/exports > V4: / -sec=sys > > # zfs list > pool1 10.4M 122G 580K /pool1 > pool1/user3 on /pool1/user3 (zfs, NFS exported, local, nfsv4acls) > > Exports list on localhost: > /pool1/user3 109.70.28.0 > /pool1 109.70.28.0 > > # zfs get sharenfs pool1/user3 > NAME PROPERTY VALUE SOURCE > pool1/user3 sharenfs -alldirs -maproot=root -network=109.70.28.0/24 > local > > 2. pool1 is mounted on NFSv4 client: > nfs-server:/pool1 on /home (nfs, noatime, nfsv4acls) > > So that on NFS client the "pool1/user3" dataset comes at /home/user3. > / - ufs > /home - zpool-over-nfsv4 > /home/user3 - zfs dataset "pool1/user3" > > At the same time it works as expected when we're not on zfs dataset, > but directly on its parent zfs pool (also over NFSv4), e.g. > nfs-client:/home/non_dataset_dir # pwd > /home/non_dataset_dir > > The ls command works as expected: > nfs-client:/# ls -dl /home/user3/var/ > drwxrwxrwt+ 6 root wheel 6 Jan 10 16:19 /home/user3/var/ > > -- > wbr, > pluknet > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 02:31:33 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6F0DCAA1; Thu, 17 Jan 2013 02:31:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id 0F01D182; Thu, 17 Jan 2013 02:31:32 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0H2VR97023518 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 17 Jan 2013 13:31:30 +1100 Date: Thu, 17 Jan 2013 13:31:26 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem Subject: Re: [PATCH] Better handle NULL utimes() in the NFS client In-Reply-To: <1642392672.2036529.1358346406018.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20130117132903.O1225@besplex.bde.org> References: <1642392672.2036529.1358346406018.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=R7tbgqtX c=1 sm=1 a=S8Qr1IbAvFsA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=U1Z5fgpPGSMA:10 a=-VJYlQ4Kf0PCPYjB1mIA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 02:31:33 -0000 On Wed, 16 Jan 2013, Rick Macklem wrote: > Bruce Evans wrote: >> On Tue, 15 Jan 2013, Rick Macklem wrote: >> >>> Bruce Evans wrote: >> >>>> I can't see anything that does the different permissions check for >>>> the VA_UTIMES_NULL case, and testing shows that this case is just >>>> broken, >>>> at least for an old version of the old nfs client -- the same >>>> ... >>> I did a quick test on a -current client/server and it seems to work >>> ok. >>> The client uses SET_TIME_TO_SERVER and the server sets >>> VA_UTIMES_NULL >>> for this case. At least it works for a UFS exported volume. >> >> It's not working for me with newnfs from 4 Mar 2012: >> ... > Well, I just ran essentially the same test, using the new client patched > with jhb@'s patch and an up to date server and I got the same behaviour > as when doing the touch locally on the file in the file system. > - when not the file owner, but having write permissions > touch - worked for both local and NFS mount > touch -r - failed with Operation not permitted for > both local and NFS mount > > The test I had done before used a trivial program that just did a utimes(NULL) > and it worked as non-owner with write access, as well. > > The server appears to have been patched for this at r157325 (Apr. 2006). > > Maybe your server hasn't been patched for this? Indeed it hasn't -- it is missing setting of VA_UTIMES_NULL. Bruce From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 03:11:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1D319244; Thu, 17 Jan 2013 03:11:10 +0000 (UTC) (envelope-from araujobsdport@gmail.com) Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com [209.85.212.171]) by mx1.freebsd.org (Postfix) with ESMTP id 8DD893E6; Thu, 17 Jan 2013 03:11:09 +0000 (UTC) Received: by mail-wi0-f171.google.com with SMTP id hn14so4101003wib.16 for ; Wed, 16 Jan 2013 19:11:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:reply-to:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=wVrXJfBtsd1/lgHlaf9PnL68rxnZfEQzXcncALh97Ys=; b=saP0H4k9Gt73kzoXbjEdZWHqNWYM12tTa1q9eV6cUvdar3SuX3pHvlMDR/bY8Qg6V1 Eehqa/G+/Pio58HZyoeA9dCJnzu0QC3VTboGwlu8Zf5RMijPeQLm9feH+4aNLBKIsK9I dh3NqrOKW2nQq0B821kTonkl6E7FiUZD/ukz8wh6EtQZDBMzZ+PzIahQQNVuD8Hc1wkq RasY7E0tFBeY7eySQ9ARj+yQMncV6Nmeg/zD9TquFI1SylyqLJ+4wO/Fqeoea9OJ3ToM 48CajaYaPkiyO2bUisMEXmqz17R2X6pgqWLIRSf037ZG8N0EOTq7lXfwdS0/Lxi37Q9Y S/sQ== MIME-Version: 1.0 X-Received: by 10.180.85.103 with SMTP id g7mr5597769wiz.29.1358392268419; Wed, 16 Jan 2013 19:11:08 -0800 (PST) Received: by 10.180.145.44 with HTTP; Wed, 16 Jan 2013 19:11:08 -0800 (PST) In-Reply-To: <20130109023327.GA1888@FreeBSD.org> References: <20130109023327.GA1888@FreeBSD.org> Date: Thu, 17 Jan 2013 11:11:08 +0800 Message-ID: Subject: Re: rc.d script for memory based zfs intent log From: Marcelo Araujo To: John Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: araujo@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 03:11:10 -0000 2013/1/9 John > http://people.freebsd.org/~jwd/memzil.txt Hello John, In my point of view this script seems be very useful, and I probably will use it in my product. As an example, I faced out few problems on system reboot/shutdown, I use Ramdisk as ZIL, when normal reboot/shutdown the ramdisk will disappear, because right now there is nothing to detach the ramdisk safely. I do believe with this script I can attache a new ZIL in every boot and detach it safely when perform a reboot/shutdown. I use some tricks with my ZIL, it is mirrored using RAM and the RAM has its own battery to protect the data, my main problem is what I described above. I don't think sync=disable is a good option, regards you can loose data. Nice script! Thanks! -- Marcelo Araujo araujo@FreeBSD.org From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 04:21:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4BF7A881 for ; Thu, 17 Jan 2013 04:21:44 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: from mail-ie0-f169.google.com (mail-ie0-f169.google.com [209.85.223.169]) by mx1.freebsd.org (Postfix) with ESMTP id 1FA0099A for ; Thu, 17 Jan 2013 04:21:43 +0000 (UTC) Received: by mail-ie0-f169.google.com with SMTP id c14so4165346ieb.28 for ; Wed, 16 Jan 2013 20:21:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphix.com; s=google; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=NHVW9y7cpP5XK6Ji3Qf32ZhKVIxLJhzATdxtUH2ewJ8=; b=WuDeEEllgXIpaqGziFdu3jdlj/+47queTFUHAKovWh3kPO88eS7F5I+7WBk6oFvbc+ uP1m0py2zfVkxEgZ4cs7FHBZHYHU3YfTnJExhli8r40xQuufW1i3Nt62Wc4GoID2Lrh2 0pYTo4maNewpQyO0aXW3Yxt8lsS2cUkihTG9o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=NHVW9y7cpP5XK6Ji3Qf32ZhKVIxLJhzATdxtUH2ewJ8=; b=mqY19zdTl2tnYzS21XLNgtdvaWUoaUWSdezNj5Bb7CczZzsLXsy4KLXKIIcBWau4r7 89PdCidJmYmut7RD9zlOZ13e7GpiHm2wBSak3JYQXr2gfJPdKS7Gjc7WIWfAMyaNmW+d ERHwL8JgyMDvcR/bVvWEF2HI2G/ixqnI2Nw3h2fQg7KQNwSm9rZ5zc02FVN78L00krpF yZ8IX8H3HKFH4XeYW5+nlZDGJaIIT7E+l9nJizinHR+RyPn0ECNxIoW5aEHveQ+tj+Vh 5SdFbxK+g1Ix65Dwgi1LDbhKhwRvf0MQj8lxNWxCFnpDDQ/5IPpzfUrsVfs8TsE2yYmy cNcw== MIME-Version: 1.0 X-Received: by 10.50.7.204 with SMTP id l12mr6904927iga.103.1358396503365; Wed, 16 Jan 2013 20:21:43 -0800 (PST) Received: by 10.50.91.131 with HTTP; Wed, 16 Jan 2013 20:21:43 -0800 (PST) In-Reply-To: References: <20130109023327.GA1888@FreeBSD.org> Date: Wed, 16 Jan 2013 20:21:43 -0800 Message-ID: Subject: Re: rc.d script for memory based zfs intent log From: Matthew Ahrens To: araujo@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQkL7md778WPa7CW4ZdGfYYODqJdIeDV4p/6Bq4vw0XqeRq/uO/LVjYaCTVkkaHT6dvQDNLq Cc: FreeBSD Filesystems , John X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 04:21:44 -0000 Pardon my inexperience with FreeBSD, but is this ramdisk persistent across reboots? POST doesn't overwrite it? --matt On Wed, Jan 16, 2013 at 7:11 PM, Marcelo Araujo wrote: > 2013/1/9 John > >> http://people.freebsd.org/~jwd/memzil.txt > > > Hello John, > > In my point of view this script seems be very useful, and I probably will > use it in my product. > > As an example, I faced out few problems on system reboot/shutdown, I use > Ramdisk as ZIL, when normal reboot/shutdown the ramdisk will disappear, > because right now there is nothing to detach the ramdisk safely. I do > believe with this script I can attache a new ZIL in every boot and detach > it safely when perform a reboot/shutdown. > > I use some tricks with my ZIL, it is mirrored using RAM and the RAM has its > own battery to protect the data, my main problem is what I described above. > I don't think sync=disable is a good option, regards you can loose data. > > Nice script! > > Thanks! > -- > Marcelo Araujo > araujo@FreeBSD.org > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 04:33:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9E0A0A20; Thu, 17 Jan 2013 04:33:00 +0000 (UTC) (envelope-from araujobsdport@gmail.com) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id 1983225F; Thu, 17 Jan 2013 04:32:59 +0000 (UTC) Received: by mail-wi0-f182.google.com with SMTP id hn14so1943081wib.3 for ; Wed, 16 Jan 2013 20:32:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:reply-to:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=yWXPQIAK/pmOpzSXkOQOZOEUiFZ1UvL5sAqW+oAxOrw=; b=ADp5o7OsjfxtM1cCBwtcJJG7z25pSnjedrtCN3XZVZbGL7o5BWAaTUei1/ozKlLTnp 1IcDPWNEVa8Y2KmDaV3Givj/lFDXFI23yoa6/iFb87xXvXcVvB+YwzIy/opb/Am5tJMx qiP1GmE0R55NwqZBYUW0wF1mX87fGQeJhF8x1pQFhJ2B8ti7ow9SS/J1XfxBai2nl+o3 BdmbOOw72bs96KIlG6KazdSVwdT7SBMRKAOkCifL6lB3KwJlk1lI73zRbNlYS5hgW6wo LN/CXi5ZXrqMtggbfW7Pu4bZnu+UvUL51wFpNnl1eQNgQ5cHgAmtUwVIvlH3EbqCoSpW PWXQ== MIME-Version: 1.0 X-Received: by 10.194.240.233 with SMTP id wd9mr5908977wjc.54.1358397179188; Wed, 16 Jan 2013 20:32:59 -0800 (PST) Received: by 10.180.145.44 with HTTP; Wed, 16 Jan 2013 20:32:59 -0800 (PST) In-Reply-To: References: <20130109023327.GA1888@FreeBSD.org> Date: Thu, 17 Jan 2013 12:32:59 +0800 Message-ID: Subject: Re: rc.d script for memory based zfs intent log From: Marcelo Araujo To: Matthew Ahrens Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems , John X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: araujo@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 04:33:00 -0000 2013/1/17 Matthew Ahrens > Pardon my inexperience with FreeBSD, but is this ramdisk persistent > across reboots? POST doesn't overwrite it? > > --matt Hello Matthew, No, it is not persistent, it will create a new RAMDISK at boot time. But like in my case, I have a persistent ZIL that is mirrored, with small changes on this script you can have a persistent ZIL. But in my case, I have a special stuff, I have the ZIL at RAMDISK mirrored and two different RAM and protected by battery, when I make a shutdown it dump all information from RAMDISK to a SSD, and when I boot, it does the opposite, create the RAMDISK and dump from SSD to RAMDISK and I just attach it once again to the pool. Best Regards, -- Marcelo Araujo araujo@FreeBSD.org From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 04:42:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DD5E2C0B; Thu, 17 Jan 2013 04:42:54 +0000 (UTC) (envelope-from araujobsdport@gmail.com) Received: from mail-wg0-f43.google.com (mail-wg0-f43.google.com [74.125.82.43]) by mx1.freebsd.org (Postfix) with ESMTP id 48FE62A3; Thu, 17 Jan 2013 04:42:53 +0000 (UTC) Received: by mail-wg0-f43.google.com with SMTP id e12so89634wge.22 for ; Wed, 16 Jan 2013 20:42:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:reply-to:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=EzQb9Hx4uTEwN0b0ot9EEUwVT/2euWbbVFTNfKZ3kbY=; b=JoE1cFM3KYT4GfXg/XEkBlVqdH70sblE4ISHBDeGwGl7UA1DA3XzYMtHDyywaOPoOh sY7brwxDYZq1fO75nQGHvG4e1GzECTESOAGe9UOAVXcV85cAkISdntHa+MeNtlFM41LY aWa2fwQCrma978fMT64zjun8M31/zVAQzgxMcGJApnyc2UW8+eDDtVeneljEtMerNO2S 8qUvknISgLBvXfxV0BEMWOE895nf45tbTwZ6gndKC1si6h6b+1uyLafCiWbv/Gq6Sqle GFGfsSybGgyY3BCPmy7G0yzIMZ6DJzM/vGN5l0jxMhChwyNIVwjNdQeF1MvqzN/XJbZG XeAQ== MIME-Version: 1.0 X-Received: by 10.194.240.129 with SMTP id wa1mr6089690wjc.21.1358397773117; Wed, 16 Jan 2013 20:42:53 -0800 (PST) Received: by 10.180.145.44 with HTTP; Wed, 16 Jan 2013 20:42:53 -0800 (PST) In-Reply-To: References: <20130109023327.GA1888@FreeBSD.org> Date: Thu, 17 Jan 2013 12:42:53 +0800 Message-ID: Subject: Re: rc.d script for memory based zfs intent log From: Marcelo Araujo To: John Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: araujo@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 04:42:54 -0000 2013/1/17 Marcelo Araujo > > 2013/1/9 John > >> http://people.freebsd.org/~jwd/memzil.txt > > > Hello John, > > In my point of view this script seems be very useful, and I probably will > use it in my product. > > > Dear John, Just another thing... you must add in the begging of your script the KEYWORD: shutdown, or otherwise it wont be called and you perform the shutdown. Best Regards, -- Marcelo Araujo araujo@FreeBSD.org From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 05:10:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9770F221 for ; Thu, 17 Jan 2013 05:10:54 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 095E7372 for ; Thu, 17 Jan 2013 05:10:53 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r0H5Am09011029; Thu, 17 Jan 2013 07:10:48 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r0H5Am09011029 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r0H5Amio011028; Thu, 17 Jan 2013 07:10:48 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 17 Jan 2013 07:10:48 +0200 From: Konstantin Belousov To: Olivier Cochard-Labb? Subject: Re: Reproducible crash with tmpfs on 9.1-release Message-ID: <20130117051048.GK2522@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="/0P/MvzTfyTu5j9Q" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 05:10:54 -0000 --/0P/MvzTfyTu5j9Q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Jan 17, 2013 at 12:47:21AM +0100, Olivier Cochard-Labb? wrote: > Hi, > I reach to reproduce a crash on 9.1-release (amd64) by compiling > software on a tmpfs workdir (cf PR/175353). > My first machine is a 8 core with 56GB RAM, but without swap, then I > didn't have a core dump on it. > Then I reproduced the crash on a smaller machine with 4 core and only > 4GB RAM but with a swap. > I've put the files from my /var/crash online for anyone interested by > (with the exception of the full vmcore): > http://gugus69.free.fr/freebsd/tmpfs/core0/ This looks as unionfs problem, and not tmpfs. Unionfs is known to be broken in varying ways. --/0P/MvzTfyTu5j9Q Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJQ94fYAAoJEJDCuSvBvK1B3mIP+gOsmeIalj2AOCvB09iIJtbb xAeHOc99RN6S1vD5xMD/7mJHQ6aKzUQyRiGk03SzecQsoDo6OHUKl5rJ2RgtrIsw iHny7Z2CD7GE1KgGgDK1G8LEmOKONvKaqV6AUmCKdS8yW87MMM4VRua5uc/7MNG9 D/1+Syb9ES37KwrKoIYMXxksaQoAvk+QzF4Gdfd6dvVgLoh1SGkPG8n9LUm3VzsO 8Anmy35rpSl4dEvkbSZ7PSyi7FxCTzsaAG88CaEmmq60vmTJhSQyXCzhvBcI0kPI rM2cAi00lc+PghGPmbuGexULbKFnUlN4+kpzEx7eD8rpf6FRfy2WcKR0PRjGRbIp cNMvHJGCyljY6+82W9z6SD9Y4tlCq1J1pABpVzMTVNUhCihYCmYQRavbwP0rAlsv ld0FM8IwGnn/twgNnBkLsJgysd88JhtF81bEwgW8O/J0M4AoTcHE6vVvuDbpGzGb esemmbAAAffsZqyKr2O3EgEX1lMYwDRTqX+7vaG3IMrXgyGvgIRutkLhwdnoosHI e90LXR9IO8StQ9p+fiK1Y37raVRJgkcCNpsDhzSIz/mbMmbc4Jd/2tLCIbwLEjJC yM5A+AIQr9wK9Pv7pzPNKH3fv/Z2i6eGJaqi/zUYAE4/CD6Nd8076yD20eMsdsRN vDbNenJaRFOA/w0hFw2y =MxRL -----END PGP SIGNATURE----- --/0P/MvzTfyTu5j9Q-- From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 09:33:06 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8A8327A0; Thu, 17 Jan 2013 09:33:06 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 45FD8F36; Thu, 17 Jan 2013 09:33:05 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3Yn0Sq2hTVz7ySF; Thu, 17 Jan 2013 10:32:59 +0100 (CET) Date: Thu, 17 Jan 2013 10:32:59 +0100 From: Nicolas Rachinsky To: Andriy Gapon Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130117093259.GA83951@mid.pc5.i.0x5.de> References: <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <50F67551.5020704@FreeBSD.org> <20130116095009.GA36867@mid.pc5.i.0x5.de> <50F69788.2040506@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50F69788.2040506@FreeBSD.org> X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 09:33:06 -0000 * Andriy Gapon [2013-01-16 14:05 +0200]: > on 16/01/2013 12:14 Steven Hartland said the following: > > You only have ~11% free so yer it is pretty full ;-) > > just in case, Steve is not kidding. > > Those free hundreds of gigabytes could be spread over the terabytes and could be > quite fragmented if the pool has a history of adding and removing lots of files. > ZFS could be spending quite a lot of time in that case when it looks for some > free space and tries to minimize further fragmentation. > > Empirical/anecdotal safe limit on pool utilization is said to be about 70-80%. > > You can test if this guess is true by doing the following: > kgdb -w > (kgdb) set metaslab_min_alloc_size=4096 > > If performance noticeably improves after that, then this is your problem indeed. I tried this, but I didn't notice any difference in performance. Next I'll try the update Artem suggested. Thanks Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 10:34:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 54038867 for ; Thu, 17 Jan 2013 10:34:11 +0000 (UTC) (envelope-from cochard@gmail.com) Received: from mail-vb0-f50.google.com (mail-vb0-f50.google.com [209.85.212.50]) by mx1.freebsd.org (Postfix) with ESMTP id 16A621E0 for ; Thu, 17 Jan 2013 10:34:10 +0000 (UTC) Received: by mail-vb0-f50.google.com with SMTP id ft2so1888070vbb.9 for ; Thu, 17 Jan 2013 02:34:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=AFGbU6ubqEF0Ngb1O7G45hiRuo8Anovd7Bhpnk9lrRI=; b=w9XYcr9H+cjqo4Bg4dn8++rHTzRA7ihjEbHSZe3gjxyazBoIKk96JpWpY/gYUwi0L6 znXDF4HCWEluw+ZoCiw345opuc0l3B4PtmUR/Dl2q4u2yhthVDDOODy5iO7CB4o7UC/P V78I/x7Tl2q2PQdrcydcAtAqL/UsOuleOtg7uvJgBKUwCT6nD6JveZsNCYr1RELMlAaa bAiU2Bfh72BQfH5DvvUwVB5ScUXkvJmHc+lRL/g4us9OqKAHLNm3fBSIo+ue9b7lJGzW XHNCfVriUuSQYvYrZ9xQ/qH8svcUHDOjPlp3NO1Ti3I+yfa7qRZe/EHkwfmv+Sf2lsSS 2EXw== X-Received: by 10.220.107.202 with SMTP id c10mr4783050vcp.59.1358418850019; Thu, 17 Jan 2013 02:34:10 -0800 (PST) MIME-Version: 1.0 Sender: cochard@gmail.com Received: by 10.58.164.100 with HTTP; Thu, 17 Jan 2013 02:33:49 -0800 (PST) In-Reply-To: <20130117051048.GK2522@kib.kiev.ua> References: <20130117051048.GK2522@kib.kiev.ua> From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= Date: Thu, 17 Jan 2013 11:33:49 +0100 X-Google-Sender-Auth: U3SUkz8E-GclYEPsPiYJ_LgD1Ew Message-ID: Subject: Re: Reproducible crash with tmpfs on 9.1-release To: Konstantin Belousov Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 10:34:11 -0000 On Thu, Jan 17, 2013 at 6:10 AM, Konstantin Belousov wrote: > > This looks as unionfs problem, and not tmpfs. Unionfs is known to be > broken in varying ways. Yes I'm using unionfs too, but I meet this problem only when I use a tmpfs as workdir. By the way, applying the mgj's patch (in stable/9 as of r245351): http://people.freebsd.org/~mjg/patches/lockmgr-noshare-interlock.diff solve this problem on my 2 machines ! No more crash :-) Regards, Olivier From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 10:40:01 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CA178929 for ; Thu, 17 Jan 2013 10:40:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 9B4D7210 for ; Thu, 17 Jan 2013 10:40:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0HAe1mo029361 for ; Thu, 17 Jan 2013 10:40:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0HAe1S9029360; Thu, 17 Jan 2013 10:40:01 GMT (envelope-from gnats) Date: Thu, 17 Jan 2013 10:40:01 GMT Message-Id: <201301171040.r0HAe1S9029360@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= Subject: Re: kern/175353: [tmpfs] [panic] panic during building a nanobsd image + ports X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 10:40:01 -0000 The following reply was made to PR kern/175353; it has been noted by GNATS. From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= To: bug-followup@freebsd.org Cc: Subject: Re: kern/175353: [tmpfs] [panic] panic during building a nanobsd image + ports Date: Thu, 17 Jan 2013 11:36:22 +0100 Applying the mjg's patch of revision 245351 (kern_lock.c) solve this problem on my 2 machines. Regards, Olivier From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 19:19:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2DF32118 for ; Thu, 17 Jan 2013 19:19:23 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from cpsmtpb-ews05.kpnxchange.com (cpsmtpb-ews05.kpnxchange.com [213.75.39.8]) by mx1.freebsd.org (Postfix) with ESMTP id BAD9CB9E for ; Thu, 17 Jan 2013 19:19:22 +0000 (UTC) Received: from cpsps-ews29.kpnxchange.com ([10.94.84.195]) by cpsmtpb-ews05.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Thu, 17 Jan 2013 20:17:06 +0100 Received: from CPSMTPM-TLF104.kpnxchange.com ([195.121.3.7]) by cpsps-ews29.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Thu, 17 Jan 2013 20:17:06 +0100 Received: from sjakie.klop.ws ([212.182.167.131]) by CPSMTPM-TLF104.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Thu, 17 Jan 2013 20:18:14 +0100 Received: from 212-182-167-131.ip.telfort.nl (localhost [127.0.0.1]) by sjakie.klop.ws (Postfix) with ESMTP id 7B3DD98AE for ; Thu, 17 Jan 2013 20:18:14 +0100 (CET) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: slowdown of zfs (tx->tx) References: <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <50F67551.5020704@FreeBSD.org> <20130116095009.GA36867@mid.pc5.i.0x5.de> <50F69788.2040506@FreeBSD.org> <20130117093259.GA83951@mid.pc5.i.0x5.de> Date: Thu, 17 Jan 2013 20:18:14 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Ronald Klop" Message-ID: In-Reply-To: <20130117093259.GA83951@mid.pc5.i.0x5.de> User-Agent: Opera Mail/12.12 (FreeBSD) X-OriginalArrivalTime: 17 Jan 2013 19:18:14.0831 (UTC) FILETIME=[65EDABF0:01CDF4E7] X-RcptDomain: freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 19:19:23 -0000 On Thu, 17 Jan 2013 10:32:59 +0100, Nicolas Rachinsky wrote: > * Andriy Gapon [2013-01-16 14:05 +0200]: >> on 16/01/2013 12:14 Steven Hartland said the following: >> > You only have ~11% free so yer it is pretty full ;-) >> >> just in case, Steve is not kidding. >> >> Those free hundreds of gigabytes could be spread over the terabytes and >> could be >> quite fragmented if the pool has a history of adding and removing lots >> of files. >> ZFS could be spending quite a lot of time in that case when it looks >> for some >> free space and tries to minimize further fragmentation. >> >> Empirical/anecdotal safe limit on pool utilization is said to be about >> 70-80%. >> >> You can test if this guess is true by doing the following: >> kgdb -w >> (kgdb) set metaslab_min_alloc_size=4096 >> >> If performance noticeably improves after that, then this is your >> problem indeed. > > I tried this, but I didn't notice any difference in performance. > > Next I'll try the update Artem suggested. > > Thanks > > Nicolas Did you already try to free some space? Ronald. From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 19:57:45 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CC2A8FC7; Thu, 17 Jan 2013 19:57:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 7AAFBDF5; Thu, 17 Jan 2013 19:57:45 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D57F7B924; Thu, 17 Jan 2013 14:57:44 -0500 (EST) From: John Baldwin To: freebsd-fs@freebsd.org Subject: [PATCH] Use vfs_timestamp() instead of getnanotime() in NFS Date: Thu, 17 Jan 2013 14:57:43 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <162405990.1985479.1358212854967.JavaMail.root@erie.cs.uoguelph.ca> <20130115141019.H1444@besplex.bde.org> <201301151458.42874.jhb@freebsd.org> In-Reply-To: <201301151458.42874.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201301171457.43800.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 17 Jan 2013 14:57:44 -0500 (EST) Cc: Rick Macklem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 19:57:45 -0000 On Tuesday, January 15, 2013 2:58:42 pm John Baldwin wrote: > Fixing NFS to properly use vfs_timestamp() seems to be a larger > project. Actually, I have a patch that I think does this below. It builds, have not yet booted it (but will do so in a bit). Index: fs/nfsclient/nfs_clstate.c =================================================================== --- fs/nfsclient/nfs_clstate.c (revision 245225) +++ fs/nfsclient/nfs_clstate.c (working copy) @@ -4611,7 +4611,7 @@ } dp = nfscl_finddeleg(clp, np->n_fhp->nfh_fh, np->n_fhp->nfh_len); if (dp != NULL && (dp->nfsdl_flags & NFSCLDL_WRITE)) { - NFSGETNANOTIME(&dp->nfsdl_modtime); + vfs_timestamp(&dp->nfsdl_modtime); dp->nfsdl_flags |= NFSCLDL_MODTIMESET; } NFSUNLOCKCLSTATE(); Index: fs/nfsclient/nfs_clvnops.c =================================================================== --- fs/nfsclient/nfs_clvnops.c (revision 245225) +++ fs/nfsclient/nfs_clvnops.c (working copy) @@ -3247,7 +3247,7 @@ */ mtx_lock(&np->n_mtx); np->n_flag |= NACC; - getnanotime(&np->n_atim); + vfs_timestamp(&np->n_atim); mtx_unlock(&np->n_mtx); error = fifo_specops.vop_read(ap); return error; @@ -3266,7 +3266,7 @@ */ mtx_lock(&np->n_mtx); np->n_flag |= NUPD; - getnanotime(&np->n_mtim); + vfs_timestamp(&np->n_mtim); mtx_unlock(&np->n_mtx); return(fifo_specops.vop_write(ap)); } @@ -3286,7 +3286,7 @@ mtx_lock(&np->n_mtx); if (np->n_flag & (NACC | NUPD)) { - getnanotime(&ts); + vfs_timestamp(&ts); if (np->n_flag & NACC) np->n_atim = ts; if (np->n_flag & NUPD) Index: fs/nfsserver/nfs_nfsdport.c =================================================================== --- fs/nfsserver/nfs_nfsdport.c (revision 245225) +++ fs/nfsserver/nfs_nfsdport.c (working copy) @@ -1476,7 +1476,7 @@ struct vattr va; VATTR_NULL(&va); - getnanotime(&va.va_mtime); + vfs_timestamp(&va.va_mtime); (void) VOP_SETATTR(vp, &va, cred); (void) nfsvno_getattr(vp, nvap, cred, p, 1); } @@ -2248,7 +2248,6 @@ { u_int32_t *tl; struct nfsv2_sattr *sp; - struct timeval curtime; int error = 0, toclient = 0; switch (nd->nd_flag & (ND_NFSV2 | ND_NFSV3 | ND_NFSV4)) { @@ -2307,9 +2306,7 @@ toclient = 1; break; case NFSV3SATTRTIME_TOSERVER: - NFSGETTIME(&curtime); - nvap->na_atime.tv_sec = curtime.tv_sec; - nvap->na_atime.tv_nsec = curtime.tv_usec * 1000; + vfs_timestamp(&nvap->na_atime); nvap->na_vaflags |= VA_UTIMES_NULL; break; }; @@ -2321,9 +2318,7 @@ nvap->na_vaflags &= ~VA_UTIMES_NULL; break; case NFSV3SATTRTIME_TOSERVER: - NFSGETTIME(&curtime); - nvap->na_mtime.tv_sec = curtime.tv_sec; - nvap->na_mtime.tv_nsec = curtime.tv_usec * 1000; + vfs_timestamp(&nvap->na_mtime); if (!toclient) nvap->na_vaflags |= VA_UTIMES_NULL; break; @@ -2353,7 +2348,6 @@ u_char *cp, namestr[NFSV4_SMALLSTR + 1]; uid_t uid; gid_t gid; - struct timeval curtime; error = nfsrv_getattrbits(nd, attrbitp, NULL, &retnotsup); if (error) @@ -2488,9 +2482,7 @@ toclient = 1; attrsum += NFSX_V4TIME; } else { - NFSGETTIME(&curtime); - nvap->na_atime.tv_sec = curtime.tv_sec; - nvap->na_atime.tv_nsec = curtime.tv_usec * 1000; + vfs_timestamp(&nvap->na_atime); nvap->na_vaflags |= VA_UTIMES_NULL; } break; @@ -2515,9 +2507,7 @@ nvap->na_vaflags &= ~VA_UTIMES_NULL; attrsum += NFSX_V4TIME; } else { - NFSGETTIME(&curtime); - nvap->na_mtime.tv_sec = curtime.tv_sec; - nvap->na_mtime.tv_nsec = curtime.tv_usec * 1000; + vfs_timestamp(&nvap->na_mtime); if (!toclient) nvap->na_vaflags |= VA_UTIMES_NULL; } Index: nfsclient/nfs_vnops.c =================================================================== --- nfsclient/nfs_vnops.c (revision 245225) +++ nfsclient/nfs_vnops.c (working copy) @@ -3458,7 +3458,7 @@ */ mtx_lock(&np->n_mtx); np->n_flag |= NACC; - getnanotime(&np->n_atim); + vfs_timestamp(&np->n_atim); mtx_unlock(&np->n_mtx); error = fifo_specops.vop_read(ap); return error; @@ -3477,7 +3477,7 @@ */ mtx_lock(&np->n_mtx); np->n_flag |= NUPD; - getnanotime(&np->n_mtim); + vfs_timestamp(&np->n_mtim); mtx_unlock(&np->n_mtx); return(fifo_specops.vop_write(ap)); } @@ -3497,7 +3497,7 @@ mtx_lock(&np->n_mtx); if (np->n_flag & (NACC | NUPD)) { - getnanotime(&ts); + vfs_timestamp(&ts); if (np->n_flag & NACC) np->n_atim = ts; if (np->n_flag & NUPD) Index: nfsserver/nfs_srvsubs.c =================================================================== --- nfsserver/nfs_srvsubs.c (revision 245225) +++ nfsserver/nfs_srvsubs.c (working copy) @@ -1393,7 +1393,7 @@ toclient = 1; break; case NFSV3SATTRTIME_TOSERVER: - getnanotime(&(a)->va_atime); + vfs_timestamp(&(a)->va_atime); a->va_vaflags |= VA_UTIMES_NULL; break; } @@ -1409,7 +1409,7 @@ a->va_vaflags &= ~VA_UTIMES_NULL; break; case NFSV3SATTRTIME_TOSERVER: - getnanotime(&(a)->va_mtime); + vfs_timestamp(&(a)->va_mtime); if (toclient == 0) a->va_vaflags |= VA_UTIMES_NULL; break; -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Thu Jan 17 23:05:49 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 95326F6D; Thu, 17 Jan 2013 23:05:49 +0000 (UTC) (envelope-from mjg@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 6F02BB15; Thu, 17 Jan 2013 23:05:49 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0HN5nBW066428; Thu, 17 Jan 2013 23:05:49 GMT (envelope-from mjg@freefall.freebsd.org) Received: (from mjg@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0HN5nLN066424; Thu, 17 Jan 2013 23:05:49 GMT (envelope-from mjg) Date: Thu, 17 Jan 2013 23:05:49 GMT Message-Id: <201301172305.r0HN5nLN066424@freefall.freebsd.org> To: mjg@FreeBSD.org, freebsd-fs@FreeBSD.org, mjg@FreeBSD.org From: mjg@FreeBSD.org Subject: Re: kern/175353: [tmpfs] [panic] panic during building a nanobsd image + ports X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 23:05:49 -0000 Synopsis: [tmpfs] [panic] panic during building a nanobsd image + ports Responsible-Changed-From-To: freebsd-fs->mjg Responsible-Changed-By: mjg Responsible-Changed-When: Thu Jan 17 23:05:48 UTC 2013 Responsible-Changed-Why: Take http://www.freebsd.org/cgi/query-pr.cgi?pr=175353 From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 00:49:29 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5149F779; Fri, 18 Jan 2013 00:49:29 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from mail.distal.com (mail.distal.com [IPv6:2001:470:e24c:200::ae25]) by mx1.freebsd.org (Postfix) with ESMTP id E2D8A8C; Fri, 18 Jan 2013 00:49:28 +0000 (UTC) Received: from magrathea.distal.com (magrathea.distal.com [IPv6:2001:470:e24c:200:ea06:88ff:feca:960e]) (authenticated bits=0) by mail.distal.com (8.14.3/8.14.3) with ESMTP id r0I0nPuC014155 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 17 Jan 2013 19:49:25 -0500 (EST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Changes to kern.geom.debugflags? From: Chris Ross In-Reply-To: <50F82846.6030104@FreeBSD.org> Date: Thu, 17 Jan 2013 19:49:24 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <315EDE17-4995-4819-BC82-E9B7D942E82A@distal.com> References: <7AA0B5D0-D49C-4D5A-8FA0-AA57C091C040@distal.com> <6A0C1005-F328-4C4C-BB83-CA463BD85127@distal.com> <20121225232507.GA47735@alchemy.franken.de> <8D01A854-97D9-4F1F-906A-7AB59BF8850B@distal.com> <6FC4189B-85FA-466F-AA00-C660E9C16367@distal.com> <20121230032403.GA29164@pix.net> <56B28B8A-2284-421D-A666-A21F995C7640@distal.com> <20130104234616.GA37999@alchemy.franken.de> <50F82846.6030104@FreeBSD.org> To: Andriy Gapon X-Mailer: Apple Mail (2.1499) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.2 (mail.distal.com [IPv6:2001:470:e24c:200::ae25]); Thu, 17 Jan 2013 19:49:26 -0500 (EST) Cc: "freebsd-fs@freebsd.org" , Kurt Lidl , "freebsd-sparc64@freebsd.org" , Marius Strobl X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 00:49:29 -0000 On Jan 17, 2013, at 11:35 , Andriy Gapon wrote: > on 08/01/2013 03:53 Chris Ross said the following: >>=20 >> Out of curiosity, I did try 242229. It boots. So. the problem = occurred with 242230, which=20 >> came from 241289. FYI. >=20 > Chris, >=20 > thank you for triaging and analyzing this problem. And sorry for the = long delay > (caused by the New Year craziness you mentioned earlier). >=20 > The problem is that arch_zfs_probe methods are expected only to probe = for ZFS > disks/partitions, but they are not allowed to execute any other ZFS = operations. > I assumed this to be true and forgot to check sparc64_zfs_probe. Mea = culpa. >=20 > Could you please test the following patch? Thank you, Andriy. Much as you'd expect, that patch solves the = problem. I get some of the printf()s that I'd put into zfs_fmtdev(), and the system loads = successfully. Please commit that patch, and if you could, change the comment just = below the last portion of it that is now not quite accurate (since you moved mentioned = code). Thanks again! How long will this take to get to stable/9? Being new = to FreeBSD, I'm not too familiar with the process of HEAD/stable/etc. (In NetBSD, = it would be a commit followed by a pull request.) - Chris From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 02:23:43 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7A6D0669; Fri, 18 Jan 2013 02:23:43 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E31D6654; Fri, 18 Jan 2013 02:23:42 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAG6x+FCDaFvO/2dsb2JhbABFhkW0CYN/c4IeAQEEASMEUgUWDgoCAg0ZAlkGiCYGqVORdoEjjwOBEwOIYY0riU2GfIMTggY X-IronPort-AV: E=Sophos;i="4.84,488,1355115600"; d="scan'208";a="9745603" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 17 Jan 2013 21:23:35 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id DC8CCB3F44; Thu, 17 Jan 2013 21:23:35 -0500 (EST) Date: Thu, 17 Jan 2013 21:23:35 -0500 (EST) From: Rick Macklem To: John Baldwin Message-ID: <460209850.2108683.1358475815866.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201301171457.43800.jhb@freebsd.org> Subject: Re: [PATCH] Use vfs_timestamp() instead of getnanotime() in NFS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Rick Macklem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 02:23:43 -0000 John Baldwin wrote: > On Tuesday, January 15, 2013 2:58:42 pm John Baldwin wrote: > > Fixing NFS to properly use vfs_timestamp() seems to be a larger > > project. > > Actually, I have a patch that I think does this below. It builds, have > not > yet booted it (but will do so in a bit). > > Index: fs/nfsclient/nfs_clstate.c > =================================================================== > --- fs/nfsclient/nfs_clstate.c (revision 245225) > +++ fs/nfsclient/nfs_clstate.c (working copy) > @@ -4611,7 +4611,7 @@ > } > dp = nfscl_finddeleg(clp, np->n_fhp->nfh_fh, np->n_fhp->nfh_len); > if (dp != NULL && (dp->nfsdl_flags & NFSCLDL_WRITE)) { > - NFSGETNANOTIME(&dp->nfsdl_modtime); > + vfs_timestamp(&dp->nfsdl_modtime); > dp->nfsdl_flags |= NFSCLDL_MODTIMESET; > } > NFSUNLOCKCLSTATE(); Not sure about this case. Although nfsdl_modtime is being set for local use, it replaces the mtime returned by the NFS server while the delegation is in use. Ideally it would be the same resolution as the NFS server, but that resolution isn't known to the client. (It is often better than 1sec, which is the default for vfs_timestamp().) I'd be tempted to leave it (although the function used by the macro might need to be changed, since Bruce mentions getnanotime() isn't supposed to be used?). > Index: fs/nfsclient/nfs_clvnops.c > =================================================================== > --- fs/nfsclient/nfs_clvnops.c (revision 245225) > +++ fs/nfsclient/nfs_clvnops.c (working copy) > @@ -3247,7 +3247,7 @@ > */ > mtx_lock(&np->n_mtx); > np->n_flag |= NACC; > - getnanotime(&np->n_atim); > + vfs_timestamp(&np->n_atim); > mtx_unlock(&np->n_mtx); > error = fifo_specops.vop_read(ap); > return error; > @@ -3266,7 +3266,7 @@ > */ > mtx_lock(&np->n_mtx); > np->n_flag |= NUPD; > - getnanotime(&np->n_mtim); > + vfs_timestamp(&np->n_mtim); > mtx_unlock(&np->n_mtx); > return(fifo_specops.vop_write(ap)); > } > @@ -3286,7 +3286,7 @@ > > mtx_lock(&np->n_mtx); > if (np->n_flag & (NACC | NUPD)) { > - getnanotime(&ts); > + vfs_timestamp(&ts); > if (np->n_flag & NACC) > np->n_atim = ts; > if (np->n_flag & NUPD) > Index: fs/nfsserver/nfs_nfsdport.c > =================================================================== > --- fs/nfsserver/nfs_nfsdport.c (revision 245225) > +++ fs/nfsserver/nfs_nfsdport.c (working copy) > @@ -1476,7 +1476,7 @@ > struct vattr va; > > VATTR_NULL(&va); > - getnanotime(&va.va_mtime); > + vfs_timestamp(&va.va_mtime); > (void) VOP_SETATTR(vp, &va, cred); > (void) nfsvno_getattr(vp, nvap, cred, p, 1); > } > @@ -2248,7 +2248,6 @@ > { > u_int32_t *tl; > struct nfsv2_sattr *sp; > - struct timeval curtime; > int error = 0, toclient = 0; > > switch (nd->nd_flag & (ND_NFSV2 | ND_NFSV3 | ND_NFSV4)) { > @@ -2307,9 +2306,7 @@ > toclient = 1; > break; > case NFSV3SATTRTIME_TOSERVER: > - NFSGETTIME(&curtime); > - nvap->na_atime.tv_sec = curtime.tv_sec; > - nvap->na_atime.tv_nsec = curtime.tv_usec * 1000; > + vfs_timestamp(&nvap->na_atime); > nvap->na_vaflags |= VA_UTIMES_NULL; > break; > }; > @@ -2321,9 +2318,7 @@ > nvap->na_vaflags &= ~VA_UTIMES_NULL; > break; > case NFSV3SATTRTIME_TOSERVER: > - NFSGETTIME(&curtime); > - nvap->na_mtime.tv_sec = curtime.tv_sec; > - nvap->na_mtime.tv_nsec = curtime.tv_usec * 1000; > + vfs_timestamp(&nvap->na_mtime); > if (!toclient) > nvap->na_vaflags |= VA_UTIMES_NULL; > break; > @@ -2353,7 +2348,6 @@ > u_char *cp, namestr[NFSV4_SMALLSTR + 1]; > uid_t uid; > gid_t gid; > - struct timeval curtime; > > error = nfsrv_getattrbits(nd, attrbitp, NULL, &retnotsup); > if (error) > @@ -2488,9 +2482,7 @@ > toclient = 1; > attrsum += NFSX_V4TIME; > } else { > - NFSGETTIME(&curtime); > - nvap->na_atime.tv_sec = curtime.tv_sec; > - nvap->na_atime.tv_nsec = curtime.tv_usec * 1000; > + vfs_timestamp(&nvap->na_atime); > nvap->na_vaflags |= VA_UTIMES_NULL; > } > break; > @@ -2515,9 +2507,7 @@ > nvap->na_vaflags &= ~VA_UTIMES_NULL; > attrsum += NFSX_V4TIME; > } else { > - NFSGETTIME(&curtime); > - nvap->na_mtime.tv_sec = curtime.tv_sec; > - nvap->na_mtime.tv_nsec = curtime.tv_usec * 1000; > + vfs_timestamp(&nvap->na_mtime); > if (!toclient) > nvap->na_vaflags |= VA_UTIMES_NULL; > } > Index: nfsclient/nfs_vnops.c > =================================================================== > --- nfsclient/nfs_vnops.c (revision 245225) > +++ nfsclient/nfs_vnops.c (working copy) > @@ -3458,7 +3458,7 @@ > */ > mtx_lock(&np->n_mtx); > np->n_flag |= NACC; > - getnanotime(&np->n_atim); > + vfs_timestamp(&np->n_atim); > mtx_unlock(&np->n_mtx); > error = fifo_specops.vop_read(ap); > return error; > @@ -3477,7 +3477,7 @@ > */ > mtx_lock(&np->n_mtx); > np->n_flag |= NUPD; > - getnanotime(&np->n_mtim); > + vfs_timestamp(&np->n_mtim); > mtx_unlock(&np->n_mtx); > return(fifo_specops.vop_write(ap)); > } > @@ -3497,7 +3497,7 @@ > > mtx_lock(&np->n_mtx); > if (np->n_flag & (NACC | NUPD)) { > - getnanotime(&ts); > + vfs_timestamp(&ts); > if (np->n_flag & NACC) > np->n_atim = ts; > if (np->n_flag & NUPD) > Index: nfsserver/nfs_srvsubs.c > =================================================================== > --- nfsserver/nfs_srvsubs.c (revision 245225) > +++ nfsserver/nfs_srvsubs.c (working copy) > @@ -1393,7 +1393,7 @@ > toclient = 1; > break; > case NFSV3SATTRTIME_TOSERVER: > - getnanotime(&(a)->va_atime); > + vfs_timestamp(&(a)->va_atime); > a->va_vaflags |= VA_UTIMES_NULL; > break; > } > @@ -1409,7 +1409,7 @@ > a->va_vaflags &= ~VA_UTIMES_NULL; > break; > case NFSV3SATTRTIME_TOSERVER: > - getnanotime(&(a)->va_mtime); > + vfs_timestamp(&(a)->va_mtime); > if (toclient == 0) > a->va_vaflags |= VA_UTIMES_NULL; > break; > > -- > John Baldwin Other than nfsdl_modtime, the rest look ok to me, since they are either the times for the special files in the client or timestamps for server file systems. rick From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 06:19:42 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 039A4B54; Fri, 18 Jan 2013 06:19:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id 8715DF66; Fri, 18 Jan 2013 06:19:40 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r0I6JTZv001575 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 18 Jan 2013 17:19:30 +1100 Date: Fri, 18 Jan 2013 17:19:29 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem Subject: Re: [PATCH] Use vfs_timestamp() instead of getnanotime() in NFS In-Reply-To: <460209850.2108683.1358475815866.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20130118165934.K1042@besplex.bde.org> References: <460209850.2108683.1358475815866.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Zty1sKHG c=1 sm=1 a=kdfE0iePi98A:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=9IEQsz3md4oA:10 a=g0RNfcms4CO3QVWdHTYA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: freebsd-fs@FreeBSD.org, Rick Macklem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 06:19:42 -0000 On Thu, 17 Jan 2013, Rick Macklem wrote: > John Baldwin wrote: >> On Tuesday, January 15, 2013 2:58:42 pm John Baldwin wrote: >>> Fixing NFS to properly use vfs_timestamp() seems to be a larger >>> project. >> >> Actually, I have a patch that I think does this below. It builds, have >> not >> yet booted it (but will do so in a bit). >> >> Index: fs/nfsclient/nfs_clstate.c >> =================================================================== >> --- fs/nfsclient/nfs_clstate.c (revision 245225) >> +++ fs/nfsclient/nfs_clstate.c (working copy) >> @@ -4611,7 +4611,7 @@ >> } >> dp = nfscl_finddeleg(clp, np->n_fhp->nfh_fh, np->n_fhp->nfh_len); >> if (dp != NULL && (dp->nfsdl_flags & NFSCLDL_WRITE)) { >> - NFSGETNANOTIME(&dp->nfsdl_modtime); >> + vfs_timestamp(&dp->nfsdl_modtime); >> dp->nfsdl_flags |= NFSCLDL_MODTIMESET; >> } >> NFSUNLOCKCLSTATE(); > Not sure about this case. Although nfsdl_modtime is being set for local > use, it replaces the mtime returned by the NFS server while the delegation > is in use. Ideally it would be the same resolution as the NFS server, but > that resolution isn't known to the client. (It is often better than 1sec, > which is the default for vfs_timestamp().) The patch seems about right except for this. > I'd be tempted to leave it (although the function used by the macro might > need to be changed, since Bruce mentions getnanotime() isn't supposed to > be used?). For maximal precision and accuracy, it nanotime() should be used. I'm not sure if you need to be at least as precise and accurate as the server. Having them synced to nanoseconds accuracy is impossible, but getnanotime() gives <= 1/HZ of accuracy and it is easy for them to be synced with more accuracy than that. Then the extra accuracy can be seen in server timestamps if the server is FreeBSD and uses vfs_timestamp() with a either microtime() or nanotime(). Further style fixes: - remove the NFSGETNANOTIME() macro. It is only used in the above, and in 3 other instances where its use is bogus because only the seconds part is used. The `time_second' global gives seconds part with the same (in)accuracy as getnanotime(). If you want maximal accuracy for just the seconds part, then bintime() should be used (this is slightly faster than microtime() and nanotime(). (get*time()'s seconds part is the same as time_second. This inaccurate since it lags bintime()'s seconds part by up to 1/HZ seconds (so it differs by a full second for an everage of 1 one in every HZ readings). The difference is visible if one reader, say make(1) reads the time using bintime() while another reader, say vfs_timestamp() reads the time using getbintime().) >> Index: nfsserver/nfs_srvsubs.c >> =================================================================== >> --- nfsserver/nfs_srvsubs.c (revision 245225) >> +++ nfsserver/nfs_srvsubs.c (working copy) >> @@ -1393,7 +1393,7 @@ >> toclient = 1; >> break; >> case NFSV3SATTRTIME_TOSERVER: >> - getnanotime(&(a)->va_atime); >> + vfs_timestamp(&(a)->va_atime); >> a->va_vaflags |= VA_UTIMES_NULL; >> break; >> } >> @@ -1409,7 +1409,7 @@ >> a->va_vaflags &= ~VA_UTIMES_NULL; >> break; >> case NFSV3SATTRTIME_TOSERVER: >> - getnanotime(&(a)->va_mtime); >> + vfs_timestamp(&(a)->va_mtime); >> if (toclient == 0) >> a->va_vaflags |= VA_UTIMES_NULL; >> break; - parenthesizing 'a' is bogus. Bruce From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 11:26:37 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DC6FB473; Fri, 18 Jan 2013 11:26:37 +0000 (UTC) (envelope-from nicolas@i.0x5.de) Received: from n.0x5.de (n.0x5.de [217.197.85.144]) by mx1.freebsd.org (Postfix) with ESMTP id 4573AF7B; Fri, 18 Jan 2013 11:26:37 +0000 (UTC) Received: by pc5.i.0x5.de (Postfix, from userid 1003) id 3YnfxL12vTz7ySF; Fri, 18 Jan 2013 12:26:30 +0100 (CET) Date: Fri, 18 Jan 2013 12:26:30 +0100 From: Nicolas Rachinsky To: Artem Belevich Subject: Re: slowdown of zfs (tx->tx) Message-ID: <20130118112630.GA41074@mid.pc5.i.0x5.de> References: <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <20130116073759.GA47781@mid.pc5.i.0x5.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Powered-by: FreeBSD X-Homepage: http://www.rachinsky.de X-PGP-Keyid: 887BAE72 X-PGP-Fingerprint: 039E 9433 115F BC5F F88D 4524 5092 45C4 887B AE72 X-PGP-Keys: http://www.rachinsky.de/nicolas/gpg/nicolas_rachinsky.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 11:26:37 -0000 * Artem Belevich [2013-01-16 00:45 -0800]: > On Tue, Jan 15, 2013 at 11:37 PM, Nicolas Rachinsky > wrote: > >> You may want to update your system to very recent FreeBSD as quite a > >> few fixes were recently imported from illumos. Hopefully it will deal > >> with the issue. I'm out of ideas otherwise. Sorry. > > > > Do you mean -CURRENT or -STABLE with very recent? Or just 9.1? > > -HEAD or -STABLE (-8 or -9). I have now updated the machine to stable/8 r245541. I have not updated the zpool. But the problem still occurs. Should I update the pool? Or try other things first? Nicolas -- http://www.rachinsky.de/nicolas From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 13:07:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9EC36A21 for ; Fri, 18 Jan 2013 13:07:25 +0000 (UTC) (envelope-from dppascual@gmail.com) Received: from mail-ie0-f170.google.com (mail-ie0-f170.google.com [209.85.223.170]) by mx1.freebsd.org (Postfix) with ESMTP id 79C4577B for ; Fri, 18 Jan 2013 13:07:25 +0000 (UTC) Received: by mail-ie0-f170.google.com with SMTP id k10so6353951iea.29 for ; Fri, 18 Jan 2013 05:07:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=wRO4nsto0fLyZC7+GaptBc0TY74Iv8a2Mah8PyEb/6E=; b=wHtp/NDI9OnG1B7+JFGvF/3MzZwRD0UrUXo0jrriGvePRHcoXjVykqVL7mQkKROWGz ZRNfSk95zk7dCpQQk8g9On7cJCYSMEnRHgMyetKPPGH4n8Oi1XzVK4CJSVP+qPtSoVtV 1e56dBqMnDOkbWXUzUNbppWbRW2s5H38o8/eoUdY2R6dzpXEjjF8u/43Ey9oXeor+WaB yPqcDB0zgIA1B9vbpGo6ZS7y3bdgkNXTE8QVrDI6NWknhCl4qRSXJhwXWv6fOk64ES9W HQBZqriv221SHKWHjZdm0iHs1Go3tY8OehT0X0KbyAWbke0squdna0qhnmaDPFGqVfZD BHMw== MIME-Version: 1.0 X-Received: by 10.50.16.210 with SMTP id i18mr2000604igd.53.1358514439132; Fri, 18 Jan 2013 05:07:19 -0800 (PST) Received: by 10.50.153.168 with HTTP; Fri, 18 Jan 2013 05:07:19 -0800 (PST) Date: Fri, 18 Jan 2013 14:07:19 +0100 Message-ID: Subject: Enable UNMAP in ZFS From: Dani To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 13:07:25 -0000 Hi all, I have installed FreeBSD 9.1 and have created a ZFS pool with SAS disks. How can I enable UNMAP on SSDs devices used as cache and log devices? Thanks you. Regards. From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 14:23:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E493ED43 for ; Fri, 18 Jan 2013 14:23:10 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-vc0-f169.google.com (mail-vc0-f169.google.com [209.85.220.169]) by mx1.freebsd.org (Postfix) with ESMTP id A138DA9D for ; Fri, 18 Jan 2013 14:23:10 +0000 (UTC) Received: by mail-vc0-f169.google.com with SMTP id gb23so3755736vcb.0 for ; Fri, 18 Jan 2013 06:23:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=6bJ80RVz1XvB/mB+9S14OvcMRoTrYHLmWL0plk9ncx8=; b=QOGybnry5ZmdcYTxRQ4aFHv7WPQIX6pLVnU9qW1wfQbvCJnMHWhoRJRCvSsZkQNvvH a/LD2fOFLtyh0o8gQTvVNNLP4UETCzmNR32PUwONjo8/MV7vEjzSfiWkLKdoiv2SEryw sWaU4v/DqlSSYvFXjGMSBWLpXR/0Z82HjpNdT4J4g1wgQFecvNEJfH01j8kiuPvd+Smt B77KjM4dNOHJDfA3tAG5O5pe9Gfb1Np+4lqGjgQlT3UpK3z0cQQB4nPLzLjcjrOmtdB0 M9WWRnzTPF4ICebV7SkuKTEhNYUAfaVvaT0u2izoO442vPYzUQOftKwFIC5dTWJPE1jv BrPg== MIME-Version: 1.0 X-Received: by 10.52.175.106 with SMTP id bz10mr8397064vdc.125.1358518983967; Fri, 18 Jan 2013 06:23:03 -0800 (PST) Received: by 10.58.145.196 with HTTP; Fri, 18 Jan 2013 06:23:03 -0800 (PST) In-Reply-To: References: Date: Fri, 18 Jan 2013 14:23:03 +0000 Message-ID: Subject: Re: Enable UNMAP in ZFS From: Tom Evans To: Dani Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 14:23:11 -0000 On Fri, Jan 18, 2013 at 1:07 PM, Dani wrote: > Hi all, > > I have installed FreeBSD 9.1 and have created a ZFS pool with SAS disks. > How can I enable UNMAP on SSDs devices used as cache and log devices? > > Thanks you. Regards. UNMAP, I don't know. FreeBSD ZFS has support for TRIM in 10-CURRENT. Oracle support UNMAP in Solaris 11.1, but judging from this, it's not that useful: http://docs.oracle.com/cd/E26502_01/html/E28978/gmibl.html Cheers Tom From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 14:57:43 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 195AFA6B for ; Fri, 18 Jan 2013 14:57:43 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id C9FB2DF0 for ; Fri, 18 Jan 2013 14:57:42 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1TwDNy-0004Lu-Tr for freebsd-fs@freebsd.org; Fri, 18 Jan 2013 15:57:40 +0100 Received: from [81.21.138.17] (helo=ronaldradial.versatec.local) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1TwDNz-00017M-1v for freebsd-fs@freebsd.org; Fri, 18 Jan 2013 15:57:39 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: Enable UNMAP in ZFS References: Date: Fri, 18 Jan 2013 15:57:37 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/12.12 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: -0.0 X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled version=3.3.1 X-Scan-Signature: c74461a82029b6293650421ecb57b64a X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 14:57:43 -0000 On Fri, 18 Jan 2013 15:23:03 +0100, Tom Evans wrote: > On Fri, Jan 18, 2013 at 1:07 PM, Dani wrote: >> Hi all, >> >> I have installed FreeBSD 9.1 and have created a ZFS pool with SAS disks. >> How can I enable UNMAP on SSDs devices used as cache and log devices? >> >> Thanks you. Regards. > > UNMAP, I don't know. FreeBSD ZFS has support for TRIM in 10-CURRENT. > > Oracle support UNMAP in Solaris 11.1, but judging from this, it's not > that useful: > > http://docs.oracle.com/cd/E26502_01/html/E28978/gmibl.html > > Cheers > > Tom Isn't UNMAP the SCSI name for TRIM? Ronald. From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 15:02:01 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5707DCE6 for ; Fri, 18 Jan 2013 15:02:01 +0000 (UTC) (envelope-from gavin@FreeBSD.org) Received: from mail-gw14.york.ac.uk (mail-gw14.york.ac.uk [144.32.129.164]) by mx1.freebsd.org (Postfix) with ESMTP id 0EE57E3A for ; Fri, 18 Jan 2013 15:02:00 +0000 (UTC) Received: from ury.york.ac.uk ([144.32.108.81]:37640) by mail-gw14.york.ac.uk with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1TwDSB-0006Qv-KY for freebsd-fs@FreeBSD.org; Fri, 18 Jan 2013 15:01:59 +0000 Date: Fri, 18 Jan 2013 15:01:59 +0000 (GMT) From: Gavin Atkinson X-X-Sender: gavin@thunderhorn.york.ac.uk To: freebsd-fs@FreeBSD.org Subject: ZFS lock up 9-stable r244911 (Jan) Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 15:02:01 -0000 Hi all, I have a machine on which ZFS appears to have locked up, and any processes that attempt to access the ZFS filesystem. This machine is running 9-stable amd64 r244911 (though from cvs, not SVN), and therefore I believe has all of avg's ZFS deadlock patches. This machine has both UFS and ZFS filesystems. All of the "system" filesystems are on UFS, and as a result the machine itself is responsive and I can investigate state using kgdb against the live kernel. I've included all thread backtraces, a couple of other bits relating to held locks, and ps/sysctl output at http://people.freebsd.org/~gavin/tay-zfs-hang.txt http://people.freebsd.org/~gavin/tay-sysctl-a.txt http://people.freebsd.org/~gavin/tay-ps-auxwwwH.txt This machine was in use as a pkgng package builder, using poudriere. Poudriere makes heavy use of zfs filesystems within jails, including "zfs get", "zfs set", "zfs snapshot", "zfs rollback", "zfs diff" and other commands, although there do not appear to be any instances of the zfs process running currently. At the time of the hang 16 parallel builds were in progress, The underlying disk subsystem is a single hardware RAID-10 on a twa controller, and the zpool is on a single partition of this device. The RAID-10 itself is intact, the controller reports no errors. There is no L2ARC or separate ZIL. The UFS filesystems (still seem to be fully functional) are on separate partitions on the same underlying device, so I do not believe the underlying storage is having issues. I can "dd" from the underlying ZFS partition without problem. Nothing has been logged to /var/log/messages. I can keep this machine in this state for a couple of days, so can get further details as required. I am happy to work with somebody in order to diagnose this hang further - Note however that the kernel does not have WITNESS etc compiled in. Thanks, Gavin From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 15:17:16 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 83C63411 for ; Fri, 18 Jan 2013 15:17:16 +0000 (UTC) (envelope-from feld@feld.me) Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2]) by mx1.freebsd.org (Postfix) with ESMTP id 62FB5EFC for ; Fri, 18 Jan 2013 15:17:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me; s=blargle; h=In-Reply-To:Message-Id:From:Mime-Version:Date:References:Subject:To:Content-Type; bh=+CTXW3J+6nDt8B1ALMKX4+D7hvMTHe20zAhXlL4DW6A=; b=RULViNeCZY/QgdhfR55BdLctdYeXw9FQ0GVKBHdMCYF+GHrkpGrc/46abbVSV/J0TebTR9jVLOmmtUFyjXjMiwMhE2MYdKa+JEd8mJdRayQMADrE2J14nUf7jaG/TIGh; Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org) by feld.me with esmtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1TwDgy-0008vp-5U; Fri, 18 Jan 2013 09:17:16 -0600 Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4) with esmtpsa id 1358522210-12155-89420/5/1; Fri, 18 Jan 2013 15:16:50 +0000 Content-Type: text/plain; format=flowed; delsp=yes To: freebsd-fs@freebsd.org, Dani Subject: Re: Enable UNMAP in ZFS References: Date: Fri, 18 Jan 2013 09:16:50 -0600 Mime-Version: 1.0 From: Mark Felder Message-Id: In-Reply-To: User-Agent: Opera Mail/12.12 (FreeBSD) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 15:17:16 -0000 On Fri, 18 Jan 2013 07:07:19 -0600, Dani wrote: > > I have installed FreeBSD 9.1 and have created a ZFS pool with SAS disks. > How can I enable UNMAP on SSDs devices used as cache and log devices? By UNMAP do you mean TRIM? From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 16:20:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E570FF4E for ; Fri, 18 Jan 2013 16:20:14 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vc0-f175.google.com (mail-vc0-f175.google.com [209.85.220.175]) by mx1.freebsd.org (Postfix) with ESMTP id 9F4C4311 for ; Fri, 18 Jan 2013 16:20:14 +0000 (UTC) Received: by mail-vc0-f175.google.com with SMTP id fw7so1100534vcb.6 for ; Fri, 18 Jan 2013 08:20:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=rIRTqlKhtT5UIxfIX8loXRz8bC96UlBEIAIjFFMi9ug=; b=LFFVWTakfpOAdW4YlFCqtfYfaBxN2Tba6SIoCGdbBkoGW/yIb6grhhUXM5MN0rPbDz QiovAzUUMqJvzb+u0XTL+MuUcqniLs1qazUSmxxUHnt1crd3OhGDlCRj7IUZKHVaPshA lLAgDkmUvKBgaCXY3u+MmgsvRx8R2G/CVVS/pFSil/uCH1XIzptW9D36+xpq3mKj5JR0 HU1Gh7sSdvy3ps00dyL+YJanb21djpMkcaQCgRjM9OO2Z0xi6dT7fYRKu+CiFn6XVAhR EvdLMFquQqTLRO+wrcUWkNUldSCf/xAzM/CuaVCABsDxHbpZJyl0GpvK3Au9PxnRsqAi Dq3Q== MIME-Version: 1.0 X-Received: by 10.52.74.38 with SMTP id q6mr9240240vdv.17.1358526008608; Fri, 18 Jan 2013 08:20:08 -0800 (PST) Sender: artemb@gmail.com Received: by 10.220.122.196 with HTTP; Fri, 18 Jan 2013 08:20:08 -0800 (PST) In-Reply-To: <20130118112630.GA41074@mid.pc5.i.0x5.de> References: <20130114094010.GA75529@mid.pc5.i.0x5.de> <20130114195148.GA20540@mid.pc5.i.0x5.de> <20130114214652.GA76779@mid.pc5.i.0x5.de> <20130115224556.GA41774@mid.pc5.i.0x5.de> <20130116073759.GA47781@mid.pc5.i.0x5.de> <20130118112630.GA41074@mid.pc5.i.0x5.de> Date: Fri, 18 Jan 2013 08:20:08 -0800 X-Google-Sender-Auth: 70rmbILCuVZFPnvxl2-xn-17Xew Message-ID: Subject: Re: slowdown of zfs (tx->tx) From: Artem Belevich To: Nicolas Rachinsky Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 16:20:15 -0000 On Fri, Jan 18, 2013 at 3:26 AM, Nicolas Rachinsky wrote: > * Artem Belevich [2013-01-16 00:45 -0800]: >> On Tue, Jan 15, 2013 at 11:37 PM, Nicolas Rachinsky >> wrote: >> >> You may want to update your system to very recent FreeBSD as quite a >> >> few fixes were recently imported from illumos. Hopefully it will deal >> >> with the issue. I'm out of ideas otherwise. Sorry. >> > >> > Do you mean -CURRENT or -STABLE with very recent? Or just 9.1? >> >> -HEAD or -STABLE (-8 or -9). > > I have now updated the machine to stable/8 r245541. I have not updated > the zpool. > > But the problem still occurs. Should I update the pool? Or try other > things first? Updating the pool is an irreversible operation. In general I'd suggest trying less drastic options first. Other people suggested that the problem may be just a side effect of almost-full filesystem. ZFS needs fair amount of unfragmented free space in order to work efficiently. If that's what's causing your problem, then one thing to try would be to free enough free space. The gotcha there is that you need to free up enough contiguous space. Removing bunch of recently written files may not help as those writes would happen on already fragmented FS. Removing files written when FS had a lot of free space may have better chance of freeing contiguous space. Old snapshots are good candidates for this. Other than that I'm out of ideas. Sorry. --Artem From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 17:13:18 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7A0205A2; Fri, 18 Jan 2013 17:13:18 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 500B388F; Fri, 18 Jan 2013 17:13:18 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C3F63B993; Fri, 18 Jan 2013 12:13:17 -0500 (EST) From: John Baldwin To: Bruce Evans Subject: Re: [PATCH] Use vfs_timestamp() instead of getnanotime() in NFS Date: Fri, 18 Jan 2013 12:12:41 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <460209850.2108683.1358475815866.JavaMail.root@erie.cs.uoguelph.ca> <20130118165934.K1042@besplex.bde.org> In-Reply-To: <20130118165934.K1042@besplex.bde.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201301181212.41321.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 18 Jan 2013 12:13:17 -0500 (EST) Cc: freebsd-fs@freebsd.org, Rick Macklem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 17:13:18 -0000 On Friday, January 18, 2013 1:19:29 am Bruce Evans wrote: > On Thu, 17 Jan 2013, Rick Macklem wrote: > > > John Baldwin wrote: > >> On Tuesday, January 15, 2013 2:58:42 pm John Baldwin wrote: > >>> Fixing NFS to properly use vfs_timestamp() seems to be a larger > >>> project. > >> > >> Actually, I have a patch that I think does this below. It builds, have > >> not > >> yet booted it (but will do so in a bit). > >> > >> Index: fs/nfsclient/nfs_clstate.c > >> =================================================================== > >> --- fs/nfsclient/nfs_clstate.c (revision 245225) > >> +++ fs/nfsclient/nfs_clstate.c (working copy) > >> @@ -4611,7 +4611,7 @@ > >> } > >> dp = nfscl_finddeleg(clp, np->n_fhp->nfh_fh, np->n_fhp->nfh_len); > >> if (dp != NULL && (dp->nfsdl_flags & NFSCLDL_WRITE)) { > >> - NFSGETNANOTIME(&dp->nfsdl_modtime); > >> + vfs_timestamp(&dp->nfsdl_modtime); > >> dp->nfsdl_flags |= NFSCLDL_MODTIMESET; > >> } > >> NFSUNLOCKCLSTATE(); > > Not sure about this case. Although nfsdl_modtime is being set for local > > use, it replaces the mtime returned by the NFS server while the delegation > > is in use. Ideally it would be the same resolution as the NFS server, but > > that resolution isn't known to the client. (It is often better than 1sec, > > which is the default for vfs_timestamp().) > > The patch seems about right except for this. > > > I'd be tempted to leave it (although the function used by the macro might > > need to be changed, since Bruce mentions getnanotime() isn't supposed to > > be used?). > > For maximal precision and accuracy, it nanotime() should be used. I'm > not sure if you need to be at least as precise and accurate as the server. > Having them synced to nanoseconds accuracy is impossible, but > getnanotime() gives <= 1/HZ of accuracy and it is easy for them to be > synced with more accuracy than that. Then the extra accuracy can be > seen in server timestamps if the server is FreeBSD and uses vfs_timestamp() > with a either microtime() or nanotime(). I've certainly seen NFS servers use much more finely-grained VFS timestamps (e.g. Isilon nodes run with vfs.timestamp_precision of 2 or 3 so they give more precise timestamps than just getnanotime()). OTOH, clock drift between the client and server could easily screw this up. I will leave this as-is for now and just commit the vfs_timestamp() changes first. > Further style fixes: > - remove the NFSGETNANOTIME() macro. It is only used in the above, and > in 3 other instances where its use is bogus because only the seconds > part is used. The `time_second' global gives seconds part with the > same (in)accuracy as getnanotime(). If you want maximal accuracy > for just the seconds part, then bintime() should be used (this is > slightly faster than microtime() and nanotime(). > (get*time()'s seconds part is the same as time_second. This > inaccurate since it lags bintime()'s seconds part by up to 1/HZ > seconds (so it differs by a full second for an everage of 1 one > in every HZ readings). The difference is visible if one reader, > say make(1) reads the time using bintime() while another reader, > say vfs_timestamp() reads the time using getbintime().) Yes, I wondered if I could replace those with time_second. The same is true of the remaining uses of NFSGETTIME() as they also only use the seconds portion. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 17:34:56 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 52768F9E for ; Fri, 18 Jan 2013 17:34:56 +0000 (UTC) (envelope-from dppascual@gmail.com) Received: from mail-ia0-x235.google.com (mail-ia0-x235.google.com [IPv6:2607:f8b0:4001:c02::235]) by mx1.freebsd.org (Postfix) with ESMTP id 26E05B04 for ; Fri, 18 Jan 2013 17:34:56 +0000 (UTC) Received: by mail-ia0-f181.google.com with SMTP id k25so1612311iah.26 for ; Fri, 18 Jan 2013 09:34:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=yAs92Pf1RYhw4xjC1znAMcZusfhaW0X7rzO/JHNdxZo=; b=OLDFxdNJDF1eOODchXq3yX9K4uLtwKHQDDTKMh931fb4DDuow9071/O9g1iAWn6h5+ LOdkw0D2VVMzSXR9kkm8LRcpPgkoXCRvPNnZrc3Op/pWJV/yKqmbznW9+NF/WxSndp8F XLzbtLQiieryYQxj907+D93kNuIFROJ1OhZsAA2T5Bt31xKeZqeSY2GLveQKHsCeKQCr VUFJ/grA8SIbVYI6s+LdxjMLxpSUtVjG0c5Rut+sljSl+8b6I1TTST1PxdZjr0BRtAIG 7pzQor3ii91WKgtZSGwTlYVw9OXRlTitFx+3VXF5jAlRQ8gGvKXfowph5kvrAT5kwJJC 6QYQ== MIME-Version: 1.0 X-Received: by 10.42.98.80 with SMTP id r16mr6556503icn.45.1358530495865; Fri, 18 Jan 2013 09:34:55 -0800 (PST) Received: by 10.50.153.168 with HTTP; Fri, 18 Jan 2013 09:34:55 -0800 (PST) Received: by 10.50.153.168 with HTTP; Fri, 18 Jan 2013 09:34:55 -0800 (PST) In-Reply-To: References: Date: Fri, 18 Jan 2013 18:34:55 +0100 Message-ID: Subject: Re: Enable UNMAP in ZFS From: Dani To: Mark Felder Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 17:34:56 -0000 Hi all, For SATA the command is called TRIM, for SAS the command is called UNMAP. Regards El 18/01/2013 16:17, "Mark Felder" escribi=F3: > On Fri, 18 Jan 2013 07:07:19 -0600, Dani wrote: > > >> I have installed FreeBSD 9.1 and have created a ZFS pool with SAS disks. >> How can I enable UNMAP on SSDs devices used as cache and log devices? >> > > By UNMAP do you mean TRIM? > From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 18:13:02 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AB56C6E5 for ; Fri, 18 Jan 2013 18:13:02 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-lb0-f179.google.com (mail-lb0-f179.google.com [209.85.217.179]) by mx1.freebsd.org (Postfix) with ESMTP id 30AFDDB0 for ; Fri, 18 Jan 2013 18:13:01 +0000 (UTC) Received: by mail-lb0-f179.google.com with SMTP id gm13so2864472lbb.24 for ; Fri, 18 Jan 2013 10:13:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=NqvEgT0z3dXu1S4xhgTkhvadPz3+d0KGAPA65ZlgwGA=; b=PBJEVYg+Ty/u20Z5WOhV+6HR4stfAR98Bjf3z6SGpJDXYYGC2YHyMMigEPX3ngR/ZJ rVCIftCxKwU5SjHQC6HH33oFnCRSls3gkmpabQznX0e1zuEJvn6Nj5VLcQtSMJ9wG7jr MOh3XpWVI0eRR0/wGNwcqfs4BJNSPtXcTbw+icUWo5guq5mJGZoecc2yIfIYE0inu2QK 6iOalB2tVKhRk8KUeFZNl49/ol2dymvgCxXh29gwOLje732ubpjau6uD1gmTTXf4RAKU zeTdb5oby4F0gVQkfujRDVnQQddebIIihTYB/Xrk6Sqw+Pogv6c4kiH6BhOBdcUS9zcP 4/TA== X-Received: by 10.112.26.169 with SMTP id m9mr4187766lbg.116.1358532780675; Fri, 18 Jan 2013 10:13:00 -0800 (PST) Received: from [192.168.1.130] (mau.donbass.com. [92.242.127.250]) by mx.google.com with ESMTPS id j9sm1878629lbd.13.2013.01.18.10.12.57 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 18 Jan 2013 10:12:58 -0800 (PST) Message-ID: <50F990A9.1030305@gmail.com> Date: Fri, 18 Jan 2013 20:12:57 +0200 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:18.0) Gecko/20100101 Firefox/18.0 SeaMonkey/2.15 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Subject: lz4 support for ZFS Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 18:13:02 -0000 Hi all. I see LZ4 is now supported in head. Can I ask is there any plans MFC'ing it to stable? -- Sphinx of black quartz, judge my vow. From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 23:29:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id BCC5D87B for ; Fri, 18 Jan 2013 23:29:41 +0000 (UTC) (envelope-from feld@feld.me) Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2]) by mx1.freebsd.org (Postfix) with ESMTP id 94D89DE0 for ; Fri, 18 Jan 2013 23:29:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me; s=blargle; h=Message-Id:Cc:To:Date:From:Subject:Content-Type:Mime-Version:References:In-Reply-To; bh=zLLjRUWU+o4bf88d1fwLrgLLUusiw4+FIittDc+uuv0=; b=JAB3p+ApMf3GLPWrQrN4/U9jamoI6c9OrG5N8TrvJwZOj0K+JdjSHgnFoXMq5Sbo1MKlUuLt6QtcavGKD0sGO8FiTJdOqeO3+Mx7W1BscrSEyLQZhzvOlSkRRtpopl01; Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org) by feld.me with esmtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1TwLNU-0005rb-BP; Fri, 18 Jan 2013 17:29:40 -0600 Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4) with esmtpsa id 1358551774-40632-89420/5/3; Fri, 18 Jan 2013 23:29:34 +0000 User-Agent: K-9 Mail for Android In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Subject: Re: Enable UNMAP in ZFS From: Mark Felder Date: Fri, 18 Jan 2013 11:59:37 -0600 To: Dani Message-Id: Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 23:29:41 -0000 I believe this requires 9-STABLE or 10 From owner-freebsd-fs@FreeBSD.ORG Fri Jan 18 23:36:27 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3A8F7956; Fri, 18 Jan 2013 23:36:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id C38ADE18; Fri, 18 Jan 2013 23:36:26 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAJDb+VCDaFvO/2dsb2JhbABEhkW0D4N8c4IeAQEFIwRSGw4KAgINGQJZBogsqlKRaYEjjwOBEwOIYY0riU2GfIMTggY X-IronPort-AV: E=Sophos;i="4.84,495,1355115600"; d="scan'208";a="12685634" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 18 Jan 2013 18:36:20 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 2D267B3F0D; Fri, 18 Jan 2013 18:36:20 -0500 (EST) Date: Fri, 18 Jan 2013 18:36:20 -0500 (EST) From: Rick Macklem To: John Baldwin Message-ID: <1309255502.2141506.1358552180122.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201301181212.41321.jhb@freebsd.org> Subject: Re: [PATCH] Use vfs_timestamp() instead of getnanotime() in NFS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Rick Macklem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2013 23:36:27 -0000 John Baldwin wrote: > On Friday, January 18, 2013 1:19:29 am Bruce Evans wrote: > > On Thu, 17 Jan 2013, Rick Macklem wrote: > > > > > John Baldwin wrote: > > >> On Tuesday, January 15, 2013 2:58:42 pm John Baldwin wrote: > > >>> Fixing NFS to properly use vfs_timestamp() seems to be a larger > > >>> project. > > >> > > >> Actually, I have a patch that I think does this below. It builds, > > >> have > > >> not > > >> yet booted it (but will do so in a bit). > > >> > > >> Index: fs/nfsclient/nfs_clstate.c > > >> =================================================================== > > >> --- fs/nfsclient/nfs_clstate.c (revision 245225) > > >> +++ fs/nfsclient/nfs_clstate.c (working copy) > > >> @@ -4611,7 +4611,7 @@ > > >> } > > >> dp = nfscl_finddeleg(clp, np->n_fhp->nfh_fh, np->n_fhp->nfh_len); > > >> if (dp != NULL && (dp->nfsdl_flags & NFSCLDL_WRITE)) { > > >> - NFSGETNANOTIME(&dp->nfsdl_modtime); > > >> + vfs_timestamp(&dp->nfsdl_modtime); > > >> dp->nfsdl_flags |= NFSCLDL_MODTIMESET; > > >> } > > >> NFSUNLOCKCLSTATE(); > > > Not sure about this case. Although nfsdl_modtime is being set for > > > local > > > use, it replaces the mtime returned by the NFS server while the > > > delegation > > > is in use. Ideally it would be the same resolution as the NFS > > > server, but > > > that resolution isn't known to the client. (It is often better > > > than 1sec, > > > which is the default for vfs_timestamp().) > > > > The patch seems about right except for this. > > > > > I'd be tempted to leave it (although the function used by the > > > macro might > > > need to be changed, since Bruce mentions getnanotime() isn't > > > supposed to > > > be used?). > > > > For maximal precision and accuracy, it nanotime() should be used. > > I'm > > not sure if you need to be at least as precise and accurate as the > > server. > > Having them synced to nanoseconds accuracy is impossible, but > > getnanotime() gives <= 1/HZ of accuracy and it is easy for them to > > be > > synced with more accuracy than that. Then the extra accuracy can be > > seen in server timestamps if the server is FreeBSD and uses > > vfs_timestamp() > > with a either microtime() or nanotime(). > > I've certainly seen NFS servers use much more finely-grained VFS > timestamps > (e.g. Isilon nodes run with vfs.timestamp_precision of 2 or 3 so they > give > more precise timestamps than just getnanotime()). OTOH, clock drift > between > the client and server could easily screw this up. I will leave this > as-is > for now and just commit the vfs_timestamp() changes first. > > > Further style fixes: > > - remove the NFSGETNANOTIME() macro. It is only used in the above, > > and > > in 3 other instances where its use is bogus because only the > > seconds > > part is used. The `time_second' global gives seconds part with > > the > > same (in)accuracy as getnanotime(). If you want maximal accuracy > > for just the seconds part, then bintime() should be used (this is > > slightly faster than microtime() and nanotime(). > > (get*time()'s seconds part is the same as time_second. This > > inaccurate since it lags bintime()'s seconds part by up to 1/HZ > > seconds (so it differs by a full second for an everage of 1 one > > in every HZ readings). The difference is visible if one reader, > > say make(1) reads the time using bintime() while another > > reader, > > say vfs_timestamp() reads the time using getbintime().) > > Yes, I wondered if I could replace those with time_second. The same > is true of the remaining uses of NFSGETTIME() as they also only use > the > seconds portion. > Those macros are just cruft left over from when the code was written to be portable between various BSDen. And what they were mapped to was just something that seemed to work. Feel free to replace them with whatever makes sense, rick > -- > John Baldwin From owner-freebsd-fs@FreeBSD.ORG Sat Jan 19 13:34:17 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2B8E81E4 for ; Sat, 19 Jan 2013 13:34:17 +0000 (UTC) (envelope-from prvs=17316559e7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id C5CE0BE8 for ; Sat, 19 Jan 2013 13:34:16 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001761944.msg for ; Sat, 19 Jan 2013 13:34:09 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Sat, 19 Jan 2013 13:34:09 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=17316559e7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Dani" , References: Subject: Re: Enable UNMAP in ZFS Date: Sat, 19 Jan 2013 13:34:38 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jan 2013 13:34:17 -0000 As others have said ZFS TRIM support, which can be backed by SATA TRIM or SCSI UNMAP, is supported in head. All these require driver support for the underlying card, which currently means either CAM ata or scsi supported controllers. I do plan to MFC ZFS TRIM to stable but it could be a few weeks before I get to it. I also have a fairly extensive patch for cam which improves cam trim, unmap support if you want to test it. Regards Steve ----- Original Message ----- From: "Dani" To: Sent: Friday, January 18, 2013 1:07 PM Subject: Enable UNMAP in ZFS > Hi all, > > I have installed FreeBSD 9.1 and have created a ZFS pool with SAS disks. > How can I enable UNMAP on SSDs devices used as cache and log devices? > > Thanks you. Regards. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Sat Jan 19 17:01:07 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F26E8F3D; Sat, 19 Jan 2013 17:01:06 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 96B057A5; Sat, 19 Jan 2013 17:01:05 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA00114; Sat, 19 Jan 2013 19:00:57 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Twbmq-0004Mx-Pt; Sat, 19 Jan 2013 19:00:56 +0200 Message-ID: <50FAD145.10906@FreeBSD.org> Date: Sat, 19 Jan 2013 19:00:53 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Gavin Atkinson Subject: Re: ZFS lock up 9-stable r244911 (Jan) References: In-Reply-To: X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jan 2013 17:01:07 -0000 on 18/01/2013 17:01 Gavin Atkinson said the following: > > Hi all, > > I have a machine on which ZFS appears to have locked up, and any processes > that attempt to access the ZFS filesystem. This machine is running > 9-stable amd64 r244911 (though from cvs, not SVN), and therefore I believe > has all of avg's ZFS deadlock patches. > > This machine has both UFS and ZFS filesystems. All of the "system" > filesystems are on UFS, and as a result the machine itself is responsive > and I can investigate state using kgdb against the live kernel. I've > included all thread backtraces, a couple of other bits relating to held > locks, and ps/sysctl output at > http://people.freebsd.org/~gavin/tay-zfs-hang.txt > http://people.freebsd.org/~gavin/tay-sysctl-a.txt > http://people.freebsd.org/~gavin/tay-ps-auxwwwH.txt > > This machine was in use as a pkgng package builder, using poudriere. > Poudriere makes heavy use of zfs filesystems within jails, including "zfs > get", "zfs set", "zfs snapshot", "zfs rollback", "zfs diff" and other > commands, although there do not appear to be any instances of the zfs > process running currently. At the time of the hang 16 parallel builds were > in progress, > > The underlying disk subsystem is a single hardware RAID-10 on a twa > controller, and the zpool is on a single partition of this device. The > RAID-10 itself is intact, the controller reports no errors. There is no > L2ARC or separate ZIL. The UFS filesystems (still seem to be fully > functional) are on separate partitions on the same underlying device, so I > do not believe the underlying storage is having issues. I can "dd" from > the underlying ZFS partition without problem. Nothing has been logged to > /var/log/messages. Based on the above information plus some additional debugging information that Gavin has kindly provided to me I've developed the following "theory" to explain this deadlock. I believe that was very high disk load (overwhelmingly high load) before the deadlock occurred. Further I think that there was a substantial number of high priority writes. Under those conditions the number of in-progress/pending zio-s was constantly at zfs_vdev_max_pending (by default 10). Number of queued zio-s was above hundred: vdev_queue = { vq_deadline_tree = {avl_root = 0xfffffe0338dbb248, avl_compar = 0xffffffff816855b0 , avl_offset = 584, avl_numnodes = 116, avl_size = 896}, vq_read_tree = {avl_root = 0xfffffe019d0b65b0, avl_compar = 0xffffffff81685600 , avl_offset = 560, avl_numnodes = 8, avl_size = 896}, vq_write_tree = { avl_root = 0xfffffe03e3d19230, avl_compar = 0xffffffff81685600 , avl_offset = 560, avl_numnodes = 108, avl_size = 896}, vq_pending_tree = {avl_root = 0xfffffe025e32c230, avl_compar = 0xffffffff81685600 , avl_offset = 560, avl_numnodes = 10, avl_size = 896}, vq_lock = {lock_object = {lo_name = 0xffffffff8172afc9 "vq->vq_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}}, vdev_cache = {vc_offset_tree = {avl_root = 0x0, avl_compar = 0xffffffff81681740 , avl_offset = 24, avl_numnodes = 0, avl_size = 88}, vc_lastused_tree = { avl_root = 0x0, avl_compar = 0xffffffff81681760 , avl_offset = 48, avl_numnodes = 0, avl_size = 88} Apparently processing of zio-s was so lagging behind that some executed zio-s triggered "late arrival" condition. My incomplete understanding shows here - I am not sure what exactly triggers the condition and what is so special about it, but from the following stack traces we can see that all five of zio_write_intr_high taskqueue threads were executing dmu_sync_late_arrival_done(): 0 100432 kernel zio_write_intr_h mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_rele_to_sync+0x36 dmu_tx_commit+0xf1 dmu_sync_late_arrival_done+0x52 zio_done+0x353 zio_execute+0xc3 zio_done+0x3d0 zio_execute+0xc3 taskqueue_run_locked+0x74 taskqueue_thread_loop+0x46 fork_exit+0x11f fork_trampoline+0xe 0 100433 kernel zio_write_intr_h mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_rele_to_sync+0x36 dmu_tx_commit+0xf1 dmu_sync_late_arrival_done+0x52 zio_done+0x353 zio_execute+0xc3 zio_done+0x3d0 zio_execute+0xc3 taskqueue_run_locked+0x74 taskqueue_thread_loop+0x46 fork_exit+0x11f fork_trampoline+0xe 0 100434 kernel zio_write_intr_h mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_rele_to_sync+0x36 dmu_tx_commit+0xf1 dmu_sync_late_arrival_done+0x52 zio_done+0x353 zio_execute+0xc3 zio_done+0x3d0 zio_execute+0xc3 taskqueue_run_locked+0x74 taskqueue_thread_loop+0x46 fork_exit+0x11f fork_trampoline+0xe 0 100435 kernel zio_write_intr_h mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_rele_to_sync+0x36 dmu_tx_commit+0xf1 dmu_sync_late_arrival_done+0x52 zio_done+0x353 zio_execute+0xc3 zio_done+0x3d0 zio_execute+0xc3 taskqueue_run_locked+0x74 taskqueue_thread_loop+0x46 fork_exit+0x11f fork_trampoline+0xe 0 100436 kernel zio_write_intr_h mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_rele_to_sync+0x36 dmu_tx_commit+0xf1 dmu_sync_late_arrival_done+0x52 zio_done+0x353 zio_execute+0xc3 zio_done+0x3d0 zio_execute+0xc3 taskqueue_run_locked+0x74 taskqueue_thread_loop+0x46 fork_exit+0x11f fork_trampoline+0xe In additional to the above, the taskqueue associated with the above threads has another 9 pending tasks. As you can see that "late arrival" code path involves txg_rele_to_sync() where an instance tc_lock is acquired. Further, it looks that tc_lock instances are held by the following threads: 64998 101921 pkg initial thread mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_delay+0x9d dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_freebsd_create+0x310 VOP_CREATE_APV+0x31 vn_open_cred+0x4b7 kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7 66152 102491 pkg initial thread mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_delay+0x9d dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 vn_write+0x37e vn_io_fault+0x90 dofilewrite+0x85 kern_writev+0x6c sys_write+0x64 amd64_syscall+0x540 Xfast_syscall+0xf7 75803 101638 find - mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_delay+0x9d dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_inactive+0x1b7 zfs_freebsd_inactive+0x1a vinactive+0x86 vputx+0x2d8 sys_fchdir+0x356 amd64_syscall+0x540 Xfast_syscall+0xf7 75809 102932 find - mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_delay+0x9d dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_inactive+0x1b7 zfs_freebsd_inactive+0x1a vinactive+0x86 vputx+0x2d8 sys_fchdir+0x356 amd64_syscall+0x540 Xfast_syscall+0xf7 75813 101515 find - mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_delay+0x9d dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_inactive+0x1b7 zfs_freebsd_inactive+0x1a vinactive+0x86 vputx+0x2d8 sys_fchdir+0x356 amd64_syscall+0x540 Xfast_syscall+0xf7 77468 101412 update-mime-databas initial thread mi_switch+0x186 sleepq_wait+0x42 _sx_xlock_hard+0x426 _sx_xlock+0x51 txg_delay+0x9d dsl_pool_tempreserve_space+0xd5 dsl_dir_tempreserve_space+0x154 dmu_tx_assign+0x370 zfs_freebsd_write+0x45b VOP_WRITE_APV+0xb2 vn_write+0x37e vn_io_fault+0x90 dofilewrite+0x85 kern_writev+0x6c sys_write+0x64 amd64_syscall+0x540 Xfast_syscall+0xf7 These threads calls txg_delay also because of the high load. In the code we see that dmu_tx_assign first grabs an instance of tc_lock and then calls dsl_dir_tempreserve_space. Also, we see that txg_delay tries to acquire tx_sync_lock and that's where all these threads are block. Then we see that txg_sync_thread holds tx_sync_lock, but in its turn it is blocked waiting for its zio: 1552 100544 zfskern txg_thread_enter mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x112 zio_wait+0x61 dbuf_read+0x5e5 dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_lookup_norm+0x45 zap_lookup+0x2e feature_get_refcount+0x4b spa_feature_is_active+0x52 dsl_scan_active+0x63 txg_sync_thread+0x20d fork_exit+0x11f fork_trampoline+0xe So a summary. For some completed zio-s their zio_done routines are blocked because of dmu_sync_late_arrival_done->txg_rele_to_sync->tc_lock. tc_locks are held by threads in dmu_tx_assign->..->txg_delay where txg_delay is blocked on tx_sync_lock. tx_sync_lock is held by txg_sync_thread which waits for its zio to be processed. The zio is held on queue and is not getting processed because the vdev already has too many pending/in-progress zio-s. This theory looks plausible to me, but I'd like to hear what the experts think. Even more important question is how this situation can be avoided. -- Andriy Gapon