From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 07:57:28 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 1C44495C;
 Sun, 26 Oct 2014 07:57:28 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B3631955;
 Sun, 26 Oct 2014 07:57:27 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9Q7vKBQ044691
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sun, 26 Oct 2014 09:57:20 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9Q7vKBQ044691
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s9Q7vKIi044690;
 Sun, 26 Oct 2014 09:57:20 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sun, 26 Oct 2014 09:57:20 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: panic in nfs on arm
Message-ID: <20141026075720.GO1877@kib.kiev.ua>
References: <op.xn95m7ajeclrs1@82-171-231-144.ip.telfort.nl>
 <1388627434.7506173.1414279273153.JavaMail.root@uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1388627434.7506173.1414279273153.JavaMail.root@uoguelph.ca>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: Ronald Klop <ronald@klop.ws>, freebsd-fs@freebsd.org,
 freebsd-arm@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 07:57:28 -0000

On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote:
> Ronald Klop wrote:
> > Hi,
> > 
> > I got a panic on my arm computer while building a port with
> > /usr/ports
> > mounted from my FreeBSD-10-STABLE/amd64 machine.
> > 
> > This is the machine which paniced:
> > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014
> > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG
> >  arm
> > 
> > 
> > Tracing pid 90295 tid 100119 td 0xc5f8c960
> > db_trace_self() at db_trace_self
> >           pc = 0xc0bb12c8  lr = 0xc0bb1354 (db_trace_thread+0x50)
> >           sp = 0xdf29e5d0  fp = 0xc3e07120
> > db_trace_thread() at db_trace_thread+0x50
> >           pc = 0xc0bb1354  lr = 0xc0936314 (db_command_init+0x5a4)
> >           sp = 0xdf29e630  fp = 0xc3e07120
> > db_command_init() at db_command_init+0x5a4
> >           pc = 0xc0936314  lr = 0xc0935ad0 (db_skip_to_eol+0x484)
> >           sp = 0xdf29e648  fp = 0xc3e07120
> >           r4 = 0xc0c8d350  r5 = 0x00000000
> > db_skip_to_eol() at db_skip_to_eol+0x484
> >           pc = 0xc0935ad0  lr = 0xc0935c38 (db_command_loop+0x5c)
> >           sp = 0xdf29e6e8  fp = 0xc3e07120
> >           r4 = 0xdf29e6fc  r5 = 0xc0c8d64c
> >           r6 = 0x3cd90e75  r7 = 0x00000000
> >           r8 = 0x00000001 r10 = 0x600000d3
> > db_command_loop() at db_command_loop+0x5c
> >           pc = 0xc0935c38  lr = 0xc0937f80 (X_db_sym_numargs+0xec)
> >           sp = 0xdf29e6f0  fp = 0xc3e07120
> > X_db_sym_numargs() at X_db_sym_numargs+0xec
> >           pc = 0xc0937f80  lr = 0xc0a6f0c0 (kdb_trap+0x94)
> >           sp = 0xdf29e808  fp = 0xc3e07120
> >           r4 = 0xdf29e8f8
> > kdb_trap() at kdb_trap+0x94
> >           pc = 0xc0a6f0c0  lr = 0xc0bc1d60 (badaddr_read+0x274)
> >           sp = 0xdf29e828  fp = 0xc3e07120
> >           r4 = 0xdf29e8f8  r5 = 0x00000001
> >           r6 = 0x3cd90e75  r7 = 0xc5f8c960
> >           r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0
> > badaddr_read() at badaddr_read+0x274
> >           pc = 0xc0bc1d60  lr = 0xc0bc1e98 (badaddr_read+0x3ac)
> >           sp = 0xdf29e840  fp = 0xc3e07120
> >           r4 = 0xc5f8c960  r5 = 0xdf29e8f8
> >           r6 = 0x3cd90e05
> > badaddr_read() at badaddr_read+0x3ac
> >           pc = 0xc0bc1e98  lr = 0xc0bc2278 (data_abort_handler+0x10c)
> >           sp = 0xdf29e858  fp = 0xc3e07120
> >           r4 = 0xc0cd8af8  r5 = 0xffff1004
> > data_abort_handler() at data_abort_handler+0x10c
> >           pc = 0xc0bc2278  lr = 0xc0bb2f40 (exception_exit)
> >           sp = 0xdf29e8f8  fp = 0xc3e07120
> >           r4 = 0xffffffff  r5 = 0xffff1004
> >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
> >           r8 = 0x0000000f  r9 = 0x00000101
> >          r10 = 0x0000001d
> > exception_exit() at exception_exit
> >           pc = 0xc0bb2f40  lr = 0xc0b8daf8 (uma_reclaim+0x1f8)
> >           sp = 0xdf29e948  fp = 0xc3e07120
> >           r0 = 0xba9b9127  r1 = 0x8b3de5fb
> >           r2 = 0xc61c1fc8  r3 = 0xba9b9126
> >           r4 = 0x00000000  r5 = 0xc61c1fc8
> >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
> >           r8 = 0x0000000f  r9 = 0x00000101
> >          r10 = 0x0000001d r12 = 0x00000000
> > uma_reclaim() at uma_reclaim+0x24c
> This looks to me like a crash in uma_reclaim() and I find UMA
> way too obscure to understand.
> 
> I have no idea if it might be related, but alc@ put a fix for low
> memory situations in r272071 (or maybe it's r272221?).
> 
> Might be worth trying a slightly newer kernel to see if the
> problem still occurs.
> 
> And hopefully someone more conversant with UMA (or this stack
> trace) can help more.
> 
> rick
> 
> >           pc = 0xc0b8db4c  lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0)
> >           sp = 0xdf29e978  fp = 0xdf29ec10
> >           r4 = 0xc3e071d8  r5 = 0xc0e0ea00
> >           r6 = 0xc3e07120  r7 = 0x00000000
> >           r8 = 0x00000102  r9 = 0xdf29ecf8
> >          r10 = 0xc61c0760
> > uma_zalloc_arg() at uma_zalloc_arg+0x2f0
uma_reclaim() is not called from uma_zalloc().
I think there is some issue with ddb on arm, which means that
the backtrace is not useful.  See below for one more.

> >           pc = 0xc0b8c800  lr = 0xc09e1df0 (nfscl_nget+0x308)
> >           sp = 0xdf29e990  fp = 0xdf29ec10
> >           r4 = 0x9bb9fa43  r5 = 0x00000000
> >           r6 = 0xc550dce8  r7 = 0xc3edaa00
> >           r8 = 0xc3ebbac0
> > nfscl_nget() at nfscl_nget+0x308
> >           pc = 0xc09e1df0  lr = 0xc09da69c (ncl_readlinkrpc+0xf60)
> >           sp = 0xdf29e9d8  fp = 0xdf29ea10
> >           r4 = 0xc550dce8  r5 = 0x00000000
> >           r6 = 0xc550dcf8  r7 = 0xdf29ecf8
> >           r8 = 0xdf29ec6c  r9 = 0x00000000
> >          r10 = 0xdf29ed28
> > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60
> >           pc = 0xc09da69c  lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94)
> >           sp = 0xdf29ec40  fp = 0xbffff620
> >           r4 = 0xc0c95c68  r5 = 0xdf29ec6c
> >           r6 = 0x00000001  r7 = 0x00020284
> >           r8 = 0xffffff9c  r9 = 0x00200800
> >          r10 = 0xc5f8c960
> > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94
I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(),
esp. without intervening frame.

> >           pc = 0xc0bdae44  lr = 0xc0aca614 (kern_mkdirat+0x18c)
> >           sp = 0xdf29ec50  fp = 0xbffff620
> >           r4 = 0xdf29ed28  r5 = 0xdf29ec90
> >           r6 = 0x00000000
> > kern_mkdirat() at kern_mkdirat+0x18c
> >           pc = 0xc0aca614  lr = 0xc0aca684 (kern_mkdir+0x24)
> >           sp = 0xdf29ede0  fp = 0xbffff620
> >           r4 = 0x00020290  r5 = 0xc5f8c960
> >           r6 = 0x00000000  r7 = 0xc5f7f000
> >           r8 = 0x00000000 r10 = 0x00013640
> > kern_mkdir() at kern_mkdir+0x24
> >           pc = 0xc0aca684  lr = 0xc0aca6a8 (sys_mkdir+0x1c)
> >           sp = 0xdf29edf0  fp = 0xbffff620
> > sys_mkdir() at sys_mkdir+0x1c
> >           pc = 0xc0aca6a8  lr = 0xc0bc2884 (swi_handler+0x254)
> >           sp = 0xdf29edf8  fp = 0xbffff620
> > swi_handler() at swi_handler+0x254
> >           pc = 0xc0bc2884  lr = 0xc0bb2ed0 (swi_exit)
> >           sp = 0xdf29ee60  fp = 0xbffff620
> >           r4 = 0x00020290  r5 = 0x2085e8e0
> >           r6 = 0x00020284  r7 = 0x00000088
> >           r8 = 0x00000001
> > swi_exit() at swi_exit
> >           pc = 0xc0bb2ed0  lr = 0xc0bb2ed0 (swi_exit)
> >           sp = 0xdf29ee60  fp = 0xbffff620
> > Unable to unwind further
> > 
> > 
> > Unfortunately dumping the kernel core also paniced.
> > db> dump
> > Physical memory: 507 MB
> > Dumping 74 MB: 71 67 63
> > vm_fault(0xc4147000, 0, 1, 0) -> 0
> > Fatal kernel mode data abort: 'Translation Fault (P)'
> > trapframe: 0xdf29e0b8
> > FSR=00000017, FAR=00000014, spsr=a00000d3
> > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004
> > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c
> > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a
> > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060
> > 
> > panic: Fatal abort
> > Uptime: 3d18h30m32s
> > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> > 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 09:55:01 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 7ABFFB6C;
 Sun, 26 Oct 2014 09:55:01 +0000 (UTC)
Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3952B367;
 Sun, 26 Oct 2014 09:55:00 +0000 (UTC)
Received: from mail (mail [192.168.254.3])
 by mail.madpilot.net (Postfix) with ESMTP id 3jQZKP0v2qzb0g;
 Sun, 26 Oct 2014 10:54:49 +0100 (CET)
Received: from mail.madpilot.net ([192.168.254.3])
 by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024)
 with ESMTP id 8XLrlYF-z2dL; Sun, 26 Oct 2014 10:54:34 +0100 (CET)
Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206])
 by mail.madpilot.net (Postfix) with ESMTPSA;
 Sun, 26 Oct 2014 10:54:28 +0100 (CET)
Message-ID: <544CC4D4.7040203@FreeBSD.org>
Date: Sun, 26 Oct 2014 10:54:28 +0100
From: Guido Falsi <madpilot@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: FreeBSD FS <freebsd-fs@FreeBSD.org>
Subject: Re: panic: detach with active requests on 10.1-RC3
References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net>
 <544BC990.4030700@madpilot.net>
In-Reply-To: <544BC990.4030700@madpilot.net>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Cc: Glen Barber <gjb@FreeBSD.org>, freebsd-stable@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 09:55:01 -0000

On 10/25/14 18:02, Guido Falsi wrote:
> On 10/25/14 17:02, Guido Falsi wrote:
>> On 10/24/14 15:26, Guido Falsi wrote:
>>> Hi,
>>>
>>> I'm making some experiments with 10.1-RC3 on alix boards as hardware
>>> using NanoBSD.
>>>
>>> By mounting and umounting UFS filesystems I have seen umount constantly
>>> hanging hard in a deadlock. I have tested on two boards with two
>>> distinct compactflash disks with same results. This was not happening
>>> with 10.0-RELEASE.
>>>
>>> I have build a 10.1-RC3 kernel with full debugging and caused the
>>> problem to happen, I got this:
>>>
>>> root@qtest:~ [0]# umount /cfg
>>> panic: detach with active requests
>>> KDB: stack backtrace:
[...]
> I must admit I am out of ideas.
> 

I bisected commits and finally found out this happens starting with
r268815, which MFCed r268205.

It is related to trim support, in fact disabling trim on the filesystm
"fixes" it.

I filed bug #194606 on bugzilla [1] to further track this issue, if
anyone is interested.

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606

-- 
Guido Falsi <madpilot@FreeBSD.org>

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 12:00:32 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id F3467942;
 Sun, 26 Oct 2014 12:00:31 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7E1F4EBE;
 Sun, 26 Oct 2014 12:00:30 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aq0EAF3hTFSDaFve/2dsb2JhbABcg2JYBIMCyWUKhnlUAoEaAX2EAgEBAQMBAQEBIAQnIAsbDgoCAg0ZAikBCSYGCAcEARwEiBcJDbNMlAYBAQEBAQEEAQEBAQEBARuBLI8LAQEbNAeCd4FUBZZPhA6EcZRBhBQhLweBCDmBAwEBAQ
X-IronPort-AV: E=Sophos;i="5.04,790,1406606400"; d="scan'208";a="163555615"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 26 Oct 2014 08:00:29 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6108EB413D;
 Sun, 26 Oct 2014 08:00:29 -0400 (EDT)
Date: Sun, 26 Oct 2014 08:00:29 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <1340373913.7617662.1414324829387.JavaMail.root@uoguelph.ca>
In-Reply-To: <20141026075720.GO1877@kib.kiev.ua>
Subject: Re: panic in nfs on arm
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926)
Cc: Ronald Klop <ronald@klop.ws>, freebsd-fs@freebsd.org,
 freebsd-arm@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 12:00:32 -0000

Kostik wrote:
> On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote:
> > Ronald Klop wrote:
> > > Hi,
> > > 
> > > I got a panic on my arm computer while building a port with
> > > /usr/ports
> > > mounted from my FreeBSD-10-STABLE/amd64 machine.
> > > 
> > > This is the machine which paniced:
> > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014
> > > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG
> > >  arm
> > > 
> > > 
> > > Tracing pid 90295 tid 100119 td 0xc5f8c960
> > > db_trace_self() at db_trace_self
> > >           pc = 0xc0bb12c8  lr = 0xc0bb1354 (db_trace_thread+0x50)
> > >           sp = 0xdf29e5d0  fp = 0xc3e07120
> > > db_trace_thread() at db_trace_thread+0x50
> > >           pc = 0xc0bb1354  lr = 0xc0936314
> > >           (db_command_init+0x5a4)
> > >           sp = 0xdf29e630  fp = 0xc3e07120
> > > db_command_init() at db_command_init+0x5a4
> > >           pc = 0xc0936314  lr = 0xc0935ad0 (db_skip_to_eol+0x484)
> > >           sp = 0xdf29e648  fp = 0xc3e07120
> > >           r4 = 0xc0c8d350  r5 = 0x00000000
> > > db_skip_to_eol() at db_skip_to_eol+0x484
> > >           pc = 0xc0935ad0  lr = 0xc0935c38 (db_command_loop+0x5c)
> > >           sp = 0xdf29e6e8  fp = 0xc3e07120
> > >           r4 = 0xdf29e6fc  r5 = 0xc0c8d64c
> > >           r6 = 0x3cd90e75  r7 = 0x00000000
> > >           r8 = 0x00000001 r10 = 0x600000d3
> > > db_command_loop() at db_command_loop+0x5c
> > >           pc = 0xc0935c38  lr = 0xc0937f80
> > >           (X_db_sym_numargs+0xec)
> > >           sp = 0xdf29e6f0  fp = 0xc3e07120
> > > X_db_sym_numargs() at X_db_sym_numargs+0xec
> > >           pc = 0xc0937f80  lr = 0xc0a6f0c0 (kdb_trap+0x94)
> > >           sp = 0xdf29e808  fp = 0xc3e07120
> > >           r4 = 0xdf29e8f8
> > > kdb_trap() at kdb_trap+0x94
> > >           pc = 0xc0a6f0c0  lr = 0xc0bc1d60 (badaddr_read+0x274)
> > >           sp = 0xdf29e828  fp = 0xc3e07120
> > >           r4 = 0xdf29e8f8  r5 = 0x00000001
> > >           r6 = 0x3cd90e75  r7 = 0xc5f8c960
> > >           r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0
> > > badaddr_read() at badaddr_read+0x274
> > >           pc = 0xc0bc1d60  lr = 0xc0bc1e98 (badaddr_read+0x3ac)
> > >           sp = 0xdf29e840  fp = 0xc3e07120
> > >           r4 = 0xc5f8c960  r5 = 0xdf29e8f8
> > >           r6 = 0x3cd90e05
> > > badaddr_read() at badaddr_read+0x3ac
> > >           pc = 0xc0bc1e98  lr = 0xc0bc2278
> > >           (data_abort_handler+0x10c)
> > >           sp = 0xdf29e858  fp = 0xc3e07120
> > >           r4 = 0xc0cd8af8  r5 = 0xffff1004
> > > data_abort_handler() at data_abort_handler+0x10c
> > >           pc = 0xc0bc2278  lr = 0xc0bb2f40 (exception_exit)
> > >           sp = 0xdf29e8f8  fp = 0xc3e07120
> > >           r4 = 0xffffffff  r5 = 0xffff1004
> > >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
> > >           r8 = 0x0000000f  r9 = 0x00000101
> > >          r10 = 0x0000001d
> > > exception_exit() at exception_exit
> > >           pc = 0xc0bb2f40  lr = 0xc0b8daf8 (uma_reclaim+0x1f8)
> > >           sp = 0xdf29e948  fp = 0xc3e07120
> > >           r0 = 0xba9b9127  r1 = 0x8b3de5fb
> > >           r2 = 0xc61c1fc8  r3 = 0xba9b9126
> > >           r4 = 0x00000000  r5 = 0xc61c1fc8
> > >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
> > >           r8 = 0x0000000f  r9 = 0x00000101
> > >          r10 = 0x0000001d r12 = 0x00000000
> > > uma_reclaim() at uma_reclaim+0x24c
> > This looks to me like a crash in uma_reclaim() and I find UMA
> > way too obscure to understand.
> > 
> > I have no idea if it might be related, but alc@ put a fix for low
> > memory situations in r272071 (or maybe it's r272221?).
> > 
> > Might be worth trying a slightly newer kernel to see if the
> > problem still occurs.
> > 
> > And hopefully someone more conversant with UMA (or this stack
> > trace) can help more.
> > 
> > rick
> > 
> > >           pc = 0xc0b8db4c  lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0)
> > >           sp = 0xdf29e978  fp = 0xdf29ec10
> > >           r4 = 0xc3e071d8  r5 = 0xc0e0ea00
> > >           r6 = 0xc3e07120  r7 = 0x00000000
> > >           r8 = 0x00000102  r9 = 0xdf29ecf8
> > >          r10 = 0xc61c0760
> > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0
> uma_reclaim() is not called from uma_zalloc().
> I think there is some issue with ddb on arm, which means that
> the backtrace is not useful.  See below for one more.
> 
Yea, I noticed that and the one below (ie. I knew the stack dump
wasn't correct). I kinda hoped it was right w.r.t. the crash
happening in uma_reclaim() { which only seems to be called from
the pageout daemon? }, so that doesn't match up with the thread.

Also, I couldn't see what the panic message actually was. Is it
this one at the bottom:
Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock
or was that what happened when you tried to crash dump?

Btw, nfscl_nget() does call uma_zalloc(M_WAITOK), but it doesn't hold a mutex
when it does this.

rick

> > >           pc = 0xc0b8c800  lr = 0xc09e1df0 (nfscl_nget+0x308)
> > >           sp = 0xdf29e990  fp = 0xdf29ec10
> > >           r4 = 0x9bb9fa43  r5 = 0x00000000
> > >           r6 = 0xc550dce8  r7 = 0xc3edaa00
> > >           r8 = 0xc3ebbac0
> > > nfscl_nget() at nfscl_nget+0x308
> > >           pc = 0xc09e1df0  lr = 0xc09da69c
> > >           (ncl_readlinkrpc+0xf60)
> > >           sp = 0xdf29e9d8  fp = 0xdf29ea10
> > >           r4 = 0xc550dce8  r5 = 0x00000000
> > >           r6 = 0xc550dcf8  r7 = 0xdf29ecf8
> > >           r8 = 0xdf29ec6c  r9 = 0x00000000
> > >          r10 = 0xdf29ed28
> > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60
> > >           pc = 0xc09da69c  lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94)
> > >           sp = 0xdf29ec40  fp = 0xbffff620
> > >           r4 = 0xc0c95c68  r5 = 0xdf29ec6c
> > >           r6 = 0x00000001  r7 = 0x00020284
> > >           r8 = 0xffffff9c  r9 = 0x00200800
> > >          r10 = 0xc5f8c960
> > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94
> I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(),
> esp. without intervening frame.
> 
> > >           pc = 0xc0bdae44  lr = 0xc0aca614 (kern_mkdirat+0x18c)
> > >           sp = 0xdf29ec50  fp = 0xbffff620
> > >           r4 = 0xdf29ed28  r5 = 0xdf29ec90
> > >           r6 = 0x00000000
> > > kern_mkdirat() at kern_mkdirat+0x18c
> > >           pc = 0xc0aca614  lr = 0xc0aca684 (kern_mkdir+0x24)
> > >           sp = 0xdf29ede0  fp = 0xbffff620
> > >           r4 = 0x00020290  r5 = 0xc5f8c960
> > >           r6 = 0x00000000  r7 = 0xc5f7f000
> > >           r8 = 0x00000000 r10 = 0x00013640
> > > kern_mkdir() at kern_mkdir+0x24
> > >           pc = 0xc0aca684  lr = 0xc0aca6a8 (sys_mkdir+0x1c)
> > >           sp = 0xdf29edf0  fp = 0xbffff620
> > > sys_mkdir() at sys_mkdir+0x1c
> > >           pc = 0xc0aca6a8  lr = 0xc0bc2884 (swi_handler+0x254)
> > >           sp = 0xdf29edf8  fp = 0xbffff620
> > > swi_handler() at swi_handler+0x254
> > >           pc = 0xc0bc2884  lr = 0xc0bb2ed0 (swi_exit)
> > >           sp = 0xdf29ee60  fp = 0xbffff620
> > >           r4 = 0x00020290  r5 = 0x2085e8e0
> > >           r6 = 0x00020284  r7 = 0x00000088
> > >           r8 = 0x00000001
> > > swi_exit() at swi_exit
> > >           pc = 0xc0bb2ed0  lr = 0xc0bb2ed0 (swi_exit)
> > >           sp = 0xdf29ee60  fp = 0xbffff620
> > > Unable to unwind further
> > > 
> > > 
> > > Unfortunately dumping the kernel core also paniced.
> > > db> dump
> > > Physical memory: 507 MB
> > > Dumping 74 MB: 71 67 63
> > > vm_fault(0xc4147000, 0, 1, 0) -> 0
> > > Fatal kernel mode data abort: 'Translation Fault (P)'
> > > trapframe: 0xdf29e0b8
> > > FSR=00000017, FAR=00000014, spsr=a00000d3
> > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004
> > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c
> > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a
> > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060
> > > 
> > > panic: Fatal abort
> > > Uptime: 3d18h30m32s
> > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock
> > > _______________________________________________
> > > freebsd-fs@freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > To unsubscribe, send any mail to
> > > "freebsd-fs-unsubscribe@freebsd.org"
> > > 
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to
> > "freebsd-fs-unsubscribe@freebsd.org"
> 

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 12:12:09 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 84663AB5;
 Sun, 26 Oct 2014 12:12:09 +0000 (UTC)
Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl
 [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4547880;
 Sun, 26 Oct 2014 12:12:08 +0000 (UTC)
Received: from smtp.greenhost.nl ([213.108.104.138])
 by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.72) (envelope-from <ronald@klop.ws>)
 id 1XiMfq-0004Js-AT; Sun, 26 Oct 2014 13:12:00 +0100
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
To: "Konstantin Belousov" <kostikbel@gmail.com>, "Rick Macklem"
 <rmacklem@uoguelph.ca>
Subject: Re: panic in nfs on arm
References: <1340373913.7617662.1414324829387.JavaMail.root@uoguelph.ca>
Date: Sun, 26 Oct 2014 13:11:53 +0100
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: "Ronald Klop" <ronald@klop.ws>
Message-ID: <op.xob6t3tweclrs1@82-171-231-144.ip.telfort.nl>
In-Reply-To: <1340373913.7617662.1414324829387.JavaMail.root@uoguelph.ca>
User-Agent: Opera Mail/12.16 (FreeBSD)
X-Authenticated-As-Hash: bdb49c4ff80bd276e321aade33e76e02752072e2
X-Virus-Scanned: by clamav at smarthost1.samage.net
X-Spam-Level: /
X-Spam-Score: -0.2
X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED,
 BAYES_50 autolearn=disabled version=3.3.1
X-Scan-Signature: 503f1a2b1db20d3cc8283cfb339c155f
Cc: freebsd-fs@freebsd.org, freebsd-arm@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 12:12:09 -0000

On Sun, 26 Oct 2014 13:00:29 +0100, Rick Macklem <rmacklem@uoguelph.ca>  
wrote:

> Kostik wrote:
>> On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote:
>> > Ronald Klop wrote:
>> > > Hi,
>> > >
>> > > I got a panic on my arm computer while building a port with
>> > > /usr/ports
>> > > mounted from my FreeBSD-10-STABLE/amd64 machine.
>> > >
>> > > This is the machine which paniced:
>> > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014
>> > > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG
>> > >  arm
>> > >
>> > >
>> > > Tracing pid 90295 tid 100119 td 0xc5f8c960
>> > > db_trace_self() at db_trace_self
>> > >           pc = 0xc0bb12c8  lr = 0xc0bb1354 (db_trace_thread+0x50)
>> > >           sp = 0xdf29e5d0  fp = 0xc3e07120
>> > > db_trace_thread() at db_trace_thread+0x50
>> > >           pc = 0xc0bb1354  lr = 0xc0936314
>> > >           (db_command_init+0x5a4)
>> > >           sp = 0xdf29e630  fp = 0xc3e07120
>> > > db_command_init() at db_command_init+0x5a4
>> > >           pc = 0xc0936314  lr = 0xc0935ad0 (db_skip_to_eol+0x484)
>> > >           sp = 0xdf29e648  fp = 0xc3e07120
>> > >           r4 = 0xc0c8d350  r5 = 0x00000000
>> > > db_skip_to_eol() at db_skip_to_eol+0x484
>> > >           pc = 0xc0935ad0  lr = 0xc0935c38 (db_command_loop+0x5c)
>> > >           sp = 0xdf29e6e8  fp = 0xc3e07120
>> > >           r4 = 0xdf29e6fc  r5 = 0xc0c8d64c
>> > >           r6 = 0x3cd90e75  r7 = 0x00000000
>> > >           r8 = 0x00000001 r10 = 0x600000d3
>> > > db_command_loop() at db_command_loop+0x5c
>> > >           pc = 0xc0935c38  lr = 0xc0937f80
>> > >           (X_db_sym_numargs+0xec)
>> > >           sp = 0xdf29e6f0  fp = 0xc3e07120
>> > > X_db_sym_numargs() at X_db_sym_numargs+0xec
>> > >           pc = 0xc0937f80  lr = 0xc0a6f0c0 (kdb_trap+0x94)
>> > >           sp = 0xdf29e808  fp = 0xc3e07120
>> > >           r4 = 0xdf29e8f8
>> > > kdb_trap() at kdb_trap+0x94
>> > >           pc = 0xc0a6f0c0  lr = 0xc0bc1d60 (badaddr_read+0x274)
>> > >           sp = 0xdf29e828  fp = 0xc3e07120
>> > >           r4 = 0xdf29e8f8  r5 = 0x00000001
>> > >           r6 = 0x3cd90e75  r7 = 0xc5f8c960
>> > >           r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0
>> > > badaddr_read() at badaddr_read+0x274
>> > >           pc = 0xc0bc1d60  lr = 0xc0bc1e98 (badaddr_read+0x3ac)
>> > >           sp = 0xdf29e840  fp = 0xc3e07120
>> > >           r4 = 0xc5f8c960  r5 = 0xdf29e8f8
>> > >           r6 = 0x3cd90e05
>> > > badaddr_read() at badaddr_read+0x3ac
>> > >           pc = 0xc0bc1e98  lr = 0xc0bc2278
>> > >           (data_abort_handler+0x10c)
>> > >           sp = 0xdf29e858  fp = 0xc3e07120
>> > >           r4 = 0xc0cd8af8  r5 = 0xffff1004
>> > > data_abort_handler() at data_abort_handler+0x10c
>> > >           pc = 0xc0bc2278  lr = 0xc0bb2f40 (exception_exit)
>> > >           sp = 0xdf29e8f8  fp = 0xc3e07120
>> > >           r4 = 0xffffffff  r5 = 0xffff1004
>> > >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
>> > >           r8 = 0x0000000f  r9 = 0x00000101
>> > >          r10 = 0x0000001d
>> > > exception_exit() at exception_exit
>> > >           pc = 0xc0bb2f40  lr = 0xc0b8daf8 (uma_reclaim+0x1f8)
>> > >           sp = 0xdf29e948  fp = 0xc3e07120
>> > >           r0 = 0xba9b9127  r1 = 0x8b3de5fb
>> > >           r2 = 0xc61c1fc8  r3 = 0xba9b9126
>> > >           r4 = 0x00000000  r5 = 0xc61c1fc8
>> > >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
>> > >           r8 = 0x0000000f  r9 = 0x00000101
>> > >          r10 = 0x0000001d r12 = 0x00000000
>> > > uma_reclaim() at uma_reclaim+0x24c
>> > This looks to me like a crash in uma_reclaim() and I find UMA
>> > way too obscure to understand.
>> >
>> > I have no idea if it might be related, but alc@ put a fix for low
>> > memory situations in r272071 (or maybe it's r272221?).
>> >
>> > Might be worth trying a slightly newer kernel to see if the
>> > problem still occurs.
>> >
>> > And hopefully someone more conversant with UMA (or this stack
>> > trace) can help more.
>> >
>> > rick
>> >
>> > >           pc = 0xc0b8db4c  lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0)
>> > >           sp = 0xdf29e978  fp = 0xdf29ec10
>> > >           r4 = 0xc3e071d8  r5 = 0xc0e0ea00
>> > >           r6 = 0xc3e07120  r7 = 0x00000000
>> > >           r8 = 0x00000102  r9 = 0xdf29ecf8
>> > >          r10 = 0xc61c0760
>> > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0
>> uma_reclaim() is not called from uma_zalloc().
>> I think there is some issue with ddb on arm, which means that
>> the backtrace is not useful.  See below for one more.
>>
> Yea, I noticed that and the one below (ie. I knew the stack dump
> wasn't correct). I kinda hoped it was right w.r.t. the crash
> happening in uma_reclaim() { which only seems to be called from
> the pageout daemon? }, so that doesn't match up with the thread.
>
> Also, I couldn't see what the panic message actually was. Is it
> this one at the bottom:
> Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock
> or was that what happened when you tried to crash dump?
>
> Btw, nfscl_nget() does call uma_zalloc(M_WAITOK), but it doesn't hold a  
> mutex
> when it does this.
>
> rick


Hi,

The non-sleepable lock is not the original panic. That non-sleepable lock  
happened when I dumped the memory to dumpdev from the debugger. I don't  
have the original panic message. It was not on the serial output anymore.  
Is it possible to let the debugger print it again?

I rebooted the machine already. Let's see if it happens again someday.

Ronald.


>> > >           pc = 0xc0b8c800  lr = 0xc09e1df0 (nfscl_nget+0x308)
>> > >           sp = 0xdf29e990  fp = 0xdf29ec10
>> > >           r4 = 0x9bb9fa43  r5 = 0x00000000
>> > >           r6 = 0xc550dce8  r7 = 0xc3edaa00
>> > >           r8 = 0xc3ebbac0
>> > > nfscl_nget() at nfscl_nget+0x308
>> > >           pc = 0xc09e1df0  lr = 0xc09da69c
>> > >           (ncl_readlinkrpc+0xf60)
>> > >           sp = 0xdf29e9d8  fp = 0xdf29ea10
>> > >           r4 = 0xc550dce8  r5 = 0x00000000
>> > >           r6 = 0xc550dcf8  r7 = 0xdf29ecf8
>> > >           r8 = 0xdf29ec6c  r9 = 0x00000000
>> > >          r10 = 0xdf29ed28
>> > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60
>> > >           pc = 0xc09da69c  lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94)
>> > >           sp = 0xdf29ec40  fp = 0xbffff620
>> > >           r4 = 0xc0c95c68  r5 = 0xdf29ec6c
>> > >           r6 = 0x00000001  r7 = 0x00020284
>> > >           r8 = 0xffffff9c  r9 = 0x00200800
>> > >          r10 = 0xc5f8c960
>> > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94
>> I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(),
>> esp. without intervening frame.
>>
>> > >           pc = 0xc0bdae44  lr = 0xc0aca614 (kern_mkdirat+0x18c)
>> > >           sp = 0xdf29ec50  fp = 0xbffff620
>> > >           r4 = 0xdf29ed28  r5 = 0xdf29ec90
>> > >           r6 = 0x00000000
>> > > kern_mkdirat() at kern_mkdirat+0x18c
>> > >           pc = 0xc0aca614  lr = 0xc0aca684 (kern_mkdir+0x24)
>> > >           sp = 0xdf29ede0  fp = 0xbffff620
>> > >           r4 = 0x00020290  r5 = 0xc5f8c960
>> > >           r6 = 0x00000000  r7 = 0xc5f7f000
>> > >           r8 = 0x00000000 r10 = 0x00013640
>> > > kern_mkdir() at kern_mkdir+0x24
>> > >           pc = 0xc0aca684  lr = 0xc0aca6a8 (sys_mkdir+0x1c)
>> > >           sp = 0xdf29edf0  fp = 0xbffff620
>> > > sys_mkdir() at sys_mkdir+0x1c
>> > >           pc = 0xc0aca6a8  lr = 0xc0bc2884 (swi_handler+0x254)
>> > >           sp = 0xdf29edf8  fp = 0xbffff620
>> > > swi_handler() at swi_handler+0x254
>> > >           pc = 0xc0bc2884  lr = 0xc0bb2ed0 (swi_exit)
>> > >           sp = 0xdf29ee60  fp = 0xbffff620
>> > >           r4 = 0x00020290  r5 = 0x2085e8e0
>> > >           r6 = 0x00020284  r7 = 0x00000088
>> > >           r8 = 0x00000001
>> > > swi_exit() at swi_exit
>> > >           pc = 0xc0bb2ed0  lr = 0xc0bb2ed0 (swi_exit)
>> > >           sp = 0xdf29ee60  fp = 0xbffff620
>> > > Unable to unwind further
>> > >
>> > >
>> > > Unfortunately dumping the kernel core also paniced.
>> > > db> dump
>> > > Physical memory: 507 MB
>> > > Dumping 74 MB: 71 67 63
>> > > vm_fault(0xc4147000, 0, 1, 0) -> 0
>> > > Fatal kernel mode data abort: 'Translation Fault (P)'
>> > > trapframe: 0xdf29e0b8
>> > > FSR=00000017, FAR=00000014, spsr=a00000d3
>> > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004
>> > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c
>> > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a
>> > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060
>> > >
>> > > panic: Fatal abort
>> > > Uptime: 3d18h30m32s
>> > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock
>> > > _______________________________________________
>> > > freebsd-fs@freebsd.org mailing list
>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> > > To unsubscribe, send any mail to
>> > > "freebsd-fs-unsubscribe@freebsd.org"
>> > >
>> > _______________________________________________
>> > freebsd-fs@freebsd.org mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> > To unsubscribe, send any mail to
>> > "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 14:59:21 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 62A5CCA5;
 Sun, 26 Oct 2014 14:59:21 +0000 (UTC)
Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 22BB1FD0;
 Sun, 26 Oct 2014 14:59:20 +0000 (UTC)
Received: from [73.34.117.227] (helo=ilsoft.org)
 by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1XiPHr-0009z0-6u; Sun, 26 Oct 2014 14:59:19 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s9QExHer074479;
 Sun, 26 Oct 2014 08:59:17 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 73.34.117.227
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX19WVQ941P4koAD0Fz9/d8NS
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: panic in nfs on arm
From: Ian Lepore <ian@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
In-Reply-To: <20141026075720.GO1877@kib.kiev.ua>
References: <op.xn95m7ajeclrs1@82-171-231-144.ip.telfort.nl>
 <1388627434.7506173.1414279273153.JavaMail.root@uoguelph.ca>
 <20141026075720.GO1877@kib.kiev.ua>
Content-Type: text/plain; charset="us-ascii"
Date: Sun, 26 Oct 2014 08:59:17 -0600
Message-ID: <1414335557.12052.672.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: Ronald Klop <ronald@klop.ws>, freebsd-fs@freebsd.org,
 freebsd-arm@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 14:59:21 -0000

On Sun, 2014-10-26 at 09:57 +0200, Konstantin Belousov wrote:
> On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote:
> > Ronald Klop wrote:
> > > Hi,
> > > 
> > > I got a panic on my arm computer while building a port with
> > > /usr/ports
> > > mounted from my FreeBSD-10-STABLE/amd64 machine.
> > > 
> > > This is the machine which paniced:
> > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014
> > > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG
> > >  arm
> > > 
> > > 
> > > Tracing pid 90295 tid 100119 td 0xc5f8c960
> > > db_trace_self() at db_trace_self
> > >           pc = 0xc0bb12c8  lr = 0xc0bb1354 (db_trace_thread+0x50)
> > >           sp = 0xdf29e5d0  fp = 0xc3e07120
> > > db_trace_thread() at db_trace_thread+0x50
> > >           pc = 0xc0bb1354  lr = 0xc0936314 (db_command_init+0x5a4)
> > >           sp = 0xdf29e630  fp = 0xc3e07120
> > > db_command_init() at db_command_init+0x5a4
> > >           pc = 0xc0936314  lr = 0xc0935ad0 (db_skip_to_eol+0x484)
> > >           sp = 0xdf29e648  fp = 0xc3e07120
> > >           r4 = 0xc0c8d350  r5 = 0x00000000
> > > db_skip_to_eol() at db_skip_to_eol+0x484
> > >           pc = 0xc0935ad0  lr = 0xc0935c38 (db_command_loop+0x5c)
> > >           sp = 0xdf29e6e8  fp = 0xc3e07120
> > >           r4 = 0xdf29e6fc  r5 = 0xc0c8d64c
> > >           r6 = 0x3cd90e75  r7 = 0x00000000
> > >           r8 = 0x00000001 r10 = 0x600000d3
> > > db_command_loop() at db_command_loop+0x5c
> > >           pc = 0xc0935c38  lr = 0xc0937f80 (X_db_sym_numargs+0xec)
> > >           sp = 0xdf29e6f0  fp = 0xc3e07120
> > > X_db_sym_numargs() at X_db_sym_numargs+0xec
> > >           pc = 0xc0937f80  lr = 0xc0a6f0c0 (kdb_trap+0x94)
> > >           sp = 0xdf29e808  fp = 0xc3e07120
> > >           r4 = 0xdf29e8f8
> > > kdb_trap() at kdb_trap+0x94
> > >           pc = 0xc0a6f0c0  lr = 0xc0bc1d60 (badaddr_read+0x274)
> > >           sp = 0xdf29e828  fp = 0xc3e07120
> > >           r4 = 0xdf29e8f8  r5 = 0x00000001
> > >           r6 = 0x3cd90e75  r7 = 0xc5f8c960
> > >           r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0
> > > badaddr_read() at badaddr_read+0x274
> > >           pc = 0xc0bc1d60  lr = 0xc0bc1e98 (badaddr_read+0x3ac)
> > >           sp = 0xdf29e840  fp = 0xc3e07120
> > >           r4 = 0xc5f8c960  r5 = 0xdf29e8f8
> > >           r6 = 0x3cd90e05
> > > badaddr_read() at badaddr_read+0x3ac
> > >           pc = 0xc0bc1e98  lr = 0xc0bc2278 (data_abort_handler+0x10c)
> > >           sp = 0xdf29e858  fp = 0xc3e07120
> > >           r4 = 0xc0cd8af8  r5 = 0xffff1004
> > > data_abort_handler() at data_abort_handler+0x10c
> > >           pc = 0xc0bc2278  lr = 0xc0bb2f40 (exception_exit)
> > >           sp = 0xdf29e8f8  fp = 0xc3e07120
> > >           r4 = 0xffffffff  r5 = 0xffff1004
> > >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
> > >           r8 = 0x0000000f  r9 = 0x00000101
> > >          r10 = 0x0000001d
> > > exception_exit() at exception_exit
> > >           pc = 0xc0bb2f40  lr = 0xc0b8daf8 (uma_reclaim+0x1f8)
> > >           sp = 0xdf29e948  fp = 0xc3e07120
> > >           r0 = 0xba9b9127  r1 = 0x8b3de5fb
> > >           r2 = 0xc61c1fc8  r3 = 0xba9b9126
> > >           r4 = 0x00000000  r5 = 0xc61c1fc8
> > >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
> > >           r8 = 0x0000000f  r9 = 0x00000101
> > >          r10 = 0x0000001d r12 = 0x00000000
> > > uma_reclaim() at uma_reclaim+0x24c
> > This looks to me like a crash in uma_reclaim() and I find UMA
> > way too obscure to understand.
> > 
> > I have no idea if it might be related, but alc@ put a fix for low
> > memory situations in r272071 (or maybe it's r272221?).
> > 
> > Might be worth trying a slightly newer kernel to see if the
> > problem still occurs.
> > 
> > And hopefully someone more conversant with UMA (or this stack
> > trace) can help more.
> > 
> > rick
> > 
> > >           pc = 0xc0b8db4c  lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0)
> > >           sp = 0xdf29e978  fp = 0xdf29ec10
> > >           r4 = 0xc3e071d8  r5 = 0xc0e0ea00
> > >           r6 = 0xc3e07120  r7 = 0x00000000
> > >           r8 = 0x00000102  r9 = 0xdf29ecf8
> > >          r10 = 0xc61c0760
> > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0
> uma_reclaim() is not called from uma_zalloc().
> I think there is some issue with ddb on arm, which means that
> the backtrace is not useful.  See below for one more.
> > >           pc = 0xc0b8c800  lr = 0xc09e1df0 (nfscl_nget+0x308)
> > >           sp = 0xdf29e990  fp = 0xdf29ec10
> > >           r4 = 0x9bb9fa43  r5 = 0x00000000
> > >           r6 = 0xc550dce8  r7 = 0xc3edaa00
> > >           r8 = 0xc3ebbac0
> > > nfscl_nget() at nfscl_nget+0x308
> > >           pc = 0xc09e1df0  lr = 0xc09da69c (ncl_readlinkrpc+0xf60)
> > >           sp = 0xdf29e9d8  fp = 0xdf29ea10
> > >           r4 = 0xc550dce8  r5 = 0x00000000
> > >           r6 = 0xc550dcf8  r7 = 0xdf29ecf8
> > >           r8 = 0xdf29ec6c  r9 = 0x00000000
> > >          r10 = 0xdf29ed28
> > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60
> > >           pc = 0xc09da69c  lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94)
> > >           sp = 0xdf29ec40  fp = 0xbffff620
> > >           r4 = 0xc0c95c68  r5 = 0xdf29ec6c
> > >           r6 = 0x00000001  r7 = 0x00020284
> > >           r8 = 0xffffff9c  r9 = 0x00200800
> > >          r10 = 0xc5f8c960
> > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94
> I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(),
> esp. without intervening frame.
> 

 
Notice that the address is actually ncl_readlinkrpc+0xf60, 0xf60 is a
pretty big offset into a function, it's probably in some static function
that follows ncl_readlinkrpc in the source file but the symbol info has
been stripped.  Using addr2line on the pc and lr values will give
reliable source line numbers (but I can't do that without Ronald's
kernel config).

-- Ian


> > >           pc = 0xc0bdae44  lr = 0xc0aca614 (kern_mkdirat+0x18c)
> > >           sp = 0xdf29ec50  fp = 0xbffff620
> > >           r4 = 0xdf29ed28  r5 = 0xdf29ec90
> > >           r6 = 0x00000000
> > > kern_mkdirat() at kern_mkdirat+0x18c
> > >           pc = 0xc0aca614  lr = 0xc0aca684 (kern_mkdir+0x24)
> > >           sp = 0xdf29ede0  fp = 0xbffff620
> > >           r4 = 0x00020290  r5 = 0xc5f8c960
> > >           r6 = 0x00000000  r7 = 0xc5f7f000
> > >           r8 = 0x00000000 r10 = 0x00013640
> > > kern_mkdir() at kern_mkdir+0x24
> > >           pc = 0xc0aca684  lr = 0xc0aca6a8 (sys_mkdir+0x1c)
> > >           sp = 0xdf29edf0  fp = 0xbffff620
> > > sys_mkdir() at sys_mkdir+0x1c
> > >           pc = 0xc0aca6a8  lr = 0xc0bc2884 (swi_handler+0x254)
> > >           sp = 0xdf29edf8  fp = 0xbffff620
> > > swi_handler() at swi_handler+0x254
> > >           pc = 0xc0bc2884  lr = 0xc0bb2ed0 (swi_exit)
> > >           sp = 0xdf29ee60  fp = 0xbffff620
> > >           r4 = 0x00020290  r5 = 0x2085e8e0
> > >           r6 = 0x00020284  r7 = 0x00000088
> > >           r8 = 0x00000001
> > > swi_exit() at swi_exit
> > >           pc = 0xc0bb2ed0  lr = 0xc0bb2ed0 (swi_exit)
> > >           sp = 0xdf29ee60  fp = 0xbffff620
> > > Unable to unwind further
> > > 
> > > 
> > > Unfortunately dumping the kernel core also paniced.
> > > db> dump
> > > Physical memory: 507 MB
> > > Dumping 74 MB: 71 67 63
> > > vm_fault(0xc4147000, 0, 1, 0) -> 0
> > > Fatal kernel mode data abort: 'Translation Fault (P)'
> > > trapframe: 0xdf29e0b8
> > > FSR=00000017, FAR=00000014, spsr=a00000d3
> > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004
> > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c
> > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a
> > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060
> > > 
> > > panic: Fatal abort
> > > Uptime: 3d18h30m32s
> > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock


From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 15:27:57 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 1C06A299
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 15:27:57 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id D4A722E3
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 15:27:56 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id 6C2AD20E7088C; Sun, 26 Oct 2014 15:27:54 +0000 (UTC)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk
 [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 6072020E70886
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 15:27:54 +0000 (UTC)
Message-ID: <544D137A.7010006@multiplay.co.uk>
Date: Sun, 26 Oct 2014 15:30:02 +0000
From: Steven Hartland <killing@multiplay.co.uk>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: panic: detach with active requests on 10.1-RC3
References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net>
 <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org>
In-Reply-To: <544CC4D4.7040203@FreeBSD.org>
Content-Type: multipart/mixed; boundary="------------080209070109080503070307"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 15:27:57 -0000

This is a multi-part message in MIME format.
--------------080209070109080503070307
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit


On 26/10/2014 09:54, Guido Falsi wrote:
> On 10/25/14 18:02, Guido Falsi wrote:
>> On 10/25/14 17:02, Guido Falsi wrote:
>>> On 10/24/14 15:26, Guido Falsi wrote:
>>>> Hi,
>>>>
>>>> I'm making some experiments with 10.1-RC3 on alix boards as hardware
>>>> using NanoBSD.
>>>>
>>>> By mounting and umounting UFS filesystems I have seen umount constantly
>>>> hanging hard in a deadlock. I have tested on two boards with two
>>>> distinct compactflash disks with same results. This was not happening
>>>> with 10.0-RELEASE.
>>>>
>>>> I have build a 10.1-RC3 kernel with full debugging and caused the
>>>> problem to happen, I got this:
>>>>
>>>> root@qtest:~ [0]# umount /cfg
>>>> panic: detach with active requests
>>>> KDB: stack backtrace:
> [...]
>> I must admit I am out of ideas.
>>
> I bisected commits and finally found out this happens starting with
> r268815, which MFCed r268205.
>
> It is related to trim support, in fact disabling trim on the filesystm
> "fixes" it.
>
> I filed bug #194606 on bugzilla [1] to further track this issue, if
> anyone is interested.
>
> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606
Nice work Guido, can you try the attached patch and see if that fixes it 
please?

     Regards
     Steve


--------------080209070109080503070307
Content-Type: text/plain; charset=windows-1252;
 name="cf_erase.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="cf_erase.patch"

Index: sys/cam/ata/ata_da.c
===================================================================
--- sys/cam/ata/ata_da.c	(revision 273157)
+++ sys/cam/ata/ata_da.c	(working copy)
@@ -1470,6 +1470,8 @@ ada_cfaerase(struct ada_softc *softc, struct bio *
 	uint64_t lba = bp->bio_pblkno;
 	uint16_t count = bp->bio_bcount / softc->params.secsize;
 
+	bioq_remove(&softc->trim_queue, bp);
+
 	cam_fill_ataio(ataio,
 	    ada_retry_count,
 	    adadone,

--------------080209070109080503070307--

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 15:59:16 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8DA317EB
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 15:59:16 +0000 (UTC)
Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 44569790
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 15:59:15 +0000 (UTC)
Received: from mail (mail [192.168.254.3])
 by mail.madpilot.net (Postfix) with ESMTP id 3jQkPY0nLzzb36;
 Sun, 26 Oct 2014 16:58:57 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h=
 content-transfer-encoding:content-type:content-type:in-reply-to
 :references:subject:subject:mime-version:user-agent:from:from
 :date:date:message-id:received:received; s=mail; t=1414339134;
 x=1416153535; bh=OWbnC8HDhWImIe3eyBf+by0RshfiKWfuCIusxIAmMxU=; b=
 jZ3pKhojbQmjAdD8LWDNj+GVNXX/eQNVjCYX0EdJIrFbc1hJEa0zF5kjKPcCrFZp
 nIDURjREddDEytZ9yP1sr5KSsvS9xsqUtMbXMlPzPWwGNiaowmMJ46Fs+6NcowOZ
 jQDs+mwq5UgHi8ztuMfoDRCvoJbdTmURe0b92ngAJ38=
Received: from mail.madpilot.net ([192.168.254.3])
 by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024)
 with ESMTP id boi_Cr3THAeX; Sun, 26 Oct 2014 16:58:54 +0100 (CET)
Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206])
 by mail.madpilot.net (Postfix) with ESMTPSA;
 Sun, 26 Oct 2014 16:58:54 +0100 (CET)
Message-ID: <544D1A3E.5000000@madpilot.net>
Date: Sun, 26 Oct 2014 16:58:54 +0100
From: Guido Falsi <mad@madpilot.net>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>, freebsd-fs@freebsd.org
Subject: Re: panic: detach with active requests on 10.1-RC3
References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net>
 <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org>
 <544D137A.7010006@multiplay.co.uk>
In-Reply-To: <544D137A.7010006@multiplay.co.uk>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 15:59:16 -0000

On 10/26/14 16:30, Steven Hartland wrote:
> 
> On 26/10/2014 09:54, Guido Falsi wrote:
>> On 10/25/14 18:02, Guido Falsi wrote:
>>> On 10/25/14 17:02, Guido Falsi wrote:
>>>> On 10/24/14 15:26, Guido Falsi wrote:
>>>>> Hi,
>>>>>
>>>>> I'm making some experiments with 10.1-RC3 on alix boards as hardware
>>>>> using NanoBSD.
>>>>>
>>>>> By mounting and umounting UFS filesystems I have seen umount
>>>>> constantly
>>>>> hanging hard in a deadlock. I have tested on two boards with two
>>>>> distinct compactflash disks with same results. This was not happening
>>>>> with 10.0-RELEASE.
>>>>>
>>>>> I have build a 10.1-RC3 kernel with full debugging and caused the
>>>>> problem to happen, I got this:
>>>>>
>>>>> root@qtest:~ [0]# umount /cfg
>>>>> panic: detach with active requests
>>>>> KDB: stack backtrace:
>> [...]
>>> I must admit I am out of ideas.
>>>
>> I bisected commits and finally found out this happens starting with
>> r268815, which MFCed r268205.
>>
>> It is related to trim support, in fact disabling trim on the filesystm
>> "fixes" it.
>>
>> I filed bug #194606 on bugzilla [1] to further track this issue, if
>> anyone is interested.
>>
>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606
> Nice work Guido, can you try the attached patch and see if that fixes it
> please?

Sure, I'll report back ASAP

-- 
Guido Falsi <mad@madpilot.net>

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 17:24:52 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 7C7E98A3
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 17:24:52 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id 4006AED2
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 17:24:51 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id 1095720E7088C; Sun, 26 Oct 2014 17:24:50 +0000 (UTC)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk
 [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTPS id F313C20E70886;
 Sun, 26 Oct 2014 17:24:49 +0000 (UTC)
Message-ID: <544D2EE4.6010809@multiplay.co.uk>
Date: Sun, 26 Oct 2014 17:27:00 +0000
From: Steven Hartland <killing@multiplay.co.uk>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Guido Falsi <mad@madpilot.net>, freebsd-fs@freebsd.org
Subject: Re: panic: detach with active requests on 10.1-RC3
References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net>
 <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org>
 <544D137A.7010006@multiplay.co.uk> <544D1A3E.5000000@madpilot.net>
In-Reply-To: <544D1A3E.5000000@madpilot.net>
Content-Type: multipart/mixed; boundary="------------020309030106010303040409"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 17:24:52 -0000

This is a multi-part message in MIME format.
--------------020309030106010303040409
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit


On 26/10/2014 15:58, Guido Falsi wrote:
>
>>> I bisected commits and finally found out this happens starting with
>>> r268815, which MFCed r268205.
>>>
>>> It is related to trim support, in fact disabling trim on the filesystm
>>> "fixes" it.
>>>
>>> I filed bug #194606 on bugzilla [1] to further track this issue, if
>>> anyone is interested.
>>>
>>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606
>> Nice work Guido, can you try the attached patch and see if that fixes it
>> please?
> Sure, I'll report back ASAP

Actually looks like the fix requires more changes than I first thought, 
updated patch attached.

     Regards
     Steve

--------------020309030106010303040409
Content-Type: text/plain; charset=windows-1252;
 name="cf_erase.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="cf_erase.patch"

SW5kZXg6IHN5cy9jYW0vYXRhL2F0YV9kYS5jCj09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHN5cy9jYW0v
YXRhL2F0YV9kYS5jCShyZXZpc2lvbiAyNzMxNTcpCisrKyBzeXMvY2FtL2F0YS9hdGFfZGEu
Ywkod29ya2luZyBjb3B5KQpAQCAtMTQ2Nyw5ICsxNDY3LDE1IEBAIGFkYV9kc210cmltKHN0
cnVjdCBhZGFfc29mdGMgKnNvZnRjLCBzdHJ1Y3QgYmlvICpiCiBzdGF0aWMgdm9pZAogYWRh
X2NmYWVyYXNlKHN0cnVjdCBhZGFfc29mdGMgKnNvZnRjLCBzdHJ1Y3QgYmlvICpicCwgc3Ry
dWN0IGNjYl9hdGFpbyAqYXRhaW8pCiB7CisJc3RydWN0IHRyaW1fcmVxdWVzdCAqcmVxID0g
JnNvZnRjLT50cmltX3JlcTsKIAl1aW50NjRfdCBsYmEgPSBicC0+YmlvX3BibGtubzsKIAl1
aW50MTZfdCBjb3VudCA9IGJwLT5iaW9fYmNvdW50IC8gc29mdGMtPnBhcmFtcy5zZWNzaXpl
OwogCisJYnplcm8ocmVxLCBzaXplb2YoKnJlcSkpOworCVRBSUxRX0lOSVQoJnJlcS0+YnBz
KTsKKwliaW9xX3JlbW92ZSgmc29mdGMtPnRyaW1fcXVldWUsIGJwKTsKKwlUQUlMUV9JTlNF
UlRfVEFJTCgmcmVxLT5icHMsIGJwLCBiaW9fcXVldWUpOworCiAJY2FtX2ZpbGxfYXRhaW8o
YXRhaW8sCiAJICAgIGFkYV9yZXRyeV9jb3VudCwKIAkgICAgYWRhZG9uZSwK
--------------020309030106010303040409--

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 17:33:13 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 43D82A1E
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 17:33:13 +0000 (UTC)
Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0159AFA1
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 17:33:12 +0000 (UTC)
Received: from mail (mail [192.168.254.3])
 by mail.madpilot.net (Postfix) with ESMTP id 3jQmV50FN6zb38;
 Sun, 26 Oct 2014 18:33:01 +0100 (CET)
Received: from mail.madpilot.net ([192.168.254.3])
 by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024)
 with ESMTP id 6JZKaVttFZbh; Sun, 26 Oct 2014 18:32:45 +0100 (CET)
Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206])
 by mail.madpilot.net (Postfix) with ESMTPSA;
 Sun, 26 Oct 2014 18:32:40 +0100 (CET)
Message-ID: <544D3038.7080901@FreeBSD.org>
Date: Sun, 26 Oct 2014 18:32:40 +0100
From: Guido Falsi <madpilot@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>, freebsd-fs@freebsd.org
Subject: Re: panic: detach with active requests on 10.1-RC3
References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net>
 <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org>
 <544D137A.7010006@multiplay.co.uk>
In-Reply-To: <544D137A.7010006@multiplay.co.uk>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 17:33:13 -0000

On 10/26/14 16:30, Steven Hartland wrote:
> 
> On 26/10/2014 09:54, Guido Falsi wrote:
>> On 10/25/14 18:02, Guido Falsi wrote:
>>> On 10/25/14 17:02, Guido Falsi wrote:
>>>> On 10/24/14 15:26, Guido Falsi wrote:
>>>>> Hi,
>>>>>
>>>>> I'm making some experiments with 10.1-RC3 on alix boards as hardware
>>>>> using NanoBSD.
>>>>>
>>>>> By mounting and umounting UFS filesystems I have seen umount
>>>>> constantly
>>>>> hanging hard in a deadlock. I have tested on two boards with two
>>>>> distinct compactflash disks with same results. This was not happening
>>>>> with 10.0-RELEASE.
>>>>>
>>>>> I have build a 10.1-RC3 kernel with full debugging and caused the
>>>>> problem to happen, I got this:
>>>>>
>>>>> root@qtest:~ [0]# umount /cfg
>>>>> panic: detach with active requests
>>>>> KDB: stack backtrace:
>> [...]
>>> I must admit I am out of ideas.
>>>
>> I bisected commits and finally found out this happens starting with
>> r268815, which MFCed r268205.
>>
>> It is related to trim support, in fact disabling trim on the filesystm
>> "fixes" it.
>>
>> I filed bug #194606 on bugzilla [1] to further track this issue, if
>> anyone is interested.
>>
>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606
> Nice work Guido, can you try the attached patch and see if that fixes it
> please?

It dies the same way with this patch applied. I tested applying the
patch both in stable/10 at r268815 and to a fresh releng/10.1.

-- 
Guido Falsi <madpilot@FreeBSD.org>

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 17:34:52 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 84AC4AB4;
 Sun, 26 Oct 2014 17:34:52 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id 4BA81FB5;
 Sun, 26 Oct 2014 17:34:51 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id 0780B20E7088C; Sun, 26 Oct 2014 17:34:51 +0000 (UTC)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk
 [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTPS id EC21020E70886;
 Sun, 26 Oct 2014 17:34:50 +0000 (UTC)
Message-ID: <544D313E.8090908@multiplay.co.uk>
Date: Sun, 26 Oct 2014 17:37:02 +0000
From: Steven Hartland <killing@multiplay.co.uk>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Guido Falsi <madpilot@FreeBSD.org>, freebsd-fs@freebsd.org
Subject: Re: panic: detach with active requests on 10.1-RC3
References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net>
 <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org>
 <544D137A.7010006@multiplay.co.uk> <544D3038.7080901@FreeBSD.org>
In-Reply-To: <544D3038.7080901@FreeBSD.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 17:34:52 -0000


On 26/10/2014 17:32, Guido Falsi wrote:
> On 10/26/14 16:30, Steven Hartland wrote:
>>
>>> I bisected commits and finally found out this happens starting with
>>> r268815, which MFCed r268205.
>>>
>>> It is related to trim support, in fact disabling trim on the filesystm
>>> "fixes" it.
>>>
>>> I filed bug #194606 on bugzilla [1] to further track this issue, if
>>> anyone is interested.
>>>
>>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606
>> Nice work Guido, can you try the attached patch and see if that fixes it
>> please?
> It dies the same way with this patch applied. I tested applying the
> patch both in stable/10 at r268815 and to a fresh releng/10.1.
>
Looks like our mails might have cross over, was this with the original 
patch or the updated one?

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 17:37:59 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 090DFB4C
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 17:37:59 +0000 (UTC)
Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id BA2DEFD0
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 17:37:58 +0000 (UTC)
Received: from mail (mail [192.168.254.3])
 by mail.madpilot.net (Postfix) with ESMTP id 3jQmbZ6NYVzb0g;
 Sun, 26 Oct 2014 18:37:46 +0100 (CET)
Received: from mail.madpilot.net ([192.168.254.3])
 by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024)
 with ESMTP id mbCySwbVk4xl; Sun, 26 Oct 2014 18:37:31 +0100 (CET)
Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206])
 by mail.madpilot.net (Postfix) with ESMTPSA;
 Sun, 26 Oct 2014 18:37:26 +0100 (CET)
Message-ID: <544D3156.5030402@FreeBSD.org>
Date: Sun, 26 Oct 2014 18:37:26 +0100
From: Guido Falsi <madpilot@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>, freebsd-fs@freebsd.org
Subject: Re: panic: detach with active requests on 10.1-RC3
References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net>
 <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org>
 <544D137A.7010006@multiplay.co.uk> <544D3038.7080901@FreeBSD.org>
 <544D313E.8090908@multiplay.co.uk>
In-Reply-To: <544D313E.8090908@multiplay.co.uk>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 17:37:59 -0000

On 10/26/14 18:37, Steven Hartland wrote:
> 
> On 26/10/2014 17:32, Guido Falsi wrote:
>> On 10/26/14 16:30, Steven Hartland wrote:
>>>
>>>> I bisected commits and finally found out this happens starting with
>>>> r268815, which MFCed r268205.
>>>>
>>>> It is related to trim support, in fact disabling trim on the filesystm
>>>> "fixes" it.
>>>>
>>>> I filed bug #194606 on bugzilla [1] to further track this issue, if
>>>> anyone is interested.
>>>>
>>>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606
>>> Nice work Guido, can you try the attached patch and see if that fixes it
>>> please?
>> It dies the same way with this patch applied. I tested applying the
>> patch both in stable/10 at r268815 and to a fresh releng/10.1.
>>
> Looks like our mails might have cross over, was this with the original
> patch or the updated one?

Original one. I just saw the new one, I'll followup shortly.

-- 
Guido Falsi <madpilot@FreeBSD.org>

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 17:59:49 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 46DE983C
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 17:59:49 +0000 (UTC)
Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 03CE824A
 for <freebsd-fs@freebsd.org>; Sun, 26 Oct 2014 17:59:48 +0000 (UTC)
Received: from mail (mail [192.168.254.3])
 by mail.madpilot.net (Postfix) with ESMTP id 3jQn4n0YgYzb3G;
 Sun, 26 Oct 2014 18:59:37 +0100 (CET)
Received: from mail.madpilot.net ([192.168.254.3])
 by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024)
 with ESMTP id RYh5vvMCiyVB; Sun, 26 Oct 2014 18:59:21 +0100 (CET)
Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206])
 by mail.madpilot.net (Postfix) with ESMTPSA;
 Sun, 26 Oct 2014 18:59:16 +0100 (CET)
Message-ID: <544D3674.3030005@FreeBSD.org>
Date: Sun, 26 Oct 2014 18:59:16 +0100
From: Guido Falsi <madpilot@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>, freebsd-fs@freebsd.org
Subject: Re: panic: detach with active requests on 10.1-RC3
References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net>
 <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org>
 <544D137A.7010006@multiplay.co.uk> <544D3038.7080901@FreeBSD.org>
 <544D313E.8090908@multiplay.co.uk> <544D3156.5030402@FreeBSD.org>
In-Reply-To: <544D3156.5030402@FreeBSD.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 17:59:49 -0000

On 10/26/14 18:37, Guido Falsi wrote:
> On 10/26/14 18:37, Steven Hartland wrote:
>>
>> On 26/10/2014 17:32, Guido Falsi wrote:
>>> On 10/26/14 16:30, Steven Hartland wrote:
>>>>
>>>>> I bisected commits and finally found out this happens starting with
>>>>> r268815, which MFCed r268205.
>>>>>
>>>>> It is related to trim support, in fact disabling trim on the filesystm
>>>>> "fixes" it.
>>>>>
>>>>> I filed bug #194606 on bugzilla [1] to further track this issue, if
>>>>> anyone is interested.
>>>>>
>>>>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606
>>>> Nice work Guido, can you try the attached patch and see if that fixes it
>>>> please?
>>> It dies the same way with this patch applied. I tested applying the
>>> patch both in stable/10 at r268815 and to a fresh releng/10.1.
>>>
>> Looks like our mails might have cross over, was this with the original
>> patch or the updated one?
> 
> Original one. I just saw the new one, I'll followup shortly.
> 

Tested again with new patch, against releng/10.1. Fixes it for me, I've
been unable to make it crash again as before.

Is there a chance to get this one in for 10.1-RELEASE?

Thanks for the patch and time!

-- 
Guido Falsi <madpilot@FreeBSD.org>

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 18:27:00 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A60506C7;
 Sun, 26 Oct 2014 18:27:00 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id 6C2767B3;
 Sun, 26 Oct 2014 18:27:00 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id 43F0620E7088C; Sun, 26 Oct 2014 18:26:58 +0000 (UTC)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk
 [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 348CA20E70886;
 Sun, 26 Oct 2014 18:26:58 +0000 (UTC)
Message-ID: <544D3D77.8000605@multiplay.co.uk>
Date: Sun, 26 Oct 2014 18:29:11 +0000
From: Steven Hartland <killing@multiplay.co.uk>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Guido Falsi <madpilot@FreeBSD.org>, freebsd-fs@freebsd.org
Subject: Re: panic: detach with active requests on 10.1-RC3
References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net>
 <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org>
 <544D137A.7010006@multiplay.co.uk> <544D3038.7080901@FreeBSD.org>
 <544D313E.8090908@multiplay.co.uk> <544D3156.5030402@FreeBSD.org>
 <544D3674.3030005@FreeBSD.org>
In-Reply-To: <544D3674.3030005@FreeBSD.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 18:27:00 -0000


On 26/10/2014 17:59, Guido Falsi wrote:
> Tested again with new patch, against releng/10.1. Fixes it for me, 
> I've been unable to make it crash again as before. Is there a chance 
> to get this one in for 10.1-RELEASE? Thanks for the patch and time! 

Thanks for testing Guido, I have already informed re@ about being a 
potential blocker, so yes I'm looking to get this in for 10.1.

     Regards
     Steve

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 26 21:00:11 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9E02CE6
 for <freebsd-fs@FreeBSD.org>; Sun, 26 Oct 2014 21:00:11 +0000 (UTC)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 75B136C3
 for <freebsd-fs@FreeBSD.org>; Sun, 26 Oct 2014 21:00:11 +0000 (UTC)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id s9QL0B8E080271
 for <freebsd-fs@FreeBSD.org>; Sun, 26 Oct 2014 21:00:11 GMT
 (envelope-from bugzilla-noreply@FreeBSD.org)
Message-Id: <201410262100.s9QL0B8E080271@kenobi.freebsd.org>
From: bugzilla-noreply@FreeBSD.org
To: freebsd-fs@FreeBSD.org
Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
Date: Sun, 26 Oct 2014 21:00:11 +0000
Content-Type: text/plain; charset="UTF-8"
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 21:00:11 -0000

To view an individual PR, use:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id).

The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status          |    Bug Id | Description
----------------+-----------+-------------------------------------------------
Needs MFC       |    136470 | [nfs] Cannot mount / in read-only, over NFS     
Needs MFC       |    139651 | [nfs] mount(8): read-only remount of NFS volume 
Needs MFC       |    144447 | [zfs] sharenfs fsunshare() & fsshare_main() non 

3 problems total for which you should take action.

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct 27 00:22:45 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 51CBD8BA;
 Mon, 27 Oct 2014 00:22:45 +0000 (UTC)
Received: from mail.jrv.org (adsl-70-243-84-11.dsl.austtx.swbell.net
 [70.243.84.11]) by mx1.freebsd.org (Postfix) with ESMTP id 08876B34;
 Mon, 27 Oct 2014 00:22:44 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1])
 by mail.jrv.org (Postfix) with ESMTP id D2B0F1B6C41;
 Sun, 26 Oct 2014 19:22:36 -0500 (CDT)
Received: from mail.jrv.org ([127.0.0.1])
 by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id Jy30G9Y-t3Az; Sun, 26 Oct 2014 19:22:26 -0500 (CDT)
Received: from localhost (localhost.localdomain [127.0.0.1])
 by mail.jrv.org (Postfix) with ESMTP id D79A11B6C3C;
 Sun, 26 Oct 2014 19:22:26 -0500 (CDT)
X-Virus-Scanned: amavisd-new at zimbra64.housenet.jrv
Received: from mail.jrv.org ([127.0.0.1])
 by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id Dh7RDba9VwES; Sun, 26 Oct 2014 19:22:26 -0500 (CDT)
Received: from [192.168.138.128] (BMX.housenet.jrv [192.168.3.140])
 by mail.jrv.org (Postfix) with ESMTPSA id B50751B6C39;
 Sun, 26 Oct 2014 19:22:26 -0500 (CDT)
Message-ID: <544D9056.10805@jrv.org>
Date: Sun, 26 Oct 2014 18:22:46 -0600
From: "James R. Van Artsdalen" <james-freebsd-fs2@jrv.org>
User-Agent: Mozilla/5.0 (Windows NT 5.0;
 rv:12.0) Gecko/20120428 Thunderbird/12.0.1
MIME-Version: 1.0
To: "James R. Van Artsdalen" <james-freebsd-current@jrv.org>
Subject: Re: zfs recv hangs in kmem arena
References: <54250AE9.6070609@jrv.org> <543FAB3C.4090503@jrv.org>
In-Reply-To: <543FAB3C.4090503@jrv.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, current@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 00:22:45 -0000

I was able to complete a ZFS replication by manually intervening each
time "zfs recv" blocked on "kmem arena": running the program at the end
was sufficient to unblock zfs each of the 17 times it stalled.

The program is intended to consume about 24GB RAM out of 32GB physical
RAM, thereby pressuring the ARC and kernel cache to shrink: when the
program exits it would leave plenty of free RAM for zfs or whatever
else.  What actually happened is that every time, zfs unblocked as the
program below was growing: it was never necessary to wait for the
program to exit and free memory before zfs unblocked.

On 10/16/2014 6:25 AM, James R. Van Artsdalen wrote:
> The zfs recv / kmem arena hang happens with -CURRENT as well as
> 10-STABLE, on two different systems, with 16GB or 32GB of RAM, from
> memstick or normal multi-user environments,
>
> Hangs usually seem to hapeen 1TB to 3TB in, but last night one run hung
> after only 4.35MB.
>
> On 9/26/2014 1:42 AM, James R. Van Artsdalen wrote:
>> FreeBSD BLACKIE.housenet.jrv 10.1-BETA2 FreeBSD 10.1-BETA2 #2 r272070M:
>> Wed Sep 24 17:36:56 CDT 2014    
>> james@BLACKIE.housenet.jrv:/usr/obj/usr/src/sys/GENERIC  amd64
>>
>> With current STABLE10 I am unable to replicate a ZFS pool using zfs
>> send/recv without zfs hanging in state "kmem arena", within the first
>> 4TB or so (of a 23TB Pool).
>>
>> The most recent attempt used this command line
>>
>> SUPERTEX:/root# zfs send -R BIGTEX/UNIX@syssnap | ssh BLACKIE zfs recv
>> -duvF BIGTOX
>>
>> though local replications fail in kmem arena too.
>>
>> The two machines I've been attempting this on have 16BG and 32GB of RAM
>> each and are otherwise idle.
>>
>> Any suggestions on how to get around, or investigate, "kmem arena"?
>>
>> # top
>> last pid:  3272;  load averages:  0.22,  0.22,  0.23                  up
>> 0+08:25:02  01:32:07
>> 34 processes:  1 running, 33 sleeping
>> CPU:  0.0% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.9% idle
>> Mem: 21M Active, 82M Inact, 15G Wired, 28M Cache, 450M Free
>> ARC: 12G Total, 24M MFU, 12G MRU, 23M Anon, 216M Header, 47M Other
>> Swap: 16G Total, 16G Free
>>
>>   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU
>> COMMAND
>>  1173 root          1  52    0 86476K  7780K select  0 124:33   0.00% sshd
>>  1176 root          1  46    0 87276K 47732K kmem a  3  48:36   0.00% zfs
>>   968 root         32  20    0 12344K  1888K rpcsvc  0   0:13   0.00% nfsd
>>  1009 root          1  20    0 25452K  2864K select  3   0:01   0.00% ntpd
>> ...

#include <stdlib.h>
#include <string.h>

long long s = ( (long long) 1 << 32) - 65;

main()
{
  char *p;

  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
}


From owner-freebsd-fs@FreeBSD.ORG  Mon Oct 27 08:00:10 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9749D197
 for <freebsd-fs@FreeBSD.org>; Mon, 27 Oct 2014 08:00:10 +0000 (UTC)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 850C2A69
 for <freebsd-fs@FreeBSD.org>; Mon, 27 Oct 2014 08:00:10 +0000 (UTC)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id s9R80AjV070626
 for <freebsd-fs@FreeBSD.org>; Mon, 27 Oct 2014 08:00:10 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
Message-Id: <201410270800.s9R80AjV070626@kenobi.freebsd.org>
From: bugzilla-noreply@freebsd.org
To: freebsd-fs@FreeBSD.org
Subject: [FreeBSD Bugzilla] Commit Needs MFC
MIME-Version: 1.0
X-Bugzilla-Type: whine
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
Date: Mon, 27 Oct 2014 08:00:10 +0000
Content-Type: text/plain
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 08:00:10 -0000

Hi,

You have a bug in the "Needs MFC" state which has not been touched in 7 or more days. This email serves as a reminder that you may want to MFC this bug or marked it as completed.

In the event you have a longer MFC timeout you may update this bug with a comment and I won't remind you again for 7 days.

This reminder is only sent on Mondays.  Please file a bug about concerns you may have.

  This search was scheduled by eadler@FreeBSD.org.


 (3 bugs)

Bug 136470:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=136470
    Severity: Affects Only Me
    Priority: Normal
    Hardware: Any
    Assignee: freebsd-fs@FreeBSD.org
      Status: Needs MFC
  Resolution: 
     Summary: [nfs] Cannot mount / in read-only, over NFS
Bug 139651:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=139651
    Severity: Affects Only Me
    Priority: Normal
    Hardware: Any
    Assignee: freebsd-fs@FreeBSD.org
      Status: Needs MFC
  Resolution: 
     Summary: [nfs] mount(8): read-only remount of NFS volume does not work
Bug 144447:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=144447
    Severity: Affects Only Me
    Priority: Normal
    Hardware: Any
    Assignee: freebsd-fs@FreeBSD.org
      Status: Needs MFC
  Resolution: 
     Summary: [zfs] sharenfs fsunshare() & fsshare_main() non functional


From owner-freebsd-fs@FreeBSD.ORG  Mon Oct 27 15:16:12 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2858EFE3
 for <freebsd-fs@freebsd.org>; Mon, 27 Oct 2014 15:16:12 +0000 (UTC)
Received: from mail1.sandvine.com (Mail1.sandvine.com [64.7.137.134])
 by mx1.freebsd.org (Postfix) with ESMTP id BEFD1382
 for <freebsd-fs@freebsd.org>; Mon, 27 Oct 2014 15:16:11 +0000 (UTC)
Received: from WTL-EXCHP-1.sandvine.com ([fe80::ac6b:cc1e:f2ff:93aa]) by
 wtl-exchp-2.sandvine.com ([fe80::68ac:f071:19ff:3455%19]) with mapi id
 14.03.0195.001; Mon, 27 Oct 2014 11:15:01 -0400
From: Adam Parco <aparco@sandvine.com>
To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject: panic: devfs_fsync: vop_stdfsync failed.
Thread-Topic: panic: devfs_fsync: vop_stdfsync failed.
Thread-Index: Ac/x88uhWWQKkD58QMu09zdrpmOZwQ==
Date: Mon, 27 Oct 2014 15:15:00 +0000
Message-ID: <E24EA62D164B85409C90C1D04F35083918311488@wtl-exchp-1.sandvine.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [192.168.200.58]
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 15:16:12 -0000

Hello,

I am investigating making FreeBSD 8.2 more resilient to removing a USB duri=
ng a write.  In my short testing I have gotten 3 different failures.  I wou=
ld like to discuss potential solutions for the first failure.  I looked at =
head source and there doesn't appear to be any fixes for this particular is=
sue.

>From what I have gathered, it looks like:

-          the filesystem synchronizer daemon wakes up

-          tries to sync vnodes

-          devfs_fsync realizes the device went away with dirty set, warns =
of data loss:

o   "Device da0s1 went missing before all of the data could be written to i=
t; expect data loss."

-          vop_stdfync loops a couple times trying to handle the dirty item=
s

-          eventual gives up

o   "fsync: giving up on dirty"

-          Dirty count will still be >0 and errno should be EAGAIN (not 0)

-          Panic

o   "devfs_fsync: vop_stdfsync failed."

When we give up on dirty, should we be clearing the dirty count?  Otherwise=
 we will always panic after this.  Thoughts?  Other suggestions?

Thanks,
Adam.

bt:
#0  doadump () at pcpu.h:224
#1  0xffffffff8045fbe9 in boot (howto=3D260) at /usr/src/sys/kern/kern_shut=
down.c:508
#2  0xffffffff8046011d in panic (fmt=3D0x1 <Address 0x1 out of bounds>) at =
/usr/src/sys/kern/kern_shutdown.c:775
#3  0xffffffff803ec3db in devfs_fsync (ap=3D0xffffff862b3edb20) at /usr/src=
/sys/fs/devfs/devfs_vnops.c:569
#4  0xffffffff8069c8ca in VOP_FSYNC_APV (vop=3D0xffffffff80892fa0, a=3D0xff=
ffff862b3edb20) at vnode_if.c:1267
#5  0xffffffff804e3e27 in sync_vnode (slp=3D0xffffff0006136af0, bo=3D0xffff=
ff862b3edbc0, td=3D0xffffff0006005480) at vnode_if.h:549
#6  0xffffffff804e406d in sched_sync () at /usr/src/sys/kern/vfs_subr.c:184=
1
#7  0xffffffff80438ed2 in fork_exit (callout=3D0xffffffff804e3ec0 <sched_sy=
nc>, arg=3D0x0, frame=3D0xffffff862b3edc50) at /usr/src/sys/kern/kern_fork.=
c:847
#8  0xffffffff80633dbe in fork_trampoline () at /usr/src/sys/amd64/amd64/ex=
ception.S:599

console:
da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
da0: <Kingston DataTraveler G3 PMAP> Removable Direct Access SCSI-0 device
da0: 40.000MB/s transfers
da0: 7441MB (15240576 512 byte sectors: 255H 63S/T 948C)
[-- MARK -- Fri Oct 24 13:39:00 2014]
ugen1.3: <Kingston> at usbus1 (disconnected)
umass0: at uhub3, port 1, addr 3 (disconnected)
(da0:umass-sim0:0:0:0): AutoSense failed
g_vfs_done():da0s1[WRITE(offset=3D50139136, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50204672, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50270208, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50335744, length=3D65536)](da0:umass-sim0=
:0:0:0): lost device error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50401280, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50466816, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50532352, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50597888, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50663424, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50728960, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50794496, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50860032, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50925568, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50991104, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D50073600, length=3D65536)]error =3D 5
g_vfs_done():da0s1[WRITE(offset=3D51056640, length=3D65536)]error =3D 6
g_vfs_done():da0s1[WRITE(offset=3D51122176, length=3D65536)]error =3D 6
(da0:umass-sim0:0:0:0): Synchronize cache failed, status =3D=3D 0xa, scsi s=
tatus =3D=3D 0x0
(da0:umass-sim0:0:0:0): removing device entry
g_vfs_done():[unknown][WRITE(offset=3D51187712, length=3D65536)]error =3D 6
g_vfs_done():[unknown][WRITE(offset=3D51253248, length=3D65536)]error =3D 6
g_vfs_done():[unknown][WRITE(offset=3D51318784, length=3D65536)]error =3D 6
Device da0s1 went missing before all of the data could be written to it; ex=
pect data loss.
fsync: giving up on dirty
0xffffff08c7eec1d8: tag devfs, type VCHR
    usecount 1, writecount 0, refcount 4 mountedhere 0xffffff0109ddf400
    flags (VI_DOOMED)
    v_object 0xffffff08c7ec7438 ref 0 pages 2169
   lock type devfs: EXCL by thread 0xffffff0adf0b6900 (pid 64960)
                dev da0s1
panic: devfs_fsync: vop_stdfsync failed.
pci: 1752 correctable, 0 uncorrectable, 0 fatal
cpu: 2 correctable, 0 uncorrectable, 0 fatal
cpuid =3D 4
curthread =3D getty/getty (64960/100629)
cpu_ticks =3D 19048675905756
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff801e51ca =3D db_trace_self_wrapper+0x2=
a
panic() at 0xffffffff80460148 =3D panic+0x228
devfs_fsync() at 0xffffffff803ec3db =3D devfs_fsync+0x8b
VOP_FSYNC_APV() at 0xffffffff8069c8ca =3D VOP_FSYNC_APV+0x4a
bufsync() at 0xffffffff804cbcd8 =3D bufsync+0x38
bufobj_invalbuf() at 0xffffffff804e1997 =3D bufobj_invalbuf+0x87
vgonel() at 0xffffffff804e1c56 =3D vgonel+0xb6
vgone() at 0xffffffff804e1e89 =3D vgone+0x39
devfs_delete() at 0xffffffff803eab89 =3D devfs_delete+0x189
devfs_populate_loop() at 0xffffffff803eb37d =3D devfs_populate_loop+0x3ad
devfs_populate() at 0xffffffff803eb461 =3D devfs_populate+0x21
devfs_lookup() at 0xffffffff803eeb94 =3D devfs_lookup+0x2d4
VOP_LOOKUP_APV() at 0xffffffff8069e9bc =3D VOP_LOOKUP_APV+0x4c
lookup() at 0xffffffff804d7eea =3D lookup+0x37a
namei() at 0xffffffff804d8cff =3D namei+0x3bf
vn_open_cred() at 0xffffffff804ed583 =3D vn_open_cred+0x1e3
kern_openat() at 0xffffffff804eaab9 =3D kern_openat+0x149
syscallenter() at 0xffffffff8049bad4 =3D syscallenter+0x104
syscall() at 0xffffffff8064a15c =3D syscall+0x4c
Xfast_syscall() at 0xffffffff80633b52 =3D Xfast_syscall+0xe2
--- syscall (5, FreeBSD ELF64, open), rip =3D 0x300845cfc, rsp =3D 0x7fffff=
ffecd8, rbp =3D 0x5086a0 ---
Uptime: 2h26m42s
Physical memory: 49040 MB
Dumping 1934 MB: 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 174=
3 1727 1711 1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 150=
3 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 126=
3 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071[-- MARK -- Fr=
i Oct 24 13:40:00 2014]
1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783=
 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 47=
9 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 1=
75 159 143 127 111 95 79 63 47 31 15
Dump complete

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct 27 18:34:12 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 122D8287;
 Mon, 27 Oct 2014 18:34:12 +0000 (UTC)
Received: from mail-vc0-x235.google.com (mail-vc0-x235.google.com
 [IPv6:2607:f8b0:400c:c03::235])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id AE94CF62;
 Mon, 27 Oct 2014 18:34:11 +0000 (UTC)
Received: by mail-vc0-f181.google.com with SMTP id hy10so1404492vcb.26
 for <multiple recipients>; Mon, 27 Oct 2014 11:34:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=01sPY+OMHWg/Up+AutcbW5dtl/bmfoCU5izse5Na2L4=;
 b=C9We9w5vkS5uLO0ySoNiB2CG8XkoxUrgnZ5NaW7aA56efwvawZy7z62OyXrJqrnwk7
 f2+WaY50mgWrq79PQxyaQR6DtFQL7AER5BatYuaFNsRLLRM5n0m+65L+Nbbl3RPJP9Zb
 Ogz9s0gfwcY7EnEHZDqg5QBcsQYUjyFgYlsTV1VoR9IB4I/EDy6MzpD5q9iVfPdNKs3D
 1yH+kdFvZrnqW3kMvzXtCW3VIAt0MudZ9I1feB2LmEt4rIpge+Q7fIJYli4Ok/AqJTzL
 YUGVaA4dhULu9/ntYiSxzizNbb+/9bnu8gy0S6Em15BddWfe2fnq694ca2hlJmKS4Vc7
 u0ZA==
MIME-Version: 1.0
X-Received: by 10.220.213.197 with SMTP id gx5mr1326433vcb.51.1414434850001;
 Mon, 27 Oct 2014 11:34:10 -0700 (PDT)
Received: by 10.220.118.73 with HTTP; Mon, 27 Oct 2014 11:34:09 -0700 (PDT)
In-Reply-To: <CACpH0Md8f1dAqUvgAMnKN+iZbWmL2ANXuwj7xDqkiGcHaiS9jg@mail.gmail.com>
References: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com>
 <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com>
 <CACpH0MfL1J8fbP+Mkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
 <544B12B8.8060302@freebsd.org>
 <CACpH0Md8f1dAqUvgAMnKN+iZbWmL2ANXuwj7xDqkiGcHaiS9jg@mail.gmail.com>
Date: Mon, 27 Oct 2014 14:34:09 -0400
Message-ID: <CACpH0MdQDi85pvks+E1A2OYRKYXi6CMiXcsL4U1Ud5r_Zw4d8g@mail.gmail.com>
Subject: Re: ZFS errors on the array but not the disk.
From: Zaphod Beeblebrox <zbeeble@gmail.com>
To: Steven Hartland <smh@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: freebsd-fs <freebsd-fs@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 18:34:12 -0000

Ok... This is just frustrating.  I've trusted ZFS through many versions ...
and pretty much ... it's delivered.  There are five symptoms here:

1. after each reboot, resilver starts again... even if after the resilver I
complete a full scrub.

2. seemingly random objects (files, zvols or snapshot items) get marked as
having errors.  when I say random, to be clear; different items each time.

3. none of the drives are showing errors in zpool status, neither are they
chucking errors into dmesg.

4. errors are being logged against the vdev (only one of the two vdevs) and
the array (half as many as the vdev).

5. The activity light for the recently replaced disk does not "flash"
"with" the others in it's vdev during either resilver or scrub.  This last
bit might need some explanation. I realize that raidz-1 stripes do not
always use all the disks, but "generally" the activity lights of the drives
in a vdev go "together"... In this case, the light of the recently replaced
drive is off much of the time ...

Is there anything I can/should do?  I pulled the new disk, moved it's
partitions around (it's larger than the array disks because you can't buy
1.5T drives anymore) and then re-added it... so I've tried that.


On Fri, Oct 24, 2014 at 11:47 PM, Zaphod Beeblebrox <zbeeble@gmail.com>
wrote:

> Thanks for the heads up.  I'm following releng/10.1 and 271683 seems to be
> part of that, but a good catch/guess.
>
>
> On Fri, Oct 24, 2014 at 11:02 PM, Steven Hartland <smh@freebsd.org> wrote:
>
>> There was an issue which would cause resilver restarts fixed by *265253* <
>> https://svnweb.freebsd.org/base?view=revision&revision=265253> which was
>> MFC'ed to stable/10 by *271683* <https://svnweb.freebsd.org/
>> base?view=revision&revision=271683>so you'll want to make sure your
>> latter than that.
>>
>>
>> On 24/10/2014 19:42, Zaphod Beeblebrox wrote:
>>
>>> I manually replaced a disk... and the array was scrubbed recently.
>>> Interestingly, I seem to be in the "endless loop"  of resilvering
>>> problem.
>>> Not much I can find on it.  but resilvering will complete and I can then
>>> run another scrub.  It will complete, too.  Then rebooting causes another
>>> resilvering.
>>>
>>> Another odd data point: it seems as if the things that show up as
>>> "errors"
>>> change from resilvering to resilvering.
>>>
>>> One bug, it would seem, is that once ZFS has detected an error... another
>>> scrub can reset it, but no attempt is made to read-through the error if
>>> you
>>> access the object directly.
>>>
>>> On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers <asomers@freebsd.org>
>>> wrote:
>>>
>>>  On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox <zbeeble@gmail.com>
>>>> wrote:
>>>>
>>>>> What does it mean when checksum errors appear on the array (and the
>>>>> vdev)
>>>>> but not on any of the disks?  See the paste below.  One would think
>>>>> that
>>>>> there isn't some ephemeral data stored somewhere that is not one of the
>>>>> disks, yet "cksum" errors show only on the vdev and the array lines.
>>>>>
>>>> Help?
>>>>
>>>>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status
>>>>>    pool: vr2
>>>>>   state: ONLINE
>>>>> status: One or more devices is currently being resilvered.  The pool
>>>>> will
>>>>>          continue to function, possibly in a degraded state.
>>>>> action: Wait for the resilver to complete.
>>>>>    scan: resilver in progress since Thu Oct 23 23:11:29 2014
>>>>>          1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go
>>>>>          119G resilvered, 6.79% done
>>>>> config:
>>>>>
>>>>>          NAME               STATE     READ WRITE CKSUM
>>>>>          vr2                ONLINE       0     0    36
>>>>>            raidz1-0         ONLINE       0     0    72
>>>>>              label/vr2-d0   ONLINE       0     0     0
>>>>>              label/vr2-d1   ONLINE       0     0     0
>>>>>              gpt/vr2-d2c    ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native  (resilvering)
>>>>>              gpt/vr2-d3b    ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native
>>>>>              gpt/vr2-d4a    ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native
>>>>>              ada14          ONLINE       0     0     0
>>>>>              label/vr2-d6   ONLINE       0     0     0
>>>>>              label/vr2-d7c  ONLINE       0     0     0
>>>>>              label/vr2-d8   ONLINE       0     0     0
>>>>>            raidz1-1         ONLINE       0     0     0
>>>>>              gpt/vr2-e0     ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native
>>>>>              gpt/vr2-e1     ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native
>>>>>              gpt/vr2-e2     ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native
>>>>>              gpt/vr2-e3     ONLINE       0     0     0
>>>>>              gpt/vr2-e4     ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native
>>>>>              gpt/vr2-e5     ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native
>>>>>              gpt/vr2-e6     ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native
>>>>>              gpt/vr2-e7     ONLINE       0     0     0  block size:
>>>>> 512B
>>>>> configured, 4096B native
>>>>>
>>>>> errors: 43 data errors, use '-v' for a list
>>>>>
>>>> The checksum errors will appear on the raidz vdev instead of a leaf if
>>>> vdev_raidz.c can't determine which leaf vdev was responsible.  This
>>>> could happen if two or more leaf vdevs return bad data for the same
>>>> block, which would also lead to unrecoverable data errors.  I see that
>>>> you have some unrecoverable data errors, so maybe that's what happened
>>>> to you.
>>>>
>>>> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable
>>>> to determine which child was responsible for a checksum error.
>>>> However, I've only seen that happen when a raidz vdev has a mirror
>>>> child.  That can only happen if the child is a spare or replacing
>>>> vdev.  Did you activate any spares, or did you manually replace a
>>>> vdev?
>>>>
>>>> -Alan
>>>>
>>>>  _______________________________________________
>>> freebsd-fs@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>
>>>
>>>
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>
>
>

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct 27 23:13:44 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EED6C9BD
 for <freebsd-fs@freebsd.org>; Mon, 27 Oct 2014 23:13:44 +0000 (UTC)
Received: from mail-wi0-x231.google.com (mail-wi0-x231.google.com
 [IPv6:2a00:1450:400c:c05::231])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 703C730A
 for <freebsd-fs@freebsd.org>; Mon, 27 Oct 2014 23:13:44 +0000 (UTC)
Received: by mail-wi0-f177.google.com with SMTP id ex7so6002wid.16
 for <freebsd-fs@freebsd.org>; Mon, 27 Oct 2014 16:13:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nofocus.org; s=google;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=OCemWUEe5obl8Ac/+eSYevorlVK9O8ERvOttN0sdQu4=;
 b=NwNaZf4Ori+5PJajm3XYnq8piLRbuQN2cPb99bhvvLX74wVp7xYZ4li9SdFY+z8yT7
 MMZyE+jLIBh09k4s0AgOJ1ggR3l61mvYpJZTtW8w4RsmCYajK1aFYUrPjLaLF37r9sVM
 7ksFzkQOKai3HjLksFCszi+U4wMN28v3qYoyI=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-type;
 bh=OCemWUEe5obl8Ac/+eSYevorlVK9O8ERvOttN0sdQu4=;
 b=Ac9jBM8HXcQdR4u66gtyxAgPG82tkKGEbe7u9a0owtrT8ivIKZWQ3WHr6/ubmWo0yO
 B+yEqlDgeorNWs5jopBvn18HlNJ4exgB+J5LkdYA99L4wdPSHgeijM7x9ooNdsAEPCQs
 z1/8UAu8Uq83z57oiahIqq4/zbc3uP9Wvd/4ZKMdTfIj/gc7P+sBIGhKrsl2loMnU0YC
 X5eS2os7oojElgtICLpgcrj2l+rcHXU8CHHp4LiGItqxNyQmwMxmGGqLGFoLlsYEdwgp
 Inb0V0JKR3gc+70vNJ+pBAwVoBET8wDIEp7Td3Hnb8K72jjMl5gdarvHY7/DWcvNECma
 lRjQ==
X-Gm-Message-State: ALoCoQkaOpCm/6JSedzhp75ZTg9n+djAYXjvFCzg5tcwhGoCHFA6GpVePYRt1+qBPEyYirP1IqdT
X-Received: by 10.194.58.205 with SMTP id t13mr24834934wjq.55.1414451622405;
 Mon, 27 Oct 2014 16:13:42 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.180.103.10 with HTTP; Mon, 27 Oct 2014 16:13:22 -0700 (PDT)
In-Reply-To: <CACpH0MdQDi85pvks+E1A2OYRKYXi6CMiXcsL4U1Ud5r_Zw4d8g@mail.gmail.com>
References: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com>
 <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com>
 <CACpH0MfL1J8fbP+Mkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
 <544B12B8.8060302@freebsd.org>
 <CACpH0Md8f1dAqUvgAMnKN+iZbWmL2ANXuwj7xDqkiGcHaiS9jg@mail.gmail.com>
 <CACpH0MdQDi85pvks+E1A2OYRKYXi6CMiXcsL4U1Ud5r_Zw4d8g@mail.gmail.com>
From: Robert Banz <rob@nofocus.org>
Date: Mon, 27 Oct 2014 16:13:22 -0700
Message-ID: <CA+-fWwBgh-mzKFRVhtddZVZz9j8T2fh-M-gpgR+4XmchbW8W1A@mail.gmail.com>
Subject: Re: ZFS errors on the array but not the disk.
To: Zaphod Beeblebrox <zbeeble@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: freebsd-fs <freebsd-fs@freebsd.org>, Steven Hartland <smh@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 23:13:45 -0000

Have you tried different hardware? This screams something's up anywhere in
the stack -- DRAM, cabling, controller...

On Mon, Oct 27, 2014 at 11:34 AM, Zaphod Beeblebrox <zbeeble@gmail.com>
wrote:

> Ok... This is just frustrating.  I've trusted ZFS through many versions ...
> and pretty much ... it's delivered.  There are five symptoms here:
>
> 1. after each reboot, resilver starts again... even if after the resilver I
> complete a full scrub.
>
> 2. seemingly random objects (files, zvols or snapshot items) get marked as
> having errors.  when I say random, to be clear; different items each time.
>
> 3. none of the drives are showing errors in zpool status, neither are they
> chucking errors into dmesg.
>
> 4. errors are being logged against the vdev (only one of the two vdevs) and
> the array (half as many as the vdev).
>
> 5. The activity light for the recently replaced disk does not "flash"
> "with" the others in it's vdev during either resilver or scrub.  This last
> bit might need some explanation. I realize that raidz-1 stripes do not
> always use all the disks, but "generally" the activity lights of the drives
> in a vdev go "together"... In this case, the light of the recently replaced
> drive is off much of the time ...
>
> Is there anything I can/should do?  I pulled the new disk, moved it's
> partitions around (it's larger than the array disks because you can't buy
> 1.5T drives anymore) and then re-added it... so I've tried that.
>
>
> On Fri, Oct 24, 2014 at 11:47 PM, Zaphod Beeblebrox <zbeeble@gmail.com>
> wrote:
>
> > Thanks for the heads up.  I'm following releng/10.1 and 271683 seems to
> be
> > part of that, but a good catch/guess.
> >
> >
> > On Fri, Oct 24, 2014 at 11:02 PM, Steven Hartland <smh@freebsd.org>
> wrote:
> >
> >> There was an issue which would cause resilver restarts fixed by
> *265253* <
> >> https://svnweb.freebsd.org/base?view=revision&revision=265253> which
> was
> >> MFC'ed to stable/10 by *271683* <https://svnweb.freebsd.org/
> >> base?view=revision&revision=271683>so you'll want to make sure your
> >> latter than that.
> >>
> >>
> >> On 24/10/2014 19:42, Zaphod Beeblebrox wrote:
> >>
> >>> I manually replaced a disk... and the array was scrubbed recently.
> >>> Interestingly, I seem to be in the "endless loop"  of resilvering
> >>> problem.
> >>> Not much I can find on it.  but resilvering will complete and I can
> then
> >>> run another scrub.  It will complete, too.  Then rebooting causes
> another
> >>> resilvering.
> >>>
> >>> Another odd data point: it seems as if the things that show up as
> >>> "errors"
> >>> change from resilvering to resilvering.
> >>>
> >>> One bug, it would seem, is that once ZFS has detected an error...
> another
> >>> scrub can reset it, but no attempt is made to read-through the error if
> >>> you
> >>> access the object directly.
> >>>
> >>> On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers <asomers@freebsd.org>
> >>> wrote:
> >>>
> >>>  On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox <
> zbeeble@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> What does it mean when checksum errors appear on the array (and the
> >>>>> vdev)
> >>>>> but not on any of the disks?  See the paste below.  One would think
> >>>>> that
> >>>>> there isn't some ephemeral data stored somewhere that is not one of
> the
> >>>>> disks, yet "cksum" errors show only on the vdev and the array lines.
> >>>>>
> >>>> Help?
> >>>>
> >>>>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status
> >>>>>    pool: vr2
> >>>>>   state: ONLINE
> >>>>> status: One or more devices is currently being resilvered.  The pool
> >>>>> will
> >>>>>          continue to function, possibly in a degraded state.
> >>>>> action: Wait for the resilver to complete.
> >>>>>    scan: resilver in progress since Thu Oct 23 23:11:29 2014
> >>>>>          1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go
> >>>>>          119G resilvered, 6.79% done
> >>>>> config:
> >>>>>
> >>>>>          NAME               STATE     READ WRITE CKSUM
> >>>>>          vr2                ONLINE       0     0    36
> >>>>>            raidz1-0         ONLINE       0     0    72
> >>>>>              label/vr2-d0   ONLINE       0     0     0
> >>>>>              label/vr2-d1   ONLINE       0     0     0
> >>>>>              gpt/vr2-d2c    ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native  (resilvering)
> >>>>>              gpt/vr2-d3b    ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native
> >>>>>              gpt/vr2-d4a    ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native
> >>>>>              ada14          ONLINE       0     0     0
> >>>>>              label/vr2-d6   ONLINE       0     0     0
> >>>>>              label/vr2-d7c  ONLINE       0     0     0
> >>>>>              label/vr2-d8   ONLINE       0     0     0
> >>>>>            raidz1-1         ONLINE       0     0     0
> >>>>>              gpt/vr2-e0     ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native
> >>>>>              gpt/vr2-e1     ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native
> >>>>>              gpt/vr2-e2     ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native
> >>>>>              gpt/vr2-e3     ONLINE       0     0     0
> >>>>>              gpt/vr2-e4     ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native
> >>>>>              gpt/vr2-e5     ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native
> >>>>>              gpt/vr2-e6     ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native
> >>>>>              gpt/vr2-e7     ONLINE       0     0     0  block size:
> >>>>> 512B
> >>>>> configured, 4096B native
> >>>>>
> >>>>> errors: 43 data errors, use '-v' for a list
> >>>>>
> >>>> The checksum errors will appear on the raidz vdev instead of a leaf if
> >>>> vdev_raidz.c can't determine which leaf vdev was responsible.  This
> >>>> could happen if two or more leaf vdevs return bad data for the same
> >>>> block, which would also lead to unrecoverable data errors.  I see that
> >>>> you have some unrecoverable data errors, so maybe that's what happened
> >>>> to you.
> >>>>
> >>>> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable
> >>>> to determine which child was responsible for a checksum error.
> >>>> However, I've only seen that happen when a raidz vdev has a mirror
> >>>> child.  That can only happen if the child is a spare or replacing
> >>>> vdev.  Did you activate any spares, or did you manually replace a
> >>>> vdev?
> >>>>
> >>>> -Alan
> >>>>
> >>>>  _______________________________________________
> >>> freebsd-fs@freebsd.org mailing list
> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >>>
> >>>
> >>>
> >> _______________________________________________
> >> freebsd-fs@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >>
> >
> >
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 28 01:47:45 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D9420EF8;
 Tue, 28 Oct 2014 01:47:45 +0000 (UTC)
Received: from mail-vc0-x230.google.com (mail-vc0-x230.google.com
 [IPv6:2607:f8b0:400c:c03::230])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7F6D179B;
 Tue, 28 Oct 2014 01:47:45 +0000 (UTC)
Received: by mail-vc0-f176.google.com with SMTP id hq11so3057019vcb.35
 for <multiple recipients>; Mon, 27 Oct 2014 18:47:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=ndICht25sscxy+qgJ3UmAL4sLZbYhvYImwubEfgkFPI=;
 b=kTviLe0dGC7jllfWef4NJ9zaa1SDdszuIJUT6SUKY+nKA9LUyIl2rEnt/rCfpeBhwF
 wNVCC46Y+hav8/lIqCkTZx86nKLCZuj1hnxJw4GhwlqFIuuE1wWLqwlenLhnwkpdgrtV
 CETBpBPjNs44leKczdwFVAK6NsW9mIFf0tj+HnVTDtXAZKajvMqArTDzmk7RLnCrDAAf
 yAd88Fm9c+UmZx/CafC8JxIYYxyUqDSKrqnxO20QJ+ZrkhdY5BAukyTQ+d1moFLiHi98
 3zp4pxTp0ljjJDseTtI0qfk58WjKmdPKb/2TZD1fJh3ipPKzIMBd23BPd0ALip0oRaem
 2jBg==
MIME-Version: 1.0
X-Received: by 10.220.128.4 with SMTP id i4mr104113vcs.32.1414460864414; Mon,
 27 Oct 2014 18:47:44 -0700 (PDT)
Received: by 10.220.118.73 with HTTP; Mon, 27 Oct 2014 18:47:44 -0700 (PDT)
In-Reply-To: <CA+-fWwBgh-mzKFRVhtddZVZz9j8T2fh-M-gpgR+4XmchbW8W1A@mail.gmail.com>
References: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com>
 <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com>
 <CACpH0MfL1J8fbP+Mkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
 <544B12B8.8060302@freebsd.org>
 <CACpH0Md8f1dAqUvgAMnKN+iZbWmL2ANXuwj7xDqkiGcHaiS9jg@mail.gmail.com>
 <CACpH0MdQDi85pvks+E1A2OYRKYXi6CMiXcsL4U1Ud5r_Zw4d8g@mail.gmail.com>
 <CA+-fWwBgh-mzKFRVhtddZVZz9j8T2fh-M-gpgR+4XmchbW8W1A@mail.gmail.com>
Date: Mon, 27 Oct 2014 21:47:44 -0400
Message-ID: <CACpH0McVeuUGoC45rsK-cwrG0TFd_s=Cj66G7_TX=8a8jNBWQQ@mail.gmail.com>
Subject: Re: ZFS errors on the array but not the disk.
From: Zaphod Beeblebrox <zbeeble@gmail.com>
To: Robert Banz <rob@nofocus.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: freebsd-fs <freebsd-fs@freebsd.org>, Steven Hartland <smh@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 01:47:45 -0000

Well... why wouldn't this trigger an error with (say) the checksums on the
devices themselves?  Without throwing an error, why is the vdev re -
resilvering?  I don't have spare hardware to throw at it.  It's otherwise a
sane system.  It can "make -j32 buildworld" without choking.  It can
download several hundred torrents at a time without corrupting them.
Hardly seems like suspect hardware.

On Mon, Oct 27, 2014 at 7:13 PM, Robert Banz <rob@nofocus.org> wrote:

> Have you tried different hardware? This screams something's up anywhere in
> the stack -- DRAM, cabling, controller...
>
> On Mon, Oct 27, 2014 at 11:34 AM, Zaphod Beeblebrox <zbeeble@gmail.com>
> wrote:
>
>> Ok... This is just frustrating.  I've trusted ZFS through many versions
>> ...
>> and pretty much ... it's delivered.  There are five symptoms here:
>>
>> 1. after each reboot, resilver starts again... even if after the resilver
>> I
>> complete a full scrub.
>>
>> 2. seemingly random objects (files, zvols or snapshot items) get marked as
>> having errors.  when I say random, to be clear; different items each time.
>>
>> 3. none of the drives are showing errors in zpool status, neither are they
>> chucking errors into dmesg.
>>
>> 4. errors are being logged against the vdev (only one of the two vdevs)
>> and
>> the array (half as many as the vdev).
>>
>> 5. The activity light for the recently replaced disk does not "flash"
>> "with" the others in it's vdev during either resilver or scrub.  This last
>> bit might need some explanation. I realize that raidz-1 stripes do not
>> always use all the disks, but "generally" the activity lights of the
>> drives
>> in a vdev go "together"... In this case, the light of the recently
>> replaced
>> drive is off much of the time ...
>>
>> Is there anything I can/should do?  I pulled the new disk, moved it's
>> partitions around (it's larger than the array disks because you can't buy
>> 1.5T drives anymore) and then re-added it... so I've tried that.
>>
>>
>> On Fri, Oct 24, 2014 at 11:47 PM, Zaphod Beeblebrox <zbeeble@gmail.com>
>> wrote:
>>
>> > Thanks for the heads up.  I'm following releng/10.1 and 271683 seems to
>> be
>> > part of that, but a good catch/guess.
>> >
>> >
>> > On Fri, Oct 24, 2014 at 11:02 PM, Steven Hartland <smh@freebsd.org>
>> wrote:
>> >
>> >> There was an issue which would cause resilver restarts fixed by
>> *265253* <
>> >> https://svnweb.freebsd.org/base?view=revision&revision=265253> which
>> was
>> >> MFC'ed to stable/10 by *271683* <https://svnweb.freebsd.org/
>> >> base?view=revision&revision=271683>so you'll want to make sure your
>> >> latter than that.
>> >>
>> >>
>> >> On 24/10/2014 19:42, Zaphod Beeblebrox wrote:
>> >>
>> >>> I manually replaced a disk... and the array was scrubbed recently.
>> >>> Interestingly, I seem to be in the "endless loop"  of resilvering
>> >>> problem.
>> >>> Not much I can find on it.  but resilvering will complete and I can
>> then
>> >>> run another scrub.  It will complete, too.  Then rebooting causes
>> another
>> >>> resilvering.
>> >>>
>> >>> Another odd data point: it seems as if the things that show up as
>> >>> "errors"
>> >>> change from resilvering to resilvering.
>> >>>
>> >>> One bug, it would seem, is that once ZFS has detected an error...
>> another
>> >>> scrub can reset it, but no attempt is made to read-through the error
>> if
>> >>> you
>> >>> access the object directly.
>> >>>
>> >>> On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers <asomers@freebsd.org>
>> >>> wrote:
>> >>>
>> >>>  On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox <
>> zbeeble@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> What does it mean when checksum errors appear on the array (and the
>> >>>>> vdev)
>> >>>>> but not on any of the disks?  See the paste below.  One would think
>> >>>>> that
>> >>>>> there isn't some ephemeral data stored somewhere that is not one of
>> the
>> >>>>> disks, yet "cksum" errors show only on the vdev and the array lines.
>> >>>>>
>> >>>> Help?
>> >>>>
>> >>>>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status
>> >>>>>    pool: vr2
>> >>>>>   state: ONLINE
>> >>>>> status: One or more devices is currently being resilvered.  The pool
>> >>>>> will
>> >>>>>          continue to function, possibly in a degraded state.
>> >>>>> action: Wait for the resilver to complete.
>> >>>>>    scan: resilver in progress since Thu Oct 23 23:11:29 2014
>> >>>>>          1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go
>> >>>>>          119G resilvered, 6.79% done
>> >>>>> config:
>> >>>>>
>> >>>>>          NAME               STATE     READ WRITE CKSUM
>> >>>>>          vr2                ONLINE       0     0    36
>> >>>>>            raidz1-0         ONLINE       0     0    72
>> >>>>>              label/vr2-d0   ONLINE       0     0     0
>> >>>>>              label/vr2-d1   ONLINE       0     0     0
>> >>>>>              gpt/vr2-d2c    ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native  (resilvering)
>> >>>>>              gpt/vr2-d3b    ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native
>> >>>>>              gpt/vr2-d4a    ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native
>> >>>>>              ada14          ONLINE       0     0     0
>> >>>>>              label/vr2-d6   ONLINE       0     0     0
>> >>>>>              label/vr2-d7c  ONLINE       0     0     0
>> >>>>>              label/vr2-d8   ONLINE       0     0     0
>> >>>>>            raidz1-1         ONLINE       0     0     0
>> >>>>>              gpt/vr2-e0     ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native
>> >>>>>              gpt/vr2-e1     ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native
>> >>>>>              gpt/vr2-e2     ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native
>> >>>>>              gpt/vr2-e3     ONLINE       0     0     0
>> >>>>>              gpt/vr2-e4     ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native
>> >>>>>              gpt/vr2-e5     ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native
>> >>>>>              gpt/vr2-e6     ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native
>> >>>>>              gpt/vr2-e7     ONLINE       0     0     0  block size:
>> >>>>> 512B
>> >>>>> configured, 4096B native
>> >>>>>
>> >>>>> errors: 43 data errors, use '-v' for a list
>> >>>>>
>> >>>> The checksum errors will appear on the raidz vdev instead of a leaf
>> if
>> >>>> vdev_raidz.c can't determine which leaf vdev was responsible.  This
>> >>>> could happen if two or more leaf vdevs return bad data for the same
>> >>>> block, which would also lead to unrecoverable data errors.  I see
>> that
>> >>>> you have some unrecoverable data errors, so maybe that's what
>> happened
>> >>>> to you.
>> >>>>
>> >>>> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable
>> >>>> to determine which child was responsible for a checksum error.
>> >>>> However, I've only seen that happen when a raidz vdev has a mirror
>> >>>> child.  That can only happen if the child is a spare or replacing
>> >>>> vdev.  Did you activate any spares, or did you manually replace a
>> >>>> vdev?
>> >>>>
>> >>>> -Alan
>> >>>>
>> >>>>  _______________________________________________
>> >>> freebsd-fs@freebsd.org mailing list
>> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>> >>>
>> >>>
>> >>>
>> >> _______________________________________________
>> >> freebsd-fs@freebsd.org mailing list
>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>> >>
>> >
>> >
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>
>
>

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 28 02:45:24 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 048EC9A6
 for <freebsd-fs@freebsd.org>; Tue, 28 Oct 2014 02:45:24 +0000 (UTC)
Received: from quine.pinyon.org (quine.pinyon.org [65.101.5.249])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C8035C96
 for <freebsd-fs@freebsd.org>; Tue, 28 Oct 2014 02:45:23 +0000 (UTC)
Received: by quine.pinyon.org (Postfix, from userid 122)
 id CDCAE1602C2; Mon, 27 Oct 2014 19:45:16 -0700 (MST)
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on quine.pinyon.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
 autolearn=ham autolearn_force=no version=3.4.0
Received: from feyerabend.n1.pinyon.org (feyerabend.n1.pinyon.org [10.0.10.6])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128
 bits)) (No client certificate requested)
 by quine.pinyon.org (Postfix) with ESMTPSA id A4D7E160247
 for <freebsd-fs@freebsd.org>; Mon, 27 Oct 2014 19:45:14 -0700 (MST)
Message-ID: <544F033A.8070808@pinyon.org>
Date: Mon, 27 Oct 2014 19:45:14 -0700
From: "Russell L. Carter" <rcarter@pinyon.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: ZFS errors on the array but not the disk.
References: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com>
 <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com>
 <CACpH0MfL1J8fbP+Mkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
 <544B12B8.8060302@freebsd.org>
 <CACpH0Md8f1dAqUvgAMnKN+iZbWmL2ANXuwj7xDqkiGcHaiS9jg@mail.gmail.com>
 <CACpH0MdQDi85pvks+E1A2OYRKYXi6CMiXcsL4U1Ud5r_Zw4d8g@mail.gmail.com>
 <CA+-fWwBgh-mzKFRVhtddZVZz9j8T2fh-M-gpgR+4XmchbW8W1A@mail.gmail.com>
 <CACpH0McVeuUGoC45rsK-cwrG0TFd_s=Cj66G7_TX=8a8jNBWQQ@mail.gmail.com>
In-Reply-To: <CACpH0McVeuUGoC45rsK-cwrG0TFd_s=Cj66G7_TX=8a8jNBWQQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 02:45:24 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 10/27/14 18:47, Zaphod Beeblebrox wrote:
> Well... why wouldn't this trigger an error with (say) the checksums
> on the devices themselves?  Without throwing an error, why is the
> vdev re - resilvering?  I don't have spare hardware to throw at it.
> It's otherwise a sane system.  It can "make -j32 buildworld"
> without choking.  It can download several hundred torrents at a
> time without corrupting them. Hardly seems like suspect hardware.

I will just say as a non-zfs expert that I have had several disastrous
raid failures over the last 15 yrs, and a couple that cost me real
money, and it was always hw.

And the reason it was disastrous was I couldn't diagnose it even though
I was a pseudo-expert.  I spent a lot of time under deadlines assuming
the underlying hw was sane.  The software diagnostics were no help.
I trusted the hw then, but no more.  And your reports (thank you) are
a reminder to me to not give too much credence to zpool status.

BR,
Russell


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJUTwM6AAoJEFnLrGVSDFaEkwwP/A6CsMOF0uT/TA6NAOQBeFIW
Byh9ySfYbBg9gUCB7YZFLBqmhGDzV2HCNu58cniYfVwj2Hwrr+GGahUJfagQjT1w
ssNoflihTIBCWcmanXLoD9W0QMpGuyfi556FDzRX4NunAwP+URidqcJuR3tsdCcz
jPYIQLZL6qQO5EfdX+UR9kcBS1st/6oLQ+Y2IPUXlfvg+hUQ660dS+SIfHFc+qcg
lg2fLh3Vz8bJp2BlYJR6/AaxmOGrqA7Ze9hG684vaVSAz8U5EUn4tC76OPAPc1N2
MATat7T8lot0SRI1EqLBp6vsWpYTZK7itPDjyABO6f21iltbtgvPN22Hcr8+wEdQ
AdEK4WLBsTF+xtD9DER1rVsDGIIYbBhw5vfh/7d9/RLrtf0B8rOs6OQNXV+ubjoc
I8W852jbZT1HojLEOqIdC7bzkjEgln7a0miG/VFQPjYiZG9b5juozeOPOStENrrp
ehIvvlxkeJBfJm505oLhXhOgXITC1fABTHeMfCXcbr3zw4OaN/8nHN4L4u2+HI35
2ahiWqwN/i6tF4V74zZDi9djkwuU8e+/qNrndeLotaTmXudY1Ox3wNBYEyYFCmHJ
DIBSUPKqcH3zOICfiO0mmVmPuU4a95HkslRtNy1mPTvNO4+Cpv7iLx28CdZHXWfg
BYb9ymp0bL3HAgHZwamd
=SOhS
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 28 06:03:12 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8A6A7DD9;
 Tue, 28 Oct 2014 06:03:12 +0000 (UTC)
Received: from mail.slu.se (tmgext2-1.slu.se [77.235.224.51])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (Client CN "webmail.slu.se", Issuer "TERENA SSL CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id D861B402;
 Tue, 28 Oct 2014 06:03:10 +0000 (UTC)
Received: from Exchange2-2.slu.se (130.238.96.155) by Tmg2-1.slu.se
 (130.238.96.151) with Microsoft SMTP Server (TLS) id 14.3.210.2; Tue, 28 Oct
 2014 07:03:00 +0100
Received: from Exchange2-1.slu.se ([130.238.96.154]) by exchange2-2
 ([130.238.96.155]) with mapi id 14.03.0210.002; Tue, 28 Oct 2014 07:03:00
 +0100
From: =?utf-8?B?S2FybGkgU2rDtmJlcmc=?= <Karli.Sjoberg@slu.se>
To: "zbeeble@gmail.com" <zbeeble@gmail.com>
Subject: Re: ZFS errors on the array but not the disk.
Thread-Topic: ZFS errors on the array but not the disk.
Thread-Index: AQHP8nTTZaxgLAPs10KP2kE89fC4EQ==
Date: Tue, 28 Oct 2014 06:02:59 +0000
Message-ID: <5F9E965F5A80BC468BE5F40576769F099DF78CC7@exchange2-1>
References: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com>
 <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com>
 <CACpH0MfL1J8fbP+Mkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
 <544B12B8.8060302@freebsd.org>
 <CACpH0Md8f1dAqUvgAMnKN+iZbWmL2ANXuwj7xDqkiGcHaiS9jg@mail.gmail.com>
 <CACpH0MdQDi85pvks+E1A2OYRKYXi6CMiXcsL4U1Ud5r_Zw4d8g@mail.gmail.com>
In-Reply-To: <CACpH0MdQDi85pvks+E1A2OYRKYXi6CMiXcsL4U1Ud5r_Zw4d8g@mail.gmail.com>
Accept-Language: sv-SE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [77.235.228.32]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>,
 "smh@freebsd.org" <smh@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 06:03:12 -0000

T24gTW9uLCAyMDE0LTEwLTI3IGF0IDE0OjM0IC0wNDAwLCBaYXBob2QgQmVlYmxlYnJveCB3cm90
ZToKPiBPay4uLiBUaGlzIGlzIGp1c3QgZnJ1c3RyYXRpbmcuICBJJ3ZlIHRydXN0ZWQgWkZTIHRo
cm91Z2ggbWFueSB2ZXJzaW9ucyAuLi4KPiBhbmQgcHJldHR5IG11Y2ggLi4uIGl0J3MgZGVsaXZl
cmVkLiAgVGhlcmUgYXJlIGZpdmUgc3ltcHRvbXMgaGVyZToKPiAKPiAxLiBhZnRlciBlYWNoIHJl
Ym9vdCwgcmVzaWx2ZXIgc3RhcnRzIGFnYWluLi4uIGV2ZW4gaWYgYWZ0ZXIgdGhlIHJlc2lsdmVy
IEkKPiBjb21wbGV0ZSBhIGZ1bGwgc2NydWIuCj4gCj4gMi4gc2VlbWluZ2x5IHJhbmRvbSBvYmpl
Y3RzIChmaWxlcywgenZvbHMgb3Igc25hcHNob3QgaXRlbXMpIGdldCBtYXJrZWQgYXMKPiBoYXZp
bmcgZXJyb3JzLiAgd2hlbiBJIHNheSByYW5kb20sIHRvIGJlIGNsZWFyOyBkaWZmZXJlbnQgaXRl
bXMgZWFjaCB0aW1lLgo+IAo+IDMuIG5vbmUgb2YgdGhlIGRyaXZlcyBhcmUgc2hvd2luZyBlcnJv
cnMgaW4genBvb2wgc3RhdHVzLCBuZWl0aGVyIGFyZSB0aGV5Cj4gY2h1Y2tpbmcgZXJyb3JzIGlu
dG8gZG1lc2cuCj4gCj4gNC4gZXJyb3JzIGFyZSBiZWluZyBsb2dnZWQgYWdhaW5zdCB0aGUgdmRl
diAob25seSBvbmUgb2YgdGhlIHR3byB2ZGV2cykgYW5kCj4gdGhlIGFycmF5IChoYWxmIGFzIG1h
bnkgYXMgdGhlIHZkZXYpLgo+IAo+IDUuIFRoZSBhY3Rpdml0eSBsaWdodCBmb3IgdGhlIHJlY2Vu
dGx5IHJlcGxhY2VkIGRpc2sgZG9lcyBub3QgImZsYXNoIgo+ICJ3aXRoIiB0aGUgb3RoZXJzIGlu
IGl0J3MgdmRldiBkdXJpbmcgZWl0aGVyIHJlc2lsdmVyIG9yIHNjcnViLiAgVGhpcyBsYXN0Cj4g
Yml0IG1pZ2h0IG5lZWQgc29tZSBleHBsYW5hdGlvbi4gSSByZWFsaXplIHRoYXQgcmFpZHotMSBz
dHJpcGVzIGRvIG5vdAo+IGFsd2F5cyB1c2UgYWxsIHRoZSBkaXNrcywgYnV0ICJnZW5lcmFsbHki
IHRoZSBhY3Rpdml0eSBsaWdodHMgb2YgdGhlIGRyaXZlcwo+IGluIGEgdmRldiBnbyAidG9nZXRo
ZXIiLi4uIEluIHRoaXMgY2FzZSwgdGhlIGxpZ2h0IG9mIHRoZSByZWNlbnRseSByZXBsYWNlZAo+
IGRyaXZlIGlzIG9mZiBtdWNoIG9mIHRoZSB0aW1lIC4uLgo+IAo+IElzIHRoZXJlIGFueXRoaW5n
IEkgY2FuL3Nob3VsZCBkbz8gIEkgcHVsbGVkIHRoZSBuZXcgZGlzaywgbW92ZWQgaXQncwo+IHBh
cnRpdGlvbnMgYXJvdW5kIChpdCdzIGxhcmdlciB0aGFuIHRoZSBhcnJheSBkaXNrcyBiZWNhdXNl
IHlvdSBjYW4ndCBidXkKPiAxLjVUIGRyaXZlcyBhbnltb3JlKSBhbmQgdGhlbiByZS1hZGRlZCBp
dC4uLiBzbyBJJ3ZlIHRyaWVkIHRoYXQuCgpIYXZlIHlvdSB0cmllZCBzdGFydGluZyBpdCB1cCBm
cm9tIGEgQ0QsIFVTQiwgd2hhdGV2LCBhbmQgdHJ5IHRvIGltcG9ydAp0aGUgcG9vbCBmcm9tIHRo
ZXJlPwoKL0sKCj4gCj4gCj4gT24gRnJpLCBPY3QgMjQsIDIwMTQgYXQgMTE6NDcgUE0sIFphcGhv
ZCBCZWVibGVicm94IDx6YmVlYmxlQGdtYWlsLmNvbT4KPiB3cm90ZToKPiAKPiA+IFRoYW5rcyBm
b3IgdGhlIGhlYWRzIHVwLiAgSSdtIGZvbGxvd2luZyByZWxlbmcvMTAuMSBhbmQgMjcxNjgzIHNl
ZW1zIHRvIGJlCj4gPiBwYXJ0IG9mIHRoYXQsIGJ1dCBhIGdvb2QgY2F0Y2gvZ3Vlc3MuCj4gPgo+
ID4KPiA+IE9uIEZyaSwgT2N0IDI0LCAyMDE0IGF0IDExOjAyIFBNLCBTdGV2ZW4gSGFydGxhbmQg
PHNtaEBmcmVlYnNkLm9yZz4gd3JvdGU6Cj4gPgo+ID4+IFRoZXJlIHdhcyBhbiBpc3N1ZSB3aGlj
aCB3b3VsZCBjYXVzZSByZXNpbHZlciByZXN0YXJ0cyBmaXhlZCBieSAqMjY1MjUzKiA8Cj4gPj4g
aHR0cHM6Ly9zdm53ZWIuZnJlZWJzZC5vcmcvYmFzZT92aWV3PXJldmlzaW9uJnJldmlzaW9uPTI2
NTI1Mz4gd2hpY2ggd2FzCj4gPj4gTUZDJ2VkIHRvIHN0YWJsZS8xMCBieSAqMjcxNjgzKiA8aHR0
cHM6Ly9zdm53ZWIuZnJlZWJzZC5vcmcvCj4gPj4gYmFzZT92aWV3PXJldmlzaW9uJnJldmlzaW9u
PTI3MTY4Mz5zbyB5b3UnbGwgd2FudCB0byBtYWtlIHN1cmUgeW91cgo+ID4+IGxhdHRlciB0aGFu
IHRoYXQuCj4gPj4KPiA+Pgo+ID4+IE9uIDI0LzEwLzIwMTQgMTk6NDIsIFphcGhvZCBCZWVibGVi
cm94IHdyb3RlOgo+ID4+Cj4gPj4+IEkgbWFudWFsbHkgcmVwbGFjZWQgYSBkaXNrLi4uIGFuZCB0
aGUgYXJyYXkgd2FzIHNjcnViYmVkIHJlY2VudGx5Lgo+ID4+PiBJbnRlcmVzdGluZ2x5LCBJIHNl
ZW0gdG8gYmUgaW4gdGhlICJlbmRsZXNzIGxvb3AiICBvZiByZXNpbHZlcmluZwo+ID4+PiBwcm9i
bGVtLgo+ID4+PiBOb3QgbXVjaCBJIGNhbiBmaW5kIG9uIGl0LiAgYnV0IHJlc2lsdmVyaW5nIHdp
bGwgY29tcGxldGUgYW5kIEkgY2FuIHRoZW4KPiA+Pj4gcnVuIGFub3RoZXIgc2NydWIuICBJdCB3
aWxsIGNvbXBsZXRlLCB0b28uICBUaGVuIHJlYm9vdGluZyBjYXVzZXMgYW5vdGhlcgo+ID4+PiBy
ZXNpbHZlcmluZy4KPiA+Pj4KPiA+Pj4gQW5vdGhlciBvZGQgZGF0YSBwb2ludDogaXQgc2VlbXMg
YXMgaWYgdGhlIHRoaW5ncyB0aGF0IHNob3cgdXAgYXMKPiA+Pj4gImVycm9ycyIKPiA+Pj4gY2hh
bmdlIGZyb20gcmVzaWx2ZXJpbmcgdG8gcmVzaWx2ZXJpbmcuCj4gPj4+Cj4gPj4+IE9uZSBidWcs
IGl0IHdvdWxkIHNlZW0sIGlzIHRoYXQgb25jZSBaRlMgaGFzIGRldGVjdGVkIGFuIGVycm9yLi4u
IGFub3RoZXIKPiA+Pj4gc2NydWIgY2FuIHJlc2V0IGl0LCBidXQgbm8gYXR0ZW1wdCBpcyBtYWRl
IHRvIHJlYWQtdGhyb3VnaCB0aGUgZXJyb3IgaWYKPiA+Pj4geW91Cj4gPj4+IGFjY2VzcyB0aGUg
b2JqZWN0IGRpcmVjdGx5Lgo+ID4+Pgo+ID4+PiBPbiBGcmksIE9jdCAyNCwgMjAxNCBhdCAxMToz
MyBBTSwgQWxhbiBTb21lcnMgPGFzb21lcnNAZnJlZWJzZC5vcmc+Cj4gPj4+IHdyb3RlOgo+ID4+
Pgo+ID4+PiAgT24gVGh1LCBPY3QgMjMsIDIwMTQgYXQgMTE6MzcgUE0sIFphcGhvZCBCZWVibGVi
cm94IDx6YmVlYmxlQGdtYWlsLmNvbT4KPiA+Pj4+IHdyb3RlOgo+ID4+Pj4KPiA+Pj4+PiBXaGF0
IGRvZXMgaXQgbWVhbiB3aGVuIGNoZWNrc3VtIGVycm9ycyBhcHBlYXIgb24gdGhlIGFycmF5IChh
bmQgdGhlCj4gPj4+Pj4gdmRldikKPiA+Pj4+PiBidXQgbm90IG9uIGFueSBvZiB0aGUgZGlza3M/
ICBTZWUgdGhlIHBhc3RlIGJlbG93LiAgT25lIHdvdWxkIHRoaW5rCj4gPj4+Pj4gdGhhdAo+ID4+
Pj4+IHRoZXJlIGlzbid0IHNvbWUgZXBoZW1lcmFsIGRhdGEgc3RvcmVkIHNvbWV3aGVyZSB0aGF0
IGlzIG5vdCBvbmUgb2YgdGhlCj4gPj4+Pj4gZGlza3MsIHlldCAiY2tzdW0iIGVycm9ycyBzaG93
IG9ubHkgb24gdGhlIHZkZXYgYW5kIHRoZSBhcnJheSBsaW5lcy4KPiA+Pj4+Pgo+ID4+Pj4gSGVs
cD8KPiA+Pj4+Cj4gPj4+Pj4gWzI6MTc6MzE2XXJvb3RAdmlydHVhbDovdnIyL3RvcnJlbnQvaW4+
IHpwb29sIHN0YXR1cwo+ID4+Pj4+ICAgIHBvb2w6IHZyMgo+ID4+Pj4+ICAgc3RhdGU6IE9OTElO
RQo+ID4+Pj4+IHN0YXR1czogT25lIG9yIG1vcmUgZGV2aWNlcyBpcyBjdXJyZW50bHkgYmVpbmcg
cmVzaWx2ZXJlZC4gIFRoZSBwb29sCj4gPj4+Pj4gd2lsbAo+ID4+Pj4+ICAgICAgICAgIGNvbnRp
bnVlIHRvIGZ1bmN0aW9uLCBwb3NzaWJseSBpbiBhIGRlZ3JhZGVkIHN0YXRlLgo+ID4+Pj4+IGFj
dGlvbjogV2FpdCBmb3IgdGhlIHJlc2lsdmVyIHRvIGNvbXBsZXRlLgo+ID4+Pj4+ICAgIHNjYW46
IHJlc2lsdmVyIGluIHByb2dyZXNzIHNpbmNlIFRodSBPY3QgMjMgMjM6MTE6MjkgMjAxNAo+ID4+
Pj4+ICAgICAgICAgIDEuNTNUIHNjYW5uZWQgb3V0IG9mIDIyLjZUIGF0IDYyLjRNL3MsIDk4aDIz
bSB0byBnbwo+ID4+Pj4+ICAgICAgICAgIDExOUcgcmVzaWx2ZXJlZCwgNi43OSUgZG9uZQo+ID4+
Pj4+IGNvbmZpZzoKPiA+Pj4+Pgo+ID4+Pj4+ICAgICAgICAgIE5BTUUgICAgICAgICAgICAgICBT
VEFURSAgICAgUkVBRCBXUklURSBDS1NVTQo+ID4+Pj4+ICAgICAgICAgIHZyMiAgICAgICAgICAg
ICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAzNgo+ID4+Pj4+ICAgICAgICAgICAgcmFpZHox
LTAgICAgICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICA3Mgo+ID4+Pj4+ICAgICAgICAgICAg
ICBsYWJlbC92cjItZDAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ICAgICAg
ICAgICAgICBsYWJlbC92cjItZDEgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+
ICAgICAgICAgICAgICBncHQvdnIyLWQyYyAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMCAg
YmxvY2sgc2l6ZToKPiA+Pj4+PiA1MTJCCj4gPj4+Pj4gY29uZmlndXJlZCwgNDA5NkIgbmF0aXZl
ICAocmVzaWx2ZXJpbmcpCj4gPj4+Pj4gICAgICAgICAgICAgIGdwdC92cjItZDNiICAgIE9OTElO
RSAgICAgICAwICAgICAwICAgICAwICBibG9jayBzaXplOgo+ID4+Pj4+IDUxMkIKPiA+Pj4+PiBj
b25maWd1cmVkLCA0MDk2QiBuYXRpdmUKPiA+Pj4+PiAgICAgICAgICAgICAgZ3B0L3ZyMi1kNGEg
ICAgT05MSU5FICAgICAgIDAgICAgIDAgICAgIDAgIGJsb2NrIHNpemU6Cj4gPj4+Pj4gNTEyQgo+
ID4+Pj4+IGNvbmZpZ3VyZWQsIDQwOTZCIG5hdGl2ZQo+ID4+Pj4+ICAgICAgICAgICAgICBhZGEx
NCAgICAgICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ICAgICAgICAgICAg
ICBsYWJlbC92cjItZDYgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ICAgICAg
ICAgICAgICBsYWJlbC92cjItZDdjICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+
ICAgICAgICAgICAgICBsYWJlbC92cjItZDggICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+
ID4+Pj4+ICAgICAgICAgICAgcmFpZHoxLTEgICAgICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAg
ICAgMAo+ID4+Pj4+ICAgICAgICAgICAgICBncHQvdnIyLWUwICAgICBPTkxJTkUgICAgICAgMCAg
ICAgMCAgICAgMCAgYmxvY2sgc2l6ZToKPiA+Pj4+PiA1MTJCCj4gPj4+Pj4gY29uZmlndXJlZCwg
NDA5NkIgbmF0aXZlCj4gPj4+Pj4gICAgICAgICAgICAgIGdwdC92cjItZTEgICAgIE9OTElORSAg
ICAgICAwICAgICAwICAgICAwICBibG9jayBzaXplOgo+ID4+Pj4+IDUxMkIKPiA+Pj4+PiBjb25m
aWd1cmVkLCA0MDk2QiBuYXRpdmUKPiA+Pj4+PiAgICAgICAgICAgICAgZ3B0L3ZyMi1lMiAgICAg
T05MSU5FICAgICAgIDAgICAgIDAgICAgIDAgIGJsb2NrIHNpemU6Cj4gPj4+Pj4gNTEyQgo+ID4+
Pj4+IGNvbmZpZ3VyZWQsIDQwOTZCIG5hdGl2ZQo+ID4+Pj4+ICAgICAgICAgICAgICBncHQvdnIy
LWUzICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ICAgICAgICAgICAgICBn
cHQvdnIyLWU0ICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMCAgYmxvY2sgc2l6ZToKPiA+
Pj4+PiA1MTJCCj4gPj4+Pj4gY29uZmlndXJlZCwgNDA5NkIgbmF0aXZlCj4gPj4+Pj4gICAgICAg
ICAgICAgIGdwdC92cjItZTUgICAgIE9OTElORSAgICAgICAwICAgICAwICAgICAwICBibG9jayBz
aXplOgo+ID4+Pj4+IDUxMkIKPiA+Pj4+PiBjb25maWd1cmVkLCA0MDk2QiBuYXRpdmUKPiA+Pj4+
PiAgICAgICAgICAgICAgZ3B0L3ZyMi1lNiAgICAgT05MSU5FICAgICAgIDAgICAgIDAgICAgIDAg
IGJsb2NrIHNpemU6Cj4gPj4+Pj4gNTEyQgo+ID4+Pj4+IGNvbmZpZ3VyZWQsIDQwOTZCIG5hdGl2
ZQo+ID4+Pj4+ICAgICAgICAgICAgICBncHQvdnIyLWU3ICAgICBPTkxJTkUgICAgICAgMCAgICAg
MCAgICAgMCAgYmxvY2sgc2l6ZToKPiA+Pj4+PiA1MTJCCj4gPj4+Pj4gY29uZmlndXJlZCwgNDA5
NkIgbmF0aXZlCj4gPj4+Pj4KPiA+Pj4+PiBlcnJvcnM6IDQzIGRhdGEgZXJyb3JzLCB1c2UgJy12
JyBmb3IgYSBsaXN0Cj4gPj4+Pj4KPiA+Pj4+IFRoZSBjaGVja3N1bSBlcnJvcnMgd2lsbCBhcHBl
YXIgb24gdGhlIHJhaWR6IHZkZXYgaW5zdGVhZCBvZiBhIGxlYWYgaWYKPiA+Pj4+IHZkZXZfcmFp
ZHouYyBjYW4ndCBkZXRlcm1pbmUgd2hpY2ggbGVhZiB2ZGV2IHdhcyByZXNwb25zaWJsZS4gIFRo
aXMKPiA+Pj4+IGNvdWxkIGhhcHBlbiBpZiB0d28gb3IgbW9yZSBsZWFmIHZkZXZzIHJldHVybiBi
YWQgZGF0YSBmb3IgdGhlIHNhbWUKPiA+Pj4+IGJsb2NrLCB3aGljaCB3b3VsZCBhbHNvIGxlYWQg
dG8gdW5yZWNvdmVyYWJsZSBkYXRhIGVycm9ycy4gIEkgc2VlIHRoYXQKPiA+Pj4+IHlvdSBoYXZl
IHNvbWUgdW5yZWNvdmVyYWJsZSBkYXRhIGVycm9ycywgc28gbWF5YmUgdGhhdCdzIHdoYXQgaGFw
cGVuZWQKPiA+Pj4+IHRvIHlvdS4KPiA+Pj4+Cj4gPj4+PiBTdWJ0bGUgZGVzaWduIGJ1Z3MgaW4g
WkZTIGNhbiBhbHNvIGxlYWQgdG8gdmRldl9yYWlkei5jIGJlaW5nIHVuYWJsZQo+ID4+Pj4gdG8g
ZGV0ZXJtaW5lIHdoaWNoIGNoaWxkIHdhcyByZXNwb25zaWJsZSBmb3IgYSBjaGVja3N1bSBlcnJv
ci4KPiA+Pj4+IEhvd2V2ZXIsIEkndmUgb25seSBzZWVuIHRoYXQgaGFwcGVuIHdoZW4gYSByYWlk
eiB2ZGV2IGhhcyBhIG1pcnJvcgo+ID4+Pj4gY2hpbGQuICBUaGF0IGNhbiBvbmx5IGhhcHBlbiBp
ZiB0aGUgY2hpbGQgaXMgYSBzcGFyZSBvciByZXBsYWNpbmcKPiA+Pj4+IHZkZXYuICBEaWQgeW91
IGFjdGl2YXRlIGFueSBzcGFyZXMsIG9yIGRpZCB5b3UgbWFudWFsbHkgcmVwbGFjZSBhCj4gPj4+
PiB2ZGV2Pwo+ID4+Pj4KPiA+Pj4+IC1BbGFuCj4gPj4+Pgo+ID4+Pj4gIF9fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCj4gPj4+IGZyZWVic2QtZnNAZnJlZWJz
ZC5vcmcgbWFpbGluZyBsaXN0Cj4gPj4+IGh0dHA6Ly9saXN0cy5mcmVlYnNkLm9yZy9tYWlsbWFu
L2xpc3RpbmZvL2ZyZWVic2QtZnMKPiA+Pj4gVG8gdW5zdWJzY3JpYmUsIHNlbmQgYW55IG1haWwg
dG8gImZyZWVic2QtZnMtdW5zdWJzY3JpYmVAZnJlZWJzZC5vcmciCj4gPj4+Cj4gPj4+Cj4gPj4+
Cj4gPj4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KPiA+
PiBmcmVlYnNkLWZzQGZyZWVic2Qub3JnIG1haWxpbmcgbGlzdAo+ID4+IGh0dHA6Ly9saXN0cy5m
cmVlYnNkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2ZyZWVic2QtZnMKPiA+PiBUbyB1bnN1YnNjcmli
ZSwgc2VuZCBhbnkgbWFpbCB0byAiZnJlZWJzZC1mcy11bnN1YnNjcmliZUBmcmVlYnNkLm9yZyIK
PiA+Pgo+ID4KPiA+Cj4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fX18KPiBmcmVlYnNkLWZzQGZyZWVic2Qub3JnIG1haWxpbmcgbGlzdAo+IGh0dHA6Ly9saXN0
cy5mcmVlYnNkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2ZyZWVic2QtZnMKPiBUbyB1bnN1YnNjcmli
ZSwgc2VuZCBhbnkgbWFpbCB0byAiZnJlZWJzZC1mcy11bnN1YnNjcmliZUBmcmVlYnNkLm9yZyIK
CgoKLS0gCgpNZWQgVsOkbmxpZ2EgSMOkbHNuaW5nYXIKCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0K
S2FybGkgU2rDtmJlcmcKU3dlZGlzaCBVbml2ZXJzaXR5IG9mIEFncmljdWx0dXJhbCBTY2llbmNl
cyBCb3ggNzA3OSAoVmlzaXRpbmcgQWRkcmVzcwpLcm9uw6VzdsOkZ2VuIDgpClMtNzUwIDA3IFVw
cHNhbGEsIFN3ZWRlbgpQaG9uZTogICs0Ni0oMCkxOC02NyAxNSA2NgprYXJsaS5zam9iZXJnQHNs
dS5zZQo=

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 28 12:58:20 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B8D6313E
 for <freebsd-fs@freebsd.org>; Tue, 28 Oct 2014 12:58:20 +0000 (UTC)
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com
 [66.111.4.25])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 873BA838
 for <freebsd-fs@freebsd.org>; Tue, 28 Oct 2014 12:58:20 +0000 (UTC)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.nyi.internal (Postfix) with ESMTP id 9EBE420AAE
 for <freebsd-fs@freebsd.org>; Tue, 28 Oct 2014 08:58:18 -0400 (EDT)
Received: from web3 ([10.202.2.213])
 by compute1.internal (MEProxy); Tue, 28 Oct 2014 08:58:18 -0400
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=
 messagingengine.com; h=message-id:x-sasl-enc:from:to
 :mime-version:content-transfer-encoding:content-type:subject
 :date:in-reply-to:references; s=smtpout; bh=mVG/QQrU0PC9QMYViVyu
 RKJJu/c=; b=C+Ph150AO57FAdytTzTtWbqYH7vgmvQtDXgUYwSbHEU/xWpVM4if
 GsN4gYYvvpUqM9NGloIZjsyK45eCqPp7wdJ5BXJbuIDJy2fi+Wn+dnJjrvPEwjH5
 04jt/Qu5zT0fli4Ck1srTPu76/h4MCN/rzSdpvs4knkZykREwsmAQI8=
Received: by web3.nyi.internal (Postfix, from userid 99)
 id 6EB22114632; Tue, 28 Oct 2014 08:58:18 -0400 (EDT)
Message-Id: <1414501098.45274.184197353.0847A931@webmail.messagingengine.com>
X-Sasl-Enc: V7cYWP1L0Spleqtflk9h6RdeeT5PnfgbeW9Nhu6jw7wz 1414501098
From: Mark Felder <feld@FreeBSD.org>
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain
X-Mailer: MessagingEngine.com Webmail Interface - ajax-c51dec4f
Subject: Re: ZFS errors on the array but not the disk.
Date: Tue, 28 Oct 2014 07:58:18 -0500
In-Reply-To: <CACpH0MdQDi85pvks+E1A2OYRKYXi6CMiXcsL4U1Ud5r_Zw4d8g@mail.gmail.com>
References: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com>
 <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com>
 <CACpH0MfL1J8fbP+Mkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
 <544B12B8.8060302@freebsd.org>
 <CACpH0Md8f1dAqUvgAMnKN+iZbWmL2ANXuwj7xDqkiGcHaiS9jg@mail.gmail.com>
 <CACpH0MdQDi85pvks+E1A2OYRKYXi6CMiXcsL4U1Ud5r_Zw4d8g@mail.gmail.com>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 12:58:20 -0000


On Mon, Oct 27, 2014, at 13:34, Zaphod Beeblebrox wrote:
> 
> Is there anything I can/should do?  I pulled the new disk, moved it's
> partitions around (it's larger than the array disks because you can't buy
> 1.5T drives anymore) and then re-added it... so I've tried that.
> 

Test and/or replace your power supply. You'd be surprised what dropping
voltage (even slightly) can do.

Consider all parts of your system suspect until they've been thoroughly
vetted.

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 29 18:08:37 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 33776A99
 for <freebsd-fs@freebsd.org>; Wed, 29 Oct 2014 18:08:37 +0000 (UTC)
Received: from plane.gmane.org (plane.gmane.org [80.91.229.3])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id E2E3FEDC
 for <freebsd-fs@freebsd.org>; Wed, 29 Oct 2014 18:08:36 +0000 (UTC)
Received: from list by plane.gmane.org with local (Exim 4.69)
 (envelope-from <freebsd-fs@m.gmane.org>) id 1XjXfY-0000zO-CL
 for freebsd-fs@freebsd.org; Wed, 29 Oct 2014 19:08:28 +0100
Received: from jtotz2.cs.ucl.ac.uk ([128.16.6.56])
 by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
 id 1AlnuQ-0007hv-00
 for <freebsd-fs@freebsd.org>; Wed, 29 Oct 2014 19:08:28 +0100
Received: from johannes by jtotz2.cs.ucl.ac.uk with local (Gmexim 0.1 (Debian))
 id 1AlnuQ-0007hv-00
 for <freebsd-fs@freebsd.org>; Wed, 29 Oct 2014 19:08:28 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Johannes Totz <johannes@jo-t.de>
Subject: Re: Snapshots and what not to snapshot
Date: Wed, 29 Oct 2014 18:08:17 +0000
Lines: 63
Message-ID: <m2raeh$9ek$2@ger.gmane.org>
References: <alpine.BSF.2.00.1410120128570.6601@woozle.rinet.ru>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: jtotz2.cs.ucl.ac.uk
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.2
In-Reply-To: <alpine.BSF.2.00.1410120128570.6601@woozle.rinet.ru>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 18:08:37 -0000

On 11/10/2014 22:38, Dmitry Morozovsky wrote:
> Colleagues,
> 
> reading some last threads I'm starting to think again about the problem I 
> thought about for many times, but invent nothing but crude hacks:
> 
> it would be great to have a mechanism to exclude some subtrees from recursive 
> snapshots; the model is like:
> 
> you have some tree of ZFS file systems, like
> 
> pool/path/r
> pool/path/jails
> pool/path/jails/j1
> pool/path/jails/j1/obj
> ..
> pool/path/persistent
> pool/path/obj
> 
> or something alike.
> 
> To have the ability to make consistent backup, one would use ``zfs snapshot 
> -r''
> 
> but -- before using zfs send or other replication machanisms it would be 
> feasible to remove snapshots of not-so-important filesystems.

Not just remove but exclude from snapshotting in the first place.

> 
> For now, the kludge I could see is to set on these some artificial property 
> like org.freebsd:nodump or similar, then traverse zfs list with this attribute 
> and delete non-needed snapshots.

snapshot -r could inspect a property on children and skips snapshot
creation if some criteria are fullfilled.

For example:

zfs set org.freebsd:skip_recursive_snapshot=hou.* pool/backup
zfs snapshot -r pool@hourly
zfs snapshot -r pool@house
zfs snapshot -r pool@important

The skip property could be a regex that is matched against the
to-be-created snapshot name. If it matches, no snaps for that child and
its children recursively.

> 
> Maybe somewhere there are more elegant solutions?
> 
> Sincerely,
> D.Marck                                     [DM5020, MCK-RIPE, DM3-RIPN]
> [ FreeBSD committer:                                 marck@FreeBSD.org ]
> ------------------------------------------------------------------------
> *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru ***
> ------------------------------------------------------------------------
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 


From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 29 19:18:52 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 14CF8D84
 for <freebsd-fs@freebsd.org>; Wed, 29 Oct 2014 19:18:52 +0000 (UTC)
Received: from mail.jrv.org (rrcs-24-73-246-106.sw.biz.rr.com [24.73.246.106])
 by mx1.freebsd.org (Postfix) with ESMTP id DBA30900
 for <freebsd-fs@freebsd.org>; Wed, 29 Oct 2014 19:18:51 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1])
 by mail.jrv.org (Postfix) with ESMTP id 91E811BDBE8;
 Wed, 29 Oct 2014 14:13:05 -0500 (CDT)
Received: from mail.jrv.org ([127.0.0.1])
 by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id Dikx1qKcJVFY; Wed, 29 Oct 2014 14:12:55 -0500 (CDT)
Received: from localhost (localhost.localdomain [127.0.0.1])
 by mail.jrv.org (Postfix) with ESMTP id A90851BDBE1;
 Wed, 29 Oct 2014 14:12:55 -0500 (CDT)
X-Virus-Scanned: amavisd-new at zimbra64.housenet.jrv
Received: from mail.jrv.org ([127.0.0.1])
 by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id GwydYXvSCHxA; Wed, 29 Oct 2014 14:12:55 -0500 (CDT)
Received: from [192.168.138.128] (BMX.housenet.jrv [192.168.3.140])
 by mail.jrv.org (Postfix) with ESMTPSA id 80F371BDBDE;
 Wed, 29 Oct 2014 14:12:55 -0500 (CDT)
Message-ID: <54513C4D.4010203@jrv.org>
Date: Wed, 29 Oct 2014 13:13:17 -0600
From: "James R. Van Artsdalen" <james-freebsd-fs2@jrv.org>
User-Agent: Mozilla/5.0 (Windows NT 5.0;
 rv:12.0) Gecko/20120428 Thunderbird/12.0.1
MIME-Version: 1.0
To: Johannes Totz <johannes@jo-t.de>
Subject: Re: Snapshots and what not to snapshot
References: <alpine.BSF.2.00.1410120128570.6601@woozle.rinet.ru>
 <m2raeh$9ek$2@ger.gmane.org>
In-Reply-To: <m2raeh$9ek$2@ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 19:18:52 -0000

On 10/29/2014 12:08 PM, Johannes Totz wrote:
> On 11/10/2014 22:38, Dmitry Morozovsky wrote:
>> you have some tree of ZFS file systems, like
>>
>> pool/path/r
>> pool/path/jails
>> pool/path/jails/j1
>> pool/path/jails/j1/obj
>>

snapshots and ZFS replication is done against the ZFS namespace, not the
unix namespace.  Organize your filesystems in the ZFS tree based on how
you want to replicate/snapshot them, then use the ZFS mountpoint
property to put them in the unix namespace where you want them to appear.

For example the basic approach I use for client systems is a ZFS
namespace like POOL/UNIX for FreeBSD, POOL/BUSINESS for shared company
data, POOL/BACKUP for client system backup blobs, POOL/REPLICANT for the
replication workspace to use in keeping  hot-spare servers updated, etc.

Note that the root of the ZFS tree is empty, and that the root of the
unix tree is elsewhere.  I often keep more than one bootable unix system
root in a pool (for maintenance).

PS. Don't forget the zpool bootfs property.

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 29 21:13:54 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 00330EC8
 for <freebsd-fs@freebsd.org>; Wed, 29 Oct 2014 21:13:53 +0000 (UTC)
Received: from relay.exonetric.net (relay0.exonetric.net [178.250.72.161])
 by mx1.freebsd.org (Postfix) with ESMTP id C142E87A
 for <freebsd-fs@freebsd.org>; Wed, 29 Oct 2014 21:13:52 +0000 (UTC)
Received: from [192.168.10.18] (186.211.187.81.in-addr.arpa [81.187.211.186])
 by relay.exonetric.net (Postfix) with ESMTPSA id B3CD92CC72
 for <freebsd-fs@freebsd.org>; Wed, 29 Oct 2014 21:03:59 +0000 (GMT)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1990.1\))
Subject: Re: either gptzfsboot or zfsloader hangs during boot after kernel and
 pool upgrade
From: Mark Blackman <mark@exonetric.com>
In-Reply-To: <6F3D0C72-D774-4B1F-8A5F-25CD1C55EBE0@exonetric.com>
Date: Wed, 29 Oct 2014 21:03:55 +0000
Content-Transfer-Encoding: quoted-printable
Message-Id: <50B4C3C1-B4A1-4D8A-8E4A-2B5549E13A45@exonetric.com>
References: <6F3D0C72-D774-4B1F-8A5F-25CD1C55EBE0@exonetric.com>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.1990.1)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 21:13:54 -0000

Following a suggestion from Matt Reimer, I've updated the bootcode

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

but using a gptzfsboot from FreeBSD 9.2-release instead of FreeBSD =
8.4-release and the system now boots correctly.

So my immediate problem is resolved, but it does mean there's a bug in =
the gptzfsboot for FreeBSD 8.4 at least and I=92m pretty sure it=92s the =
serial changes from 9.2 need to be ported to 8.4.


> On 18 Sep 2014, at 21:48, Mark Blackman <mark@exonetric.com> wrote:
>=20
> Hi,
>=20
> I=92ve filed https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D193758 =
on the subject topic, but thought I would publicise the issue here too.
>=20
> Short story: following a zpool upgrade on a FreeBSD 8.4 system, the =
system now freezes very early in boot process.
>=20
> Regards,
> Mark Blackman
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 30 22:44:40 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 88CC4CA1;
 Thu, 30 Oct 2014 22:44:40 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu
 [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "khavrinen.csail.mit.edu", Issuer "Client CA" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 30D58D8B;
 Thu, 30 Oct 2014 22:44:40 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1])
 by khavrinen.csail.mit.edu (8.14.9/8.14.9) with ESMTP id s9UMicgI034127
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA);
 Thu, 30 Oct 2014 18:44:38 -0400 (EDT)
 (envelope-from wollman@khavrinen.csail.mit.edu)
Received: (from wollman@localhost)
 by khavrinen.csail.mit.edu (8.14.9/8.14.9/Submit) id s9UMic4t034124;
 Thu, 30 Oct 2014 18:44:38 -0400 (EDT) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21586.48982.64913.250497@khavrinen.csail.mit.edu>
Date: Thu, 30 Oct 2014 18:44:38 -0400
From: Garrett Wollman <wollman@csail.mit.edu>
To: freebsd-stable@freebsd.org, freebsd-fs@freebsd.org
Subject: Definite NFS  bug
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (khavrinen.csail.mit.edu [127.0.0.1]); Thu, 30 Oct 2014 18:44:38 -0400 (EDT)
Cc: rmacklem@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Oct 2014 22:44:40 -0000

Like many other users, I upgrade my FreeBSD servers by NFS-mounting
/usr/src and /usr/obj from a shared build server.[1]  Since I upgraded
the build server to 9.3, clients running 9.3 kernels have been
randomly erroring out during installkernel and installworld.  Today I
had some time to look more closely into this and found that the error
is definitely coming from the server: at some point, it just randomly
starts returning errors to client ACCESS and GETATTR operations.  The
errors are a mix of NFS3ERR_IO and NFS3ERR_ACCES, but there is nothing
on the server to indicate any kind of error, and restarting the
operation on the client causes it to fail in a different place.  With
enough patients and restarts, it's possible to complete the
installation in just four or five passes.

Needless to say this is a bit worrying.  Strangely, 9.1 and 9.2
clients don't see this issue at all; it's only 9.3 clients that
break.

It's easy to reproduce, just 'cd /usr/sc && find . -type f >/dev/null'.
It does not seem to depend on the client NFS version (3 or 4) or
implementation ("old" or "new").  I haven't tried the "old" server yet
-- I'll need to figure out how to do that first.

If anyone is willing to help debug this, I can share a packet trace,
but I don't think it's very informative.  Also, if anyone has a good
dtrace script that I could run on the server that would report what's
going on when that first NFS3ERR_IO is returned, that would be great.

-GAWollman

[1] I'd run my own freebsd-update server but unfortunately it is too
tied to building things that look like official FreeBSD security
updates, and isn't really designed for (e.g.) updating kernels when we
change a configuration option.  It also doesn't have any obvious knobs
for building with anything other than a default {make,src}.conf.
And with a pkg-able base just around the corner I don't really want to
put much effort into making freebsd-update do what I want.  NFS, on
the other hand, is a big deal and so I need to track down and fix
these bugs.

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 31 00:07:41 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 7AC37E43
 for <freebsd-fs@freebsd.org>; Fri, 31 Oct 2014 00:07:41 +0000 (UTC)
Received: from quine.pinyon.org (quine.pinyon.org [65.101.5.249])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4D88F7A4
 for <freebsd-fs@freebsd.org>; Fri, 31 Oct 2014 00:07:41 +0000 (UTC)
Received: by quine.pinyon.org (Postfix, from userid 122)
 id 6869616031A; Thu, 30 Oct 2014 17:07:34 -0700 (MST)
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on quine.pinyon.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
 autolearn=ham autolearn_force=no version=3.4.0
Received: from feyerabend.n1.pinyon.org (feyerabend.n1.pinyon.org [10.0.10.6])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128
 bits)) (No client certificate requested)
 by quine.pinyon.org (Postfix) with ESMTPSA id 9935E1602E3;
 Thu, 30 Oct 2014 17:07:31 -0700 (MST)
Message-ID: <5452D2C3.9040902@pinyon.org>
Date: Thu, 30 Oct 2014 17:07:31 -0700
From: "Russell L. Carter" <rcarter@pinyon.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Garrett Wollman <wollman@csail.mit.edu>, freebsd-fs@freebsd.org
Subject: Re: Definite NFS  bug
References: <21586.48982.64913.250497@khavrinen.csail.mit.edu>
In-Reply-To: <21586.48982.64913.250497@khavrinen.csail.mit.edu>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Oct 2014 00:07:41 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 10/30/14 15:44, Garrett Wollman wrote:
> Like many other users, I upgrade my FreeBSD servers by
> NFS-mounting /usr/src and /usr/obj from a shared build server.[1]
> Since I upgraded the build server to 9.3, clients running 9.3
> kernels have been randomly erroring out during installkernel and
> installworld.  Today I had some time to look more closely into this
> and found that the error is definitely coming from the server: at
> some point, it just randomly starts returning errors to client
> ACCESS and GETATTR operations.  The errors are a mix of NFS3ERR_IO
> and NFS3ERR_ACCES, but there is nothing on the server to indicate
> any kind of error, and restarting the operation on the client
> causes it to fail in a different place.  With enough patients and
> restarts, it's possible to complete the installation in just four
> or five passes.
> 
> Needless to say this is a bit worrying.  Strangely, 9.1 and 9.2 
> clients don't see this issue at all; it's only 9.3 clients that 
> break.
> 
> It's easy to reproduce, just 'cd /usr/sc && find . -type f
> >/dev/null'. It does not seem to depend on the client NFS version
> (3 or 4) or implementation ("old" or "new").  I haven't tried the
> "old" server yet -- I'll need to figure out how to do that first.
> 
> If anyone is willing to help debug this, I can share a packet
> trace, but I don't think it's very informative.  Also, if anyone
> has a good dtrace script that I could run on the server that would
> report what's going on when that first NFS3ERR_IO is returned, that
> would be great.

This sounds sort of like what I have been complaining about.
I of course have no competency here but if I build the world
- -j1, I have a much better chance of successful remote installs.
The problems I'm seeing on -current for the last few months
seem to me to be out-of-date targets, so that the failure is a
desire by the remote client to try to rebuild the out-of-date target
on the RO file system.  My new plan is to dump all of the
st_atim and st_mtim for every .depend list on both systems
when I see the problem again, to see if something jumps out.

I just reinstalled everybody with -j1 builds of r273808M, no problems.
Last week however, a fast box failed.  Kind of concerning for
an install to fail say 2/3 through.  I have to admit when
soon after I had a crash on that 2/3 system (on NFS unmount),
I had to step out of the room for the reboot.  Exciting.

I am traveling on Sunday for a week, but I've got a few days to
run things on several big fast 8cpu boxes (my old laptop is
much less afflicted with this problem, though it occasionally
fails too).

Russell


> -GAWollman
> 
> [1] I'd run my own freebsd-update server but unfortunately it is
> too tied to building things that look like official FreeBSD
> security updates, and isn't really designed for (e.g.) updating
> kernels when we change a configuration option.  It also doesn't
> have any obvious knobs for building with anything other than a
> default {make,src}.conf. And with a pkg-able base just around the
> corner I don't really want to put much effort into making
> freebsd-update do what I want.  NFS, on the other hand, is a big
> deal and so I need to track down and fix these bugs. 
> _______________________________________________ 
> freebsd-fs@freebsd.org mailing list 
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs To
> unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJUUtLDAAoJEFnLrGVSDFaEu0MQAJOlPWcsduuiS75LUe42uj+E
SRnxSvm5JgUdJojatx7cL5TQjEvXbYov8CE8OLZUqGxIi0D0IdpKlr6WJes8KOUC
wfix7doQZQe3IPqgYAJZz0y6j89q6+QABPTS2oy+cPpYmop9568TvuJJZCCixBOF
Zv3XYa4I7uIl1pYF2zl2nJHtOwLi2wjT+851heqXo8GvIo8SAhBouTN5biPh2JGl
Yabbb4e5xePvigMLEwxbPNslv3nhT1JOcsH9GoFLo5zph2+Txw6ZPy1Sccyv88AQ
w5ID129VMzZChX6zYT7+LtJYLmZME3bVrA2R6YeEdnr/Is8qm5eKtpkMrUz+5Qn4
ULf3fJSCjYdlfatfBIFfi2jFJWBkBY7qVu9S5nqfG9yn4DCLY2UYl4skP71Eo4hz
DPDKQwpuij/Tf8y459Vj60AsOt87Sh0eYBnW+nWJdgIPWptYLNmjv/VHvC8ZFbnn
HsrvUw9DovnTfd7rn+GR4F4+nlnjXqOKdPJtLroId3tSxZzy9L08n7Y6AvAWFFWM
oQ4q/B4LxpOmjXqIBTCrC5ux7GdtKGN2gkAYvY4zh3ngPJJ9ts0BRHbq2zRMo9OA
eUT8Cf+D/wQcFcd+27eI1RJu8IbyycStwGMXbA57UkvJkfSA5CVpcey+T5z9uyPa
7xlgxCpHOIHSJ6l2BeSQ
=4Q5V
-----END PGP SIGNATURE-----

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 31 01:31:36 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9C60ECDB;
 Fri, 31 Oct 2014 01:31:36 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 41141F37;
 Fri, 31 Oct 2014 01:31:35 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aq0EAHzlUlSDaFve/2dsb2JhbABcDoQwgwLRaAKBMgEBAQEBfYQCAQEBAwEjBFIFFhgCAg0ZAlkGiEsJtVWUaAEBAQEGAQEBAQEBHIEsjyEONAeCd4FUBZ8hjWaHLYM4XCGBN0CBAwEBAQ
X-IronPort-AV: E=Sophos;i="5.07,290,1413259200"; d="scan'208";a="163577126"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Oct 2014 21:31:34 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 11382AE92D;
 Thu, 30 Oct 2014 21:31:34 -0400 (EDT)
Date: Thu, 30 Oct 2014 21:31:34 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Garrett Wollman <wollman@csail.mit.edu>
Message-ID: <1902145956.2676513.1414719094052.JavaMail.root@uoguelph.ca>
In-Reply-To: <21586.48982.64913.250497@khavrinen.csail.mit.edu>
Subject: Re: Definite NFS  bug
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926)
Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, freebsd-stable@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Oct 2014 01:31:36 -0000

Garrett Wollman wrote:
> Like many other users, I upgrade my FreeBSD servers by NFS-mounting
> /usr/src and /usr/obj from a shared build server.[1]  Since I
> upgraded
> the build server to 9.3, clients running 9.3 kernels have been
> randomly erroring out during installkernel and installworld.  Today I
> had some time to look more closely into this and found that the error
> is definitely coming from the server: at some point, it just randomly
> starts returning errors to client ACCESS and GETATTR operations.  The
> errors are a mix of NFS3ERR_IO and NFS3ERR_ACCES, but there is
> nothing
> on the server to indicate any kind of error, and restarting the
> operation on the client causes it to fail in a different place.  With
> enough patients and restarts, it's possible to complete the
> installation in just four or five passes.
> 
> Needless to say this is a bit worrying.  Strangely, 9.1 and 9.2
> clients don't see this issue at all; it's only 9.3 clients that
> break.
> 
> It's easy to reproduce, just 'cd /usr/sc && find . -type f
> >/dev/null'.
> It does not seem to depend on the client NFS version (3 or 4) or
> implementation ("old" or "new").  I haven't tried the "old" server
> yet
> -- I'll need to figure out how to do that first.
> 
Well, I took a quick look and, if I got it correct, there is one single
line change in the "old" client between 9.2 and 9.3, which defined
an otherwise unused mount flag called NFSMNT_NONCONTIGWR. (It is
only used by the new client when "nocontigwr" is specified.)

However, there was some fairly extensive changes done (mostly by mav@)
to the kernel rpc (sys/rpc), which is used by both clients and both
servers.
Most of these changes were committed to stable/9 as r261057, r261058.
If you could build a kernel from stable/9 just prior to r261057 and see
if that client runs into the problem, it could help determine if these
changes are causing the problem.
Alternately, running the 9.3 system with a 9.2 sys/rpc (if it links/runs),
that could also help see if the kernel rpc is the culprit. (You can
load the kernel rpc as a module, but it's linked into most kernels.)

If it doesn't turn out to be in the kernel rpc, my next guess would
be changes to the net device driver (to check for this you could use
a different type of hardware device or the 9.2 driver on the 9.3 system. maybe?).

The "new" client has some changes 9.2->9.3, but since nothing changed
for the "old" client and you see the problem with the "old" one, I
think the NFS client is not the culprit.

rick

> If anyone is willing to help debug this, I can share a packet trace,
> but I don't think it's very informative.  Also, if anyone has a good
> dtrace script that I could run on the server that would report what's
> going on when that first NFS3ERR_IO is returned, that would be great.
> 
> -GAWollman
> 
> [1] I'd run my own freebsd-update server but unfortunately it is too
> tied to building things that look like official FreeBSD security
> updates, and isn't really designed for (e.g.) updating kernels when
> we
> change a configuration option.  It also doesn't have any obvious
> knobs
> for building with anything other than a default {make,src}.conf.
> And with a pkg-able base just around the corner I don't really want
> to
> put much effort into making freebsd-update do what I want.  NFS, on
> the other hand, is a big deal and so I need to track down and fix
> these bugs.
> 

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 31 01:49:49 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 023EEED2;
 Fri, 31 Oct 2014 01:49:49 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9ABCABC;
 Fri, 31 Oct 2014 01:49:48 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ar0EAB/qUlSDaFve/2dsb2JhbABcDoNUWASDAsoRCoZ5VAKBMgEBAQEBfYQCAQEBAwEBAQEgBCcgCwUWGAICDRkCKQEJJgYIBwQBHASIFwkNtUyUZgEBAQEGAQEBAQEBARuBLI8SAQENDjQHgneBVAWWWoQShDU8jSqHLYM4XCEvB4EBBxcigQMBAQE
X-IronPort-AV: E=Sophos;i="5.07,290,1413259200"; d="scan'208";a="163580651"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Oct 2014 21:49:47 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3E53FB4082;
 Thu, 30 Oct 2014 21:49:47 -0400 (EDT)
Date: Thu, 30 Oct 2014 21:49:47 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Garrett Wollman <wollman@csail.mit.edu>
Message-ID: <928219131.2682604.1414720187244.JavaMail.root@uoguelph.ca>
In-Reply-To: <1902145956.2676513.1414719094052.JavaMail.root@uoguelph.ca>
Subject: Re: Definite NFS  bug
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.209]
X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926)
Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, freebsd-stable@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Oct 2014 01:49:49 -0000

I wrote:
> Garrett Wollman wrote:
> > Like many other users, I upgrade my FreeBSD servers by NFS-mounting
> > /usr/src and /usr/obj from a shared build server.[1]  Since I
> > upgraded
> > the build server to 9.3, clients running 9.3 kernels have been
> > randomly erroring out during installkernel and installworld.  Today
> > I
> > had some time to look more closely into this and found that the
> > error
> > is definitely coming from the server: at some point, it just
> > randomly
> > starts returning errors to client ACCESS and GETATTR operations.
> >  The
> > errors are a mix of NFS3ERR_IO and NFS3ERR_ACCES, but there is
> > nothing
> > on the server to indicate any kind of error, and restarting the
> > operation on the client causes it to fail in a different place.
> >  With
> > enough patients and restarts, it's possible to complete the
> > installation in just four or five passes.
> > 
> > Needless to say this is a bit worrying.  Strangely, 9.1 and 9.2
> > clients don't see this issue at all; it's only 9.3 clients that
> > break.
> > 
> > It's easy to reproduce, just 'cd /usr/sc && find . -type f
> > >/dev/null'.
> > It does not seem to depend on the client NFS version (3 or 4) or
> > implementation ("old" or "new").  I haven't tried the "old" server
> > yet
> > -- I'll need to figure out how to do that first.
> > 
Oh, and it wasn't clear to me if you are seeing this on a 9.3 server
only? (If you get the same outcome testing against an older server,
then it seems it is a client side issue.)

If that is the case, I'd suggest you try a pre-r261056 (one of the changes
was r261056, not r261057) stable/9 kernel.

At a closer look, most of the kernel rpc changes are for the server side.
(Most of the client side commits just change the copyright, but there are
 a couple of client side changes beyond that.)

> Well, I took a quick look and, if I got it correct, there is one
> single
> line change in the "old" client between 9.2 and 9.3, which defined
> an otherwise unused mount flag called NFSMNT_NONCONTIGWR. (It is
> only used by the new client when "nocontigwr" is specified.)
> 
> However, there was some fairly extensive changes done (mostly by
> mav@)
> to the kernel rpc (sys/rpc), which is used by both clients and both
> servers.
> Most of these changes were committed to stable/9 as r261057, r261058.
> If you could build a kernel from stable/9 just prior to r261057 and
> see
> if that client runs into the problem, it could help determine if
> these
> changes are causing the problem.
> Alternately, running the 9.3 system with a 9.2 sys/rpc (if it
> links/runs),
> that could also help see if the kernel rpc is the culprit. (You can
> load the kernel rpc as a module, but it's linked into most kernels.)
> 
> If it doesn't turn out to be in the kernel rpc, my next guess would
> be changes to the net device driver (to check for this you could use
> a different type of hardware device or the 9.2 driver on the 9.3
> system. maybe?).
> 
> The "new" client has some changes 9.2->9.3, but since nothing changed
> for the "old" client and you see the problem with the "old" one, I
> think the NFS client is not the culprit.
> 
> rick
> 
> > If anyone is willing to help debug this, I can share a packet
> > trace,
> > but I don't think it's very informative.  Also, if anyone has a
> > good
> > dtrace script that I could run on the server that would report
> > what's
> > going on when that first NFS3ERR_IO is returned, that would be
> > great.
> > 
> > -GAWollman
> > 
> > [1] I'd run my own freebsd-update server but unfortunately it is
> > too
> > tied to building things that look like official FreeBSD security
> > updates, and isn't really designed for (e.g.) updating kernels when
> > we
> > change a configuration option.  It also doesn't have any obvious
> > knobs
> > for building with anything other than a default {make,src}.conf.
> > And with a pkg-able base just around the corner I don't really want
> > to
> > put much effort into making freebsd-update do what I want.  NFS, on
> > the other hand, is a big deal and so I need to track down and fix
> > these bugs.
> > 
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe@freebsd.org"
> 

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 31 15:59:01 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id F1A3DBFF
 for <freebsd-fs@freebsd.org>; Fri, 31 Oct 2014 15:59:01 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu
 [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "khavrinen.csail.mit.edu", Issuer "Client CA" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id B12F15FE
 for <freebsd-fs@freebsd.org>; Fri, 31 Oct 2014 15:59:01 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1])
 by khavrinen.csail.mit.edu (8.14.9/8.14.9) with ESMTP id s9VFx0j0040276
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA);
 Fri, 31 Oct 2014 11:59:00 -0400 (EDT)
 (envelope-from wollman@khavrinen.csail.mit.edu)
Received: (from wollman@localhost)
 by khavrinen.csail.mit.edu (8.14.9/8.14.9/Submit) id s9VFwx92040273;
 Fri, 31 Oct 2014 11:58:59 -0400 (EDT) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <21587.45507.688734.672857@khavrinen.csail.mit.edu>
Date: Fri, 31 Oct 2014 11:58:59 -0400
From: Garrett Wollman <wollman@csail.mit.edu>
To: "Russell L. Carter" <rcarter@pinyon.org>
Subject: Re: Definite NFS  bug
In-Reply-To: <5452D2C3.9040902@pinyon.org>
References: <21586.48982.64913.250497@khavrinen.csail.mit.edu>
 <5452D2C3.9040902@pinyon.org>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (khavrinen.csail.mit.edu [127.0.0.1]); Fri, 31 Oct 2014 11:59:00 -0400 (EDT)
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Oct 2014 15:59:02 -0000

<<On Thu, 30 Oct 2014 17:07:31 -0700, "Russell L. Carter" <rcarter@pinyon.org> said:

> The problems I'm seeing on -current for the last few months
> seem to me to be out-of-date targets, so that the failure is a
> desire by the remote client to try to rebuild the out-of-date target
> on the RO file system.

Nope, nothing at all to do with that.  As I said in my original
message, the problem is that the server is returning NFS3ERR_IO or
NFS3ERR_ACCES for RPCs that should (and a few seconds later DO)
succeed.

-GAWollman


From owner-freebsd-fs@FreeBSD.ORG  Sat Nov  1 22:44:44 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 98F9FE1A;
 Sat,  1 Nov 2014 22:44:44 +0000 (UTC)
Received: from potato.growveg.org (potato.growveg.org [62.49.247.163])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 584BCF62;
 Sat,  1 Nov 2014 22:44:44 +0000 (UTC)
Received: from john by potato.growveg.org with local (Exim 4.84 (FreeBSD))
 (envelope-from <john@potato.growveg.org>)
 id 1XkhPG-000IBC-Qi; Sat, 01 Nov 2014 22:44:26 +0000
Date: Sat, 1 Nov 2014 22:44:26 +0000
From: John <freebsd-lists@potato.growveg.org>
To: freebsd-hardware@freebsd.org
Subject: gptboot: invalid backup GPT header
Message-ID: <20141101224426.GA69717@potato.growveg.org>
Mail-Followup-To: freebsd-hardware@freebsd.org, freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: John <john@potato.growveg.org>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: john@potato.growveg.org
X-SA-Exim-Scanned: No (on potato.growveg.org); SAEximRunCond expanded to false
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Nov 2014 22:44:44 -0000

Hello lists,

Not sure if this is a hardware problem or a filesystem problem, so have 
cc'd to freebsd-fs@

The "problem" is on newly-installed freebsd 10.1 RC3. I say "problem" in 
inverted commas because it's not stopping it from booting and the server 
seems to run allright, I'm wondering if it's anything to worry about. As 
well as seeing gptboot: invalid backup GPT header *before* beastie 
starts, I see the following in dmesg:

GEOM: mfid1: corrupt or invalid GPT detected.
GEOM: mfid1: GPT rejected -- may not be recoverable.

There are 4 disks installed - mfid0,1,2 & 3. mfid0 is a regular ufs gpt 
based disk. mfid1,2 and 3 together form a zfs raidz array. A thread on 
https://forums.freenas.org/index.php?threads/gpt-table-is-corrupt-or-invalid-error-on-bootup.12171/
describes a similar problem - the thing is though the "erroring" disk is 
not a GPT disk, and the one in the example was.

# gpart list mfid1
gpart: No such geom: mfid1.

# ls -la /dev/mfid*
crw-r-----  1 root  operator  0x62 Nov  1 13:37 /dev/mfid0
crw-r-----  1 root  operator  0x66 Nov  1 13:37 /dev/mfid0p1
crw-r-----  1 root  operator  0x67 Nov  1 13:37 /dev/mfid0p2
crw-r-----  1 root  operator  0x68 Nov  1 13:37 /dev/mfid0p3
crw-r-----  1 root  operator  0x63 Nov  1 13:37 /dev/mfid1
crw-r-----  1 root  operator  0x64 Nov  1 13:37 /dev/mfid2
crw-r-----  1 root  operator  0x65 Nov  1 13:37 /dev/mfid3

# zpool status
  pool: vms
  state: ONLINE
  scan: none requested

config:

        NAME        STATE     READ WRITE CKSUM
        vms         ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            mfid1   ONLINE       0     0     0
            mfid2   ONLINE       0     0     0
            mfid3   ONLINE       0     0     0

errors: No known data errors

Should I worry?

-- 
John