From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 07:57:28 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1C44495C; Sun, 26 Oct 2014 07:57:28 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B3631955; Sun, 26 Oct 2014 07:57:27 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9Q7vKBQ044691 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 26 Oct 2014 09:57:20 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9Q7vKBQ044691 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9Q7vKIi044690; Sun, 26 Oct 2014 09:57:20 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 26 Oct 2014 09:57:20 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: panic in nfs on arm Message-ID: <20141026075720.GO1877@kib.kiev.ua> References: <1388627434.7506173.1414279273153.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1388627434.7506173.1414279273153.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 07:57:28 -0000 On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote: > Ronald Klop wrote: > > Hi, > > > > I got a panic on my arm computer while building a port with > > /usr/ports > > mounted from my FreeBSD-10-STABLE/amd64 machine. > > > > This is the machine which paniced: > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014 > > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG > > arm > > > > > > Tracing pid 90295 tid 100119 td 0xc5f8c960 > > db_trace_self() at db_trace_self > > pc = 0xc0bb12c8 lr = 0xc0bb1354 (db_trace_thread+0x50) > > sp = 0xdf29e5d0 fp = 0xc3e07120 > > db_trace_thread() at db_trace_thread+0x50 > > pc = 0xc0bb1354 lr = 0xc0936314 (db_command_init+0x5a4) > > sp = 0xdf29e630 fp = 0xc3e07120 > > db_command_init() at db_command_init+0x5a4 > > pc = 0xc0936314 lr = 0xc0935ad0 (db_skip_to_eol+0x484) > > sp = 0xdf29e648 fp = 0xc3e07120 > > r4 = 0xc0c8d350 r5 = 0x00000000 > > db_skip_to_eol() at db_skip_to_eol+0x484 > > pc = 0xc0935ad0 lr = 0xc0935c38 (db_command_loop+0x5c) > > sp = 0xdf29e6e8 fp = 0xc3e07120 > > r4 = 0xdf29e6fc r5 = 0xc0c8d64c > > r6 = 0x3cd90e75 r7 = 0x00000000 > > r8 = 0x00000001 r10 = 0x600000d3 > > db_command_loop() at db_command_loop+0x5c > > pc = 0xc0935c38 lr = 0xc0937f80 (X_db_sym_numargs+0xec) > > sp = 0xdf29e6f0 fp = 0xc3e07120 > > X_db_sym_numargs() at X_db_sym_numargs+0xec > > pc = 0xc0937f80 lr = 0xc0a6f0c0 (kdb_trap+0x94) > > sp = 0xdf29e808 fp = 0xc3e07120 > > r4 = 0xdf29e8f8 > > kdb_trap() at kdb_trap+0x94 > > pc = 0xc0a6f0c0 lr = 0xc0bc1d60 (badaddr_read+0x274) > > sp = 0xdf29e828 fp = 0xc3e07120 > > r4 = 0xdf29e8f8 r5 = 0x00000001 > > r6 = 0x3cd90e75 r7 = 0xc5f8c960 > > r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0 > > badaddr_read() at badaddr_read+0x274 > > pc = 0xc0bc1d60 lr = 0xc0bc1e98 (badaddr_read+0x3ac) > > sp = 0xdf29e840 fp = 0xc3e07120 > > r4 = 0xc5f8c960 r5 = 0xdf29e8f8 > > r6 = 0x3cd90e05 > > badaddr_read() at badaddr_read+0x3ac > > pc = 0xc0bc1e98 lr = 0xc0bc2278 (data_abort_handler+0x10c) > > sp = 0xdf29e858 fp = 0xc3e07120 > > r4 = 0xc0cd8af8 r5 = 0xffff1004 > > data_abort_handler() at data_abort_handler+0x10c > > pc = 0xc0bc2278 lr = 0xc0bb2f40 (exception_exit) > > sp = 0xdf29e8f8 fp = 0xc3e07120 > > r4 = 0xffffffff r5 = 0xffff1004 > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 > > r8 = 0x0000000f r9 = 0x00000101 > > r10 = 0x0000001d > > exception_exit() at exception_exit > > pc = 0xc0bb2f40 lr = 0xc0b8daf8 (uma_reclaim+0x1f8) > > sp = 0xdf29e948 fp = 0xc3e07120 > > r0 = 0xba9b9127 r1 = 0x8b3de5fb > > r2 = 0xc61c1fc8 r3 = 0xba9b9126 > > r4 = 0x00000000 r5 = 0xc61c1fc8 > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 > > r8 = 0x0000000f r9 = 0x00000101 > > r10 = 0x0000001d r12 = 0x00000000 > > uma_reclaim() at uma_reclaim+0x24c > This looks to me like a crash in uma_reclaim() and I find UMA > way too obscure to understand. > > I have no idea if it might be related, but alc@ put a fix for low > memory situations in r272071 (or maybe it's r272221?). > > Might be worth trying a slightly newer kernel to see if the > problem still occurs. > > And hopefully someone more conversant with UMA (or this stack > trace) can help more. > > rick > > > pc = 0xc0b8db4c lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0) > > sp = 0xdf29e978 fp = 0xdf29ec10 > > r4 = 0xc3e071d8 r5 = 0xc0e0ea00 > > r6 = 0xc3e07120 r7 = 0x00000000 > > r8 = 0x00000102 r9 = 0xdf29ecf8 > > r10 = 0xc61c0760 > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0 uma_reclaim() is not called from uma_zalloc(). I think there is some issue with ddb on arm, which means that the backtrace is not useful. See below for one more. > > pc = 0xc0b8c800 lr = 0xc09e1df0 (nfscl_nget+0x308) > > sp = 0xdf29e990 fp = 0xdf29ec10 > > r4 = 0x9bb9fa43 r5 = 0x00000000 > > r6 = 0xc550dce8 r7 = 0xc3edaa00 > > r8 = 0xc3ebbac0 > > nfscl_nget() at nfscl_nget+0x308 > > pc = 0xc09e1df0 lr = 0xc09da69c (ncl_readlinkrpc+0xf60) > > sp = 0xdf29e9d8 fp = 0xdf29ea10 > > r4 = 0xc550dce8 r5 = 0x00000000 > > r6 = 0xc550dcf8 r7 = 0xdf29ecf8 > > r8 = 0xdf29ec6c r9 = 0x00000000 > > r10 = 0xdf29ed28 > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60 > > pc = 0xc09da69c lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94) > > sp = 0xdf29ec40 fp = 0xbffff620 > > r4 = 0xc0c95c68 r5 = 0xdf29ec6c > > r6 = 0x00000001 r7 = 0x00020284 > > r8 = 0xffffff9c r9 = 0x00200800 > > r10 = 0xc5f8c960 > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94 I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(), esp. without intervening frame. > > pc = 0xc0bdae44 lr = 0xc0aca614 (kern_mkdirat+0x18c) > > sp = 0xdf29ec50 fp = 0xbffff620 > > r4 = 0xdf29ed28 r5 = 0xdf29ec90 > > r6 = 0x00000000 > > kern_mkdirat() at kern_mkdirat+0x18c > > pc = 0xc0aca614 lr = 0xc0aca684 (kern_mkdir+0x24) > > sp = 0xdf29ede0 fp = 0xbffff620 > > r4 = 0x00020290 r5 = 0xc5f8c960 > > r6 = 0x00000000 r7 = 0xc5f7f000 > > r8 = 0x00000000 r10 = 0x00013640 > > kern_mkdir() at kern_mkdir+0x24 > > pc = 0xc0aca684 lr = 0xc0aca6a8 (sys_mkdir+0x1c) > > sp = 0xdf29edf0 fp = 0xbffff620 > > sys_mkdir() at sys_mkdir+0x1c > > pc = 0xc0aca6a8 lr = 0xc0bc2884 (swi_handler+0x254) > > sp = 0xdf29edf8 fp = 0xbffff620 > > swi_handler() at swi_handler+0x254 > > pc = 0xc0bc2884 lr = 0xc0bb2ed0 (swi_exit) > > sp = 0xdf29ee60 fp = 0xbffff620 > > r4 = 0x00020290 r5 = 0x2085e8e0 > > r6 = 0x00020284 r7 = 0x00000088 > > r8 = 0x00000001 > > swi_exit() at swi_exit > > pc = 0xc0bb2ed0 lr = 0xc0bb2ed0 (swi_exit) > > sp = 0xdf29ee60 fp = 0xbffff620 > > Unable to unwind further > > > > > > Unfortunately dumping the kernel core also paniced. > > db> dump > > Physical memory: 507 MB > > Dumping 74 MB: 71 67 63 > > vm_fault(0xc4147000, 0, 1, 0) -> 0 > > Fatal kernel mode data abort: 'Translation Fault (P)' > > trapframe: 0xdf29e0b8 > > FSR=00000017, FAR=00000014, spsr=a00000d3 > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004 > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060 > > > > panic: Fatal abort > > Uptime: 3d18h30m32s > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 09:55:01 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7ABFFB6C; Sun, 26 Oct 2014 09:55:01 +0000 (UTC) Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3952B367; Sun, 26 Oct 2014 09:55:00 +0000 (UTC) Received: from mail (mail [192.168.254.3]) by mail.madpilot.net (Postfix) with ESMTP id 3jQZKP0v2qzb0g; Sun, 26 Oct 2014 10:54:49 +0100 (CET) Received: from mail.madpilot.net ([192.168.254.3]) by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024) with ESMTP id 8XLrlYF-z2dL; Sun, 26 Oct 2014 10:54:34 +0100 (CET) Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206]) by mail.madpilot.net (Postfix) with ESMTPSA; Sun, 26 Oct 2014 10:54:28 +0100 (CET) Message-ID: <544CC4D4.7040203@FreeBSD.org> Date: Sun, 26 Oct 2014 10:54:28 +0100 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: FreeBSD FS Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net> <544BC990.4030700@madpilot.net> In-Reply-To: <544BC990.4030700@madpilot.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: Glen Barber , freebsd-stable@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 09:55:01 -0000 On 10/25/14 18:02, Guido Falsi wrote: > On 10/25/14 17:02, Guido Falsi wrote: >> On 10/24/14 15:26, Guido Falsi wrote: >>> Hi, >>> >>> I'm making some experiments with 10.1-RC3 on alix boards as hardware >>> using NanoBSD. >>> >>> By mounting and umounting UFS filesystems I have seen umount constantly >>> hanging hard in a deadlock. I have tested on two boards with two >>> distinct compactflash disks with same results. This was not happening >>> with 10.0-RELEASE. >>> >>> I have build a 10.1-RC3 kernel with full debugging and caused the >>> problem to happen, I got this: >>> >>> root@qtest:~ [0]# umount /cfg >>> panic: detach with active requests >>> KDB: stack backtrace: [...] > I must admit I am out of ideas. > I bisected commits and finally found out this happens starting with r268815, which MFCed r268205. It is related to trim support, in fact disabling trim on the filesystm "fixes" it. I filed bug #194606 on bugzilla [1] to further track this issue, if anyone is interested. [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606 -- Guido Falsi From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 12:00:32 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F3467942; Sun, 26 Oct 2014 12:00:31 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7E1F4EBE; Sun, 26 Oct 2014 12:00:30 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq0EAF3hTFSDaFve/2dsb2JhbABcg2JYBIMCyWUKhnlUAoEaAX2EAgEBAQMBAQEBIAQnIAsbDgoCAg0ZAikBCSYGCAcEARwEiBcJDbNMlAYBAQEBAQEEAQEBAQEBARuBLI8LAQEbNAeCd4FUBZZPhA6EcZRBhBQhLweBCDmBAwEBAQ X-IronPort-AV: E=Sophos;i="5.04,790,1406606400"; d="scan'208";a="163555615" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 26 Oct 2014 08:00:29 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6108EB413D; Sun, 26 Oct 2014 08:00:29 -0400 (EDT) Date: Sun, 26 Oct 2014 08:00:29 -0400 (EDT) From: Rick Macklem To: Konstantin Belousov Message-ID: <1340373913.7617662.1414324829387.JavaMail.root@uoguelph.ca> In-Reply-To: <20141026075720.GO1877@kib.kiev.ua> Subject: Re: panic in nfs on arm MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 12:00:32 -0000 Kostik wrote: > On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote: > > Ronald Klop wrote: > > > Hi, > > > > > > I got a panic on my arm computer while building a port with > > > /usr/ports > > > mounted from my FreeBSD-10-STABLE/amd64 machine. > > > > > > This is the machine which paniced: > > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014 > > > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG > > > arm > > > > > > > > > Tracing pid 90295 tid 100119 td 0xc5f8c960 > > > db_trace_self() at db_trace_self > > > pc = 0xc0bb12c8 lr = 0xc0bb1354 (db_trace_thread+0x50) > > > sp = 0xdf29e5d0 fp = 0xc3e07120 > > > db_trace_thread() at db_trace_thread+0x50 > > > pc = 0xc0bb1354 lr = 0xc0936314 > > > (db_command_init+0x5a4) > > > sp = 0xdf29e630 fp = 0xc3e07120 > > > db_command_init() at db_command_init+0x5a4 > > > pc = 0xc0936314 lr = 0xc0935ad0 (db_skip_to_eol+0x484) > > > sp = 0xdf29e648 fp = 0xc3e07120 > > > r4 = 0xc0c8d350 r5 = 0x00000000 > > > db_skip_to_eol() at db_skip_to_eol+0x484 > > > pc = 0xc0935ad0 lr = 0xc0935c38 (db_command_loop+0x5c) > > > sp = 0xdf29e6e8 fp = 0xc3e07120 > > > r4 = 0xdf29e6fc r5 = 0xc0c8d64c > > > r6 = 0x3cd90e75 r7 = 0x00000000 > > > r8 = 0x00000001 r10 = 0x600000d3 > > > db_command_loop() at db_command_loop+0x5c > > > pc = 0xc0935c38 lr = 0xc0937f80 > > > (X_db_sym_numargs+0xec) > > > sp = 0xdf29e6f0 fp = 0xc3e07120 > > > X_db_sym_numargs() at X_db_sym_numargs+0xec > > > pc = 0xc0937f80 lr = 0xc0a6f0c0 (kdb_trap+0x94) > > > sp = 0xdf29e808 fp = 0xc3e07120 > > > r4 = 0xdf29e8f8 > > > kdb_trap() at kdb_trap+0x94 > > > pc = 0xc0a6f0c0 lr = 0xc0bc1d60 (badaddr_read+0x274) > > > sp = 0xdf29e828 fp = 0xc3e07120 > > > r4 = 0xdf29e8f8 r5 = 0x00000001 > > > r6 = 0x3cd90e75 r7 = 0xc5f8c960 > > > r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0 > > > badaddr_read() at badaddr_read+0x274 > > > pc = 0xc0bc1d60 lr = 0xc0bc1e98 (badaddr_read+0x3ac) > > > sp = 0xdf29e840 fp = 0xc3e07120 > > > r4 = 0xc5f8c960 r5 = 0xdf29e8f8 > > > r6 = 0x3cd90e05 > > > badaddr_read() at badaddr_read+0x3ac > > > pc = 0xc0bc1e98 lr = 0xc0bc2278 > > > (data_abort_handler+0x10c) > > > sp = 0xdf29e858 fp = 0xc3e07120 > > > r4 = 0xc0cd8af8 r5 = 0xffff1004 > > > data_abort_handler() at data_abort_handler+0x10c > > > pc = 0xc0bc2278 lr = 0xc0bb2f40 (exception_exit) > > > sp = 0xdf29e8f8 fp = 0xc3e07120 > > > r4 = 0xffffffff r5 = 0xffff1004 > > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 > > > r8 = 0x0000000f r9 = 0x00000101 > > > r10 = 0x0000001d > > > exception_exit() at exception_exit > > > pc = 0xc0bb2f40 lr = 0xc0b8daf8 (uma_reclaim+0x1f8) > > > sp = 0xdf29e948 fp = 0xc3e07120 > > > r0 = 0xba9b9127 r1 = 0x8b3de5fb > > > r2 = 0xc61c1fc8 r3 = 0xba9b9126 > > > r4 = 0x00000000 r5 = 0xc61c1fc8 > > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 > > > r8 = 0x0000000f r9 = 0x00000101 > > > r10 = 0x0000001d r12 = 0x00000000 > > > uma_reclaim() at uma_reclaim+0x24c > > This looks to me like a crash in uma_reclaim() and I find UMA > > way too obscure to understand. > > > > I have no idea if it might be related, but alc@ put a fix for low > > memory situations in r272071 (or maybe it's r272221?). > > > > Might be worth trying a slightly newer kernel to see if the > > problem still occurs. > > > > And hopefully someone more conversant with UMA (or this stack > > trace) can help more. > > > > rick > > > > > pc = 0xc0b8db4c lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0) > > > sp = 0xdf29e978 fp = 0xdf29ec10 > > > r4 = 0xc3e071d8 r5 = 0xc0e0ea00 > > > r6 = 0xc3e07120 r7 = 0x00000000 > > > r8 = 0x00000102 r9 = 0xdf29ecf8 > > > r10 = 0xc61c0760 > > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0 > uma_reclaim() is not called from uma_zalloc(). > I think there is some issue with ddb on arm, which means that > the backtrace is not useful. See below for one more. > Yea, I noticed that and the one below (ie. I knew the stack dump wasn't correct). I kinda hoped it was right w.r.t. the crash happening in uma_reclaim() { which only seems to be called from the pageout daemon? }, so that doesn't match up with the thread. Also, I couldn't see what the panic message actually was. Is it this one at the bottom: Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock or was that what happened when you tried to crash dump? Btw, nfscl_nget() does call uma_zalloc(M_WAITOK), but it doesn't hold a mutex when it does this. rick > > > pc = 0xc0b8c800 lr = 0xc09e1df0 (nfscl_nget+0x308) > > > sp = 0xdf29e990 fp = 0xdf29ec10 > > > r4 = 0x9bb9fa43 r5 = 0x00000000 > > > r6 = 0xc550dce8 r7 = 0xc3edaa00 > > > r8 = 0xc3ebbac0 > > > nfscl_nget() at nfscl_nget+0x308 > > > pc = 0xc09e1df0 lr = 0xc09da69c > > > (ncl_readlinkrpc+0xf60) > > > sp = 0xdf29e9d8 fp = 0xdf29ea10 > > > r4 = 0xc550dce8 r5 = 0x00000000 > > > r6 = 0xc550dcf8 r7 = 0xdf29ecf8 > > > r8 = 0xdf29ec6c r9 = 0x00000000 > > > r10 = 0xdf29ed28 > > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60 > > > pc = 0xc09da69c lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94) > > > sp = 0xdf29ec40 fp = 0xbffff620 > > > r4 = 0xc0c95c68 r5 = 0xdf29ec6c > > > r6 = 0x00000001 r7 = 0x00020284 > > > r8 = 0xffffff9c r9 = 0x00200800 > > > r10 = 0xc5f8c960 > > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94 > I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(), > esp. without intervening frame. > > > > pc = 0xc0bdae44 lr = 0xc0aca614 (kern_mkdirat+0x18c) > > > sp = 0xdf29ec50 fp = 0xbffff620 > > > r4 = 0xdf29ed28 r5 = 0xdf29ec90 > > > r6 = 0x00000000 > > > kern_mkdirat() at kern_mkdirat+0x18c > > > pc = 0xc0aca614 lr = 0xc0aca684 (kern_mkdir+0x24) > > > sp = 0xdf29ede0 fp = 0xbffff620 > > > r4 = 0x00020290 r5 = 0xc5f8c960 > > > r6 = 0x00000000 r7 = 0xc5f7f000 > > > r8 = 0x00000000 r10 = 0x00013640 > > > kern_mkdir() at kern_mkdir+0x24 > > > pc = 0xc0aca684 lr = 0xc0aca6a8 (sys_mkdir+0x1c) > > > sp = 0xdf29edf0 fp = 0xbffff620 > > > sys_mkdir() at sys_mkdir+0x1c > > > pc = 0xc0aca6a8 lr = 0xc0bc2884 (swi_handler+0x254) > > > sp = 0xdf29edf8 fp = 0xbffff620 > > > swi_handler() at swi_handler+0x254 > > > pc = 0xc0bc2884 lr = 0xc0bb2ed0 (swi_exit) > > > sp = 0xdf29ee60 fp = 0xbffff620 > > > r4 = 0x00020290 r5 = 0x2085e8e0 > > > r6 = 0x00020284 r7 = 0x00000088 > > > r8 = 0x00000001 > > > swi_exit() at swi_exit > > > pc = 0xc0bb2ed0 lr = 0xc0bb2ed0 (swi_exit) > > > sp = 0xdf29ee60 fp = 0xbffff620 > > > Unable to unwind further > > > > > > > > > Unfortunately dumping the kernel core also paniced. > > > db> dump > > > Physical memory: 507 MB > > > Dumping 74 MB: 71 67 63 > > > vm_fault(0xc4147000, 0, 1, 0) -> 0 > > > Fatal kernel mode data abort: 'Translation Fault (P)' > > > trapframe: 0xdf29e0b8 > > > FSR=00000017, FAR=00000014, spsr=a00000d3 > > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004 > > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c > > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a > > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060 > > > > > > panic: Fatal abort > > > Uptime: 3d18h30m32s > > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to > > > "freebsd-fs-unsubscribe@freebsd.org" > > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to > > "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 12:12:09 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 84663AB5; Sun, 26 Oct 2014 12:12:09 +0000 (UTC) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4547880; Sun, 26 Oct 2014 12:12:08 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1XiMfq-0004Js-AT; Sun, 26 Oct 2014 13:12:00 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: "Konstantin Belousov" , "Rick Macklem" Subject: Re: panic in nfs on arm References: <1340373913.7617662.1414324829387.JavaMail.root@uoguelph.ca> Date: Sun, 26 Oct 2014 13:11:53 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Ronald Klop" Message-ID: In-Reply-To: <1340373913.7617662.1414324829387.JavaMail.root@uoguelph.ca> User-Agent: Opera Mail/12.16 (FreeBSD) X-Authenticated-As-Hash: bdb49c4ff80bd276e321aade33e76e02752072e2 X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: -0.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED, BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: 503f1a2b1db20d3cc8283cfb339c155f Cc: freebsd-fs@freebsd.org, freebsd-arm@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 12:12:09 -0000 On Sun, 26 Oct 2014 13:00:29 +0100, Rick Macklem wrote: > Kostik wrote: >> On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote: >> > Ronald Klop wrote: >> > > Hi, >> > > >> > > I got a panic on my arm computer while building a port with >> > > /usr/ports >> > > mounted from my FreeBSD-10-STABLE/amd64 machine. >> > > >> > > This is the machine which paniced: >> > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014 >> > > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG >> > > arm >> > > >> > > >> > > Tracing pid 90295 tid 100119 td 0xc5f8c960 >> > > db_trace_self() at db_trace_self >> > > pc = 0xc0bb12c8 lr = 0xc0bb1354 (db_trace_thread+0x50) >> > > sp = 0xdf29e5d0 fp = 0xc3e07120 >> > > db_trace_thread() at db_trace_thread+0x50 >> > > pc = 0xc0bb1354 lr = 0xc0936314 >> > > (db_command_init+0x5a4) >> > > sp = 0xdf29e630 fp = 0xc3e07120 >> > > db_command_init() at db_command_init+0x5a4 >> > > pc = 0xc0936314 lr = 0xc0935ad0 (db_skip_to_eol+0x484) >> > > sp = 0xdf29e648 fp = 0xc3e07120 >> > > r4 = 0xc0c8d350 r5 = 0x00000000 >> > > db_skip_to_eol() at db_skip_to_eol+0x484 >> > > pc = 0xc0935ad0 lr = 0xc0935c38 (db_command_loop+0x5c) >> > > sp = 0xdf29e6e8 fp = 0xc3e07120 >> > > r4 = 0xdf29e6fc r5 = 0xc0c8d64c >> > > r6 = 0x3cd90e75 r7 = 0x00000000 >> > > r8 = 0x00000001 r10 = 0x600000d3 >> > > db_command_loop() at db_command_loop+0x5c >> > > pc = 0xc0935c38 lr = 0xc0937f80 >> > > (X_db_sym_numargs+0xec) >> > > sp = 0xdf29e6f0 fp = 0xc3e07120 >> > > X_db_sym_numargs() at X_db_sym_numargs+0xec >> > > pc = 0xc0937f80 lr = 0xc0a6f0c0 (kdb_trap+0x94) >> > > sp = 0xdf29e808 fp = 0xc3e07120 >> > > r4 = 0xdf29e8f8 >> > > kdb_trap() at kdb_trap+0x94 >> > > pc = 0xc0a6f0c0 lr = 0xc0bc1d60 (badaddr_read+0x274) >> > > sp = 0xdf29e828 fp = 0xc3e07120 >> > > r4 = 0xdf29e8f8 r5 = 0x00000001 >> > > r6 = 0x3cd90e75 r7 = 0xc5f8c960 >> > > r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0 >> > > badaddr_read() at badaddr_read+0x274 >> > > pc = 0xc0bc1d60 lr = 0xc0bc1e98 (badaddr_read+0x3ac) >> > > sp = 0xdf29e840 fp = 0xc3e07120 >> > > r4 = 0xc5f8c960 r5 = 0xdf29e8f8 >> > > r6 = 0x3cd90e05 >> > > badaddr_read() at badaddr_read+0x3ac >> > > pc = 0xc0bc1e98 lr = 0xc0bc2278 >> > > (data_abort_handler+0x10c) >> > > sp = 0xdf29e858 fp = 0xc3e07120 >> > > r4 = 0xc0cd8af8 r5 = 0xffff1004 >> > > data_abort_handler() at data_abort_handler+0x10c >> > > pc = 0xc0bc2278 lr = 0xc0bb2f40 (exception_exit) >> > > sp = 0xdf29e8f8 fp = 0xc3e07120 >> > > r4 = 0xffffffff r5 = 0xffff1004 >> > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 >> > > r8 = 0x0000000f r9 = 0x00000101 >> > > r10 = 0x0000001d >> > > exception_exit() at exception_exit >> > > pc = 0xc0bb2f40 lr = 0xc0b8daf8 (uma_reclaim+0x1f8) >> > > sp = 0xdf29e948 fp = 0xc3e07120 >> > > r0 = 0xba9b9127 r1 = 0x8b3de5fb >> > > r2 = 0xc61c1fc8 r3 = 0xba9b9126 >> > > r4 = 0x00000000 r5 = 0xc61c1fc8 >> > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 >> > > r8 = 0x0000000f r9 = 0x00000101 >> > > r10 = 0x0000001d r12 = 0x00000000 >> > > uma_reclaim() at uma_reclaim+0x24c >> > This looks to me like a crash in uma_reclaim() and I find UMA >> > way too obscure to understand. >> > >> > I have no idea if it might be related, but alc@ put a fix for low >> > memory situations in r272071 (or maybe it's r272221?). >> > >> > Might be worth trying a slightly newer kernel to see if the >> > problem still occurs. >> > >> > And hopefully someone more conversant with UMA (or this stack >> > trace) can help more. >> > >> > rick >> > >> > > pc = 0xc0b8db4c lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0) >> > > sp = 0xdf29e978 fp = 0xdf29ec10 >> > > r4 = 0xc3e071d8 r5 = 0xc0e0ea00 >> > > r6 = 0xc3e07120 r7 = 0x00000000 >> > > r8 = 0x00000102 r9 = 0xdf29ecf8 >> > > r10 = 0xc61c0760 >> > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0 >> uma_reclaim() is not called from uma_zalloc(). >> I think there is some issue with ddb on arm, which means that >> the backtrace is not useful. See below for one more. >> > Yea, I noticed that and the one below (ie. I knew the stack dump > wasn't correct). I kinda hoped it was right w.r.t. the crash > happening in uma_reclaim() { which only seems to be called from > the pageout daemon? }, so that doesn't match up with the thread. > > Also, I couldn't see what the panic message actually was. Is it > this one at the bottom: > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock > or was that what happened when you tried to crash dump? > > Btw, nfscl_nget() does call uma_zalloc(M_WAITOK), but it doesn't hold a > mutex > when it does this. > > rick Hi, The non-sleepable lock is not the original panic. That non-sleepable lock happened when I dumped the memory to dumpdev from the debugger. I don't have the original panic message. It was not on the serial output anymore. Is it possible to let the debugger print it again? I rebooted the machine already. Let's see if it happens again someday. Ronald. >> > > pc = 0xc0b8c800 lr = 0xc09e1df0 (nfscl_nget+0x308) >> > > sp = 0xdf29e990 fp = 0xdf29ec10 >> > > r4 = 0x9bb9fa43 r5 = 0x00000000 >> > > r6 = 0xc550dce8 r7 = 0xc3edaa00 >> > > r8 = 0xc3ebbac0 >> > > nfscl_nget() at nfscl_nget+0x308 >> > > pc = 0xc09e1df0 lr = 0xc09da69c >> > > (ncl_readlinkrpc+0xf60) >> > > sp = 0xdf29e9d8 fp = 0xdf29ea10 >> > > r4 = 0xc550dce8 r5 = 0x00000000 >> > > r6 = 0xc550dcf8 r7 = 0xdf29ecf8 >> > > r8 = 0xdf29ec6c r9 = 0x00000000 >> > > r10 = 0xdf29ed28 >> > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60 >> > > pc = 0xc09da69c lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94) >> > > sp = 0xdf29ec40 fp = 0xbffff620 >> > > r4 = 0xc0c95c68 r5 = 0xdf29ec6c >> > > r6 = 0x00000001 r7 = 0x00020284 >> > > r8 = 0xffffff9c r9 = 0x00200800 >> > > r10 = 0xc5f8c960 >> > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94 >> I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(), >> esp. without intervening frame. >> >> > > pc = 0xc0bdae44 lr = 0xc0aca614 (kern_mkdirat+0x18c) >> > > sp = 0xdf29ec50 fp = 0xbffff620 >> > > r4 = 0xdf29ed28 r5 = 0xdf29ec90 >> > > r6 = 0x00000000 >> > > kern_mkdirat() at kern_mkdirat+0x18c >> > > pc = 0xc0aca614 lr = 0xc0aca684 (kern_mkdir+0x24) >> > > sp = 0xdf29ede0 fp = 0xbffff620 >> > > r4 = 0x00020290 r5 = 0xc5f8c960 >> > > r6 = 0x00000000 r7 = 0xc5f7f000 >> > > r8 = 0x00000000 r10 = 0x00013640 >> > > kern_mkdir() at kern_mkdir+0x24 >> > > pc = 0xc0aca684 lr = 0xc0aca6a8 (sys_mkdir+0x1c) >> > > sp = 0xdf29edf0 fp = 0xbffff620 >> > > sys_mkdir() at sys_mkdir+0x1c >> > > pc = 0xc0aca6a8 lr = 0xc0bc2884 (swi_handler+0x254) >> > > sp = 0xdf29edf8 fp = 0xbffff620 >> > > swi_handler() at swi_handler+0x254 >> > > pc = 0xc0bc2884 lr = 0xc0bb2ed0 (swi_exit) >> > > sp = 0xdf29ee60 fp = 0xbffff620 >> > > r4 = 0x00020290 r5 = 0x2085e8e0 >> > > r6 = 0x00020284 r7 = 0x00000088 >> > > r8 = 0x00000001 >> > > swi_exit() at swi_exit >> > > pc = 0xc0bb2ed0 lr = 0xc0bb2ed0 (swi_exit) >> > > sp = 0xdf29ee60 fp = 0xbffff620 >> > > Unable to unwind further >> > > >> > > >> > > Unfortunately dumping the kernel core also paniced. >> > > db> dump >> > > Physical memory: 507 MB >> > > Dumping 74 MB: 71 67 63 >> > > vm_fault(0xc4147000, 0, 1, 0) -> 0 >> > > Fatal kernel mode data abort: 'Translation Fault (P)' >> > > trapframe: 0xdf29e0b8 >> > > FSR=00000017, FAR=00000014, spsr=a00000d3 >> > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004 >> > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c >> > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a >> > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060 >> > > >> > > panic: Fatal abort >> > > Uptime: 3d18h30m32s >> > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock >> > > _______________________________________________ >> > > freebsd-fs@freebsd.org mailing list >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > > To unsubscribe, send any mail to >> > > "freebsd-fs-unsubscribe@freebsd.org" >> > > >> > _______________________________________________ >> > freebsd-fs@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > To unsubscribe, send any mail to >> > "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 14:59:21 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 62A5CCA5; Sun, 26 Oct 2014 14:59:21 +0000 (UTC) Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 22BB1FD0; Sun, 26 Oct 2014 14:59:20 +0000 (UTC) Received: from [73.34.117.227] (helo=ilsoft.org) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1XiPHr-0009z0-6u; Sun, 26 Oct 2014 14:59:19 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id s9QExHer074479; Sun, 26 Oct 2014 08:59:17 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 73.34.117.227 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX19WVQ941P4koAD0Fz9/d8NS X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan [172.22.42.240] claimed to be [172.22.42.240] Subject: Re: panic in nfs on arm From: Ian Lepore To: Konstantin Belousov In-Reply-To: <20141026075720.GO1877@kib.kiev.ua> References: <1388627434.7506173.1414279273153.JavaMail.root@uoguelph.ca> <20141026075720.GO1877@kib.kiev.ua> Content-Type: text/plain; charset="us-ascii" Date: Sun, 26 Oct 2014 08:59:17 -0600 Message-ID: <1414335557.12052.672.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 14:59:21 -0000 On Sun, 2014-10-26 at 09:57 +0200, Konstantin Belousov wrote: > On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote: > > Ronald Klop wrote: > > > Hi, > > > > > > I got a panic on my arm computer while building a port with > > > /usr/ports > > > mounted from my FreeBSD-10-STABLE/amd64 machine. > > > > > > This is the machine which paniced: > > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014 > > > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG > > > arm > > > > > > > > > Tracing pid 90295 tid 100119 td 0xc5f8c960 > > > db_trace_self() at db_trace_self > > > pc = 0xc0bb12c8 lr = 0xc0bb1354 (db_trace_thread+0x50) > > > sp = 0xdf29e5d0 fp = 0xc3e07120 > > > db_trace_thread() at db_trace_thread+0x50 > > > pc = 0xc0bb1354 lr = 0xc0936314 (db_command_init+0x5a4) > > > sp = 0xdf29e630 fp = 0xc3e07120 > > > db_command_init() at db_command_init+0x5a4 > > > pc = 0xc0936314 lr = 0xc0935ad0 (db_skip_to_eol+0x484) > > > sp = 0xdf29e648 fp = 0xc3e07120 > > > r4 = 0xc0c8d350 r5 = 0x00000000 > > > db_skip_to_eol() at db_skip_to_eol+0x484 > > > pc = 0xc0935ad0 lr = 0xc0935c38 (db_command_loop+0x5c) > > > sp = 0xdf29e6e8 fp = 0xc3e07120 > > > r4 = 0xdf29e6fc r5 = 0xc0c8d64c > > > r6 = 0x3cd90e75 r7 = 0x00000000 > > > r8 = 0x00000001 r10 = 0x600000d3 > > > db_command_loop() at db_command_loop+0x5c > > > pc = 0xc0935c38 lr = 0xc0937f80 (X_db_sym_numargs+0xec) > > > sp = 0xdf29e6f0 fp = 0xc3e07120 > > > X_db_sym_numargs() at X_db_sym_numargs+0xec > > > pc = 0xc0937f80 lr = 0xc0a6f0c0 (kdb_trap+0x94) > > > sp = 0xdf29e808 fp = 0xc3e07120 > > > r4 = 0xdf29e8f8 > > > kdb_trap() at kdb_trap+0x94 > > > pc = 0xc0a6f0c0 lr = 0xc0bc1d60 (badaddr_read+0x274) > > > sp = 0xdf29e828 fp = 0xc3e07120 > > > r4 = 0xdf29e8f8 r5 = 0x00000001 > > > r6 = 0x3cd90e75 r7 = 0xc5f8c960 > > > r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0 > > > badaddr_read() at badaddr_read+0x274 > > > pc = 0xc0bc1d60 lr = 0xc0bc1e98 (badaddr_read+0x3ac) > > > sp = 0xdf29e840 fp = 0xc3e07120 > > > r4 = 0xc5f8c960 r5 = 0xdf29e8f8 > > > r6 = 0x3cd90e05 > > > badaddr_read() at badaddr_read+0x3ac > > > pc = 0xc0bc1e98 lr = 0xc0bc2278 (data_abort_handler+0x10c) > > > sp = 0xdf29e858 fp = 0xc3e07120 > > > r4 = 0xc0cd8af8 r5 = 0xffff1004 > > > data_abort_handler() at data_abort_handler+0x10c > > > pc = 0xc0bc2278 lr = 0xc0bb2f40 (exception_exit) > > > sp = 0xdf29e8f8 fp = 0xc3e07120 > > > r4 = 0xffffffff r5 = 0xffff1004 > > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 > > > r8 = 0x0000000f r9 = 0x00000101 > > > r10 = 0x0000001d > > > exception_exit() at exception_exit > > > pc = 0xc0bb2f40 lr = 0xc0b8daf8 (uma_reclaim+0x1f8) > > > sp = 0xdf29e948 fp = 0xc3e07120 > > > r0 = 0xba9b9127 r1 = 0x8b3de5fb > > > r2 = 0xc61c1fc8 r3 = 0xba9b9126 > > > r4 = 0x00000000 r5 = 0xc61c1fc8 > > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 > > > r8 = 0x0000000f r9 = 0x00000101 > > > r10 = 0x0000001d r12 = 0x00000000 > > > uma_reclaim() at uma_reclaim+0x24c > > This looks to me like a crash in uma_reclaim() and I find UMA > > way too obscure to understand. > > > > I have no idea if it might be related, but alc@ put a fix for low > > memory situations in r272071 (or maybe it's r272221?). > > > > Might be worth trying a slightly newer kernel to see if the > > problem still occurs. > > > > And hopefully someone more conversant with UMA (or this stack > > trace) can help more. > > > > rick > > > > > pc = 0xc0b8db4c lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0) > > > sp = 0xdf29e978 fp = 0xdf29ec10 > > > r4 = 0xc3e071d8 r5 = 0xc0e0ea00 > > > r6 = 0xc3e07120 r7 = 0x00000000 > > > r8 = 0x00000102 r9 = 0xdf29ecf8 > > > r10 = 0xc61c0760 > > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0 > uma_reclaim() is not called from uma_zalloc(). > I think there is some issue with ddb on arm, which means that > the backtrace is not useful. See below for one more. > > > pc = 0xc0b8c800 lr = 0xc09e1df0 (nfscl_nget+0x308) > > > sp = 0xdf29e990 fp = 0xdf29ec10 > > > r4 = 0x9bb9fa43 r5 = 0x00000000 > > > r6 = 0xc550dce8 r7 = 0xc3edaa00 > > > r8 = 0xc3ebbac0 > > > nfscl_nget() at nfscl_nget+0x308 > > > pc = 0xc09e1df0 lr = 0xc09da69c (ncl_readlinkrpc+0xf60) > > > sp = 0xdf29e9d8 fp = 0xdf29ea10 > > > r4 = 0xc550dce8 r5 = 0x00000000 > > > r6 = 0xc550dcf8 r7 = 0xdf29ecf8 > > > r8 = 0xdf29ec6c r9 = 0x00000000 > > > r10 = 0xdf29ed28 > > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60 > > > pc = 0xc09da69c lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94) > > > sp = 0xdf29ec40 fp = 0xbffff620 > > > r4 = 0xc0c95c68 r5 = 0xdf29ec6c > > > r6 = 0x00000001 r7 = 0x00020284 > > > r8 = 0xffffff9c r9 = 0x00200800 > > > r10 = 0xc5f8c960 > > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94 > I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(), > esp. without intervening frame. > Notice that the address is actually ncl_readlinkrpc+0xf60, 0xf60 is a pretty big offset into a function, it's probably in some static function that follows ncl_readlinkrpc in the source file but the symbol info has been stripped. Using addr2line on the pc and lr values will give reliable source line numbers (but I can't do that without Ronald's kernel config). -- Ian > > > pc = 0xc0bdae44 lr = 0xc0aca614 (kern_mkdirat+0x18c) > > > sp = 0xdf29ec50 fp = 0xbffff620 > > > r4 = 0xdf29ed28 r5 = 0xdf29ec90 > > > r6 = 0x00000000 > > > kern_mkdirat() at kern_mkdirat+0x18c > > > pc = 0xc0aca614 lr = 0xc0aca684 (kern_mkdir+0x24) > > > sp = 0xdf29ede0 fp = 0xbffff620 > > > r4 = 0x00020290 r5 = 0xc5f8c960 > > > r6 = 0x00000000 r7 = 0xc5f7f000 > > > r8 = 0x00000000 r10 = 0x00013640 > > > kern_mkdir() at kern_mkdir+0x24 > > > pc = 0xc0aca684 lr = 0xc0aca6a8 (sys_mkdir+0x1c) > > > sp = 0xdf29edf0 fp = 0xbffff620 > > > sys_mkdir() at sys_mkdir+0x1c > > > pc = 0xc0aca6a8 lr = 0xc0bc2884 (swi_handler+0x254) > > > sp = 0xdf29edf8 fp = 0xbffff620 > > > swi_handler() at swi_handler+0x254 > > > pc = 0xc0bc2884 lr = 0xc0bb2ed0 (swi_exit) > > > sp = 0xdf29ee60 fp = 0xbffff620 > > > r4 = 0x00020290 r5 = 0x2085e8e0 > > > r6 = 0x00020284 r7 = 0x00000088 > > > r8 = 0x00000001 > > > swi_exit() at swi_exit > > > pc = 0xc0bb2ed0 lr = 0xc0bb2ed0 (swi_exit) > > > sp = 0xdf29ee60 fp = 0xbffff620 > > > Unable to unwind further > > > > > > > > > Unfortunately dumping the kernel core also paniced. > > > db> dump > > > Physical memory: 507 MB > > > Dumping 74 MB: 71 67 63 > > > vm_fault(0xc4147000, 0, 1, 0) -> 0 > > > Fatal kernel mode data abort: 'Translation Fault (P)' > > > trapframe: 0xdf29e0b8 > > > FSR=00000017, FAR=00000014, spsr=a00000d3 > > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004 > > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c > > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a > > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060 > > > > > > panic: Fatal abort > > > Uptime: 3d18h30m32s > > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 15:27:57 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1C06A299 for ; Sun, 26 Oct 2014 15:27:57 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id D4A722E3 for ; Sun, 26 Oct 2014 15:27:56 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id 6C2AD20E7088C; Sun, 26 Oct 2014 15:27:54 +0000 (UTC) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 6072020E70886 for ; Sun, 26 Oct 2014 15:27:54 +0000 (UTC) Message-ID: <544D137A.7010006@multiplay.co.uk> Date: Sun, 26 Oct 2014 15:30:02 +0000 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net> <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org> In-Reply-To: <544CC4D4.7040203@FreeBSD.org> Content-Type: multipart/mixed; boundary="------------080209070109080503070307" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 15:27:57 -0000 This is a multi-part message in MIME format. --------------080209070109080503070307 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit On 26/10/2014 09:54, Guido Falsi wrote: > On 10/25/14 18:02, Guido Falsi wrote: >> On 10/25/14 17:02, Guido Falsi wrote: >>> On 10/24/14 15:26, Guido Falsi wrote: >>>> Hi, >>>> >>>> I'm making some experiments with 10.1-RC3 on alix boards as hardware >>>> using NanoBSD. >>>> >>>> By mounting and umounting UFS filesystems I have seen umount constantly >>>> hanging hard in a deadlock. I have tested on two boards with two >>>> distinct compactflash disks with same results. This was not happening >>>> with 10.0-RELEASE. >>>> >>>> I have build a 10.1-RC3 kernel with full debugging and caused the >>>> problem to happen, I got this: >>>> >>>> root@qtest:~ [0]# umount /cfg >>>> panic: detach with active requests >>>> KDB: stack backtrace: > [...] >> I must admit I am out of ideas. >> > I bisected commits and finally found out this happens starting with > r268815, which MFCed r268205. > > It is related to trim support, in fact disabling trim on the filesystm > "fixes" it. > > I filed bug #194606 on bugzilla [1] to further track this issue, if > anyone is interested. > > [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606 Nice work Guido, can you try the attached patch and see if that fixes it please? Regards Steve --------------080209070109080503070307 Content-Type: text/plain; charset=windows-1252; name="cf_erase.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cf_erase.patch" Index: sys/cam/ata/ata_da.c =================================================================== --- sys/cam/ata/ata_da.c (revision 273157) +++ sys/cam/ata/ata_da.c (working copy) @@ -1470,6 +1470,8 @@ ada_cfaerase(struct ada_softc *softc, struct bio * uint64_t lba = bp->bio_pblkno; uint16_t count = bp->bio_bcount / softc->params.secsize; + bioq_remove(&softc->trim_queue, bp); + cam_fill_ataio(ataio, ada_retry_count, adadone, --------------080209070109080503070307-- From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 15:59:16 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8DA317EB for ; Sun, 26 Oct 2014 15:59:16 +0000 (UTC) Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 44569790 for ; Sun, 26 Oct 2014 15:59:15 +0000 (UTC) Received: from mail (mail [192.168.254.3]) by mail.madpilot.net (Postfix) with ESMTP id 3jQkPY0nLzzb36; Sun, 26 Oct 2014 16:58:57 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :references:subject:subject:mime-version:user-agent:from:from :date:date:message-id:received:received; s=mail; t=1414339134; x=1416153535; bh=OWbnC8HDhWImIe3eyBf+by0RshfiKWfuCIusxIAmMxU=; b= jZ3pKhojbQmjAdD8LWDNj+GVNXX/eQNVjCYX0EdJIrFbc1hJEa0zF5kjKPcCrFZp nIDURjREddDEytZ9yP1sr5KSsvS9xsqUtMbXMlPzPWwGNiaowmMJ46Fs+6NcowOZ jQDs+mwq5UgHi8ztuMfoDRCvoJbdTmURe0b92ngAJ38= Received: from mail.madpilot.net ([192.168.254.3]) by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024) with ESMTP id boi_Cr3THAeX; Sun, 26 Oct 2014 16:58:54 +0100 (CET) Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206]) by mail.madpilot.net (Postfix) with ESMTPSA; Sun, 26 Oct 2014 16:58:54 +0100 (CET) Message-ID: <544D1A3E.5000000@madpilot.net> Date: Sun, 26 Oct 2014 16:58:54 +0100 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Steven Hartland , freebsd-fs@freebsd.org Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net> <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org> <544D137A.7010006@multiplay.co.uk> In-Reply-To: <544D137A.7010006@multiplay.co.uk> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 15:59:16 -0000 On 10/26/14 16:30, Steven Hartland wrote: > > On 26/10/2014 09:54, Guido Falsi wrote: >> On 10/25/14 18:02, Guido Falsi wrote: >>> On 10/25/14 17:02, Guido Falsi wrote: >>>> On 10/24/14 15:26, Guido Falsi wrote: >>>>> Hi, >>>>> >>>>> I'm making some experiments with 10.1-RC3 on alix boards as hardware >>>>> using NanoBSD. >>>>> >>>>> By mounting and umounting UFS filesystems I have seen umount >>>>> constantly >>>>> hanging hard in a deadlock. I have tested on two boards with two >>>>> distinct compactflash disks with same results. This was not happening >>>>> with 10.0-RELEASE. >>>>> >>>>> I have build a 10.1-RC3 kernel with full debugging and caused the >>>>> problem to happen, I got this: >>>>> >>>>> root@qtest:~ [0]# umount /cfg >>>>> panic: detach with active requests >>>>> KDB: stack backtrace: >> [...] >>> I must admit I am out of ideas. >>> >> I bisected commits and finally found out this happens starting with >> r268815, which MFCed r268205. >> >> It is related to trim support, in fact disabling trim on the filesystm >> "fixes" it. >> >> I filed bug #194606 on bugzilla [1] to further track this issue, if >> anyone is interested. >> >> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606 > Nice work Guido, can you try the attached patch and see if that fixes it > please? Sure, I'll report back ASAP -- Guido Falsi From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 17:24:52 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7C7E98A3 for ; Sun, 26 Oct 2014 17:24:52 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 4006AED2 for ; Sun, 26 Oct 2014 17:24:51 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id 1095720E7088C; Sun, 26 Oct 2014 17:24:50 +0000 (UTC) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id F313C20E70886; Sun, 26 Oct 2014 17:24:49 +0000 (UTC) Message-ID: <544D2EE4.6010809@multiplay.co.uk> Date: Sun, 26 Oct 2014 17:27:00 +0000 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Guido Falsi , freebsd-fs@freebsd.org Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net> <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org> <544D137A.7010006@multiplay.co.uk> <544D1A3E.5000000@madpilot.net> In-Reply-To: <544D1A3E.5000000@madpilot.net> Content-Type: multipart/mixed; boundary="------------020309030106010303040409" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 17:24:52 -0000 This is a multi-part message in MIME format. --------------020309030106010303040409 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 26/10/2014 15:58, Guido Falsi wrote: > >>> I bisected commits and finally found out this happens starting with >>> r268815, which MFCed r268205. >>> >>> It is related to trim support, in fact disabling trim on the filesystm >>> "fixes" it. >>> >>> I filed bug #194606 on bugzilla [1] to further track this issue, if >>> anyone is interested. >>> >>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606 >> Nice work Guido, can you try the attached patch and see if that fixes it >> please? > Sure, I'll report back ASAP Actually looks like the fix requires more changes than I first thought, updated patch attached. Regards Steve --------------020309030106010303040409 Content-Type: text/plain; charset=windows-1252; name="cf_erase.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="cf_erase.patch" SW5kZXg6IHN5cy9jYW0vYXRhL2F0YV9kYS5jCj09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHN5cy9jYW0v YXRhL2F0YV9kYS5jCShyZXZpc2lvbiAyNzMxNTcpCisrKyBzeXMvY2FtL2F0YS9hdGFfZGEu Ywkod29ya2luZyBjb3B5KQpAQCAtMTQ2Nyw5ICsxNDY3LDE1IEBAIGFkYV9kc210cmltKHN0 cnVjdCBhZGFfc29mdGMgKnNvZnRjLCBzdHJ1Y3QgYmlvICpiCiBzdGF0aWMgdm9pZAogYWRh X2NmYWVyYXNlKHN0cnVjdCBhZGFfc29mdGMgKnNvZnRjLCBzdHJ1Y3QgYmlvICpicCwgc3Ry dWN0IGNjYl9hdGFpbyAqYXRhaW8pCiB7CisJc3RydWN0IHRyaW1fcmVxdWVzdCAqcmVxID0g JnNvZnRjLT50cmltX3JlcTsKIAl1aW50NjRfdCBsYmEgPSBicC0+YmlvX3BibGtubzsKIAl1 aW50MTZfdCBjb3VudCA9IGJwLT5iaW9fYmNvdW50IC8gc29mdGMtPnBhcmFtcy5zZWNzaXpl OwogCisJYnplcm8ocmVxLCBzaXplb2YoKnJlcSkpOworCVRBSUxRX0lOSVQoJnJlcS0+YnBz KTsKKwliaW9xX3JlbW92ZSgmc29mdGMtPnRyaW1fcXVldWUsIGJwKTsKKwlUQUlMUV9JTlNF UlRfVEFJTCgmcmVxLT5icHMsIGJwLCBiaW9fcXVldWUpOworCiAJY2FtX2ZpbGxfYXRhaW8o YXRhaW8sCiAJICAgIGFkYV9yZXRyeV9jb3VudCwKIAkgICAgYWRhZG9uZSwK --------------020309030106010303040409-- From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 17:33:13 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 43D82A1E for ; Sun, 26 Oct 2014 17:33:13 +0000 (UTC) Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0159AFA1 for ; Sun, 26 Oct 2014 17:33:12 +0000 (UTC) Received: from mail (mail [192.168.254.3]) by mail.madpilot.net (Postfix) with ESMTP id 3jQmV50FN6zb38; Sun, 26 Oct 2014 18:33:01 +0100 (CET) Received: from mail.madpilot.net ([192.168.254.3]) by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024) with ESMTP id 6JZKaVttFZbh; Sun, 26 Oct 2014 18:32:45 +0100 (CET) Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206]) by mail.madpilot.net (Postfix) with ESMTPSA; Sun, 26 Oct 2014 18:32:40 +0100 (CET) Message-ID: <544D3038.7080901@FreeBSD.org> Date: Sun, 26 Oct 2014 18:32:40 +0100 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Steven Hartland , freebsd-fs@freebsd.org Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net> <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org> <544D137A.7010006@multiplay.co.uk> In-Reply-To: <544D137A.7010006@multiplay.co.uk> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 17:33:13 -0000 On 10/26/14 16:30, Steven Hartland wrote: > > On 26/10/2014 09:54, Guido Falsi wrote: >> On 10/25/14 18:02, Guido Falsi wrote: >>> On 10/25/14 17:02, Guido Falsi wrote: >>>> On 10/24/14 15:26, Guido Falsi wrote: >>>>> Hi, >>>>> >>>>> I'm making some experiments with 10.1-RC3 on alix boards as hardware >>>>> using NanoBSD. >>>>> >>>>> By mounting and umounting UFS filesystems I have seen umount >>>>> constantly >>>>> hanging hard in a deadlock. I have tested on two boards with two >>>>> distinct compactflash disks with same results. This was not happening >>>>> with 10.0-RELEASE. >>>>> >>>>> I have build a 10.1-RC3 kernel with full debugging and caused the >>>>> problem to happen, I got this: >>>>> >>>>> root@qtest:~ [0]# umount /cfg >>>>> panic: detach with active requests >>>>> KDB: stack backtrace: >> [...] >>> I must admit I am out of ideas. >>> >> I bisected commits and finally found out this happens starting with >> r268815, which MFCed r268205. >> >> It is related to trim support, in fact disabling trim on the filesystm >> "fixes" it. >> >> I filed bug #194606 on bugzilla [1] to further track this issue, if >> anyone is interested. >> >> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606 > Nice work Guido, can you try the attached patch and see if that fixes it > please? It dies the same way with this patch applied. I tested applying the patch both in stable/10 at r268815 and to a fresh releng/10.1. -- Guido Falsi From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 17:34:52 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 84AC4AB4; Sun, 26 Oct 2014 17:34:52 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 4BA81FB5; Sun, 26 Oct 2014 17:34:51 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id 0780B20E7088C; Sun, 26 Oct 2014 17:34:51 +0000 (UTC) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id EC21020E70886; Sun, 26 Oct 2014 17:34:50 +0000 (UTC) Message-ID: <544D313E.8090908@multiplay.co.uk> Date: Sun, 26 Oct 2014 17:37:02 +0000 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Guido Falsi , freebsd-fs@freebsd.org Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net> <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org> <544D137A.7010006@multiplay.co.uk> <544D3038.7080901@FreeBSD.org> In-Reply-To: <544D3038.7080901@FreeBSD.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 17:34:52 -0000 On 26/10/2014 17:32, Guido Falsi wrote: > On 10/26/14 16:30, Steven Hartland wrote: >> >>> I bisected commits and finally found out this happens starting with >>> r268815, which MFCed r268205. >>> >>> It is related to trim support, in fact disabling trim on the filesystm >>> "fixes" it. >>> >>> I filed bug #194606 on bugzilla [1] to further track this issue, if >>> anyone is interested. >>> >>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606 >> Nice work Guido, can you try the attached patch and see if that fixes it >> please? > It dies the same way with this patch applied. I tested applying the > patch both in stable/10 at r268815 and to a fresh releng/10.1. > Looks like our mails might have cross over, was this with the original patch or the updated one? From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 17:37:59 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 090DFB4C for ; Sun, 26 Oct 2014 17:37:59 +0000 (UTC) Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BA2DEFD0 for ; Sun, 26 Oct 2014 17:37:58 +0000 (UTC) Received: from mail (mail [192.168.254.3]) by mail.madpilot.net (Postfix) with ESMTP id 3jQmbZ6NYVzb0g; Sun, 26 Oct 2014 18:37:46 +0100 (CET) Received: from mail.madpilot.net ([192.168.254.3]) by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024) with ESMTP id mbCySwbVk4xl; Sun, 26 Oct 2014 18:37:31 +0100 (CET) Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206]) by mail.madpilot.net (Postfix) with ESMTPSA; Sun, 26 Oct 2014 18:37:26 +0100 (CET) Message-ID: <544D3156.5030402@FreeBSD.org> Date: Sun, 26 Oct 2014 18:37:26 +0100 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Steven Hartland , freebsd-fs@freebsd.org Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net> <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org> <544D137A.7010006@multiplay.co.uk> <544D3038.7080901@FreeBSD.org> <544D313E.8090908@multiplay.co.uk> In-Reply-To: <544D313E.8090908@multiplay.co.uk> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 17:37:59 -0000 On 10/26/14 18:37, Steven Hartland wrote: > > On 26/10/2014 17:32, Guido Falsi wrote: >> On 10/26/14 16:30, Steven Hartland wrote: >>> >>>> I bisected commits and finally found out this happens starting with >>>> r268815, which MFCed r268205. >>>> >>>> It is related to trim support, in fact disabling trim on the filesystm >>>> "fixes" it. >>>> >>>> I filed bug #194606 on bugzilla [1] to further track this issue, if >>>> anyone is interested. >>>> >>>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606 >>> Nice work Guido, can you try the attached patch and see if that fixes it >>> please? >> It dies the same way with this patch applied. I tested applying the >> patch both in stable/10 at r268815 and to a fresh releng/10.1. >> > Looks like our mails might have cross over, was this with the original > patch or the updated one? Original one. I just saw the new one, I'll followup shortly. -- Guido Falsi From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 17:59:49 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 46DE983C for ; Sun, 26 Oct 2014 17:59:49 +0000 (UTC) Received: from mail.madpilot.net (grunt.madpilot.net [78.47.145.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 03CE824A for ; Sun, 26 Oct 2014 17:59:48 +0000 (UTC) Received: from mail (mail [192.168.254.3]) by mail.madpilot.net (Postfix) with ESMTP id 3jQn4n0YgYzb3G; Sun, 26 Oct 2014 18:59:37 +0100 (CET) Received: from mail.madpilot.net ([192.168.254.3]) by mail (mail.madpilot.net [192.168.254.3]) (amavisd-new, port 10024) with ESMTP id RYh5vvMCiyVB; Sun, 26 Oct 2014 18:59:21 +0100 (CET) Received: from tommy.madpilot.net (micro.madpilot.net [88.149.173.206]) by mail.madpilot.net (Postfix) with ESMTPSA; Sun, 26 Oct 2014 18:59:16 +0100 (CET) Message-ID: <544D3674.3030005@FreeBSD.org> Date: Sun, 26 Oct 2014 18:59:16 +0100 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Steven Hartland , freebsd-fs@freebsd.org Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net> <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org> <544D137A.7010006@multiplay.co.uk> <544D3038.7080901@FreeBSD.org> <544D313E.8090908@multiplay.co.uk> <544D3156.5030402@FreeBSD.org> In-Reply-To: <544D3156.5030402@FreeBSD.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 17:59:49 -0000 On 10/26/14 18:37, Guido Falsi wrote: > On 10/26/14 18:37, Steven Hartland wrote: >> >> On 26/10/2014 17:32, Guido Falsi wrote: >>> On 10/26/14 16:30, Steven Hartland wrote: >>>> >>>>> I bisected commits and finally found out this happens starting with >>>>> r268815, which MFCed r268205. >>>>> >>>>> It is related to trim support, in fact disabling trim on the filesystm >>>>> "fixes" it. >>>>> >>>>> I filed bug #194606 on bugzilla [1] to further track this issue, if >>>>> anyone is interested. >>>>> >>>>> [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606 >>>> Nice work Guido, can you try the attached patch and see if that fixes it >>>> please? >>> It dies the same way with this patch applied. I tested applying the >>> patch both in stable/10 at r268815 and to a fresh releng/10.1. >>> >> Looks like our mails might have cross over, was this with the original >> patch or the updated one? > > Original one. I just saw the new one, I'll followup shortly. > Tested again with new patch, against releng/10.1. Fixes it for me, I've been unable to make it crash again as before. Is there a chance to get this one in for 10.1-RELEASE? Thanks for the patch and time! -- Guido Falsi From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 18:27:00 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A60506C7; Sun, 26 Oct 2014 18:27:00 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 6C2767B3; Sun, 26 Oct 2014 18:27:00 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id 43F0620E7088C; Sun, 26 Oct 2014 18:26:58 +0000 (UTC) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 348CA20E70886; Sun, 26 Oct 2014 18:26:58 +0000 (UTC) Message-ID: <544D3D77.8000605@multiplay.co.uk> Date: Sun, 26 Oct 2014 18:29:11 +0000 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Guido Falsi , freebsd-fs@freebsd.org Subject: Re: panic: detach with active requests on 10.1-RC3 References: <544A538F.6060202@FreeBSD.org> <544BBB85.2020909@madpilot.net> <544BC990.4030700@madpilot.net> <544CC4D4.7040203@FreeBSD.org> <544D137A.7010006@multiplay.co.uk> <544D3038.7080901@FreeBSD.org> <544D313E.8090908@multiplay.co.uk> <544D3156.5030402@FreeBSD.org> <544D3674.3030005@FreeBSD.org> In-Reply-To: <544D3674.3030005@FreeBSD.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 18:27:00 -0000 On 26/10/2014 17:59, Guido Falsi wrote: > Tested again with new patch, against releng/10.1. Fixes it for me, > I've been unable to make it crash again as before. Is there a chance > to get this one in for 10.1-RELEASE? Thanks for the patch and time! Thanks for testing Guido, I have already informed re@ about being a potential blocker, so yes I'm looking to get this in for 10.1. Regards Steve From owner-freebsd-fs@FreeBSD.ORG Sun Oct 26 21:00:11 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9E02CE6 for ; Sun, 26 Oct 2014 21:00:11 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 75B136C3 for ; Sun, 26 Oct 2014 21:00:11 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id s9QL0B8E080271 for ; Sun, 26 Oct 2014 21:00:11 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201410262100.s9QL0B8E080271@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 26 Oct 2014 21:00:11 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 21:00:11 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ----------------+-----------+------------------------------------------------- Needs MFC | 136470 | [nfs] Cannot mount / in read-only, over NFS Needs MFC | 139651 | [nfs] mount(8): read-only remount of NFS volume Needs MFC | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non 3 problems total for which you should take action. From owner-freebsd-fs@FreeBSD.ORG Mon Oct 27 00:22:45 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 51CBD8BA; Mon, 27 Oct 2014 00:22:45 +0000 (UTC) Received: from mail.jrv.org (adsl-70-243-84-11.dsl.austtx.swbell.net [70.243.84.11]) by mx1.freebsd.org (Postfix) with ESMTP id 08876B34; Mon, 27 Oct 2014 00:22:44 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.jrv.org (Postfix) with ESMTP id D2B0F1B6C41; Sun, 26 Oct 2014 19:22:36 -0500 (CDT) Received: from mail.jrv.org ([127.0.0.1]) by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Jy30G9Y-t3Az; Sun, 26 Oct 2014 19:22:26 -0500 (CDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.jrv.org (Postfix) with ESMTP id D79A11B6C3C; Sun, 26 Oct 2014 19:22:26 -0500 (CDT) X-Virus-Scanned: amavisd-new at zimbra64.housenet.jrv Received: from mail.jrv.org ([127.0.0.1]) by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Dh7RDba9VwES; Sun, 26 Oct 2014 19:22:26 -0500 (CDT) Received: from [192.168.138.128] (BMX.housenet.jrv [192.168.3.140]) by mail.jrv.org (Postfix) with ESMTPSA id B50751B6C39; Sun, 26 Oct 2014 19:22:26 -0500 (CDT) Message-ID: <544D9056.10805@jrv.org> Date: Sun, 26 Oct 2014 18:22:46 -0600 From: "James R. Van Artsdalen" User-Agent: Mozilla/5.0 (Windows NT 5.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: "James R. Van Artsdalen" Subject: Re: zfs recv hangs in kmem arena References: <54250AE9.6070609@jrv.org> <543FAB3C.4090503@jrv.org> In-Reply-To: <543FAB3C.4090503@jrv.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, current@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 00:22:45 -0000 I was able to complete a ZFS replication by manually intervening each time "zfs recv" blocked on "kmem arena": running the program at the end was sufficient to unblock zfs each of the 17 times it stalled. The program is intended to consume about 24GB RAM out of 32GB physical RAM, thereby pressuring the ARC and kernel cache to shrink: when the program exits it would leave plenty of free RAM for zfs or whatever else. What actually happened is that every time, zfs unblocked as the program below was growing: it was never necessary to wait for the program to exit and free memory before zfs unblocked. On 10/16/2014 6:25 AM, James R. Van Artsdalen wrote: > The zfs recv / kmem arena hang happens with -CURRENT as well as > 10-STABLE, on two different systems, with 16GB or 32GB of RAM, from > memstick or normal multi-user environments, > > Hangs usually seem to hapeen 1TB to 3TB in, but last night one run hung > after only 4.35MB. > > On 9/26/2014 1:42 AM, James R. Van Artsdalen wrote: >> FreeBSD BLACKIE.housenet.jrv 10.1-BETA2 FreeBSD 10.1-BETA2 #2 r272070M: >> Wed Sep 24 17:36:56 CDT 2014 >> james@BLACKIE.housenet.jrv:/usr/obj/usr/src/sys/GENERIC amd64 >> >> With current STABLE10 I am unable to replicate a ZFS pool using zfs >> send/recv without zfs hanging in state "kmem arena", within the first >> 4TB or so (of a 23TB Pool). >> >> The most recent attempt used this command line >> >> SUPERTEX:/root# zfs send -R BIGTEX/UNIX@syssnap | ssh BLACKIE zfs recv >> -duvF BIGTOX >> >> though local replications fail in kmem arena too. >> >> The two machines I've been attempting this on have 16BG and 32GB of RAM >> each and are otherwise idle. >> >> Any suggestions on how to get around, or investigate, "kmem arena"? >> >> # top >> last pid: 3272; load averages: 0.22, 0.22, 0.23 up >> 0+08:25:02 01:32:07 >> 34 processes: 1 running, 33 sleeping >> CPU: 0.0% user, 0.0% nice, 0.1% system, 0.0% interrupt, 99.9% idle >> Mem: 21M Active, 82M Inact, 15G Wired, 28M Cache, 450M Free >> ARC: 12G Total, 24M MFU, 12G MRU, 23M Anon, 216M Header, 47M Other >> Swap: 16G Total, 16G Free >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU >> COMMAND >> 1173 root 1 52 0 86476K 7780K select 0 124:33 0.00% sshd >> 1176 root 1 46 0 87276K 47732K kmem a 3 48:36 0.00% zfs >> 968 root 32 20 0 12344K 1888K rpcsvc 0 0:13 0.00% nfsd >> 1009 root 1 20 0 25452K 2864K select 3 0:01 0.00% ntpd >> ... #include #include long long s = ( (long long) 1 << 32) - 65; main() { char *p; p = calloc (s, 1); memset (p, 1, s); p = calloc (s, 1); memset (p, 1, s); p = calloc (s, 1); memset (p, 1, s); p = calloc (s, 1); memset (p, 1, s); p = calloc (s, 1); memset (p, 1, s); p = calloc (s, 1); memset (p, 1, s); } From owner-freebsd-fs@FreeBSD.ORG Mon Oct 27 08:00:10 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9749D197 for ; Mon, 27 Oct 2014 08:00:10 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 850C2A69 for ; Mon, 27 Oct 2014 08:00:10 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id s9R80AjV070626 for ; Mon, 27 Oct 2014 08:00:10 GMT (envelope-from bugzilla-noreply@freebsd.org) Message-Id: <201410270800.s9R80AjV070626@kenobi.freebsd.org> From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [FreeBSD Bugzilla] Commit Needs MFC MIME-Version: 1.0 X-Bugzilla-Type: whine X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated Date: Mon, 27 Oct 2014 08:00:10 +0000 Content-Type: text/plain X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 08:00:10 -0000 Hi, You have a bug in the "Needs MFC" state which has not been touched in 7 or more days. This email serves as a reminder that you may want to MFC this bug or marked it as completed. In the event you have a longer MFC timeout you may update this bug with a comment and I won't remind you again for 7 days. This reminder is only sent on Mondays. Please file a bug about concerns you may have. This search was scheduled by eadler@FreeBSD.org. (3 bugs) Bug 136470: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=136470 Severity: Affects Only Me Priority: Normal Hardware: Any Assignee: freebsd-fs@FreeBSD.org Status: Needs MFC Resolution: Summary: [nfs] Cannot mount / in read-only, over NFS Bug 139651: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=139651 Severity: Affects Only Me Priority: Normal Hardware: Any Assignee: freebsd-fs@FreeBSD.org Status: Needs MFC Resolution: Summary: [nfs] mount(8): read-only remount of NFS volume does not work Bug 144447: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=144447 Severity: Affects Only Me Priority: Normal Hardware: Any Assignee: freebsd-fs@FreeBSD.org Status: Needs MFC Resolution: Summary: [zfs] sharenfs fsunshare() & fsshare_main() non functional From owner-freebsd-fs@FreeBSD.ORG Mon Oct 27 15:16:12 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2858EFE3 for ; Mon, 27 Oct 2014 15:16:12 +0000 (UTC) Received: from mail1.sandvine.com (Mail1.sandvine.com [64.7.137.134]) by mx1.freebsd.org (Postfix) with ESMTP id BEFD1382 for ; Mon, 27 Oct 2014 15:16:11 +0000 (UTC) Received: from WTL-EXCHP-1.sandvine.com ([fe80::ac6b:cc1e:f2ff:93aa]) by wtl-exchp-2.sandvine.com ([fe80::68ac:f071:19ff:3455%19]) with mapi id 14.03.0195.001; Mon, 27 Oct 2014 11:15:01 -0400 From: Adam Parco To: "freebsd-fs@freebsd.org" Subject: panic: devfs_fsync: vop_stdfsync failed. Thread-Topic: panic: devfs_fsync: vop_stdfsync failed. Thread-Index: Ac/x88uhWWQKkD58QMu09zdrpmOZwQ== Date: Mon, 27 Oct 2014 15:15:00 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.200.58] MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 15:16:12 -0000 Hello, I am investigating making FreeBSD 8.2 more resilient to removing a USB duri= ng a write. In my short testing I have gotten 3 different failures. I wou= ld like to discuss potential solutions for the first failure. I looked at = head source and there doesn't appear to be any fixes for this particular is= sue. >From what I have gathered, it looks like: - the filesystem synchronizer daemon wakes up - tries to sync vnodes - devfs_fsync realizes the device went away with dirty set, warns = of data loss: o "Device da0s1 went missing before all of the data could be written to i= t; expect data loss." - vop_stdfync loops a couple times trying to handle the dirty item= s - eventual gives up o "fsync: giving up on dirty" - Dirty count will still be >0 and errno should be EAGAIN (not 0) - Panic o "devfs_fsync: vop_stdfsync failed." When we give up on dirty, should we be clearing the dirty count? Otherwise= we will always panic after this. Thoughts? Other suggestions? Thanks, Adam. bt: #0 doadump () at pcpu.h:224 #1 0xffffffff8045fbe9 in boot (howto=3D260) at /usr/src/sys/kern/kern_shut= down.c:508 #2 0xffffffff8046011d in panic (fmt=3D0x1
) at = /usr/src/sys/kern/kern_shutdown.c:775 #3 0xffffffff803ec3db in devfs_fsync (ap=3D0xffffff862b3edb20) at /usr/src= /sys/fs/devfs/devfs_vnops.c:569 #4 0xffffffff8069c8ca in VOP_FSYNC_APV (vop=3D0xffffffff80892fa0, a=3D0xff= ffff862b3edb20) at vnode_if.c:1267 #5 0xffffffff804e3e27 in sync_vnode (slp=3D0xffffff0006136af0, bo=3D0xffff= ff862b3edbc0, td=3D0xffffff0006005480) at vnode_if.h:549 #6 0xffffffff804e406d in sched_sync () at /usr/src/sys/kern/vfs_subr.c:184= 1 #7 0xffffffff80438ed2 in fork_exit (callout=3D0xffffffff804e3ec0 , arg=3D0x0, frame=3D0xffffff862b3edc50) at /usr/src/sys/kern/kern_fork.= c:847 #8 0xffffffff80633dbe in fork_trampoline () at /usr/src/sys/amd64/amd64/ex= ception.S:599 console: da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 da0: Removable Direct Access SCSI-0 device da0: 40.000MB/s transfers da0: 7441MB (15240576 512 byte sectors: 255H 63S/T 948C) [-- MARK -- Fri Oct 24 13:39:00 2014] ugen1.3: at usbus1 (disconnected) umass0: at uhub3, port 1, addr 3 (disconnected) (da0:umass-sim0:0:0:0): AutoSense failed g_vfs_done():da0s1[WRITE(offset=3D50139136, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50204672, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50270208, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50335744, length=3D65536)](da0:umass-sim0= :0:0:0): lost device error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50401280, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50466816, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50532352, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50597888, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50663424, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50728960, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50794496, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50860032, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50925568, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50991104, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D50073600, length=3D65536)]error =3D 5 g_vfs_done():da0s1[WRITE(offset=3D51056640, length=3D65536)]error =3D 6 g_vfs_done():da0s1[WRITE(offset=3D51122176, length=3D65536)]error =3D 6 (da0:umass-sim0:0:0:0): Synchronize cache failed, status =3D=3D 0xa, scsi s= tatus =3D=3D 0x0 (da0:umass-sim0:0:0:0): removing device entry g_vfs_done():[unknown][WRITE(offset=3D51187712, length=3D65536)]error =3D 6 g_vfs_done():[unknown][WRITE(offset=3D51253248, length=3D65536)]error =3D 6 g_vfs_done():[unknown][WRITE(offset=3D51318784, length=3D65536)]error =3D 6 Device da0s1 went missing before all of the data could be written to it; ex= pect data loss. fsync: giving up on dirty 0xffffff08c7eec1d8: tag devfs, type VCHR usecount 1, writecount 0, refcount 4 mountedhere 0xffffff0109ddf400 flags (VI_DOOMED) v_object 0xffffff08c7ec7438 ref 0 pages 2169 lock type devfs: EXCL by thread 0xffffff0adf0b6900 (pid 64960) dev da0s1 panic: devfs_fsync: vop_stdfsync failed. pci: 1752 correctable, 0 uncorrectable, 0 fatal cpu: 2 correctable, 0 uncorrectable, 0 fatal cpuid =3D 4 curthread =3D getty/getty (64960/100629) cpu_ticks =3D 19048675905756 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff801e51ca =3D db_trace_self_wrapper+0x2= a panic() at 0xffffffff80460148 =3D panic+0x228 devfs_fsync() at 0xffffffff803ec3db =3D devfs_fsync+0x8b VOP_FSYNC_APV() at 0xffffffff8069c8ca =3D VOP_FSYNC_APV+0x4a bufsync() at 0xffffffff804cbcd8 =3D bufsync+0x38 bufobj_invalbuf() at 0xffffffff804e1997 =3D bufobj_invalbuf+0x87 vgonel() at 0xffffffff804e1c56 =3D vgonel+0xb6 vgone() at 0xffffffff804e1e89 =3D vgone+0x39 devfs_delete() at 0xffffffff803eab89 =3D devfs_delete+0x189 devfs_populate_loop() at 0xffffffff803eb37d =3D devfs_populate_loop+0x3ad devfs_populate() at 0xffffffff803eb461 =3D devfs_populate+0x21 devfs_lookup() at 0xffffffff803eeb94 =3D devfs_lookup+0x2d4 VOP_LOOKUP_APV() at 0xffffffff8069e9bc =3D VOP_LOOKUP_APV+0x4c lookup() at 0xffffffff804d7eea =3D lookup+0x37a namei() at 0xffffffff804d8cff =3D namei+0x3bf vn_open_cred() at 0xffffffff804ed583 =3D vn_open_cred+0x1e3 kern_openat() at 0xffffffff804eaab9 =3D kern_openat+0x149 syscallenter() at 0xffffffff8049bad4 =3D syscallenter+0x104 syscall() at 0xffffffff8064a15c =3D syscall+0x4c Xfast_syscall() at 0xffffffff80633b52 =3D Xfast_syscall+0xe2 --- syscall (5, FreeBSD ELF64, open), rip =3D 0x300845cfc, rsp =3D 0x7fffff= ffecd8, rbp =3D 0x5086a0 --- Uptime: 2h26m42s Physical memory: 49040 MB Dumping 1934 MB: 1919 1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 174= 3 1727 1711 1695 1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 150= 3 1487 1471 1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 126= 3 1247 1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071[-- MARK -- Fr= i Oct 24 13:40:00 2014] 1055 1039 1023 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783= 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 47= 9 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 1= 75 159 143 127 111 95 79 63 47 31 15 Dump complete From owner-freebsd-fs@FreeBSD.ORG Mon Oct 27 18:34:12 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 122D8287; Mon, 27 Oct 2014 18:34:12 +0000 (UTC) Received: from mail-vc0-x235.google.com (mail-vc0-x235.google.com [IPv6:2607:f8b0:400c:c03::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AE94CF62; Mon, 27 Oct 2014 18:34:11 +0000 (UTC) Received: by mail-vc0-f181.google.com with SMTP id hy10so1404492vcb.26 for ; Mon, 27 Oct 2014 11:34:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=01sPY+OMHWg/Up+AutcbW5dtl/bmfoCU5izse5Na2L4=; b=C9We9w5vkS5uLO0ySoNiB2CG8XkoxUrgnZ5NaW7aA56efwvawZy7z62OyXrJqrnwk7 f2+WaY50mgWrq79PQxyaQR6DtFQL7AER5BatYuaFNsRLLRM5n0m+65L+Nbbl3RPJP9Zb Ogz9s0gfwcY7EnEHZDqg5QBcsQYUjyFgYlsTV1VoR9IB4I/EDy6MzpD5q9iVfPdNKs3D 1yH+kdFvZrnqW3kMvzXtCW3VIAt0MudZ9I1feB2LmEt4rIpge+Q7fIJYli4Ok/AqJTzL YUGVaA4dhULu9/ntYiSxzizNbb+/9bnu8gy0S6Em15BddWfe2fnq694ca2hlJmKS4Vc7 u0ZA== MIME-Version: 1.0 X-Received: by 10.220.213.197 with SMTP id gx5mr1326433vcb.51.1414434850001; Mon, 27 Oct 2014 11:34:10 -0700 (PDT) Received: by 10.220.118.73 with HTTP; Mon, 27 Oct 2014 11:34:09 -0700 (PDT) In-Reply-To: References: <544B12B8.8060302@freebsd.org> Date: Mon, 27 Oct 2014 14:34:09 -0400 Message-ID: Subject: Re: ZFS errors on the array but not the disk. From: Zaphod Beeblebrox To: Steven Hartland Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 18:34:12 -0000 Ok... This is just frustrating. I've trusted ZFS through many versions ... and pretty much ... it's delivered. There are five symptoms here: 1. after each reboot, resilver starts again... even if after the resilver I complete a full scrub. 2. seemingly random objects (files, zvols or snapshot items) get marked as having errors. when I say random, to be clear; different items each time. 3. none of the drives are showing errors in zpool status, neither are they chucking errors into dmesg. 4. errors are being logged against the vdev (only one of the two vdevs) and the array (half as many as the vdev). 5. The activity light for the recently replaced disk does not "flash" "with" the others in it's vdev during either resilver or scrub. This last bit might need some explanation. I realize that raidz-1 stripes do not always use all the disks, but "generally" the activity lights of the drives in a vdev go "together"... In this case, the light of the recently replaced drive is off much of the time ... Is there anything I can/should do? I pulled the new disk, moved it's partitions around (it's larger than the array disks because you can't buy 1.5T drives anymore) and then re-added it... so I've tried that. On Fri, Oct 24, 2014 at 11:47 PM, Zaphod Beeblebrox wrote: > Thanks for the heads up. I'm following releng/10.1 and 271683 seems to be > part of that, but a good catch/guess. > > > On Fri, Oct 24, 2014 at 11:02 PM, Steven Hartland wrote: > >> There was an issue which would cause resilver restarts fixed by *265253* < >> https://svnweb.freebsd.org/base?view=revision&revision=265253> which was >> MFC'ed to stable/10 by *271683* > base?view=revision&revision=271683>so you'll want to make sure your >> latter than that. >> >> >> On 24/10/2014 19:42, Zaphod Beeblebrox wrote: >> >>> I manually replaced a disk... and the array was scrubbed recently. >>> Interestingly, I seem to be in the "endless loop" of resilvering >>> problem. >>> Not much I can find on it. but resilvering will complete and I can then >>> run another scrub. It will complete, too. Then rebooting causes another >>> resilvering. >>> >>> Another odd data point: it seems as if the things that show up as >>> "errors" >>> change from resilvering to resilvering. >>> >>> One bug, it would seem, is that once ZFS has detected an error... another >>> scrub can reset it, but no attempt is made to read-through the error if >>> you >>> access the object directly. >>> >>> On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers >>> wrote: >>> >>> On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox >>>> wrote: >>>> >>>>> What does it mean when checksum errors appear on the array (and the >>>>> vdev) >>>>> but not on any of the disks? See the paste below. One would think >>>>> that >>>>> there isn't some ephemeral data stored somewhere that is not one of the >>>>> disks, yet "cksum" errors show only on the vdev and the array lines. >>>>> >>>> Help? >>>> >>>>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status >>>>> pool: vr2 >>>>> state: ONLINE >>>>> status: One or more devices is currently being resilvered. The pool >>>>> will >>>>> continue to function, possibly in a degraded state. >>>>> action: Wait for the resilver to complete. >>>>> scan: resilver in progress since Thu Oct 23 23:11:29 2014 >>>>> 1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go >>>>> 119G resilvered, 6.79% done >>>>> config: >>>>> >>>>> NAME STATE READ WRITE CKSUM >>>>> vr2 ONLINE 0 0 36 >>>>> raidz1-0 ONLINE 0 0 72 >>>>> label/vr2-d0 ONLINE 0 0 0 >>>>> label/vr2-d1 ONLINE 0 0 0 >>>>> gpt/vr2-d2c ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native (resilvering) >>>>> gpt/vr2-d3b ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native >>>>> gpt/vr2-d4a ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native >>>>> ada14 ONLINE 0 0 0 >>>>> label/vr2-d6 ONLINE 0 0 0 >>>>> label/vr2-d7c ONLINE 0 0 0 >>>>> label/vr2-d8 ONLINE 0 0 0 >>>>> raidz1-1 ONLINE 0 0 0 >>>>> gpt/vr2-e0 ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native >>>>> gpt/vr2-e1 ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native >>>>> gpt/vr2-e2 ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native >>>>> gpt/vr2-e3 ONLINE 0 0 0 >>>>> gpt/vr2-e4 ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native >>>>> gpt/vr2-e5 ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native >>>>> gpt/vr2-e6 ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native >>>>> gpt/vr2-e7 ONLINE 0 0 0 block size: >>>>> 512B >>>>> configured, 4096B native >>>>> >>>>> errors: 43 data errors, use '-v' for a list >>>>> >>>> The checksum errors will appear on the raidz vdev instead of a leaf if >>>> vdev_raidz.c can't determine which leaf vdev was responsible. This >>>> could happen if two or more leaf vdevs return bad data for the same >>>> block, which would also lead to unrecoverable data errors. I see that >>>> you have some unrecoverable data errors, so maybe that's what happened >>>> to you. >>>> >>>> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable >>>> to determine which child was responsible for a checksum error. >>>> However, I've only seen that happen when a raidz vdev has a mirror >>>> child. That can only happen if the child is a spare or replacing >>>> vdev. Did you activate any spares, or did you manually replace a >>>> vdev? >>>> >>>> -Alan >>>> >>>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> >>> >>> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > > From owner-freebsd-fs@FreeBSD.ORG Mon Oct 27 23:13:44 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EED6C9BD for ; Mon, 27 Oct 2014 23:13:44 +0000 (UTC) Received: from mail-wi0-x231.google.com (mail-wi0-x231.google.com [IPv6:2a00:1450:400c:c05::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 703C730A for ; Mon, 27 Oct 2014 23:13:44 +0000 (UTC) Received: by mail-wi0-f177.google.com with SMTP id ex7so6002wid.16 for ; Mon, 27 Oct 2014 16:13:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nofocus.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=OCemWUEe5obl8Ac/+eSYevorlVK9O8ERvOttN0sdQu4=; b=NwNaZf4Ori+5PJajm3XYnq8piLRbuQN2cPb99bhvvLX74wVp7xYZ4li9SdFY+z8yT7 MMZyE+jLIBh09k4s0AgOJ1ggR3l61mvYpJZTtW8w4RsmCYajK1aFYUrPjLaLF37r9sVM 7ksFzkQOKai3HjLksFCszi+U4wMN28v3qYoyI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=OCemWUEe5obl8Ac/+eSYevorlVK9O8ERvOttN0sdQu4=; b=Ac9jBM8HXcQdR4u66gtyxAgPG82tkKGEbe7u9a0owtrT8ivIKZWQ3WHr6/ubmWo0yO B+yEqlDgeorNWs5jopBvn18HlNJ4exgB+J5LkdYA99L4wdPSHgeijM7x9ooNdsAEPCQs z1/8UAu8Uq83z57oiahIqq4/zbc3uP9Wvd/4ZKMdTfIj/gc7P+sBIGhKrsl2loMnU0YC X5eS2os7oojElgtICLpgcrj2l+rcHXU8CHHp4LiGItqxNyQmwMxmGGqLGFoLlsYEdwgp Inb0V0JKR3gc+70vNJ+pBAwVoBET8wDIEp7Td3Hnb8K72jjMl5gdarvHY7/DWcvNECma lRjQ== X-Gm-Message-State: ALoCoQkaOpCm/6JSedzhp75ZTg9n+djAYXjvFCzg5tcwhGoCHFA6GpVePYRt1+qBPEyYirP1IqdT X-Received: by 10.194.58.205 with SMTP id t13mr24834934wjq.55.1414451622405; Mon, 27 Oct 2014 16:13:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.180.103.10 with HTTP; Mon, 27 Oct 2014 16:13:22 -0700 (PDT) In-Reply-To: References: <544B12B8.8060302@freebsd.org> From: Robert Banz Date: Mon, 27 Oct 2014 16:13:22 -0700 Message-ID: Subject: Re: ZFS errors on the array but not the disk. To: Zaphod Beeblebrox Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-fs , Steven Hartland X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 23:13:45 -0000 Have you tried different hardware? This screams something's up anywhere in the stack -- DRAM, cabling, controller... On Mon, Oct 27, 2014 at 11:34 AM, Zaphod Beeblebrox wrote: > Ok... This is just frustrating. I've trusted ZFS through many versions ... > and pretty much ... it's delivered. There are five symptoms here: > > 1. after each reboot, resilver starts again... even if after the resilver I > complete a full scrub. > > 2. seemingly random objects (files, zvols or snapshot items) get marked as > having errors. when I say random, to be clear; different items each time. > > 3. none of the drives are showing errors in zpool status, neither are they > chucking errors into dmesg. > > 4. errors are being logged against the vdev (only one of the two vdevs) and > the array (half as many as the vdev). > > 5. The activity light for the recently replaced disk does not "flash" > "with" the others in it's vdev during either resilver or scrub. This last > bit might need some explanation. I realize that raidz-1 stripes do not > always use all the disks, but "generally" the activity lights of the drives > in a vdev go "together"... In this case, the light of the recently replaced > drive is off much of the time ... > > Is there anything I can/should do? I pulled the new disk, moved it's > partitions around (it's larger than the array disks because you can't buy > 1.5T drives anymore) and then re-added it... so I've tried that. > > > On Fri, Oct 24, 2014 at 11:47 PM, Zaphod Beeblebrox > wrote: > > > Thanks for the heads up. I'm following releng/10.1 and 271683 seems to > be > > part of that, but a good catch/guess. > > > > > > On Fri, Oct 24, 2014 at 11:02 PM, Steven Hartland > wrote: > > > >> There was an issue which would cause resilver restarts fixed by > *265253* < > >> https://svnweb.freebsd.org/base?view=revision&revision=265253> which > was > >> MFC'ed to stable/10 by *271683* >> base?view=revision&revision=271683>so you'll want to make sure your > >> latter than that. > >> > >> > >> On 24/10/2014 19:42, Zaphod Beeblebrox wrote: > >> > >>> I manually replaced a disk... and the array was scrubbed recently. > >>> Interestingly, I seem to be in the "endless loop" of resilvering > >>> problem. > >>> Not much I can find on it. but resilvering will complete and I can > then > >>> run another scrub. It will complete, too. Then rebooting causes > another > >>> resilvering. > >>> > >>> Another odd data point: it seems as if the things that show up as > >>> "errors" > >>> change from resilvering to resilvering. > >>> > >>> One bug, it would seem, is that once ZFS has detected an error... > another > >>> scrub can reset it, but no attempt is made to read-through the error if > >>> you > >>> access the object directly. > >>> > >>> On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers > >>> wrote: > >>> > >>> On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox < > zbeeble@gmail.com> > >>>> wrote: > >>>> > >>>>> What does it mean when checksum errors appear on the array (and the > >>>>> vdev) > >>>>> but not on any of the disks? See the paste below. One would think > >>>>> that > >>>>> there isn't some ephemeral data stored somewhere that is not one of > the > >>>>> disks, yet "cksum" errors show only on the vdev and the array lines. > >>>>> > >>>> Help? > >>>> > >>>>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status > >>>>> pool: vr2 > >>>>> state: ONLINE > >>>>> status: One or more devices is currently being resilvered. The pool > >>>>> will > >>>>> continue to function, possibly in a degraded state. > >>>>> action: Wait for the resilver to complete. > >>>>> scan: resilver in progress since Thu Oct 23 23:11:29 2014 > >>>>> 1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go > >>>>> 119G resilvered, 6.79% done > >>>>> config: > >>>>> > >>>>> NAME STATE READ WRITE CKSUM > >>>>> vr2 ONLINE 0 0 36 > >>>>> raidz1-0 ONLINE 0 0 72 > >>>>> label/vr2-d0 ONLINE 0 0 0 > >>>>> label/vr2-d1 ONLINE 0 0 0 > >>>>> gpt/vr2-d2c ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native (resilvering) > >>>>> gpt/vr2-d3b ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native > >>>>> gpt/vr2-d4a ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native > >>>>> ada14 ONLINE 0 0 0 > >>>>> label/vr2-d6 ONLINE 0 0 0 > >>>>> label/vr2-d7c ONLINE 0 0 0 > >>>>> label/vr2-d8 ONLINE 0 0 0 > >>>>> raidz1-1 ONLINE 0 0 0 > >>>>> gpt/vr2-e0 ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native > >>>>> gpt/vr2-e1 ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native > >>>>> gpt/vr2-e2 ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native > >>>>> gpt/vr2-e3 ONLINE 0 0 0 > >>>>> gpt/vr2-e4 ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native > >>>>> gpt/vr2-e5 ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native > >>>>> gpt/vr2-e6 ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native > >>>>> gpt/vr2-e7 ONLINE 0 0 0 block size: > >>>>> 512B > >>>>> configured, 4096B native > >>>>> > >>>>> errors: 43 data errors, use '-v' for a list > >>>>> > >>>> The checksum errors will appear on the raidz vdev instead of a leaf if > >>>> vdev_raidz.c can't determine which leaf vdev was responsible. This > >>>> could happen if two or more leaf vdevs return bad data for the same > >>>> block, which would also lead to unrecoverable data errors. I see that > >>>> you have some unrecoverable data errors, so maybe that's what happened > >>>> to you. > >>>> > >>>> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable > >>>> to determine which child was responsible for a checksum error. > >>>> However, I've only seen that happen when a raidz vdev has a mirror > >>>> child. That can only happen if the child is a spare or replacing > >>>> vdev. Did you activate any spares, or did you manually replace a > >>>> vdev? > >>>> > >>>> -Alan > >>>> > >>>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >>> > >>> > >>> > >> _______________________________________________ > >> freebsd-fs@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >> > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Tue Oct 28 01:47:45 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D9420EF8; Tue, 28 Oct 2014 01:47:45 +0000 (UTC) Received: from mail-vc0-x230.google.com (mail-vc0-x230.google.com [IPv6:2607:f8b0:400c:c03::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7F6D179B; Tue, 28 Oct 2014 01:47:45 +0000 (UTC) Received: by mail-vc0-f176.google.com with SMTP id hq11so3057019vcb.35 for ; Mon, 27 Oct 2014 18:47:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ndICht25sscxy+qgJ3UmAL4sLZbYhvYImwubEfgkFPI=; b=kTviLe0dGC7jllfWef4NJ9zaa1SDdszuIJUT6SUKY+nKA9LUyIl2rEnt/rCfpeBhwF wNVCC46Y+hav8/lIqCkTZx86nKLCZuj1hnxJw4GhwlqFIuuE1wWLqwlenLhnwkpdgrtV CETBpBPjNs44leKczdwFVAK6NsW9mIFf0tj+HnVTDtXAZKajvMqArTDzmk7RLnCrDAAf yAd88Fm9c+UmZx/CafC8JxIYYxyUqDSKrqnxO20QJ+ZrkhdY5BAukyTQ+d1moFLiHi98 3zp4pxTp0ljjJDseTtI0qfk58WjKmdPKb/2TZD1fJh3ipPKzIMBd23BPd0ALip0oRaem 2jBg== MIME-Version: 1.0 X-Received: by 10.220.128.4 with SMTP id i4mr104113vcs.32.1414460864414; Mon, 27 Oct 2014 18:47:44 -0700 (PDT) Received: by 10.220.118.73 with HTTP; Mon, 27 Oct 2014 18:47:44 -0700 (PDT) In-Reply-To: References: <544B12B8.8060302@freebsd.org> Date: Mon, 27 Oct 2014 21:47:44 -0400 Message-ID: Subject: Re: ZFS errors on the array but not the disk. From: Zaphod Beeblebrox To: Robert Banz Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-fs , Steven Hartland X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 01:47:45 -0000 Well... why wouldn't this trigger an error with (say) the checksums on the devices themselves? Without throwing an error, why is the vdev re - resilvering? I don't have spare hardware to throw at it. It's otherwise a sane system. It can "make -j32 buildworld" without choking. It can download several hundred torrents at a time without corrupting them. Hardly seems like suspect hardware. On Mon, Oct 27, 2014 at 7:13 PM, Robert Banz wrote: > Have you tried different hardware? This screams something's up anywhere in > the stack -- DRAM, cabling, controller... > > On Mon, Oct 27, 2014 at 11:34 AM, Zaphod Beeblebrox > wrote: > >> Ok... This is just frustrating. I've trusted ZFS through many versions >> ... >> and pretty much ... it's delivered. There are five symptoms here: >> >> 1. after each reboot, resilver starts again... even if after the resilver >> I >> complete a full scrub. >> >> 2. seemingly random objects (files, zvols or snapshot items) get marked as >> having errors. when I say random, to be clear; different items each time. >> >> 3. none of the drives are showing errors in zpool status, neither are they >> chucking errors into dmesg. >> >> 4. errors are being logged against the vdev (only one of the two vdevs) >> and >> the array (half as many as the vdev). >> >> 5. The activity light for the recently replaced disk does not "flash" >> "with" the others in it's vdev during either resilver or scrub. This last >> bit might need some explanation. I realize that raidz-1 stripes do not >> always use all the disks, but "generally" the activity lights of the >> drives >> in a vdev go "together"... In this case, the light of the recently >> replaced >> drive is off much of the time ... >> >> Is there anything I can/should do? I pulled the new disk, moved it's >> partitions around (it's larger than the array disks because you can't buy >> 1.5T drives anymore) and then re-added it... so I've tried that. >> >> >> On Fri, Oct 24, 2014 at 11:47 PM, Zaphod Beeblebrox >> wrote: >> >> > Thanks for the heads up. I'm following releng/10.1 and 271683 seems to >> be >> > part of that, but a good catch/guess. >> > >> > >> > On Fri, Oct 24, 2014 at 11:02 PM, Steven Hartland >> wrote: >> > >> >> There was an issue which would cause resilver restarts fixed by >> *265253* < >> >> https://svnweb.freebsd.org/base?view=revision&revision=265253> which >> was >> >> MFC'ed to stable/10 by *271683* > >> base?view=revision&revision=271683>so you'll want to make sure your >> >> latter than that. >> >> >> >> >> >> On 24/10/2014 19:42, Zaphod Beeblebrox wrote: >> >> >> >>> I manually replaced a disk... and the array was scrubbed recently. >> >>> Interestingly, I seem to be in the "endless loop" of resilvering >> >>> problem. >> >>> Not much I can find on it. but resilvering will complete and I can >> then >> >>> run another scrub. It will complete, too. Then rebooting causes >> another >> >>> resilvering. >> >>> >> >>> Another odd data point: it seems as if the things that show up as >> >>> "errors" >> >>> change from resilvering to resilvering. >> >>> >> >>> One bug, it would seem, is that once ZFS has detected an error... >> another >> >>> scrub can reset it, but no attempt is made to read-through the error >> if >> >>> you >> >>> access the object directly. >> >>> >> >>> On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers >> >>> wrote: >> >>> >> >>> On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox < >> zbeeble@gmail.com> >> >>>> wrote: >> >>>> >> >>>>> What does it mean when checksum errors appear on the array (and the >> >>>>> vdev) >> >>>>> but not on any of the disks? See the paste below. One would think >> >>>>> that >> >>>>> there isn't some ephemeral data stored somewhere that is not one of >> the >> >>>>> disks, yet "cksum" errors show only on the vdev and the array lines. >> >>>>> >> >>>> Help? >> >>>> >> >>>>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status >> >>>>> pool: vr2 >> >>>>> state: ONLINE >> >>>>> status: One or more devices is currently being resilvered. The pool >> >>>>> will >> >>>>> continue to function, possibly in a degraded state. >> >>>>> action: Wait for the resilver to complete. >> >>>>> scan: resilver in progress since Thu Oct 23 23:11:29 2014 >> >>>>> 1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go >> >>>>> 119G resilvered, 6.79% done >> >>>>> config: >> >>>>> >> >>>>> NAME STATE READ WRITE CKSUM >> >>>>> vr2 ONLINE 0 0 36 >> >>>>> raidz1-0 ONLINE 0 0 72 >> >>>>> label/vr2-d0 ONLINE 0 0 0 >> >>>>> label/vr2-d1 ONLINE 0 0 0 >> >>>>> gpt/vr2-d2c ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native (resilvering) >> >>>>> gpt/vr2-d3b ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native >> >>>>> gpt/vr2-d4a ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native >> >>>>> ada14 ONLINE 0 0 0 >> >>>>> label/vr2-d6 ONLINE 0 0 0 >> >>>>> label/vr2-d7c ONLINE 0 0 0 >> >>>>> label/vr2-d8 ONLINE 0 0 0 >> >>>>> raidz1-1 ONLINE 0 0 0 >> >>>>> gpt/vr2-e0 ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native >> >>>>> gpt/vr2-e1 ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native >> >>>>> gpt/vr2-e2 ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native >> >>>>> gpt/vr2-e3 ONLINE 0 0 0 >> >>>>> gpt/vr2-e4 ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native >> >>>>> gpt/vr2-e5 ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native >> >>>>> gpt/vr2-e6 ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native >> >>>>> gpt/vr2-e7 ONLINE 0 0 0 block size: >> >>>>> 512B >> >>>>> configured, 4096B native >> >>>>> >> >>>>> errors: 43 data errors, use '-v' for a list >> >>>>> >> >>>> The checksum errors will appear on the raidz vdev instead of a leaf >> if >> >>>> vdev_raidz.c can't determine which leaf vdev was responsible. This >> >>>> could happen if two or more leaf vdevs return bad data for the same >> >>>> block, which would also lead to unrecoverable data errors. I see >> that >> >>>> you have some unrecoverable data errors, so maybe that's what >> happened >> >>>> to you. >> >>>> >> >>>> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable >> >>>> to determine which child was responsible for a checksum error. >> >>>> However, I've only seen that happen when a raidz vdev has a mirror >> >>>> child. That can only happen if the child is a spare or replacing >> >>>> vdev. Did you activate any spares, or did you manually replace a >> >>>> vdev? >> >>>> >> >>>> -Alan >> >>>> >> >>>> _______________________________________________ >> >>> freebsd-fs@freebsd.org mailing list >> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >>> >> >>> >> >>> >> >> _______________________________________________ >> >> freebsd-fs@freebsd.org mailing list >> >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >> >> > >> > >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > > From owner-freebsd-fs@FreeBSD.ORG Tue Oct 28 02:45:24 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 048EC9A6 for ; Tue, 28 Oct 2014 02:45:24 +0000 (UTC) Received: from quine.pinyon.org (quine.pinyon.org [65.101.5.249]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C8035C96 for ; Tue, 28 Oct 2014 02:45:23 +0000 (UTC) Received: by quine.pinyon.org (Postfix, from userid 122) id CDCAE1602C2; Mon, 27 Oct 2014 19:45:16 -0700 (MST) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on quine.pinyon.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 Received: from feyerabend.n1.pinyon.org (feyerabend.n1.pinyon.org [10.0.10.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by quine.pinyon.org (Postfix) with ESMTPSA id A4D7E160247 for ; Mon, 27 Oct 2014 19:45:14 -0700 (MST) Message-ID: <544F033A.8070808@pinyon.org> Date: Mon, 27 Oct 2014 19:45:14 -0700 From: "Russell L. Carter" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS errors on the array but not the disk. References: <544B12B8.8060302@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 02:45:24 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/27/14 18:47, Zaphod Beeblebrox wrote: > Well... why wouldn't this trigger an error with (say) the checksums > on the devices themselves? Without throwing an error, why is the > vdev re - resilvering? I don't have spare hardware to throw at it. > It's otherwise a sane system. It can "make -j32 buildworld" > without choking. It can download several hundred torrents at a > time without corrupting them. Hardly seems like suspect hardware. I will just say as a non-zfs expert that I have had several disastrous raid failures over the last 15 yrs, and a couple that cost me real money, and it was always hw. And the reason it was disastrous was I couldn't diagnose it even though I was a pseudo-expert. I spent a lot of time under deadlines assuming the underlying hw was sane. The software diagnostics were no help. I trusted the hw then, but no more. And your reports (thank you) are a reminder to me to not give too much credence to zpool status. BR, Russell -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJUTwM6AAoJEFnLrGVSDFaEkwwP/A6CsMOF0uT/TA6NAOQBeFIW Byh9ySfYbBg9gUCB7YZFLBqmhGDzV2HCNu58cniYfVwj2Hwrr+GGahUJfagQjT1w ssNoflihTIBCWcmanXLoD9W0QMpGuyfi556FDzRX4NunAwP+URidqcJuR3tsdCcz jPYIQLZL6qQO5EfdX+UR9kcBS1st/6oLQ+Y2IPUXlfvg+hUQ660dS+SIfHFc+qcg lg2fLh3Vz8bJp2BlYJR6/AaxmOGrqA7Ze9hG684vaVSAz8U5EUn4tC76OPAPc1N2 MATat7T8lot0SRI1EqLBp6vsWpYTZK7itPDjyABO6f21iltbtgvPN22Hcr8+wEdQ AdEK4WLBsTF+xtD9DER1rVsDGIIYbBhw5vfh/7d9/RLrtf0B8rOs6OQNXV+ubjoc I8W852jbZT1HojLEOqIdC7bzkjEgln7a0miG/VFQPjYiZG9b5juozeOPOStENrrp ehIvvlxkeJBfJm505oLhXhOgXITC1fABTHeMfCXcbr3zw4OaN/8nHN4L4u2+HI35 2ahiWqwN/i6tF4V74zZDi9djkwuU8e+/qNrndeLotaTmXudY1Ox3wNBYEyYFCmHJ DIBSUPKqcH3zOICfiO0mmVmPuU4a95HkslRtNy1mPTvNO4+Cpv7iLx28CdZHXWfg BYb9ymp0bL3HAgHZwamd =SOhS -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Tue Oct 28 06:03:12 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8A6A7DD9; Tue, 28 Oct 2014 06:03:12 +0000 (UTC) Received: from mail.slu.se (tmgext2-1.slu.se [77.235.224.51]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "webmail.slu.se", Issuer "TERENA SSL CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D861B402; Tue, 28 Oct 2014 06:03:10 +0000 (UTC) Received: from Exchange2-2.slu.se (130.238.96.155) by Tmg2-1.slu.se (130.238.96.151) with Microsoft SMTP Server (TLS) id 14.3.210.2; Tue, 28 Oct 2014 07:03:00 +0100 Received: from Exchange2-1.slu.se ([130.238.96.154]) by exchange2-2 ([130.238.96.155]) with mapi id 14.03.0210.002; Tue, 28 Oct 2014 07:03:00 +0100 From: =?utf-8?B?S2FybGkgU2rDtmJlcmc=?= To: "zbeeble@gmail.com" Subject: Re: ZFS errors on the array but not the disk. Thread-Topic: ZFS errors on the array but not the disk. Thread-Index: AQHP8nTTZaxgLAPs10KP2kE89fC4EQ== Date: Tue, 28 Oct 2014 06:02:59 +0000 Message-ID: <5F9E965F5A80BC468BE5F40576769F099DF78CC7@exchange2-1> References: <544B12B8.8060302@freebsd.org> In-Reply-To: Accept-Language: sv-SE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [77.235.228.32] Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 MIME-Version: 1.0 Cc: "freebsd-fs@freebsd.org" , "smh@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 06:03:12 -0000 T24gTW9uLCAyMDE0LTEwLTI3IGF0IDE0OjM0IC0wNDAwLCBaYXBob2QgQmVlYmxlYnJveCB3cm90 ZToKPiBPay4uLiBUaGlzIGlzIGp1c3QgZnJ1c3RyYXRpbmcuICBJJ3ZlIHRydXN0ZWQgWkZTIHRo cm91Z2ggbWFueSB2ZXJzaW9ucyAuLi4KPiBhbmQgcHJldHR5IG11Y2ggLi4uIGl0J3MgZGVsaXZl cmVkLiAgVGhlcmUgYXJlIGZpdmUgc3ltcHRvbXMgaGVyZToKPiAKPiAxLiBhZnRlciBlYWNoIHJl Ym9vdCwgcmVzaWx2ZXIgc3RhcnRzIGFnYWluLi4uIGV2ZW4gaWYgYWZ0ZXIgdGhlIHJlc2lsdmVy IEkKPiBjb21wbGV0ZSBhIGZ1bGwgc2NydWIuCj4gCj4gMi4gc2VlbWluZ2x5IHJhbmRvbSBvYmpl Y3RzIChmaWxlcywgenZvbHMgb3Igc25hcHNob3QgaXRlbXMpIGdldCBtYXJrZWQgYXMKPiBoYXZp bmcgZXJyb3JzLiAgd2hlbiBJIHNheSByYW5kb20sIHRvIGJlIGNsZWFyOyBkaWZmZXJlbnQgaXRl bXMgZWFjaCB0aW1lLgo+IAo+IDMuIG5vbmUgb2YgdGhlIGRyaXZlcyBhcmUgc2hvd2luZyBlcnJv cnMgaW4genBvb2wgc3RhdHVzLCBuZWl0aGVyIGFyZSB0aGV5Cj4gY2h1Y2tpbmcgZXJyb3JzIGlu dG8gZG1lc2cuCj4gCj4gNC4gZXJyb3JzIGFyZSBiZWluZyBsb2dnZWQgYWdhaW5zdCB0aGUgdmRl diAob25seSBvbmUgb2YgdGhlIHR3byB2ZGV2cykgYW5kCj4gdGhlIGFycmF5IChoYWxmIGFzIG1h bnkgYXMgdGhlIHZkZXYpLgo+IAo+IDUuIFRoZSBhY3Rpdml0eSBsaWdodCBmb3IgdGhlIHJlY2Vu dGx5IHJlcGxhY2VkIGRpc2sgZG9lcyBub3QgImZsYXNoIgo+ICJ3aXRoIiB0aGUgb3RoZXJzIGlu IGl0J3MgdmRldiBkdXJpbmcgZWl0aGVyIHJlc2lsdmVyIG9yIHNjcnViLiAgVGhpcyBsYXN0Cj4g Yml0IG1pZ2h0IG5lZWQgc29tZSBleHBsYW5hdGlvbi4gSSByZWFsaXplIHRoYXQgcmFpZHotMSBz dHJpcGVzIGRvIG5vdAo+IGFsd2F5cyB1c2UgYWxsIHRoZSBkaXNrcywgYnV0ICJnZW5lcmFsbHki IHRoZSBhY3Rpdml0eSBsaWdodHMgb2YgdGhlIGRyaXZlcwo+IGluIGEgdmRldiBnbyAidG9nZXRo ZXIiLi4uIEluIHRoaXMgY2FzZSwgdGhlIGxpZ2h0IG9mIHRoZSByZWNlbnRseSByZXBsYWNlZAo+ IGRyaXZlIGlzIG9mZiBtdWNoIG9mIHRoZSB0aW1lIC4uLgo+IAo+IElzIHRoZXJlIGFueXRoaW5n IEkgY2FuL3Nob3VsZCBkbz8gIEkgcHVsbGVkIHRoZSBuZXcgZGlzaywgbW92ZWQgaXQncwo+IHBh cnRpdGlvbnMgYXJvdW5kIChpdCdzIGxhcmdlciB0aGFuIHRoZSBhcnJheSBkaXNrcyBiZWNhdXNl IHlvdSBjYW4ndCBidXkKPiAxLjVUIGRyaXZlcyBhbnltb3JlKSBhbmQgdGhlbiByZS1hZGRlZCBp dC4uLiBzbyBJJ3ZlIHRyaWVkIHRoYXQuCgpIYXZlIHlvdSB0cmllZCBzdGFydGluZyBpdCB1cCBm cm9tIGEgQ0QsIFVTQiwgd2hhdGV2LCBhbmQgdHJ5IHRvIGltcG9ydAp0aGUgcG9vbCBmcm9tIHRo ZXJlPwoKL0sKCj4gCj4gCj4gT24gRnJpLCBPY3QgMjQsIDIwMTQgYXQgMTE6NDcgUE0sIFphcGhv ZCBCZWVibGVicm94IDx6YmVlYmxlQGdtYWlsLmNvbT4KPiB3cm90ZToKPiAKPiA+IFRoYW5rcyBm b3IgdGhlIGhlYWRzIHVwLiAgSSdtIGZvbGxvd2luZyByZWxlbmcvMTAuMSBhbmQgMjcxNjgzIHNl ZW1zIHRvIGJlCj4gPiBwYXJ0IG9mIHRoYXQsIGJ1dCBhIGdvb2QgY2F0Y2gvZ3Vlc3MuCj4gPgo+ ID4KPiA+IE9uIEZyaSwgT2N0IDI0LCAyMDE0IGF0IDExOjAyIFBNLCBTdGV2ZW4gSGFydGxhbmQg PHNtaEBmcmVlYnNkLm9yZz4gd3JvdGU6Cj4gPgo+ID4+IFRoZXJlIHdhcyBhbiBpc3N1ZSB3aGlj aCB3b3VsZCBjYXVzZSByZXNpbHZlciByZXN0YXJ0cyBmaXhlZCBieSAqMjY1MjUzKiA8Cj4gPj4g aHR0cHM6Ly9zdm53ZWIuZnJlZWJzZC5vcmcvYmFzZT92aWV3PXJldmlzaW9uJnJldmlzaW9uPTI2 NTI1Mz4gd2hpY2ggd2FzCj4gPj4gTUZDJ2VkIHRvIHN0YWJsZS8xMCBieSAqMjcxNjgzKiA8aHR0 cHM6Ly9zdm53ZWIuZnJlZWJzZC5vcmcvCj4gPj4gYmFzZT92aWV3PXJldmlzaW9uJnJldmlzaW9u PTI3MTY4Mz5zbyB5b3UnbGwgd2FudCB0byBtYWtlIHN1cmUgeW91cgo+ID4+IGxhdHRlciB0aGFu IHRoYXQuCj4gPj4KPiA+Pgo+ID4+IE9uIDI0LzEwLzIwMTQgMTk6NDIsIFphcGhvZCBCZWVibGVi cm94IHdyb3RlOgo+ID4+Cj4gPj4+IEkgbWFudWFsbHkgcmVwbGFjZWQgYSBkaXNrLi4uIGFuZCB0 aGUgYXJyYXkgd2FzIHNjcnViYmVkIHJlY2VudGx5Lgo+ID4+PiBJbnRlcmVzdGluZ2x5LCBJIHNl ZW0gdG8gYmUgaW4gdGhlICJlbmRsZXNzIGxvb3AiICBvZiByZXNpbHZlcmluZwo+ID4+PiBwcm9i bGVtLgo+ID4+PiBOb3QgbXVjaCBJIGNhbiBmaW5kIG9uIGl0LiAgYnV0IHJlc2lsdmVyaW5nIHdp bGwgY29tcGxldGUgYW5kIEkgY2FuIHRoZW4KPiA+Pj4gcnVuIGFub3RoZXIgc2NydWIuICBJdCB3 aWxsIGNvbXBsZXRlLCB0b28uICBUaGVuIHJlYm9vdGluZyBjYXVzZXMgYW5vdGhlcgo+ID4+PiBy ZXNpbHZlcmluZy4KPiA+Pj4KPiA+Pj4gQW5vdGhlciBvZGQgZGF0YSBwb2ludDogaXQgc2VlbXMg YXMgaWYgdGhlIHRoaW5ncyB0aGF0IHNob3cgdXAgYXMKPiA+Pj4gImVycm9ycyIKPiA+Pj4gY2hh bmdlIGZyb20gcmVzaWx2ZXJpbmcgdG8gcmVzaWx2ZXJpbmcuCj4gPj4+Cj4gPj4+IE9uZSBidWcs IGl0IHdvdWxkIHNlZW0sIGlzIHRoYXQgb25jZSBaRlMgaGFzIGRldGVjdGVkIGFuIGVycm9yLi4u IGFub3RoZXIKPiA+Pj4gc2NydWIgY2FuIHJlc2V0IGl0LCBidXQgbm8gYXR0ZW1wdCBpcyBtYWRl IHRvIHJlYWQtdGhyb3VnaCB0aGUgZXJyb3IgaWYKPiA+Pj4geW91Cj4gPj4+IGFjY2VzcyB0aGUg b2JqZWN0IGRpcmVjdGx5Lgo+ID4+Pgo+ID4+PiBPbiBGcmksIE9jdCAyNCwgMjAxNCBhdCAxMToz MyBBTSwgQWxhbiBTb21lcnMgPGFzb21lcnNAZnJlZWJzZC5vcmc+Cj4gPj4+IHdyb3RlOgo+ID4+ Pgo+ID4+PiAgT24gVGh1LCBPY3QgMjMsIDIwMTQgYXQgMTE6MzcgUE0sIFphcGhvZCBCZWVibGVi cm94IDx6YmVlYmxlQGdtYWlsLmNvbT4KPiA+Pj4+IHdyb3RlOgo+ID4+Pj4KPiA+Pj4+PiBXaGF0 IGRvZXMgaXQgbWVhbiB3aGVuIGNoZWNrc3VtIGVycm9ycyBhcHBlYXIgb24gdGhlIGFycmF5IChh bmQgdGhlCj4gPj4+Pj4gdmRldikKPiA+Pj4+PiBidXQgbm90IG9uIGFueSBvZiB0aGUgZGlza3M/ ICBTZWUgdGhlIHBhc3RlIGJlbG93LiAgT25lIHdvdWxkIHRoaW5rCj4gPj4+Pj4gdGhhdAo+ID4+ Pj4+IHRoZXJlIGlzbid0IHNvbWUgZXBoZW1lcmFsIGRhdGEgc3RvcmVkIHNvbWV3aGVyZSB0aGF0 IGlzIG5vdCBvbmUgb2YgdGhlCj4gPj4+Pj4gZGlza3MsIHlldCAiY2tzdW0iIGVycm9ycyBzaG93 IG9ubHkgb24gdGhlIHZkZXYgYW5kIHRoZSBhcnJheSBsaW5lcy4KPiA+Pj4+Pgo+ID4+Pj4gSGVs cD8KPiA+Pj4+Cj4gPj4+Pj4gWzI6MTc6MzE2XXJvb3RAdmlydHVhbDovdnIyL3RvcnJlbnQvaW4+ IHpwb29sIHN0YXR1cwo+ID4+Pj4+ICAgIHBvb2w6IHZyMgo+ID4+Pj4+ICAgc3RhdGU6IE9OTElO RQo+ID4+Pj4+IHN0YXR1czogT25lIG9yIG1vcmUgZGV2aWNlcyBpcyBjdXJyZW50bHkgYmVpbmcg cmVzaWx2ZXJlZC4gIFRoZSBwb29sCj4gPj4+Pj4gd2lsbAo+ID4+Pj4+ICAgICAgICAgIGNvbnRp bnVlIHRvIGZ1bmN0aW9uLCBwb3NzaWJseSBpbiBhIGRlZ3JhZGVkIHN0YXRlLgo+ID4+Pj4+IGFj dGlvbjogV2FpdCBmb3IgdGhlIHJlc2lsdmVyIHRvIGNvbXBsZXRlLgo+ID4+Pj4+ICAgIHNjYW46 IHJlc2lsdmVyIGluIHByb2dyZXNzIHNpbmNlIFRodSBPY3QgMjMgMjM6MTE6MjkgMjAxNAo+ID4+ Pj4+ICAgICAgICAgIDEuNTNUIHNjYW5uZWQgb3V0IG9mIDIyLjZUIGF0IDYyLjRNL3MsIDk4aDIz bSB0byBnbwo+ID4+Pj4+ICAgICAgICAgIDExOUcgcmVzaWx2ZXJlZCwgNi43OSUgZG9uZQo+ID4+ Pj4+IGNvbmZpZzoKPiA+Pj4+Pgo+ID4+Pj4+ICAgICAgICAgIE5BTUUgICAgICAgICAgICAgICBT VEFURSAgICAgUkVBRCBXUklURSBDS1NVTQo+ID4+Pj4+ICAgICAgICAgIHZyMiAgICAgICAgICAg ICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAzNgo+ID4+Pj4+ICAgICAgICAgICAgcmFpZHox LTAgICAgICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICA3Mgo+ID4+Pj4+ICAgICAgICAgICAg ICBsYWJlbC92cjItZDAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ICAgICAg ICAgICAgICBsYWJlbC92cjItZDEgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ ICAgICAgICAgICAgICBncHQvdnIyLWQyYyAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMCAg YmxvY2sgc2l6ZToKPiA+Pj4+PiA1MTJCCj4gPj4+Pj4gY29uZmlndXJlZCwgNDA5NkIgbmF0aXZl ICAocmVzaWx2ZXJpbmcpCj4gPj4+Pj4gICAgICAgICAgICAgIGdwdC92cjItZDNiICAgIE9OTElO RSAgICAgICAwICAgICAwICAgICAwICBibG9jayBzaXplOgo+ID4+Pj4+IDUxMkIKPiA+Pj4+PiBj b25maWd1cmVkLCA0MDk2QiBuYXRpdmUKPiA+Pj4+PiAgICAgICAgICAgICAgZ3B0L3ZyMi1kNGEg ICAgT05MSU5FICAgICAgIDAgICAgIDAgICAgIDAgIGJsb2NrIHNpemU6Cj4gPj4+Pj4gNTEyQgo+ ID4+Pj4+IGNvbmZpZ3VyZWQsIDQwOTZCIG5hdGl2ZQo+ID4+Pj4+ICAgICAgICAgICAgICBhZGEx NCAgICAgICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ICAgICAgICAgICAg ICBsYWJlbC92cjItZDYgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ICAgICAg ICAgICAgICBsYWJlbC92cjItZDdjICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ ICAgICAgICAgICAgICBsYWJlbC92cjItZDggICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ ID4+Pj4+ICAgICAgICAgICAgcmFpZHoxLTEgICAgICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAg ICAgMAo+ID4+Pj4+ICAgICAgICAgICAgICBncHQvdnIyLWUwICAgICBPTkxJTkUgICAgICAgMCAg ICAgMCAgICAgMCAgYmxvY2sgc2l6ZToKPiA+Pj4+PiA1MTJCCj4gPj4+Pj4gY29uZmlndXJlZCwg NDA5NkIgbmF0aXZlCj4gPj4+Pj4gICAgICAgICAgICAgIGdwdC92cjItZTEgICAgIE9OTElORSAg ICAgICAwICAgICAwICAgICAwICBibG9jayBzaXplOgo+ID4+Pj4+IDUxMkIKPiA+Pj4+PiBjb25m aWd1cmVkLCA0MDk2QiBuYXRpdmUKPiA+Pj4+PiAgICAgICAgICAgICAgZ3B0L3ZyMi1lMiAgICAg T05MSU5FICAgICAgIDAgICAgIDAgICAgIDAgIGJsb2NrIHNpemU6Cj4gPj4+Pj4gNTEyQgo+ID4+ Pj4+IGNvbmZpZ3VyZWQsIDQwOTZCIG5hdGl2ZQo+ID4+Pj4+ICAgICAgICAgICAgICBncHQvdnIy LWUzICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMAo+ID4+Pj4+ICAgICAgICAgICAgICBn cHQvdnIyLWU0ICAgICBPTkxJTkUgICAgICAgMCAgICAgMCAgICAgMCAgYmxvY2sgc2l6ZToKPiA+ Pj4+PiA1MTJCCj4gPj4+Pj4gY29uZmlndXJlZCwgNDA5NkIgbmF0aXZlCj4gPj4+Pj4gICAgICAg ICAgICAgIGdwdC92cjItZTUgICAgIE9OTElORSAgICAgICAwICAgICAwICAgICAwICBibG9jayBz aXplOgo+ID4+Pj4+IDUxMkIKPiA+Pj4+PiBjb25maWd1cmVkLCA0MDk2QiBuYXRpdmUKPiA+Pj4+ PiAgICAgICAgICAgICAgZ3B0L3ZyMi1lNiAgICAgT05MSU5FICAgICAgIDAgICAgIDAgICAgIDAg IGJsb2NrIHNpemU6Cj4gPj4+Pj4gNTEyQgo+ID4+Pj4+IGNvbmZpZ3VyZWQsIDQwOTZCIG5hdGl2 ZQo+ID4+Pj4+ICAgICAgICAgICAgICBncHQvdnIyLWU3ICAgICBPTkxJTkUgICAgICAgMCAgICAg MCAgICAgMCAgYmxvY2sgc2l6ZToKPiA+Pj4+PiA1MTJCCj4gPj4+Pj4gY29uZmlndXJlZCwgNDA5 NkIgbmF0aXZlCj4gPj4+Pj4KPiA+Pj4+PiBlcnJvcnM6IDQzIGRhdGEgZXJyb3JzLCB1c2UgJy12 JyBmb3IgYSBsaXN0Cj4gPj4+Pj4KPiA+Pj4+IFRoZSBjaGVja3N1bSBlcnJvcnMgd2lsbCBhcHBl YXIgb24gdGhlIHJhaWR6IHZkZXYgaW5zdGVhZCBvZiBhIGxlYWYgaWYKPiA+Pj4+IHZkZXZfcmFp ZHouYyBjYW4ndCBkZXRlcm1pbmUgd2hpY2ggbGVhZiB2ZGV2IHdhcyByZXNwb25zaWJsZS4gIFRo aXMKPiA+Pj4+IGNvdWxkIGhhcHBlbiBpZiB0d28gb3IgbW9yZSBsZWFmIHZkZXZzIHJldHVybiBi YWQgZGF0YSBmb3IgdGhlIHNhbWUKPiA+Pj4+IGJsb2NrLCB3aGljaCB3b3VsZCBhbHNvIGxlYWQg dG8gdW5yZWNvdmVyYWJsZSBkYXRhIGVycm9ycy4gIEkgc2VlIHRoYXQKPiA+Pj4+IHlvdSBoYXZl IHNvbWUgdW5yZWNvdmVyYWJsZSBkYXRhIGVycm9ycywgc28gbWF5YmUgdGhhdCdzIHdoYXQgaGFw cGVuZWQKPiA+Pj4+IHRvIHlvdS4KPiA+Pj4+Cj4gPj4+PiBTdWJ0bGUgZGVzaWduIGJ1Z3MgaW4g WkZTIGNhbiBhbHNvIGxlYWQgdG8gdmRldl9yYWlkei5jIGJlaW5nIHVuYWJsZQo+ID4+Pj4gdG8g ZGV0ZXJtaW5lIHdoaWNoIGNoaWxkIHdhcyByZXNwb25zaWJsZSBmb3IgYSBjaGVja3N1bSBlcnJv ci4KPiA+Pj4+IEhvd2V2ZXIsIEkndmUgb25seSBzZWVuIHRoYXQgaGFwcGVuIHdoZW4gYSByYWlk eiB2ZGV2IGhhcyBhIG1pcnJvcgo+ID4+Pj4gY2hpbGQuICBUaGF0IGNhbiBvbmx5IGhhcHBlbiBp ZiB0aGUgY2hpbGQgaXMgYSBzcGFyZSBvciByZXBsYWNpbmcKPiA+Pj4+IHZkZXYuICBEaWQgeW91 IGFjdGl2YXRlIGFueSBzcGFyZXMsIG9yIGRpZCB5b3UgbWFudWFsbHkgcmVwbGFjZSBhCj4gPj4+ PiB2ZGV2Pwo+ID4+Pj4KPiA+Pj4+IC1BbGFuCj4gPj4+Pgo+ID4+Pj4gIF9fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCj4gPj4+IGZyZWVic2QtZnNAZnJlZWJz ZC5vcmcgbWFpbGluZyBsaXN0Cj4gPj4+IGh0dHA6Ly9saXN0cy5mcmVlYnNkLm9yZy9tYWlsbWFu L2xpc3RpbmZvL2ZyZWVic2QtZnMKPiA+Pj4gVG8gdW5zdWJzY3JpYmUsIHNlbmQgYW55IG1haWwg dG8gImZyZWVic2QtZnMtdW5zdWJzY3JpYmVAZnJlZWJzZC5vcmciCj4gPj4+Cj4gPj4+Cj4gPj4+ Cj4gPj4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KPiA+ PiBmcmVlYnNkLWZzQGZyZWVic2Qub3JnIG1haWxpbmcgbGlzdAo+ID4+IGh0dHA6Ly9saXN0cy5m cmVlYnNkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2ZyZWVic2QtZnMKPiA+PiBUbyB1bnN1YnNjcmli ZSwgc2VuZCBhbnkgbWFpbCB0byAiZnJlZWJzZC1mcy11bnN1YnNjcmliZUBmcmVlYnNkLm9yZyIK PiA+Pgo+ID4KPiA+Cj4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX18KPiBmcmVlYnNkLWZzQGZyZWVic2Qub3JnIG1haWxpbmcgbGlzdAo+IGh0dHA6Ly9saXN0 cy5mcmVlYnNkLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2ZyZWVic2QtZnMKPiBUbyB1bnN1YnNjcmli ZSwgc2VuZCBhbnkgbWFpbCB0byAiZnJlZWJzZC1mcy11bnN1YnNjcmliZUBmcmVlYnNkLm9yZyIK CgoKLS0gCgpNZWQgVsOkbmxpZ2EgSMOkbHNuaW5nYXIKCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0K S2FybGkgU2rDtmJlcmcKU3dlZGlzaCBVbml2ZXJzaXR5IG9mIEFncmljdWx0dXJhbCBTY2llbmNl cyBCb3ggNzA3OSAoVmlzaXRpbmcgQWRkcmVzcwpLcm9uw6VzdsOkZ2VuIDgpClMtNzUwIDA3IFVw cHNhbGEsIFN3ZWRlbgpQaG9uZTogICs0Ni0oMCkxOC02NyAxNSA2NgprYXJsaS5zam9iZXJnQHNs dS5zZQo= From owner-freebsd-fs@FreeBSD.ORG Tue Oct 28 12:58:20 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B8D6313E for ; Tue, 28 Oct 2014 12:58:20 +0000 (UTC) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 873BA838 for ; Tue, 28 Oct 2014 12:58:20 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 9EBE420AAE for ; Tue, 28 Oct 2014 08:58:18 -0400 (EDT) Received: from web3 ([10.202.2.213]) by compute1.internal (MEProxy); Tue, 28 Oct 2014 08:58:18 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:x-sasl-enc:from:to :mime-version:content-transfer-encoding:content-type:subject :date:in-reply-to:references; s=smtpout; bh=mVG/QQrU0PC9QMYViVyu RKJJu/c=; b=C+Ph150AO57FAdytTzTtWbqYH7vgmvQtDXgUYwSbHEU/xWpVM4if GsN4gYYvvpUqM9NGloIZjsyK45eCqPp7wdJ5BXJbuIDJy2fi+Wn+dnJjrvPEwjH5 04jt/Qu5zT0fli4Ck1srTPu76/h4MCN/rzSdpvs4knkZykREwsmAQI8= Received: by web3.nyi.internal (Postfix, from userid 99) id 6EB22114632; Tue, 28 Oct 2014 08:58:18 -0400 (EDT) Message-Id: <1414501098.45274.184197353.0847A931@webmail.messagingengine.com> X-Sasl-Enc: V7cYWP1L0Spleqtflk9h6RdeeT5PnfgbeW9Nhu6jw7wz 1414501098 From: Mark Felder To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - ajax-c51dec4f Subject: Re: ZFS errors on the array but not the disk. Date: Tue, 28 Oct 2014 07:58:18 -0500 In-Reply-To: References: <544B12B8.8060302@freebsd.org> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 12:58:20 -0000 On Mon, Oct 27, 2014, at 13:34, Zaphod Beeblebrox wrote: > > Is there anything I can/should do? I pulled the new disk, moved it's > partitions around (it's larger than the array disks because you can't buy > 1.5T drives anymore) and then re-added it... so I've tried that. > Test and/or replace your power supply. You'd be surprised what dropping voltage (even slightly) can do. Consider all parts of your system suspect until they've been thoroughly vetted. From owner-freebsd-fs@FreeBSD.ORG Wed Oct 29 18:08:37 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 33776A99 for ; Wed, 29 Oct 2014 18:08:37 +0000 (UTC) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E2E3FEDC for ; Wed, 29 Oct 2014 18:08:36 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1XjXfY-0000zO-CL for freebsd-fs@freebsd.org; Wed, 29 Oct 2014 19:08:28 +0100 Received: from jtotz2.cs.ucl.ac.uk ([128.16.6.56]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 29 Oct 2014 19:08:28 +0100 Received: from johannes by jtotz2.cs.ucl.ac.uk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 29 Oct 2014 19:08:28 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Johannes Totz Subject: Re: Snapshots and what not to snapshot Date: Wed, 29 Oct 2014 18:08:17 +0000 Lines: 63 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: jtotz2.cs.ucl.ac.uk User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 In-Reply-To: X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 18:08:37 -0000 On 11/10/2014 22:38, Dmitry Morozovsky wrote: > Colleagues, > > reading some last threads I'm starting to think again about the problem I > thought about for many times, but invent nothing but crude hacks: > > it would be great to have a mechanism to exclude some subtrees from recursive > snapshots; the model is like: > > you have some tree of ZFS file systems, like > > pool/path/r > pool/path/jails > pool/path/jails/j1 > pool/path/jails/j1/obj > .. > pool/path/persistent > pool/path/obj > > or something alike. > > To have the ability to make consistent backup, one would use ``zfs snapshot > -r'' > > but -- before using zfs send or other replication machanisms it would be > feasible to remove snapshots of not-so-important filesystems. Not just remove but exclude from snapshotting in the first place. > > For now, the kludge I could see is to set on these some artificial property > like org.freebsd:nodump or similar, then traverse zfs list with this attribute > and delete non-needed snapshots. snapshot -r could inspect a property on children and skips snapshot creation if some criteria are fullfilled. For example: zfs set org.freebsd:skip_recursive_snapshot=hou.* pool/backup zfs snapshot -r pool@hourly zfs snapshot -r pool@house zfs snapshot -r pool@important The skip property could be a regex that is matched against the to-be-created snapshot name. If it matches, no snaps for that child and its children recursively. > > Maybe somewhere there are more elegant solutions? > > Sincerely, > D.Marck [DM5020, MCK-RIPE, DM3-RIPN] > [ FreeBSD committer: marck@FreeBSD.org ] > ------------------------------------------------------------------------ > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** > ------------------------------------------------------------------------ > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Wed Oct 29 19:18:52 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 14CF8D84 for ; Wed, 29 Oct 2014 19:18:52 +0000 (UTC) Received: from mail.jrv.org (rrcs-24-73-246-106.sw.biz.rr.com [24.73.246.106]) by mx1.freebsd.org (Postfix) with ESMTP id DBA30900 for ; Wed, 29 Oct 2014 19:18:51 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.jrv.org (Postfix) with ESMTP id 91E811BDBE8; Wed, 29 Oct 2014 14:13:05 -0500 (CDT) Received: from mail.jrv.org ([127.0.0.1]) by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Dikx1qKcJVFY; Wed, 29 Oct 2014 14:12:55 -0500 (CDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.jrv.org (Postfix) with ESMTP id A90851BDBE1; Wed, 29 Oct 2014 14:12:55 -0500 (CDT) X-Virus-Scanned: amavisd-new at zimbra64.housenet.jrv Received: from mail.jrv.org ([127.0.0.1]) by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id GwydYXvSCHxA; Wed, 29 Oct 2014 14:12:55 -0500 (CDT) Received: from [192.168.138.128] (BMX.housenet.jrv [192.168.3.140]) by mail.jrv.org (Postfix) with ESMTPSA id 80F371BDBDE; Wed, 29 Oct 2014 14:12:55 -0500 (CDT) Message-ID: <54513C4D.4010203@jrv.org> Date: Wed, 29 Oct 2014 13:13:17 -0600 From: "James R. Van Artsdalen" User-Agent: Mozilla/5.0 (Windows NT 5.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Johannes Totz Subject: Re: Snapshots and what not to snapshot References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 19:18:52 -0000 On 10/29/2014 12:08 PM, Johannes Totz wrote: > On 11/10/2014 22:38, Dmitry Morozovsky wrote: >> you have some tree of ZFS file systems, like >> >> pool/path/r >> pool/path/jails >> pool/path/jails/j1 >> pool/path/jails/j1/obj >> snapshots and ZFS replication is done against the ZFS namespace, not the unix namespace. Organize your filesystems in the ZFS tree based on how you want to replicate/snapshot them, then use the ZFS mountpoint property to put them in the unix namespace where you want them to appear. For example the basic approach I use for client systems is a ZFS namespace like POOL/UNIX for FreeBSD, POOL/BUSINESS for shared company data, POOL/BACKUP for client system backup blobs, POOL/REPLICANT for the replication workspace to use in keeping hot-spare servers updated, etc. Note that the root of the ZFS tree is empty, and that the root of the unix tree is elsewhere. I often keep more than one bootable unix system root in a pool (for maintenance). PS. Don't forget the zpool bootfs property. From owner-freebsd-fs@FreeBSD.ORG Wed Oct 29 21:13:54 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 00330EC8 for ; Wed, 29 Oct 2014 21:13:53 +0000 (UTC) Received: from relay.exonetric.net (relay0.exonetric.net [178.250.72.161]) by mx1.freebsd.org (Postfix) with ESMTP id C142E87A for ; Wed, 29 Oct 2014 21:13:52 +0000 (UTC) Received: from [192.168.10.18] (186.211.187.81.in-addr.arpa [81.187.211.186]) by relay.exonetric.net (Postfix) with ESMTPSA id B3CD92CC72 for ; Wed, 29 Oct 2014 21:03:59 +0000 (GMT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1990.1\)) Subject: Re: either gptzfsboot or zfsloader hangs during boot after kernel and pool upgrade From: Mark Blackman In-Reply-To: <6F3D0C72-D774-4B1F-8A5F-25CD1C55EBE0@exonetric.com> Date: Wed, 29 Oct 2014 21:03:55 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <50B4C3C1-B4A1-4D8A-8E4A-2B5549E13A45@exonetric.com> References: <6F3D0C72-D774-4B1F-8A5F-25CD1C55EBE0@exonetric.com> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1990.1) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 21:13:54 -0000 Following a suggestion from Matt Reimer, I've updated the bootcode gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0 but using a gptzfsboot from FreeBSD 9.2-release instead of FreeBSD = 8.4-release and the system now boots correctly. So my immediate problem is resolved, but it does mean there's a bug in = the gptzfsboot for FreeBSD 8.4 at least and I=92m pretty sure it=92s the = serial changes from 9.2 need to be ported to 8.4. > On 18 Sep 2014, at 21:48, Mark Blackman wrote: >=20 > Hi, >=20 > I=92ve filed https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D193758 = on the subject topic, but thought I would publicise the issue here too. >=20 > Short story: following a zpool upgrade on a FreeBSD 8.4 system, the = system now freezes very early in boot process. >=20 > Regards, > Mark Blackman >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Oct 30 22:44:40 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 88CC4CA1; Thu, 30 Oct 2014 22:44:40 +0000 (UTC) Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "khavrinen.csail.mit.edu", Issuer "Client CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 30D58D8B; Thu, 30 Oct 2014 22:44:40 +0000 (UTC) Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1]) by khavrinen.csail.mit.edu (8.14.9/8.14.9) with ESMTP id s9UMicgI034127 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA); Thu, 30 Oct 2014 18:44:38 -0400 (EDT) (envelope-from wollman@khavrinen.csail.mit.edu) Received: (from wollman@localhost) by khavrinen.csail.mit.edu (8.14.9/8.14.9/Submit) id s9UMic4t034124; Thu, 30 Oct 2014 18:44:38 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21586.48982.64913.250497@khavrinen.csail.mit.edu> Date: Thu, 30 Oct 2014 18:44:38 -0400 From: Garrett Wollman To: freebsd-stable@freebsd.org, freebsd-fs@freebsd.org Subject: Definite NFS bug X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (khavrinen.csail.mit.edu [127.0.0.1]); Thu, 30 Oct 2014 18:44:38 -0400 (EDT) Cc: rmacklem@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Oct 2014 22:44:40 -0000 Like many other users, I upgrade my FreeBSD servers by NFS-mounting /usr/src and /usr/obj from a shared build server.[1] Since I upgraded the build server to 9.3, clients running 9.3 kernels have been randomly erroring out during installkernel and installworld. Today I had some time to look more closely into this and found that the error is definitely coming from the server: at some point, it just randomly starts returning errors to client ACCESS and GETATTR operations. The errors are a mix of NFS3ERR_IO and NFS3ERR_ACCES, but there is nothing on the server to indicate any kind of error, and restarting the operation on the client causes it to fail in a different place. With enough patients and restarts, it's possible to complete the installation in just four or five passes. Needless to say this is a bit worrying. Strangely, 9.1 and 9.2 clients don't see this issue at all; it's only 9.3 clients that break. It's easy to reproduce, just 'cd /usr/sc && find . -type f >/dev/null'. It does not seem to depend on the client NFS version (3 or 4) or implementation ("old" or "new"). I haven't tried the "old" server yet -- I'll need to figure out how to do that first. If anyone is willing to help debug this, I can share a packet trace, but I don't think it's very informative. Also, if anyone has a good dtrace script that I could run on the server that would report what's going on when that first NFS3ERR_IO is returned, that would be great. -GAWollman [1] I'd run my own freebsd-update server but unfortunately it is too tied to building things that look like official FreeBSD security updates, and isn't really designed for (e.g.) updating kernels when we change a configuration option. It also doesn't have any obvious knobs for building with anything other than a default {make,src}.conf. And with a pkg-able base just around the corner I don't really want to put much effort into making freebsd-update do what I want. NFS, on the other hand, is a big deal and so I need to track down and fix these bugs. From owner-freebsd-fs@FreeBSD.ORG Fri Oct 31 00:07:41 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7AC37E43 for ; Fri, 31 Oct 2014 00:07:41 +0000 (UTC) Received: from quine.pinyon.org (quine.pinyon.org [65.101.5.249]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4D88F7A4 for ; Fri, 31 Oct 2014 00:07:41 +0000 (UTC) Received: by quine.pinyon.org (Postfix, from userid 122) id 6869616031A; Thu, 30 Oct 2014 17:07:34 -0700 (MST) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on quine.pinyon.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 Received: from feyerabend.n1.pinyon.org (feyerabend.n1.pinyon.org [10.0.10.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by quine.pinyon.org (Postfix) with ESMTPSA id 9935E1602E3; Thu, 30 Oct 2014 17:07:31 -0700 (MST) Message-ID: <5452D2C3.9040902@pinyon.org> Date: Thu, 30 Oct 2014 17:07:31 -0700 From: "Russell L. Carter" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Garrett Wollman , freebsd-fs@freebsd.org Subject: Re: Definite NFS bug References: <21586.48982.64913.250497@khavrinen.csail.mit.edu> In-Reply-To: <21586.48982.64913.250497@khavrinen.csail.mit.edu> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2014 00:07:41 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/30/14 15:44, Garrett Wollman wrote: > Like many other users, I upgrade my FreeBSD servers by > NFS-mounting /usr/src and /usr/obj from a shared build server.[1] > Since I upgraded the build server to 9.3, clients running 9.3 > kernels have been randomly erroring out during installkernel and > installworld. Today I had some time to look more closely into this > and found that the error is definitely coming from the server: at > some point, it just randomly starts returning errors to client > ACCESS and GETATTR operations. The errors are a mix of NFS3ERR_IO > and NFS3ERR_ACCES, but there is nothing on the server to indicate > any kind of error, and restarting the operation on the client > causes it to fail in a different place. With enough patients and > restarts, it's possible to complete the installation in just four > or five passes. > > Needless to say this is a bit worrying. Strangely, 9.1 and 9.2 > clients don't see this issue at all; it's only 9.3 clients that > break. > > It's easy to reproduce, just 'cd /usr/sc && find . -type f > >/dev/null'. It does not seem to depend on the client NFS version > (3 or 4) or implementation ("old" or "new"). I haven't tried the > "old" server yet -- I'll need to figure out how to do that first. > > If anyone is willing to help debug this, I can share a packet > trace, but I don't think it's very informative. Also, if anyone > has a good dtrace script that I could run on the server that would > report what's going on when that first NFS3ERR_IO is returned, that > would be great. This sounds sort of like what I have been complaining about. I of course have no competency here but if I build the world - -j1, I have a much better chance of successful remote installs. The problems I'm seeing on -current for the last few months seem to me to be out-of-date targets, so that the failure is a desire by the remote client to try to rebuild the out-of-date target on the RO file system. My new plan is to dump all of the st_atim and st_mtim for every .depend list on both systems when I see the problem again, to see if something jumps out. I just reinstalled everybody with -j1 builds of r273808M, no problems. Last week however, a fast box failed. Kind of concerning for an install to fail say 2/3 through. I have to admit when soon after I had a crash on that 2/3 system (on NFS unmount), I had to step out of the room for the reboot. Exciting. I am traveling on Sunday for a week, but I've got a few days to run things on several big fast 8cpu boxes (my old laptop is much less afflicted with this problem, though it occasionally fails too). Russell > -GAWollman > > [1] I'd run my own freebsd-update server but unfortunately it is > too tied to building things that look like official FreeBSD > security updates, and isn't really designed for (e.g.) updating > kernels when we change a configuration option. It also doesn't > have any obvious knobs for building with anything other than a > default {make,src}.conf. And with a pkg-able base just around the > corner I don't really want to put much effort into making > freebsd-update do what I want. NFS, on the other hand, is a big > deal and so I need to track down and fix these bugs. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs To > unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJUUtLDAAoJEFnLrGVSDFaEu0MQAJOlPWcsduuiS75LUe42uj+E SRnxSvm5JgUdJojatx7cL5TQjEvXbYov8CE8OLZUqGxIi0D0IdpKlr6WJes8KOUC wfix7doQZQe3IPqgYAJZz0y6j89q6+QABPTS2oy+cPpYmop9568TvuJJZCCixBOF Zv3XYa4I7uIl1pYF2zl2nJHtOwLi2wjT+851heqXo8GvIo8SAhBouTN5biPh2JGl Yabbb4e5xePvigMLEwxbPNslv3nhT1JOcsH9GoFLo5zph2+Txw6ZPy1Sccyv88AQ w5ID129VMzZChX6zYT7+LtJYLmZME3bVrA2R6YeEdnr/Is8qm5eKtpkMrUz+5Qn4 ULf3fJSCjYdlfatfBIFfi2jFJWBkBY7qVu9S5nqfG9yn4DCLY2UYl4skP71Eo4hz DPDKQwpuij/Tf8y459Vj60AsOt87Sh0eYBnW+nWJdgIPWptYLNmjv/VHvC8ZFbnn HsrvUw9DovnTfd7rn+GR4F4+nlnjXqOKdPJtLroId3tSxZzy9L08n7Y6AvAWFFWM oQ4q/B4LxpOmjXqIBTCrC5ux7GdtKGN2gkAYvY4zh3ngPJJ9ts0BRHbq2zRMo9OA eUT8Cf+D/wQcFcd+27eI1RJu8IbyycStwGMXbA57UkvJkfSA5CVpcey+T5z9uyPa 7xlgxCpHOIHSJ6l2BeSQ =4Q5V -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Fri Oct 31 01:31:36 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9C60ECDB; Fri, 31 Oct 2014 01:31:36 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 41141F37; Fri, 31 Oct 2014 01:31:35 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq0EAHzlUlSDaFve/2dsb2JhbABcDoQwgwLRaAKBMgEBAQEBfYQCAQEBAwEjBFIFFhgCAg0ZAlkGiEsJtVWUaAEBAQEGAQEBAQEBHIEsjyEONAeCd4FUBZ8hjWaHLYM4XCGBN0CBAwEBAQ X-IronPort-AV: E=Sophos;i="5.07,290,1413259200"; d="scan'208";a="163577126" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Oct 2014 21:31:34 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 11382AE92D; Thu, 30 Oct 2014 21:31:34 -0400 (EDT) Date: Thu, 30 Oct 2014 21:31:34 -0400 (EDT) From: Rick Macklem To: Garrett Wollman Message-ID: <1902145956.2676513.1414719094052.JavaMail.root@uoguelph.ca> In-Reply-To: <21586.48982.64913.250497@khavrinen.csail.mit.edu> Subject: Re: Definite NFS bug MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2014 01:31:36 -0000 Garrett Wollman wrote: > Like many other users, I upgrade my FreeBSD servers by NFS-mounting > /usr/src and /usr/obj from a shared build server.[1] Since I > upgraded > the build server to 9.3, clients running 9.3 kernels have been > randomly erroring out during installkernel and installworld. Today I > had some time to look more closely into this and found that the error > is definitely coming from the server: at some point, it just randomly > starts returning errors to client ACCESS and GETATTR operations. The > errors are a mix of NFS3ERR_IO and NFS3ERR_ACCES, but there is > nothing > on the server to indicate any kind of error, and restarting the > operation on the client causes it to fail in a different place. With > enough patients and restarts, it's possible to complete the > installation in just four or five passes. > > Needless to say this is a bit worrying. Strangely, 9.1 and 9.2 > clients don't see this issue at all; it's only 9.3 clients that > break. > > It's easy to reproduce, just 'cd /usr/sc && find . -type f > >/dev/null'. > It does not seem to depend on the client NFS version (3 or 4) or > implementation ("old" or "new"). I haven't tried the "old" server > yet > -- I'll need to figure out how to do that first. > Well, I took a quick look and, if I got it correct, there is one single line change in the "old" client between 9.2 and 9.3, which defined an otherwise unused mount flag called NFSMNT_NONCONTIGWR. (It is only used by the new client when "nocontigwr" is specified.) However, there was some fairly extensive changes done (mostly by mav@) to the kernel rpc (sys/rpc), which is used by both clients and both servers. Most of these changes were committed to stable/9 as r261057, r261058. If you could build a kernel from stable/9 just prior to r261057 and see if that client runs into the problem, it could help determine if these changes are causing the problem. Alternately, running the 9.3 system with a 9.2 sys/rpc (if it links/runs), that could also help see if the kernel rpc is the culprit. (You can load the kernel rpc as a module, but it's linked into most kernels.) If it doesn't turn out to be in the kernel rpc, my next guess would be changes to the net device driver (to check for this you could use a different type of hardware device or the 9.2 driver on the 9.3 system. maybe?). The "new" client has some changes 9.2->9.3, but since nothing changed for the "old" client and you see the problem with the "old" one, I think the NFS client is not the culprit. rick > If anyone is willing to help debug this, I can share a packet trace, > but I don't think it's very informative. Also, if anyone has a good > dtrace script that I could run on the server that would report what's > going on when that first NFS3ERR_IO is returned, that would be great. > > -GAWollman > > [1] I'd run my own freebsd-update server but unfortunately it is too > tied to building things that look like official FreeBSD security > updates, and isn't really designed for (e.g.) updating kernels when > we > change a configuration option. It also doesn't have any obvious > knobs > for building with anything other than a default {make,src}.conf. > And with a pkg-able base just around the corner I don't really want > to > put much effort into making freebsd-update do what I want. NFS, on > the other hand, is a big deal and so I need to track down and fix > these bugs. > From owner-freebsd-fs@FreeBSD.ORG Fri Oct 31 01:49:49 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 023EEED2; Fri, 31 Oct 2014 01:49:49 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9ABCABC; Fri, 31 Oct 2014 01:49:48 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ar0EAB/qUlSDaFve/2dsb2JhbABcDoNUWASDAsoRCoZ5VAKBMgEBAQEBfYQCAQEBAwEBAQEgBCcgCwUWGAICDRkCKQEJJgYIBwQBHASIFwkNtUyUZgEBAQEGAQEBAQEBARuBLI8SAQENDjQHgneBVAWWWoQShDU8jSqHLYM4XCEvB4EBBxcigQMBAQE X-IronPort-AV: E=Sophos;i="5.07,290,1413259200"; d="scan'208";a="163580651" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 30 Oct 2014 21:49:47 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3E53FB4082; Thu, 30 Oct 2014 21:49:47 -0400 (EDT) Date: Thu, 30 Oct 2014 21:49:47 -0400 (EDT) From: Rick Macklem To: Garrett Wollman Message-ID: <928219131.2682604.1414720187244.JavaMail.root@uoguelph.ca> In-Reply-To: <1902145956.2676513.1414719094052.JavaMail.root@uoguelph.ca> Subject: Re: Definite NFS bug MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2014 01:49:49 -0000 I wrote: > Garrett Wollman wrote: > > Like many other users, I upgrade my FreeBSD servers by NFS-mounting > > /usr/src and /usr/obj from a shared build server.[1] Since I > > upgraded > > the build server to 9.3, clients running 9.3 kernels have been > > randomly erroring out during installkernel and installworld. Today > > I > > had some time to look more closely into this and found that the > > error > > is definitely coming from the server: at some point, it just > > randomly > > starts returning errors to client ACCESS and GETATTR operations. > > The > > errors are a mix of NFS3ERR_IO and NFS3ERR_ACCES, but there is > > nothing > > on the server to indicate any kind of error, and restarting the > > operation on the client causes it to fail in a different place. > > With > > enough patients and restarts, it's possible to complete the > > installation in just four or five passes. > > > > Needless to say this is a bit worrying. Strangely, 9.1 and 9.2 > > clients don't see this issue at all; it's only 9.3 clients that > > break. > > > > It's easy to reproduce, just 'cd /usr/sc && find . -type f > > >/dev/null'. > > It does not seem to depend on the client NFS version (3 or 4) or > > implementation ("old" or "new"). I haven't tried the "old" server > > yet > > -- I'll need to figure out how to do that first. > > Oh, and it wasn't clear to me if you are seeing this on a 9.3 server only? (If you get the same outcome testing against an older server, then it seems it is a client side issue.) If that is the case, I'd suggest you try a pre-r261056 (one of the changes was r261056, not r261057) stable/9 kernel. At a closer look, most of the kernel rpc changes are for the server side. (Most of the client side commits just change the copyright, but there are a couple of client side changes beyond that.) > Well, I took a quick look and, if I got it correct, there is one > single > line change in the "old" client between 9.2 and 9.3, which defined > an otherwise unused mount flag called NFSMNT_NONCONTIGWR. (It is > only used by the new client when "nocontigwr" is specified.) > > However, there was some fairly extensive changes done (mostly by > mav@) > to the kernel rpc (sys/rpc), which is used by both clients and both > servers. > Most of these changes were committed to stable/9 as r261057, r261058. > If you could build a kernel from stable/9 just prior to r261057 and > see > if that client runs into the problem, it could help determine if > these > changes are causing the problem. > Alternately, running the 9.3 system with a 9.2 sys/rpc (if it > links/runs), > that could also help see if the kernel rpc is the culprit. (You can > load the kernel rpc as a module, but it's linked into most kernels.) > > If it doesn't turn out to be in the kernel rpc, my next guess would > be changes to the net device driver (to check for this you could use > a different type of hardware device or the 9.2 driver on the 9.3 > system. maybe?). > > The "new" client has some changes 9.2->9.3, but since nothing changed > for the "old" client and you see the problem with the "old" one, I > think the NFS client is not the culprit. > > rick > > > If anyone is willing to help debug this, I can share a packet > > trace, > > but I don't think it's very informative. Also, if anyone has a > > good > > dtrace script that I could run on the server that would report > > what's > > going on when that first NFS3ERR_IO is returned, that would be > > great. > > > > -GAWollman > > > > [1] I'd run my own freebsd-update server but unfortunately it is > > too > > tied to building things that look like official FreeBSD security > > updates, and isn't really designed for (e.g.) updating kernels when > > we > > change a configuration option. It also doesn't have any obvious > > knobs > > for building with anything other than a default {make,src}.conf. > > And with a pkg-able base just around the corner I don't really want > > to > > put much effort into making freebsd-update do what I want. NFS, on > > the other hand, is a big deal and so I need to track down and fix > > these bugs. > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Fri Oct 31 15:59:01 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F1A3DBFF for ; Fri, 31 Oct 2014 15:59:01 +0000 (UTC) Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "khavrinen.csail.mit.edu", Issuer "Client CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B12F15FE for ; Fri, 31 Oct 2014 15:59:01 +0000 (UTC) Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1]) by khavrinen.csail.mit.edu (8.14.9/8.14.9) with ESMTP id s9VFx0j0040276 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA); Fri, 31 Oct 2014 11:59:00 -0400 (EDT) (envelope-from wollman@khavrinen.csail.mit.edu) Received: (from wollman@localhost) by khavrinen.csail.mit.edu (8.14.9/8.14.9/Submit) id s9VFwx92040273; Fri, 31 Oct 2014 11:58:59 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21587.45507.688734.672857@khavrinen.csail.mit.edu> Date: Fri, 31 Oct 2014 11:58:59 -0400 From: Garrett Wollman To: "Russell L. Carter" Subject: Re: Definite NFS bug In-Reply-To: <5452D2C3.9040902@pinyon.org> References: <21586.48982.64913.250497@khavrinen.csail.mit.edu> <5452D2C3.9040902@pinyon.org> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (khavrinen.csail.mit.edu [127.0.0.1]); Fri, 31 Oct 2014 11:59:00 -0400 (EDT) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2014 15:59:02 -0000 < said: > The problems I'm seeing on -current for the last few months > seem to me to be out-of-date targets, so that the failure is a > desire by the remote client to try to rebuild the out-of-date target > on the RO file system. Nope, nothing at all to do with that. As I said in my original message, the problem is that the server is returning NFS3ERR_IO or NFS3ERR_ACCES for RPCs that should (and a few seconds later DO) succeed. -GAWollman From owner-freebsd-fs@FreeBSD.ORG Sat Nov 1 22:44:44 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 98F9FE1A; Sat, 1 Nov 2014 22:44:44 +0000 (UTC) Received: from potato.growveg.org (potato.growveg.org [62.49.247.163]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 584BCF62; Sat, 1 Nov 2014 22:44:44 +0000 (UTC) Received: from john by potato.growveg.org with local (Exim 4.84 (FreeBSD)) (envelope-from ) id 1XkhPG-000IBC-Qi; Sat, 01 Nov 2014 22:44:26 +0000 Date: Sat, 1 Nov 2014 22:44:26 +0000 From: John To: freebsd-hardware@freebsd.org Subject: gptboot: invalid backup GPT header Message-ID: <20141101224426.GA69717@potato.growveg.org> Mail-Followup-To: freebsd-hardware@freebsd.org, freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) Sender: John X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: john@potato.growveg.org X-SA-Exim-Scanned: No (on potato.growveg.org); SAEximRunCond expanded to false Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Nov 2014 22:44:44 -0000 Hello lists, Not sure if this is a hardware problem or a filesystem problem, so have cc'd to freebsd-fs@ The "problem" is on newly-installed freebsd 10.1 RC3. I say "problem" in inverted commas because it's not stopping it from booting and the server seems to run allright, I'm wondering if it's anything to worry about. As well as seeing gptboot: invalid backup GPT header *before* beastie starts, I see the following in dmesg: GEOM: mfid1: corrupt or invalid GPT detected. GEOM: mfid1: GPT rejected -- may not be recoverable. There are 4 disks installed - mfid0,1,2 & 3. mfid0 is a regular ufs gpt based disk. mfid1,2 and 3 together form a zfs raidz array. A thread on https://forums.freenas.org/index.php?threads/gpt-table-is-corrupt-or-invalid-error-on-bootup.12171/ describes a similar problem - the thing is though the "erroring" disk is not a GPT disk, and the one in the example was. # gpart list mfid1 gpart: No such geom: mfid1. # ls -la /dev/mfid* crw-r----- 1 root operator 0x62 Nov 1 13:37 /dev/mfid0 crw-r----- 1 root operator 0x66 Nov 1 13:37 /dev/mfid0p1 crw-r----- 1 root operator 0x67 Nov 1 13:37 /dev/mfid0p2 crw-r----- 1 root operator 0x68 Nov 1 13:37 /dev/mfid0p3 crw-r----- 1 root operator 0x63 Nov 1 13:37 /dev/mfid1 crw-r----- 1 root operator 0x64 Nov 1 13:37 /dev/mfid2 crw-r----- 1 root operator 0x65 Nov 1 13:37 /dev/mfid3 # zpool status pool: vms state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM vms ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 mfid1 ONLINE 0 0 0 mfid2 ONLINE 0 0 0 mfid3 ONLINE 0 0 0 errors: No known data errors Should I worry? -- John