From owner-freebsd-fs@FreeBSD.ORG Sat Jun 20 14:50:58 2015 Return-Path: Delivered-To: fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 307D2976 for ; Sat, 20 Jun 2015 14:50:58 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E67F3F4 for ; Sat, 20 Jun 2015 14:50:57 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 9CB0D16A407 for ; Sat, 20 Jun 2015 16:50:49 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IBGSGUpD3BOH; Sat, 20 Jun 2015 16:50:23 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:8a1:f529:c01f:472d] (unknown [IPv6:2001:4cb8:3:1:8a1:f529:c01f:472d]) by smtp.digiware.nl (Postfix) with ESMTPA id E5C8916A401 for ; Sat, 20 Jun 2015 16:19:38 +0200 (CEST) Message-ID: <5585767B.4000206@digiware.nl> Date: Sat, 20 Jun 2015 16:19:39 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: fs@freebsd.org Subject: This diskfailure should not panic a system, but just disconnect disk from ZFS Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jun 2015 14:50:58 -0000 Hi, Found my system rebooted this morning: Jun 20 05:28:33 zfs kernel: sonewconn: pcb 0xfffff8011b6da498: Listen queue overflow: 8 already in queue awaiting acceptance (48 occurrences) Jun 20 05:28:33 zfs kernel: panic: I/O to pool 'zfsraid' appears to be hung on vdev guid 18180224580327100979 at '/dev/da0'. Jun 20 05:28:33 zfs kernel: cpuid = 0 Jun 20 05:28:33 zfs kernel: Uptime: 8d9h7m9s Jun 20 05:28:33 zfs kernel: Dumping 6445 out of 8174 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Which leads me to believe that /dev/da0 went out on vacation, leaving ZFS into trouble.... But the array is: ---- NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP zfsraid 32.5T 13.3T 19.2T - 7% 41% 1.00x ONLINE - raidz2 16.2T 6.67T 9.58T - 8% 41% da0 - - - - - - da1 - - - - - - da2 - - - - - - da3 - - - - - - da4 - - - - - - da5 - - - - - - raidz2 16.2T 6.67T 9.58T - 7% 41% da6 - - - - - - da7 - - - - - - ada4 - - - - - - ada5 - - - - - - ada6 - - - - - - ada7 - - - - - - mirror 504M 1.73M 502M - 39% 0% gpt/log0 - - - - - - gpt/log1 - - - - - - cache - - - - - - gpt/raidcache0 109G 1.34G 107G - 0% 1% gpt/raidcache1 109G 787M 108G - 0% 0% ---- And thus I'd would have expected that ZFS would disconnect /dev/da0 and then switch to DEGRADED state and continue, letting the operator fix the broken disk. Instead it chooses to panic, which is not a nice thing to do. :) Or do I have to high hopes of ZFS? Next question to answer is why this WD RED on: arcmsr0@pci0:7:14:0: class=0x010400 card=0x112017d3 chip=0x112017d3 rev=0x00 hdr=0x00 vendor = 'Areca Technology Corp.' device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' class = mass storage subclass = RAID got hung, and nothing for this shows in SMART.... Thanx, --WjW (If needed vmcore available)