From owner-freebsd-fs@FreeBSD.ORG Fri Jan 20 09:09:42 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 88A48106566B; Fri, 20 Jan 2012 09:09:42 +0000 (UTC) (envelope-from martin.ranne@kockumsonics.com) Received: from webmail.kockumsonics.com (mail.kockumsonics.com [194.103.55.3]) by mx1.freebsd.org (Postfix) with ESMTP id DCD298FC16; Fri, 20 Jan 2012 09:09:41 +0000 (UTC) Received: from MAILGATE.sonet.local ([192.168.12.8]) by mailgate ([192.168.12.8]) with mapi id 14.01.0355.002; Fri, 20 Jan 2012 10:09:39 +0100 From: Martin Ranne To: Andriy Gapon Thread-Topic: zpool import reboots computer Thread-Index: AczWvHf/qf1tgj/cQ3aTdT164KORYwAAxbSAAARQzcD///SRAP//zVoQgABYagD//xWRYA== Date: Fri, 20 Jan 2012 09:09:38 +0000 Message-ID: <39C592E81AEC0B418EAD826FC1BBB09B25284B@mailgate> References: <39C592E81AEC0B418EAD826FC1BBB09B25031D@mailgate> <4F18459F.7040309@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B252444@mailgate> <4F1858FE.7020509@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25253F@mailgate> <4F1878AC.6060704@FreeBSD.org> In-Reply-To: <4F1878AC.6060704@FreeBSD.org> Accept-Language: sv-SE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.15.18] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "freebsd-fs@freebsd.org" Subject: RE: zpool import reboots computer X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jan 2012 09:09:42 -0000 On 2012-01-19 21:10, Andriy Gapon wrote:=20 >on 19/01/2012 21:58 Martin Ranne said the following: >>On 2012-01-19 18:55, Andriy Gapon wrote:=20 >>on 19/01/2012 19:36 Martin Ranne said the following: >>>On 2012-01-19 17:32, Andriy Gapon wrote:=20 >>>on 19/01/2012 17:36 Martin Ranne said the following: >>>>>>I had a failure in one server where i try to determine if it is memor= y or cpu. It shows up as memory failure in memtest86. >>The result is that = it managed to damage the zpool which is a raidz2 with 6 disks. >>>>>>If I boot from a FreeBSD 9.0-RELEASE usb stick and import it with zpo= ol -f -R /mnt/zroot zroot it will reboot the computer. >>I have also tried = to import it in another computer which is running 9-STABLE with the same re= sult. On the second computer I >>used zpool -f -R /mnt/zroot "zpool-id" ser= v06zroot=20 >>>>>>Can I get some help on how to be able to debug this and in the end be= able to import it to repair it. >>>>>>Data for the second computer can be found attached. The disks in ques= tion are da0 to da5 in this. >>>>>And the panic message is? >>>>I am trying to get a crash dump but it hangs when dumping. >>>Alternatives: >>>- serial console >>>- digital camera >>>- eyes plus pen and paper >>Finally here it is. Is there anything i can do in the debugger to make it= possible to find what is crashing in there? >>Fatal trap 12: page fault while in kernel mode >>Fatal trap 12: page fault while in kernel mode >>cpuid =3D 0; cpuid =3D 2; apic id =3D 00 >>apic id =3D 02 >>fault virtual address =3D 0x88 >>fault virtual address =3D 0x38 >>fault code =3D supervisor read data, page not present >>fault code =3D supervisor read data, page not present >>instruction pointer =3D 0x20:0xffffffff814a7ef5 >>instruction pointer =3D 0x20:0xffffffff814872a1 >>stack pointer =3D 0x28:0xffffff8c10252ad0 >>stack pointer =3D 0x28:0xffffff8c0d564f00 >>frame pointer =3D 0x28:0xffffff8c10252b40 >>frame pointer =3D 0x28:0xffffff8c0d564f30 >>code segment =3D base 0x0, limit 0xfffff, type 0x1b >>code segment =3D base 0x0, limit 0xfffff, type 0x1b >> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >>processor eflags =3D processor eflags =3D interrupt ena= bled, interrupt enabled, resume, resume, IOPL =3D 0 >>IOPL =3D 0 >>current process =3D current process =3D 265= 9 (zpool) >>0 [ thread pid 2659 tid 100592 ] >Hmm, two traps running almost perfectly in parallel... >stopped at zio_vdev_child_io+0x25: cmpq $0,0x88(%r10) >db> >At least the 'bt' command. >It could be that the panic is caused by corrupted vdev label, but not sure= ... I tried again to get into the debugger. It will not always work as it freez= es before i get to the prompt most of the times but here it is. Any other c= ommands to run in the debugger to get better information to help solve this= ? I used the command zpool import -F -f -o readonly=3Don -R /mnt/serv06 zroot Result is the following Fatal trap 12: page fault while in kernel mode Fatal trap 12: page fault while in kernel mode cpuid =3D 0; cpuid =3D 5; apic id =3D 00 apic id =3D 05 fault virtual address =3D 0x38 fault virtual address =3D 0x88 fault code =3D supervisor read data, page not present fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff814872a1 instruction pointer =3D 0x20:0xffffffff814a7ef5 stack pointer =3D 0x28:0xffffff8c0d564f00 stack pointer =3D 0x28:0xffffff8c0ffd7ad0 frame pointer =3D 0x28:0xffffff8c0d564f30 frame pointer =3D 0x28:0xffffff8c0ffd7b40 code segment =3D base 0x0, limit 0xfffff, type 0x1b code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D processor eflags =3D interrupt enabled, in= terrupt enabled, resume, resume, IOPL =3D 0 IOPL =3D 0 current process =3D current process =3D 0 (system_tas= k1_3) 26[ thread pid 0 tid 100099 ] Stopped at vdev_is_dead+0x1: cmpq $0x5,0x28(%rdi) db> bt Tracing pid 0 tid 100099 td 0xfffffe000e546460 vdev_is_dead() at vdev_is_dead+0x1 vdev_mirror_child_select() at vdev_mirror_child_select+0x67 vdev_mirror_io_start() at vdev_mirror_io_start+0x24c zio_vdev_io_start() at zio_vdev_io_start+0x232 zio_execute() at zio_execute+0xc3 zio_gang_assemble() at zio_gang_assemble+0x1b zio_execute() at zio_execute+0xc3 arc_read_nolock() at arc_read_nolock+0x6d1 arc_read() at arc_read+0x93 traverse_prefetcher() at traverse_prefetcher+0x103 traverse_visitbp() at traverse_visitbp+0x21c traverse_dnode() at traverse_dnode+0x7c traverse_visitbp() at traverse_visitbp+0x3ff traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_dnode() at traverse_dnode+0x7c traverse_visitbp() at traverse_visitbp+0x48c traverse_prefetch_thread() at traverse_prefetch_thread+0x78 taskq_run() at taskq_run+0x13 taskqueue_run_locked() at taskqueue_run_locked+0x85 taskqueue_thread_loop() at taskqueue_thread_loop+0x46 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffff8c0d565d00, rbp =3D 0 --- db> //Martin Ranne ________________________________________ No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1901 / Virus Database: 2109/4754 - Release Date: 01/19/12