From owner-freebsd-fs@FreeBSD.ORG Mon Jan 23 14:38:32 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 974D3106564A; Mon, 23 Jan 2012 14:38:32 +0000 (UTC) (envelope-from martin.ranne@kockumsonics.com) Received: from webmail.kockumsonics.com (mail.kockumsonics.com [194.103.55.3]) by mx1.freebsd.org (Postfix) with ESMTP id E7E798FC15; Mon, 23 Jan 2012 14:38:31 +0000 (UTC) Received: from MAILGATE.sonet.local ([192.168.12.8]) by mailgate ([192.168.12.8]) with mapi id 14.01.0355.002; Mon, 23 Jan 2012 15:38:29 +0100 From: Martin Ranne To: Andriy Gapon Thread-Topic: zpool import reboots computer Thread-Index: AczWvHf/qf1tgj/cQ3aTdT164KORYwAAxbSAAARQzcD///SRAP//zVoQgABYagD//xWRYIADrTyA//zFgGA= Date: Mon, 23 Jan 2012 14:38:28 +0000 Message-ID: <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate> References: <39C592E81AEC0B418EAD826FC1BBB09B25031D@mailgate> <4F18459F.7040309@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B252444@mailgate> <4F1858FE.7020509@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25253F@mailgate> <4F1878AC.6060704@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25284B@mailgate> <4F1AC995.7050506@FreeBSD.org> In-Reply-To: <4F1AC995.7050506@FreeBSD.org> Accept-Language: sv-SE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.15.6] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "freebsd-fs@freebsd.org" Subject: RE: zpool import reboots computer X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jan 2012 14:38:32 -0000 >On 2012-01-21 15:20, Andriy Gapon wrote:=20 >>on 20/01/2012 11:09 Martin Ranne said the following: >>I tried again to get into the debugger. It will not always work as it fre= ezes before i get to the prompt most of the times but here it is. Any other= commands to run in the debugger to get better information to help solve th= is? >>I used the command zpool import -F -f -o readonly=3Don -R /mnt/serv06 zro= ot >>Result is the following >>Fatal trap 12: page fault while in kernel mode >>Fatal trap 12: page fault while in kernel mode >>cpuid =3D 0; cpuid =3D 5; apic id =3D 00 >>apic id =3D 05 >>fault virtual address =3D 0x38 >>fault virtual address =3D 0x88 >>fault code =3D supervisor read data, page not present >>fault code =3D supervisor read data, page not present >>instruction pointer =3D 0x20:0xffffffff814872a1 >>instruction pointer =3D 0x20:0xffffffff814a7ef5 >>stack pointer =3D 0x28:0xffffff8c0d564f00 >>stack pointer =3D 0x28:0xffffff8c0ffd7ad0 >>frame pointer =3D 0x28:0xffffff8c0d564f30 >>frame pointer =3D 0x28:0xffffff8c0ffd7b40 >>code segment =3D base 0x0, limit 0xfffff, type 0x1b >>code segment =3D base 0x0, limit 0xfffff, type 0x1b >> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >>processor eflags =3D processor eflags =3D interrupt enabled, = >>interrupt enabled, resume, resume, IOPL =3D 0 >>IOPL =3D 0 >>current process =3D current process =3D 0 (system_t= ask1_3) >>26[ thread pid 0 tid 100099 ] >>Stopped at vdev_is_dead+0x1: cmpq $0x5,0x28(%rdi) >>db> bt >>Tracing pid 0 tid 100099 td 0xfffffe000e546460 >>vdev_is_dead() at vdev_is_dead+0x1 >>vdev_mirror_child_select() at vdev_mirror_child_select+0x67 >>vdev_mirror_io_start() at vdev_mirror_io_start+0x24c >>zio_vdev_io_start() at zio_vdev_io_start+0x232 >>zio_execute() at zio_execute+0xc3 >>zio_gang_assemble() at zio_gang_assemble+0x1b >>zio_execute() at zio_execute+0xc3 >>arc_read_nolock() at arc_read_nolock+0x6d1 >>arc_read() at arc_read+0x93 >>traverse_prefetcher() at traverse_prefetcher+0x103 >>traverse_visitbp() at traverse_visitbp+0x21c >>traverse_dnode() at traverse_dnode+0x7c >>traverse_visitbp() at traverse_visitbp+0x3ff >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_dnode() at traverse_dnode+0x7c >>traverse_visitbp() at traverse_visitbp+0x48c >>traverse_prefetch_thread() at traverse_prefetch_thread+0x78 >>taskq_run() at taskq_run+0x13 >>taskqueue_run_locked() at taskqueue_run_locked+0x85 >>taskqueue_thread_loop() at taskqueue_thread_loop+0x46 >>fork_exit() at fork_exit+0x11f >>fork_trampoline() at fork_trampoline+0xe >>--- trap 0, rip =3D 0, rsp =3D 0xffffff8c0d565d00, rbp =3D 0 --- >>db> >> > >To me it looks like in the vdev_mirror_child_select function mc->mc_vd cou= ld be >NULL although the code doesn't expect it. You can add some code to the fu= nction >to check if the hypothesis is correct and to skip a loop if mc->mc_vd is N= ULL. >Such a hack is probably not needed in general, but given that your pool co= uld be >corrupted, this could be your chance to get access to it. > >BTW, restoring from backups is what is usually recommended first in a situ= ation >like this. > I know it would be recommended first to restore from backup but there were = backup failures. Am back after the weekend. I have done the hack in vdev_mirror_child_select= function as per the code below. if (mc->mc_tried || mc->mc_skipped) continue; # hack start if (mc->mc_vd =3D=3D NULL) break; # hack end if (!vdev_readable(mc->mc_vd)) { I am not getting the fault virtual address at 0x38 and 0x88 but instead get= two at 0x88. The function it stops at is zio_vdev_child_io. Is there anoth= er hack i could do there? Crash and bt below. Fatal trap 12: page fault while in kernel mode cpuid =3D 1; apic id =3D 01 Fatal trap 12: page fault while in kernel mode fault virtual address =3D 0x88 cpuid =3D 5; fault code =3D supervisor read data, page not presen= t apic id =3D 05 instruction pointer =3D 0x20:0xffffffff814a7ee5 fault virtual address =3D 0x88 stack pointer =3D 0x28:0xffffff8c0d564f00 fault code =3D supervisor read data, page not present frame pointer =3D 0x28:0xffffff8c0d564f70 instruction pointer =3D 0x20:0xffffffff814a7ee5 code segment =3D base 0x0, limit 0xfffff, type 0x1b stack pointer =3D 0x28:0xffffff8c1009aad0 =3D DPL 0, pres 1, long 1, def32 0, gran 1 frame pointer =3D 0x28:0xffffff8c1009ab40 processor eflags =3D code segment =3D base 0x0, limit 0xfff= ff, type 0x1b interrupt enabled, =3D DPL 0, pres 1, long 1, def32 0,= gran 1 resume, processor eflags =3D IOPL =3D 0 interrupt enabled, current process =3D resume, 0 (system_taskq= _3) I[ thread pid 0 tid 100099 ] Stopped at zio_vdev_child_io+0x25: cmpq $0, 0x88(%r10) db> bt Tracing pid 0 tid 100099 td 0xfffffe000ee4e460 zio_vdev_child_io() at zio_vdev_child_io+0x25 vdev_mirror_io_start() at vdev_mirror_io_start+0x16c zio_vdev_io_start() at zio_vdev_io_start+0x232 zio_execute() at zio_execute+0xc3 zio_gang_assemble() at zio_gang_assemble+0x1b zio_execute() at zio_execute+0xc3 arc_read_nolock() at arc_read_nolock+0x6d1 arc_read() at arc_read+0x93 traverse_prefetcher() at traverse_prefetcher+0x103 traverse_visitbp() at traverse_visitbp+0x21c traverse_dnode() at traverse_dnode+0x7c traverse_visitbp() at traverse_visitbp+0x3ff traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_visitbp() at traverse_visitbp+0x316 traverse_dnode() at traverse_dnode+0x7c traverse_visitbp() at traverse_visitbp+0x48c traverse_prefetch_thread() at traverse_prefetch_thread+0x78 taskq_run() at taskq_run+0x13 taskqueue_run_locked() at taskqueue_run_locked+0x85 taskqueue_thread_loop() at taskqueue_thread_loop+0x46 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffff8c0d565d00, rbp =3D 0 --- db> //Martin Ranne ________________________________________ No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1901 / Virus Database: 2109/4760 - Release Date: 01/22/12