From owner-freebsd-bugs@FreeBSD.ORG Tue Sep 16 11:10:03 2008 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 249E41065677 for ; Tue, 16 Sep 2008 11:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0257B8FC22 for ; Tue, 16 Sep 2008 11:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m8GBA2E0082737 for ; Tue, 16 Sep 2008 11:10:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m8GBA23D082736; Tue, 16 Sep 2008 11:10:02 GMT (envelope-from gnats) Resent-Date: Tue, 16 Sep 2008 11:10:02 GMT Resent-Message-Id: <200809161110.m8GBA23D082736@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Ruben van Staveren Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 155F1106567C for ; Tue, 16 Sep 2008 11:05:20 +0000 (UTC) (envelope-from ruben@erg.verweg.com) Received: from erg.verweg.com (erg.verweg.com [217.77.141.129]) by mx1.freebsd.org (Postfix) with ESMTP id 904E68FC23 for ; Tue, 16 Sep 2008 11:05:19 +0000 (UTC) (envelope-from ruben@erg.verweg.com) Received: from erg.verweg.com (erg.verweg.com [217.77.141.129]) by erg.verweg.com (8.14.3/8.14.2) with ESMTP id m8GAsn3E079296 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 16 Sep 2008 10:54:50 GMT (envelope-from ruben@erg.verweg.com) Received: (from ruben@localhost) by erg.verweg.com (8.14.3/8.14.2/Submit) id m8GAsnp9079295; Tue, 16 Sep 2008 12:54:49 +0200 (CEST) (envelope-from ruben) Message-Id: <200809161054.m8GAsnp9079295@erg.verweg.com> Date: Tue, 16 Sep 2008 12:54:49 +0200 (CEST) From: Ruben van Staveren To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Subject: kern/127420: panic: Journal overflow on gmirrored gjournal X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Ruben van Staveren List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Sep 2008 11:10:03 -0000 >Number: 127420 >Category: kern >Synopsis: panic: Journal overflow on gmirrored gjournal >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Sep 16 11:10:02 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Ruben van Staveren >Release: FreeBSD 7.1-PRERELEASE amd64 >Organization: >Environment: System: FreeBSD chassis 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #2: Tue Sep 16 11:29:52 CEST 2008 root@chassis:/opt/obj/usr/cvsup/7-stable/src/sys/CHASSIS-DEBUG amd64 >Description: Crash 1 panic: Journal overflow (joffset=180955342336 active=180735900160 inactive=180952868864) cpuid = 1 Uptime: 40m34s Physical memory: 4085 MB Dumping 625 MB: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x200 fault code = supervisor read instruction, page not present instruction pointer = 0x8:0x200 stack pointer = 0x10:0xffffffffae1ece40 frame pointer = 0x10:0xffffffffae1ece70 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 47 (g_journal mirror/gm) trap number = 12 Crash 2 (with debug kernel) panic: Journal overflow (joffset=180542946816 active=181305220608 inactive=180542008320) cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17d g_journal_flush() at g_journal_flush+0x8cb g_journal_worker() at g_journal_worker+0x14ce fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffae1edd30, rbp = 0 --- panic: BUF_UNLOCK 0xffffffff9a26e220 while B_REMFREE is still set. cpuid = 1 panic: BUF_UNLOCK 0xffffffff9a04b420 while B_REMFREE is still set. cpuid = 1 Uptime: 20m24s Physical memory: 4084 MB Dumping 625 MB: Unfortunately, dumping doesn't succeed anymore at this stage Kernel config, the -DEBUG version just includes that file with as extra options: options BREAK_TO_DEBUGGER options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_KDB options DIAGNOSTIC (I had to disable some KASSERTS in sys/geom/geom_io.c as gjournal may alter some data there it seems, also see http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-08/msg00648.html ) http://ruben.is.verweg.com/stuff/gjournal-panic/CHASSIS http://ruben.is.verweg.com/stuff/gjournal-panic/dmesg.boot The machine is a Sun X2100M2 with 2 x 250Gb SATA drives Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NOFAILSYNC GenID: 0 SyncID: 1 ID: 4042519102 Providers: 1. Name: mirror/gm0 Mediasize: 250055999488 (233G) Sectorsize: 512 Mode: r6w6e8 Consumers: 1. Name: ad4 Mediasize: 250056000000 (233G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 1 Flags: NONE GenID: 0 SyncID: 1 ID: 2820405034 2. Name: ad6 Mediasize: 250056000000 (233G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: NONE GenID: 0 SyncID: 1 ID: 933275518 Geom name: gjournal 243051746 ID: 243051746 Providers: 1. Name: mirror/gm0s1a.journal Mediasize: 3221224960 (3.0G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1a Mediasize: 4294967296 (4.0G) Sectorsize: 512 Mode: r1w1e1 Jend: 4294966784 Jstart: 3221224960 Role: Data,Journal Geom name: gjournal 3027218344 ID: 3027218344 Providers: 1. Name: mirror/gm0s1d.journal Mediasize: 33285996032 (31G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1d Mediasize: 34359738368 (32G) Sectorsize: 512 Mode: r1w1e1 Jend: 34359737856 Jstart: 33285996032 Role: Data,Journal Geom name: gjournal 1964026446 ID: 1964026446 Providers: 1. Name: mirror/gm0s1e.journal Mediasize: 3221224960 (3.0G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1e Mediasize: 4294967296 (4.0G) Sectorsize: 512 Mode: r1w1e1 Jend: 4294966784 Jstart: 3221224960 Role: Data,Journal Geom name: gjournal 3220754734 ID: 3220754734 Providers: 1. Name: mirror/gm0s1f.journal Mediasize: 7516192256 (7.0G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1f Mediasize: 8589934592 (8.0G) Sectorsize: 512 Mode: r1w1e1 Jend: 8589934080 Jstart: 7516192256 Role: Data,Journal Geom name: gjournal 1120739874 ID: 1120739874 Providers: 1. Name: mirror/gm0s1g.journal Mediasize: 180255252480 (168G) Sectorsize: 512 Mode: r1w1e2 Consumers: 1. Name: mirror/gm0s1g Mediasize: 181328994816 (169G) Sectorsize: 512 Mode: r1w1e1 Jend: 181328994304 Jstart: 180255252480 Role: Data,Journal Name Status Components label/swap N/A mirror/gm0s1b ufs/root N/A mirror/gm0s1a.journal ufs/var N/A mirror/gm0s1d.journal ufs/tmp N/A mirror/gm0s1e.journal ufs/usr N/A mirror/gm0s1f.journal ufs/opt N/A mirror/gm0s1g.journal ******* Working on device /dev/ad4 ******* parameters extracted from in-core disklabel are: cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl) Figures below won't work with BIOS for partitions not in cyl 1 parameters to be used for BIOS calculations are: cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 488375937 (238464 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 703/ head 254/ sector 63 The data for partition 2 is: The data for partition 3 is: The data for partition 4 is: ******* Working on device /dev/ad6 ******* parameters extracted from in-core disklabel are: cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl) Figures below won't work with BIOS for partitions not in cyl 1 parameters to be used for BIOS calculations are: cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 488375937 (238464 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 703/ head 254/ sector 63 The data for partition 2 is: The data for partition 3 is: The data for partition 4 is: # /dev/mirror/gm0s1: 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 8388608 16 4.2BSD 2048 16384 28528 b: 33554432 8388624 swap c: 488375937 0 unused 0 0 # "raw" part, don't edit d: 67108864 41943056 4.2BSD 2048 16384 28528 e: 8388608 109051920 4.2BSD 2048 16384 28528 f: 16777216 117440528 4.2BSD 2048 16384 28528 g: 354158193 134217744 4.2BSD 2048 16384 28528 /dev/ufs/root on / (ufs, asynchronous, local, gjournal) devfs on /dev (devfs, local) /dev/ufs/opt on /opt (ufs, asynchronous, local, gjournal) /dev/ufs/tmp on /tmp (ufs, asynchronous, local, gjournal) /dev/ufs/usr on /usr (ufs, asynchronous, local, gjournal) /dev/ufs/var on /var (ufs, asynchronous, local, gjournal) >How-To-Repeat: on /opt/bonnie, run in parallel bonnie++ -c 4 -s 4096 -r 4096 -u nobody -d $PWD both bonnie processes will stall the system with suspfs/wdrain states until it panics. Also building a 1Gb sized nanobsd image will lock during disk install phase on suspfs/wdrain, but that is not always reproducable: it succeeds about 50% of the time. It looks it takes longer to trigger when using the debugging options. >Fix: Maybe don't run a mirrored gjournal on FreeBSD/amd64 ? >Release-Note: >Audit-Trail: >Unformatted: