From owner-freebsd-current@FreeBSD.ORG  Sun Jul 12 10:30:10 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3E871106566C
	for <freebsd-current@freebsd.org>; Sun, 12 Jul 2009 10:30:10 +0000 (UTC)
	(envelope-from lstewart@freebsd.org)
Received: from lauren.room52.net (lauren.room52.net [210.50.193.198])
	by mx1.freebsd.org (Postfix) with ESMTP id C04FE8FC25
	for <freebsd-current@freebsd.org>; Sun, 12 Jul 2009 10:30:09 +0000 (UTC)
	(envelope-from lstewart@freebsd.org)
Received: from lstewart-laptop.caia.swin.edu.au
	(host86-150-124-14.range86-150.btcentralplus.com [86.150.124.14])
	(authenticated bits=0)
	by lauren.room52.net (8.14.3/8.14.3) with ESMTP id n6CATrgj062031
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 12 Jul 2009 20:30:01 +1000 (EST)
	(envelope-from lstewart@freebsd.org)
Message-ID: <4A59BB11.70706@freebsd.org>
Date: Sun, 12 Jul 2009 11:29:37 +0100
From: Lawrence Stewart <lstewart@freebsd.org>
User-Agent: Thunderbird 2.0.0.22 (X11/20090626)
MIME-Version: 1.0
To: Stefan Bethke <stb@lassitu.de>
References: <128E7C52-CCBD-4BAF-A4AE-1D914A3968CB@lassitu.de>
	<4A58DD8D.3090308@freebsd.org>
	<6D58BB3C-85F4-44A6-A43B-F6E18F056FA4@lassitu.de>
	<4A598DDF.4010306@freebsd.org>
	<6C047344-397E-4F14-97F1-C61FD80AAC3F@lassitu.de>
In-Reply-To: <6C047344-397E-4F14-97F1-C61FD80AAC3F@lassitu.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-0.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_PBL,
	RDNS_DYNAMIC,SPF_SOFTFAIL autolearn=disabled version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on lauren.room52.net
Cc: FreeBSD Current <freebsd-current@freebsd.org>
Subject: Re: ppp triggers GPF panic
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Jul 2009 10:30:10 -0000

Stefan Bethke wrote:
> Am 12.07.2009 um 09:16 schrieb Lawrence Stewart:
> 
>> Stefan Bethke wrote:
>>> Am 11.07.2009 um 20:44 schrieb Lawrence Stewart:
>>>> Stefan Bethke wrote:
>>>>> Yesterday's -current, amd64, C2D, 4 GB RAM. Full dmesg below.
>>>>> Fatal trap 9: general protection fault while in kernel mode
>>>>> cpuid = 0; apic id = 00
>>>>> instruction pointer    = 0x20:0xffffffff802fc2ce
>>>>> stack pointer            = 0x28:0xffffff8000037b10
>>>>> frame pointer            = 0x28:0xffffff8000037b30
>>>>> code segment        = base 0x0, limit 0xfffff, type 0x1b
>>>>>           = DPL 0, pres 1, long 1, def32 0, gran 1
>>>>> processor eflags    = interrupt enabled, resume, IOPL = 0
>>>>> current process        = 12 (swi1: netisr 0)
>>>>> [thread pid 12 tid 100007 ]
>>>>> Stopped at      _mtx_lock_sleep+0x4e:   movl    0x288(%rcx),%esi
>>>>> Didn't capture anything else there.  This happened when my ADSL 
>>>>> link was forced down (24h connection reset).
>>>>> After fixing the file system (UFS2 + softupdates on /), I got 
>>>>> another "panic: spin lock held too long" on rebooting.
>>>>> Then, the GPF panic happened again as ppp was trying to establish 
>>>>> the connection:
>>>>
>>>> 1. Do you have a crash dump?
>>> Unfortunatly not.
>>>> 2. Can you try find a sequence of events to deterministically 
>>>> reproduce this?
>>> Not if I can help it, this is my main gateway at home.  Sorry.  But 
>>> I'll try collect as much info as possible if and when it happens again.
>>
>> You can set debug.debugger_on_panic=0 in /etc/sysctl.conf which will 
>> make the system automatically dump core and reset instead of sitting 
>> at the ddb prompt. Alternatively, run "call doadump" from the ddb 
>> prompt followed by "reset" and that should also get you a usable core 
>> file. I'd suggest the first option for you though given you don't like 
>> the machine being down. Let us know if/when it happens again, but 
>> without a core file there's not much we can help with.
> 
> Happend again when ppp tried to reestablish the connection. 
> Unfortunatly, the dump wasn't good enough for savecore:
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 1; apic id = 01
> instruction pointer    = 0x20:0xffffffff802fc2ce
> stack pointer            = 0x28:0xffffff807512c540
> frame pointer            = 0x28:0xffffff807512c560
> code segment        = base 0x0, limit 0xfffff, type 0x1b
>             = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags    = interrupt enabled, resume, IOPL = 0
> current process        = 9451 (ifconfig)
> [thread pid 9451 tid 100126 ]
> Stopped at      _mtx_lock_sleep+0x4e:   movl    0x288(%rcx),%esi
> db> bt
> Tracing pid 9451 tid 100126 td 0xffffff0002771390
> _mtx_lock_sleep() at _mtx_lock_sleep+0x4e
> _mtx_lock_flags() at _mtx_lock_flags+0x43
> netisr_queue_internal() at netisr_queue_internal+0x4f
> netisr_queue_src() at netisr_queue_src+0x3c
> rt_newaddrmsg() at rt_newaddrmsg+0x1d1
> rtinit() at rtinit+0x3c0
> in_ifinit() at in_ifinit+0x2f0
> in_control() at in_control+0xf12
> ifioctl() at ifioctl+0xfc1
> kern_ioctl() at kern_ioctl+0xf6
> ioctl() at ioctl+0xfd
> syscall() at syscall+0x19e
> Xfast_syscall() at Xfast_syscall+0xe1
> --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800b7df5c, rsp = 
> 0x7fffffffe058, rbp = 0x7fffffffec2d ---
> db> call doadump
> Physical memory: 3983 MB
> Dumping 2351 MB: 2336 2320 2304 2288 2272 2256 2240 2224 2208 2192 2176 
> 2160 2144 2128 2112 2096 2080 2064 2048 2032 2016 2000 1984 1968 1952 
> 1936 1920 1904 1888 1872 1856 1840 1824 1808 1792 1776 1760 1744 1728 
> 1712 1696 1680 1664 1648 1632 1616 1600 1584 1568 1552 1536 1520 1504 
> 1488 1472 1456 1440 1424 1408 1392 1376 1360 1344 1328 1312 1296 1280 
> 1264 1248 1232 1216 1200 1184 1168 1152 1136 1120 1104 1088 1072 1056 
> 1040 1024 1008 992 976 960 944 928 912 896 880 864 848 832 816 800 784 
> 768 752 736 720 704 688 672 656 640 624 608 592 576 560 544 528 512 496 
> 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 
> 192 176 160 144 128 112 96 80 64 48 32 16
> Dump complete
> = 0
> db> reset
> /boot.config: -DhS38400
> Consoles: internal video/keyboard  serial port
> BIOS drive A: is disk0
> ...
> 
> savecore: first and last dump headers disagree on /dev/mirror/diesel_swap
> savecore: unsaved dumps found but not saved
> savecore: first and last dump headers disagree on /dev/mirror/diesel_swap
> savecore: unsaved dumps found but not saved
> No crash dumps in /var/crash.
> 
> 
> I'll reconfigure swap to use a raw disk instead ofa mirror.

Yeah, dump not working with mirrored disks is a huge PITA. Please make 
the change so we can get a usable crash dump.

Kamigishi has suggested to me that the panic isn't occurring (as much?) 
with a r195617 world/kernel. Could you perhaps try update to r195617 and 
let us know if you continue to observe the panic?

Cheers,
Lawrence