From owner-freebsd-stable@FreeBSD.ORG Wed Aug 6 09:01:08 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A0DBA1065672; Wed, 6 Aug 2008 09:01:08 +0000 (UTC) (envelope-from ws@au.dyndns.ws) Received: from ipmail05.adl2.internode.on.net (ipmail05.adl2.internode.on.net [203.16.214.145]) by mx1.freebsd.org (Postfix) with ESMTP id 76A5E8FC21; Wed, 6 Aug 2008 09:01:07 +0000 (UTC) (envelope-from ws@au.dyndns.ws) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjQCADABmUiWZWdv/2dsb2JhbAAIrUQ X-IronPort-AV: E=Sophos;i="4.31,314,1215354600"; d="scan'208";a="175517543" Received: from ppp103-111.static.internode.on.net (HELO [192.168.1.157]) ([150.101.103.111]) by ipmail05.adl2.internode.on.net with ESMTP; 06 Aug 2008 18:15:47 +0930 From: Wayne Sierke To: Jeremy Chadwick In-Reply-To: <20080806033016.GA35921@eos.sc1.parodius.com> References: <20080806033016.GA35921@eos.sc1.parodius.com> Content-Type: text/plain; charset=UTF-8 Date: Wed, 06 Aug 2008 18:15:45 +0930 Message-Id: <1218012345.4383.106.camel@predator-ii.buffyverse> Mime-Version: 1.0 X-Mailer: Evolution 2.22.2 FreeBSD GNOME Team Port Content-Transfer-Encoding: 8bit Cc: freebsd-stable@freebsd.org Subject: Fatal trap 12/TIMEOUT - READ_DMA (was Re: Stuck in geli) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2008 09:01:08 -0000 On Tue, 2008-08-05 at 20:30 -0700, Jeremy Chadwick wrote: > This looks like the issue I've been tracking for months now. I'm sorry > the document isn't complete; it's an issue of time... > > http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting > My experiences with disk timeouts on FreeBSD is that the OS does not > handle it well at all, regardless of geli(4) being used or not. The > entire system can deadlock, and in some cases panic (which for me is > the more common result). > Recently I returned to my desktop system to find it had rebooted itself and found the following: # kgdb /boot/kernel/kernel /var/crash/vmcore.3 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". There is no member named pathname. Error while mapping shared library sections: rtc.ko: No such file or directory. Reading symbols from /boot/kernel/vesa.ko...Reading symbols from /boot/kernel/vesa.ko.symbols...done. done. Loaded symbols for /boot/kernel/vesa.ko Reading symbols from /boot/kernel/linux.ko...Reading symbols from /boot/kernel/linux.ko.symbols...done. done. Loaded symbols for /boot/kernel/linux.ko Reading symbols from /boot/kernel/snd_ich.ko...Reading symbols from /boot/kernel/snd_ich.ko.symbols...done. done. Loaded symbols for /boot/kernel/snd_ich.ko Reading symbols from /boot/kernel/sound.ko...Reading symbols from /boot/kernel/sound.ko.symbols...done. done. Loaded symbols for /boot/kernel/sound.ko Reading symbols from /boot/modules/nvidia.ko...done. Loaded symbols for /boot/modules/nvidia.ko Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done. done. Loaded symbols for /boot/kernel/acpi.ko Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/linprocfs.ko Reading symbols from /boot/kernel/green_saver.ko...Reading symbols from /boot/kernel/green_saver.ko.symbols...done. done. Loaded symbols for /boot/kernel/green_saver.ko Error while reading shared library symbols: rtc.ko: No such file or directory. Unread portion of the kernel message buffer: ad1: TIMEOUT - READ_DMA retrying (1 retry left) LBA=67332091 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x188 fault code = supervisor read, page not present instruction pointer = 0x20:0xc075ce24 stack pointer = 0x28:0xe52f1c04 frame pointer = 0x28:0xe52f1c1c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 18 (swi6: task queue) trap number = 12 panic: page fault cpuid = 0 Uptime: 1d11h41m37s Physical memory: 1519 MB Dumping 214 MB: 199 183 167 151 135 119 103 87 71 55 39 23 7 #0 doadump () at pcpu.h:195 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xc076a137 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0xc076a3f9 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:572 #3 0xc0a71aec in trap_fatal (frame=0xe52f1bc4, eva=392) at /usr/src/sys/i386/i386/trap.c:899 #4 0xc0a71d70 in trap_pfault (frame=0xe52f1bc4, usermode=0, eva=392) at /usr/src/sys/i386/i386/trap.c:812 #5 0xc0a7271c in trap (frame=0xe52f1bc4) at /usr/src/sys/i386/i386/trap.c:490 #6 0xc0a584ab in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc075ce24 in _mtx_lock_sleep (m=0xc5abedcc, tid=3302165152, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:335 #8 0xc07693b6 in _sema_post (sema=0xc5abedcc, file=0x0, line=0) at /usr/src/sys/kern/kern_sema.c:79 #9 0xc050bd30 in ata_completed (context=0xc5abed80, dummy=1) at /usr/src/sys/dev/ata/ata-queue.c:481 #10 0xc079ce85 in taskqueue_run (queue=0xc4d21680) at /usr/src/sys/kern/subr_taskqueue.c:255 #11 0xc079d193 in taskqueue_swi_run (dummy=0x0) at /usr/src/sys/kern/subr_taskqueue.c:297 #12 0xc074acfb in ithread_loop (arg=0xc4d2a940) at /usr/src/sys/kern/kern_intr.c:1036 #13 0xc0747ad9 in fork_exit (callout=0xc074ab50 , arg=0xc4d2a940, frame=0xe52f1d38) at /usr/src/sys/kern/kern_fork.c:783 #14 0xc0a58520 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205 (kgdb) This is the first time I've examined one of these DMA TIMEOUT events. I've probably seen a handful of these over the last few years, perhaps 2 to 4 per year. Unfortunately this system also occasionally faults from X-related hangs - or what I otherwise assume to be X-related. In any case I don't often get cores left behind that I've noticed. # atacontrol info ata0 Master: ad0 ATA/ATAPI revision 6 Slave: ad1 ATA/ATAPI revision 6 >From another system I have here (a VIA EPIA/6.3-PRERELEASE) I can only find one TIMEOUT instance from the last 3 years, but with no discernible consequences. In fact that system has been rock-solid. It's a mail server, runs courier-imap, apache-1.3, samba, mysql and assorted other stuff, but is not heavily loaded. Jan 2 03:30:20 lillith-iv kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=3196031 # atacontrol info ata0 Master: ad0 ATA/ATAPI revision 6 Slave: ad1 ATA/ATAPI revision 6 Anyway, I don't know whether there's any significant or useful information in that vmcore. Perhaps someone could let me know? Wayne