From owner-freebsd-bugs@FreeBSD.ORG Thu Apr 17 14:20:03 2008 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8A101065671 for ; Thu, 17 Apr 2008 14:20:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9FA678FC1B for ; Thu, 17 Apr 2008 14:20:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m3HEK3co020413 for ; Thu, 17 Apr 2008 14:20:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m3HEK3GA020412; Thu, 17 Apr 2008 14:20:03 GMT (envelope-from gnats) Date: Thu, 17 Apr 2008 14:20:03 GMT Message-Id: <200804171420.m3HEK3GA020412@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Bob Frazier Cc: Subject: Re: kern/122615: occasional crash/boot while running Xorg X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Bob Frazier List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2008 14:20:03 -0000 The following reply was made to PR kern/122615; it has been noted by GNATS. From: Bob Frazier To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/122615: occasional crash/boot while running Xorg Date: Thu, 17 Apr 2008 08:18:20 -0700 It appears that this problem may be related specifically to the SATA controller. I had several crashes happen to me this morning, most of them without Xorg running. Prior to this, Xorg had been running for several days without incident. I should point out that I have 2 jails running from directories on the SATA drive, which is the 2nd drive in my system. So I can expect file activity on this drive from time to time due to cron, etc. running in the jails. The SATA drive has a single NFS partition and is 160Gb. Crash 1: copying a ~180Mb file from an NFS share on a linux machine to a location on the SATA drive. System froze up then rebooted. no core dump. Crash 2: From the console (no X running), after copying the same file again (while background checks were being done), copied this same file to a USB ramdisk and started another process (in a different vconsole) to compare a number of existing files against (should be) identical files on the same NFS share as before. When I issued the 'umount' command, the system rebooted. No core dump. Crash 3: Started the file comparison (again), after manually fsck'ing the partitions on the IDE drive (/, /tmp, /var, /usr) in single-user and pressing CTRL+D to resume startup. System rebooted with a crash dump (#4 in /var/crash). Crash 4: Started the system, booted to single user, fsck'd the 4 mountpoints on the IDE drive again, ctrl+D to multi-user, and then started typing in a command. System froze up and rebooted with a crash dump (#5 in /var/crash). In each case the crash symptoms are similar to the one I reported here. I'm lacking time at the moment and will follow up with more backtraces for the 2 crashdump files on request. At the moment I'm running an fsck on the SATA drive with the drive unmounted in multi-user mode (jails not running). Hopefully this won't crash and I can validate and offload files from this drive. I am starting to suspect that the SATA controller or the drive itself is at the root of the problem. The typical symptoms include a message in which the 'ad4' (SATA) drive has some kind of error, followed by a message that suggests it is being removed or not responding or something similar, followed by several reported errors reading/writing LBA locations that seem unusually large for a drive that size, followed by the crash/boot. Unfortunately this information gets lost every time, if I'm even lucky enough to see the writing on the terminal before the system boots. The only relevant piece of information that seems to end up in the info.# file is "vinvalbuf: dirtybufs" as the cause for the 'panic'.