From owner-freebsd-questions@FreeBSD.ORG Tue Jun 12 17:24:46 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7A0C916A400 for ; Tue, 12 Jun 2007 17:24:46 +0000 (UTC) (envelope-from wbishop@twosensemedia.com) Received: from twosensemedia.com (twosensemedia.com [69.15.36.137]) by mx1.freebsd.org (Postfix) with ESMTP id 13EB713C45A for ; Tue, 12 Jun 2007 17:24:45 +0000 (UTC) (envelope-from wbishop@twosensemedia.com) Received: from [10.0.1.8] (account wbishop HELO S0030153310) by twosensemedia.com (CommuniGate Pro SMTP 4.2.10) with ESMTP id 3203549 for freebsd-questions@freebsd.org; Tue, 12 Jun 2007 11:37:52 -0400 Message-ID: <004b01c7ad07$b3d474d0$0801000a@S0030153310> From: "Worth Bishop" To: Date: Tue, 12 Jun 2007 11:38:19 -0400 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3028 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Subject: Fw: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page fault; warning: 'T2' might be used uninitialized X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jun 2007 17:24:46 -0000 Addendum: For what it's worth, the 250Gb Samsung drive was added when the system was upgraded - it's only 3-4 months old. ----- Original Message ----- From: "Worth Bishop" To: Sent: Tuesday, June 12, 2007 11:33 AM Subject: FreeBSD 6.2 Repeating Crash - Sleeping thread; Fatal trap 12: page fault; warning: 'T2' might be used uninitialized > Please help if you can... > > BACKGROUND > > This crash is occurring on a dual-AMD 1.6Ghz cpu white-box system with 1 > Gb ram, 250Gb storage running GENERIC kernel. The system has been in > production use as a web server for nearly five years. > > About 3 - 4 months ago, the system was upgraded from an earlier FreeBSD > version to 6.1. At the same time, all supporting applications (Apache > webserver, PERL, PostgreSQL, PHP, countless other applications & > libraries) were upgraded to the current releases. The system was stable up > until a couple of weeks ago. > > FIRST ERROR EVENT > > The system crashed during normal usage. The following message was > displayed on the console which was not responsive to keyboard input: > > Sleeping thread (tid 100122, pid 11099) > owns a non-sleepable lock > > panic: sleeping thread > cpuid=1 > > The system was restarted, an fsck routine was completed (answering "yes" > to all the "Do you want to salvage" type questions) and the server ran > fine. For about a week. It then crashed again several times, at intervals > varying from a few minutes of uptime to a few days. > > SECOND ERROR EVENT > > After some crashes, a message similar to that above was displayed. > However, at other times a message similar to this was displayed: > > kernel trap 12 with interrupts disabled > > Fatal trap 12: page fault while in kernel mode > cpuid=0; apic id=01 > fault virtual address =0x100 > fault code =supervisor read, page not present > instruction pointer =0x20:0xc066c731 > stack pointer =0x28:0xe432ebf0 > framepointer =0x28:0xe432ebfc > code segment =base 0x0, limit0xfffff, type 0x1b > > =DPL 0, pres 1, def32 1, gran1 > processor eflags =resume, IOPL=0 > current process =36 (syncer) > trap number = 12 > panic: page fault > cpuid=0 > uptime: 3d10h11m44s > Dumping 1535 Mb (2 chunks) [NOTE: the system had 1.5Gb memory at that > time. Memory was removed, reseated, swapped, etc., now 1Gb] > chunk 0:1Mb (159 pages) > > CORRECTIONS ATTEMPTED > > Somewhere during this ordeal, a Google search revealed a number of other > people experiencing the "Sleeping thread" problem. One of these was > apparently experienced in a FreeBSD 6.x development version stress test. > No definitive solution was identified in anything we say, except a single > reference to the problem being a kernel bug fixed in FreeBSD 6.2. > > Accordingly, we upgraded from 6.1 to 6.2 but have still experienced the > problem. > > We reviewed the 'messages' file and found references to several things > which led us to check FreeBSD 6.2 ERRATA > (http://www.freebsd.org/releases/6.2R/errata.html). This suggested adding > 'kern.ipc.nmbclusters="0"' to the /boot/loader.conf file which might avoid > a known issue. We tried this, but saw no relief. > > We also found a reference in the manual that suggested the issue might be > a problem with the APIC in 6.x. This recommended adding > 'hint.apic.0.disabled="1"' to loader.conf. Tried this; no help. > > In order to try to get more information about the system dumps we added: > dumpdev="AUTO" and dumpdir="/usr/crash" [to get more storage space than > available in /var/] and have generated several vmcore.# files of ~1 Gb > each (all identical size). > > We attempted to use DDB to analyze the dumps (struggling now, unfamiliar > with kernel debugging process) with no success. Research suggested we > needed to create a debug version of the kernel (i.e., KERNEL.DEBUG) with > debugging options enabled. > > We duly copied GENERIC and edited it, noting that "options ddb" was > already enabled. We added 'makeoptions DEBUG=-g # Build > kernel with gdb(1) debug symbols' as suggested and tried to "make > buildkernel" which errored out stating that KDB must be enabled to use > DDB. We edited KERNEL.DEBUG to add 'options KDB > # Enable kernel debugger' and attempted to "make buildkernel" again. This > time, the process stopped again with the message: > > THIRD ERROR EVENT > > [snip] > inline-unit-growth=100 --param > rge-function-growth=1000 -mno-align-long-strings -mpreferred-stack-boundary=2 > -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Werror > /usr/src/sys/crypto/sha2/sha2.c > /usr/src/sys/crypto/sha2/sha2.c: In function `SHA512_Transform': > /usr/src/sys/crypto/sha2/sha2.c:753: warning: 'T2' might be used > uninitialized in this function > *** Error code 1 > > Stop in /usr/obj/usr/src/sys/KERNEL.DEBUG. > *** Error code 1 > > Stop in /usr/src. > *** Error code 1 > > Stop in /usr/src. > www:/usr/src# > > With this, we are stumped. > > HELP PLEASE! > > Can anyone: > > - lead us to a solution based on these error messages? > - help us understand why the GENERIC kernel with only the debugging > options added failed to make? > - help us understand what '/usr/src/crypto/sha2/sha2.c' has to do with > anything? > - help us understand what we need to do to extract useful information > from the vmcore.# files? > - offer any other suggestions? > > Thanks in advance! > > > > > > > >