From owner-freebsd-stable@FreeBSD.ORG Tue Jul 5 15:48:35 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE5BC106566B for ; Tue, 5 Jul 2011 15:48:35 +0000 (UTC) (envelope-from gkontos.mail@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9A6068FC08 for ; Tue, 5 Jul 2011 15:48:34 +0000 (UTC) Received: by iwr19 with SMTP id 19so7121647iwr.13 for ; Tue, 05 Jul 2011 08:48:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=ZwSbB6sdodYZ3BYGV8JUtWV1v0FTcRxU47rdYraxI6o=; b=oUxm1OZSLPqVPV8t6ru3fwNg4+cN8USGgFZBqBm9nmPD6qI3Y9vBJA6VwD0tuaO3Um GR4Tk6zBMU1Ncqrudyr+UTml3jGI4Lz2dWiJNHGUl/QmMZqur6y8dJKzgiSBikHdK9r2 vFd1p7dfmCqS6pEnBiHaLTQDUM/LBNa2cr088= MIME-Version: 1.0 Received: by 10.231.127.142 with SMTP id g14mr2543786ibs.163.1309880913387; Tue, 05 Jul 2011 08:48:33 -0700 (PDT) Received: by 10.231.15.205 with HTTP; Tue, 5 Jul 2011 08:48:33 -0700 (PDT) In-Reply-To: References: <52F39CE0-EEC7-4180-8186-BF8696AF279D@lassitu.de> <20110618175215.GA18645@icarus.home.lan> Date: Tue, 5 Jul 2011 18:48:33 +0300 Message-ID: From: George Kontostanos To: Christian Baer Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-stable@freebsd.org Subject: Re: Crashes with Promise controller X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2011 15:48:35 -0000 On Tue, Jul 5, 2011 at 6:41 PM, Christian Baer wrote: > On 18.06.2011 19:52, Jeremy Chadwick wrote: > >> It may be that the kernel is panic'ing and auto-rebooting before he can >> see the message in question. =A0I would advocate he put the following >> directives in his kernel configuration and rebuild/reinstall kernel and >> wait for it to happen again. > > I have now changed the power setup slightly and the problems have > *reduced* and slightly changed in themselves. Reproducing a panic is a > lot harder, which I consider a good thing at the moment. > > Since I changed the power configuration, the system has been running for > about 4 days and had only two crashes (traps) since then, despite quite > heavy traffic on the drives. Because the system rebooted very quickly > before I set up the serial console, I only ever got to see one panic > (not a trap) in the past. But it was gone to quickly for me to write > anything down about it. > > On a side-note: > I did find out during my testing (before changing the power) that two > drives were actually causing the problems and I could even make the > system crash while only reading from one of those drives. Crashes while > reading felt less frequent (no statistics collected though) but happened > just the same. > > Because I formatted the two drives in question with rather strange > values (rather large block sizes), I have decided to copy everything off > them, re-partition them with gpt and create both the encryption-system > on them aswell as the file system over. > > During this copying, I managed to crash the system twice. The first time > was yesterday, where I got this: > > --- snip --- > Fatal trap 12: page fault while in kernel mode > fault virtual address =A0 =3D 0x1f8 > fault code =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D supervisor read, page not prese= nt > instruction pointer =A0 =A0 =3D 0x20:0xc3d2120c > stack pointer =A0 =A0 =A0 =A0 =A0 =3D 0x28:0xc3697bf4 > frame pointer =A0 =A0 =A0 =A0 =A0 =3D 0x28:0xc3697c4c > code segment =A0 =A0 =A0 =A0 =A0 =A0=3D base 0x0, limit 0xfffff, type 0x1= b > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D DPL 0, pres 1, def32 1= , gran 1 > processor eflags =A0 =A0 =A0 =A0=3D interrupt enabled, resume, IOPL =3D 0 > current process =A0 =A0 =A0 =A0 =3D 2 (g_event) > [thread pid 2 tid 100007 ] > Stopped at =A0 =A0 =A0g_eli_access+0x7c: =A0 =A0 =A0testl =A0 $0x10008,0x= 1f8(%ebx) > --- snap --- > > About 25 minutes ago, the system crashed again. This time, I had the > "known" errors prior to the actual trap: > > --- snip --- > ata6: SIGNATURE: ffffffff > ata6: timeout waiting to issue command > ata6: error issuing SETFEATURES SET TRANSFER MODE command > ata6: timeout waiting to issue command > ata6: error issuing SETFEATURES ENABLE RCACHE command > ata6: timeout waiting to issue command > ata6: error issuing SETFEATURES ENABLE WCACHE command > ata6: timeout waiting to issue command > ata6: error issuing SET_MULTI command > ad12: FAILURE - device detached > GEOM_ELI: g_eli_read_done() failed ad12d.eli[READ(offset=3D403810975744, > length=3D32768)] > g_vfs_done():ad12d.eli[READ(offset=3D403810975744, length=3D32768)]error = =3D 6 > > Fatal trap 12: page fault while in kernel mode > fault virtual address =A0 =3D 0x1f8 > fault code =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D supervisor read, page not prese= nt > instruction pointer =A0 =A0 =3D 0x20:0xc3d2420c > stack pointer =A0 =A0 =A0 =A0 =A0 =3D 0x28:0xc3697bf4 > frame pointer =A0 =A0 =A0 =A0 =A0 =3D 0x28:0xc3697c4c > code segment =A0 =A0 =A0 =A0 =A0 =A0=3D base 0x0, limit 0xfffff, type 0x1= b > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D DPL 0, pres 1, def32 1= , gran 1 > processor eflags =A0 =A0 =A0 =A0=3D interrupt enabled, resume, IOPL =3D 0 > current process =A0 =A0 =A0 =A0 =3D 2 (g_event) > [thread pid 2 tid 100007 ] > Stopped at =A0 =A0 =A0g_eli_access+0x7c: =A0 =A0 =A0testl =A0 $0x10008,0x= 1f8(%ebx) > --- snap --- > > The strange thing is that I wasn't actually accessing ad12 at the time. > I was running a "-t long" on it, but no more. That test had been running > for over two hours at the time of the crash. > > Does this still somehow point to a power problem (since ad12 seems to > get detached)? Or could is be something a bit more fundamental? > > Best regards, > Chris > I am not sure if it is the same controller: http://www.freebsd.org/cgi/query-pr.cgi?pr=3D158268 --=20 George Kontostanos aisecure.net