Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Jul 2011 18:48:33 +0300
From:      George Kontostanos <gkontos.mail@gmail.com>
To:        Christian Baer <christian.baer@uni-dortmund.de>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Crashes with Promise controller
Message-ID:  <CA%2BdUSyorF%2BfXUJ8UJQm8TNM0_orb7_0JJj4TpXEbysbMxyq=TQ@mail.gmail.com>
In-Reply-To: <iuvbao$l84$1@dough.gmane.org>
References:  <it56el$tqa$1@dough.gmane.org> <52F39CE0-EEC7-4180-8186-BF8696AF279D@lassitu.de> <20110618175215.GA18645@icarus.home.lan> <iuvbao$l84$1@dough.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 5, 2011 at 6:41 PM, Christian Baer
<christian.baer@uni-dortmund.de> wrote:
> On 18.06.2011 19:52, Jeremy Chadwick wrote:
>
>> It may be that the kernel is panic'ing and auto-rebooting before he can
>> see the message in question. =A0I would advocate he put the following
>> directives in his kernel configuration and rebuild/reinstall kernel and
>> wait for it to happen again.
>
> I have now changed the power setup slightly and the problems have
> *reduced* and slightly changed in themselves. Reproducing a panic is a
> lot harder, which I consider a good thing at the moment.
>
> Since I changed the power configuration, the system has been running for
> about 4 days and had only two crashes (traps) since then, despite quite
> heavy traffic on the drives. Because the system rebooted very quickly
> before I set up the serial console, I only ever got to see one panic
> (not a trap) in the past. But it was gone to quickly for me to write
> anything down about it.
>
> On a side-note:
> I did find out during my testing (before changing the power) that two
> drives were actually causing the problems and I could even make the
> system crash while only reading from one of those drives. Crashes while
> reading felt less frequent (no statistics collected though) but happened
> just the same.
>
> Because I formatted the two drives in question with rather strange
> values (rather large block sizes), I have decided to copy everything off
> them, re-partition them with gpt and create both the encryption-system
> on them aswell as the file system over.
>
> During this copying, I managed to crash the system twice. The first time
> was yesterday, where I got this:
>
> --- snip ---
> Fatal trap 12: page fault while in kernel mode
> fault virtual address =A0 =3D 0x1f8
> fault code =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D supervisor read, page not prese=
nt
> instruction pointer =A0 =A0 =3D 0x20:0xc3d2120c
> stack pointer =A0 =A0 =A0 =A0 =A0 =3D 0x28:0xc3697bf4
> frame pointer =A0 =A0 =A0 =A0 =A0 =3D 0x28:0xc3697c4c
> code segment =A0 =A0 =A0 =A0 =A0 =A0=3D base 0x0, limit 0xfffff, type 0x1=
b
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D DPL 0, pres 1, def32 1=
, gran 1
> processor eflags =A0 =A0 =A0 =A0=3D interrupt enabled, resume, IOPL =3D 0
> current process =A0 =A0 =A0 =A0 =3D 2 (g_event)
> [thread pid 2 tid 100007 ]
> Stopped at =A0 =A0 =A0g_eli_access+0x7c: =A0 =A0 =A0testl =A0 $0x10008,0x=
1f8(%ebx)
> --- snap ---
>
> About 25 minutes ago, the system crashed again. This time, I had the
> "known" errors prior to the actual trap:
>
> --- snip ---
> ata6: SIGNATURE: ffffffff
> ata6: timeout waiting to issue command
> ata6: error issuing SETFEATURES SET TRANSFER MODE command
> ata6: timeout waiting to issue command
> ata6: error issuing SETFEATURES ENABLE RCACHE command
> ata6: timeout waiting to issue command
> ata6: error issuing SETFEATURES ENABLE WCACHE command
> ata6: timeout waiting to issue command
> ata6: error issuing SET_MULTI command
> ad12: FAILURE - device detached
> GEOM_ELI: g_eli_read_done() failed ad12d.eli[READ(offset=3D403810975744,
> length=3D32768)]
> g_vfs_done():ad12d.eli[READ(offset=3D403810975744, length=3D32768)]error =
=3D 6
>
> Fatal trap 12: page fault while in kernel mode
> fault virtual address =A0 =3D 0x1f8
> fault code =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D supervisor read, page not prese=
nt
> instruction pointer =A0 =A0 =3D 0x20:0xc3d2420c
> stack pointer =A0 =A0 =A0 =A0 =A0 =3D 0x28:0xc3697bf4
> frame pointer =A0 =A0 =A0 =A0 =A0 =3D 0x28:0xc3697c4c
> code segment =A0 =A0 =A0 =A0 =A0 =A0=3D base 0x0, limit 0xfffff, type 0x1=
b
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D DPL 0, pres 1, def32 1=
, gran 1
> processor eflags =A0 =A0 =A0 =A0=3D interrupt enabled, resume, IOPL =3D 0
> current process =A0 =A0 =A0 =A0 =3D 2 (g_event)
> [thread pid 2 tid 100007 ]
> Stopped at =A0 =A0 =A0g_eli_access+0x7c: =A0 =A0 =A0testl =A0 $0x10008,0x=
1f8(%ebx)
> --- snap ---
>
> The strange thing is that I wasn't actually accessing ad12 at the time.
> I was running a "-t long" on it, but no more. That test had been running
> for over two hours at the time of the crash.
>
> Does this still somehow point to a power problem (since ad12 seems to
> get detached)? Or could is be something a bit more fundamental?
>
> Best regards,
> Chris

>

I am not sure if it is the same controller:

http://www.freebsd.org/cgi/query-pr.cgi?pr=3D158268

--=20
George Kontostanos
aisecure.net



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BdUSyorF%2BfXUJ8UJQm8TNM0_orb7_0JJj4TpXEbysbMxyq=TQ>