Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Feb 2010 20:23:15 +0900
From:      Stephane LAPIE <stephane.lapie@darkbsd.org>
To:        Andriy Gapon <avg@icyb.net.ua>
Cc:        freebsd-fs@freebsd.org, Julian Elischer <julian@elischer.org>, freebsd-hardware@freebsd.org
Subject:   Re: [zfs][hardware] Reproducible kernel panic in 8.0-STABLE
Message-ID:  <4B695CA3.50008@darkbsd.org>
In-Reply-To: <4B68641D.9000201@icyb.net.ua>
References:  <4B682972.6030604@darkbsd.org> <4B682F29.90505@icyb.net.ua> <4B686324.2090308@elischer.org> <4B68641D.9000201@icyb.net.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig9026C478EB31D9AA9953CE13
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Andriy Gapon wrote:
> on 02/02/2010 19:38 Julian Elischer said the following:
>> Andriy Gapon wrote:
>>> on 02/02/2010 15:32 Stephane LAPIE said the following:
>>>> I have a case of kernel panic that can be consistently reproduced, a=
nd
>>>> which I guess is related to the hardware I'm using (Marvell controll=
ers,
>>>> check my pciconf -lv output below).
>>>>
>>>> The kernel panic message is always, consistently, the following :
>>>>
>>>> Sleeping thread (tid 100021, pid 0) owns a non-sleepable lock
>>> I probably won't be able to help you, but to kickstart debugging coul=
d
>>> you please
>>> run 'procstat -t 0' and determine what kernel thread has tid 100021 o=
n
>>> your system?
>> or in the kernel debugger after the panic, do: bt
>=20
> I think that in this case it may not help.  I mean the stack trace.
> Because, I think that this panic happens after the taskqueue thread is =
done with
> its tasks and is parked waiting.
>=20
>> you DO have options kdb and ddb right?  (I never leave home without th=
em)
>>
>=20
>=20

I just rebuilt a kernel with debugger options, and obtained the=20
following output upon pulling out one disk :

Sleeping thread (tid 100024, pid 0) owns a non-sleepable lock
sched_switch() at sched_switch+0xf8
mi_switch() at mi_switch+0x16f
sleepq_timedwait() at sleepq_timedwait+0x42
_cv_timedwait() at _cv_timedwait+0x129
_sema_timedwait() at _sema_timedwait+0x55
ata_queue_request() at ata_queue_request+0x526
ata_controlcmd() at ata_controlcmd+0xa1
ata_setmode() at ata_setmode+0xdc
ad_init() at ad_init+0x27
ad_reinit() at ad_reinit+0x48
ata_reinit() at ata_reinit+0x268
ata_conn_event() at ata_conn_event+0x49
taskqueue_run() at taskqueue_run+0x93
taskqueue_thread_loop() at taskqueue_thread_loop+0x46
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip =3D 0, rsp =3D 0xffffff80000aad30, rbp =3D 0 ---
panic: sleeping thread
cpuid =3D 2
KDB: enter: panic
[thread pid 12 tid 100008 ]
Stopped at      kdb_enter+0x3d: movq    $0,0x4943d0(%rip)

I think the output below is not really relevant though.

db> bt
Tracing pid 12 tid 100008 td 0xffffff000187e000
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
turnstile_adjust() at turnstile_adjust
turnstile_wait() at turnstile_wait+0x1aa
_mtx_lock_sleep() at _mtx_lock_sleep+0xb0
softclock() at softclock+0x2a9
intr_event_execute_handlers() at intr_event_execute_handlers+0xfd
ithread_loop() at ithread_loop+0x8e
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip =3D 0, rsp =3D 0xffffff800005ad30, rbp =3D 0 ---

If there is anything else I can run to obtain further information, all=20
hints are welcome, though this clearly seems to point to a problem with=20
my controller event handling as I initially thought.

I am also very suspicious of that controller because it tends to drop=20
two disks at exactly the same time, which alas belong to the same raidz1 =

block (BIOS level can't reset properly the port or redetect them after=20
this, I have to go through a cold boot; The disks themselves could be=20
damaged but I don't catch any weird readings via SMART and Reallocated=20
Sectors or such). I am seriously thinking of moving some of these disks=20
to the AHCI controller on my motherboard, and will resort to using my=20
spares at the very least in the meantime.

Thanks for your time,
--=20
Stephane LAPIE, EPITA SRS, Promo 2005
"Even when they have digital readouts, I can't understand them."
--MegaTokyo


--------------enig9026C478EB31D9AA9953CE13
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktpXKkACgkQ24Ql8u6TF2PafgCg0KHN21iTsRKK5bicKqrVo4Rv
E68AoKFECb7szXCvNUWvk7k40dKfMI5r
=URPh
-----END PGP SIGNATURE-----

--------------enig9026C478EB31D9AA9953CE13--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B695CA3.50008>