Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Dec 2009 20:04:08 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        Andriy Gapon <avg@icyb.net.ua>
Cc:        freebsd-current@freebsd.org
Subject:   Re: process stuck in stat/../cache_lookup: ktorrent, zfs
Message-ID:  <3bbf2fe10912061104j53ef5be2yb1019699308b0473@mail.gmail.com>
In-Reply-To: <4B1BBEC4.7040906@icyb.net.ua>
References:  <4B1B9600.4080709@icyb.net.ua> <4B1BBEC4.7040906@icyb.net.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
2009/12/6 Andriy Gapon <avg@icyb.net.ua>:
> on 06/12/2009 13:31 Andriy Gapon said the following:
>> System is recent 9-current, amd64.
>> I see that sometimes ktorrent gets stuck during heavy download (multiple=
 files
>> in parallel, high speed). =C2=A0It is completely unresponsive and not ki=
llable even
>> with SIGKILL.
> [snip]
>> #0 =C2=A0sched_switch (td=3D0xffffff012a6c5700, newtd=3D0xffffff00015333=
80,
>> flags=3DVariable "flags" is not available.
>> ) at /usr/src/sys/kern/sched_ule.c:1865
>> #1 =C2=A00xffffffff80374baf in mi_switch (flags=3D260, newtd=3D0x0) at
>> /usr/src/sys/kern/kern_synch.c:449
>> #2 =C2=A00xffffffff803a795b in sleepq_switch (wchan=3DVariable "wchan" i=
s not available.
>> ) at /usr/src/sys/kern/subr_sleepqueue.c:509
>> #3 =C2=A00xffffffff803a8645 in sleepq_wait (wchan=3D0xffffff0105b457f8, =
pri=3D80) at
>> /usr/src/sys/kern/subr_sleepqueue.c:588
>> #4 =C2=A00xffffffff80351184 in __lockmgr_args (lk=3D0xffffff0105b457f8, =
flags=3D2097408,
>> ilk=3D0xffffff0105b45820, wmesg=3DVariable "wmesg" is not available.
>> ) at /usr/src/sys/kern/kern_lock.c:216
>
> So some more data:
> (kgdb) fr 4
>
> #4 =C2=A00xffffffff80351184 in __lockmgr_args (lk=3D0xffffff0105b457f8, f=
lags=3D2097408,
> ilk=3D0xffffff0105b45820, wmesg=3DVariable "wmesg" is not available.
> ) at /usr/src/sys/kern/kern_lock.c:216
> 216 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 sleepq_wait(&lk->lock_object, pri);
> (kgdb) p *lk
> $8 =3D {lock_object =3D {lo_name =3D 0xffffffff80ad55b6 "zfs", lo_flags =
=3D 91947008,
> lo_data =3D 0, lo_witness =3D 0x0}, lk_lock =3D 3, lk_timo =3D 51, lk_pri=
 =3D 80}
> (kgdb) p/x flags
> $9 =3D 0x200100
> (kgdb) p/x lk->lock_object.lo_flags
> $12 =3D 0x57b0000
>
> Apparently sleeplk is inlined into __lockmgr_args.
>
> So it looks like this is a LK_SHARED|LK_INTERLOCK lockmgr call which has =
not
> taken any easy path and ended up in sleepq_wait, but wakeup never comes f=
or it,
> perhaps missed?

I think that a 'missed wakeup' is a too fast (and wrong) conclusion.
here the problem is that the lock is held in shared mode (lk->lk_lock
=3D 3) so you would need to know what happened to the owners once they
got the lock.
The only way you can do that, though, is with shared acquisitions,
then you should try to reproduce it with WITNESS on.
Once you have such datas we could digg further.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3bbf2fe10912061104j53ef5be2yb1019699308b0473>