Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Oct 2010 15:15:32 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Locked up processes after upgrade to ZFS v15
Message-ID:  <20101010121532.GG2392@deviant.kiev.zoral.com.ua>
In-Reply-To: <4CB18BC6.70305@freebsd.org>
References:  <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> <201010061732.o96HW2Vi005945@higson.cam.lispworks.com> <E5332812-379B-4EC1-A134-12176C718B2E@free.de> <4CAF45A8.3020401@icyb.net.ua> <4CB18BC6.70305@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--URkQCorwCiZbgSAY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Oct 10, 2010 at 12:47:50PM +0300, Andriy Gapon wrote:
> on 08/10/2010 19:24 Andriy Gapon said the following:
> > on 06/10/2010 21:51 Kai Gallasch said the following:
> >>
> >> Am 06.10.2010 um 19:32 schrieb Martin Simmons:
> >>
> >>>>>>>> On Wed, 6 Oct 2010 14:28:31 +0200, Kai Gallasch said:
> >>>>
> >>>> How can I debug this and get further information?
> >>>
> >>> procstat -k -k $pid will generate a backtrace (or replace $pid by -a =
for all
> >>> processes).
> >>
> >> procstat for process 12111 (state: zfs)
> >> sonnenkraft:~ # procstat -k -k 12111
> >>   PID    TID COMM             TDNAME           KSTACK                 =
     =20
> >> 12111 102385 httpd            -                mi_switch+0x21b sleepq_=
switch+0x123 sleepq_wait+0x4d __lockmgr_args+0x7ae vop_stdlock+0x39 VOP_LOC=
K1_APV+0x9b _vn_lock+0x57 vget+0x7b cache_lookup+0x4e0 vfs_cache_lookup+0xc=
0 VOP_LOOKUP_APV+0xb7 lookup+0x3d3 namei+0x457 vn_open_cred+0x1e3 kern_open=
at+0x181 syscall+0x102 Xfast_syscall+0xe2
> >>
> >> procstat for process 24731 (state: zfsmrb)
> >> # procstat -k -k 24731
> >>   PID    TID COMM             TDNAME           KSTACK                 =
     =20
> >> 24731 102273 httpd            -                mi_switch+0x21b sleepq_=
switch+0x123 sleepq_wait+0x4d _sleep+0x369 zfs_freebsd_read+0x2a6 VOP_READ_=
APV+0xaf vnode_pager_generic_getpages+0x3ea VOP_GETPAGES_APV+0xb5 vnode_pag=
er_getpages+0x8c vm_fault+0x685 trap_pfault+0x128 trap+0x52c calltrap+0x8
>=20
> Hm, I think that we actually shouldn't see a stack like that.
> vm_fault sets VPO_BUSY on a page before calling vnode_pager_generic_getpa=
ges, so
> the thread gets stuck forever in zfs mappedread.
> It seems like the page that was seen as invalid in vm_fault becomes valid=
 while
> call flow reaches mappedread.
The vnode is share-locked, and vm object lock is dropped and reacquired
several times until control reaches zfs_mappedread. This indeed allows
a window during which page might be read by other thread.

There are two possible routes to solve the issue:
1. Provide zfs-specific VOP_GETPAGES().
2. Use my vm6 patch. Sigh.

>=20
> >> In my original post I wrote that only apache httpd processes would loc=
k up..
> >> This is wrong. Several other non-httpd processes also got stuck in sta=
te zfs or zfsmrb.
> >=20
> > Interesting.
> > It's possible that TID 102385 might be waiting on a vnode lock held by =
TID 102273.
> > But TID 102273 seems to be waiting on a vnode's page lock.
> > It would be very interesting to learn what process has that page busy, =
for how
> > long and why.
> > Perhaps there is a code path that busies a page, but never un-busies it=
...
> >=20
>=20
>=20
> --=20
> Andriy Gapon

--URkQCorwCiZbgSAY
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkyxrmQACgkQC3+MBN1Mb4hV9ACeLIfbAZYd14eJsqFc1G2qTUhP
AVIAnA8z9BMl1sb5RFLOKZOwAengP7gD
=7NAB
-----END PGP SIGNATURE-----

--URkQCorwCiZbgSAY--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101010121532.GG2392>