Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 May 2013 11:37:23 +0200
From:      dennis berger <db@bsdsystems.de>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        Jeremy Chadwick <jdc@koitsu.org>, FreeBSD stable <freebsd-stable@freebsd.org>
Subject:   Re: still mbuf leak in 9.0 / 9.1?
Message-ID:  <1186B7CE-EC84-42F6-8904-EDD0C4A5FFBD@bsdsystems.de>
In-Reply-To: <4F319A22-E611-4EE6-A970-98315B15C12F@nipsi.de>
References:  <FDFFFCCB-BDF8-4E27-AF9D-D14D7E0D426D@nipsi.de> <CAFOYbcmF5WybuyJ9DuotcJf_u1FxwBKOLtHvpnT-05cVG6ES=A@mail.gmail.com> <004BC6EA-D8E6-473E-851C-9CDA7578510A@nipsi.de> <20130515211436.GA42790@icarus.home.lan> <696B5622-A95D-4187-A027-07ECC9B5AD1F@nipsi.de> <F3B040438E014E958372DCD64566CED4@multiplay.co.uk> <4F319A22-E611-4EE6-A970-98315B15C12F@nipsi.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi List,
I can confirm that it is the bug you mentioned steven.
Here is how I found it.

I recorded hourly zfskern and nfsd stats. like this.

echo "PROCSTAT" >> $reportname
pgrep -S "(zfskern|nfsd)" | xargs procstat -kk >> $reportname

luckily it crashed this night and logged this.

 1910 101508 nfsd             nfsd: service    mi_switch+0x186 =
sleepq_wait+0x42 _sleep+0x376 arc_lowmem+0x77 kmem_malloc+0xc1 =
uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 =
arc_read_nolock+0x1ec arc_read+0x93 dbuf_prefetch+0x12c =
dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x4a7 =
dmu_buf_hold_array_by_dnode+0x16b dmu_buf_hold_array+0x67 =
dmu_read_uio+0x3f zfs_freebsd_read+0x3e3=20

Maybe it would be good to merge this fix into RELENG_9_1 and distribute =
a fix via freebsd-update what do you think?

best,
-dennis


Am 16.05.2013 um 11:42 schrieb dennis berger:

> This is indeed a ZFS+NFS system and I can see that istgt and nfs are =
stuck in some ZIO state. Maybe it's this.=20
> Thank's for pointing out.=20
>=20
> Is it this ZFS+NFS deadlock?
>=20
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c=20
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c=20
> @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int howto =
__unused)=20
> 	mutex_enter(&arc_reclaim_thr_lock);=20
> 	needfree =3D 1;=20
> 	cv_signal(&arc_reclaim_thr_cv);=20
> -	while (needfree)=20
> -	 msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0);=20=

> +=20
> +	/*=20
> +	 * It is unsafe to block here in arbitrary threads, because we =
can come=20
> +	 * here from ARC itself and may hold ARC locks and thus risk a =
deadlock=20
> +	 * with ARC reclaim thread.=20
> +	 */=20
> +	if (curproc =3D=3D pageproc) {=20
> +	 while (needfree)=20
> +	 msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0);=20=

> +	}=20
> 	mutex_exit(&arc_reclaim_thr_lock);=20
> 	mutex_exit(&arc_lowmem_lock);=20
> }
>=20
> I'll try to crash our testsystem. I'll assume that stressing NFS =
backed with ZFS a lot might trigger this bug?
>=20
> -dennis
>=20
>=20
> Am 16.05.2013 um 00:03 schrieb Steven Hartland:
>=20
>> ----- Original Message ----- From: "dennis berger" <db@nipsi.de>
>>> FreeBSD  9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 =
09:23:10 UTC 2012
>>>=20
>>>> 3. Regarding this:
>>>>>> A clean shutdown isn't possible though. It hangs after vnode
>>>>>> cleaning, normally you would see detaching of usb devices here, =
or
>>>>>> other devices maybe?
>>>> Please don't conflate this with your above issue.  This is almost
>>>> certainly unrelated.  Please start a new thread about that if =
desired.
>>>=20
>>> Maybe this is a misunderstanding normally this system will shutdown =
cleanly, of course.
>>> This hang only appears after the network problem above.
>>=20
>> If this is a ZFS system, its a known issue which is fixed in current,
>> stable-9, stable-8 and the upcoming 8.4 release.
>>=20
>> If not and you have USB devices see if the following sysctl helps:
>> hw.usb.no_shutdown_wait=3D1
>>=20
>>  Regards
>>  Steve
>>=20
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> This e.mail is private and confidential between Multiplay (UK) Ltd. =
and the person or entity to whom it is addressed. In the event of =
misdirection, the recipient is prohibited from using, copying, printing =
or otherwise disseminating it or any information contained in it.=20
>> In the event of misdirection, illegible or incomplete transmission =
please telephone +44 845 868 1337
>> or return the E.mail to postmaster@multiplay.co.uk.
>>=20
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org"
>=20
>=20
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org"






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1186B7CE-EC84-42F6-8904-EDD0C4A5FFBD>