Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 May 2013 08:35:38 -0700
From:      Jeremy Chadwick <jdc@koitsu.org>
To:        Ronald Klop <ronald-freebsd8@klop.yi.org>
Cc:        FreeBSD stable <freebsd-stable@freebsd.org>, dennis berger <db@bsdsystems.de>, Steven Hartland <killing@multiplay.co.uk>
Subject:   Re: still mbuf leak in 9.0 / 9.1?
Message-ID:  <20130518153538.GA9228@icarus.home.lan>
In-Reply-To: <op.ww9yqee88527sy@ronaldradial>
References:  <FDFFFCCB-BDF8-4E27-AF9D-D14D7E0D426D@nipsi.de> <CAFOYbcmF5WybuyJ9DuotcJf_u1FxwBKOLtHvpnT-05cVG6ES=A@mail.gmail.com> <004BC6EA-D8E6-473E-851C-9CDA7578510A@nipsi.de> <20130515211436.GA42790@icarus.home.lan> <696B5622-A95D-4187-A027-07ECC9B5AD1F@nipsi.de> <F3B040438E014E958372DCD64566CED4@multiplay.co.uk> <4F319A22-E611-4EE6-A970-98315B15C12F@nipsi.de> <1186B7CE-EC84-42F6-8904-EDD0C4A5FFBD@bsdsystems.de> <20130517173101.GB87223@icarus.home.lan> <op.ww9yqee88527sy@ronaldradial>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, May 18, 2013 at 12:14:28PM +0200, Ronald Klop wrote:
> On Fri, 17 May 2013 19:31:01 +0200, Jeremy Chadwick <jdc@koitsu.org> wrote:
> 
> >On Fri, May 17, 2013 at 11:37:23AM +0200, dennis berger wrote:
> >>Hi List,
> >>I can confirm that it is the bug you mentioned steven.
> >>Here is how I found it.
> >>
> >>I recorded hourly zfskern and nfsd stats. like this.
> >>
> >>echo "PROCSTAT" >> $reportname
> >>pgrep -S "(zfskern|nfsd)" | xargs procstat -kk >> $reportname
> >>
> >>luckily it crashed this night and logged this.
> >>
> >> 1910 101508 nfsd             nfsd: service    mi_switch+0x186
> >>sleepq_wait+0x42 _sleep+0x376 arc_lowmem+0x77 kmem_malloc+0xc1
> >>uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5
> >>arc_read_nolock+0x1ec arc_read+0x93 dbuf_prefetch+0x12c
> >>dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x4a7
> >>dmu_buf_hold_array_by_dnode+0x16b dmu_buf_hold_array+0x67
> >>dmu_read_uio+0x3f zfs_freebsd_read+0x3e3
> >>
> >>Maybe it would be good to merge this fix into RELENG_9_1 and
> >>distribute a fix via freebsd-update what do you think?
> >>
> >>best,
> >>-dennis
> >>
> >>
> >>Am 16.05.2013 um 11:42 schrieb dennis berger:
> >>
> >>> This is indeed a ZFS+NFS system and I can see that istgt and
> >>nfs are stuck in some ZIO state. Maybe it's this.
> >>> Thank's for pointing out.
> >>>
> >>> Is it this ZFS+NFS deadlock?
> >>>
> >>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> >>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> >>> @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int
> >>howto __unused)
> >>> 	mutex_enter(&arc_reclaim_thr_lock);
> >>> 	needfree = 1;
> >>> 	cv_signal(&arc_reclaim_thr_cv);
> >>> -	while (needfree)
> >>> -	 msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0);
> >>> +
> >>> +	/*
> >>> +	 * It is unsafe to block here in arbitrary threads, because
> >>we can come
> >>> +	 * here from ARC itself and may hold ARC locks and thus risk
> >>a deadlock
> >>> +	 * with ARC reclaim thread.
> >>> +	 */
> >>> +	if (curproc == pageproc) {
> >>> +	 while (needfree)
> >>> +	 msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0);
> >>> +	}
> >>> 	mutex_exit(&arc_reclaim_thr_lock);
> >>> 	mutex_exit(&arc_lowmem_lock);
> >>> }
> >>>
> >>> I'll try to crash our testsystem. I'll assume that stressing
> >>NFS backed with ZFS a lot might trigger this bug?
> >>>
> >>> -dennis
> >>>
> >>>
> >>> Am 16.05.2013 um 00:03 schrieb Steven Hartland:
> >>>
> >>>> ----- Original Message ----- From: "dennis berger" <db@nipsi.de>
> >>>>> FreeBSD  9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec
> >>4 09:23:10 UTC 2012
> >>>>>
> >>>>>> 3. Regarding this:
> >>>>>>>> A clean shutdown isn't possible though. It hangs after vnode
> >>>>>>>> cleaning, normally you would see detaching of usb devices
> >>here, or
> >>>>>>>> other devices maybe?
> >>>>>> Please don't conflate this with your above issue.  This is almost
> >>>>>> certainly unrelated.  Please start a new thread about that
> >>if desired.
> >>>>>
> >>>>> Maybe this is a misunderstanding normally this system will
> >>shutdown cleanly, of course.
> >>>>> This hang only appears after the network problem above.
> >>>>
> >>>> If this is a ZFS system, its a known issue which is fixed in current,
> >>>> stable-9, stable-8 and the upcoming 8.4 release.
> >>>>
> >>>> If not and you have USB devices see if the following sysctl helps:
> >>>> hw.usb.no_shutdown_wait=1
> >
> >I'm sorry to say it won't happen.  The only updates that the -RELEASE
> >branches get are for security.  If you want fixes for other things, you
> >need to follow/run stables branches (i.e. stable/9), otherwise you will
> >need to wait until 9.2-RELEASE comes out.
> >
> 
> And errata notices? Are they for security?

Example case:

http://www.freebsd.org/releases/9.1R/errata.html

Only the items in section "Security Advisories" would get actual updates
pushed out to the 9.1-RELEASE branch (e.g. RELENG_9_1); the items in
sections "Open Issues" and "Late-breaking News" are purely FYIs.  There
are always hundreds of bugs that never show up in either of those
sections but are mentioned in the next official versions' Release Notes.
I can speculate all day and night as to why this is, but it's easier for
me to just say "that's just the way it is".

For example, compare the "Open Issues" in the 9.0-RELEASE errata to all
the bugfixes in the 9.1-RELEASE Release Notes (you'll have to go through
each item by hand and read it):

http://www.freebsd.org/releases/9.0R/errata.html
http://www.freebsd.org/releases/9.1R/relnotes-detailed.html

...and you'll see what I mean.

So to recap: when you run a -RELEASE branch, you should only expect
fixes related to security.  For any other problems, you are expected to
run stable/X (e.g. stable/9) or get to backport the fix yourself.

And because I am certain someone will bring it up: no, the fixes done in
stable/X cannot necessarily be turned into a patch file for a -RELEASE
branch.  The reason is that there are often other commits to stable/X
branches which are for things other than bugfixes (i.e.
re-engineering/refactoring of code, semantics changes, or entire
portions nuked altogether).  Sometimes "backported" patches can be made,
but it isn't always the case -- it is not always as simple as "the patch
applied cleanly".  ZFS and NFS are two (of many) things which have been
undergoing constant change.

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130518153538.GA9228>