Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Feb 2011 22:36:53 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Bartosz Stec <admin@kkip.pl>
Cc:        freebsd-fs@freebsd.org, pjd@freebsd.org
Subject:   Re: Memory leak in ZFS?
Message-ID:  <20110208203653.GC78089@deviant.kiev.zoral.com.ua>
In-Reply-To: <4D519F97.2000805@kkip.pl>
References:  <AANLkTi=8fFwiaQ4%2Bm_cWFkXwpa4_W0_DDV2aW8vyNU4E@mail.gmail.com> <op.vqjyb21daevz08@ghost-pc.home.lan> <4D510BBB.1060708@kkip.pl> <20110208102727.GA8555@icarus.home.lan> <4D511F65.2050503@kkip.pl> <4D519F97.2000805@kkip.pl>

next in thread | previous in thread | raw e-mail | index | archive | help

--UIrAl4r1g2eOkvhC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 08, 2011 at 08:55:03PM +0100, Bartosz Stec wrote:
> W dniu 2011-02-08 11:48, Bartosz Stec pisze:
> >W dniu 2011-02-08 11:27, Jeremy Chadwick pisze:
> >>On Tue, Feb 08, 2011 at 10:24:11AM +0100, Bartosz Stec wrote:
> >>>W dniu 2011-02-07 22:37, Emil Muratov pisze:
> >>>>>For the past few weeks, I noticed that the amount of memory
> >>>>>reported in top
> >>>>>(sum of active, inact, wired, cache buf and free) keeps
> >>>>>decreasing as the
> >>>>>uptime increases. I can't pinpoint to when I first noticed this,
> >>>>>as I have
> >>>>>updated the system a few times just in case this has been fixed.
> >>>>Yes, I have the same issue on my home file storage. My system is
> >>>>8.1 amd64, 2G ram, zfs on root raidz with 4x1,5T drives.
> >>>>After updating to stable a couple of days ago I noticed that the
> >>>>system leaks memory very fast. Checking here and there I found
> >>>>that the issue concerns sendfile (yep, again!).
> >>>>
> >>>>How to reproduce:
> >>>>Configure samba with aio and sendfile (mine is version 3.5.6)
> >>>>
> >>>>smb.conf
> >>>>[global]
> >>>>use sendfile=3Dtrue
> >>>>aio read size =3D 16384
> >>>>
> >>>>Download a couple of large samba shared files (8-10 gigs).
> >>>>
> >>>>
> >>>>While downloading files I can see that memory decreazes to nowhere
> >>>>very-very fast, several MBs per second! First it drains free mem,
> >>>>than active and inactive, than comes wired until the whole system
> >>>>commits suicide suffocating itself to the death.
> >>>>The only way to free memory is to reboot the system. I can't
> >>>>unload zfs module like PJD suggested to do, 'cause my root is on
> >>>>zfs :(
> >>>>I'll try to make a bootable flash and move root to the flash to
> >>>>try to unload module and what will happen.
> >>>>
> >>>>Everything was OK in stable before the new year, sendfile used to
> >>>>pump free and wired memory to inactive than slowly reclaiming it
> >>>>back. But it seems something was changed after NY holydays?
> >>>I'm glad someone else finally picked that problem, so there's
> >>>appareantly no memory-eating ghost in my machine ;)
> >>>Here's my thread on stable list about this issue:
> >>>http://lists.freebsd.org/pipermail/freebsd-stable/2011-January/061247.=
html=20
> >>>
> >>>
> >>>And in fact, PC reported in thread above is also SAMBA server with
> >>>aio/sendfile enabled and ZFS.
> >>>
> >>>I would be happy testing some patches if necessary, because until
> >>>now I need to monitor memory and reboot this server before it dies.
> >>The source and build date of your kernel will matter greatly here.
> >>
> >>I can't speak about the memory utilisation aspect, but I tend to disable
> >>sendfile everywhere possible when ZFS is in use on a system.  The reason
> >>is based on something I and another user experienced back in October
> >>2010 pertaining to sendfile() on ZFS locking up processes (making them
> >>unkillable).  See here[1] for details; this problem has since been
> >>fixed[2] (look for commits around October).  You'll also find some
> >>commits that went through in November pertaining to ZFS and sendfile.
> >>This is why I said the date of your kernel/sources matters.  :-)
> >
> >I tried rebuild since original thread, hoping that problem is fixed=20
> >already, so now it's very fresh: 8.2-PRERELEASE #18: Sun Feb  6=20
> >03:04:47 CET 2011.
> >Problem is still here:
> >
> >Mem: 37M Active, 78M Inact, 1154M Wired, 64M Cache, 199M Buf, 40M Free
> >About 1373MB instead of 2GB, and it's not even 2 days of uptime.
> >
> >>The issue I referenced in [1] is not related to memory utilisation, but
> >>does indicate use of sendfile with ZFS may be a bad idea (by this I
> >>mean, there may be aspects of its implementation when mixed with ZFS
> >>that have been overlooked).
> >>
> >>Simple test: if you disable use of sendfile (but not AIO) in Samba, does
> >>the problem go away?
> >I've just disabled sendfile in smb.conf and I'll report in about 2=20
> >days, after reboot which I will perform tonight.
> >I hope it won't hit samba performance too much ;)
> >
> We didn't need to wait 2 days :)
> Now I can confirm that sendfile under SAMBA + ZFS are responsible for=20
> issue. Here's sample output from my monitoring script[1] (update every 2=
=20
> seconds):
>=20
>    PHYSMEM: 2027.41 MB
>    ACTIVE: 61.14 MB
>    INACTIVE: 40.01 MB
>    WIRED: 1303.86 MB
>    CACHED: .50 MB
>    FREE: 552.30 MB
>    SUM: 1957.82 MB
>    ------------------------
>    MISSING: 69.58 MB
>=20
>    PHYSMEM: 2027.41 MB
>    ACTIVE: 61.14 MB
>    INACTIVE: 40.07 MB
>    WIRED: 1303.86 MB
>    CACHED: .50 MB
>    FREE: 551.80 MB
>    SUM: 1957.38 MB
>    ------------------------
>    MISSING: 70.02 MB
>=20
>    PHYSMEM: 2027.41 MB
>    ACTIVE: 61.14 MB
>    INACTIVE: 40.13 MB
>    WIRED: 1303.86 MB
>    CACHED: .50 MB
>    FREE: 551.30 MB
>    SUM: 1956.94 MB
>    ------------------------
>    MISSING: 70.46 MB
>=20
>    PHYSMEM: 2027.41 MB
>    ACTIVE: 61.14 MB
>    INACTIVE: 40.19 MB
>    WIRED: 1303.86 MB
>    CACHED: .50 MB
>    FREE: 550.80 MB
>    SUM: 1956.51 MB
>    ------------------------
>    MISSING: 70.89 MB
>=20
>    PHYSMEM: 2027.41 MB
>    ACTIVE: 61.14 MB
>    INACTIVE: 40.24 MB
>    WIRED: 1303.86 MB
>    CACHED: .50 MB
>    FREE: 550.42 MB
>    SUM: 1956.18 MB
>    ------------------------
>    MISSING: 71.22 MB
>=20
>    PHYSMEM: 2027.41 MB
>    ACTIVE: 61.14 MB
>    INACTIVE: 40.30 MB
>    WIRED: 1303.86 MB
>    CACHED: .50 MB
>    FREE: 549.92 MB
>    SUM: 1955.74 MB
>    ------------------------
>    MISSING: 71.66 MB
>=20
>    PHYSMEM: 2027.41 MB
>    ACTIVE: 61.14 MB
>    INACTIVE: 40.38 MB
>    WIRED: 1303.86 MB
>    CACHED: .50 MB
>    FREE: 549.30 MB
>    SUM: 1955.19 MB
>    ------------------------
>    MISSING: 72.21 MB
>=20
>    PHYSMEM: 2027.41 MB
>    ACTIVE: 61.14 MB
>    INACTIVE: 40.44 MB
>    WIRED: 1303.86 MB
>    CACHED: .50 MB
>    FREE: 548.80 MB
>    SUM: 1954.76 MB
>    ------------------------
>    MISSING: 72.64 MB
>=20
> This behaviour has been seen while copying 600MB file from SAMBA share=20
> with sendfile enabled.
> It doesn't happen when writing to samba share, and it doesn't happen=20
> with sendfile disabled, both ways.
> For me it looks like memory which leaks should be added to wired pool=20
> and belongs to ARC, but appareantly this doesn't work well and WIRED:=20
> 1303.86 MB all the time.
>=20
> [1] http://pastebin.com/sQUyQbmm

Try this. I the similar fix is needed for tmpfs, but there are some
more issues and pending rewrite, so I decided not to touch it.

commit 8e5885bce1afecd419e40240a2d7ab90deb0392a
Author: Konstantin Belousov <kostik@pooma.home>
Date:   Tue Feb 8 22:35:29 2011 +0200

    Do not forget to activate the page

diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c b/s=
ys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
index e8191b3..7343c72 100644
--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
@@ -353,6 +353,9 @@ page_unlock(vm_page_t pp)
 {
=20
 	vm_page_wakeup(pp);
+	vm_page_lock(pp);
+	vm_page_activate(pp);
+	vm_page_unlock(pp);
 }
=20
 static caddr_t
@@ -480,7 +483,7 @@ again:
 			if (error =3D=3D 0)
 				uiomove_fromphys(&m, off, bytes, uio);
 			VM_OBJECT_LOCK(obj);
-			vm_page_wakeup(m);
+			page_unlock(m);
 		} else if (uio->uio_segflg =3D=3D UIO_NOCOPY) {
 			/*
 			 * The code below is here to make sendfile(2) work
@@ -527,9 +530,15 @@ again:
 				zfs_unmap_page(sf);
 			}
 			VM_OBJECT_LOCK(obj);
-			if (error =3D=3D 0)
-				m->valid =3D VM_PAGE_BITS_ALL;
 			vm_page_io_finish(m);
+			vm_page_lock(m);
+			if (error =3D=3D 0) {
+				m->valid =3D VM_PAGE_BITS_ALL;
+				vm_page_activate(m);
+			} else
+				vm_page_free(m);
+			vm_page_unlock(m);
+
 			if (error =3D=3D 0) {
 				uio->uio_resid -=3D bytes;
 				uio->uio_offset +=3D bytes;

--UIrAl4r1g2eOkvhC
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk1RqWUACgkQC3+MBN1Mb4izHwCfUbU4AceFZ3BnReYfVOety+gw
1gUAoLYpyMdSOEsmxeI3g1E05wy9ntdQ
=wjUH
-----END PGP SIGNATURE-----

--UIrAl4r1g2eOkvhC--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110208203653.GC78089>