Date: Mon, 13 Aug 2018 19:39:00 +0200 From: Mark Martinec <Mark.Martinec+freebsd@ijs.si> To: stable@freebsd.org Cc: Mark Johnston <markj@freebsd.org> Subject: Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.2-R amd64 Message-ID: <3a02f99bf5a5250df3f5d6ee6cb168c9@ijs.si> In-Reply-To: <a1e730def80ae4d188f5ba782b70b46d@ijs.si> References: <1a039af7758679ba1085934b4fb81b57@ijs.si> <3e56e4de076111c04c2595068ba71eec@ijs.si> <20180731220948.GA97237@raichu> <2ec91ebeaba54fda5e9437f868d4d590@ijs.si> <b3aa2bbe947914f8933b24cf0d0b15f0@ijs.si> <20180804170154.GA12146@raichu> <87f6a55cc2ee3d754ddb89475bbfbab8@ijs.si> <20180804194757.GD12146@raichu> <a1e730def80ae4d188f5ba782b70b46d@ijs.si>
next in thread | previous in thread | raw e-mail | index | archive | help
> 2018-08-04 21:47, Mark Johnston wrote: >> Sorry, I missed that message. Given that information, it would be >> useful to see the output of the following script instead: >> >> # dtrace -c "zpool list -Hp" -x temporal=off -n ' >> dtmalloc::solaris:malloc >> /pid == $target/{@allocs[stack(), args[3]] = count()} >> dtmalloc::solaris:free >> /pid == $target/{@frees[stack(), args[3]] = count();}' >> This will record all allocations and frees from a single instance of >> "zpool list". > 2018-08-07 14:58, Mark Martinec wrote: > Collected, here it is: > https://www.ijs.si/usr/mark/tmp/dtrace-cmd.out.bz2 >> Was there a mention of a defunct pool? > > Indeed. > Haven't tried yet to destroy it, so it is only my hypothesis > that a defunct pool plays a role in this leak. [...] > I have jumped from 10.3 directly to 11.1-RELEASE-p11, so I'm not sure > with exactly which version / patch level the problem was introduced. > > Tried to reproduce the problem on another host running 11.2R, > using memory disk (md), created GPT partition on it and a ZFS pool > on top, then destroyed the disk, so the pool was left as UNAVAILABLE. > Unfortunately this did not reproduce the problem, the "zpool list" > on that host does not cause ZFS to leak memory. Must be something > specific to that failed disk or pool, which is causing the leak. > Mark More news: on my last posting I said I can't reproduce the issue on another 11.2 host. Well, it turned out this was only half the truth. So this is what I did the last time: # create a test pool on md mdconfig -a -t swap -s 1Gb gpart create -s gpt /dev/md0 gpart add -t freebsd-zfs -a 4k /dev/md0 zpool create test /dev/md0p1 # destroy the disk underneath the pool, making it "unavailable" mdconfig -d -u 0 -o force and I reported that the "zpool list" command does not leak memory, unlike on another host where the problem was first detected. But in the following days after this, the second machine started to run out of memory and ground to a standstill after a couple of days - this now happened three times, until I realized the same thing was happening here as on the original host. (the "zpool list" is running periodically as a plugin to a "telegraf" monitoring) Sure enough the "zpool list" was leaking "solaris" zone memory here too, and even in larger chunks (previously by 570, now by about 2k): # (while true; do zpool list >/dev/null; vmstat -m | \ fgrep solaris; sleep 0.5; done) | awk '{print $2-a; a=$2}' 12224540 2509 3121 5022 2507 1834 2508 2505 And it's not just the "zpool list" command. The same leak occurs with "zpool status" and with "zpool iostat", either when explicitly specifying the defunct pool as argument, or without specifying a pool (implying all). (but not when a healthy pool is explicitly specified to such command) And to confirm the hypothesis: while running the "zpool list" in an above loop, I destroyed the defunct pool from another terminal, and the leak immediately vanished (the vmstat -m | fgrep solaris no longer grew). So the only missing link is: why the leak did not start immediately after revoking the disk and making the pool unavailable, but only some time later (hours? few days? after a reboot? after running some other command?). Mark
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3a02f99bf5a5250df3f5d6ee6cb168c9>