Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 Mar 2015 01:07:24 +0100
From:      Steven Hartland <killing@multiplay.co.uk>
To:        freebsd-fs@freebsd.org
Subject:   Re: All available memory used when deleting files from ZFS
Message-ID:  <5519E53C.4060203@multiplay.co.uk>
In-Reply-To: <FD30147A-C7F7-4138-9F96-10024A6FE061@ebureau.com>
References:  <FD30147A-C7F7-4138-9F96-10024A6FE061@ebureau.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Later versions have vfs.zfs.free_max_blocks which is likely to be the 
fix your looking for.

It was added to head by r271532 and stable/10 by:
https://svnweb.freebsd.org/base?view=revision&revision=272665

Description being:

Add a new tunable/sysctl, vfs.zfs.free_max_blocks, which can be used to
limit how many blocks can be free'ed before a new transaction group is
created.  The default is no limit (infinite), but we should probably have
a lower default, e.g. 100,000.

With this limit, we can guard against the case where ZFS could run out of
memory when destroying large numbers of blocks in a single transaction
group, as the entire DDT needs to be brought into memory.

Illumos issue:
     5138 add tunable for maximum number of blocks freed in one txg



On 30/03/2015 22:14, Dustin Wenz wrote:
> I had several systems panic or hang over the weekend while deleting some data off of their local zfs filesystem. It looks like they ran out of physical memory (32GB), and hung when paging to swap-on-zfs (which is not surprising, given that ZFS was likely using the memory). They were running 10.1-STABLE r277139M, which I built in the middle of January. The pools were about 35TB in size, and are a concatenation of 3TB mirrors. They were maybe 95% full. I deleted just over 1000 files, totaling 25TB on each system.
>
> It took roughly 10 minutes to remove that 25TB of data per host using a remote rsync, and immediately after that everything seemed fine. However, after several more minutes, every machine that had data removed became unresponsive. Some had numerous "swap_pager: indefinite wait buffer" errors followed by a panic, and some just died with no console messages. The same thing would happen after a reboot, when FreeBSD attempted to mount the local filesystem again.
>
> I was able to boot these systems after exporting the affected pool, but the problem would recur several minutes after initiating a "zpool import". Watching zfs statistics didn't seem to reveal where the memory was going; ARC would only climb to about 4GB, but free memory would decline rapidly. Eventually, after enough export/reboot/import cycles, the pool would import successfully and everything would be fine from then on. Note that there is no L2ARC or compression being used.
>
> Has anyone else run into this when deleting files on ZFS? It seems to be a consistent problem under the versions of 10.1 I'm running.
>
> For reference, I've appended a zstat dump below that was taken 5 minutes after starting a zpool import, and was about three minutes before the machine became unresponsive. You can see that the ARC is only 4GB, but free memory was down to 471MB (and continued to drop).
>
> 	- .Dustin
>
>
> ------------------------------------------------------------------------
> ZFS Subsystem Report				Mon Mar 30 12:35:27 2015
> ------------------------------------------------------------------------
>
> System Information:
>
> 	Kernel Version:				1001506 (osreldate)
> 	Hardware Platform:			amd64
> 	Processor Architecture:			amd64
>
> 	ZFS Storage pool Version:		5000
> 	ZFS Filesystem Version:			5
>
> FreeBSD 10.1-STABLE #11 r277139M: Tue Jan 13 14:59:55 CST 2015 root
> 12:35PM  up 8 mins, 3 users, load averages: 7.23, 8.96, 4.87
>
> ------------------------------------------------------------------------
>
> System Memory:
>
> 	0.17%	55.40	MiB Active,	0.14%	46.11	MiB Inact
> 	98.34%	30.56	GiB Wired,	0.00%	0 Cache
> 	1.34%	425.46	MiB Free,	0.00%	4.00	KiB Gap
>
> 	Real Installed:				32.00	GiB
> 	Real Available:			99.82%	31.94	GiB
> 	Real Managed:			97.29%	31.08	GiB
>
> 	Logical Total:				32.00	GiB
> 	Logical Used:			98.56%	31.54	GiB
> 	Logical Free:			1.44%	471.57	MiB
>
> Kernel Memory:					3.17	GiB
> 	Data:				99.18%	3.14	GiB
> 	Text:				0.82%	26.68	MiB
>
> Kernel Memory Map:				31.08	GiB
> 	Size:				14.18%	4.41	GiB
> 	Free:				85.82%	26.67	GiB
>
> ------------------------------------------------------------------------
>
> ARC Summary: (HEALTHY)
> 	Memory Throttle Count:			0
>
> ARC Misc:
> 	Deleted:				145
> 	Recycle Misses:				0
> 	Mutex Misses:				0
> 	Evict Skips:				0
>
> ARC Size:				14.17%	4.26	GiB
> 	Target Size: (Adaptive)		100.00%	30.08	GiB
> 	Min Size (Hard Limit):		12.50%	3.76	GiB
> 	Max Size (High Water):		8:1	30.08	GiB
>
> ARC Size Breakdown:
> 	Recently Used Cache Size:	50.00%	15.04	GiB
> 	Frequently Used Cache Size:	50.00%	15.04	GiB
>
> ARC Hash Breakdown:
> 	Elements Max:				270.56k
> 	Elements Current:		100.00%	270.56k
> 	Collisions:				23.66k
> 	Chain Max:				3
> 	Chains:					8.28k
>
> ------------------------------------------------------------------------
>
> ARC Efficiency:					2.93m
> 	Cache Hit Ratio:		70.44%	2.06m
> 	Cache Miss Ratio:		29.56%	866.05k
> 	Actual Hit Ratio:		70.40%	2.06m
>
> 	Data Demand Efficiency:		97.47%	24.58k
> 	Data Prefetch Efficiency:	1.88%	479
>
> 	CACHE HITS BY CACHE LIST:
> 	  Anonymously Used:		0.05%	1.07k
> 	  Most Recently Used:		71.82%	1.48m
> 	  Most Frequently Used:		28.13%	580.49k
> 	  Most Recently Used Ghost:	0.00%	0
> 	  Most Frequently Used Ghost:	0.00%	0
>
> 	CACHE HITS BY DATA TYPE:
> 	  Demand Data:			1.16%	23.96k
> 	  Prefetch Data:		0.00%	9
> 	  Demand Metadata:		98.79%	2.04m
> 	  Prefetch Metadata:		0.05%	1.08k
>
> 	CACHE MISSES BY DATA TYPE:
> 	  Demand Data:			0.07%	621
> 	  Prefetch Data:		0.05%	470
> 	  Demand Metadata:		99.69%	863.35k
> 	  Prefetch Metadata:		0.19%	1.61k
>
> ------------------------------------------------------------------------
>
> L2ARC is disabled
>
> ------------------------------------------------------------------------
>
> File-Level Prefetch: (HEALTHY)
>
> DMU Efficiency:					72.95k
> 	Hit Ratio:			70.83%	51.66k
> 	Miss Ratio:			29.17%	21.28k
>
> 	Colinear:				21.28k
> 	  Hit Ratio:			0.01%	2
> 	  Miss Ratio:			99.99%	21.28k
>
> 	Stride:					50.45k
> 	  Hit Ratio:			99.98%	50.44k
> 	  Miss Ratio:			0.02%	9
>
> DMU Misc:
> 	Reclaim:				21.28k
> 	  Successes:			1.73%	368
> 	  Failures:			98.27%	20.91k
>
> 	Streams:				1.23k
> 	  +Resets:			0.16%	2
> 	  -Resets:			99.84%	1.23k
> 	  Bogus:				0
>
> ------------------------------------------------------------------------
>
> VDEV cache is disabled
>
> ------------------------------------------------------------------------
>
> ZFS Tunables (sysctl):
> 	kern.maxusers                           2380
> 	vm.kmem_size                            33367830528
> 	vm.kmem_size_scale                      1
> 	vm.kmem_size_min                        0
> 	vm.kmem_size_max                        1319413950874
> 	vfs.zfs.arc_max                         32294088704
> 	vfs.zfs.arc_min                         4036761088
> 	vfs.zfs.arc_average_blocksize           8192
> 	vfs.zfs.arc_shrink_shift                5
> 	vfs.zfs.arc_free_target                 56518
> 	vfs.zfs.arc_meta_used                   4534349216
> 	vfs.zfs.arc_meta_limit                  8073522176
> 	vfs.zfs.l2arc_write_max                 8388608
> 	vfs.zfs.l2arc_write_boost               8388608
> 	vfs.zfs.l2arc_headroom                  2
> 	vfs.zfs.l2arc_feed_secs                 1
> 	vfs.zfs.l2arc_feed_min_ms               200
> 	vfs.zfs.l2arc_noprefetch                1
> 	vfs.zfs.l2arc_feed_again                1
> 	vfs.zfs.l2arc_norw                      1
> 	vfs.zfs.anon_size                       1786368
> 	vfs.zfs.anon_metadata_lsize             0
> 	vfs.zfs.anon_data_lsize                 0
> 	vfs.zfs.mru_size                        504812032
> 	vfs.zfs.mru_metadata_lsize              415273472
> 	vfs.zfs.mru_data_lsize                  35227648
> 	vfs.zfs.mru_ghost_size                  0
> 	vfs.zfs.mru_ghost_metadata_lsize        0
> 	vfs.zfs.mru_ghost_data_lsize            0
> 	vfs.zfs.mfu_size                        3925990912
> 	vfs.zfs.mfu_metadata_lsize              3901947392
> 	vfs.zfs.mfu_data_lsize                  7000064
> 	vfs.zfs.mfu_ghost_size                  0
> 	vfs.zfs.mfu_ghost_metadata_lsize        0
> 	vfs.zfs.mfu_ghost_data_lsize            0
> 	vfs.zfs.l2c_only_size                   0
> 	vfs.zfs.dedup.prefetch                  1
> 	vfs.zfs.nopwrite_enabled                1
> 	vfs.zfs.mdcomp_disable                  0
> 	vfs.zfs.max_recordsize                  1048576
> 	vfs.zfs.dirty_data_max                  3429735628
> 	vfs.zfs.dirty_data_max_max              4294967296
> 	vfs.zfs.dirty_data_max_percent          10
> 	vfs.zfs.dirty_data_sync                 67108864
> 	vfs.zfs.delay_min_dirty_percent         60
> 	vfs.zfs.delay_scale                     500000
> 	vfs.zfs.prefetch_disable                0
> 	vfs.zfs.zfetch.max_streams              8
> 	vfs.zfs.zfetch.min_sec_reap             2
> 	vfs.zfs.zfetch.block_cap                256
> 	vfs.zfs.zfetch.array_rd_sz              1048576
> 	vfs.zfs.top_maxinflight                 32
> 	vfs.zfs.resilver_delay                  2
> 	vfs.zfs.scrub_delay                     4
> 	vfs.zfs.scan_idle                       50
> 	vfs.zfs.scan_min_time_ms                1000
> 	vfs.zfs.free_min_time_ms                1000
> 	vfs.zfs.resilver_min_time_ms            3000
> 	vfs.zfs.no_scrub_io                     0
> 	vfs.zfs.no_scrub_prefetch               0
> 	vfs.zfs.free_max_blocks                 -1
> 	vfs.zfs.metaslab.gang_bang              16777217
> 	vfs.zfs.metaslab.fragmentation_threshold70
> 	vfs.zfs.metaslab.debug_load             0
> 	vfs.zfs.metaslab.debug_unload           0
> 	vfs.zfs.metaslab.df_alloc_threshold     131072
> 	vfs.zfs.metaslab.df_free_pct            4
> 	vfs.zfs.metaslab.min_alloc_size         33554432
> 	vfs.zfs.metaslab.load_pct               50
> 	vfs.zfs.metaslab.unload_delay           8
> 	vfs.zfs.metaslab.preload_limit          3
> 	vfs.zfs.metaslab.preload_enabled        1
> 	vfs.zfs.metaslab.fragmentation_factor_enabled1
> 	vfs.zfs.metaslab.lba_weighting_enabled  1
> 	vfs.zfs.metaslab.bias_enabled           1
> 	vfs.zfs.condense_pct                    200
> 	vfs.zfs.mg_noalloc_threshold            0
> 	vfs.zfs.mg_fragmentation_threshold      85
> 	vfs.zfs.check_hostid                    1
> 	vfs.zfs.spa_load_verify_maxinflight     10000
> 	vfs.zfs.spa_load_verify_metadata        1
> 	vfs.zfs.spa_load_verify_data            1
> 	vfs.zfs.recover                         0
> 	vfs.zfs.deadman_synctime_ms             1000000
> 	vfs.zfs.deadman_checktime_ms            5000
> 	vfs.zfs.deadman_enabled                 1
> 	vfs.zfs.spa_asize_inflation             24
> 	vfs.zfs.spa_slop_shift                  5
> 	vfs.zfs.space_map_blksz                 4096
> 	vfs.zfs.txg.timeout                     5
> 	vfs.zfs.vdev.metaslabs_per_vdev         200
> 	vfs.zfs.vdev.cache.max                  16384
> 	vfs.zfs.vdev.cache.size                 0
> 	vfs.zfs.vdev.cache.bshift               16
> 	vfs.zfs.vdev.trim_on_init               1
> 	vfs.zfs.vdev.mirror.rotating_inc        0
> 	vfs.zfs.vdev.mirror.rotating_seek_inc   5
> 	vfs.zfs.vdev.mirror.rotating_seek_offset1048576
> 	vfs.zfs.vdev.mirror.non_rotating_inc    0
> 	vfs.zfs.vdev.mirror.non_rotating_seek_inc1
> 	vfs.zfs.vdev.async_write_active_min_dirty_percent30
> 	vfs.zfs.vdev.async_write_active_max_dirty_percent60
> 	vfs.zfs.vdev.max_active                 1000
> 	vfs.zfs.vdev.sync_read_min_active       10
> 	vfs.zfs.vdev.sync_read_max_active       10
> 	vfs.zfs.vdev.sync_write_min_active      10
> 	vfs.zfs.vdev.sync_write_max_active      10
> 	vfs.zfs.vdev.async_read_min_active      1
> 	vfs.zfs.vdev.async_read_max_active      3
> 	vfs.zfs.vdev.async_write_min_active     1
> 	vfs.zfs.vdev.async_write_max_active     10
> 	vfs.zfs.vdev.scrub_min_active           1
> 	vfs.zfs.vdev.scrub_max_active           2
> 	vfs.zfs.vdev.trim_min_active            1
> 	vfs.zfs.vdev.trim_max_active            64
> 	vfs.zfs.vdev.aggregation_limit          131072
> 	vfs.zfs.vdev.read_gap_limit             32768
> 	vfs.zfs.vdev.write_gap_limit            4096
> 	vfs.zfs.vdev.bio_flush_disable          0
> 	vfs.zfs.vdev.bio_delete_disable         0
> 	vfs.zfs.vdev.trim_max_bytes             2147483648
> 	vfs.zfs.vdev.trim_max_pending           64
> 	vfs.zfs.max_auto_ashift                 13
> 	vfs.zfs.min_auto_ashift                 9
> 	vfs.zfs.zil_replay_disable              0
> 	vfs.zfs.cache_flush_disable             0
> 	vfs.zfs.zio.use_uma                     1
> 	vfs.zfs.zio.exclude_metadata            0
> 	vfs.zfs.sync_pass_deferred_free         2
> 	vfs.zfs.sync_pass_dont_compress         5
> 	vfs.zfs.sync_pass_rewrite               2
> 	vfs.zfs.snapshot_list_prefetch          0
> 	vfs.zfs.super_owner                     0
> 	vfs.zfs.debug                           0
> 	vfs.zfs.version.ioctl                   4
> 	vfs.zfs.version.acl                     1
> 	vfs.zfs.version.spa                     5000
> 	vfs.zfs.version.zpl                     5
> 	vfs.zfs.vol.mode                        1
> 	vfs.zfs.vol.unmap_enabled               1
> 	vfs.zfs.trim.enabled                    1
> 	vfs.zfs.trim.txg_delay                  32
> 	vfs.zfs.trim.timeout                    30
> 	vfs.zfs.trim.max_interval               1
>
> ------------------------------------------------------------------------
>
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5519E53C.4060203>