Date: Tue, 25 Jun 2019 14:19:05 -0400 From: mike tancsa <mike@sentex.net> To: freebsd-questions@freebsd.org Subject: ZFS Optimizing for large directories and MANY files Message-ID: <e3168936-9c39-2e8c-2d37-b1bf8da03b08@sentex.net>
next in thread | raw e-mail | index | archive | help
I have been trying to understand various zfs sysctl settings once again and how they might relate to optimizing a file server that has very few big files, but MANY small ones (RELENG_12).... Sometimes directories that get upwards of 30,000+ files and the odd time when some outside user process breaks, 100,000+ files. Obviously, throwing a LOT of RAM at the problem helps. But are there any more tunings I can do ? So far, I have adjusted vfs.zfs.arc_meta_strategy=1 /|vfs.zfs.arc_meta_limit to 65% of ARC memory over the default 25% on the zfs set in question, I have set primarycache=metadata |/ /|Anything else I can do to bias towards a file system with MANY files ? Unfortunately, I cant control the end users from dumping many files in their single directories easily. I think the hit happens, when they log in, do a dir, see what files they need to download, download and log out. As long as that is cached, its not so bad. |/ /| |/ /|Doing some simple tests on an imported version of the data set (on slower spinning rust drives), something simple such as |/ /|# time find . -type f -mtime -2d|/ /|takes 40 min after a cold boot.|/ /|Watching zfs disk IO, its super slow in terms of bandwidth, but gstat shows the disks close to being pegged. I guess the heads are thrashing about inefficiently ? |/ /|1{ryzenbsd12}# zpool iostat tmpdisk 1 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- tmpdisk 301G 1.46T 335 1 899K 33.0K tmpdisk 301G 1.46T 402 0 1.02M 0 tmpdisk 301G 1.46T 265 0 559K 0 tmpdisk 301G 1.46T 331 0 715K 0 tmpdisk 301G 1.46T 276 0 650K 0 tmpdisk 301G 1.46T 293 0 718K 0 tmpdisk 301G 1.46T 432 0 1.11M 0 tmpdisk 301G 1.46T 435 0 1.03M 0 tmpdisk 301G 1.46T 412 0 1.01M 0 tmpdisk 301G 1.46T 315 0 717K 0 tmpdisk 301G 1.46T 417 0 1.04M 0 tmpdisk 301G 1.46T 457 0 1.13M 0 tmpdisk 301G 1.46T 448 0 1.05M 0|/ /|top shows ARC steadily growing |/ /|ARC: 5119M Total, 2128M MFU, 2361M MRU, 1608K Anon, 73M Header, 560M Other 606M Compressed, 3902M Uncompressed, 6.43:1 Ratio |/ /| |/ /|stats show|/ /| ARC Summary: (HEALTHY) Memory Throttle Count: 0 ARC Misc: Deleted: 238 Recycle Misses: 0 Mutex Misses: 0 Evict Skips: 1.04k ARC Size: 17.28% 5.20 GiB Target Size: (Adaptive) 100.00% 30.07 GiB Min Size (Hard Limit): 12.50% 3.76 GiB Max Size (High Water): 8:1 30.07 GiB ARC Size Breakdown: Recently Used Cache Size: 50.00% 15.03 GiB Frequently Used Cache Size: 50.00% 15.03 GiB ARC Hash Breakdown: Elements Max: 247.31k Elements Current: 100.00% 247.31k Collisions: 7.20k Chain Max: 3 Chains: 6.99k ------------------------------------------------------------------------ ARC Efficiency: 2.53m Cache Hit Ratio: 88.31% 2.23m Cache Miss Ratio: 11.69% 295.48k Actual Hit Ratio: 88.24% 2.23m Data Demand Efficiency: 87.76% 20.01k CACHE HITS BY CACHE LIST: Anonymously Used: 0.08% 1.69k Most Recently Used: 21.64% 483.05k Most Frequently Used: 78.28% 1.75m Most Recently Used Ghost: 0.00% 0 Most Frequently Used Ghost: 0.00% 0 CACHE HITS BY DATA TYPE: Demand Data: 0.79% 17.56k Prefetch Data: 0.00% 0 Demand Metadata: 99.14% 2.21m Prefetch Metadata: 0.08% 1.69k CACHE MISSES BY DATA TYPE: Demand Data: 0.83% 2.45k Prefetch Data: 0.00% 0 Demand Metadata: 18.79% 55.52k Prefetch Metadata: 80.38% 237.51k|/ /| |/ /| |/ /|Once a single trip through the file system via find is done, top shows|/ /|ARC: 10G Total, 7161M MFU, 467M MRU, 1600K Anon, 191M Header, 2842M Other 1647M Compressed, 11G Uncompressed, 7.12:1 Ratio |/ /|find, on the second iteration only takes|/ /|0{ryzenbsd12}# time find . -type f -mtime -2d ./list.txt ./l 1.992u 69.557s 1:11.54 100.0% 35+177k 169144+0io 0pf+0w 0{ryzenbsd12}# |/ /|and the stats look appropriately better too|/ /| ARC Summary: (HEALTHY) Memory Throttle Count: 0 ARC Misc: Deleted: 238 Recycle Misses: 0 Mutex Misses: 0 Evict Skips: 1.04k ARC Size: 34.11% 10.26 GiB Target Size: (Adaptive) 100.00% 30.07 GiB Min Size (Hard Limit): 12.50% 3.76 GiB Max Size (High Water): 8:1 30.07 GiB ARC Size Breakdown: Recently Used Cache Size: 50.00% 15.03 GiB Frequently Used Cache Size: 50.00% 15.03 GiB ARC Hash Breakdown: Elements Max: 688.43k Elements Current: 100.00% 688.43k Collisions: 53.65k Chain Max: 4 Chains: 50.50k ------------------------------------------------------------------------ ARC Efficiency: 56.03m Cache Hit Ratio: 98.07% 54.94m Cache Miss Ratio: 1.93% 1.08m Actual Hit Ratio: 97.64% 54.71m Data Demand Efficiency: 86.21% 21.97k CACHE HITS BY CACHE LIST: Anonymously Used: 0.43% 237.54k Most Recently Used: 12.19% 6.70m Most Frequently Used: 87.37% 48.01m Most Recently Used Ghost: 0.00% 0 Most Frequently Used Ghost: 0.00% 0 CACHE HITS BY DATA TYPE: Demand Data: 0.03% 18.94k Prefetch Data: 0.00% 0 Demand Metadata: 95.72% 52.59m Prefetch Metadata: 4.24% 2.33m CACHE MISSES BY DATA TYPE: Demand Data: 0.28% 3.03k Prefetch Data: 0.00% 0 Demand Metadata: 50.84% 550.75k Prefetch Metadata: 48.88% 529.54k ------------------------------------------------------------------------ |/ /|Anything else to adjust ? I was going to use RAID1+0 for the dataset on SSDs. Should I bother with an NVME drive for L2ARC caching ? On my test box, I can sort of approximate how much RAM I need for metadata (11G it seems), is there a better programatic way to find that value out ? |/ /| ---Mike |/ /| |/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e3168936-9c39-2e8c-2d37-b1bf8da03b08>