Date: Thu, 14 Jan 2010 17:15:47 +0100 From: Ivan Voras <ivoras@freebsd.org> To: Doug Poland <doug@polands.org> Cc: freebsd-questions@freebsd.org Subject: Re: 8.0-R-p2 ZFS: unixbench causing kmem exhaustion panic Message-ID: <9bbcef731001140815h5ee1d672je58c8ec91382e8d4@mail.gmail.com> In-Reply-To: <e9d87b468534fb7eccbc1a4fc2fced19.squirrel@email.polands.org> References: <8418112cdfada93d83ca0cb5307c1d21.squirrel@email.polands.org> <9bbcef731001131035x604cdea1t81b14589cb10ad25@mail.gmail.com> <b41ca31fbeacf104143509e8cba2fe66.squirrel@email.polands.org> <9bbcef731001131157h256c4d14mbb241bc4326405f8@mail.gmail.com> <3aa09fd8723749d1fa65f1b9a6faac60.squirrel@email.polands.org> <cb290c7a06dd633dfc1cd5bd8b4fd99a.squirrel@email.polands.org> <himnfv$acn$1@ger.gmane.org> <27117211dd662bcf93055f4351243396.squirrel@email.polands.org> <9bbcef731001140650h5d887843ubc6d555da993e8b6@mail.gmail.com> <e9d87b468534fb7eccbc1a4fc2fced19.squirrel@email.polands.org>
next in thread | previous in thread | raw e-mail | index | archive | help
2010/1/14 Doug Poland <doug@polands.org>: > > On Thu, January 14, 2010 08:50, Ivan Voras wrote: >> 2010/1/14 Doug Poland <doug@polands.org>: >>>>> >>>>> kstat.zfs.misc.arcstats.size >>>>> >>>>> seemed to fluctuate between about 164,000,00 and 180,000,000 bytes >>>>> during this last run >>>> >>>> Is that with or without panicking? >>>> >>> with a panic >>> >>> >>>> If the system did panic then it looks like the problem is a memory >>>> leak somewhere else in the kernel, which you could confirm by >>>> monitoring vmstat -z. >>>> >>> I'll give that a try. =C2=A0Am I looking for specific items in vmstat >>> -z? =C2=A0 arc*, zil*, zfs*, zio*? =C2=A0Please advise. >> >> You should look for whatever is allocating all your memory between 180 >> MB (which is your ARC size) and 1.2 GB (which is your kmem size). >> > > OK, another run, this time back to vfs.zfs.arc_max=3D512M in > /boot/loader.conf, and a panic: > > panic: kmem malloc(131072): kmem map too small: 1294258176 total > allocated > > I admit I do not fully understand what metrics are important to proper > analysis of this issue. =C2=A0In this case, I was watching the following > within 1 second of the panic: > > sysctl kstat.zfs.misc.arcstats.size: 41739944 > sysctl vfs.numvnodes: 678 > sysctl vfs.zfs.arc_max: 536870912 > sysctl vfs.zfs.arc_meta_limit: 134217728 > sysctl vfs.zfs.arc_meta_used: 7228584 > sysctl vfs.zfs.arc_min: 67108864 > sysctl vfs.zfs.cache_flush_disable: 0 > sysctl vfs.zfs.debug: 0 > sysctl vfs.zfs.mdcomp_disable: 0 > sysctl vfs.zfs.prefetch_disable: 1 > sysctl vfs.zfs.recover: 0 > sysctl vfs.zfs.scrub_limit: 10 > sysctl vfs.zfs.super_owner: 0 > sysctl vfs.zfs.txg.synctime: 5 > sysctl vfs.zfs.txg.timeout: 30 > sysctl vfs.zfs.vdev.aggregation_limit: 131072 > sysctl vfs.zfs.vdev.cache.bshift: 16 > sysctl vfs.zfs.vdev.cache.max: 16384 > sysctl vfs.zfs.vdev.cache.size: 10485760 > sysctl vfs.zfs.vdev.max_pending: 35 > sysctl vfs.zfs.vdev.min_pending: 4 > sysctl vfs.zfs.vdev.ramp_rate: 2 > sysctl vfs.zfs.vdev.time_shift: 6 > sysctl vfs.zfs.version.acl: 1 > sysctl vfs.zfs.version.dmu_backup_header: 2 > sysctl vfs.zfs.version.dmu_backup_stream: 1 > sysctl vfs.zfs.version.spa: 13 > sysctl vfs.zfs.version.vdev_boot: 1 > sysctl vfs.zfs.version.zpl: 3 > sysctl vfs.zfs.zfetch.array_rd_sz: 1048576 > sysctl vfs.zfs.zfetch.block_cap: 256 > sysctl vfs.zfs.zfetch.max_streams: 8 > sysctl vfs.zfs.zfetch.min_sec_reap: 2 > sysctl vfs.zfs.zil_disable: 0 > sysctl vm.kmem_size: 1327202304 > sysctl vm.kmem_size_max: 329853485875 > sysctl vm.kmem_size_min: 0 > sysctl vm.kmem_size_scale: 3 > > > vmstat -z | egrep -i 'zfs|zil|arc|zio|files' > ITEM =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 SIZE =C2=A0 =C2=A0 LIMIT =C2=A0 =C2=A0 =C2=A0USED =C2=A0 =C2=A0 =C2=A0F= REE =C2=A0REQUESTS > Files: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 80, =C2=A0 =C2=A0 =C2=A0 =C2=A00, =C2=A0 =C2=A0 =C2=A0116, =C2=A0 =C2= =A0 =C2=A0199, =C2=A0 850713 > zio_cache: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0720, = =C2=A0 =C2=A0 =C2=A0 =C2=A00, =C2=A0 =C2=A053562, =C2=A0 =C2=A0 =C2=A0 98, = 86386955 > arc_buf_hdr_t: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0208, =C2=A0 =C2= =A0 =C2=A0 =C2=A00, =C2=A0 =C2=A0 1193, =C2=A0 =C2=A0 =C2=A0 31, =C2=A0 =C2= =A011990 > arc_buf_t: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 72, = =C2=A0 =C2=A0 =C2=A0 =C2=A00, =C2=A0 =C2=A0 1180, =C2=A0 =C2=A0 =C2=A0120, = =C2=A0 =C2=A011990 > zil_lwb_cache: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0200, =C2=A0 =C2= =A0 =C2=A0 =C2=A00, =C2=A0 =C2=A011580, =C2=A0 =C2=A0 2594, =C2=A0 =C2=A062= 407 > zfs_znode_cache: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0376, =C2=A0 =C2=A0 =C2= =A0 =C2=A00, =C2=A0 =C2=A0 =C2=A0605, =C2=A0 =C2=A0 =C2=A0 55, =C2=A0 =C2= =A0 =C2=A0654 > > vmstat -m |grep solaris|sed 's/K//'|awk '{print "vm.solaris:", $3*1024}' > > > =C2=A0solaris: 1285068800 > > > The value I see as the culprit is vmstat -m | grep solaris. =C2=A0This > value fluctuates wildly during the run and is always near kmem_size at > the time of the panic. > > Again, I'm not sure what to look for here, and you are patiently > helping me along in this process. =C2=A0If you have any tips or can point > me to docs on how to easily monitor these values, I will endeavor to > do so. The only really important ones should be kstat.zfs.misc.arcstats.size (which you very rarely print) and vm.kmem_size. The "solaris" entry above should be near kstat.zfs.misc.arcstats.size in all cases. But I don't have any more ideas here. Try taking this post (also include kstst.zfs.misc.arcstats.size) to the freebsd-fs@ mailing list.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9bbcef731001140815h5ee1d672je58c8ec91382e8d4>