Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Jan 2010 17:15:47 +0100
From:      Ivan Voras <ivoras@freebsd.org>
To:        Doug Poland <doug@polands.org>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: 8.0-R-p2 ZFS: unixbench causing kmem exhaustion panic
Message-ID:  <9bbcef731001140815h5ee1d672je58c8ec91382e8d4@mail.gmail.com>
In-Reply-To: <e9d87b468534fb7eccbc1a4fc2fced19.squirrel@email.polands.org>
References:  <8418112cdfada93d83ca0cb5307c1d21.squirrel@email.polands.org>  <9bbcef731001131035x604cdea1t81b14589cb10ad25@mail.gmail.com>  <b41ca31fbeacf104143509e8cba2fe66.squirrel@email.polands.org>  <9bbcef731001131157h256c4d14mbb241bc4326405f8@mail.gmail.com>  <3aa09fd8723749d1fa65f1b9a6faac60.squirrel@email.polands.org>  <cb290c7a06dd633dfc1cd5bd8b4fd99a.squirrel@email.polands.org>  <himnfv$acn$1@ger.gmane.org> <27117211dd662bcf93055f4351243396.squirrel@email.polands.org>  <9bbcef731001140650h5d887843ubc6d555da993e8b6@mail.gmail.com>  <e9d87b468534fb7eccbc1a4fc2fced19.squirrel@email.polands.org>

next in thread | previous in thread | raw e-mail | index | archive | help
2010/1/14 Doug Poland <doug@polands.org>:
>
> On Thu, January 14, 2010 08:50, Ivan Voras wrote:
>> 2010/1/14 Doug Poland <doug@polands.org>:
>>>>>
>>>>> kstat.zfs.misc.arcstats.size
>>>>>
>>>>> seemed to fluctuate between about 164,000,00 and 180,000,000 bytes
>>>>> during this last run
>>>>
>>>> Is that with or without panicking?
>>>>
>>> with a panic
>>>
>>>
>>>> If the system did panic then it looks like the problem is a memory
>>>> leak somewhere else in the kernel, which you could confirm by
>>>> monitoring vmstat -z.
>>>>
>>> I'll give that a try. =C2=A0Am I looking for specific items in vmstat
>>> -z? =C2=A0 arc*, zil*, zfs*, zio*? =C2=A0Please advise.
>>
>> You should look for whatever is allocating all your memory between 180
>> MB (which is your ARC size) and 1.2 GB (which is your kmem size).
>>
>
> OK, another run, this time back to vfs.zfs.arc_max=3D512M in
> /boot/loader.conf, and a panic:
>
> panic: kmem malloc(131072): kmem map too small: 1294258176 total
> allocated
>
> I admit I do not fully understand what metrics are important to proper
> analysis of this issue. =C2=A0In this case, I was watching the following
> within 1 second of the panic:
>
> sysctl kstat.zfs.misc.arcstats.size: 41739944
> sysctl vfs.numvnodes: 678
> sysctl vfs.zfs.arc_max: 536870912
> sysctl vfs.zfs.arc_meta_limit: 134217728
> sysctl vfs.zfs.arc_meta_used: 7228584
> sysctl vfs.zfs.arc_min: 67108864
> sysctl vfs.zfs.cache_flush_disable: 0
> sysctl vfs.zfs.debug: 0
> sysctl vfs.zfs.mdcomp_disable: 0
> sysctl vfs.zfs.prefetch_disable: 1
> sysctl vfs.zfs.recover: 0
> sysctl vfs.zfs.scrub_limit: 10
> sysctl vfs.zfs.super_owner: 0
> sysctl vfs.zfs.txg.synctime: 5
> sysctl vfs.zfs.txg.timeout: 30
> sysctl vfs.zfs.vdev.aggregation_limit: 131072
> sysctl vfs.zfs.vdev.cache.bshift: 16
> sysctl vfs.zfs.vdev.cache.max: 16384
> sysctl vfs.zfs.vdev.cache.size: 10485760
> sysctl vfs.zfs.vdev.max_pending: 35
> sysctl vfs.zfs.vdev.min_pending: 4
> sysctl vfs.zfs.vdev.ramp_rate: 2
> sysctl vfs.zfs.vdev.time_shift: 6
> sysctl vfs.zfs.version.acl: 1
> sysctl vfs.zfs.version.dmu_backup_header: 2
> sysctl vfs.zfs.version.dmu_backup_stream: 1
> sysctl vfs.zfs.version.spa: 13
> sysctl vfs.zfs.version.vdev_boot: 1
> sysctl vfs.zfs.version.zpl: 3
> sysctl vfs.zfs.zfetch.array_rd_sz: 1048576
> sysctl vfs.zfs.zfetch.block_cap: 256
> sysctl vfs.zfs.zfetch.max_streams: 8
> sysctl vfs.zfs.zfetch.min_sec_reap: 2
> sysctl vfs.zfs.zil_disable: 0
> sysctl vm.kmem_size: 1327202304
> sysctl vm.kmem_size_max: 329853485875
> sysctl vm.kmem_size_min: 0
> sysctl vm.kmem_size_scale: 3
>
>
> vmstat -z | egrep -i 'zfs|zil|arc|zio|files'
> ITEM =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 SIZE =C2=A0 =C2=A0 LIMIT =C2=A0 =C2=A0 =C2=A0USED =C2=A0 =C2=A0 =C2=A0F=
REE =C2=A0REQUESTS
> Files: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 80, =C2=A0 =C2=A0 =C2=A0 =C2=A00, =C2=A0 =C2=A0 =C2=A0116, =C2=A0 =C2=
=A0 =C2=A0199, =C2=A0 850713
> zio_cache: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0720, =
=C2=A0 =C2=A0 =C2=A0 =C2=A00, =C2=A0 =C2=A053562, =C2=A0 =C2=A0 =C2=A0 98, =
86386955
> arc_buf_hdr_t: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0208, =C2=A0 =C2=
=A0 =C2=A0 =C2=A00, =C2=A0 =C2=A0 1193, =C2=A0 =C2=A0 =C2=A0 31, =C2=A0 =C2=
=A011990
> arc_buf_t: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 72, =
=C2=A0 =C2=A0 =C2=A0 =C2=A00, =C2=A0 =C2=A0 1180, =C2=A0 =C2=A0 =C2=A0120, =
=C2=A0 =C2=A011990
> zil_lwb_cache: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0200, =C2=A0 =C2=
=A0 =C2=A0 =C2=A00, =C2=A0 =C2=A011580, =C2=A0 =C2=A0 2594, =C2=A0 =C2=A062=
407
> zfs_znode_cache: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0376, =C2=A0 =C2=A0 =C2=
=A0 =C2=A00, =C2=A0 =C2=A0 =C2=A0605, =C2=A0 =C2=A0 =C2=A0 55, =C2=A0 =C2=
=A0 =C2=A0654
>
> vmstat -m |grep solaris|sed 's/K//'|awk '{print "vm.solaris:", $3*1024}'
>
>
> =C2=A0solaris: 1285068800
>
>
> The value I see as the culprit is vmstat -m | grep solaris. =C2=A0This
> value fluctuates wildly during the run and is always near kmem_size at
> the time of the panic.
>
> Again, I'm not sure what to look for here, and you are patiently
> helping me along in this process. =C2=A0If you have any tips or can point
> me to docs on how to easily monitor these values, I will endeavor to
> do so.

The only really important ones should be kstat.zfs.misc.arcstats.size
(which you very rarely print) and vm.kmem_size. The "solaris" entry
above should be near  kstat.zfs.misc.arcstats.size in all cases.

But I don't have any more ideas here. Try taking this post (also
include kstst.zfs.misc.arcstats.size) to the freebsd-fs@ mailing list.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9bbcef731001140815h5ee1d672je58c8ec91382e8d4>