From owner-freebsd-performance@FreeBSD.ORG Mon Aug 20 15:22:32 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F09271065670; Mon, 20 Aug 2012 15:22:31 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30]) by mx1.freebsd.org (Postfix) with ESMTP id B6F578FC14; Mon, 20 Aug 2012 15:22:31 +0000 (UTC) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 412A44C01EA; Mon, 20 Aug 2012 10:22:31 -0500 (CDT) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 3F9FD4C01E0; Mon, 20 Aug 2012 10:22:31 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel Received: from mh11.mail.rice.edu ([127.0.0.1]) by mh11.mail.rice.edu (mh11.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id 9Hhej0BMPt7H; Mon, 20 Aug 2012 10:22:31 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh11.mail.rice.edu (Postfix) with ESMTPSA id 956A64C01C4; Mon, 20 Aug 2012 10:22:30 -0500 (CDT) Message-ID: <50325634.7090904@rice.edu> Date: Mon, 20 Aug 2012 10:22:28 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:8.0) Gecko/20111113 Thunderbird/8.0 MIME-Version: 1.0 To: =?ISO-8859-1?Q?=22Gezeala_M=2E_Bacu=F1o_II=22?= References: <502DEAD9.6050304@zonov.org> <502EB081.3030801@rice.edu> <502FE98E.40807@rice.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Mailman-Approved-At: Mon, 20 Aug 2012 15:26:27 +0000 Cc: alc@freebsd.org, freebsd-performance@freebsd.org, Andrey Zonov , kib@freebsd.org Subject: Re: vm.kmem_size_max and vm.kmem_size capped at 329853485875 (~307GB) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Aug 2012 15:22:32 -0000 On 08/18/2012 19:57, Gezeala M. Bacuņo II wrote: > On Sat, Aug 18, 2012 at 12:14 PM, Alan Cox wrote: >> On 08/17/2012 17:08, Gezeala M. Bacuņo II wrote: >>> On Fri, Aug 17, 2012 at 1:58 PM, Alan Cox wrote: >>>> vm.kmem_size controls the maximum size of the kernel's heap, i.e., the >>>> region where the kernel's slab and malloc()-like memory allocators obtain >>>> their memory. While this heap may occupy the largest portion of the >>>> kernel's virtual address space, it cannot occupy the entirety of the >>>> address >>>> space. There are other things that must be given space within the >>>> kernel's >>>> address space, for example, the file system buffer map. >>>> >>>> ZFS does not, however, use the regular file system buffer cache. The ARC >>>> takes its place, and the ARC abuses the kernel's heap like nothing else. >>>> So, if you are running a machine that only makes trivial use of a non-ZFS >>>> file system, like you boot from UFS, but store all of your data in ZFS, >>>> then >>>> you can dramatically reduce the size of the buffer map via boot loader >>>> tuneables and proportionately increase vm.kmem_size. >>>> >>>> Any further increases in the kernel virtual address space size will, >>>> however, require code changes. Small changes, but changes nonetheless. >>>> >>>> Alan >>>> >>>> >>> <> >>> >>>>> Additional Info: >>>>> 1] Installed using PCBSD-9 Release amd64. >>>>> >>>>> 2] uname -a >>>>> FreeBSD fmt-iscsi-stg1.musicreports.com 9.0-RELEASE FreeBSD >>>>> 9.0-RELEASE #3: Tue Dec 27 14:14:29 PST 2011 >>>>> >>>>> >>>>> root@build9x64.pcbsd.org:/usr/obj/builds/amd64/pcbsd-build90/fbsd-source/9.0/sys/GENERIC >>>>> amd64 >>>>> >>>>> 3] first few lines from /var/run/dmesg.boot: >>>>> FreeBSD 9.0-RELEASE #3: Tue Dec 27 14:14:29 PST 2011 >>>>> >>>>> >>>>> root@build9x64.pcbsd.org:/usr/obj/builds/amd64/pcbsd-build90/fbsd-source/9.0/sys/GENERIC >>>>> amd64 >>>>> CPU: Intel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz (2666.82-MHz K8-class CPU) >>>>> Origin = "GenuineIntel" Id = 0x206f2 Family = 6 Model = 2f >>>>> Stepping >>>>> = 2 >>>>> >>>>> >>>>> Features=0xbfebfbff >>>>> >>>>> >>>>> Features2=0x29ee3ff >>>>> AMD Features=0x2c100800 >>>>> AMD Features2=0x1 >>>>> TSC: P-state invariant, performance statistics >>>>> real memory = 549755813888 (524288 MB) >>>>> avail memory = 530339893248 (505771 MB) >>>>> Event timer "LAPIC" quality 600 >>>>> ACPI APIC Table: >>>>> FreeBSD/SMP: Multiprocessor System Detected: 64 CPUs >>>>> FreeBSD/SMP: 8 package(s) x 8 core(s) >>>>> >>>>> 4] relevant sysctl's with manual tuning: >>>>> kern.maxusers: 384 >>>>> kern.maxvnodes: 8222162 >>>>> vfs.numvnodes: 675740 >>>>> vfs.freevnodes: 417524 >>>>> kern.ipc.somaxconn: 128 >>>>> kern.openfiles: 5238 >>>>> vfs.zfs.arc_max: 428422987776 >>>>> vfs.zfs.arc_min: 53552873472 >>>>> vfs.zfs.arc_meta_used: 3167391088 >>>>> vfs.zfs.arc_meta_limit: 107105746944 >>>>> vm.kmem_size_max: 429496729600 ==>> manually tuned >>>>> vm.kmem_size: 429496729600 ==>> manually tuned >>>>> vm.kmem_map_free: 107374727168 >>>>> vm.kmem_map_size: 144625156096 >>>>> vfs.wantfreevnodes: 2055540 >>>>> kern.minvnodes: 2055540 >>>>> kern.maxfiles: 197248 ==>> manually tuned >>>>> vm.vmtotal: >>>>> System wide totals computed every five seconds: (values in kilobytes) >>>>> =============================================== >>>>> Processes: (RUNQ: 1 Disk Wait: 1 Page Wait: 0 Sleep: 150) >>>>> Virtual Memory: (Total: 1086325716K Active: 12377876K) >>>>> Real Memory: (Total: 144143408K Active: 803432K) >>>>> Shared Virtual Memory: (Total: 81384K Active: 37560K) >>>>> Shared Real Memory: (Total: 32224K Active: 27548K) >>>>> Free Memory Pages: 365565564K >>>>> >>>>> hw.availpages: 134170294 >>>>> hw.physmem: 549561524224 >>>>> hw.usermem: 391395241984 >>>>> hw.realmem: 551836188672 >>>>> vm.kmem_size_scale: 1 >>>>> kern.ipc.nmbclusters: 2560000 ==>> manually tuned >>>>> kern.ipc.maxsockbuf: 2097152 >>>>> net.inet.tcp.sendbuf_max: 2097152 >>>>> net.inet.tcp.recvbuf_max: 2097152 >>>>> kern.maxfilesperproc: 18000 >>>>> net.inet.ip.intr_queue_maxlen: 256 >>>>> kern.maxswzone: 33554432 >>>>> kern.ipc.shmmax: 10737418240 ==>> manually tuned >>>>> kern.ipc.shmall: 2621440 ==>> manually tuned >>>>> vfs.zfs.write_limit_override: 0 >>>>> vfs.zfs.prefetch_disable: 0 >>>>> hw.pagesize: 4096 >>>>> hw.availpages: 134170294 >>>>> kern.ipc.maxpipekva: 8586895360 >>>>> kern.ipc.shm_use_phys: 1 ==>> manually tuned >>>>> vfs.vmiodirenable: 1 >>>>> debug.numcache: 632148 >>>>> vfs.ncsizefactor: 2 >>>>> vm.kvm_size: 549755809792 >>>>> vm.kvm_free: 54456741888 >>>>> kern.ipc.semmni: 256 >>>>> kern.ipc.semmns: 512 >>>>> kern.ipc.semmnu: 256 >>>>> >>> Thanks. It will be mainly used for postgreSQL and java. We have a huge >>> db (3TB and growing) and we need to have as much of it as we can on >>> zfs' ARC. All data resides on zpools while root is on ufs. On 8.2 and >>> 9 machines vm.kmem_size is always auto-tuned to almost the same size >>> as our installed RAM. What I've tuned on those machines is lower >>> vfs.zfs.arc_max to 50% or 75% of vm.kmem_size and that have worked >>> well for us and the machines does not swap out. Now on this machine, I >>> do think that I need to adjust my formula for tuning vfs.zfs.arc_max, >>> 25% for other stuff is probably overkill. >>> >>> We were able to successfully bump vm.kmem_size_max and vm.kmem_size to >>> 400GB: >>> vm.kmem_size_max: 429496729600 ==>> manually tuned >>> vm.kmem_size: 429496729600 ==>> manually tuned >>> vfs.zfs.arc_max: 428422987776 ==>> auto-tuned (vm.kmem_size - 1G) >>> vfs.zfs.arc_min: 53552873472 ==>> auto-tuned >>> >>> Which other tuneables do I need to set on /boot/loader.conf so we can >>> boot the machine with vm.kmem_size> 400G. As I don't know which part >>> of the boot-up process is failing with vm.kmem_size/_max set to 450G >>> or 500G, I have no idea which to tune next. >> >> >> Your objective should be to reduce the value of "sysctl vfs.maxbufspace". >> You can do this by setting the loader.conf tuneable "kern.maxbcache" to the >> desired value. >> >> What does your machine currently report for "sysctl vfs.maxbufspace"? >> > Here you go: > vfs.maxbufspace: 54967025664 > kern.maxbcache: 0 Try setting kern.maxbcache to two billion and adding 50 billion to the setting of vm.kmem_size{,_max}. > Other (probably) relevant values: > vfs.hirunningspace: 16777216 > vfs.lorunningspace: 11206656 > vfs.bufdefragcnt: 0 > vfs.buffreekvacnt: 2 > vfs.bufreusecnt: 320149 > vfs.hibufspace: 54966370304 > vfs.lobufspace: 54966304768 > vfs.maxmallocbufspace: 2748318515 > vfs.bufmallocspace: 0 > vfs.bufspace: 10490478592 > vfs.runningbufspace: 0 > > Let me know if you need other tuneables or sysctl values. Thanks a lot > for looking into this. >