From owner-svn-src-head@freebsd.org Thu Feb 14 07:56:49 2019 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6387B14ECDEF; Thu, 14 Feb 2019 07:56:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id ADA47937BD; Thu, 14 Feb 2019 07:56:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id DD7AA436B8A; Thu, 14 Feb 2019 18:56:44 +1100 (AEDT) Date: Thu, 14 Feb 2019 18:56:42 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Justin Hibbits cc: Gleb Smirnoff , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r343030 - in head/sys: cam conf dev/md dev/nvme fs/fuse fs/nfsclient fs/smbfs kern sys ufs/ffs vm In-Reply-To: <20190213192450.32343d6a@ralga.knownspace> Message-ID: <20190214153345.C1404@besplex.bde.org> References: <201901150102.x0F12Hlt025856@repo.freebsd.org> <20190213192450.32343d6a@ralga.knownspace> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=Wz2sYcN-yK6-1nRnajMA:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-Rspamd-Queue-Id: ADA47937BD X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.99 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_SHORT(-0.99)[-0.990,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Feb 2019 07:56:49 -0000 On Wed, 13 Feb 2019, Justin Hibbits wrote: > On Tue, 15 Jan 2019 01:02:17 +0000 (UTC) > Gleb Smirnoff wrote: > >> Author: glebius >> Date: Tue Jan 15 01:02:16 2019 >> New Revision: 343030 >> URL: https://svnweb.freebsd.org/changeset/base/343030 >> >> Log: >> Allocate pager bufs from UMA instead of 80-ish mutex protected >> linked list. > ... > > This seems to break 32-bit platforms, or at least 32-bit book-e > powerpc, which has a limited KVA space (~500MB). It preallocates I've > seen over 2500 pbufs, at 128kB each, eating up over 300MB KVA, > leaving very little left for the rest of runtime. Hrmph. I complained other things in this commit this when it was committed, but not this largest bug since preallocation was broken then so I thought that it wasn't done, so that problems are smaller unless the excessive limits are actually reached. Now i386 does it: XX ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP XX XX swrbuf: 336, 128, 0, 0, 0, 0, 0 XX swwbuf: 336, 64, 0, 0, 0, 0, 0 XX nfspbuf: 336, 128, 0, 0, 0, 0, 0 XX mdpbuf: 336, 25, 0, 0, 0, 0, 0 XX clpbuf: 336, 128, 0, 5, 4, 0, 0 XX vnpbuf: 336, 2048, 0, 0, 0, 0, 0 XX pbuf: 336, 16, 0, 2535, 0, 0, 0 but i386 now has 4GB of KVA, with almost 3GB to waste, so the bug is not noticed there. The preallocation wasn't there in my last mail to the author about nearby bugs, on 24 Jan 2019: YY vnpbuf: 568, 2048, 0, 0, 0, 0, 0 YY clpbuf: 568, 128, 0, 128, 8750, 0, 1 YY pbuf: 568, 16, 0, 4, 0, 0, 0 This output is on amd64 where the SIZE is larger and everything else was the same as on i386. Now amd64 shows the large preallocation too. There seems to be another bug for the especially small LIMIT of 16 to turn into a preallocation of 2535 and not cause immediate reduction to the limit. I happen to have kernels from 24 and 25 Jan handy. The first one is amd64 r343346M built on Jan 23, and it doesn't do the large preallocation. The second one is i386 r343388:343418M built on Jan 25, and it does the large preallocation. Both call uma_prealloc() to ask for nswbuf_max = 0x9e9 buffers, but the old version only allocates 4 buffers while later version allocate 0x9e9 buffers. The only relevant commit between the good and bad versions seems to be r343453. This fixes uma_prealloc() to actually work. But it is a feature for it to not work when its caller asks for too much. 0x9e9 is the sum of the LIMITs of all pbuf pools. The main bug in r343030 is that it expands nswbuf, which is supposed to give the combined limit, from its normal value of 256 to 0x9e9. (r343030 actually used nswbuf before it was properly initialized, so used its maximum value of 256 even on small systems with nswbuf = 16. Only this has been fixed.) On i386, nbuf is excessively limited so as to give a maxbufspace of about 100MB so as to fit in 1GB of kva even with infinite RAM and -current's actual 4GB of kva. nbuf is correctly limited to give a much smaller maxbufspace when RAM is small (kva scaling for this is not done so well). nswbuf is restricted if nbuf is restricted, but not enough (except in my version). It is normally 256, so the pbuf allocation used to be 32MB, and this is already a bit large compared with 100MB for maxbufspace. Expanding pbufs by a factor of 0x9e9/0x100 gives the silly combination of 100MB for maxbufspace and 317MB for pbufs. If kva is only 512MB instead of 1GB, then maxbufspace should be only 50MB and nswbuf should be smaller too. Similarly for PAE on i386 back when it was configured with 1GB kva by default. Only about 512MB are left after allocating space for page table metadata. I have fixes that scale most of this better. Large subsystems starting with kmem get a hard-coded fraction of the usable kva. E.g., kmem gets about 60% of usable kva instead of about 40% of nominal kva. Most other large subsystems including the buffer cache get about 1/8 of the remaining 40% of usable kva. Scaling for other subsystems is mostly worse than for kmem. pbufs are part of the buffer cache allocation. The expansion factor of 0x9e9/0x100 breaks this. I don't understand how pbuf_preallocate() allocates for the other pbuf pools. When I debugged this for clpbufs, the preallocation was not used. pbuf types other than clpbufs seem to be unused in my configurations. I thought that pbufs were used during initialization, since they end up with a nonzero FREE count, but their only use seems to be to preallocate them. Bruce