Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Sep 2010 11:14:35 +0300
From:      Andriy Gapon <avg@freebsd.org>
To:        freebsd-hackers@freebsd.org
Cc:        Jeff Roberson <jeff@freebsd.org>
Subject:   zfs + uma
Message-ID:  <4C93236B.4050906@freebsd.org>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------030602010507080304070903
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit


I've been investigating interaction between zfs and uma for a while.
You might remember that there is a noticeable fragmentation in zfs uma zones
when uma use is not enabled for actual data/metadata buffers.

I also noticed that when uma use is enabled for data/metadata buffers
(zio.use_uma=1) amount of memory reserved in free items of zfs uma zones becomes
really huge.  And this is despite the fact that the vast majority of the
data/metadata zone have items with sizes that are multiples of page size.
This couldn't really be because of fragmentation.

Further checks show that the free items are accumulated in per-cpu cache
buckets.  uz_count for those buckets starts with 1, but over time, during bursts
of activity, it grows up to maximum of 128.
Problem with those buckets is that they are not drained on low memory conditions
and uz_count never goes down.

So, after a while, I observe about 300 free items (on a mere two core system)
cached in 4 per-cpu buckets for a single zone with 128KB item size.
That's 30MB right there.
For all data and metadata zones the number goes as high as 500MB on my machine
with 4GB physical RAM.
This seems like a bit too much to me.

Although keeping free items around improves performance, it does consume memory
too.  And the fact that that memory is not freed on lowmem condition makes the
situation worse.

So, I decided to take a look at how they handle this situation in (Open)Solaris.
There is this good book:
http://books.google.com/books?id=r_cecYD4AKkC&printsec=frontcover
Please see section 6.2.4.5 on page 225 and table 6-11 on page 226.
And also this code:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/kmem.c#971

It makes sense to me to limit size of per-cpu buckets depending on item size.
I even wrote a little bit hackish patch [attached].
But I didn't go far as they did in Solaris, so minimum bucket size limit is 4.
But perhaps it would make sense to not use the cache at all starting with
certain size.

Another attached hack removes zio zones that have items larger than page size,
but not multiple of page size.  Internally they would still consume multiple of
page size per item, so we potentially can have two zones that use the same
number of pages per zone, but with different item size. With the patch they are
collapsed into a single zone.

-- 
Andriy Gapon

--------------030602010507080304070903
Content-Type: text/plain;
 name="uma-uz_count_max.diff"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="uma-uz_count_max.diff"

ZGlmZiAtLWdpdCBhL3N5cy92bS91bWFfY29yZS5jIGIvc3lzL3ZtL3VtYV9jb3JlLmMKaW5k
ZXggM2ZjNWI4YS4uM2I4Mzg0YiAxMDA2NDQKLS0tIGEvc3lzL3ZtL3VtYV9jb3JlLmMKKysr
IGIvc3lzL3ZtL3VtYV9jb3JlLmMKQEAgLTE3OSw5ICsxNzksMTIgQEAgc3RydWN0IHVtYV9i
dWNrZXRfem9uZSB7CiAJaW50CQl1YnpfZW50cmllczsKIH07CiAKLSNkZWZpbmUJQlVDS0VU
X01BWAkxMjgKKyNkZWZpbmUJQlVDS0VUX1NJWkVfVEhSRVNIT0xECTEzMTA3MgorI2RlZmlu
ZQlCVUNLRVRfTUFYCQkxMjgKIAogc3RydWN0IHVtYV9idWNrZXRfem9uZSBidWNrZXRfem9u
ZXNbXSA9IHsKKwl7IE5VTEwsICI0IEJ1Y2tldCIsIDQgfSwKKwl7IE5VTEwsICI4IEJ1Y2tl
dCIsIDggfSwKIAl7IE5VTEwsICIxNiBCdWNrZXQiLCAxNiB9LAogCXsgTlVMTCwgIjMyIEJ1
Y2tldCIsIDMyIH0sCiAJeyBOVUxMLCAiNjQgQnVja2V0IiwgNjQgfSwKQEAgLTE4OSw3ICsx
OTIsNyBAQCBzdHJ1Y3QgdW1hX2J1Y2tldF96b25lIGJ1Y2tldF96b25lc1tdID0gewogCXsg
TlVMTCwgTlVMTCwgMH0KIH07CiAKLSNkZWZpbmUJQlVDS0VUX1NISUZUCTQKKyNkZWZpbmUJ
QlVDS0VUX1NISUZUCTIKICNkZWZpbmUJQlVDS0VUX1pPTkVTCSgoQlVDS0VUX01BWCA+PiBC
VUNLRVRfU0hJRlQpICsgMSkKIAogLyoKQEAgLTE0NjMsNiArMTQ2NiwxMyBAQCB6b25lX2N0
b3Iodm9pZCAqbWVtLCBpbnQgc2l6ZSwgdm9pZCAqdWRhdGEsIGludCBmbGFncykKIAkJem9u
ZS0+dXpfY291bnQgPSBrZWctPnVrX2lwZXJzOwogCWVsc2UKIAkJem9uZS0+dXpfY291bnQg
PSBCVUNLRVRfTUFYOworCisJem9uZS0+dXpfY291bnRfbWF4ID0gQlVDS0VUX1NJWkVfVEhS
RVNIT0xEIC8gem9uZS0+dXpfc2l6ZTsKKwlpZiAoem9uZS0+dXpfY291bnRfbWF4ID4gQlVD
S0VUX01BWCkKKwkJem9uZS0+dXpfY291bnRfbWF4ID0gQlVDS0VUX01BWDsKKwllbHNlIGlm
ICh6b25lLT51el9jb3VudF9tYXggPCAoMSA8PCBCVUNLRVRfU0hJRlQpKQorCQl6b25lLT51
el9jb3VudF9tYXggPSAxIDw8IEJVQ0tFVF9TSElGVDsKKwogCXJldHVybiAoMCk7CiB9CiAK
QEAgLTIwNzYsNyArMjA4Niw3IEBAIHphbGxvY19zdGFydDoKIAljcml0aWNhbF9leGl0KCk7
CiAKIAkvKiBCdW1wIHVwIG91ciB1el9jb3VudCBzbyB3ZSBnZXQgaGVyZSBsZXNzICovCi0J
aWYgKHpvbmUtPnV6X2NvdW50IDwgQlVDS0VUX01BWCkKKwlpZiAoem9uZS0+dXpfY291bnQg
PCB6b25lLT51el9jb3VudF9tYXgpCiAJCXpvbmUtPnV6X2NvdW50Kys7CiAKIAkvKgpkaWZm
IC0tZ2l0IGEvc3lzL3ZtL3VtYV9pbnQuaCBiL3N5cy92bS91bWFfaW50LmgKaW5kZXggNzcx
MzU5My4uNmQ4MWUzZCAxMDA2NDQKLS0tIGEvc3lzL3ZtL3VtYV9pbnQuaAorKysgYi9zeXMv
dm0vdW1hX2ludC5oCkBAIC0zMzAsNiArMzMwLDcgQEAgc3RydWN0IHVtYV96b25lIHsKIAl1
X2ludDY0X3QJdXpfc2xlZXBzOwkvKiBUb3RhbCBudW1iZXIgb2YgYWxsb2Mgc2xlZXBzICov
CiAJdWludDE2X3QJdXpfZmlsbHM7CS8qIE91dHN0YW5kaW5nIGJ1Y2tldCBmaWxscyAqLwog
CXVpbnQxNl90CXV6X2NvdW50OwkvKiBIaWdoZXN0IHZhbHVlIHViX3B0ciBjYW4gaGF2ZSAq
LworCXVpbnQxNl90CXV6X2NvdW50X21heDsJLyogSGlnaGVzdCB2YWx1ZSB1el9jb3VudCBj
YW4gaGF2ZSAqLwogCiAJLyoKIAkgKiBUaGlzIEhBUyB0byBiZSB0aGUgbGFzdCBpdGVtIGJl
Y2F1c2Ugd2UgYWRqdXN0IHRoZSB6b25lIHNpemUK
--------------030602010507080304070903
Content-Type: text/plain;
 name="zfs-zio-zones.diff"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="zfs-zio-zones.diff"

ZGlmZiAtLWdpdCBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRzL2NvbW1vbi9m
cy96ZnMvemlvLmMgYi9zeXMvY2RkbC9jb250cmliL29wZW5zb2xhcmlzL3V0cy9jb21tb24v
ZnMvemZzL3ppby5jCmluZGV4IDhkZGY3Y2QuLjM0MGY2NzYgMTAwNjQ0Ci0tLSBhL3N5cy9j
ZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRzL2NvbW1vbi9mcy96ZnMvemlvLmMKKysrIGIv
c3lzL2NkZGwvY29udHJpYi9vcGVuc29sYXJpcy91dHMvY29tbW9uL2ZzL3pmcy96aW8uYwpA
QCAtMTIxLDEwICsxMjEsMTEgQEAgemlvX2luaXQodm9pZCkKIAkJCWFsaWduID0gU1BBX01J
TkJMT0NLU0laRTsKIAkJfSBlbHNlIGlmIChQMlBIQVNFKHNpemUsIFBBR0VTSVpFKSA9PSAw
KSB7CiAJCQlhbGlnbiA9IFBBR0VTSVpFOworI2lmIDAKIAkJfSBlbHNlIGlmIChQMlBIQVNF
KHNpemUsIHAyID4+IDIpID09IDApIHsKIAkJCWFsaWduID0gcDIgPj4gMjsKKyNlbmRpZgog
CQl9Ci0KIAkJaWYgKGFsaWduICE9IDApIHsKIAkJCWNoYXIgbmFtZVszNl07CiAJCQkodm9p
ZCkgc3ByaW50ZihuYW1lLCAiemlvX2J1Zl8lbHUiLCAodWxvbmdfdClzaXplKTsK
--------------030602010507080304070903--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C93236B.4050906>