Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Sep 2008 00:52:59 -0400
From:      Jeff Wheelhouse <freebsd-hackers@wheelhouse.org>
To:        freebsd-hackers@freebsd.org
Subject:   Major SMP problems with lstat/namei
Message-ID:  <8185F68B-C443-4891-BEC2-5E3D453DDC93@wheelhouse.org>

next in thread | raw e-mail | index | archive | help

We have encountered some serious SMP performance/scalability problems  
that we've tracked back to lstat/namei calls.  I've written a quick  
benchmark with a pair of tests to simplify/measure the problem.  Both  
tests use a tree of directories: the top level directory contains five  
subdirectories a, b, c, d, and e.  Each subdirectory contains five  
subdirectories a, b, c, d, and e, and so on..  1 directory at level  
one, 5 at level two, 25 at level three, 125 at level four, 625 at  
level five, and 3125 at level six.

In the "realpath" test, a random path is constructed at the bottom of  
the tree (e.g. /tmp/lstat/a/b/c/d/e) and realpath() is called on that,  
provoking lstat() calls on the whole tree.  This is to simulate a mix  
of high-contention and low-contention lstat() calls.

In the "lstat" test, lstat is called directly on a path at the bottom  
of the tree.  Since there are 3125 files, this simulates relatively  
low-contention lstat() calls.

In both cases, the test repeats as many times as possible for 60  
seconds.  Each test is run simultaneously by multiple processes, with  
progressively doubling concurrency from 1 to 512.

What I found was that everything is fine at concurrency 2, probably  
indicating that the benchmark pegged on some other resource limit.  At  
concurrency 4, realpath drops to 31.8% of concurrency 1.  At  
concurrency 8, performance is down to 18.3%.  In the interim, CPU load  
goes to 80-90% system CPU.  I've confirmed via ktrace and the rusage  
that the CPU usage is all system time, and that lstat() is the *only*  
system call in the test (realpath() is called with an absolute path).

I then reran the 32-process test on 1-7 cores, and found that  
performance peaks at 2 cores and drops sharply from there.  eight  
cores runs *fifteen* times slower than two cores.

The test full results are at the bottom of this message.

This is on 6.3-RELEASE-p4 with vfs.lookup_shared=1.

I believe this is the same issue that was previously discussed as "2 x  
quad-core system is slower that 2 x dual core on FreeBSD" archived here:

http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038441.html

In that post, Kris Kennaway wrote:
 > It is hard to say for certain without a direct profile comparison  
of the
 > workload, but it is probably due to lockmgr contention.  lockmgr is  
used
 > for various locking operations to do with VFS data structures.  It is
 > known to have poor performance and scale very badly."

At this point, what I've got is one of those synthetic benchmarks, but  
it matches our production problems exactly, except that the production  
processes need a whole lot more RAM and eventually when this  
manifests, they backlog and the server death spirals through swap,  
which is a most unfortunate difference.

I've chased my way up the kernel source to kern_lstat(), where a  
shared lock is obtained, and then onto namei, where vfs.lookup_shared  
comes into play.  But unfortunately, I don't understand lockmgr, I  
don't know how the macros and flags I see here relate to it, I can't  
figure out what happened to the changes that Attilio Rao was working  
on, and there didn't seem to be much other hope at the time.

This is becoming a huge problem for us.  Is there anything that at all  
can be done, or any news?  In the case linked above, improvement was  
made by changing a PHP setting that isn't applicable in our case.

Thanks,
Jeff

Concurrency 1

	realpath
		Total = 1409069 (100%)
		Total/Sec = 23484
		Total/Sec/Worker = 23484

	lstat
		Total = 6828763 (100%)
		Total/Sec = 113812
		Total/Sec/Worker = 113812

Concurrency 2

	realpath
		Total = 1450489 (100%)
		Total/Sec = 24174
		Total/Sec/Worker = 12087

	lstat
		Total = 6891417 (100.9%)
		Total/Sec = 114856
		Total/Sec/Worker = 57428


Concurrency 4

	realpath
		Total = 448693 (31.8%)
		Total/Sec = 7478
		Total/Sec/Worker = 1869

	lstat
		Total = 3047933 (44.6%)
		Total/Sec = 50798
		Total/Sec/Worker = 12699

Concurrency 8

	realpath
		Total = 258281 (18.3%)
		Total/Sec = 4304
		Total/Sec/Worker = 538

	lstat
		Total = 1688728 (24.7%)
		Total/Sec = 28145
		Total/Sec/Worker = 3518

Concurrency 16

	realpath
		Total = 179150 (12.7%)
		Total/Sec = 2985
		Total/Sec/Worker = 186

	lstat
		Total = 966558 (14.1%)
		Total/Sec = 16109
		Total/Sec/Worker = 1006

Concurrency 32

	realpath
		Total = 116982 (8.3%)
		Total/Sec = 1949
		Total/Sec/Worker = 60

	lstat
		Total = 644703 (9.4%)
		Total/Sec = 10745
		Total/Sec/Worker = 335

Concurrency 64

	realpath
		Total = 112050 (7.9%)
		Total/Sec = 1867
		Total/Sec/Worker = 29

	lstat
		Total = 572798 (8.3%)
		Total/Sec = 9546
		Total/Sec/Worker = 149


Concurrency 128

	realpath
		Total = 111544 (7.9%)
		Total/Sec = 1859
		Total/Sec/Worker = 14

	lstat
		Total = 570800 (8.3%)
		Total/Sec = 9513
		Total/Sec/Worker = 74


Concurrency 256

	realpath
		Total = 96461 (6.8%)
		Total/Sec = 1607
		Total/Sec/Worker = 6

	lstat
		Total = 580679 (8.5%)
		Total/Sec = 9677
		Total/Sec/Worker = 37


Concurrency 512

	realpath
		Total = 91224 (6.4%)
		Total/Sec = 1520
		Total/Sec/Worker = 2

	lstat
		Total = 498342 (7.2%)
		Total/Sec = 8305
		Total/Sec/Worker = 16

realpath Concurrency 32 - 1 Core

Total = 1289527
Total/Sec = 21492
Total/Sec/Worker = 671

realpath Concurrency 32 - 2 Core

Total = 1753625
Total/Sec = 29227
Total/Sec/Worker = 913

realpath Concurrency 32 - 3 Core

Total = 1197896
Total/Sec = 19964
Total/Sec/Worker = 623

realpath Concurrency 32 - 4 Core

Total = 631293
Total/Sec = 10521
Total/Sec/Worker = 328

realpath Concurrency 32 - 5 Core

Total = 227814
Total/Sec = 3796
Total/Sec/Worker = 118

realpath Concurrency 32 - 6 Core

Total = 153550
Total/Sec = 2559
Total/Sec/Worker = 79

realpath Concurrency 32 - 7 Core

Total = 136013
Total/Sec = 2266
Total/Sec/Worker = 70





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8185F68B-C443-4891-BEC2-5E3D453DDC93>