Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 14 May 2006 22:21:45 +0300
From:      Sven Petai <hadara@bsd.ee>
To:        freebsd-current@freebsd.org
Cc:        Robert Watson <rwatson@freebsd.org>, Kris Kennaway <kris@obsecurity.org>
Subject:   Re: Fine-grained locking for POSIX local sockets (UNIX domain sockets)
Message-ID:  <200605142221.46093.hadara@bsd.ee>
In-Reply-To: <20060508065207.GA20386@xor.obsecurity.org>
References:  <20060506150622.C17611@fledge.watson.org> <20060507230430.GA6872@xor.obsecurity.org> <20060508065207.GA20386@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 08 May 2006 09:52, Kris Kennaway wrote:

>  The other big gain is to sleep
> mtxpool contention, which roughly doubled:
>
> /*
>  * Change the total socket buffer size a user has used.
>  */
> int
> chgsbsize(uip, hiwat, to, max)
>         struct  uidinfo *uip;
>         u_int  *hiwat;
>         u_int   to;
>         rlim_t  max;
> {
>         rlim_t new;
>
>         UIDINFO_LOCK(uip);
>
> So the next question is how can that be optimized?
>
> Kris

hi

on the 8 core machine this lock was the top contended one with rwatsons patch, 
with over 8 million failed acquire attempts.
Originally the unp lock had only ~3 million of those, so this explains the 
sharp drop with larger number of threads I suppose.

I feel like I'm missing some very obvious reason, but wouldn't the simplest 
workaround be just to return 1 right away if limit is set to infinity, which 
is almost always the case since it's the default, and document on the 
login.conf manpage that you might take performance hit with this type of 
workloads when you set sbsize limits.

--- /usr/src/sys_clean/kern/kern_resource.c     Sat Mar 11 12:48:19 2006
+++ /usr/src/sys/kern/kern_resource.c   Sun May 14 05:34:02 2006
@@ -1169,6 +1169,10 @@
 {
        rlim_t new;

+       if (max == RLIM_INFINITY) {
+               *hiwat = to;
+               return (1);
+       }
        UIDINFO_LOCK(uip);
        new = uip->ui_sbsize + to - *hiwat;
        /* Don't allow them to exceed max, but allow subtraction. */

8 core machine that I originally used for benchmarking was
shipped out to client, so I couldn't test how it would have
performed with uidinfo contention out of the way, but results from
a 1 * dualcore machine look good:
http://bsd.ee/~hadara/debug/mysql4/dualcore/stats.html

Several interesting things can be noticed from this data
 * on dualcore rwatsons patch gives consistent performance boost
   with all the thread settings tested, no sharp drop after 20 that I had on
   8 core
 * with threadcount in range [3;10] even number of
   threads performs usually ~4-5% better than odd number
 * with uidinfo + rwatson patch there were some significant outliers
   where one result was more than 30% better than others with same
   settings, these were removed before calculating mean values for graphs

after I had finished benchmarking I discovered that new malloc library has 
debug turned on. After turning it off I see large (20-25%) performance boosts 
across the range, so I started new round of testing with NO_MALLOC_EXTRAS 
defined, I'll update the results ASAP.


I wonder if I should set up automatic&periodic performance testing
system, that would run all the tests for example once a week, with
latest current and stable, so that it would be easier for developers
to see how changes affect different workloads.

If you guys think it would be worthwile, what would be the bechmarks
you would like to see in addition to mysql+supersmack ?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200605142221.46093.hadara>