Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Oct 2015 00:00:57 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
Cc:        freebsd-bugs@freebsd.org
Subject:   Re: [Bug 204049] vmtotal() loading is high when memory utilization is high
Message-ID:  <20151028225118.L1832@besplex.bde.org>
In-Reply-To: <bug-204049-8-MlbJYA75Pk@https.bugs.freebsd.org/bugzilla/>
References:  <bug-204049-8@https.bugs.freebsd.org/bugzilla/> <bug-204049-8-MlbJYA75Pk@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 27 Oct 2015 bugzilla-noreply@freebsd.org wrote:

Bugzilla doesn't want replies, so I shouldn't reply.  I didn't reply to it.

> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204049
>
> Dmitry Sivachenko <demon@FreeBSD.org> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |demon@FreeBSD.org
>
> --- Comment #1 from Dmitry Sivachenko <demon@FreeBSD.org> ---
> I noticed that if a program calls clock() frequently (clock() in turn calls
> getrusage()), the system itself responds slow.  For example, we run word2vec
> program
> (http://word2vec.googlecode.com/svn/trunk/word2vec.c) in 32 threads (on 32-core
> machine) and during that all other programs (even single-threaded) run an order
> of magnitude slower compared with the time they use without word2vec.
>
> I wonder if the reason in the same.

It might be just lock contention for both.  Only a stupid program would
call these functions often, but vmtotal() is much heavier weight and the
lock contention for it is more obvious.  It holds lots of global locks
throughout loops.  getrusage() only holds locks for the process.

Only a stupid program would call clock() a lot, but clock() is badly
implemented.  Its units of were suitable in 1980, but became wrong
when microtime() started working in 1990-1995.  Its units are not even
stathz ticks, but are hard-coded 1/128 second ticks for compatibility
with the 1980 interface (except that probably had 1/60 second ticks).
The timing part of getrusage() takes the (very accurate) process runtime
that was recorded using microtime() in 1990-1995 and is now recorded
less accurately using cpu_ticks(), and splits it up into user+sys+intr;
this only reduces its accuracy slightly.  Then clock() reduces its accuracy
significantly by discarding the intr part and rounding user+sys to a multiple
of 1/128 seconds.  clock() also wastes time by getting full rusage and
discarding everyrhing except the times.

clock() can be implemented better using clock_gettime() on a suitable
clock id.  This method has only been available for 10-15 years.  The
following clock ids are suitable:
- CLOCK_PROF.  This gives the same result as clock() would (user+sys),
   not dumbed down except to convert it to timespec units, and without
   wasting time for full getrusage().  But the decomposition part is the
   slowest.
- CLOCK_PROCESS_CPUTIME_ID.  This returns the runtime of the current
   process, not dumbed down exccept to convert it to timespec units.
   This unfortunately requires considerable proc locking to add up times
   for all threads in the process.
- certain magic clock ids generalize the previous id to an arbitrary
   process.
The following clock ids are related:
- CLOCK_VIRTUAL.  This returns the 'user' part of the user+sys+intr
   decomposition of the runtime.  It has the same slownesses as
   CLOCK_PROF.
- CLOCK_THREAD_CPUTIME_ID.  This is like CLOCK_THREAD_CPUTIME_ID except
   it only returns the runtime of the current thread.  This doesn't using
   any locking except a critical section.
- certain magic clock ids generalize the previous id to an arbitrary
   thread.

There ar many bugs in the implementation of clock_gettime() family.  Some
of the related ones are:
- none of the above unportable clock ids is documented
- CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID are bogusly named.
   The '_ID' in their name is redundant, and is not used for the name of
   any other clock id.  It would be useful for them to operated on a general
   pid or tid, but they don't.
- the undocumented magic clock ids do act on a general pid or tid.  The
   thread case is an implementation detail for pthread_getcpuclockid(3)
   which is documented.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151028225118.L1832>