Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Apr 1995 15:53:51 +1000
From:      Bruce Evans <bde@zeta.org.au>
To:        bde@zeta.org.au, terry@cs.weber.edu
Cc:        geli.com!rcarter@implode.root.com, hackers@FreeBSD.org, jkh@violet.berkeley.edu, toor@jsdinc.root.com
Subject:   Re: benchmark hell..
Message-ID:  <199504270553.PAA13004@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>What are the same numbers for an fstat?  One would expect it to drop out
>only the lookup itself.

32uS instead of 110:
__qdivrem	9
_ufs_getattr	5
_copyout	5
_syscall	5
_copyin		3
_vn_stat	2
_Xsyscall	1
___udivdi3	1
_doreti		1
_fstat		1

>I think the malloc and free are suspicious, and should probably be
>stack allocation instead.  That's 14uS (or 12%) right there.

The buffers have size MAXPATHLEN and are allocated in namei().  I'm not
sure how many can be active at once.  More than a couple would not fit
on the stack.  BTW, namei() is not even internally consistent in its
use of the malloc macros.  It always uses MALLOC(), but it uses both
Free() and free().

>The divides are *extremely* curious.  They could be alignement in malloc,
>though I would expect an AND to be used instead of a div.  That's 12%
>right there.

I think they are for `sb->st_blocks = vap->va_bytes / S_BLKSIZE;' in
vn_stat().  If so, they are poorly implemented.  vap->va_bytes is a
quad_t, but it is usually smaller than 2G, not to mention smaller
than 2G * S_BLKSIZE, and S_BLKSIZE is small, not to mention a power
of 2 (512), so the natural i386 (quad_t, long) -> long division
operator usually applies, so the division should be little slower
than an ordinary divsion.  gcc doesn't optimize the division into
shifts even if the dividend is uquad_t.

>I understand the copyout (but it's a bit large), but I don't
>undrestand the copyin seperate from the copyinstr.

It is for copying syscall args off the user stack.

>I think the ufs_getattr comes from the buffer fudging that is used
>for NFS export but serves no real useful purpose here; I rememebr
>complaining about the semantic change at the time it was made for
>just this reason.

It is necessary to convert ufs attibutes to stat attributes.

>I don't understand the double lock, unless it was for the directory
>lookup then the stat of the object itself.  If this is the case (I'll

Half are in lookup() and half are in vget().  I don't understand vget().

>> There's lots of bloat to trim.  I would start with ufs_lock() and
>> ufs_unlock() because they are significant in tty i/o, then look at
>> the quad division functions.

>Yeah... although I wouldn't expect a big impact on ttyio except from
>the lookup unles you are talking specfs.

Parts of ttyio are highly optimized.  This makes the unoptimized parts
more obvious.  My benchmark for it uses select() and MIN=255 (the max),
so most reads return only a little more than 255 bytes, which is too
small for efficiency, and select() gets exercised too much too.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199504270553.PAA13004>