Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Feb 2006 08:50:17 +0530
From:      "Joseph Koshy" <joseph.koshy@gmail.com>
To:        "Andrew Gallatin" <gallatin@cs.duke.edu>
Cc:        freebsd-amd64@freebsd.org
Subject:   Re: non-temporal copyin/copyout?
Message-ID:  <84dead720602171920y153bd9d5p1c0aa11cbc177020@mail.gmail.com>
In-Reply-To: <17397.63064.242130.484086@grasshopper.cs.duke.edu>
References:  <17397.58669.457047.277510@grasshopper.cs.duke.edu> <84dead720602170750j119080c9g32ec9f1ac0e3944d@mail.gmail.com> <17397.63064.242130.484086@grasshopper.cs.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2/17/06, Andrew Gallatin <gallatin@cs.duke.edu> wrote:

ag>  "k8-dc-miss" : data cache misses
ag>  91.5    6466.00  6466.00        0  100.00%           copyout [1]

ag>  "k8-bu-fill-request-l2-miss,mask=3Ddc-fill" : L2 fills
ag> for the data cache
ag>  88.2    3866.00  3866.00        0  100.00%           copyout [1]

Certainly copyout() appears to be thrashing the cache.

ag>  "k8-dc-misaligned-data-reference": in case there are any

ag>  99.5   66763.00 66763.00        0  100.00%           copyout [1]

The code in question "/usr/src/sys/amd64/amd64/support.S" has:
    216         ENTRY(copyout)
    ...
    249         shrq    $3,%rcx
    250         cld
    251         rep
    252         movsq
    253         movb    %dl,%cl
    254         andb    $7,%cl
    255         rep
    256         movsb

i.e., it doesn't handle the case where the `from_kernel'
or `to_user' addresses are misaligned to their natural
boundaries.  IIRC `rep movsq' works best if both the source
and destination addresses are 8-byte aligned.

If we are going to use `movntq' then we may as well take
care of alignment issues too.

jk>  "k8-fr-interrupts-masked-while-pending-cycles": for
jk>      finding spots in the code where spin-locks are being
jk>      held for long.

ag> I had to tweak the sample rate to 512 for this one.
ag>  52.5     330.00   330.00        0  100.00%           acpi_cpu_idle [1]
ag>  10.4     395.00    65.00        0  100.00%           spinlock_exit [2]
ag>   9.1     452.00    57.00        0  100.00%           acpi_cpu_c1 [3]

This is interesting too, but I'm not sure how much of
an effect it has on this particular benchmark.

--
FreeBSD Volunteer,     http://people.freebsd.org/~jkoshy



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?84dead720602171920y153bd9d5p1c0aa11cbc177020>