Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Aug 2010 10:51:30 -0500
From:      Alan Cox <alc@cs.rice.edu>
To:        Dimitry Andric <dimitry@andric.com>
Cc:        alc@freebsd.org, current@freebsd.org
Subject:   Re: Official request: Please make GNU grep the default
Message-ID:  <4C6D5302.4030602@cs.rice.edu>
In-Reply-To: <4C6D3BBB.7030104@andric.com>
References:  <4C6505A4.9060203@FreeBSD.org>	<20100813085235.GA16268@freebsd.org>	<4C66C010.3040308@FreeBSD.org>	<4C673F02.8000805@FreeBSD.org>	<20100815013438.GA8958@troutmask.apl.washington.edu>	<4C67492C.5020206@FreeBSD.org>	<B7A05068-9578-4341-851B-86BD9BC7A2DA@gmail.com>	<8639ufd78w.fsf@ds4.des.no>	<4C6844D8.5070602@andric.com>	<86sk2faqdl.fsf@ds4.des.no>	<4C6AAA88.5080606@andric.com>	<AANLkTik-ee6iKiOoA=KMmmToS2giUOmW5JB-d1vBx9r3@mail.gmail.com>	<4C6AF13A.1080606@andric.com> <AANLkTikCyVVmx3-f4g2x1a%2Bq6PYOCLA-KrF53NFTx7uQ@mail.gmail.com> <4C6D3BBB.7030104@andric.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Dimitry Andric wrote:
> On 2010-08-17 23:24, Alan Cox wrote:
>   
>>> So normal mmap is ~3% slower, and prefault mmap does not seem to make
>>> any measurable difference.  I guess the added complexity is not really
>>> worth it, for now.
>>>       
>> Do you know what fraction of this time is being spent in the kernel?
>>     
>
> I ran 100 trials again, but now using "time -a -o logfile", so I could
> run ministat over the accumulated results.  This gives:
>
> x gnugrep
> + bsdgrep-r210927 (the initial version that started this thread)
> * bsdgrep-r211490 (current version)
> % bsdgrep-r211490-mmap-plain
> # bsdgrep-r211490-mmap-prefault
>
> Real time:
>     N           Min           Max        Median           Avg        Stddev
> x 100          1.15          1.98          1.18        1.2122    0.11159613
> + 100          8.57         14.26          8.79        9.1823     1.0496126
> * 100          2.81          6.57          2.91        3.0189     0.4304259
> % 100          2.34          4.03          2.99        3.0022    0.12635992
> # 100          2.85          3.49          2.88        2.8981   0.075232904
>
> User time:
>     N           Min           Max        Median           Avg        Stddev
> x 100             0          0.07          0.03        0.0239   0.015627934
> + 100           1.6          3.33           1.9         1.976    0.30264824
> * 100          0.29             1          0.39        0.4004    0.08696824
> % 100           1.8          3.56          2.73        2.7274    0.13260117
> # 100          2.78          3.04          2.81        2.8238    0.04039652
>
> System time:
>     N           Min           Max        Median           Avg        Stddev
> x 100          1.08          1.91          1.15        1.1809    0.10953617
> + 100          6.55          10.9          6.94        7.1905    0.77911809
> * 100          2.38           5.5          2.53        2.6061    0.35068445
> % 100          0.18          0.53          0.25        0.2645   0.053586049
> # 100          0.03          0.54          0.06        0.0668   0.052259647
>
> E.g. it looks like bsdgrep with 'plain' mmap performs almost the same
> as the regular bsdgrep (both around 3.0s average), but with mmap much
> more of the time is spent in user mode.
>
>   

That makes sense to me.  With traditional I/O, such as read(2), the 
copyout to user space fills the  processor's data cache with the data to 
be processed.  Grep's core algorithm in user space shouldn't be 
experiencing cache misses to obtain the data.  By and large, the cache 
misses will have occurred in the kernel.  However, once you switch to 
mmap(2), the kernel never touches the data, and all cache misses occur 
in user space.  You ought to be able to confirm this with pmcstat's 
sampling mode set to sample L2 cache misses.

Here is what actually puzzles me about these results.  With traditional 
I/O, even after the optimizations to bsdgrep, the system time for 
gnugrep is still less than half that of the optimized bsdgrep.  I 
haven't looked at the changes, but I would have thought the system time 
for gnugrep and bsdgrep would be almost the same.

> And it seems prefaulting does help now!  I guess it also makes sense to
> add madvise(..., MADV_SEQUENTIAL)?
>
>   

This won't matter as long as you are working with memory resident 
files.  With a memory resident file, it would only be a waste of cycles.

>   
>> Does
>> the value of "sysctl vm.pmap.pde.mappings" increase as a result of your
>> test?  If not, there is still room for improvement in the results with
>> mmap().
>>     
>
> It always stays at 0, I have never seen any other value.
>   

Addressing this issue would mostly affect the system time, which is 
already tiny for mmap-prefault, so I wouldn't be concerned about this (yet).

Did you ever describe your test machine?  If so, I'm sorry, but I missed 
that.  Is it running an amd64 or i386 kernel?  Can you briefly describe 
what kind of processor and memory it has?

Regards,
Alan




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C6D5302.4030602>