Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Jan 2012 01:19:41 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Marc Olzheim <marcolz@stack.nl>
Cc:        Garrett Cooper <yanegomi@gmail.com>, freebsd-performance@freebsd.org, Dieter BSD <dieterbsd@engineer.com>
Subject:   Re: cmp(1) has a bottleneck, but where?
Message-ID:  <20120104000111.K6684@besplex.bde.org>
In-Reply-To: <20120103083454.GA22673@zlo.nu>
References:  <20120103073736.218240@gmx.com> <CAGH67wQXuMasyc9BE8M9fHsQv6d2zdRxDQ2ekX4whjHJFyqZyg@mail.gmail.com> <20120103083454.GA22673@zlo.nu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 3 Jan 2012, Marc Olzheim wrote:

> On Tue, Jan 03, 2012 at 12:21:10AM -0800, Garrett Cooper wrote:
>>     The file is 3.0GB in size. Look at all those page faults though!
>> Thanks!
>> -Garrett
>
> From usr.bin/cmp/c_regular.c:
>
> #define MMAP_CHUNK (8*1024*1024)
> ...
> for (..) {
> 	mmap() chunk of size MMAP_CHUNK.
> 	compare
> 	munmap()k
> }
>
> That 8 MB chunk size sounds like a bad plan to me. I can imagine
> something needed to be done to compare files larger than X GB on a 32bit
> system, but 8MB is pretty small...

8MB is more than large enough.  It works at disk speed in my tests.  cp
still uses this value.  Old versions of cmp used the bogus value of
SIZE_T_MAX and aborted on large regular files when mmap() failed.
SIZE_T_MAX is bogus because it is larger than can possibly be mmapped
on 32-bit machines (except certain unsupported segmented ones), yet it
is not large enough for all files on 32-bit machines.  On 64-bite machines,
it is still more than can be mmapped (except...), but effectively infinity
since it is larger than all files.

cmp was changed to be more like cp.  Both are still remarkably defective.
cp is also remarkably ugly, especially in its fallback for when mmap()
fails.  The fallback for cmp is missing the ugliness, but it uses
getc() so it is very slow.  This might be the problem here.  The
fallback is to use c_special(), and c_special() is also used
unconditionally for "special" files, and special files are detected
badly:
- there is no way to force a file to be special (or not special).  This
   would be useful for testing the mmap() method and the non-mmap() method
   on the same file
- if one of the files is named "-", then this is an alias for stdin and
   the file is considered special.  I see no good reason to force
   specialness here.  It can be used to avoid the mmap() method.
- otherwise, one of the files is special if it is not regular according
   to fstat() on it.  For some reason, the fstat()s are not done if
   specialness was forced by one of the file names being "-".

In my tests, using "-" for one of the files mainly takes lots more user
time.  It only reduces the real time by 25%.  This is on a core2.  On
a system with a slow CPU, it is easy for getc() to be much slower than
the disk.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120104000111.K6684>