From owner-freebsd-performance@FreeBSD.ORG Tue Jan 3 14:19:47 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2CB8C106566B for ; Tue, 3 Jan 2012 14:19:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id BDBE58FC15 for ; Tue, 3 Jan 2012 14:19:46 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q03EJfrh030971 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 4 Jan 2012 01:19:42 +1100 Date: Wed, 4 Jan 2012 01:19:41 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Marc Olzheim In-Reply-To: <20120103083454.GA22673@zlo.nu> Message-ID: <20120104000111.K6684@besplex.bde.org> References: <20120103073736.218240@gmx.com> <20120103083454.GA22673@zlo.nu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Garrett Cooper , freebsd-performance@freebsd.org, Dieter BSD Subject: Re: cmp(1) has a bottleneck, but where? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jan 2012 14:19:47 -0000 On Tue, 3 Jan 2012, Marc Olzheim wrote: > On Tue, Jan 03, 2012 at 12:21:10AM -0800, Garrett Cooper wrote: >> The file is 3.0GB in size. Look at all those page faults though! >> Thanks! >> -Garrett > > From usr.bin/cmp/c_regular.c: > > #define MMAP_CHUNK (8*1024*1024) > ... > for (..) { > mmap() chunk of size MMAP_CHUNK. > compare > munmap()k > } > > That 8 MB chunk size sounds like a bad plan to me. I can imagine > something needed to be done to compare files larger than X GB on a 32bit > system, but 8MB is pretty small... 8MB is more than large enough. It works at disk speed in my tests. cp still uses this value. Old versions of cmp used the bogus value of SIZE_T_MAX and aborted on large regular files when mmap() failed. SIZE_T_MAX is bogus because it is larger than can possibly be mmapped on 32-bit machines (except certain unsupported segmented ones), yet it is not large enough for all files on 32-bit machines. On 64-bite machines, it is still more than can be mmapped (except...), but effectively infinity since it is larger than all files. cmp was changed to be more like cp. Both are still remarkably defective. cp is also remarkably ugly, especially in its fallback for when mmap() fails. The fallback for cmp is missing the ugliness, but it uses getc() so it is very slow. This might be the problem here. The fallback is to use c_special(), and c_special() is also used unconditionally for "special" files, and special files are detected badly: - there is no way to force a file to be special (or not special). This would be useful for testing the mmap() method and the non-mmap() method on the same file - if one of the files is named "-", then this is an alias for stdin and the file is considered special. I see no good reason to force specialness here. It can be used to avoid the mmap() method. - otherwise, one of the files is special if it is not regular according to fstat() on it. For some reason, the fstat()s are not done if specialness was forced by one of the file names being "-". In my tests, using "-" for one of the files mainly takes lots more user time. It only reduces the real time by 25%. This is on a core2. On a system with a slow CPU, it is easy for getc() to be much slower than the disk. Bruce