From owner-freebsd-current@FreeBSD.ORG Mon Jun 21 22:10:22 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C692316A4CE; Mon, 21 Jun 2004 22:10:22 +0000 (GMT) Received: from corbulon.video-collage.com (corbulon.video-collage.com [64.35.99.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id D88C843D2D; Mon, 21 Jun 2004 22:10:21 +0000 (GMT) (envelope-from Mikhail.Teterin@Murex.com) Received: from 250-217.customer.cloud9.net (195-11.customer.cloud9.net [168.100.195.11])i5LMAIMs053684 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 21 Jun 2004 18:10:19 -0400 (EDT) (envelope-from Mikhail.Teterin@Murex.com) Received: from localhost (mteterin@localhost [127.0.0.1]) i5LMA4pL061906; Mon, 21 Jun 2004 18:10:04 -0400 (EDT) (envelope-from Mikhail.Teterin@Murex.com) From: Mikhail Teterin Organization: Murex N.A. To: Matthew Dillon Date: Mon, 21 Jun 2004 18:10:03 -0400 User-Agent: KMail/1.6.2 References: <200406211057.31103@aldan> <200406211952.i5LJqWSl035702@apollo.backplane.com> In-Reply-To: <200406211952.i5LJqWSl035702@apollo.backplane.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="koi8-u" Content-Transfer-Encoding: 7bit Message-Id: <200406211810.03629@misha-mx.virtual-estates.net> X-Virus-Scanned: clamd / ClamAV version devel-20040615, clamav-milter version 0.73a on corbulon.video-collage.com X-Virus-Status: Clean X-Scanned-By: MIMEDefang 2.39 X-Mailman-Approved-At: Tue, 22 Jun 2004 12:02:33 +0000 cc: questions@freebsd.org cc: Julian Elischer cc: Mikhail Teterin cc: current@freebsd.org Subject: Re: read vs. mmap (or io vs. page faults) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jun 2004 22:10:23 -0000 = Both read and mmap have a read-ahead heuristic. The heuristic = works. In fact, the mmap heuristic is so smart it can read-behind = as well as read-ahead if it detects a backwards scan. Evidently, read's heuristics are better. At least, for this task. I'm, actually, surprised, they are _different_ at all. The mmap interface is supposed to be more efficient -- theoreticly -- because it requires one less buffer-copying, and because it (together with the possible madvise()) provides the kernel with more information thus enabling it to make better (at least -- no worse) decisions. That these theoretical advantages -- small or not -- are eaten by, what seems like, practical implementation deficiencies to the point, that using mmap is not only not faster, but frequently slower -- wallclock-wise -- is, in itself, a serious shortcoming, that stands between an OS and perfection. That other OSes have similar shortcomings simply gives us some breathing room from an advocacy point of view. I hope, my rhetoric will burn an itch in someone capable of addressing it technically :-) = The heuristic does not try to read megabytes and megabytes ahead, = however... Neither does the read-handling. = that might speed up this particular application a little, but it = would destroy performance for many other types of applications, = especially in a loaded environment. I'm not asking mmap (page fault handling) to cache any more aggressively, than read-handling does. = Well now hold a second... the best you can do here is compare relative = differences between mmap and read. This is all I am doing, actually. :-) = If you really want to compare operating systems, you have to run the = OS's and the tests on the same hardware. I am comparing relative differences between between read and mmap on different OSes. =: 4.8-stable on Pentium2-400MHz =: mmap: 21.507u 11.472s 1:27.53 37.6% 62+276k 99+0io 44736pf+0w =: read: 10.619u 23.814s 1:17.67 44.3% 62+274k 11255+0io 0pf+0w = = mmap 12% slower then read. 12% isn't much. Well, now we are venturing into the domain of humans' subjective perception... I'd say, 12% is plenty, actually. This is what some people achieve by rewriting stuff in assembler -- and are proud, when it works :-) =: recent -current on dual P2 Xeon-450MHz (mmap WINS -- SMP?) =: mmap: 12.482u 12.872s 2:28.70 17.0% 74+298k 23+0io 46522pf+0w =: read: 7.255u 16.366s 3:27.07 11.4% 70+283k 44437+0io 7pf+0w = = mmap 39% faster. That's a significant difference. = = It kinda smells funny, actually... are you sure that you compiled = your FreeBSD-5 system with Witness turned off? There are no "WITNESS" options in the kernel's config file (unlike in NOTES). So, unless there has to be some sort of explicit "NOWITNESS", I am sure. =: recent -current on a Centrino-laptop P4-1GHz (NO win at all) =: mmap: 4.197u 3.920s 2:07.57 6.3% 65+284k 63+0io 45568pf+0w =: read: 3.965u 4.265s 1:50.26 7.4% 67+291k 13131+0io 17pf+0w = = mmap 15% slower. =: Linux 2.4.20-30.9bigmem dual P4-3GHz (with a different file) =: mmap: 2.280u 4.800s 1:13.39 9.6% 0+0k 0+0io 512434pf+0w =: read: 1.630u 2.820s 0:08.89 50.0% 0+0k 0+0io 396pf+0w = = mmap 821% slower on Linux? With a different file? So these numbers = can't be compared to anything else (over and above the fact that this = machine is three times faster then any of the others). No, the file is different (as is the processor) -- relative performance difference only. I was quite surprised myself. My fmd5 program does not show such a dramatic difference, but `fgrep --mmap' is vastly slower on Linux, than the regular `fgrep'. Here are the results of the two new fgrep runs: mmap1: 1.450u 3.000s 0:46.00 9.6% 0+0k 0+0io 512439pf+0w read1: 1.830u 2.620s 0:09.51 46.7% 0+0k 0+0io 393pf+0w mmap2: 1.700u 4.040s 1:02.31 9.2% 0+0k 0+0io 512427pf+0w read2: 1.330u 3.150s 0:09.38 47.7% 0+0k 0+0io 396pf+0w = I'm not sure why you are complaining about FreeBSD. Because I have much higher expectations for it :-) I thought, I'll be able to use the powerful technique of presenting a Linux' superiority in some area to fire up rapid improvements in the same area in FreeBSD. Now I'm back to fighting the "12% gain is not worth the effort" mentality. =:Once mmap-handling is improved, all sorts of whole-file operations =:(bzip2, gzip, md5, sha1) can be made faster... = Well, your numbers don't really say that. It looks like you might = eeek out a 10-15% improvement, and while this is faster it really = isn't all that much faster. It certainly isn't something to write = home about, and certainly not significant enough to warrant major = codework. Put it into perspective -- 10-15% is usually the difference between the latest processor and the previous one. People are willing to pay hundreds of dollars premium... Besides, the differences can be higher. Here is from md5-ing a 2097272832-bytes file over NFS (on a Gigabit network, no jumbo frames). The machine runs a FreeBSD-current on a single P4 2GHz: mmap1: 17.115u 16.106s 2:20.84 23.5% 5+166k 0+0io 253421pf+0w read1: 19.468u 12.179s 1:27.80 36.0% 4+163k 0+0io 0pf+0w mmap2: 17.214u 13.265s 2:13.75 22.7% 5+165k 1+0io 204842pf+0w read2: 19.142u 11.576s 1:20.22 38.2% 4+162k 0+0io 4pf+0w mmap is 87% slower (or read is 38% faster)! According to `systat -if', mmap was reading at about 13Mb/s, while read was consistently above 20Mb/s. If this mmap-associated penalty is removed, the applications can save some memory by not using the BUFSIZ (or bigger) buffers, and the systems can save the time and effort of shuffling the memory from kernel buffers into user space (and flushing the instruction and data caches). The difference can be big -- on a CPU bound machine the sum of user time and system time is much smaller with mmap. For example, on this Solaris box running on Sparc-900MHz md5-ing a 16061698048-byte file (FreeBSD behaves similarly on the P2 400MHz reported earlier): mmap: 215.290u 48.990s 7:18.81 60.2% 0+0k 0+0io 0pf+0w read: 184.240u 142.350s 5:46.31 94.3% 0+0k 0+0io 0pf+0w (264.28 vs. 326.59 CPU seconds) but read manages to saturate the CPU better -- 94% vs. 60% -- and win the "wall clock" race repeatedly... Yours, -mi