From owner-freebsd-current@FreeBSD.ORG Wed Feb 1 01:34:02 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48CFC106566C; Wed, 1 Feb 2012 01:34:02 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 8D76E8FC14; Wed, 1 Feb 2012 01:34:01 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAL+VKE+DaFvO/2dsb2JhbABDhQunOoMtgXIBAQQBI1YbDgoCAg0ZAlkGCogFp0uRcIEviW8HAgIdAwQBDgEIBQMDCQ2DAwIGBQIEDAYNAwkCAnMNAhCCI4EWBIhAjGCJLolE X-IronPort-AV: E=Sophos;i="4.71,599,1320642000"; d="scan'208";a="154535183" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 31 Jan 2012 20:34:00 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 44981B3F3A; Tue, 31 Jan 2012 20:34:00 -0500 (EST) Date: Tue, 31 Jan 2012 20:34:00 -0500 (EST) From: Rick Macklem To: John Baldwin Message-ID: <1517648658.522197.1328060040257.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201201311321.47714.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Tijl Coosemans , freebsd-current@freebsd.org Subject: Re: posix_fadvise noreuse disables file caching X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Feb 2012 01:34:02 -0000 John Baldwin wrote: > On Tuesday, January 31, 2012 12:21:07 pm Ulrich Sp=C3=B6rlein wrote: > > On Mon, 2012-01-30 at 09:36:45 -0500, John Baldwin wrote: > > > On Sunday, January 29, 2012 10:08:10 am Tijl Coosemans wrote: > > > > On Wednesday 25 January 2012 17:29:22 John Baldwin wrote: > > > > > On Friday, January 20, 2012 2:12:13 pm John Baldwin wrote: > > > > >> On Thursday, January 19, 2012 11:39:42 am Tijl Coosemans > > > > >> wrote: > > > > >>> I recently noticed that multimedia/vlc generates a lot of > > > > >>> disk IO when > > > > >>> playing media files. For instance, when playing a 320kbps > > > > >>> mp3 gstat > > > > >>> reports about 1250kBps (=3D10000kbps). That's quite a lot of > > > > >>> overhead. > > > > >>> > > > > >>> It turns out that vlc sets POSIX_FADV_NOREUSE on the entire > > > > >>> file and > > > > >>> reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as > > > > >>> if > > > > >>> O_DIRECT was specified during open(2), i.e. it disables all > > > > >>> caching. > > > > >>> That means every 1028 byte read turns into a 32KiB read (new > > > > >>> default > > > > >>> block size in 9.0) which explains the above numbers. > > > > >>> > > > > >>> I've copied the relevant vlc code below > > > > >>> (modules/access/file.c:Open()). > > > > >>> It's interesting to see that on OSX it sets F_NOCACHE which > > > > >>> disables > > > > >>> caching too, but combined with F_RDAHEAD there's still > > > > >>> read-ahead > > > > >>> caching. > > > > >>> > > > > >>> I don't think POSIX intended for NOREUSE to mean O_DIRECT. > > > > >>> It should > > > > >>> still cache data (and even do read-ahead if F_RDAHEAD is > > > > >>> specified), > > > > >>> and once data is fetched from the cache, it can be marked > > > > >>> WONTNEED. > > > > >> > > > > >> POSIX doesn't specify O_DIRECT, so it's not clear what it > > > > >> asks for. > > > > >> > > > > >>> Is it possible to implement it this way, or if not to just > > > > >>> ignore > > > > >>> the NOREUSE hint for now? > > > > >> > > > > >> I think it would be good to improve NOREUSE, though I had > > > > >> sort of > > > > >> assumed that applications using NOREUSE would do their own > > > > >> buffering > > > > >> and read full blocks. We could perhaps reimplement NOREUSE by > > > > >> doing > > > > >> the equivalent of POSIX_FADV_DONTNEED after each read to free > > > > >> buffers > > > > >> and pages after the data is copied out to userland. I also > > > > >> have an > > > > >> XXX about whether or not NOREUSE should still allow > > > > >> read-ahead as it > > > > >> isn't very clear what the right thing to do there is. HP-UX > > > > >> (IIRC) > > > > >> has an fadvise() that lets you specify multiple policies, so > > > > >> you > > > > >> could specify both NOREUSE and SEQUENTIAL for a single region > > > > >> to > > > > >> get read-ahead but still release memory once the data is read > > > > >> once. > > > > > > > > > > So I've came up with this untested patch. It uses > > > > > VOP_ADVISE(FADV_DONTNEED) after read(2) calls to a NOREUSE > > > > > region, and > > > > > leaves read-ahead caching enabled for NOREUSE. FADV_DONTNEED > > > > > doesn't > > > > > do any good really for writes (it only flushes clean buffers), > > > > > so I've > > > > > left write(2) operations as using IO_DIRECT still. Does this > > > > > sound > > > > > reasonable? I've not yet tested this at all: > > > > > > > > The patch drastically improves vlc, but there's still a tiny > > > > overhead. > > > > Without NOREUSE the disk is read in chunks of 128KiB (F_RDAHEAD > > > > buffer > > > > size). With NOREUSE there's an extra transfer of 32KiB (block > > > > size). > > > > > > This is probably because vlc is not reading on block boundaries, > > > so the > > > noreuse is throwing away partial blocks at the end of a read that > > > then have to > > > be re-read. We could maybe fix this by making FADV_DONTNEED only > > > throw > > > away completely-contained blocks rather than completely-contained > > > pages. > > > However, this will probably result in NOREUSE not actually > > > throwing away > > > anything at all if an app always reads sub-blocksize chunks. > > > > > > We could maybe make the case of vlc work ok in this case though by > > > allowing > > > an extension where you can do 'posix_fadvise(SEQUENTIAL | > > > NOREUSE)', and > > > in this case we could make the VOP_ADVISE(DONTNEED) in read() use > > > an offset > > > of 0 rather than the start of the read request. > > > > > > However, posix_fadvise() really is going to work best if the > > > userland > > > application reads aligned FS blocks. > > > > I find it questionable in general that an application can tell the > > system what to do wrt. caching. Perhaps I'm running 100s of VLC > > players > > all on the same file and actually *do* want reads to be cached? > > > > What happens if I seek back in the file? It has to do a potentially > > high-latency read again. The system has a better overview of blocks > > that > > are frequently being requested than any individual application. > > > > I fully understand the intention, and in 99.99% of the cases, this > > data > > *is* just being read once so there's no need to cache any reads for > > actually requested data. But as the example shows, requested data is > > not > > necessarily the data that lower layers have to fetch from the disk. > > > > Perhaps taking to VLC people on why they think this is useful and > > where > > it actually, measurably helped them would be interesting. > > > > Sorry if this is all perfectly obvious >=20 > There are certainly cases where the user can choose to run specific > apps in > such a way where this makes sense, so the OS needs this functionality. > As > to whether or not specific apps should use these APIs or if they > should make > use of these APIs configurable, that is a question for each app (e.g. > vlc). > However, the OS should provide the tools. >=20 I'd agree. However, there might be an argument for sysctl that tells the OS to ignore the hints, so a sysadmin can work around a case where an app runs poorly in their environment, due to the hint? rick