From owner-freebsd-stable@FreeBSD.ORG Wed Sep 3 17:56:19 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2B469107A585; Wed, 3 Sep 2008 17:56:19 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id C98C08FC13; Wed, 3 Sep 2008 17:56:18 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.home (pooker.samsco.home [192.168.254.1]) by pooker.samsco.org (8.14.2/8.14.2) with ESMTP id m83Hu9AG040834; Wed, 3 Sep 2008 11:56:09 -0600 (MDT) (envelope-from scottl@samsco.org) Date: Wed, 3 Sep 2008 11:56:09 -0600 (MDT) From: Scott Long To: Igor Sysoev In-Reply-To: <20080903174452.GB73831@rambler-co.ru> Message-ID: <20080903114853.Q39726@pooker.samsco.org> References: <20080903095352.GA62541@rambler-co.ru> <20080903123955.GE2038@deviant.kiev.zoral.com.ua> <20080903124733.GH62541@rambler-co.ru> <20080903103846.T39726@pooker.samsco.org> <20080903174452.GB73831@rambler-co.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Spam-Status: No, score=-4.4 required=3.8 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: Kostik Belousov , freebsd-stable@freebsd.org, Tor Egge Subject: Re: vfs.ffs.rawreadahead X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Sep 2008 17:56:19 -0000 On Wed, 3 Sep 2008, Igor Sysoev wrote: > On Wed, Sep 03, 2008 at 10:44:46AM -0600, Scott Long wrote: > >> On Wed, 3 Sep 2008, Igor Sysoev wrote: >>> On Wed, Sep 03, 2008 at 03:39:55PM +0300, Kostik Belousov wrote: >>> >>>> On Wed, Sep 03, 2008 at 01:53:52PM +0400, Igor Sysoev wrote: >>>>> Hi, >>>>> >>>>> could anyone tell what does vfs.ffs.rawreadahead enable ? >>>>> As I understand it's used in DIRECTIO code that allows read data >>>>> directly to an userland buffer bypassing the buffer cache. >>>>> What I can not understand where the read ahead data can be placed in ? >>>> >>>> The operation of the ffs_rawread is more accurately described as >>>> bypassing the page cache. It creates the physical buffer that maps >>>> the user pages. >>>> >>>> The readahead is performed only when the supplied user memory region >>>> is bigger then blocksize. In this case, two reads are performed >>>> simultaneously, with both buffers mapping consequent blocks from >>>> user-supplied buffers. The read operation looks like footsteps. >>> >>> Nice! >>> >>> As I understand the size limit of one read operation is MAXPHYS, which is >>> equal to 128K due to LBA28 ATA limit. On SCSI, SATA, and LBA48 ATA this >>> limit >>> can be increased. Is it safe ? >> >> The value of MAXPHYS is unrelated to capabilities or limitations of ATA. >> It was chosen based on the needs to prevent an excessive amount of >> parallel I/O from exhausting the kernel address space and system memory. >> In fact, the concern was with SCSI, not with ATA. >> >> MAXPHYS can be raised, especially on 64bit platforms, but doing so also >> bloats the sizes of a few key data structures. I've been looking at a >> solution for this, and I'd rather that people keep their MAXPHYS changes >> confined to their local trees rather than changing FreeBSD unless they >> also solve the associated side effects. > > As I understand MAXPHYS affects at least on pager_map size: on modern > machines it's usually 256 * MAXPHYS = 32M, therefore increasing MAXPHYS > will increase the map too. This is intended and desirable. > > The 128K is probably good value and I do not suggest to increase it by > default, I just want to increase MAXPHYS to improve disk throughput > on some hosts where nginx serves large files (1G+) using DIRECTIO. I've tested increases up to 1M, and they all are very beneficial not only for silly sequential style benchmarks but also for clustered i/o. 256-512k is the sweet spot, but Windows has set the standard at 1M and I'd like to have FreeBSD follow suit eventually. > > BTW, is it possible to change MAXPHYS to a loader tunnable ? > > No. Struct buf is sized based on MAXPHYS, and there's no convenient way yet to dynamically size that at runtime. Scott