From owner-freebsd-hackers@FreeBSD.ORG  Fri Aug 15 20:16:51 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A567037B401; Fri, 15 Aug 2003 20:16:51 -0700 (PDT)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id F2DD243FBF; Fri, 15 Aug 2003 20:16:50 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.12.9/8.12.6) with ESMTP id h7G3GoVI084340;
	Fri, 15 Aug 2003 20:16:50 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9/8.12.6/Submit) id h7G3Go4b084339;
	Fri, 15 Aug 2003 20:16:50 -0700 (PDT)
Date: Fri, 15 Aug 2003 20:16:50 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200308160316.h7G3Go4b084339@apollo.backplane.com>
To: Peter Jeremy <PeterJeremy@optushome.com.au>
References: <200308151204.h7FC42rq050760@repoman.freebsd.org>
	<20030816024753.GA74853@cirb503493.alcatel.com.au>
cc: hackers@freebsd.org
Subject: Re: cvs commit: src/sys/nfsclient bootp_subr.c nfs_diskless.c
	nfs_vfsops.c nfsdiskless.h
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Aug 2003 03:16:52 -0000

    Swapfiles should always be preallocated, never sparse.  There are two
    major reasons for this:  First, if the target filesystem fills up all
    hell will break lose.  The kernel uses swap space too, remember, it
    won't be just user programs that will start crashing (think UAREA).

    Second, once a swap block is allocated in the file the backing store
    for the file will never be deallocated, even if the swap is freed.
    One will wind up with a creeping allocation problem that will eventually
    fill up the filesystem (since a person usually uses sparse files in
    these cases precisely because their filesystem is too small to 
    accomodate the configuration they want).  This can lead to all sorts of
    weird system failures.

    In regards to linear physical block allocation verses the logical swap
    block, it is likely that the swap system will allocate 'new' swap 
    linearly.  But it is *NOT* guarenteed.  Swap is allocated in clusters
    of varying sizes which are in turn based on the size of the VM object.
    Swap is also always allocated contiguously.  So if one swap page is
    allocated, and then 8 swap pages are allocated, there will be a gap of
    7 pages in the swap area.  In addition, swap operations are not
    necessarily initiated in order.  If a system is swapping heavily it could
    very well issue the WRITE for a later swap cluster prior to issuing the
    WRITE for an earlier swap cluster.  This can lead to severe fragmentation
    of the file and severe degredation of swap performance, but it
    probably will not be as bad as the type of degredation you get with
    mmap() (fragmentation from dirty mmap()'d pages is *SEVERE* because
    even clustered writes are issued completely out of order with NO 
    locality of reference whatsoever).  At least with the swap there is likely
    to be some locality of reference.

    -

    In regards to swap block reuse... swap space is freed by freeing its 
    representitive bit in the radix tree which is stored in kernel memory.
    The swap system has no clue as to whether the actual backing store has or
    has not allocated a block for what it considers to be 'free' swap space.
    Swap space has a tendancy to be allocated from the bottom up, but only
    loosly, so it is highly likely that a freed swap block will be reused.
    But reuse depends on how well the system is able to cluster a pageout
    operation.  The swap system always allocates a 'contiguous' block of
    swap so if one page of swap is freed but then 8 pages are requested, that
    one free page is not likely to be reused until there is a request for
    one page of swap.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


:[Redirected to -hackers because this isn't directly relevant to the
: actual code committed]
:
:On Fri, Aug 15, 2003 at 05:04:02AM -0700, Poul-Henning Kamp wrote:
:>  Suggested replacement command sequence on the client:
:>  
:>          dd if=/dev/zero of=/swapfile bs=1k count=1 oseek=100000
:>          swapon /swapfile
:
:This results in a sparse swapfile.  Whilst this minimises diskspace
:occupancy on the server (which is in keeping with the swap overcommit
:principle used in the VM subsystem), there are other side-effects
:which may not be so advantageous.
:
:Firstly, the client VM system can receive ENOSPC - which can't occur
:on a swap device.  How does the pager handle this?  Does it panic,
:kill the task the owns the page in question or what?
:
:Secondly, this effectively means that the physical disk blocks are
:effectively being allocated by the client.  I recall reading a comment
:that recommended against using ftruncate() and mmap() to extend files
:because this resulted in sub-optimal block allocation compared to
:write().  Will the same thing happen in this case?
:
:Also, how are dirtied swap blocks reused?  Once a physical block has
:been allocated, it is beneficial to reuse that block in preference to
:allocating another block.  This only matters in the situation where
:you are paging into a sparse file - which is probably not a common
:case and therefore unlikely to have been taken into account when the
:block reuse algorithm was developed.
:
:Peter
:_______________________________________________
:freebsd-hackers@freebsd.org mailing list
:http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
:To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"