From owner-freebsd-hackers@FreeBSD.ORG Fri Aug 15 20:16:51 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A567037B401; Fri, 15 Aug 2003 20:16:51 -0700 (PDT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id F2DD243FBF; Fri, 15 Aug 2003 20:16:50 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.9/8.12.6) with ESMTP id h7G3GoVI084340; Fri, 15 Aug 2003 20:16:50 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9/8.12.6/Submit) id h7G3Go4b084339; Fri, 15 Aug 2003 20:16:50 -0700 (PDT) Date: Fri, 15 Aug 2003 20:16:50 -0700 (PDT) From: Matthew Dillon Message-Id: <200308160316.h7G3Go4b084339@apollo.backplane.com> To: Peter Jeremy References: <200308151204.h7FC42rq050760@repoman.freebsd.org> <20030816024753.GA74853@cirb503493.alcatel.com.au> cc: hackers@freebsd.org Subject: Re: cvs commit: src/sys/nfsclient bootp_subr.c nfs_diskless.c nfs_vfsops.c nfsdiskless.h X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Aug 2003 03:16:52 -0000 Swapfiles should always be preallocated, never sparse. There are two major reasons for this: First, if the target filesystem fills up all hell will break lose. The kernel uses swap space too, remember, it won't be just user programs that will start crashing (think UAREA). Second, once a swap block is allocated in the file the backing store for the file will never be deallocated, even if the swap is freed. One will wind up with a creeping allocation problem that will eventually fill up the filesystem (since a person usually uses sparse files in these cases precisely because their filesystem is too small to accomodate the configuration they want). This can lead to all sorts of weird system failures. In regards to linear physical block allocation verses the logical swap block, it is likely that the swap system will allocate 'new' swap linearly. But it is *NOT* guarenteed. Swap is allocated in clusters of varying sizes which are in turn based on the size of the VM object. Swap is also always allocated contiguously. So if one swap page is allocated, and then 8 swap pages are allocated, there will be a gap of 7 pages in the swap area. In addition, swap operations are not necessarily initiated in order. If a system is swapping heavily it could very well issue the WRITE for a later swap cluster prior to issuing the WRITE for an earlier swap cluster. This can lead to severe fragmentation of the file and severe degredation of swap performance, but it probably will not be as bad as the type of degredation you get with mmap() (fragmentation from dirty mmap()'d pages is *SEVERE* because even clustered writes are issued completely out of order with NO locality of reference whatsoever). At least with the swap there is likely to be some locality of reference. - In regards to swap block reuse... swap space is freed by freeing its representitive bit in the radix tree which is stored in kernel memory. The swap system has no clue as to whether the actual backing store has or has not allocated a block for what it considers to be 'free' swap space. Swap space has a tendancy to be allocated from the bottom up, but only loosly, so it is highly likely that a freed swap block will be reused. But reuse depends on how well the system is able to cluster a pageout operation. The swap system always allocates a 'contiguous' block of swap so if one page of swap is freed but then 8 pages are requested, that one free page is not likely to be reused until there is a request for one page of swap. -Matt Matthew Dillon :[Redirected to -hackers because this isn't directly relevant to the : actual code committed] : :On Fri, Aug 15, 2003 at 05:04:02AM -0700, Poul-Henning Kamp wrote: :> Suggested replacement command sequence on the client: :> :> dd if=/dev/zero of=/swapfile bs=1k count=1 oseek=100000 :> swapon /swapfile : :This results in a sparse swapfile. Whilst this minimises diskspace :occupancy on the server (which is in keeping with the swap overcommit :principle used in the VM subsystem), there are other side-effects :which may not be so advantageous. : :Firstly, the client VM system can receive ENOSPC - which can't occur :on a swap device. How does the pager handle this? Does it panic, :kill the task the owns the page in question or what? : :Secondly, this effectively means that the physical disk blocks are :effectively being allocated by the client. I recall reading a comment :that recommended against using ftruncate() and mmap() to extend files :because this resulted in sub-optimal block allocation compared to :write(). Will the same thing happen in this case? : :Also, how are dirtied swap blocks reused? Once a physical block has :been allocated, it is beneficial to reuse that block in preference to :allocating another block. This only matters in the situation where :you are paging into a sparse file - which is probably not a common :case and therefore unlikely to have been taken into account when the :block reuse algorithm was developed. : :Peter :_______________________________________________ :freebsd-hackers@freebsd.org mailing list :http://lists.freebsd.org/mailman/listinfo/freebsd-hackers :To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"