From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 24 01:07:53 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F61B1065674 for ; Wed, 24 Jun 2009 01:07:53 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id BC84E8FC15 for ; Wed, 24 Jun 2009 01:07:52 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n5O17qhW015113; Tue, 23 Jun 2009 18:07:52 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.2/8.13.4/Submit) id n5O17q9j015112; Tue, 23 Jun 2009 18:07:52 -0700 (PDT) Date: Tue, 23 Jun 2009 18:07:52 -0700 (PDT) From: Matthew Dillon Message-Id: <200906240107.n5O17q9j015112@apollo.backplane.com> To: Nirmal Thacker References: <87429ffe0906231252j7c84489dt6ebd60333654f411@mail.gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: Dump Utility cache efficiency analysis X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Jun 2009 01:07:53 -0000 :Hello : :This is regarding the dump utility cache efficiency analysis post made on :February '07 by Peter Jeremy [ :http://lists.freebsd.org/pipermail/freebsd-hackers/2007-February/019666.html] :and if this project is still open. I would be interested to begin exploring :FreeBSD (and contributing) by starting this project. : :I do have some basic understanding of the problem at hand - to determine if :a unified cache would appeal as a more efficient/elegant solution compared :to the per-process-cache in the Dump utility implementation. I admit I am :new to this list and FreeBSD so I wouldn't be able to determine what the :current implementation is, until I get started. :... I think the cache in the dump utility is still the one I worked up a long time ago. It was a quick and dirty job at the time, and it was never really designed for parallel operation which is probably why it doesn't work so well in that regard. In my opinion, a unified cache would be an excellent improvement. Ultimately dump is an I/O bound process so I don't think we would really need to worry about the minor increases in cpu overhead from the additional locking needed. There are a few issues you will have to consider: * Dump uses a fork model for its children rather then pthreads. You would either have to use the F_*LK fcntl() operations or use a simpler flock() scheme to lock across the children. Alternatively you could change dump over to a pthreads model and use pthreads mutexes, but that would entail a lot more work. Dump was never designed to be threaded. * The general issue with any caching scheme for dump is how much to actually cache per I/O vs the size of the cache. Caching larger amounts of data hits diminishing returns as it also increases seek times and waste (cached data never usde). Caching smaller amounts of data hits diminishing returns as it causes the disk to seek more. Disk drives generally do have a track cache, but they also only typically have 8-16M of cache ram (32M in newer drives, particularly the higher capacity ones). A track is typically about 1-2M (maybe higher now) so it doesn't take much seeking for the drive to blow out its internal track cache. Caching that much data in a single read would probably be detrimental anyway. This also means you do not necessarily want to cache too much linearly-read data, as the disk drive is already doing it for you. Because of all of this it is going to be tough to find cache parameters that work well generally, and the parameters are going to chance drastically based on the amount of cache you specify on the command line and the size of the partition being dumped. -Matt