FreeBSD Mail Archives

Date:      Fri, 26 Jul 2002 07:31:04 +1000
From:      Peter Jeremy <peter.jeremy@alcatel.com.au>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Andreas Koch <koch@eis.cs.tu-bs.de>, freebsd-stable@FreeBSD.ORG
Subject:   Re: 4.6-RC: Glacial speed of dump backups
Message-ID:  <20020726073104.R38313@gsmx07.alcatel.com.au>
In-Reply-To: <200207251715.g6PHFGDD034256@apollo.backplane.com>; from dillon@apollo.backplane.com on Thu, Jul 25, 2002 at 10:15:16AM -0700
References:  <20020606204948.GA4540@ultra4.eis.cs.tu-bs.de> <20020722081614.E367@gsmx07.alcatel.com.au> <20020722100408.GP26095@ultra4.eis.cs.tu-bs.de> <200207221943.g6MJhIBX054785@apollo.backplane.com> <20020725164416.A52778@gsmx07.alcatel.com.au> <200207251715.g6PHFGDD034256@apollo.backplane.com>

I wrote:
64MB cache hangs.

On 2002-Jul-25 10:15:16 -0700, Matthew Dillon <dillon@apollo.backplane.com> wrote:
>Interesting.  What was the cache block size reported by dump?

  DUMP: Cache 67108864 MB, blocksize = 32768

>If you have the time, it may be worth playing with the cache block size.

I don't have time right now, but I agree that this should be worth
experimenting with.

>    The NetBSD caching code appears to try to avoid caching whole blocks,
>    operating under the assumption that if a read for a whole block occurs
>    dump is not likely to re-request the block.  Changing the conditional
>    above and setting the BLKFACTOR to 1 in my code will mimic this
>    behavior.

Actually, from memory of the statistics I gathered previously, apart
from inodes, dump only ever reads a single "block" (offset/size pair)
once.  The trick is to identify when dump will read both (offset,size1)
and (offset+size1,size2) and merge it into read(offset,size1+size2)
(even though the original reads occur at different times and read into
non-adjacent buffers).  A traditional cache relies on locality of
reference - and I'm not sure that UFS layout provides this when there
are lots of small files.

>    I'm not sure why dump failed w/ a 64MB cache.  I will investigate.

Having had a bit of a closer look, the problem is related to swap
starvation - one of the children dies and the parent doesn't notice.

Also, whilst I knew dump forked multiple times, but I thought that the
parents were just sleeping.  It looks like at least the first few
children are active which means the system thrashes fairly badly unless
there's enough RAM to keep 5 or 6 copies of the cache resident.

I've tried repeating the 64M cache on another Proliant with 256MB RAM
and it ran to completion (though slowly).

This suggests that unless you want to limit dump to using very small
caches, you need to share the cache between all the children (which
implies a lot more synchronisation code).

Peter

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020726073104.R38313>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation