Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Sep 2008 14:38:56 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Kevin Oberman <oberman@es.net>
Cc:        Derek =?iso-8859-1?B?S3VsacU/c2tp?= <takeda@takeda.tk>, Michael <freebsdports@bindone.de>, freebsd-stable@freebsd.org
Subject:   Re: bin/121684: : dump(8) frequently hangs
Message-ID:  <20080901213856.GA17155@icarus.home.lan>
In-Reply-To: <20080901160013.0005F4500F@ptavv.es.net>
References:  <200809011336.m81Da5BT046532@lava.sentex.ca> <20080901160013.0005F4500F@ptavv.es.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Sep 01, 2008 at 09:00:12AM -0700, Kevin Oberman wrote:
> > Date: Mon, 01 Sep 2008 09:36:11 -0400
> > From: Mike Tancsa <mike@sentex.net>
> > Sender: owner-freebsd-stable@freebsd.org
> > 
> > At 05:07 AM 9/1/2008, Derek Kuli??ski wrote:
> > 
> > >Now I'm honestly a bit scared about it (even if it will be fixed
> > >before 7.1, I'm not sure I'll hurry with the update).
> > 
> > There have been a number of commits to releng_7 
> > that fixed dump issues for me.  A box that used 
> > to regularly exhibit hung dump processes have 
> > been working fine since April.  e.g. a kernel from
> > 7.0-STABLE FreeBSD 7.0-STABLE #4: Wed Apr 30
> > 
> > does weekly level 0 dumps and daily differential 
> > dumps on the file systems below without issue
> > % df -i
> > Filesystem    1K-blocks      Used     Avail 
> > Capacity iused    ifree %iused  Mounted on
> > /dev/twed0s1a   2026030    284346   1579602    15%    2937   279685    1%   /
> > devfs                 1         1         0 
> > 100%       0        0  100%   /dev
> > /dev/twed0s1d   5077038    575828   4095048 
> > 12%    1197   658257    0%   /tmp
> > /dev/twed0s1e  20308398  11072840   7610888 
> > 59% 1065406  1572416   40%   /usr
> > /dev/twed0s1f  20308398  13275050   5408678 
> > 71%   13750  2624072    1%   /var
> > /dev/twed0s1g 246875258 
> > 186393906  40731332    82% 9118036 22794922   29%   /zoo
> > 
> > However, you should test and make sure it works for you.
> 
> I have a 7-Stable system which has not been able to successfully dump(8)
> for about 2 months. Since it contains almost no important data that is
> subject to change, it's not too big a deal, but I worry that other
> systems might start showing the same problems.
> 
> I have no idea why it's failing, though, and I have spent little effort
> in troubleshooting it. I'm running 3 week old stable and I'll be
> updating to today's RELENG_7 later today.

Can someone explain what "dump frequently hangs" actually means?

Does it lock up the entire machine indefinitely (and if so, how long did
you wait for it to (hopefully) recover)?

Or does it more or less "deadlock" the machine, making it generally
unusable, until the dump is completely finished?

If the latter, I can confirm this problem -- which is why we moved all
of our production systems away from using dump on UFS2 to simply using
rsnapshot[1].  I'll try to find the thread (it was a year or so ago)
where a developer told me more or less what was going on.  The problem
was that UFS2 snapshot generation, over time, becomes slower and slower
to generate (this is what dump does on UFS2 systems, with or without the
-L flag), and is a known design issue.

If anything, this issue makes ZFS incredibly important with regards to
-STABLE, where its snapshot generation for backups does not behave this
was; fast and very easily managable.

[1]: rsync is great for backups, and very fast, but there's the issue of
modifying atimes.  I committed a patch to ports/net/rsync which adds an
--atimes flag, except its behaviour is not what you'd expect: the file
which was copied, at the destination, has the correct atime (of the
source), but the source itself ends up getting its atime modified, so
you're essentially destroying the atime data on the source.

This is a problem when it comes to programs which use atime to discern
things, such as classic UNIX mailboxes/mbox.  "Um, why does mutt say I
don't have any new mail when I do??" In our case, the only person using
classic UNIX mboxes with a mail client local to the machine was me, so I
ended up migrating my procmail rules and data to Maildir using mutt,
solving the problem entirely.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080901213856.GA17155>