From owner-freebsd-hackers  Thu Feb  7  9:51: 1 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from moebius2.Space.Net (moebius2.Space.Net [195.30.1.100])
	by hub.freebsd.org (Postfix) with SMTP id 5546937B416
	for <freebsd-hackers@freebsd.org>; Thu,  7 Feb 2002 09:50:54 -0800 (PST)
Received: (qmail 88832 invoked by uid 1013); 7 Feb 2002 17:50:52 -0000
Date: Thu, 7 Feb 2002 18:50:52 +0100
From: Markus Stumpf <maex-freebsd-hackers@Space.Net>
To: freebsd-hackers@freebsd.org
Subject: dump(8) race conditions?
Message-ID: <20020207185052.A87994@Space.Net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
Organization: SpaceNet AG, Muenchen, Germany
X-PGP-Fingerprint: 66 F3 75 79 01 D0 B8 5F  1A C7 77 88 4A B6 70 DF
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

We use amanda and dump for backups. Some hosts have rather busy disks
even during non prime time hours when backup is run.

From time to time amanda reports dump(8) errors like the following:

sendbackup: info end
|   DUMP: Date of this level 5 dump: Wed Feb  6 01:53:12 2002
|   DUMP: Date of last level 4 dump: Mon Feb  4 02:31:40 2002
|   DUMP: Dumping /dev/rda4s1e (/share/turing/disk07) to standard output
|   DUMP: mapping (Pass I) [regular files]
|   DUMP: mapping (Pass II) [directories]
|   DUMP: estimated 2423080 tape blocks.
|   DUMP: dumping (Pass III) [directories]
|   DUMP: dumping (Pass IV) [regular files]
|   DUMP: 14.72% done, finished in 0:28
|   DUMP: 33.78% done, finished in 0:19
|   DUMP: 52.84% done, finished in 0:13
|   DUMP: 71.65% done, finished in 0:07
?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921522]: count=3072
?   DUMP:   DUMP: read error from /dev/rda4s1e: Invalid argument: [sector -410921522]: count=512
?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921532]: count=5120
?   DUMP: read error from /dev/rda4s1e: Invalid argument: [block -1001057530]: count=1024
[ ... ]

First time we saw this we took down the machine to single user, unmounted
the disk and fsck'd it. No errors where found and the next backups (even
level 0) made it without errors.

As we where still suspicious as to what might be the reason for this really
sporadic error messages from different machines and different disks I
look through the source of dump.

If I do interpret the code correctly dump caches directory inode lists.
Now, if during a dump and after caching the inode infos files get
removed/shrunk dump has a "dirty" cache and tries to access blocks
that are not/no longer allocated and the result are the above errors.

Am I right with my interpretation or are this really hardware errors?

Thanks,

	\Maex

-- 
SpaceNet AG            | Joseph-Dollinger-Bogen 14 | Fon: +49 (89) 32356-0
Research & Development |       D-80807 Muenchen    | Fax: +49 (89) 32356-299
"The security, stability and reliability of a computer system is reciprocally
 proportional to the amount of vacuity between the ears of the admin"

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message