From owner-freebsd-hackers Thu Feb 7 9:51: 1 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from moebius2.Space.Net (moebius2.Space.Net [195.30.1.100]) by hub.freebsd.org (Postfix) with SMTP id 5546937B416 for ; Thu, 7 Feb 2002 09:50:54 -0800 (PST) Received: (qmail 88832 invoked by uid 1013); 7 Feb 2002 17:50:52 -0000 Date: Thu, 7 Feb 2002 18:50:52 +0100 From: Markus Stumpf To: freebsd-hackers@freebsd.org Subject: dump(8) race conditions? Message-ID: <20020207185052.A87994@Space.Net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Organization: SpaceNet AG, Muenchen, Germany X-PGP-Fingerprint: 66 F3 75 79 01 D0 B8 5F 1A C7 77 88 4A B6 70 DF Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG We use amanda and dump for backups. Some hosts have rather busy disks even during non prime time hours when backup is run. From time to time amanda reports dump(8) errors like the following: sendbackup: info end | DUMP: Date of this level 5 dump: Wed Feb 6 01:53:12 2002 | DUMP: Date of last level 4 dump: Mon Feb 4 02:31:40 2002 | DUMP: Dumping /dev/rda4s1e (/share/turing/disk07) to standard output | DUMP: mapping (Pass I) [regular files] | DUMP: mapping (Pass II) [directories] | DUMP: estimated 2423080 tape blocks. | DUMP: dumping (Pass III) [directories] | DUMP: dumping (Pass IV) [regular files] | DUMP: 14.72% done, finished in 0:28 | DUMP: 33.78% done, finished in 0:19 | DUMP: 52.84% done, finished in 0:13 | DUMP: 71.65% done, finished in 0:07 ? DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921522]: count=3072 ? DUMP: DUMP: read error from /dev/rda4s1e: Invalid argument: [sector -410921522]: count=512 ? DUMP: read error from /dev/rda4s1e: Invalid argument: [block -410921532]: count=5120 ? DUMP: read error from /dev/rda4s1e: Invalid argument: [block -1001057530]: count=1024 [ ... ] First time we saw this we took down the machine to single user, unmounted the disk and fsck'd it. No errors where found and the next backups (even level 0) made it without errors. As we where still suspicious as to what might be the reason for this really sporadic error messages from different machines and different disks I look through the source of dump. If I do interpret the code correctly dump caches directory inode lists. Now, if during a dump and after caching the inode infos files get removed/shrunk dump has a "dirty" cache and tries to access blocks that are not/no longer allocated and the result are the above errors. Am I right with my interpretation or are this really hardware errors? Thanks, \Maex -- SpaceNet AG | Joseph-Dollinger-Bogen 14 | Fon: +49 (89) 32356-0 Research & Development | D-80807 Muenchen | Fax: +49 (89) 32356-299 "The security, stability and reliability of a computer system is reciprocally proportional to the amount of vacuity between the ears of the admin" To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message