Date: Thu, 30 Aug 2007 19:13:25 +0100 (BST) From: "Mark Powell" <M.S.Powell@salford.ac.uk> To: Mark Powell <M.S.Powell@salford.ac.uk> Cc: freebsd-current@freebsd.org Subject: Re: Another ZFS kernel panic on same block on every drive in raidz Message-ID: <20070830190328.B60345@rust.salford.ac.uk> In-Reply-To: <20070830183305.X60345@rust.salford.ac.uk> References: <20070830183305.X60345@rust.salford.ac.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 30 Aug 2007, Mark Powell wrote: > I am being told that a dma error is occuring on the same block on all 3 > drives at the same time: > > Just performing a scrub now to see what happens. The scrub performed fine. The panic is occuring under heavyish use; with 3 simultaneous rsync from an XP box over samba. Just recalled that it paniced earlier, but I was in X and couldn't see the message. Surprisingly it did log something: Aug 30 17:27:48 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435298 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435298 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435298 Aug 30 17:28:29 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435297 Aug 30 17:28:29 echo kernel: ad16: FAILURE - WRITE_DMA timed out LBA=268435297 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435298 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435298 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435297 Aug 30 17:28:29 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435297 Aug 30 17:28:29 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435298 Aug 30 17:28:29 echo kernel: ad16: FAILURE - WRITE_DMA timed out LBA=268435297 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435425 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435426 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435425 Aug 30 17:28:29 echo kernel: ad18: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435425 Aug 30 17:28:29 echo kernel: ad14: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435426 Aug 30 17:28:29 echo kernel: ad16: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435425 Aug 30 17:28:29 echo kernel: ad18: FAILURE - WRITE_DMA timed out LBA=268435425 Aug 30 17:28:29 echo kernel: ad14: FAILURE - WRITE_DMA timed out LBA=268435426 Here the blocks are different and 4 blocks overall are reported as having problems. In hex they all start FFFFFxx ? They are (including the one from the previous report): 268435297 fffff61 268435298 fffff62 268435340 fffff8c 268435425 fffffe1 268435426 fffffe2 Coincidence? This is on amd64 with all drives connected to the ICH9 ports on a Gigabyte Intel P35 based MB. Current is from 25/8/7. Cheers. -- Mark Powell - UNIX System Administrator - The University of Salford Information Services Division, Clifford Whitworth Building, Salford University, Manchester, M5 4WT, UK. Tel: +44 161 295 4837 Fax: +44 161 295 5888 www.pgp.com for PGP key
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070830190328.B60345>