Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Jul 2008 21:45:12 +0800
From:      Astrodog <astrodog@gmail.com>
To:        "gnn@freebsd.org" <gnn@freebsd.org>
Cc:        current@freebsd.org
Subject:   Re: Has anyone else seen any form of in memory or on disk corruption?
Message-ID:  <2fd864e0807060645j65b67f97s5dc1e81145660c9d@mail.gmail.com>
In-Reply-To: <m2r6a9poww.wl%gnn@neville-neil.com>
References:  <m2r6a9poww.wl%gnn@neville-neil.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 7/5/08, gnn@freebsd.org <gnn@freebsd.org> wrote:
> Hi,
>
>  I've been working on the following brain teasing (breaking?) problem
>  for about a week now.  What I'm seeing is that on large memory
>  machines, those with more than 4G of RAM, the ungzipping/untarring of
>  files fails due to gzip thinking the file is corrupt.  The way to
>  reproduce this is:
>
>  1) Create a bunch of gzip/tar balls in the 1-20MB range.
>  2) Reboot FreeBSD 7.0 release
>  3) Run gzip -t over all the files.
>
>  I have hundreds of these files to run this over, and a full check
>  takes about 3 hours, but I usually see some form of corruption within
>  the first 20 minutes.
>
>  Other important factors:
>
>  1) This is on very modern, 2P/4Core (8 cores total) hardware
>  2) The disks are 1TB SATA set up in JBOD.
>  3) The machines have 16G of RAM.
>  4) Corruption is seen only after a reboot, if the machines continue to
>  run corruption is never seen again, until another reboot.
>  5) The systems are all Xeon running amd64
>  6) The disk controller is an AMCC 9650, but we do see this very rarely
>  with the on board controlller.
>  7) All boards are
>
>  http://www.supermicro.com/products/motherboard/Xeon1333/5400/X7DWU.cfm
>
>  8) All machines have 3 1TB drives.
>  9) The corruption is in 4K chunks.  That is N x 4K.
>  10) Files are not normally corrupted on disk, but this can happen.
>
>  I have already tried a few of the obvious things, such as making sure
>  that we sync pages before we shutdown the twa driver.
>
>  Given what I have seen I believe this is something that happens from
>  startup, and not at shutdown.
>
>  Thoughts?
>
>  Best,
>  George
>

As a datapoint for you, I use a number of the 9650s with 15 750GB or
1TB drives, on a Supermicro motherboard with Opteron processors and
4GB of memory. With this configuration I have not experienced any data
corruption.

--- Harrison



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2fd864e0807060645j65b67f97s5dc1e81145660c9d>