From owner-freebsd-current@FreeBSD.ORG Tue May 6 07:41:41 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D075C37B401 for ; Tue, 6 May 2003 07:41:38 -0700 (PDT) Received: from sauron.fto.de (p15106025.pureserver.info [217.160.140.13]) by mx1.FreeBSD.org (Postfix) with ESMTP id EE4FD43FB1 for ; Tue, 6 May 2003 07:41:37 -0700 (PDT) (envelope-from hschaefer@fto.de) Received: from localhost (localhost.fto.de [127.0.0.1]) by sauron.fto.de (Postfix) with ESMTP id 7ADE625C101 for ; Tue, 6 May 2003 16:41:36 +0200 (CEST) Received: from sauron.fto.de ([127.0.0.1]) by localhost (sauron [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 20533-10 for ; Tue, 6 May 2003 16:41:35 +0200 (CEST) Received: from giskard.foundation.hs (p5091A82E.dip.t-dialin.net [80.145.168.46]) by sauron.fto.de (Postfix) with ESMTP id 32E0325C100 for ; Tue, 6 May 2003 16:41:35 +0200 (CEST) Received: from daneel.foundation.hs (daneel.foundation.hs [192.168.20.2]) by giskard.foundation.hs (8.9.3/8.9.3) with ESMTP id QAA42359 for ; Tue, 6 May 2003 16:41:30 +0200 (CEST) (envelope-from hschaefer@fto.de) Date: Tue, 6 May 2003 16:41:30 +0200 (CEST) From: Heiko Schaefer X-X-Sender: heiko@daneel.foundation.hs To: current@freebsd.org Message-ID: <20030506162410.M66653@daneel.foundation.hs> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by amavisd-new at fto.de Subject: data corruption with current (maybe sis chipset related?) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 May 2003 14:41:42 -0000 Hi List, i already brought up my issue with data corruption when i suspected that gbde might be the cause for it. it turns out gbde was not guilty. then just now i thought that M. Warner Losh's mail (subject 'Precaution') could explain what's going wrong, but i still have the problem with a new world and kernel as of today. i can reproduce the data corruption by doing the following: (i have two disks in the box, one is 30GB, the other is 60GB) the 30GB disk is already filled with data, which i then copy (in parallel) into two directories on the 60GB disk. the result is that i (should) have two times the same data on the 60gb disk as on the 30gb disk. then i compute the checksums of the duplicated files - and more often than not a few files are corrupted in the copied version. more numbers: i typically see 1-2 blocks of corrupted data (32kb is the size of corrupted data i usually see) on that 60gb disk. usually the corruptions are aligned within the file, at least to a multiple of 512. often, the corrupted data consists of lots of 0-bytes, but i also see data that looks random in other places of the corrupted segments of the files. it seems that not only the content of files gets corrupted, i also see errors when i fsck that partition, sometimes (for example: once i saw a file that had a size in "ls -l" which clearly didn't match its actual content, as seen by "wc"). by now i have ruled out a number of possible reasons: - i am only using local disks (no networking as i did initially) - first i used a 512mb ddr memory, now i use a 256mb sdr one, which should (i believe) have different enough properties to rule out the original memory as the cause of the problem as i see it, the issue can be hardware-related (mainboard/cpu seem to be the only remaining possibilities) or software related (maybe the driver for the chipset, in particular the harddrive controller, is suboptimal ?), or maybe come other freebsd code that moves around data makes occasional mistakes. the board is an elitegroup k7-s5a lan (with sis 735 chipset), the cpu an amd xp 1800+ (i specifically bought that hardware very recently to run a gbde-based nfs server on it). does anyone know of any (freebsd-current) issues that might be causing this - or have any idea on how i can further rule out anything of this kind ? my best idea at this point is to go out and buy p4-board and cpu. and i don't really like that. it seems almost futile to go and get another board/cpu of the same type before i have a good idea what is actually wrong :( regards, thanks for any thoughts, Heiko -- Free Software. Why put up with inferior code and antisocial corporations? http://www.gnu.org/philosophy/why-free.html