From owner-freebsd-scsi Fri Sep 10 14:50:28 1999 Delivered-To: freebsd-scsi@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id 01D6C152AF; Fri, 10 Sep 1999 14:50:20 -0700 (PDT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.1/8.9.1) with ESMTP id RAA21258; Fri, 10 Sep 1999 17:50:18 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.9.3/8.9.1) id RAA02122; Fri, 10 Sep 1999 17:49:48 -0400 (EDT) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Fri, 10 Sep 1999 17:49:48 -0400 (EDT) To: "Justin T. Gibbs" Cc: scsi@freebsd.org, gibbs@freebsd.org, anderson@cs.duke.edu Subject: Re: data corruption when using aic7890 In-Reply-To: <199909102049.OAA03111@caspian.plutotech.com> References: <14297.27236.577546.795593@grasshopper.cs.duke.edu> <199909102049.OAA03111@caspian.plutotech.com> X-Mailer: VM 6.43 under 20.4 "Emerald" XEmacs Lucid Message-ID: <14297.30000.669959.29706@grasshopper.cs.duke.edu> Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Justin T. Gibbs writes: > >This does seem to have an effect, so you might have the right knob to > >twiddle! > > > >Unfortunately, the change seems to make things even worse. The > >errors are occurring much more freqently now. Also, the errors are > >occuring later in the page. Where as before, the errors would almost > >always occur in the first 500 bytes of the page, now they're occuring > >near the end of the page (some around 2500 bytes, most near 3900). > > What are the dynamics of your test program? Are you sure that this > is a problem with reads and not with writes? If it is a problem > with reads, WR_DFTHRSH is what you should be tweaking, since the > directions are relative to the bus master (i.e. the aic7xxx part). > > -- > Justin We're using a home-grown program called 'hunt' (as in hunt for errors). I've left you source & an i386 binary on freefall in ~gallatin/hunt. Run it with the arguments -touch= -fileio= -size= So on a 512MB x86, you'd say './hunt.i386 -touch=4096 -fileio=zot -size=131072' It sequentially writes out the data & reads it back multiple times. The data set is large enough (>physical memory) so that its not being cached. So at least one successful read of the entire file indicates that it was written properly. When an error is encountered, that page of the file is re-written with the correct data. It prints '<' when it starts initializing, '>' after the file is completely written, and '.' for each successful read of the file. On an error, you'll see some information regarding what was read & what was expected. So, using an NCR875 controller I see: <>!.............................................. On the 7890, I'm seeing: <>!..........##error 1 page 16167 expected [0x03f27ff8] saw [0x056deff8] My tweak didn't seem to help either. The default setting seem to be the most reliable. Should I continue to tweak this variable, or do you have other ideas? I really appreciate your help. We've got 20 of these machines & I'd really like to avoid having to purchase 20 scsi controllers to replace the on-board ones. Thanks! Drew ------------------------------------------------------------------------------ Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: gallatin@cs.duke.edu Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message