From owner-freebsd-arch  Thu Nov 11 21:32:37 1999
Delivered-To: freebsd-arch@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id 3E7001541B
	for <freebsd-arch@freebsd.org>; Thu, 11 Nov 1999 21:32:19 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id GAA10063
	for <freebsd-arch@freebsd.org>; Fri, 12 Nov 1999 06:32:17 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id GAA16089
	for freebsd-arch@freebsd.org; Fri, 12 Nov 1999 06:32:17 +0100 (MET)
Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169])
	by hub.freebsd.org (Postfix) with ESMTP id 2A38915172
	for <freebsd-arch@freebsd.org>; Thu, 11 Nov 1999 21:31:56 -0800 (PST)
	(envelope-from ken@panzer.kdm.org)
Received: (from ken@localhost)
	by panzer.kdm.org (8.9.3/8.9.1) id WAA32504;
	Thu, 11 Nov 1999 22:30:22 -0700 (MST)
	(envelope-from ken)
Message-Id: <199911120530.WAA32504@panzer.kdm.org>
Subject: Re: I/O Evaluation Questions (Long but interesting!)
In-Reply-To: <382BA304.EE2F0D66@simon-shapiro.org> from Simon Shapiro at "Nov 12, 1999 00:17:56 am"
To: shimon@simon-shapiro.org (Simon Shapiro)
Date: Thu, 11 Nov 1999 22:30:22 -0700 (MST)
Cc: rjesup@wgate.com (Randell Jesup), freebsd-arch@freebsd.org
From: "Kenneth D. Merry" <ken@kdm.org>
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Simon Shapiro wrote...
> "Kenneth D. Merry" wrote:
> > It could be that the combination of the DPT controller's 256MB cache and
> > fancy queueing, and your 1GB of RAM is causing the amazingly fast disk speeds.
> 
> These DPTs seem to be optimal for RAID-5, very good at RAID-0
> and nothing exciting for single disks.  I have some FC-AL
> gear on order.
> 
> What worries me is not the perfromance, but the corruption
> of the stack that I see.
> 
> For example, I can run the same 400 processes against the
> raw device all day and all night without a hitch.
> Run them against a block device and something bizzare
> happens;  A filesystem get corrupted, the Adaptec driver
> times out, tsleep segfaults, something.  At times I can
> get the error in the driver, but then it makes no sense 
> either.  There are tons of self-checks and state
> verifications in the code.  None trip, or when they do
> they are as illogical as the null pointer inside tsleep.

Well, since you've done a lot of work to try to isolate the problem in your
code, but haven't tracked it down, I'd suggest taking your code out of the
picture as a variable.

Create a CCD or Vinum array, using the same disks on Adaptec controllers.
Run the same tests, against the raw and block devices, and see if you get
the same sort of weird behavior.

If you do, you have solid proof that it's not your code, since your code
wasn't in the kernel.  If you don't, unfortunately, you don't have solid
proof either way.  (Since in that case, it could be some set of
circumstances that your driver tickles that CCD or Vinum don't.)

One other thing to make sure of is that you're running a -stable with
Justin's Adaptec driver bug fix from September 20th.  It fixed some cases
where corruption could happen with Ultra 2 Adaptec controllers.

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message