From owner-freebsd-current  Mon Apr 15 16:49:22 2002
Delivered-To: freebsd-current@freebsd.org
Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62])
	by hub.freebsd.org (Postfix) with ESMTP id 1E7CF37B404
	for <freebsd-current@freebsd.org>; Mon, 15 Apr 2002 16:49:19 -0700 (PDT)
Received: from pool0600.cvx21-bradley.dialup.earthlink.net ([209.179.194.90] helo=mindspring.com)
	by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16xGDd-00008M-00; Mon, 15 Apr 2002 16:49:09 -0700
Message-ID: <3CBB66DA.F9C94ED0@mindspring.com>
Date: Mon, 15 Apr 2002 16:48:42 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: msch@snafu.de
Cc: freebsd-current@freebsd.org
Subject: Re: ATA errors on recent -current
References: <E16xCAf-0005kI-00@smart.eusc.inter.net>
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

Matthias Schuendehuette wrote:
> I still have an old FreeBSD Test-Installation (45GB are big enough :-)
> with a 4.4-STABLE as of Okt 23, 2001...
> =

> It boots off the DTLA, uses tagged-queuing and connects using UDMA100..=
=2E
> ... and doesn't have any problems!!
> =

> So, to bring some of you down to earth again, the DTLA may be a
> horrible disk and I'm one of the last to praise ATA at all (My machine
> has two SCSI host adaptors, five SCSI-Disks and several other SCSI
> Devices), but it once worked!

I think we all already agree, though, that the tagged command
queuing problem comes from a code change.  That doesn't identify
it very closely (or you would have included a patch ;^)).


It may be that the OS is slower in older revisions (one would
hope that was the case), and that now the code is faster, it's
too fast for the hardware.

It may also be that the switches between write caching on/off by
default in various versions have remove stall points in the write
code path which would have otherwise protected the drive from
being overwhelmed by the host OS.

There are a lot of possibilities for timing problems having been
introduced, that don't require that Soren's code be wrong, and
that it's impossible to blame the problem on the hardware.


On the theory that it is an off-by-one error, introduced either
by increased concurrency in an error path, or a direct off-by-one,
I've suggested dropping the effective number of tagged commands
supported by the drive.

That way, if you exceed this number for whatever coding error
reason, you won't exceed the capicty of the drive.

Since you have one of these beasts, could you maybe try changing
the number of tagged command queue entries you permit to be used
at one time?


> I really, really don't want to blame S=F8ren, he's doing a great job an=
d
> everybody, who makes something makes occasionally some errors, but (at
> least for me) it doesn't seem to be a fundamental technical problem,
> because *it once worked* - sorry, but it's true.
> =

> And maybe it isn't related to tagged queuing and the DTLA at all - if I=

> correctly understand Giorgos' mail...

As I said: it could be drive settings unrelated to the code
itself being correct.  I've given three suggestions to verify
this, one way or the other:

1)	Control the drive DMA speed down

2)	Pretend the maximum tagged command queue depth is
	smaller than it is

3)	Toggle the write caching on the drive

Until you try all three of these and report back, you can't say
that the problem is Soren's.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message