Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Feb 2003 10:20:12 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Ruslan Ermilov <ru@FreeBSD.ORG>
Cc:        Alfred Perlstein <alfred@FreeBSD.ORG>, Thomas Moestl <tmm@FreeBSD.ORG>, Soren Schmidt <sos@FreeBSD.ORG>, <current@FreeBSD.ORG>
Subject:   Re: cvs commit: src/sys/kern kern_intr.c src/sys/dev/ata ata-all.c
Message-ID:  <20030219095525.R11144-100000@gamplex.bde.org>
In-Reply-To: <20030218102408.GA48010@sunbay.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 18 Feb 2003, Ruslan Ermilov wrote:

> On Fri, Feb 14, 2003 at 05:10:40AM -0800, Alfred Perlstein wrote:
> > alfred      2003/02/14 05:10:40 PST
> >
> >   Modified files:
> >     sys/kern             kern_intr.c
> >     sys/dev/ata          ata-all.c
> >   Log:
> >   Fix crash dumps on ata and scsi.
> >
> [...]
> >   To fix ata, use what appears to be a polling method if we're dumping,
> >   I stole this from tmm but added code to ensure that this change is
> >   only in effect while dumping.
> >
> >   Tested by: des
> >
> FWIW, if I propagate this change to the !dumping case, it also
> fixes the ``resume stucks in "ata1: resetting devices .."'' bug
> I was having with my ThinkPad 600X:
>
> %%%
> Index: ata-all.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/dev/ata/ata-all.c,v
> retrieving revision 1.165
> diff -u -p -r1.165 ata-all.c
> --- ata-all.c	14 Feb 2003 13:10:40 -0000	1.165
> +++ ata-all.c	18 Feb 2003 10:08:22 -0000
> @@ -486,8 +486,7 @@ ata_getparam(struct ata_device *atadev,
>
>      /* apparently some devices needs this repeated */
>      do {
> -	if (ata_command(atadev, command, 0, 0, 0,
> -		dumping ? ATA_WAIT_READY : ATA_WAIT_INTR)) {
> +	if (ata_command(atadev, command, 0, 0, 0, ATA_WAIT_READY)) {
>  	    ata_prtdev(atadev, "%s identify failed\n",
>  		       command == ATA_C_ATAPI_IDENTIFY ? "ATAPI" : "ATA");
>  	    free(ata_parm, M_ATA);
> %%%

There is, or was, something near here that made the whole system go
unresponsive (as seen by nfs clients) for several seconds.  I guess
the main problem was just using polled mode in all cases here.  In
RELENG_4, polling is done at splbio() so normally only disk devices
are blocked, but under -current almost everything is blocked by Giant.

> The resume session (with apm(4)) now looks like this:
>
> : cbb0: PCI Memory allocated: 50103000
> : cbb1: PCI Memory allocated: 50102000
> : pcm0: detached
> : csa: card is Thinkpad 600X/A20/T20
> : pcm0: <CS461x PCM Audio> on csa0
> : pcm0: <Cirrus Logic CS4297A ac97 codec>
> : wakeup from sleeping state (slept 00:00:10)
> : ata0: resetting devices ..
> : done
> : ata1: resetting devices ..
> : ata1-slave: timeout waiting for cmd=ec s=01 e=24
> : ata1-slave: ATA identify failed
> : done

Apparently the timeout is too short or the interrupt got lost.  The
timeout seems to be too short.  It is 10 seconds, but IIRC the spec
is says 30 seconds for reset of the master and a bit more for the
slave.  Since things work with polling, we know that the device state
changed properly.  We could test for this state change instead of
always aborting after the timeout, and do finer grained and more sleeps
to determine the precise timeout required.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030219095525.R11144-100000>