Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 Oct 1999 22:00:23 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        sos@freebsd.org
Cc:        alpha@freebsd.org, "Erik H. Bakke" <erik@habatech.no>
Subject:   workaround for ata driver woes on alpha 
Message-ID:  <14346.31193.248797.237477@grits.cs.duke.edu>

next in thread | raw e-mail | index | archive | help

S=F8ren,

There's a problem with the ata-driver on alphas.  Under heavy disk
load, the machine will complain "ad_timeout: lost disk contact -
resetting" and then promptly panic & leave something like the
following stack trace:

panic: trap
#0  0xfffffc0000386c2c in boot (howto=3D260) at ../../kern/kern_shutdow=
n.c:278
278                     savectx(&dumppcb);
(kgdb) bt
#0  0xfffffc0000386c2c in boot (howto=3D260) at ../../kern/kern_shutdow=
n.c:278
#1  0xfffffc0000344530 in db_fncall (dummy1=3D0, dummy2=3D0, dummy3=3D0=
, dummy4=3D0x0)
    at ../../ddb/db_command.c:532
#2  0xfffffc00003441a4 in db_command (last_cmdp=3D0xfffffc00005b1a60,=20=

    cmd_table=3D0x0, aux_cmd_tablep=3D0xfffffc00005d6990)
    at ../../ddb/db_command.c:333
#3  0xfffffc0000344320 in db_command_loop () at ../../ddb/db_command.c:=
455
#4  0xfffffc0000347ff8 in db_trap (type=3D0, code=3D0) at ../../ddb/db_=
trap.c:71
#5  0xfffffc00005051c8 in kdb_trap (a0=3D1, a1=3D1, a2=3D9600, entry=3D=
3,=20
    regs=3D0xfffffe0011955500) at ../../alpha/alpha/db_interface.c:194
#6  0xfffffc0000512d58 in trap (a0=3D1, a1=3D15, a2=3D9600, entry=3D3,=20=

    framep=3D0xfffffe0011955500) at ../../alpha/alpha/trap.c:285
#7  0xfffffc0000505ad0 in XentIF () at ../../alpha/alpha/exception.s:63=

#8  0xfffffc000050538c in Debugger (msg=3D0x0)
    at ../../alpha/alpha/db_interface.c:256
#9  0xfffffc0000387354 in panic (fmt=3D0xfffffc00005a76fc "trap")
    at ../../kern/kern_shutdown.c:528
#10 0xfffffc00005131ec in trap (a0=3D40, a1=3D1, a2=3D0, entry=3D2,=20
    framep=3D0xfffffe0011955740) at ../../alpha/alpha/trap.c:530
#11 0xfffffc0000505b2c in XentMM () at ../../alpha/alpha/exception.s:94=

#12 0xfffffc0000523b04 in ad_transfer (request=3D0xfffffe00087e3c00)
    at ../../dev/ata/ata-disk.c:431
#13 0xfffffc0000521d38 in ata_start (scp=3D0xfffffe0008713400)
    at ../../dev/ata/ata-all.c:583
#14 0xfffffc0000522338 in ata_reinit (scp=3D0xfffffe0008713400)
    at ../../dev/ata/ata-all.c:716
#15 0xfffffc000052448c in ad_timeout (request=3D0xfffffe00087e3c00)
    at ../../dev/ata/ata-disk.c:648
#16 0xfffffc000039025c in softclock () at ../../kern/kern_timeout.c:131=

#17 0xfffffc0000376d70 in hardclock (frame=3D0xfffffe00119559e0)
    at ../../kern/kern_clock.c:253
#18 0xfffffc000051564c in handleclock (arg=3D0xfffffe00119559e0)
    at ../../alpha/alpha/clock.c:266
#19 0xfffffc0000513e34 in interrupt (a0=3D0, a1=3D1536, a2=3D1844673967=
5668704635,=20
    framep=3D0xfffffe00119559e0) at ../../alpha/alpha/interrupt.c:101
#20 0xfffffc0000505afc in XentInt () at ../../alpha/alpha/exception.s:7=
8

I admit to not understanding callouts, so you might want to take this
theory with a grain of salt:

I believe what is happening is that ad_timeout() gets called (quite
prematurely) at spl0.  While ad_timout() is executing, the interrupt
comes in for the request in question.  The interrupt handler frees the
request that the ad_timeout() call chain is currently operating on (or
otherwise messes with it).  The request is then corrupted, and chaos
(machine check, or a trap for an invalid access) ensues.  I'm tempted
to wrap ad_timeout() in splbio() but there is still a window when
ad_callout() is being called that we'll be at spl0 (is this right, is
it called at spl0? this is what I don't know..)

Anyway, we see this on the alpha because the timeout is hardcoded to
fire after 300 ticks.  This is a little under 3 seconds on an x86
(typically hz<=3D128) but it is less than 1/3 of a second on an alpha
(typically hz>=3D1024).  The following patch levels the playing field &=

seems to "fixe" the problem on alpha. (at least I'm now able to untar
ports & then rm -rf the tree).

Index: sys/dev/ata/ata-disk.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /home/ncvs/src/sys/dev/ata/ata-disk.c,v
retrieving revision 1.31
diff -u -r1.31 ata-disk.c
--- ata-disk.c  1999/10/10 18:08:36     1.31
+++ ata-disk.c  1999/10/18 01:13:48
@@ -417,7 +417,7 @@
     if (request->donecount =3D=3D 0) {
=20
        /* start timeout for this transfer */
-       request->timeout_handle =3D timeout((timeout_t*)ad_timeout, req=
uest, 300);
+       request->timeout_handle =3D timeout((timeout_t*)ad_timeout, req=
uest, 3*hz);
=20
        /* setup transfer parameters */
        count =3D howmany(request->bytecount, DEV_BSIZE);


Drew
-----------------------------------------------------------------------=
-------
Andrew Gallatin, Sr Systems Programmer=09http://www.cs.duke.edu/~gallat=
in
Duke University=09=09=09=09Email: gallatin@cs.duke.edu
Department of Computer Science=09=09Phone: (919) 660-6590


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14346.31193.248797.237477>