Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Oct 1999 09:37:04 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        mjacob@feral.com
Cc:        alpha@freebsd.org
Subject:   Re: workaround for ata driver woes on alpha 
Message-ID:  <14347.6330.820928.627692@grits.cs.duke.edu>
In-Reply-To: <Pine.BSF.4.10.9910171902490.77636-100000@beppo.feral.com>
References:  <14346.31193.248797.237477@grits.cs.duke.edu> <Pine.BSF.4.10.9910171902490.77636-100000@beppo.feral.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Matthew Jacob writes:
 > 
 > actually, this happens right away for me w/o even heavy disk load.
 > 
 > callouts should not be happening at spl0, should they? shouldn't they
 > happen at the lowest level that is *not* non-interrupt context? At any
 > rate, callouts have to always protect the resources they might share with
 > interrupts (whether via spls or locks) like any other shared resource
 > problem, no?
 > 
 > -matt

<I hope you don't mind me CC'ing this to -alpha, but I want to hear
what other people have to say>

Sorry. spl0 was a typo.  I'd meant to say softclock.  Eg,
ALPHA_PSL_IPL_SOFT.  hardclock() calls (void)splsoftclock(); to set
the ipl to ALPHA_PSL_IPL_SOFT just before calling softclock.
Softclock itself itself goes to splhigh() while it processes the
callouts.  But just before it calls the timeout function it lowers the
ipl back to ALPHA_PSL_IPL_SOFT:

   130                                  splx(s);
   131                                  c_func(c_arg);
   132                                  s = splhigh();


According to alpha/include/alpha_cpu.h:

#define ALPHA_PSL_IPL_0         0x0000          /* all interrupts enabled */
#define ALPHA_PSL_IPL_SOFT      0x0001          /* software ints disabled */
#define ALPHA_PSL_IPL_IO        0x0004          /* I/O dev ints disabled */
#define ALPHA_PSL_IPL_CLOCK     0x0005          /* clock ints disabled */
#define ALPHA_PSL_IPL_HIGH      0x0006          /* all but mchecks disabled */

So we should be open to device interrupts when the ad_timeout()
routine is called.

I'm pretty sure this is what is happening for the following reason:

I bzero() the request structure in ad_interrupt() just prior to
freeing it (around line 593 of ata-disk.c).  Then at the top of
ad_timeout() I print out some values from the request. 

When ad_timeout() is called, I see non-zero values for the request
fields that I have printed, but my panic changes from a machine check
to a memory access fault when attempting to deref a pointer in the
request struct that was bzeroed.  If, after the crash, I examine the
fields that I'd printed out, they are now zero.  However, all the
fields are not zero -- request->retries == 1, for example. I got the
crashdump via 'call boot(RB_NOSYNC|RB_DUMP)' in the debugger, so I do
not think that sync'ing disks changed anything.

I think there is a serious problem with the ad_timeout() function in
the case where the request has actually completed & the timeout was
too short.  ad_timeout() has no way to know if the request it has been
passed is still valid, or has been deallocated.  Wrapping the function
in splbio() will only narrow the race, not close it because we're
still going to be at splsoftclock when the function is called.  I
think setting the timeout to a reasonable value is a good workaround,
but I'm still concerned about very slow hardware..

Drew
------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer	http://www.cs.duke.edu/~gallatin
Duke University				Email: gallatin@cs.duke.edu
Department of Computer Science		Phone: (919) 660-6590







To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14347.6330.820928.627692>