Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 May 2007 13:14:14 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Nate Lawson <nate@root.org>
Cc:        cvs-src@freebsd.org, Darren Reed <darrenr@hub.freebsd.org>, src-committers@freebsd.org, cvs-all@freebsd.org
Subject:   Re: cvs commit: src/sys/kern kern_intr.c src/sys/sys interrupt.h
Message-ID:  <200705021314.15733.jhb@freebsd.org>
In-Reply-To: <4638BE29.1020505@root.org>
References:  <200705020615.l426FDo7015874@repoman.freebsd.org> <4638BAC9.7000603@root.org> <4638BE29.1020505@root.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 02 May 2007 12:36:57 pm Nate Lawson wrote:
> Nate Lawson wrote:
> > John Baldwin wrote:
> >> On Wednesday 02 May 2007 03:07:07 am Darren Reed wrote:
> >>> On Wed, May 02, 2007 at 06:15:13AM +0000, Nate Lawson wrote:
> >>>> njl         2007-05-02 06:15:13 UTC
> >>>>
> >>>>   FreeBSD src repository
> >>>>
> >>>>   Modified files:        (Branch: RELENG_6)
> >>>>     sys/kern             kern_intr.c 
> >>>>     sys/sys              interrupt.h 
> >>>>   Log:
> >>>>   MFC: rate-check the interrupt storm message and bump the counter 
500 -> 
> >> 1000
> >>> Is this number, "500" or "1000" somehow "magical" for modern hardware?
> >>>
> >>> If I had a 500MHZ, 1GHz, 1.5GHz, 2GHz, 2.5GHz machines, each with the
> >>> appropriate architecture, what would the correct value for this be?
> >>> Is i always 1000 or should it be calculated?
> >> It's a SWAG and tunable for machines where it doesn't work.  In practice 
the 
> >> old setting seemed to be a bit too trigger-happy as I know my printer 
always 
> >> triggered it, for example.
> >>
> > 
> > There's more to it than just your Ghz number.  It's a counter of the
> > number of times an interrupt has triggered while the previous one was
> > being serviced.  The faster your kernel, the lower the number could be.
> > 
> > I have a slow early SMP Celeron system with a dc(4) adapter with 4 ports
> > sharing an irq with my ata.  At 3 am, the nightly script kicks off
> > enough IO that it triggers a bug in my dc(4) card that causes it to mask
> > the interrupt too long.  Then, the irq storm suppression logic kicked
> > in, causing ata to timeout the request.  The drive is on a mirror so I'd
> > lose half the mirror, then rebuild in the morning.  With this value
> > bumped, I don't have that problem any more but the real issue is why
> > dc(4) is being so quirky under heavy shared irq load.
> > 
> 
> This is on 6.x btw.  Is there any reason why our retries is so low?
> 
> sys/dev/ata/ata-disk.c:    request->retries = 2;

At work we up the timeout from 5 to 30, but we leave retries at 2.

> Note that I still got a timeout but it succeeded without error.  I think
> this is a combination of the dc(4) and highpoint hpt366 driver
> interaction.  dc(4) is probably holding Giant or something too long and
> ata is being too sensitive to the slow hw.

Neither dc(4) nor ata(4) hold Giant, FWIW.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200705021314.15733.jhb>