Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Nov 2004 13:15:46 +0100
From:      =?ISO-8859-1?Q?S=F8ren_Schmidt?= <sos@DeepCore.dk>
To:        Zoltan Frombach <tssajo@hotmail.com>
Cc:        Robert Watson <rwatson@freebsd.org>
Subject:   Re: 5.3-RELEASE: WARNING - WRITE_DMA interrupt timout
Message-ID:  <4195FAF2.4050409@DeepCore.dk>
In-Reply-To: <BAY2-DAV4EoPXCFbN4T00016204@hotmail.com>
References:  <26249.1100342074@critter.freebsd.dk> <4195E5DB.2070302@DeepCore.dk> <BAY2-DAV4EoPXCFbN4T00016204@hotmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Zoltan Frombach wrote:
> I will apply this patch first thing tomorrow. But I don't see how will =
I=20
> see any difference? Does it put something into a log file? Shouldn't it=
?

The change is that if you see the "WARNING interrupt seen" we know that=20
it was the upper layers that used up the 5 secs of timeout, not that
some of it was used by a disk being slow to respond.

-S=F8ren

> Zoltan
>=20
> Poul-Henning Kamp wrote:
>=20
>> In message <4195E1FF.5090906@DeepCore.dk>,=20
>> =3D?ISO-8859-1?Q?S=3DF8ren_Schmidt?=3D wri
>> tes:
>>
>>
>>>>> Timeout is 5 secs, which is a pretty long time in this context IMHO=
=2E.
>>>>
>>>>
>>>> Five seconds counted from when ?
>>>
>>>
>>> Now thats the nasty part :)
>>> ATA starts the timeout when the request is issued to the device, so=20
>>> theoretically the disk could take 4.9999 secs to complete the request=
=20
>>> and then the timeout fires before the taskqueue gets its chance at=20
>>> it, but IMHO thats pretty unlikely...
>>
>>
>> I find that far more likely than kernel threads being stalled for that=

>> long.  ATA disks doing bad-block stuff takes several seconds on some
>> of the disks I've had my hands on.
>>
>>> Anyhow, I can just remove the warning from ATA if that makes anyone=20
>>> happy, since its just a warning and ATA doesn't do anything with it=20
>>> at all.
>>> However, IMNHO this points at a problem somewhere that we should=20
>>> better understand and fix instead.
>>
>>
>> I would prefer you reset the timer to five seconds in your interrupt
>> routine so we can see exactly on which side of that the time is spent.=

>=20
>=20
> It would be even better to time how long both ops take and be able to
> get that via a sysctl or something (I have that on my TODO list but its=

> loooong :) ).
>=20
> Anyhow resetting it is easy (patch against 5.3R):
>=20
> Index: ata-queue.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /home/ncvs/src/sys/dev/ata/ata-queue.c,v
> retrieving revision 1.32.2.5
> diff -u -r1.32.2.5 ata-queue.c
> --- ata-queue.c 24 Oct 2004 09:27:37 -0000      1.32.2.5
> +++ ata-queue.c 13 Nov 2004 10:44:40 -0000
> @@ -216,6 +216,9 @@
>         ata_completed(request, 0);
>      }
>      else {
> +       if (!dumping)
> +           callout_reset(&request->callout, request->timeout * hz,
> +                         (timeout_t*)ata_timeout, request);
>         if (request->bio && !(request->flags & ATA_R_TIMEOUT)) {
>             ATA_DEBUG_RQ(request, "finish bio_taskqueue");
>             bio_taskqueue(request->bio, (bio_task_t *)ata_completed,
> request);
>=20



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4195FAF2.4050409>