Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 May 2024 06:42:07 +0000
From:      "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: <CAG6t_XAcUDK%2BpPHiUZ9Bwu2fE5wg6vwK_zcuEYe94sb15HnUPg@mail.gmail.com>
References:  <CAG6t_XAcUDK%2BpPHiUZ9Bwu2fE5wg6vwK_zcuEYe94sb15HnUPg@mail.gmail.com>

| previous in thread | raw e-mail | index | archive | help
Kumara Babu writes:

> We perhaps could gracefully handle such lengthy buffer IO operations by
> adding a timeout in bwait() - like say 10 minutes. If the buffer IO is not
> completed in a few mins, it probably would not complete forever and/or
> would be slowing down the entire system. So it is better to stop such
> faulty IO operations.

I agree that the symptoms are bad, but disagree about putting a workaround
in bread(), because you get system corruption if the I/O operation
completes anyway after the timeout.

The fundamental issue with timing out I/O, is stopping the operation
in progress.

If you do a "I'm not waiting for this any more", you have to sequester
the destination of the I/O operation, until you have 100% confirmation
that the operation has either been completed or sucessfuly neutered.
(As a policy choice, you may also want to write-protect the source.)

This is why hi-rel systems never allow direct(-mapped) I/O: By
insisting that data go through dedicated I/O buffers, failing buffers
can be sequestered as long as necessary, without complicating the
application logic.

Before Virtual Memory, the UNIX buffer-cache worked that way, and
MERT did that.  (MERT = Early five-nines UNIX for telephone switches.)

Between "intelligent I/O controllers" with DMA access, virtual
memory and direct-mapped I/O, we /have/ to make sure the underlying
I/O operation is /guaranteed/ dead, before we wake up the thread.

The only place that can and should happen is in the device driver,
possibly assisted by infrastructure such as CAM.

You need to find out which device driver is ultimately responsible
for the hanging bread(), because that's where the timeout should
happen.

Poul-Henning
 
-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?>