Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Apr 2017 07:36:35 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Edward Tomasz Napierala <trasz@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r316941 - head/sys/kern
Message-ID:  <20170415064658.L4428@besplex.bde.org>
In-Reply-To: <201704142015.v3EKFYWA017623@repo.freebsd.org>
References:  <201704142015.v3EKFYWA017623@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 14 Apr 2017, Edward Tomasz Napierala wrote:

> Log:
>  Don't try to write out bufs that have already failed with ENXIO.
>  This fixes some panics after disconnecting mounted disks.
>
>  Submitted by:	imp (slightly different version, which I've then lost)
>  Reviewed by:	kib, imp, mckusick
>  MFC after:	2 weeks
>  Differential Revision:	https://reviews.freebsd.org/D9674
>
> Modified:
>  head/sys/kern/vfs_bio.c
>
> Modified: head/sys/kern/vfs_bio.c
> ==============================================================================
> --- head/sys/kern/vfs_bio.c	Fri Apr 14 20:15:17 2017	(r316940)
> +++ head/sys/kern/vfs_bio.c	Fri Apr 14 20:15:34 2017	(r316941)
> @@ -2290,18 +2290,28 @@ brelse(struct buf *bp)
> 		bdirty(bp);
> 	}
> 	if (bp->b_iocmd == BIO_WRITE && (bp->b_ioflags & BIO_ERROR) &&
> +	    (bp->b_error != ENXIO || !LIST_EMPTY(&bp->b_dep)) &&
> 	    !(bp->b_flags & B_INVAL)) {
> 		/*
> -		 * Failed write, redirty.  Must clear BIO_ERROR to prevent
> -		 * pages from being scrapped.
> +		 * Failed write, redirty.  All errors except ENXIO (which
> +		 * means the device is gone) are expected to be potentially
> +		 * transient - underlying media might work if tried again
> +		 * after EIO, and memory might be available after an ENOMEM.
> +		 *
> +		 * Do this also for buffers that failed with ENXIO, but have
> +		 * non-empty dependencies - the soft updates code might need
> +		 * to access the buffer to untangle them.
> +		 *
> +		 * Must clear BIO_ERROR to prevent pages from being scrapped.
> 		 */

This is hard to fix, but I have used a version that only retries after
EIO for 15-20 years.  I didn't think of ENOMEM.

The media is unlikely to come back after EIO too.  For removable media,
you might be able to get the write done to new media, but a panic reading
from the new media is just as likely.  Geom "tasting" might prevent the
new media being used.

ENXIO is actually the one error that can often be recovered from.  I
wrote a form of "tasting" in a toy OS 30-35 years ago.  It handled
removal of "mounted" disks with pending writes too well, in a way that
made recovery from non-transient I/O errors almost impossible without
turning off the system.  ENXIO was treated as a transient I/O error.
It was recovered from perfectly if the user could find the original
media and unremove it.  The "tasting" usually worked to detect different
media and disallow writing cached data to a different disk.  Media
errors were common, and when one occurred for writing the method of
replacing the disk by a garbage one did't work since it was a different
disk.  The most common one was writing to a write protected disk, and
that was recoverable by removing the write protection.  But often you
really didn't want to write to that disk, but wanted to write somewhere.
The only way to continue was to reboot to discard the write.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170415064658.L4428>