Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 05 Nov 2004 22:08:18 +0100
From:      Uwe Doering <gemini@geminix.org>
To:        Igor Sysoev <is@rambler-co.ru>
Cc:        stable@freebsd.org
Subject:   Re: vnode_pager_putpages errors and DOS?
Message-ID:  <418BEBC2.3020304@geminix.org>
In-Reply-To: <20041104124616.S92154@is.park.rambler.ru>
References:  <Pine.NEB.3.96L.1041009150440.93055O-100000@fledge.watson.org> <4168578F.7060706@geminix.org> <20041103191641.K63546@is.park.rambler.ru> <4189666A.9020500@geminix.org> <20041104124616.S92154@is.park.rambler.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------020708070405060102030405
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Igor Sysoev wrote:
> [...]
> I've tried your patch from second email (it requires to include
> <sys/conf.h> for devsw and D_DISK): the system also became unresponsible.
> 
> The main problem is that I could not kill the offending process - it
> stuck in biowr state.

In the meantime I've investigated this further.  The two patches I 
provided so far certainly have their merits, since they deal with some 
unwanted side effects.  However, I found that the root cause for the 
eventual system lock-up lies elsewhere.

In an earlier email I already pointed out that function 
vnode_pager_generic_putpages() actually doesn't care whether the write 
operation failed or not.  It always returns VM_PAGER_OK.

Now, in case the write operation succeeds the file system code takes 
care that the formerly dirty pages associated with the i/o buffer get 
marked clean.  On the other hand, if the write attempt fails, for 
instance in an out-of-disk-space situation, the pages are left dirty. 
At this point the syncer enters an infinite loop, trying to flush the 
same dirty pages to disk over and over again.

The fix is actually quite simple.  In case of a write error we have to 
make sure ourselves that the associated pages get marked clean.  We do 
this by returning VM_PAGER_BAD instead of VM_PAGER_OK.  These two result 
codes are functionally identical, with the exception that VM_PAGER_BAD 
additionally marks the respective page clean.  For the details, please 
have a look at the caller function vm_pageout_flush() in 'vm_pageout.c'.

What this modification means is that in case of a write error the 
affected pages remain intact in memory until they get recycled, but we 
lose their contents as far as the copy on disk is concerned.  I believe 
this is acceptable (and possibly even originally intended) because 
giving up on syncing is about the best thing we can do in this 
situation, anyway.  And it is certainly a much better choice than 
halting the whole system due to an infinite loop.

I've attached an updated version of the patch for 'vnode_pager.c'.  On 
my test system it resolved the issue.  Please let us know whether it 
works for you as well.

    Uwe
-- 
Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
gemini@geminix.org  |  http://www.escapebox.net

--------------020708070405060102030405
Content-Type: text/plain;
 name="vnode_pager.c.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="vnode_pager.c.diff"

--- src/sys/vm/vnode_pager.c.orig	Tue Dec 31 10:34:51 2002
+++ src/sys/vm/vnode_pager.c	Fri Nov  5 20:41:15 2004
@@ -954,7 +954,9 @@
 	struct uio auio;
 	struct iovec aiov;
 	int error;
+	int status;
 	int ioflags;
+	static int last_elog, last_rlog;
 
 	object = vp->v_object;
 	count = bytecount / PAGE_SIZE;
@@ -1035,15 +1037,18 @@
 	cnt.v_vnodeout++;
 	cnt.v_vnodepgsout += ncount;
 
-	if (error) {
+	if (error && last_elog != time_second) {
+		last_elog = time_second;
 		printf("vnode_pager_putpages: I/O error %d\n", error);
 	}
-	if (auio.uio_resid) {
+	if (auio.uio_resid && last_rlog != time_second) {
+		last_rlog = time_second;
 		printf("vnode_pager_putpages: residual I/O %d at %lu\n",
 		    auio.uio_resid, (u_long)m[0]->pindex);
 	}
+	status = error ? VM_PAGER_BAD : VM_PAGER_OK;
 	for (i = 0; i < ncount; i++) {
-		rtvals[i] = VM_PAGER_OK;
+		rtvals[i] = status;
 	}
 	return rtvals[0];
 }

--------------020708070405060102030405--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?418BEBC2.3020304>