Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Apr 2005 10:38:50 +0100
From:      Ian Dowse <iedowse@maths.tcd.ie>
To:        Daniel Eriksson <daniel_k_eriksson@telia.com>
Cc:        'FreeBSD Current' <freebsd-current@freebsd.org>
Subject:   Re: Serious I/O problems (bad performance and live-lock) 
Message-ID:  <200504251038.aa23541@salmon.maths.tcd.ie>
In-Reply-To: Your message of "Sat, 23 Apr 2005 10:09:41 %2B0200." <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA0VcX9IoJqUaXPS8MjT1PdsKAAAAQAAAA5xh4prxQBkmZLv9A9nCvPwEAAAAA@telia.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA0VcX9IoJqUaXPS8MjT1P
dsKAAAAQAAAA5xh4prxQBkmZLv9A9nCvPwEAAAAA@telia.com>, Daniel Eriksson writes:
>
>Here are some further observations and speculations.
>
>On a newly booted system, this is what happens:
>
>1. Start a "dd if=/dev/zero of=/usr/test bs=128k".
>2. While looking at 'top', "Inact" grows and "Free" shrinks.
>3. Once "Free" has bottomed out, "Inact" stops growing (naturally).
>4. 'dd' continues to put a load on the VM system, eventually forcing most
>processes to be swapped out (illustrated by the "RES" column showing a very
>low number for all but a few processes). This takes 30-60 seconds after
>"Free" has bottomed out on my machine.
>5. At this point the machine is mostly useless because it can take several
>minutes to run a simple 'ls'.

This may not be directly related, but the disk scheduling algorithm
in bioq_disksort() has always behaved poorly for large sequential
writes. Its keeps deciding to process the next request in the
sequential pattern because the single-direction elevator sort always
prefers offsets that are after the current position over smaller
offsets. The intention is that this results in frequent sweeps
across the whole disk, but with large sequential writes it can get
stuck for long periods of time at one part of the disk.

Below is a patch that offers a bit more control over this behaviour
that I was experimenting with some time ago. I seem to remember
finding that a smaller value for kern.bioq_maxbeforeswitch such as
5 might be a better default. The existing bioq_disksort() behaviour
corresponds to a very large value of this sysctl.

Ian

Index: subr_disk.c
===================================================================
RCS file: /dump/FreeBSD-CVS/src/sys/kern/subr_disk.c,v
retrieving revision 1.83
diff -u -r1.83 subr_disk.c
--- subr_disk.c	6 Jan 2005 23:35:39 -0000	1.83
+++ subr_disk.c	17 Feb 2005 20:53:18 -0000
@@ -14,11 +14,18 @@
 
 #include <sys/param.h>
 #include <sys/systm.h>
+#include <sys/kernel.h>
+#include <sys/sysctl.h>
 #include <sys/bio.h>
 #include <sys/conf.h>
 #include <sys/disk.h>
 #include <geom/geom_disk.h>
 
+int bioq_maxbeforeswitch = 20;
+SYSCTL_INT(_kern, OID_AUTO, bioq_maxbeforeswitch, CTLFLAG_RW,
+    &bioq_maxbeforeswitch, 0,
+    "Maximum number of operations to place before the switch point");
+
 /*-
  * Disk error is the preface to plaintive error messages
  * about failing disk transfers.  It prints messages of the form
@@ -71,6 +78,7 @@
 	head->last_offset = 0;
 	head->insert_point = NULL;
 	head->switch_point = NULL;
+	head->beforeswitchcnt = 0;
 }
 
 void
@@ -85,8 +93,10 @@
 	} else if (bp == TAILQ_FIRST(&head->queue))
 		head->last_offset = bp->bio_offset;
 	TAILQ_REMOVE(&head->queue, bp, bio_queue);
-	if (TAILQ_FIRST(&head->queue) == head->switch_point)
+	if (TAILQ_FIRST(&head->queue) == head->switch_point) {
+		head->beforeswitchcnt = 0;
 		head->switch_point = NULL;
+	}
 }
 
 void
@@ -179,7 +189,8 @@
 		 * "locked" portion of the list, then we must add ourselves
 		 * to the second request list.
 		 */
-		if (bp->bio_offset < bioq->last_offset) {
+		if (bp->bio_offset < bioq->last_offset ||
+		    bioq->beforeswitchcnt > bioq_maxbeforeswitch) {
 
 			bq = bioq->switch_point;
 			/*
@@ -202,6 +213,7 @@
 				return;
 			}
 		} else {
+			bioq->beforeswitchcnt++;
 			if (bioq->switch_point != NULL)
 				be = TAILQ_PREV(bioq->switch_point,
 						bio_queue, bio_queue);




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200504251038.aa23541>