Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Mar 2003 05:50:37 -0800
From:      Sean Chittenden <sean@chittenden.org>
To:        "Alan L. Cox" <alc@imimic.com>
Cc:        arch@freebsd.org
Subject:   Re: Should sendfile() to return EAGAIN?  [patch]
Message-ID:  <20030309135037.GK79234@perrin.int.nxad.com>
In-Reply-To: <3E64FEA0.CCA21C7@imimic.com>
References:  <3E64FEA0.CCA21C7@imimic.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--D85EWoRcDXv1uhH1
Content-Type: multipart/mixed; boundary="Sh9dYexoRflRb0jn"
Content-Disposition: inline


--Sh9dYexoRflRb0jn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

> Given that the main purpose of the sf_buf is simply to provide an
> in-kernel virtual address for the page, one sf_buf per page should
> suffice.  Sf_bufs are already reference counted.  So, the principle
> change would be to add a directory data structure that could answer
> the question "Does this page already have an allocated sf_buf?"

Is this kind of a structure already in use someplace in the tree?  I
poked around for a bit and couldn't find anything that'd suggest that
it's already been done.  Do you know if/where I could poach code
that'd provide this functionality or would this be fresh bits that'd
that'd have to hit the tree?

Was there any form of consensus regarding whether or not
sf_buf_alloc() should be called in the non-blocking case?  I think
that there should be another call, sf_buf_alloc_nb() that is used that
doesn't block the call when there aren't any sf_buf's available and is
used in the non-blocking case.  I can't imagine Greenman meant to use
msleep(..., 0) for the non-blocking case.  The attached patch corrects
this.

The patch updates the case of sendfile() when there aren't any
sf_buf's available.  Instead of calling msleep() and blocking the
caller on a socket that has been marked non-blocking, return instantly
with EAGAIN.  This doesn't provide a mechanism for identifying that
there aren't any sf_buf's available.  At some point a read only sysctl
should be added that lets an administrator know how many sf_buf's are
free (max number already exists in -CURRENT so it should be trivial
for an admin to figure out how many are in use), but that will come at
a later date (ENOSLEEP).  Returning control to the program should
dramatically improve the responsiveness of a non-blocking application
that uses sendfile(), hopefully sf_buf's will be freed up for use
making it possible to deal with the load.  Currently when this limit
is hit, it kills the concurrency of the server that non-blocking IO
affords and the throughput of a system drops from __Mbps down to near
0Mbps.  With this patch, at the very least the server should be able
to continue to send data at __Mbps.

There is a race in this code, however.  There is a race with this code
in that if a non-blocking socket that has had sendfile() called on it
where there wasn't an sf_buf available, and it is set back to being a
blocking socket, sf_buf_alloc_want will never reach zero and as a
result, wakeup_one(&sf_freelist) will be called every time in
sf_buf_free().  I'm not sure how best to fix this if it should be, or
even what the behavior of wakeup_one(&sf_freelist) will be if it is
called when there isn't anything msleep()'ing on sf_freelist.

Another something that came to mind was to alert the server admin that
he/she's out of sf_buf's the same way that the kernel does when it
runs out of nmbclusters.

Lastly, I couldn't find any bits to suggest how data coherency is
maintained.  What's the mechanism in place that guarantee's this?

-sc


Patch can also be found at:

http://people.freebsd.org/~seanc/patches/#sendfile_no_block

--=20
Sean Chittenden

--Sh9dYexoRflRb0jn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=patch
Content-Transfer-Encoding: quoted-printable

Index: uipc_syscalls.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v
retrieving revision 1.142
diff -u -r1.142 uipc_syscalls.c
--- uipc_syscalls.c	6 Mar 2003 04:48:19 -0000	1.142
+++ uipc_syscalls.c	9 Mar 2003 13:14:43 -0000
@@ -1661,6 +1661,27 @@
 	return (sf);
 }
=20
+/*
+ * Get an sf_buf from the freelist. Will return NULL if none are
+ * available.
+ */
+struct sf_buf *
+sf_buf_alloc_nb()
+{
+	struct sf_buf *sf;
+
+	mtx_lock(&sf_freelist.sf_lock);
+	sf_buf_alloc_want++;
+	sf =3D SLIST_FIRST(&sf_freelist.sf_head);
+	if (sf !=3D NULL) {
+		SLIST_REMOVE_HEAD(&sf_freelist.sf_head, free_list);
+		sf_buf_alloc_want--;
+	}
+
+	mtx_unlock(&sf_freelist.sf_lock);
+	return (sf);
+}
+
 #define dtosf(x)	(&sf_bufs[((uintptr_t)(x) - (uintptr_t)sf_base) >> PAGE_S=
HIFT])
=20
 /*
@@ -1929,17 +1950,22 @@
 		vm_page_unlock_queues();
=20
 		/*
-		 * Get a sendfile buf. We usually wait as long as necessary,
-		 * but this wait can be interrupted.
+		 * Get a sendfile buf.
 		 */
-		if ((sf =3D sf_buf_alloc()) =3D=3D NULL) {
+		if (so->so_state & SS_NBIO) {
+			if ((sf =3D sf_buf_alloc_nb()) =3D=3D NULL)
+				error =3D EAGAIN;
+		} else {
+			if ((sf =3D sf_buf_alloc()) =3D=3D NULL)
+				error =3D EINTR;
+		}
+		if (sf =3D=3D NULL) {
 			vm_page_lock_queues();
 			vm_page_unwire(pg, 0);
 			if (pg->wire_count =3D=3D 0 && pg->object =3D=3D NULL)
 				vm_page_free(pg);
 			vm_page_unlock_queues();
 			sbunlock(&so->so_snd);
-			error =3D EINTR;
 			goto done;
 		}
=20

--Sh9dYexoRflRb0jn--

--D85EWoRcDXv1uhH1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Comment: Sean Chittenden <sean@chittenden.org>

iD8DBQE+a0at3ZnjH7yEs0ERAmRSAJ9wXd+g+rPqHQ/dL9K1Ds0ku2aMqACfSgfv
xaRjZ5R5gJNFkPNdnrK1tl0=
=ngMe
-----END PGP SIGNATURE-----

--D85EWoRcDXv1uhH1--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030309135037.GK79234>