Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Apr 1999 20:54:49 -0400 (EDT)
From:      Christopher Sedore <cmsedore@mailbox.syr.edu>
To:        hackers@freebsd.org
Subject:   async io and sockets update
Message-ID:  <Pine.SOL.3.95.990428202157.17938A-100000@rodan.syr.edu>

next in thread | raw e-mail | index | archive | help

I've mostly finished what I set out to do with the kernel aio routines.
Below is a summary:

1. I've added a new system call, aio_waitcomplete(struct aiocb **cb,
struct timespec *tv).  This system call will cause a process to sleep
until the next async io op completes or the timeout expires.  When the
operation completes, a pointer to the userland aiocb is placed cb.  This
makes fire and forget async io programming both possible and easy.

2. I've changed the way that async operations on sockets get handled.
	a. Sockets are checked to see if the operations will complete 
	   immediately.  If not, they are placed on a separate queue are
	   processed when upcalled by the sowakeup routine.
	b. When upcalled as writeable all pending writes are moved to the
	   regular io queue to be processed.
	c. When upcalled as readable, reads are executed in the upcall
	   routine as long as the socket stays readable.

3. I believe I fixed a bug in aio_process that would allow it to try to
execute operations on descriptors that have been closed, causing a panic.

Notes:

Ideally, operations on sockets that would complete immediately should be
executed during the aio_read system call, and the results made ready to be
picked up later.

Benefits:

The old aio code would pass socket operations on to the aio daemons
immediately, causing them to block (sbwait).  Once you blocked the maximum
number of aiod's, no more operations would progress until one of the aiods
could complete an operation.

This methodology can be significantly faster than using select() to poll
sockets.  A simple test program showed that before optimization 2c above,
the async io routines would only be faster than select() after about 37
descriptors were being monitored.  With optimization 2c, async io is
faster for the all testing I did (I did not test with less than 10
descriptors).

The performance difference (again with a simple test program) between aio
and select() for reading looks something like this:

	select()	aio_read()/aio_waitcomplete()
num fds	kb/s	secs 	kb/s	secs
10	26315	19	35714	14
20	20833	24	35714	14
30	17241	29	33333	15
40	14285	35	33333	15
50	12195	41	33333	15
60	10416	48	33333	15
70	9259	54	31250	16
80	8196	61	33333	15
90	7575	66	31250	16
100	6944	72	33333	15


select() continues to trail off up to 250 descriptors, while aio shows no
significant degradation.  Note that using aio_suspen instead of
aio_waitcomplete would probably be non-trivially slower than
aio_waitcomplete, but still faster than select on large numbers of
descriptors (though it might not be much faster, depending on the order
that operations completed vs the order of the pointers to them passed
into aio_suspend). 

The test program simply creates the requested number of descriptors using
socketpair(), and either places an outstanding aio_read on each, or puts
each in an fd_set for select.  Then, descriptors are chosen at random()
out of this set, and written to.  aio_waitcomplete or select() are used to
get the [completed aio_read aiocb/fd to read], and then [aio_read is done
again/the fd_set is reset].  The tests above were done with 1000000 writes
of 512 bytes each, and a corresponding read of 1000000 buffers of 512
bytes each. 

One remaining problem with the aio code is that aio operations won't
"cross over" to other kernel threads, because they are based on the procs
that issue them, rather than the file descriptor itself.  I may
investigate creating a variation of NT's io completion ports to enable
async io with kernel threads.
	   
I don't think that the modifications are too invasive.  There are numerous
mods to kern/vfs_aio.c, some mods to uipc_socket.c and uipc_socket2.c and
small changes sys/aio.h, and sys/socketvar.h.  (Plus the syscall
addition).  I hope to do some more tweaking and see if I can get some one
to look it over with an eye to committing some or all of it.

-Chris




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SOL.3.95.990428202157.17938A-100000>