From owner-freebsd-hackers  Mon Apr 26  5:49:45 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from mailer.syr.edu (mailer.syr.edu [128.230.18.29])
	by hub.freebsd.org (Postfix) with ESMTP id 61F6E14E30
	for <hackers@freebsd.org>; Mon, 26 Apr 1999 05:49:34 -0700 (PDT)
	(envelope-from cmsedore@mailbox.syr.edu)
Received: from rodan.syr.edu by mailer.syr.edu (LSMTP for Windows NT v1.1a) with SMTP id <0.ADC26190@mailer.syr.edu>; Mon, 26 Apr 1999 8:49:22 -0400
Received: from localhost (cmsedore@localhost)
	by rodan.syr.edu (8.8.7/8.8.7) with SMTP id IAA00927
	for <hackers@freebsd.org>; Mon, 26 Apr 1999 08:49:15 -0400 (EDT)
X-Authentication-Warning: rodan.syr.edu: cmsedore owned process doing -bs
Date: Mon, 26 Apr 1999 08:49:15 -0400 (EDT)
From: Christopher Sedore <cmsedore@mailbox.syr.edu>
X-Sender: cmsedore@rodan.syr.edu
Reply-To: Christopher Sedore <cmsedore@mailbox.syr.edu>
To: hackers@freebsd.org
Subject: aio and sockets (long)
Message-ID: <Pine.SOL.3.95.990426080745.21215A-100000@rodan.syr.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


I've been working on modifying the kernel aio routines so that they are
more useful for sockets.  Currently, if you ask for an async read or write
on a socket, this takes up one aiod which blocks waiting for the operation
to complete, which is undesireable.

What I've currently implemented (and am not happy with) is a alternate
queueing for socket operations.  Basically, if the descriptor is
DTYPE_SOCKET, we check to see if it is readable or writeable
(soreadable()/sowriteable()) and if it is we queue as before.  If it is
not, then the aiocb is put on a socket queue, and the socket is modified
to call a wakeup routine with a pointer to the aiocb.

I modified the aiocb to contain another pointer to create a singly linked
list of aiocbs pending on a socket.  When the wakeup routine is called,
all the aiocbs that are waiting on the socket (and are of the same read or
write type) are moved to the aio job queue.

This worked really well until I hit control-c and paniced the system :)
I had missed aio_proc_rundown, which cleans up the outstanding aio
requests before process exists.  I fixed this by a fair bit of frobbing
around in the aio_proc_rundown (find the socket, work through the queued
aiocbs, and remove the ones that are for the proc that is going away).
I'm now getting system hangs instead of panics, but I'm betting that's a
problem with my code since I think the concepts are sound.

Here's what I don't like:  

1. It seems silly to requeue socket read operations back to the main job
queue on an upcall--why not simply do the read in the upcall and be done
with it?
1a. Likewise for writes, but I'd much prefer the whole write to be
completed in one call, and we'd have to do a bit more messing around to
ensure this (like checking available buffer space, etc).
2. The linked list stuff for the socket queued aiocbs is really ugly.
The head of the list is the so_upcallarg element in the socket struct.
This linked list can include operations from multiple processes, and one
can't use the linked list macros since there's no place to have the head
end.

Here are a few things I don't understand, but live with:

1. I don't see what protects the async code from having the file
descriptor closed underneath it.  It seems that it is checked when the
operation is queued, but not afterward.  
2. We always call splx(s) _after_ tsleep(), which seems wierd to someone
who is used to userland multithreaded programming. (so I'm no kernel
expert)

(I also realized that my brain was AWOL when I commented previously on
what I thought might be a memory leak in the aio routines.) 

Here's what I think I'd like to do: 

1. Add a couple of tailqs to the socket structure, one to hold async read
requests, one for async write requests.  Arguably, a single one should be
sufficient, though it requires stepping through the list to find one with
a relevant operation.  

2. Fix the socket close routines to dispose of the aiocbs properly.  Fix
aio_proc_rundown to handle this scenario.

3. Fix the wakeup routine to execute reads in the wakeup, rather than
requeueing them.  Only the number of reads necessary to empty the buffer
should be executed (or all in the case of an error). 

4. Leave writes alone for now, by just requeueing them as I currently do.

This could also present a solution for the pid vs struct proc * problem in
the "flock + kernel threads bug" series.  Select operations could be
queued as async requests on the socket--they would then get killed by
aio_proc_rundown (with proper glue).  Same actually goes for all the
wakeup functions. Just a thought. 

Any comments or enlightenment would be appreciated.

-Chris


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message