Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Dec 2006 10:38:33 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        David Xu <davidxu@freebsd.org>
Cc:        Daniel Eischen <deischen@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: close() of active socket does not work on FreeBSD 6
Message-ID:  <20061221102909.O83974@fledge.watson.org>
In-Reply-To: <200612210820.09955.davidxu@freebsd.org>
References:  <32874.1165905843@critter.freebsd.dk> <20061220153126.G85384@fledge.watson.org> <Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net> <200612210820.09955.davidxu@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Thu, 21 Dec 2006, David Xu wrote:

> On Thursday 21 December 2006 02:18, Daniel Eischen wrote:
>> On Wed, 20 Dec 2006, Robert Watson wrote:
>>> On Wed, 13 Dec 2006, Daniel Eischen wrote:
>>>> Anyway, this was just a thought/idea.  I don't mean to argue against any
>>>> of the other reasons why this isn't a good idea.
>>>
>>> Whatever may be implemented to solve this issue will require a fairly 
>>> serious re-working of how we implement file descriptor reference counting 
>>> in the kernel.  Do you propose similar "cancellation" of other system 
>>> calls blocked on the file descriptor, including select(), etc?  Typically 
>>> these system calls interact with the underlying object associated with the 
>>> file descriptor, not the file descriptor itself, and often, they act 
>>> directly on the object and release the file descriptor before performing 
>>> their operation. I think before we can put any reasonable implementation 
>>> proposal on the table, we need a clear set of requirements:
>>
>> [ ... ]
>>
>>> While providing Solaris-like semantics here makes some amount of sense, 
>>> this is a very tricky area, and one where we're still refining performance 
>>> behavior, reference counting behavior, etc.  I don't think there will be 
>>> any easy answers, and we need to think through the semantic and 
>>> performance implications of any change very carefully before starting to 
>>> implement.
>>
>> I don't think the behavior here has to be any different that what we 
>> currently (or desire to) do with regard to (unblocked) signals interrupting 
>> threads waiting on IO.  You can spend a lot of time thinking about how 
>> close() should affect IO operations on the same file descriptor, but a very 
>> simple approach is to treat them the same as if the operations were 
>> interrupted by a signal.  I'm not suggesting it is implemented the same 
>> way, just that it seems to make a lot of sense to me that the behavior is 
>> consistent between the two.
>
> I think the main concern is if we will record every thread using a fd, that 
> means, when you call read() on a fd, you record your thread pointer into the 
> fd's thread list, when one wants to close the fd, it has to notify all the 
> threads in the list, set a flag for each thread, the flag indicates a thread 
> is interrupted because the fd was closed, when the thread returns from deep 
> code path to read() syscall, it should check the flag, and return EBADF to 
> user if it was set. whatever, a reserved signal or TDF_INTERRUPT may 
> interrupt a thread. but since there are many file operations, I don't know 
> if we are willing to pay such overheads to every file syscall, extra locking 
> is not welcomed.

Yes, as well as adding quite a bit of complexity and opening the door for some 
rather odd/unfortunate races.  You can inspect the bulk of the Solaris 
implementation by looking at three spots:

http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=closeandsetf 
http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=post_syscall 
http://fxr.watson.org/fxr/search?v=OPENSOLARIS&string=MUSTRETURN

In closeandsetf(), you can see that an additional layer of indirection 
associated with the file descriptor is maintained in order to count consumers 
of a particular fd, not just the open file record, and the set of active fds 
for each thread is maintained.  When a close() is performed and there are 
still other open consumers, the process is suspended and all threads are 
inspected to see if the fd is active for the thread, in which case a thread 
flag indicating that a stale fd is set.  I believe that the interrupt here is 
an implicit part of the process suspend/restart, and in post_syscall() the 
EINTR returns are remapped to EBADF.

That extra level of indirection and use tracking will be both complex and a 
performance hit in a critical kernel path.  I'm not opposed to investigating 
implementing something along these lines, but I think we should defer this for 
some time while we sort out more pressing issues in our kernel file 
descriptor/socket/etc code and revist this in a few months.  We will need to 
carefully evaluate the performance costs, and if they are significant, figure 
out how to avoid this causing a significant hit.  It's worth observing that 
removing one level of reference counting from the socket send/receive paths 
(using the file descriptor reference instead of the socket reference) made a 
5%+ difference in high speed send performance.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061221102909.O83974>