From owner-svn-src-head@FreeBSD.ORG Thu Apr 25 12:03:58 2013 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B3BAB434; Thu, 25 Apr 2013 12:03:58 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx08.syd.optusnet.com.au (fallbackmx08.syd.optusnet.com.au [211.29.132.10]) by mx1.freebsd.org (Postfix) with ESMTP id 27BC21075; Thu, 25 Apr 2013 12:03:57 +0000 (UTC) Received: from mail36.syd.optusnet.com.au (mail36.syd.optusnet.com.au [211.29.133.76]) by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r3PBuCNb006860; Thu, 25 Apr 2013 21:56:12 +1000 Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail36.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r3PBu1gA025314 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 25 Apr 2013 21:56:03 +1000 Date: Thu, 25 Apr 2013 21:56:01 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jilles Tjoelker Subject: Re: svn commit: r249859 - head/lib/libc/sys In-Reply-To: <201304242124.r3OLOZW5034818@svn.freebsd.org> Message-ID: <20130425204458.F1034@besplex.bde.org> References: <201304242124.r3OLOZW5034818@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=A8I0pNqG c=1 sm=1 a=P_x8KVnH094A:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=uiiGxAOqq18A:10 a=ppjzmsSWu8qfP4cRltEA:9 a=CjuIK1q_8ugA:10 a=I_1LoNp5KbIVb1Mo:21 a=YFT-1n5FI_Fbnx_q:21 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Apr 2013 12:03:58 -0000 On Wed, 24 Apr 2013, Jilles Tjoelker wrote: > Log: > getdtablesize(2): Describe what this function actually does. > > getdtablesize() returns the limit on new file descriptors; this says nothing > about existing descriptors. It's still quite broken. > Modified: head/lib/libc/sys/getdtablesize.2 > ============================================================================== > --- head/lib/libc/sys/getdtablesize.2 Wed Apr 24 21:21:03 2013 (r249858) > +++ head/lib/libc/sys/getdtablesize.2 Wed Apr 24 21:24:35 2013 (r249859) > @@ -28,12 +28,12 @@ > .\" @(#)getdtablesize.2 8.1 (Berkeley) 6/4/93 > .\" $FreeBSD$ > .\" > -.Dd June 4, 1993 > +.Dd April 24, 2013 > .Dt GETDTABLESIZE 2 > .Os > .Sh NAME > .Nm getdtablesize > -.Nd get descriptor table size > +.Nd get file descriptor limit Now its name doesn't match its description, and the reason for this is not documented. This function is almost obsolete. In POSIX, it is spelled {OPEN_MAX} or sysconf(__SC_OPEN_MAX). It is sometimes misspelled OPEN_MAX. I prepared to remove the broken definition of OPEN_MAX, but never committed the final step. /usr/src has very few misuses of OPEN_MAX now, so removing the definition wouldn't be too hard. Most uses are in compatibility cruft. E.g., the following from crypto/openssh/openbsd-compat/bsd-closefrom.c which is confused about related things: @ /* @ * Fall back on sysconf() or getdtablesize(). We avoid checking @ * resource limits since it is possible to open a file descriptor @ * and then drop the rlimit such that it is below the open fd. @ */ This is a fallback for when some other compatibility cruft doesn't work. The part about resource limits is mostly wrong:... @ #ifdef HAVE_SYSCONF @ maxfd = sysconf(_SC_OPEN_MAX); @ #else @ maxfd = getdtablesize(); @ #endif /* HAVE_SYSCONF */ ... in 4.4BSD and FreeBSD, both sysconf(_SC_OPEN_MAX) are just wrappers for the resource limit (sysconf() is a libc wrapper and getdtablesize() is a syscall wrapper). Actually, in FreeBSD, getdtablesize() is not even the rlmint -- it is the min() of the rlimit and the global sysctl integer maxfilesperproc. Here the bug is in the rlimit. For the rlimit, maxfilesperproc is only used when the rlimit is set and when it is used in the kernel. But when the rlimit is returned to userland, via getrlimit(), maxfilesperproc is not used, so the rlimit may be wrong if maxfileperproc was lowered after setting the rlimit. @ if (maxfd < 0) @ maxfd = OPEN_MAX; This should be ifdefed. All POSIX systems have sysconf(), and that is ifdefed, but most don't have a constant OPEN_MAX. @ @ for (fd = lowfd; fd < maxfd; fd++) @ (void) close((int) fd); @ } Old code that ends up using {OPEN_MAX} under any correct spelling in loops like this works poorly. On freefall now, {OPEN_MAX} for users is 707112. Syscalls are slow, so a loop like this will take a significant fraction of a second on freefall. So getdtablesize() should never be used for its original purpose of setting an upper limit for loops like this. Better hope that this compatibility cruft is not used. (It is for closefrom(int lowfd), which FreeBSD has in libc. closefrom() is described weirdly as "deleting" file descriptors.) > .Sh LIBRARY > .Lb libc > .Sh SYNOPSIS > @@ -41,18 +41,20 @@ > .Ft int > .Fn getdtablesize void > .Sh DESCRIPTION > -Each process has a fixed size descriptor table, Actually, each process has a variable size descriptor table, and getdtablesize() doesn't give the size of this table. > -which is guaranteed to have at least 20 slots. Actually, {OPEN_MAX} is guaranteed by POSIX to be at least {_POSIX_OPEN_MAX}, and {_POSIX_OPEN_MAX} is precisely 20. But these guarantees and similar ones for stdio's FOPEN_MAX have always been broken in FreeBSD, since anyone can reduce the rlimit below 20. Privileged users can break the gurantee even more easily by setting maxfilesperproc below 20. When POSIX standardized rlimits, it didn't properly specify the behaviour for the interaction of the rlimit with {OPEN_MAX}, at least initially. The 2001 version breaks its own guarantee by just saying that if the rlimit is reduced to less than {_POSIX_OPEN_MAX}, then "unexpected behaviour may occur". Reductions from 707112 to less than 20 won't occur often in practice. Ones from 707112 to less than the largest currently open fd (+1) are more common in practice and cause similarly unexpected behaviours, but the 2001 version of POSIX is even more underspecified for them. > -The entries in > -the descriptor table are numbered with small integers starting at 0. Still correct, though not very interesting. > The > .Fn getdtablesize > -system call returns the size of this table. > +system call returns the maximum number of file descriptors > +that the current process may open. Actually, the process may open more than this number, after raising its (soft) rlimit, if this is possible. > +The maximum file descriptor number that the system may assign > +is the return value minus one. > +Existing file descriptor numbers may be higher > +if the limit was lowered after they were opened. > .Sh SEE ALSO > .Xr close 2 , > +.Xr closefrom 2 , > .Xr dup 2 , > -.Xr open 2 , > -.Xr select 2 > +.Xr getrlimit 2 , > +.Xr sysconf 2 > .Sh HISTORY > The > .Fn getdtablesize > open(2) is probably still relevant. It seems to be the natuaral place to document {OPEN_MAX}, but it says nothing about any spelling of OPEN_MAX. (The closest that it gets is saying that [EMFILE] means that the process has reached its limit for open file descriptors. apropos(1) gives nothing appropriate for OPEN_MAX. In fact, even OPEN_MAX is not mentioned in any man page. Only _SC_OPEN_MAX is mentioned (in sysconf(3)), and it is misdescribed as being the maximum number of open files per user id.) Some limits are better descrtibed than {OPEN_MAX}, in intro(2). I was a little surprised to not find much about the file descriptor limits there. In fact, there is a very incomplete description of them for [EMFILE]. This says that the limit is the release one of 64 (that was for the 4.4BSD-Lite* release) and that getdtablesize(2) will obtain the current limit. Similar historical limits have been changed to POSIX ones mainly for pathnames. Bruce