Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Sep 2007 05:28:05 -0400
From:      Ighighi <ighighi@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   add closefrom() call revisited
Message-ID:  <46F0EBA5.7020802@gmail.com>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------070205070500040607020800
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Given that NetBSD, OpenBSD and DragonFly (as well as Solaris and maybe
others) it'd be nice and worthwhile to implement it too on FreeBSD.

The attached shar archive contains 4 possible implementations of it.
One, a system call (the approach use by the other BSD's), available
here as a loadable kernel module for quick testing.  The remaining 3
others are library versions.  One of them doesn't currently work since
FreeBSD lacks a /proc/<pid>/fd/ that I tried to emulate with /dev/fd/,
both via devfs(5) and fdescfs(5): they seem to lacks some types of
file descriptors...  Another just does what a lot of programs do: try
close() on every possible file descriptor and the other uses sysctl().

The implementation was inspired by the DragonFly code but the semantics
match Open/NetBSD's (EBADF vs EINVAL). Their code is available at:
http://www.dragonflybsd.org/cvsweb/~checkout~/src/sys/kern/kern_descrip.c
http://cvsweb.netbsd.org/bsdweb.cgi/~checkout~/src/sys/kern/kern_descrip.c

Also included in the archive is a timing test along with a regression
test borrowed from OpenSSH.

It was successfully built and tested on FreeBSD 6.2-STABLE.
There's code to make it work in -CURRENT.

A sample run on a Pentium 4 1.7Ghz:

$ make test
Trying closefrom_syscall(3) with 58976 open file descriptors
user    0.000000        sys     0.030874        total   0.030874
Trying closefrom_syscall(3) with 58976 closed file descriptors
user    0.000000        sys     0.000008        total   0.000008

Trying closefrom_sysctl(3) with 58976 open file descriptors
user    0.050941        sys     0.045333        total   0.096274
Trying closefrom_sysctl(3) with 58976 closed file descriptors
user    0.000877        sys     0.000939        total   0.001816

Trying closefrom_brute(3) with 58976 open file descriptors
user    0.037777        sys     0.043793        total   0.081570
Trying closefrom_brute(3) with 58976 closed file descriptors
user    0.026666        sys     0.046383        total   0.073049

closefrom_sysctl() has a a worst-case scenario when a lot of files
are open that may make it slower than closefrom_brute().
Implementations using /proc/<pid>/fd/ are also vulnerable to this.
With no library version guaranteed to be faster, and because of the
various reasons discussed in
http://lists.freebsd.org/pipermail/freebsd-hackers/2007-July/thread.html
I believe it'd be best to implement it as a system call (which can be
done through fcntl() anyway).

More info is included in the README.

Any ideas, suggestions?
Salutes,
Igh

--------------070205070500040607020800
Content-Type: text/plain;
 name="closefrom.shar"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="closefrom.shar"

#!/bin/sh
# This is a shell archive
echo x closefrom
mkdir -p closefrom > /dev/null 2>&1
echo x closefrom/Makefile
sed 's/^X//' > closefrom/Makefile << 'SHAR_END'
XSUBDIR	= module test
X
X.include <bsd.subdir.mk>
SHAR_END
echo x closefrom/README
sed 's/^X//' > closefrom/README << 'SHAR_END'
XOVERVIEW
X
XThis tarball contains 4 possible implementations of closefrom().
XThe first, a system call, is located in ./module/syscall.c and is
Xavailable as a kernel module for quick testing.
X
XBoth NetBSD >= 3.0 and DragonFly >= 1.4 implement it as a system call.
XIn NetBSD, it uses the F_CLOSEM fcntl(), available since version 2.0.
X
XThe second, implemented with the kern.file sysctl(), is available
Xon both FreeBSD >= 5.0 and DragonFly >= 1.2.  Dynamic memory should be
Xallocated for an array of "struct xfile" structures that describes each
Xopen file descriptor open file descriptor _for every running process_ in
Xthe system...! (Note: the sysctl(3) manpage should be patched to reflect
Xthe current behaviour since FreeBSD 5.0: it should mention struct xfile).
XIn my system, the size of this structure is 52 bytes, so it could fail
Xon systems that setup a larger kern.maxfiles.  This function would be
Xcleaner to implement in NetBSD which has an (undocumented) kern.file2
Xthat lets you work with a specific pid instead by passing KERN_FILE_BYPID.
X
XThe third is the usual brute force approach that uses getdtablesize(),
Xused for reference on the approach most applications take.
X
XThe fourth tries to do what some implementations (including Solaris') do
Xby browsing /proc/<pid>/fd/ but using /dev/fd/.  Unfortunately, it doesn't
Xwork because neither devfs(5) nor fdescfs(5) seem to include duplicated
Xfile descriptors, sockets and maybe others.
X
X-o-
X
XIt was successfully built and tested on FreeBSD 6.2-STABLE (as of
XSept, 18 2007), though code that should work on -CURRENT is present
X(namely, the new FILEDESC_S[UN]LOCK macros).
X
XTo try the implementations, run these commands as follows:
X
Xcd module
Xmake
Xsudo make load
Xcd ..
Xcd test
Xmake
Xmake check
Xmake test
X
XFor repeated testing of any of the implementations you may run:
X./closefrom syscall
X./closefrom sysctl
X./closefrom brute
X
SHAR_END
echo x closefrom/module
mkdir -p closefrom/module > /dev/null 2>&1
echo x closefrom/test
mkdir -p closefrom/test > /dev/null 2>&1
echo x closefrom/test/closefrom.c
sed 's/^X//' > closefrom/test/closefrom.c << 'SHAR_END'
X/*
X * Copyright (c) 2007 by Ighighi
X * All rights reserved.
X *
X * Redistribution and use in source and binary forms, with or without
X * modification, are permitted provided that the following conditions
X * are met:
X *
X * 1. Redistributions of source code must retain the above copyright
X *    notice, this list of conditions and the following disclaimer.
X * 2. Redistributions in binary form must reproduce the above copyright
X *    notice, this list of conditions and the following disclaimer in the
X *    documentation and/or other materials provided with the distribution.
X *
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
X */
X
X#include <dirent.h>
X#include <err.h>
X#include <errno.h>
X#include <fcntl.h>
X#include <limits.h>
X#include <stdio.h>
X#include <stdlib.h>
X#include <string.h>
X#include <unistd.h>
X#include <sys/types.h>
X#include <sys/param.h>
X#include <sys/file.h>
X#include <sys/resource.h>
X#include <sys/time.h>
X#include <sys/sysctl.h>
X
X#include <sys/syscall.h>
X#include <sys/module.h>
X
X#define DEBUG
X
Xstatic void
Xusage(const char *argv0)
X{
X	fprintf(stderr, "Usage: %s syscall|sysctl|brute|devfd\n"
X		"Usage: %s check\n", argv0, argv0);
X	exit(1);
X}
X
Xstatic int (*closefrom)(int);	/* pointer to closefrom_xxx() */
X
X/*
X * LKM version of closefrom()
X */
X
Xstatic int syscall_num;
X
Xstatic void
Xfind_module(void)
X{
X	struct module_stat stat;
X	int modid;
X
X    	modid = modfind("closefrom");
X	if (modid == -1)
X		err(1, "modfind(closefrom)");
X
X	stat.version = sizeof(stat);
X	if (modstat(modid, &stat) == -1)
X		err(1, "modstat()");
X
X	syscall_num = stat.data.intval;
X}
X
Xstatic int
Xclosefrom_syscall(int lowfd)
X{
X	return (syscall(syscall_num, lowfd));
X}
X
X/*
X * This version uses the kern.file sysctl()
X */
Xstatic int
Xclosefrom_sysctl(int lowfd)
X{
X	int mib[2] = { CTL_KERN, KERN_FILE };
X	struct xfile *files = NULL;
X	pid_t pid = getpid();
X	size_t fsize;
X	int i, nfiles;
X
X	if (lowfd < 0) {
X		errno = EBADF;
X		return (-1);
X	}
X
X	for (;;) {
X		if (sysctl(mib, 2, files, &fsize, NULL, 0) == -1) {
X			if (errno != ENOMEM)
X				goto bad;
X			else if (files != NULL) {
X				free(files);
X				files = NULL;
X			}
X		} else if (files == NULL) {
X			files = (struct xfile *) malloc(fsize);
X			if (files == NULL)
X	    			return (-1);
X		} else
X			break;
X	}
X
X        /* XXX This structure may change */
X	if (files->xf_size != sizeof(struct xfile) ||
X		fsize % sizeof(struct xfile))
X	{
X		errno = ENOSYS;
X		goto bad;
X	}
X
X	nfiles = fsize / sizeof(struct xfile);
X
X	for (i = 0; i < nfiles; i++)
X		if (files[i].xf_pid == pid && files[i].xf_fd >= lowfd)
X			if (close(files[i].xf_fd) < 0 && errno == EINTR)
X				goto bad;
X
X	free(files);
X	return (0);
X
Xbad:
X	if (files != NULL) {
X		int save_errno = errno;
X		free(files);
X		errno = save_errno;
X	}
X	return (-1);
X}
X
X/*
X * This version iterates over all possible file descriptors >= lowfd
X */
Xstatic int
Xclosefrom_brute(int lowfd)
X{
X	int fd;
X
X	if (lowfd < 0) {
X		errno = EBADF;
X		return (-1);
X	}
X
X	for (fd = getdtablesize(); fd >= lowfd; fd--)
X		if (close(fd) < 0 && errno == EINTR)
X			return (-1);
X
X	return (0);
X}
X
X/*
X * An example implementation using /dev/fd (other systems use /proc/<pid>/fd)
X * Unfortunately, on FreeBSD, fdescf(5) doesn't include duplicated file
X * descriptors and sockets.
X */
Xstatic int
Xclosefrom_devfd(int lowfd)
X{
X	struct dirent *d;
X	DIR *dir;
X	int fd;
X
X	if (lowfd < 0) {
X		errno = EBADF;
X		return (-1);
X	}
X
X	/*
X	 * Close lowfd so we have a spare fd to use with /dev/fd
X	 */
X	close(lowfd++);
X
X	if ((dir = opendir("/dev/fd")) == NULL)
X		return (-1);
X
X	while ((d = readdir(dir)) != NULL) {
X#ifdef DEBUG
X		printf("%s\n", d->d_name);
X#endif
X		if (d->d_name[0] == '.')
X			continue;
X		fd = atoi(d->d_name);
X		if (fd >= lowfd && fd != dirfd(dir))
X			if (close(fd) < 0 && errno == EINTR)
X				goto bad;
X	}
X
X	(void)closedir(dir);
X	return (0);
X
Xbad:
X	{
X		int save_errno = errno;
X		(void)closedir(dir);
X		errno = save_errno;
X	        return (-1);
X	}
X}
X
Xstatic void
Xtime_closefrom(int lowfd)
X{
X	struct rusage ru, rux;
X	struct timeval tv;
X	double usecs, ssecs;
X
X	if (getrusage(RUSAGE_SELF, &ru) < 0)
X		err(1, "getrusage()");
X	if (closefrom(lowfd) < 0)
X		err(1, "closefrom()");
X	if (getrusage(RUSAGE_SELF, &rux) < 0)
X		err(1, "getrusage()");
X
X	timersub(&rux.ru_utime, &ru.ru_utime, &tv);
X	usecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000);
X	printf("user\t%f\t", usecs);
X	timersub(&rux.ru_stime, &ru.ru_stime, &tv);
X	ssecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000);
X	printf("sys\t%f\t", ssecs);
X	usecs += ssecs;
X	printf("total\t%f\n", usecs);
X}
X
Xstatic void
Xtry(int (*xclosefrom)(int), const char *str)
X{
X	int fd, lowfd, maxfd;
X
X	lowfd = dup(STDIN_FILENO);
X	maxfd = getdtablesize();
X	for (fd = 1; fd < maxfd; fd++)
X		if (dup(STDIN_FILENO) < 0)
X			break;
X
X	closefrom = xclosefrom;
X	printf("Trying %s(%d) with %d open file descriptors\n", str, lowfd, fd);
X	time_closefrom(lowfd);
X
X	printf("Trying %s(%d) with %d closed file descriptors\n", str, lowfd, fd);
X	time_closefrom(lowfd);
X	printf("\n");
X}
X
Xint test(int (*)(int));
X
Xint
Xmain(int argc, char *argv[])
X{
X	if (argv[1] == NULL)
X		usage(argv[0]);
X
X	if (!strcmp(argv[1], "check")) {
X		find_module();
X		printf("testing closefrom_syscall():\t%s\n",
X			test(&closefrom_syscall) ? "failed" : "ok");
X		printf("testing closefrom_sysctl():\t%s\n",
X			test(&closefrom_sysctl) ? "failed" : "ok");
X		printf("testing closefrom_brute():\t%s\n",
X			test(&closefrom_brute) ? "failed" : "ok");
X	}
X	else if (!strcmp(argv[1], "syscall")) {
X		find_module();
X		try(&closefrom_syscall, "closefrom_syscall");
X	}
X	else if (!strcmp(argv[1], "sysctl"))
X		try(&closefrom_sysctl, "closefrom_sysctl");
X	else if (!strcmp(argv[1], "devfd"))
X		try(&closefrom_devfd, "closefrom_devfd");
X	else if (!strcmp(argv[1], "brute"))
X		try(&closefrom_brute, "closefrom_brute");
X	else
X		usage(argv[0]);
X
X	return (0);
X}
X
X/*
X * NOTE:
X *   The following code was adapted from OpenSSH's
X *   openbsd-compat/regress/closefromtest.c
X */
X
X/*
X * Copyright (c) 2006 Darren Tucker
X *
X * Permission to use, copy, modify, and distribute this software for any
X * purpose with or without fee is hereby granted, provided that the above
X * copyright notice and this permission notice appear in all copies.
X *
X * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
X * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
X * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
X * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
X * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
X * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
X * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
X */
X
X#define NUM_OPENS 10
X
X#define fail(str)	\
X    do { printf("%s\n", (str));	\
X	return -1; } while(0)
X
Xint
Xtest(int (*xclosefrom)(int))
X{
X	int i, max, fds[NUM_OPENS];
X	char buf[512];
X
X	for (i = 0; i < NUM_OPENS; i++)
X		if ((fds[i] = open("/dev/null", O_RDONLY)) == -1)
X			exit(0);	/* can't test */
X	max = i - 1;
X
X	/* should close last fd only */
X	xclosefrom(fds[max]);
X	if (close(fds[max]) != -1)
X		fail("failed to close highest fd");
X
X	/* make sure we can still use remaining descriptors */
X	for (i = 0; i < max; i++)
X		if (read(fds[i], buf, sizeof(buf)) == -1)
X			fail("closed descriptors it should not have");
X
X	/* should close all fds */
X	xclosefrom(fds[0]);
X	for (i = 0; i < NUM_OPENS; i++)
X		if (close(fds[i]) != -1)
X			fail("failed to close from lowest fd");
X
X	return 0;
X}
SHAR_END
echo x closefrom/test/Makefile
sed 's/^X//' > closefrom/test/Makefile << 'SHAR_END'
XPROG	= closefrom
XNO_MAN	=
X
XCFLAGS	= -Wall -O2
X
Xcheck:	${PROG}
X	@./${PROG} check
X
Xtest:	${PROG}
X	@./${PROG} syscall
X	@./${PROG} sysctl
X	@./${PROG} brute
X
X.include <bsd.prog.mk>
SHAR_END
echo x closefrom/module/Makefile
mkdir -p closefrom/module > /dev/null 2>&1
sed 's/^X//' > closefrom/module/Makefile << 'SHAR_END'
XKMOD	= syscall
XSRCS	= syscall.c vnode_if.h
X
XCFLAGS += -Wall
X
Xreload:
X	@${MAKE} unload
X	@${MAKE} load
X
X.include <bsd.kmod.mk>
SHAR_END
echo x closefrom/module/syscall.c
sed 's/^X//' > closefrom/module/syscall.c << 'SHAR_END'
X/*
X * Copyright (c) 2007 by Ighighi
X * All rights reserved.
X *
X * Redistribution and use in source and binary forms, with or without
X * modification, are permitted provided that the following conditions
X * are met:
X *
X * 1. Redistributions of source code must retain the above copyright
X *    notice, this list of conditions and the following disclaimer.
X * 2. Redistributions in binary form must reproduce the above copyright
X *    notice, this list of conditions and the following disclaimer in the
X *    documentation and/or other materials provided with the distribution.
X *
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
X */
X
X#include <sys/param.h>
X#include <sys/file.h>
X#include <sys/filedesc.h>
X#include <sys/kernel.h>
X#include <sys/proc.h>
X#include <sys/syscallsubr.h>
X#include <sys/sysent.h>
X#include <sys/systm.h>
X#include <sys/vnode.h>
X#include <sys/module.h>
X
X/*
X * Newer code in FreeBSD > 6.2 use shared/exclusive locks
X */
X#ifndef FILEDESC_SLOCK
X#define FILEDESC_SLOCK		FILEDESC_LOCK_FAST
X#define FILEDESC_SUNLOCK	FILEDESC_UNLOCK_FAST
X#endif
X
X/*
X * kern_closefrom()
X */
Xstatic int
Xkern_closefrom(struct thread *td, int lowfd)
X{
X	struct filedesc *fdp;
X	int fd;
X
X	/*
X	 * Note: NetBSD uses EBADF and Dragonly uses (undocumented) EINVAL
X	 */
X	if (lowfd < 0)
X		return (EBADF);
X
X	fdp = td->td_proc->p_fd;
X
X	FILEDESC_SLOCK(fdp);
X	while ((fd = fdp->fd_lastfile) >= lowfd) {
X		FILEDESC_SUNLOCK(fdp);
X		if (kern_close(td, fd) == EINTR)
X			return (EINTR);
X		FILEDESC_SLOCK(fdp);
X	}
X	FILEDESC_SUNLOCK(fdp);
X
X	return (0);
X}
X
X/* closefrom() arguments */
Xstruct closefrom_args {
X	int fd;
X};
X
Xstatic int
Xclosefrom(struct thread *td, void *args)
X{
X	struct closefrom_args *uap = (struct closefrom_args *)args;
X
X	return (kern_closefrom(td, uap->fd));
X}
X
X/* closefrom() sysent[] */
Xstatic struct sysent closefrom_sysent = {
X	1,		/* number of arguments */
X	closefrom	/* implementing function */
X};
X
X/*
X * LKM stuff
X */
X
X/* offset in sysent[] where the syscall will be allocated */
Xstatic int offset = NO_SYSCALL;
X
Xstatic int
Xload(struct module *module, int cmd, void *arg)
X{
X	int error = 0;
X
X	switch (cmd) {
X	case MOD_LOAD:
X		uprintf("closefrom loaded at offset %d\n", offset);
X		break;
X
X	case MOD_UNLOAD:
X		uprintf("closefrom unloaded from offset %d\n", offset);
X		break;
X
X	default:
X		error = EOPNOTSUPP;
X		break;
X	}
X
X	return (error);
X}
X
XSYSCALL_MODULE(closefrom, &offset, &closefrom_sysent, load, NULL);
SHAR_END
exit

--------------070205070500040607020800--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?46F0EBA5.7020802>