From owner-cvs-src@FreeBSD.ORG  Wed Aug 13 10:16:29 2008
Return-Path: <owner-cvs-src@FreeBSD.ORG>
Delivered-To: cvs-src@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 76840106566B;
	Wed, 13 Aug 2008 10:16:29 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au
	[211.29.132.186])
	by mx1.freebsd.org (Postfix) with ESMTP id D4D338FC13;
	Wed, 13 Aug 2008 10:16:28 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au
	(c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11])
	by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	m7DAGMSZ002758
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 13 Aug 2008 20:16:26 +1000
Date: Wed, 13 Aug 2008 20:16:22 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20080813014505.H1310@besplex.bde.org>
Message-ID: <20080813195547.W91887@delplex.bde.org>
References: <200808081343.m78DhwYE068477@repoman.freebsd.org>
	<20080812014937.E21092@besplex.bde.org>
	<20080812231130.D760@besplex.bde.org>
	<200808121115.01483.jhb@freebsd.org>
	<20080813014505.H1310@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org,
	Ed Schouten <ed@FreeBSD.org>, cvs-all@FreeBSD.org,
	John Baldwin <jhb@FreeBSD.org>
Subject: Re: cvs commit: src/sys/dev/io iodev.c
X-BeenThere: cvs-src@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: CVS commit messages for the src tree <cvs-src.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-src>
List-Post: <mailto:cvs-src@freebsd.org>
List-Help: <mailto:cvs-src-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Aug 2008 10:16:29 -0000

On Wed, 13 Aug 2008, Bruce Evans wrote:

> On Tue, 12 Aug 2008, John Baldwin wrote:

>> Of course bpf is
>> broken with revoke, but nobody uses revoke with bpf.  What people do do in
>> the normal course of using bpf is lots of concurrent bpf accesses, and w/o
>> D_TRACKCLOSE, bpf devices don't get closed.
>
> Why not fix the actual bug?
>
> Your commit doesn't give enough details on the actual bug, so I will
> try to guess it:  libpcap has to probe for for a free bpf unit, so it
> does lots of failing opens of bpfN (especially when N == 0) when bpfN
> is already in use.  Failing opens break last closes with which they
> are concurrent, because the relevant reference count (si_usecount) is
> increased during the failing open (I think it is the vref() in _fgetvp()
> that does it).  Then when the opens fail, si_usecount is decremented
> to 1, but devfs_close() is not called again because only 1 real last
> close is possible (I think -- at least without races with revoke()),
> so d_close() is never called twice for 1 real least close.  Failing
> opens shouldn't take long, so it is surprising that the race is often
> lost.  Apparently there is some synchronization.
>
> This bug probably also affects the probe for a free pty.

In fact, it is well known that it does.  There is a PR or 2 about this,
and committed fixes that were backed out because they gave panics instead
of just leaked ptys.  The following seems to my most recent mail about
this.  The test program in it still leaks ptys under -current.

% From bde@zeta.org.au Thu Nov  9 16:39:06 2006 +1100
% Date: Thu, 9 Nov 2006 16:39:03 +1100 (EST)
% From: Bruce Evans <bde@zeta.org.au>
% X-X-Sender: bde@delplex.bde.org
% To: Rong-en Fan <grafan@gmail.com>
% cc: Kostik Belousov <kostikbel@gmail.com>, bde@FreeBSD.org, tegge@FreeBSD.org, 
%     mb@imp.ch, Vlad Galu <dudu@dudu.ro>
% Subject: Re: Fwd: panic when portupgrade in jail (devfs related?)
% In-Reply-To: <6eb82e0611081559s2dd6d088k529b161a63b9eec2@mail.gmail.com>
% Message-ID: <20061109155921.O59908@delplex.bde.org>
% References: <6eb82e0611050847i54d16638x89c428c9dffcc106@mail.gmail.com> 
%  <6eb82e0611060946p726b3418of94036f583e95dcc@mail.gmail.com> 
%  <20061106193405.GU12108@deviant.kiev.zoral.com.ua> 
%  <6eb82e0611061146gd6b8402xc254a992d149dd11@mail.gmail.com> 
%  <20061107041730.GV12108@deviant.kiev.zoral.com.ua> 
%  <6eb82e0611062223o4b7193bbu4fb19c79f49e9700@mail.gmail.com> 
%  <6eb82e0611070026m682bc3f7vfdbcb7c0527fdd06@mail.gmail.com> 
%  <6eb82e0611080221u47473713m17b19e60307663a0@mail.gmail.com>
%  <6eb82e0611081559s2dd6d088k529b161a63b9eec2@mail.gmail.com>
% MIME-Version: 1.0
% Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
% Status: O
% X-Status: 
% X-Keywords: 
% X-UID: 623
% 
% On Thu, 9 Nov 2006, Rong-en Fan wrote:
% 
% > OK, after running 4 instances of script in jail for 10+ hours. It seems
% > that ttys are leaked (I haven't got any panic). I get lots of
% >
% > script: openpty: No such file or directory
% 
% This is another known bug.  No fix is in sight, but I think one of the
% other known bugs can be exploited to write a program to unleak the
% ptys.  (Arrange for a nonzero vfs refcount, possibly by attempting to
% open the leaked pty, and race with revoke().  The open() will always fail
% int ptcopen(), but it may be possible to trick revoke() into calling
% ptcclose() for the unopen device; then the ptcclose() will unleak the
% device.)
% 
% I now have a better SMP machine for testing the races.  The original
% version of my program to demonstrate the leak doesn't show the leak
% very fast under -current on this machine, and it takes 2 instances
% (parent doing open()s and child doing stat()s) to leak at all, though
% the leak occurs fast on a machine with similar speed (but configured
% with INVARIANTS etc) running 6.1.  Apparently, locking fixes in
% -current have fixed or reduced the race with stat().
% 
% The following version (with TIOCEXCL and both the parent and child
% doing open()'s) leaks fast on all machines that I tested on.
% 
% %%%
% #include <sys/ioctl.h>
% #include <sys/stat.h>
% 
% #include <err.h>
% #include <fcntl.h>
% #include <unistd.h>
% 
% int
% main(int argc, char **argv)
% {
%  	struct stat sb;
%  	const char *pty, *who;
%  	pid_t child, parent;
%  	int fd, i, lastok;
% 
%  	pty = (argc == 1 ? "/dev/ptyp0" : argv[1]);
%  	i = 0;
%  	lastok = -1;
%  	parent = getpid();
%  	child = fork();
%  	who = (child == 0 ? "child" : "parent");
%  	for (;;) {
%  		fd = open(pty, O_WRONLY | O_NONBLOCK);
%  		if (fd < 0) {
%  			if (i - lastok >= 1000) {
%  				if (child != 0)
%  					usleep(50000);
%  				err(1, "%s open %d", who, i);
%  			}
%  			i++;
%  			continue;
%  		}
%  		(void)ioctl(fd, TIOCEXCL);
%  		if (close(fd) != 0)
%  			err(1, "%s close %d", who, i);
%  		lastok = i;
%  		i++;
%  	}
% }
% %%%
% 
% Usage is something like "./foo /dev/ptyp<victim>" as non-root.  It
% quits after 1000 failed opens and a leaked pty is shown by failing
% open #'s 0 through 999 and printing i == 999 when quitting.  Some
% open failures are now normal for ptyp* since all opens of ptyp*
% are exclusive and the program now races itself.  The TIOCEXCL is to
% give similar exclusivity for testing other devices.  It only works
% as non-root.  Remove it to simplify testing of ptyp* only.  After
% an open() failure, the usleep() is to give up control so as not
% to see 1000 sequential open() failures on UP systems just because
% 1000 open()s can be done before being rescheduled.  Remove it to
% simplify testing on SMP systems.
% 
% Bruce
%

Any device tha enforces exclusive access for all opens (instead of
selectively/correctly according to O_EXCL and/or TIOCEXCL) has the
same leak.  Such devices must have some state which indicates that
the device is open.  When a last-close is missed, the device-is-open
state remains set, so the device cannot be opened again.  Then it
cannot be closed again.  For devices that don't do this, the missed
last-close isn't much of a problem.  New opens just work, and it
is possible to unleak the device by repeating open()/close() until
the last-close isn't missed.

Bruce