From owner-freebsd-arch@FreeBSD.ORG Thu Jul 17 23:55:43 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A5D3EB79 for ; Thu, 17 Jul 2014 23:55:43 +0000 (UTC) Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com [IPv6:2a00:1450:400c:c05::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3FAE02F77 for ; Thu, 17 Jul 2014 23:55:43 +0000 (UTC) Received: by mail-wi0-f178.google.com with SMTP id hi2so3635043wib.17 for ; Thu, 17 Jul 2014 16:55:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=r7XJHj1X6DUu/L8H1rPX9pue/L1X8WJ6cSwXous5zNc=; b=fFB1L/jfG0zSiD4nbmCW9Kljenw8ys+QWYgQ0PcAaU1p8aadYm0K7KKfcQrsp21cxC GAUrI4/wP8FKGL+6cSBr4Npn8C1DoqlM5+l+W0hSuO25dA/AR87+Vt11EoG0d5g1JlMD TJncV5LfkCJYyzQHwh+DN6KvxXCoh5u2J8eoZwP189l1YZHfhdj0rdXv/7y5YQJ+UFqV 7UTTaqSk5G6n7l+QMt2H61rkNbhCuuPxY5AVuLne9vlwGLjvMeTRVIjbkj33NMS4IN2n K8uyYLmaopxe5BmCSZRWZ7E5/yxTKB71ah5iDDlNPNwcAFjqCZ1Jf/FrsKeqwivDoYxC 9CRQ== X-Received: by 10.180.210.239 with SMTP id mx15mr2050986wic.65.1405641341458; Thu, 17 Jul 2014 16:55:41 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id x3sm18212wia.11.2014.07.17.16.55.40 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 17 Jul 2014 16:55:40 -0700 (PDT) Date: Fri, 18 Jul 2014 01:55:38 +0200 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: current fd allocation idiom Message-ID: <20140717235538.GA15714@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 23:55:43 -0000 The kernel has to deal with concurrent attempts to open, close, dup2 and use the same file descriptor. I start with stating a rather esoteric bug affecting this area, then I follow with a short overview of what is happening in general and a proposal how to change to get rid of the bug and get some additional enchancements. Interestingly enough turns out Linux is doing pretty much the same thing. ============================ THE BUG: /* * Initialize the file pointer with the specified properties. * * The ops are set with release semantics to be certain that the flags, type, * and data are visible when ops is. This is to prevent ops methods from being * called with bad data. */ void finit(struct file *fp, u_int flag, short type, void *data, struct fileops *ops) { fp->f_data = data; fp->f_flag = flag; fp->f_type = type; atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t)ops); } This itself is fine, but it assumes all code obtaining fp from fdtable places a read memory barrier after reading f_ops and before reading anything else. As you could guess no code does that and I don't believe placing rmb's in several new places is the way to go. ============================ GENERAL OVERVIEW OF CURRENT STATE: fps are obtained and installed as follows: struct file *fp; int fd; if (error = falloc(&fp, &fd)) return (error); if (error = something(....)) { fdclose(fp, fd); fdrop(fp); return (error); } finit(fp, ....); fdrop(fp); return (0); After falloc fp is installed in fdtable, it has refcount 2 and ops set to 'badfileops'. if something() failed: fdclose() checks if it has anything to do. if so, it cleares fd and fdrops fp fdrop() clears the second reference, everything is cleared up if something() succeeded: finit() finishes initialization of fp fdrop() cleares the second reference. fp now has expected refcount of 1. Now a little complication: parallel close() execution: fd is recognizes as used. it is cleared and fdrop(fp) is called. if something() succeeded after close: fdrop() kills fp if something() failed after close: fdclose() concludes nothing to do fdrop() kill fp Same story with dup2. What readers need to do: - rmb() after reading fp_ops - check fp_ops for badfileops ============================ PROPOSAL: struct file *fp; int fd; if (error = falloc(&fp, &fd)) return (error); if (error = something(....)) { fdclose(fp, fd); return (error); } finit(fp, ....); factivate(fd, fp); return (0); After falloc fd is only marked as used, fp is NOT installed. fp is returned with refcount of 1 and is invisible to anyone but curthread. if something() failed: fdclose() marks fd as unused and kills fp if something() succeeded: finit() finishes initialization of fp factivate() sets fp to non-null with a barrier Now a little complication: parallel close() execution: since fp is null, fd is recognized as unused. EBADF is returned. The only problem is with dup2 and I believe it is actually a step forward. Let's assume fd was marked as used by falloc, but fp was not installed yet. dup2(n, fd) will see that fd is used. With current code there is no problem since there is fp to fdrop and it can proceed. With the proposal however, there is nothing to fdrop. Linux returns EBADF in this case which deals with the problem and does not seem to provide any drawbacks for behaving processes. So, differences to current approach: 1. fewer barriers and atomic operations 2. no need to check for f_ops type 3. new case when dup2 can return an error Note that 3 should not be a problem since Linux is doing this already. Also note current approach is not implemented correctly at the moment as it misses rmbs, although I'm unsure how much this matters in practice. Thoughts? -- Mateusz Guzik