From owner-freebsd-arch@FreeBSD.ORG Sun Nov 23 23:14:41 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 063D91FE for ; Sun, 23 Nov 2014 23:14:41 +0000 (UTC) Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com [IPv6:2a00:1450:400c:c05::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8CFC33E8 for ; Sun, 23 Nov 2014 23:14:40 +0000 (UTC) Received: by mail-wi0-f182.google.com with SMTP id h11so4123998wiw.3 for ; Sun, 23 Nov 2014 15:14:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=Vb5wKYpLJNIMaYIJ3b01JmwV5ogHQDqcoXW+qOlbAeE=; b=Z3y7wnWBcEIAjy5RAw1/li9qjDgtpWD19GJi3UTMFngDuDUzClO7pJ1MkrOSXhOaZX nK0y9jGOaSKbXNME6u/6MvKTwaAd89zvR7/ZaqoDJPYiSDXg7LZfcnWpl0Xa5jbizFQT mGc0PQvmCs//EZA18A3GxSOoeYfNpXOBpYm5jHSFVQ2wGyNJZmvVJWiXhe2hudqzBePh IrVhg8oJhfD5FondioWxkgqiMnZRUS01AmSMBk9CfA/H1jRzNeca6cZLbEhoCEZk4thA 8jbH6BGXkzg/7kIHNLN8Px9ARih9xVKXOR53mj9eGeTofk4r76r0iUHKh7Yp0wqYdKBq xgjQ== X-Received: by 10.180.78.73 with SMTP id z9mr16151619wiw.52.1416784478915; Sun, 23 Nov 2014 15:14:38 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id r10sm9547586wiy.13.2014.11.23.15.14.37 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sun, 23 Nov 2014 15:14:38 -0800 (PST) Date: Mon, 24 Nov 2014 00:14:35 +0100 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: rarely changing process-wide data vs threads Message-ID: <20141123231435.GA32084@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Nov 2014 23:14:41 -0000 Currently we have some things frequently accessed which require locking, even though they very rarely change. This includes: - cwd, root, jdir vnodes - resource limits File lookup typically requires us to vref and unref cwd and root dir and locking filedesc lock shared which competes with fd open/close in other threads. Any resource limit checks requires taking PROC_LOCK, which is an exclusive lock. Turns out we already have a nice solution which only needs some minor refining and it was used to manage credentials: Each thread has a reference on active credentials and has its own pointer. When credentials are updated, a new structure is allocated and threads check that they got the right pointer on syscall boundary. If they got the wrong one, they lock PROC_LOCK and update. We can make this more general to suit other needs with an introduction of 'generation' counter and optionally an rwlock instead of using PROC_LOCK. If 'generation' is unequal to what is set in the process, at least one of creds/dirs/rlimits/$something needs updating and we can take the lock and iterate over structs. This may pose some concern since it may seem this introduces a window where given thread uses stale data while a concurrently executing thread uses new one. This window is already present for all users that I can see. During file lookups filedesc lock is only temporarily held (and current code even has a possible use after free since it does not start with refing root vnode while fdp is locked so it can be freed/recycled). resource limits are inherently racy anyway. proc lock is held only for a short them to read them, that's it. As such, I don't believe this approach introduces any new windows (although it extends already existing ones). When it comes to implementation of this concept for dir vnodes, one would need to split current struct filedesc. chdir in threaded processes would be more expensive since new struct would have to be allocated and vnodes vrefed, but chdirs are way less frequent than lookups so it should be worth it anyway. There is also a note on filedescs shared between processes. In such cases we would abandon this optimisation (dir struct can have a flag to note cow is not suitable and lookups need to vref like they do now). Comments? -- Mateusz Guzik