From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 02:08:55 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 795E8106564A for ; Sun, 13 Apr 2008 02:08:55 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 4F2A68FC16 for ; Sun, 13 Apr 2008 02:08:55 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 1E2001A4D7E; Sat, 12 Apr 2008 19:08:55 -0700 (PDT) Date: Sat, 12 Apr 2008 19:08:55 -0700 From: Alfred Perlstein To: Jeff Roberson Message-ID: <20080413020855.GA95731@elvis.mu.org> References: <200804121703.m3CH3StJ081660@chez.mckusick.com> <41ED3941-E5E6-45F0-B880-C1B2861FDE32@rabson.org> <20080412131017.S43186@desktop> <20080412234547.GZ95731@elvis.mu.org> <20080412135135.V43186@desktop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080412135135.V43186@desktop> User-Agent: Mutt/1.4.2.3i Cc: Kirk McKusick , arch@freebsd.org Subject: Re: VOP_LEASE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 02:08:55 -0000 * Jeff Roberson [080412 16:51] wrote: > > On Sat, 12 Apr 2008, Alfred Perlstein wrote: > > >* Jeff Roberson [080412 16:13] wrote: > >>On Sat, 12 Apr 2008, Doug Rabson wrote: > >> > >>> > >>>On 12 Apr 2008, at 18:03, Kirk McKusick wrote: > >>> > >>>>>Date: Sat, 12 Apr 2008 02:13:15 -1000 (HST) > >>>>>From: Jeff Roberson > >>>>>To: arch@freebsd.org > >>>>>Subject: VOP_LEASE > >>>>> > >>>>>As far as I can tell this has never been used. Unless someone can show > >>>>>me > >>>>>otherwise I'm going to go ahead and remove it. > >>>>> > >>>>>Thanks, > >>>>>Jeff > >>>> > >>>>VOP_LEASE is used by NQNFS and NFSv4. It notifies them when a file > >>>>is modified locally so that they know to update any outstanding > >>>>leases (e.g., evict any write lease for the file and do callbacks > >>>>for any read leases for the file). Deleting VOP_LEASE would break > >>>>NFS big time. > >>> > >>>I think our NQNFS support might have been removed some time ago - I can't > >>>see any calls to VOP_LEASE in the code right now. Something like > >>>VOP_LEASE > >>>would certainly be useful for a hypothetical future NFSv4 server. I > >>>believe that samba could use it too for its oplocks feature which appears > >>>to be similar to NQNFS's leases and NFSv4's delegations. > >> > >>So the idea with delegations is that close() doesn't actually release the > >>file entirely to make future access cheaper? > >> > >>My issue with VOP_LEASE is only that there are no in kernel > >>implementations of the VOP. I doubt it is applied regularly in syscalls. > >>It also seems odd that it is called without a lock. > >> > >>Is the intent that the server will trap all accesses to a local vnode in > >>order to invalidate the client leases? > > > >VOP_LEASE is supposed to implemented by a filesystem client. > > > >For insance, NFS client with NQNFS would implement the VOP_LEASE > >and trap those accesses to manage the lease with the remote server. > > > >The remote server would get "lease RPCs" from the client and manage > >the cache appropriately. > > So why isn't this done within the actual VOP? If the lease expires > between calling VOP_LEASE and vn_lock(), VOP_READ() you have to do that > work all over again anyway. > > I don't yet see why this is in filesystem independent code. I'm not > asserting that it doesn't need to be. I'd just like to understand it > better. The reason to have it is to reduce code duplication and not to be holding the vnode locks while doing the callbacks into the server code. Let me explain, the reason is 2-fold, one for reducing code duplication and the other for avoiding holding locks for extended periods. Consider a local client contending against a remote client for a filesystem that supported leases. Basically, each and every filesytem would have to explicitly do a VOP_LEASE at the start of every routine that required notifying the server making use of the underlying filesystem. What you really wind up doing is having a vop_stdlocallease that calls into a generic lease manager that does callbacks into any server exporting that file. So, if you move the lease call INTO the VOP_READ/READDIR/WRITE/etc you wind up holding vnode locks while doing client communication when contending with remote servers. -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 02:19:33 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 232231065672 for ; Sun, 13 Apr 2008 02:19:33 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.231]) by mx1.freebsd.org (Postfix) with ESMTP id E5C188FC1F for ; Sun, 13 Apr 2008 02:19:32 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by rv-out-0506.google.com with SMTP id b25so362377rvf.43 for ; Sat, 12 Apr 2008 19:19:32 -0700 (PDT) Received: by 10.141.37.8 with SMTP id p8mr2525376rvj.53.1208053172655; Sat, 12 Apr 2008 19:19:32 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id g22sm7193414rvb.5.2008.04.12.19.19.30 (version=SSLv3 cipher=OTHER); Sat, 12 Apr 2008 19:19:32 -0700 (PDT) Date: Sat, 12 Apr 2008 16:20:50 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Alfred Perlstein In-Reply-To: <20080413020855.GA95731@elvis.mu.org> Message-ID: <20080412161417.Q43186@desktop> References: <200804121703.m3CH3StJ081660@chez.mckusick.com> <41ED3941-E5E6-45F0-B880-C1B2861FDE32@rabson.org> <20080412131017.S43186@desktop> <20080412234547.GZ95731@elvis.mu.org> <20080412135135.V43186@desktop> <20080413020855.GA95731@elvis.mu.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kirk McKusick , arch@freebsd.org Subject: Re: VOP_LEASE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 02:19:33 -0000 On Sat, 12 Apr 2008, Alfred Perlstein wrote: > * Jeff Roberson [080412 16:51] wrote: >> >> On Sat, 12 Apr 2008, Alfred Perlstein wrote: >> >>> * Jeff Roberson [080412 16:13] wrote: >>>> On Sat, 12 Apr 2008, Doug Rabson wrote: >>>> >>>>> >>>>> On 12 Apr 2008, at 18:03, Kirk McKusick wrote: >>>>> >>>>>>> Date: Sat, 12 Apr 2008 02:13:15 -1000 (HST) >>>>>>> From: Jeff Roberson >>>>>>> To: arch@freebsd.org >>>>>>> Subject: VOP_LEASE >>>>>>> >>>>>>> As far as I can tell this has never been used. Unless someone can show >>>>>>> me >>>>>>> otherwise I'm going to go ahead and remove it. >>>>>>> >>>>>>> Thanks, >>>>>>> Jeff >>>>>> >>>>>> VOP_LEASE is used by NQNFS and NFSv4. It notifies them when a file >>>>>> is modified locally so that they know to update any outstanding >>>>>> leases (e.g., evict any write lease for the file and do callbacks >>>>>> for any read leases for the file). Deleting VOP_LEASE would break >>>>>> NFS big time. >>>>> >>>>> I think our NQNFS support might have been removed some time ago - I can't >>>>> see any calls to VOP_LEASE in the code right now. Something like >>>>> VOP_LEASE >>>>> would certainly be useful for a hypothetical future NFSv4 server. I >>>>> believe that samba could use it too for its oplocks feature which appears >>>>> to be similar to NQNFS's leases and NFSv4's delegations. >>>> >>>> So the idea with delegations is that close() doesn't actually release the >>>> file entirely to make future access cheaper? >>>> >>>> My issue with VOP_LEASE is only that there are no in kernel >>>> implementations of the VOP. I doubt it is applied regularly in syscalls. >>>> It also seems odd that it is called without a lock. >>>> >>>> Is the intent that the server will trap all accesses to a local vnode in >>>> order to invalidate the client leases? >>> >>> VOP_LEASE is supposed to implemented by a filesystem client. >>> >>> For insance, NFS client with NQNFS would implement the VOP_LEASE >>> and trap those accesses to manage the lease with the remote server. >>> >>> The remote server would get "lease RPCs" from the client and manage >>> the cache appropriately. So just to be clear, this is required for nfsv4 client but not presently used by nfsv4 client? The vnodes we're calling VOP_LEASE on are actually remote files? >> >> So why isn't this done within the actual VOP? If the lease expires >> between calling VOP_LEASE and vn_lock(), VOP_READ() you have to do that >> work all over again anyway. >> >> I don't yet see why this is in filesystem independent code. I'm not >> asserting that it doesn't need to be. I'd just like to understand it >> better. > > The reason to have it is to reduce code duplication and not to be > holding the vnode locks while doing the callbacks into the server > code. > > Let me explain, the reason is 2-fold, one for reducing code duplication > and the other for avoiding holding locks for extended periods. > > Consider a local client contending against a remote client for a > filesystem that supported leases. > > Basically, each and every filesytem would have to explicitly do a > VOP_LEASE at the start of every routine that required notifying the > server making use of the underlying filesystem. So this is for the _server_ side and not the client side. That's what I originally asked. So you want to notify the nfsv4 server code that has mounted a local filesystem that you're going to modify or read a file locally so it can invalidate the client cache. Correct? > > What you really wind up doing is having a vop_stdlocallease that > calls into a generic lease manager that does callbacks into any > server exporting that file. > > So, if you move the lease call INTO the VOP_READ/READDIR/WRITE/etc > you wind up holding vnode locks while doing client communication > when contending with remote servers. Ok but doesn't this open a race? What about: VOP_LEASE() -> invalidate current remote leases <- new lease established vn_lock() VOP_WRITE() vn_unlock() Jeff > > -- > - Alfred Perlstein > From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 07:05:54 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 692F2106566C for ; Sun, 13 Apr 2008 07:05:54 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 284528FC27 for ; Sun, 13 Apr 2008 07:05:53 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 7634A17104; Sun, 13 Apr 2008 07:05:52 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m3D75psB018355; Sun, 13 Apr 2008 07:05:51 GMT (envelope-from phk@critter.freebsd.dk) To: Jeff Roberson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sat, 12 Apr 2008 13:51:15 -1000." <20080412132457.W43186@desktop> Date: Sun, 13 Apr 2008 07:05:51 +0000 Message-ID: <18354.1208070351@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 07:05:54 -0000 In message <20080412132457.W43186@desktop>, Jeff Roberson writes: >It's worth discussing what posix actually guarantees for f_offset as well >as what other operating systems do. I think DWIM is quite easily defined here: concurrent access only makes sense with pwrite[v](2)/pread[v](2). The non p-prefix versions should always be serialized, because there is know way of knowing where they read/write if you don't. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 08:19:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63B3A106564A for ; Sun, 13 Apr 2008 08:19:07 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from itchy.rabson.org (mail.rabson.org [IPv6:2002:50b1:e8f2:1::143]) by mx1.freebsd.org (Postfix) with ESMTP id 138288FC1F for ; Sun, 13 Apr 2008 08:19:07 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from [IPv6:2002:50b1:e8f2:1:21b:63ff:feb8:5abc] (unknown [IPv6:2002:50b1:e8f2:1:21b:63ff:feb8:5abc]) by itchy.rabson.org (Postfix) with ESMTP id 4B68F3F91; Sun, 13 Apr 2008 09:19:06 +0100 (BST) Message-Id: From: Doug Rabson To: Jeff Roberson In-Reply-To: <20080412131017.S43186@desktop> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Sun, 13 Apr 2008 09:19:06 +0100 References: <200804121703.m3CH3StJ081660@chez.mckusick.com> <41ED3941-E5E6-45F0-B880-C1B2861FDE32@rabson.org> <20080412131017.S43186@desktop> X-Mailer: Apple Mail (2.919.2) Cc: Kirk McKusick , arch@freebsd.org Subject: Re: VOP_LEASE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 08:19:07 -0000 On 13 Apr 2008, at 00:15, Jeff Roberson wrote: > On Sat, 12 Apr 2008, Doug Rabson wrote: > >> >> On 12 Apr 2008, at 18:03, Kirk McKusick wrote: >> >>>> Date: Sat, 12 Apr 2008 02:13:15 -1000 (HST) >>>> From: Jeff Roberson >>>> To: arch@freebsd.org >>>> Subject: VOP_LEASE >>>> As far as I can tell this has never been used. Unless someone >>>> can show me >>>> otherwise I'm going to go ahead and remove it. >>>> Thanks, >>>> Jeff >>> VOP_LEASE is used by NQNFS and NFSv4. It notifies them when a file >>> is modified locally so that they know to update any outstanding >>> leases (e.g., evict any write lease for the file and do callbacks >>> for any read leases for the file). Deleting VOP_LEASE would break >>> NFS big time. >> >> I think our NQNFS support might have been removed some time ago - I >> can't see any calls to VOP_LEASE in the code right now. Something >> like VOP_LEASE would certainly be useful for a hypothetical future >> NFSv4 server. I believe that samba could use it too for its oplocks >> feature which appears to be similar to NQNFS's leases and NFSv4's >> delegations. > > So the idea with delegations is that close() doesn't actually > release the file entirely to make future access cheaper? > > My issue with VOP_LEASE is only that there are no in kernel > implementations of the VOP. I doubt it is applied regularly in > syscalls. It also seems odd that it is called without a lock. > > Is the intent that the server will trap all accesses to a local > vnode in order to invalidate the client leases? I'm working from memory here (too lazy to checkout an old tree). I seem to remember that the way this worked is that when an NQNFS server granted a lease to a remote client, it arranged things so that any local filesystem access to the leased file would first evict the remote leaseholder. While the remote client has a valid lease, it is free to agressively cache locally as long as it flushes write to the server on eviction. The implementation was quite intrusive on the server. I can't quite remember where VOP_LEASE came in and the documentation is useless. From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 08:24:06 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2121106566B for ; Sun, 13 Apr 2008 08:24:06 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.176]) by mx1.freebsd.org (Postfix) with ESMTP id AC7478FC0C for ; Sun, 13 Apr 2008 08:24:06 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wa-out-1112.google.com with SMTP id k17so1331231waf.3 for ; Sun, 13 Apr 2008 01:24:06 -0700 (PDT) Received: by 10.114.150.1 with SMTP id x1mr5487401wad.144.1208075046190; Sun, 13 Apr 2008 01:24:06 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id m25sm8639046waf.46.2008.04.13.01.24.04 (version=SSLv3 cipher=OTHER); Sun, 13 Apr 2008 01:24:05 -0700 (PDT) Date: Sat, 12 Apr 2008 22:24:22 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Poul-Henning Kamp In-Reply-To: <18354.1208070351@critter.freebsd.dk> Message-ID: <20080412221654.S959@desktop> References: <18354.1208070351@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 08:24:06 -0000 On Sun, 13 Apr 2008, Poul-Henning Kamp wrote: > In message <20080412132457.W43186@desktop>, Jeff Roberson writes: > >> It's worth discussing what posix actually guarantees for f_offset as well >> as what other operating systems do. > > I think DWIM is quite easily defined here: concurrent access only > makes sense with pwrite[v](2)/pread[v](2). It may only make sense with p* but it happens without. If "DWIM" means "do what it means" posix is purposefully permissive in the requirements for f_offset because existing implementations were non-serializing. > > The non p-prefix versions should always be serialized, because there > is know way of knowing where they read/write if you don't. Well that's at odds with what the standard says and what others implement. I think there is a clear case for serializing writes. I don't see what advantage we get from serializing reads. The heavy cost of synchronization should be justified by actual need. Thanks, Jeff > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 08:41:28 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 889F6106564A for ; Sun, 13 Apr 2008 08:41:28 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.178]) by mx1.freebsd.org (Postfix) with ESMTP id 610068FC1F for ; Sun, 13 Apr 2008 08:41:28 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wa-out-1112.google.com with SMTP id k17so1336249waf.3 for ; Sun, 13 Apr 2008 01:41:28 -0700 (PDT) Received: by 10.114.124.12 with SMTP id w12mr2233594wac.210.1208076087815; Sun, 13 Apr 2008 01:41:27 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id k37sm8707412waf.55.2008.04.13.01.41.26 (version=SSLv3 cipher=OTHER); Sun, 13 Apr 2008 01:41:27 -0700 (PDT) Date: Sat, 12 Apr 2008 22:41:44 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Doug Rabson In-Reply-To: Message-ID: <20080412222458.B959@desktop> References: <200804121703.m3CH3StJ081660@chez.mckusick.com> <41ED3941-E5E6-45F0-B880-C1B2861FDE32@rabson.org> <20080412131017.S43186@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kirk McKusick , arch@freebsd.org Subject: Re: VOP_LEASE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 08:41:28 -0000 On Sun, 13 Apr 2008, Doug Rabson wrote: > > On 13 Apr 2008, at 00:15, Jeff Roberson wrote: > >> On Sat, 12 Apr 2008, Doug Rabson wrote: >> >>> >>> On 12 Apr 2008, at 18:03, Kirk McKusick wrote: >>> >>>>> Date: Sat, 12 Apr 2008 02:13:15 -1000 (HST) >>>>> From: Jeff Roberson >>>>> To: arch@freebsd.org >>>>> Subject: VOP_LEASE >>>>> As far as I can tell this has never been used. Unless someone can show >>>>> me >>>>> otherwise I'm going to go ahead and remove it. >>>>> Thanks, >>>>> Jeff >>>> VOP_LEASE is used by NQNFS and NFSv4. It notifies them when a file >>>> is modified locally so that they know to update any outstanding >>>> leases (e.g., evict any write lease for the file and do callbacks >>>> for any read leases for the file). Deleting VOP_LEASE would break >>>> NFS big time. >>> >>> I think our NQNFS support might have been removed some time ago - I can't >>> see any calls to VOP_LEASE in the code right now. Something like VOP_LEASE >>> would certainly be useful for a hypothetical future NFSv4 server. I >>> believe that samba could use it too for its oplocks feature which appears >>> to be similar to NQNFS's leases and NFSv4's delegations. >> >> So the idea with delegations is that close() doesn't actually release the >> file entirely to make future access cheaper? >> >> My issue with VOP_LEASE is only that there are no in kernel implementations >> of the VOP. I doubt it is applied regularly in syscalls. It also seems odd >> that it is called without a lock. >> >> Is the intent that the server will trap all accesses to a local vnode in >> order to invalidate the client leases? > > I'm working from memory here (too lazy to checkout an old tree). I seem to > remember that the way this worked is that when an NQNFS server granted a > lease to a remote client, it arranged things so that any local filesystem > access to the leased file would first evict the remote leaseholder. While the > remote client has a valid lease, it is free to agressively cache locally as > long as it flushes write to the server on eviction. The implementation was > quite intrusive on the server. I can't quite remember where VOP_LEASE came in > and the documentation is useless. I discussed it more with alfred. I don't intend to remove VOP_LEASE since there may be some valid use for it. We just haven't had any code in at least a decade that made use of it so I thought it was prime for axing. I believe that calling the VOP without a lock makes it prone to races which make it minimally useful. However I'm willing to reserve judgement until some consumer actually shows up. Sun doesn't seem to have a VOP_LEASE or similar in Solaris. They actually seem to install a kind of filter on vfs and vnode operations and monitor there. Their filters do more than VOP_LEASE does and operate a bit like the vop_*_pre and post hooks I added for debugging which now have been turned on all the time. It might be cleaner if we implemented the lease notification in these hooks instead. Cheers, Jeff From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 08:59:13 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0DB2106566C for ; Sun, 13 Apr 2008 08:59:13 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from itchy.rabson.org (mail.rabson.org [IPv6:2002:50b1:e8f2:1::143]) by mx1.freebsd.org (Postfix) with ESMTP id 249CF8FC22 for ; Sun, 13 Apr 2008 08:59:13 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from [IPv6:2002:50b1:e8f2:1:21b:63ff:feb8:5abc] (unknown [IPv6:2002:50b1:e8f2:1:21b:63ff:feb8:5abc]) by itchy.rabson.org (Postfix) with ESMTP id 3773B3FB4; Sun, 13 Apr 2008 09:59:12 +0100 (BST) Message-Id: <579895DC-DE8D-4DED-8E0B-CFCC73032A6A@rabson.org> From: Doug Rabson To: Jeff Roberson In-Reply-To: <20080412222458.B959@desktop> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Sun, 13 Apr 2008 09:59:11 +0100 References: <200804121703.m3CH3StJ081660@chez.mckusick.com> <41ED3941-E5E6-45F0-B880-C1B2861FDE32@rabson.org> <20080412131017.S43186@desktop> <20080412222458.B959@desktop> X-Mailer: Apple Mail (2.919.2) Cc: Kirk McKusick , arch@freebsd.org Subject: Re: VOP_LEASE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 08:59:13 -0000 On 13 Apr 2008, at 09:41, Jeff Roberson wrote: > On Sun, 13 Apr 2008, Doug Rabson wrote: > >> >> On 13 Apr 2008, at 00:15, Jeff Roberson wrote: >> >>> On Sat, 12 Apr 2008, Doug Rabson wrote: >>>> On 12 Apr 2008, at 18:03, Kirk McKusick wrote: >>>>>> Date: Sat, 12 Apr 2008 02:13:15 -1000 (HST) >>>>>> From: Jeff Roberson >>>>>> To: arch@freebsd.org >>>>>> Subject: VOP_LEASE >>>>>> As far as I can tell this has never been used. Unless someone >>>>>> can show me >>>>>> otherwise I'm going to go ahead and remove it. >>>>>> Thanks, >>>>>> Jeff >>>>> VOP_LEASE is used by NQNFS and NFSv4. It notifies them when a file >>>>> is modified locally so that they know to update any outstanding >>>>> leases (e.g., evict any write lease for the file and do callbacks >>>>> for any read leases for the file). Deleting VOP_LEASE would break >>>>> NFS big time. >>>> I think our NQNFS support might have been removed some time ago - >>>> I can't see any calls to VOP_LEASE in the code right now. >>>> Something like VOP_LEASE would certainly be useful for a >>>> hypothetical future NFSv4 server. I believe that samba could use >>>> it too for its oplocks feature which appears to be similar to >>>> NQNFS's leases and NFSv4's delegations. >>> So the idea with delegations is that close() doesn't actually >>> release the file entirely to make future access cheaper? >>> My issue with VOP_LEASE is only that there are no in kernel >>> implementations of the VOP. I doubt it is applied regularly in >>> syscalls. It also seems odd that it is called without a lock. >>> Is the intent that the server will trap all accesses to a local >>> vnode in order to invalidate the client leases? >> >> I'm working from memory here (too lazy to checkout an old tree). I >> seem to remember that the way this worked is that when an NQNFS >> server granted a lease to a remote client, it arranged things so >> that any local filesystem access to the leased file would first >> evict the remote leaseholder. While the remote client has a valid >> lease, it is free to agressively cache locally as long as it >> flushes write to the server on eviction. The implementation was >> quite intrusive on the server. I can't quite remember where >> VOP_LEASE came in and the documentation is useless. > > I discussed it more with alfred. I don't intend to remove VOP_LEASE > since there may be some valid use for it. We just haven't had any > code in at least a decade that made use of it so I thought it was > prime for axing. > > I believe that calling the VOP without a lock makes it prone to > races which make it minimally useful. However I'm willing to > reserve judgement until some consumer actually shows up. > > Sun doesn't seem to have a VOP_LEASE or similar in Solaris. They > actually seem to install a kind of filter on vfs and vnode > operations and monitor there. Their filters do more than VOP_LEASE > does and operate a bit like the vop_*_pre and post hooks I added for > debugging which now have been turned on all the time. It might be > cleaner if we implemented the lease notification in these hooks > instead. That sounds reasonable. Actually one good reason for removing VOP_LEASE as it currently stands would be that there is no specification and no implementation to derive a specification from. From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 15:23:02 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13B351065677 for ; Sun, 13 Apr 2008 15:23:02 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id CB7DC8FC55 for ; Sun, 13 Apr 2008 15:23:01 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 0135D17104; Sun, 13 Apr 2008 15:22:59 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m3DFMw3v001310; Sun, 13 Apr 2008 15:22:59 GMT (envelope-from phk@critter.freebsd.dk) To: Jeff Roberson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sat, 12 Apr 2008 22:24:22 -1000." <20080412221654.S959@desktop> Date: Sun, 13 Apr 2008 15:22:58 +0000 Message-ID: <1309.1208100178@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 15:23:02 -0000 In message <20080412221654.S959@desktop>, Jeff Roberson writes: >> The non p-prefix versions should always be serialized, because there >> is know way of knowing where they read/write if you don't. > >Well that's at odds with what the standard says and what others implement. >I think there is a clear case for serializing writes. I don't see what >advantage we get from serializing reads. The heavy cost of >synchronization should be justified by actual need. If you don't serialize read(2) and readv(2), how do you know where they read from ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 16:04:35 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13766106566B for ; Sun, 13 Apr 2008 16:04:35 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 015738FC16 for ; Sun, 13 Apr 2008 16:04:34 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id BE76D1A4D80; Sun, 13 Apr 2008 09:04:34 -0700 (PDT) Date: Sun, 13 Apr 2008 09:04:34 -0700 From: Alfred Perlstein To: Poul-Henning Kamp Message-ID: <20080413160434.GD95731@elvis.mu.org> References: <20080412221654.S959@desktop> <1309.1208100178@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1309.1208100178@critter.freebsd.dk> User-Agent: Mutt/1.4.2.3i Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 16:04:35 -0000 * Poul-Henning Kamp [080413 08:23] wrote: > In message <20080412221654.S959@desktop>, Jeff Roberson writes: > > >> The non p-prefix versions should always be serialized, because there > >> is know way of knowing where they read/write if you don't. > > > >Well that's at odds with what the standard says and what others implement. > >I think there is a clear case for serializing writes. I don't see what > >advantage we get from serializing reads. The heavy cost of > >synchronization should be justified by actual need. > > If you don't serialize read(2) and readv(2), how do you know where > they read from ? You don't always care, if the file is a fixed record file or datagram socket then it does not matter. -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 16:05:20 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3A11106566C for ; Sun, 13 Apr 2008 16:05:20 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id 967A78FC24 for ; Sun, 13 Apr 2008 16:05:20 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.2/8.14.2) with ESMTP id m3DG8Ta9043067; Sun, 13 Apr 2008 12:08:29 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.2/8.14.2/Submit) id m3DG8TAT043066; Sun, 13 Apr 2008 12:08:29 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Date: Sun, 13 Apr 2008 12:08:29 -0400 From: David Schultz To: Jeff Roberson Message-ID: <20080413160829.GA42972@zim.MIT.EDU> Mail-Followup-To: Jeff Roberson , arch@FreeBSD.ORG References: <20080412132457.W43186@desktop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080412132457.W43186@desktop> Cc: arch@FreeBSD.ORG Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 16:05:20 -0000 On Sat, Apr 12, 2008, Jeff Roberson wrote: > It's worth discussing what posix actually guarantees for f_offset as well > as what other operating systems do. POSIX actually does not guarantee any > behavior with simultaneous access. Multiple readers may read the same > position in the file concurrently and update the position to different > offsets. Multiple writers may write to the same file location, although > the io should be serialized by some other means. Posix allows for and > Solaris, Linux, and historic implementations of f_offset work in the > following way: This is not entirely true. In particular, files opened with O_APPEND have stronger guarantees, and this behavior can be useful. For example, I imagine that a database that opens its log file with O_APPEND can depend on being able to write log entries concurrently without losing any data. (There are also stronger requirements for pipes, FIFOs, etc.) As I recall, empiricial evidence shows that SunOS 5.10 and FreeBSD both make stronger guarantees than Linux in the presence of multiple concurrent writers. I haven't tested readers or looked at the fdesc code for any of these. From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 16:39:30 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 09F5B106566B for ; Sun, 13 Apr 2008 16:39:30 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id EC98D8FC21 for ; Sun, 13 Apr 2008 16:39:29 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id E104D1A4D80; Sun, 13 Apr 2008 09:39:29 -0700 (PDT) Date: Sun, 13 Apr 2008 09:39:29 -0700 From: Alfred Perlstein To: Jeff Roberson , arch@FreeBSD.ORG Message-ID: <20080413163929.GE95731@elvis.mu.org> References: <20080412132457.W43186@desktop> <20080413160829.GA42972@zim.MIT.EDU> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080413160829.GA42972@zim.MIT.EDU> User-Agent: Mutt/1.4.2.3i Cc: Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 16:39:30 -0000 * David Schultz [080413 09:05] wrote: > On Sat, Apr 12, 2008, Jeff Roberson wrote: > > It's worth discussing what posix actually guarantees for f_offset as well > > as what other operating systems do. POSIX actually does not guarantee any > > behavior with simultaneous access. Multiple readers may read the same > > position in the file concurrently and update the position to different > > offsets. Multiple writers may write to the same file location, although > > the io should be serialized by some other means. Posix allows for and > > Solaris, Linux, and historic implementations of f_offset work in the > > following way: > > This is not entirely true. In particular, files opened with > O_APPEND have stronger guarantees, and this behavior can be > useful. For example, I imagine that a database that opens its log > file with O_APPEND can depend on being able to write log entries > concurrently without losing any data. (There are also stronger > requirements for pipes, FIFOs, etc.) > > As I recall, empiricial evidence shows that SunOS 5.10 and FreeBSD > both make stronger guarantees than Linux in the presence of > multiple concurrent writers. I haven't tested readers or looked > at the fdesc code for any of these. O_APPEND is kept inside of f_flags and passed down into the VOP layer so that the filesystem can "do the right thing", basically always append and get rid of the f_offset problem. Sort of. -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 20:18:03 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38813106564A for ; Sun, 13 Apr 2008 20:18:03 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id 185FC8FC26 for ; Sun, 13 Apr 2008 20:18:02 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (localhost.mckusick.com [127.0.0.1]) by chez.mckusick.com (8.13.8/8.13.8) with ESMTP id m3DKI4MG007310; Sun, 13 Apr 2008 13:18:04 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <200804132018.m3DKI4MG007310@chez.mckusick.com> To: Jeff Roberson Date: Sun, 13 Apr 2008 13:18:04 -0700 From: Kirk McKusick Cc: arch@freebsd.org Subject: Re: VOP_LEASE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 20:18:03 -0000 The explanation is correct that VOP_LEASE is present so that an NFS server is made aware when a file is used locally so that the server can take appropriate action on the leases that it has extended to its clients. Because VOP_LEASE is not under a lock, the race that Jeff points out is also valid. To be race-free, they would need to be protected by the vnode lock. Most instances of them are still protected by the vnode lock, though some key ones (like READ and WRITE) fell out of being protected when the vnode lock was pushed down so that it no longer protected f_offset updates. Using the pre- and post- hooks in the vnode operations would be an equally valid place to do the leasing call-back to the NFS (or other remote filesystem) server. As the VOP_LEASE calls were, at least at the time they were in use, correctly placed and indentified read or write access, it might be useful to figure out the appropriate set of pre- and post- hooks as part of their removal. Kirk McKusick From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 23:16:40 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 553D7106564A for ; Sun, 13 Apr 2008 23:16:40 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.177]) by mx1.freebsd.org (Postfix) with ESMTP id 35C298FC22 for ; Sun, 13 Apr 2008 23:16:40 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wa-out-1112.google.com with SMTP id k17so1697119waf.3 for ; Sun, 13 Apr 2008 16:16:39 -0700 (PDT) Received: by 10.114.89.1 with SMTP id m1mr6151442wab.77.1208128599819; Sun, 13 Apr 2008 16:16:39 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id k9sm12600336wah.3.2008.04.13.16.16.37 (version=SSLv3 cipher=OTHER); Sun, 13 Apr 2008 16:16:38 -0700 (PDT) Date: Sun, 13 Apr 2008 13:16:59 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: David Schultz In-Reply-To: <20080413160829.GA42972@zim.MIT.EDU> Message-ID: <20080413131422.V959@desktop> References: <20080412132457.W43186@desktop> <20080413160829.GA42972@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.ORG Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 23:16:40 -0000 On Sun, 13 Apr 2008, David Schultz wrote: > On Sat, Apr 12, 2008, Jeff Roberson wrote: >> It's worth discussing what posix actually guarantees for f_offset as well >> as what other operating systems do. POSIX actually does not guarantee any >> behavior with simultaneous access. Multiple readers may read the same >> position in the file concurrently and update the position to different >> offsets. Multiple writers may write to the same file location, although >> the io should be serialized by some other means. Posix allows for and >> Solaris, Linux, and historic implementations of f_offset work in the >> following way: > > This is not entirely true. In particular, files opened with > O_APPEND have stronger guarantees, and this behavior can be > useful. For example, I imagine that a database that opens its log > file with O_APPEND can depend on being able to write log entries > concurrently without losing any data. (There are also stronger > requirements for pipes, FIFOs, etc.) As alfred mentioned append is handled in a different way. I'm not suggesting we break posix semantics for append. Also, pipes and fifos don't have an f_offset and you can't call seek on them. > > As I recall, empiricial evidence shows that SunOS 5.10 and FreeBSD > both make stronger guarantees than Linux in the presence of > multiple concurrent writers. I haven't tested readers or looked > at the fdesc code for any of these. > Yes I slightly misspoke about solaris. They use the exclusive vnode lock to protect f_offset for writers. However, f_offset is fetched and set with a shared vnode lock for readers. These are the same semantics that I'm proposing. Thanks, Jeff From owner-freebsd-arch@FreeBSD.ORG Sun Apr 13 23:19:13 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA62A10656C6 for ; Sun, 13 Apr 2008 23:19:13 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.175]) by mx1.freebsd.org (Postfix) with ESMTP id 8C1ED8FC18 for ; Sun, 13 Apr 2008 23:19:13 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wf-out-1314.google.com with SMTP id 25so1360197wfa.7 for ; Sun, 13 Apr 2008 16:19:13 -0700 (PDT) Received: by 10.142.199.10 with SMTP id w10mr696081wff.272.1208128752390; Sun, 13 Apr 2008 16:19:12 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id 30sm10395144wff.11.2008.04.13.16.19.10 (version=SSLv3 cipher=OTHER); Sun, 13 Apr 2008 16:19:11 -0700 (PDT) Date: Sun, 13 Apr 2008 13:19:35 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Poul-Henning Kamp In-Reply-To: <1309.1208100178@critter.freebsd.dk> Message-ID: <20080413131724.X959@desktop> References: <1309.1208100178@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Apr 2008 23:19:13 -0000 On Sun, 13 Apr 2008, Poul-Henning Kamp wrote: > In message <20080412221654.S959@desktop>, Jeff Roberson writes: > >>> The non p-prefix versions should always be serialized, because there >>> is know way of knowing where they read/write if you don't. >> >> Well that's at odds with what the standard says and what others implement. >> I think there is a clear case for serializing writes. I don't see what >> advantage we get from serializing reads. The heavy cost of >> synchronization should be justified by actual need. > > If you don't serialize read(2) and readv(2), how do you know where > they read from ? Concurrent calls to read() are inherently racy. They will still use the current value of f_offset and store it while they are done. I'm just suggesting we don't use an exclusive lock that is held for the duration of the io to protect the update to the f_offset field. The field will still be updated in such a way that it is atomic. Thanks, Jeff > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 07:00:30 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9129C106564A for ; Mon, 14 Apr 2008 07:00:30 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 35A878FC1E for ; Mon, 14 Apr 2008 07:00:30 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id EAF2F17107; Mon, 14 Apr 2008 07:00:28 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m3E70Sp0004994; Mon, 14 Apr 2008 07:00:28 GMT (envelope-from phk@critter.freebsd.dk) To: Jeff Roberson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sun, 13 Apr 2008 13:16:59 -1000." <20080413131422.V959@desktop> Date: Mon, 14 Apr 2008 07:00:28 +0000 Message-ID: <4993.1208156428@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@FreeBSD.ORG, David Schultz Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 07:00:30 -0000 In message <20080413131422.V959@desktop>, Jeff Roberson writes: >However, f_offset is fetched and set with a shared vnode lock for readers. "fetched AND set" with a shared lock ? How can that possibly work out sanely ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 07:50:26 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7DD72106566B for ; Mon, 14 Apr 2008 07:50:26 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id 436CC8FC24 for ; Mon, 14 Apr 2008 07:50:26 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id 6CB561CD30; Mon, 14 Apr 2008 09:47:10 +0200 (CEST) Date: Mon, 14 Apr 2008 09:47:10 +0200 From: Ed Schouten To: Jeff Roberson Message-ID: <20080414074710.GI5934@hoeg.nl> References: <1309.1208100178@critter.freebsd.dk> <20080413131724.X959@desktop> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="w+rhPQc/K9ract27" Content-Disposition: inline In-Reply-To: <20080413131724.X959@desktop> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 07:50:26 -0000 --w+rhPQc/K9ract27 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello Jeff, * Jeff Roberson wrote: > Concurrent calls to read() are inherently racy. They will still use the= =20 > current value of f_offset and store it while they are done. I'm experiencing similar problems with implementing read() and write() inside my mpsafetty branch for TTY's. Just like the current TTY implementation, my implementation will do strange things when two threads call read() or write() at the same time. Data could end up mixed together. The main cause is that mutexes cannot be held when copying data back to userspace, which is obvious. I could store flags to indicate a read() or write() call is in progress, but because there is no requirement for this, I think I won't pay attention to this. With regular files you could probably increment the offset before copying any data back to userspace, but of course those calls may fail (EFAULT, EIO), which means the offset shouldn't advance. --=20 Ed Schouten WWW: http://g-rave.nl/ --w+rhPQc/K9ract27 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkgDC/4ACgkQ52SDGA2eCwWwBACbBo/DheVrtSZtogASRWxCw9XS ic8An3qVDDDwk/lOzXNsaCCyfBie/w+8 =9SZf -----END PGP SIGNATURE----- --w+rhPQc/K9ract27-- From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 07:55:11 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B06DC106566B for ; Mon, 14 Apr 2008 07:55:11 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 771598FC18 for ; Mon, 14 Apr 2008 07:55:06 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id EF62A17104; Mon, 14 Apr 2008 07:55:04 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m3E7t4ie005288; Mon, 14 Apr 2008 07:55:04 GMT (envelope-from phk@critter.freebsd.dk) To: Ed Schouten From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 14 Apr 2008 09:47:10 +0200." <20080414074710.GI5934@hoeg.nl> Date: Mon, 14 Apr 2008 07:55:04 +0000 Message-ID: <5287.1208159704@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 07:55:11 -0000 In message <20080414074710.GI5934@hoeg.nl>, Ed Schouten writes: >I'm experiencing similar problems with implementing read() and write() >inside my mpsafetty branch for TTY's. Just like the current TTY >implementation, my implementation will do strange things when two >threads call read() or write() at the same time. Data could end up mixed >together. The write side of this will break quite a lot of stuff, starting with syslogd(8), write(1), wall(1) and similar, all which expect to be able to spam terminals coherently. The read side will probably mostly cause trouble for programs that try to take input from /dev/tty, usually passwords. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 08:19:38 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5BCE9106564A; Mon, 14 Apr 2008 08:19:38 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello087206046210.chello.pl [87.206.46.210]) by mx1.freebsd.org (Postfix) with ESMTP id 9EF308FC13; Mon, 14 Apr 2008 08:19:37 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id DDEDB45CA0; Mon, 14 Apr 2008 10:19:35 +0200 (CEST) Received: from localhost (unknown [10.0.1.111]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 2421345C99; Mon, 14 Apr 2008 10:19:31 +0200 (CEST) Date: Mon, 14 Apr 2008 10:19:18 +0200 From: Pawel Jakub Dawidek To: Kostik Belousov Message-ID: <20080414081918.GA10478@garage.freebsd.pl> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl> <20080412131604.GX21209@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="82I3+IH0IqGh5yIs" Content-Disposition: inline In-Reply-To: <20080412131604.GX21209@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 8.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: Roman Divacky , rwatson@FreeBSD.garage.freebsd.pl, freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 08:19:38 -0000 --82I3+IH0IqGh5yIs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Apr 12, 2008 at 04:16:04PM +0300, Kostik Belousov wrote: > On Sat, Apr 12, 2008 at 01:20:19PM +0200, Pawel Jakub Dawidek wrote: > > On Thu, Dec 20, 2007 at 11:38:55AM -0500, John Baldwin wrote: > > > On Tuesday 18 December 2007 04:22:22 am Roman Divacky wrote: > > > > Dear arch@ > > > >=20 > > > > Over this summer I was working (among other things) on *at family o= f syscalls > > > > kindly sponsored by Google (in their Summer of Code). The resulting= patch is=20 > > > > almost finished but I need to decide one design question. If you ar= e not interested=20 > > > > in *at/namei feel free to skip this mail. > > > >=20 > > > > The *at syscalls are a threads-oriented extension to basic file sys= calls (think > > > > of open(), fstat(), etc.) adding the possibility to specify from wh= ere the search > > > > for relative path should start. > > > >=20 > > > > image that we have /tmp/foo/bar > > > >=20 > > > > and CWD is set to "/tmp/", and the process has opened "foo" as dirf= d. with ordinary > > > > open() syscall you have to either > > > >=20 > > > > chdir("/tmp/foo");open("./bar"); > > > >=20 > > > > or > > > >=20 > > > > open("/tmp/foo/bar"); > > > >=20 > > > > The first approach is problematic because it changes CWD for all th= reads in the process, > > > > the second is prone to race-conditions as some of the components of= the path can > > > > change in parallel with the "open". > > > >=20 > > > > So POSIX introduced a new API, called "Extended API set part 2, ISB= N: 1-931624-67-4" (at > > > > least this was the latest when I looked last time), which solves th= at by introducing "*at" > > > > syscalls that supply an fd of previously opened directory which is = used instead of CWD > > > > for searching relative path, ie. the previous example becomes > > > >=20 > > > > dirfd =3D open("/tmp/foo"); openat("foo", dirfd); > > > >=20 > > > > I implemented the whole API as native FreeBSD syscalls + in linuxul= ator emulation layer. > > > > Here's the problem: > > > >=20 > > > > There are two approaches to the name translation from "filedescript= or" to the "vnode". > > > >=20 > > > > 1) we can do it in the kern_fooat() syscall and pass namei() the re= sulting vnode > > > > 2) we can pass namei() the filedescriptor and do the translation th= ere > > > >=20 > > > > PROs of #1: > > > >=20 > > > > o namei() does not need to know about the curthread, you can use t= his *at > > > > ability for different purposes, it's cleaner (imho) > > > >=20 > > > > PROs of #2 > > > >=20 > > > > o raceless implementation > > > > o no code duplication > > > >=20 > > > > CONs of #1 > > > >=20 > > > > o some very small code duplication (the translation is done in eve= ry=20 > > > > kern_fooat() function) > > > > o there is a race between the name translation and the actual use = of the result > > > > of the translation that needs to be handled, the "path_to_file" s= tring is copied > > > > to the kernel space twice hence a race > > > >=20 > > > > CONs of #2 > > > >=20 > > > > o namei is made thread dependant =09 > > > >=20 > > > > Please tell me what approach you like more. I personally favour #1 = because I don't like namei() > > > > being thread dependant, Kostik Belousov prefers #2. > > >=20 > > > Considering Robert's paper on security race problems in things like s= ystrace > > > stemming from when you copy parameters out of userland and into the k= ernel > > > multiple times, I think #2 is definitely the better choice. Also, na= mei() is > > > already thread aware AFAICT since 'struct componentname' already cont= ains a > > > 'cnp_thread' member (was 'cnp_proc' in 4.x). > >=20 > > It looks like I'm a bit too late, but anyway... > >=20 > > From what you write John, #1 is a better choice than #2. If you want to > > avoid races, you can pass already locked vnode. In case of file > > descriptors, if p_fd is not locked another thread can close and open > > different directory under the same descriptor number. > This is the application imposed race, not the externally imposed one. > Moreover, I would argue that this is application error. Right, this will be an application bug and the race can occur even before entering the kernel. What I'm saying is that this is more racy than vnode approach (at least is not less racy). > > I also need such functionality for recent ZFS and #2 makes it impossible > > to use it. NDINIT_AT() is kernel (VFS) API so it should operate on > > vnodes, not file descriptor numbers, IMHO. > Following the same arguments, NDINIT() shall not operate on the pathes > too. We both know that we have to convert path to vnode somehow, but you can't disagree that VFS API doesn't operate on file descriptors in general, but on vnodes. > > For completness can you Kostik and Robert provide your arguments against > > #1? >=20 > The #2 was already committed. > The #1 caused a code duplication that was quite error-prone. >=20 > What are your problems with the #2 ? Take a look at perforce change @139873. You can find there how I was using #1. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --82I3+IH0IqGh5yIs Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFIAxOFForvXbEpPzQRAg2zAKCYG0/6D5q5C5SFuJiYfw5w+fUkfwCglTaY 0aYlD8psV6shTkfW1BabXCE= =jVbV -----END PGP SIGNATURE----- --82I3+IH0IqGh5yIs-- From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 08:27:22 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A7A51065674 for ; Mon, 14 Apr 2008 08:27:22 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.183]) by mx1.freebsd.org (Postfix) with ESMTP id 1E9908FC1A for ; Mon, 14 Apr 2008 08:27:22 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by py-out-1112.google.com with SMTP id u52so1974818pyb.10 for ; Mon, 14 Apr 2008 01:27:21 -0700 (PDT) Received: by 10.141.152.8 with SMTP id e8mr3198708rvo.19.1208161641074; Mon, 14 Apr 2008 01:27:21 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id g22sm9408758rvb.5.2008.04.14.01.27.19 (version=SSLv3 cipher=OTHER); Mon, 14 Apr 2008 01:27:20 -0700 (PDT) Date: Sun, 13 Apr 2008 22:27:46 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Arthur Hartwig In-Reply-To: <480313A2.4050306@nokia.com> Message-ID: <20080413222626.X959@desktop> References: <20080412132457.W43186@desktop> <480313A2.4050306@nokia.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 08:27:22 -0000 On Mon, 14 Apr 2008, Arthur Hartwig wrote: > ext Jeff Roberson wrote: >> So I'm in the midst of working on other filesystem concurrency issues and >> that has brought me back around to f_offset again. I'm working on a method >> to allow non-overlapping writes and reads to proceed concurrently to the >> same file. This means the exclusive vnode lock can not be used to protect >> f_offset even in the write case. >> >> To maintain the existing semantics I'm simply going to add an exclusive >> sx_xlock() around access to f_offset. This is done inconsistently today >> which is fine from the perspective of the updates in most cases being >> user-space races. However, f_offset is 64bit and can not be written >> atomically on 32bit systems and so requires some extra synchronization >> there. > I'm not sure of the processor family constraints of the i386 builds, but the > Intel IA32 architecture manual says reads and writes of a quadword (64 bits) > aligned on a quadword boundary are atomic (Pentium and newer CPUs). Guess > that leaves out i386, i486 (any others?) Thanks. I hadn't seen that. Do you know which manual and section states this? I was intending to simply use cmpxchg8b but it sounds like that may not be necessary. We still have to handle other 32bit archs like powerpc and mips but I'm not sure if any of those are SMP. Jeff From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 08:30:21 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E37F3106566B for ; Mon, 14 Apr 2008 08:30:21 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.171]) by mx1.freebsd.org (Postfix) with ESMTP id C4D858FC15 for ; Mon, 14 Apr 2008 08:30:21 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wf-out-1314.google.com with SMTP id 25so1510794wfa.7 for ; Mon, 14 Apr 2008 01:30:21 -0700 (PDT) Received: by 10.142.131.18 with SMTP id e18mr1727023wfd.39.1208161821472; Mon, 14 Apr 2008 01:30:21 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id 30sm10742859wfa.2.2008.04.14.01.30.18 (version=SSLv3 cipher=OTHER); Mon, 14 Apr 2008 01:30:20 -0700 (PDT) Date: Sun, 13 Apr 2008 22:30:46 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Poul-Henning Kamp In-Reply-To: <4993.1208156428@critter.freebsd.dk> Message-ID: <20080413222803.D959@desktop> References: <4993.1208156428@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.ORG, David Schultz Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 08:30:22 -0000 On Mon, 14 Apr 2008, Poul-Henning Kamp wrote: > In message <20080413131422.V959@desktop>, Jeff Roberson writes: > >> However, f_offset is fetched and set with a shared vnode lock for readers. > > "fetched AND set" with a shared lock ? > > How can that possibly work out sanely ? It doesn't matter which reader wins the race to update offset as long as the resulting offset is entirely valid from one. This is a userspace race otherwise. There is a complication for 64bit offsets on 32bit machines that needs some MD support. Otherwise the write happens in two instructions and can have mixed results. Jeff > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 08:31:38 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C8211065672 for ; Mon, 14 Apr 2008 08:31:38 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.176]) by mx1.freebsd.org (Postfix) with ESMTP id 027848FC25 for ; Mon, 14 Apr 2008 08:31:37 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wa-out-1112.google.com with SMTP id k17so1956593waf.3 for ; Mon, 14 Apr 2008 01:31:37 -0700 (PDT) Received: by 10.114.75.1 with SMTP id x1mr3309880waa.150.1208161897351; Mon, 14 Apr 2008 01:31:37 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id m24sm13654166waf.57.2008.04.14.01.31.34 (version=SSLv3 cipher=OTHER); Mon, 14 Apr 2008 01:31:36 -0700 (PDT) Date: Sun, 13 Apr 2008 22:32:01 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Ed Schouten In-Reply-To: <20080414074710.GI5934@hoeg.nl> Message-ID: <20080413223053.U959@desktop> References: <1309.1208100178@critter.freebsd.dk> <20080413131724.X959@desktop> <20080414074710.GI5934@hoeg.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 08:31:38 -0000 On Mon, 14 Apr 2008, Ed Schouten wrote: > Hello Jeff, > > * Jeff Roberson wrote: >> Concurrent calls to read() are inherently racy. They will still use the >> current value of f_offset and store it while they are done. > > I'm experiencing similar problems with implementing read() and write() > inside my mpsafetty branch for TTY's. Just like the current TTY > implementation, my implementation will do strange things when two > threads call read() or write() at the same time. Data could end up mixed > together. The main cause is that mutexes cannot be held when copying > data back to userspace, which is obvious. You should use an sx lock which can be held across such operations. Non seekable devices, terminals included, have to serialize all IO. They are treated separately by posix. > > I could store flags to indicate a read() or write() call is in progress, > but because there is no requirement for this, I think I won't pay > attention to this. > > With regular files you could probably increment the offset before > copying any data back to userspace, but of course those calls may fail > (EFAULT, EIO), which means the offset shouldn't advance. Right this offset can't be visible to other threads until the operation completes successfully. Jeff > > -- > Ed Schouten > WWW: http://g-rave.nl/ > From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 08:50:40 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 144BE1065674 for ; Mon, 14 Apr 2008 08:50:40 +0000 (UTC) (envelope-from Arthur.Hartwig@nokia.com) Received: from mgw-mx09.nokia.com (smtp.nokia.com [192.100.105.134]) by mx1.freebsd.org (Postfix) with ESMTP id CE3C38FC32 for ; Mon, 14 Apr 2008 08:50:39 +0000 (UTC) (envelope-from Arthur.Hartwig@nokia.com) Received: from esebh105.NOE.Nokia.com (esebh105.ntc.nokia.com [172.21.138.211]) by mgw-mx09.nokia.com (Switch-3.2.6/Switch-3.2.6) with ESMTP id m3E8mhta027607; Mon, 14 Apr 2008 03:52:59 -0500 Received: from esebh102.NOE.Nokia.com ([172.21.138.183]) by esebh105.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 14 Apr 2008 11:49:38 +0300 Received: from syebe101.NOE.Nokia.com ([172.30.128.65]) by esebh102.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 14 Apr 2008 11:49:37 +0300 Received: from [172.30.67.155] ([172.30.67.155]) by syebe101.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 14 Apr 2008 18:49:33 +1000 Message-ID: <48031A9D.3050806@nokia.com> Date: Mon, 14 Apr 2008 18:49:33 +1000 From: Arthur Hartwig User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: ext Jeff Roberson References: <20080412132457.W43186@desktop> <480313A2.4050306@nokia.com> <20080413222626.X959@desktop> In-Reply-To: <20080413222626.X959@desktop> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 14 Apr 2008 08:49:33.0483 (UTC) FILETIME=[75E57FB0:01C89E0C] X-Nokia-AV: Clean Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 08:50:40 -0000 ext Jeff Roberson wrote: > > On Mon, 14 Apr 2008, Arthur Hartwig wrote: > >> ext Jeff Roberson wrote: >>> So I'm in the midst of working on other filesystem concurrency >>> issues and that has brought me back around to f_offset again. I'm >>> working on a method to allow non-overlapping writes and reads to >>> proceed concurrently to the same file. This means the exclusive >>> vnode lock can not be used to protect f_offset even in the write case. >>> >>> To maintain the existing semantics I'm simply going to add an >>> exclusive sx_xlock() around access to f_offset. This is done >>> inconsistently today which is fine from the perspective of the >>> updates in most cases being user-space races. However, f_offset is >>> 64bit and can not be written atomically on 32bit systems and so >>> requires some extra synchronization there. >> I'm not sure of the processor family constraints of the i386 builds, >> but the Intel IA32 architecture manual says reads and writes of a >> quadword (64 bits) aligned on a quadword boundary are atomic (Pentium >> and newer CPUs). Guess that leaves out i386, i486 (any others?) > > Thanks. I hadn't seen that. Do you know which manual and section > states this? Intel 64 and IA-32 Architectures Software Developer's Manual Vol 3A: System Programming Guide, Part 1, section 7.1.1 Guaranteed Atomic Operations. You can download this (and other volumes of the Intel Architecture manuals) from http://www.intel.com/products/processor/manuals/index.htm > I was intending to simply use cmpxchg8b but it sounds like that may > not be necessary. We still have to handle other 32bit archs like > powerpc and mips but I'm not sure if any of those are SMP. Do you also have to handle i386 and i486? > > Jeff From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 09:26:16 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 08CA21065676 for ; Mon, 14 Apr 2008 09:26:16 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id 924F88FC2F for ; Mon, 14 Apr 2008 09:26:15 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from server.vk2pj.dyndns.org (c220-239-20-82.belrs4.nsw.optusnet.com.au [220.239.20.82]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m3E9QAi3005049 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 14 Apr 2008 19:26:12 +1000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.2/8.14.1) with ESMTP id m3E9QATL036102; Mon, 14 Apr 2008 19:26:10 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.2/8.14.2/Submit) id m3E9Q9ur036101; Mon, 14 Apr 2008 19:26:09 +1000 (EST) (envelope-from peter) Date: Mon, 14 Apr 2008 19:26:09 +1000 From: Peter Jeremy To: Arthur Hartwig Message-ID: <20080414092609.GA73016@server.vk2pj.dyndns.org> References: <20080412132457.W43186@desktop> <480313A2.4050306@nokia.com> <20080413222626.X959@desktop> <48031A9D.3050806@nokia.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="45Z9DzgjV8m4Oswq" Content-Disposition: inline In-Reply-To: <48031A9D.3050806@nokia.com> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.17 (2007-11-01) Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 09:26:16 -0000 --45Z9DzgjV8m4Oswq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 14, 2008 at 06:49:33PM +1000, Arthur Hartwig wrote: >Do you also have to handle i386 and i486? i386 hasn't been supported for some time. i486 remains supported but only for UP systems. This means that 64-bit f_offset ops could be managed by using DI/EI (which is probably going to be cheaper than an explicit lock). SMP is only supported on Pentium and later. Of course, there sre still the ARM, MIPS and PPC architectures to support. --=20 Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. --45Z9DzgjV8m4Oswq Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.8 (FreeBSD) iEYEARECAAYFAkgDIzEACgkQ/opHv/APuIc5igCglQDPahkPkl5H12G3JQgUGsRI OS0AoJMtqax+i/2IPqvKeF1r8WKe3oWs =4fTR -----END PGP SIGNATURE----- --45Z9DzgjV8m4Oswq-- From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 11:06:46 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADD54106564A for ; Mon, 14 Apr 2008 11:06:46 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 844CC8FC13 for ; Mon, 14 Apr 2008 11:06:46 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m3EB6kBp072166 for ; Mon, 14 Apr 2008 11:06:46 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m3EB6kJ9072162 for freebsd-arch@FreeBSD.org; Mon, 14 Apr 2008 11:06:46 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 14 Apr 2008 11:06:46 GMT Message-Id: <200804141106.m3EB6kJ9072162@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-arch@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 11:06:46 -0000 Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 11:52:39 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3FCC1106566C; Mon, 14 Apr 2008 11:52:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay01.kiev.sovam.com (relay01.kiev.sovam.com [62.64.120.200]) by mx1.freebsd.org (Postfix) with ESMTP id 89C6C8FC24; Mon, 14 Apr 2008 11:52:38 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=skuns.kiev.zoral.com.ua) by relay01.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1JlNEi-000Fbt-V6; Mon, 14 Apr 2008 14:52:37 +0300 Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by skuns.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3EBqfkK085223 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 14 Apr 2008 14:52:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3EBqX0n038489; Mon, 14 Apr 2008 14:52:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id m3EBqXL7038488; Mon, 14 Apr 2008 14:52:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 14 Apr 2008 14:52:33 +0300 From: Kostik Belousov To: Pawel Jakub Dawidek Message-ID: <20080414115233.GF18958@deviant.kiev.zoral.com.ua> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl> <20080412131604.GX21209@deviant.kiev.zoral.com.ua> <20080414081918.GA10478@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="JcvBIhDvR6w3jUPA" Content-Disposition: inline In-Reply-To: <20080414081918.GA10478@garage.freebsd.pl> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.4 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on skuns.kiev.zoral.com.ua X-Scanner-Signature: 187c1e8f85aa2873c46c23e7719cc4c5 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Header: Not Detected X-SpamTest-Info: Profiles 2625 [Apr 14 2008] X-SpamTest-Info: helo_type=3 X-SpamTest-Method: none X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0278], KAS30/Release Cc: Roman Divacky , rwatson@freebsd.org, freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 11:52:39 -0000 --JcvBIhDvR6w3jUPA Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [I corrected Robert email, it seems that he still does not have an account on the FreeBSD.garage.freebsd.pl] On Mon, Apr 14, 2008 at 10:19:18AM +0200, Pawel Jakub Dawidek wrote: > On Sat, Apr 12, 2008 at 04:16:04PM +0300, Kostik Belousov wrote: > > On Sat, Apr 12, 2008 at 01:20:19PM +0200, Pawel Jakub Dawidek wrote: > > > On Thu, Dec 20, 2007 at 11:38:55AM -0500, John Baldwin wrote: > > > > On Tuesday 18 December 2007 04:22:22 am Roman Divacky wrote: > > > > > Dear arch@ > > > > >=20 > > > > > Over this summer I was working (among other things) on *at family= of syscalls > > > > > kindly sponsored by Google (in their Summer of Code). The resulti= ng patch is=20 > > > > > almost finished but I need to decide one design question. If you = are not interested=20 > > > > > in *at/namei feel free to skip this mail. > > > > >=20 > > > > > The *at syscalls are a threads-oriented extension to basic file s= yscalls (think > > > > > of open(), fstat(), etc.) adding the possibility to specify from = where the search > > > > > for relative path should start. > > > > >=20 > > > > > image that we have /tmp/foo/bar > > > > >=20 > > > > > and CWD is set to "/tmp/", and the process has opened "foo" as di= rfd. with ordinary > > > > > open() syscall you have to either > > > > >=20 > > > > > chdir("/tmp/foo");open("./bar"); > > > > >=20 > > > > > or > > > > >=20 > > > > > open("/tmp/foo/bar"); > > > > >=20 > > > > > The first approach is problematic because it changes CWD for all = threads in the process, > > > > > the second is prone to race-conditions as some of the components = of the path can > > > > > change in parallel with the "open". > > > > >=20 > > > > > So POSIX introduced a new API, called "Extended API set part 2, I= SBN: 1-931624-67-4" (at > > > > > least this was the latest when I looked last time), which solves = that by introducing "*at" > > > > > syscalls that supply an fd of previously opened directory which i= s used instead of CWD > > > > > for searching relative path, ie. the previous example becomes > > > > >=20 > > > > > dirfd =3D open("/tmp/foo"); openat("foo", dirfd); > > > > >=20 > > > > > I implemented the whole API as native FreeBSD syscalls + in linux= ulator emulation layer. > > > > > Here's the problem: > > > > >=20 > > > > > There are two approaches to the name translation from "filedescri= ptor" to the "vnode". > > > > >=20 > > > > > 1) we can do it in the kern_fooat() syscall and pass namei() the = resulting vnode > > > > > 2) we can pass namei() the filedescriptor and do the translation = there > > > > >=20 > > > > > PROs of #1: > > > > >=20 > > > > > o namei() does not need to know about the curthread, you can use= this *at > > > > > ability for different purposes, it's cleaner (imho) > > > > >=20 > > > > > PROs of #2 > > > > >=20 > > > > > o raceless implementation > > > > > o no code duplication > > > > >=20 > > > > > CONs of #1 > > > > >=20 > > > > > o some very small code duplication (the translation is done in e= very=20 > > > > > kern_fooat() function) > > > > > o there is a race between the name translation and the actual us= e of the result > > > > > of the translation that needs to be handled, the "path_to_file"= string is copied > > > > > to the kernel space twice hence a race > > > > >=20 > > > > > CONs of #2 > > > > >=20 > > > > > o namei is made thread dependant =09 > > > > >=20 > > > > > Please tell me what approach you like more. I personally favour #= 1 because I don't like namei() > > > > > being thread dependant, Kostik Belousov prefers #2. > > > >=20 > > > > Considering Robert's paper on security race problems in things like= systrace > > > > stemming from when you copy parameters out of userland and into the= kernel > > > > multiple times, I think #2 is definitely the better choice. Also, = namei() is > > > > already thread aware AFAICT since 'struct componentname' already co= ntains a > > > > 'cnp_thread' member (was 'cnp_proc' in 4.x). > > >=20 > > > It looks like I'm a bit too late, but anyway... > > >=20 > > > From what you write John, #1 is a better choice than #2. If you want = to > > > avoid races, you can pass already locked vnode. In case of file > > > descriptors, if p_fd is not locked another thread can close and open > > > different directory under the same descriptor number. > > This is the application imposed race, not the externally imposed one. > > Moreover, I would argue that this is application error. >=20 > Right, this will be an application bug and the race can occur even > before entering the kernel. What I'm saying is that this is more racy > than vnode approach (at least is not less racy). >=20 > > > I also need such functionality for recent ZFS and #2 makes it impossi= ble > > > to use it. NDINIT_AT() is kernel (VFS) API so it should operate on > > > vnodes, not file descriptor numbers, IMHO. > > Following the same arguments, NDINIT() shall not operate on the pathes > > too. >=20 > We both know that we have to convert path to vnode somehow, but you > can't disagree that VFS API doesn't operate on file descriptors in > general, but on vnodes. >=20 > > > For completness can you Kostik and Robert provide your arguments agai= nst > > > #1? > >=20 > > The #2 was already committed. > > The #1 caused a code duplication that was quite error-prone. > >=20 > > What are your problems with the #2 ? >=20 > Take a look at perforce change @139873. You can find there how I was > using #1. Lets distinguish between the desired KPI and the *at implementation. Would the patch below do what you want ? NDINIT_ATVP takes the vnode that is assumed to be referenced, and unconditionally consumes the reference. The vnode is taken for the start of the path translation iff the path is relative. Untested. diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index 0185158..c8a9860 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -193,14 +193,21 @@ namei(struct nameidata *ndp) ndp->ni_rootdir =3D fdp->fd_rdir; ndp->ni_topdir =3D fdp->fd_jdir; =20 - if (cnp->cn_pnbuf[0] !=3D '/' && ndp->ni_dirfd !=3D AT_FDCWD) { - error =3D fgetvp(td, ndp->ni_dirfd, &dp); - FILEDESC_SUNLOCK(fdp); - if (error =3D=3D 0 && dp->v_type !=3D VDIR) { - vfslocked =3D VFS_LOCK_GIANT(dp->v_mount); - vrele(dp); - VFS_UNLOCK_GIANT(vfslocked); - error =3D ENOTDIR; + dp =3D NULL; + if (cnp->cn_pnbuf[0] !=3D '/') { + if (ndp->ni_startdir !=3D NULL) { + dp =3D ndp->ni_startdir; + error =3D 0; + } else if (ndp->ni_dirfd !=3D AT_FDCWD) + error =3D fgetvp(td, ndp->ni_dirfd, &dp); + if (error !=3D 0 || dp !=3D NULL) { + FILEDESC_SUNLOCK(fdp); + if (error =3D=3D 0 && dp->v_type !=3D VDIR) { + vfslocked =3D VFS_LOCK_GIANT(dp->v_mount); + vrele(dp); + VFS_UNLOCK_GIANT(vfslocked); + error =3D ENOTDIR; + } } if (error) { uma_zfree(namei_zone, cnp->cn_pnbuf); @@ -210,10 +217,16 @@ namei(struct nameidata *ndp) #endif return (error); } - } else { + } + if (dp =3D=3D NULL) { dp =3D fdp->fd_cdir; VREF(dp); FILEDESC_SUNLOCK(fdp); + if (ndp->ni_startdir !=3D NULL) { + vfslocked =3D VFS_LOCK_GIANT(ndp->ni_startdir->v_mount); + vrele(ndp->ni_startdir); + VFS_UNLOCK_GIANT(vfslocked); + } } vfslocked =3D VFS_LOCK_GIANT(dp->v_mount); for (;;) { diff --git a/sys/sys/namei.h b/sys/sys/namei.h index 121f014..5d1c1ea 100644 --- a/sys/sys/namei.h +++ b/sys/sys/namei.h @@ -150,14 +150,19 @@ struct nameidata { * Initialization of a nameidata structure. */ #define NDINIT(ndp, op, flags, segflg, namep, td) \ - NDINIT_AT(ndp, op, flags, segflg, namep, AT_FDCWD, td) + NDINIT_ALL(ndp, op, flags, segflg, namep, AT_FDCWD, NULL, td) +#define NDINIT_AT(ndp, op, flags, segflg, namep, dirfd, td) \ + NDINIT_ALL(ndp, op, flags, segflg, namep, dirfd, NULL, td) +#define NDINIT_ATVP(ndp, op, flags, segflg, namep, vp, td) \ + NDINIT_ALL(ndp, op, flags, segflg, namep, AT_FDCWD, vp, td) =20 static __inline void -NDINIT_AT(struct nameidata *ndp, +NDINIT_ALL(struct nameidata *ndp, u_long op, u_long flags, enum uio_seg segflg, const char *namep, int dirfd, + struct vnode *startdir, struct thread *td) { ndp->ni_cnd.cn_nameiop =3D op; @@ -165,6 +170,7 @@ NDINIT_AT(struct nameidata *ndp, ndp->ni_segflg =3D segflg; ndp->ni_dirp =3D namep; ndp->ni_dirfd =3D dirfd; + ndp->ni_startdir =3D startdir; ndp->ni_cnd.cn_thread =3D td; } =20 --JcvBIhDvR6w3jUPA Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkgDRYAACgkQC3+MBN1Mb4jZRwCfaYKlaYfsGcuPlD+HevMXnVH6 X2QAnj5Gc/qCdzApt38FethOb3zOpn4R =aWrS -----END PGP SIGNATURE----- --JcvBIhDvR6w3jUPA-- From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 14:55:55 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 826DA10656AE for ; Mon, 14 Apr 2008 14:55:55 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id 462AC8FC1B for ; Mon, 14 Apr 2008 14:55:55 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id 1F72C1CF2E; Mon, 14 Apr 2008 16:52:31 +0200 (CEST) Date: Mon, 14 Apr 2008 16:52:31 +0200 From: Ed Schouten To: Jeff Roberson Message-ID: <20080414145231.GJ5934@hoeg.nl> References: <1309.1208100178@critter.freebsd.dk> <20080413131724.X959@desktop> <20080414074710.GI5934@hoeg.nl> <20080413223053.U959@desktop> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="8+odlFQADydc3R4z" Content-Disposition: inline In-Reply-To: <20080413223053.U959@desktop> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 14:55:55 -0000 --8+odlFQADydc3R4z Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * Jeff Roberson wrote: > You should use an sx lock which can be held across such operations. Non= =20 > seekable devices, terminals included, have to serialize all IO. They are= =20 > treated separately by posix. It's all so confusing that the standards seem to change then. When I take a look at the POSIX onlinepubs, the articles seem to mention the opposite: http://www.opengroup.org/onlinepubs/009695399/functions/read.html "The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified." http://www.opengroup.org/onlinepubs/009695399/functions/write.html "This volume of IEEE Std 1003.1-2001 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control." --=20 Ed Schouten WWW: http://g-rave.nl/ --8+odlFQADydc3R4z Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkgDb68ACgkQ52SDGA2eCwUzYwCfVb77MvmedRLqPwP2Jo6zrTUF PrYAn28KWSfn7Lcke0ZXmL51kh4Zz2VR =lYto -----END PGP SIGNATURE----- --8+odlFQADydc3R4z-- From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 14:59:50 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F502106564A for ; Mon, 14 Apr 2008 14:59:50 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 23E7D8FC13 for ; Mon, 14 Apr 2008 14:59:49 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id B49CA17107; Mon, 14 Apr 2008 14:59:48 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m3EExmd4093247; Mon, 14 Apr 2008 14:59:48 GMT (envelope-from phk@critter.freebsd.dk) To: Ed Schouten From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 14 Apr 2008 16:52:31 +0200." <20080414145231.GJ5934@hoeg.nl> Date: Mon, 14 Apr 2008 14:59:48 +0000 Message-ID: <93246.1208185188@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 14:59:50 -0000 In message <20080414145231.GJ5934@hoeg.nl>, Ed Schouten writes: >It's all so confusing that the standards seem to change then. When I >take a look at the POSIX onlinepubs, the articles seem to mention the >opposite: > >http://www.opengroup.org/onlinepubs/009695399/functions/read.html > > "The behavior of multiple concurrent reads on the same pipe, > FIFO, or terminal device is unspecified." > >http://www.opengroup.org/onlinepubs/009695399/functions/write.html > > "This volume of IEEE Std 1003.1-2001 does not specify behavior > of concurrent writes to a file from multiple processes. > Applications should use some form of concurrency control." Remember that POSIX was written so both MVS and Windows could comply, UNIX may have and need higher standards. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 15:18:25 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 564EF1065670 for ; Mon, 14 Apr 2008 15:18:25 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms046pub.verizon.net (vms046pub.verizon.net [206.46.252.46]) by mx1.freebsd.org (Postfix) with ESMTP id 37DC88FC15 for ; Mon, 14 Apr 2008 15:18:25 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms063.mailsrvcs.net ([172.18.12.132]) by vms046.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0JZB00A25MHX7S40@vms046.mailsrvcs.net> for arch@freebsd.org; Mon, 14 Apr 2008 10:17:57 -0500 (CDT) Received: from 65.242.108.162 ([65.242.108.162]) by vms063.mailsrvcs.net (Verizon Webmail) with HTTP; Mon, 14 Apr 2008 10:17:57 -0500 (CDT) Date: Mon, 14 Apr 2008 10:17:57 -0500 (CDT) From: Sergey Babkin X-Originating-IP: [65.242.108.162] To: Ed Schouten , Poul-Henning Kamp Message-id: <25637504.3015221208186277561.JavaMail.root@vms063.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: 7bit Cc: arch@freebsd.org Subject: Re: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 15:18:25 -0000 >From: Poul-Henning Kamp >In message <20080414145231.GJ5934@hoeg.nl>, Ed Schouten writes: > >>It's all so confusing that the standards seem to change then. When I >>take a look at the POSIX onlinepubs, the articles seem to mention the >>opposite: >> >>http://www.opengroup.org/onlinepubs/009695399/functions/read.html >> >> "The behavior of multiple concurrent reads on the same pipe, >> FIFO, or terminal device is unspecified." >> >>http://www.opengroup.org/onlinepubs/009695399/functions/write.html >> >> "This volume of IEEE Std 1003.1-2001 does not specify behavior >> of concurrent writes to a file from multiple processes. >> Applications should use some form of concurrency control." > >Remember that POSIX was written so both MVS and Windows could comply, And nowadays Linux too :-) >UNIX may have and need higher standards. Yep. Linux in this respect is not Unix. This behavior of Linux is pretty annoying (the log files get mixed up pretty badly) but I agree that it makes the writes on the highly contested files faster. -SB From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 15:56:40 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F2EE1106564A for ; Mon, 14 Apr 2008 15:56:40 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from smtpoutm.mac.com (smtpoutm.mac.com [17.148.16.80]) by mx1.freebsd.org (Postfix) with ESMTP id DC5458FC1E for ; Mon, 14 Apr 2008 15:56:40 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from mac.com (asmtp005-s [10.150.69.68]) by smtpoutm.mac.com (Xserve/smtpout017/MantshX 4.0) with ESMTP id m3EFueIW019502; Mon, 14 Apr 2008 08:56:40 -0700 (PDT) Received: from macbook-pro.jnpr.net (natint3.juniper.net [66.129.224.36]) (authenticated bits=0) by mac.com (Xserve/asmtp005/MantshX 4.0) with ESMTP id m3EFuS1k022548 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Mon, 14 Apr 2008 08:56:34 -0700 (PDT) Message-Id: <300DE361-167E-4491-8E8C-7A227225B506@mac.com> From: Marcel Moolenaar To: Jeff Roberson In-Reply-To: <20080413222626.X959@desktop> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Mon, 14 Apr 2008 08:56:24 -0700 References: <20080412132457.W43186@desktop> <480313A2.4050306@nokia.com> <20080413222626.X959@desktop> X-Mailer: Apple Mail (2.919.2) Cc: arch@freebsd.org, Arthur Hartwig Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 15:56:41 -0000 On Apr 14, 2008, at 1:27 AM, Jeff Roberson wrote: > > On Mon, 14 Apr 2008, Arthur Hartwig wrote: > >> ext Jeff Roberson wrote: >>> So I'm in the midst of working on other filesystem concurrency >>> issues and that has brought me back around to f_offset again. I'm >>> working on a method to allow non-overlapping writes and reads to >>> proceed concurrently to the same file. This means the exclusive >>> vnode lock can not be used to protect f_offset even in the write >>> case. >>> To maintain the existing semantics I'm simply going to add an >>> exclusive sx_xlock() around access to f_offset. This is done >>> inconsistently today which is fine from the perspective of the >>> updates in most cases being user-space races. However, f_offset >>> is 64bit and can not be written atomically on 32bit systems and so >>> requires some extra synchronization there. >> I'm not sure of the processor family constraints of the i386 >> builds, but the Intel IA32 architecture manual says reads and >> writes of a quadword (64 bits) aligned on a quadword boundary are >> atomic (Pentium and newer CPUs). Guess that leaves out i386, i486 >> (any others?) > > Thanks. I hadn't seen that. Do you know which manual and section > states this? I was intending to simply use cmpxchg8b but it sounds > like that may not be necessary. We still have to handle other 32bit > archs like powerpc and mips but I'm not sure if any of those are SMP. I'm working on SMP for PowerPC.. -- Marcel Moolenaar xcllnt@mac.com From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 17:32:34 2008 Return-Path: Delivered-To: arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B3EB1065671 for ; Mon, 14 Apr 2008 17:32:34 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id 3EF668FC0C for ; Mon, 14 Apr 2008 17:32:34 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.2/8.14.2) with ESMTP id m3EHZtsE049482; Mon, 14 Apr 2008 13:35:55 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.2/8.14.2/Submit) id m3EHZtp0049481; Mon, 14 Apr 2008 13:35:55 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Date: Mon, 14 Apr 2008 13:35:55 -0400 From: David Schultz To: Jeff Roberson Message-ID: <20080414173555.GA49271@zim.MIT.EDU> Mail-Followup-To: Jeff Roberson , Arthur Hartwig , arch@FreeBSD.ORG References: <20080412132457.W43186@desktop> <480313A2.4050306@nokia.com> <20080413222626.X959@desktop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080413222626.X959@desktop> Cc: arch@FreeBSD.ORG, Arthur Hartwig Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 17:32:34 -0000 On Sun, Apr 13, 2008, Jeff Roberson wrote: > Thanks. I hadn't seen that. Do you know which manual and section states > this? I was intending to simply use cmpxchg8b but it sounds like that may > not be necessary. We still have to handle other 32bit archs like powerpc > and mips but I'm not sure if any of those are SMP. Usually single-instruction writes to within a single cache line are atomic, and cache lines are often 128+ bits. The trouble is that 32-bit architectures may not have 64-bit store instructions at all. I know that 32-bit powerpc processors don't have 64-bit compare-and-swap atomic ops. They do have a stmw instruction, which is sort of equivalent to i386's 'rep stosd', and can store 64 bits in one instruction, but I doubt that that's atomic. Maybe the best strategy would be to say something like: #ifdef atomic_set_64 atomic_set_64(&fp->f_offset, uio_resid); #else mtx_lock(fp->f_mtx); fp->f_offset = uio_resid; #endif (Worse) alternatives: - Generic (very slow) atomic_set_64 implemented in terms of hashed mutexes. - Add a level of indirection to implement the 64-bit update in terms of atomic pointer ops, i.e.: newoffset = uio_resid; membar(); atomic_set_ptr(&fp->f_offsetp, &newoffset); The idea here is that pointer-size writes are often atomic even if uint64_t-size writes aren't. From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 17:34:41 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 083391065680 for ; Mon, 14 Apr 2008 17:34:41 +0000 (UTC) (envelope-from info@uktradestreet.com) Received: from smtp5.freeserve.com (smtp5.freeserve.com [193.252.22.159]) by mx1.freebsd.org (Postfix) with ESMTP id 90C008FC30 for ; Mon, 14 Apr 2008 17:34:40 +0000 (UTC) (envelope-from info@uktradestreet.com) Received: from smtp5.freeserve.com (mwinf3423 [10.232.11.123]) by mwinf3415.me.freeserve.com (SMTP Server) with ESMTP id 9D4FD1C00ED3 for ; Mon, 14 Apr 2008 19:14:26 +0200 (CEST) Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf3423.me.freeserve.com (SMTP Server) with ESMTP id 59BA71C00095 for ; Mon, 14 Apr 2008 19:14:26 +0200 (CEST) Received: from Tilly (unknown [91.104.80.190]) by mwinf3423.me.freeserve.com (SMTP Server) with SMTP id 070C41C00087 for ; Mon, 14 Apr 2008 19:14:19 +0200 (CEST) X-ME-UUID: 20080414171420289.070C41C00087@mwinf3423.me.freeserve.com Message-ID: From: "uktradestreet" To: Date: Mon, 14 Apr 2008 17:59:48 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Renovation Projects X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: info@uktradestreet.com List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 17:34:41 -0000 Hello I came across your details=A0and hope you will not see this as junk mail; = if=20 so, please do accept my apologies=2E =A0 We at =A0www.uktradestreet.com=A0 can offer you a complimentary service in = that=20 we have genuine customers - both commercial and domestic who would need = your=20 services and as such we would welcome you to register with us for FREE=2E =A0 If you are a customer who needs to have a job undertaken - whether at = home=20 or in the office, then likewise you are most welcome to post your job on = our=20 site, and honest, reliable and customer recommended = Tradespeople/companies,=20 will be happy to give you=A0a Quotation=2E =A0 Why not have a look around, and remember, it is FREE to register ~ and = you=20 can tell all your friends and colleagues too! =A0 Thanks Michelle T:020 8133 0625 www.uktradestreet.com =A0 =A0 To unsubscribe please reply with 'unsubscribe' in subject heading ~ but = we hope you will sign up first! From owner-freebsd-arch@FreeBSD.ORG Mon Apr 14 19:18:23 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3F141065672 for ; Mon, 14 Apr 2008 19:18:23 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:610:652::211]) by mx1.freebsd.org (Postfix) with ESMTP id 5A18C8FC37 for ; Mon, 14 Apr 2008 19:18:23 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: by palm.hoeg.nl (Postfix, from userid 1000) id 7C5601CD30; Mon, 14 Apr 2008 21:14:53 +0200 (CEST) Date: Mon, 14 Apr 2008 21:14:53 +0200 From: Ed Schouten To: Poul-Henning Kamp Message-ID: <20080414191453.GK5934@hoeg.nl> References: <20080414074710.GI5934@hoeg.nl> <5287.1208159704@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Ht+/lPZLak6eP81R" Content-Disposition: inline In-Reply-To: <5287.1208159704@critter.freebsd.dk> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: arch@freebsd.org Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2008 19:18:23 -0000 --Ht+/lPZLak6eP81R Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello Poul-Henning, Sorry, I forgot to reply to your message this afternoon. * Poul-Henning Kamp wrote: > In message <20080414074710.GI5934@hoeg.nl>, Ed Schouten writes: >=20 > >I'm experiencing similar problems with implementing read() and write() > >inside my mpsafetty branch for TTY's. Just like the current TTY > >implementation, my implementation will do strange things when two > >threads call read() or write() at the same time. Data could end up mixed > >together. >=20 > The write side of this will break quite a lot of stuff, starting > with syslogd(8), write(1), wall(1) and similar, all which expect > to be able to spam terminals coherently. >=20 > The read side will probably mostly cause trouble for programs that > try to take input from /dev/tty, usually passwords. I'll explain what I've done with the TTY layer somewhat more in depth. I was a little brief in my last message. It is true that the read()-side of the TTY layer doesn't really offer much guarantees when multiple applications try to perform a read() on both the TTY device or the PTY controller device. I've tried to prevent as much copying as possible, so unlike the current code there isn't a buffer in between. A compromise I had to make was that read()'s on TTY's aren't serialized. I don't think this will cause a problem: - PTY controller devices aren't really intended to be read() by multiple threads at the same time. - It's not likely multiple read()'s on the TTY device will happen a lot. When a background process group tries to perform a read(), it will most likely receive a SIGTTIN (except when it ignores the signal, etc). - I am sure this will not cause any problems in canonical mode, because there is already a guarantee that the VEOL/VEOF character will always be processed by the thread that first detected it. It is not possible that the character is interpreted by multiple readers. Now about the write() case. I said write()'s were completely unprotected, but I was a little brief about this. Because it's too complex to implement an unbuffered mechanism that copies data from userspace directly into the buffer queue (makes locking hard, input could be processed to expand to a different amount of bytes, etc), write() calls will be buffered. There are only two cases where a write() may end up fragmented when multiple write() calls would happen at the same time: - The write() exceeds the write buffer size of 256 bytes (100 bytes with the current TTY layer). Maybe I should adjust this value, because 256 bytes may be a lot when allocated on the stack. - The write() call causes the calling thread to be blocked, because the TTY has reached its high watermark. All in all I think the way I've implemented the TTY layer should be quite safe. Its guarantees don't differ too much when compared to the existing implementation, in my opinion. --=20 Ed Schouten WWW: http://g-rave.nl/ --Ht+/lPZLak6eP81R Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkgDrS0ACgkQ52SDGA2eCwX8kACdHbYeBzOOtqEHDiJwRZ+NsLKz OssAniUX/VbObXglktgx1EHEVBeacJsM =+YxA -----END PGP SIGNATURE----- --Ht+/lPZLak6eP81R-- From owner-freebsd-arch@FreeBSD.ORG Tue Apr 15 14:29:55 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 75EA6106564A for ; Tue, 15 Apr 2008 14:29:55 +0000 (UTC) (envelope-from ups@freebsd.org) Received: from smtpout09.prod.mesa1.secureserver.net (smtpout09-04.prod.mesa1.secureserver.net [64.202.165.17]) by mx1.freebsd.org (Postfix) with SMTP id 50CFA8FC13 for ; Tue, 15 Apr 2008 14:29:55 +0000 (UTC) (envelope-from ups@freebsd.org) Received: (qmail 17595 invoked from network); 15 Apr 2008 14:03:15 -0000 Received: from unknown (70.193.30.0) by smtpout09-04.prod.mesa1.secureserver.net (64.202.165.17) with ESMTP; 15 Apr 2008 14:03:14 -0000 Message-ID: <4804B58C.1010403@freebsd.org> Date: Tue, 15 Apr 2008 10:02:52 -0400 From: Stephan Uphoff User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: Jeff Roberson References: <200804121703.m3CH3StJ081660@chez.mckusick.com> <41ED3941-E5E6-45F0-B880-C1B2861FDE32@rabson.org> <20080412131017.S43186@desktop> <20080412222458.B959@desktop> In-Reply-To: <20080412222458.B959@desktop> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Kirk McKusick , arch@freebsd.org Subject: Re: [SPAM] Re: VOP_LEASE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Apr 2008 14:29:55 -0000 Jeff Roberson wrote: > On Sun, 13 Apr 2008, Doug Rabson wrote: > >> >> On 13 Apr 2008, at 00:15, Jeff Roberson wrote: >> >>> On Sat, 12 Apr 2008, Doug Rabson wrote: >>> >>>> >>>> On 12 Apr 2008, at 18:03, Kirk McKusick wrote: >>>> >>>>>> Date: Sat, 12 Apr 2008 02:13:15 -1000 (HST) >>>>>> From: Jeff Roberson >>>>>> To: arch@freebsd.org >>>>>> Subject: VOP_LEASE >>>>>> As far as I can tell this has never been used. Unless someone >>>>>> can show me >>>>>> otherwise I'm going to go ahead and remove it. >>>>>> Thanks, >>>>>> Jeff >>>>> VOP_LEASE is used by NQNFS and NFSv4. It notifies them when a file >>>>> is modified locally so that they know to update any outstanding >>>>> leases (e.g., evict any write lease for the file and do callbacks >>>>> for any read leases for the file). Deleting VOP_LEASE would break >>>>> NFS big time. >>>> >>>> I think our NQNFS support might have been removed some time ago - I >>>> can't see any calls to VOP_LEASE in the code right now. Something >>>> like VOP_LEASE would certainly be useful for a hypothetical future >>>> NFSv4 server. I believe that samba could use it too for its oplocks >>>> feature which appears to be similar to NQNFS's leases and NFSv4's >>>> delegations. >>> >>> So the idea with delegations is that close() doesn't actually >>> release the file entirely to make future access cheaper? >>> >>> My issue with VOP_LEASE is only that there are no in kernel >>> implementations of the VOP. I doubt it is applied regularly in >>> syscalls. It also seems odd that it is called without a lock. >>> >>> Is the intent that the server will trap all accesses to a local >>> vnode in order to invalidate the client leases? >> >> I'm working from memory here (too lazy to checkout an old tree). I >> seem to remember that the way this worked is that when an NQNFS >> server granted a lease to a remote client, it arranged things so that >> any local filesystem access to the leased file would first evict the >> remote leaseholder. While the remote client has a valid lease, it is >> free to agressively cache locally as long as it flushes write to the >> server on eviction. The implementation was quite intrusive on the >> server. I can't quite remember where VOP_LEASE came in and the >> documentation is useless. > > I discussed it more with alfred. I don't intend to remove VOP_LEASE > since there may be some valid use for it. We just haven't had any > code in at least a decade that made use of it so I thought it was > prime for axing. > > I believe that calling the VOP without a lock makes it prone to races > which make it minimally useful. However I'm willing to reserve > judgement until some consumer actually shows up. I remember looking at VOP_LEASE quite a bit some time ago and came to the same conclusion as Jeff. It is just racy and not maintained. > > Sun doesn't seem to have a VOP_LEASE or similar in Solaris. They > actually seem to install a kind of filter on vfs and vnode operations > and monitor there. Their filters do more than VOP_LEASE does and > operate a bit like the vop_*_pre and post hooks I added for debugging > which now have been turned on all the time. It might be cleaner if we > implemented the lease notification in these hooks instead. > Installing the filters in Solaris seems a bit racy as I did not notice waiting for/draining existing operations. I believe our locking approach ( locking is exported to VFS - something that I personally don't like) may fix this as we can hold a lock while installing a filter. Most VFS exported functions are called with a lock held and the others may be safe. This being said we should talk more about what we don't like about the current VFS before making it even harder to replace. Stephan From owner-freebsd-arch@FreeBSD.ORG Wed Apr 16 14:47:11 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 233EA1065670 for ; Wed, 16 Apr 2008 14:47:11 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 1A18D8FC1A for ; Wed, 16 Apr 2008 14:47:11 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (unknown [208.65.88.170]) by elvis.mu.org (Postfix) with ESMTP id B46031A4D84; Wed, 16 Apr 2008 07:47:10 -0700 (PDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Wed, 16 Apr 2008 09:30:27 -0400 User-Agent: KMail/1.9.7 References: <20080412132457.W43186@desktop> In-Reply-To: <20080412132457.W43186@desktop> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200804160930.27981.jhb@freebsd.org> Cc: Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2008 14:47:11 -0000 On Saturday 12 April 2008 07:51:15 pm Jeff Roberson wrote: > To maintain the existing semantics I'm simply going to add an exclusive > sx_xlock() around access to f_offset. This is done inconsistently today > which is fine from the perspective of the updates in most cases being > user-space races. However, f_offset is 64bit and can not be written > atomically on 32bit systems and so requires some extra synchronization > there. > > The sx lock will nearly double the size of struct file. Although it's > lost some weight in 8.0 that is quite unfortunate. However, the method of > using LOCKED & WAITING flags, msleep and a mutex has ruined performance in > too many cases to continue using it. You could use a pool of sx locks and hash the file pointer to get an offset (ala the mtx pools) to avoid bloating struct file if desired. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Apr 16 14:47:14 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 380A01065671; Wed, 16 Apr 2008 14:47:14 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 1DC578FC1D; Wed, 16 Apr 2008 14:47:14 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (unknown [208.65.88.170]) by elvis.mu.org (Postfix) with ESMTP id 7FBFA1A4D8C; Wed, 16 Apr 2008 07:47:13 -0700 (PDT) From: John Baldwin To: Pawel Jakub Dawidek Date: Wed, 16 Apr 2008 10:14:40 -0400 User-Agent: KMail/1.9.7 References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl> In-Reply-To: <20080412112019.GI45299@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200804161014.41025.jhb@freebsd.org> Cc: Roman Divacky , kib@freebsd.org, rwatson@freebsd.garage.freebsd.pl, freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2008 14:47:14 -0000 On Saturday 12 April 2008 07:20:19 am Pawel Jakub Dawidek wrote: > On Thu, Dec 20, 2007 at 11:38:55AM -0500, John Baldwin wrote: > > On Tuesday 18 December 2007 04:22:22 am Roman Divacky wrote: > > > Dear arch@ > > > > > > Over this summer I was working (among other things) on *at family of > > > syscalls kindly sponsored by Google (in their Summer of Code). The > > > resulting patch is almost finished but I need to decide one design > > > question. If you are not interested in *at/namei feel free to skip this > > > mail. > > > > > > The *at syscalls are a threads-oriented extension to basic file > > > syscalls (think of open(), fstat(), etc.) adding the possibility to > > > specify from where the search for relative path should start. > > > > > > image that we have /tmp/foo/bar > > > > > > and CWD is set to "/tmp/", and the process has opened "foo" as dirfd. > > > with ordinary open() syscall you have to either > > > > > > chdir("/tmp/foo");open("./bar"); > > > > > > or > > > > > > open("/tmp/foo/bar"); > > > > > > The first approach is problematic because it changes CWD for all > > > threads in the process, the second is prone to race-conditions as some > > > of the components of the path can change in parallel with the "open". > > > > > > So POSIX introduced a new API, called "Extended API set part 2, ISBN: > > > 1-931624-67-4" (at least this was the latest when I looked last time), > > > which solves that by introducing "*at" syscalls that supply an fd of > > > previously opened directory which is used instead of CWD for searching > > > relative path, ie. the previous example becomes > > > > > > dirfd = open("/tmp/foo"); openat("foo", dirfd); > > > > > > I implemented the whole API as native FreeBSD syscalls + in linuxulator > > > emulation layer. Here's the problem: > > > > > > There are two approaches to the name translation from "filedescriptor" > > > to the "vnode". > > > > > > 1) we can do it in the kern_fooat() syscall and pass namei() the > > > resulting vnode 2) we can pass namei() the filedescriptor and do the > > > translation there > > > > > > PROs of #1: > > > > > > o namei() does not need to know about the curthread, you can use this > > > *at ability for different purposes, it's cleaner (imho) > > > > > > PROs of #2 > > > > > > o raceless implementation > > > o no code duplication > > > > > > CONs of #1 > > > > > > o some very small code duplication (the translation is done in every > > > kern_fooat() function) > > > o there is a race between the name translation and the actual use of > > > the result of the translation that needs to be handled, the > > > "path_to_file" string is copied to the kernel space twice hence a race > > > > > > CONs of #2 > > > > > > o namei is made thread dependant > > > > > > Please tell me what approach you like more. I personally favour #1 > > > because I don't like namei() being thread dependant, Kostik Belousov > > > prefers #2. > > > > Considering Robert's paper on security race problems in things like > > systrace stemming from when you copy parameters out of userland and into > > the kernel multiple times, I think #2 is definitely the better choice. > > Also, namei() is already thread aware AFAICT since 'struct componentname' > > already contains a 'cnp_thread' member (was 'cnp_proc' in 4.x). > > It looks like I'm a bit too late, but anyway... > > From what you write John, #1 is a better choice than #2. If you want to > avoid races, you can pass already locked vnode. In case of file > descriptors, if p_fd is not locked another thread can close and open > different directory under the same descriptor number. Did you read Robert's paper? Do you not realize that the kernel copying data in from userland multiple times and having it change in between is very bug prone? -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Apr 16 15:38:14 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 439D61065672 for ; Wed, 16 Apr 2008 15:38:14 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 04DED8FC1A for ; Wed, 16 Apr 2008 15:38:13 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.2/8.14.1) with ESMTP id m3GFbSNu021741; Wed, 16 Apr 2008 09:37:28 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Wed, 16 Apr 2008 09:38:27 -0600 (MDT) Message-Id: <20080416.093827.-262812665.imp@bsdimp.com> To: xcllnt@mac.com From: "M. Warner Losh" In-Reply-To: <300DE361-167E-4491-8E8C-7A227225B506@mac.com> References: <480313A2.4050306@nokia.com> <20080413222626.X959@desktop> <300DE361-167E-4491-8E8C-7A227225B506@mac.com> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org, Arthur.Hartwig@nokia.com Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2008 15:38:14 -0000 In message: <300DE361-167E-4491-8E8C-7A227225B506@mac.com> Marcel Moolenaar writes: : : On Apr 14, 2008, at 1:27 AM, Jeff Roberson wrote: : > : > On Mon, 14 Apr 2008, Arthur Hartwig wrote: : > : >> ext Jeff Roberson wrote: : >>> So I'm in the midst of working on other filesystem concurrency : >>> issues and that has brought me back around to f_offset again. I'm : >>> working on a method to allow non-overlapping writes and reads to : >>> proceed concurrently to the same file. This means the exclusive : >>> vnode lock can not be used to protect f_offset even in the write : >>> case. : >>> To maintain the existing semantics I'm simply going to add an : >>> exclusive sx_xlock() around access to f_offset. This is done : >>> inconsistently today which is fine from the perspective of the : >>> updates in most cases being user-space races. However, f_offset : >>> is 64bit and can not be written atomically on 32bit systems and so : >>> requires some extra synchronization there. : >> I'm not sure of the processor family constraints of the i386 : >> builds, but the Intel IA32 architecture manual says reads and : >> writes of a quadword (64 bits) aligned on a quadword boundary are : >> atomic (Pentium and newer CPUs). Guess that leaves out i386, i486 : >> (any others?) : > : > Thanks. I hadn't seen that. Do you know which manual and section : > states this? I was intending to simply use cmpxchg8b but it sounds : > like that may not be necessary. We still have to handle other 32bit : > archs like powerpc and mips but I'm not sure if any of those are SMP. : : I'm working on SMP for PowerPC.. Support for MIPS SMP is in the initial commit. It might not be working, but one of the big reasons that people want MIPS and FreeBSD is due to the excellent scaling work that's been done as well as the prevenance of multicore MIPS designs for certain application domains. Warner From owner-freebsd-arch@FreeBSD.ORG Wed Apr 16 16:56:37 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DBB041065672; Wed, 16 Apr 2008 16:56:37 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello087206046210.chello.pl [87.206.46.210]) by mx1.freebsd.org (Postfix) with ESMTP id 18E8B8FC1A; Wed, 16 Apr 2008 16:56:36 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id A5A8845B36; Wed, 16 Apr 2008 18:56:34 +0200 (CEST) Received: from localhost (chello087206046210.chello.pl [87.206.46.210]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id F2C1F45684; Wed, 16 Apr 2008 18:56:27 +0200 (CEST) Date: Wed, 16 Apr 2008 18:56:12 +0200 From: Pawel Jakub Dawidek To: John Baldwin Message-ID: <20080416165612.GA31094@garage.freebsd.pl> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl> <200804161014.41025.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="XsQoSWH+UP9D9v3l" Content-Disposition: inline In-Reply-To: <200804161014.41025.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 8.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham version=3.0.4 Cc: kib@freebsd.org, Roman Divacky , rwatson@freebsd.org, freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2008 16:56:38 -0000 --XsQoSWH+UP9D9v3l Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Apr 16, 2008 at 10:14:40AM -0400, John Baldwin wrote: > On Saturday 12 April 2008 07:20:19 am Pawel Jakub Dawidek wrote: > > From what you write John, #1 is a better choice than #2. If you want to > > avoid races, you can pass already locked vnode. In case of file > > descriptors, if p_fd is not locked another thread can close and open > > different directory under the same descriptor number. >=20 > Did you read Robert's paper? Do you not realize that the kernel copying = data=20 > in from userland multiple times and having it change in between is very b= ug=20 > prone? Believe me I'm fully aware of the problems Robert described in his paper. With vnode approach where do you have more data copying between kernel and userland? File descriptor proposal works like this: userland openat(fd, path) kernel NDINIT_AT(&vp, path, fd); /* operate on vp */ Vnode proposal works this way: userland openat(fd, path) kernel dvp =3D file_descriptor_to_vnode(fd); NDINIT_AT(&vp, path, dvp); /* operate on vp */ --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --XsQoSWH+UP9D9v3l Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFIBi+rForvXbEpPzQRAuQzAKCiBCGMz+eYtUMUlxOHAZ/h24jFsgCfXTso hh0WOhUPunmiEGxhJ/Do/Vs= =iqfy -----END PGP SIGNATURE----- --XsQoSWH+UP9D9v3l-- From owner-freebsd-arch@FreeBSD.ORG Wed Apr 16 17:03:42 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 37342106566C; Wed, 16 Apr 2008 17:03:42 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 2149E8FC33; Wed, 16 Apr 2008 17:03:42 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 0DC251A4D8B; Wed, 16 Apr 2008 10:03:42 -0700 (PDT) Date: Wed, 16 Apr 2008 10:03:42 -0700 From: Alfred Perlstein To: Pawel Jakub Dawidek Message-ID: <20080416170341.GN95731@elvis.mu.org> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl> <200804161014.41025.jhb@freebsd.org> <20080416165612.GA31094@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080416165612.GA31094@garage.freebsd.pl> User-Agent: Mutt/1.4.2.3i Cc: rwatson@freebsd.org, Roman Divacky , kib@freebsd.org, freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2008 17:03:42 -0000 * Pawel Jakub Dawidek [080416 09:56] wrote: > On Wed, Apr 16, 2008 at 10:14:40AM -0400, John Baldwin wrote: > > On Saturday 12 April 2008 07:20:19 am Pawel Jakub Dawidek wrote: > > > From what you write John, #1 is a better choice than #2. If you want to > > > avoid races, you can pass already locked vnode. In case of file > > > descriptors, if p_fd is not locked another thread can close and open > > > different directory under the same descriptor number. > > > > Did you read Robert's paper? Do you not realize that the kernel copying data > > in from userland multiple times and having it change in between is very bug > > prone? > > Believe me I'm fully aware of the problems Robert described in his > paper. With vnode approach where do you have more data copying between > kernel and userland? > > File descriptor proposal works like this: > > userland > openat(fd, path) > kernel > NDINIT_AT(&vp, path, fd); > /* operate on vp */ > > Vnode proposal works this way: > > userland > openat(fd, path) > kernel > dvp = file_descriptor_to_vnode(fd); > NDINIT_AT(&vp, path, dvp); > /* operate on vp */ My first impression is that passing fp to vp code is a layering violation and bad news. I need to think about it more. -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Wed Apr 16 17:52:14 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87E5A106564A; Wed, 16 Apr 2008 17:52:14 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 220D38FC1E; Wed, 16 Apr 2008 17:52:14 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 95F9F46B33; Wed, 16 Apr 2008 13:52:12 -0400 (EDT) Date: Wed, 16 Apr 2008 18:52:12 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Alfred Perlstein In-Reply-To: <20080416170341.GN95731@elvis.mu.org> Message-ID: <20080416184522.F1046@fledge.watson.org> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl> <200804161014.41025.jhb@freebsd.org> <20080416165612.GA31094@garage.freebsd.pl> <20080416170341.GN95731@elvis.mu.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: kib@freebsd.org, Roman Divacky , Pawel Jakub Dawidek , freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2008 17:52:14 -0000 On Wed, 16 Apr 2008, Alfred Perlstein wrote: >> File descriptor proposal works like this: >> >> userland >> openat(fd, path) >> kernel >> NDINIT_AT(&vp, path, fd); >> /* operate on vp */ >> >> Vnode proposal works this way: >> >> userland >> openat(fd, path) >> kernel >> dvp = file_descriptor_to_vnode(fd); >> NDINIT_AT(&vp, path, dvp); >> /* operate on vp */ > > My first impression is that passing fp to vp code is a layering > violation and bad news. I need to think about it more. NDINIT() is already aware of the file descriptor array because it uses that to get the current working and root directories. And what the *at() system calls are effectively doing is substituting another directory for the current working directory. The exact expression of all this doesn't matter all that much to me, but I think evaluating the file descriptor array for directory stuff all in one place, rather than spread over the caller and NDINIT(), is cleaner and avoids a lot of code everywhere. Nothing says you can't have: void NDINIT(struct nameidata *ndp, u_long op, u_long flags, enum uio_seg segflg, const char *namep, struct thread *td); void NDINIT_AT(struct nameidata *ndp, u_long op, u_long flags, enum uio_seg segflg, const char *namep, int fd, struct thread *td); NDINIT_DVP(struct nameidata *ndp, u_long op, u_long flags, enum uio_seg segflg, const char *namep, struct vnode *vp, struct thread *td); However, I think I wouldn't want NDINIT_AT() to be a wrapper for NDINIT_DVP(), because I'd like all that fdp following to occur together. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Wed Apr 16 17:58:38 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E7BA106564A; Wed, 16 Apr 2008 17:58:38 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from relay03.kiev.sovam.com (relay03.kiev.sovam.com [62.64.120.201]) by mx1.freebsd.org (Postfix) with ESMTP id BB0AE8FC1E; Wed, 16 Apr 2008 17:58:37 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=skuns.kiev.zoral.com.ua) by relay03.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1JmBtz-000Aw6-Rl; Wed, 16 Apr 2008 20:58:36 +0300 Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by skuns.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3GHwdaH003781 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 16 Apr 2008 20:58:40 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3GHwWmC056261; Wed, 16 Apr 2008 20:58:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id m3GHwWiv056260; Wed, 16 Apr 2008 20:58:32 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 16 Apr 2008 20:58:32 +0300 From: Kostik Belousov To: Robert Watson Message-ID: <20080416175832.GX18958@deviant.kiev.zoral.com.ua> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl> <200804161014.41025.jhb@freebsd.org> <20080416165612.GA31094@garage.freebsd.pl> <20080416170341.GN95731@elvis.mu.org> <20080416184522.F1046@fledge.watson.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="aJDJANv8BPX70wwH" Content-Disposition: inline In-Reply-To: <20080416184522.F1046@fledge.watson.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.4 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on skuns.kiev.zoral.com.ua X-Scanner-Signature: ca614d1e091d53a46adb2ef7a730c6a6 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Header: Not Detected X-SpamTest-Info: Profiles 2646 [Apr 16 2008] X-SpamTest-Info: helo_type=3 X-SpamTest-Method: none X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0278], KAS30/Release Cc: Roman Divacky , Alfred Perlstein , Pawel Jakub Dawidek , freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2008 17:58:38 -0000 --aJDJANv8BPX70wwH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Apr 16, 2008 at 06:52:12PM +0100, Robert Watson wrote: >=20 > On Wed, 16 Apr 2008, Alfred Perlstein wrote: >=20 > >>File descriptor proposal works like this: > >> > >>userland > >> openat(fd, path) > >>kernel > >> NDINIT_AT(&vp, path, fd); > >> /* operate on vp */ > >> > >>Vnode proposal works this way: > >> > >>userland > >> openat(fd, path) > >>kernel > >> dvp =3D file_descriptor_to_vnode(fd); > >> NDINIT_AT(&vp, path, dvp); > >> /* operate on vp */ > > > >My first impression is that passing fp to vp code is a layering > >violation and bad news. I need to think about it more. >=20 > NDINIT() is already aware of the file descriptor array because it uses th= at=20 > to get the current working and root directories. And what the *at() syst= em=20 > calls are effectively doing is substituting another directory for the=20 > current working directory. The exact expression of all this doesn't matt= er=20 > all that much to me, but I think evaluating the file descriptor array for= =20 > directory stuff all in one place, rather than spread over the caller and= =20 > NDINIT(), is cleaner and avoids a lot of code everywhere. Nothing says y= ou=20 > can't have: >=20 > void > NDINIT(struct nameidata *ndp, u_long op, u_long flags, > enum uio_seg segflg, const char *namep, struct thread *td); >=20 > void > NDINIT_AT(struct nameidata *ndp, u_long op, u_long flags, > enum uio_seg segflg, const char *namep, int fd, struct thread *t= d); >=20 > NDINIT_DVP(struct nameidata *ndp, u_long op, u_long flags, > enum uio_seg segflg, const char *namep, struct vnode *vp, > struct thread *td); >=20 > However, I think I wouldn't want NDINIT_AT() to be a wrapper for=20 > NDINIT_DVP(), because I'd like all that fdp following to occur together. I already mailed the patch implementing all the above, modulo s/_DVP/_ATVP/. I want to get the response from Pawel and others. If positive, the patch is to be tested and committed. I see no reason for heating the debate. --aJDJANv8BPX70wwH Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEARECAAYFAkgGPkcACgkQC3+MBN1Mb4jCngCgl4uSoRlhEnCTsC5FagUNKVlN 78EAnibPH/Vh0KEr8RcOlhkikMkQqZ6k =WFdv -----END PGP SIGNATURE----- --aJDJANv8BPX70wwH-- From owner-freebsd-arch@FreeBSD.ORG Wed Apr 16 18:05:51 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DDC11065671; Wed, 16 Apr 2008 18:05:51 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello087206046210.chello.pl [87.206.46.210]) by mx1.freebsd.org (Postfix) with ESMTP id 893CB8FC1C; Wed, 16 Apr 2008 18:05:50 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 01FBB45C89; Wed, 16 Apr 2008 20:05:48 +0200 (CEST) Received: from localhost (chello087206046210.chello.pl [87.206.46.210]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id AD1194569A; Wed, 16 Apr 2008 20:05:41 +0200 (CEST) Date: Wed, 16 Apr 2008 20:05:26 +0200 From: Pawel Jakub Dawidek To: Kostik Belousov Message-ID: <20080416180526.GA32235@garage.freebsd.pl> References: <20071218092222.GA9695@freebsd.org> <200712201138.56423.jhb@freebsd.org> <20080412112019.GI45299@garage.freebsd.pl> <200804161014.41025.jhb@freebsd.org> <20080416165612.GA31094@garage.freebsd.pl> <20080416170341.GN95731@elvis.mu.org> <20080416184522.F1046@fledge.watson.org> <20080416175832.GX18958@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="XsQoSWH+UP9D9v3l" Content-Disposition: inline In-Reply-To: <20080416175832.GX18958@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 8.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham version=3.0.4 Cc: Roman Divacky , Alfred Perlstein , Robert Watson , freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Apr 2008 18:05:51 -0000 --XsQoSWH+UP9D9v3l Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Apr 16, 2008 at 08:58:32PM +0300, Kostik Belousov wrote: > On Wed, Apr 16, 2008 at 06:52:12PM +0100, Robert Watson wrote: > > NDINIT() is already aware of the file descriptor array because it uses = that=20 > > to get the current working and root directories. And what the *at() sy= stem=20 > > calls are effectively doing is substituting another directory for the= =20 > > current working directory. The exact expression of all this doesn't ma= tter=20 > > all that much to me, but I think evaluating the file descriptor array f= or=20 > > directory stuff all in one place, rather than spread over the caller an= d=20 > > NDINIT(), is cleaner and avoids a lot of code everywhere. Nothing says= you=20 > > can't have: > >=20 > > void > > NDINIT(struct nameidata *ndp, u_long op, u_long flags, > > enum uio_seg segflg, const char *namep, struct thread *td); > >=20 > > void > > NDINIT_AT(struct nameidata *ndp, u_long op, u_long flags, > > enum uio_seg segflg, const char *namep, int fd, struct thread = *td); > >=20 > > NDINIT_DVP(struct nameidata *ndp, u_long op, u_long flags, > > enum uio_seg segflg, const char *namep, struct vnode *vp, > > struct thread *td); > >=20 > > However, I think I wouldn't want NDINIT_AT() to be a wrapper for=20 > > NDINIT_DVP(), because I'd like all that fdp following to occur together. >=20 > I already mailed the patch implementing all the above, modulo > s/_DVP/_ATVP/. I want to get the response from Pawel and others. If > positive, the patch is to be tested and committed. Back when we discussed NDINIT_AT(), I was a bit opposed, because I was afraid that we will grow more NDINIT_() functions. I preferred to just initialize additional arguments directly, eg. NDINIT(&nd, foo, bar); nd.ni_dirfd =3D fd; nd.ni_startvp =3D dvp; namei(&nd); At this point I don't really care, I can use NDINIT_DVP/NDINIT_ATVP. > I see no reason for heating the debate. Agreed. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --XsQoSWH+UP9D9v3l Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFIBj/jForvXbEpPzQRAgO6AKDWUm3ngdma89OX/Ce7THAgKcJL7gCggABl fAyvTnNC1/DocpuLfJc6qhU= =DzdD -----END PGP SIGNATURE----- --XsQoSWH+UP9D9v3l-- From owner-freebsd-arch@FreeBSD.ORG Thu Apr 17 03:02:02 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1441B1065683 for ; Thu, 17 Apr 2008 03:02:02 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.227]) by mx1.freebsd.org (Postfix) with ESMTP id C18F78FC19 for ; Thu, 17 Apr 2008 03:02:01 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wr-out-0506.google.com with SMTP id 50so16354wra.13 for ; Wed, 16 Apr 2008 20:02:00 -0700 (PDT) Received: by 10.142.48.14 with SMTP id v14mr248420wfv.133.1208401319958; Wed, 16 Apr 2008 20:01:59 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id 27sm18263071wfa.0.2008.04.16.20.01.57 (version=SSLv3 cipher=OTHER); Wed, 16 Apr 2008 20:01:58 -0700 (PDT) Date: Wed, 16 Apr 2008 17:02:37 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: "M. Warner Losh" In-Reply-To: <20080416.093827.-262812665.imp@bsdimp.com> Message-ID: <20080416170104.G959@desktop> References: <480313A2.4050306@nokia.com> <20080413222626.X959@desktop> <300DE361-167E-4491-8E8C-7A227225B506@mac.com> <20080416.093827.-262812665.imp@bsdimp.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org, xcllnt@mac.com, Arthur.Hartwig@nokia.com Subject: Re: f_offset X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2008 03:02:02 -0000 On Wed, 16 Apr 2008, M. Warner Losh wrote: > In message: <300DE361-167E-4491-8E8C-7A227225B506@mac.com> > Marcel Moolenaar writes: > : > : On Apr 14, 2008, at 1:27 AM, Jeff Roberson wrote: > : > > : > On Mon, 14 Apr 2008, Arthur Hartwig wrote: > : > > : >> ext Jeff Roberson wrote: > : >>> So I'm in the midst of working on other filesystem concurrency > : >>> issues and that has brought me back around to f_offset again. I'm > : >>> working on a method to allow non-overlapping writes and reads to > : >>> proceed concurrently to the same file. This means the exclusive > : >>> vnode lock can not be used to protect f_offset even in the write > : >>> case. > : >>> To maintain the existing semantics I'm simply going to add an > : >>> exclusive sx_xlock() around access to f_offset. This is done > : >>> inconsistently today which is fine from the perspective of the > : >>> updates in most cases being user-space races. However, f_offset > : >>> is 64bit and can not be written atomically on 32bit systems and so > : >>> requires some extra synchronization there. > : >> I'm not sure of the processor family constraints of the i386 > : >> builds, but the Intel IA32 architecture manual says reads and > : >> writes of a quadword (64 bits) aligned on a quadword boundary are > : >> atomic (Pentium and newer CPUs). Guess that leaves out i386, i486 > : >> (any others?) > : > > : > Thanks. I hadn't seen that. Do you know which manual and section > : > states this? I was intending to simply use cmpxchg8b but it sounds > : > like that may not be necessary. We still have to handle other 32bit > : > archs like powerpc and mips but I'm not sure if any of those are SMP. > : > : I'm working on SMP for PowerPC.. > > Support for MIPS SMP is in the initial commit. It might not be > working, but one of the big reasons that people want MIPS and FreeBSD > is due to the excellent scaling work that's been done as well as the > prevenance of multicore MIPS designs for certain application domains. My intent is to support 32bit platforms with a generic shim that grabs a mtx pool lock to provide atomic like primitives for 64bit. I think that will sufficiently solve the issue. For 32bit non-smp we could simply disable interrupts during the operation. > > Warner > From owner-freebsd-arch@FreeBSD.ORG Thu Apr 17 03:27:45 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 020711065673 for ; Thu, 17 Apr 2008 03:27:45 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.174]) by mx1.freebsd.org (Postfix) with ESMTP id D37F18FC1D for ; Thu, 17 Apr 2008 03:27:44 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wf-out-1314.google.com with SMTP id 25so2688438wfa.7 for ; Wed, 16 Apr 2008 20:27:44 -0700 (PDT) Received: by 10.142.231.7 with SMTP id d7mr244631wfh.194.1208401215017; Wed, 16 Apr 2008 20:00:15 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id 9sm19241939wfc.16.2008.04.16.20.00.12 (version=SSLv3 cipher=OTHER); Wed, 16 Apr 2008 20:00:13 -0700 (PDT) Date: Wed, 16 Apr 2008 17:00:52 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: John Baldwin , attilio@freebsd.org In-Reply-To: <200804160930.27981.jhb@freebsd.org> Message-ID: <20080416165553.U959@desktop> References: <20080412132457.W43186@desktop> <200804160930.27981.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-arch@freebsd.org Subject: synchronization primitive size (was f_offset) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2008 03:27:45 -0000 On Wed, 16 Apr 2008, John Baldwin wrote: > On Saturday 12 April 2008 07:51:15 pm Jeff Roberson wrote: >> The sx lock will nearly double the size of struct file. Although it's >> lost some weight in 8.0 that is quite unfortunate. However, the method of >> using LOCKED & WAITING flags, msleep and a mutex has ruined performance in >> too many cases to continue using it. > > You could use a pool of sx locks and hash the file pointer to get an offset > (ala the mtx pools) to avoid bloating struct file if desired. This would not be a good idea since the sx is held over actual io. Any collisions would mean blocking unrelated files/vnodes. I think we should use the right synchronization primitive for the job. If people are upset at how much space overhead that adds we need to rethink how big our synchronization primitives are. If we move the *_recurse fields into lock object they can buddy up with lo_flags potentially saving 8 bytes of pading waste on 64bit architectures. For the !WITNESS case we need only the name pointer, the flags, recurse count, and actual lock uintptr_t. This reduces the overhead to 3 pointers size on 64bit. We could either conditionally compile WITNESS support, which has unpleasant side effects for binary modules, or use secondary storage with a hash similar to what I did with LOCK_PROFILING. I think it's time to seriously consider this. Thanks, Jeff > > -- > John Baldwin > From owner-freebsd-arch@FreeBSD.ORG Thu Apr 17 12:25:15 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC5FD106566B for ; Thu, 17 Apr 2008 12:25:15 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.156]) by mx1.freebsd.org (Postfix) with ESMTP id 3F3278FC12 for ; Thu, 17 Apr 2008 12:25:15 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fg-out-1718.google.com with SMTP id 16so39989fgg.35 for ; Thu, 17 Apr 2008 05:25:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; bh=jIHqyLmYvthTKT1Y2/EBlL1L/ZbpV2N134t0rRFaQ0Y=; b=uvwgeHYc3oaJzdkUfgRsx+E95taTOgTxnNL6ybutw6+eY8Pqs3eWluoUTLHZiC3dQI9hqel9pCn2tpMDLPmG0wn3rz5Y0l1hvvVF4mNltZ5fk8qvEMW0W2jMELYaiSG5Lejq/vqhFCFt4y4VbkfOzFHyVzxjvcqux2f3DaBKVAg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=pAE4A3EP+5AZ+ZLMAYdTNK1E5d260Z4PGk9VbL9bJc1oMjxx0+WW48rPtkBg4yGwcP7VxVm5GZsS53WyUgs1RvFFT0QqtBOIGdjm3lEeSk/v+4d1sBMyLaV3b1whBcJe6SQdPlfW/0+csKYbQuyVUAnO6Cr/Xkys/ILUwIxGtyg= Received: by 10.86.31.18 with SMTP id e18mr2557063fge.35.1208433544377; Thu, 17 Apr 2008 04:59:04 -0700 (PDT) Received: by 10.86.36.15 with HTTP; Thu, 17 Apr 2008 04:59:04 -0700 (PDT) Message-ID: <3bbf2fe10804170459j4933ed09ubfd22035ff27d5d6@mail.gmail.com> Date: Thu, 17 Apr 2008 13:59:04 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "Jeff Roberson" In-Reply-To: <20080416165553.U959@desktop> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080412132457.W43186@desktop> <200804160930.27981.jhb@freebsd.org> <20080416165553.U959@desktop> X-Google-Sender-Auth: 5a1de6146ee7e31f Cc: freebsd-arch@freebsd.org Subject: Re: synchronization primitive size (was f_offset) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2008 12:25:15 -0000 2008/4/17, Jeff Roberson : > > On Wed, 16 Apr 2008, John Baldwin wrote: > > > > On Saturday 12 April 2008 07:51:15 pm Jeff Roberson wrote: > > > > > The sx lock will nearly double the size of struct file. Although it's > > > lost some weight in 8.0 that is quite unfortunate. However, the method > of > > > using LOCKED & WAITING flags, msleep and a mutex has ruined performance > in > > > too many cases to continue using it. > > > > > > > You could use a pool of sx locks and hash the file pointer to get an > offset > > (ala the mtx pools) to avoid bloating struct file if desired. > > > > This would not be a good idea since the sx is held over actual io. Any > collisions would mean blocking unrelated files/vnodes. > > I think we should use the right synchronization primitive for the job. If > people are upset at how much space overhead that adds we need to rethink how > big our synchronization primitives are. > > If we move the *_recurse fields into lock object they can buddy up with > lo_flags potentially saving 8 bytes of pading waste on 64bit architectures. it is not possible. Lower 16 bits of lo_flags are currently used for special flags, external flags, by sx and lockmgr and they are intented to be 'reserved'. Btw, I worked some in this week-end about this issue. I just broke lock_object into for WITNESS (moving there lo_type and lod_* fields), I embedded the recursion counter in the lockmgr structure and statically sized lo_flags and lo_recurse to 32 bits. For the !WITNESS case the size of lock_object was 2/3 and for locking primitives 1/2. It requires, however, ABI breakage for WITNESS which is something we don't want really. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Thu Apr 17 13:26:28 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9E9EF1065674 for ; Thu, 17 Apr 2008 13:26:28 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.224]) by mx1.freebsd.org (Postfix) with ESMTP id 679F28FC1B for ; Thu, 17 Apr 2008 13:26:28 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wr-out-0506.google.com with SMTP id 50so42631wra.13 for ; Thu, 17 Apr 2008 06:26:27 -0700 (PDT) Received: by 10.114.166.1 with SMTP id o1mr1413658wae.5.1208438786586; Thu, 17 Apr 2008 06:26:26 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id n37sm22034917wag.24.2008.04.17.06.26.25 (version=SSLv3 cipher=OTHER); Thu, 17 Apr 2008 06:26:25 -0700 (PDT) Date: Thu, 17 Apr 2008 03:26:42 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Attilio Rao In-Reply-To: <3bbf2fe10804170459j4933ed09ubfd22035ff27d5d6@mail.gmail.com> Message-ID: <20080417032638.T983@desktop> References: <20080412132457.W43186@desktop> <200804160930.27981.jhb@freebsd.org> <20080416165553.U959@desktop> <3bbf2fe10804170459j4933ed09ubfd22035ff27d5d6@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-arch@freebsd.org Subject: Re: synchronization primitive size (was f_offset) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2008 13:26:28 -0000 On Thu, 17 Apr 2008, Attilio Rao wrote: > 2008/4/17, Jeff Roberson : >> >> On Wed, 16 Apr 2008, John Baldwin wrote: >> >> >>> On Saturday 12 April 2008 07:51:15 pm Jeff Roberson wrote: >>> >>>> The sx lock will nearly double the size of struct file. Although it's >>>> lost some weight in 8.0 that is quite unfortunate. However, the method >> of >>>> using LOCKED & WAITING flags, msleep and a mutex has ruined performance >> in >>>> too many cases to continue using it. >>>> >>> >>> You could use a pool of sx locks and hash the file pointer to get an >> offset >>> (ala the mtx pools) to avoid bloating struct file if desired. >>> >> >> This would not be a good idea since the sx is held over actual io. Any >> collisions would mean blocking unrelated files/vnodes. >> >> I think we should use the right synchronization primitive for the job. If >> people are upset at how much space overhead that adds we need to rethink how >> big our synchronization primitives are. >> >> If we move the *_recurse fields into lock object they can buddy up with >> lo_flags potentially saving 8 bytes of pading waste on 64bit architectures. > > it is not possible. > Lower 16 bits of lo_flags are currently used for special flags, > external flags, by sx and lockmgr and they are intented to be > 'reserved'. I meant that they should be placed adjacent to each other so that neither one of them requires padding on 64bit machines. Not that the flags and recurse should be within the same member. > > Btw, I worked some in this week-end about this issue. > I just broke lock_object into for WITNESS (moving there lo_type and > lod_* fields), I embedded the recursion counter in the lockmgr > structure and statically sized lo_flags and lo_recurse to 32 bits. For > the !WITNESS case the size of lock_object was 2/3 and for locking > primitives 1/2. It requires, however, ABI breakage for WITNESS which > is something we don't want really. What do you think about witness operating as LOCK_PROFILING does now? Each lock would be looked up in a table to get the required information rather than embedding it in the datastructure. This would solve the ABI issue. > > Thanks, > Attilio > > > -- > Peace can only be achieved by understanding - A. Einstein > From owner-freebsd-arch@FreeBSD.ORG Thu Apr 17 13:44:56 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D0AE1106567D; Thu, 17 Apr 2008 13:44:56 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id C2CB58FC27; Thu, 17 Apr 2008 13:44:56 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from zion.baldwin.cx (unknown [208.65.89.154]) by elvis.mu.org (Postfix) with ESMTP id 2E1AC1A4D8B; Thu, 17 Apr 2008 06:44:56 -0700 (PDT) From: John Baldwin To: Pawel Jakub Dawidek Date: Thu, 17 Apr 2008 09:33:45 -0400 User-Agent: KMail/1.9.7 References: <20071218092222.GA9695@freebsd.org> <200804161014.41025.jhb@freebsd.org> <20080416165612.GA31094@garage.freebsd.pl> In-Reply-To: <20080416165612.GA31094@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200804170933.45477.jhb@freebsd.org> Cc: kib@freebsd.org, Roman Divacky , rwatson@freebsd.org, freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2008 13:44:56 -0000 On Wednesday 16 April 2008 12:56:12 pm Pawel Jakub Dawidek wrote: > On Wed, Apr 16, 2008 at 10:14:40AM -0400, John Baldwin wrote: > > On Saturday 12 April 2008 07:20:19 am Pawel Jakub Dawidek wrote: > > > From what you write John, #1 is a better choice than #2. If you want to > > > avoid races, you can pass already locked vnode. In case of file > > > descriptors, if p_fd is not locked another thread can close and open > > > different directory under the same descriptor number. > > > > Did you read Robert's paper? Do you not realize that the kernel copying > > data in from userland multiple times and having it change in between is > > very bug prone? > > Believe me I'm fully aware of the problems Robert described in his > paper. With vnode approach where do you have more data copying between > kernel and userland? Only because it was explicitly mentioned in the original e-mail: > CONs of #1 > > o some very small code duplication (the translation is done in every > kern_fooat() function) > o there is a race between the name translation and the actual use of the result > of the translation that needs to be handled, the "path_to_file" string is copied > to the kernel space twice hence a race -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Thu Apr 17 14:24:44 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6B67106564A; Thu, 17 Apr 2008 14:24:44 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello087206046210.chello.pl [87.206.46.210]) by mx1.freebsd.org (Postfix) with ESMTP id F41C38FC17; Thu, 17 Apr 2008 14:24:43 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 6F4FA45CA6; Thu, 17 Apr 2008 16:24:41 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id B17494569A; Thu, 17 Apr 2008 16:24:36 +0200 (CEST) Date: Thu, 17 Apr 2008 16:24:18 +0200 From: Pawel Jakub Dawidek To: John Baldwin Message-ID: <20080417142418.GA35655@garage.freebsd.pl> References: <20071218092222.GA9695@freebsd.org> <200804161014.41025.jhb@freebsd.org> <20080416165612.GA31094@garage.freebsd.pl> <200804170933.45477.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="C7zPtVaVf+AK4Oqc" Content-Disposition: inline In-Reply-To: <200804170933.45477.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 8.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: kib@freebsd.org, Roman Divacky , rwatson@freebsd.org, freebsd-arch@freebsd.org Subject: Re: final decision about *at syscalls X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Apr 2008 14:24:44 -0000 --C7zPtVaVf+AK4Oqc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 17, 2008 at 09:33:45AM -0400, John Baldwin wrote: > On Wednesday 16 April 2008 12:56:12 pm Pawel Jakub Dawidek wrote: > > On Wed, Apr 16, 2008 at 10:14:40AM -0400, John Baldwin wrote: > > > On Saturday 12 April 2008 07:20:19 am Pawel Jakub Dawidek wrote: > > > > From what you write John, #1 is a better choice than #2. If you wan= t to > > > > avoid races, you can pass already locked vnode. In case of file > > > > descriptors, if p_fd is not locked another thread can close and open > > > > different directory under the same descriptor number. > > > > > > Did you read Robert's paper? Do you not realize that the kernel copy= ing > > > data in from userland multiple times and having it change in between = is > > > very bug prone? > > > > Believe me I'm fully aware of the problems Robert described in his > > paper. With vnode approach where do you have more data copying between > > kernel and userland? >=20 > Only because it was explicitly mentioned in the original e-mail: >=20 > > CONs of #1 > >=20 > > o some very small code duplication (the translation is do= ne in every=20 > > kern_fooat() function) > > o there is a race between the name translation and the ac= tual use of the result > > of the translation that needs to be handled, the "path_= to_file" string is copied > > to the kernel space twice hence a race Ah, ok. I don't think this is not method's flaw, but implementation problem. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --C7zPtVaVf+AK4Oqc Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFIB12RForvXbEpPzQRAqinAJ46IU96CpA9sdGmUJ/qSn0VrLx1xwCfQsHm adoLOLOzTE6q09mnHyxQRIU= =v9UV -----END PGP SIGNATURE----- --C7zPtVaVf+AK4Oqc-- From owner-freebsd-arch@FreeBSD.ORG Fri Apr 18 13:48:20 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82E05106566B for ; Fri, 18 Apr 2008 13:48:20 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from postfix1-g20.free.fr (postfix1-g20.free.fr [212.27.60.42]) by mx1.freebsd.org (Postfix) with ESMTP id 19A298FC14 for ; Fri, 18 Apr 2008 13:48:20 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (smtp5-g19.free.fr [212.27.42.35]) by postfix1-g20.free.fr (Postfix) with ESMTP id 923A72516F33 for ; Fri, 18 Apr 2008 15:30:03 +0200 (CEST) Received: from smtp5-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp5-g19.free.fr (Postfix) with ESMTP id 06FDB3F6401 for ; Fri, 18 Apr 2008 15:30:02 +0200 (CEST) Received: from tatooine.tataz.chchile.org (tataz.chchile.org [82.233.239.98]) by smtp5-g19.free.fr (Postfix) with ESMTP id E4AB83F62F4 for ; Fri, 18 Apr 2008 15:30:01 +0200 (CEST) Received: from obiwan.tataz.chchile.org (unknown [192.168.1.25]) by tatooine.tataz.chchile.org (Postfix) with ESMTP id B2BEE9BF12 for ; Fri, 18 Apr 2008 13:27:49 +0000 (UTC) Received: by obiwan.tataz.chchile.org (Postfix, from userid 1000) id A4F97405B; Fri, 18 Apr 2008 15:27:49 +0200 (CEST) Date: Fri, 18 Apr 2008 15:27:49 +0200 From: Jeremie Le Hen To: freebsd-arch@FreeBSD.org Message-ID: <20080418132749.GB4840@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.15 (2007-04-06) Cc: Subject: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 13:48:20 -0000 Hi, As you may already know I've integrated GCC's ProPolice into FreeBSD. The build infrastructure overlord, namely ru@, (I'm quoting kan@) has reviewed the patch and technically it is ready to hit the CVS tree. A few things should be discussed beforehand though. First, should we build world and/or kernel with SSP by default? I've scamped a trivial benchmark back in 2006: timing buildworld with and without SSP. You can found the result on my webpage: http://tataz.chchile.org/~tataz/FreeSBD/SSP/#section1 Also, the original ProPolice author achieved a thorough performance comparison with and without SSP, and the overhead is really small: http://www.trl.ibm.com/projects/security/ssp/node5.html I would like to reach a consensus on whether SSP should be opt-in or opt-out on FreeBSD. Another concern that Robert Watson showed back in 2006 [1] when I brought forward my patch was the compatibility between pre-SSP and post-SSP binaries/libraries. I'll try to make it simple and short. SSP requires two additional symbols that are kindly provided by libc. Any binary or library compiled with SSP will require them. As long as your libc contains the symbols, you can smoothly run pre-SSP applications with post-SSP libs as well as the other way around. Also Kris explained [2] that once applied, it is painful to try to revert the change (removing SSP symbols from libc). This is true but once the patch gets committed, it should hopefully never happen. [1] http://lists.freebsd.org/pipermail/freebsd-security/2006-May/003751.html [2] http://lists.freebsd.org/pipermail/freebsd-security/2006-May/003752.html Thank you. Best regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From owner-freebsd-arch@FreeBSD.ORG Fri Apr 18 15:03:58 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C3FD1065671 for ; Fri, 18 Apr 2008 15:03:58 +0000 (UTC) (envelope-from antoine.brodin.freebsd@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.181]) by mx1.freebsd.org (Postfix) with ESMTP id 3C6118FC14 for ; Fri, 18 Apr 2008 15:03:57 +0000 (UTC) (envelope-from antoine.brodin.freebsd@gmail.com) Received: by py-out-1112.google.com with SMTP id u52so803450pyb.10 for ; Fri, 18 Apr 2008 08:03:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; bh=pbB5ZtIP17sSj7rMo0fPsG4rs9FtPrutgLZ6qbCNNt8=; b=sbxyT/3LsRxXwoiqnxW51Ekwz/h2ia2XRl5bh1fu6ch9Rz4GZzdoVUs7Zr7ZQbc6aJQZNfBFhqfU5ifyTXc4i1prV/4tXEY6nni+2Pn0Mwxyqz5bruVq1dHqCHHQpxuXRjxhRYKSXXpMqpRXFDOmuCCWg+t/FGwejB/bicO02h0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=SZ1IiMuA2eBNrco5lX0H2MDk2D5HYjAAixE+rMJ3+WPUSeBuV3CwZ3/8jlUTugV+uUkmHvKdf10OEkXnnXR9c9crwrqKz13HE7Alj33ZxKSt5EPY2/YODr7GCx9QDEE3f8sziNqYYPFrH50iErBW6oxKUU74gZPZooTc6ihS7wY= Received: by 10.35.71.17 with SMTP id y17mr4917310pyk.44.1208529426460; Fri, 18 Apr 2008 07:37:06 -0700 (PDT) Received: by 10.35.38.6 with HTTP; Fri, 18 Apr 2008 07:37:06 -0700 (PDT) Message-ID: Date: Fri, 18 Apr 2008 16:37:06 +0200 From: "Antoine Brodin" Sender: antoine.brodin.freebsd@gmail.com To: "Jeremie Le Hen" In-Reply-To: <20080418132749.GB4840@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080418132749.GB4840@obiwan.tataz.chchile.org> X-Google-Sender-Auth: ee68c5a427e3fbef Cc: freebsd-arch@freebsd.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 15:03:58 -0000 On Fri, Apr 18, 2008 at 3:27 PM, Jeremie Le Hen wrote: > Hi, > > As you may already know I've integrated GCC's ProPolice into FreeBSD. > The build infrastructure overlord, namely ru@, (I'm quoting kan@) has > reviewed the patch and technically it is ready to hit the CVS tree. > > A few things should be discussed beforehand though. > > First, should we build world and/or kernel with SSP by default? I've > scamped a trivial benchmark back in 2006: timing buildworld with and > without SSP. You can found the result on my webpage: > http://tataz.chchile.org/~tataz/FreeSBD/SSP/#section1 > Also, the original ProPolice author achieved a thorough performance > comparison with and without SSP, and the overhead is really small: > http://www.trl.ibm.com/projects/security/ssp/node5.html > I would like to reach a consensus on whether SSP should be opt-in or > opt-out on FreeBSD. > > > Another concern that Robert Watson showed back in 2006 [1] when I brought > forward my patch was the compatibility between pre-SSP and post-SSP > binaries/libraries. > > I'll try to make it simple and short. SSP requires two additional > symbols that are kindly provided by libc. Any binary or library > compiled with SSP will require them. As long as your libc contains the > symbols, you can smoothly run pre-SSP applications with post-SSP libs as > well as the other way around. > > Also Kris explained [2] that once applied, it is painful to try to > revert the change (removing SSP symbols from libc). This is true but > once the patch gets committed, it should hopefully never happen. > > [1] http://lists.freebsd.org/pipermail/freebsd-security/2006-May/003751.html > [2] http://lists.freebsd.org/pipermail/freebsd-security/2006-May/003752.html Last time I looked at your patch, there was a problem when using -fstack-protector-all instead of -fstack-protector: when you compile lib/csu/*, gnu/lib/csu/*, or src/lib/libc/sys/stack_protector.c with this flag, there is a kind of chicken/egg problem and you end up with an unusable world. That said, it would be great to be able to compile world with SSP when an option is set in src.conf. Cheers, Antoine From owner-freebsd-arch@FreeBSD.ORG Fri Apr 18 15:52:48 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3292D106566B for ; Fri, 18 Apr 2008 15:52:48 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from smtpoutm.mac.com (smtpoutm.mac.com [17.148.16.72]) by mx1.freebsd.org (Postfix) with ESMTP id 1946B8FC14 for ; Fri, 18 Apr 2008 15:52:47 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from mac.com (asmtp006-s [10.150.69.69]) by smtpoutm.mac.com (Xserve/smtpout009/MantshX 4.0) with ESMTP id m3IFqlEN016616; Fri, 18 Apr 2008 08:52:47 -0700 (PDT) Received: from macbook-pro.jnpr.net (natint3.juniper.net [66.129.224.36]) (authenticated bits=0) by mac.com (Xserve/asmtp006/MantshX 4.0) with ESMTP id m3IFqiCO001875 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Fri, 18 Apr 2008 08:52:45 -0700 (PDT) Message-Id: From: Marcel Moolenaar To: Jeremie Le Hen In-Reply-To: <20080418132749.GB4840@obiwan.tataz.chchile.org> Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Fri, 18 Apr 2008 08:52:42 -0700 References: <20080418132749.GB4840@obiwan.tataz.chchile.org> X-Mailer: Apple Mail (2.919.2) Cc: freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 15:52:48 -0000 On Apr 18, 2008, at 6:27 AM, Jeremie Le Hen wrote: > Hi, > > As you may already know I've integrated GCC's ProPolice into FreeBSD. > The build infrastructure overlord, namely ru@, (I'm quoting kan@) has > reviewed the patch and technically it is ready to hit the CVS tree. > > A few things should be discussed beforehand though. > > First, should we build world and/or kernel with SSP by default? Really, first is: what platforms does this apply to and/or have you tested this on? > I would like to reach a consensus on whether SSP should be opt-in or > opt-out on FreeBSD. That depends: what's the benefit of ProPolice on ia64? Also: please provide references to ProPolice. -- Marcel Moolenaar xcllnt@mac.com From owner-freebsd-arch@FreeBSD.ORG Fri Apr 18 16:47:32 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A037B106564A for ; Fri, 18 Apr 2008 16:47:32 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (smtp5-g19.free.fr [212.27.42.35]) by mx1.freebsd.org (Postfix) with ESMTP id 5E7F58FC13 for ; Fri, 18 Apr 2008 16:47:31 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp5-g19.free.fr (Postfix) with ESMTP id DAADD3F8A51; Fri, 18 Apr 2008 18:47:30 +0200 (CEST) Received: from tatooine.tataz.chchile.org (tataz.chchile.org [82.233.239.98]) by smtp5-g19.free.fr (Postfix) with ESMTP id 1BDCE3F768D; Fri, 18 Apr 2008 18:34:35 +0200 (CEST) Received: from obiwan.tataz.chchile.org (unknown [192.168.1.25]) by tatooine.tataz.chchile.org (Postfix) with ESMTP id 5BE899BF12; Fri, 18 Apr 2008 16:32:22 +0000 (UTC) Received: by obiwan.tataz.chchile.org (Postfix, from userid 1000) id 4D997405B; Fri, 18 Apr 2008 18:32:22 +0200 (CEST) Date: Fri, 18 Apr 2008 18:32:22 +0200 From: Jeremie Le Hen To: Antoine Brodin Message-ID: <20080418163222.GC4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.15 (2007-04-06) Cc: freebsd-arch@freebsd.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 16:47:32 -0000 Hi Antoine, On Fri, Apr 18, 2008 at 04:37:06PM +0200, Antoine Brodin wrote: > Last time I looked at your patch, there was a problem when using > -fstack-protector-all instead of -fstack-protector: > when you compile lib/csu/*, gnu/lib/csu/*, or > src/lib/libc/sys/stack_protector.c with this flag, there is a kind of > chicken/egg problem and you end up with an unusable world. > That said, it would be great to be able to compile world with SSP when > an option is set in src.conf. I'm kind of surprised by this statement. I will give it a shot this evening. I'll let you know. Thanks. Regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From owner-freebsd-arch@FreeBSD.ORG Fri Apr 18 17:28:57 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 379981065677 for ; Fri, 18 Apr 2008 17:28:57 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (smtp5-g19.free.fr [212.27.42.35]) by mx1.freebsd.org (Postfix) with ESMTP id C5D4D8FC25 for ; Fri, 18 Apr 2008 17:28:56 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp5-g19.free.fr (Postfix) with ESMTP id C0C003F7C32; Fri, 18 Apr 2008 19:28:53 +0200 (CEST) Received: from tatooine.tataz.chchile.org (tataz.chchile.org [82.233.239.98]) by smtp5-g19.free.fr (Postfix) with ESMTP id 579E03F952E; Fri, 18 Apr 2008 19:01:12 +0200 (CEST) Received: from obiwan.tataz.chchile.org (unknown [192.168.1.25]) by tatooine.tataz.chchile.org (Postfix) with ESMTP id 136D19BF12; Fri, 18 Apr 2008 16:59:00 +0000 (UTC) Received: by obiwan.tataz.chchile.org (Postfix, from userid 1000) id F276C405B; Fri, 18 Apr 2008 18:58:59 +0200 (CEST) Date: Fri, 18 Apr 2008 18:58:59 +0200 From: Jeremie Le Hen To: Marcel Moolenaar Message-ID: <20080418165859.GD4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.15 (2007-04-06) Cc: freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 17:28:57 -0000 Hi Marcel, On Fri, Apr 18, 2008 at 08:52:42AM -0700, Marcel Moolenaar wrote: > > The build infrastructure overlord, namely ru@, (I'm quoting kan@) has > > reviewed the patch and technically it is ready to hit the CVS tree. > > > > A few things should be discussed beforehand though. > > > > First, should we build world and/or kernel with SSP by default? > > Really, first is: what platforms does this apply to and/or have > you tested this on? The patch enables SSP for all archs. Unfortunately I've not been able to test it myself on other arch than i386, but two years ago I've got a successful feedback from Pascal Hofstee on amd64. ISTR there was a sparc64 user too, but I'm not sure. This should theorically work for all arch as, from what I've read, ProPolice takes place at the intermediate representation level of the compiler. This should therefore be architecture agnostic. > > I would like to reach a consensus on whether SSP should be opt-in or > > opt-out on FreeBSD. > > That depends: what's the benefit of ProPolice on ia64? > > Also: please provide references to ProPolice. I think the original author's website will explain things better than me :-). http://www.trl.ibm.com/projects/security/ssp/ Basically, a "canary" is randomly chosen when the program starts (this part lives in libc). GCC inserts code in prologue and epilogue of all functions that contains a buffer of 8 or more bytes. In the prologue, the canary is pushed on the stack right after the return valued has been pushed, and this value is then checked in function epilogue. If the value in the stack has changed, there has been a buffer overflow ProPolice has originally been a patch against gcc2 and gcc3, but it has been integrated to GCC 4.1 IIRC. I hope this will answer to your concerns. Best regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From owner-freebsd-arch@FreeBSD.ORG Fri Apr 18 17:48:31 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 397AB106566C for ; Fri, 18 Apr 2008 17:48:31 +0000 (UTC) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.183]) by mx1.freebsd.org (Postfix) with ESMTP id C6FC48FC2B for ; Fri, 18 Apr 2008 17:48:30 +0000 (UTC) (envelope-from max@love2party.net) Received: from max41 (port-92-194-21-71.dynamic.qsc.de [92.194.21.71]) by mrelayeu.kundenserver.de (node=mrelayeu8) with ESMTP (Nemesis) id 0ML31I-1Jmuh9124D-0003Gf; Fri, 18 Apr 2008 19:48:28 +0200 From: Max Laier To: freebsd-arch@freebsd.org Date: Fri, 18 Apr 2008 19:45:58 +0200 User-Agent: KMail/1.9.7 References: <20080418132749.GB4840@obiwan.tataz.chchile.org> In-Reply-To: <20080418132749.GB4840@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200804181945.59189.max@love2party.net> X-Provags-ID: V01U2FsdGVkX1+mujMtEAPaGjNwJ9kQ4MjDbV6+mqJQG1a5V88 Ngd4vMWMgI3obsNeovTskKDAGRdjmIOre8IVAYLw9FQZqgT4zR a+o9kL6A4xNGHmUDmEtNQ== Cc: Jeremie Le Hen Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 17:48:31 -0000 On Friday 18 April 2008 15:27:49 Jeremie Le Hen wrote: > Hi, > > As you may already know I've integrated GCC's ProPolice into FreeBSD. > The build infrastructure overlord, namely ru@, (I'm quoting kan@) has > reviewed the patch and technically it is ready to hit the CVS tree. > > A few things should be discussed beforehand though. > > First, should we build world and/or kernel with SSP by default? I've > scamped a trivial benchmark back in 2006: timing buildworld with and > without SSP. You can found the result on my webpage: > http://tataz.chchile.org/~tataz/FreeSBD/SSP/#section1 404 :-\ > Also, the original ProPolice author achieved a thorough performance > comparison with and without SSP, and the overhead is really small: > http://www.trl.ibm.com/projects/security/ssp/node5.html > I would like to reach a consensus on whether SSP should be opt-in or > opt-out on FreeBSD. > > > Another concern that Robert Watson showed back in 2006 [1] when I brought > forward my patch was the compatibility between pre-SSP and post-SSP > binaries/libraries. > > I'll try to make it simple and short. SSP requires two additional > symbols that are kindly provided by libc. Any binary or library > compiled with SSP will require them. As long as your libc contains the > symbols, you can smoothly run pre-SSP applications with post-SSP libs as > well as the other way around. > > Also Kris explained [2] that once applied, it is painful to try to > revert the change (removing SSP symbols from libc). This is true but > once the patch gets committed, it should hopefully never happen. So I'd suggest something along the lines of: 1) Add the needed support symbols to libc (they don't hurt anyone, right?) 2) Add support to build kernel/world with SSP enabled - default OFF. 3) Solicit testing! 4) After some time has passed (and people have had to reinstall libc anyways) and enough feedback has been received flip the switch to default ON. In light of the the recent "let's save stack space in the kernel", I'd like to point out that SSP adds one word to every call. Not much, but still. Finally, what happens if SSP triggers in the kernel? Do we get a useable panic message? Can we get a kdb_traceback() (if compiled in)? Where is the patch, btw? -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News From owner-freebsd-arch@FreeBSD.ORG Fri Apr 18 18:46:36 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A9AD1065670 for ; Fri, 18 Apr 2008 18:46:36 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from smtpoutm.mac.com (smtpoutm.mac.com [17.148.16.70]) by mx1.freebsd.org (Postfix) with ESMTP id 708B08FC1E for ; Fri, 18 Apr 2008 18:46:31 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from mac.com (asmtp005-s [10.150.69.68]) by smtpoutm.mac.com (Xserve/smtpout007/MantshX 4.0) with ESMTP id m3IIkVQe001694; Fri, 18 Apr 2008 11:46:31 -0700 (PDT) Received: from macbook-pro.jnpr.net (natint3.juniper.net [66.129.224.36]) (authenticated bits=0) by mac.com (Xserve/asmtp005/MantshX 4.0) with ESMTP id m3IIkJaT000828 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Fri, 18 Apr 2008 11:46:22 -0700 (PDT) Message-Id: From: Marcel Moolenaar To: Jeremie Le Hen In-Reply-To: <20080418165859.GD4840@obiwan.tataz.chchile.org> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Fri, 18 Apr 2008 11:46:18 -0700 References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <20080418165859.GD4840@obiwan.tataz.chchile.org> X-Mailer: Apple Mail (2.919.2) Cc: freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 18:46:36 -0000 On Apr 18, 2008, at 9:58 AM, Jeremie Le Hen wrote: > This should theorically work for all arch as, from what I've read, > ProPolice takes place at the intermediate representation level of the > compiler. This should therefore be architecture agnostic. The question is whether it will actually make a difference on ia64? The stack does not contain any of the "objects" that ProPolice tries to protect from "stack-smashing" attacks, so what good is the added overhead? > Basically, a "canary" is randomly chosen when the program starts (this > part lives in libc). GCC inserts code in prologue and epilogue of all > functions that contains a buffer of 8 or more bytes. In the prologue, > the canary is pushed on the stack right after the return valued has > been > pushed, and this value is then checked in function epilogue. If the > value in the stack has changed, there has been a buffer overflow The ia64 architecture has been designed to eliminate use of the stack as much as possible for performance reasons. ProPolice does add significant overhead for no good reason AFAICT. So, let's assume at this time that ia64 is out and that an opt-out is reasonable (given that ia64 is expected to be the only one that doesn't need it). -- Marcel Moolenaar xcllnt@mac.com From owner-freebsd-arch@FreeBSD.ORG Fri Apr 18 19:21:02 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F5591065671 for ; Fri, 18 Apr 2008 19:21:02 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from smtpoutm.mac.com (smtpoutm.mac.com [17.148.16.77]) by mx1.freebsd.org (Postfix) with ESMTP id 75EDB8FC18 for ; Fri, 18 Apr 2008 19:21:02 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from mac.com (asmtp007-s [10.150.69.70]) by smtpoutm.mac.com (Xserve/smtpout014/MantshX 4.0) with ESMTP id m3IJL2CC027102; Fri, 18 Apr 2008 12:21:02 -0700 (PDT) Received: from macbook-pro.jnpr.net (natint3.juniper.net [66.129.224.36]) (authenticated bits=0) by mac.com (Xserve/asmtp007/MantshX 4.0) with ESMTP id m3IJKkoZ016377 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Fri, 18 Apr 2008 12:20:59 -0700 (PDT) Message-Id: <4D7941ED-03BA-4F3B-8590-65EA8142EC00@mac.com> From: Marcel Moolenaar To: Max Laier In-Reply-To: <200804181945.59189.max@love2party.net> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Fri, 18 Apr 2008 12:20:45 -0700 References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> X-Mailer: Apple Mail (2.919.2) Cc: Jeremie Le Hen , freebsd-arch@freebsd.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 19:21:02 -0000 On Apr 18, 2008, at 10:45 AM, Max Laier wrote: > On Friday 18 April 2008 15:27:49 Jeremie Le Hen wrote: >> Hi, >> >> As you may already know I've integrated GCC's ProPolice into FreeBSD. >> The build infrastructure overlord, namely ru@, (I'm quoting kan@) has >> reviewed the patch and technically it is ready to hit the CVS tree. >> >> A few things should be discussed beforehand though. >> >> First, should we build world and/or kernel with SSP by default? I've >> scamped a trivial benchmark back in 2006: timing buildworld with and >> without SSP. You can found the result on my webpage: >> http://tataz.chchile.org/~tataz/FreeSBD/SSP/#section1 > > 404 :-\ > >> Also, the original ProPolice author achieved a thorough performance >> comparison with and without SSP, and the overhead is really small: >> http://www.trl.ibm.com/projects/security/ssp/node5.html >> I would like to reach a consensus on whether SSP should be opt-in or >> opt-out on FreeBSD. >> >> >> Another concern that Robert Watson showed back in 2006 [1] when I >> brought >> forward my patch was the compatibility between pre-SSP and post-SSP >> binaries/libraries. >> >> I'll try to make it simple and short. SSP requires two additional >> symbols that are kindly provided by libc. Any binary or library >> compiled with SSP will require them. As long as your libc contains >> the >> symbols, you can smoothly run pre-SSP applications with post-SSP >> libs as >> well as the other way around. >> >> Also Kris explained [2] that once applied, it is painful to try to >> revert the change (removing SSP symbols from libc). This is true but >> once the patch gets committed, it should hopefully never happen. > > So I'd suggest something along the lines of: > > 1) Add the needed support symbols to libc (they don't hurt anyone, > right?) autoconf? With tools like autoconf, I'm much less inclined to say that some unused symbol, library, header or whatever is harmless. I've turned into a "if we don't use it, don't add/keep it" person :-) -- Marcel Moolenaar xcllnt@mac.com From owner-freebsd-arch@FreeBSD.ORG Fri Apr 18 23:36:07 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5227106564A for ; Fri, 18 Apr 2008 23:36:07 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (smtp5-g19.free.fr [212.27.42.35]) by mx1.freebsd.org (Postfix) with ESMTP id 40F118FC16 for ; Fri, 18 Apr 2008 23:36:07 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp5-g19.free.fr (Postfix) with ESMTP id 6B33C381A8B; Sat, 19 Apr 2008 00:13:41 +0200 (CEST) Received: from tatooine.tataz.chchile.org (tataz.chchile.org [82.233.239.98]) by smtp5-g19.free.fr (Postfix) with ESMTP id 5947839A1A1; Fri, 18 Apr 2008 22:49:51 +0200 (CEST) Received: from obiwan.tataz.chchile.org (unknown [192.168.1.25]) by tatooine.tataz.chchile.org (Postfix) with ESMTP id 4AA599BF12; Fri, 18 Apr 2008 20:47:38 +0000 (UTC) Received: by obiwan.tataz.chchile.org (Postfix, from userid 1000) id 3C67E405B; Fri, 18 Apr 2008 22:47:38 +0200 (CEST) Date: Fri, 18 Apr 2008 22:47:38 +0200 From: Jeremie Le Hen To: Max Laier Message-ID: <20080418204738.GE4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200804181945.59189.max@love2party.net> User-Agent: Mutt/1.5.15 (2007-04-06) Cc: freebsd-arch@freebsd.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 23:36:07 -0000 On Fri, Apr 18, 2008 at 07:45:58PM +0200, Max Laier wrote: > > First, should we build world and/or kernel with SSP by default? I've > > scamped a trivial benchmark back in 2006: timing buildworld with and > > without SSP. You can found the result on my webpage: > > http://tataz.chchile.org/~tataz/FreeSBD/SSP/#section1 > > 404 :-\ Oops, sorry I made a typo. http://tataz.chchile.org/~tataz/FreeBSD/SSP/#section1 > So I'd suggest something along the lines of: > > 1) Add the needed support symbols to libc (they don't hurt anyone, right?) Actually, they are already in libc :-). See src/sys/lib/libc/sys/stack_protector.c . > 2) Add support to build kernel/world with SSP enabled - default OFF. > 3) Solicit testing! > 4) After some time has passed (and people have had to reinstall libc anyways) > and enough feedback has been received flip the switch to default ON. I will change my patch to make SSP opt-out. This will address Marcel's concern too. > In light of the the recent "let's save stack space in the kernel", I'd like to > point out that SSP adds one word to every call. Not much, but still. Certainly. I would like to hear opinion from other committers if SSP should be enabled by default. > Finally, what happens if SSP triggers in the kernel? Do we get a useable > panic message? Can we get a kdb_traceback() (if compiled in)? Where is the > patch, btw? Yes, the panic message is explicit. But since a stack overflow occured, the backtrace may be corrupted. BTW the panic message warns about this. See src/sys/kern/stack_protector.c in the patch. Regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 00:17:19 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6820106566C for ; Sat, 19 Apr 2008 00:17:19 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.208.78.105]) by mx1.freebsd.org (Postfix) with ESMTP id 7E8A58FC0A for ; Sat, 19 Apr 2008 00:17:19 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.2/8.14.2) with ESMTP id m3J0FtR6050153; Fri, 18 Apr 2008 17:15:55 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.2/8.14.2/Submit) id m3J0Ftuj050152; Fri, 18 Apr 2008 17:15:55 -0700 (PDT) (envelope-from sgk) Date: Fri, 18 Apr 2008 17:15:55 -0700 From: Steve Kargl To: Jeremie Le Hen Message-ID: <20080419001555.GA50009@troutmask.apl.washington.edu> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> <20080418204738.GE4840@obiwan.tataz.chchile.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080418204738.GE4840@obiwan.tataz.chchile.org> User-Agent: Mutt/1.4.2.3i Cc: Max Laier , freebsd-arch@freebsd.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 00:17:19 -0000 On Fri, Apr 18, 2008 at 10:47:38PM +0200, Jeremie Le Hen wrote: > > Certainly. I would like to hear opinion from other committers if SSP > should be enabled by default. > I'm not a committer, but I'll ask a question anyway. Can you quantify the performance impact, in particular for numerically intensive codes with heavy use of libm? -- Steve From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 00:37:29 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70AB5106564A for ; Sat, 19 Apr 2008 00:37:29 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from smtp7.server.rpi.edu (smtp7.server.rpi.edu [128.113.2.227]) by mx1.freebsd.org (Postfix) with ESMTP id 34F048FC19 for ; Sat, 19 Apr 2008 00:37:28 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp7.server.rpi.edu (8.13.1/8.13.1) with ESMTP id m3J0bP00028038; Fri, 18 Apr 2008 20:37:27 -0400 Mime-Version: 1.0 Message-Id: In-Reply-To: <20080418204738.GE4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> <20080418204738.GE4840@obiwan.tataz.chchile.org> Date: Fri, 18 Apr 2008 20:37:24 -0400 To: Jeremie Le Hen , Max Laier From: Garance A Drosehn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-RPI-SA-Score: undef - spam scanning disabled X-CanItPRO-Stream: default X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.227 Cc: freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 00:37:29 -0000 At 10:47 PM +0200 4/18/08, Jeremie Le Hen wrote: >On Fri, Apr 18, 2008, Max Laier wrote: > >> 2) Add support to build kernel/world with SSP enabled - default OFF. >> 3) Solicit testing! > > 4) After some time has passed (and people have had to reinstall > > libc anyways), and enough feedback has been received flip the > > switch to default ON. > >I will change my patch to make SSP opt-out. This will address >Marcel's concern too. This is a big-enough change that we should ease into it, as Max suggests. It can be very painful to back out of this, so we don't want to rush into the change and then find out that we really really regret it. > > In light of the the recent "let's save stack space in the kernel", > > I'd like to point out that SSP adds one word to every call. Not > > much, but still. > >Certainly. I would like to hear opinion from other committers if >SSP should be enabled by default. You've probably described this somewhere, but let me ask for a little more info. There is "enabled" in the sense that the symbols exist in libc, so programs *can* be compiled with -fstack-protector-all or -fstack-protector options. But nothing much really happens until we start building programs with those options turned on. Once a program is built with one of those options, then that program has code in it which will check for stack-smashing in that one program. So, in my mind there's the step of "enabling SSP", and then there's the step of "compiling programs with stack-protection on". I think we could also split that the second step in stages: a) add stack-protection to all setuid programs in the base system. b) add stack-protection to all "/usr/sbin" programs in the base. c) add stack-protection to all programs in the base. d) compile ports with stack-protection on. Is that a reasonable breakdown? We could (perhaps) have four switches, and people could turn on whatever wants they wanted. But as far as the *default* values, we might want "class A" to default on for 8.0-release, but the other classes to default off. Then (maybe) add another class each time we make another release. -- Garance Alistair Drosehn = drosehn@rpi.edu Senior Systems Programmer or gad@FreeBSD.org Rensselaer Polytechnic Institute; Troy, NY; USA From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 00:45:16 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 217731065672 for ; Sat, 19 Apr 2008 00:45:16 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from smtp7.server.rpi.edu (smtp7.server.rpi.edu [128.113.2.227]) by mx1.freebsd.org (Postfix) with ESMTP id DCB528FC0A for ; Sat, 19 Apr 2008 00:45:15 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp7.server.rpi.edu (8.13.1/8.13.1) with ESMTP id m3J0jEbw029722; Fri, 18 Apr 2008 20:45:14 -0400 Mime-Version: 1.0 Message-Id: In-Reply-To: <20080418165859.GD4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <20080418165859.GD4840@obiwan.tataz.chchile.org> Date: Fri, 18 Apr 2008 20:45:13 -0400 To: Jeremie Le Hen , Marcel Moolenaar From: Garance A Drosehn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-RPI-SA-Score: undef - spam scanning disabled X-CanItPRO-Stream: default X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.227 Cc: freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 00:45:16 -0000 At 6:58 PM +0200 4/18/08, Jeremie Le Hen wrote: >Hi Marcel, > >On Fri, Apr 18, 2008 at 08:52:42AM -0700, Marcel Moolenaar wrote: >> > The build infrastructure overlord, namely ru@, (I'm quoting kan@) has >> > reviewed the patch and technically it is ready to hit the CVS tree. >> > >> > A few things should be discussed beforehand though. >> > >> > First, should we build world and/or kernel with SSP by default? >> >> Really, first is: what platforms does this apply to and/or have >> you tested this on? > >The patch enables SSP for all archs. Unfortunately I've not been able >to test it myself on other arch than i386, but two years ago I've got a >successful feedback from Pascal Hofstee on amd64. ISTR there was a >sparc64 user too, but I'm not sure. I have run it on FreeBD/PowerPC for a short time. Seemed to work fine, but I had this installed on a set of partitions which have sense been erased. They were not erased due to problems with Propolice, but just because I wanted them for some other purpose, and I wasn't really to commit to always having propolice turned on. I *think* I also had a build of FreeBSD/sparc64 with this turned on, but again it was just a test system which has since been overwritten. (actually I've upgraded to a newer, slightly faster sparc64 machine, and lost my propolice build while making that upgrade). The only reason I did the above is because I had a friend who wanted to move from OpenBSD to FreeBSD, and really wanted Propolice as a working option on his machines. After I showed that it did work, he built freebsd with propolice on both i386 and amd64 platforms. But in his case he screwed up the first time he cvsup'ed those systems, just because the cvsup clobbered some of the local changes he had made for propolice. So he ended up deciding it was safer for him to skip propolice for now, until it's all part of the base FreeBSD system. -- Garance Alistair Drosehn = drosehn@rpi.edu Senior Systems Programmer or gad@FreeBSD.org Rensselaer Polytechnic Institute; Troy, NY; USA From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 01:00:17 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17A2B106566B for ; Sat, 19 Apr 2008 01:00:17 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from smtp8.server.rpi.edu (smtp8.server.rpi.edu [128.113.2.228]) by mx1.freebsd.org (Postfix) with ESMTP id B51628FC1E for ; Sat, 19 Apr 2008 01:00:16 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp8.server.rpi.edu (8.13.1/8.13.1) with ESMTP id m3J10FiY007821 for ; Fri, 18 Apr 2008 21:00:15 -0400 Mime-Version: 1.0 Message-Id: In-Reply-To: References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <20080418165859.GD4840@obiwan.tataz.chchile.org> Date: Fri, 18 Apr 2008 21:00:14 -0400 To: freebsd-arch@FreeBSD.org From: Garance A Drosehn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-RPI-SA-Score: undef - spam scanning disabled X-CanItPRO-Stream: default X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.228 Cc: Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 01:00:17 -0000 You might not know it from my emails, but English really is my first and main language! At 8:45 PM -0400 4/18/08, Garance A Drosehn wrote: > >I have run it on FreeBD/PowerPC for a short time. Seemed to work fine, >but I had this installed on a set of partitions which have sense been "which have *since* been erased..." >erased. They were not erased due to problems with Propolice, but just >because I wanted them for some other purpose, and I wasn't really to "and I wasn't *ready* to commit..." >commit to always having propolice turned on. -- Garance Alistair Drosehn = drosehn@rpi.edu Senior Systems Programmer or gad@FreeBSD.org Rensselaer Polytechnic Institute; Troy, NY; USA From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 01:23:31 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 988D0106564A for ; Sat, 19 Apr 2008 01:23:31 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from smtp8.server.rpi.edu (smtp8.server.rpi.edu [128.113.2.228]) by mx1.freebsd.org (Postfix) with ESMTP id 5B7F68FC15 for ; Sat, 19 Apr 2008 01:23:31 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp8.server.rpi.edu (8.13.1/8.13.1) with ESMTP id m3J0MFF1031362; Fri, 18 Apr 2008 20:22:15 -0400 Mime-Version: 1.0 Message-Id: In-Reply-To: References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <20080418165859.GD4840@obiwan.tataz.chchile.org> Date: Fri, 18 Apr 2008 20:22:13 -0400 To: Marcel Moolenaar , Jeremie Le Hen From: Garance A Drosehn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-RPI-SA-Score: undef - spam scanning disabled X-CanItPRO-Stream: default X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.228 Cc: freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 01:23:31 -0000 At 11:46 AM -0700 4/18/08, Marcel Moolenaar wrote: >On Apr 18, 2008, at 9:58 AM, Jeremie Le Hen wrote: >>This should theorically work for all arch as, from what I've read, >>ProPolice takes place at the intermediate representation level of the >>compiler. This should therefore be architecture agnostic. > >The question is whether it will actually make a difference on ia64? > >The stack does not contain any of the "objects" that ProPolice >tries to protect from "stack-smashing" attacks, so what good is the >added overhead? On ia64 we have a large set of userland programs running C code. We run the same C code there which which we run on all other architectures. ProPolice will take a certain class of *actual* bugs in that C code, and turn those into fatal bugs on the platforms where ProPolice does work. By making those bugs much more visible on our high-volume platforms, it will also greatly increase the chance that someone will take the time to find and fix the *actual* bug. The bug in C. The bug in C code which we are running on ia64. Even if Propolice could never be made to work on ia64, the presence of it on other hardware platforms will benefit users on ia64. >>Basically, a "canary" is randomly chosen when the program starts (this >>part lives in libc). GCC inserts code in prologue and epilogue of all >>functions that contains a buffer of 8 or more bytes. In the prologue, >>the canary is pushed on the stack right after the return valued has been >>pushed, and this value is then checked in function epilogue. If the >>value in the stack has changed, there has been a buffer overflow > >The ia64 architecture has been designed to eliminate use of the >stack as much as possible for performance reasons. ProPolice does >add significant overhead for no good reason AFAICT. We can certainly have a different default for propolice/SSD support on FreeBSD/ia64 than we default to for other architectures. That is a very reasonable idea. I, for one, am very interested in Propolice support in FreeBSD, at least as an easy-to-set option. By that I mean: I don't mind what the default is, just as long as there is an easy and safe way to specify that you want propolice support at buildworld time. Right now we're in a situation where someone can specify it by making a few updates, but then that person is *really* screwed if they lose the updates by mistake. -- Garance Alistair Drosehn = drosehn@rpi.edu Senior Systems Programmer or gad@FreeBSD.org Rensselaer Polytechnic Institute; Troy, NY; USA From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 07:14:13 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 156B4106564A for ; Sat, 19 Apr 2008 07:14:13 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id 9DE4D8FC12 for ; Sat, 19 Apr 2008 07:14:12 +0000 (UTC) (envelope-from peterjeremy@optushome.com.au) Received: from server.vk2pj.dyndns.org (c220-239-20-82.belrs4.nsw.optusnet.com.au [220.239.20.82]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m3J7E5w0008144 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 19 Apr 2008 17:14:06 +1000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.2/8.14.1) with ESMTP id m3J7E5qo020839; Sat, 19 Apr 2008 17:14:05 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.2/8.14.2/Submit) id m3J7E0X5020838; Sat, 19 Apr 2008 17:14:00 +1000 (EST) (envelope-from peter) Date: Sat, 19 Apr 2008 17:14:00 +1000 From: Peter Jeremy To: Jeremie Le Hen Message-ID: <20080419071400.GP73016@server.vk2pj.dyndns.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> <20080418204738.GE4840@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="q8dntDJTu318bll0" Content-Disposition: inline In-Reply-To: <20080418204738.GE4840@obiwan.tataz.chchile.org> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.17 (2007-11-01) Cc: freebsd-arch@freebsd.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 07:14:13 -0000 --q8dntDJTu318bll0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Apr 18, 2008 at 10:47:38PM +0200, Jeremie Le Hen wrote: >> 1) Add the needed support symbols to libc (they don't hurt anyone, right= ?) > >Actually, they are already in libc :-). >See src/sys/lib/libc/sys/stack_protector.c . /usr/src/lib/libc/sys/stack_protector.c Similar code needs to be added to libkern before SSP can be enabled within the kernel. >Certainly. I would like to hear opinion from other committers if SSP >should be enabled by default. I would agree that a phased approach to enabling SSP is warranted but I believe it should wind up enabled by default in -current fairly rapidly. Once the Project has gained more familiarity with SSP and its impacts, a decision can be made as to whether it should default to on or off in -stable and releases. --=20 Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. --q8dntDJTu318bll0 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.8 (FreeBSD) iEYEARECAAYFAkgJm7gACgkQ/opHv/APuId7CACfUNSfkH0Cox0TH17XYRFiYsgD lmUAn3jgUZEvZtDY8IxIFUU90xIbBSqv =G8Rw -----END PGP SIGNATURE----- --q8dntDJTu318bll0-- From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 10:56:20 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84EC4106566C for ; Sat, 19 Apr 2008 10:56:20 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.172]) by mx1.freebsd.org (Postfix) with ESMTP id 589E98FC0C for ; Sat, 19 Apr 2008 10:56:20 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wf-out-1314.google.com with SMTP id 25so780382wfa.7 for ; Sat, 19 Apr 2008 03:56:20 -0700 (PDT) Received: by 10.142.128.6 with SMTP id a6mr977334wfd.206.1208602579876; Sat, 19 Apr 2008 03:56:19 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id 9sm2125776wfc.16.2008.04.19.03.56.18 (version=SSLv3 cipher=OTHER); Sat, 19 Apr 2008 03:56:19 -0700 (PDT) Date: Sat, 19 Apr 2008 00:56:44 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: arch@freebsd.org Message-ID: <20080419004911.R942@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: monitor/mwait support for idle X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 10:56:20 -0000 http://people.freebsd.org/~jeff/mwait.diff This patch implements support for the x86/amd64 monitor and mwait instructions in the idle loop. This also implements idle loop selection via a sysctl string. The following loops are supported, in decreasing order of performance and power consumption: spin - Simply returns mwait - Always use mwait to sleep. CPU enters C0 or C1 depending on how busy it is. mwait_hlt - Use mwait when busy but fall back to hlt/acpi when not. hlt - pure hlt loop acpi - uses acpi_cpu_idle if available and hlt if not. This is the default. This also introduces a new MD function 'cpu_wake_idle' which allows MD to use a faster mechanism than IPI to wake idle. In the spin case this is a nop. For hlt and acpi we resort to an IPI. If the processor is sleeping in mwait we can simply write to a per-cpu buffer to wake it up. This saves considerable cpu cycles on the initiator and target. The prototype for cpu_idle() changed to accept an integer indication of how busy we are from the scheduler. If we have been busy MD code may choose to enter a higher power state on idle. ULE now spins for a short while if we have been very busy regardless of MD settings. There seems to be a problem entering C0 on the Xeons I have access to. It returns from mwait too quickly. Hopefully intel will respond to my email about that. Feedback welcome. Thanks, Jeff From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 13:05:10 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AF6CE1065674 for ; Sat, 19 Apr 2008 13:05:10 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (smtp5-g19.free.fr [212.27.42.35]) by mx1.freebsd.org (Postfix) with ESMTP id 495B28FC1B for ; Sat, 19 Apr 2008 13:05:10 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp5-g19.free.fr (Postfix) with ESMTP id EF038404927; Sat, 19 Apr 2008 14:22:07 +0200 (CEST) Received: from tatooine.tataz.chchile.org (tataz.chchile.org [82.233.239.98]) by smtp5-g19.free.fr (Postfix) with ESMTP id B81743F6EC7; Sat, 19 Apr 2008 09:37:02 +0200 (CEST) Received: from obiwan.tataz.chchile.org (unknown [192.168.1.25]) by tatooine.tataz.chchile.org (Postfix) with ESMTP id 926A69BF12; Sat, 19 Apr 2008 07:34:48 +0000 (UTC) Received: by obiwan.tataz.chchile.org (Postfix, from userid 1000) id 7EEB5405B; Sat, 19 Apr 2008 09:34:48 +0200 (CEST) Date: Sat, 19 Apr 2008 09:34:48 +0200 From: Jeremie Le Hen To: Garance A Drosehn Message-ID: <20080419073448.GG4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <20080418165859.GD4840@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.15 (2007-04-06) Cc: freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 13:05:10 -0000 Hi Garance, On Fri, Apr 18, 2008 at 08:45:13PM -0400, Garance A Drosehn wrote: > At 6:58 PM +0200 4/18/08, Jeremie Le Hen wrote: > > The patch enables SSP for all archs. Unfortunately I've not been able > > to test it myself on other arch than i386, but two years ago I've got a > > successful feedback from Pascal Hofstee on amd64. ISTR there was a > > sparc64 user too, but I'm not sure. > > I have run it on FreeBD/PowerPC for a short time. Seemed to work fine, > but I had this installed on a set of partitions which have sense been > erased. They were not erased due to problems with Propolice, but just > because I wanted them for some other purpose, and I wasn't really to > commit to always having propolice turned on. > > I *think* I also had a build of FreeBSD/sparc64 with this turned on, > but again it was just a test system which has since been overwritten. > (actually I've upgraded to a newer, slightly faster sparc64 machine, > and lost my propolice build while making that upgrade). Thanks for this feedback, I'm please to hear this has worked on other arch too. I had done the src/sys/boot/ part blindly for those, so I'm very happy to hear that now. You should have told me before! :) > The only reason I did the above is because I had a friend who wanted > to move from OpenBSD to FreeBSD, and really wanted Propolice as a > working option on his machines. After I showed that it did work, he > built freebsd with propolice on both i386 and amd64 platforms. But > in his case he screwed up the first time he cvsup'ed those systems, > just because the cvsup clobbered some of the local changes he had > made for propolice. So he ended up deciding it was safer for him to > skip propolice for now, until it's all part of the base FreeBSD system. Yes, upgrading a SSP-patched system is really a pain because this is only a patch. The advised way is to patch -R before doing cvs update and the reapply the patch. Regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 13:15:54 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 766051065673 for ; Sat, 19 Apr 2008 13:15:54 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from postfix1-g20.free.fr (postfix1-g20.free.fr [212.27.60.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2F2018FC1F for ; Sat, 19 Apr 2008 13:15:54 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (smtp5-g19.free.fr [212.27.42.35]) by postfix1-g20.free.fr (Postfix) with ESMTP id 4FCEF251F896; Sat, 19 Apr 2008 15:15:53 +0200 (CEST) Received: from tatooine.tataz.chchile.org (tataz.chchile.org [82.233.239.98]) by smtp5-g19.free.fr (Postfix) with ESMTP id DC1693F8E5B; Sat, 19 Apr 2008 09:48:52 +0200 (CEST) Received: from obiwan.tataz.chchile.org (unknown [192.168.1.25]) by tatooine.tataz.chchile.org (Postfix) with ESMTP id DA06F9BF12; Sat, 19 Apr 2008 07:46:38 +0000 (UTC) Received: by obiwan.tataz.chchile.org (Postfix, from userid 1000) id C6C70405B; Sat, 19 Apr 2008 09:46:38 +0200 (CEST) Date: Sat, 19 Apr 2008 09:46:38 +0200 From: Jeremie Le Hen To: Steve Kargl Message-ID: <20080419074638.GH4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> <20080418204738.GE4840@obiwan.tataz.chchile.org> <20080419001555.GA50009@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080419001555.GA50009@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.15 (2007-04-06) Cc: freebsd-arch@freebsd.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 13:15:54 -0000 Hi Steve, On Fri, Apr 18, 2008 at 05:15:55PM -0700, Steve Kargl wrote: > On Fri, Apr 18, 2008 at 10:47:38PM +0200, Jeremie Le Hen wrote: > > > > Certainly. I would like to hear opinion from other committers if SSP > > should be enabled by default. > > I'm not a committer, but I'll ask a question anyway. > > Can you quantify the performance impact, in particular for > numerically intensive codes with heavy use of libm? I don't run such application, so I can't answer. Sorry. If you are willing to give a try, I would be pleased to help you to run your tests, or even run them on my side. BTW for the sake of my curiosity, is there a technical reason for ProPolice to be heavier for libm? Regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 13:57:06 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 224321065672 for ; Sat, 19 Apr 2008 13:57:06 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (smtp5-g19.free.fr [212.27.42.35]) by mx1.freebsd.org (Postfix) with ESMTP id CD75E8FC24 for ; Sat, 19 Apr 2008 13:57:05 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp5-g19.free.fr (Postfix) with ESMTP id 1A315397A56; Sat, 19 Apr 2008 13:27:49 +0200 (CEST) Received: from tatooine.tataz.chchile.org (tataz.chchile.org [82.233.239.98]) by smtp5-g19.free.fr (Postfix) with ESMTP id 9040438514; Sat, 19 Apr 2008 09:51:35 +0200 (CEST) Received: from obiwan.tataz.chchile.org (unknown [192.168.1.25]) by tatooine.tataz.chchile.org (Postfix) with ESMTP id 727FA9BF12; Sat, 19 Apr 2008 07:49:21 +0000 (UTC) Received: by obiwan.tataz.chchile.org (Postfix, from userid 1000) id 6710B405B; Sat, 19 Apr 2008 09:49:21 +0200 (CEST) Date: Sat, 19 Apr 2008 09:49:21 +0200 From: Jeremie Le Hen To: Peter Jeremy Message-ID: <20080419074921.GI4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> <20080418204738.GE4840@obiwan.tataz.chchile.org> <20080419071400.GP73016@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080419071400.GP73016@server.vk2pj.dyndns.org> User-Agent: Mutt/1.5.15 (2007-04-06) Cc: Jeremie Le Hen , freebsd-arch@freebsd.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 13:57:06 -0000 On Sat, Apr 19, 2008 at 05:14:00PM +1000, Peter Jeremy wrote: > On Fri, Apr 18, 2008 at 10:47:38PM +0200, Jeremie Le Hen wrote: > >Actually, they are already in libc :-). > >See src/sys/lib/libc/sys/stack_protector.c . > > /usr/src/lib/libc/sys/stack_protector.c > > Similar code needs to be added to libkern before SSP can be enabled > within the kernel. Yes sorry, I've made another typo. As I told to max in my previous e-mail, the kernel bits are in src/sys/kern/stack_protector.c . If you want to look at the patch, this is the last file. http://tataz.chchile.org/~tataz/FreeBSD/SSP/fbsd8-ssp.diff Thank you. Best regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 15:56:49 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EAAB9106566B for ; Sat, 19 Apr 2008 15:56:49 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.208.78.105]) by mx1.freebsd.org (Postfix) with ESMTP id C00F48FC16 for ; Sat, 19 Apr 2008 15:56:49 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.2/8.14.2) with ESMTP id m3JFtKZM055693; Sat, 19 Apr 2008 08:55:20 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.2/8.14.2/Submit) id m3JFtJOp055692; Sat, 19 Apr 2008 08:55:19 -0700 (PDT) (envelope-from sgk) Date: Sat, 19 Apr 2008 08:55:19 -0700 From: Steve Kargl To: Jeremie Le Hen Message-ID: <20080419155519.GA55562@troutmask.apl.washington.edu> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> <20080418204738.GE4840@obiwan.tataz.chchile.org> <20080419001555.GA50009@troutmask.apl.washington.edu> <20080419074638.GH4840@obiwan.tataz.chchile.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080419074638.GH4840@obiwan.tataz.chchile.org> User-Agent: Mutt/1.4.2.3i Cc: freebsd-arch@freebsd.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 15:56:50 -0000 On Sat, Apr 19, 2008 at 09:46:38AM +0200, Jeremie Le Hen wrote: > Hi Steve, > > On Fri, Apr 18, 2008 at 05:15:55PM -0700, Steve Kargl wrote: > > On Fri, Apr 18, 2008 at 10:47:38PM +0200, Jeremie Le Hen wrote: > > > > > > Certainly. I would like to hear opinion from other committers if SSP > > > should be enabled by default. > > > > I'm not a committer, but I'll ask a question anyway. > > > > Can you quantify the performance impact, in particular for > > numerically intensive codes with heavy use of libm? > > I don't run such application, so I can't answer. Sorry. If you are > willing to give a try, I would be pleased to help you to run your tests, > or even run them on my side. > > BTW for the sake of my curiosity, is there a technical reason for > ProPolice to be heavier for libm? > Most numerical applications, that I'm familiar with, tend to contain nested loops that make calls to functions in libm. Simple example in one of my codes is a 3 deep loop that computes what is known as the thermal dose. for (k = 0; k < kmax; k++) for (j = 0; j < jmax; j++) for (i = 0; i < imax; i++) td += exp(a * b[k][j][i]) Now, put the above loops inside a time loop with n time steps. exp() will be called kmax*jmax*imax*n times where this product can be quite large (order of 5e11). Any overhead caused by PP will increase the simulation time. A 1% increase in time is probably tolerable, but a 10% increase would be detrimental to simulations that takes days to complete (yes, I have a few that run that long). I'll see if I can get you some numbers this weekend. -- Steve From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 16:01:17 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3416D106566B; Sat, 19 Apr 2008 16:01:17 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (smtp5-g19.free.fr [212.27.42.35]) by mx1.freebsd.org (Postfix) with ESMTP id BB5518FC0C; Sat, 19 Apr 2008 16:01:16 +0000 (UTC) (envelope-from tataz@tataz.chchile.org) Received: from smtp5-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp5-g19.free.fr (Postfix) with ESMTP id 656833F7D76; Sat, 19 Apr 2008 12:41:24 +0200 (CEST) Received: from tatooine.tataz.chchile.org (tataz.chchile.org [82.233.239.98]) by smtp5-g19.free.fr (Postfix) with ESMTP id 6F6583FA895; Sat, 19 Apr 2008 11:39:15 +0200 (CEST) Received: from obiwan.tataz.chchile.org (unknown [192.168.1.25]) by tatooine.tataz.chchile.org (Postfix) with ESMTP id 3CD749BF12; Sat, 19 Apr 2008 09:37:01 +0000 (UTC) Received: by obiwan.tataz.chchile.org (Postfix, from userid 1000) id 31331405B; Sat, 19 Apr 2008 11:37:01 +0200 (CEST) Date: Sat, 19 Apr 2008 11:37:01 +0200 From: Jeremie Le Hen To: Garance A Drosehn Message-ID: <20080419093701.GJ4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> <20080418204738.GE4840@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.15 (2007-04-06) Cc: Max Laier , freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 16:01:17 -0000 Hi, On Fri, Apr 18, 2008 at 08:37:24PM -0400, Garance A Drosehn wrote: > At 10:47 PM +0200 4/18/08, Jeremie Le Hen wrote: > > I will change my patch to make SSP opt-out. This will address > > Marcel's concern too. > > This is a big-enough change that we should ease into it, as Max > suggests. It can be very painful to back out of this, so we don't > want to rush into the change and then find out that we really > really regret it. Honestly, the really painful part when reverting this patch was the disappearance of SSP symbols from libc. But I think this will never happen because they have already been in libc for about a year. Enabling/disabling SSP should not hurt. > You've probably described this somewhere, but let me ask for a little > more info. > > There is "enabled" in the sense that the symbols exist in libc, so > programs *can* be compiled with -fstack-protector-all or > -fstack-protector options. But nothing much really happens until > we start building programs with those options turned on. Once a > program is built with one of those options, then that program has > code in it which will check for stack-smashing in that one program. > > So, in my mind there's the step of "enabling SSP", and then there's > the step of "compiling programs with stack-protection on". I think > we could also split that the second step in stages: > > a) add stack-protection to all setuid programs in the base system. > b) add stack-protection to all "/usr/sbin" programs in the base. > c) add stack-protection to all programs in the base. > d) compile ports with stack-protection on. > > Is that a reasonable breakdown? We could (perhaps) have four > switches, and people could turn on whatever wants they wanted. But > as far as the *default* values, we might want "class A" to default > on for 8.0-release, but the other classes to default off. Then > (maybe) add another class each time we make another release. On Sat, Apr 19, 2008 at 05:14:00PM +1000, Peter Jeremy wrote: > I would agree that a phased approach to enabling SSP is warranted but > I believe it should wind up enabled by default in -current fairly > rapidly. Once the Project has gained more familiarity with SSP and > its > impacts, a decision can be made as to whether it should default to on > or off in -stable and releases. Provided the very little performance overhead [1], my leaning goes toward Peter here. Moreover given that some ports simply don't compile with SSP (qemu, gcc4, etherboot), my personal opinion is it should be enabled by default for ports on -CURRENT in order to spot those out. Note that the port bits have not been provided yet, I will contact portmgr@ for this. [1] http://www.trl.ibm.com/projects/security/ssp/node5.html Thanks. Regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org > From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 18:47:28 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7928F106564A for ; Sat, 19 Apr 2008 18:47:28 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from smtp7.server.rpi.edu (smtp7.server.rpi.edu [128.113.2.227]) by mx1.freebsd.org (Postfix) with ESMTP id 3730A8FC15 for ; Sat, 19 Apr 2008 18:47:28 +0000 (UTC) (envelope-from gad@FreeBSD.org) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp7.server.rpi.edu (8.13.1/8.13.1) with ESMTP id m3JIlO7d021526; Sat, 19 Apr 2008 14:47:26 -0400 Mime-Version: 1.0 Message-Id: In-Reply-To: <20080419093701.GJ4840@obiwan.tataz.chchile.org> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> <20080418204738.GE4840@obiwan.tataz.chchile.org> <20080419093701.GJ4840@obiwan.tataz.chchile.org> Date: Sat, 19 Apr 2008 14:47:24 -0400 To: Jeremie Le Hen From: Garance A Drosehn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-RPI-SA-Score: undef - spam scanning disabled X-CanItPRO-Stream: default X-Canit-Stats-ID: Bayes signature not available X-Scanned-By: CanIt (www . roaringpenguin . com) on 128.113.2.227 Cc: Max Laier , freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 18:47:28 -0000 At 11:37 AM +0200 4/19/08, Jeremie Le Hen wrote: > >On Fri, Apr 18, 2008 at 08:37:24PM -0400, Garance A Drosehn wrote: > > >> So, in my mind there's the step of "enabling SSP", and then there's >> the step of "compiling programs with stack-protection on". I think >> we could also split that the second step in stages: >> >> a) add stack-protection to all setuid programs in the base system. >> b) add stack-protection to all "/usr/sbin" programs in the base. >> c) add stack-protection to all programs in the base. >> d) compile ports with stack-protection on. >> >> Is that a reasonable breakdown? We could (perhaps) have four >> switches, and people could turn on whatever wants they wanted. But >> as far as the *default* values, we might want "class A" to default >> on for 8.0-release, but the other classes to default off. Then >> (maybe) add another class each time we make another release. > >On Sat, Apr 19, 2008 at 05:14:00PM +1000, Peter Jeremy wrote: > > I would agree that a phased approach to enabling SSP is warranted > > but I believe it should wind up enabled by default in -current > > fairly rapidly. Once the Project has gained more familiarity with > > SSP and its impacts, a decision can be made as to whether it should > > default to on or off in -stable and releases. > >Provided the very little performance overhead [1], my leaning goes >toward Peter here. My comment was talking about how we would roll out SSP for *releases*. I do agree we'd move faster for having developers test it in -current. >Moreover given that some ports simply don't compile with SSP (qemu, >gcc4, etherboot), my personal opinion is it should be enabled by >default for ports on -CURRENT in order to spot those out. This part I'm not so sure of. The fact that I'm willing to run freebsd-current to test *freebsd* changes does not mean that I want to be constantly tripping over port-specific problems. I realize that SSP is certainly useful for ports, but that's > 15,000 programs which we probably have little control over. It's going to take quite awhile before we get that sorted out. I guess I don't have any specific recommendation for how to handle SSP in the ports collection. Maybe per-port settings, although that would also get messy. The main ports-developers should decide how SSP should be rolled out through the ports collection. -- Garance Alistair Drosehn = drosehn@rpi.edu Senior Systems Programmer or gad@FreeBSD.org Rensselaer Polytechnic Institute; Troy, NY; USA From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 19:21:53 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2A6A6106567D for ; Sat, 19 Apr 2008 19:21:53 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 17F188FC14 for ; Sat, 19 Apr 2008 19:21:52 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id ED8B71A4D82; Sat, 19 Apr 2008 12:21:52 -0700 (PDT) Date: Sat, 19 Apr 2008 12:21:52 -0700 From: Alfred Perlstein To: Jeff Roberson Message-ID: <20080419192152.GX95731@elvis.mu.org> References: <20080419004911.R942@desktop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080419004911.R942@desktop> User-Agent: Mutt/1.4.2.3i Cc: arch@freebsd.org Subject: Re: monitor/mwait support for idle X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 19:21:53 -0000 Jeff, this is very interesting! I have a question about your earlier email. You mentioned that the IPIs and communication to enter the idle state (hlt, mwait) is expensive. Perhaps something to track the number of times entering and exiting the state would be a good idea. Additionally a tuneable for the number of spins before entering the state might be a good idea. Perhaps spinning for "1" or "2" (where 1/2 is one unit of hz) before entering idle might be a good compromise to avoid hlt enter/exit thrashing. Let me know your thoughts on this. It may already be implemented so just saying "we do that" would be fine too. :) -Alfred * Jeff Roberson [080419 03:56] wrote: > http://people.freebsd.org/~jeff/mwait.diff > > This patch implements support for the x86/amd64 monitor and mwait > instructions in the idle loop. This also implements idle loop selection > via a sysctl string. The following loops are supported, in > decreasing order of performance and power consumption: > > spin - Simply returns > mwait - Always use mwait to sleep. CPU enters C0 or C1 depending on > how busy it is. > mwait_hlt - Use mwait when busy but fall back to hlt/acpi when not. > hlt - pure hlt loop > acpi - uses acpi_cpu_idle if available and hlt if not. This is the > default. > > This also introduces a new MD function 'cpu_wake_idle' which allows MD to > use a faster mechanism than IPI to wake idle. In the spin case this is a > nop. For hlt and acpi we resort to an IPI. If the processor is sleeping > in mwait we can simply write to a per-cpu buffer to wake it up. This > saves considerable cpu cycles on the initiator and target. > > The prototype for cpu_idle() changed to accept an integer indication of > how busy we are from the scheduler. If we have been busy MD code may > choose to enter a higher power state on idle. ULE now spins for a short > while if we have been very busy regardless of MD settings. > > There seems to be a problem entering C0 on the Xeons I have access to. It > returns from mwait too quickly. Hopefully intel will respond to my email > about that. > > Feedback welcome. > > Thanks, > Jeff > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" -- - Alfred Perlstein From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 21:24:59 2008 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 51BE8106566B; Sat, 19 Apr 2008 21:24:59 +0000 (UTC) (envelope-from linimon@lonesome.com) Received: from mail.soaustin.net (lefty.soaustin.net [66.135.55.46]) by mx1.freebsd.org (Postfix) with ESMTP id 315AC8FC16; Sat, 19 Apr 2008 21:24:59 +0000 (UTC) (envelope-from linimon@lonesome.com) Received: by mail.soaustin.net (Postfix, from userid 502) id 90DBB8C0B8; Sat, 19 Apr 2008 15:58:10 -0500 (CDT) Date: Sat, 19 Apr 2008 15:58:10 -0500 To: Jeremie Le Hen Message-ID: <20080419205810.GA16584@soaustin.net> References: <20080418132749.GB4840@obiwan.tataz.chchile.org> <200804181945.59189.max@love2party.net> <20080418204738.GE4840@obiwan.tataz.chchile.org> <20080419093701.GJ4840@obiwan.tataz.chchile.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080419093701.GJ4840@obiwan.tataz.chchile.org> User-Agent: Mutt/1.5.13 (2006-08-11) From: linimon@lonesome.com (Mark Linimon) Cc: Max Laier , Garance A Drosehn , freebsd-arch@FreeBSD.org Subject: Re: Integration of ProPolice in FreeBSD X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 21:24:59 -0000 On Sat, Apr 19, 2008 at 11:37:01AM +0200, Jeremie Le Hen wrote: > given that some ports simply don't compile with SSP (qemu, gcc4, > etherboot), my personal opinion is it should be enabled by default > for ports on -CURRENT in order to spot those out. Whate we generally do for sweeping changes like this is to first run them through an "experimental" build. This does two things: 1) the packages from this run don't get uploaded; 2) in almost all package runs, we only build the packages that will have changed since the last run. Thus, you don't get complete coverage. mcl From owner-freebsd-arch@FreeBSD.ORG Sat Apr 19 22:47:22 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE01F1065674 for ; Sat, 19 Apr 2008 22:47:22 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.171]) by mx1.freebsd.org (Postfix) with ESMTP id 9E7E08FC22 for ; Sat, 19 Apr 2008 22:47:22 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by wf-out-1314.google.com with SMTP id 25so970951wfa.7 for ; Sat, 19 Apr 2008 15:47:20 -0700 (PDT) Received: by 10.142.101.17 with SMTP id y17mr1159325wfb.20.1208645240299; Sat, 19 Apr 2008 15:47:20 -0700 (PDT) Received: from ?10.0.1.199? ( [24.94.72.120]) by mx.google.com with ESMTPS id 22sm3078969wfd.4.2008.04.19.15.47.18 (version=SSLv3 cipher=OTHER); Sat, 19 Apr 2008 15:47:19 -0700 (PDT) Date: Sat, 19 Apr 2008 12:47:47 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Alfred Perlstein In-Reply-To: <20080419192152.GX95731@elvis.mu.org> Message-ID: <20080419124622.Q942@desktop> References: <20080419004911.R942@desktop> <20080419192152.GX95731@elvis.mu.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org Subject: Re: monitor/mwait support for idle X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Apr 2008 22:47:22 -0000 On Sat, 19 Apr 2008, Alfred Perlstein wrote: > Jeff, this is very interesting! > > I have a question about your earlier email. You mentioned that the > IPIs and communication to enter the idle state (hlt, mwait) is expensive. > > Perhaps something to track the number of times entering and exiting > the state would be a good idea. options COUNT_IPIS will tell you the number of physical IPIs delivered. options SCHED_STATS has statistics on the number of times we were preempted by a remote cpu. > > Additionally a tuneable for the number of spins before entering the > state might be a good idea. Perhaps spinning for "1" or "2" (where > 1/2 is one unit of hz) before entering idle might be a good compromise > to avoid hlt enter/exit thrashing. > > Let me know your thoughts on this. It may already be implemented > so just saying "we do that" would be fine too. :) Presently it's kern.sched.idlespinthresh and idlespins. I'm not sure I intend to leave those sysctls indefinitely. Jeff > > -Alfred > > * Jeff Roberson [080419 03:56] wrote: >> http://people.freebsd.org/~jeff/mwait.diff >> >> This patch implements support for the x86/amd64 monitor and mwait >> instructions in the idle loop. This also implements idle loop selection >> via a sysctl string. The following loops are supported, in >> decreasing order of performance and power consumption: >> >> spin - Simply returns >> mwait - Always use mwait to sleep. CPU enters C0 or C1 depending on >> how busy it is. >> mwait_hlt - Use mwait when busy but fall back to hlt/acpi when not. >> hlt - pure hlt loop >> acpi - uses acpi_cpu_idle if available and hlt if not. This is the >> default. >> >> This also introduces a new MD function 'cpu_wake_idle' which allows MD to >> use a faster mechanism than IPI to wake idle. In the spin case this is a >> nop. For hlt and acpi we resort to an IPI. If the processor is sleeping >> in mwait we can simply write to a per-cpu buffer to wake it up. This >> saves considerable cpu cycles on the initiator and target. >> >> The prototype for cpu_idle() changed to accept an integer indication of >> how busy we are from the scheduler. If we have been busy MD code may >> choose to enter a higher power state on idle. ULE now spins for a short >> while if we have been very busy regardless of MD settings. >> >> There seems to be a problem entering C0 on the Xeons I have access to. It >> returns from mwait too quickly. Hopefully intel will respond to my email >> about that. >> >> Feedback welcome. >> >> Thanks, >> Jeff >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > -- > - Alfred Perlstein >