From owner-freebsd-arch@FreeBSD.ORG Thu Jul 17 23:54:13 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3BA20AC7 for ; Thu, 17 Jul 2014 23:54:13 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 080542F69 for ; Thu, 17 Jul 2014 23:54:13 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.9/8.14.9) with ESMTP id s6HNsCLm094099 for ; Thu, 17 Jul 2014 23:54:12 GMT (envelope-from bdrewery@freefall.freebsd.org) Received: (from bdrewery@localhost) by freefall.freebsd.org (8.14.9/8.14.9/Submit) id s6HNsCOP094095 for freebsd-arch@freebsd.org; Thu, 17 Jul 2014 23:54:12 GMT (envelope-from bdrewery) Received: (qmail 13543 invoked from network); 17 Jul 2014 18:54:09 -0500 Received: from unknown (HELO blah) (freebsd@shatow.net@67.182.131.225) by sweb.xzibition.com with ESMTPA; 17 Jul 2014 18:54:09 -0500 Message-ID: <53C8621E.5040101@FreeBSD.org> Date: Thu, 17 Jul 2014 18:54:06 -0500 From: Bryan Drewery Organization: FreeBSD User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-arch@freebsd.org Subject: Re: [RFC] ASLR Whitepaper and Candidate Final Patch References: <20140711232914.GH41807@pwnie.vrt.sourcefire.com> In-Reply-To: <20140711232914.GH41807@pwnie.vrt.sourcefire.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: PaX Team , alc@rice.edu, Oliver Pinter , des@freebsd.org, Shawn Webb X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 23:54:13 -0000 On 7/11/14, 6:29 PM, Shawn Webb wrote: > Hey All, > > Oliver Pinter and I have been working hard on our ASLR implementation. > We're now in the final stages of development and would like to get > feedback from the community. I've attached to this email a small > whitepaper that details our implementation and the accompanying patch. > > There is one part of the patch that I wrote that is quite an ugly hack > and would like to get some feedback on. I added a little hack to > sys_mmap() to apply ASLR to calls to mmap(2) when MAP_32BIT is > specified. I'd like to remove that ugly hack to something a bit more > beautiful, so if anyone has any suggestions, I'm all ears. > > Other than that ugly hack, the code adheres to FreeBSD's style(9) > standards. I believe we have an awesome implementation, one I've > personally been using without issue for months. > > I'm looking forward to your comments and questions. I've CC'd the PaX > team. Please keep them CC'd in your replies. > > Thank you very much, > > Shawn Webb > CC: PaX Team > CC: Oliver Pinter > CC: des@freebsd.org > CC: alc@rice.edu > CC: bdrewery@freebsd.org > > PS - Sorry for the duplicate emails. I hit the wrong key and didn't CC > everyone. I plan to review and test this and then commit it likely next weekend (7/27). I would do it sooner but will be busy next week. One big shortcoming I reported to Shawn was lack of committable documentation. He is working on that now. There was a lot of outrage over the NO_PIE commit which seemed to be much more directed at ASLR and its support scope across the system than the simple -fPIE change that was committed. If anyone has any concerns please do speak up now with constructive input. I am leaning towards leaving by PIE/ASLR off by default on head until more widespread testing can be done. Eventually we will want it enabled by default though. -- Regards, Bryan Drewery From owner-freebsd-arch@FreeBSD.ORG Thu Jul 17 23:55:43 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A5D3EB79 for ; Thu, 17 Jul 2014 23:55:43 +0000 (UTC) Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com [IPv6:2a00:1450:400c:c05::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3FAE02F77 for ; Thu, 17 Jul 2014 23:55:43 +0000 (UTC) Received: by mail-wi0-f178.google.com with SMTP id hi2so3635043wib.17 for ; Thu, 17 Jul 2014 16:55:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=r7XJHj1X6DUu/L8H1rPX9pue/L1X8WJ6cSwXous5zNc=; b=fFB1L/jfG0zSiD4nbmCW9Kljenw8ys+QWYgQ0PcAaU1p8aadYm0K7KKfcQrsp21cxC GAUrI4/wP8FKGL+6cSBr4Npn8C1DoqlM5+l+W0hSuO25dA/AR87+Vt11EoG0d5g1JlMD TJncV5LfkCJYyzQHwh+DN6KvxXCoh5u2J8eoZwP189l1YZHfhdj0rdXv/7y5YQJ+UFqV 7UTTaqSk5G6n7l+QMt2H61rkNbhCuuPxY5AVuLne9vlwGLjvMeTRVIjbkj33NMS4IN2n K8uyYLmaopxe5BmCSZRWZ7E5/yxTKB71ah5iDDlNPNwcAFjqCZ1Jf/FrsKeqwivDoYxC 9CRQ== X-Received: by 10.180.210.239 with SMTP id mx15mr2050986wic.65.1405641341458; Thu, 17 Jul 2014 16:55:41 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id x3sm18212wia.11.2014.07.17.16.55.40 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 17 Jul 2014 16:55:40 -0700 (PDT) Date: Fri, 18 Jul 2014 01:55:38 +0200 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: current fd allocation idiom Message-ID: <20140717235538.GA15714@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 23:55:43 -0000 The kernel has to deal with concurrent attempts to open, close, dup2 and use the same file descriptor. I start with stating a rather esoteric bug affecting this area, then I follow with a short overview of what is happening in general and a proposal how to change to get rid of the bug and get some additional enchancements. Interestingly enough turns out Linux is doing pretty much the same thing. ============================ THE BUG: /* * Initialize the file pointer with the specified properties. * * The ops are set with release semantics to be certain that the flags, type, * and data are visible when ops is. This is to prevent ops methods from being * called with bad data. */ void finit(struct file *fp, u_int flag, short type, void *data, struct fileops *ops) { fp->f_data = data; fp->f_flag = flag; fp->f_type = type; atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t)ops); } This itself is fine, but it assumes all code obtaining fp from fdtable places a read memory barrier after reading f_ops and before reading anything else. As you could guess no code does that and I don't believe placing rmb's in several new places is the way to go. ============================ GENERAL OVERVIEW OF CURRENT STATE: fps are obtained and installed as follows: struct file *fp; int fd; if (error = falloc(&fp, &fd)) return (error); if (error = something(....)) { fdclose(fp, fd); fdrop(fp); return (error); } finit(fp, ....); fdrop(fp); return (0); After falloc fp is installed in fdtable, it has refcount 2 and ops set to 'badfileops'. if something() failed: fdclose() checks if it has anything to do. if so, it cleares fd and fdrops fp fdrop() clears the second reference, everything is cleared up if something() succeeded: finit() finishes initialization of fp fdrop() cleares the second reference. fp now has expected refcount of 1. Now a little complication: parallel close() execution: fd is recognizes as used. it is cleared and fdrop(fp) is called. if something() succeeded after close: fdrop() kills fp if something() failed after close: fdclose() concludes nothing to do fdrop() kill fp Same story with dup2. What readers need to do: - rmb() after reading fp_ops - check fp_ops for badfileops ============================ PROPOSAL: struct file *fp; int fd; if (error = falloc(&fp, &fd)) return (error); if (error = something(....)) { fdclose(fp, fd); return (error); } finit(fp, ....); factivate(fd, fp); return (0); After falloc fd is only marked as used, fp is NOT installed. fp is returned with refcount of 1 and is invisible to anyone but curthread. if something() failed: fdclose() marks fd as unused and kills fp if something() succeeded: finit() finishes initialization of fp factivate() sets fp to non-null with a barrier Now a little complication: parallel close() execution: since fp is null, fd is recognized as unused. EBADF is returned. The only problem is with dup2 and I believe it is actually a step forward. Let's assume fd was marked as used by falloc, but fp was not installed yet. dup2(n, fd) will see that fd is used. With current code there is no problem since there is fp to fdrop and it can proceed. With the proposal however, there is nothing to fdrop. Linux returns EBADF in this case which deals with the problem and does not seem to provide any drawbacks for behaving processes. So, differences to current approach: 1. fewer barriers and atomic operations 2. no need to check for f_ops type 3. new case when dup2 can return an error Note that 3 should not be a problem since Linux is doing this already. Also note current approach is not implemented correctly at the moment as it misses rmbs, although I'm unsure how much this matters in practice. Thoughts? -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 00:56:14 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5FC8C7CB for ; Fri, 18 Jul 2014 00:56:14 +0000 (UTC) Received: from mail-we0-x22b.google.com (mail-we0-x22b.google.com [IPv6:2a00:1450:400c:c03::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ED1452424 for ; Fri, 18 Jul 2014 00:56:13 +0000 (UTC) Received: by mail-we0-f171.google.com with SMTP id p10so3864627wes.30 for ; Thu, 17 Jul 2014 17:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=2UHo1D+wgmjmu1YOK/F1+2Rrx/2YN6qgEqtOk0tTVrA=; b=JfbiY9XbjHfNQ88DlOiJCF8JfuoS77p1VDcQxy6nhYpR3IBf0ao7nN5EYhWPqbIcDm 0j0qPdMR5GKoE8foEl+A/W5U8kg/o+xrO/N/CNe77Em2WfEf44WUxmP8ZdV9k7bJhgls zOzjgTiIQXf7Lw34+Uig+7bvoD0jhX8EZuayBR6wIX7FC0nk3GyZsK1eMmZen8F83RO4 dDh5hrxpqPAqNq9H+TuMmhLY5vCKxLf4RI+1b1D1hn64zeCaTLPSI5+7ndvEhYJPvlm1 Ywdn802zCEghYejVmYB7qLisMMZ6ryyXu2cxx3jvpCP1tHBx/7QllPSBRA0kfADeYG8N FLAA== X-Received: by 10.194.10.167 with SMTP id j7mr1398572wjb.100.1405644972211; Thu, 17 Jul 2014 17:56:12 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id jb16sm493326wic.10.2014.07.17.17.56.10 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 17 Jul 2014 17:56:10 -0700 (PDT) Date: Fri, 18 Jul 2014 02:56:08 +0200 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: Re: current fd allocation idiom Message-ID: <20140718005608.GB15714@dft-labs.eu> References: <20140717235538.GA15714@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20140717235538.GA15714@dft-labs.eu> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 00:56:14 -0000 On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote: > The kernel has to deal with concurrent attempts to open, close, dup2 and > use the same file descriptor. > > I start with stating a rather esoteric bug affecting this area, then I > follow with a short overview of what is happening in general and a > proposal how to change to get rid of the bug and get some additional > enchancements. Interestingly enough turns out Linux is doing pretty much > the same thing. > > ============================ > THE BUG: > /* > * Initialize the file pointer with the specified properties. > * > * The ops are set with release semantics to be certain that the flags, type, > * and data are visible when ops is. This is to prevent ops methods from being > * called with bad data. > */ > void > finit(struct file *fp, u_int flag, short type, void *data, struct fileops *ops) > { > fp->f_data = data; > fp->f_flag = flag; > fp->f_type = type; > atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t)ops); > } > > This itself is fine, but it assumes all code obtaining fp from fdtable places > a read memory barrier after reading f_ops and before reading anything else. > As you could guess no code does that and I don't believe placing rmb's > in several new places is the way to go. > > ============================ > GENERAL OVERVIEW OF CURRENT STATE: > > fps are obtained and installed as follows: > > struct file *fp; > int fd; > > if (error = falloc(&fp, &fd)) > return (error); > if (error = something(....)) { > fdclose(fp, fd); > fdrop(fp); > return (error); > } > > finit(fp, ....); > fdrop(fp); > return (0); > > After falloc fp is installed in fdtable, it has refcount 2 and ops set to > 'badfileops'. > > if something() failed: > fdclose() checks if it has anything to do. if so, it cleares fd and fdrops fp > fdrop() clears the second reference, everything is cleared up > > if something() succeeded: > finit() finishes initialization of fp > fdrop() cleares the second reference. fp now has expected refcount of 1. > > Now a little complication: > parallel close() execution: > fd is recognizes as used. it is cleared and fdrop(fp) is called. > > if something() succeeded after close: > fdrop() kills fp > > if something() failed after close: > fdclose() concludes nothing to do > fdrop() kill fp > > Same story with dup2. > > What readers need to do: > - rmb() after reading fp_ops > - check fp_ops for badfileops > > ============================ > PROPOSAL: > > struct file *fp; > int fd; > > if (error = falloc(&fp, &fd)) > return (error); > if (error = something(....)) { > fdclose(fp, fd); > return (error); > } > > finit(fp, ....); > factivate(fd, fp); > return (0); > > After falloc fd is only marked as used, fp is NOT installed. > fp is returned with refcount of 1 and is invisible to anyone but > curthread. > > if something() failed: > fdclose() marks fd as unused and kills fp > > if something() succeeded: > finit() finishes initialization of fp > factivate() sets fp to non-null with a barrier > > Now a little complication: > parallel close() execution: > since fp is null, fd is recognized as unused. EBADF is returned. > > The only problem is with dup2 and I believe it is actually a step > forward. > > Let's assume fd was marked as used by falloc, but fp was not installed yet. > dup2(n, fd) will see that fd is used. With current code there is no > problem since there is fp to fdrop and it can proceed. With the proposal > however, there is nothing to fdrop. Linux returns EBADF in this case > which deals with the problem and does not seem to provide any drawbacks > for behaving processes. > > So, differences to current approach: > 1. fewer barriers and atomic operations > 2. no need to check for f_ops type > 3. new case when dup2 can return an error > One has to note that fdtable can be reallocated at any time and factivate would have to make sure it updated the current table. The simplest thing would be to take filedesc lock, which diminishes advantages to some extent. Maybe switching sx lock to other kind would remedy this a little. > Note that 3 should not be a problem since Linux is doing this already. > > Also note current approach is not implemented correctly at the moment as > it misses rmbs, although I'm unsure how much this matters in practice. > > Thoughts? > -- > Mateusz Guzik -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 07:08:23 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 141CE4C8 for ; Fri, 18 Jul 2014 07:08:23 +0000 (UTC) Received: from o1.newsletters.flashissue.com (o1.newsletters.flashissue.com [50.31.39.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B2BE82FE4 for ; Fri, 18 Jul 2014 07:08:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=email1.flashissue.com; h=from:to:subject:mime-version:content-type:sender; s=smtpapi; bh=GzM+/K7O53O9J2fflQrnTsXA9T4=; b=eWN5lANn54kQLeoORZZwAhkBLsHy3 coiuZesEN6wXPjPp9Cjy078ijqfGbm5JVyJTG5JXT5QGPexp/k9B8YBljqooXsrU +V/IilDICo5LQVQbOX1IB+oKikS4laLytZFeX5YwGnF9kCPIYVe3+jqYB4XbYOta x1cROGvrmCDCzg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=email1.flashissue.com; h=from:to:subject:mime-version:content-type:sender; q=dns; s=smtpapi; b=hgpPl/JmJzKtgZ99IHUOLrR41FO385HagbScNFtmpiR1PgIpuCu T6bBR/EwWdjaBLua286s0iIVcTOqFmRNICu0l9bx8u6mMrYQzdkvjnCTwFuR0AeR fx4jNu1a+Hde8gI8I+KG0B44hU7fQWRM0kyAJn6v1WaAzDC4fz3xXrB0= Received: by mf143.sendgrid.net with SMTP id mf143.37030.53C8C7E51B 2014-07-18 07:08:21.594845927 +0000 UTC Received: from ip-10-191-123-72.ec2.internal (ec2-23-21-128-144.compute-1.amazonaws.com [23.21.128.144]) by ismtpd-013.iad1.sendgrid.net (SG) with ESMTP id 147484cc440.3b99.fb0b5 for ; Fri, 18 Jul 2014 07:08:16 +0000 (GMT) From: "Abiconol Cosm." To: Message-ID: <1460886618.26024.1405667296326.JavaMail.root@ip-10-191-123-72> Subject: The best wart remover MIME-Version: 1.0 Date: Fri, 18 Jul 2014 07:08:21 +0000 (UTC) X-SG-EID: t2fXfoZHCw6vGsGKHqKxJ1vvZFmqZoRBNxUXljRrVNbIPFWV6/2z2d5OTBljzw0yo31tgh+0Z6j5FwZAJeg2uS4/YVQVEY1KK9V8520Od/PvpI24cRF7CO1aKpc5w28a51gd9d9vy8gguIU4UXitt0Ku31eLx3Vy7rQYzbcwz2M= X-SG-ID: abJri/z89ozyEJuJkS5UoDaB/x8mT16BRdU6pWHt6OxYThwsJ3IEaTZ5jsgbHZWCh0Z7W3vPiAsxgrZzV8GQrEG4n8K4J+z2/b36tjBTZ9blrYS4At++3TxifsGvvHdCG4g7K6RNnyUF4uTnboWFu+tEtg03k0uQhGUW5wIm/H5mZoHWtVZNwdVpt2mDJPN3KYT//b7E8tAb3mf4//eEFBOaM3qmO1aiDHESkbsfe+voadX43PiTHTjLZDfjS9/npxSpxTO7GaSvftMxQoXvtntVL47WD4H6q7B9XdfHAU6XegW+NZH0f7qcgLyb5qaZWvUkjcrF9wSuEBP7AsFxoqfziAsaoAwAcDXFTBuVTCU= Sender: Abiconol Cosm. Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 07:08:23 -0000 - =C2=A0 =C2=A0 View in web browser =C2=A0 =C2=A0 ABICONOL Up to one in 10 = people will have a wart at some point in their life, most commonly in child= hood or adolescence. Warts and verrucas are usually harmless and go away by= themselves, with up to nine in 10 disappearing within two years in childre= n but often taking longer in adults. =C2=A0 =C2=A0 [continue...] =C2=A0 =C2= =A0 =C2=A0 =C2=A0 What is a wart ? Definition Warts are small, benign growt= hs caused by a viral infection of the skin or mucous membrane. The virus in= fects the surface layer. The viruses that cause warts are members of the hu= man papilloma virus (HPV) family. Warts are not cancerous but some strains = of HPV, usually not associated with warts, have been linked with... =C2=A0 = =C2=A0 [continue...] =C2=A0 =C2=A0 =C2=A0 Mailing address: Abiconol Cosm, A= ustralia, Sydney, 234 McConnel str., office 231, Sydney, Sydney, 2000, Aust= ralia Unsubscribe from future emails.= From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 13:06:37 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DD7E1922 for ; Fri, 18 Jul 2014 13:06:36 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 66373216F for ; Fri, 18 Jul 2014 13:06:36 +0000 (UTC) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s6ID6UdI051389 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 18 Jul 2014 16:06:30 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s6ID6UdI051389 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s6ID6TKx051388; Fri, 18 Jul 2014 16:06:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 18 Jul 2014 16:06:29 +0300 From: Konstantin Belousov To: Mateusz Guzik Subject: Re: current fd allocation idiom Message-ID: <20140718130629.GJ93733@kib.kiev.ua> References: <20140717235538.GA15714@dft-labs.eu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="fkEewJHTsXZZ0w48" Content-Disposition: inline In-Reply-To: <20140717235538.GA15714@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 13:06:37 -0000 --fkEewJHTsXZZ0w48 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote: > The kernel has to deal with concurrent attempts to open, close, dup2 and > use the same file descriptor. >=20 > I start with stating a rather esoteric bug affecting this area, then I > follow with a short overview of what is happening in general and a > proposal how to change to get rid of the bug and get some additional > enchancements. Interestingly enough turns out Linux is doing pretty much > the same thing. >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > THE BUG: > /* > * Initialize the file pointer with the specified properties. > * > * The ops are set with release semantics to be certain that the flags, t= ype, > * and data are visible when ops is. This is to prevent ops methods from= being > * called with bad data. > */ > void > finit(struct file *fp, u_int flag, short type, void *data, struct fileops= *ops) > { > fp->f_data =3D data; > fp->f_flag =3D flag; > fp->f_type =3D type; > atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t= )ops); > } >=20 > This itself is fine, but it assumes all code obtaining fp from fdtable pl= aces > a read memory barrier after reading f_ops and before reading anything els= e. > As you could guess no code does that and I don't believe placing rmb's > in several new places is the way to go. I think your analysis is correct for all cases except kern_openat(). For kern_openat(), we install a file into fdtable only after the struct file is fully initialized, see kern_openat(). The file is allocated with falloc_noinstall(), then file is initialized, then finstall() does FILEDESC_LOCK/UNLOCK, which ensures the full barrier. Other file accessors do fget_unlocked(), which does acquire (of fp->f_count, but this does not matter). The only critical constraint is that other accessors must not see f_ops !=3D badfileops while struct file is not fully visible. IMO, the full barrier in finstall() and acquire in fget*() guarantee that we see badfileops until writes to other members are visible. For falloc(), indeed the write to f_ops could become visible too early (but not on x86). >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > GENERAL OVERVIEW OF CURRENT STATE: >=20 > fps are obtained and installed as follows: >=20 > struct file *fp; > int fd; >=20 > if (error =3D falloc(&fp, &fd)) > return (error); > if (error =3D something(....)) { > fdclose(fp, fd); > fdrop(fp); > return (error); > } >=20 > finit(fp, ....); > fdrop(fp); > return (0); >=20 > After falloc fp is installed in fdtable, it has refcount 2 and ops set to > 'badfileops'. >=20 > if something() failed: > fdclose() checks if it has anything to do. if so, it cleares fd and fdrop= s fp > fdrop() clears the second reference, everything is cleared up >=20 > if something() succeeded: > finit() finishes initialization of fp > fdrop() cleares the second reference. fp now has expected refcount of 1. >=20 > Now a little complication: > parallel close() execution: > fd is recognizes as used. it is cleared and fdrop(fp) is called. >=20 > if something() succeeded after close: > fdrop() kills fp >=20 > if something() failed after close: > fdclose() concludes nothing to do > fdrop() kill fp >=20 > Same story with dup2. >=20 > What readers need to do: > - rmb() after reading fp_ops > - check fp_ops for badfileops How can readers see badfileops ? >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > PROPOSAL: >=20 > struct file *fp; > int fd; >=20 > if (error =3D falloc(&fp, &fd)) Use falloc_noinstall() there. > return (error); > if (error =3D something(....)) { > fdclose(fp, fd); > return (error); > } >=20 > finit(fp, ....); > factivate(fd, fp); This function is spelled finstall(). It seems that all what is needed is conversion of places using falloc() to falloc_noinstall()/finstall(). > return (0); >=20 > After falloc fd is only marked as used, fp is NOT installed. > fp is returned with refcount of 1 and is invisible to anyone but > curthread. >=20 > if something() failed: > fdclose() marks fd as unused and kills fp >=20 > if something() succeeded: > finit() finishes initialization of fp > factivate() sets fp to non-null with a barrier >=20 > Now a little complication: > parallel close() execution: > since fp is null, fd is recognized as unused. EBADF is returned. >=20 > The only problem is with dup2 and I believe it is actually a step > forward. >=20 > Let's assume fd was marked as used by falloc, but fp was not installed ye= t. > dup2(n, fd) will see that fd is used. With current code there is no > problem since there is fp to fdrop and it can proceed. With the proposal > however, there is nothing to fdrop. Linux returns EBADF in this case > which deals with the problem and does not seem to provide any drawbacks > for behaving processes. >=20 > So, differences to current approach: > 1. fewer barriers and atomic operations > 2. no need to check for f_ops type > 3. new case when dup2 can return an error >=20 > Note that 3 should not be a problem since Linux is doing this already. >=20 > Also note current approach is not implemented correctly at the moment as > it misses rmbs, although I'm unsure how much this matters in practice. >=20 > Thoughts? > --=20 > Mateusz Guzik > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" --fkEewJHTsXZZ0w48 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJTyRvVAAoJEJDCuSvBvK1BZ0oP/jcXsr1uN2vcD85ED6NlFOvs I/PCpLyndYxOdMEGhUPiHw7nMvmpcHknGjxyPrzr64fMCk1WUiySA7gY5kqNxV/w UZB61O9O57A9m3c+3uIHMWQdhZYPk7WC0d1H59rCcwlnMrFAZXgy+EcFvd4MNAgN gTR4Ss2mb8sqjVpl34Sal4MRV2AuprGsWJfViomcGYOp02pu8pNbvq2hySmoTu/Y HGsx9Pxv9Tq5P4pWBbb6RG6nEB2wiggv2qQa3MWy4YSf+e5GI5b0aeu9mZO9RD2/ t5iL8skUp4TTt/TZIM7KOpbvzqfKU30nFfB+b8ZNB2bKMkXgvWU4KS6vqhIcCX4z DWq0XbEoQ63myLqcdzv8FSxuWhA2Djv8BoslpeIZeGltM+aMAhBw1htI5kP44aWo IvDmW//O5QrUay1cct1Nnn170sRF8pjzeazQECiP3c8GVHYdUXqy4HWuLLs1lqGg LVgruvQMcuzxYuB4gQtttvKYl7M5gS12H0zj/ALn+YoW9i4VHvjk31GvvZ4NAVw2 ahHohWU7KJSLukmgrYt5Cb33CyCd5obNpm4YLR/iQo6rjhxN4m9Lg5wTek8Rwz03 pjqP0ZKJJPX4lSX0CzKPStcO5Oz/gIVgoUGNhdRWHa1WxRFE0nBHLGfeYizU2Bsq nfX+H5kv4pa7+pR0+SJz =YyP8 -----END PGP SIGNATURE----- --fkEewJHTsXZZ0w48-- From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 14:40:23 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 580F8D63 for ; Fri, 18 Jul 2014 14:40:23 +0000 (UTC) Received: from mail-wg0-x22b.google.com (mail-wg0-x22b.google.com [IPv6:2a00:1450:400c:c00::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D9B672B3F for ; Fri, 18 Jul 2014 14:40:22 +0000 (UTC) Received: by mail-wg0-f43.google.com with SMTP id l18so3572799wgh.26 for ; Fri, 18 Jul 2014 07:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=jlQhRrPppFpp/X6D5HOH4wlhEjThjppSJCUPy1awlKo=; b=zLNj2kPCjxdcvGhUtLOs8SZODQOkkH2VjsPg8Har0kqqfsCjN2ry9wcpJRnQawB4wx hVYjK4nENyAelu5UPkPMtOtDZvLz32J1VM+6QQgSia3v2tl5M8/XNEDtesaEJg5zYG1J YQM8WS2gzUvZ3CrbUM1EvqvSLwSn9Evfqs47LTMC3ezDsFUYWp2e5wqaYUs7rwmZLOEj dYWRi3WF/f1QCc0T6W6rk2ZZsUyRLl8b5L3cA6m00lOKEnXw9WjqQnCg/qsrk+1tLLXG EC8uFc2eqDCTIC+lMt/l6pBvXvur/jcebMqq/+WbsaNpFbkR1yVwiXs0RrCg55zR7ie5 FFFA== X-Received: by 10.194.243.200 with SMTP id xa8mr7286329wjc.97.1405694419121; Fri, 18 Jul 2014 07:40:19 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id bg2sm7360450wib.21.2014.07.18.07.40.17 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 18 Jul 2014 07:40:18 -0700 (PDT) Date: Fri, 18 Jul 2014 16:40:12 +0200 From: Mateusz Guzik To: Konstantin Belousov Subject: Re: current fd allocation idiom Message-ID: <20140718144012.GA7179@dft-labs.eu> References: <20140717235538.GA15714@dft-labs.eu> <20140718130629.GJ93733@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20140718130629.GJ93733@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 14:40:23 -0000 On Fri, Jul 18, 2014 at 04:06:29PM +0300, Konstantin Belousov wrote: > On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote: > > The kernel has to deal with concurrent attempts to open, close, dup2 and > > use the same file descriptor. > > > > I start with stating a rather esoteric bug affecting this area, then I > > follow with a short overview of what is happening in general and a > > proposal how to change to get rid of the bug and get some additional > > enchancements. Interestingly enough turns out Linux is doing pretty much > > the same thing. > > > > ============================ > > THE BUG: > > /* > > * Initialize the file pointer with the specified properties. > > * > > * The ops are set with release semantics to be certain that the flags, type, > > * and data are visible when ops is. This is to prevent ops methods from being > > * called with bad data. > > */ > > void > > finit(struct file *fp, u_int flag, short type, void *data, struct fileops *ops) > > { > > fp->f_data = data; > > fp->f_flag = flag; > > fp->f_type = type; > > atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintptr_t)ops); > > } > > > > This itself is fine, but it assumes all code obtaining fp from fdtable places > > a read memory barrier after reading f_ops and before reading anything else. > > As you could guess no code does that and I don't believe placing rmb's > > in several new places is the way to go. > I think your analysis is correct for all cases except kern_openat(). > > For kern_openat(), we install a file into fdtable only after the struct > file is fully initialized, see kern_openat(). The file is allocated > with falloc_noinstall(), then file is initialized, then finstall() does > FILEDESC_LOCK/UNLOCK, which ensures the full barrier. Other file > accessors do fget_unlocked(), which does acquire (of fp->f_count, but > this does not matter). > > The only critical constraint is that other accessors must not see > f_ops != badfileops while struct file is not fully visible. IMO, > the full barrier in finstall() and acquire in fget*() guarantee > that we see badfileops until writes to other members are visible. > I agree, but I doubt everything can be converted to this model (see below). > For falloc(), indeed the write to f_ops could become visible too early > (but not on x86). > > > > > ============================ > > GENERAL OVERVIEW OF CURRENT STATE: > > > > What readers need to do: > > - rmb() after reading fp_ops > > - check fp_ops for badfileops > How can readers see badfileops ? > Not sure what you mean. fp is installed with badfileops, anything accessing fdtable before finit finishes will see this. > > > > ============================ > > PROPOSAL: > > > > struct file *fp; > > int fd; > > > > if (error = falloc(&fp, &fd)) > Use falloc_noinstall() there. > > > return (error); > > if (error = something(....)) { > > fdclose(fp, fd); > > return (error); > > } > > > > finit(fp, ....); > > factivate(fd, fp); > This function is spelled finstall(). > > It seems that all what is needed is conversion of places using > falloc() to falloc_noinstall()/finstall(). > This postpones fd allocation to after interested function did all work it wanted to do, which means we would need reliable ways of reverting all the work in case allocation failed. I'm not so confident we can do that for all current consumers and both current and my proposed approach don't impose such requirement. Of course postponing fd allocation where possible is definitely worth doing. For cases where it is not possible/feasible, the only problem we have is making sure we update fd entry in proper table in factivate. The easiest solution is to FILEDESC_XLOCK, but this may have measurable impact. We can get away with FILEDESC_SLOCK just fine, but this still writes and may ping-pong with other cpus. This is another place where we could just plop sequence counters. fdgrowtable would +/-: seq_write_begin(&fdp->fd_tbl_seq); memcpy(....); assign the new pointer seq_end_begin(&fdp->fd_tbl_seq); Then factivate would be +/-: do { fdtable = __READ_ONCE(..., fdp->fd_tbl); rmb(); seq = seq_read(fdp->fd_tbl_seq); fdtable[fd] = fp; while (!seq_consistent(fdp->fd_tbl_seq, seq)); This in worst case updates old fdtable (which we never free so it is harmless) and retries. seq_read never returns seq in 'modify' state, and we check that seq is the same before and after the operation. If the condition is met there was no concurrent fdtable copy. This also means that table readers can 'suddenly' get fps, but this should be fine. NULL fp is used to denote unused fd. If fp is non NULL, it is safe for use. If it is NULL, fd is ignored which should not matter. Functions like fdcopy just have to make sure they read fp once. -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 16:00:05 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2BD21E2F for ; Fri, 18 Jul 2014 16:00:05 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A64BD22F5 for ; Fri, 18 Jul 2014 16:00:04 +0000 (UTC) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s6IFxxW6090738 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 18 Jul 2014 18:59:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s6IFxxW6090738 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s6IFxxi3090737; Fri, 18 Jul 2014 18:59:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 18 Jul 2014 18:59:59 +0300 From: Konstantin Belousov To: Mateusz Guzik Subject: Re: current fd allocation idiom Message-ID: <20140718155959.GN93733@kib.kiev.ua> References: <20140717235538.GA15714@dft-labs.eu> <20140718130629.GJ93733@kib.kiev.ua> <20140718144012.GA7179@dft-labs.eu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="whB3aSVTmcGVRwAq" Content-Disposition: inline In-Reply-To: <20140718144012.GA7179@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 16:00:05 -0000 --whB3aSVTmcGVRwAq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 18, 2014 at 04:40:12PM +0200, Mateusz Guzik wrote: > On Fri, Jul 18, 2014 at 04:06:29PM +0300, Konstantin Belousov wrote: > > On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote: > > > The kernel has to deal with concurrent attempts to open, close, dup2 = and > > > use the same file descriptor. > > >=20 > > > I start with stating a rather esoteric bug affecting this area, then I > > > follow with a short overview of what is happening in general and a > > > proposal how to change to get rid of the bug and get some additional > > > enchancements. Interestingly enough turns out Linux is doing pretty m= uch > > > the same thing. > > >=20 > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > > > THE BUG: > > > /* > > > * Initialize the file pointer with the specified properties. > > > * > > > * The ops are set with release semantics to be certain that the flag= s, type, > > > * and data are visible when ops is. This is to prevent ops methods = =66rom being > > > * called with bad data. > > > */ > > > void > > > finit(struct file *fp, u_int flag, short type, void *data, struct fil= eops *ops) > > > { > > > fp->f_data =3D data; > > > fp->f_flag =3D flag; > > > fp->f_type =3D type; > > > atomic_store_rel_ptr((volatile uintptr_t *)&fp->f_ops, (uintp= tr_t)ops); > > > } > > >=20 > > > This itself is fine, but it assumes all code obtaining fp from fdtabl= e places > > > a read memory barrier after reading f_ops and before reading anything= else. > > > As you could guess no code does that and I don't believe placing rmb's > > > in several new places is the way to go. > > I think your analysis is correct for all cases except kern_openat(). > >=20 > > For kern_openat(), we install a file into fdtable only after the struct > > file is fully initialized, see kern_openat(). The file is allocated > > with falloc_noinstall(), then file is initialized, then finstall() does > > FILEDESC_LOCK/UNLOCK, which ensures the full barrier. Other file > > accessors do fget_unlocked(), which does acquire (of fp->f_count, but > > this does not matter). > >=20 > > The only critical constraint is that other accessors must not see > > f_ops !=3D badfileops while struct file is not fully visible. IMO, > > the full barrier in finstall() and acquire in fget*() guarantee > > that we see badfileops until writes to other members are visible. > >=20 >=20 > I agree, but I doubt everything can be converted to this model (see below= ). >=20 > > For falloc(), indeed the write to f_ops could become visible too early > > (but not on x86). > >=20 > > >=20 > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > > > GENERAL OVERVIEW OF CURRENT STATE: > > >=20 > > > What readers need to do: > > > - rmb() after reading fp_ops > > > - check fp_ops for badfileops > > How can readers see badfileops ? > >=20 >=20 > Not sure what you mean. fp is installed with badfileops, anything > accessing fdtable before finit finishes will see this. I referenced falloc_noinstall(). >=20 > > >=20 > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > > > PROPOSAL: > > >=20 > > > struct file *fp; > > > int fd; > > >=20 > > > if (error =3D falloc(&fp, &fd)) > > Use falloc_noinstall() there. > >=20 > > > return (error); > > > if (error =3D something(....)) { > > > fdclose(fp, fd); > > > return (error); > > > } > > >=20 > > > finit(fp, ....); > > > factivate(fd, fp); > > This function is spelled finstall(). > >=20 > > It seems that all what is needed is conversion of places using > > falloc() to falloc_noinstall()/finstall(). > >=20 >=20 > This postpones fd allocation to after interested function did all work > it wanted to do, which means we would need reliable ways of reverting > all the work in case allocation failed. I'm not so confident we can do > that for all current consumers and both current and my proposed approach > don't impose such requirement. Cleanup should be identical to the actions done on close(2). >=20 > Of course postponing fd allocation where possible is definitely worth > doing. Yes, and after that the rest of the cases should be evaluated. But my gut feeling is that everything would be converted. >=20 > For cases where it is not possible/feasible, the only problem we have is > making sure we update fd entry in proper table in factivate. >=20 > The easiest solution is to FILEDESC_XLOCK, but this may have measurable > impact. We can get away with FILEDESC_SLOCK just fine, but this still > writes and may ping-pong with other cpus. >=20 > This is another place where we could just plop sequence counters. >=20 > fdgrowtable would +/-: > seq_write_begin(&fdp->fd_tbl_seq); > memcpy(....); > assign the new pointer > seq_end_begin(&fdp->fd_tbl_seq); >=20 > Then factivate would be +/-: >=20 > do { > fdtable =3D __READ_ONCE(..., fdp->fd_tbl); > rmb(); > seq =3D seq_read(fdp->fd_tbl_seq); > fdtable[fd] =3D fp; > while (!seq_consistent(fdp->fd_tbl_seq, seq)); >=20 > This in worst case updates old fdtable (which we never free so it is > harmless) and retries. seq_read never returns seq in 'modify' state, > and we check that seq is the same before and after the operation. If the > condition is met there was no concurrent fdtable copy. >=20 > This also means that table readers can 'suddenly' get fps, but this > should be fine. NULL fp is used to denote unused fd. If fp is non NULL, > it is safe for use. If it is NULL, fd is ignored which should not matter. >=20 > Functions like fdcopy just have to make sure they read fp once. >=20 > --=20 > Mateusz Guzik --whB3aSVTmcGVRwAq Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJTyUR+AAoJEJDCuSvBvK1BQcYP/iGyymAyxnX4nZ55SIS0sz1n KS9bJzNmOsnQrqQNcTP1AArkmCPqa2MGTWEXxzvS54dhXCBRxA14WmSQykfUV7jg 2B2AsnnQ5lNOxwOjjHwvvkhrWtqoxvuVhg7a6ADFIu4AS5TCtVxi1bohO5X+T3ED qoPxtDai6rNNiRJKu3aNfKYT0idNVt1HROHRu3pRelBkfFqwdxO8D4GsI8e2rQGY 74ZIi3kkVq+N3GA1KbSnMzrEhzE5jrUNz5v9YCszoYmbsyKY1myToTQoqX/E70l2 KnhIotBbdQFEYBb2EffCHfrzn/Tl8y6uE//7Aqag6DVj3iJ5Khv5FLe1jiJXx573 sxfOj5N+eaiBY9Y8T7pNkYkrHAz2910UR+yW04atAV7+/yKHH1uyVewZej+JAFwN IFYKcfwhBOlbCBc+mfV1WZN0axszJX3eX+i7Fi5Mo2GdaLeEFyqZSWbBlNHTPpw2 OwrzDFZAcm9EsLQGf6RP4Mawq3TeuKNvdqjuNcegeZHNHGyFU3XCrzBLKVz2OOWY Dh+sO5b79K7MZctNvZNhjOMawy5BALaFapsjnJc5QvR9Gknxml256Sc18eSfDRgt qQ/fQA/SoQRndzhRjRsHEGOCIzwqeNIgaMKv86NiAND/zB3Xi5jFKzpeSoig3TvU 0HhnD6SBejgndg23BgKI =UyEP -----END PGP SIGNATURE----- --whB3aSVTmcGVRwAq-- From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 16:07:14 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EEBEFF7A; Fri, 18 Jul 2014 16:07:14 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 753AF23AD; Fri, 18 Jul 2014 16:07:14 +0000 (UTC) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s6IG786b093036 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 18 Jul 2014 19:07:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s6IG786b093036 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s6IG78YB093035; Fri, 18 Jul 2014 19:07:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 18 Jul 2014 19:07:08 +0300 From: Konstantin Belousov To: arch@freebsd.org Subject: KDB entry on NMI Message-ID: <20140718160708.GO93733@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="IOwL3FhNvW0Xz3At" Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: amd64@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 16:07:15 -0000 --IOwL3FhNvW0Xz3At Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable It was mentioned somewhere recently, that typical BIOS today configures NMI delivery on the hardware events as broadcast. When I developerd the dmar(4) busdma backend, I indeed met the problem, and wrote a prototype which avoided startup of ddb on all cores. Instead, the patch implements custom spinlock, which allows only one core to win, other cores ignore the NMI, by spinning on lock. The issue which I see on at least two different machines with different Intel chipsets, is that NMI is somehow sticky, i.e. it is re-delivered after the handler executes iret. I am not sure what the problem is, whether it is due to hardware needing some ACK, or a bug in code. Anyway, even on two-cores machine, having both cores simultaneously enter NMI makes the use of ddb impossible, so I believe the patch is improvement. I make measures to ensure that reboot from ddb prompt works. Thought ? diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c index 9b12449..76b992a 100644 --- a/sys/amd64/amd64/mp_machdep.c +++ b/sys/amd64/amd64/mp_machdep.c @@ -32,14 +32,18 @@ __FBSDID("$FreeBSD$"); #include "opt_kstack_pages.h" #include "opt_sched.h" #include "opt_smp.h" +#include "opt_isa.h" +#include "opt_kdb.h" =20 #include #include #include +#include #include #ifdef GPROF=20 #include #endif +#include #include #include #include @@ -60,6 +64,7 @@ __FBSDID("$FreeBSD$"); =20 #include #include +#include #include #include #include @@ -164,6 +169,7 @@ static int cpu_logical; /* logical cpus per core */ static int cpu_cores; /* cores per package */ =20 static void assign_cpu_ids(void); +static void cpustop_handler_post(u_int cpu); static void set_interrupt_apic_ids(void); static int start_ap(int apic_id); static void release_aps(void *dummy); @@ -1415,26 +1421,44 @@ ipi_nmi_handler() cpustop_handler(); return (0); } - =20 -/* - * Handle an IPI_STOP by saving our current context and spinning until we - * are resumed. - */ -void -cpustop_handler(void) -{ - u_int cpu; =20 - cpu =3D PCPU_GET(cpuid); +#ifdef DEV_ISA +static int nmi_kdb_lock; +#endif =20 - savectx(&stoppcbs[cpu]); +#ifdef DEV_ISA +bool +nmi_call_kdb_smp(u_int type, int code, struct trapframe *frame, bool do_pa= nic) +{ + int cpu; + bool call_post, ret; =20 - /* Indicate that we are stopped */ - CPU_SET_ATOMIC(cpu, &stopped_cpus); + call_post =3D false; + cpu =3D PCPU_GET(cpuid); + if (atomic_cmpset_acq_int(&nmi_kdb_lock, 0, 1)) { + ret =3D nmi_call_kdb(cpu, type, code, frame, do_panic); + } else { + ret =3D true; + savectx(&stoppcbs[cpu]); + while (!atomic_cmpset_acq_int(&nmi_kdb_lock, 0, 1)) { + if (CPU_ISSET(cpu, &ipi_nmi_pending)) { + CPU_CLR_ATOMIC(cpu, &ipi_nmi_pending); + call_post =3D true; + } + cpustop_handler_post(cpu); + cpu_spinwait(); + } + } + atomic_store_rel_int(&nmi_kdb_lock, 0); + if (call_post) + cpustop_handler_post(cpu); + return (ret); +} +#endif =20 - /* Wait for restart */ - while (!CPU_ISSET(cpu, &started_cpus)) - ia32_pause(); +static void +cpustop_handler_post(u_int cpu) +{ =20 CPU_CLR_ATOMIC(cpu, &started_cpus); CPU_CLR_ATOMIC(cpu, &stopped_cpus); @@ -1450,6 +1474,25 @@ cpustop_handler(void) } =20 /* + * Handle an IPI_STOP by saving our current context and spinning until we + * are resumed. + */ +void +cpustop_handler(void) +{ + u_int cpu; + + cpu =3D PCPU_GET(cpuid); + savectx(&stoppcbs[cpu]); + /* Indicate that we are stopped */ + CPU_SET_ATOMIC(cpu, &stopped_cpus); + /* Wait for restart */ + while (!CPU_ISSET(cpu, &started_cpus)) + ia32_pause(); + cpustop_handler_post(cpu); +} + +/* * Handle an IPI_SUSPEND by saving our current context and spinning until = we * are resumed. */ diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c index d9203bc..6fa576e 100644 --- a/sys/amd64/amd64/trap.c +++ b/sys/amd64/amd64/trap.c @@ -74,6 +74,7 @@ PMC_SOFT_DEFINE( , , page_fault, all); PMC_SOFT_DEFINE( , , page_fault, read); PMC_SOFT_DEFINE( , , page_fault, write); #endif +#include =20 #include #include @@ -158,6 +159,44 @@ SYSCTL_INT(_machdep, OID_AUTO, uprintf_signal, CTLFLAG= _RWTUN, &uprintf_signal, 0, "Print debugging information on trap signal to ctty"); =20 +#ifdef DEV_ISA +bool +nmi_call_kdb(u_int cpu, u_int type, int code, struct trapframe *frame, + bool do_panic) +{ + + /* machine/parity/power fail/"kitchen sink" faults */ + if (isa_nmi(code) =3D=3D 0) { +#ifdef KDB + /* + * NMI can be hooked up to a pushbutton for debugging. + */ + if (kdb_on_nmi) { + printf ("NMI/cpu%d ... going to debugger\n", cpu); + kdb_trap(type, 0, frame); + return (true); + } + } else +#endif /* KDB */ + if (do_panic) + panic("NMI indicates hardware failure"); + return (false); +} +#endif + +static int +handle_nmi_intr(u_int type, int code, struct trapframe *frame, bool panic) +{ + +#ifdef DEV_ISA +#ifdef SMP + return (nmi_call_kdb_smp(type, code, frame, panic)); +#else + return (nmi_call_kdb(0, type, code, frame, panic)); +#endif +#endif +} + /* * Exception, fault, and trap interface to the FreeBSD kernel. * This common code is called from assembly language IDT gate entry @@ -357,25 +396,9 @@ trap(struct trapframe *frame) i =3D SIGFPE; break; =20 -#ifdef DEV_ISA case T_NMI: - /* machine/parity/power fail/"kitchen sink" faults */ - if (isa_nmi(code) =3D=3D 0) { -#ifdef KDB - /* - * NMI can be hooked up to a pushbutton - * for debugging. - */ - if (kdb_on_nmi) { - printf ("NMI ... going to debugger\n"); - kdb_trap(type, 0, frame); - } -#endif /* KDB */ - goto userout; - } else if (panic_on_nmi) - panic("NMI indicates hardware failure"); + handle_nmi_intr(type, code, frame, true); break; -#endif /* DEV_ISA */ =20 case T_OFLOW: /* integer overflow fault */ ucode =3D FPE_INTOVF; @@ -543,25 +566,11 @@ trap(struct trapframe *frame) #endif break; =20 -#ifdef DEV_ISA case T_NMI: - /* machine/parity/power fail/"kitchen sink" faults */ - if (isa_nmi(code) =3D=3D 0) { -#ifdef KDB - /* - * NMI can be hooked up to a pushbutton - * for debugging. - */ - if (kdb_on_nmi) { - printf ("NMI ... going to debugger\n"); - kdb_trap(type, 0, frame); - } -#endif /* KDB */ - goto out; - } else if (panic_on_nmi =3D=3D 0) + if (handle_nmi_intr(type, code, frame, false) || + !panic_on_nmi) goto out; /* FALLTHROUGH */ -#endif /* DEV_ISA */ } =20 trap_fatal(frame, 0); diff --git a/sys/amd64/include/md_var.h b/sys/amd64/include/md_var.h index 5ddfbbd..da170f2 100644 --- a/sys/amd64/include/md_var.h +++ b/sys/amd64/include/md_var.h @@ -120,5 +120,9 @@ struct savefpu *get_pcb_user_save_td(struct thread *td); struct savefpu *get_pcb_user_save_pcb(struct pcb *pcb); struct pcb *get_pcb_td(struct thread *td); void amd64_db_resume_dbreg(void); +bool nmi_call_kdb(u_int cpu, u_int type, int code, struct trapframe *frame, + bool panic); +bool nmi_call_kdb_smp(u_int type, int code, struct trapframe *frame, + bool panic); =20 #endif /* !_MACHINE_MD_VAR_H_ */ --IOwL3FhNvW0Xz3At Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJTyUYsAAoJEJDCuSvBvK1Bl5YP/2IEO296ZUqjVYUqvcGoZm/x Ov+1xzBFsWd+M81IHKHtyQ91B5dzS4vDnTBrlQR7/PjEKpMuuscLPkXDMJ2Q3KG7 3RC/DHZO2gtNKuppUW72Wtr5bZwRc7WChFE/q48/N0f3Pan8o25ZqHtdt+j3pMlM xj3GjtH5xlnlOzHUgBUeGgZEnzA9bl1kvBYDz53j+JSqbjZCsoJpSfhq5zP0fuYM O6L+gyQhqPN7NjkVcJaG4wfsA+spVfHt72+67HWj8AbXvj6NJsNifUCtyNlTnRwf czp+j6m9NfZyC3aSgvnC6uI+txLm7ambdRuuxtzvLEsQb4f2Lst1HKPgq3yvexcI yNAwmlkCFWl7GBpUFknk9I6+MTW+bgicrnU0F49JUj247V1jTpLImQPVORD7wWC3 vhcudmCDCiB4Qj/meRMfKjaIBtGcM6OYFvbkLUKse2zHwc8vl41B/tGdECwgt3Bd ZWhj6jHm0ck6mKvNCxlqDrEFL4cXgeeaV823AEBWJi9M4zczLe+W4NIk1sihePtJ OSHBUtsK3ExytjUoV1Q7cf45G0+gJuHZkrk7V53Kty1qAd1JrHi5dC57hDOwUqyE EvXRUi4z9NYvrSxyyGOzbcNaaQMqwlTo39hsuTNUUGX5dmQjYIasby07M+9OE51b oA1RaVbv/lGOcnqaCEY0 =CQqR -----END PGP SIGNATURE----- --IOwL3FhNvW0Xz3At-- From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 17:55:50 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B8EF5B4D; Fri, 18 Jul 2014 17:55:50 +0000 (UTC) Received: from mail.ignoranthack.me (ignoranthack.me [199.102.79.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 60CAB2D34; Fri, 18 Jul 2014 17:55:50 +0000 (UTC) Received: from [192.168.200.205] (c-50-131-5-126.hsd1.ca.comcast.net [50.131.5.126]) (using SSLv3 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: sbruno@ignoranthack.me) by mail.ignoranthack.me (Postfix) with ESMTPSA id D5F17193DD9; Fri, 18 Jul 2014 17:55:48 +0000 (UTC) Subject: Re: Total confusion over toolchain/xdev behavior From: Sean Bruno Reply-To: sbruno@freebsd.org To: freebsd-arch@freebsd.org In-Reply-To: <1404688077.1059.115.camel@bruno> References: <1404688077.1059.115.camel@bruno> Content-Type: text/plain; charset="us-ascii" Date: Fri, 18 Jul 2014 10:55:47 -0700 Message-ID: <1405706147.19254.17.camel@bruno> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 17:55:50 -0000 On Sun, 2014-07-06 at 16:07 -0700, Sean Bruno wrote: > Objective: install an xcompile toolchain into a jail for use by > poudriere during arm/mips/sparc/power ports pkgs builds. The build > should be possible from a non-root user. >=20 > As far as I can tell, the xdev target is completely busted for > non-clang > arch's right now as it tries to build clang no matter what I do. Its > missing some pretty key documentation to making it work correctly, so > a > lot of my attempts have been "guess and check" with verbose make. >=20 >=20 Quite a bit of success with one blocking failure. Thanks to Warner, Baptiste and Dimitry for plugging along through some of my ranting (on bcc here). The xdev target can be used to produce a compiler toolchain that can be used to build ports. However, final linking seems to fail and makes it impossible to use for building many, many ports. Regardless of the cross compile bits, simply using the xdev target for building ports natively on amd64 for amd64 manifests this issue, this leads me to believe that xdev is *not* building the tool chain correctly. There is no chroot/jail/emualtion involved in this test case. My test case: 1. build amd64 xdev tool chain: make -s -j4 xdev XDEV=3Damd64 XDEV_ARCH=3Damd64 2. modify make.conf to use this toolchain: CC=3D/usr/amd64-freebsd/usr/bin/cc CPP=3D/usr/amd64-freebsd/usr/bin/cpp CXX=3D/usr/amd64-freebsd/usr/bin/cc++ AS=3D/usr/amd64-freebsd/usr/bin/as NM=3D/usr/amd64-freebsd/usr/bin/nm RANLIB=3D/usr/amd64-freebsd/usr/bin/ranlib LD=3D/usr/amd64-freebsd/usr/bin/ld OBJCOPY=3D/usr/amd64-freebsd/usr/bin/objcopy SIZE=3D/usr/amd64-freebsd/usr/bin/llvm-size STRIPBIN=3D/usr/amd64-freebsd/usr/bin/strip 3. attempt to build ports-mgmt/pkg *NOTE* if you add -lbsdxml to CFLAGS this will not happen. Other packages manfiest similar issues, with different libs. --- pkg-static --- /usr/amd64-freebsd/usr/bin/cc -static -O2 -pipe -fno-strict-aliasing -DPORTSDIR=3D\"/usr/local/poudriere/ports/default\" -I../libpkg -I/usr/local/poudriere/ports/default/ports-mgmt/pkg/work/pkg-1.2.7/pkg/../e= xternal/uthash -I/usr/local/poudriere/ports/default/ports-mgmt/pkg/work/pk= g-1.2.7/pkg/../external/expat/lib -std=3Dgnu99 -fstack-protector -Wsystem-h= eaders -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prot= otypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwri= te-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscri= pts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno= -pointer-sign -Wmissing-variable-declarations -Wno-empty-body -Wno-string-p= lus-int -Wno-unused-const-variable -Qunused-arguments -static -o pkg-stati= c add.o annotate.o audit.o autoremove.o backup.o check.o clean.o config.o c= onvert.o create.o delete.o event.o info.o install.o lock.o main.o plugins.o= progressmeter.o query.o register.o repo.o rquery.o update.o upgrade.o sear= ch.o set.o shlib.o updating.o utils.o version.o which.o fetch.o shell.o sta= ts.o ssh.o -L/usr/local/poudriere/ports/default/ports-mgmt/pkg/work/pkg-1.2= .7/pkg/../libpkg -lpkg -ledit -larchive -lutil -lpthread -lsbuf -lfe= tch -lssl -lcrypto -lmd -lz -lbz2 -llzma -ljail -lelf -larchive -lsb= uf -lfetch -lpthread -lssl -lcrypto -lmd -lz -lbz2 -llzma -ledit -= lncursesw --- pkg --- /usr/amd64-freebsd/usr/bin/cc -O2 -pipe -fno-strict-aliasing -DPORTSDIR=3D\"/usr/local/poudriere/ports/default\" -I../libpkg -I/usr/local/poudriere/ports/default/ports-mgmt/pkg/work/pkg-1.2.7/pkg/../e= xternal/uthash -I/usr/local/poudriere/ports/default/ports-mgmt/pkg/work/pk= g-1.2.7/pkg/../external/expat/lib -std=3Dgnu99 -fstack-protector -Wsystem-h= eaders -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prot= otypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwri= te-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscri= pts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno= -pointer-sign -Wmissing-variable-declarations -Wno-empty-body -Wno-string-p= lus-int -Wno-unused-const-variable -Qunused-arguments -Wl,-rpath=3D/usr/li= b:/usr/local/lib -o pkg add.o annotate.o audit.o autoremove.o backup.o chec= k.o clean.o config.o convert.o create.o delete.o event.o info.o install.o l= ock.o main.o plugins.o progressmeter.o query.o register.o repo.o rquery.o u= pdate.o upgrade.o search.o set.o shlib.o updating.o utils.o version.o which= .o fetch.o shell.o stats.o ssh.o -L/usr/local/poudriere/ports/default/ports= -mgmt/pkg/work/pkg-1.2.7/pkg/../libpkg -lpkg -ledit -larchive -lutil -= lpthread -lsbuf -lfetch -lssl -lcrypto -lmd -lz -lbz2 -llzma -ljail /usr/amd64-freebsd/usr/bin/ld: /usr/amd64-freebsd/lib/libbsdxml.so.4: invalid DSO for symbol `XML_SetUserData' definition /usr/amd64-freebsd/lib/libbsdxml.so.4: could not read symbols: Bad value cc: error: linker command failed with exit code 1 (use -v to see invocation) *** [pkg] Error code 1 make[3]: stopped in /usr/local/poudriere/ports/default/ports-mgmt/pkg/work/pkg-1.2.7/pkg 1 error make[3]: stopped in /usr/local/poudriere/ports/default/ports-mgmt/pkg/work/pkg-1.2.7/pkg *** [all] Error code 2 make[2]: stopped in /usr/local/poudriere/ports/default/ports-mgmt/pkg/work/pkg-1.2.7 1 error make[2]: stopped in /usr/local/poudriere/ports/default/ports-mgmt/pkg/work/pkg-1.2.7 =3D=3D=3D> Compilation failed unexpectedly. Try to set MAKE_JOBS_UNSAFE=3Dyes and rebuild before reporting the failure to the maintainer. *** Error code 1 Stop. make[1]: stopped in /usr/local/poudriere/ports/default/ports-mgmt/pkg *** Error code 1 Stop. make: stopped in /usr/local/poudriere/ports/default/ports-mgmt/pkg From owner-freebsd-arch@FreeBSD.ORG Fri Jul 18 19:19:34 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DF51E537 for ; Fri, 18 Jul 2014 19:19:33 +0000 (UTC) Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 76CC4257D for ; Fri, 18 Jul 2014 19:19:33 +0000 (UTC) Received: by mail-wi0-f180.google.com with SMTP id n3so1347503wiv.13 for ; Fri, 18 Jul 2014 12:19:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=uH0f4VZiyxKN0kUqcp9EA1Xtxv67Dvpkoi7lbOTmkl4=; b=Cj5XlIHhIw3I/kaj4yAvwk1Lxa/r38tyrTSnAzdr/rbPYPQCERaGFIjfyMH2m0PZqw j+46sPrNULGq5LtNqRto2SGZb8g8eVWec0X1YZgsViWuxLCCCG99o65rZxKUtGH89vJ9 rY85EEbMKecnfau/nEahDOCTcHQ8dBaGXlQn2KTPh0klqPntZ+9YUqB8hNF7QmvwfZFa Ilbm2mS/FGqKnYqlWmlrmqoV2dGHxG+qAk85ha43YcG1bhtji6N7R4Qs0dm4F7APMGnd cLIPDNYFDaSvwypzJxFceZKLdnrNGibgJSo4jrAs5YxzMWylbL5RkHAKvbdYRdHXHn3C bE8Q== X-Received: by 10.194.48.8 with SMTP id h8mr9747673wjn.106.1405711171520; Fri, 18 Jul 2014 12:19:31 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id hi4sm16294007wjc.27.2014.07.18.12.19.30 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 18 Jul 2014 12:19:30 -0700 (PDT) Date: Fri, 18 Jul 2014 21:19:28 +0200 From: Mateusz Guzik To: Konstantin Belousov Subject: Re: current fd allocation idiom Message-ID: <20140718191928.GB7179@dft-labs.eu> References: <20140717235538.GA15714@dft-labs.eu> <20140718130629.GJ93733@kib.kiev.ua> <20140718144012.GA7179@dft-labs.eu> <20140718155959.GN93733@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20140718155959.GN93733@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 19:19:34 -0000 On Fri, Jul 18, 2014 at 06:59:59PM +0300, Konstantin Belousov wrote: > On Fri, Jul 18, 2014 at 04:40:12PM +0200, Mateusz Guzik wrote: > > On Fri, Jul 18, 2014 at 04:06:29PM +0300, Konstantin Belousov wrote: > > > On Fri, Jul 18, 2014 at 01:55:38AM +0200, Mateusz Guzik wrote: > > > > ============================ > > > > GENERAL OVERVIEW OF CURRENT STATE: > > > > > > > > What readers need to do: > > > > - rmb() after reading fp_ops > > > > - check fp_ops for badfileops > > > How can readers see badfileops ? > > > > > > > Not sure what you mean. fp is installed with badfileops, anything > > accessing fdtable before finit finishes will see this. > I referenced falloc_noinstall(). > There must be some miscommunication. If finstall is not executed fp with badfileops (like in kern_openat), readers obviously cannot find this fp. However, if fdalloc is executed (falloc_noinstall + fdalloc which installs fp), readers can find such fp. The latter is the common pattern in the kernel. > > > It seems that all what is needed is conversion of places using > > > falloc() to falloc_noinstall()/finstall(). > > > > > > > This postpones fd allocation to after interested function did all work > > it wanted to do, which means we would need reliable ways of reverting > > all the work in case allocation failed. I'm not so confident we can do > > that for all current consumers and both current and my proposed approach > > don't impose such requirement. > Cleanup should be identical to the actions done on close(2). > > > > > Of course postponing fd allocation where possible is definitely worth > > doing. > Yes, and after that the rest of the cases should be evaluated. > But my gut feeling is that everything would be converted. > So let's say you accept() a connection. With current code, if you got to accept the connection you got it. With your proposal you may find that you can't allocate any fd and have to close fp. This will be visible as accept + close by the other end, while the caller never saw the connection. My guess is people would complain once they encounter such issue. -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Sat Jul 19 17:58:35 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 09942120; Sat, 19 Jul 2014 17:58:35 +0000 (UTC) Received: from mail.xcllnt.net (mail.xcllnt.net [50.0.150.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9A7392FA3; Sat, 19 Jul 2014 17:58:33 +0000 (UTC) Received: from [172.29.5.30] ([66.129.239.11]) (authenticated bits=0) by mail.xcllnt.net (8.14.9/8.14.9) with ESMTP id s6JHwOUU023109 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 19 Jul 2014 10:58:26 -0700 (PDT) (envelope-from marcel@xcllnt.net) Content-Type: multipart/signed; boundary="Apple-Mail=_332A8ACD-E11F-490C-B854-38BF11F64932"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: KDB entry on NMI From: Marcel Moolenaar In-Reply-To: <20140718160708.GO93733@kib.kiev.ua> Date: Sat, 19 Jul 2014 10:58:18 -0700 Message-Id: References: <20140718160708.GO93733@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.1878.6) Cc: amd64@freebsd.org, arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 17:58:35 -0000 --Apple-Mail=_332A8ACD-E11F-490C-B854-38BF11F64932 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii On Jul 18, 2014, at 9:07 AM, Konstantin Belousov wrote: > It was mentioned somewhere recently, that typical BIOS today configures > NMI delivery on the hardware events as broadcast. When I developerd > the dmar(4) busdma backend, I indeed met the problem, and wrote a > prototype which avoided startup of ddb on all cores. Instead, the patch > implements custom spinlock, which allows only one core to win, other > cores ignore the NMI, by spinning on lock. > > The issue which I see on at least two different machines with different > Intel chipsets, is that NMI is somehow sticky, i.e. it is re-delivered > after the handler executes iret. I am not sure what the problem is, > whether it is due to hardware needing some ACK, or a bug in code. > > Anyway, even on two-cores machine, having both cores simultaneously > enter NMI makes the use of ddb impossible, so I believe the patch is > improvement. I make measures to ensure that reboot from ddb prompt > works. > > Thought ? One may call kdb_enter on different CPUs at the same time and it's also possible to call panic on multiple CPUs at the same time (but we serialize panic() right now). What if we let kdb_enter at al deal with concurrency, instead of doing it specifically for NMIs? Also: we may want to do something else than going to the debugger when we see an NMI. More complexity in the NMI handler and specific to entering the debugger seems to move us away from doing other things more easily. Aside: I've always wanted to have the ability to have the kernel debugger switch to a different CPU so that you can create DDB commands that dump hardware resources like TLBs, etc. To support this, you want the KDB layer to have good CPU handling, which possibly makes it also a good place to handle concurrent entry into the debugger from different CPUs. FYI, -- Marcel Moolenaar marcel@xcllnt.net --Apple-Mail=_332A8ACD-E11F-490C-B854-38BF11F64932 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iEYEARECAAYFAlPKsboACgkQpgWlLWHuifbE1ACeLhNWhD1eu/5acOCnK+zTedWY uq4AniYoVSg/fF9DIDEWJiDjMIkTAhbS =SCRD -----END PGP SIGNATURE----- --Apple-Mail=_332A8ACD-E11F-490C-B854-38BF11F64932-- From owner-freebsd-arch@FreeBSD.ORG Sat Jul 19 18:29:20 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 063B0677; Sat, 19 Jul 2014 18:29:20 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8158D2202; Sat, 19 Jul 2014 18:29:19 +0000 (UTC) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s6JIT930088464 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 19 Jul 2014 21:29:09 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s6JIT930088464 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s6JIT9Ix088463; Sat, 19 Jul 2014 21:29:09 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 19 Jul 2014 21:29:09 +0300 From: Konstantin Belousov To: Marcel Moolenaar Subject: Re: KDB entry on NMI Message-ID: <20140719182909.GU93733@kib.kiev.ua> References: <20140718160708.GO93733@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Tx24CLeOShJjHgAY" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: amd64@freebsd.org, arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 18:29:20 -0000 --Tx24CLeOShJjHgAY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jul 19, 2014 at 10:58:18AM -0700, Marcel Moolenaar wrote: >=20 > On Jul 18, 2014, at 9:07 AM, Konstantin Belousov wr= ote: >=20 > > It was mentioned somewhere recently, that typical BIOS today configures > > NMI delivery on the hardware events as broadcast. When I developerd > > the dmar(4) busdma backend, I indeed met the problem, and wrote a > > prototype which avoided startup of ddb on all cores. Instead, the patch > > implements custom spinlock, which allows only one core to win, other > > cores ignore the NMI, by spinning on lock. > >=20 > > The issue which I see on at least two different machines with different > > Intel chipsets, is that NMI is somehow sticky, i.e. it is re-delivered > > after the handler executes iret. I am not sure what the problem is, > > whether it is due to hardware needing some ACK, or a bug in code. > >=20 > > Anyway, even on two-cores machine, having both cores simultaneously > > enter NMI makes the use of ddb impossible, so I believe the patch is > > improvement. I make measures to ensure that reboot from ddb prompt > > works. > >=20 > > Thought ? >=20 > One may call kdb_enter on different CPUs at the same time and it's > also possible to call panic on multiple CPUs at the same time (but > we serialize panic() right now). What if we let kdb_enter at al deal > with concurrency, instead of doing it specifically for NMIs? Then, on 80-threads machine I get the 80 ddb sessions on NMI broadcast, like now. With your proposal, it will be somewhat better, since sessions are serialized, so I can do the reboot from the first one. Still, I hope to understand what I am missing to stop NMI from delivering in loop. Then, having only one ddb entry would mean that I should return only once. >=20 > Also: we may want to do something else than going to the debugger > when we see an NMI. More complexity in the NMI handler and specific > to entering the debugger seems to move us away from doing other > things more easily. I agree there. >=20 > Aside: I've always wanted to have the ability to have the kernel > debugger switch to a different CPU so that you can create DDB > commands that dump hardware resources like TLBs, etc. To support > this, you want the KDB layer to have good CPU handling, which > possibly makes it also a good place to handle concurrent entry > into the debugger from different CPUs. Me too. I have another half-finished patch which does this, it allows to migrate the ddb from one cpu to another. It worked by signalling a destination cpu that it should activate, while source cpu starts spinning. I do not remember exact problems which were unresolved. I needed this because some state is CPU-local, cannot be accessed =66rom other cores, and is not saved in pcb. I definitely looked at EFER and MISC_FEATURES MSRs, and local apic state. --Tx24CLeOShJjHgAY Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJTyrj1AAoJEJDCuSvBvK1BodgQAIk2aC6e9so+6O0rLJAizzzz 1SZ3bYLSpNN/MbdPxQM/JudjI4qRuqMXl4vSOdOpN8oUlrKKpZBKBj7YfDyjwA3L aEeyG1iAhBjEMLhF2Rty2wDVXfOxBM/bZiTmPdl69VGbmvsqIchhJFNbUh3tTY1i a4JKQDg32LyUNvkbxvw++wco+Ts2ptIASHVkayI1zM428uaxA2r2DnFajCFFAbot dh/jvWJsXHEiQhHFOCoP1UfrQaLWbl+mTsvBjXW3lSldL4SkAQPSuW8NtUE1+kfY DBVEjMHXFvR2cT9TVJgkOhzosFKQ+Z/hrjd2tyIknaptk2kQPZJUX/qO9KH1Yt1+ UoeTVBKXUySYcfDZX/CzThLmsZwURnfq/7ZjV64P0x44FfWIBe+os5sUA0dKBmS2 62kkOFFkOwi4oSFM6ymm0JJhYLxHfGULIgqTeVa+XEO4RHb8qmU199559R/YSSDK f6nlV8hLaHkbhBSZ2EZAhRmXyuBB+P6l6rOoIEbujTAQGIy3xHvCEUFIHDl3BtN6 6aiFEKXPQJFoEvItYwDW3VSjbvCO9jkyA0+suCaL36isEM8w4Z4OfAQszgBRouqk /SXHoI6KIe/14z0JZohYYavPSC3q51/WeHYO1k7q50IIEiwhKUNwkyRAv0DJPIf2 WwSwITcNPpypeeamVfaB =qos5 -----END PGP SIGNATURE----- --Tx24CLeOShJjHgAY-- From owner-freebsd-arch@FreeBSD.ORG Sat Jul 19 19:57:27 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4EDA5BCB; Sat, 19 Jul 2014 19:57:27 +0000 (UTC) Received: from mail.xcllnt.net (mail.xcllnt.net [50.0.150.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0D57F293B; Sat, 19 Jul 2014 19:57:26 +0000 (UTC) Received: from [192.168.2.31] (atc.xcllnt.net [50.0.150.213]) (authenticated bits=0) by mail.xcllnt.net (8.14.9/8.14.9) with ESMTP id s6JJvOxX024820 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sat, 19 Jul 2014 12:57:25 -0700 (PDT) (envelope-from marcel@xcllnt.net) Content-Type: multipart/signed; boundary="Apple-Mail=_9EC7C332-AEF2-47D7-BE1B-8C3AA7C81F99"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: KDB entry on NMI From: Marcel Moolenaar In-Reply-To: <20140719182909.GU93733@kib.kiev.ua> Date: Sat, 19 Jul 2014 12:57:24 -0700 Message-Id: <18C85F15-FC9E-480C-BFB9-4CD0894FD93A@xcllnt.net> References: <20140718160708.GO93733@kib.kiev.ua> <20140719182909.GU93733@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.1878.6) Cc: amd64@freebsd.org, arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 19:57:27 -0000 --Apple-Mail=_9EC7C332-AEF2-47D7-BE1B-8C3AA7C81F99 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Jul 19, 2014, at 11:29 AM, Konstantin Belousov = wrote: >>=20 >> One may call kdb_enter on different CPUs at the same time and it's >> also possible to call panic on multiple CPUs at the same time (but >> we serialize panic() right now). What if we let kdb_enter at al deal >> with concurrency, instead of doing it specifically for NMIs? > Then, on 80-threads machine I get the 80 ddb sessions on NMI = broadcast, > like now. With your proposal, it will be somewhat better, since > sessions are serialized, so I can do the reboot from the first one. There's value to send the NMI to all CPUs: you'll be pretty sure that if there's a CPU that can handle it, it will get the NMI. Sending it to a single CPU has the downside that if that CPU is unable to handle the NMI (corrupted page tables, locked on some chipset access, held in reset, powering down, whatever one can think of) you're out of luck. Are we acking the NMI on all CPUs right now? --=20 Marcel Moolenaar marcel@xcllnt.net --Apple-Mail=_9EC7C332-AEF2-47D7-BE1B-8C3AA7C81F99 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iEYEARECAAYFAlPKzaQACgkQpgWlLWHuifbMMwCeMnNysi89BXze/Aatu3VkRZk8 G9cAnj7IsKTxaQh5eZ3xtNcwSryWBfHl =m2cx -----END PGP SIGNATURE----- --Apple-Mail=_9EC7C332-AEF2-47D7-BE1B-8C3AA7C81F99-- From owner-freebsd-arch@FreeBSD.ORG Sat Jul 19 20:09:55 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 158B61B5; Sat, 19 Jul 2014 20:09:55 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AA82C2A40; Sat, 19 Jul 2014 20:09:54 +0000 (UTC) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s6JK9nVH011576 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 19 Jul 2014 23:09:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s6JK9nVH011576 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s6JK9mMn011575; Sat, 19 Jul 2014 23:09:48 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 19 Jul 2014 23:09:48 +0300 From: Konstantin Belousov To: Marcel Moolenaar Subject: Re: KDB entry on NMI Message-ID: <20140719200948.GW93733@kib.kiev.ua> References: <20140718160708.GO93733@kib.kiev.ua> <20140719182909.GU93733@kib.kiev.ua> <18C85F15-FC9E-480C-BFB9-4CD0894FD93A@xcllnt.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="/kNZZTualZuJSMMX" Content-Disposition: inline In-Reply-To: <18C85F15-FC9E-480C-BFB9-4CD0894FD93A@xcllnt.net> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: amd64@freebsd.org, arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 20:09:55 -0000 --/kNZZTualZuJSMMX Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jul 19, 2014 at 12:57:24PM -0700, Marcel Moolenaar wrote: >=20 > On Jul 19, 2014, at 11:29 AM, Konstantin Belousov w= rote: > >>=20 > >> One may call kdb_enter on different CPUs at the same time and it's > >> also possible to call panic on multiple CPUs at the same time (but > >> we serialize panic() right now). What if we let kdb_enter at al deal > >> with concurrency, instead of doing it specifically for NMIs? > > Then, on 80-threads machine I get the 80 ddb sessions on NMI broadcast, > > like now. With your proposal, it will be somewhat better, since > > sessions are serialized, so I can do the reboot from the first one. >=20 > There's value to send the NMI to all CPUs: you'll be pretty sure > that if there's a CPU that can handle it, it will get the NMI. > Sending it to a single CPU has the downside that if that CPU is > unable to handle the NMI (corrupted page tables, locked on some > chipset access, held in reset, powering down, whatever one can > think of) you're out of luck. Right, and this is what my patch aim to make useable. The first CPU which gets the NMI wins the permission to enter kdb, other CPUs ignore interrupt, if they are interrupted while winner is still in NMI handler. >=20 > Are we acking the NMI on all CPUs right now? I am not sure what you are asking about. There is no any code which ACKs NMI, and possibly there should be, but I do not know what the ACK could consist of. It might be that the external signal which initiates the NMI is programmed to level-triggered. and we need to ack it in chipset ? But I do not know this for sure, and the action would be definitely chipset-depended. The NMI is blocked on cpu by internal NMI disable bit upon NMI delivery, until iret is executed. The bit is not available in the CPU architectural state. --/kNZZTualZuJSMMX Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJTytCMAAoJEJDCuSvBvK1BPBAQAJhX5OO90h5g62cs0VxX3l4c lnyG6vK/oZQBSIJHWPf8evPIGuFhOQH8pULC2FA4z5fwRJmgIoI+s/QVKxotD1nw AjGuBM2MVfwtOzafvfkXq+x/jlL7iJgHBHEGde4HHWxgbCzvkENVQUADDDXftex/ oDLkFzE0vfnVlm8/UAC6si08AmawSeEI5gQ+oss1onrrC/RCKconMqJZ4nlQ/EWJ VMm6QLJb7+HWuWNKWTFznD9n/+FNM6S5n8UTthvbKz7EIZHtfuESpXdIhx9ctt81 USbz7r+GM27MYwoupwkd4c33qetobsTBVZW9L4vuhj2QCmKX+wWU+lozxtG18gF3 T2ZH+AGvr+qLn6ROQq5J51PT7Et/JFkMn4o5fyOvJaEm+9FeGdEa70pp0hRDYWBQ pmVVqANKtqvo4PnspN4bhPY3W3XBCx1vQkI32g4bdgifaZsjW1YLqQVkLqDz50Mh uwXmoB1wFPZ+cwHo+tBsmMynjJtXmatKdnhRXbN56FDz9PEjyOVLhON0OummAftC BsS2txJNy2vyOVYRypU2PqLWTY+wED3Ks092Yjw/Qlu61Xo7/N+U6IQ4jC2Xzm5l S34U+0twBm1vAcEZhO1hV1up3b0DqfKPbvdFMQOA1ojRY+qX7Lte5dAXwhM59KOG qbqOkKDPlTM1Ri9EOTGd =qT// -----END PGP SIGNATURE----- --/kNZZTualZuJSMMX-- From owner-freebsd-arch@FreeBSD.ORG Sat Jul 19 23:35:35 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 47B845D7 for ; Sat, 19 Jul 2014 23:35:35 +0000 (UTC) Received: from nm8-vm0.bullet.mail.bf1.yahoo.com (nm8-vm0.bullet.mail.bf1.yahoo.com [98.139.213.95]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E0A5A296D for ; Sat, 19 Jul 2014 23:35:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1405812927; bh=FjnTH9M9gGlUWONt9UVXXRpqcMtfigSvv1o7B/qzjr8=; h=Received:Received:Received:X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:From:Content-Type:Content-Transfer-Encoding:Subject:Date:Message-Id:Cc:To:Mime-Version:X-Mailer; b=Cz4zRXcj0WY8oFYcltg/yLUWaacIF82DJ9N5L8B3hf9Owrx1y4Jo2vq25pu+aAbahlxf5sJ56Eu/EPow7ByUO+nPUC6pXcDqGp9WyJOOUto3d40NiALbgAu/MGuPcQ5rCKZt6E/VmO7iG36svCwHx8l9B14dWkJB9dE4rG7nVzYPBXhMD8Ckk2i8hEHSh3OOMsPwPNdueMf6sHEnyJ7kP3QCssTdoEMPJBCyMUGGILRgGyNmEGWxvGga1UWcB4C9CVK7mAONwBTkQHeF+vt00N7AZbe5McyYpPUFMa912gmd1Ql/AHmmYjwAwx4kUGN9lJH0UiNld1MBcuTI+NEyhA== DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s2048; d=yahoo.com; b=Kokn1JmZKd1b8DBdwt2IhOpcNvdzLZKFjmO2ilwQ+t6coJKx2/XbrWX9KYVKaI3xJZAWnGaNId3r9ywkQFJD1+FWII5NTD/MkvssxZoVVYl8J+stQB+8bDiH2FnItT3CIyZEzzVJQzCXf6LnANrfiAiF413u/zZnC9P7Njnlfx14qPhXg9EdbqUEKFMawA/c/MvCXxUeHvhAq4r20hv5y2MJiXSZIAdrHX5cn3XEuPcLn8p4LTpE5uTOzty25Tm9Ay72zZGgJ0g2n1TubOIoVwmmgCvWaeqgLBNycCeke3n6BQmavkUJz9EERt6fFUZZ+Xl3uz2VQV8lp44jkR9w2g==; Received: from [66.196.81.174] by nm8.bullet.mail.bf1.yahoo.com with NNFMP; 19 Jul 2014 23:35:27 -0000 Received: from [98.139.213.12] by tm20.bullet.mail.bf1.yahoo.com with NNFMP; 19 Jul 2014 23:35:27 -0000 Received: from [127.0.0.1] by smtp112.mail.bf1.yahoo.com with NNFMP; 19 Jul 2014 23:35:27 -0000 X-Yahoo-Newman-Id: 606632.4074.bm@smtp112.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: 7p6gl5IVM1neDBPUKUa.w0cHupSzjZseRw1SpiEj3axnivI THvQkdpELB8L_je4i1evG2zuVYwHNkdzB1a4EiLaCRb3mnSBaVhBfChvOT4m zUjkbBrZlkJa5GyQUvaA_dpnoLb_KHxKCySYihUdRW9SaAsCO7qGnTFn6rC2 UUj_TxY4mWa6GI8L.FHZeA9QnKb9s.Plza.RvWAAs6fpIuDI3BK7es5_56zf U7icZJAkcdUp2Nu56nYaEpomLAW3w7BqtkLAgZW2nBJZwFiyNRqGbJ.dEDGf SWLnUd5kRDFLpZnIaVaFlUmxYKc7OcnFv7Ba4Frpb68vL8WSS2lBucnAEe8x kalmR9Vo58xK4uqlz8c5nkSFc992eG0zmRB1UsOkU990PiO5DxFUXbJQyVRw nlApz3RajRgzZHXKGpHvjlk2EqvW7bId._u911YWqki11QOi7HJGVd4RLv1p 04QuLUBvZmWbd..ZKzfDFXnIrGYbYZ_KPVQ6.Q2qny.HBmI7gzR5KczSlLI_ Anv07b2wn1Ve4codmKozhugpxk6Tfi.M- X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf From: Pedro Giffuni Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [RFC] ASLR Whitepaper and Candidate Final Patch Date: Sat, 19 Jul 2014 18:35:24 -0500 Message-Id: <96C72773-3239-427E-A90B-D05FF0F5B782@freebsd.org> To: Bryan Drewery Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-Mailer: Apple Mail (2.1878.6) Cc: PaX Team , freebsd-arch@freebsd.org, Shawn Webb , Oliver Pinter X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 23:35:35 -0000 (Assuming @FreeBSD addresses are subscribed to arch, or check the = archives) FWIW, The issues I pointed out are still standing: - It is yet undetermined what the performance effect will be, and it is = not clear (but seems likely from past measurements) if there will be a = performance hit even when ASLR is off. -Apparently there are applications that will segfault (?). I wouldn=92t object to see it in the tree though: it has obviously been = the result of a lot of work and it is configurable and well integrated. = It will certainly have to be some time in the tree and undergo extensive = testing before turning it on by default though so it sounds reasonable = to bring it in but leave it initially inactive. Pedro.=