From owner-freebsd-arch@freebsd.org Mon Feb 8 21:42:40 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89082AA239C for ; Mon, 8 Feb 2016 21:42:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 70314939 for ; Mon, 8 Feb 2016 21:42:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 6DD1EAA239B; Mon, 8 Feb 2016 21:42:40 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6D6ECAA239A for ; Mon, 8 Feb 2016 21:42:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 50F67937 for ; Mon, 8 Feb 2016 21:42:40 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3B598B990; Mon, 8 Feb 2016 16:42:39 -0500 (EST) From: John Baldwin To: Jilles Tjoelker Cc: arch@freebsd.org Subject: Re: Refactoring asynchronous I/O Date: Mon, 08 Feb 2016 11:22:25 -0800 Message-ID: <2544692.uejYAA75dx@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: <20160205213212.GA97435@stack.nl> References: <2793494.0Z1kBV82mT@ralph.baldwin.cx> <9227739.EqUaAQ57pU@ralph.baldwin.cx> <20160205213212.GA97435@stack.nl> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 08 Feb 2016 16:42:39 -0500 (EST) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Feb 2016 21:42:40 -0000 On Friday, February 05, 2016 10:32:12 PM Jilles Tjoelker wrote: > On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote: > > On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote: > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote: > > > > Note that binding the AIO support to a new fileop does mean that > > > > the AIO code now becomes mandatory (rather than optional). We > > > > could perhaps make the system calls continue to be optional if > > > > people really need that, but the guts of the code will now need to > > > > always be in the kernel. > > > > Enabling this by default is OK with me as long as the easy ways to get a > > > stuck process are at least disabled by default. Currently, a process > > > gets stuck forever if it has an AIO request from or to a pipe that will > > > never complete. An AIO daemon should not be allowed to perform an > > > unbounded sleep such as for a pipe (NFS server should be OK). > > > One thing I could do is split vfs_aio.c into two files: kern_aio.c that > > holds the "library" such as aio_aqueue() / aio_complete(), etc. and a > > sys_aio.c that is just the system calls. kern_aio.c would be standard, > > but sys_aio.c could still be optional and aio.ko would contain it. > > This would still make AIO optional, though aio.ko would be fairly small, > > so not having it probably wouldn't save much in terms of size. > > > Does this seem reasonable or is a trivial aio.ko not worth it? > > It is one possible option. Another option is to refuse AIO for kinds of > file that have not been vetted to avoid blocking an AIO daemon for too > long. "Kinds of file" is about the code that will be executed to do I/O, > so that, for example, /dev/klog and /dev/ttyv0 are different kinds. > Depending on whether this restriction breaks existing code, it may need > to be a sysctl. This doesn't seem to be very easy to implement, especially if it should vary by character device. It sounds like what you would do today is to allow mlock() and AIO on sockets and maybe the fast physio path and disable everything else by default for now? This would be slightly more feasible than something more fine-grained. A sysctl would allow "full" AIO use that would disable these checks? > All of this would not be a problem if PCATCH worked in AIO daemons > (treat aio_cancel like a pending signal in a user process) but the last > time I looked this appeared quite hard to fix. The cancel routines approach is what I'm doing to try to resolve this, but it means handling the case explicitly in the various backends. -- John Baldwin From owner-freebsd-arch@freebsd.org Tue Feb 9 11:28:56 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BC1B9AA2620 for ; Tue, 9 Feb 2016 11:28:56 +0000 (UTC) (envelope-from export@vascofilters.eu) Received: from karieraserlg.nazwa.pl (aju23.rev.netart.pl [77.55.254.23]) by mx1.freebsd.org (Postfix) with ESMTP id C208CF50 for ; Tue, 9 Feb 2016 11:28:55 +0000 (UTC) (envelope-from export@vascofilters.eu) Received: from pc (unknown [194.33.77.155]) by karieraserlg.nazwa.pl (Postfix) with ESMTPA id E80C83CE18C6 for ; Tue, 9 Feb 2016 12:28:23 +0100 (CET) From: "=?iso-8859-2?Q?Marcin_Tomzi=F1ski?=" To: "freebsd-arch" Subject: cooperation Message-ID: Date: Tue, 9 Feb 2016 12:28:32 +0100 Organization: AnoMail - oprogramowanie do emailingow X-Priority: 3 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2; format=flowed Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Feb 2016 11:28:56 -0000 =20 Are you looking for a new automotive parts brand to be different from your = competitors?=20 Do you want to be a direct importer?=20 Are you looking for high quality products at fair and individual quotation?= =20 Fed up with late deliveries and low availability from the stock?=20 Do you want to join young and effective organisation?=20 Do you need fast orders handling and flexibility in market approach? We have a great proposal for your business! =20 Our products range consists of more than 1500 references under 4 brands - VASCO FILTERS (oil, fuel air and cabin filters) - FLUXAR FILTERS (red line) - PEXA WIPERS (universal and dedicated sets of wipers) - BREYKO (brake pads)=20 We are expanding our business in your country and looking for direct import= ers.=20 Join our Team! More info on web page: www.vascofilters.com or facebook = =20 If you have any questions do not hesitate to contact us, we speak English, = French and Spanish. Marketing and Operations Manager Olaf Tomzi=F1ski If you do not wish to receive further information. From owner-freebsd-arch@freebsd.org Wed Feb 10 22:58:48 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 547E5AA52F3 for ; Wed, 10 Feb 2016 22:58:48 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 3A5671F99 for ; Wed, 10 Feb 2016 22:58:48 +0000 (UTC) (envelope-from jilles@stack.nl) Received: by mailman.ysv.freebsd.org (Postfix) id 3877EAA52F0; Wed, 10 Feb 2016 22:58:48 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1E286AA52EF for ; Wed, 10 Feb 2016 22:58:48 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id AACDA1F98; Wed, 10 Feb 2016 22:58:47 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from toad2.stack.nl (toad2.stack.nl [IPv6:2001:610:1108:5010::161]) by mx1.stack.nl (Postfix) with ESMTP id 8F3EEB80A0; Wed, 10 Feb 2016 23:58:44 +0100 (CET) Received: by toad2.stack.nl (Postfix, from userid 1677) id 949F9892F8; Wed, 10 Feb 2016 23:58:44 +0100 (CET) Date: Wed, 10 Feb 2016 23:58:44 +0100 From: Jilles Tjoelker To: John Baldwin Cc: arch@freebsd.org Subject: Re: Refactoring asynchronous I/O Message-ID: <20160210225844.GA89743@stack.nl> References: <2793494.0Z1kBV82mT@ralph.baldwin.cx> <9227739.EqUaAQ57pU@ralph.baldwin.cx> <20160205213212.GA97435@stack.nl> <2544692.uejYAA75dx@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2544692.uejYAA75dx@ralph.baldwin.cx> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Feb 2016 22:58:48 -0000 On Mon, Feb 08, 2016 at 11:22:25AM -0800, John Baldwin wrote: > On Friday, February 05, 2016 10:32:12 PM Jilles Tjoelker wrote: > > On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote: > > > On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote: > > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote: > > > > > Note that binding the AIO support to a new fileop does mean that > > > > > the AIO code now becomes mandatory (rather than optional). We > > > > > could perhaps make the system calls continue to be optional if > > > > > people really need that, but the guts of the code will now need to > > > > > always be in the kernel. > > > > Enabling this by default is OK with me as long as the easy ways to get a > > > > stuck process are at least disabled by default. Currently, a process > > > > gets stuck forever if it has an AIO request from or to a pipe that will > > > > never complete. An AIO daemon should not be allowed to perform an > > > > unbounded sleep such as for a pipe (NFS server should be OK). > > > One thing I could do is split vfs_aio.c into two files: kern_aio.c that > > > holds the "library" such as aio_aqueue() / aio_complete(), etc. and a > > > sys_aio.c that is just the system calls. kern_aio.c would be standard, > > > but sys_aio.c could still be optional and aio.ko would contain it. > > > This would still make AIO optional, though aio.ko would be fairly small, > > > so not having it probably wouldn't save much in terms of size. > > > Does this seem reasonable or is a trivial aio.ko not worth it? > > It is one possible option. Another option is to refuse AIO for kinds of > > file that have not been vetted to avoid blocking an AIO daemon for too > > long. "Kinds of file" is about the code that will be executed to do I/O, > > so that, for example, /dev/klog and /dev/ttyv0 are different kinds. > > Depending on whether this restriction breaks existing code, it may need > > to be a sysctl. > This doesn't seem to be very easy to implement, especially if it should > vary by character device. It sounds like what you would do today is > to allow mlock() and AIO on sockets and maybe the fast physio path > and disable everything else by default for now? This would be slightly > more feasible than something more fine-grained. A sysctl would allow > "full" AIO use that would disable these checks? It does not seem that complicated to me. A check can be inserted before placing a job into aio_jobs (in aio_queue_file() in the patched code). If the code can detect some safe cases, they could be allowed; otherwise, the AIO daemons will not be used. I'm assuming that the code in the new fo_aio_queue file ops behaves properly and will not cause AIO daemons to get stuck. Note that this means there is no easy way to do kernel AIO on a kind of file without specific support in the code for that kind of file. I think the risk of a stuck process (that may even cause shutdown to hang indefinitely) is bad enough to forbid it by default. > > All of this would not be a problem if PCATCH worked in AIO daemons > > (treat aio_cancel like a pending signal in a user process) but the last > > time I looked this appeared quite hard to fix. > The cancel routines approach is what I'm doing to try to resolve this, > but it means handling the case explicitly in the various backends. OK. -- Jilles Tjoelker From owner-freebsd-arch@freebsd.org Wed Feb 10 23:19:21 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0AFF7AA5A03 for ; Wed, 10 Feb 2016 23:19:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id EE47C9AD for ; Wed, 10 Feb 2016 23:19:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id ECA23AA5A02; Wed, 10 Feb 2016 23:19:20 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D2656AA5A01 for ; Wed, 10 Feb 2016 23:19:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ADC6E9AC for ; Wed, 10 Feb 2016 23:19:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3C0F5B972; Wed, 10 Feb 2016 18:19:19 -0500 (EST) From: John Baldwin To: Jilles Tjoelker Cc: arch@freebsd.org Subject: Re: Refactoring asynchronous I/O Date: Wed, 10 Feb 2016 15:19:12 -0800 Message-ID: <1897781.JF7dmmKAT1@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: <20160210225844.GA89743@stack.nl> References: <2793494.0Z1kBV82mT@ralph.baldwin.cx> <2544692.uejYAA75dx@ralph.baldwin.cx> <20160210225844.GA89743@stack.nl> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 10 Feb 2016 18:19:19 -0500 (EST) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Feb 2016 23:19:21 -0000 On Wednesday, February 10, 2016 11:58:44 PM Jilles Tjoelker wrote: > On Mon, Feb 08, 2016 at 11:22:25AM -0800, John Baldwin wrote: > > On Friday, February 05, 2016 10:32:12 PM Jilles Tjoelker wrote: > > > On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote: > > > > On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote: > > > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote: > > > > > > Note that binding the AIO support to a new fileop does mean that > > > > > > the AIO code now becomes mandatory (rather than optional). We > > > > > > could perhaps make the system calls continue to be optional if > > > > > > people really need that, but the guts of the code will now need to > > > > > > always be in the kernel. > > > > > > Enabling this by default is OK with me as long as the easy ways to get a > > > > > stuck process are at least disabled by default. Currently, a process > > > > > gets stuck forever if it has an AIO request from or to a pipe that will > > > > > never complete. An AIO daemon should not be allowed to perform an > > > > > unbounded sleep such as for a pipe (NFS server should be OK). > > > > > One thing I could do is split vfs_aio.c into two files: kern_aio.c that > > > > holds the "library" such as aio_aqueue() / aio_complete(), etc. and a > > > > sys_aio.c that is just the system calls. kern_aio.c would be standard, > > > > but sys_aio.c could still be optional and aio.ko would contain it. > > > > This would still make AIO optional, though aio.ko would be fairly small, > > > > so not having it probably wouldn't save much in terms of size. > > > > > Does this seem reasonable or is a trivial aio.ko not worth it? > > > > It is one possible option. Another option is to refuse AIO for kinds of > > > file that have not been vetted to avoid blocking an AIO daemon for too > > > long. "Kinds of file" is about the code that will be executed to do I/O, > > > so that, for example, /dev/klog and /dev/ttyv0 are different kinds. > > > Depending on whether this restriction breaks existing code, it may need > > > to be a sysctl. > > > This doesn't seem to be very easy to implement, especially if it should > > vary by character device. It sounds like what you would do today is > > to allow mlock() and AIO on sockets and maybe the fast physio path > > and disable everything else by default for now? This would be slightly > > more feasible than something more fine-grained. A sysctl would allow > > "full" AIO use that would disable these checks? > > It does not seem that complicated to me. A check can be inserted before > placing a job into aio_jobs (in aio_queue_file() in the patched code). > If the code can detect some safe cases, they could be allowed; > otherwise, the AIO daemons will not be used. I'm assuming that the code > in the new fo_aio_queue file ops behaves properly and will not cause AIO > daemons to get stuck. > > Note that this means there is no easy way to do kernel AIO on a kind of > file without specific support in the code for that kind of file. I think > the risk of a stuck process (that may even cause shutdown to hang > indefinitely) is bad enough to forbid it by default. Ok. One issue is that some software may assume that aio will "work" if modfind("aio") works and might be surprised by it not working. I'm not sure how real that is. I don't plan on merging this to 10 so we will have some testing time in 11 to figure that out. If it does prove problematic we can revert to splitting the syscalls out into aio.ko. For now I think the two cases to enable by default would be sockets (which use a fo_aio_queue method) and physio. We could let the VFS_AIO option change the sysctl's default value to allow all AIO for compat, though that seems a bit clumsy as the name doesn't really make sense for that. (I also still don't like the vfs_aio.c filename as aio is not VFS-specific.) -- John Baldwin From owner-freebsd-arch@freebsd.org Thu Feb 11 00:21:36 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC969AA3A06 for ; Thu, 11 Feb 2016 00:21:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9D609CC5 for ; Thu, 11 Feb 2016 00:21:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 31688B91F; Wed, 10 Feb 2016 19:21:35 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Cc: Jilles Tjoelker Subject: Re: Refactoring asynchronous I/O Date: Wed, 10 Feb 2016 16:21:30 -0800 Message-ID: <2604669.yblYTI1T32@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: <1897781.JF7dmmKAT1@ralph.baldwin.cx> References: <2793494.0Z1kBV82mT@ralph.baldwin.cx> <20160210225844.GA89743@stack.nl> <1897781.JF7dmmKAT1@ralph.baldwin.cx> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 10 Feb 2016 19:21:35 -0500 (EST) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Feb 2016 00:21:37 -0000 On Wednesday, February 10, 2016 03:19:12 PM John Baldwin wrote: > On Wednesday, February 10, 2016 11:58:44 PM Jilles Tjoelker wrote: > > On Mon, Feb 08, 2016 at 11:22:25AM -0800, John Baldwin wrote: > > > On Friday, February 05, 2016 10:32:12 PM Jilles Tjoelker wrote: > > > > On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote: > > > > > On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote: > > > > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote: > > > > > > > Note that binding the AIO support to a new fileop does mean that > > > > > > > the AIO code now becomes mandatory (rather than optional). We > > > > > > > could perhaps make the system calls continue to be optional if > > > > > > > people really need that, but the guts of the code will now need to > > > > > > > always be in the kernel. > > > > > > > > Enabling this by default is OK with me as long as the easy ways to get a > > > > > > stuck process are at least disabled by default. Currently, a process > > > > > > gets stuck forever if it has an AIO request from or to a pipe that will > > > > > > never complete. An AIO daemon should not be allowed to perform an > > > > > > unbounded sleep such as for a pipe (NFS server should be OK). > > > > > > > One thing I could do is split vfs_aio.c into two files: kern_aio.c that > > > > > holds the "library" such as aio_aqueue() / aio_complete(), etc. and a > > > > > sys_aio.c that is just the system calls. kern_aio.c would be standard, > > > > > but sys_aio.c could still be optional and aio.ko would contain it. > > > > > This would still make AIO optional, though aio.ko would be fairly small, > > > > > so not having it probably wouldn't save much in terms of size. > > > > > > > Does this seem reasonable or is a trivial aio.ko not worth it? > > > > > > It is one possible option. Another option is to refuse AIO for kinds of > > > > file that have not been vetted to avoid blocking an AIO daemon for too > > > > long. "Kinds of file" is about the code that will be executed to do I/O, > > > > so that, for example, /dev/klog and /dev/ttyv0 are different kinds. > > > > Depending on whether this restriction breaks existing code, it may need > > > > to be a sysctl. > > > > > This doesn't seem to be very easy to implement, especially if it should > > > vary by character device. It sounds like what you would do today is > > > to allow mlock() and AIO on sockets and maybe the fast physio path > > > and disable everything else by default for now? This would be slightly > > > more feasible than something more fine-grained. A sysctl would allow > > > "full" AIO use that would disable these checks? > > > > It does not seem that complicated to me. A check can be inserted before > > placing a job into aio_jobs (in aio_queue_file() in the patched code). > > If the code can detect some safe cases, they could be allowed; > > otherwise, the AIO daemons will not be used. I'm assuming that the code > > in the new fo_aio_queue file ops behaves properly and will not cause AIO > > daemons to get stuck. > > > > Note that this means there is no easy way to do kernel AIO on a kind of > > file without specific support in the code for that kind of file. I think > > the risk of a stuck process (that may even cause shutdown to hang > > indefinitely) is bad enough to forbid it by default. > > Ok. One issue is that some software may assume that aio will "work" if > modfind("aio") works and might be surprised by it not working. I'm not sure > how real that is. I don't plan on merging this to 10 so we will have some > testing time in 11 to figure that out. If it does prove problematic we can > revert to splitting the syscalls out into aio.ko. For now I think the two > cases to enable by default would be sockets (which use a fo_aio_queue method) > and physio. We could let the VFS_AIO option change the sysctl's default > value to allow all AIO for compat, though that seems a bit clumsy as the > name doesn't really make sense for that. > > (I also still don't like the vfs_aio.c filename as aio is not VFS-specific.) I've pushed a new version with a vfs.aio.enable_unsafe sysctl (defaults to off). If you think this is ok then I'll work on cleaning up some of the module bits (removing the unload support mostly). I figure marking the syscalls as STD and removing all the helper stuff can be done as a followup commit to reduce extra noise in this diff (which is large enough on its own). https://github.com/freebsd/freebsd/compare/master...bsdjhb:aio_rework (Also, for the inital commit I will leave out the socket protocol hook. That can be added later.) -- John Baldwin From owner-freebsd-arch@freebsd.org Thu Feb 11 10:55:00 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5D0F5AA524D for ; Thu, 11 Feb 2016 10:55:00 +0000 (UTC) (envelope-from export@vascofilters.eu) Received: from karieraserlg.nazwa.pl (aju23.rev.netart.pl [77.55.254.23]) by mx1.freebsd.org (Postfix) with ESMTP id 63AED1B81 for ; Thu, 11 Feb 2016 10:54:58 +0000 (UTC) (envelope-from export@vascofilters.eu) Received: from pc (unknown [194.33.77.155]) by karieraserlg.nazwa.pl (Postfix) with ESMTPA id 4D0173CE1A97 for ; Thu, 11 Feb 2016 11:54:27 +0100 (CET) From: "=?iso-8859-2?Q?Marcin_Tomzi=F1ski?=" To: "freebsd-arch" Subject: cooperation Message-ID: <8FF013BEF4FA503AADD765C225FB61FE@pc> Date: Thu, 11 Feb 2016 11:54:30 +0100 Organization: AnoMail - oprogramowanie do emailingow X-Priority: 3 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2; format=flowed Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Feb 2016 10:55:00 -0000 =20 Are you looking for a new automotive parts brand to be different from your = competitors?=20 Do you want to be a direct importer?=20 Are you looking for high quality products at fair and individual quotation?= =20 Fed up with late deliveries and low availability from the stock?=20 Do you want to join young and effective organisation?=20 Do you need fast orders handling and flexibility in market approach? We have a great proposal for your business! =20 Our products range consists of more than 1500 references under 4 brands - VASCO FILTERS (oil, fuel air and cabin filters) - FLUXAR FILTERS (red line) - PEXA WIPERS (universal and dedicated sets of wipers) - BREYKO (brake pads)=20 We are expanding our business in your country and looking for direct import= ers.=20 Join our Team! More info on web page: www.vascofilters.com or facebook = =20 If you have any questions do not hesitate to contact us, we speak English, = French and Spanish. Marketing and Operations Manager Olaf Tomzi=F1ski If you do not wish to receive further information. From owner-freebsd-arch@freebsd.org Thu Feb 11 11:20:45 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 42EE6AA5E44 for ; Thu, 11 Feb 2016 11:20:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 27CFE94B for ; Thu, 11 Feb 2016 11:20:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 1B9E5AA5E43; Thu, 11 Feb 2016 11:20:45 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1B371AA5E42 for ; Thu, 11 Feb 2016 11:20:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8E8C8948; Thu, 11 Feb 2016 11:20:44 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u1BBKdmV066291 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 11 Feb 2016 13:20:39 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u1BBKdmV066291 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u1BBKcCq066289; Thu, 11 Feb 2016 13:20:38 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 11 Feb 2016 13:20:38 +0200 From: Konstantin Belousov To: arch@freebsd.org Cc: bde@freebsd.org Subject: Goodbye for lint(1) Message-ID: <20160211112038.GQ91220@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="SLfjTIIQuAzj8yil" Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Feb 2016 11:20:45 -0000 --SLfjTIIQuAzj8yil Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I believe the time has come to remove lint and its libraries from the system. I did some manipulations with the mcontext_t/ucontext_t to make us more POSIX-compatible, and found several things about lint(1) which cause serious questions about tool usefulness. My main point is that the lint processing starts with "cc -E -undef" producing the preprocessed source of the linted file or library. The -undef switch removes (almost) all predefined symbols, most importantly, the ____ and __LP64__ and its variants are dropped. Due to this, for the whole 10.x lifetime, since the merge of the i386 and amd64 MD includes, lint cannot ever correctly work on amd64. The same should be true for powerpc, and there headers are more unified and the effect is less enchanting. Even on i386, since headers other than _type.h tend to use #ifdef __i386__/#endif and #ifdef __amd64__/#endif, lint cannot see a lot of system. Nobody complained for 3 (?) years about the tool which clearly misfunctioned. I propose to kill it as unused. Modern compilers do much better job at diagnosing inconsistencies supposedly detected (but really not) by lint. --SLfjTIIQuAzj8yil Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWvG6GAAoJEJDCuSvBvK1BImwP/0PHl1UWOgPC2R9V+iN+Wzwt EUii6moYouXmUljpX2mhCSc9OftDKrxwHYbFwncnt/zaIy+mlyFO7aVgYuixwOG8 knW05NOSX+voSPgn3iZezlpse9WI9h/sqRbF2JxCbYX7ClpGj/zBiqAIwEjVQgH2 nl4WfiVYQSxLJhNWT9e//oAhrTE34sWBvQ1Osr6tZbgzhYkMFHGJiTPF68HZp+zh 7mLNoI0NjAZfD5ltCTpnN/EG5ENTmrzXaY8JudUeW3M00SjXQYH+l2RGd0A9/awR GcrJJnL6gWjCk7FM1UPOdDP4+e6Y7KsGcic7qI75zVrqh9HyNAf4QAtCjafKhbG/ Khkbm+9D2Zaib6tSHPVjMEBjWoIOyN8vkXSU7kEV9EuQgWOQSAaAtV745rQjSFRK MhpAIFFQ2UIBY6gVqI+4p1jtWNRVdeuggYtEfrtOsq1OOle6qHA6r+OPCct1YfAH 4h3NATMmj+uVOVq6HOGbAwaGFRFihWa2bmO4tve6At4upNGmBzO4llL1PxsASrGD ye2hVpLaPN9tLPQzZzmNOZSxb4hFGySdFDGt7tJF4W0qUiWI9Xj7V/A3Cza++5UT DTIeaBe4f/L1PrvOdCN7svz/UOWDbQTkbiArsLb4DepZhBRB6GUGog9FpX+4AyD4 dRE6iZHnPI/PdZIy8B7K =yBbo -----END PGP SIGNATURE----- --SLfjTIIQuAzj8yil-- From owner-freebsd-arch@freebsd.org Thu Feb 11 16:27:28 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CEA7CAA5E61 for ; Thu, 11 Feb 2016 16:27:28 +0000 (UTC) (envelope-from rpaulo@me.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id BC4B2A8D for ; Thu, 11 Feb 2016 16:27:28 +0000 (UTC) (envelope-from rpaulo@me.com) Received: by mailman.ysv.freebsd.org (Postfix) id BAA79AA5E60; Thu, 11 Feb 2016 16:27:28 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BA488AA5E5F for ; Thu, 11 Feb 2016 16:27:28 +0000 (UTC) (envelope-from rpaulo@me.com) Received: from mr11p00im-asmtp002.me.com (mr11p00im-asmtp002.me.com [17.110.69.253]) (using TLSv1.2 with cipher DHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AAD68A8C; Thu, 11 Feb 2016 16:27:28 +0000 (UTC) (envelope-from rpaulo@me.com) Received: from akita.hsd1.ca.comcast.net (c-73-162-13-215.hsd1.ca.comcast.net [73.162.13.215]) by mr11p00im-asmtp002.me.com (Oracle Communications Messaging Server 7.0.5.36.0 64bit (built Sep 8 2015)) with ESMTPSA id <0O2E00C8Q5PLXV30@mr11p00im-asmtp002.me.com>; Thu, 11 Feb 2016 16:27:22 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-02-11_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1011 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1510270003 definitions=main-1602110281 Message-id: <1455208039.2282.1.camel@me.com> Subject: Re: Goodbye for lint(1) From: Rui Paulo To: Konstantin Belousov , arch@freebsd.org Cc: bde@freebsd.org Date: Thu, 11 Feb 2016 08:27:19 -0800 In-reply-to: <20160211112038.GQ91220@kib.kiev.ua> References: <20160211112038.GQ91220@kib.kiev.ua> Content-type: text/plain; charset=UTF-8 X-Mailer: Evolution 3.18.3-1 MIME-version: 1.0 Content-transfer-encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Feb 2016 16:27:28 -0000 On Thu, 2016-02-11 at 13:20 +0200, Konstantin Belousov wrote: > I believe the time has come to remove lint and its libraries from the > system. I did some manipulations with the mcontext_t/ucontext_t to > make > us more POSIX-compatible, and found several things about lint(1) > which > cause serious questions about tool usefulness. > > My main point is that the lint processing starts with "cc -E -undef" > producing the preprocessed source of the linted file or library. The > -undef switch removes (almost) all predefined symbols, most > importantly, > the ____ and __LP64__ and its variants are dropped. > > Due to this, for the whole 10.x lifetime, since the merge of the i386 > and amd64 MD includes, lint cannot ever correctly work on amd64. The > same should be true for powerpc, and there headers are more unified > and > the effect is less enchanting. Even on i386, since headers other than > _type.h tend to use #ifdef __i386__/#endif and #ifdef > __amd64__/#endif, > lint cannot see a lot of system. > > Nobody complained for 3 (?) years about the tool which clearly > misfunctioned. I propose to kill it as unused. Modern compilers do > much > better job at diagnosing inconsistencies supposedly detected (but > really > not) by lint. Yes, it should've been removed years ago... -- Rui Paulo From owner-freebsd-arch@freebsd.org Thu Feb 11 21:21:44 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9EDF1AA5278 for ; Thu, 11 Feb 2016 21:21:44 +0000 (UTC) (envelope-from cturt@hardenedbsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 812AF798 for ; Thu, 11 Feb 2016 21:21:44 +0000 (UTC) (envelope-from cturt@hardenedbsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 7F137AA5277; Thu, 11 Feb 2016 21:21:44 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E98DAA5276 for ; Thu, 11 Feb 2016 21:21:44 +0000 (UTC) (envelope-from cturt@hardenedbsd.org) Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0CE98796 for ; Thu, 11 Feb 2016 21:21:44 +0000 (UTC) (envelope-from cturt@hardenedbsd.org) Received: by mail-wm0-x22c.google.com with SMTP id p63so92034782wmp.1 for ; Thu, 11 Feb 2016 13:21:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hardenedbsd-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ErBCO5h2rhk+NQp6e3orRZHHComipOs5achMki+w09k=; b=gVlwwyqwYb8fMzjDlS9fHHXtsUbzV4v25MQFfZ7GVQpzg02pZ33+nxgvjR0Vf1bc7D iiwfgtI7BGfP6pKvM48vXpoulrOqsOx5WU1cPm8fAII/DCKU8YwYwFPv90qBkxonlHyF MScEoJ4VsZ5rbNtKM2JRMEJmo+D66gBUFNRf+65HidwqEqcBE6YWvAn06f9DrsazrqqX 083XTxvYM7LQuftUSJrwhCH6B4+/7bGt21tAVWUWqb7sRwvpT3A1AgFhr9c8mejFxC+w IenvpVKEDrMd1du7sE55RIFpY1Prh6dnmxwyBzKOG6LUrqNCSmPT3NiXG9shkc8SCr2M 6tcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=ErBCO5h2rhk+NQp6e3orRZHHComipOs5achMki+w09k=; b=bcuUpNnzxGIKx8rA7q7kw5S8T5zUPtV3u2mqDb6/WBRRK5UOUmIOsg8z1vT3GDVAFa FxEttjqR3sDur2rGIwlz8vaUOq5JULXvfZF0F0vFxhYUS5ZOjOhLg915k2N9F2T3pdp7 HpQ2WUrn/unnMfEDEiTN/7Y1Ij7Q+Et21zcluKs3HOOtPWBaz6fIxAjpzPzyIb/BdrDd 7LduWO88UIwEz7ObNVgpReMKkuaNPf62JQqjeaHweVzlU9ovJKr22WPl+tMsqoiGod53 4oU0QuA6TRBw5nhx62VBrO6V7UXvz34O8RL3hPziV/Dnpc9tkiQHWJkfX/aiWlORt9Nn WRmQ== X-Gm-Message-State: AG10YOSxM4+Fm0Rollq383BAxTZ1FLBXV5rZ5Oj4NkzdZek7t/T4qK+sj1G554b/GXJ8TrJJfUQLkJVceIOG8OH6 MIME-Version: 1.0 X-Received: by 10.194.103.131 with SMTP id fw3mr55681585wjb.55.1455225702506; Thu, 11 Feb 2016 13:21:42 -0800 (PST) Received: by 10.27.157.18 with HTTP; Thu, 11 Feb 2016 13:21:42 -0800 (PST) In-Reply-To: References: Date: Thu, 11 Feb 2016 21:21:42 +0000 Message-ID: Subject: Re: OpenBSD mallocarray From: C Turt To: arch@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Feb 2016 21:21:44 -0000 I just wanted to post some real world examples of bugs which could be mitigated with `mallocarray` to attract more interest. My most meritable example of a real world attack from this behaviour would be the sys_dynlib_prepare_dlclose kernel exploit for PS4 (PS4 OS is based on FreeBSD 9.0). You may read my write-up of this exploit here: http://cturt.github.io/dlclose-overflow.html The significance of this example is that if a `mallocarray` wrapper was available, and used here, the bug would not have been exploitable, because it would have intentionally panicked instead of continuing with an under allocated buffer. You may think that this is clearly Sony's fault for not checking the count at all, and that FreeBSD kernel code would never have a bug like this, which is why I would like to mention that similar overflows can be possible even if there initially appear to be sufficient bound checks performed. There are several pieces of vulnerable code present in FreeBSD kernel (albeit most of them are triggerable only as root so are not critical), however I will use the file `/sys/kern/kern_alq.c` to demonstrate. There are some checks on counts and sizes, but they are insufficient: [CODE]int alq_open(struct alq **alqp, const char *file, struct ucred *cred, int cmode, int size, int count) { int ret; KASSERT((count >= 0), ("%s: count < 0", __func__)); if (count > 0) { if ((ret = alq_open_flags(alqp, file, cred, cmode, size*count, 0)) == 0) { (*alqp)->aq_flags |= AQ_LEGACY; (*alqp)->aq_entmax = count; (*alqp)->aq_entlen = size; } ... int alq_open_flags(struct alq **alqp, const char *file, struct ucred *cred, int cmode, int size, int flags) { ... KASSERT((size > 0), ("%s: size <= 0", __func__)); ... alq->aq_buflen = size; ... alq->aq_entbuf = malloc(alq->aq_buflen, M_ALD, M_WAITOK|M_ZERO);[/CODE] The check on `count` being greater than or equal to 0 in `alq_open`, and the check for `size` being greater than 0 in `alq_open_flags` are cute, but they don't check for an overflow of `size*count`. This code path is reachable in several places, such as `sysctl_debug_ktr_alq_enable`: [CODE]static int sysctl_debug_ktr_alq_enable(SYSCTL_HANDLER_ARGS) { ... error = alq_open(&ktr_alq, (const char *)ktr_alq_file, req->td->td_ucred, ALQ_DEFAULT_CMODE, sizeof(struct ktr_entry), ktr_alq_depth); [/CODE] With `ktr_alq_depth` being controllable from userland (but only as root): [CODE]SYSCTL_INT(_debug_ktr, OID_AUTO, alq_depth, CTLFLAG_RW, &ktr_alq_depth, 0, "Number of items in the write buffer");[/CODE] `sizeof(struct ktr_entry)` is 88 bytes. So theoretically if `ktr_alq_depth` is set to `48806447`, we'll get an allocation of `0x100000028`, which truncates to 0x28 = 40 bytes. A heap overflow would then possible when this buffer is iterated over with `aq_entmax` and `aq_entlen`. On Mon, Feb 1, 2016 at 7:57 PM, C Turt wrote: > I've recently started browsing the OpenBSD kernel source code, and have > found the mallocarray function positively wonderful. I would like to > discuss the possibility of getting this into FreeBSD kernel. > > For example, many parts of kernel code in FreeBSD use something like > malloc(xxx * sizeof(struct xxx)). If xxx is 64bit and controllable by user, > this allocation can easily overflow, resulting in a heap overflow later on. > > The mallocarray is a wrapper for malloc which can be used in this > situations to detect an integer overflow before allocating: > > /* > * Copyright (c) 2008 Otto Moerbeek > * > * Permission to use, copy, modify, and distribute this software for any > * purpose with or without fee is hereby granted, provided that the above > * copyright notice and this permission notice appear in all copies. > * > * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES > * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF > * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR > * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES > * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN > * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF > * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. > */ > > /* > * This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX > * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW > */ > #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4)) > > void * > mallocarray(size_t nmemb, size_t size, int type, int flags) > { > if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) && > nmemb > 0 && SIZE_MAX / nmemb < size) { > if (flags & M_CANFAIL) > return (NULL); > panic("mallocarray: overflow %zu * %zu", nmemb, size); > } > return (malloc(size * nmemb, type, flags)); > } > > From owner-freebsd-arch@freebsd.org Thu Feb 11 21:36:55 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B204AA59F2 for ; Thu, 11 Feb 2016 21:36:55 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 78A9D107D for ; Thu, 11 Feb 2016 21:36:55 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: by mailman.ysv.freebsd.org (Postfix) id 74A69AA59F1; Thu, 11 Feb 2016 21:36:55 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5A3E7AA59F0 for ; Thu, 11 Feb 2016 21:36:55 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ob0-x230.google.com (mail-ob0-x230.google.com [IPv6:2607:f8b0:4003:c01::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 21C3E107C for ; Thu, 11 Feb 2016 21:36:55 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: by mail-ob0-x230.google.com with SMTP id is5so94097745obc.0 for ; Thu, 11 Feb 2016 13:36:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :message-id:references:to; bh=+oXv6z+TFfHANYRNuUGEs6zqt5WNIIP6bdUzCyY1UQU=; b=RjJwymPZLnpBACSctdEi+T7rtHI4H0mDARWX8eBdnfKgjB3yLUF/NRmlCkrZ6WuY2W p6AYHG9sZpXqBhMcPkYmDaDGQ1KlJc3V2BsB3cGYlMqM3cLfSHt53n4hpAJSdB7hwusY KLORY5n2wwaH+ArFWqYNt+UUL4I22dIwgXlZcFQHl+XaMkTZBbRTSSPpcq2npkua504e FnBT3B5CMX7R/28ltdfsi2WrfzbmWGIKeoIdXF4IXiBp9b8IyHV+FLgwjgVgVFx9Owco imFps+A6CIFcpXeDwN0qQmoOitt84v496sd3dZ+/LXB/MoSGSxhiyaFrHJF5YNuNqUoC RyAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:mime-version:content-type:from :in-reply-to:date:cc:message-id:references:to; bh=+oXv6z+TFfHANYRNuUGEs6zqt5WNIIP6bdUzCyY1UQU=; b=chMhxTSyVunb2CZZAhapvx2borvbBJMbk1ZhkGUC3d3jSZOcHP47VFyhbe72HG3guu H15EaULQqAAVWvRdEoYLy4oaMK5TSc+fafa4/8sBSkM0OFizQSXRFRdWRibgkyKwCM/x FishRTUizNJA2+0W9NjhG5R9BoiWfaPlqCgTX49Wsc9tQORPo0zHJM4sl50pNhXxvP0q ZAAvVTkaiQa746OKYjaW4pkMHHi3aX1bjYkM8bTg2GSngDAiKl3WF/BcdToozfcpGoPO zGZcNdrgHzpWKDntQBCC7qCv6CL64oGXvT/SlJeks/rIXItvGuWq4u3sFUf58T/bMaN2 SGWA== X-Gm-Message-State: AG10YOTA3EHqBta/8ykFa3lit1L5/zBW/iarYcnnr+VbPu82hAW0k02wxzaHgvg5n/US6w== X-Received: by 10.182.65.138 with SMTP id x10mr47537105obs.39.1455226614121; Thu, 11 Feb 2016 13:36:54 -0800 (PST) Received: from netflix-mac-wired.bsdimp.com ([50.253.99.174]) by smtp.gmail.com with ESMTPSA id pp10sm6710291obc.16.2016.02.11.13.36.53 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 11 Feb 2016 13:36:53 -0800 (PST) Sender: Warner Losh Subject: Re: OpenBSD mallocarray Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Content-Type: multipart/signed; boundary="Apple-Mail=_4D542F93-2169-43C9-A5D6-E0B42B162824"; protocol="application/pgp-signature"; micalg=pgp-sha512 X-Pgp-Agent: GPGMail 2.5.2 From: Warner Losh In-Reply-To: Date: Thu, 11 Feb 2016 14:36:51 -0700 Cc: arch@freebsd.org Message-Id: References: To: C Turt X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Feb 2016 21:36:55 -0000 --Apple-Mail=_4D542F93-2169-43C9-A5D6-E0B42B162824 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Feb 11, 2016, at 2:21 PM, C Turt wrote: >=20 > I just wanted to post some real world examples of bugs which could be > mitigated with `mallocarray` to attract more interest. Let=E2=80=99s play devil=E2=80=99s advocate: since you have to make code = changes, how would mallocarray() be superior to the various MALLOC_OVERFLOW macro suggestions from earlier in the thread? Given that kernel code is somewhat different in what it needs to support? > My most meritable example of a real world attack from this behaviour = would > be the sys_dynlib_prepare_dlclose kernel exploit for PS4 (PS4 OS is = based > on FreeBSD 9.0). You may read my write-up of this exploit here: > http://cturt.github.io/dlclose-overflow.html >=20 > The significance of this example is that if a `mallocarray` wrapper = was > available, and used here, the bug would not have been exploitable, = because > it would have intentionally panicked instead of continuing with an = under > allocated buffer. Except that you=E2=80=99d need to change all the code that was imported = into the kernel to use mallocarray. The code Sony imported didn=E2=80=99t = have it to start with. > You may think that this is clearly Sony's fault for not checking the = count > at all, and that FreeBSD kernel code would never have a bug like this, > which is why I would like to mention that similar overflows can be = possible > even if there initially appear to be sufficient bound checks = performed. >=20 > There are several pieces of vulnerable code present in FreeBSD kernel > (albeit most of them are triggerable only as root so are not = critical), > however I will use the file `/sys/kern/kern_alq.c` to demonstrate. = There > are some checks on counts and sizes, but they are insufficient: >=20 > [CODE]int > alq_open(struct alq **alqp, const char *file, struct ucred *cred, int = cmode, > int size, int count) > { > int ret; >=20 > KASSERT((count >=3D 0), ("%s: count < 0", __func__)); >=20 > if (count > 0) { > if ((ret =3D alq_open_flags(alqp, file, cred, cmode, > size*count, 0)) =3D=3D 0) { > (*alqp)->aq_flags |=3D AQ_LEGACY; > (*alqp)->aq_entmax =3D count; > (*alqp)->aq_entlen =3D size; > } >=20 > ... >=20 > int > alq_open_flags(struct alq **alqp, const char *file, struct ucred = *cred, int > cmode, > int size, int flags) > { > ... > KASSERT((size > 0), ("%s: size <=3D 0", __func__)); > ... > alq->aq_buflen =3D size; > ... > alq->aq_entbuf =3D malloc(alq->aq_buflen, M_ALD, = M_WAITOK|M_ZERO);[/CODE] >=20 > The check on `count` being greater than or equal to 0 in `alq_open`, = and > the check for `size` being greater than 0 in `alq_open_flags` are = cute, but > they don't check for an overflow of `size*count`. >=20 > This code path is reachable in several places, such as > `sysctl_debug_ktr_alq_enable`: >=20 > [CODE]static int > sysctl_debug_ktr_alq_enable(SYSCTL_HANDLER_ARGS) > { > ... > error =3D alq_open(&ktr_alq, (const char *)ktr_alq_file, > req->td->td_ucred, ALQ_DEFAULT_CMODE, > sizeof(struct ktr_entry), ktr_alq_depth); > [/CODE] >=20 > With `ktr_alq_depth` being controllable from userland (but only as = root): >=20 > [CODE]SYSCTL_INT(_debug_ktr, OID_AUTO, alq_depth, CTLFLAG_RW, > &ktr_alq_depth, 0, "Number of items in the write buffer");[/CODE] >=20 > `sizeof(struct ktr_entry)` is 88 bytes. So theoretically if = `ktr_alq_depth` > is set to `48806447`, we'll get an allocation of `0x100000028`, which > truncates to 0x28 =3D 40 bytes. A heap overflow would then possible = when this > buffer is iterated over with `aq_entmax` and `aq_entlen`. These are all good finds. And they are all mitigable with the = MALLOC_OVERFLOW macro that was suggested earlier in this thread. Given the unique = demands of the kernel, why should that not be the preferred method of dealing with this = stuff? Warner > On Mon, Feb 1, 2016 at 7:57 PM, C Turt wrote: >=20 >> I've recently started browsing the OpenBSD kernel source code, and = have >> found the mallocarray function positively wonderful. I would like to >> discuss the possibility of getting this into FreeBSD kernel. >>=20 >> For example, many parts of kernel code in FreeBSD use something like >> malloc(xxx * sizeof(struct xxx)). If xxx is 64bit and controllable by = user, >> this allocation can easily overflow, resulting in a heap overflow = later on. >>=20 >> The mallocarray is a wrapper for malloc which can be used in this >> situations to detect an integer overflow before allocating: >>=20 >> /* >> * Copyright (c) 2008 Otto Moerbeek >> * >> * Permission to use, copy, modify, and distribute this software for = any >> * purpose with or without fee is hereby granted, provided that the = above >> * copyright notice and this permission notice appear in all copies. >> * >> * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL = WARRANTIES >> * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF >> * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE = FOR >> * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY = DAMAGES >> * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN = AN >> * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING = OUT OF >> * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. >> */ >>=20 >> /* >> * This is sqrt(SIZE_MAX+1), as s1*s2 <=3D SIZE_MAX >> * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW >> */ >> #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4)) >>=20 >> void * >> mallocarray(size_t nmemb, size_t size, int type, int flags) >> { >> if ((nmemb >=3D MUL_NO_OVERFLOW || size >=3D MUL_NO_OVERFLOW) && >> nmemb > 0 && SIZE_MAX / nmemb < size) { >> if (flags & M_CANFAIL) >> return (NULL); >> panic("mallocarray: overflow %zu * %zu", nmemb, size); >> } >> return (malloc(size * nmemb, type, flags)); >> } >>=20 >>=20 > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org" --Apple-Mail=_4D542F93-2169-43C9-A5D6-E0B42B162824 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJWvP70AAoJEGwc0Sh9sBEAvf0QAJLfmMmRyPhvo/oW+wfPRM8o PIZnzALIokBt3ol3m7JiYklSm36lqosqET+eZhJLvrtNAnRC2/uaJjJF1Fo56Bgo bY5eJO5NA3Of1pZspaQ9UmVtmn1i+u8eIdmBmq6FzZAXYGvY4qxEWtTPloEd560H ug9ONA/41zH+gq/oxrIl4G2X5JQtkjLM3iyrXu4zntORvHvU2iLvr25UMSCCVgdo Yd/wj+wJdTRLiemlewYAjBcZy4Y0fBh7IIYw+m1FS2/91T3gPWeOcH4vWcDhwKqM h5kOUk48nocEQFM2tcYVUbsg2K8srAmgyL0GlW05NHuyJ0PvTe4hgLu2HAl/Rz3I +P/VQVZ2UnZKpF87bhyKw46Pu/bGqVuq+V6VTutIv4HlvR4tYvK2aYErBj8qnMW0 YdPZU/2eAGH1GMGHArjX070f6HKilyr0AA9SvcO/ApzMRxRsxquPR+2fCrllhTML c3x0MpYN1JpKh12EetUqKsyxYWO3Q/6+1xTv8b2pja/PbQZlrzJbfkyuTDmkPWbm 8SvWc/ZBk4+wa/KkH3SZIomp47ifshzVOmR1PBKZUGFo6DIdcvHmwXzVHSfjaJK5 btZ2+224uVFeGsVHftenm9v5uo0tgTMO0+0xDnMkFAKrgjVKpONQvrUJWmSVsPUx nKOwv6AzIX6l4MOXz1E8 =lf02 -----END PGP SIGNATURE----- --Apple-Mail=_4D542F93-2169-43C9-A5D6-E0B42B162824-- From owner-freebsd-arch@freebsd.org Thu Feb 11 22:32:41 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AED0BAA40C8 for ; Thu, 11 Feb 2016 22:32:41 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7DB67EC2; Thu, 11 Feb 2016 22:32:41 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from toad2.stack.nl (toad2.stack.nl [IPv6:2001:610:1108:5010::161]) by mx1.stack.nl (Postfix) with ESMTP id 67D083592E1; Thu, 11 Feb 2016 23:32:38 +0100 (CET) Received: by toad2.stack.nl (Postfix, from userid 1677) id 30106892B3; Thu, 11 Feb 2016 23:32:38 +0100 (CET) Date: Thu, 11 Feb 2016 23:32:38 +0100 From: Jilles Tjoelker To: John Baldwin Cc: freebsd-arch@freebsd.org Subject: Re: Refactoring asynchronous I/O Message-ID: <20160211223238.GA73182@stack.nl> References: <2793494.0Z1kBV82mT@ralph.baldwin.cx> <20160210225844.GA89743@stack.nl> <1897781.JF7dmmKAT1@ralph.baldwin.cx> <2604669.yblYTI1T32@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2604669.yblYTI1T32@ralph.baldwin.cx> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Feb 2016 22:32:41 -0000 On Wed, Feb 10, 2016 at 04:21:30PM -0800, John Baldwin wrote: > I've pushed a new version with a vfs.aio.enable_unsafe sysctl > (defaults to off). If you think this is ok then I'll work on cleaning > up some of the module bits (removing the unload support mostly). I > figure marking the syscalls as STD and removing all the helper stuff > can be done as a followup commit to reduce extra noise in this diff > (which is large enough on its own). > https://github.com/freebsd/freebsd/compare/master...bsdjhb:aio_rework > (Also, for the inital commit I will leave out the socket protocol > hook. That can be added later.) Looks good to me, except that the error should be [ENOTSUP] instead of the overloaded [EINVAL]. -- Jilles Tjoelker From owner-freebsd-arch@freebsd.org Fri Feb 12 23:24:44 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EE28AA72E9 for ; Fri, 12 Feb 2016 23:24:44 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 415C199 for ; Fri, 12 Feb 2016 23:24:44 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 3D6FAAA72E7; Fri, 12 Feb 2016 23:24:44 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3CEBFAA72E6; Fri, 12 Feb 2016 23:24:44 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from smtp.vangyzen.net (hotblack.vangyzen.net [IPv6:2607:fc50:1000:7400:216:3eff:fe72:314f]) by mx1.freebsd.org (Postfix) with ESMTP id 2989D97; Fri, 12 Feb 2016 23:24:44 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from sweettea.beer.town (unknown [76.164.8.130]) by smtp.vangyzen.net (Postfix) with ESMTPSA id A01C456497; Fri, 12 Feb 2016 17:24:43 -0600 (CST) Subject: Re: libthr shared locks References: <20151223172528.GT3625@kib.kiev.ua> From: Eric van Gyzen X-Enigmail-Draft-Status: N1110 To: Konstantin Belousov , threads@freebsd.org, arch@freebsd.org Message-ID: <56BE69B8.9020808@FreeBSD.org> Date: Fri, 12 Feb 2016 17:24:40 -0600 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <20151223172528.GT3625@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Feb 2016 23:24:44 -0000 On 12/23/2015 11:25, Konstantin Belousov wrote: > The implementation in the patch > https://www.kib.kiev.ua/kib/pshared/pshared.1.patch > gives shared mutexes, condvars, rwlocks and barriers. I reviewed everything except kern_umtx.c, which I plan to review on Monday. Here are my comments so far. * thr_mutex.c Thank you for converting some macros to functions. I find functions much cleaner and easier to read and debug. * thr_mutex.c line 116 The parentheses around (m) can be removed now. * thr_mutex.c lines 331-333 m->m_qe.tqe_prev = TAILQ_NEXT(last_priv, m_qe)-> m_qe.tqe_prev; This seems to read the m_qe.tqe_prev field from a shared mutex. Is that safe? It seems like a race. The following would seem more direct, avoiding the shared mutex: m->m_qe.tqe_prev = &TAILQ_NEXT(last_prev, m_qe); * thr_mutex.c line 354 *(q)->tqh_last = last_priv; This seems to modify the tqe_next field in a shared mutex. Is that safe? Furthermore, that mutex was/is the last on the list, but we seem to set its tqe_next pointer to an earlier element, creating a cycle in the list. * thr_mutex.c line 451 __pthread_mutex_trylock() calls __thr_pshared_offpage() twice [for pshared mutexes]. You could eliminate one call by moving mutex_trylock_common() inline. * thr_pshared.c line 165 res = NULL seems unnecessary. * thr_pshared.c In __thr_pshared_offpage(), can pshared_lookup() fail in the !doalloc case? pshared_hash seems to be an authority, not just an optimization. I ask so that I can understand the code and more effectively review it. In particular, if pshared_lookup() fails and UMTX_SHM_LOOKUP succeeds, is it possible to get multiple mappings for the same shm object? * thr_barrier.c line 110 if (bar == NULL) return (EFAULT); POSIX does not mention EFAULT. Should we return ENOMEM, or can we "extend" the standard? (Ditto for all other _init functions.) * thr_cond.c line 106 You could use cattr instead of the ?: expression. * thr_rwlock.c rwlock_init() assumes that __thr_pshared_offpage() does not fail. From owner-freebsd-arch@freebsd.org Sat Feb 13 09:18:29 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 53132AA6496 for ; Sat, 13 Feb 2016 09:18:29 +0000 (UTC) (envelope-from cturt@hardenedbsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 30DD21AC1 for ; Sat, 13 Feb 2016 09:18:29 +0000 (UTC) (envelope-from cturt@hardenedbsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 3149EAA6495; Sat, 13 Feb 2016 09:18:29 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 16EA7AA6494 for ; Sat, 13 Feb 2016 09:18:29 +0000 (UTC) (envelope-from cturt@hardenedbsd.org) Received: from mail-wm0-x22b.google.com (mail-wm0-x22b.google.com [IPv6:2a00:1450:400c:c09::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 927781ABF for ; Sat, 13 Feb 2016 09:18:28 +0000 (UTC) (envelope-from cturt@hardenedbsd.org) Received: by mail-wm0-x22b.google.com with SMTP id g62so48314457wme.0 for ; Sat, 13 Feb 2016 01:18:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hardenedbsd-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=dGoeYjFzZ9fLbvh1RB3IVlQ5mtLZgVDqHbRO3z+eg58=; b=NMYlRSTjqBeGuJGR7n20QzDmXnsBlM48dm/5bwPHYKv8hB8x7jqe+sc6k+ngf4imNo Z7Hh/VnRpj89ClCXoaWihJNZVV2JuUXnG2+RpKxN+VmQhWZq39RLbghwtv0yQvLuP2ay h39AMOsXRWsr/7R8AEkq+Xnt8fByJo44KdB8F7UMOkvOihLnHP7EzjEcbn537+2FmpPd syQZWR1+tHb7SR5L//lGidIZ1fyqXFPkGeg2ZFNuqj51JuiBZ0CCXDUo8EwMoEUupWz6 KdLVcx/IICPkhVAQMzLoi3FX8810ayzXuv5l3adgfMoavrEO+DAQFbRvHy5qddvFuYfb ilSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=dGoeYjFzZ9fLbvh1RB3IVlQ5mtLZgVDqHbRO3z+eg58=; b=caOXMT7HnhbKV5y+6OIAFuYPsvk/9qeGzZ374rb4YtACc/dVTc3eHY5PRA8GyVPUfd RK0E4+BMN0fBfjyYAg1LLB7KDHzYsEABcWQsCn6vYYIgW0L3XSPauPSGkoBC2zyhFCCQ 0jLOn+oI3z0Vyu0P+ok4A1FIkN2d8MtZFq47g8J1Drln0mtaR8X69zuSfwoIWIoA7WPb Hu0EjtPM/1bEE6v8JQr99HbGgJ2Ue4KnySsVTrRB7wWhjozpqxK0ZIa8cYXWMtzoUjz5 3lTqikxAMkS+EINHePV0ZRi+OrD8JNcQt81hnwRRRIUwaxlKG1/siDEvFDHWLmZvqia7 gMFw== X-Gm-Message-State: AG10YOQSof9nDiz5LZmAq6/jQRx3/CwsgJ+bwm1rbkN66mVm1pQ4xG2DWOmKiGpbB6xgsFL9TXQi4tHQw/v2nA0G MIME-Version: 1.0 X-Received: by 10.194.176.170 with SMTP id cj10mr6013751wjc.165.1455355106154; Sat, 13 Feb 2016 01:18:26 -0800 (PST) Received: by 10.27.157.18 with HTTP; Sat, 13 Feb 2016 01:18:26 -0800 (PST) In-Reply-To: References: Date: Sat, 13 Feb 2016 09:18:26 +0000 Message-ID: Subject: Re: OpenBSD mallocarray From: C Turt To: Warner Losh Cc: arch@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Feb 2016 09:18:29 -0000 "Except that you=E2=80=99d need to change all the code that was imported in= to the kernel to use mallocarray. The code Sony imported didn=E2=80=99t have i= t to start with." Although most of the PS4 dynamic linker is taken from userland FreeBSD rtld-elf, the particular system call which contains the overflow, sys_dynlib_prepare_dlclose, is a Sony extension which was not imported and so would have been written from scratch with the intention of being kernel code. Hence, I continue to believe that if it was common practice to use mallocarray in the kernel, it would likely have been used here. "These are all good finds. And they are all mitigable with the MALLOC_OVERFLOW macro that was suggested earlier in this thread. Given the unique demands of the kernel, why should that not be the preferred method of dealing with this stuff?" You are absolutely right that macros like MALLOC_OVERFLOW should be the preferred way of catching integer overflows, and dealing with them appropriately according to the unique context where the overflow occurs. My intention isn't to remove checks and rely on mallocarray to deal with them, it is to have both, such that only if the initial checks are faulty they will be caught in mallocarray, where the system should then intentionally chose to panic rather than continue, leading to a probable heap overflow. The problem I have is that certain parts of FreeBSD kernel flat out don't live up to FreeBSD's reputation of having clean code written with security in mind. Look at `huft_build` from `sys/kern/inflate.c`, this has to be the ugliest function in the whole of FreeBSD kernel, and there is an allocation with the described pattern: if ((q =3D (struct huft *) malloc((z + 1) * sizeof(struct huft), M_GZIP, M_WAITOK)) =3D=3D (struct huft *) NULL) { What's more, this code has a history of containing vulnerabilities ( https://www.freebsd.org/security/advisories/FreeBSD-SA-06:21.gzip.asc and CVE-2009-2624). `z + 1` should be, and probably is, guaranteed by this code to be within appropriate bounds. However, my proposal is that pieces of code like this should be replaced with `mallocarray` so that there is absolutely no way that this can ever overflow from the multiplication at least. Although there are many other ways that an allocation like this could potentially be vulnerable: overflowing from the `+ 1`, or the count being raced attacked if it wasn't a stack variable, for example, I believe that using `mallocarray` would be an excellent start in pro-actively increasing the security and code quality of the FreeBSD kernel. On Thu, Feb 11, 2016 at 9:36 PM, Warner Losh wrote: > > > On Feb 11, 2016, at 2:21 PM, C Turt wrote: > > > > I just wanted to post some real world examples of bugs which could be > > mitigated with `mallocarray` to attract more interest. > > Let=E2=80=99s play devil=E2=80=99s advocate: since you have to make code = changes, how > would mallocarray() be superior to the various MALLOC_OVERFLOW > macro suggestions from earlier in the thread? Given that kernel code is > somewhat different in what it needs to support? > > > My most meritable example of a real world attack from this behaviour > would > > be the sys_dynlib_prepare_dlclose kernel exploit for PS4 (PS4 OS is bas= ed > > on FreeBSD 9.0). You may read my write-up of this exploit here: > > http://cturt.github.io/dlclose-overflow.html > > > > The significance of this example is that if a `mallocarray` wrapper was > > available, and used here, the bug would not have been exploitable, > because > > it would have intentionally panicked instead of continuing with an unde= r > > allocated buffer. > > Except that you=E2=80=99d need to change all the code that was imported i= nto > the kernel to use mallocarray. The code Sony imported didn=E2=80=99t have= it > to start with. > > > You may think that this is clearly Sony's fault for not checking the > count > > at all, and that FreeBSD kernel code would never have a bug like this, > > which is why I would like to mention that similar overflows can be > possible > > even if there initially appear to be sufficient bound checks performed. > > > > There are several pieces of vulnerable code present in FreeBSD kernel > > (albeit most of them are triggerable only as root so are not critical), > > however I will use the file `/sys/kern/kern_alq.c` to demonstrate. Ther= e > > are some checks on counts and sizes, but they are insufficient: > > > > [CODE]int > > alq_open(struct alq **alqp, const char *file, struct ucred *cred, int > cmode, > > int size, int count) > > { > > int ret; > > > > KASSERT((count >=3D 0), ("%s: count < 0", __func__)); > > > > if (count > 0) { > > if ((ret =3D alq_open_flags(alqp, file, cred, cmode, > > size*count, 0)) =3D=3D 0) { > > (*alqp)->aq_flags |=3D AQ_LEGACY; > > (*alqp)->aq_entmax =3D count; > > (*alqp)->aq_entlen =3D size; > > } > > > > ... > > > > int > > alq_open_flags(struct alq **alqp, const char *file, struct ucred *cred, > int > > cmode, > > int size, int flags) > > { > > ... > > KASSERT((size > 0), ("%s: size <=3D 0", __func__)); > > ... > > alq->aq_buflen =3D size; > > ... > > alq->aq_entbuf =3D malloc(alq->aq_buflen, M_ALD, M_WAITOK|M_ZERO);[/C= ODE] > > > > The check on `count` being greater than or equal to 0 in `alq_open`, an= d > > the check for `size` being greater than 0 in `alq_open_flags` are cute, > but > > they don't check for an overflow of `size*count`. > > > > This code path is reachable in several places, such as > > `sysctl_debug_ktr_alq_enable`: > > > > [CODE]static int > > sysctl_debug_ktr_alq_enable(SYSCTL_HANDLER_ARGS) > > { > > ... > > error =3D alq_open(&ktr_alq, (const char *)ktr_alq_file, > > req->td->td_ucred, ALQ_DEFAULT_CMODE, > > sizeof(struct ktr_entry), ktr_alq_depth); > > [/CODE] > > > > With `ktr_alq_depth` being controllable from userland (but only as root= ): > > > > [CODE]SYSCTL_INT(_debug_ktr, OID_AUTO, alq_depth, CTLFLAG_RW, > > &ktr_alq_depth, 0, "Number of items in the write buffer");[/CODE] > > > > `sizeof(struct ktr_entry)` is 88 bytes. So theoretically if > `ktr_alq_depth` > > is set to `48806447`, we'll get an allocation of `0x100000028`, which > > truncates to 0x28 =3D 40 bytes. A heap overflow would then possible whe= n > this > > buffer is iterated over with `aq_entmax` and `aq_entlen`. > > These are all good finds. And they are all mitigable with the > MALLOC_OVERFLOW > macro that was suggested earlier in this thread. Given the unique demands > of the > kernel, why should that not be the preferred method of dealing with this > stuff? > > Warner > > > > On Mon, Feb 1, 2016 at 7:57 PM, C Turt wrote: > > > >> I've recently started browsing the OpenBSD kernel source code, and hav= e > >> found the mallocarray function positively wonderful. I would like to > >> discuss the possibility of getting this into FreeBSD kernel. > >> > >> For example, many parts of kernel code in FreeBSD use something like > >> malloc(xxx * sizeof(struct xxx)). If xxx is 64bit and controllable by > user, > >> this allocation can easily overflow, resulting in a heap overflow late= r > on. > >> > >> The mallocarray is a wrapper for malloc which can be used in this > >> situations to detect an integer overflow before allocating: > >> > >> /* > >> * Copyright (c) 2008 Otto Moerbeek > >> * > >> * Permission to use, copy, modify, and distribute this software for an= y > >> * purpose with or without fee is hereby granted, provided that the abo= ve > >> * copyright notice and this permission notice appear in all copies. > >> * > >> * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL > WARRANTIES > >> * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF > >> * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE > FOR > >> * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAG= ES > >> * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN A= N > >> * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT > OF > >> * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. > >> */ > >> > >> /* > >> * This is sqrt(SIZE_MAX+1), as s1*s2 <=3D SIZE_MAX > >> * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW > >> */ > >> #define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4)) > >> > >> void * > >> mallocarray(size_t nmemb, size_t size, int type, int flags) > >> { > >> if ((nmemb >=3D MUL_NO_OVERFLOW || size >=3D MUL_NO_OVERFLOW) && > >> nmemb > 0 && SIZE_MAX / nmemb < size) { > >> if (flags & M_CANFAIL) > >> return (NULL); > >> panic("mallocarray: overflow %zu * %zu", nmemb, size); > >> } > >> return (malloc(size * nmemb, type, flags)); > >> } > >> > >> > > _______________________________________________ > > freebsd-arch@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > From owner-freebsd-arch@freebsd.org Sat Feb 13 13:21:41 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64C8AAA7805 for ; Sat, 13 Feb 2016 13:21:41 +0000 (UTC) (envelope-from ed@nuxi.nl) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 49C2E1E13 for ; Sat, 13 Feb 2016 13:21:41 +0000 (UTC) (envelope-from ed@nuxi.nl) Received: by mailman.ysv.freebsd.org (Postfix) id 4773AAA7804; Sat, 13 Feb 2016 13:21:41 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2E241AA7802 for ; Sat, 13 Feb 2016 13:21:41 +0000 (UTC) (envelope-from ed@nuxi.nl) Received: from mail-yk0-x231.google.com (mail-yk0-x231.google.com [IPv6:2607:f8b0:4002:c07::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EFDFC1E12 for ; Sat, 13 Feb 2016 13:21:40 +0000 (UTC) (envelope-from ed@nuxi.nl) Received: by mail-yk0-x231.google.com with SMTP id z13so44660296ykd.0 for ; Sat, 13 Feb 2016 05:21:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nuxi-nl.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=DM6qrHkMLMSY/bHjaFfY8CJmwa5TUBhW462BcWjed8M=; b=l8ZwwpBHoTtt3aR+n7xIzzuWVzD2faEp0f7vnzE/RzH7NeuU2Z7KBfLB/wjNpH9wHd mFWr0AICqZAtqbO2Q8mP+fpu1wvA8Nl1YVCO3Q0Nn32jz++XxmhbLy/UkqtvoXe84zQg f0fsziAhiYi4XfuYz8PZbFRpLEHqMfccxinRyIbS0QTMbn8F4NuePP2l/o3+04SmoQz2 os0mr+VqLSPiYY3LUp0Sev9bwoJSVWQlQeNp+o9Pk94jjdwmvEhhDeuYvKftWydFiujO 3EbJuPgv5VWMRjoFxSNKH7jo+KqvNGfOrAujVw9ynFngu1Si+qqardYr0mOahT8rOvLR 8MBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=DM6qrHkMLMSY/bHjaFfY8CJmwa5TUBhW462BcWjed8M=; b=JDCG++jmqHPgACg/gvihJ33YhvmJDYengzgiWL8cerMuI2mOCpzwJeTpHVS16h/RjK 33J7BDdFfKLeqVCTJF0LqD433U7U2qdvqFBRUjKuRKgsbKRWQy0bs1xZoBcAG0pXEbg0 khgO+/AWAuktSM0i+NGflDgqWyeJKalmhBCNYNm9iHNjOLPIEqyfXRCLdSWI90wZM4eF di2Nl+lpjLdjDGkAxiT7D7xg2/Iv1pTnoqATsAPvcWWukAkkgf395vKxBfIyQHVi3Hn7 EfgawuAbFAcc+xVTr3Xjb5wcfCHjJBsW0Yxau1knWGIAAQ5jb6f6rE7LbjBFZ7Ofajzt qpeg== X-Gm-Message-State: AG10YOTUzAx/RYKOdEzViuSrgfI9RJ47weCxr3Er1RYDArk77vp6U+JRRDNMS5tkSAW0XnDbFG9E5GKSUbNQGA== MIME-Version: 1.0 X-Received: by 10.37.81.137 with SMTP id f131mr3962636ybb.54.1455369699470; Sat, 13 Feb 2016 05:21:39 -0800 (PST) Received: by 10.129.148.6 with HTTP; Sat, 13 Feb 2016 05:21:39 -0800 (PST) In-Reply-To: References: Date: Sat, 13 Feb 2016 14:21:39 +0100 Message-ID: Subject: Re: OpenBSD mallocarray From: Ed Schouten To: C Turt Cc: arch@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Feb 2016 13:21:41 -0000 Hi there, 2016-02-01 20:57 GMT+01:00 C Turt : > if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) && > nmemb > 0 && SIZE_MAX / nmemb < size) { In my opinion functions like these are good additions, as long as we make sure that we stop importing garbage expressions like the one above. It's already bad enough that we copy-pasted this gobbledygook into fread(), fwrite(), calloc(), reallocarray(), etc. Both the latest versions of Clang and GCC support the following builtins: bool __builtin_add_overflow(type1 a, type2 b, type3 *res); bool __builtin_sub_overflow(type1 a, type2 b, type3 *res); bool __builtin_mul_overflow(type1 a, type2 b, type3 *res); These functions perform addition, subtraction and multiplication, returning whether overflow has occurred in the process. This is a lot more efficient (and readable) than the expression above, as it simply uses the CPU's mul instruction, followed by a jump-on-overflow. GCC 4.2.1 doesn't support these builtins, but they can easily be emulated by using static inline functions that use the code above. If we want them to be type generic, we can use 's __generic(), which either expands to C11's _Generic() or falls back to GCC's __builtin_types_compatible_p()/__builtin_choose_expr(). I'd say it would make a lot of sense to add a new header file, e.g. , that adds compiler-independent wrappers for these builtins: #if recent version of GCC/Clang #define add_overflow(a, b, res) __builtin_add_overflow(a, b, res) #else #define add_overflow(a, b, res) (__generic(*(res), unsigned int, ..., ...)(a, b, res)) #endif -- Ed Schouten Nuxi, 's-Hertogenbosch, the Netherlands KvK-nr.: 62051717 From owner-freebsd-arch@freebsd.org Sat Feb 13 14:38:27 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 358C7AA7677 for ; Sat, 13 Feb 2016 14:38:27 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 225AC1F91 for ; Sat, 13 Feb 2016 14:38:27 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 1DFEBAA7675; Sat, 13 Feb 2016 14:38:27 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1D654AA7674; Sat, 13 Feb 2016 14:38:27 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BC6F11F90; Sat, 13 Feb 2016 14:38:26 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u1DEcFKg073947 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 13 Feb 2016 16:38:16 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u1DEcFKg073947 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u1DEcFBY073946; Sat, 13 Feb 2016 16:38:15 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 13 Feb 2016 16:38:15 +0200 From: Konstantin Belousov To: Eric van Gyzen Cc: threads@freebsd.org, arch@freebsd.org Subject: Re: libthr shared locks Message-ID: <20160213143815.GB91220@kib.kiev.ua> References: <20151223172528.GT3625@kib.kiev.ua> <56BE69B8.9020808@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56BE69B8.9020808@FreeBSD.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Feb 2016 14:38:27 -0000 Thank you for the review. On Fri, Feb 12, 2016 at 05:24:40PM -0600, Eric van Gyzen wrote: > * thr_mutex.c line 116 > > The parentheses around (m) can be removed now. Done. > > * thr_mutex.c lines 331-333 > > m->m_qe.tqe_prev = > TAILQ_NEXT(last_priv, m_qe)-> > m_qe.tqe_prev; > > This seems to read the m_qe.tqe_prev field from a shared mutex. Is that > safe? It seems like a race. The following would seem more direct, > avoiding the shared mutex: > > m->m_qe.tqe_prev = &TAILQ_NEXT(last_prev, m_qe); This is indeed racy, relying on the parent process not unlocking the shared mutexes. But after your note, I think that the whole list iteration is unsafe, because of the same unlocking in parent. So in fact I have to return to what I did in the previous version of the patch, where I kept two queues for each type of the mutexes, one total, and one private. The private queue keeps the order of the total, so that reinitialization of the total queue on the fork is correct for ceiling ordering. > * thr_mutex.c line 354 > > *(q)->tqh_last = last_priv; > > This seems to modify the tqe_next field in a shared mutex. Is that > safe? Furthermore, that mutex was/is the last on the list, but we seem > to set its tqe_next pointer to an earlier element, creating a cycle in > the list. This code is gone due to the previous note. > > * thr_mutex.c line 451 > > __pthread_mutex_trylock() calls __thr_pshared_offpage() twice [for > pshared mutexes]. You could eliminate one call by moving > mutex_trylock_common() inline. I see. In fact, I really wanted to eliminate the CHECK_AND_INIT_MUTEX. See my attempt in the updated patch. > > * thr_pshared.c line 165 > > res = NULL seems unnecessary. Done. > > * thr_pshared.c > > In __thr_pshared_offpage(), can pshared_lookup() fail in the !doalloc > case? pshared_hash seems to be an authority, not just an optimization. > I ask so that I can understand the code and more effectively review it. > In particular, if pshared_lookup() fails and UMTX_SHM_LOOKUP succeeds, > is it possible to get multiple mappings for the same shm object? Do you mean, is it possible (and if yes, is it harmful) to have several virtual addresses in one process for the same key ? I think there is a bug in pshared_insert() where I preferred new val over the hashed val (mapping address). In the situation where there are two threads trying to lock the same object, it may cause first thread to operate on unmapped address. The scenario is: - both threads do not find the key in hash; - first thread performs pshared_insert() and returns the address; - second thread performs pshared_insert() and replaces the address in hash, also invalidating the address still used by first thread. I changed the pshared_insert() to keep the existing hash value, and unmap the new val. That said, I think it is probably not very harmful to have different callers to operate on different mappings for the same key (of course, the backing page must be shared). I can only think of the problems due to locked mutex list manipulation functions failing if the address of element changed. I said that this should not be very harmful since I suspect that only list invariants checks would fail, and not actual removal, but I did not checked it. For now, I think that the invariant I have to ensure is that calls to lock and unlock from the same thread for the same key get the same offpage virtual address. > > * thr_barrier.c line 110 > > if (bar == NULL) > return (EFAULT); > > POSIX does not mention EFAULT. Should we return ENOMEM, or can we > "extend" the standard? (Ditto for all other _init functions.) I think EFAULT is permitted by the 'undefined behaviour' clause, since the primary cause for the offpage allocation failure is wrong address passed to the __thr_pshared_offpage(). In other words, if an implementation did not used offpage, but directly write something to the *m memory, it would get SIGSEGV. As noted above, malloc() failure would also lead to EFAULT (and this is what probably caused your question), but I would consider this improbable, while invalid address passed to the init function a more likely cause. I do not want to complicate code to distinguish the cases. > > * thr_cond.c line 106 > > You could use cattr instead of the ?: expression. Done. > > * thr_rwlock.c > > rwlock_init() assumes that __thr_pshared_offpage() does not fail. Done, I return EFAULT in case offpage allocation fails. If you object still against other EFAULTs, I would change all of them to be consistent. Updated patch is at https://www.kib.kiev.ua/kib/pshared/pshared.2.patch From owner-freebsd-arch@freebsd.org Sat Feb 13 16:56:31 2016 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5C7E6AA676E for ; Sat, 13 Feb 2016 16:56:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 4254D109F for ; Sat, 13 Feb 2016 16:56:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 40163AA676D; Sat, 13 Feb 2016 16:56:31 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 24ECCAA676C for ; Sat, 13 Feb 2016 16:56:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id A7282109D for ; Sat, 13 Feb 2016 16:56:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c110-21-41-193.carlnfd1.nsw.optusnet.com.au (c110-21-41-193.carlnfd1.nsw.optusnet.com.au [110.21.41.193]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 8A348426EB7; Sun, 14 Feb 2016 03:56:21 +1100 (AEDT) Date: Sun, 14 Feb 2016 03:56:20 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ed Schouten cc: C Turt , arch@freebsd.org Subject: Re: OpenBSD mallocarray In-Reply-To: Message-ID: <20160214025715.A918@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=cK4dyQqN c=1 sm=1 tr=0 a=73JWPhLeruqQCjN69UNZtQ==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=ypVJL4-jAAAA:8 a=8YkvNc5hkbnPY-hunSYA:9 a=U5Y3M-fodkWOT-C-:21 a=05EgSaJtlaSRRacC:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Feb 2016 16:56:31 -0000 On Sat, 13 Feb 2016, Ed Schouten wrote: > 2016-02-01 20:57 GMT+01:00 C Turt : >> if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) && >> nmemb > 0 && SIZE_MAX / nmemb < size) { > > In my opinion functions like these are good additions, as long as we > make sure that we stop importing garbage expressions like the one > above. It's already bad enough that we copy-pasted this gobbledygook > into fread(), fwrite(), calloc(), reallocarray(), etc. This is a normal well written C expression. It uses a possibly- excessive micro-optimization to avoid a possibly-slow division. It is less general then my macro: #define MALLOC_ARRAY_CHECK(num, size, limit) \ ((size) == 0 || limit / (size) >= (num)) My macro shouldn't have MALLOC in its name, since its only relation to malloc() is that it may be used for bounds checking related to using malloc(). > Both the latest versions of Clang and GCC support the following builtins: > > bool __builtin_add_overflow(type1 a, type2 b, type3 *res); > bool __builtin_sub_overflow(type1 a, type2 b, type3 *res); > bool __builtin_mul_overflow(type1 a, type2 b, type3 *res); > > These functions perform addition, subtraction and multiplication, > returning whether overflow has occurred in the process. This is a lot > more efficient (and readable) than the expression above, as it simply > uses the CPU's mul instruction, followed by a jump-on-overflow. Using these functions is gives undefined behaviour. They are in the implementation namespace and are not documented in any installed documentation for clang (maybe in info for gcc?). This is a slightly less efficient and slightly less readable than the first expression above if the implementation is as you describe. The micro-optimization in the first expression does work and results in no division in most cases. It usually takes 2 comparisons and 2 branches instead of 1 multiplication, 1 comparison and 1 branch. The extra comparison is probably faster than the multiplication unless multiplication takes only 1 cycle. The compiler might actually actually convert each version to the other if the version with the multiplication is faster. Or better, to the in-between version with 2 comparisons and the division reduced to multiplications. My macro is more general than this. It allows an arbitrary limit, but the builtins only check for the overflow threshold. Checks for malloc() in the kernel always need to use a limit much smaller than the overflow threshold. The version in fread.c actually has a further micro-optimizations which you don't like and style bugs which I don't like: X /* X * Check for integer overflow. As an optimization, first check that X * at least one of {count, size} is at least 2^16, since if both X * values are less than that, their product can't possible overflow X * (size_t is always at least 32 bits on FreeBSD). X */ X if (((count | size) > 0xFFFF) && X (count > SIZE_MAX / size)) { (size != 0 has already been checked for.) This manually reduces the 2 comparisons and 2 branches to 1 OR, 1 comparison and 1 branch. Well, I don't like this either. The compiler can do this strength reduction much more easily than the mul/div ones if it (either the optimization or the compiler) is any good. It micro-optimization gives one of the style bugs (a verbose comment to explain it, and duplicating this comment. The other style bugs are excessive parentheses and unnecessary line splitting. Bruce