From owner-freebsd-hackers@FreeBSD.ORG Sun Nov 17 23:09:12 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id ADB3CC9B; Sun, 17 Nov 2013 23:09:12 +0000 (UTC) Received: from mail-ea0-x234.google.com (mail-ea0-x234.google.com [IPv6:2a00:1450:4013:c01::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EDE892B56; Sun, 17 Nov 2013 23:09:11 +0000 (UTC) Received: by mail-ea0-f180.google.com with SMTP id f15so268946eak.25 for ; Sun, 17 Nov 2013 15:09:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=9JhQdvAs5+jD0Qs/gm3vGXoKD2I5ED1Rr63BFPP9G4Y=; b=jA+duaKysm+lYhos+xqy6q7TTtjxwWUNK27Tvv0L9z1x2oVrOqabcb7r9dvfFrtE/T 9pNAQsggWNMMzYyHtQ9wnnQtPjQRVSN4QNDNoYOD8SaUo2W0Ee9EADPNRwfgEyUmRbtg +I/v3XDUOiHvFdHmYNq58tdiwkIN9vlAJYLUTtdH2IyN+QE7lPv7buRn1sRibDmei8VA lGkkOnQ2liEFW9hvXI2RGChoKHWLCASs8t8s0BKD7Lbg41r+BoQ+mIUknIwWD1XEs9Q9 PoUjt7b+6yQ4QQnnxwfSSgA91s02wTaxRxm68QnfEUMVzGmtC5f/gqL9qe/KJOUuX3aW WQ4g== X-Received: by 10.15.65.11 with SMTP id p11mr345117eex.49.1384729749539; Sun, 17 Nov 2013 15:09:09 -0800 (PST) Received: from mavbook.mavhome.dp.ua ([178.137.150.35]) by mx.google.com with ESMTPSA id o47sm31544449eem.21.2013.11.17.15.09.07 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 17 Nov 2013 15:09:08 -0800 (PST) Sender: Alexander Motin Message-ID: <52894C92.60905@FreeBSD.org> Date: Mon, 18 Nov 2013 01:09:06 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" Subject: UMA cache back pressure Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Nov 2013 23:09:12 -0000 Hi. I've created patch, based on earlier work of avg@, to add back pressure to UMA allocation caches. The problem of physical memory or KVA exhaustion existed there for many years and it is quite critical now for improving systems performance while keeping stability. Changes done in memory allocation last years improved situation. but haven't fixed completely. My patch solves remaining problems from two sides: a) reducing bucket sizes every time system detects low memory condition; and b) as last-resort mechanism for very low memory condition, it cycling over all CPUs to purge their per-CPU UMA caches. Benefit of this approach is in absence of any additional hard-coded limits on cache sizes -- they are self-tuned, based on load and memory pressure. With this change I believe it should be safe enough to enable UMA allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for amd64). I did many tests on machine with 24 logical cores (and as result strong allocation cache effects), and can say that with 40GB RAM using UMA caches, allowed by this change, by two times increases results of SPEC NFS benchmark on ZFS pool of several SSDs. To test system stability I've run the same test with physical memory limited to just 2GB and system successfully survived that, and even showed results 1.5 times better then with just last resort measures of b). In both cases tools/umastat no longer shows unbound UMA cache growth, that makes me believe in viability of this approach for longer runs. I would like to hear some comments about that: http://people.freebsd.org/~mav/uma_pressure.patch Thank you. -- Alexander Motin From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 08:41:50 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E87C359F; Mon, 18 Nov 2013 08:41:49 +0000 (UTC) Received: from mail-qe0-x229.google.com (mail-qe0-x229.google.com [IPv6:2607:f8b0:400d:c02::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 8C13E2719; Mon, 18 Nov 2013 08:41:49 +0000 (UTC) Received: by mail-qe0-f41.google.com with SMTP id x7so3878272qeu.14 for ; Mon, 18 Nov 2013 00:41:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=G+T0OAuaFI/yyb4AkjROiIwYoZzdv9PevOuywYhjg+I=; b=f/wS6uq+Gr/7F3GSlqBFqzGBiHPw6JrwNTDOX2Hm2n2lVwEsTEFZPVON3B9JERZHJY vtI8sdfLATQreDpQTAQmq2G6g44oNezeACdLN89WV4QNJLSGByp121OXnUeVcN4THkuA tfdYnDsuk61qKwctA4Bn0Zy64OZZnhV7gZfLrM56nqRqF4XUClnsqIhIPV0L1H+AC3Wi /A9EzZuiEXQOB2l4j9yf476DXwTLEvtR8nFmo1hp3zoyibiPT1w3K6c/XnngZcEKd1Is qU83sCcJJcqvmaUZwh6elbJ1U4M/e4bxeSPHvnwpgmS6ZQbKFAD09k4qwETE4XKJ8yaG pTog== MIME-Version: 1.0 X-Received: by 10.224.64.200 with SMTP id f8mr32262534qai.55.1384764108825; Mon, 18 Nov 2013 00:41:48 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.207.66 with HTTP; Mon, 18 Nov 2013 00:41:48 -0800 (PST) In-Reply-To: <52894C92.60905@FreeBSD.org> References: <52894C92.60905@FreeBSD.org> Date: Mon, 18 Nov 2013 00:41:48 -0800 X-Google-Sender-Auth: NbolgVcs7EvAmjwQ51Qzypcoosk Message-ID: Subject: Re: UMA cache back pressure From: Adrian Chadd To: Alexander Motin Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 08:41:50 -0000 Hi! Your patch does three things: * adds a couple new buckets; * reduces some lock contention * does the aggressive backpressure. So, do you get any benefits from just the first one, or first two? -adrian On 17 November 2013 15:09, Alexander Motin wrote: > Hi. > > I've created patch, based on earlier work of avg@, to add back pressure to > UMA allocation caches. The problem of physical memory or KVA exhaustion > existed there for many years and it is quite critical now for improving > systems performance while keeping stability. Changes done in memory > allocation last years improved situation. but haven't fixed completely. My > patch solves remaining problems from two sides: a) reducing bucket sizes > every time system detects low memory condition; and b) as last-resort > mechanism for very low memory condition, it cycling over all CPUs to purge > their per-CPU UMA caches. Benefit of this approach is in absence of any > additional hard-coded limits on cache sizes -- they are self-tuned, based on > load and memory pressure. > > With this change I believe it should be safe enough to enable UMA allocation > caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for amd64). I did > many tests on machine with 24 logical cores (and as result strong allocation > cache effects), and can say that with 40GB RAM using UMA caches, allowed by > this change, by two times increases results of SPEC NFS benchmark on ZFS > pool of several SSDs. To test system stability I've run the same test with > physical memory limited to just 2GB and system successfully survived that, > and even showed results 1.5 times better then with just last resort measures > of b). In both cases tools/umastat no longer shows unbound UMA cache growth, > that makes me believe in viability of this approach for longer runs. > > I would like to hear some comments about that: > http://people.freebsd.org/~mav/uma_pressure.patch > > Thank you. > > -- > Alexander Motin > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 09:21:02 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C1A43373; Mon, 18 Nov 2013 09:21:02 +0000 (UTC) Received: from mail-ee0-x230.google.com (mail-ee0-x230.google.com [IPv6:2a00:1450:4013:c00::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0DA4029A0; Mon, 18 Nov 2013 09:21:01 +0000 (UTC) Received: by mail-ee0-f48.google.com with SMTP id e49so2313646eek.21 for ; Mon, 18 Nov 2013 01:21:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=WdHtD+khd2d/VfhXzRyplRr7wBb2XCgdBHz2xxlmbds=; b=ZGRKncfEbYrBaKzzzAZJBrXQ0K4IvvEJOSTHl0p/jFpte5nRQj9n2wd+67+nrfaEmG dIYa8F0Uu+jGthVNW8IUyza8LQ53isJkJRGd0ZViUACzV1Pmex4NQWkZmtUxODoKpDPB 3wP+WVd+Tnwe48dJolZrL40Ufa5oe81wINmYqlC2uIqoIvd6/GK0CIxYmYxyo7hEvDD1 qCMDIAo0rm5NCWMBdVm6RfXgsGMDAy1EmHF2sqNxw4bBbzOJ5+E9rFmrxHSenf9/tskw x0Esueh6EnGKWqWhj0j29mjn82NEbg2j4CPuxK3Dj26k/gSaKno98MPsXqRyYrFKes31 oMDA== X-Received: by 10.14.108.9 with SMTP id p9mr20316683eeg.8.1384766460341; Mon, 18 Nov 2013 01:21:00 -0800 (PST) Received: from mavbook.mavhome.dp.ua ([178.137.150.35]) by mx.google.com with ESMTPSA id s3sm35801312eeo.3.2013.11.18.01.20.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 18 Nov 2013 01:20:59 -0800 (PST) Sender: Alexander Motin Message-ID: <5289DBF9.80004@FreeBSD.org> Date: Mon, 18 Nov 2013 11:20:57 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: UMA cache back pressure References: <52894C92.60905@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 09:21:02 -0000 On 18.11.2013 10:41, Adrian Chadd wrote: > Your patch does three things: > > * adds a couple new buckets; These new buckets make bucket size self-tuning more soft and precise. Without them there are buckets for 1, 5, 13, 29, ... items. While at bigger sizes difference about 2x is fine, at smallest ones it is 5x and 2.6x respectively. New buckets make that line look like 1, 3, 5, 9, 13, 29, reducing jumps between steps, making algorithm work softer, allocating and freeing memory in better fitting chunks. Otherwise there is quite a big gap between allocating 128K and 5x128K of RAM at once. > * reduces some lock contention More precisely patch adds check for congestion on free to grow bucket sizes same as on allocation. As consequence that indeed should reduce lock congestion, but I don't have specific numbers. All I see is that VM and UMA mutexes no longer appear in profiling top after all these changes. * does soft back pressure In this list you have missed mentioning small but major point of the patch -- we should prevent problems, not just solve them. As I have written in original email, this specific change shown me 1.5x performance improvement in low-memory condition. As I understand, that happened because VM no longer have to repeatedly allocate and free hugely oversized buckets of 10-15 * 128K. > * does the aggressive backpressure. After all above that is mostly just a safety belt. With 40GB RAM that code was triggered only couple times during full hour of testing with debug logging inserted there. On machine with 2GB RAM it is triggered quite regularly and probably that is unavoidable since even with lowest bucket size of one item 24 CPUs mean 48 cache buckets, i.e. up to 6MB of otherwise unreleasable memory for single 128K zone. > So, do you get any benefits from just the first one, or first two? I don't see much reason to handle that in pieces. As I have described above, each part has own goal, but they much better work together. > On 17 November 2013 15:09, Alexander Motin wrote: >> Hi. >> >> I've created patch, based on earlier work of avg@, to add back pressure to >> UMA allocation caches. The problem of physical memory or KVA exhaustion >> existed there for many years and it is quite critical now for improving >> systems performance while keeping stability. Changes done in memory >> allocation last years improved situation. but haven't fixed completely. My >> patch solves remaining problems from two sides: a) reducing bucket sizes >> every time system detects low memory condition; and b) as last-resort >> mechanism for very low memory condition, it cycling over all CPUs to purge >> their per-CPU UMA caches. Benefit of this approach is in absence of any >> additional hard-coded limits on cache sizes -- they are self-tuned, based on >> load and memory pressure. >> >> With this change I believe it should be safe enough to enable UMA allocation >> caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for amd64). I did >> many tests on machine with 24 logical cores (and as result strong allocation >> cache effects), and can say that with 40GB RAM using UMA caches, allowed by >> this change, by two times increases results of SPEC NFS benchmark on ZFS >> pool of several SSDs. To test system stability I've run the same test with >> physical memory limited to just 2GB and system successfully survived that, >> and even showed results 1.5 times better then with just last resort measures >> of b). In both cases tools/umastat no longer shows unbound UMA cache growth, >> that makes me believe in viability of this approach for longer runs. >> >> I would like to hear some comments about that: >> http://people.freebsd.org/~mav/uma_pressure.patch -- Alexander Motin From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 09:45:29 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6F35F2B8; Mon, 18 Nov 2013 09:45:29 +0000 (UTC) Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com [IPv6:2a00:1450:4010:c03::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6CB2C2B7D; Mon, 18 Nov 2013 09:45:28 +0000 (UTC) Received: by mail-la0-f42.google.com with SMTP id ec20so4743519lab.1 for ; Mon, 18 Nov 2013 01:45:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=eCr1wXnw9yFHqGiIQpqKOrsD8/+8sHIx1oscnisEvTI=; b=R3P2Hi5Xl8bVdMgLuPr8Xf+unH796luoQ+dzRZKlha/brDuzDT77i7Cc6BVHr9ulog QWVm1qZMFsMvxGEPSIytagRCJbAUUiMV8NlH5muOckXzcqBgdFnRTdX9kEUZXT6cg6V+ jUxSP5Ep8McBkk7EEyGtvRgS/WmSlFdYEPQiafgCHV60JQl1OjY5c/Xa1yper8C+lT8S mMwX7WWGF3pNHEtfBznSTMdNsKn0itMGKSc4Epn/msH+aL6KTnbCWZzPjsdu6AdV9wVi RnbwMyVTKn5nRHEhcsF4fPxd0EaRG28acaRZ7i4t4SkC5gvRrnnGjNQT8NRTYfNTpsh0 LVjg== MIME-Version: 1.0 X-Received: by 10.112.219.99 with SMTP id pn3mr1025787lbc.24.1384767926523; Mon, 18 Nov 2013 01:45:26 -0800 (PST) Sender: rizzo.unipi@gmail.com Received: by 10.114.77.228 with HTTP; Mon, 18 Nov 2013 01:45:26 -0800 (PST) In-Reply-To: <5289DBF9.80004@FreeBSD.org> References: <52894C92.60905@FreeBSD.org> <5289DBF9.80004@FreeBSD.org> Date: Mon, 18 Nov 2013 10:45:26 +0100 X-Google-Sender-Auth: zAs_G8XSv3CVF7664Dac1teoEC8 Message-ID: Subject: Re: UMA cache back pressure From: Luigi Rizzo To: Alexander Motin Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: "freebsd-hackers@freebsd.org" , Adrian Chadd , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 09:45:29 -0000 On Mon, Nov 18, 2013 at 10:20 AM, Alexander Motin wrote: > On 18.11.2013 10:41, Adrian Chadd wrote: > >> Your patch does three things: >> >> * adds a couple new buckets; >> > > These new buckets make bucket size self-tuning more soft and precise. > Without them there are buckets for 1, 5, 13, 29, ... items. While at bigger > sizes difference about 2x is fine, at smallest ones it is 5x and 2.6x > respectively. New buckets make that line look like 1, 3, 5, 9, 13, 29, > reducing jumps between steps, making algorithm work softer, allocating and > freeing memory in better fitting chunks. Otherwise there is quite a big gap > between allocating 128K and 5x128K of RAM at once. > > just curious (and i do not understand whether the "1, 5 ..." are object sizes in bytes or what), would it make sense to add some instrumentation code (a small array of counters i presume) to track the actual number of requests for exact object sizes, and perhaps at runtime create buckets trying to reduce waste ? Following your reasoning there seems to be still a big gap between some of the numbers you quote in the sequence. cheers luigi From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 09:59:41 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C5DF97ED; Mon, 18 Nov 2013 09:59:41 +0000 (UTC) Received: from mail-ee0-x235.google.com (mail-ee0-x235.google.com [IPv6:2a00:1450:4013:c00::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 109852C41; Mon, 18 Nov 2013 09:59:40 +0000 (UTC) Received: by mail-ee0-f53.google.com with SMTP id b57so2361474eek.12 for ; Mon, 18 Nov 2013 01:59:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=z8SjCiJNVsJYQIQevLms4rBH10/eVvjRNnAsaPnx910=; b=gbBZO8svj8R5kM0wCmtYbbSwCgyU+Fad2BcKXXRq6WR9n8VeydpMOwcl8SlpK9RZYZ SPLelGk9dWsI3tJaLIg8/ImVC1YnNXcJ1vF8swI2RPAj0ZaIaLedPx7P/RTTQ8MpAf26 ePk7JtG8RGGCWpTtPg7L1FsIAN0+py9sHSa+dudQWyF/xnMwdBjoRHTtus94ZWZQV0iK Zqqow3pXsFaDe2LSTPZWkEFzgeAN1o7g1XbhXD3KprV9Y7x/Bk7ON2ilJPjxqUCWdqwQ T5QawgbeKYcLyU2pwuAlbuUMHLkipl0omokO9JxDb2frdazmVos9JuED7CX5QYJjtVzj bcUQ== X-Received: by 10.14.109.1 with SMTP id r1mr11909280eeg.32.1384768779114; Mon, 18 Nov 2013 01:59:39 -0800 (PST) Received: from mavbook.mavhome.dp.ua ([178.137.150.35]) by mx.google.com with ESMTPSA id o47sm36065475eem.21.2013.11.18.01.59.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 18 Nov 2013 01:59:38 -0800 (PST) Sender: Alexander Motin Message-ID: <5289E506.2070207@FreeBSD.org> Date: Mon, 18 Nov 2013 11:59:34 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: UMA cache back pressure References: <52894C92.60905@FreeBSD.org> <5289DBF9.80004@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-hackers@freebsd.org" , Adrian Chadd , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 09:59:41 -0000 On 18.11.2013 11:45, Luigi Rizzo wrote: > > > > On Mon, Nov 18, 2013 at 10:20 AM, Alexander Motin > wrote: > > On 18.11.2013 10:41, Adrian Chadd wrote: > > Your patch does three things: > > * adds a couple new buckets; > > > These new buckets make bucket size self-tuning more soft and > precise. Without them there are buckets for 1, 5, 13, 29, ... items. > While at bigger sizes difference about 2x is fine, at smallest ones > it is 5x and 2.6x respectively. New buckets make that line look like > 1, 3, 5, 9, 13, 29, reducing jumps between steps, making algorithm > work softer, allocating and freeing memory in better fitting chunks. > Otherwise there is quite a big gap between allocating 128K and > 5x128K of RAM at once. > > > just curious (and i do not understand whether the "1, 5 ..." are object > sizes in bytes or what), Buckets include header (~3 pointers), plus number of item pointers. So on amd64 1, 5, 13 mean 32, 64, 128 bytes per bucket. It is not really about saving memory on buckets themselves since they are very small, comparing to stored items. We could use bigger (like 16 items) bucket zone for allocating all smaller ones, overwriting just their items limit. But more zones potentially means also lower zone lock congestion there, so why not? > would it make sense to add some instrumentation > code (a small array of counters i presume) to track the actual number > of requests for exact object sizes, and perhaps at runtime create buckets > trying to reduce waste ? Since 10.0 buckets are also allocated from UMA cache zones, so all stats, garbage collection, etc. work by the same rules, which you can see in `vmstat -z`. > Following your reasoning there seems to be still a big gap between > some of the numbers you quote in the sequence. Big (2x) gaps between big numbers is less important since once we got there it means we have not so much memory pressure and should not be hurt by many extra frees. At lower numbers it may be more important. -- Alexander Motin From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 10:21:29 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 39EC6E7B for ; Mon, 18 Nov 2013 10:21:29 +0000 (UTC) Received: from dub0-omc2-s22.dub0.hotmail.com (dub0-omc2-s22.dub0.hotmail.com [157.55.1.161]) by mx1.freebsd.org (Postfix) with ESMTP id D42A82D78 for ; Mon, 18 Nov 2013 10:21:28 +0000 (UTC) Received: from DUB114-W124 ([157.55.1.137]) by dub0-omc2-s22.dub0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 18 Nov 2013 02:20:21 -0800 X-TMN: [+Cw046ZaBtdk29Dtf8z9sjINZlg/cGxY] X-Originating-Email: [robert.sevat@live.nl] Message-ID: From: Robert Sevat To: "freebsd-hackers@freebsd.org" Subject: FreeBSD hangs during boot when assigned a controller via vt-d Date: Mon, 18 Nov 2013 11:20:20 +0100 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 18 Nov 2013 10:20:21.0209 (UTC) FILETIME=[C9578490:01CEE447] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.16 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 10:21:29 -0000 Greetings=2C I have a problem with forwarding an LSI 2308 via vt-d in KVM to a FreeBSD v= irtual machine. FreeBSD (9.2 and 10.0 beta 3) will=20 hang during the boot. Hardware Setup: Supermicro X10SL7-F with LSI 2308 flashed to IT mode 8x4 GB ecc ram Haswell Xeon E3-1230V3 Software Setup: Ubuntu 12.04.3 LTS 64 bit + latest KVM version. uname -a Linux Secretum 3.8.0-33-generic #48~precise1-Ubuntu SMP Thu Oct 24 16:28:06= UTC 2013 x86_64 x86_64 x86_64 GNU/Linux dpkg --list | grep kvm=20 ii kvm 1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu14.12 dummy transiti= onal package from kvm to qemu-kvm ii kvm-ipxe 1.0.0+git-3.55f6c88-0ubuntu1 PXE ROM's for = KVM ii qemu-kvm 1.0+noroms-0ubuntu14.12 Full virtualiz= ation on i386 and amd64 hardware Under KVM I have the following 3 virtual machines installed=2C I have tried= forwarding the LSI 2308 to all three virtual machines.=20 It works perfectly under Ubuntu=2C but both FreeBSD vms will hang during th= e boot. FreeBSD 9.2 FreeBSD 10.0 beta 3 FreeBSD 10.0 live cd Ubuntu 12.04 LTS If I run FreeBSD 10.0 beta 3 directly on the hardware=2C it does recognize = the raid controller and it'll use the mps0 driver.=20 Everything works fine then. So the problem is that for some reason FreeBSD hangs during boot if you for= ward the LSI 2308 via vt-d=2C and I have no idea why. It will hang and give the following error: http://i.imgur.com/hAMxwR7.png http://i.imgur.com/rKALeXZ.png While doing so the FreeBSD virtual machine uses 300% cpu=2C so it maxes out= 3 cores. And it will stay like that. After googling a bit some people suggested turning off msi / msix in the lo= ader.conf hw.pci.enable_msi=3D"0" hw.pci.enable_msix=3D"0" I have tried this on both freebsd virtual machines=2C it makes no differenc= e. It still hangs. Could somebody point me in the right direction of what I could still try? S= hould I submit this as a bug? Should I ask this on another mailing list? Kind Regards Robert Sevat = From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 12:10:22 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 37258F17; Mon, 18 Nov 2013 12:10:22 +0000 (UTC) Received: from mail-qa0-x236.google.com (mail-qa0-x236.google.com [IPv6:2607:f8b0:400d:c00::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CD2962567; Mon, 18 Nov 2013 12:10:21 +0000 (UTC) Received: by mail-qa0-f54.google.com with SMTP id f11so1145119qae.6 for ; Mon, 18 Nov 2013 04:10:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=v5eD+yxyx4RXPz2/rlhOv350IlwQtIUrqZeXMtBJX+M=; b=hoim1jtWE5vY0rmrIH7vvklPIGGwUy5FP0dkfGP0a9H2TrM7rzzzzc1M7fTbsoGyWv IEdTmxnMLXV54aDdIoqkU3I0zPFviKXznZchnNpqQYq9TOQnqSafccoZFhxAc3FtumL9 tol6BelQOa/F6kfdqCC4nNsS720BbLG5PB7Q9ufJTfQej9K4moaS01IJvFcM/9DRFZPy W2Z3AEryFvSkEHYP0jpMzJalB7a3n+7XQVgkVDaJn2jmcTcNUR0lZjf9K0fmCdtFZfoD al1JCYap8/ucOrRJVZcOl3+vif/suFvhikdWvqPuKplnitKW0wb1UUBUoKWvy+oO6rBz Ddrw== MIME-Version: 1.0 X-Received: by 10.224.98.200 with SMTP id r8mr33352927qan.26.1384776619952; Mon, 18 Nov 2013 04:10:19 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.207.66 with HTTP; Mon, 18 Nov 2013 04:10:19 -0800 (PST) In-Reply-To: <5289DBF9.80004@FreeBSD.org> References: <52894C92.60905@FreeBSD.org> <5289DBF9.80004@FreeBSD.org> Date: Mon, 18 Nov 2013 04:10:19 -0800 X-Google-Sender-Auth: FWiYWYHm8mpA44UH0srn0ozYp3g Message-ID: Subject: Re: UMA cache back pressure From: Adrian Chadd To: Alexander Motin Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 12:10:22 -0000 On 18 November 2013 01:20, Alexander Motin wrote: > On 18.11.2013 10:41, Adrian Chadd wrote: >> >> Your patch does three things: >> >> * adds a couple new buckets; > > > These new buckets make bucket size self-tuning more soft and precise. > Without them there are buckets for 1, 5, 13, 29, ... items. While at bigger > sizes difference about 2x is fine, at smallest ones it is 5x and 2.6x > respectively. New buckets make that line look like 1, 3, 5, 9, 13, 29, > reducing jumps between steps, making algorithm work softer, allocating and > freeing memory in better fitting chunks. Otherwise there is quite a big gap > between allocating 128K and 5x128K of RAM at once. Right. That makes sense, but your initial email didn't say "oh, I'm adding more buckets." :-) > >> * reduces some lock contention > > > More precisely patch adds check for congestion on free to grow bucket sizes > same as on allocation. As consequence that indeed should reduce lock > congestion, but I don't have specific numbers. All I see is that VM and UMA > mutexes no longer appear in profiling top after all these changes. Sure. But again, you don't say that in your commit message. :) > * does soft back pressure > > In this list you have missed mentioning small but major point of the patch > -- we should prevent problems, not just solve them. As I have written in > original email, this specific change shown me 1.5x performance improvement > in low-memory condition. As I understand, that happened because VM no longer > have to repeatedly allocate and free hugely oversized buckets of 10-15 * > 128K. yup, sorry I missed this. It's a sneaky two lines. :) > >> * does the aggressive backpressure. > > > After all above that is mostly just a safety belt. With 40GB RAM that code > was triggered only couple times during full hour of testing with debug > logging inserted there. On machine with 2GB RAM it is triggered quite > regularly and probably that is unavoidable since even with lowest bucket > size of one item 24 CPUs mean 48 cache buckets, i.e. up to 6MB of otherwise > unreleasable memory for single 128K zone. > > >> So, do you get any benefits from just the first one, or first two? > > > I don't see much reason to handle that in pieces. As I have described above, > each part has own goal, but they much better work together. Well, with changes like this, having them broken up and committed in small pieces make it easier for people to do regression testing with. If you introduce some regression in a particular workload then the user or developer is only going to find that it's this patch and won't necessarily know how to break it down into pieces to see which piece actually introduced the regression in their specific workload. I totally agree that this should be done! It just does seem to be something that could be committed in smaller pieces quite easily so to make potential debugging later on down the road much easier. Each commit builds on the previous commit. So, something like (in order): * add two new buckets, here's why * fix locking, here's why * soft back pressure * aggressive backpressure Did you get profiling traces from the VM free paths? Is it because it's churning the physical pages through the VM physical allocator? or? -adrian From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 12:57:10 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F2791286; Mon, 18 Nov 2013 12:57:09 +0000 (UTC) Received: from mail-ee0-x229.google.com (mail-ee0-x229.google.com [IPv6:2a00:1450:4013:c00::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3ED0B284D; Mon, 18 Nov 2013 12:57:09 +0000 (UTC) Received: by mail-ee0-f41.google.com with SMTP id t10so1216942eei.14 for ; Mon, 18 Nov 2013 04:57:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=6yV6N5N0Giemz4fI7O2fKbBixJGKUz5MaPhHRdIz0Js=; b=Jym+V3dyXUgWsnLHNYMAQYfyDAHnsvxNBudPJV4AJLLL50idcHR9W3e3chPrlEBuKQ FYng6MJSNcipFMnyyu9k6hT8JxtD7aPUViAINPqk4u/4mTnmNaD72m4oFfkwbgoVjRsr ErZPLs4TVvcpo4XzwZJ688GzjUYyGNRF0m43VOq6ZOQFLMlHS0KH1dv6u7+qdNdhcVyp PndiFPs31T5ZtC/P9NlC0J0KNhC+6GZqsLsLvXscsDT/2OHNSaCobLIJEZnBkE685Vnl 4dCB+s3KjNap0HF/Xe2vVg2RGviq8kMJEm6pCiamcWedgEZzCNcZGzdTGRxPF58wNAz+ Jmww== X-Received: by 10.14.113.137 with SMTP id a9mr12600546eeh.3.1384779427676; Mon, 18 Nov 2013 04:57:07 -0800 (PST) Received: from mavbook.mavhome.dp.ua ([178.137.150.35]) by mx.google.com with ESMTPSA id 44sm37646908eek.5.2013.11.18.04.57.05 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 18 Nov 2013 04:57:06 -0800 (PST) Sender: Alexander Motin Message-ID: <528A0EA0.3040408@FreeBSD.org> Date: Mon, 18 Nov 2013 14:57:04 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: UMA cache back pressure References: <52894C92.60905@FreeBSD.org> <5289DBF9.80004@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 12:57:10 -0000 On 18.11.2013 14:10, Adrian Chadd wrote: > On 18 November 2013 01:20, Alexander Motin wrote: >> On 18.11.2013 10:41, Adrian Chadd wrote: >>> So, do you get any benefits from just the first one, or first two? >> >> I don't see much reason to handle that in pieces. As I have described above, >> each part has own goal, but they much better work together. > > Well, with changes like this, having them broken up and committed in > small pieces make it easier for people to do regression testing with. > > If you introduce some regression in a particular workload then the > user or developer is only going to find that it's this patch and won't > necessarily know how to break it down into pieces to see which piece > actually introduced the regression in their specific workload. I can't argue here, but too many small pieces turning later merging into a headache. This patch is not that big to not be reviewable at one piece. What's about better commit message -- your hint accepted. :) > I totally agree that this should be done! It just does seem to be > something that could be committed in smaller pieces quite easily so to > make potential debugging later on down the road much easier. Each > commit builds on the previous commit. > > So, something like (in order): > > * add two new buckets, here's why > * fix locking, here's why > * soft back pressure > * aggressive backpressure I can do that it you insist, I would just take different order (3,1,4,2). 2 without 3 will make buckets grow faster, that may be bad without back pressure. > Did you get profiling traces from the VM free paths? Is it because > it's churning the physical pages through the VM physical allocator? > or? Yes. Without use_uma enabled I've seen up to 50% of CPU time burned on locks held around expensive VM magic such as TLB shutdown, etc. With use_uma enabled situation improved a lot, but I've seen periodical bursts, which I guess happened when system was getting low on memory and started aggressively purge gigabytes of oversized caches. With this patch I haven't noticed such behavior so far at all, though it may be subjective since test runs quite some time and load is not very stationary. -- Alexander Motin From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 19:01:45 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DCC65C0; Mon, 18 Nov 2013 19:01:44 +0000 (UTC) Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90]) by mx1.freebsd.org (Postfix) with ESMTP id A27182F81; Mon, 18 Nov 2013 19:01:44 +0000 (UTC) X-Ambrisko-Me: Yes Received: from server2.ambrisko.com (HELO internal.ambrisko.com) ([192.168.1.2]) by ironport.ambrisko.com with ESMTP; 18 Nov 2013 11:05:35 -0800 Received: from ambrisko.com (localhost [127.0.0.1]) by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id rAIJ1hRO037251; Mon, 18 Nov 2013 11:01:43 -0800 (PST) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.14.4/8.14.4/Submit) id rAIJ1gOT037249; Mon, 18 Nov 2013 11:01:42 -0800 (PST) (envelope-from ambrisko) Date: Mon, 18 Nov 2013 11:01:42 -0800 From: Doug Ambrisko To: Konstantin Belousov Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20131118190142.GA28210@ambrisko.com> References: <51B3B59B.8050903@erdgeist.org> <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131116183129.GD59496@kib.kiev.ua> User-Agent: Mutt/1.4.2.3i Cc: freebsd-hackers@freebsd.org, Dirk Engling , Jase Thew , mdf@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 19:01:45 -0000 On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote: | On Thu, Nov 14, 2013 at 05:08:54PM -0800, Doug Ambrisko wrote: | > On Thu, Nov 14, 2013 at 09:32:17PM +0000, Jase Thew wrote: | > | On 10/06/2013 16:52, John Baldwin wrote: | > | > On Saturday, June 08, 2013 9:36:27 pm mdf@freebsd.org wrote: | > | >> On Sat, Jun 8, 2013 at 3:52 PM, Dirk Engling wrote: | > | >> | > | >>> The arbitrary value | > | >>> | > | >>> #define MNAMELEN 88 /* size of on/from name bufs */ | > | >>> | > | >>> struct statfs { | > | >>> [...] | > | >>> char f_mntfromname[MNAMELEN];/* mounted filesystem */ | > | >>> char f_mntonname[MNAMELEN]; /* directory on which mounted */ | > | >>> }; | > | >>> | > | >>> currently bites us when trying to use poudriere with errors like | > | >>> | > | >>> 'mount: tmpfs: File name too long' | > | >>> | > | >>> | > | >>> /poudriere/data/build/91_RELEASE_amd64-REALLY-REALLY-LONG- | > | > JAILNAME/ref/wrkdirs | > | >>> | > | >>> The topic has been discussed several times since 2004 and has been | > | >>> postponed each time, the last time when it hit zfs users: | > | >>> | > | >>> http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007974.html | > | >>> | > | >>> So I'd like to point to the calendar, it's 2013 already and there's | > | >>> still a static arbitrary (and way too low) limit in one of the core | > | >>> areas of the vfs code. | > | >>> | > | >>> So I'd like to bump the issue and propose either making f_mntfromname a | > | >>> dynamic allocation or just increase MNAMELEN, using 10.0 as water shed. | > | >>> | > | >> | > | >> Gleb Kurtsou did this along with the ino64 GSoC project. Unfortunately, | > | >> both he and I hit ENOTIME due to the job that pays the bills and it's | > | >> never made it back to the main repository. | > | >> | > | >> IIRC, though, the only reason for doing it with 64-bit ino_t is that he'd | > | >> already finished changing the stat/dirent ABI so what was one more. I | > | >> think he went with 1024 bytes, which also necessitated not allocating | > | >> statfs on the stack for the kernel. | > | > | > | > He also fixed a few other things since changing this ABI is so invasive | > | > IIRC. This really is the right fix for this. Is it in an svn branch | > | > that can be updated and a new patch generated? | > | > | > | | > | Hi folks, | > | | > | Has there been any progress on addressing this issue? With the advent of | > | pkgng / poudriere, this limitation is really becoming a frustrating problem. | > | > I looked at NetBSD and what they did with statvfs. The mount paths | > lengths are bigger in NetBSD defines so that helps. However, when | > testing it out via a script that keep on doing a nullfs mount in | > every increasing directory depth I found that NetBSD would allow the | > mount to exceed the value in statvfs. When NetBSD populates the path | > in statvfs they truncate it to what fits in statvfs. So I looked at | > what that might be like in FreeBSD. So I came up with this simple patch: | > | > --- /sys/kern/vfs_mount.c 2013-10-01 14:27:35.000000000 -0700 | > +++ vfs_mount.c 2013-10-21 14:20:19.000000000 -0700 | > @@ -656,7 +656,7 @@ vfs_donmount(struct thread *td, uint64_t | > * variables will fit in our mp buffers, including the | > * terminating NUL. | > */ | > - if (fstypelen >= MFSNAMELEN - 1 || fspathlen >= MNAMELEN - 1) { | > + if (fstypelen >= MFSNAMELEN - 1 || fspathlen >= MAXPATHLEN - 1) { | > error = ENAMETOOLONG; | > goto bail; | > } | > @@ -748,8 +748,8 @@ sys_mount(td, uap) | > return (EOPNOTSUPP); | > } | > | > - ma = mount_argsu(ma, "fstype", uap->type, MNAMELEN); | > - ma = mount_argsu(ma, "fspath", uap->path, MNAMELEN); | > + ma = mount_argsu(ma, "fstype", uap->type, MFSNAMELEN); | > + ma = mount_argsu(ma, "fspath", uap->path, MAXPATHLEN); | > ma = mount_argb(ma, flags & MNT_RDONLY, "noro"); | > ma = mount_argb(ma, !(flags & MNT_NOSUID), "nosuid"); | > ma = mount_argb(ma, !(flags & MNT_NOEXEC), "noexec"); | > @@ -1039,7 +1039,7 @@ vfs_domount( | > * variables will fit in our mp buffers, including the | > * terminating NUL. | > */ | > - if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MNAMELEN) | > + if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MAXPATHLEN) | > return (ENAMETOOLONG); | > | > if (jailed(td->td_ucred) || usermount == 0) { | > @@ -1095,9 +1095,9 @@ vfs_domount( | > NDFREE(&nd, NDF_ONLY_PNBUF); | > vp = nd.ni_vp; | > if ((fsflags & MNT_UPDATE) == 0) { | > - pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK); | > + pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK); | > strcpy(pathbuf, fspath); | > - error = vn_path_to_global_path(td, vp, pathbuf, MNAMELEN); | > + error = vn_path_to_global_path(td, vp, pathbuf, MAXPATHLEN); | > /* debug.disablefullpath == 1 results in ENODEV */ | > if (error == 0 || error == ENODEV) { | > error = vfs_domount_first(td, vfsp, pathbuf, vp, | > @@ -1147,8 +1147,8 @@ sys_unmount(td, uap) | > return (error); | > } | > | > - pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK); | > - error = copyinstr(uap->path, pathbuf, MNAMELEN, NULL); | > + pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK); | > + error = copyinstr(uap->path, pathbuf, MAXPATHLEN, NULL); | > if (error) { | > free(pathbuf, M_TEMP); | > return (error); | > @@ -1181,7 +1181,7 @@ sys_unmount(td, uap) | > vfslocked = NDHASGIANT(&nd); | > NDFREE(&nd, NDF_ONLY_PNBUF); | > error = vn_path_to_global_path(td, nd.ni_vp, pathbuf, | > - MNAMELEN); | > + MAXPATHLEN); | > if (error == 0 || error == ENODEV) | > vput(nd.ni_vp); | > VFS_UNLOCK_GIANT(vfslocked); | > | > I seemed to have found a typo bug in an instance in which MFSNAMELEN | > wasn't used in the fstype when I did this change. | > | > With this patch things in general seem to work. You can do a | > mount and umount of a long path. The umount of the long path works | > by failing on the exact match but then passing when via the FSID. | > df/mount looks a little strange since it shows a truncated path | > but has valid contents (FS type, space etc.). umount via the truncated | > path works if there is only one truncated path that matches. If there | > are multiple then it fails. | > | > This doesn't change and kernel ABI's so then it is safe to apply to the | > kernel without rebuilding user-land. | > | > Future work could be to implement statvfs to return a longer path but | > only do it for df/umount etc. The rest of the system could continue | > with the existing statfs. mount works because it passed a string into | > the kernel so it can be long. | > | > I'd propose this as a current solution to this problem. It appears to | > work well and shouldn't drastically break things. Doing df via the | > full path, stat etc. work since the associated path access the vnode. | > So things that do a mount, df of the mount point etc. should continue | > to work. Scripts that try to figure out the mount points vi df and mount | > displaying all mount points would fail. That is probably good enough for | > now. | > | > Comments welcomed. | | Generally, I agree with the approach, but what is done seems to be too | simple to be usable. I like the simplicity and I'd like to see examples of not being usable. | One obvious and important thing which is broken with the patch is | the unmounts from jails. In other words, now it is possible to mount | something from jail with appropriate privileges set up, and after that | corresponding mount cannot be unmounted, since vfs_mount_alloc() copies | trimmed path into f_mntonname, and sys_unmount() matches full path with | pathbuf. Hmm, this should be broken in the same way for non-jailed | mounts with pathes which do not fit into f_mntonname. They can be umounted since it will fall back to fsid as in the non-jail case. I just tried it sorry for the bad line wrap: + mount 192.168.38.1:/data/home/ambrisko/netboot /data/jail/test + jail -i -c name=test path=/data/jail/test host.hostname=test.ambrisko.com persist enforce_statfs=0 allow.mount=1 allow.mount.devfs=1 allow.mount.nullfs=1 allow.mount.tmpfs=1 allow.mount.procfs=1 14 + jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc + jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev + jexec test df + egrep '^devfs|^procfs' devfs 2 2 0 100% /dev procfs 8 8 0 100% /proc + jexec test mount -t procfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc + jexec test mount -t devfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev + jexec test df + egrep '^devfs|^procfs' devfs 2 2 0 100% /dev procfs 8 8 0 100% /proc procfs 8 8 0 100% /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 devfs 2 2 0 100% /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 + jexec test mount -v + egrep '^devfs|^procfs' devfs on /dev (devfs, local, multilabel, fsid 00ff007171000000) procfs on /proc (procfs, local, fsid 02ff000202000000) procfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (procfs, local, fsid 26ff000202000000) devfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (devfs, local, multilabel, fsid 27ff007171000000) + jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc + jexec test df + egrep '^devfs|^procfs' devfs 2 2 0 100% /dev procfs 8 8 0 100% /proc devfs 2 2 0 100% /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 + jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev + jexec test df + egrep '^devfs|^procfs' devfs 2 2 0 100% /dev procfs 8 8 0 100% /proc | I think that struct mount should have a const char * field where the | non-trimmed path is stored and used for match at unmount. f_mntonname | truncation would be only unfortunate user interface glitch. Note that we are not storing the path in mount structure so no structures have changed which is nice since then we haven't introduced any real ABI breakage. So we could MFC this. The match isn't critical since umount will fall back to fsid and work. One thing that might be good to do is change umount to try to umount via fsid first and then do the match if the fsid failed versus the other way round that it does now. The problem I see is if someone tries to do things based on the parsed output of mount/df then that will fail since the output is truncated. Thanks for looking at this, Doug A. From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 19:55:19 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 99D13167; Mon, 18 Nov 2013 19:55:19 +0000 (UTC) Received: from mail-ea0-x233.google.com (mail-ea0-x233.google.com [IPv6:2a00:1450:4013:c01::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id F24A82364; Mon, 18 Nov 2013 19:55:18 +0000 (UTC) Received: by mail-ea0-f179.google.com with SMTP id r15so2667889ead.10 for ; Mon, 18 Nov 2013 11:55:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=k6Xz+6WHeny21iheK79VQT8uRr9JgaXrrJ13RPuulnA=; b=rS/J3wBsZDiqJsRAE8+wIIsD1dwHp+Fw+3fQG49zL+U+voHnP4vuQSk6aDkrxbC5Br IU7jK6+XoliJeJQg83hP1EmyngaoIx/7bVWQhDiCwN/uuKIaDKww9+uMggocALWK/zyo 5TDAJx3Ov8puayeTNvySX2rdQuZLhkGy3sY28v+UW/3U+fxLY/ZFVzMgBPNJk2CUiGE4 AI0hFBArQ32lX/l/FS0n3Z+FrRVoyTcdCih813FFhJ4lfyq3sF03IfUWcrOEFNwH8LjL qYOKLnQLCpWdcpM0BX146ZKpXZIDax8bucmT766qkkkha8EQ920wHVTccXb5mzA4lnhd k0kQ== X-Received: by 10.14.0.72 with SMTP id 48mr5414158eea.50.1384804517422; Mon, 18 Nov 2013 11:55:17 -0800 (PST) Received: from mavbook.mavhome.dp.ua ([178.137.150.35]) by mx.google.com with ESMTPSA id w6sm41027683eeo.12.2013.11.18.11.55.15 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 18 Nov 2013 11:55:16 -0800 (PST) Sender: Alexander Motin Message-ID: <528A70A2.4010308@FreeBSD.org> Date: Mon, 18 Nov 2013 21:55:14 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Jeff Roberson Subject: Re: UMA cache back pressure References: <52894C92.60905@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 19:55:19 -0000 On 18.11.2013 21:11, Jeff Roberson wrote: > On Mon, 18 Nov 2013, Alexander Motin wrote: >> I've created patch, based on earlier work of avg@, to add back >> pressure to UMA allocation caches. The problem of physical memory or >> KVA exhaustion existed there for many years and it is quite critical >> now for improving systems performance while keeping stability. Changes >> done in memory allocation last years improved situation. but haven't >> fixed completely. My patch solves remaining problems from two sides: >> a) reducing bucket sizes every time system detects low memory >> condition; and b) as last-resort mechanism for very low memory >> condition, it cycling over all CPUs to purge their per-CPU UMA caches. >> Benefit of this approach is in absence of any additional hard-coded >> limits on cache sizes -- they are self-tuned, based on load and memory >> pressure. >> >> With this change I believe it should be safe enough to enable UMA >> allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for >> amd64). I did many tests on machine with 24 logical cores (and as >> result strong allocation cache effects), and can say that with 40GB >> RAM using UMA caches, allowed by this change, by two times increases >> results of SPEC NFS benchmark on ZFS pool of several SSDs. To test >> system stability I've run the same test with physical memory limited >> to just 2GB and system successfully survived that, and even showed >> results 1.5 times better then with just last resort measures of b). In >> both cases tools/umastat no longer shows unbound UMA cache growth, >> that makes me believe in viability of this approach for longer runs. >> >> I would like to hear some comments about that: >> http://people.freebsd.org/~mav/uma_pressure.patch > > Hey Mav, > > This is a great start and great results. I think it could probably even > go in as-is, but I have a few suggestions. Hey! Thanks for your review. I appreciate. > First, let's test this with something that is really super allocator > heavy and doesn't benefit much from bucket sizing. For example, a > network forwarding test. Or maybe you could get someone like Netflix > that is using it to push a lot of bits with less filesystem cost than > zfs and spec. I am not sure what simple forwarding may show in this case. Even on my workload with ZFS creating strong memory pressure I still have mbuf* zones buckets almost (some totally) maxed out. Without other major (or even any) pressure in system they just can't become bigger then maximum. But if you can propose some interesting test case with pressure that I can reproduce -- I am all ears. > Second, the cpu binding is a very costly and very high-latency > operation. It would make sense to do CPU_FOREACH and then ZONE_FOREACH. > You're also biasing the first zones in the list. The low memory > condition will more often clear after you check these first zones. So > you might just check it once and equally penalize all zones. I'm > concerned that doing CPU_FOREACH in every zone will slow the pagedaemon > more. I completely agree with all you said here. This part of code I just took as-is from earlier work. It definitely can be improved. I'll take a look on that. But as I have mentioned in one of earlier responses that code used in _very_ rare cases, unless system is heavily overloaded on memory, like doing ZFS on box with 24 cores and 2GB RAM. During reasonable operation it is enough to have soft back pressure to keep on caches in shape and never call that. > We also have been working towards per-domain pagedaemons so > perhaps we should have a uma-reclaim taskqueue that we wake up to do the > work? VM is not my area so far, so please propose "the right way". I took this task now only because I have to due to huge performance bottleneck this problem causes and years it remains unsolved. > Third, using vm_page_count_min() will only trigger when the pageout > daemon can't keep up with the free target. Typically this should only > happen with a lot of dirty mmap'd pages or incredibly high system load > coupled with frequent allocations. So there may be many cases where > reclaiming the extra UMA memory is helpful but the pagedaemon can still > keep up while pushing out file pages that we'd prefer to keep. As I have told that is indeed last resort. It does not need to be done often. Per-CPU caches just should not grow without real need to the point when they have to be cleaned. > I think the perfect heuristic would have some idea of how likely the UMA > pages are to be re-used immediately so we can more effectively tradeoff > between file pages and kernel memory cache. As it is now we limit the > uma_reclaim() calls to every 10 seconds when there is memory pressure. > Perhaps we could keep a timestamp for when the last slab was allocated > to a zone and do the more expensive reclaim on zones who have timestamps > that exceed some threshold? Then have a lower threshold for reclaiming > at all? Again, it doesn't need to be perfect, but I believe we can catch > a wider set of cases by carefully scheduling this. I was thinking about that too. But I think timestamps should be set not on slab, but on bucket. The fact that zone is not allocating new slabs does not mean it does not use its already allocated buckets. If we put time of the last refill into each bucket, then we should be able to purge all buckets, unused for specified period of time. Additionally we could put timestamp on zone and update it every time zone runs out of its cache. If zone does not run out of cache for some time -- probably it has unused buckets. So when we need some RAM we should take a first look on zones that had stale timestamp. -- Alexander Motin From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 20:12:19 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9997E9E1; Mon, 18 Nov 2013 20:12:19 +0000 (UTC) Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com [IPv6:2607:f8b0:400d:c01::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3A4D5246A; Mon, 18 Nov 2013 20:12:19 +0000 (UTC) Received: by mail-qc0-f180.google.com with SMTP id e16so2209772qcx.25 for ; Mon, 18 Nov 2013 12:12:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=HqgiATGrJ/ODsDXBEhSOw2BEkJggcTDFe3Y+u3r8XaI=; b=h9blKoVL7Rtp3CsyIz17NF/++7F9nW8bKhDu1mlvf3SGX3ImW1PpVg/QQ2F3H24TSw ndOpDd7ddtDE4ep50g7o5Xu6C5FH1DqJvrkjq2Q9D1lR9AYX7MVWOiqwOmLbaSp8NHRw OVDJEOYnXmvMEqKlFwGsTpKsgHLFf+dqE6WfPrwoTTOL+/bvc5B/HgICi9Vaj1TWYzzA oRgLUABk2ApiQP8AwnftpPMlLvyYr0AJf9iBcqtOrpebHNLqYm3wUSRR+HReFOpPr4Iq 5FMTB28tAc+egTVHaQ3lWNBFjGyCWV1dHxYrbDDH3EuQmjqepw32We4cZMN2s3bKfOPW P9Jw== MIME-Version: 1.0 X-Received: by 10.49.71.207 with SMTP id x15mr37164431qeu.49.1384805538000; Mon, 18 Nov 2013 12:12:18 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.207.66 with HTTP; Mon, 18 Nov 2013 12:12:17 -0800 (PST) In-Reply-To: <528A70A2.4010308@FreeBSD.org> References: <52894C92.60905@FreeBSD.org> <528A70A2.4010308@FreeBSD.org> Date: Mon, 18 Nov 2013 12:12:17 -0800 X-Google-Sender-Auth: -bRyyF1NPA-IgriJgTBwA18SvkM Message-ID: Subject: Re: UMA cache back pressure From: Adrian Chadd To: Alexander Motin Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" , Jeff Roberson X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 20:12:19 -0000 Remember that for Netflix, we have a mostly non-cachable workload (with some very specific exceptions!) and thus we churn through VM pages at a presitidigious rate. 20gbit sec, or ~ 2.4 gigabytes a second, or ~ 680,000 4 kilobyte pages a second. It's quite frightening and it's only likely to increase. There's a lot of pressure from all over the place so IIRC pools tend to not stay very large for very long. That's why I'm interested in your specific situations. Doing an all CPU TLB shootdown with 24 cores is costly. But after we killed some incorrect KVA mapping flags for sendfile, we (netflix) totally stopped seeing the TLB shootdown and IPIs in any of the performance traces. Now, doing 24 cores worth of ZFS when you let the pools grow to the size you do is understandable, but I'd like to just make sure that you aren't breaking performance for people doing different workloads on less cores. I'm a bit busy at work with other things so I can't spin up your patch on a cache for another week or two. But I'll certainly get around to it as I'd like to see this stuff catch on. What I _can_ do in a reasonably immediate timeframe is update vm0.freebsd.org to the latest -HEAD and stress test your patch out. I'm using vm0.freebsd.org to stress test -HEAD with ZFS doing concurrent poudriere builds so it gets very crowded on that box. The box currently survives a couple days before I hit some races to do with vnode exhaustion and a lack of handling there, and ZFS deadlocks. I'll just run this up to see if anything unexpected happens that causes it to blow up in a different way. Thanks, -adrian On 18 November 2013 11:55, Alexander Motin wrote: > On 18.11.2013 21:11, Jeff Roberson wrote: >> >> On Mon, 18 Nov 2013, Alexander Motin wrote: >>> >>> I've created patch, based on earlier work of avg@, to add back >>> pressure to UMA allocation caches. The problem of physical memory or >>> KVA exhaustion existed there for many years and it is quite critical >>> now for improving systems performance while keeping stability. Changes >>> done in memory allocation last years improved situation. but haven't >>> fixed completely. My patch solves remaining problems from two sides: >>> a) reducing bucket sizes every time system detects low memory >>> condition; and b) as last-resort mechanism for very low memory >>> condition, it cycling over all CPUs to purge their per-CPU UMA caches. >>> Benefit of this approach is in absence of any additional hard-coded >>> limits on cache sizes -- they are self-tuned, based on load and memory >>> pressure. >>> >>> With this change I believe it should be safe enough to enable UMA >>> allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for >>> amd64). I did many tests on machine with 24 logical cores (and as >>> result strong allocation cache effects), and can say that with 40GB >>> RAM using UMA caches, allowed by this change, by two times increases >>> results of SPEC NFS benchmark on ZFS pool of several SSDs. To test >>> system stability I've run the same test with physical memory limited >>> to just 2GB and system successfully survived that, and even showed >>> results 1.5 times better then with just last resort measures of b). In >>> both cases tools/umastat no longer shows unbound UMA cache growth, >>> that makes me believe in viability of this approach for longer runs. >>> >>> I would like to hear some comments about that: >>> http://people.freebsd.org/~mav/uma_pressure.patch >> >> >> Hey Mav, >> >> This is a great start and great results. I think it could probably even >> go in as-is, but I have a few suggestions. > > > Hey! Thanks for your review. I appreciate. > > >> First, let's test this with something that is really super allocator >> heavy and doesn't benefit much from bucket sizing. For example, a >> network forwarding test. Or maybe you could get someone like Netflix >> that is using it to push a lot of bits with less filesystem cost than >> zfs and spec. > > > I am not sure what simple forwarding may show in this case. Even on my > workload with ZFS creating strong memory pressure I still have mbuf* zones > buckets almost (some totally) maxed out. Without other major (or even any) > pressure in system they just can't become bigger then maximum. But if you > can propose some interesting test case with pressure that I can reproduce -- > I am all ears. > > >> Second, the cpu binding is a very costly and very high-latency >> operation. It would make sense to do CPU_FOREACH and then ZONE_FOREACH. >> You're also biasing the first zones in the list. The low memory >> condition will more often clear after you check these first zones. So >> you might just check it once and equally penalize all zones. I'm >> concerned that doing CPU_FOREACH in every zone will slow the pagedaemon >> more. > > > I completely agree with all you said here. This part of code I just took > as-is from earlier work. It definitely can be improved. I'll take a look on > that. But as I have mentioned in one of earlier responses that code used in > _very_ rare cases, unless system is heavily overloaded on memory, like doing > ZFS on box with 24 cores and 2GB RAM. During reasonable operation it is > enough to have soft back pressure to keep on caches in shape and never call > that. > > >> We also have been working towards per-domain pagedaemons so >> perhaps we should have a uma-reclaim taskqueue that we wake up to do the >> work? > > > VM is not my area so far, so please propose "the right way". I took this > task now only because I have to due to huge performance bottleneck this > problem causes and years it remains unsolved. > > >> Third, using vm_page_count_min() will only trigger when the pageout >> daemon can't keep up with the free target. Typically this should only >> happen with a lot of dirty mmap'd pages or incredibly high system load >> coupled with frequent allocations. So there may be many cases where >> reclaiming the extra UMA memory is helpful but the pagedaemon can still >> keep up while pushing out file pages that we'd prefer to keep. > > > As I have told that is indeed last resort. It does not need to be done > often. Per-CPU caches just should not grow without real need to the point > when they have to be cleaned. > > >> I think the perfect heuristic would have some idea of how likely the UMA >> pages are to be re-used immediately so we can more effectively tradeoff >> between file pages and kernel memory cache. As it is now we limit the >> uma_reclaim() calls to every 10 seconds when there is memory pressure. >> Perhaps we could keep a timestamp for when the last slab was allocated >> to a zone and do the more expensive reclaim on zones who have timestamps >> that exceed some threshold? Then have a lower threshold for reclaiming >> at all? Again, it doesn't need to be perfect, but I believe we can catch >> a wider set of cases by carefully scheduling this. > > > I was thinking about that too. But I think timestamps should be set not on > slab, but on bucket. The fact that zone is not allocating new slabs does not > mean it does not use its already allocated buckets. If we put time of the > last refill into each bucket, then we should be able to purge all buckets, > unused for specified period of time. Additionally we could put timestamp on > zone and update it every time zone runs out of its cache. If zone does not > run out of cache for some time -- probably it has unused buckets. So when we > need some RAM we should take a first look on zones that had stale timestamp. > > > -- > Alexander Motin > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Mon Nov 18 19:23:08 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8DA41C41 for ; Mon, 18 Nov 2013 19:23:08 +0000 (UTC) Received: from mail-pb0-f45.google.com (mail-pb0-f45.google.com [209.85.160.45]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 663E9210B for ; Mon, 18 Nov 2013 19:23:08 +0000 (UTC) Received: by mail-pb0-f45.google.com with SMTP id rp16so736609pbb.18 for ; Mon, 18 Nov 2013 11:23:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version:content-type; bh=Bu5VE7OEDP7NC5GovIKIn+lYk8jOjHUwIFfitNaW8Iw=; b=L+/gZiGy3Sfi2nw0CkclUwn7gcUKx0ahXqwIw+fOGO7kzItJ1qhYHq+1ihDU4ofUXj yCX3VFHzE7K+3GupH5XWS3u1YXnta1TsIz/SNM9jYesaKuZgEb6rqlU44Cbfl3X4IaPX sF11ECzig7lAJa/d/H+n23oPmj+4in3WPvHyRS6TiBLccwAGgFpE0HGDAn8mHL43ieEf JcKbe4HMrHT1HIjzzKEl6GC+05HdTbCRIqODi/5Y/3srcRh0QKmFQ6TAvZGis5vFGb6V 2M9jKpnLeCV02mC2jeKVU1MbqAtiY6PKVI/mHCxyKY+iMJTgPXhWzE+W3N/ZEkjwUQEA c1OA== X-Gm-Message-State: ALoCoQmARYzGT0Yu/o7OyAe+14YJf2i2JVFz3SbkeAc5BsC/fXJeITRxNbhKEjLw+1BAC+AIKOuq X-Received: by 10.68.180.162 with SMTP id dp2mr22499630pbc.5.1384802106553; Mon, 18 Nov 2013 11:15:06 -0800 (PST) Received: from rrcs-66-91-135-210.west.biz.rr.com (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210]) by mx.google.com with ESMTPSA id gg10sm25139867pbc.46.2013.11.18.11.15.04 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 18 Nov 2013 11:15:05 -0800 (PST) Date: Mon, 18 Nov 2013 09:11:10 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Alexander Motin Subject: Re: UMA cache back pressure In-Reply-To: <52894C92.60905@FreeBSD.org> Message-ID: References: <52894C92.60905@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Mon, 18 Nov 2013 20:16:59 +0000 Cc: "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 19:23:08 -0000 On Mon, 18 Nov 2013, Alexander Motin wrote: > Hi. > > I've created patch, based on earlier work of avg@, to add back pressure to > UMA allocation caches. The problem of physical memory or KVA exhaustion > existed there for many years and it is quite critical now for improving > systems performance while keeping stability. Changes done in memory > allocation last years improved situation. but haven't fixed completely. My > patch solves remaining problems from two sides: a) reducing bucket sizes > every time system detects low memory condition; and b) as last-resort > mechanism for very low memory condition, it cycling over all CPUs to purge > their per-CPU UMA caches. Benefit of this approach is in absence of any > additional hard-coded limits on cache sizes -- they are self-tuned, based on > load and memory pressure. > > With this change I believe it should be safe enough to enable UMA allocation > caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for amd64). I did > many tests on machine with 24 logical cores (and as result strong allocation > cache effects), and can say that with 40GB RAM using UMA caches, allowed by > this change, by two times increases results of SPEC NFS benchmark on ZFS pool > of several SSDs. To test system stability I've run the same test with > physical memory limited to just 2GB and system successfully survived that, > and even showed results 1.5 times better then with just last resort measures > of b). In both cases tools/umastat no longer shows unbound UMA cache growth, > that makes me believe in viability of this approach for longer runs. > > I would like to hear some comments about that: > http://people.freebsd.org/~mav/uma_pressure.patch Hey Mav, This is a great start and great results. I think it could probably even go in as-is, but I have a few suggestions. First, let's test this with something that is really super allocator heavy and doesn't benefit much from bucket sizing. For example, a network forwarding test. Or maybe you could get someone like Netflix that is using it to push a lot of bits with less filesystem cost than zfs and spec. Second, the cpu binding is a very costly and very high-latency operation. It would make sense to do CPU_FOREACH and then ZONE_FOREACH. You're also biasing the first zones in the list. The low memory condition will more often clear after you check these first zones. So you might just check it once and equally penalize all zones. I'm concerned that doing CPU_FOREACH in every zone will slow the pagedaemon more. We also have been working towards per-domain pagedaemons so perhaps we should have a uma-reclaim taskqueue that we wake up to do the work? Third, using vm_page_count_min() will only trigger when the pageout daemon can't keep up with the free target. Typically this should only happen with a lot of dirty mmap'd pages or incredibly high system load coupled with frequent allocations. So there may be many cases where reclaiming the extra UMA memory is helpful but the pagedaemon can still keep up while pushing out file pages that we'd prefer to keep. I think the perfect heuristic would have some idea of how likely the UMA pages are to be re-used immediately so we can more effectively tradeoff between file pages and kernel memory cache. As it is now we limit the uma_reclaim() calls to every 10 seconds when there is memory pressure. Perhaps we could keep a timestamp for when the last slab was allocated to a zone and do the more expensive reclaim on zones who have timestamps that exceed some threshold? Then have a lower threshold for reclaiming at all? Again, it doesn't need to be perfect, but I believe we can catch a wider set of cases by carefully scheduling this. Thanks, Jeff > > Thank you. > > -- > Alexander Motin > From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 19 03:57:04 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BD4D9D70 for ; Tue, 19 Nov 2013 03:57:04 +0000 (UTC) Received: from mail-pd0-f180.google.com (mail-pd0-f180.google.com [209.85.192.180]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 92A25206C for ; Tue, 19 Nov 2013 03:57:04 +0000 (UTC) Received: by mail-pd0-f180.google.com with SMTP id q10so2319679pdj.11 for ; Mon, 18 Nov 2013 19:56:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version:content-type; bh=6VF9TAgBSajXXWBrk0LFqIW0eS5I0BGL9U8kzpTXeFk=; b=cTQdOMMsnrxmU+tt8HhLXeJMAFKnNMs5kIO7Dpq7rtMSb0G3fF23RuwkHM3UQrfFEw yfQvv+tQcuW3Z4sytZBrQYqGwmTqDnpISO/nqo4vvqj3Losp1OpqZnBfGTlQpqT2KA91 4e3FZJ1lQ2i3+oJ7BpfTM0opZBnDrgzb1BrDOs+xxH6dFj4G5umYFcBEKJBl2zs5Mrng CeKRHFpSsXdRIsyimfiu3NqXTyqb9xPMRaeEpX+kZt/JxKghApwXexmmd8JBcXw0hoya YObO3udtCGR0nd6jw1p/nSsjrXOSizz9U/tzGob9dBaWieKAydDAIGBBBpBCB/KSFTLi N7+A== X-Gm-Message-State: ALoCoQnKzwJ1i4qKWuMjUxPDRIDmGmbQYJ8YLv2A8jadXD5Yi4hOTxcXOL87vW51YLgcZSq9EfnE X-Received: by 10.69.11.130 with SMTP id ei2mr6017490pbd.144.1384833417886; Mon, 18 Nov 2013 19:56:57 -0800 (PST) Received: from rrcs-66-91-135-210.west.biz.rr.com (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210]) by mx.google.com with ESMTPSA id g8sm13723486pbe.37.2013.11.18.19.56.55 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 18 Nov 2013 19:56:57 -0800 (PST) Date: Mon, 18 Nov 2013 17:52:59 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Adrian Chadd Subject: Re: UMA cache back pressure In-Reply-To: Message-ID: References: <52894C92.60905@FreeBSD.org> <528A70A2.4010308@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Tue, 19 Nov 2013 04:02:28 +0000 Cc: "freebsd-hackers@freebsd.org" , Alexander Motin , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Nov 2013 03:57:04 -0000 On Mon, 18 Nov 2013, Adrian Chadd wrote: > Remember that for Netflix, we have a mostly non-cachable workload > (with some very specific exceptions!) and thus we churn through VM > pages at a presitidigious rate. 20gbit sec, or ~ 2.4 gigabytes a > second, or ~ 680,000 4 kilobyte pages a second. It's quite frightening > and it's only likely to increase. > > There's a lot of pressure from all over the place so IIRC pools tend > to not stay very large for very long. I think the combination of a lot of cache pressure, a lot of allocator use, and no ZFS makes you an interesting candidate. > > That's why I'm interested in your specific situations. Doing an all > CPU TLB shootdown with 24 cores is costly. But after we killed some > incorrect KVA mapping flags for sendfile, we (netflix) totally stopped Do you have any information on what this change was? > seeing the TLB shootdown and IPIs in any of the performance traces. > Now, doing 24 cores worth of ZFS when you let the pools grow to the > size you do is understandable, but I'd like to just make sure that you > aren't breaking performance for people doing different workloads on > less cores. We also have opportunities now with vmem to cache KVA backed pages and release them together in bulk when necessary. However, remember most UMA memory won't need an IPI since it comes from the direct map. Only the few zones which use very large allocations will. Jeff > > I'm a bit busy at work with other things so I can't spin up your patch > on a cache for another week or two. But I'll certainly get around to > it as I'd like to see this stuff catch on. > > What I _can_ do in a reasonably immediate timeframe is update > vm0.freebsd.org to the latest -HEAD and stress test your patch out. > I'm using vm0.freebsd.org to stress test -HEAD with ZFS doing > concurrent poudriere builds so it gets very crowded on that box. The > box currently survives a couple days before I hit some races to do > with vnode exhaustion and a lack of handling there, and ZFS deadlocks. > I'll just run this up to see if anything unexpected happens that > causes it to blow up in a different way. > > Thanks, > > > > -adrian > > > On 18 November 2013 11:55, Alexander Motin wrote: >> On 18.11.2013 21:11, Jeff Roberson wrote: >>> >>> On Mon, 18 Nov 2013, Alexander Motin wrote: >>>> >>>> I've created patch, based on earlier work of avg@, to add back >>>> pressure to UMA allocation caches. The problem of physical memory or >>>> KVA exhaustion existed there for many years and it is quite critical >>>> now for improving systems performance while keeping stability. Changes >>>> done in memory allocation last years improved situation. but haven't >>>> fixed completely. My patch solves remaining problems from two sides: >>>> a) reducing bucket sizes every time system detects low memory >>>> condition; and b) as last-resort mechanism for very low memory >>>> condition, it cycling over all CPUs to purge their per-CPU UMA caches. >>>> Benefit of this approach is in absence of any additional hard-coded >>>> limits on cache sizes -- they are self-tuned, based on load and memory >>>> pressure. >>>> >>>> With this change I believe it should be safe enough to enable UMA >>>> allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for >>>> amd64). I did many tests on machine with 24 logical cores (and as >>>> result strong allocation cache effects), and can say that with 40GB >>>> RAM using UMA caches, allowed by this change, by two times increases >>>> results of SPEC NFS benchmark on ZFS pool of several SSDs. To test >>>> system stability I've run the same test with physical memory limited >>>> to just 2GB and system successfully survived that, and even showed >>>> results 1.5 times better then with just last resort measures of b). In >>>> both cases tools/umastat no longer shows unbound UMA cache growth, >>>> that makes me believe in viability of this approach for longer runs. >>>> >>>> I would like to hear some comments about that: >>>> http://people.freebsd.org/~mav/uma_pressure.patch >>> >>> >>> Hey Mav, >>> >>> This is a great start and great results. I think it could probably even >>> go in as-is, but I have a few suggestions. >> >> >> Hey! Thanks for your review. I appreciate. >> >> >>> First, let's test this with something that is really super allocator >>> heavy and doesn't benefit much from bucket sizing. For example, a >>> network forwarding test. Or maybe you could get someone like Netflix >>> that is using it to push a lot of bits with less filesystem cost than >>> zfs and spec. >> >> >> I am not sure what simple forwarding may show in this case. Even on my >> workload with ZFS creating strong memory pressure I still have mbuf* zones >> buckets almost (some totally) maxed out. Without other major (or even any) >> pressure in system they just can't become bigger then maximum. But if you >> can propose some interesting test case with pressure that I can reproduce -- >> I am all ears. >> >> >>> Second, the cpu binding is a very costly and very high-latency >>> operation. It would make sense to do CPU_FOREACH and then ZONE_FOREACH. >>> You're also biasing the first zones in the list. The low memory >>> condition will more often clear after you check these first zones. So >>> you might just check it once and equally penalize all zones. I'm >>> concerned that doing CPU_FOREACH in every zone will slow the pagedaemon >>> more. >> >> >> I completely agree with all you said here. This part of code I just took >> as-is from earlier work. It definitely can be improved. I'll take a look on >> that. But as I have mentioned in one of earlier responses that code used in >> _very_ rare cases, unless system is heavily overloaded on memory, like doing >> ZFS on box with 24 cores and 2GB RAM. During reasonable operation it is >> enough to have soft back pressure to keep on caches in shape and never call >> that. >> >> >>> We also have been working towards per-domain pagedaemons so >>> perhaps we should have a uma-reclaim taskqueue that we wake up to do the >>> work? >> >> >> VM is not my area so far, so please propose "the right way". I took this >> task now only because I have to due to huge performance bottleneck this >> problem causes and years it remains unsolved. >> >> >>> Third, using vm_page_count_min() will only trigger when the pageout >>> daemon can't keep up with the free target. Typically this should only >>> happen with a lot of dirty mmap'd pages or incredibly high system load >>> coupled with frequent allocations. So there may be many cases where >>> reclaiming the extra UMA memory is helpful but the pagedaemon can still >>> keep up while pushing out file pages that we'd prefer to keep. >> >> >> As I have told that is indeed last resort. It does not need to be done >> often. Per-CPU caches just should not grow without real need to the point >> when they have to be cleaned. >> >> >>> I think the perfect heuristic would have some idea of how likely the UMA >>> pages are to be re-used immediately so we can more effectively tradeoff >>> between file pages and kernel memory cache. As it is now we limit the >>> uma_reclaim() calls to every 10 seconds when there is memory pressure. >>> Perhaps we could keep a timestamp for when the last slab was allocated >>> to a zone and do the more expensive reclaim on zones who have timestamps >>> that exceed some threshold? Then have a lower threshold for reclaiming >>> at all? Again, it doesn't need to be perfect, but I believe we can catch >>> a wider set of cases by carefully scheduling this. >> >> >> I was thinking about that too. But I think timestamps should be set not on >> slab, but on bucket. The fact that zone is not allocating new slabs does not >> mean it does not use its already allocated buckets. If we put time of the >> last refill into each bucket, then we should be able to purge all buckets, >> unused for specified period of time. Additionally we could put timestamp on >> zone and update it every time zone runs out of its cache. If zone does not >> run out of cache for some time -- probably it has unused buckets. So when we >> need some RAM we should take a first look on zones that had stale timestamp. >> >> >> -- >> Alexander Motin >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 19 04:02:59 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3300CEBE for ; Tue, 19 Nov 2013 04:02:59 +0000 (UTC) Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com [209.85.220.52]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0B8DA20D1 for ; Tue, 19 Nov 2013 04:02:58 +0000 (UTC) Received: by mail-pa0-f52.google.com with SMTP id ld10so3134800pab.25 for ; Mon, 18 Nov 2013 20:02:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version:content-type; bh=HMXfPC5MgeeZiW4Xl2FDvu4fohfBX8Z/xq8+r8X8A1g=; b=lFy8rjEZVEkfDcqPds/O5rZmo9LAwh/tSXb4pKJGMBDC1cBSpGY9MfUx/yb0reBfjt 0zVmBqHe2ZCe9yfG1UvG749/EZSYWA0m43aMOK5zEN7hytcjpyW1b93bUMCsVN5iKGDO 8rk2QrBqow/Unvxs8wF4OiyH9k4k/rmj98Y1ZW4cMmM6526WCQlDFIZNIaqBu6i3owJl Z/Rospl+ZR8EeAx7dJaba1b2usx7/LnuEmgu6t55hGJMzMwoMLZn7WyKoSkBaxYt2jzR XQDuuQvpy/xkjapCYg5sbC14zToUIZLHr5s77RFu2hwtwCSXYJ1/8DjC6lIQN8FnriXx 6+Sw== X-Gm-Message-State: ALoCoQllvG6lUCJKhp0sNGWJ4SAWnSgG+WvvpBAAoqzY1HIlcexTqvCsHVkQ0v0pYi7tU5cUXAGT X-Received: by 10.68.218.3 with SMTP id pc3mr16807146pbc.71.1384833293474; Mon, 18 Nov 2013 19:54:53 -0800 (PST) Received: from rrcs-66-91-135-210.west.biz.rr.com (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210]) by mx.google.com with ESMTPSA id gg10sm26972304pbc.46.2013.11.18.19.54.51 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 18 Nov 2013 19:54:52 -0800 (PST) Date: Mon, 18 Nov 2013 17:50:54 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Alexander Motin Subject: Re: UMA cache back pressure In-Reply-To: <528A70A2.4010308@FreeBSD.org> Message-ID: References: <52894C92.60905@FreeBSD.org> <528A70A2.4010308@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Tue, 19 Nov 2013 04:10:54 +0000 Cc: "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Nov 2013 04:02:59 -0000 On Mon, 18 Nov 2013, Alexander Motin wrote: > On 18.11.2013 21:11, Jeff Roberson wrote: >> On Mon, 18 Nov 2013, Alexander Motin wrote: >>> I've created patch, based on earlier work of avg@, to add back >>> pressure to UMA allocation caches. The problem of physical memory or >>> KVA exhaustion existed there for many years and it is quite critical >>> now for improving systems performance while keeping stability. Changes >>> done in memory allocation last years improved situation. but haven't >>> fixed completely. My patch solves remaining problems from two sides: >>> a) reducing bucket sizes every time system detects low memory >>> condition; and b) as last-resort mechanism for very low memory >>> condition, it cycling over all CPUs to purge their per-CPU UMA caches. >>> Benefit of this approach is in absence of any additional hard-coded >>> limits on cache sizes -- they are self-tuned, based on load and memory >>> pressure. >>> >>> With this change I believe it should be safe enough to enable UMA >>> allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for >>> amd64). I did many tests on machine with 24 logical cores (and as >>> result strong allocation cache effects), and can say that with 40GB >>> RAM using UMA caches, allowed by this change, by two times increases >>> results of SPEC NFS benchmark on ZFS pool of several SSDs. To test >>> system stability I've run the same test with physical memory limited >>> to just 2GB and system successfully survived that, and even showed >>> results 1.5 times better then with just last resort measures of b). In >>> both cases tools/umastat no longer shows unbound UMA cache growth, >>> that makes me believe in viability of this approach for longer runs. >>> >>> I would like to hear some comments about that: >>> http://people.freebsd.org/~mav/uma_pressure.patch >> >> Hey Mav, >> >> This is a great start and great results. I think it could probably even >> go in as-is, but I have a few suggestions. > > Hey! Thanks for your review. I appreciate. And I appreciate more people being interested in working on the allocator. > >> First, let's test this with something that is really super allocator >> heavy and doesn't benefit much from bucket sizing. For example, a >> network forwarding test. Or maybe you could get someone like Netflix >> that is using it to push a lot of bits with less filesystem cost than >> zfs and spec. > > I am not sure what simple forwarding may show in this case. Even on my > workload with ZFS creating strong memory pressure I still have mbuf* zones > buckets almost (some totally) maxed out. Without other major (or even any) > pressure in system they just can't become bigger then maximum. But if you can > propose some interesting test case with pressure that I can reproduce -- I am > all ears. I think part of that is also because you're using min free pages right now as your threshold. It should probably be triggering slightly more often. > >> Second, the cpu binding is a very costly and very high-latency >> operation. It would make sense to do CPU_FOREACH and then ZONE_FOREACH. >> You're also biasing the first zones in the list. The low memory >> condition will more often clear after you check these first zones. So >> you might just check it once and equally penalize all zones. I'm >> concerned that doing CPU_FOREACH in every zone will slow the pagedaemon >> more. > > I completely agree with all you said here. This part of code I just took > as-is from earlier work. It definitely can be improved. I'll take a look on > that. But as I have mentioned in one of earlier responses that code used in > _very_ rare cases, unless system is heavily overloaded on memory, like doing > ZFS on box with 24 cores and 2GB RAM. During reasonable operation it is > enough to have soft back pressure to keep on caches in shape and never call > that. > >> We also have been working towards per-domain pagedaemons so >> perhaps we should have a uma-reclaim taskqueue that we wake up to do the >> work? > > VM is not my area so far, so please propose "the right way". I took this task > now only because I have to due to huge performance bottleneck this problem > causes and years it remains unsolved. Well it's probably fine to keep abusing the first domain's pageout daemon for now but we won't want to in the future, especially if we want to keep each domain's page daemon on the socket that it's managing. > >> Third, using vm_page_count_min() will only trigger when the pageout >> daemon can't keep up with the free target. Typically this should only >> happen with a lot of dirty mmap'd pages or incredibly high system load >> coupled with frequent allocations. So there may be many cases where >> reclaiming the extra UMA memory is helpful but the pagedaemon can still >> keep up while pushing out file pages that we'd prefer to keep. > > As I have told that is indeed last resort. It does not need to be done often. > Per-CPU caches just should not grow without real need to the point when they > have to be cleaned. Let me explain it differently. Right now you're handling cases of overloaded CPU, if we run this code under different conditions we could handle overloaded memory better as well. Imagine a system which has oversized buckets and lots of wasted memory but a pageout daemon which is still meeting targets by evicting page cache pages. Perhaps there was a temporary use of some very large zones which is no longer necessary. Since we meet the paging target quickly enough we will never discover this other memory that we can evict. Look at the vm page targets. The target is very far from the min. So typically the thread just wakes up and evicts clean pages very quickly to accommodate this. ZFS is particularly affected because its pages can't be evicted by the page daemon, so you're more likely to run out, but other systems would benefit from this and they do have pages which could be evicted where you'd like to preserve them by trimming the uma cache. Does that make sense? > >> I think the perfect heuristic would have some idea of how likely the UMA >> pages are to be re-used immediately so we can more effectively tradeoff >> between file pages and kernel memory cache. As it is now we limit the >> uma_reclaim() calls to every 10 seconds when there is memory pressure. >> Perhaps we could keep a timestamp for when the last slab was allocated >> to a zone and do the more expensive reclaim on zones who have timestamps >> that exceed some threshold? Then have a lower threshold for reclaiming >> at all? Again, it doesn't need to be perfect, but I believe we can catch >> a wider set of cases by carefully scheduling this. > > I was thinking about that too. But I think timestamps should be set not on > slab, but on bucket. The fact that zone is not allocating new slabs does not > mean it does not use its already allocated buckets. If we put time of the > last refill into each bucket, then we should be able to purge all buckets, > unused for specified period of time. Additionally we could put timestamp on > zone and update it every time zone runs out of its cache. If zone does not > run out of cache for some time -- probably it has unused buckets. So when we > need some RAM we should take a first look on zones that had stale timestamp. Many healthy flow control algorithms maintain a relatively steady state by periodically testing the edges. I would prefer to maintain the timestamp on a per-zone basis and not per-bucket anyway as it saves some space and we'd have to resize all the buckets if we take up another pointers space. Anyway, I'm not too dogmatic about it. There are probably several convenient ways to write it and no perfect one. May I suggest that you make the change to only FOREACH_CPU once and then commit with your current heuristic. Then we can try to take it one step further? Thanks, Jeff > > -- > Alexander Motin > From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 19 07:49:33 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C0DC5B1F; Tue, 19 Nov 2013 07:49:33 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 60FDB2A46; Tue, 19 Nov 2013 07:49:33 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAJ7nNck031698; Tue, 19 Nov 2013 09:49:23 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAJ7nNck031698 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAJ7nMiU031696; Tue, 19 Nov 2013 09:49:22 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 19 Nov 2013 09:49:22 +0200 From: Konstantin Belousov To: Doug Ambrisko Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20131119074922.GY59496@kib.kiev.ua> References: <51B3B59B.8050903@erdgeist.org> <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua> <20131118190142.GA28210@ambrisko.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rmCyz0CRE2AtwE2l" Content-Disposition: inline In-Reply-To: <20131118190142.GA28210@ambrisko.com> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-hackers@freebsd.org, Dirk Engling , Jase Thew , mdf@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Nov 2013 07:49:33 -0000 --rmCyz0CRE2AtwE2l Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Nov 18, 2013 at 11:01:42AM -0800, Doug Ambrisko wrote: > On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote: > | Generally, I agree with the approach, but what is done seems to be too > | simple to be usable. >=20 > I like the simplicity and I'd like to see examples of not being usable. I did exactly this in the text following the introductionary sentence, isn't it ? > =20 > | One obvious and important thing which is broken with the patch is > | the unmounts from jails. In other words, now it is possible to mount > | something from jail with appropriate privileges set up, and after that > | corresponding mount cannot be unmounted, since vfs_mount_alloc() copies > | trimmed path into f_mntonname, and sys_unmount() matches full path with > | pathbuf. Hmm, this should be broken in the same way for non-jailed > | mounts with pathes which do not fit into f_mntonname. >=20 > They can be umounted since it will fall back to fsid as in the non-jail > case. I just tried it sorry for the bad line wrap: >=20 > + mount 192.168.38.1:/data/home/ambrisko/netboot /data/jail/test > + jail -i -c name=3Dtest path=3D/data/jail/test host.hostname=3Dtest.ambr= isko.com persist enforce_statfs=3D0 allow.mount=3D1 allow.mount.devfs=3D1 a= llow.mount.nullfs=3D1 allow.mount.tmpfs=3D1 allow.mount.procfs=3D1 > 14 > + jexec test mkdir -p /12345678901234567890123456789012345678901234567890= 12345678901234567890123456789012345678901234567890/proc > + jexec test mkdir -p /12345678901234567890123456789012345678901234567890= 12345678901234567890123456789012345678901234567890/dev > + jexec test df > + egrep '^devfs|^procfs' > devfs 2 2 0 = 100% /dev > procfs 8 8 0 = 100% /proc > + jexec test mount -t procfs null //1234567890123456789012345678901234567= 890123456789012345678901234567890123456789012345678901234567890/proc > + jexec test mount -t devfs null //12345678901234567890123456789012345678= 90123456789012345678901234567890123456789012345678901234567890/dev > + jexec test df > + egrep '^devfs|^procfs' > devfs 2 2 0 = 100% /dev > procfs 8 8 0 = 100% /proc > procfs 8 8 0 = 100% /data/jail/test/1234567890123456789012345678901234567890123456789= 0123456789012345678901 > devfs 2 2 0 = 100% /data/jail/test/1234567890123456789012345678901234567890123456789= 0123456789012345678901 > + jexec test mount -v > + egrep '^devfs|^procfs' > devfs on /dev (devfs, local, multilabel, fsid 00ff007171000000) > procfs on /proc (procfs, local, fsid 02ff000202000000) > procfs on /data/jail/test/12345678901234567890123456789012345678901234567= 890123456789012345678901 (procfs, local, fsid 26ff000202000000) > devfs on /data/jail/test/123456789012345678901234567890123456789012345678= 90123456789012345678901 (devfs, local, multilabel, fsid 27ff007171000000) > + jexec test umount /1234567890123456789012345678901234567890123456789012= 345678901234567890123456789012345678901234567890/proc > + jexec test df > + egrep '^devfs|^procfs' > devfs 2 2 0 = 100% /dev > procfs 8 8 0 = 100% /proc > devfs 2 2 0 = 100% /data/jail/test/1234567890123456789012345678901234567890123456789= 0123456789012345678901 > + jexec test umount /1234567890123456789012345678901234567890123456789012= 345678901234567890123456789012345678901234567890/dev > + jexec test df > + egrep '^devfs|^procfs' > devfs 2 2 0 = 100% /dev > procfs 8 8 0 = 100% /proc I.e. unmount gets EINVAL, right ? I do not like it, if going this route, why do we need to store the path in the kernel at all ? At least, the attempt to unmount by path should consistently return EINVAL always, instead of failing randomly due to an implementation detail, where the caller can reasonably expect the syscall to succeed. >=20 > | I think that struct mount should have a const char * field where the > | non-trimmed path is stored and used for match at unmount. f_mntonname > | truncation would be only unfortunate user interface glitch. >=20 > Note that we are not storing the path in mount structure so no structures > have changed which is nice since then we haven't introduced any real > ABI breakage. So we could MFC this. The match isn't critical since > umount will fall back to fsid and work. One thing that might be good to > do is change umount to try to umount via fsid first and then do the > match if the fsid failed versus the other way round that it does now. I do not like somtimes not storing the full path of the mount point. I do understand that the path can easily made invalid, but I still want it there. MFC is not the problem for struct mount, which is never directly allocated by non-VFS. The new member must be added to the end of the structure, which preserves KBI. I did such surgery more than once. >=20 > The problem I see is if someone tries to do things based on the parsed > output of mount/df then that will fail since the output is truncated. Yes, this is understandable and IMO acceptable. --rmCyz0CRE2AtwE2l Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSixgBAAoJEJDCuSvBvK1B1S4P/ilstNB84CaFfE1r+III3xkU 1X7eekSDOOz2A98mvW2pr4cqusf6Xx2J/feSIoKTqWs9yBcfGZjZt9l+i5d0C5t3 8JayjS/1sEYnbns1w4C34LoiVOHBRN1VAwS9XKQDcAvEcIpHFrYHxChHlkpsxRe8 +cz5hl2U9gRS6RjKHQJpC5OyskhMwXqjbbvJsvo37YEk0mYPAS9HvjBGilNgTph4 e9/a/ophP0AOF72KSMgaat5WT+37+x/ja6wBz+I3GWXjz0QgueuK/TIj/f3NFQpI pJwYwwtkvZ4pcxv1ELv4ZwShHGRpI5HiUbRw9M1dTJgy3vTQ3pOWTxKqkH8XGkSq N+vyww2toBBSCjL++UOaIaq7YPamR7jbeu2cNyQAbm8xh8fxwzZRmjOo1EevHR3r NfRkjLhlQYszbR7LPmEyfZBYkJUAUivkXlYqhszQ0H5usUk+lBa9PzblMvrO3XU+ o2oA+aqGioRZmm9JlwKsqIIYgA8aQyZzAxRDOrgDurDxtD4fUyTNks78mIocwse6 n9hvLTXCED9Oc7OPW8rBnyetGLX0YpCBsoN/E+1TCOOkuZHvmG78LY1Ofp8yB8zK EE66xfKpw6K7sQNSH6fsnbC/U9i1t9QVvmPG0UHKllR8yYAE/yJu86NUdytYWJRS JvTzu0zVvmwXjvlggau5 =OSsC -----END PGP SIGNATURE----- --rmCyz0CRE2AtwE2l-- From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 19 14:33:19 2013 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CE331D71; Tue, 19 Nov 2013 14:33:19 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 90B8A240A; Tue, 19 Nov 2013 14:33:18 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA07017; Tue, 19 Nov 2013 16:33:10 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1VimMX-000Kce-Ts; Tue, 19 Nov 2013 16:33:09 +0200 Message-ID: <528B7681.6090806@FreeBSD.org> Date: Tue, 19 Nov 2013 16:32:33 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: freebsd-hackers@FreeBSD.org Subject: Fwd: taskqueue_block References: <5287BDB9.10201@FreeBSD.org> In-Reply-To: <5287BDB9.10201@FreeBSD.org> X-Enigmail-Version: 1.6 X-Forwarded-Message-Id: <5287BDB9.10201@FreeBSD.org> Content-Type: text/plain; charset=x-viet-vps Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Nov 2013 14:33:19 -0000 Forwarding this to the larger audience for a discussion. -------- Original Message -------- Message-ID: <5287BDB9.10201@FreeBSD.org> Date: Sat, 16 Nov 2013 20:47:21 +0200 From: Andriy Gapon Subject: taskqueue_block It seems that either I do not understand something about taskqueue_block code or it is a quite dangerous and abused API. The fact that it is not properly documented does not help either. The commit message said: > Implement taskqueue_block() and taskqueue_unblock(). These functions allow the > owner of a queue to block and unblock execution of the tasks in the queue while > allowing tasks to continue to be added queue. Combining this with > taskqueue_drain() allows a queue to be safely disabled. The unblock function may [...] I indeed see this (anti?) pattern being used in the code. But what about the following case. One thread calls taskqueue_block() and sets TQ_FLAGS_BLOCKED. Another thread calls taskqueue_enqueue, this adds a task to the queue and sets ta_pending of the task to 1. tq_enqueue is not called, so an actual queue runner is not called or waken up. Then the first thread calls taskqueue_drain() on the task. As far as I can see, the thread would then just wait forever because the task is pending and is not going to be executed. Additionally, it is impossible to reason about the taskqueue's state after taskqueue_block call, because the call just sets the flag and does not do any synchronization. And as described above, it is not safe to call APIs that could allow the taskqueue or the task state to become known. I think that taskqueue_block() should wait on the currently active tasks to complete. I don't think that this behavior could be optional. I do see any reasonable and safe use for "non-blocking" taskqueue_block(). taskqueue_drain() calls after taskqueue_block() must be removed. The code should either use taskqueue_drain() or "blocking" taskqueue_block() depending on concrete circumstances. What do you think? Thank you. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 19 17:42:24 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0CB1B15A; Tue, 19 Nov 2013 17:42:24 +0000 (UTC) Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90]) by mx1.freebsd.org (Postfix) with ESMTP id C55C32F71; Tue, 19 Nov 2013 17:42:23 +0000 (UTC) X-Ambrisko-Me: Yes Received: from server2.ambrisko.com (HELO internal.ambrisko.com) ([192.168.1.2]) by ironport.ambrisko.com with ESMTP; 19 Nov 2013 09:46:09 -0800 Received: from ambrisko.com (localhost [127.0.0.1]) by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id rAJHgGs7006486; Tue, 19 Nov 2013 09:42:16 -0800 (PST) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.14.4/8.14.4/Submit) id rAJHgGmT006464; Tue, 19 Nov 2013 09:42:16 -0800 (PST) (envelope-from ambrisko) Date: Tue, 19 Nov 2013 09:42:16 -0800 From: Doug Ambrisko To: Konstantin Belousov Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20131119174216.GA80753@ambrisko.com> References: <51B3B59B.8050903@erdgeist.org> <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua> <20131118190142.GA28210@ambrisko.com> <20131119074922.GY59496@kib.kiev.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131119074922.GY59496@kib.kiev.ua> User-Agent: Mutt/1.4.2.3i Cc: freebsd-hackers@freebsd.org, Dirk Engling , Jase Thew , mdf@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Nov 2013 17:42:24 -0000 On Tue, Nov 19, 2013 at 09:49:22AM +0200, Konstantin Belousov wrote: | On Mon, Nov 18, 2013 at 11:01:42AM -0800, Doug Ambrisko wrote: | > On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote: | > | Generally, I agree with the approach, but what is done seems to be too | > | simple to be usable. | > | > I like the simplicity and I'd like to see examples of not being usable. | I did exactly this in the text following the introductionary sentence, | isn't it ? I thought you were implying more then the one example that you gave. | > | One obvious and important thing which is broken with the patch is | > | the unmounts from jails. In other words, now it is possible to mount | > | something from jail with appropriate privileges set up, and after that | > | corresponding mount cannot be unmounted, since vfs_mount_alloc() copies | > | trimmed path into f_mntonname, and sys_unmount() matches full path with | > | pathbuf. Hmm, this should be broken in the same way for non-jailed | > | mounts with pathes which do not fit into f_mntonname. | > | > They can be umounted since it will fall back to fsid as in the non-jail | > case. I just tried it sorry for the bad line wrap: | > | > + mount 192.168.38.1:/data/home/ambrisko/netboot /data/jail/test | > + jail -i -c name=test path=/data/jail/test host.hostname=test.ambrisko.com persist enforce_statfs=0 allow.mount=1 allow.mount.devfs=1 allow.mount.nullfs=1 allow.mount.tmpfs=1 allow.mount.procfs=1 | > 14 | > + jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc | > + jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev | > + jexec test df | > + egrep '^devfs|^procfs' | > devfs 2 2 0 100% /dev | > procfs 8 8 0 100% /proc | > + jexec test mount -t procfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc | > + jexec test mount -t devfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev | > + jexec test df | > + egrep '^devfs|^procfs' | > devfs 2 2 0 100% /dev | > procfs 8 8 0 100% /proc | > procfs 8 8 0 100% /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 | > devfs 2 2 0 100% /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 | > + jexec test mount -v | > + egrep '^devfs|^procfs' | > devfs on /dev (devfs, local, multilabel, fsid 00ff007171000000) | > procfs on /proc (procfs, local, fsid 02ff000202000000) | > procfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (procfs, local, fsid 26ff000202000000) | > devfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (devfs, local, multilabel, fsid 27ff007171000000) | > + jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc | > + jexec test df | > + egrep '^devfs|^procfs' | > devfs 2 2 0 100% /dev | > procfs 8 8 0 100% /proc | > devfs 2 2 0 100% /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 | > + jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev | > + jexec test df | > + egrep '^devfs|^procfs' | > devfs 2 2 0 100% /dev | > procfs 8 8 0 100% /proc | | I.e. unmount gets EINVAL, right ? I do not like it, if going this route, | why do we need to store the path in the kernel at all ? For compatibility with old stuff that hasn't switch to fsid. I'll describe it below since it looks like umount(8) doesn't use it any more unless fsid fails. | At least, the | attempt to unmount by path should consistently return EINVAL always, | instead of failing randomly due to an implementation detail, where the | caller can reasonably expect the syscall to succeed. Yes, a failed match by is EINVAL, a failed match by fsid it ENOENT. First I'm talking about the umount binary and I made a mistake describing its behaviour. I thought it was trying the path first when it actually tried the fsid first returned from the stat structure. If that fails then it tries the path for older kernels: /* First try to unmount using the file system ID. */ snprintf(fsidbuf, sizeof(fsidbuf), "FSID:%d:%d", sfs->f_fsid.val[0], sfs->f_fsid.val[1]); if (unmount(fsidbuf, fflag | MNT_BYFSID) != 0) { /* XXX, non-root users get a zero fsid, so don't warn. */ if (errno != ENOENT || sfs->f_fsid.val[0] != 0 || sfs->f_fsid.val[1] != 0) warn("unmount of %s failed", sfs->f_mntonname); if (errno != ENOENT) { free(orignfsdirname); return (1); } /* Compatibility for old kernels. */ if (sfs->f_fsid.val[0] != 0 || sfs->f_fsid.val[1] != 0) warnx("retrying using path instead of file system ID"); if (unmount(sfs->f_mntonname, fflag) != 0) { warn("unmount of %s failed", sfs->f_mntonname); free(orignfsdirname); return (1); } } This was introduced at 1.38 in umount.c before 5.2 got released: When mount(8) is invoked with the `-v' flag, display the filesystem ID for each file system in addition to the normal information. In umount(8), accept filesystem IDs as well as the usual device and path names. This makes it possible to unambiguously specify which file system is to be unmounted even when two or more file systems share the same device and mountpoint names (e.g. NFS mounts from the same export into different chroots). and refined in 1.39. This doesn't address your concern about the system call unmount. | > | I think that struct mount should have a const char * field where the | > | non-trimmed path is stored and used for match at unmount. f_mntonname | > | truncation would be only unfortunate user interface glitch. | > | > Note that we are not storing the path in mount structure so no structures | > have changed which is nice since then we haven't introduced any real | > ABI breakage. So we could MFC this. The match isn't critical since | > umount will fall back to fsid and work. One thing that might be good to | > do is change umount to try to umount via fsid first and then do the | > match if the fsid failed versus the other way round that it does now. | I do not like somtimes not storing the full path of the mount point. | I do understand that the path can easily made invalid, but I still want | it there. | | MFC is not the problem for struct mount, which is never directly allocated | by non-VFS. The new member must be added to the end of the structure, which | preserves KBI. I did such surgery more than once. I was talking about the more general case since the system tries to keep the path in the stat structure. My prior approach which had more issues was to modify the stat structure of which I was pointed to NetBSD and their change to statvfs which doesn't really solve the problem. They don't have the check to see if the mount is longer then VFS_MNAMELEN (in their case) and just truncate things. If we are just talking about adding it to the mount structure that would be okay since it isn't exposed to user land. I can add that. | > The problem I see is if someone tries to do things based on the parsed | > output of mount/df then that will fail since the output is truncated. | Yes, this is understandable and IMO acceptable. I think we might be able to fix that in the future by populating our bogus statvfs with more valid values for paths. With your suggestion we could populate the statvfs on the fly with a value from the mount structure. Then convert df and mount to use statvfs. Thanks, Doug A. From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 19 18:50:27 2013 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3164ECE6; Tue, 19 Nov 2013 18:50:27 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 1C30023F4; Tue, 19 Nov 2013 18:50:25 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA10663; Tue, 19 Nov 2013 20:50:23 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ViqNT-000KrX-Fy; Tue, 19 Nov 2013 20:50:23 +0200 Message-ID: <528BB2B7.8060908@FreeBSD.org> Date: Tue, 19 Nov 2013 20:49:27 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: freebsd-hackers@FreeBSD.org, FreeBSD Current Subject: provide fast versions of ffsl and flsl for i386; ffsll and flsll for amd64 X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Nov 2013 18:50:27 -0000 These are just trivial wrappers based on the fact that int and long on i386 have the same "bit layout" and likewise for long and long long on amd64. For your reviewing pleasure :-) Thanks! commit fdc1228b113f8b4c9dbda2b0323cb087c6b6df9d Author: Andriy Gapon Date: Thu Nov 7 19:13:00 2013 +0200 provide fast versions of ffsl and flsl for i386; ffsll and flsll for amd64 diff --git a/sys/amd64/include/cpufunc.h b/sys/amd64/include/cpufunc.h index 5f8197b..7464739 100644 --- a/sys/amd64/include/cpufunc.h +++ b/sys/amd64/include/cpufunc.h @@ -154,6 +154,14 @@ ffsl(long mask) return (mask == 0 ? mask : (int)bsfq((u_long)mask) + 1); } +#define HAVE_INLINE_FFSLL + +static __inline int +ffsll(long long mask) +{ + return (ffsl((long)mask)); +} + #define HAVE_INLINE_FLS static __inline int @@ -170,6 +178,14 @@ flsl(long mask) return (mask == 0 ? mask : (int)bsrq((u_long)mask) + 1); } +#define HAVE_INLINE_FLSLL + +static __inline int +flsll(long long mask) +{ + return (flsl((long)mask)); +} + #endif /* _KERNEL */ static __inline void diff --git a/sys/conf/files b/sys/conf/files index d41b9d2..8077bfc 100644 --- a/sys/conf/files +++ b/sys/conf/files @@ -3029,7 +3029,6 @@ libkern/arc4random.c standard libkern/bcd.c standard libkern/bsearch.c standard libkern/crc32.c standard -libkern/flsll.c standard libkern/fnmatch.c standard libkern/iconv.c optional libiconv libkern/iconv_converter_if.m optional libiconv diff --git a/sys/conf/files.arm b/sys/conf/files.arm index 603fb2d..d15f014 100644 --- a/sys/conf/files.arm +++ b/sys/conf/files.arm @@ -87,6 +87,7 @@ libkern/divdi3.c standard libkern/ffsl.c standard libkern/fls.c standard libkern/flsl.c standard +libkern/flsll.c standard libkern/lshrdi3.c standard libkern/moddi3.c standard libkern/qdivrem.c standard diff --git a/sys/conf/files.i386 b/sys/conf/files.i386 index 23e03a3..030dbe1 100644 --- a/sys/conf/files.i386 +++ b/sys/conf/files.i386 @@ -524,8 +524,7 @@ kern/kern_clocksource.c standard kern/imgact_aout.c optional compat_aout kern/imgact_gzip.c optional gzip libkern/divdi3.c standard -libkern/ffsl.c standard -libkern/flsl.c standard +libkern/flsll.c standard libkern/memmove.c standard libkern/memset.c standard libkern/moddi3.c standard diff --git a/sys/conf/files.ia64 b/sys/conf/files.ia64 index 6719c98..e85c35d 100644 --- a/sys/conf/files.ia64 +++ b/sys/conf/files.ia64 @@ -120,6 +120,7 @@ libkern/bcmp.c standard libkern/ffsl.c standard libkern/fls.c standard libkern/flsl.c standard +libkern/flsll.c standard libkern/ia64/__divdi3.S standard libkern/ia64/__divsi3.S standard libkern/ia64/__moddi3.S standard diff --git a/sys/conf/files.mips b/sys/conf/files.mips index 82d9a69..6522bb2 100644 --- a/sys/conf/files.mips +++ b/sys/conf/files.mips @@ -56,6 +56,7 @@ kern/subr_dummy_vdso_tc.c standard libkern/ffsl.c standard libkern/fls.c standard libkern/flsl.c standard +libkern/flsll.c standard libkern/memmove.c standard libkern/cmpdi2.c optional mips | mipsel libkern/ucmpdi2.c optional mips | mipsel diff --git a/sys/conf/files.pc98 b/sys/conf/files.pc98 index fd3ad4a..c95d956 100644 --- a/sys/conf/files.pc98 +++ b/sys/conf/files.pc98 @@ -210,6 +210,7 @@ kern/imgact_gzip.c optional gzip libkern/divdi3.c standard libkern/ffsl.c standard libkern/flsl.c standard +libkern/flsll.c standard libkern/memmove.c standard libkern/memset.c standard libkern/moddi3.c standard diff --git a/sys/conf/files.powerpc b/sys/conf/files.powerpc index 6d90fc7..98b3da0 100644 --- a/sys/conf/files.powerpc +++ b/sys/conf/files.powerpc @@ -79,6 +79,7 @@ libkern/ffs.c standard libkern/ffsl.c standard libkern/fls.c standard libkern/flsl.c standard +libkern/flsll.c standard libkern/lshrdi3.c optional powerpc libkern/memmove.c standard libkern/memset.c standard diff --git a/sys/conf/files.sparc64 b/sys/conf/files.sparc64 index 5c00350..ccee247 100644 --- a/sys/conf/files.sparc64 +++ b/sys/conf/files.sparc64 @@ -68,6 +68,7 @@ libkern/ffs.c standard libkern/ffsl.c standard libkern/fls.c standard libkern/flsl.c standard +libkern/flsll.c standard libkern/memmove.c standard sparc64/central/central.c optional central sparc64/ebus/ebus.c optional ebus diff --git a/sys/i386/include/cpufunc.h b/sys/i386/include/cpufunc.h index 7cd3663..98f82f2 100644 --- a/sys/i386/include/cpufunc.h +++ b/sys/i386/include/cpufunc.h @@ -184,6 +184,14 @@ ffs(int mask) return (mask == 0 ? mask : (int)bsfl((u_int)mask) + 1); } +#define HAVE_INLINE_FFSL + +static __inline int +ffsl(long mask) +{ + return (ffs((int)mask)); +} + #define HAVE_INLINE_FLS static __inline int @@ -192,6 +200,14 @@ fls(int mask) return (mask == 0 ? mask : (int)bsrl((u_int)mask) + 1); } +#define HAVE_INLINE_FLSL + +static __inline int +flsl(long mask) +{ + return (fls((int)mask)); +} + #endif /* _KERNEL */ static __inline void -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Tue Nov 19 21:53:38 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9675B420; Tue, 19 Nov 2013 21:53:38 +0000 (UTC) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 590F12FAD; Tue, 19 Nov 2013 21:53:38 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id 08160359316; Tue, 19 Nov 2013 22:53:35 +0100 (CET) Received: by turtle.stack.nl (Postfix, from userid 1677) id F1085CB4E; Tue, 19 Nov 2013 22:53:34 +0100 (CET) Date: Tue, 19 Nov 2013 22:53:34 +0100 From: Jilles Tjoelker To: Doug Ambrisko Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20131119215334.GA30794@stack.nl> References: <51B3B59B.8050903@erdgeist.org> <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua> <20131118190142.GA28210@ambrisko.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131118190142.GA28210@ambrisko.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Konstantin Belousov , freebsd-hackers@freebsd.org, Dirk Engling , Jase Thew , mdf@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Nov 2013 21:53:38 -0000 On Mon, Nov 18, 2013 at 11:01:42AM -0800, Doug Ambrisko wrote: > On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote: > | I think that struct mount should have a const char * field where the > | non-trimmed path is stored and used for match at unmount. f_mntonname > | truncation would be only unfortunate user interface glitch. > Note that we are not storing the path in mount structure so no structures > have changed which is nice since then we haven't introduced any real > ABI breakage. So we could MFC this. The match isn't critical since > umount will fall back to fsid and work. One thing that might be good to > do is change umount to try to umount via fsid first and then do the > match if the fsid failed versus the other way round that it does now. > The problem I see is if someone tries to do things based on the parsed > output of mount/df then that will fail since the output is truncated. As noted in comments in sbin/umount/umount.c, the statfs() call is deliberately after the mount list checks because it may block forever for unresponsive NFS servers. It would be unfortunate if hung NFS filesystems would have to be forcibly unmounted by copy/pasting the fsid from 'mount -v'. I like the idea of allowing longer mount paths in a simple way, though. -- Jilles Tjoelker From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 20 03:29:19 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4639F2B6; Wed, 20 Nov 2013 03:29:19 +0000 (UTC) Received: from mail-qc0-x22f.google.com (mail-qc0-x22f.google.com [IPv6:2607:f8b0:400d:c01::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EF859279A; Wed, 20 Nov 2013 03:29:18 +0000 (UTC) Received: by mail-qc0-f175.google.com with SMTP id v14so581658qcr.20 for ; Tue, 19 Nov 2013 19:29:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=wjFe0RBd38gPvx7Bw5sjzObiqXdAH4uBkHG1bXKx1ww=; b=U1S8S19+K92GOasO33KHE9kRWYbcQxS10ZLYip7H76LGRd/8FfUQT9CCgBP0v2sZg4 7NByFN0nxTnRnpmH3huyUOS+gAgkemwK1LhLphs5nBkbqmdjy+5p6vgvabl0RBu2g9o1 UHyhTSI7G67lrFzfY97fnpTWyVVVJi5wIp7l8iQ6O18P0kB/oBPYXZ/DRmQ7d4a1K6mf KgzoNwqovGljXP//EU2NeEtUSHMYE85AAJuKPtOWEhHIygIEQmPqXYPxKcua4qKE8KMp +gizYug8RB+AWRvsLFvDNXQU9lqR4riJgQRrXDgLpWEHwGcoVdlNgexlwubsiyeyX0Cw YtqA== MIME-Version: 1.0 X-Received: by 10.49.59.70 with SMTP id x6mr48743897qeq.17.1384918158152; Tue, 19 Nov 2013 19:29:18 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.207.66 with HTTP; Tue, 19 Nov 2013 19:29:18 -0800 (PST) In-Reply-To: <528B7681.6090806@FreeBSD.org> References: <5287BDB9.10201@FreeBSD.org> <528B7681.6090806@FreeBSD.org> Date: Tue, 19 Nov 2013 19:29:18 -0800 X-Google-Sender-Auth: I80uT54wLwfHYondPdSMYGqSQ0I Message-ID: Subject: Re: taskqueue_block From: Adrian Chadd To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Nov 2013 03:29:19 -0000 Yes, and lets fix this. :) -a On 19 November 2013 06:32, Andriy Gapon wrote: > > Forwarding this to the larger audience for a discussion. > > -------- Original Message -------- > Message-ID: <5287BDB9.10201@FreeBSD.org> > Date: Sat, 16 Nov 2013 20:47:21 +0200 > From: Andriy Gapon > Subject: taskqueue_block > > > > It seems that either I do not understand something about taskqueue_block code or > it is a quite dangerous and abused API. The fact that it is not properly > documented does not help either. > > The commit message said: >> Implement taskqueue_block() and taskqueue_unblock(). These functions allow the >> owner of a queue to block and unblock execution of the tasks in the queue while >> allowing tasks to continue to be added queue. Combining this with >> taskqueue_drain() allows a queue to be safely disabled. The unblock function may > [...] > > I indeed see this (anti?) pattern being used in the code. > But what about the following case. One thread calls taskqueue_block() and sets > TQ_FLAGS_BLOCKED. Another thread calls taskqueue_enqueue, this adds a task to > the queue and sets ta_pending of the task to 1. tq_enqueue is not called, so an > actual queue runner is not called or waken up. Then the first thread calls > taskqueue_drain() on the task. As far as I can see, the thread would then just > wait forever because the task is pending and is not going to be executed. > > Additionally, it is impossible to reason about the taskqueue's state after > taskqueue_block call, because the call just sets the flag and does not do any > synchronization. And as described above, it is not safe to call APIs that could > allow the taskqueue or the task state to become known. > > I think that taskqueue_block() should wait on the currently active tasks to > complete. I don't think that this behavior could be optional. I do see any > reasonable and safe use for "non-blocking" taskqueue_block(). > taskqueue_drain() calls after taskqueue_block() must be removed. The code > should either use taskqueue_drain() or "blocking" taskqueue_block() depending on > concrete circumstances. > > What do you think? > Thank you. > -- > Andriy Gapon > > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 20 07:55:52 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3C0D4596; Wed, 20 Nov 2013 07:55:52 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1A33C2423; Wed, 20 Nov 2013 07:55:50 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAK7tXxw094013; Wed, 20 Nov 2013 09:55:33 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAK7tXxw094013 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAK7tVDp093989; Wed, 20 Nov 2013 09:55:31 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 20 Nov 2013 09:55:31 +0200 From: Konstantin Belousov To: Doug Ambrisko Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20131120075531.GE59496@kib.kiev.ua> References: <51B3B59B.8050903@erdgeist.org> <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua> <20131118190142.GA28210@ambrisko.com> <20131119074922.GY59496@kib.kiev.ua> <20131119174216.GA80753@ambrisko.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="mC3NEAINQo/WgN2+" Content-Disposition: inline In-Reply-To: <20131119174216.GA80753@ambrisko.com> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-hackers@freebsd.org, Dirk Engling , Jase Thew , mdf@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Nov 2013 07:55:52 -0000 --mC3NEAINQo/WgN2+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote: > I was talking about the more general case since the system tries to keep > the path in the stat structure. My prior approach which had more issues > was to modify the stat structure of which I was pointed to NetBSD and the= ir > change to statvfs which doesn't really solve the problem. They don't > have the check to see if the mount is longer then VFS_MNAMELEN (in their = case) > and just truncate things. >=20 > If we are just talking about adding it to the mount structure that > would be okay since it isn't exposed to user land. I can add that. Yes, this is exactly what I mean. Add a struct mount field, and use it for kernel only. In fact, it only matters for sys_unmount() and kern_jail.c, other locations in kernel use the path for warnings, and this could be postponed if you prefer to minimize the patch. --mC3NEAINQo/WgN2+ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSjGrzAAoJEJDCuSvBvK1BNYIQAIk75V37Pla/9LCW62TXNEuI idymxdkG8Rnc0PKH3BfgtpJ+97qTZuI0GPFryyAuZjdT0DUFHni5LQ3lwsmlJJ5m lLjaMEkZbGumMocAI311l+5n9BYiSNivwHdeJFl3uBA9yZSbK98n2QJJDdqK6CMk LTaYCT0caoPacvJ8SbtfL0g9qqaGuE3t8ny+cBry+wSeS94PyDx+SzZ2vYLCyael yLCzELHUzklQGpTuSU4e+sudr9km1y5pu60VpKiI46EB6kZLAe679PzP9VIBwgA+ fHlR2Q7NkgiETH1acAe6a8Qja6V2x+ETUHsVMTljyFuVKtYQJrT0l7M8swJKjBtG PU16oCNPAfw6Rzz9+mFGqBAlFanoPVkb2l2C4fXzcPuyavlwJZQ2HE0b6i10uNFh y53zmJYLHz0VZtUTcSOBdRrBbS5eInEckZyLzUBL3c/GUcSeZbdy+kTc+3DEFiMh oSSQRsiwzebUB2woocbqFtxutySsUC9mNoA3o2JvPiWe+whj9PNPvlRK9+JJ4Wl/ i0oA1tBgC0AKuzp7M+jm6aIe8TnElxjirw/bfRU7+g1wsb3DPN5mEb85RHf2F3HB 49TIRCiJ/TzsSaeY1Vw6287QRU//xcZqus1NZV0d9grk9WxZ02gltnJ2SpqhTTUG EcKVtz963Ek1zy4+nq6l =WZx2 -----END PGP SIGNATURE----- --mC3NEAINQo/WgN2+-- From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 20 17:01:47 2013 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 31610105 for ; Wed, 20 Nov 2013 17:01:47 +0000 (UTC) Received: from mail.dusatko.org (static-84-242-66-51.net.upcbroadband.cz [84.242.66.51]) by mx1.freebsd.org (Postfix) with ESMTP id D91BE2988 for ; Wed, 20 Nov 2013 17:01:46 +0000 (UTC) Received: from mail.dusatko.org (localhost [127.0.0.1]) by mail.dusatko.org (Postfix) with ESMTP id C1A782A1F for ; Wed, 20 Nov 2013 17:48:07 +0100 (CET) Received: from Relict (Relict.praha.dusatko [192.168.253.33]) by mail.dusatko.org (Postfix) with ESMTPA id 28DEB2A1D for ; Wed, 20 Nov 2013 17:48:06 +0100 (CET) From: =?iso-8859-2?B?SmFuIER1ueF0a28=?= To: Subject: ZFS pool cheating Date: Wed, 20 Nov 2013 17:47:16 +0100 Message-ID: <029f01cee610$4567f870$d037e950$@org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Ac7mD/qSZDI57tCkTDuqTUiP8kAVEA== Content-Language: cs X-Mailman-Approved-At: Wed, 20 Nov 2013 17:43:33 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list Reply-To: jan@dusatko.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Nov 2013 17:01:47 -0000 Dear, Do you someone know method, how can be pool converted from concatenating to regular mirror? By mistake I replaced failed disk in pool using add not replace, which caused me to change pool configuration. I looking method allow me to have pool online during whole replacement procedure. Regards Jan From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 20 18:55:44 2013 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AECCB6D4 for ; Wed, 20 Nov 2013 18:55:44 +0000 (UTC) Received: from smtp.fagskolen.gjovik.no (smtp.fagskolen.gjovik.no [IPv6:2001:700:1100:1:200:ff:fe00:b]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 21F472167 for ; Wed, 20 Nov 2013 18:55:43 +0000 (UTC) Received: from mail.fig.ol.no (localhost [127.0.0.1]) by mail.fig.ol.no (8.14.7/8.14.7) with ESMTP id rAKItZSV068537 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 20 Nov 2013 19:55:35 +0100 (CET) (envelope-from trond@fagskolen.gjovik.no) Received: from localhost (trond@localhost) by mail.fig.ol.no (8.14.7/8.14.7/Submit) with ESMTP id rAKItZ6k068534; Wed, 20 Nov 2013 19:55:35 +0100 (CET) (envelope-from trond@fagskolen.gjovik.no) X-Authentication-Warning: mail.fig.ol.no: trond owned process doing -bs Date: Wed, 20 Nov 2013 19:55:34 +0100 (CET) From: =?ISO-8859-1?Q?Trond_Endrest=F8l?= Sender: Trond.Endrestol@fagskolen.gjovik.no To: =?UTF-8?Q?Jan_Du=C5=A1=C3=A1tko?= Subject: Re: ZFS pool cheating In-Reply-To: <029f01cee610$4567f870$d037e950$@org> Message-ID: References: <029f01cee610$4567f870$d037e950$@org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) Organization: Fagskolen Innlandet OpenPGP: url=http://fig.ol.no/~trond/trond.key MIME-Version: 1.0 X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail.fig.ol.no Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: freebsd-hackers@FreeBSD.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Nov 2013 18:55:44 -0000 On Wed, 20 Nov 2013 17:47+0100, Jan Du?átko wrote: > Dear, > Do you someone know method, how can be pool converted from concatenating to > regular mirror? By mistake I replaced failed disk in pool using add not > replace, which caused me to change pool configuration. > I looking method allow me to have pool online during whole replacement > procedure. > > Regards > > Jan I'm afraid your only option is something along these lines: 1. Make a recursive snapshot of the entire pool. 2. Send a recursive ZFS stream of the recursive snapshots to another pool, or disk. Beware of the danger of data loss by having only a single set of snapshots available as you proceed. 3. Destroy the old pool. 4. Recreate the original pool to a mirrored configuration. 5. Transfer the recursive ZFS stream back to the new pool using the zfs receive command. 6. Remove the recursive snapshots, if warranted. If your able to setup a fresh pair of disks in your server, you might be able to transfer the snapshots to a mirrored pool assigned a temporary name. Then export both the current pool and the temporary pool. Import the temporary pool and rename the pool to the correct name as you import the temp pool. I guess/hope someone more knowledgeable on ZFS will chime in and correct me. -- +-------------------------------+------------------------------------+ | Vennlig hilsen, | Best regards, | | Trond Endrestøl, | Trond Endrestøl, | | IT-ansvarlig, | System administrator, | | Fagskolen Innlandet, | Gjøvik Technical College, Norway, | | tlf. mob. 952 62 567, | Cellular...: +47 952 62 567, | | sentralbord 61 14 54 00. | Switchboard: +47 61 14 54 00. | +-------------------------------+------------------------------------+ From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 20 19:48:24 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 177ED6A3 for ; Wed, 20 Nov 2013 19:48:24 +0000 (UTC) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B7A1B248E for ; Wed, 20 Nov 2013 19:48:23 +0000 (UTC) Received: from [194.32.164.24] (80-46-130-69.static.dsl.as9105.com [80.46.130.69]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id rAKJZ5ib005129; Wed, 20 Nov 2013 19:35:06 GMT (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1822\)) Subject: Re: ZFS pool cheating From: Bob Bishop In-Reply-To: <029f01cee610$4567f870$d037e950$@org> Date: Wed, 20 Nov 2013 19:33:21 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <31C2CB4A-F792-4088-96D9-77F1C991D6F7@gid.co.uk> References: <029f01cee610$4567f870$d037e950$@org> To: jan@dusatko.org X-Mailer: Apple Mail (2.1822) Cc: FreeBSD Hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Nov 2013 19:48:24 -0000 Hi, On 20 Nov 2013, at 16:47, Jan Du=9A=E1tko wrote: > Dear, > Do you someone know method, how can be pool converted from = concatenating to > regular mirror? By mistake I replaced failed disk in pool using add = not > replace, which caused me to change pool configuration. > I looking method allow me to have pool online during whole replacement > procedure. If you can connect two extra disks, you can make a concatenated mirror. = If you can do that without a reboot you can do the whole procedure = online. With existing disks d1,d2 and new disks d3 at least as big as d1, d4 at = least as big as d2: zpool attach d1 d3 zpool attach d2 d4 Otherwise I think you are in for some downtime. > Regards >=20 > Jan >=20 >=20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org" >=20 -- Bob Bishop rb@gid.co.uk From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 20 20:14:35 2013 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C5A7F31B for ; Wed, 20 Nov 2013 20:14:35 +0000 (UTC) Received: from mail.dusatko.org (static-84-242-66-51.net.upcbroadband.cz [84.242.66.51]) by mx1.freebsd.org (Postfix) with ESMTP id 867B62656 for ; Wed, 20 Nov 2013 20:14:34 +0000 (UTC) Received: from mail.dusatko.org (localhost [127.0.0.1]) by mail.dusatko.org (Postfix) with ESMTP id B3823209F; Wed, 20 Nov 2013 21:14:34 +0100 (CET) Received: from Relict (Relict.praha.dusatko [192.168.253.33]) by mail.dusatko.org (Postfix) with ESMTPA id 2133E209E; Wed, 20 Nov 2013 21:14:34 +0100 (CET) From: =?UTF-8?B?SmFuIER1xaHDoXRrbw==?= To: References: <029f01cee610$4567f870$d037e950$@org> In-Reply-To: Subject: RE: ZFS pool cheating Date: Wed, 20 Nov 2013 21:14:26 +0100 Message-ID: <007201cee62d$1c75c4c0$55614e40$@org> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Ac7mIhxlNoBg+0huQ2KmspoXVejeqgACr2KA Content-Language: cs X-Mailman-Approved-At: Wed, 20 Nov 2013 23:26:22 +0000 Cc: freebsd-hackers@FreeBSD.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list Reply-To: jan@dusatko.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Nov 2013 20:14:35 -0000 Did you try to send snapshot over network ? I prepared backup using tar, currently plan to do ZFS snapshot and if = there will be possibility to send it over SSH, I can minimize downtime = and in the same time check / verify functionality of new pool with the = same name Regards Jan -----Original Message----- From: Trond.Endrestol@fagskolen.gjovik.no = [mailto:Trond.Endrestol@fagskolen.gjovik.no]=20 Sent: 20. listopadu 2013 19:56 To: Jan Du=C5=A1=C3=A1tko Cc: freebsd-hackers@FreeBSD.org Subject: Re: ZFS pool cheating On Wed, 20 Nov 2013 17:47+0100, Jan Du?=C3=A1tko wrote: > Dear, > Do you someone know method, how can be pool converted from=20 > concatenating to regular mirror? By mistake I replaced failed disk in=20 > pool using add not replace, which caused me to change pool = configuration. > I looking method allow me to have pool online during whole replacement = > procedure. >=20 > Regards >=20 > Jan I'm afraid your only option is something along these lines: 1. Make a recursive snapshot of the entire pool. 2. Send a recursive ZFS stream of the recursive snapshots to another = pool, or disk. Beware of the danger of data loss by having only a single set of = snapshots available as you proceed. 3. Destroy the old pool. 4. Recreate the original pool to a mirrored configuration. 5. Transfer the recursive ZFS stream back to the new pool using the zfs = receive command. 6. Remove the recursive snapshots, if warranted. If your able to setup a fresh pair of disks in your server, you might be = able to transfer the snapshots to a mirrored pool assigned a temporary = name. Then export both the current pool and the temporary pool. Import = the temporary pool and rename the pool to the correct name as you import = the temp pool. I guess/hope someone more knowledgeable on ZFS will chime in and correct = me. --=20 +-------------------------------+------------------------------------+ | Vennlig hilsen, | Best regards, | | Trond Endrest=C3=B8l, | Trond Endrest=C3=B8l, = | | IT-ansvarlig, | System administrator, | | Fagskolen Innlandet, | Gj=C3=B8vik Technical College, Norway, = | | tlf. mob. 952 62 567, | Cellular...: +47 952 62 567, | | sentralbord 61 14 54 00. | Switchboard: +47 61 14 54 00. | +-------------------------------+------------------------------------+ From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 07:18:16 2013 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 22951E57 for ; Thu, 21 Nov 2013 07:18:16 +0000 (UTC) Received: from smtp.fagskolen.gjovik.no (smtp.fagskolen.gjovik.no [IPv6:2001:700:1100:1:200:ff:fe00:b]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BAC8C28C7 for ; Thu, 21 Nov 2013 07:18:15 +0000 (UTC) Received: from mail.fig.ol.no (localhost [127.0.0.1]) by mail.fig.ol.no (8.14.7/8.14.7) with ESMTP id rAL7I9ec074830 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 21 Nov 2013 08:18:09 +0100 (CET) (envelope-from trond@fagskolen.gjovik.no) Received: from localhost (trond@localhost) by mail.fig.ol.no (8.14.7/8.14.7/Submit) with ESMTP id rAL7I98I074827; Thu, 21 Nov 2013 08:18:09 +0100 (CET) (envelope-from trond@fagskolen.gjovik.no) X-Authentication-Warning: mail.fig.ol.no: trond owned process doing -bs Date: Thu, 21 Nov 2013 08:18:09 +0100 (CET) From: =?ISO-8859-1?Q?Trond_Endrest=F8l?= Sender: Trond.Endrestol@fagskolen.gjovik.no To: =?UTF-8?Q?Jan_Du=C5=A1=C3=A1tko?= Subject: RE: ZFS pool cheating In-Reply-To: <007201cee62d$1c75c4c0$55614e40$@org> Message-ID: References: <029f01cee610$4567f870$d037e950$@org> <007201cee62d$1c75c4c0$55614e40$@org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) Organization: Fagskolen Innlandet OpenPGP: url=http://fig.ol.no/~trond/trond.key MIME-Version: 1.0 Content-ID: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=unavailable version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail.fig.ol.no Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: freebsd-hackers@FreeBSD.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 07:18:16 -0000 > -----Original Message----- > From: Trond.Endrestol@fagskolen.gjovik.no [mailto:Trond.Endrestol@fagskolen.gjovik.no] > Sent: 20. listopadu 2013 19:56 > To: Jan Du?átko > Cc: freebsd-hackers@FreeBSD.org > Subject: Re: ZFS pool cheating > > On Wed, 20 Nov 2013 17:47+0100, Jan Du?átko wrote: > > > Dear, > > Do you someone know method, how can be pool converted from > > concatenating to regular mirror? By mistake I replaced failed disk in > > pool using add not replace, which caused me to change pool configuration. > > I looking method allow me to have pool online during whole replacement > > procedure. > > > > Regards > > > > Jan > > I'm afraid your only option is something along these lines: > > 1. Make a recursive snapshot of the entire pool. > > 2. Send a recursive ZFS stream of the recursive snapshots to another pool, or disk. > > Beware of the danger of data loss by having only a single set of snapshots available as you proceed. > > 3. Destroy the old pool. > > 4. Recreate the original pool to a mirrored configuration. > > 5. Transfer the recursive ZFS stream back to the new pool using the zfs receive command. > > 6. Remove the recursive snapshots, if warranted. > > If your able to setup a fresh pair of disks in your server, you > might be able to transfer the snapshots to a mirrored pool assigned > a temporary name. Then export both the current pool and the > temporary pool. Import the temporary pool and rename the pool to the > correct name as you import the temp pool. > > I guess/hope someone more knowledgeable on ZFS will chime in and > correct me. On Wed, 20 Nov 2013 21:14+0100, Jan Du?átko wrote: > Did you try to send snapshot over network ? No, I haven't tried anything, this is just off the top of my head. > I prepared backup using tar, currently plan to do ZFS snapshot and > if there will be possibility to send it over SSH, I can minimize > downtime and in the same time check / verify functionality of new > pool with the same name BTW, please don't top-post. -- +-------------------------------+------------------------------------+ | Vennlig hilsen, | Best regards, | | Trond Endrestøl, | Trond Endrestøl, | | IT-ansvarlig, | System administrator, | | Fagskolen Innlandet, | Gjøvik Technical College, Norway, | | tlf. mob. 952 62 567, | Cellular...: +47 952 62 567, | | sentralbord 61 14 54 00. | Switchboard: +47 61 14 54 00. | +-------------------------------+------------------------------------+ From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 12:39:09 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8C6A4D6C for ; Thu, 21 Nov 2013 12:39:09 +0000 (UTC) Received: from mail-la0-x229.google.com (mail-la0-x229.google.com [IPv6:2a00:1450:4010:c03::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1C664248B for ; Thu, 21 Nov 2013 12:39:08 +0000 (UTC) Received: by mail-la0-f41.google.com with SMTP id eo20so2371763lab.14 for ; Thu, 21 Nov 2013 04:39:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=K4FL/i/vxeWzjvXlndKh+r4XivLxwROw8fuJeC1VQek=; b=ZMNZYpNWrPS8vttBedCX/xxSGqkmqhQf4+TdHO2RfGmofnpsB5sTekDIVkm2nIjLen fx/6h/YJVgffFG/L2olwSSzsFAo/t0+e3mAaBkIr6DteHHtoOmZajlNc5bb7MdyN3TZo 5wAFu3ubTepyDt0j8STvk23TifwZy0lTQw/6IzYNH8faGcFuTRBdKOkelMWxSxRnf7Bs W6eedLotQdz2mIlbhvrKNDpXd6e4GIpDbtIptqebuLRi1XJ45U3xc2SBPO4/YV6U4Oit CXO/cb+EYUEOCigHNY7BX2sYBxSIdcs3/EKp9nGGzqSB6y0KlDr47NSc9nzxjYwh2O+q ee+A== X-Received: by 10.152.170.199 with SMTP id ao7mr1130074lac.40.1385037546278; Thu, 21 Nov 2013 04:39:06 -0800 (PST) Received: from [172.16.0.2] (tx97.net. [85.198.160.156]) by mx.google.com with ESMTPSA id 8sm32490258laq.5.2013.11.21.04.39.04 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 21 Nov 2013 04:39:05 -0800 (PST) Message-ID: <528DFEE6.6020504@gmail.com> Date: Thu, 21 Nov 2013 14:39:02 +0200 From: Vitaly Magerya User-Agent: Thunderbird MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Problem with signal 0 being delivered to SIGUSR1 handler Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 12:39:09 -0000 Hi, folks. I'm investigating a test case failure that devel/boehm-gc has on recent FreeBSD releases. The problem is that a signal handler registered for SIGUSR1 is sometimes called with signum=0, which should not be possible under any conditions. Here's a simple test case that demonstrates this behavior: /* Compile with 'c99 -o example example.c -pthread' */ #include #include #include #include void signal_handler(int signum, siginfo_t *si, void *context) { if (signum != SIGUSR1) { printf("bad signal, signum=%d\n", signum); exit(1); } } void *thread_func(void *arg) { return arg; } int main(void) { struct sigaction sa = { 0 }; sa.sa_flags = SA_SIGINFO; sa.sa_sigaction = signal_handler; if (sigfillset(&sa.sa_mask) != 0) abort(); if (sigaction(SIGUSR1, &sa, NULL) != 0) abort(); for (int i = 0; i < 10000; i++) { pthread_t t; pthread_create(&t, NULL, thread_func, NULL); pthread_kill(t, SIGUSR1); } return 0; } Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get "signum=0" from this program, but you may need to run it a few times or increase the number of iterations to see the same. Interestingly enough, I don't see this behavior under 9.0-RELEASE. So, any ideas what the problem here is? From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 13:20:59 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 24C93A9E for ; Thu, 21 Nov 2013 13:20:59 +0000 (UTC) Received: from mail-pd0-x22d.google.com (mail-pd0-x22d.google.com [IPv6:2607:f8b0:400e:c02::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 040FF2732 for ; Thu, 21 Nov 2013 13:20:58 +0000 (UTC) Received: by mail-pd0-f173.google.com with SMTP id p10so3479531pdj.4 for ; Thu, 21 Nov 2013 05:20:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=pz2rtkKRO8mN68MDPxdGRx02eJlbiE8MKW4/uRr1IgA=; b=EBR/3IVyC7741eAvTJfcejU7UGbxpyjp9tqw1Q2dEVJExRHfXChf4V4LZyrE0n/VR5 h+1baJ3lOsxhV7ZF1dNXU9UoK4RI470IiI4Zly0A4A66Jr1qu2bTMhkL7dNl9k6jNMdg Zv5ahILqWJFPgXje24nAo+HMdjajIsIwlXHLH97MJV90Zp5z9s7/6NcDOl1SrEssS/QQ /aRfjN7F18ei9gRNXxZblU655g37vV/mrTC+7lp4kWKeMWbqGquh788R3Vu065D076Hr Odl37LJUG5VTB9K3iyJ/Z3eSVT8sOe4t2jRHD7efHt05fwnGnein9UN+/zWlT1ncwnuE FqFA== MIME-Version: 1.0 X-Received: by 10.68.218.3 with SMTP id pc3mr6245928pbc.71.1385040056398; Thu, 21 Nov 2013 05:20:56 -0800 (PST) Received: by 10.70.41.133 with HTTP; Thu, 21 Nov 2013 05:20:56 -0800 (PST) Date: Thu, 21 Nov 2013 07:20:56 -0600 Message-ID: Subject: 9.1 callout behavior From: Bret Ketchum To: freebsd-hackers@freebsd.org X-Mailman-Approved-At: Thu, 21 Nov 2013 13:27:31 +0000 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.16 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 13:20:59 -0000 I've a callout which runs every 100ms and does a bit of accounting using the global ticks variable. This one-shot callout was called fairly consistently in 8.1, every 100ms give or take a few thousand clocks. I've recently upgraded to 9.1 and for the most part the period is consistent. However, periodically the callout function is executed anywhere between 5ms to 20ms after the callout was reset and the function returned while global ticks has increased 8x. The hardware has not changed (using the same timecounter configuration): CPU: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2500.05-MHz K8-class CPU) kern.timecounter.hardware: TSC-low kern.timecounter.tick: 1 kern.timecounter.invariant_tsc: 1 kern.timecounter.smp_tsc: 1 And default eventtimer configuration: kern.eventtimer.singlemul: 2 kern.eventtimer.idletick: 0 kern.eventtimer.activetick: 1 kern.eventtimer.timer: LAPIC kern.eventtimer.periodic: 0 If tickless mode is disabled the inconsistency goes away. Is the premature expiration of the callout expected? Is the jump in global ticks typical (say from 100 ticks to 800 ticks in 1.5ms)? Bret From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 17:40:36 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B697FC73; Thu, 21 Nov 2013 17:40:36 +0000 (UTC) Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90]) by mx1.freebsd.org (Postfix) with ESMTP id 6FE6226B3; Thu, 21 Nov 2013 17:40:36 +0000 (UTC) X-Ambrisko-Me: Yes Received: from server2.ambrisko.com (HELO internal.ambrisko.com) ([192.168.1.2]) by ironport.ambrisko.com with ESMTP; 21 Nov 2013 09:44:23 -0800 Received: from ambrisko.com (localhost [127.0.0.1]) by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id rALHeUcq087761; Thu, 21 Nov 2013 09:40:30 -0800 (PST) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.14.4/8.14.4/Submit) id rALHeSQ0087758; Thu, 21 Nov 2013 09:40:28 -0800 (PST) (envelope-from ambrisko) Date: Thu, 21 Nov 2013 09:40:28 -0800 From: Doug Ambrisko To: Konstantin Belousov Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20131121174028.GA80520@ambrisko.com> References: <51B3B59B.8050903@erdgeist.org> <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua> <20131118190142.GA28210@ambrisko.com> <20131119074922.GY59496@kib.kiev.ua> <20131119174216.GA80753@ambrisko.com> <20131120075531.GE59496@kib.kiev.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131120075531.GE59496@kib.kiev.ua> User-Agent: Mutt/1.4.2.3i Cc: freebsd-hackers@freebsd.org, Dirk Engling , Jase Thew , mdf@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 17:40:36 -0000 On Wed, Nov 20, 2013 at 09:55:31AM +0200, Konstantin Belousov wrote: | On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote: | > I was talking about the more general case since the system tries to keep | > the path in the stat structure. My prior approach which had more issues | > was to modify the stat structure of which I was pointed to NetBSD and their | > change to statvfs which doesn't really solve the problem. They don't | > have the check to see if the mount is longer then VFS_MNAMELEN (in their case) | > and just truncate things. | > | > If we are just talking about adding it to the mount structure that | > would be okay since it isn't exposed to user land. I can add that. | | Yes, this is exactly what I mean. Add a struct mount field, and use | it for kernel only. In fact, it only matters for sys_unmount() and | kern_jail.c, other locations in kernel use the path for warnings, and | this could be postponed if you prefer to minimize the patch. Okay, I went through all of the occurances and compile tested (except for #DEBUG). I united a few things but should do more once I get consensus on the approach. I found a few spots that should be updated as well and made the length check more consistant. Some were doing >= and others >. So this should be better, however, a lot larger. On the plus side when we figure out how to return the longer path length to user land that can be more flexible since the kernel is tracking the longer length. Probably things to note are changes in: ZFS to mount snapshot cd9660 for symlinks fuse to return full path jail to check statfs and mount mount/umount to save and check full path mountroot to save new field for full path Just in case it doesn't make it in email the full patch is at: http://people.freebsd.org/~ambrisko/mount_bigger.patch Thanks, Doug A. Index: cddl/compat/opensolaris/kern/opensolaris_vfs.c =================================================================== --- cddl/compat/opensolaris/kern/opensolaris_vfs.c (revision 257489) +++ cddl/compat/opensolaris/kern/opensolaris_vfs.c (working copy) @@ -126,7 +126,7 @@ * variables will fit in our mp buffers, including the * terminating NUL. */ - if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MNAMELEN) + if (strlen(fstype) > MFSNAMELEN || strlen(fspath) > MAXPATHLEN) return (ENAMETOOLONG); vfsp = vfs_byname_kld(fstype, td, &error); Index: cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c =================================================================== --- cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c (revision 257489) +++ cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c (working copy) @@ -1069,12 +1069,12 @@ dmu_objset_rele(snap, FTAG); domount: - mountpoint_len = strlen(dvp->v_vfsp->mnt_stat.f_mntonname) + + mountpoint_len = strlen(dvp->v_vfsp->mnt_path) + strlen("/" ZFS_CTLDIR_NAME "/snapshot/") + strlen(nm) + 1; mountpoint = kmem_alloc(mountpoint_len, KM_SLEEP); (void) snprintf(mountpoint, mountpoint_len, "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", - dvp->v_vfsp->mnt_stat.f_mntonname, nm); + dvp->v_vfsp->mnt_path, nm); err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); kmem_free(mountpoint, mountpoint_len); if (err == 0) { Index: fs/cd9660/cd9660_rrip.c =================================================================== --- fs/cd9660/cd9660_rrip.c (revision 257489) +++ fs/cd9660/cd9660_rrip.c (working copy) @@ -167,7 +167,7 @@ /* same as above */ outbuf -= len; len = 0; - inbuf = ana->imp->im_mountp->mnt_stat.f_mntonname; + inbuf = (char *)ana->imp->im_mountp->mnt_path; wlen = strlen(inbuf); break; Index: fs/ext2fs/ext2_lookup.c =================================================================== --- fs/ext2fs/ext2_lookup.c (revision 257489) +++ fs/ext2fs/ext2_lookup.c (working copy) @@ -802,10 +802,10 @@ mp = ITOV(ip)->v_mount; if ((mp->mnt_flag & MNT_RDONLY) == 0) panic("ext2_dirbad: %s: bad dir ino %lu at offset %ld: %s\n", - mp->mnt_stat.f_mntonname, (u_long)ip->i_number,(long)offset, how); + mp->mnt_path, (u_long)ip->i_number,(long)offset, how); else (void)printf("%s: bad dir ino %lu at offset %ld: %s\n", - mp->mnt_stat.f_mntonname, (u_long)ip->i_number, (long)offset, how); + mp->mnt_path, (u_long)ip->i_number, (long)offset, how); } Index: fs/fuse/fuse_vnops.c =================================================================== --- fs/fuse/fuse_vnops.c (revision 257489) +++ fs/fuse/fuse_vnops.c (working copy) @@ -1265,7 +1265,7 @@ } if (((char *)fdi.answ)[0] == '/' && fuse_get_mpdata(vnode_mount(vp))->dataflags & FSESS_PUSH_SYMLINKS_IN) { - char *mpth = vnode_mount(vp)->mnt_stat.f_mntonname; + char *mpth = (char *)vnode_mount(vp)->mnt_path; err = uiomove(mpth, strlen(mpth), uio); } Index: fs/nandfs/nandfs_segment.c =================================================================== --- fs/nandfs/nandfs_segment.c (revision 257489) +++ fs/nandfs/nandfs_segment.c (working copy) @@ -1275,7 +1275,7 @@ mp = (struct mount *)addr; db_printf("%p %s on %s (%s)\n", mp, mp->mnt_stat.f_mntfromname, - mp->mnt_stat.f_mntonname, mp->mnt_stat.f_fstypename); + mp->mnt_path, mp->mnt_stat.f_fstypename); nmp = (struct nandfsmount *)(mp->mnt_data); Index: fs/nullfs/null_vfsops.c =================================================================== --- fs/nullfs/null_vfsops.c (revision 257489) +++ fs/nullfs/null_vfsops.c (working copy) @@ -211,7 +211,7 @@ vfs_mountedfrom(mp, target); NULLFSDEBUG("nullfs_mount: lower %s, alias at %s\n", - mp->mnt_stat.f_mntfromname, mp->mnt_stat.f_mntonname); + mp->mnt_stat.f_mntfromname, mp->mnt_path); return (0); } Index: fs/unionfs/union_vfsops.c =================================================================== --- fs/unionfs/union_vfsops.c (revision 257489) +++ fs/unionfs/union_vfsops.c (working copy) @@ -310,7 +310,7 @@ copystr(target, tmp, len, NULL); UNIONFSDEBUG("unionfs_mount: from %s, on %s\n", - mp->mnt_stat.f_mntfromname, mp->mnt_stat.f_mntonname); + mp->mnt_stat.f_mntfromname, mp->mnt_path); return (0); } Index: geom/journal/g_journal.c =================================================================== --- geom/journal/g_journal.c (revision 257489) +++ geom/journal/g_journal.c (working copy) @@ -2922,7 +2922,7 @@ goto next; } - mountpoint = mp->mnt_stat.f_mntonname; + mountpoint = (char *)mp->mnt_path; error = vn_start_write(NULL, &mp, V_WAIT); if (error != 0) { Index: gnu/fs/reiserfs/reiserfs_vfsops.c =================================================================== --- gnu/fs/reiserfs/reiserfs_vfsops.c (revision 257489) +++ gnu/fs/reiserfs/reiserfs_vfsops.c (working copy) @@ -309,7 +309,7 @@ reiserfs_log(LOG_DEBUG, "...done\n"); if (sbp != &mp->mnt_stat) { - reiserfs_log(LOG_DEBUG, "copying monut point info\n"); + reiserfs_log(LOG_DEBUG, "copying mount point info\n"); sbp->f_type = mp->mnt_vfc->vfc_typenum; bcopy((caddr_t)mp->mnt_stat.f_mntonname, (caddr_t)&sbp->f_mntonname[0], MNAMELEN); @@ -318,7 +318,7 @@ reiserfs_log(LOG_DEBUG, " mount from: %s\n", sbp->f_mntfromname); reiserfs_log(LOG_DEBUG, " mount on: %s\n", - sbp->f_mntonname); + mp->mnt_path); reiserfs_log(LOG_DEBUG, "...done\n"); } Index: kern/kern_jail.c =================================================================== --- kern/kern_jail.c (revision 257489) +++ kern/kern_jail.c (working copy) @@ -3555,7 +3555,6 @@ prison_canseemount(struct ucred *cred, struct mount *mp) { struct prison *pr; - struct statfs *sp; size_t len; pr = cred->cr_prison; @@ -3574,14 +3573,13 @@ if (strcmp(pr->pr_path, "/") == 0) return (0); len = strlen(pr->pr_path); - sp = &mp->mnt_stat; - if (strncmp(pr->pr_path, sp->f_mntonname, len) != 0) + if (strncmp(pr->pr_path, mp->mnt_path, len) != 0) return (ENOENT); /* * Be sure that we don't have situation where jail's root directory * is "/some/path" and mount point is "/some/pathpath". */ - if (sp->f_mntonname[len] != '\0' && sp->f_mntonname[len] != '/') + if (mp->mnt_path[len] != '\0' && mp->mnt_path[len] != '/') return (ENOENT); return (0); } Index: kern/vfs_mount.c =================================================================== --- kern/vfs_mount.c (revision 257489) +++ kern/vfs_mount.c (working copy) @@ -473,6 +473,7 @@ mp->mnt_cred = crdup(cred); mp->mnt_stat.f_owner = cred->cr_uid; strlcpy(mp->mnt_stat.f_mntonname, fspath, MNAMELEN); + strlcpy((char *)mp->mnt_path, fspath, MAXPATHLEN); mp->mnt_iosize_max = DFLTPHYS; #ifdef MAC mac_mount_init(mp); @@ -656,7 +657,7 @@ * variables will fit in our mp buffers, including the * terminating NUL. */ - if (fstypelen > MFSNAMELEN || fspathlen > MNAMELEN) { + if (fstypelen > MFSNAMELEN || fspathlen > MAXPATHLEN) { error = ENAMETOOLONG; goto bail; } @@ -748,8 +749,8 @@ return (EOPNOTSUPP); } - ma = mount_argsu(ma, "fstype", uap->type, MNAMELEN); - ma = mount_argsu(ma, "fspath", uap->path, MNAMELEN); + ma = mount_argsu(ma, "fstype", uap->type, MFSNAMELEN); + ma = mount_argsu(ma, "fspath", uap->path, MAXPATHLEN); ma = mount_argb(ma, flags & MNT_RDONLY, "noro"); ma = mount_argb(ma, !(flags & MNT_NOSUID), "nosuid"); ma = mount_argb(ma, !(flags & MNT_NOEXEC), "noexec"); @@ -1040,7 +1041,7 @@ * variables will fit in our mp buffers, including the * terminating NUL. */ - if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MNAMELEN) + if (strlen(fstype) > MFSNAMELEN || strlen(fspath) > MAXPATHLEN) return (ENAMETOOLONG); if (jailed(td->td_ucred) || usermount == 0) { @@ -1095,9 +1096,9 @@ NDFREE(&nd, NDF_ONLY_PNBUF); vp = nd.ni_vp; if ((fsflags & MNT_UPDATE) == 0) { - pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK); + pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK); strcpy(pathbuf, fspath); - error = vn_path_to_global_path(td, vp, pathbuf, MNAMELEN); + error = vn_path_to_global_path(td, vp, pathbuf, MAXPATHLEN); /* debug.disablefullpath == 1 results in ENODEV */ if (error == 0 || error == ENODEV) { error = vfs_domount_first(td, vfsp, pathbuf, vp, @@ -1147,8 +1148,8 @@ return (error); } - pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK); - error = copyinstr(uap->path, pathbuf, MNAMELEN, NULL); + pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK); + error = copyinstr(uap->path, pathbuf, MAXPATHLEN, NULL); if (error) { free(pathbuf, M_TEMP); return (error); @@ -1179,13 +1180,13 @@ if (namei(&nd) == 0) { NDFREE(&nd, NDF_ONLY_PNBUF); error = vn_path_to_global_path(td, nd.ni_vp, pathbuf, - MNAMELEN); + MAXPATHLEN); if (error == 0 || error == ENODEV) vput(nd.ni_vp); } mtx_lock(&mountlist_mtx); TAILQ_FOREACH_REVERSE(mp, &mountlist, mntlist, mnt_list) { - if (strcmp(mp->mnt_stat.f_mntonname, pathbuf) == 0) + if (strcmp(mp->mnt_path, pathbuf) == 0) break; } mtx_unlock(&mountlist_mtx); Index: kern/vfs_mountroot.c =================================================================== --- kern/vfs_mountroot.c (revision 257489) +++ kern/vfs_mountroot.c (working copy) @@ -307,6 +307,8 @@ vp->v_mountedhere = mporoot; strlcpy(mporoot->mnt_stat.f_mntonname, fspath, MNAMELEN); + strlcpy((char *)mporoot->mnt_path, + fspath, MAXPATHLEN); VOP_UNLOCK(vp, 0); } else vput(vp); Index: kern/vfs_subr.c =================================================================== --- kern/vfs_subr.c (revision 257489) +++ kern/vfs_subr.c (working copy) @@ -2962,7 +2962,7 @@ TAILQ_FOREACH(mp, &mountlist, mnt_list) { db_printf("%p %s on %s (%s)\n", mp, mp->mnt_stat.f_mntfromname, - mp->mnt_stat.f_mntonname, + mp->mnt_path, mp->mnt_stat.f_fstypename); if (db_pager_quit) break; @@ -2973,7 +2973,7 @@ mp = (struct mount *)addr; db_printf("%p %s on %s (%s)\n", mp, mp->mnt_stat.f_mntfromname, - mp->mnt_stat.f_mntonname, mp->mnt_stat.f_fstypename); + mp->mnt_path, mp->mnt_stat.f_fstypename); buf[0] = '\0'; mflags = mp->mnt_flag; @@ -3406,7 +3406,7 @@ */ if (strcmp(mp->mnt_vfc->vfc_name, "devfs") != 0) { printf("unmount of %s failed (", - mp->mnt_stat.f_mntonname); + mp->mnt_path); if (error == EBUSY) printf("BUSY)\n"); else Index: security/mac_lomac/mac_lomac.c =================================================================== --- security/mac_lomac/mac_lomac.c (revision 257489) +++ security/mac_lomac/mac_lomac.c (working copy) @@ -569,7 +569,7 @@ "mountpount=%s)\n", subjlabeltext, p->p_pid, pgid, curthread->td_ucred->cr_uid, p->p_comm, subjtext, actionname, objlabeltext, objname, - va.va_fileid, vp->v_mount->mnt_stat.f_mntonname); + va.va_fileid, vp->v_mount->mnt_path); } else { log(LOG_INFO, "LOMAC: level-%s subject p%dg%du%d:%s demoted to" " level %s after %s a level-%s %s\n", Index: sys/mount.h =================================================================== --- sys/mount.h (revision 257489) +++ sys/mount.h (working copy) @@ -190,6 +190,7 @@ struct lock mnt_explock; /* vfs_export walkers lock */ TAILQ_ENTRY(mount) mnt_upper_link; /* (m) we in the all uppers */ TAILQ_HEAD(, mount) mnt_uppers; /* (m) upper mounts over us*/ + const char mnt_path[MAXPATHLEN]; /* actual mount path */ }; /* Index: ufs/ffs/ffs_alloc.c =================================================================== --- ufs/ffs/ffs_alloc.c (revision 257489) +++ ufs/ffs/ffs_alloc.c (working copy) @@ -2748,7 +2748,7 @@ case FFS_SET_FLAGS: #ifdef DEBUG if (fsckcmds) - printf("%s: %s flags\n", mp->mnt_stat.f_mntonname, + printf("%s: %s flags\n", mp->mnt_path, cmd.size > 0 ? "set" : "clear"); #endif /* DEBUG */ if (cmd.size > 0) @@ -2761,7 +2761,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: adjust inode %jd link count by %jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value, + mp->mnt_path, (intmax_t)cmd.value, (intmax_t)cmd.size); } #endif /* DEBUG */ @@ -2782,7 +2782,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: adjust inode %jd block count by %jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value, + mp->mnt_path, (intmax_t)cmd.value, (intmax_t)cmd.size); } #endif /* DEBUG */ @@ -2804,12 +2804,12 @@ if (fsckcmds) { if (cmd.size == 1) printf("%s: free %s inode %ju\n", - mp->mnt_stat.f_mntonname, + mp->mnt_path, filetype == IFDIR ? "directory" : "file", (uintmax_t)cmd.value); else printf("%s: free %s inodes %ju-%ju\n", - mp->mnt_stat.f_mntonname, + mp->mnt_path, filetype == IFDIR ? "directory" : "file", (uintmax_t)cmd.value, (uintmax_t)(cmd.value + cmd.size - 1)); @@ -2829,11 +2829,11 @@ if (fsckcmds) { if (cmd.size == 1) printf("%s: free block %jd\n", - mp->mnt_stat.f_mntonname, + mp->mnt_path, (intmax_t)cmd.value); else printf("%s: free blocks %jd-%jd\n", - mp->mnt_stat.f_mntonname, + mp->mnt_path, (intmax_t)cmd.value, (intmax_t)cmd.value + cmd.size - 1); } @@ -2860,7 +2860,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: adjust number of directories by %jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value); + mp->mnt_path, (intmax_t)cmd.value); } #endif /* DEBUG */ fs->fs_cstotal.cs_ndir += cmd.value; @@ -2870,7 +2870,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: adjust number of free blocks by %+jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value); + mp->mnt_path, (intmax_t)cmd.value); } #endif /* DEBUG */ fs->fs_cstotal.cs_nbfree += cmd.value; @@ -2880,7 +2880,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: adjust number of free inodes by %+jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value); + mp->mnt_path, (intmax_t)cmd.value); } #endif /* DEBUG */ fs->fs_cstotal.cs_nifree += cmd.value; @@ -2890,7 +2890,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: adjust number of free frags by %+jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value); + mp->mnt_path, (intmax_t)cmd.value); } #endif /* DEBUG */ fs->fs_cstotal.cs_nffree += cmd.value; @@ -2900,7 +2900,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: adjust number of free clusters by %+jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value); + mp->mnt_path, (intmax_t)cmd.value); } #endif /* DEBUG */ fs->fs_cstotal.cs_numclusters += cmd.value; @@ -2910,7 +2910,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: set current directory to inode %jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value); + mp->mnt_path, (intmax_t)cmd.value); } #endif /* DEBUG */ if ((error = ffs_vget(mp, (ino_t)cmd.value, LK_SHARED, &vp))) @@ -2933,7 +2933,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: change .. in cwd from %jd to %jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value, + mp->mnt_path, (intmax_t)cmd.value, (intmax_t)cmd.size); } #endif /* DEBUG */ @@ -2972,7 +2972,7 @@ if (copyinstr((char *)(intptr_t)cmd.value, buf,32,NULL)) strncpy(buf, "Name_too_long", 32); printf("%s: unlink %s (inode %jd)\n", - mp->mnt_stat.f_mntonname, buf, (intmax_t)cmd.size); + mp->mnt_path, buf, (intmax_t)cmd.size); } #endif /* DEBUG */ /* @@ -2994,7 +2994,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: update inode %jd\n", - mp->mnt_stat.f_mntonname, (intmax_t)cmd.value); + mp->mnt_path, (intmax_t)cmd.value); } #endif /* DEBUG */ if ((error = ffs_vget(mp, (ino_t)cmd.value, LK_EXCLUSIVE, &vp))) @@ -3028,7 +3028,7 @@ #ifdef DEBUG if (fsckcmds) { printf("%s: %s buffered output for descriptor %jd\n", - mp->mnt_stat.f_mntonname, + mp->mnt_path, cmd.size == 1 ? "enable" : "disable", (intmax_t)cmd.value); } Index: ufs/ffs/ffs_snapshot.c =================================================================== --- ufs/ffs/ffs_snapshot.c (revision 257489) +++ ufs/ffs/ffs_snapshot.c (working copy) @@ -693,7 +693,7 @@ nanotime(&endtime); timespecsub(&endtime, &starttime); printf("%s: suspended %ld.%03ld sec, redo %ld of %d\n", - vp->v_mount->mnt_stat.f_mntonname, (long)endtime.tv_sec, + vp->v_mount->mnt_path, (long)endtime.tv_sec, endtime.tv_nsec / 1000000, redo, fs->fs_ncg); } if (copy_fs == NULL) Index: ufs/ffs/ffs_softdep.c =================================================================== --- ufs/ffs/ffs_softdep.c (revision 257489) +++ ufs/ffs/ffs_softdep.c (working copy) @@ -733,7 +733,7 @@ * Internal function prototypes. */ static void check_clear_deps(struct mount *); -static void softdep_error(char *, int); +static void softdep_error(const char *, int); static int softdep_process_worklist(struct mount *, int); static int softdep_waitidle(struct mount *); static void drain_output(struct vnode *); @@ -13771,7 +13771,7 @@ if ((bp->b_ioflags & BIO_ERROR) == 0) panic("softdep_deallocate_dependencies: dangling deps"); if (bp->b_vp != NULL && bp->b_vp->v_mount != NULL) - softdep_error(bp->b_vp->v_mount->mnt_stat.f_mntonname, bp->b_error); + softdep_error(bp->b_vp->v_mount->mnt_path, bp->b_error); else printf("softdep_deallocate_dependencies: " "got error %d while accessing filesystem\n", bp->b_error); @@ -13784,7 +13784,7 @@ */ static void softdep_error(func, error) - char *func; + const char *func; int error; { @@ -13916,7 +13916,7 @@ db_print_ffs(struct ufsmount *ump) { db_printf("mp %p %s devvp %p fs %p su_wl %d su_deps %d su_req %d\n", - ump->um_mountp, ump->um_mountp->mnt_stat.f_mntonname, + ump->um_mountp, ump->um_mountp->mnt_path, ump->um_devvp, ump->um_fs, ump->softdep_on_worklist, ump->softdep_deps, ump->softdep_req); } Index: ufs/ffs/ffs_vfsops.c =================================================================== --- ufs/ffs/ffs_vfsops.c (revision 257489) +++ ufs/ffs/ffs_vfsops.c (working copy) @@ -533,7 +533,7 @@ * We need the name for the mount point (also used for * "last mounted on") copied in. If an error occurs, * the mount point is discarded by the upper level code. - * Note that vfs_mount() populates f_mntonname for us. + * Note that vfs_mount() populates mnt_path for us. */ if ((error = ffs_mountfs(devvp, mp, td)) != 0) { vrele(devvp); @@ -885,13 +885,13 @@ } else { printf("WARNING: %s: GJOURNAL flag on fs " "but no gjournal provider below\n", - mp->mnt_stat.f_mntonname); + mp->mnt_path); free(mp->mnt_gjprovider, M_UFSMNT); mp->mnt_gjprovider = NULL; } #else printf("WARNING: %s: GJOURNAL flag on fs but no " - "UFS_GJOURNAL support\n", mp->mnt_stat.f_mntonname); + "UFS_GJOURNAL support\n", mp->mnt_path); #endif } else { mp->mnt_gjprovider = NULL; @@ -976,7 +976,7 @@ MNT_IUNLOCK(mp); #else printf("WARNING: %s: multilabel flag on fs but " - "no MAC support\n", mp->mnt_stat.f_mntonname); + "no MAC support\n", mp->mnt_path); #endif } if ((fs->fs_flags & FS_ACLS) != 0) { @@ -986,7 +986,7 @@ if (mp->mnt_flag & MNT_NFS4ACLS) printf("WARNING: %s: ACLs flag on fs conflicts with " "\"nfsv4acls\" mount option; option ignored\n", - mp->mnt_stat.f_mntonname); + mp->mnt_path); mp->mnt_flag &= ~MNT_NFS4ACLS; mp->mnt_flag |= MNT_ACLS; @@ -993,7 +993,7 @@ MNT_IUNLOCK(mp); #else printf("WARNING: %s: ACLs flag on fs but no ACLs support\n", - mp->mnt_stat.f_mntonname); + mp->mnt_path); #endif } if ((fs->fs_flags & FS_NFS4ACLS) != 0) { @@ -1003,7 +1003,7 @@ if (mp->mnt_flag & MNT_ACLS) printf("WARNING: %s: NFSv4 ACLs flag on fs conflicts " "with \"acls\" mount option; option ignored\n", - mp->mnt_stat.f_mntonname); + mp->mnt_path); mp->mnt_flag &= ~MNT_ACLS; mp->mnt_flag |= MNT_NFS4ACLS; @@ -1010,7 +1010,7 @@ MNT_IUNLOCK(mp); #else printf("WARNING: %s: NFSv4 ACLs flag on fs but no " - "ACLs support\n", mp->mnt_stat.f_mntonname); + "ACLs support\n", mp->mnt_path); #endif } if ((fs->fs_flags & FS_TRIM) != 0) { @@ -1020,11 +1020,11 @@ if (!ump->um_candelete) printf("WARNING: %s: TRIM flag on fs but disk " "does not support TRIM\n", - mp->mnt_stat.f_mntonname); + mp->mnt_path); } else { printf("WARNING: %s: TRIM flag on fs but disk does " "not confirm that it supports TRIM\n", - mp->mnt_stat.f_mntonname); + mp->mnt_path); ump->um_candelete = 0; } } @@ -1044,7 +1044,7 @@ * Set FS local "last mounted on" information (NULL pad) */ bzero(fs->fs_fsmnt, MAXMNTLEN); - strlcpy(fs->fs_fsmnt, mp->mnt_stat.f_mntonname, MAXMNTLEN); + strlcpy(fs->fs_fsmnt, mp->mnt_path, MAXMNTLEN); mp->mnt_stat.f_iosize = fs->fs_bsize; if (mp->mnt_flag & MNT_ROOTFS) { @@ -1241,7 +1241,7 @@ if ((error = ufs_extattr_stop(mp, td))) { if (error != EOPNOTSUPP) printf("WARNING: unmount %s: ufs_extattr_stop " - "returned errno %d\n", mp->mnt_stat.f_mntonname, + "returned errno %d\n", mp->mnt_path, error); e_restart = 0; } else { Index: ufs/ufs/ufs_extattr.c =================================================================== --- ufs/ufs/ufs_extattr.c (revision 257489) +++ ufs/ufs/ufs_extattr.c (working copy) @@ -923,7 +923,7 @@ * up by the next write or extattrctl clean. */ printf("ufs_extattr_get (%s): inode number inconsistency (%d, %ju)\n", - mp->mnt_stat.f_mntonname, ueh.ueh_i_gen, (uintmax_t)ip->i_gen); + mp->mnt_path, ueh.ueh_i_gen, (uintmax_t)ip->i_gen); error = ENOATTR; goto vopunlock_exit; } @@ -1228,7 +1228,7 @@ * the next write or extattrctl clean. */ printf("ufs_extattr_rm (%s): inode number inconsistency (%d, %jd)\n", - mp->mnt_stat.f_mntonname, ueh.ueh_i_gen, (intmax_t)ip->i_gen); + mp->mnt_path, ueh.ueh_i_gen, (intmax_t)ip->i_gen); error = ENOATTR; goto vopunlock_exit; } Index: ufs/ufs/ufs_lookup.c =================================================================== --- ufs/ufs/ufs_lookup.c (revision 257489) +++ ufs/ufs/ufs_lookup.c (working copy) @@ -771,11 +771,11 @@ mp = ITOV(ip)->v_mount; if ((mp->mnt_flag & MNT_RDONLY) == 0) panic("ufs_dirbad: %s: bad dir ino %ju at offset %ld: %s", - mp->mnt_stat.f_mntonname, (uintmax_t)ip->i_number, + mp->mnt_path, (uintmax_t)ip->i_number, (long)offset, how); else (void)printf("%s: bad dir ino %ju at offset %ld: %s\n", - mp->mnt_stat.f_mntonname, (uintmax_t)ip->i_number, + mp->mnt_path, (uintmax_t)ip->i_number, (long)offset, how); } Index: ufs/ufs/ufs_quota.c =================================================================== --- ufs/ufs/ufs_quota.c (revision 257489) +++ ufs/ufs/ufs_quota.c (working copy) @@ -238,7 +238,7 @@ DQI_UNLOCK(dq); if (warn) uprintf("\n%s: warning, %s disk quota exceeded\n", - ITOV(ip)->v_mount->mnt_stat.f_mntonname, + ITOV(ip)->v_mount->mnt_path, quotatypes[i]); } return (0); @@ -264,7 +264,7 @@ dq->dq_flags |= DQ_BLKS; DQI_UNLOCK(dq); uprintf("\n%s: write failed, %s disk limit reached\n", - ITOV(ip)->v_mount->mnt_stat.f_mntonname, + ITOV(ip)->v_mount->mnt_path, quotatypes[type]); return (EDQUOT); } @@ -289,7 +289,7 @@ DQI_UNLOCK(dq); uprintf("\n%s: write failed, %s " "disk quota exceeded for too long\n", - ITOV(ip)->v_mount->mnt_stat.f_mntonname, + ITOV(ip)->v_mount->mnt_path, quotatypes[type]); return (EDQUOT); } @@ -382,7 +382,7 @@ DQI_UNLOCK(dq); if (warn) uprintf("\n%s: warning, %s inode quota exceeded\n", - ITOV(ip)->v_mount->mnt_stat.f_mntonname, + ITOV(ip)->v_mount->mnt_path, quotatypes[i]); } return (0); @@ -407,7 +407,7 @@ dq->dq_flags |= DQ_INODS; DQI_UNLOCK(dq); uprintf("\n%s: write failed, %s inode limit reached\n", - ITOV(ip)->v_mount->mnt_stat.f_mntonname, + ITOV(ip)->v_mount->mnt_path, quotatypes[type]); return (EDQUOT); } @@ -432,7 +432,7 @@ DQI_UNLOCK(dq); uprintf("\n%s: write failed, %s " "inode quota exceeded for too long\n", - ITOV(ip)->v_mount->mnt_stat.f_mntonname, + ITOV(ip)->v_mount->mnt_path, quotatypes[type]); return (EDQUOT); } From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 19:43:03 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5DD8442A; Thu, 21 Nov 2013 19:43:03 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 379542FED; Thu, 21 Nov 2013 19:43:03 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 586F0B98A; Thu, 21 Nov 2013 14:43:02 -0500 (EST) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: taskqueue_block Date: Thu, 21 Nov 2013 14:14:06 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <5287BDB9.10201@FreeBSD.org> <528B7681.6090806@FreeBSD.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201311211414.06849.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 21 Nov 2013 14:43:02 -0500 (EST) Cc: Adrian Chadd , Andriy Gapon X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 19:43:03 -0000 On Tuesday, November 19, 2013 10:29:18 pm Adrian Chadd wrote: > Yes, and lets fix this. :) Hmm, is taskqueue_block() always used in context where waiting is safe? > On 19 November 2013 06:32, Andriy Gapon wrote: > > > > Forwarding this to the larger audience for a discussion. > > > > -------- Original Message -------- > > Message-ID: <5287BDB9.10201@FreeBSD.org> > > Date: Sat, 16 Nov 2013 20:47:21 +0200 > > From: Andriy Gapon > > Subject: taskqueue_block > > > > > > > > It seems that either I do not understand something about taskqueue_block code or > > it is a quite dangerous and abused API. The fact that it is not properly > > documented does not help either. > > > > The commit message said: > >> Implement taskqueue_block() and taskqueue_unblock(). These functions allow the > >> owner of a queue to block and unblock execution of the tasks in the queue while > >> allowing tasks to continue to be added queue. Combining this with > >> taskqueue_drain() allows a queue to be safely disabled. The unblock function may > > [...] > > > > I indeed see this (anti?) pattern being used in the code. > > But what about the following case. One thread calls taskqueue_block() and sets > > TQ_FLAGS_BLOCKED. Another thread calls taskqueue_enqueue, this adds a task to > > the queue and sets ta_pending of the task to 1. tq_enqueue is not called, so an > > actual queue runner is not called or waken up. Then the first thread calls > > taskqueue_drain() on the task. As far as I can see, the thread would then just > > wait forever because the task is pending and is not going to be executed. > > > > Additionally, it is impossible to reason about the taskqueue's state after > > taskqueue_block call, because the call just sets the flag and does not do any > > synchronization. And as described above, it is not safe to call APIs that could > > allow the taskqueue or the task state to become known. > > > > I think that taskqueue_block() should wait on the currently active tasks to > > complete. I don't think that this behavior could be optional. I do see any > > reasonable and safe use for "non-blocking" taskqueue_block(). > > taskqueue_drain() calls after taskqueue_block() must be removed. The code > > should either use taskqueue_drain() or "blocking" taskqueue_block() depending on > > concrete circumstances. > > > > What do you think? > > Thank you. > > -- > > Andriy Gapon > > > > > > > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 20:18:09 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 79DA77C8; Thu, 21 Nov 2013 20:18:09 +0000 (UTC) Received: from mail-qa0-x236.google.com (mail-qa0-x236.google.com [IPv6:2607:f8b0:400d:c00::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1C4242259; Thu, 21 Nov 2013 20:18:09 +0000 (UTC) Received: by mail-qa0-f54.google.com with SMTP id f11so4401168qae.13 for ; Thu, 21 Nov 2013 12:18:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=ktYvXsMjnzS6RttthW0sWR+ZnBMCGo/gnxADDTbXmD8=; b=ufBv5B7fbLvrtXIJYMWtEAQpIFkB3iRU3CwH3CU6RqlecnLGqN8SX1r04j4RhHyoLE 8hlaVzBsQGOtbORfVR60NK1d7ydGKdtyiRYHrcPBk7GiTICvx14Y93+pHZpiJA5s4k2K BdcaE/fPV4VVNOl7lhP5Zkns7EEWiTtW2V5cz/Zp0Siws46EOQKFOQ7d6iacQXPpCewQ 7Wx3yI8VCrpujWJawc671Zq4NeeutbAHDRU+uhHuF/TV/2QMwUUhC5v5tqCOqp/oksf1 4l+tSjdU8lEsjTtOa5S0tuO5EUxfM5xhgi3qcwEHc97oH7Ua763BeWPQv+Io79lvXU+K A3dg== MIME-Version: 1.0 X-Received: by 10.229.13.69 with SMTP id b5mr14764956qca.13.1385065088135; Thu, 21 Nov 2013 12:18:08 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.207.66 with HTTP; Thu, 21 Nov 2013 12:18:08 -0800 (PST) In-Reply-To: <201311211414.06849.jhb@freebsd.org> References: <5287BDB9.10201@FreeBSD.org> <528B7681.6090806@FreeBSD.org> <201311211414.06849.jhb@freebsd.org> Date: Thu, 21 Nov 2013 12:18:08 -0800 X-Google-Sender-Auth: HsVRcgu5OF3X7r92FDpzIFeeMoY Message-ID: Subject: Re: taskqueue_block From: Adrian Chadd To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-hackers@freebsd.org" , Andriy Gapon X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 20:18:09 -0000 On 21 November 2013 11:14, John Baldwin wrote: > On Tuesday, November 19, 2013 10:29:18 pm Adrian Chadd wrote: >> Yes, and lets fix this. :) > > Hmm, is taskqueue_block() always used in context where waiting is safe? I seem to recall that a taskqueue function may wish to block further jobs from running. The trouble is that since it was called from a task queued to that particular taskqueue, it'd hang. Sigh. So yes, some slightly saner semantics would be nice. -adrian From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 20:19:24 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C33B58E9; Thu, 21 Nov 2013 20:19:24 +0000 (UTC) Received: from mail-qe0-x22a.google.com (mail-qe0-x22a.google.com [IPv6:2607:f8b0:400d:c02::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 77F40226B; Thu, 21 Nov 2013 20:19:24 +0000 (UTC) Received: by mail-qe0-f42.google.com with SMTP id t9so234688qeq.1 for ; Thu, 21 Nov 2013 12:19:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=kpGDszWBrPsRKcq3ThWD4VaQexDymEzZEyTqFrjvITA=; b=Gu1JBQArnz+VfCZ2R4U+4lT/4Ss/35NRuLa52jOY0uUdoaoo9Zfm0dnUOJ0FdQBzHw YgKNuXphSf3xTl8PIz/WiGDTd0ltdC7VqQF64CN63ZIate0fIlTDjIvWT+w+Vvzqq+MZ GKqNvUHck8YM2sQ97zapx8kMyYJKeE5eH2kc1XyZxEeS/jkRVQF1aJ/KA80oLWtreUiY OO9nGC+FjvWgeMjMx80ohUp4GXGAfJKNg5JuFIRhA4r7X5ybmDA9LhLouCDYzQKmkxQv pT0+q9eosqojie96n/wA3nsaDb6dkylHmVRNFr0vyk6JmukMoSeq4mupRcOGObUY/1o6 R8LA== MIME-Version: 1.0 X-Received: by 10.49.59.70 with SMTP id x6mr14644774qeq.17.1385065163751; Thu, 21 Nov 2013 12:19:23 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.207.66 with HTTP; Thu, 21 Nov 2013 12:19:23 -0800 (PST) In-Reply-To: References: Date: Thu, 21 Nov 2013 12:19:23 -0800 X-Google-Sender-Auth: KpSA66aqysuZ2E9gne5DjdMEl1E Message-ID: Subject: Re: 9.1 callout behavior From: Adrian Chadd To: Bret Ketchum , Alexander Motin Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 20:19:24 -0000 Hi, It sounds like you may have found an interesting test case. Mav, any ideas? -adrian On 21 November 2013 05:20, Bret Ketchum wrote: > I've a callout which runs every 100ms and does a bit of accounting > using the global ticks variable. This one-shot callout was called fairly > consistently in 8.1, every 100ms give or take a few thousand clocks. I've > recently upgraded to 9.1 and for the most part the period is consistent. > However, periodically the callout function is executed anywhere between 5ms > to 20ms after the callout was reset and the function returned while global > ticks has increased 8x. The hardware has not changed (using the same > timecounter configuration): > > CPU: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2500.05-MHz K8-class CPU) > > kern.timecounter.hardware: TSC-low > kern.timecounter.tick: 1 > kern.timecounter.invariant_tsc: 1 > kern.timecounter.smp_tsc: 1 > > And default eventtimer configuration: > > kern.eventtimer.singlemul: 2 > kern.eventtimer.idletick: 0 > kern.eventtimer.activetick: 1 > kern.eventtimer.timer: LAPIC > kern.eventtimer.periodic: 0 > > If tickless mode is disabled the inconsistency goes away. Is the > premature expiration of the callout expected? Is the jump in global ticks > typical (say from 100 ticks to 800 ticks in 1.5ms)? > > Bret > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 20:24:01 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B9712C72 for ; Thu, 21 Nov 2013 20:24:01 +0000 (UTC) Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com [IPv6:2a00:1450:400c:c05::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5AC4D22EC for ; Thu, 21 Nov 2013 20:24:01 +0000 (UTC) Received: by mail-wi0-f169.google.com with SMTP id hm6so661495wib.0 for ; Thu, 21 Nov 2013 12:23:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=rQhJlX6XP5x+JGWs7QDsqWTzGfG5AALlYgtKUCnAaqQ=; b=FXON+sJNaG0QuYkgUtksHsEzx91JAcv1iiGnMv09DdbSm7tK3+leHlsS62vTSEG6cJ PHHXGlF9RWUWZZed7j29c+KjHV6cRgI/Q8kIo+upq5Epgr1dpwQBb7MBq9CKt36pQsFE ZnXufydxQPfKar3kG0P+mfMu6mjh7B+CNrBvr0Rz4YlUJpc9AkWoAhLiBFNlSE1mhQsu mOO/faK15vgYRSNomALHIam2KB6+0kb/YtGyGDCFxK0GfchEyh/pVpt9O+Xa7nwHyLtK AfXRQwuyYPP1OhGftcGWcLmsAUnoa3unpp/p834dhPV5z5h0BY44fF9Nb50R0ZIVzDxn GN1w== MIME-Version: 1.0 X-Received: by 10.180.74.174 with SMTP id u14mr7302128wiv.53.1385065439737; Thu, 21 Nov 2013 12:23:59 -0800 (PST) Received: by 10.216.65.130 with HTTP; Thu, 21 Nov 2013 12:23:59 -0800 (PST) Date: Thu, 21 Nov 2013 22:23:59 +0200 Message-ID: Subject: CRC32 feature in FreeBSD's bootloader From: Boris Astardzhiev To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.16 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 20:24:01 -0000 Hello, A few months ago I posted a new feature in the FreeBSD bootloader. So far I haven't received any comments so I'll try to revive this topic. http://www.freebsd.org/cgi/query-pr.cgi?pr=172301&cat= http://lists.freebsd.org/pipermail/freebsd-fs/2012-October/015288.html It may be of use to somebody. So any comments and suggestions? Greetings, Boris Astardzhiev From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 20:44:45 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EAA5DA35 for ; Thu, 21 Nov 2013 20:44:45 +0000 (UTC) Received: from co1outboundpool.messaging.microsoft.com (co1ehsobe003.messaging.microsoft.com [216.32.180.186]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A99ED2497 for ; Thu, 21 Nov 2013 20:44:45 +0000 (UTC) Received: from mail192-co1-R.bigfish.com (10.243.78.231) by CO1EHSOBE002.bigfish.com (10.243.66.65) with Microsoft SMTP Server id 14.1.225.22; Thu, 21 Nov 2013 20:44:34 +0000 Received: from mail192-co1 (localhost [127.0.0.1]) by mail192-co1-R.bigfish.com (Postfix) with ESMTP id 5C7379805EF; Thu, 21 Nov 2013 20:44:34 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.240.101; KIP:(null); UIP:(null); IPV:NLI; H:BL2PRD0510HT001.namprd05.prod.outlook.com; RD:none; EFVD:NLI X-SpamScore: 0 X-BigFish: VPS0(zz9371I542Izz1f42h2148h208ch1ee6h1de0h1fdah2073h2146h1202h1e76h1d1ah1d2ah1fc6hzz8275ch1de098h17326ah8275dh1de097h186068hz2fh109h2a8h839h947hd24hf0ah1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah224fh1d07h1d0ch1d2eh1d3fh1de9h1dfeh1dffh1fe8h1ff5h2216h22d0h9a9j1155h) Received-SPF: pass (mail192-co1: domain of juniper.net designates 157.56.240.101 as permitted sender) client-ip=157.56.240.101; envelope-from=aduane@juniper.net; helo=BL2PRD0510HT001.namprd05.prod.outlook.com ; .outlook.com ; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(377454003)(13464003)(199002)(189002)(74316001)(33646001)(79102001)(80976001)(80022001)(50986001)(77982001)(76796001)(83072001)(74502001)(56816003)(4396001)(81816001)(54316002)(15202345003)(63696002)(59766001)(74662001)(19580395003)(65816001)(19580405001)(49866001)(31966008)(56776001)(15975445006)(47446002)(47736001)(81686001)(83322001)(74366001)(2656002)(81342001)(74876001)(81542001)(46102001)(53806001)(76482001)(76576001)(76786001)(47976001)(54356001)(66066001)(74706001)(85306002)(87936001)(69226001)(51856001)(87266001)(24736002); DIR:OUT; SFP:; SCL:1; SRVR:BY2PR05MB582; H:BY2PR05MB582.namprd05.prod.outlook.com; CLIP:66.129.241.19; FPR:; RD:InfoNoRecords; A:1; MX:1; LANG:en; Received: from mail192-co1 (localhost.localdomain [127.0.0.1]) by mail192-co1 (MessageSwitch) id 1385066672670737_26420; Thu, 21 Nov 2013 20:44:32 +0000 (UTC) Received: from CO1EHSMHS010.bigfish.com (unknown [10.243.78.243]) by mail192-co1.bigfish.com (Postfix) with ESMTP id 9659014004C; Thu, 21 Nov 2013 20:44:32 +0000 (UTC) Received: from BL2PRD0510HT001.namprd05.prod.outlook.com (157.56.240.101) by CO1EHSMHS010.bigfish.com (10.243.66.20) with Microsoft SMTP Server (TLS) id 14.16.227.3; Thu, 21 Nov 2013 20:44:32 +0000 Received: from BY2PR05MB582.namprd05.prod.outlook.com (10.141.219.146) by BL2PRD0510HT001.namprd05.prod.outlook.com (10.255.100.36) with Microsoft SMTP Server (TLS) id 14.16.383.1; Thu, 21 Nov 2013 20:44:32 +0000 Received: from BY2PR05MB582.namprd05.prod.outlook.com (10.141.219.146) by BY2PR05MB582.namprd05.prod.outlook.com (10.141.219.146) with Microsoft SMTP Server (TLS) id 15.0.820.5; Thu, 21 Nov 2013 20:44:30 +0000 Received: from BY2PR05MB582.namprd05.prod.outlook.com ([10.141.219.146]) by BY2PR05MB582.namprd05.prod.outlook.com ([10.141.219.146]) with mapi id 15.00.0820.005; Thu, 21 Nov 2013 20:44:29 +0000 From: Andrew Duane To: Boris Astardzhiev , "freebsd-hackers@freebsd.org" Subject: RE: CRC32 feature in FreeBSD's bootloader Thread-Topic: CRC32 feature in FreeBSD's bootloader Thread-Index: AQHO5ve1O2DgqpptYkKTB8hsnXVzzJowJrTg Date: Thu, 21 Nov 2013 20:44:29 +0000 Message-ID: <597127d7d71a496995d9407842121a47@BY2PR05MB582.namprd05.prod.outlook.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [66.129.241.19] x-forefront-prvs: 0037FD6480 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: juniper.net X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn% X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 20:44:46 -0000 I'm all for it, depending on what you do with it. The bootloader I implemen= ted for my platform tags every image it writes into flash with a checksum (= we use MD5, not CRC32, but still), and can keep multiple copies as backup. .................................... Andrew L. Duane Resident Architect - AT&T Technical Lead JNCIA - JUNOS m=A0=A0=A0+1 603.770.7088 o +1 408.933.6944 (2-6944) skype: andrewlduane aduane@juniper.net LET'S=A0GET=A0STARTED=A0=20 -----Original Message----- From: owner-freebsd-hackers@freebsd.org [mailto:owner-freebsd-hackers@freeb= sd.org] On Behalf Of Boris Astardzhiev Sent: Thursday, November 21, 2013 3:24 PM To: freebsd-hackers@freebsd.org Subject: CRC32 feature in FreeBSD's bootloader Hello, A few months ago I posted a new feature in the FreeBSD bootloader. So far I haven't received any comments so I'll try to revive this topic. http://www.freebsd.org/cgi/query-pr.cgi?pr=3D172301&cat=3D http://lists.freebsd.org/pipermail/freebsd-fs/2012-October/015288.html It may be of use to somebody. So any comments and suggestions? Greetings, Boris Astardzhiev _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 20:50:23 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 36EEBF84 for ; Thu, 21 Nov 2013 20:50:23 +0000 (UTC) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 040B6251E for ; Thu, 21 Nov 2013 20:50:22 +0000 (UTC) Received: from smtp.fisglobal.com ([10.132.206.16]) by ltcfislmsgpa03.fnfis.com (8.14.5/8.14.5) with ESMTP id rALKoLbb028731 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Thu, 21 Nov 2013 14:50:21 -0600 Received: from LTCFISWMSGMB21.FNFIS.com ([169.254.1.7]) by LTCFISWMSGHT05.FNFIS.com ([10.132.206.16]) with mapi id 14.03.0158.001; Thu, 21 Nov 2013 14:50:21 -0600 From: "Teske, Devin" To: Boris Astardzhiev Subject: Re: CRC32 feature in FreeBSD's bootloader Thread-Topic: CRC32 feature in FreeBSD's bootloader Thread-Index: AQHO5vtKBtkf5jJNEkqH2M7ihlEMZA== Date: Thu, 21 Nov 2013 20:50:19 +0000 Message-ID: <4B3925A5-9DBF-42B4-A12D-C9C7D5E6078C@fisglobal.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.132.253.120] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794, 1.0.14, 0.0.0000 definitions=2013-11-21_06:2013-11-21,2013-11-21,1970-01-01 signatures=0 Cc: FreeBSD Hackers , "Teske, Devin" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list Reply-To: Devin Teske List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 20:50:23 -0000 On Nov 21, 2013, at 12:23 PM, Boris Astardzhiev wrote: > Hello, >=20 > A few months ago I posted a new feature in the FreeBSD bootloader. > So far I haven't received any comments so I'll try to revive this topic. >=20 > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D172301&cat=3D > http://lists.freebsd.org/pipermail/freebsd-fs/2012-October/015288.html >=20 > It may be of use to somebody. So any comments and suggestions? >=20 I think it's a great idea. But... Can you extend it to be available to the Forth layer. That is, add a command to ficl.c that calls your code. I would very much like to be able to compute the CRC32 of a file from within Forth and get the results back on the stack. --=20 Devin _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you. From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 21:15:55 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6C5BC94B; Thu, 21 Nov 2013 21:15:55 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 095A726D1; Thu, 21 Nov 2013 21:15:54 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rALLFkpI074541; Thu, 21 Nov 2013 23:15:46 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rALLFkpI074541 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rALLFkvl074540; Thu, 21 Nov 2013 23:15:46 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 21 Nov 2013 23:15:46 +0200 From: Konstantin Belousov To: Vitaly Magerya Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler Message-ID: <20131121211546.GQ59496@kib.kiev.ua> References: <528DFEE6.6020504@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="JeIqLcbgB5JjL5AU" Content-Disposition: inline In-Reply-To: <528DFEE6.6020504@gmail.com> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-hackers@freebsd.org, davidxu@freebsd.org, threads@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 21:15:55 -0000 --JeIqLcbgB5JjL5AU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 21, 2013 at 02:39:02PM +0200, Vitaly Magerya wrote: > Hi, folks. I'm investigating a test case failure that devel/boehm-gc > has on recent FreeBSD releases. The problem is that a signal > handler registered for SIGUSR1 is sometimes called with signum=3D0, > which should not be possible under any conditions. >=20 > Here's a simple test case that demonstrates this behavior: >=20 > /* Compile with 'c99 -o example example.c -pthread' > */ > #include > #include > #include > #include >=20 > void signal_handler(int signum, siginfo_t *si, void *context) { > if (signum !=3D SIGUSR1) { > printf("bad signal, signum=3D%d\n", signum); > exit(1); > } > } >=20 > void *thread_func(void *arg) { > return arg; > } >=20 > int main(void) { > struct sigaction sa =3D { 0 }; > sa.sa_flags =3D SA_SIGINFO; > sa.sa_sigaction =3D signal_handler; > if (sigfillset(&sa.sa_mask) !=3D 0) abort(); > if (sigaction(SIGUSR1, &sa, NULL) !=3D 0) abort(); > for (int i =3D 0; i < 10000; i++) { > pthread_t t; > pthread_create(&t, NULL, thread_func, NULL); > pthread_kill(t, SIGUSR1); Side note. pthread_kill(3) call behaviour is undefined if pthread_create(3) in the line before failed. > } > return 0; > } >=20 > Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get > "signum=3D0" from this program, but you may need to run it a few > times or increase the number of iterations to see the same. >=20 > Interestingly enough, I don't see this behavior under 9.0-RELEASE. >=20 > So, any ideas what the problem here is? It happens when libthr deferred signal handling path is taken for signal delivery and for some reason the code inside the deferred path called into rtld for symbol binding. Than, rtld lock is locked, some code in rtld is executed, and rtld lock is unlocked. Unlock causes _thr_ast() run, which results in the nested check_deferred_signal() execution. The check_deferred_signal() clearks si_signo, so on return the same signal is delivered one more time, but is advertized as signo zero. The _thr_rtld_init() approach of doing dummy calls does not really work, since it is not practically possible to enumerate the symbols needed during signal delivery. My first attempt to fix this was to increment curthread->critical_count around the calls to check_* functions in the _thr_ast(), but it causes reverse problem of losing _thr_ast() runs on unlock. I ended up with the flag to indicate that deferred delivery is running, so check_deferred_signal() should avoid doing anything. A delicate moment is that user signal handler is allowed to modify the passed machine context to result the return from the signal handler to cause arbitrary jump, or just do longjmp(). For this case, I also clear the flag in thr_sighandler(), since kernel signal delivery means that nested delivery code should not run right now. Please try this. diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_privat= e.h index 83a02b5..c6651cd 100644 --- a/lib/libthr/thread/thr_private.h +++ b/lib/libthr/thread/thr_private.h @@ -433,6 +433,9 @@ struct pthread { /* the sigaction should be used for deferred signal. */ struct sigaction deferred_sigact; =20 + /* deferred signal delivery is performed, do not reenter. */ + int deferred_run; + /* Force new thread to exit. */ int force_exit; =20 diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c index 415ddb0..57c9406 100644 --- a/lib/libthr/thread/thr_sig.c +++ b/lib/libthr/thread/thr_sig.c @@ -162,6 +162,7 @@ thr_sighandler(int sig, siginfo_t *info, void *_ucp) act =3D _thr_sigact[sig-1].sigact; _thr_rwl_unlock(&_thr_sigact[sig-1].lock); errno =3D err; + curthread->deferred_run =3D 0; =20 /* * if a thread is in critical region, for example it holds low level lock= s, @@ -320,14 +321,18 @@ check_deferred_signal(struct pthread *curthread) siginfo_t info; int uc_len; =20 - if (__predict_true(curthread->deferred_siginfo.si_signo =3D=3D 0)) + if (__predict_true(curthread->deferred_siginfo.si_signo =3D=3D 0 || + curthread->deferred_run)) return; =20 + curthread->deferred_run =3D 1; uc_len =3D __getcontextx_size(); uc =3D alloca(uc_len); getcontext(uc); - if (curthread->deferred_siginfo.si_signo =3D=3D 0) + if (curthread->deferred_siginfo.si_signo =3D=3D 0) { + curthread->deferred_run =3D 0; return; + } __fillcontextx2((char *)uc); act =3D curthread->deferred_sigact; uc->uc_sigmask =3D curthread->deferred_sigmask; --JeIqLcbgB5JjL5AU Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSjngBAAoJEJDCuSvBvK1BKx0QAJjAmJSh9i2IQC6e8pF1QXJG P6lTmX3WpLVdnPAA5ord/KoiBCaNJQ4w2YaEWOzuP3o4GHX70dYLY9HWHuwgMhei NzS+xOCdzZcPDI68ZghJ/N/67oJSlC9i/N4RLdgDqaBpElYrOKk1pmXqpQ/216op XinMrpR5oR4TvXJ80dNCsGzc5xQ0J9LW5TjYf3rzHSJSaYWO6jSIUwDrb6kLxtVA 7enT9j8rMO+HbXgWNNcXMBTAfo+2PabK/33twemiX7dbzGTQapbVK6RU9MYBYO0N 2Sa6YI0Zd5SFJyXLLggPi/Qop/mGIrsCgd2ICOsGnBYtc5qGpeFZkbKB8OnRdw02 u4HWokfnaE6eH+ktipA9+nbpAGL3MCsHgSZBLoIKDX0YWmqvEMM6wHdrJWWwIfEB /YJp8iHGwbrjtXx4ddUqa/30BRU1HzDImPafbAOvVdjLKFQozpHPJFwRhX+2NEA/ TA7PlXXLDVXc4wE7eP0Lo/8Vpnhk/Wv5Xz2a97F6IzdeOZpbuQwLaFf5eOJD77z9 8J1hhwE//c7nlk+9ovvRvqOdXyGeQSZaW22BRNu4VjYW/Cs5uaSGBCfPKJe99DGx 4tl3vaP28nnhQRH3reqyE/fJtfaJkMrGccO2EYVbkibaLWMEMBmLQ57no4TXrdWS BU5IUgDGkfqq4DKVpL87 =jCRC -----END PGP SIGNATURE----- --JeIqLcbgB5JjL5AU-- From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 21 23:13:00 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 14E676BF; Thu, 21 Nov 2013 23:12:58 +0000 (UTC) Received: from hoyletech.com (hoyletech.com [174.136.108.42]) by mx1.freebsd.org (Postfix) with ESMTP id 239F42F15; Thu, 21 Nov 2013 23:12:58 +0000 (UTC) Received: from unknown (pool-108-51-142-17.washdc.fios.verizon.net [108.51.142.17]) by hoyletech.com (Postfix) with ESMTPSA id 3F5DD60EC2; Thu, 21 Nov 2013 15:12:51 -0800 (PST) Date: Thu, 21 Nov 2013 18:12:32 -0500 From: Nathanael Hoyle To: Doug Ambrisko Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20131121181232.000071b8@unknown> In-Reply-To: <20131121174028.GA80520@ambrisko.com> References: <51B3B59B.8050903@erdgeist.org> <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua> <20131118190142.GA28210@ambrisko.com> <20131119074922.GY59496@kib.kiev.ua> <20131119174216.GA80753@ambrisko.com> <20131120075531.GE59496@kib.kiev.ua> <20131121174028.GA80520@ambrisko.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Konstantin Belousov , freebsd-hackers@freebsd.org, Dirk Engling , Jase Thew , mdf@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 23:13:00 -0000 On Thu, 21 Nov 2013 09:40:28 -0800 Doug Ambrisko wrote: > On Wed, Nov 20, 2013 at 09:55:31AM +0200, Konstantin Belousov wrote: > | On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote: > | > I was talking about the more general case since the system tries > to keep | > the path in the stat structure. My prior approach which > had more issues | > was to modify the stat structure of which I was > pointed to NetBSD and their | > change to statvfs which doesn't > really solve the problem. They don't | > have the check to see if > the mount is longer then VFS_MNAMELEN (in their case) | > and just > truncate things. | > > | > If we are just talking about adding it to the mount structure that > | > would be okay since it isn't exposed to user land. I can add > that. | > | Yes, this is exactly what I mean. Add a struct mount field, and use > | it for kernel only. In fact, it only matters for sys_unmount() and > | kern_jail.c, other locations in kernel use the path for warnings, > and | this could be postponed if you prefer to minimize the patch. > > Okay, I went through all of the occurances and compile tested (except > for #DEBUG). I united a few things but should do more once I get > consensus on the approach. I found a few spots that should be > updated as well and made the length check more consistant. Some were > doing >= and others > >. So this should be better, however, a lot larger. On the plus side > when we figure out how to return the longer path length to user land > that can be more flexible since the kernel is tracking the longer > length. Probably things to note are changes in: > ZFS to mount snapshot > cd9660 for symlinks > fuse to return full path > jail to check statfs and mount > mount/umount to save and check full path > mountroot to save new field for full path > > Just in case it doesn't make it in email the full patch is at: > http://people.freebsd.org/~ambrisko/mount_bigger.patch > > Thanks, > > Doug A. > Hey, long-time lurker, don't normally post, but I think this introduces a boundary error. It certainly appears to make the code not match the comments. > Index: cddl/compat/opensolaris/kern/opensolaris_vfs.c > =================================================================== > --- cddl/compat/opensolaris/kern/opensolaris_vfs.c (revision > 257489) +++ cddl/compat/opensolaris/kern/opensolaris_vfs.c > (working copy) @@ -126,7 +126,7 @@ > * variables will fit in our mp buffers, including the > * terminating NUL. > */ > - if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= > MNAMELEN) > + if (strlen(fstype) > MFSNAMELEN || strlen(fspath) > > MAXPATHLEN) return (ENAMETOOLONG); > > vfsp = vfs_byname_kld(fstype, td, &error); The change from >= to > in this comparison means that where strlen(fspath)==MAXPATHLEN, this guard is passed and no error is thrown. > =================================================================== > --- kern/vfs_mount.c (revision 257489) > +++ kern/vfs_mount.c (working copy) > @@ -473,6 +473,7 @@ > mp->mnt_cred = crdup(cred); > mp->mnt_stat.f_owner = cred->cr_uid; > strlcpy(mp->mnt_stat.f_mntonname, fspath, MNAMELEN); > + strlcpy((char *)mp->mnt_path, fspath, MAXPATHLEN); > mp->mnt_iosize_max = DFLTPHYS; > #ifdef MAC > mac_mount_init(mp); > @@ -656,7 +657,7 @@ > * variables will fit in our mp buffers, including the > * terminating NUL. > */ > - if (fstypelen > MFSNAMELEN || fspathlen > MNAMELEN) { > + if (fstypelen > MFSNAMELEN || fspathlen > MAXPATHLEN) { > error = ENAMETOOLONG; > goto bail; Same logic is used here, so it doesn't fail here. > } > @@ -748,8 +749,8 @@ > return (EOPNOTSUPP); > } > > - ma = mount_argsu(ma, "fstype", uap->type, MNAMELEN); > - ma = mount_argsu(ma, "fspath", uap->path, MNAMELEN); > + ma = mount_argsu(ma, "fstype", uap->type, MFSNAMELEN); > + ma = mount_argsu(ma, "fspath", uap->path, MAXPATHLEN); > ma = mount_argb(ma, flags & MNT_RDONLY, "noro"); > ma = mount_argb(ma, !(flags & MNT_NOSUID), "nosuid"); > ma = mount_argb(ma, !(flags & MNT_NOEXEC), "noexec"); > @@ -1040,7 +1041,7 @@ > * variables will fit in our mp buffers, including the > * terminating NUL. > */ > - if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= > MNAMELEN) > + if (strlen(fstype) > MFSNAMELEN || strlen(fspath) > > MAXPATHLEN) return (ENAMETOOLONG); Adding the rest of the comment from the sources here: /* * Be ultra-paranoid about making sure the type and fspath * variables will fit in our mp buffers, including the * terminating NUL. */ Ok, so intent is to ensure that the provided mount path can fully fit, *including* terminating NULL character. For this to be true, strlen(fspath)+1 must be <= MAXPATHLEN. This is not true when strlen(fspath) == MAXPATHLEN, but is not detected in this check. At the very least, here the code and the comments differ now. > Index: kern/vfs_mountroot.c > =================================================================== > --- kern/vfs_mountroot.c (revision 257489) > +++ kern/vfs_mountroot.c (working copy) > @@ -307,6 +307,8 @@ > vp->v_mountedhere = mporoot; > strlcpy(mporoot->mnt_stat.f_mntonname, > fspath, MNAMELEN); > + strlcpy((char *)mporoot->mnt_path, > + fspath, MAXPATHLEN); > VOP_UNLOCK(vp, 0); > } else > vput(vp); Here strlcpy is used safely. However, in cases where strlen(fspath)==MAXPATHLEN, the result is to truncate the last character in fspath and replace it with a NULL byte to ensure proper termination. Now mnt_path will contain not the "actual path", but the actual path truncated by 1 character, and back to not matching what was in the original call (same problem in earlier thread discussion). Also, two different paths which differed in only their last character would get truncated to the same path name (hilarity ensues). Hopefully my understanding is correct and I'm not completely off track and being unhelpful. I'm certain I have less experience with this code than others on the thread, but wanted to note what I believe is an issue. Regards, -Nathanael Hoyle From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 03:55:55 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 90A6B92A; Fri, 22 Nov 2013 03:55:55 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7085324B5; Fri, 22 Nov 2013 03:55:55 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rAM3tskd068692; Fri, 22 Nov 2013 03:55:54 GMT (envelope-from davidxu@freebsd.org) Message-ID: <528ED5D3.1030906@freebsd.org> Date: Fri, 22 Nov 2013 11:56:03 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130416 Thunderbird/17.0.5 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua> In-Reply-To: <20131121211546.GQ59496@kib.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, Vitaly Magerya , threads@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 03:55:55 -0000 On 2013/11/22 05:15, Konstantin Belousov wrote: > On Thu, Nov 21, 2013 at 02:39:02PM +0200, Vitaly Magerya wrote: >> Hi, folks. I'm investigating a test case failure that devel/boehm-gc >> has on recent FreeBSD releases. The problem is that a signal >> handler registered for SIGUSR1 is sometimes called with signum=0, >> which should not be possible under any conditions. >> >> Here's a simple test case that demonstrates this behavior: >> >> /* Compile with 'c99 -o example example.c -pthread' >> */ >> #include >> #include >> #include >> #include >> >> void signal_handler(int signum, siginfo_t *si, void *context) { >> if (signum != SIGUSR1) { >> printf("bad signal, signum=%d\n", signum); >> exit(1); >> } >> } >> >> void *thread_func(void *arg) { >> return arg; >> } >> >> int main(void) { >> struct sigaction sa = { 0 }; >> sa.sa_flags = SA_SIGINFO; >> sa.sa_sigaction = signal_handler; >> if (sigfillset(&sa.sa_mask) != 0) abort(); >> if (sigaction(SIGUSR1, &sa, NULL) != 0) abort(); >> for (int i = 0; i < 10000; i++) { >> pthread_t t; >> pthread_create(&t, NULL, thread_func, NULL); >> pthread_kill(t, SIGUSR1); > Side note. pthread_kill(3) call behaviour is undefined if pthread_create(3) > in the line before failed. > >> } >> return 0; >> } >> >> Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get >> "signum=0" from this program, but you may need to run it a few >> times or increase the number of iterations to see the same. >> >> Interestingly enough, I don't see this behavior under 9.0-RELEASE. >> >> So, any ideas what the problem here is? > > It happens when libthr deferred signal handling path is taken for signal > delivery and for some reason the code inside the deferred path called > into rtld for symbol binding. Than, rtld lock is locked, some code in > rtld is executed, and rtld lock is unlocked. Unlock causes _thr_ast() > run, which results in the nested check_deferred_signal() execution. > The check_deferred_signal() clearks si_signo, so on return the same > signal is delivered one more time, but is advertized as signo zero. > > The _thr_rtld_init() approach of doing dummy calls does not really work, > since it is not practically possible to enumerate the symbols needed > during signal delivery. > > My first attempt to fix this was to increment curthread->critical_count > around the calls to check_* functions in the _thr_ast(), but it causes > reverse problem of losing _thr_ast() runs on unlock. > > I ended up with the flag to indicate that deferred delivery is running, > so check_deferred_signal() should avoid doing anything. A delicate > moment is that user signal handler is allowed to modify the passed > machine context to result the return from the signal handler to cause > arbitrary jump, or just do longjmp(). For this case, I also clear the > flag in thr_sighandler(), since kernel signal delivery means that nested > delivery code should not run right now. > > Please try this. > > diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_private.h > index 83a02b5..c6651cd 100644 > --- a/lib/libthr/thread/thr_private.h > +++ b/lib/libthr/thread/thr_private.h > @@ -433,6 +433,9 @@ struct pthread { > /* the sigaction should be used for deferred signal. */ > struct sigaction deferred_sigact; > > + /* deferred signal delivery is performed, do not reenter. */ > + int deferred_run; > + > /* Force new thread to exit. */ > int force_exit; > > diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c > index 415ddb0..57c9406 100644 > --- a/lib/libthr/thread/thr_sig.c > +++ b/lib/libthr/thread/thr_sig.c > @@ -162,6 +162,7 @@ thr_sighandler(int sig, siginfo_t *info, void *_ucp) > act = _thr_sigact[sig-1].sigact; > _thr_rwl_unlock(&_thr_sigact[sig-1].lock); > errno = err; > + curthread->deferred_run = 0; > > /* > * if a thread is in critical region, for example it holds low level locks, > @@ -320,14 +321,18 @@ check_deferred_signal(struct pthread *curthread) > siginfo_t info; > int uc_len; > > - if (__predict_true(curthread->deferred_siginfo.si_signo == 0)) > + if (__predict_true(curthread->deferred_siginfo.si_signo == 0 || > + curthread->deferred_run)) > return; > > + curthread->deferred_run = 1; > uc_len = __getcontextx_size(); > uc = alloca(uc_len); > getcontext(uc); > - if (curthread->deferred_siginfo.si_signo == 0) > + if (curthread->deferred_siginfo.si_signo == 0) { > + curthread->deferred_run = 0; > return; > + } > __fillcontextx2((char *)uc); > act = curthread->deferred_sigact; > uc->uc_sigmask = curthread->deferred_sigmask; > The patch looks fine to me. From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 07:42:43 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D0F3A6D9; Fri, 22 Nov 2013 07:42:43 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 50ACC2F41; Fri, 22 Nov 2013 07:42:43 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAM7gTMh005995; Fri, 22 Nov 2013 09:42:29 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAM7gTMh005995 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAM7gSvn005994; Fri, 22 Nov 2013 09:42:28 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 22 Nov 2013 09:42:28 +0200 From: Konstantin Belousov To: Doug Ambrisko Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20131122074228.GT59496@kib.kiev.ua> References: <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua> <20131118190142.GA28210@ambrisko.com> <20131119074922.GY59496@kib.kiev.ua> <20131119174216.GA80753@ambrisko.com> <20131120075531.GE59496@kib.kiev.ua> <20131121174028.GA80520@ambrisko.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="T+nnW5vHQKf/VlFb" Content-Disposition: inline In-Reply-To: <20131121174028.GA80520@ambrisko.com> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-hackers@freebsd.org, Dirk Engling , Jase Thew , mdf@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 07:42:43 -0000 --T+nnW5vHQKf/VlFb Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 21, 2013 at 09:40:28AM -0800, Doug Ambrisko wrote: > On Wed, Nov 20, 2013 at 09:55:31AM +0200, Konstantin Belousov wrote: > | On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote: > | > I was talking about the more general case since the system tries to k= eep > | > the path in the stat structure. My prior approach which had more iss= ues > | > was to modify the stat structure of which I was pointed to NetBSD and= their > | > change to statvfs which doesn't really solve the problem. They don't > | > have the check to see if the mount is longer then VFS_MNAMELEN (in th= eir case) > | > and just truncate things. > | >=20 > | > If we are just talking about adding it to the mount structure that > | > would be okay since it isn't exposed to user land. I can add that. > | > | Yes, this is exactly what I mean. Add a struct mount field, and use > | it for kernel only. In fact, it only matters for sys_unmount() and > | kern_jail.c, other locations in kernel use the path for warnings, and > | this could be postponed if you prefer to minimize the patch. >=20 > Okay, I went through all of the occurances and compile tested (except > for #DEBUG). I united a few things but should do more once I get > consensus on the approach. I found a few spots that should be updated as > well and made the length check more consistant. Some were doing >=3D and= others > >. So this should be better, however, a lot larger. On the plus side > when we figure out how to return the longer path length to user land > that can be more flexible since the kernel is tracking the longer length. > Probably things to note are changes in: > ZFS to mount snapshot > cd9660 for symlinks > fuse to return full path > jail to check statfs and mount > mount/umount to save and check full path > mountroot to save new field for full path > =09 > Just in case it doesn't make it in email the full patch is at: > http://people.freebsd.org/~ambrisko/mount_bigger.patch >=20 Yes, this is closer to the patch I can agree with :). Two notes, one was already made, about off by one. Second is, I suggest to make the mnt_path member a char *. Usually, the mount point path length is quite short, so 1024 bytes for the buffer is excessive. You can allocate exact needed buffer, which would save in around 10KB of kernel memory even for relatively modest amount of 10 mounts. For additional cleverness, mnt_path could point to f_mntonname when the length is less than MNAMELEN. Since mp deallocation is centralized, code for the trick should be not too hard. --T+nnW5vHQKf/VlFb Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSjwrkAAoJEJDCuSvBvK1BTQYP+gI9yGxHBxVh7pn4NBeZjg7e MEe6URaiyFgSSsQ1HCB5Z6pvyXJGBo2ATZKR5IfD/61e9ZAylsLX8YPR7XBDQxv1 c2JGoBNilVbTZdneqx5eP+AWeTvKVPXt1q6xuBozLZy6xV+E9/P4vk+lBP9/bmVF /xKtvYX6wsoM3AXCGlajppvRmBTuknkFgeOlCRrExeX4M0VHDWinphxnQt1f9v51 BHUAlmhJv77i0zi9UzU2/QlsKQ+n/dXKWkiobdzsu7anthdbPOLBSSEMkmAqf44F yRgiv/MXQVdJpS4QXawnarSwdmH7xFw1YXnZk5lb9ysnS3e7EEiT5jlFK+MjQ7EV JiEnVLm9v26lpBoRatfCHahZ+XGUa8WT1OzRWtCOyc/yCpK10kuRAS3hj5+e4vj8 ZNQtZxpMcJ8T20f5kwthE+cqr/dicja/oPgdoqmfEVODcfGmNWZyb43jSnrQabWp p3CKJN3wJ52DQ2oF5khc9XTuJS72PM0BlsXDnlB0+tC0oRjNf74HXJzOrbf/OCh7 6NDbwxFK1OjaqPwaiT9b1J30mnqL8IqRbInkNYFHKBie/Q30cxEe9IpoNsNyRd2F ou8Gwz8hx6R0E/j8m6cRFBeNKF4bej3EEZOgjqtZzLDpyTieZNZFs8cG3Hh15/+5 Jp7ZG8lCHLI2dJEvSNtT =Hye4 -----END PGP SIGNATURE----- --T+nnW5vHQKf/VlFb-- From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 10:22:44 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AD393B0E; Fri, 22 Nov 2013 10:22:44 +0000 (UTC) Received: from mail-lb0-x22c.google.com (mail-lb0-x22c.google.com [IPv6:2a00:1450:4010:c04::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CDE7E27FE; Fri, 22 Nov 2013 10:22:43 +0000 (UTC) Received: by mail-lb0-f172.google.com with SMTP id z5so765996lbh.17 for ; Fri, 22 Nov 2013 02:22:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=F56WM0o7sF1cHguprHuxlEqMssyHnKayRWPaALttQuw=; b=myeDvG4mrYhmXeDBWPXGSLOsbjE1E0CktNRcRXVqssRB5oq1jYcgixI8Zh3gv8EXhz siSY84JBnabHjqI99IHP1pQJkEAKi+SahSnn1J3CA/Vg2KUVJzYcMrx4NK+Plqw9furG 8EU8Vw8tfl5+ujF2WALUhopKkll2zcdW0zLHqJOf/oSKBPnbfXP4+XVfUA0GaC5yUC1Q Y3rHue95NiM5EowYOsJzB9AkI/B/ApuVh1D8XNUEMNBbXVkju/UK5MBqDML79FzBgVaB tpvEoJYmh4nTkTLeZBhUTjuJxrpTDKNsk82E9HVFSuHA7LG06dkSD1GNPa9c6CJ+LCV5 cr0A== X-Received: by 10.112.143.3 with SMTP id sa3mr8630181lbb.12.1385115761816; Fri, 22 Nov 2013 02:22:41 -0800 (PST) Received: from [172.29.2.131] (195-248-173-117.static.vega-ua.net. [195.248.173.117]) by mx.google.com with ESMTPSA id vz9sm26433975lbb.17.2013.11.22.02.22.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 Nov 2013 02:22:40 -0800 (PST) Message-ID: <528F3062.8040105@gmail.com> Date: Fri, 22 Nov 2013 12:22:26 +0200 From: Vitaly Magerya User-Agent: Thunderbird MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua> In-Reply-To: <20131121211546.GQ59496@kib.kiev.ua> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: freebsd-hackers@freebsd.org, davidxu@freebsd.org, threads@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 10:22:44 -0000 On 2013-11-21 23:15, Konstantin Belousov wrote: > Please try this. > > diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_private.h > [...] > diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c > [...] Yeah, applied to 9.2-RELEASE, this fixes the issues I had; thank you. Will you commit it and will it make it's way into 10-RELEASE? From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 11:56:28 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 793A7F32; Fri, 22 Nov 2013 11:56:28 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DFA792F64; Fri, 22 Nov 2013 11:56:27 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAMBuI53059924; Fri, 22 Nov 2013 13:56:18 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAMBuI53059924 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAMBuIBl059923; Fri, 22 Nov 2013 13:56:18 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 22 Nov 2013 13:56:18 +0200 From: Konstantin Belousov To: Vitaly Magerya Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler Message-ID: <20131122115618.GZ59496@kib.kiev.ua> References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua> <528F3062.8040105@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="hwm514xwU++9Zw4g" Content-Disposition: inline In-Reply-To: <528F3062.8040105@gmail.com> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-hackers@freebsd.org, davidxu@freebsd.org, threads@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 11:56:28 -0000 --hwm514xwU++9Zw4g Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 22, 2013 at 12:22:26PM +0200, Vitaly Magerya wrote: > On 2013-11-21 23:15, Konstantin Belousov wrote: > > Please try this. > >=20 > > diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_pr= ivate.h > > [...] > > diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c > > [...] >=20 > Yeah, applied to 9.2-RELEASE, this fixes the issues I had; thank you. > Will you commit it and will it make it's way into 10-RELEASE? Sure I will commit it after testing. It is too premature to talk about MFC, before the reasonable testing period in HEAD after commit. --hwm514xwU++9Zw4g Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSj0ZhAAoJEJDCuSvBvK1BPD4P/AwPqZZC7h+Y8qI7u1GFqPku 4E4ePSll/quaDXonal8Kh8xhmzIKmruf1D8UI8rh0xlKt8fwBeaXRqUjJQyMHrmP hPISOsevAlbN6GhftgqeEGa3v5mcwuz88RRPNxrfV/nRKdd8NRElQtbBVgVkatIr F5MmOst7CChRrjmt+g5StdzEUXUcfm2togS5gvuxhZukEuWMqz56KZ/20SP7PvAG A0lP/gbhOAZIlEhQ9/r8hBif/Sld42V7rRVr9PQr3ncAXVuICAcAoduVuhP/r/zH ZqlSUbuTCBIsCH5dUT/Wcj77VIxV8amYzeAf/kRS8fFGlVOq2/tiTovaiFPpzmMH CMamm+npBq26sZN3EhUckCmkbvXRWvevhyCTuBTon1rLK4gzI0YaYPx/PucEmsZq UHus/X2Ude9NWG/yubPgq1M9ZcaWSTxrMAnBreZL+VIJlMgwEZuJPJ+L6hH3we9p +Zp8Pf8cDFw9UeekKfepYDROKOpQJ3LJhfSyygzER2aDLTgQJ+DHzdUC1AxDa9Z8 TrXSxdQvH7WkPZRlQfPjmXCw2iD7AsfHsiRpIPGlo/rF5eUxBesELGck6OQ+gMx9 j5CXraBBz+uahhRJkP4ERgnDHGiEHOsT3eH9QXI799imd4IodvlC/krDGlBObPcV GfctHyU6tLHkKgkbiZQ5 =pBj7 -----END PGP SIGNATURE----- --hwm514xwU++9Zw4g-- From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 13:36:09 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 057ED2AC; Fri, 22 Nov 2013 13:36:09 +0000 (UTC) Received: from mx1.stack.nl (unknown [IPv6:2001:610:1108:5012::107]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 89B0324E8; Fri, 22 Nov 2013 13:36:08 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id 060391203C8; Fri, 22 Nov 2013 14:35:54 +0100 (CET) Received: by turtle.stack.nl (Postfix, from userid 1677) id D2507CB4E; Fri, 22 Nov 2013 14:35:53 +0100 (CET) Date: Fri, 22 Nov 2013 14:35:53 +0100 From: Jilles Tjoelker To: Konstantin Belousov Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler Message-ID: <20131122133553.GA28457@stack.nl> References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131121211546.GQ59496@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@freebsd.org, threads@freebsd.org, Vitaly Magerya , davidxu@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 13:36:09 -0000 On Thu, Nov 21, 2013 at 11:15:46PM +0200, Konstantin Belousov wrote: > On Thu, Nov 21, 2013 at 02:39:02PM +0200, Vitaly Magerya wrote: > > Hi, folks. I'm investigating a test case failure that devel/boehm-gc > > has on recent FreeBSD releases. The problem is that a signal > > handler registered for SIGUSR1 is sometimes called with signum=0, > > which should not be possible under any conditions. > > Here's a simple test case that demonstrates this behavior: > > /* Compile with 'c99 -o example example.c -pthread' > > */ > > #include > > #include > > #include > > #include > > > > void signal_handler(int signum, siginfo_t *si, void *context) { > > if (signum != SIGUSR1) { > > printf("bad signal, signum=%d\n", signum); > > exit(1); > > } > > } > > > > void *thread_func(void *arg) { > > return arg; > > } > > > > int main(void) { > > struct sigaction sa = { 0 }; > > sa.sa_flags = SA_SIGINFO; > > sa.sa_sigaction = signal_handler; > > if (sigfillset(&sa.sa_mask) != 0) abort(); > > if (sigaction(SIGUSR1, &sa, NULL) != 0) abort(); > > for (int i = 0; i < 10000; i++) { > > pthread_t t; > > pthread_create(&t, NULL, thread_func, NULL); > > pthread_kill(t, SIGUSR1); > Side note. pthread_kill(3) call behaviour is undefined if pthread_create(3) > in the line before failed. > > > } > > return 0; > > } > > Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get > > "signum=0" from this program, but you may need to run it a few > > times or increase the number of iterations to see the same. > > Interestingly enough, I don't see this behavior under 9.0-RELEASE. This is because the bug was introduced with AVX support. (It also occurs on systems without AVX.) > > So, any ideas what the problem here is? > It happens when libthr deferred signal handling path is taken for signal > delivery and for some reason the code inside the deferred path called > into rtld for symbol binding. Than, rtld lock is locked, some code in > rtld is executed, and rtld lock is unlocked. Unlock causes _thr_ast() > run, which results in the nested check_deferred_signal() execution. > The check_deferred_signal() clearks si_signo, so on return the same > signal is delivered one more time, but is advertized as signo zero. > The _thr_rtld_init() approach of doing dummy calls does not really work, > since it is not practically possible to enumerate the symbols needed > during signal delivery. > My first attempt to fix this was to increment curthread->critical_count > around the calls to check_* functions in the _thr_ast(), but it causes > reverse problem of losing _thr_ast() runs on unlock. > I ended up with the flag to indicate that deferred delivery is running, > so check_deferred_signal() should avoid doing anything. A delicate > moment is that user signal handler is allowed to modify the passed > machine context to result the return from the signal handler to cause > arbitrary jump, or just do longjmp(). For this case, I also clear the > flag in thr_sighandler(), since kernel signal delivery means that nested > delivery code should not run right now. This analysis suggests an easier approach: just move the check for deferred_siginfo.si_signo == 0 downward. If __fillcontextx2 or sysarch need to be looked up by rtld, the resulting _thr_ast() will invoke the signal handler and the original call to check_deferred_signal() will do nothing. This patch fixes the problem for me on stable/9 and head. Index: lib/libthr/thread/thr_sig.c =================================================================== --- lib/libthr/thread/thr_sig.c (revision 258178) +++ lib/libthr/thread/thr_sig.c (working copy) @@ -326,12 +326,12 @@ check_deferred_signal(struct pthread *curthread) uc_len = __getcontextx_size(); uc = alloca(uc_len); getcontext(uc); - if (curthread->deferred_siginfo.si_signo == 0) - return; __fillcontextx2((char *)uc); act = curthread->deferred_sigact; uc->uc_sigmask = curthread->deferred_sigmask; memcpy(&info, &curthread->deferred_siginfo, sizeof(siginfo_t)); + if (curthread->deferred_siginfo.si_signo == 0) + return; /* remove signal */ curthread->deferred_siginfo.si_signo = 0; handle_signal(&act, info.si_signo, &info, uc); -- Jilles Tjoelker From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 14:36:01 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5CF082C4 for ; Fri, 22 Nov 2013 14:36:01 +0000 (UTC) Received: from mail-ie0-x22c.google.com (mail-ie0-x22c.google.com [IPv6:2607:f8b0:4001:c03::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3567C2847 for ; Fri, 22 Nov 2013 14:36:01 +0000 (UTC) Received: by mail-ie0-f172.google.com with SMTP id qd12so2210414ieb.17 for ; Fri, 22 Nov 2013 06:36:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=ExcJIENT3SBeWV6IZi8gESC993xfSEbJ/tWEKewj8W8=; b=xZwTeXyXPZuuPEcYXP3AyHwBws3WpHk9AmyxvCjJyF5os39FtRxHEzSZ4Pqz0jsNJf AGl+ZffxhRghuY80UrZWBM9CeCxpVf+tWVZ1ugutGNLS+t6gJ5RbdCYXhvRzxcDtwKFA 4zf8ha/UIkdwiReiwHTIu2TAWGhjb/p1+275E8ngHEt90N1WKzxTdKpneVqabKv+vHEW eQ5GstIs1jJycyrr+mkligfpmPTJ7QTAm65BYx1wlhoZoCoNQUCUDaEdVf4EUSwS2svk +Hu1KM0Cgdnut2fXCOOmTtUuf2P9Kob0MfBq2i/GiIodlmrQ2MT4MYOTr0S5VsOuN5ig AnBw== MIME-Version: 1.0 X-Received: by 10.50.238.196 with SMTP id vm4mr2617364igc.43.1385130960628; Fri, 22 Nov 2013 06:36:00 -0800 (PST) Received: by 10.50.225.70 with HTTP; Fri, 22 Nov 2013 06:36:00 -0800 (PST) Date: Fri, 22 Nov 2013 15:36:00 +0100 Message-ID: Subject: O_XATTR support in FreeBSD? From: Cedric Blancher To: "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 14:36:01 -0000 Are there plans to support O_XATTR in FreeBSD anytime soon? Our applications depend heavily on it (both through NFSv4 and ZFS) and we may need an alternative to Solaris soon. Ced -- Cedric Blancher Institute Pasteur From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 15:21:06 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A9E8EFD1; Fri, 22 Nov 2013 15:21:06 +0000 (UTC) Received: from mail-la0-x232.google.com (mail-la0-x232.google.com [IPv6:2a00:1450:4010:c03::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D39602AEE; Fri, 22 Nov 2013 15:21:05 +0000 (UTC) Received: by mail-la0-f50.google.com with SMTP id el20so1078928lab.9 for ; Fri, 22 Nov 2013 07:21:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=xy+5XmsAdvZzZrW1Wqwxo9XR1IpG1jW73LOi83n/Ges=; b=cMTB7LLUIrYQIu0u69NufN+d4Spfy5iRxyMyMmBRWfVJdBb0S5c7s/QDRteiO5YlUF P992K7n6ghyRU2Zr0BhSq56MmgpvWRbM95FLzsSKAsDByx8K5q8zAna77mgbYei1C7Ao qdzTR8oYIV4C05px9refgM/o7RzzLuIUE7YWetdfrD2+Pm/elqM6toVCsdkZNJ0H2HHd rtSR7FGHZUSvTBPiqw1DMW2TO9nIfCvYShzzfLnpslM77ikjEDB8/x4bMeEVd9Ki5UG5 CvOiwqC51xuKz6WMmcYdI5pl+Ke5egDF1+6D3z+lUZFtAtVHr7l0QivaQyOq28fjJa9v R4Cg== X-Received: by 10.152.115.230 with SMTP id jr6mr1318172lab.45.1385133663854; Fri, 22 Nov 2013 07:21:03 -0800 (PST) Received: from [172.16.0.2] (tx97.net. [85.198.160.156]) by mx.google.com with ESMTPSA id k3sm27320892lbs.0.2013.11.22.07.21.01 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 Nov 2013 07:21:02 -0800 (PST) Message-ID: <528F765A.8040306@gmail.com> Date: Fri, 22 Nov 2013 17:20:58 +0200 From: Vitaly Magerya User-Agent: Thunderbird MIME-Version: 1.0 To: Jilles Tjoelker , Konstantin Belousov Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua> <20131122133553.GA28457@stack.nl> In-Reply-To: <20131122133553.GA28457@stack.nl> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, davidxu@freebsd.org, threads@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 15:21:06 -0000 On 11/22/2013 15:35, Jilles Tjoelker wrote: > This patch fixes the problem for me on stable/9 and head. > > Index: lib/libthr/thread/thr_sig.c > =================================================================== > --- lib/libthr/thread/thr_sig.c (revision 258178) > +++ lib/libthr/thread/thr_sig.c (working copy) > @@ -326,12 +326,12 @@ check_deferred_signal(struct pthread *curthread) > uc_len = __getcontextx_size(); > uc = alloca(uc_len); > getcontext(uc); > - if (curthread->deferred_siginfo.si_signo == 0) > - return; > __fillcontextx2((char *)uc); > act = curthread->deferred_sigact; > uc->uc_sigmask = curthread->deferred_sigmask; > memcpy(&info, &curthread->deferred_siginfo, sizeof(siginfo_t)); > + if (curthread->deferred_siginfo.si_signo == 0) > + return; > /* remove signal */ > curthread->deferred_siginfo.si_signo = 0; > handle_signal(&act, info.si_signo, &info, uc); > I can confirm that this also solves the problems I'm seeing. From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 16:57:27 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3C2E5578 for ; Fri, 22 Nov 2013 16:57:27 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1FF8C20C3 for ; Fri, 22 Nov 2013 16:57:26 +0000 (UTC) Received: from [192.168.1.2] (pool-173-52-87-124.nycmny.fios.verizon.net [173.52.87.124]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: ryao) by smtp.gentoo.org (Postfix) with ESMTPSA id 4C19333DA86; Fri, 22 Nov 2013 16:57:19 +0000 (UTC) Message-ID: <528F8CFE.9030709@gentoo.org> Date: Fri, 22 Nov 2013 11:57:34 -0500 From: Richard Yao User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130925 Thunderbird/17.0.9 MIME-Version: 1.0 To: Cedric Blancher Subject: Re: O_XATTR support in FreeBSD? References: In-Reply-To: X-Enigmail-Version: 1.5.2 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VS5RI4F2qQipReN3gmqCg985gMP6rel7c" Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 16:57:27 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --VS5RI4F2qQipReN3gmqCg985gMP6rel7c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 11/22/2013 09:36 AM, Cedric Blancher wrote: > Are there plans to support O_XATTR in FreeBSD anytime soon? Our > applications depend heavily on it (both through NFSv4 and ZFS) and we > may need an alternative to Solaris soon. >=20 > Ced >=20 There is always OmniOS: http://omnios.omniti.com/ That being said, do you mean that FreeBSD's ZFS implementation lacks xattr support? ZFSOnLinux supports that, so I suppose that is another option should you mean what I think you mean. --VS5RI4F2qQipReN3gmqCg985gMP6rel7c Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSj40DAAoJECDuEZm+6ExknsgP/0vv7ZA+MbgRa7nI32GQQqtm RvswloXlRSYMHWO/uQWMumvqAL+cwvXzLStx/AxFqrYEoRoZ73FL6n0ts9RoVlsX 6AagYioLA4EBaCJuszrqhc0ZvWkLdS0EKbcHBw9DUn0b2uzkkIOcPU7LNvvqAr6B r+q0VirmIRfbRMWc0acMeRS9FqV40QT+TZOpvCF2U4eWkCo7CHji+belk1NdXVDa bwk8b6aPfstgmAFC0ZVNdwp2AbKUSNDdVQV1+ZgIaSV4D+ctFIuPoIvknV83qmmt l/etlF71bdYz6lMYkI1KJi1jD1W/MpIzUP6eXEVsRd6crsq81BkBl6tRckiaLE0N 6sesKVJKTYrDyb2LvhCa36Xuug4U//LzsBSkUz9ssLEfpY9r4fnt5e7yMolxqDbe 8H9IvGv3XbJAQfL10kIIsYeiNjixh7ZVfCixS0vpCYzND1ODRnZPK9oddrmlq2dk AyACKJ9kuaHjJnHvjoj2ZVBFQcsMWpvk5ilhdKxBtfqkTrkbIRcKRpmPzBQoZdvR tNPYF1wFjRa7//rifDkJklv5N+t0qlRwdzCw0QNGUBq0mbepRRVREzIEfHlkvLxt 3cLUFHpkjTg3Fr+K5tvEd5bM1cUgRturyU78eOb+tieFmmSPKj+hyBstqISfTPht tLoVqBLhITkzT/RHncHj =hsll -----END PGP SIGNATURE----- --VS5RI4F2qQipReN3gmqCg985gMP6rel7c-- From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 17:04:20 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A223A6F8; Fri, 22 Nov 2013 17:04:20 +0000 (UTC) Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90]) by mx1.freebsd.org (Postfix) with ESMTP id 7A5442124; Fri, 22 Nov 2013 17:04:20 +0000 (UTC) X-Ambrisko-Me: Yes Received: from server2.ambrisko.com (HELO internal.ambrisko.com) ([192.168.1.2]) by ironport.ambrisko.com with ESMTP; 22 Nov 2013 09:08:12 -0800 Received: from ambrisko.com (localhost [127.0.0.1]) by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id rAMH4Ji4070672; Fri, 22 Nov 2013 09:04:19 -0800 (PST) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.14.4/8.14.4/Submit) id rAMH4JRa070670; Fri, 22 Nov 2013 09:04:19 -0800 (PST) (envelope-from ambrisko) Date: Fri, 22 Nov 2013 09:04:19 -0800 From: Doug Ambrisko To: Konstantin Belousov Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs Message-ID: <20131122170419.GA60910@ambrisko.com> References: <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua> <20131118190142.GA28210@ambrisko.com> <20131119074922.GY59496@kib.kiev.ua> <20131119174216.GA80753@ambrisko.com> <20131120075531.GE59496@kib.kiev.ua> <20131121174028.GA80520@ambrisko.com> <20131122074228.GT59496@kib.kiev.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131122074228.GT59496@kib.kiev.ua> User-Agent: Mutt/1.4.2.3i Cc: freebsd-hackers@freebsd.org, Dirk Engling , Jase Thew , mdf@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 17:04:20 -0000 On Fri, Nov 22, 2013 at 09:42:28AM +0200, Konstantin Belousov wrote: | On Thu, Nov 21, 2013 at 09:40:28AM -0800, Doug Ambrisko wrote: | > On Wed, Nov 20, 2013 at 09:55:31AM +0200, Konstantin Belousov wrote: | > | On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote: | > | > I was talking about the more general case since the system tries to keep | > | > the path in the stat structure. My prior approach which had more issues | > | > was to modify the stat structure of which I was pointed to NetBSD and their | > | > change to statvfs which doesn't really solve the problem. They don't | > | > have the check to see if the mount is longer then VFS_MNAMELEN (in their case) | > | > and just truncate things. | > | > | > | > If we are just talking about adding it to the mount structure that | > | > would be okay since it isn't exposed to user land. I can add that. | > | | > | Yes, this is exactly what I mean. Add a struct mount field, and use | > | it for kernel only. In fact, it only matters for sys_unmount() and | > | kern_jail.c, other locations in kernel use the path for warnings, and | > | this could be postponed if you prefer to minimize the patch. | > | > Okay, I went through all of the occurances and compile tested (except | > for #DEBUG). I united a few things but should do more once I get | > consensus on the approach. I found a few spots that should be updated as | > well and made the length check more consistant. Some were doing >= and others | > >. So this should be better, however, a lot larger. On the plus side | > when we figure out how to return the longer path length to user land | > that can be more flexible since the kernel is tracking the longer length. | > Probably things to note are changes in: | > ZFS to mount snapshot | > cd9660 for symlinks | > fuse to return full path | > jail to check statfs and mount | > mount/umount to save and check full path | > mountroot to save new field for full path | > | > Just in case it doesn't make it in email the full patch is at: | > http://people.freebsd.org/~ambrisko/mount_bigger.patch | > | Yes, this is closer to the patch I can agree with :). | | Two notes, one was already made, about off by one. The off by one, I want to revisit so that it is consistant. We have places in which there was checks if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MNAMELEN) and if (strlen(fstype) >= MFSNAMELEN - 1 || strlen(fspath) >= MNAMELEN - 1) both with the same comment of "Be ultra-paranoid". Unless something is special they should have been the same and whatever is right should be carried forward. If there is a special case then it should be clearly commented. Since this check has moved into other code we need to get it hashed out once and for all IMHO. I mainly did this current change to make sure attention is drawn to this for now until it is resolved. | Second is, I suggest to make the mnt_path member a char *. Usually, the | mount point path length is quite short, so 1024 bytes for the buffer is | excessive. You can allocate exact needed buffer, which would save in | around 10KB of kernel memory even for relatively modest amount of 10 | mounts. Okay, I thought you wanted it a const char to potential guard against some mis use of the field in that this should be a read only value. I had actually planned to do the malloc since I was concerned about if this structure got allocated on the stack then it could explode the kernel's stack. It seems most of the consumers access the mount structure as a pointer so then I wasn't as concerned. | For additional cleverness, mnt_path could point to f_mntonname when | the length is less than MNAMELEN. Since mp deallocation is centralized, | code for the trick should be not too hard. I'll look to see if I can change the other places that update mnt_path to use the vfs_mount_alloc type function. Since then we could get more sophisticated about the mnt_path allocater/reference as you mention. In nfs_mounroot.c it probably doesn't matter much since it should be a short path but it could be more of an issue with zfs snapshots. It looks like we are converging. I'll make some more changes to make sure we are getting on a good path port another patch. Once that looks okay in concept then I'll start looking into testing the various file systems since unfortuanately it touches a lot of code even though it is mostly mechanical. I don't have a lot of time to work on this so I want to optimize various things as once. If someone can help unit test corner cases that would be great with the various file systems. Atleast I have VirtualBox netbooting so I can test things quicker. However, that required some debugging and changes to pxeboot to send the Client ID so isc-dhcpd didn't get upset with it. I need to check that doesn't break the non-ipxe boot stuff that doesn't require the Client ID field to be set. I've only run into this issue with ipxe in VirtualBox and qemu. I also have some pxe boot robustness and caching fixes that I should get in as well. Thanks, Doug A. From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 18:11:24 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C806BB25 for ; Fri, 22 Nov 2013 18:11:24 +0000 (UTC) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 814372508 for ; Fri, 22 Nov 2013 18:11:24 +0000 (UTC) Received: from compute1.internal (compute1.nyi.mail.srv.osa [10.202.2.41]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 4285B21362; Fri, 22 Nov 2013 13:11:21 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Fri, 22 Nov 2013 13:11:21 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= daniel.shahaf.name; h=date:from:to:subject:message-id :mime-version:content-type; s=mesmtp; bh=M56sLF09ustEm9n5OiQUYQ4 jFAA=; b=uogmQmjyoxFNCltHWwy4QXbWx3k65BDisfxEf0ijYnGZ0WqYfeHzbCC 9XxeC33H6JP/CNwi9QNzxQ+rNeCUsM6kyHgg9WTudBT872+A3zE+X65D4LcCn0vS UJ52hMRkJlXpKFlStTtVsXb2GdLJJGhF3w+jY7VlyvdVBZDjnP2c= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=date:from:to:subject:message-id :mime-version:content-type; s=smtpout; bh=M56sLF09ustEm9n5OiQUYQ 4jFAA=; b=fBSU18CeSvt9OJWjJxjpj06Kp6wRcdexrGFXj2MWrW26o5oShawyQu X4VSp4igFSJHi0R0AoXlI0KgK1kaufHxrpEws1+D7+5/GQuYOXjHyGxD9ccIRg8j nELa8pDPH+dU5jdR4PABYWFfGUYbHPFnBjOho2SpUw+bMiMj9AgEs= X-Sasl-enc: dq1Pj/X6+MljBSKBTaqnXWESxretVpGdzg2YJ3eP8NVP 1385143880 Received: from tarsus.local2 (unknown [46.19.33.46]) by mail.messagingengine.com (Postfix) with ESMTPA id 6C2C5C00E83; Fri, 22 Nov 2013 13:11:20 -0500 (EST) Date: Fri, 22 Nov 2013 20:11:10 +0200 From: Daniel Shahaf To: freebsd-hackers@freebsd.org Subject: 'freebsd-update cron' repeatedly announcing 9.1-RELEASE-p8 Message-ID: <20131122181110.GA29056@tarsus.local2> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 18:11:24 -0000 This cron job: 0 3 * * * /usr/sbin/freebsd-update cron emails me nightly with a request to update to 9.1-RELEASE-p8. But I don't need the -p8 fixes in my environment, so that nightly mail is just clutter in my inbox, and would make it harder for me to notice -p9 when that is announced. So I added a freebsd-update.conf(5) knob to allow suppressing the email if it's for a given release. See attachment. The intended use is to set the knob to "9.1-RELEASE-p8" and then, when I start getting mails about -p9, either install -p9 or update the knob's value to -p9. Daniel Index: etc/freebsd-update.conf =================================================================== --- etc/freebsd-update.conf (revision 258471) +++ etc/freebsd-update.conf (working copy) @@ -74,3 +74,7 @@ MergeChanges /etc/ /boot/device.hints # When backing up a kernel also back up debug symbol files? # BackupKernelSymbolFiles no + +# If the new release is the specified value, don't emit an email announcing +# it. (Default: unspecified) +# IgnoreReleases 9.1-RELEASE-p8 Index: share/man/man5/freebsd-update.conf.5 =================================================================== --- share/man/man5/freebsd-update.conf.5 (revision 258471) +++ share/man/man5/freebsd-update.conf.5 (working copy) @@ -218,6 +218,13 @@ backup kernel, the .Cm freebsd-update rollback command will recreate the symbol files along with the old kernel. +.It Cm IgnoreReleases +The parameters following this keyword are regular expressions; +if the new release matches one of them, it will be ignored by +.Cm cron . +.Pp +This option can be specified multiple times, and the parameters +accumulate. .El .Sh FILES .Bl -tag -width "/etc/freebsd-update.conf" Index: usr.sbin/freebsd-update/freebsd-update.sh =================================================================== --- usr.sbin/freebsd-update/freebsd-update.sh (revision 258471) +++ usr.sbin/freebsd-update/freebsd-update.sh (working copy) @@ -88,6 +88,7 @@ EOF CONFIGOPTIONS="KEYPRINT WORKDIR SERVERNAME MAILTO ALLOWADD ALLOWDELETE KEEPMODIFIEDMETADATA COMPONENTS IGNOREPATHS UPDATEIFUNMODIFIED BASEDIR VERBOSELEVEL TARGETRELEASE STRICTCOMPONENTS MERGECHANGES + IGNORERELEASE IDSIGNOREPATHS BACKUPKERNEL BACKUPKERNELDIR BACKUPKERNELSYMBOLFILES" # Set all the configuration options to "". @@ -217,6 +218,13 @@ config_Components () { done } +# Add to the list of releases updates to will be ignored. +config_IgnoreReleases () { + for C in $@; do + IGNORERELEASE="${IGNORERELEASE} ${C}" + done +} + # Add to the list of paths under which updates will be ignored. config_IgnorePaths () { for C in $@; do @@ -2086,6 +2094,21 @@ fetch_run () { fetch_warn_eol || return 1 } +# If the available release is in IgnoreReleases, return true. +# Else, return false. +cron_suppress_mail() { + TMPFILE=$1 + if grep -q "No updates needed" ${TMPFILE}; then + return 0 + fi + for X in ${IGNORERELEASE}; do + if echo "${RELNUM}-p${RELPATCHNUM}" | grep -q "${X}"; then + return 0 + fi + done + return 1 +} + # If StrictComponents is not "yes", generate a new components list # with only the components which appear to be installed. upgrade_guess_components () { @@ -3199,7 +3222,7 @@ cmd_cron () { TMPFILE=`mktemp /tmp/freebsd-update.XXXXXX` || exit 1 if ! fetch_run >> ${TMPFILE} || - ! grep -q "No updates needed" ${TMPFILE} || + ! cron_suppress_mail ${TMPFILE} || [ ${VERBOSELEVEL} = "debug" ]; then mail -s "`hostname` security updates" ${MAILTO} < ${TMPFILE} fi From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 18:39:53 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 117E0145; Fri, 22 Nov 2013 18:39:53 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 842552652; Fri, 22 Nov 2013 18:39:52 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAMIdgQ8044050; Fri, 22 Nov 2013 20:39:42 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAMIdgQ8044050 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id rAMIdgnB044049; Fri, 22 Nov 2013 20:39:42 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 22 Nov 2013 20:39:42 +0200 From: Konstantin Belousov To: Jilles Tjoelker Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler Message-ID: <20131122183942.GB59496@kib.kiev.ua> References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua> <20131122133553.GA28457@stack.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rmx1G5GNWS01lHd9" Content-Disposition: inline In-Reply-To: <20131122133553.GA28457@stack.nl> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-hackers@freebsd.org, threads@freebsd.org, Vitaly Magerya , davidxu@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 18:39:53 -0000 --rmx1G5GNWS01lHd9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 22, 2013 at 02:35:53PM +0100, Jilles Tjoelker wrote: > This analysis suggests an easier approach: just move the check for > deferred_siginfo.si_signo =3D=3D 0 downward. If __fillcontextx2 or sysarch > need to be looked up by rtld, the resulting _thr_ast() will invoke the > signal handler and the original call to check_deferred_signal() will do > nothing. >=20 > This patch fixes the problem for me on stable/9 and head. >=20 > Index: lib/libthr/thread/thr_sig.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- lib/libthr/thread/thr_sig.c (revision 258178) > +++ lib/libthr/thread/thr_sig.c (working copy) > @@ -326,12 +326,12 @@ check_deferred_signal(struct pthread *curthread) > uc_len =3D __getcontextx_size(); > uc =3D alloca(uc_len); > getcontext(uc); > - if (curthread->deferred_siginfo.si_signo =3D=3D 0) > - return; > __fillcontextx2((char *)uc); > act =3D curthread->deferred_sigact; > uc->uc_sigmask =3D curthread->deferred_sigmask; > memcpy(&info, &curthread->deferred_siginfo, sizeof(siginfo_t)); > + if (curthread->deferred_siginfo.si_signo =3D=3D 0) > + return; > /* remove signal */ > curthread->deferred_siginfo.si_signo =3D 0; > handle_signal(&act, info.si_signo, &info, uc); >=20 I do not like this. It is similar to what I did initially when I debugged the problem, but the duplicated calls to getcontext(2) and sysarch(2) stayed out as a sore in ktrace. I also do not like the fact that, with the change, signal is delivered from an rtld context. If taking such road, the fix would be to add __fillcontext2() to _rtld_init(), but I described the reason for other fix in the initial response. --rmx1G5GNWS01lHd9 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSj6TtAAoJEJDCuSvBvK1BOSkQAIgX0yy3jpTylGEV1X5BfvRt SkbpN+JlzSgUTMKGrnA0qt03SQE2JZp9rHS+b8qPEgDuXG/P76pz10rqcMF+3wv3 4Xs9yiv0r4kRv9Blw7d5tvsXi1HH9sF8hPmj2TbL2rJ1qOv4hacg5LLvocyZZ4oz yyL5WRB6XwQTW3Ax8BXSMuxLvHA4P2PAQ6CxG2283O1WQrOHELroLGTeS1nCvjaI irefCxx5lXWS3HYi6NxkV6MWIBYI7e57tLZNAKJnF5FDT8bWw/0hqR1/8Jpp/80Y vEs/56f1yNzJibzTS84NmZ5iW5KsKC4NR/Oq3AyRgZQ65C6Du2oOyHgjDW7o6a+i JznvcXVGA4TlF0m2e0zoXAhG0uHtxKZaHeDm8MBrR2ghZY2w1o2IHxIW944yzzY4 wkHT3i2WsMVkpPqyIMr2Zb4Z/tKf9bnthk3K3+JnTbSJDnvpzU2xIU3B1iosmXM2 GRKBCwzD36MzJ0MBZWbSWtpdJZDcS+qZVyJviq3TKsqd0Tfbr+08LtkXJ8w+3gDV de4RMbNc9cqN9hq+mvvTxdZKUd4nFYuwZXx0qyUxZequ16tYpUfAXlnaVco6vYAS 5fFc1ztq+lVhjkLnGeW+SE1q4Alju6cgAnf25XUo+7W3ZEtC+DxXnrtuptpzcJ3X XZLeWMJ+5fuwTle9w9SP =pWVY -----END PGP SIGNATURE----- --rmx1G5GNWS01lHd9-- From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 19:57:23 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A48E92FB for ; Fri, 22 Nov 2013 19:57:23 +0000 (UTC) Received: from nm14-vm0.bullet.mail.bf1.yahoo.com (nm14-vm0.bullet.mail.bf1.yahoo.com [98.139.213.164]) by mx1.freebsd.org (Postfix) with SMTP id 4AB072AA1 for ; Fri, 22 Nov 2013 19:57:22 +0000 (UTC) Received: from [98.139.215.141] by nm14.bullet.mail.bf1.yahoo.com with NNFMP; 22 Nov 2013 19:55:40 -0000 Received: from [98.139.211.198] by tm12.bullet.mail.bf1.yahoo.com with NNFMP; 22 Nov 2013 19:55:40 -0000 Received: from [127.0.0.1] by smtp207.mail.bf1.yahoo.com with NNFMP; 22 Nov 2013 19:55:40 -0000 X-Yahoo-Newman-Id: 790450.3025.bm@smtp207.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: r58nt9YVM1ni.HUjy_PIn9MN7RJhHYNfSWZmMN6SyQ6ydl4 wFlPqwcSkK74eTllL4gC.FosZNa6Ne8WNJiaFMaVdMOC1VrPdWUT1GcoJzAe TWnsmJAKfmU6UVAZVl36dXze50jJPe9c3JUfWkQk86BviiFteZY3x9AQcHpP 3hg7XFqTzmnrklF6v_ODw2xloYmS4.zgGGsBp6vNTlgVHgw117oktdm0nN.1 2UAdrCh1MOofWiX4FHKwXUMBzj2vkA88AV4u_6nnHGh9RfQma45i8izUWFHi R714ofTMkDDgdkXnrg2vJ5DcjTTIx_MvrFOmUR4Vv9eC2zdD0eEEzHpTotHa JvNm11NxxI847qJV.6hTFrgTnslD4UZoAJ5Fg095ZSIqaNoT.puLDdFN4yqG .iBMVdV5gq9vb5dNgMhXlRlNqDgeqSj71kXyck6TitCSCdmZIxEgvE6Jbcym IaAc10yXdvmKEdNFpuu0N0SzdnzH21M4bpXeUD3ijdYYKCFiHmGnCLFOWXI6 eTw8Llumn4A2UfnCP0lswmYR.E1E57ShPMfiXiVFrWjJX8VGDbgThrPis8Gx 7wZRdDQdT2RrqAqdOT.Gq2WYoxKC7ZD2LfHyPCVDTqAgfeQDsLygOQ9Sw6cV C X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with ) by smtp207.mail.bf1.yahoo.com with SMTP; 22 Nov 2013 11:55:40 -0800 PST Message-ID: <528FB6BA.7040606@FreeBSD.org> Date: Fri, 22 Nov 2013 14:55:38 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Cedric Blancher Subject: Re: O_XATTR support in FreeBSD? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Freebsd hackers list , Richard Yao X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 19:57:23 -0000 Well ... According to: https://wiki.freebsd.org/ZFS We do support Extended Attributes on ZFS but they differ from the ones in Solaris (and Linux). Pedro. From owner-freebsd-hackers@FreeBSD.ORG Sat Nov 23 07:13:15 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CFC8F2AC; Sat, 23 Nov 2013 07:13:15 +0000 (UTC) Received: from mail-ie0-x22b.google.com (mail-ie0-x22b.google.com [IPv6:2607:f8b0:4001:c03::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9CD7E2950; Sat, 23 Nov 2013 07:13:15 +0000 (UTC) Received: by mail-ie0-f171.google.com with SMTP id ar20so3723042iec.16 for ; Fri, 22 Nov 2013 23:13:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=lGE/e4u6r+17h/nBlfDBcfeLo8hbMup9bNGaNmmBUro=; b=PRNFBz+DureTz68lxZLcOcZl9PIIEa+akVmTBlsVS4oc4c2AzNdQOiYdhNp2BOsVt0 qHWpajdaxmKRVn3zvBlSqxD6DKWEXfMzWSjt9BiRQXwlHcwRpkKIf5g61coJ01xiHxKu 2MIjU5X+eJB/s0x9AJd84cdNb5SVWzt4+dcPOA5rEQpgf3wVZ6Zlo1B3GRuuwHNpCB+q CATEFBpWCKzOQG4o123lNxEDKpHqEQkv5wxohlZHS/V3iHdHoGlSex/7JvFXkrAr/Z5E tyT/2B6wXCMW+n2lRXrE3L2xb2C+LPqDVE4dKIkoHjLD/sDwb+kwDdNkKRsbspbprtLn L4BQ== MIME-Version: 1.0 X-Received: by 10.50.50.169 with SMTP id d9mr5517335igo.28.1385190794938; Fri, 22 Nov 2013 23:13:14 -0800 (PST) Received: by 10.50.225.70 with HTTP; Fri, 22 Nov 2013 23:13:14 -0800 (PST) In-Reply-To: <528FB6BA.7040606@FreeBSD.org> References: <528FB6BA.7040606@FreeBSD.org> Date: Sat, 23 Nov 2013 08:13:14 +0100 Message-ID: Subject: Re: O_XATTR support in FreeBSD? From: Cedric Blancher To: Pedro Giffuni Content-Type: text/plain; charset=ISO-8859-1 Cc: Freebsd hackers list , Richard Yao X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Nov 2013 07:13:15 -0000 On 22 November 2013 20:55, Pedro Giffuni wrote: > Well ... > > According to: > > https://wiki.freebsd.org/ZFS > > We do support Extended Attributes on ZFS but they differ from the ones in > Solaris (and Linux). Well, we need the one specified in the NFSv4 standard. The Linux extended attributes are pretty much useless because they are size restricted (typical attribute size here is in the GB range, and for example NIH and CERN have even much bigger sizes), can't be accessed like normal files and are incompatible to Window's Alternate Streams. Ced -- Cedric Blancher Institute Pasteur From owner-freebsd-hackers@FreeBSD.ORG Sat Nov 23 14:08:09 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A552A39B for ; Sat, 23 Nov 2013 14:08:09 +0000 (UTC) Received: from nm27-vm1.bullet.mail.bf1.yahoo.com (nm27-vm1.bullet.mail.bf1.yahoo.com [98.139.213.148]) by mx1.freebsd.org (Postfix) with SMTP id 4B29129C9 for ; Sat, 23 Nov 2013 14:08:08 +0000 (UTC) Received: from [66.196.81.173] by nm27.bullet.mail.bf1.yahoo.com with NNFMP; 23 Nov 2013 14:05:03 -0000 Received: from [98.139.213.8] by tm19.bullet.mail.bf1.yahoo.com with NNFMP; 23 Nov 2013 14:05:03 -0000 Received: from [127.0.0.1] by smtp108.mail.bf1.yahoo.com with NNFMP; 23 Nov 2013 14:05:03 -0000 X-Yahoo-Newman-Id: 196499.76039.bm@smtp108.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: APb89qgVM1nxo8iK80PQAcD3aS3OiynBKVpX3_FN0UNo4w3 C5iOedsrQ4ybfkiQgdZat2ZZ6J.XxuS_rX0f0RHq99NeVYarcUQvlP0KWcMm bwNRtr22MZCmz0ne5ujw5ImZBA6wS_Y1YAng8BOVl9NSH72_mkRwzld41652 .sU35rD2ystsnT7.JRWTEKaz7XI5ILJkcjG65LkHSHUFuBvpyzYWgAaSUbrn hiavz_gVvbGrZVtSfSTfi1qk2b84Iv.MTmRiE_Vxlny0r4T4.4l1oqF_2329 74r.by70BfW241icV_0VMH4NQCQwWDLDmlJiWNXeVeIu8RKf77lZwHrd6tzN 8aCLKylKR..RjfRRiWQlBdqCkLp09dmSl9CLkzGN9PK.znlwWe7JD83mwOXE bm4bJ8T4JCJGYDfq5XPokj4YLIxqHu5jqu6af9M6CtC16vXdV.KGcQ_93faQ h_c7njKc52Zqgsg0I4CpWR73PUqhbpfD.ca0Ffz.0Br9jxFm_F9qee5lmTKV eBfznv.lKxJHa_hlNlYwTxG7gX8aYpxw51vWzMyU0w7Nn2Z9oIX3Nvw-- X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with ) by smtp108.mail.bf1.yahoo.com with SMTP; 23 Nov 2013 06:05:03 -0800 PST Message-ID: <5290B60D.2050006@FreeBSD.org> Date: Sat, 23 Nov 2013 09:05:01 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Cedric Blancher Subject: Re: O_XATTR support in FreeBSD? References: <528FB6BA.7040606@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Freebsd hackers list , Richard Yao X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Nov 2013 14:08:09 -0000 On 23.11.2013 02:13, Cedric Blancher wrote: > On 22 November 2013 20:55, Pedro Giffuni wrote: >> Well ... >> >> According to: >> >> https://wiki.freebsd.org/ZFS >> >> We do support Extended Attributes on ZFS but they differ from the ones in >> Solaris (and Linux). > Well, we need the one specified in the NFSv4 standard. The Linux > extended attributes are pretty much useless because they are size > restricted (typical attribute size here is in the GB range, and for > example NIH and CERN have even much bigger sizes), can't be accessed > like normal files and are incompatible to Window's Alternate Streams. > > Ced I was unaware of a standard for EA beyond the old posix draft. The reason for Extended Attributes is supporting ACL and we support both the draft posix and the NFS/win style ACLs. Not sure about the status of NFSv4. The guys in the posix-1e list should know better. regards, Pedro. From owner-freebsd-hackers@FreeBSD.ORG Sat Nov 23 22:53:47 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EE8ED5F1; Sat, 23 Nov 2013 22:53:47 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id A3EED2007; Sat, 23 Nov 2013 22:53:46 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqQEAMcxkVKDaFve/2dsb2JhbABZgz9Tgnm4Vk6BMnSCJQEBAQMBAQEBICsgCwUWDgoCAg0ZAikBCSYGCAcEARwBA4daBg2uCZBCF4EpjQYHAQEbNAeCa4FIA4lCjAODf4kbh0eDRh4xewkXIg X-IronPort-AV: E=Sophos;i="4.93,759,1378872000"; d="scan'208";a="71626984" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 23 Nov 2013 17:53:38 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 04940B40EB; Sat, 23 Nov 2013 17:53:38 -0500 (EST) Date: Sat, 23 Nov 2013 17:53:38 -0500 (EST) From: Rick Macklem To: Pedro Giffuni Message-ID: <820263347.19772534.1385247218007.JavaMail.root@uoguelph.ca> In-Reply-To: <5290B60D.2050006@FreeBSD.org> Subject: Re: O_XATTR support in FreeBSD? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Freebsd hackers list , Richard Yao , Cedric Blancher X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Nov 2013 22:53:48 -0000 Pedro Giffuni wrote: > On 23.11.2013 02:13, Cedric Blancher wrote: > > On 22 November 2013 20:55, Pedro Giffuni wrote: > >> Well ... > >> > >> According to: > >> > >> https://wiki.freebsd.org/ZFS > >> > >> We do support Extended Attributes on ZFS but they differ from the > >> ones in > >> Solaris (and Linux). > > Well, we need the one specified in the NFSv4 standard. The Linux > > extended attributes are pretty much useless because they are size > > restricted (typical attribute size here is in the GB range, and for > > example NIH and CERN have even much bigger sizes), can't be > > accessed > > like normal files and are incompatible to Window's Alternate > > Streams. > > > > Ced > > I was unaware of a standard for EA beyond the old posix draft. > The reason for Extended Attributes is supporting ACL and we support > both > the draft posix and the NFS/win style ACLs. > Interestingly, FreeBSD has a VOP_OPENEXTATTR() but no syscall that uses it nor support for it in ZFS. (I'm just guessing it was intended for an openat(2) syscall at some time?) Btw Cedric, if you had mentioned "subfiles" or "fork files" in your subject line, you might have gotten a better answer. I, for one, didn't know what O_XATTR is. I also always get confused w.r.t. what to call these beasts. (NFSv4 calls the named attributes.) Btw, apps can use extended attributes (the limited sized atomically stored/read kind). They aren't just for storing ACLs. > Not sure about the status of NFSv4. The guys in the posix-1e list > should > know better. > The NFSv4 implementation in FreeBSD does not support it, although adding it wouldn't be hard if someone figures out how to do the syscall and adds support for the VOP()s in ZFS. (I'm not volunteering to do the latter. I have plenty of other stuff on my to-do list;-) rick > regards, > > Pedro. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@FreeBSD.ORG Sat Nov 23 23:48:39 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 11970C75; Sat, 23 Nov 2013 23:48:39 +0000 (UTC) Received: from mail.crittercasa.com (mail.turbofuzz.com [208.87.221.144]) by mx1.freebsd.org (Postfix) with ESMTP id E1001223F; Sat, 23 Nov 2013 23:48:38 +0000 (UTC) Received: from [10.20.30.117] (248.sub-70-197-7.myvzw.com [70.197.7.248]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.crittercasa.com (Postfix) with ESMTPS id B75F8164874; Sat, 23 Nov 2013 15:41:38 -0800 (PST) Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1812\)) Subject: Re: O_XATTR support in FreeBSD? From: Jordan Hubbard In-Reply-To: <820263347.19772534.1385247218007.JavaMail.root@uoguelph.ca> Date: Sat, 23 Nov 2013 15:41:37 -0800 Message-Id: References: <820263347.19772534.1385247218007.JavaMail.root@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1812) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: Freebsd hackers list , Richard Yao , Pedro Giffuni , Cedric Blancher X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Nov 2013 23:48:39 -0000 On Nov 23, 2013, at 2:53 PM, Rick Macklem wrote: > Interestingly, FreeBSD has a VOP_OPENEXTATTR() but no syscall > that uses it nor support for it in ZFS. (I'm just guessing it > was intended for an openat(2) syscall at some time?) > Btw Cedric, if you had mentioned "subfiles" or "fork files" in your > subject line, you might have gotten a better answer. I, for one, > didn't know what O_XATTR is. I also always get confused w.r.t. what > to call these beasts. (NFSv4 calls the named attributes.) >=20 > Btw, apps can use extended attributes (the limited sized > atomically stored/read kind). They aren't just for > storing ACLs. Sigh. Extended Attributes. :-/ I guess I=92ll raise my head in this discussion. They=92ve certainly = been the bane of my existence for long enough! First, supporting EAs properly really involves multiple levels of the = Unix command and library stack. The filesystem can support them natively, sure, but that=92s actually = somewhat optional since you can always (cough cough) stick them in a = side-store if the rest of the stack cooperates. That=92s where the = awesome AppleDouble files came from (=93._weirdfile" corresponding to = =93weirdfile") which remain useful even after filesystems like = HFS/ZFS/UFS became EA-aware natively because there=92s always those = foreign data stores to talk to (some early AFP/CIFS/NFS mount, for = example) and the fact that you still need to *serialize* the dang things = into tar / cpio / zip / ??? files as well as across network replication = with tools like rsync. What good is an EA, much less an ACL that=92s = been stored in an EA, if it gets stripped off the first time you tar up = a directory and extract it somewhere else? So I wouldn=92t start with NFSv4 or ZFS if I was asking the question. I = would start with libc and ask if it had anything similar to copyfile(3) = so that the tools above it could start actually supporting those = attributes on a *practical* basis! :) - Jordan