From owner-freebsd-hackers@FreeBSD.ORG  Sun Nov 17 23:09:12 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id ADB3CC9B;
 Sun, 17 Nov 2013 23:09:12 +0000 (UTC)
Received: from mail-ea0-x234.google.com (mail-ea0-x234.google.com
 [IPv6:2a00:1450:4013:c01::234])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id EDE892B56;
 Sun, 17 Nov 2013 23:09:11 +0000 (UTC)
Received: by mail-ea0-f180.google.com with SMTP id f15so268946eak.25
 for <multiple recipients>; Sun, 17 Nov 2013 15:09:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:subject
 :content-type:content-transfer-encoding;
 bh=9JhQdvAs5+jD0Qs/gm3vGXoKD2I5ED1Rr63BFPP9G4Y=;
 b=jA+duaKysm+lYhos+xqy6q7TTtjxwWUNK27Tvv0L9z1x2oVrOqabcb7r9dvfFrtE/T
 9pNAQsggWNMMzYyHtQ9wnnQtPjQRVSN4QNDNoYOD8SaUo2W0Ee9EADPNRwfgEyUmRbtg
 +I/v3XDUOiHvFdHmYNq58tdiwkIN9vlAJYLUTtdH2IyN+QE7lPv7buRn1sRibDmei8VA
 lGkkOnQ2liEFW9hvXI2RGChoKHWLCASs8t8s0BKD7Lbg41r+BoQ+mIUknIwWD1XEs9Q9
 PoUjt7b+6yQ4QQnnxwfSSgA91s02wTaxRxm68QnfEUMVzGmtC5f/gqL9qe/KJOUuX3aW
 WQ4g==
X-Received: by 10.15.65.11 with SMTP id p11mr345117eex.49.1384729749539;
 Sun, 17 Nov 2013 15:09:09 -0800 (PST)
Received: from mavbook.mavhome.dp.ua ([178.137.150.35])
 by mx.google.com with ESMTPSA id o47sm31544449eem.21.2013.11.17.15.09.07
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Sun, 17 Nov 2013 15:09:08 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <52894C92.60905@FreeBSD.org>
Date: Mon, 18 Nov 2013 01:09:06 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, 
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
Subject: UMA cache back pressure
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Nov 2013 23:09:12 -0000

Hi.

I've created patch, based on earlier work of avg@, to add back pressure 
to UMA allocation caches. The problem of physical memory or KVA 
exhaustion existed there for many years and it is quite critical now for 
improving systems performance while keeping stability. Changes done in 
memory allocation last years improved situation. but haven't fixed 
completely. My patch solves remaining problems from two sides: a) 
reducing bucket sizes every time system detects low memory condition; 
and b) as last-resort mechanism for very low memory condition, it 
cycling over all CPUs to purge their per-CPU UMA caches. Benefit of this 
approach is in absence of any additional hard-coded limits on cache 
sizes -- they are self-tuned, based on load and memory pressure.

With this change I believe it should be safe enough to enable UMA 
allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for 
amd64). I did many tests on machine with 24 logical cores (and as result 
strong allocation cache effects), and can say that with 40GB RAM using 
UMA caches, allowed by this change, by two times increases results of 
SPEC NFS benchmark on ZFS pool of several SSDs. To test system stability 
I've run the same test with physical memory limited to just 2GB and 
system successfully survived that, and even showed results 1.5 times 
better then with just last resort measures of b). In both cases 
tools/umastat no longer shows unbound UMA cache growth, that makes me 
believe in viability of this approach for longer runs.

I would like to hear some comments about that:
http://people.freebsd.org/~mav/uma_pressure.patch

Thank you.

-- 
Alexander Motin

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 08:41:50 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E87C359F;
 Mon, 18 Nov 2013 08:41:49 +0000 (UTC)
Received: from mail-qe0-x229.google.com (mail-qe0-x229.google.com
 [IPv6:2607:f8b0:400d:c02::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8C13E2719;
 Mon, 18 Nov 2013 08:41:49 +0000 (UTC)
Received: by mail-qe0-f41.google.com with SMTP id x7so3878272qeu.14
 for <multiple recipients>; Mon, 18 Nov 2013 00:41:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=G+T0OAuaFI/yyb4AkjROiIwYoZzdv9PevOuywYhjg+I=;
 b=f/wS6uq+Gr/7F3GSlqBFqzGBiHPw6JrwNTDOX2Hm2n2lVwEsTEFZPVON3B9JERZHJY
 vtI8sdfLATQreDpQTAQmq2G6g44oNezeACdLN89WV4QNJLSGByp121OXnUeVcN4THkuA
 tfdYnDsuk61qKwctA4Bn0Zy64OZZnhV7gZfLrM56nqRqF4XUClnsqIhIPV0L1H+AC3Wi
 /A9EzZuiEXQOB2l4j9yf476DXwTLEvtR8nFmo1hp3zoyibiPT1w3K6c/XnngZcEKd1Is
 qU83sCcJJcqvmaUZwh6elbJ1U4M/e4bxeSPHvnwpgmS6ZQbKFAD09k4qwETE4XKJ8yaG
 pTog==
MIME-Version: 1.0
X-Received: by 10.224.64.200 with SMTP id f8mr32262534qai.55.1384764108825;
 Mon, 18 Nov 2013 00:41:48 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.207.66 with HTTP; Mon, 18 Nov 2013 00:41:48 -0800 (PST)
In-Reply-To: <52894C92.60905@FreeBSD.org>
References: <52894C92.60905@FreeBSD.org>
Date: Mon, 18 Nov 2013 00:41:48 -0800
X-Google-Sender-Auth: NbolgVcs7EvAmjwQ51Qzypcoosk
Message-ID: <CAJ-VmokYgfJ1tr-99qCXosBsyTZ698oLZ2oPpkdGODjo8+K3LQ@mail.gmail.com>
Subject: Re: UMA cache back pressure
From: Adrian Chadd <adrian@freebsd.org>
To: Alexander Motin <mav@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 08:41:50 -0000

Hi!

Your patch does three things:

* adds a couple new buckets;
* reduces some lock contention
* does the aggressive backpressure.

So, do you get any benefits from just the first one, or first two?




-adrian


On 17 November 2013 15:09, Alexander Motin <mav@freebsd.org> wrote:
> Hi.
>
> I've created patch, based on earlier work of avg@, to add back pressure to
> UMA allocation caches. The problem of physical memory or KVA exhaustion
> existed there for many years and it is quite critical now for improving
> systems performance while keeping stability. Changes done in memory
> allocation last years improved situation. but haven't fixed completely. My
> patch solves remaining problems from two sides: a) reducing bucket sizes
> every time system detects low memory condition; and b) as last-resort
> mechanism for very low memory condition, it cycling over all CPUs to purge
> their per-CPU UMA caches. Benefit of this approach is in absence of any
> additional hard-coded limits on cache sizes -- they are self-tuned, based on
> load and memory pressure.
>
> With this change I believe it should be safe enough to enable UMA allocation
> caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for amd64). I did
> many tests on machine with 24 logical cores (and as result strong allocation
> cache effects), and can say that with 40GB RAM using UMA caches, allowed by
> this change, by two times increases results of SPEC NFS benchmark on ZFS
> pool of several SSDs. To test system stability I've run the same test with
> physical memory limited to just 2GB and system successfully survived that,
> and even showed results 1.5 times better then with just last resort measures
> of b). In both cases tools/umastat no longer shows unbound UMA cache growth,
> that makes me believe in viability of this approach for longer runs.
>
> I would like to hear some comments about that:
> http://people.freebsd.org/~mav/uma_pressure.patch
>
> Thank you.
>
> --
> Alexander Motin
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 09:21:02 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C1A43373;
 Mon, 18 Nov 2013 09:21:02 +0000 (UTC)
Received: from mail-ee0-x230.google.com (mail-ee0-x230.google.com
 [IPv6:2a00:1450:4013:c00::230])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0DA4029A0;
 Mon, 18 Nov 2013 09:21:01 +0000 (UTC)
Received: by mail-ee0-f48.google.com with SMTP id e49so2313646eek.21
 for <multiple recipients>; Mon, 18 Nov 2013 01:21:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=WdHtD+khd2d/VfhXzRyplRr7wBb2XCgdBHz2xxlmbds=;
 b=ZGRKncfEbYrBaKzzzAZJBrXQ0K4IvvEJOSTHl0p/jFpte5nRQj9n2wd+67+nrfaEmG
 dIYa8F0Uu+jGthVNW8IUyza8LQ53isJkJRGd0ZViUACzV1Pmex4NQWkZmtUxODoKpDPB
 3wP+WVd+Tnwe48dJolZrL40Ufa5oe81wINmYqlC2uIqoIvd6/GK0CIxYmYxyo7hEvDD1
 qCMDIAo0rm5NCWMBdVm6RfXgsGMDAy1EmHF2sqNxw4bBbzOJ5+E9rFmrxHSenf9/tskw
 x0Esueh6EnGKWqWhj0j29mjn82NEbg2j4CPuxK3Dj26k/gSaKno98MPsXqRyYrFKes31
 oMDA==
X-Received: by 10.14.108.9 with SMTP id p9mr20316683eeg.8.1384766460341;
 Mon, 18 Nov 2013 01:21:00 -0800 (PST)
Received: from mavbook.mavhome.dp.ua ([178.137.150.35])
 by mx.google.com with ESMTPSA id s3sm35801312eeo.3.2013.11.18.01.20.58
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 18 Nov 2013 01:20:59 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <5289DBF9.80004@FreeBSD.org>
Date: Mon, 18 Nov 2013 11:20:57 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: UMA cache back pressure
References: <52894C92.60905@FreeBSD.org>
 <CAJ-VmokYgfJ1tr-99qCXosBsyTZ698oLZ2oPpkdGODjo8+K3LQ@mail.gmail.com>
In-Reply-To: <CAJ-VmokYgfJ1tr-99qCXosBsyTZ698oLZ2oPpkdGODjo8+K3LQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 09:21:02 -0000

On 18.11.2013 10:41, Adrian Chadd wrote:
> Your patch does three things:
>
> * adds a couple new buckets;

These new buckets make bucket size self-tuning more soft and precise. 
Without them there are buckets for 1, 5, 13, 29, ... items. While at 
bigger sizes difference about 2x is fine, at smallest ones it is 5x and 
2.6x respectively. New buckets make that line look like 1, 3, 5, 9, 13, 
29, reducing jumps between steps, making algorithm work softer, 
allocating and freeing memory in better fitting chunks. Otherwise there 
is quite a big gap between allocating 128K and 5x128K of RAM at once.

> * reduces some lock contention

More precisely patch adds check for congestion on free to grow bucket 
sizes same as on allocation. As consequence that indeed should reduce 
lock congestion, but I don't have specific numbers. All I see is that VM 
and UMA mutexes no longer appear in profiling top after all these changes.

* does soft back pressure

In this list you have missed mentioning small but major point of the 
patch -- we should prevent problems, not just solve them. As I have 
written in original email, this specific change shown me 1.5x 
performance improvement in low-memory condition. As I understand, that 
happened because VM no longer have to repeatedly allocate and free 
hugely oversized buckets of 10-15 * 128K.

> * does the aggressive backpressure.

After all above that is mostly just a safety belt. With 40GB RAM that 
code was triggered only couple times during full hour of testing with 
debug logging inserted there. On machine with 2GB RAM it is triggered 
quite regularly and probably that is unavoidable since even with lowest 
bucket size of one item 24 CPUs mean 48 cache buckets, i.e. up to 6MB of 
otherwise unreleasable memory for single 128K zone.

> So, do you get any benefits from just the first one, or first two?

I don't see much reason to handle that in pieces. As I have described 
above, each part has own goal, but they much better work together.

> On 17 November 2013 15:09, Alexander Motin <mav@freebsd.org> wrote:
>> Hi.
>>
>> I've created patch, based on earlier work of avg@, to add back pressure to
>> UMA allocation caches. The problem of physical memory or KVA exhaustion
>> existed there for many years and it is quite critical now for improving
>> systems performance while keeping stability. Changes done in memory
>> allocation last years improved situation. but haven't fixed completely. My
>> patch solves remaining problems from two sides: a) reducing bucket sizes
>> every time system detects low memory condition; and b) as last-resort
>> mechanism for very low memory condition, it cycling over all CPUs to purge
>> their per-CPU UMA caches. Benefit of this approach is in absence of any
>> additional hard-coded limits on cache sizes -- they are self-tuned, based on
>> load and memory pressure.
>>
>> With this change I believe it should be safe enough to enable UMA allocation
>> caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for amd64). I did
>> many tests on machine with 24 logical cores (and as result strong allocation
>> cache effects), and can say that with 40GB RAM using UMA caches, allowed by
>> this change, by two times increases results of SPEC NFS benchmark on ZFS
>> pool of several SSDs. To test system stability I've run the same test with
>> physical memory limited to just 2GB and system successfully survived that,
>> and even showed results 1.5 times better then with just last resort measures
>> of b). In both cases tools/umastat no longer shows unbound UMA cache growth,
>> that makes me believe in viability of this approach for longer runs.
>>
>> I would like to hear some comments about that:
>> http://people.freebsd.org/~mav/uma_pressure.patch


-- 
Alexander Motin

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 09:45:29 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6F35F2B8;
 Mon, 18 Nov 2013 09:45:29 +0000 (UTC)
Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com
 [IPv6:2a00:1450:4010:c03::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6CB2C2B7D;
 Mon, 18 Nov 2013 09:45:28 +0000 (UTC)
Received: by mail-la0-f42.google.com with SMTP id ec20so4743519lab.1
 for <multiple recipients>; Mon, 18 Nov 2013 01:45:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=eCr1wXnw9yFHqGiIQpqKOrsD8/+8sHIx1oscnisEvTI=;
 b=R3P2Hi5Xl8bVdMgLuPr8Xf+unH796luoQ+dzRZKlha/brDuzDT77i7Cc6BVHr9ulog
 QWVm1qZMFsMvxGEPSIytagRCJbAUUiMV8NlH5muOckXzcqBgdFnRTdX9kEUZXT6cg6V+
 jUxSP5Ep8McBkk7EEyGtvRgS/WmSlFdYEPQiafgCHV60JQl1OjY5c/Xa1yper8C+lT8S
 mMwX7WWGF3pNHEtfBznSTMdNsKn0itMGKSc4Epn/msH+aL6KTnbCWZzPjsdu6AdV9wVi
 RnbwMyVTKn5nRHEhcsF4fPxd0EaRG28acaRZ7i4t4SkC5gvRrnnGjNQT8NRTYfNTpsh0
 LVjg==
MIME-Version: 1.0
X-Received: by 10.112.219.99 with SMTP id pn3mr1025787lbc.24.1384767926523;
 Mon, 18 Nov 2013 01:45:26 -0800 (PST)
Sender: rizzo.unipi@gmail.com
Received: by 10.114.77.228 with HTTP; Mon, 18 Nov 2013 01:45:26 -0800 (PST)
In-Reply-To: <5289DBF9.80004@FreeBSD.org>
References: <52894C92.60905@FreeBSD.org>
 <CAJ-VmokYgfJ1tr-99qCXosBsyTZ698oLZ2oPpkdGODjo8+K3LQ@mail.gmail.com>
 <5289DBF9.80004@FreeBSD.org>
Date: Mon, 18 Nov 2013 10:45:26 +0100
X-Google-Sender-Auth: zAs_G8XSv3CVF7664Dac1teoEC8
Message-ID: <CA+hQ2+joZRJYmPdqi_0G3iRgAd_8rGVGayFT7FfHZ6MS_zziBQ@mail.gmail.com>
Subject: Re: UMA cache back pressure
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Alexander Motin <mav@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.16
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Adrian Chadd <adrian@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 09:45:29 -0000

On Mon, Nov 18, 2013 at 10:20 AM, Alexander Motin <mav@freebsd.org> wrote:

> On 18.11.2013 10:41, Adrian Chadd wrote:
>
>> Your patch does three things:
>>
>> * adds a couple new buckets;
>>
>
> These new buckets make bucket size self-tuning more soft and precise.
> Without them there are buckets for 1, 5, 13, 29, ... items. While at bigger
> sizes difference about 2x is fine, at smallest ones it is 5x and 2.6x
> respectively. New buckets make that line look like 1, 3, 5, 9, 13, 29,
> reducing jumps between steps, making algorithm work softer, allocating and
> freeing memory in better fitting chunks. Otherwise there is quite a big gap
> between allocating 128K and 5x128K of RAM at once.
>
>
just curious (and i do not understand whether the "1, 5 ..." are object
sizes in bytes or what), would it make sense to add some instrumentation
code (a small array of counters i presume) to track the actual number
of requests for exact object sizes, and perhaps at runtime create buckets
trying to reduce waste ?
Following your reasoning there seems to be still a big gap between
some of the numbers you quote in the sequence.

cheers
luigi

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 09:59:41 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C5DF97ED;
 Mon, 18 Nov 2013 09:59:41 +0000 (UTC)
Received: from mail-ee0-x235.google.com (mail-ee0-x235.google.com
 [IPv6:2a00:1450:4013:c00::235])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 109852C41;
 Mon, 18 Nov 2013 09:59:40 +0000 (UTC)
Received: by mail-ee0-f53.google.com with SMTP id b57so2361474eek.12
 for <multiple recipients>; Mon, 18 Nov 2013 01:59:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=z8SjCiJNVsJYQIQevLms4rBH10/eVvjRNnAsaPnx910=;
 b=gbBZO8svj8R5kM0wCmtYbbSwCgyU+Fad2BcKXXRq6WR9n8VeydpMOwcl8SlpK9RZYZ
 SPLelGk9dWsI3tJaLIg8/ImVC1YnNXcJ1vF8swI2RPAj0ZaIaLedPx7P/RTTQ8MpAf26
 ePk7JtG8RGGCWpTtPg7L1FsIAN0+py9sHSa+dudQWyF/xnMwdBjoRHTtus94ZWZQV0iK
 Zqqow3pXsFaDe2LSTPZWkEFzgeAN1o7g1XbhXD3KprV9Y7x/Bk7ON2ilJPjxqUCWdqwQ
 T5QawgbeKYcLyU2pwuAlbuUMHLkipl0omokO9JxDb2frdazmVos9JuED7CX5QYJjtVzj
 bcUQ==
X-Received: by 10.14.109.1 with SMTP id r1mr11909280eeg.32.1384768779114;
 Mon, 18 Nov 2013 01:59:39 -0800 (PST)
Received: from mavbook.mavhome.dp.ua ([178.137.150.35])
 by mx.google.com with ESMTPSA id o47sm36065475eem.21.2013.11.18.01.59.36
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 18 Nov 2013 01:59:38 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <5289E506.2070207@FreeBSD.org>
Date: Mon, 18 Nov 2013 11:59:34 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: Luigi Rizzo <rizzo@iet.unipi.it>
Subject: Re: UMA cache back pressure
References: <52894C92.60905@FreeBSD.org>	<CAJ-VmokYgfJ1tr-99qCXosBsyTZ698oLZ2oPpkdGODjo8+K3LQ@mail.gmail.com>	<5289DBF9.80004@FreeBSD.org>
 <CA+hQ2+joZRJYmPdqi_0G3iRgAd_8rGVGayFT7FfHZ6MS_zziBQ@mail.gmail.com>
In-Reply-To: <CA+hQ2+joZRJYmPdqi_0G3iRgAd_8rGVGayFT7FfHZ6MS_zziBQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Adrian Chadd <adrian@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 09:59:41 -0000

On 18.11.2013 11:45, Luigi Rizzo wrote:
>
>
>
> On Mon, Nov 18, 2013 at 10:20 AM, Alexander Motin <mav@freebsd.org
> <mailto:mav@freebsd.org>> wrote:
>
>     On 18.11.2013 10:41, Adrian Chadd wrote:
>
>         Your patch does three things:
>
>         * adds a couple new buckets;
>
>
>     These new buckets make bucket size self-tuning more soft and
>     precise. Without them there are buckets for 1, 5, 13, 29, ... items.
>     While at bigger sizes difference about 2x is fine, at smallest ones
>     it is 5x and 2.6x respectively. New buckets make that line look like
>     1, 3, 5, 9, 13, 29, reducing jumps between steps, making algorithm
>     work softer, allocating and freeing memory in better fitting chunks.
>     Otherwise there is quite a big gap between allocating 128K and
>     5x128K of RAM at once.
>
>
> just curious (and i do not understand whether the "1, 5 ..." are object
> sizes in bytes or what),

Buckets include header (~3 pointers), plus number of item pointers. So 
on amd64 1, 5, 13 mean 32, 64, 128 bytes per bucket. It is not really 
about saving memory on buckets themselves since they are very small, 
comparing to stored items. We could use bigger (like 16 items) bucket 
zone for allocating all smaller ones, overwriting just their items 
limit. But more zones potentially means also lower zone lock congestion 
there, so why not?

> would it make sense to add some instrumentation
> code (a small array of counters i presume) to track the actual number
> of requests for exact object sizes, and perhaps at runtime create buckets
> trying to reduce waste ?

Since 10.0 buckets are also allocated from UMA cache zones, so all 
stats, garbage collection, etc. work by the same rules, which you can 
see in `vmstat -z`.

> Following your reasoning there seems to be still a big gap between
> some of the numbers you quote in the sequence.

Big (2x) gaps between big numbers is less important since once we got 
there it means we have not so much memory pressure and should not be 
hurt by many extra frees. At lower numbers it may be more important.

-- 
Alexander Motin

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 10:21:29 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 39EC6E7B
 for <freebsd-hackers@freebsd.org>; Mon, 18 Nov 2013 10:21:29 +0000 (UTC)
Received: from dub0-omc2-s22.dub0.hotmail.com (dub0-omc2-s22.dub0.hotmail.com
 [157.55.1.161]) by mx1.freebsd.org (Postfix) with ESMTP id D42A82D78
 for <freebsd-hackers@freebsd.org>; Mon, 18 Nov 2013 10:21:28 +0000 (UTC)
Received: from DUB114-W124 ([157.55.1.137]) by dub0-omc2-s22.dub0.hotmail.com
 with Microsoft SMTPSVC(6.0.3790.4675); 
 Mon, 18 Nov 2013 02:20:21 -0800
X-TMN: [+Cw046ZaBtdk29Dtf8z9sjINZlg/cGxY]
X-Originating-Email: [robert.sevat@live.nl]
Message-ID: <DUB114-W1243D63F910FE2AE51DAAB087E40@phx.gbl>
From: Robert Sevat <robert.sevat@live.nl>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: FreeBSD hangs during boot when assigned a controller via vt-d
Date: Mon, 18 Nov 2013 11:20:20 +0100
Importance: Normal
MIME-Version: 1.0
X-OriginalArrivalTime: 18 Nov 2013 10:20:21.0209 (UTC)
 FILETIME=[C9578490:01CEE447]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.16
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 10:21:29 -0000

Greetings=2C

I have a problem with forwarding an LSI 2308 via vt-d in KVM to a FreeBSD v=
irtual machine. FreeBSD (9.2 and 10.0 beta 3) will=20

hang during the boot.

Hardware Setup:

Supermicro X10SL7-F with LSI 2308 flashed to IT mode
8x4 GB ecc ram
Haswell Xeon E3-1230V3

Software Setup:
Ubuntu 12.04.3 LTS 64 bit + latest KVM version.

uname -a
Linux Secretum 3.8.0-33-generic #48~precise1-Ubuntu SMP Thu Oct 24 16:28:06=
 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

dpkg --list | grep kvm=20
ii  kvm          1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu14.12 dummy transiti=
onal package from kvm to qemu-kvm
ii  kvm-ipxe     1.0.0+git-3.55f6c88-0ubuntu1                PXE ROM's for =
KVM
ii  qemu-kvm     1.0+noroms-0ubuntu14.12                     Full virtualiz=
ation on i386 and amd64 hardware



Under KVM I have the following 3 virtual machines installed=2C I have tried=
 forwarding the LSI 2308 to all three virtual machines.=20

It works perfectly under Ubuntu=2C but both FreeBSD vms will hang during th=
e boot.

FreeBSD 9.2
FreeBSD 10.0 beta 3
FreeBSD 10.0 live cd
Ubuntu 12.04 LTS

If I run FreeBSD 10.0 beta 3 directly on the hardware=2C it does recognize =
the raid controller and it'll use the mps0 driver.=20

Everything works fine then.

So the problem is that for some reason FreeBSD hangs during boot if you for=
ward the LSI 2308 via vt-d=2C and I have no idea why.

It will hang and give the following error:

http://i.imgur.com/hAMxwR7.png
http://i.imgur.com/rKALeXZ.png

While doing so the FreeBSD virtual machine uses 300% cpu=2C so it maxes out=
 3 cores. And it will stay like that.

After googling a bit some people suggested turning off msi / msix in the lo=
ader.conf

hw.pci.enable_msi=3D"0"
hw.pci.enable_msix=3D"0"

I have tried this on both freebsd virtual machines=2C it makes no differenc=
e. It still hangs.

Could somebody point me in the right direction of what I could still try? S=
hould I submit this as a bug? Should I ask this on another mailing list?

Kind Regards

Robert Sevat
 		 	   		  =

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 12:10:22 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 37258F17;
 Mon, 18 Nov 2013 12:10:22 +0000 (UTC)
Received: from mail-qa0-x236.google.com (mail-qa0-x236.google.com
 [IPv6:2607:f8b0:400d:c00::236])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id CD2962567;
 Mon, 18 Nov 2013 12:10:21 +0000 (UTC)
Received: by mail-qa0-f54.google.com with SMTP id f11so1145119qae.6
 for <multiple recipients>; Mon, 18 Nov 2013 04:10:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=v5eD+yxyx4RXPz2/rlhOv350IlwQtIUrqZeXMtBJX+M=;
 b=hoim1jtWE5vY0rmrIH7vvklPIGGwUy5FP0dkfGP0a9H2TrM7rzzzzc1M7fTbsoGyWv
 IEdTmxnMLXV54aDdIoqkU3I0zPFviKXznZchnNpqQYq9TOQnqSafccoZFhxAc3FtumL9
 tol6BelQOa/F6kfdqCC4nNsS720BbLG5PB7Q9ufJTfQej9K4moaS01IJvFcM/9DRFZPy
 W2Z3AEryFvSkEHYP0jpMzJalB7a3n+7XQVgkVDaJn2jmcTcNUR0lZjf9K0fmCdtFZfoD
 al1JCYap8/ucOrRJVZcOl3+vif/suFvhikdWvqPuKplnitKW0wb1UUBUoKWvy+oO6rBz
 Ddrw==
MIME-Version: 1.0
X-Received: by 10.224.98.200 with SMTP id r8mr33352927qan.26.1384776619952;
 Mon, 18 Nov 2013 04:10:19 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.207.66 with HTTP; Mon, 18 Nov 2013 04:10:19 -0800 (PST)
In-Reply-To: <5289DBF9.80004@FreeBSD.org>
References: <52894C92.60905@FreeBSD.org>
 <CAJ-VmokYgfJ1tr-99qCXosBsyTZ698oLZ2oPpkdGODjo8+K3LQ@mail.gmail.com>
 <5289DBF9.80004@FreeBSD.org>
Date: Mon, 18 Nov 2013 04:10:19 -0800
X-Google-Sender-Auth: FWiYWYHm8mpA44UH0srn0ozYp3g
Message-ID: <CAJ-VmomiFBQaNUweOO56rkOYtQOvUdsa1O=2WuYpeKxyTka+WA@mail.gmail.com>
Subject: Re: UMA cache back pressure
From: Adrian Chadd <adrian@freebsd.org>
To: Alexander Motin <mav@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 12:10:22 -0000

On 18 November 2013 01:20, Alexander Motin <mav@freebsd.org> wrote:
> On 18.11.2013 10:41, Adrian Chadd wrote:
>>
>> Your patch does three things:
>>
>> * adds a couple new buckets;
>
>
> These new buckets make bucket size self-tuning more soft and precise.
> Without them there are buckets for 1, 5, 13, 29, ... items. While at bigger
> sizes difference about 2x is fine, at smallest ones it is 5x and 2.6x
> respectively. New buckets make that line look like 1, 3, 5, 9, 13, 29,
> reducing jumps between steps, making algorithm work softer, allocating and
> freeing memory in better fitting chunks. Otherwise there is quite a big gap
> between allocating 128K and 5x128K of RAM at once.

Right. That makes sense, but your initial email didn't say "oh, I'm
adding more buckets." :-)

>
>> * reduces some lock contention
>
>
> More precisely patch adds check for congestion on free to grow bucket sizes
> same as on allocation. As consequence that indeed should reduce lock
> congestion, but I don't have specific numbers. All I see is that VM and UMA
> mutexes no longer appear in profiling top after all these changes.

Sure. But again, you don't say that in your commit message. :)

> * does soft back pressure
>
> In this list you have missed mentioning small but major point of the patch
> -- we should prevent problems, not just solve them. As I have written in
> original email, this specific change shown me 1.5x performance improvement
> in low-memory condition. As I understand, that happened because VM no longer
> have to repeatedly allocate and free hugely oversized buckets of 10-15 *
> 128K.

yup, sorry I missed this. It's a sneaky two lines. :)

>
>> * does the aggressive backpressure.
>
>
> After all above that is mostly just a safety belt. With 40GB RAM that code
> was triggered only couple times during full hour of testing with debug
> logging inserted there. On machine with 2GB RAM it is triggered quite
> regularly and probably that is unavoidable since even with lowest bucket
> size of one item 24 CPUs mean 48 cache buckets, i.e. up to 6MB of otherwise
> unreleasable memory for single 128K zone.
>
>
>> So, do you get any benefits from just the first one, or first two?
>
>
> I don't see much reason to handle that in pieces. As I have described above,
> each part has own goal, but they much better work together.

Well, with changes like this, having them broken up and committed in
small pieces make it easier for people to do regression testing with.

If you introduce some regression in a particular workload then the
user or developer is only going to find that it's this patch and won't
necessarily know how to break it down into pieces to see which piece
actually introduced the regression in their specific workload.

I totally agree that this should be done! It just does seem to be
something that could be committed in smaller pieces quite easily so to
make potential debugging later on down the road much easier. Each
commit builds on the previous commit.

So, something like (in order):

* add two new buckets, here's why
* fix locking, here's why
* soft back pressure
* aggressive backpressure

Did you get profiling traces from the VM free paths? Is it because
it's churning the physical pages through the VM physical allocator?
or?



-adrian

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 12:57:10 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id F2791286;
 Mon, 18 Nov 2013 12:57:09 +0000 (UTC)
Received: from mail-ee0-x229.google.com (mail-ee0-x229.google.com
 [IPv6:2a00:1450:4013:c00::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3ED0B284D;
 Mon, 18 Nov 2013 12:57:09 +0000 (UTC)
Received: by mail-ee0-f41.google.com with SMTP id t10so1216942eei.14
 for <multiple recipients>; Mon, 18 Nov 2013 04:57:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=6yV6N5N0Giemz4fI7O2fKbBixJGKUz5MaPhHRdIz0Js=;
 b=Jym+V3dyXUgWsnLHNYMAQYfyDAHnsvxNBudPJV4AJLLL50idcHR9W3e3chPrlEBuKQ
 FYng6MJSNcipFMnyyu9k6hT8JxtD7aPUViAINPqk4u/4mTnmNaD72m4oFfkwbgoVjRsr
 ErZPLs4TVvcpo4XzwZJ688GzjUYyGNRF0m43VOq6ZOQFLMlHS0KH1dv6u7+qdNdhcVyp
 PndiFPs31T5ZtC/P9NlC0J0KNhC+6GZqsLsLvXscsDT/2OHNSaCobLIJEZnBkE685Vnl
 4dCB+s3KjNap0HF/Xe2vVg2RGviq8kMJEm6pCiamcWedgEZzCNcZGzdTGRxPF58wNAz+
 Jmww==
X-Received: by 10.14.113.137 with SMTP id a9mr12600546eeh.3.1384779427676;
 Mon, 18 Nov 2013 04:57:07 -0800 (PST)
Received: from mavbook.mavhome.dp.ua ([178.137.150.35])
 by mx.google.com with ESMTPSA id 44sm37646908eek.5.2013.11.18.04.57.05
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 18 Nov 2013 04:57:06 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <528A0EA0.3040408@FreeBSD.org>
Date: Mon, 18 Nov 2013 14:57:04 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: UMA cache back pressure
References: <52894C92.60905@FreeBSD.org>	<CAJ-VmokYgfJ1tr-99qCXosBsyTZ698oLZ2oPpkdGODjo8+K3LQ@mail.gmail.com>	<5289DBF9.80004@FreeBSD.org>
 <CAJ-VmomiFBQaNUweOO56rkOYtQOvUdsa1O=2WuYpeKxyTka+WA@mail.gmail.com>
In-Reply-To: <CAJ-VmomiFBQaNUweOO56rkOYtQOvUdsa1O=2WuYpeKxyTka+WA@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 12:57:10 -0000

On 18.11.2013 14:10, Adrian Chadd wrote:
> On 18 November 2013 01:20, Alexander Motin <mav@freebsd.org> wrote:
>> On 18.11.2013 10:41, Adrian Chadd wrote:
>>> So, do you get any benefits from just the first one, or first two?
>>
>> I don't see much reason to handle that in pieces. As I have described above,
>> each part has own goal, but they much better work together.
>
> Well, with changes like this, having them broken up and committed in
> small pieces make it easier for people to do regression testing with.
>
> If you introduce some regression in a particular workload then the
> user or developer is only going to find that it's this patch and won't
> necessarily know how to break it down into pieces to see which piece
> actually introduced the regression in their specific workload.

I can't argue here, but too many small pieces turning later merging into 
a headache. This patch is not that big to not be reviewable at one 
piece. What's about better commit message -- your hint accepted. :)

> I totally agree that this should be done! It just does seem to be
> something that could be committed in smaller pieces quite easily so to
> make potential debugging later on down the road much easier. Each
> commit builds on the previous commit.
>
> So, something like (in order):
>
> * add two new buckets, here's why
> * fix locking, here's why
> * soft back pressure
> * aggressive backpressure

I can do that it you insist, I would just take different order 
(3,1,4,2). 2 without 3 will make buckets grow faster, that may be bad 
without back pressure.

> Did you get profiling traces from the VM free paths? Is it because
> it's churning the physical pages through the VM physical allocator?
> or?

Yes. Without use_uma enabled I've seen up to 50% of CPU time burned on 
locks held around expensive VM magic such as TLB shutdown, etc. With 
use_uma enabled situation improved a lot, but I've seen periodical 
bursts, which I guess happened when system was getting low on memory and 
started aggressively purge gigabytes of oversized caches. With this 
patch I haven't noticed such behavior so far at all, though it may be 
subjective since test runs quite some time and load is not very stationary.

-- 
Alexander Motin

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 19:01:45 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id DCC65C0;
 Mon, 18 Nov 2013 19:01:44 +0000 (UTC)
Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90])
 by mx1.freebsd.org (Postfix) with ESMTP id A27182F81;
 Mon, 18 Nov 2013 19:01:44 +0000 (UTC)
X-Ambrisko-Me: Yes
Received: from server2.ambrisko.com (HELO internal.ambrisko.com)
 ([192.168.1.2])
 by ironport.ambrisko.com with ESMTP; 18 Nov 2013 11:05:35 -0800
Received: from ambrisko.com (localhost [127.0.0.1])
 by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id rAIJ1hRO037251;
 Mon, 18 Nov 2013 11:01:43 -0800 (PST)
 (envelope-from ambrisko@ambrisko.com)
Received: (from ambrisko@localhost)
 by ambrisko.com (8.14.4/8.14.4/Submit) id rAIJ1gOT037249;
 Mon, 18 Nov 2013 11:01:42 -0800 (PST) (envelope-from ambrisko)
Date: Mon, 18 Nov 2013 11:01:42 -0800
From: Doug Ambrisko <ambrisko@ambrisko.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs
Message-ID: <20131118190142.GA28210@ambrisko.com>
References: <51B3B59B.8050903@erdgeist.org>
 <CAMBSHm8GMWffuuEcSpuNu26Mv4N2yAa2iEdw5koiXx0w30zPRQ@mail.gmail.com>
 <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org>
 <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20131116183129.GD59496@kib.kiev.ua>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-hackers@freebsd.org, Dirk Engling <erdgeist@erdgeist.org>,
 Jase Thew <jase@freebsd.org>, mdf@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 19:01:45 -0000

On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote:
| On Thu, Nov 14, 2013 at 05:08:54PM -0800, Doug Ambrisko wrote:
| > On Thu, Nov 14, 2013 at 09:32:17PM +0000, Jase Thew wrote:
| > | On 10/06/2013 16:52, John Baldwin wrote:
| > | > On Saturday, June 08, 2013 9:36:27 pm mdf@freebsd.org wrote:
| > | >> On Sat, Jun 8, 2013 at 3:52 PM, Dirk Engling <erdgeist@erdgeist.org> wrote:
| > | >>
| > | >>> The arbitrary value
| > | >>>
| > | >>> #define MNAMELEN        88              /* size of on/from name bufs */
| > | >>>
| > | >>> struct statfs {
| > | >>> [...]
| > | >>>         char    f_mntfromname[MNAMELEN];/* mounted filesystem */
| > | >>>         char    f_mntonname[MNAMELEN];  /* directory on which mounted */
| > | >>> };
| > | >>>
| > | >>> currently bites us when trying to use poudriere with errors like
| > | >>>
| > | >>> 'mount: tmpfs: File name too long'
| > | >>>
| > | >>>
| > | >>> /poudriere/data/build/91_RELEASE_amd64-REALLY-REALLY-LONG-
| > | > JAILNAME/ref/wrkdirs
| > | >>>
| > | >>> The topic has been discussed several times since 2004 and has been
| > | >>> postponed each time, the last time when it hit zfs users:
| > | >>>
| > | >>> http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007974.html
| > | >>>
| > | >>> So I'd like to point to the calendar, it's 2013 already and there's
| > | >>> still a static arbitrary (and way too low) limit in one of the core
| > | >>> areas of the vfs code.
| > | >>>
| > | >>> So I'd like to bump the issue and propose either making f_mntfromname a
| > | >>> dynamic allocation or just increase MNAMELEN, using 10.0 as water shed.
| > | >>>
| > | >>
| > | >> Gleb Kurtsou did this along with the ino64 GSoC project.  Unfortunately,
| > | >>  both he and I hit ENOTIME due to the job that pays the bills and it's
| > | >> never made it back to the main repository.
| > | >>
| > | >> IIRC, though, the only reason for doing it with 64-bit ino_t is that he'd
| > | >> already finished changing the stat/dirent ABI so what was one more.  I
| > | >> think he went with 1024 bytes, which also necessitated not allocating
| > | >> statfs on the stack for the kernel.
| > | > 
| > | > He also fixed a few other things since changing this ABI is so invasive
| > | > IIRC.  This really is the right fix for this.  Is it in an svn branch 
| > | > that can be updated and a new patch generated?
| > | > 
| > | 
| > | Hi folks,
| > | 
| > | Has there been any progress on addressing this issue? With the advent of
| > | pkgng / poudriere, this limitation is really becoming a frustrating problem.
| > 
| > I looked at NetBSD and what they did with statvfs.  The mount paths
| > lengths are bigger in NetBSD defines so that helps.  However, when
| > testing it out via a script that keep on doing a nullfs mount in 
| > every increasing directory depth I found that NetBSD would allow the
| > mount to exceed the value in statvfs.  When NetBSD populates the path
| > in statvfs they truncate it to what fits in statvfs.  So I looked at
| > what that might be like in FreeBSD.  So I came up with this simple patch:
| > 
| > --- /sys/kern/vfs_mount.c	2013-10-01 14:27:35.000000000 -0700
| > +++ vfs_mount.c	2013-10-21 14:20:19.000000000 -0700
| > @@ -656,7 +656,7 @@ vfs_donmount(struct thread *td, uint64_t
| >  	 * variables will fit in our mp buffers, including the
| >  	 * terminating NUL.
| >  	 */
| > -	if (fstypelen >= MFSNAMELEN - 1 || fspathlen >= MNAMELEN - 1) {
| > +	if (fstypelen >= MFSNAMELEN - 1 || fspathlen >= MAXPATHLEN - 1) {
| >  		error = ENAMETOOLONG;
| >  		goto bail;
| >  	}
| > @@ -748,8 +748,8 @@ sys_mount(td, uap)
| >  		return (EOPNOTSUPP);
| >  	}
| >  
| > -	ma = mount_argsu(ma, "fstype", uap->type, MNAMELEN);
| > -	ma = mount_argsu(ma, "fspath", uap->path, MNAMELEN);
| > +	ma = mount_argsu(ma, "fstype", uap->type, MFSNAMELEN);
| > +	ma = mount_argsu(ma, "fspath", uap->path, MAXPATHLEN);
| >  	ma = mount_argb(ma, flags & MNT_RDONLY, "noro");
| >  	ma = mount_argb(ma, !(flags & MNT_NOSUID), "nosuid");
| >  	ma = mount_argb(ma, !(flags & MNT_NOEXEC), "noexec");
| > @@ -1039,7 +1039,7 @@ vfs_domount(
| >  	 * variables will fit in our mp buffers, including the
| >  	 * terminating NUL.
| >  	 */
| > -	if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MNAMELEN)
| > +	if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MAXPATHLEN)
| >  		return (ENAMETOOLONG);
| >  
| >  	if (jailed(td->td_ucred) || usermount == 0) {
| > @@ -1095,9 +1095,9 @@ vfs_domount(
| >  	NDFREE(&nd, NDF_ONLY_PNBUF);
| >  	vp = nd.ni_vp;
| >  	if ((fsflags & MNT_UPDATE) == 0) {
| > -		pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK);
| > +		pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK);
| >  		strcpy(pathbuf, fspath);
| > -		error = vn_path_to_global_path(td, vp, pathbuf, MNAMELEN);
| > +		error = vn_path_to_global_path(td, vp, pathbuf, MAXPATHLEN);
| >  		/* debug.disablefullpath == 1 results in ENODEV */
| >  		if (error == 0 || error == ENODEV) {
| >  			error = vfs_domount_first(td, vfsp, pathbuf, vp,
| > @@ -1147,8 +1147,8 @@ sys_unmount(td, uap)
| >  			return (error);
| >  	}
| >  
| > -	pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK);
| > -	error = copyinstr(uap->path, pathbuf, MNAMELEN, NULL);
| > +	pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK);
| > +	error = copyinstr(uap->path, pathbuf, MAXPATHLEN, NULL);
| >  	if (error) {
| >  		free(pathbuf, M_TEMP);
| >  		return (error);
| > @@ -1181,7 +1181,7 @@ sys_unmount(td, uap)
| >  			vfslocked = NDHASGIANT(&nd);
| >  			NDFREE(&nd, NDF_ONLY_PNBUF);
| >  			error = vn_path_to_global_path(td, nd.ni_vp, pathbuf,
| > -			    MNAMELEN);
| > +			    MAXPATHLEN);
| >  			if (error == 0 || error == ENODEV)
| >  				vput(nd.ni_vp);
| >  			VFS_UNLOCK_GIANT(vfslocked);
| > 
| > I seemed to have found a typo bug in an instance in which MFSNAMELEN
| > wasn't used in the fstype when I did this change.
| > 
| > With this patch things in general seem to work.  You can do a
| > mount and umount of a long path.  The umount of the long path works
| > by failing on the exact match but then passing when via the FSID.
| > df/mount looks a little strange since it shows a truncated path 
| > but has valid contents (FS type, space etc.).  umount via the truncated
| > path works if there is only one truncated path that matches.  If there
| > are multiple then it fails.
| > 
| > This doesn't change and kernel ABI's so then it is safe to apply to the
| > kernel without rebuilding user-land.
| > 
| > Future work could be to implement statvfs to return a longer path but
| > only do it for df/umount etc.  The rest of the system could continue
| > with the existing statfs.  mount works because it passed a string into
| > the kernel so it can be long.
| > 
| > I'd propose this as a current solution to this problem.  It appears to
| > work well and shouldn't drastically break things.  Doing df via the
| > full path, stat etc. work since the associated path access the vnode.
| > So things that do a mount, df of the mount point etc. should continue
| > to work.  Scripts that try to figure out the mount points vi df and mount
| > displaying all mount points would fail.  That is probably good enough for
| > now.
| > 
| > Comments welcomed.
| 
| Generally, I agree with the approach, but what is done seems to be too
| simple to be usable.

I like the simplicity and I'd like to see examples of not being usable.
 
| One obvious and important thing which is broken with the patch is
| the unmounts from jails. In other words, now it is possible to mount
| something from jail with appropriate privileges set up, and after that
| corresponding mount cannot be unmounted, since vfs_mount_alloc() copies
| trimmed path into f_mntonname, and sys_unmount() matches full path with
| pathbuf.  Hmm, this should be broken in the same way for non-jailed
| mounts with pathes which do not fit into f_mntonname.

They can be umounted since it will fall back to fsid as in the non-jail
case.  I just tried it sorry for the bad line wrap:

+ mount 192.168.38.1:/data/home/ambrisko/netboot /data/jail/test
+ jail -i -c name=test path=/data/jail/test host.hostname=test.ambrisko.com persist enforce_statfs=0 allow.mount=1 allow.mount.devfs=1 allow.mount.nullfs=1 allow.mount.tmpfs=1 allow.mount.procfs=1
14
+ jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc
+ jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev
+ jexec test df
+ egrep '^devfs|^procfs'
devfs                                             2         2          0   100%    /dev
procfs                                            8         8          0   100%    /proc
+ jexec test mount -t procfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc
+ jexec test mount -t devfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev
+ jexec test df
+ egrep '^devfs|^procfs'
devfs                                             2         2          0   100%    /dev
procfs                                            8         8          0   100%    /proc
procfs                                            8         8          0   100%    /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901
devfs                                             2         2          0   100%    /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901
+ jexec test mount -v
+ egrep '^devfs|^procfs'
devfs on /dev (devfs, local, multilabel, fsid 00ff007171000000)
procfs on /proc (procfs, local, fsid 02ff000202000000)
procfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (procfs, local, fsid 26ff000202000000)
devfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (devfs, local, multilabel, fsid 27ff007171000000)
+ jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc
+ jexec test df
+ egrep '^devfs|^procfs'
devfs                                             2         2          0   100%    /dev
procfs                                            8         8          0   100%    /proc
devfs                                             2         2          0   100%    /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901
+ jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev
+ jexec test df
+ egrep '^devfs|^procfs'
devfs                                             2         2          0   100%    /dev
procfs                                            8         8          0   100%    /proc

| I think that struct mount should have a const char * field where the
| non-trimmed path is stored and used for match at unmount. f_mntonname
| truncation would be only unfortunate user interface glitch.

Note that we are not storing the path in mount structure so no structures
have changed which is nice since then we haven't introduced any real
ABI breakage.  So we could MFC this.  The match isn't critical since
umount will fall back to fsid and work.  One thing that might be good to
do is change umount to try to umount via fsid first and then do the
match if the fsid failed versus the other way round that it does now.

The problem I see is if someone tries to do things based on the parsed
output of mount/df then that will fail since the output is truncated.

Thanks for looking at this,

Doug A.

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 19:55:19 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 99D13167;
 Mon, 18 Nov 2013 19:55:19 +0000 (UTC)
Received: from mail-ea0-x233.google.com (mail-ea0-x233.google.com
 [IPv6:2a00:1450:4013:c01::233])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id F24A82364;
 Mon, 18 Nov 2013 19:55:18 +0000 (UTC)
Received: by mail-ea0-f179.google.com with SMTP id r15so2667889ead.10
 for <multiple recipients>; Mon, 18 Nov 2013 11:55:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=k6Xz+6WHeny21iheK79VQT8uRr9JgaXrrJ13RPuulnA=;
 b=rS/J3wBsZDiqJsRAE8+wIIsD1dwHp+Fw+3fQG49zL+U+voHnP4vuQSk6aDkrxbC5Br
 IU7jK6+XoliJeJQg83hP1EmyngaoIx/7bVWQhDiCwN/uuKIaDKww9+uMggocALWK/zyo
 5TDAJx3Ov8puayeTNvySX2rdQuZLhkGy3sY28v+UW/3U+fxLY/ZFVzMgBPNJk2CUiGE4
 AI0hFBArQ32lX/l/FS0n3Z+FrRVoyTcdCih813FFhJ4lfyq3sF03IfUWcrOEFNwH8LjL
 qYOKLnQLCpWdcpM0BX146ZKpXZIDax8bucmT766qkkkha8EQ920wHVTccXb5mzA4lnhd
 k0kQ==
X-Received: by 10.14.0.72 with SMTP id 48mr5414158eea.50.1384804517422;
 Mon, 18 Nov 2013 11:55:17 -0800 (PST)
Received: from mavbook.mavhome.dp.ua ([178.137.150.35])
 by mx.google.com with ESMTPSA id w6sm41027683eeo.12.2013.11.18.11.55.15
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 18 Nov 2013 11:55:16 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <528A70A2.4010308@FreeBSD.org>
Date: Mon, 18 Nov 2013 21:55:14 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: Jeff Roberson <jroberson@jroberson.net>
Subject: Re: UMA cache back pressure
References: <52894C92.60905@FreeBSD.org>
 <alpine.BSF.2.00.1311180857540.2109@desktop>
In-Reply-To: <alpine.BSF.2.00.1311180857540.2109@desktop>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 19:55:19 -0000

On 18.11.2013 21:11, Jeff Roberson wrote:
> On Mon, 18 Nov 2013, Alexander Motin wrote:
>> I've created patch, based on earlier work of avg@, to add back
>> pressure to UMA allocation caches. The problem of physical memory or
>> KVA exhaustion existed there for many years and it is quite critical
>> now for improving systems performance while keeping stability. Changes
>> done in memory allocation last years improved situation. but haven't
>> fixed completely. My patch solves remaining problems from two sides:
>> a) reducing bucket sizes every time system detects low memory
>> condition; and b) as last-resort mechanism for very low memory
>> condition, it cycling over all CPUs to purge their per-CPU UMA caches.
>> Benefit of this approach is in absence of any additional hard-coded
>> limits on cache sizes -- they are self-tuned, based on load and memory
>> pressure.
>>
>> With this change I believe it should be safe enough to enable UMA
>> allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for
>> amd64). I did many tests on machine with 24 logical cores (and as
>> result strong allocation cache effects), and can say that with 40GB
>> RAM using UMA caches, allowed by this change, by two times increases
>> results of SPEC NFS benchmark on ZFS pool of several SSDs. To test
>> system stability I've run the same test with physical memory limited
>> to just 2GB and system successfully survived that, and even showed
>> results 1.5 times better then with just last resort measures of b). In
>> both cases tools/umastat no longer shows unbound UMA cache growth,
>> that makes me believe in viability of this approach for longer runs.
>>
>> I would like to hear some comments about that:
>> http://people.freebsd.org/~mav/uma_pressure.patch
>
> Hey Mav,
>
> This is a great start and great results.  I think it could probably even
> go in as-is, but I have a few suggestions.

Hey! Thanks for your review. I appreciate.

> First, let's test this with something that is really super allocator
> heavy and doesn't benefit much from bucket sizing.  For example, a
> network forwarding test.  Or maybe you could get someone like Netflix
> that is using it to push a lot of bits with less filesystem cost than
> zfs and spec.

I am not sure what simple forwarding may show in this case. Even on my 
workload with ZFS creating strong memory pressure I still have mbuf* 
zones buckets almost (some totally) maxed out. Without other major (or 
even any) pressure in system they just can't become bigger then maximum. 
But if you can propose some interesting test case with pressure that I 
can reproduce -- I am all ears.

> Second, the cpu binding is a very costly and very high-latency
> operation. It would make sense to do CPU_FOREACH and then ZONE_FOREACH.
> You're also biasing the first zones in the list.  The low memory
> condition will more often clear after you check these first zones.  So
> you might just check it once and equally penalize all zones.  I'm
> concerned that doing CPU_FOREACH in every zone will slow the pagedaemon
> more.

I completely agree with all you said here. This part of code I just took 
as-is from earlier work. It definitely can be improved. I'll take a look 
on that. But as I have mentioned in one of earlier responses that code 
used in _very_ rare cases, unless system is heavily overloaded on 
memory, like doing ZFS on box with 24 cores and 2GB RAM. During 
reasonable operation it is enough to have soft back pressure to keep on 
caches in shape and never call that.

> We also have been working towards per-domain pagedaemons so
> perhaps we should have a uma-reclaim taskqueue that we wake up to do the
> work?

VM is not my area so far, so please propose "the right way". I took this 
task now only because I have to due to huge performance bottleneck this 
problem causes and years it remains unsolved.

> Third, using vm_page_count_min() will only trigger when the pageout
> daemon can't keep up with the free target.  Typically this should only
> happen with a lot of dirty mmap'd pages or incredibly high system load
> coupled with frequent allocations.  So there may be many cases where
> reclaiming the extra UMA memory is helpful but the pagedaemon can still
> keep up while pushing out file pages that we'd prefer to keep.

As I have told that is indeed last resort. It does not need to be done 
often. Per-CPU caches just should not grow without real need to the 
point when they have to be cleaned.

> I think the perfect heuristic would have some idea of how likely the UMA
> pages are to be re-used immediately so we can more effectively tradeoff
> between file pages and kernel memory cache.  As it is now we limit the
> uma_reclaim() calls to every 10 seconds when there is memory pressure.
> Perhaps we could keep a timestamp for when the last slab was allocated
> to a zone and do the more expensive reclaim on zones who have timestamps
> that exceed some threshold?  Then have a lower threshold for reclaiming
> at all? Again, it doesn't need to be perfect, but I believe we can catch
> a wider set of cases by carefully scheduling this.

I was thinking about that too. But I think timestamps should be set not 
on slab, but on bucket. The fact that zone is not allocating new slabs 
does not mean it does not use its already allocated buckets. If we put 
time of the last refill into each bucket, then we should be able to 
purge all buckets, unused for specified period of time. Additionally we 
could put timestamp on zone and update it every time zone runs out of 
its cache. If zone does not run out of cache for some time -- probably 
it has unused buckets. So when we need some RAM we should take a first 
look on zones that had stale timestamp.

-- 
Alexander Motin

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 20:12:19 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9997E9E1;
 Mon, 18 Nov 2013 20:12:19 +0000 (UTC)
Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com
 [IPv6:2607:f8b0:400d:c01::234])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3A4D5246A;
 Mon, 18 Nov 2013 20:12:19 +0000 (UTC)
Received: by mail-qc0-f180.google.com with SMTP id e16so2209772qcx.25
 for <multiple recipients>; Mon, 18 Nov 2013 12:12:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=HqgiATGrJ/ODsDXBEhSOw2BEkJggcTDFe3Y+u3r8XaI=;
 b=h9blKoVL7Rtp3CsyIz17NF/++7F9nW8bKhDu1mlvf3SGX3ImW1PpVg/QQ2F3H24TSw
 ndOpDd7ddtDE4ep50g7o5Xu6C5FH1DqJvrkjq2Q9D1lR9AYX7MVWOiqwOmLbaSp8NHRw
 OVDJEOYnXmvMEqKlFwGsTpKsgHLFf+dqE6WfPrwoTTOL+/bvc5B/HgICi9Vaj1TWYzzA
 oRgLUABk2ApiQP8AwnftpPMlLvyYr0AJf9iBcqtOrpebHNLqYm3wUSRR+HReFOpPr4Iq
 5FMTB28tAc+egTVHaQ3lWNBFjGyCWV1dHxYrbDDH3EuQmjqepw32We4cZMN2s3bKfOPW
 P9Jw==
MIME-Version: 1.0
X-Received: by 10.49.71.207 with SMTP id x15mr37164431qeu.49.1384805538000;
 Mon, 18 Nov 2013 12:12:18 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.207.66 with HTTP; Mon, 18 Nov 2013 12:12:17 -0800 (PST)
In-Reply-To: <528A70A2.4010308@FreeBSD.org>
References: <52894C92.60905@FreeBSD.org>
 <alpine.BSF.2.00.1311180857540.2109@desktop>
 <528A70A2.4010308@FreeBSD.org>
Date: Mon, 18 Nov 2013 12:12:17 -0800
X-Google-Sender-Auth: -bRyyF1NPA-IgriJgTBwA18SvkM
Message-ID: <CAJ-VmonWKP05OLzx3ExXd8ufxi_9EMso6_G_UZiMOWNWq22nzQ@mail.gmail.com>
Subject: Re: UMA cache back pressure
From: Adrian Chadd <adrian@freebsd.org>
To: Alexander Motin <mav@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>,
 Jeff Roberson <jroberson@jroberson.net>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 20:12:19 -0000

Remember that for Netflix, we have a mostly non-cachable workload
(with some very specific exceptions!) and thus we churn through VM
pages at a presitidigious rate. 20gbit sec, or ~ 2.4 gigabytes a
second, or ~ 680,000 4 kilobyte pages a second. It's quite frightening
and it's only likely to increase.

There's a lot of pressure from all over the place so IIRC pools tend
to not stay very large for very long.

That's why I'm interested in your specific situations. Doing an all
CPU TLB shootdown with 24 cores is costly. But after we killed some
incorrect KVA mapping flags for sendfile, we (netflix) totally stopped
seeing the TLB shootdown and IPIs in any of the performance traces.
Now, doing 24 cores worth of ZFS when you let the pools grow to the
size you do is understandable, but I'd like to just make sure that you
aren't breaking performance for people doing different workloads on
less cores.

I'm a bit busy at work with other things so I can't spin up your patch
on a cache for another week or two. But I'll certainly get around to
it as I'd like to see this stuff catch on.

What I _can_ do in a reasonably immediate timeframe is update
vm0.freebsd.org to the latest -HEAD and stress test your patch out.
I'm using vm0.freebsd.org to stress test -HEAD with ZFS doing
concurrent poudriere builds so it gets very crowded on that box. The
box currently survives a couple days before I hit some races to do
with vnode exhaustion and a lack of handling there, and ZFS deadlocks.
I'll just run this up to see if anything unexpected happens that
causes it to blow up in a different way.

Thanks,



-adrian


On 18 November 2013 11:55, Alexander Motin <mav@freebsd.org> wrote:
> On 18.11.2013 21:11, Jeff Roberson wrote:
>>
>> On Mon, 18 Nov 2013, Alexander Motin wrote:
>>>
>>> I've created patch, based on earlier work of avg@, to add back
>>> pressure to UMA allocation caches. The problem of physical memory or
>>> KVA exhaustion existed there for many years and it is quite critical
>>> now for improving systems performance while keeping stability. Changes
>>> done in memory allocation last years improved situation. but haven't
>>> fixed completely. My patch solves remaining problems from two sides:
>>> a) reducing bucket sizes every time system detects low memory
>>> condition; and b) as last-resort mechanism for very low memory
>>> condition, it cycling over all CPUs to purge their per-CPU UMA caches.
>>> Benefit of this approach is in absence of any additional hard-coded
>>> limits on cache sizes -- they are self-tuned, based on load and memory
>>> pressure.
>>>
>>> With this change I believe it should be safe enough to enable UMA
>>> allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for
>>> amd64). I did many tests on machine with 24 logical cores (and as
>>> result strong allocation cache effects), and can say that with 40GB
>>> RAM using UMA caches, allowed by this change, by two times increases
>>> results of SPEC NFS benchmark on ZFS pool of several SSDs. To test
>>> system stability I've run the same test with physical memory limited
>>> to just 2GB and system successfully survived that, and even showed
>>> results 1.5 times better then with just last resort measures of b). In
>>> both cases tools/umastat no longer shows unbound UMA cache growth,
>>> that makes me believe in viability of this approach for longer runs.
>>>
>>> I would like to hear some comments about that:
>>> http://people.freebsd.org/~mav/uma_pressure.patch
>>
>>
>> Hey Mav,
>>
>> This is a great start and great results.  I think it could probably even
>> go in as-is, but I have a few suggestions.
>
>
> Hey! Thanks for your review. I appreciate.
>
>
>> First, let's test this with something that is really super allocator
>> heavy and doesn't benefit much from bucket sizing.  For example, a
>> network forwarding test.  Or maybe you could get someone like Netflix
>> that is using it to push a lot of bits with less filesystem cost than
>> zfs and spec.
>
>
> I am not sure what simple forwarding may show in this case. Even on my
> workload with ZFS creating strong memory pressure I still have mbuf* zones
> buckets almost (some totally) maxed out. Without other major (or even any)
> pressure in system they just can't become bigger then maximum. But if you
> can propose some interesting test case with pressure that I can reproduce --
> I am all ears.
>
>
>> Second, the cpu binding is a very costly and very high-latency
>> operation. It would make sense to do CPU_FOREACH and then ZONE_FOREACH.
>> You're also biasing the first zones in the list.  The low memory
>> condition will more often clear after you check these first zones.  So
>> you might just check it once and equally penalize all zones.  I'm
>> concerned that doing CPU_FOREACH in every zone will slow the pagedaemon
>> more.
>
>
> I completely agree with all you said here. This part of code I just took
> as-is from earlier work. It definitely can be improved. I'll take a look on
> that. But as I have mentioned in one of earlier responses that code used in
> _very_ rare cases, unless system is heavily overloaded on memory, like doing
> ZFS on box with 24 cores and 2GB RAM. During reasonable operation it is
> enough to have soft back pressure to keep on caches in shape and never call
> that.
>
>
>> We also have been working towards per-domain pagedaemons so
>> perhaps we should have a uma-reclaim taskqueue that we wake up to do the
>> work?
>
>
> VM is not my area so far, so please propose "the right way". I took this
> task now only because I have to due to huge performance bottleneck this
> problem causes and years it remains unsolved.
>
>
>> Third, using vm_page_count_min() will only trigger when the pageout
>> daemon can't keep up with the free target.  Typically this should only
>> happen with a lot of dirty mmap'd pages or incredibly high system load
>> coupled with frequent allocations.  So there may be many cases where
>> reclaiming the extra UMA memory is helpful but the pagedaemon can still
>> keep up while pushing out file pages that we'd prefer to keep.
>
>
> As I have told that is indeed last resort. It does not need to be done
> often. Per-CPU caches just should not grow without real need to the point
> when they have to be cleaned.
>
>
>> I think the perfect heuristic would have some idea of how likely the UMA
>> pages are to be re-used immediately so we can more effectively tradeoff
>> between file pages and kernel memory cache.  As it is now we limit the
>> uma_reclaim() calls to every 10 seconds when there is memory pressure.
>> Perhaps we could keep a timestamp for when the last slab was allocated
>> to a zone and do the more expensive reclaim on zones who have timestamps
>> that exceed some threshold?  Then have a lower threshold for reclaiming
>> at all? Again, it doesn't need to be perfect, but I believe we can catch
>> a wider set of cases by carefully scheduling this.
>
>
> I was thinking about that too. But I think timestamps should be set not on
> slab, but on bucket. The fact that zone is not allocating new slabs does not
> mean it does not use its already allocated buckets. If we put time of the
> last refill into each bucket, then we should be able to purge all buckets,
> unused for specified period of time. Additionally we could put timestamp on
> zone and update it every time zone runs out of its cache. If zone does not
> run out of cache for some time -- probably it has unused buckets. So when we
> need some RAM we should take a first look on zones that had stale timestamp.
>
>
> --
> Alexander Motin
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Mon Nov 18 19:23:08 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8DA41C41
 for <freebsd-hackers@freebsd.org>; Mon, 18 Nov 2013 19:23:08 +0000 (UTC)
Received: from mail-pb0-f45.google.com (mail-pb0-f45.google.com
 [209.85.160.45])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 663E9210B
 for <freebsd-hackers@freebsd.org>; Mon, 18 Nov 2013 19:23:08 +0000 (UTC)
Received: by mail-pb0-f45.google.com with SMTP id rp16so736609pbb.18
 for <freebsd-hackers@freebsd.org>; Mon, 18 Nov 2013 11:23:02 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id
 :references:user-agent:mime-version:content-type;
 bh=Bu5VE7OEDP7NC5GovIKIn+lYk8jOjHUwIFfitNaW8Iw=;
 b=L+/gZiGy3Sfi2nw0CkclUwn7gcUKx0ahXqwIw+fOGO7kzItJ1qhYHq+1ihDU4ofUXj
 yCX3VFHzE7K+3GupH5XWS3u1YXnta1TsIz/SNM9jYesaKuZgEb6rqlU44Cbfl3X4IaPX
 sF11ECzig7lAJa/d/H+n23oPmj+4in3WPvHyRS6TiBLccwAGgFpE0HGDAn8mHL43ieEf
 JcKbe4HMrHT1HIjzzKEl6GC+05HdTbCRIqODi/5Y/3srcRh0QKmFQ6TAvZGis5vFGb6V
 2M9jKpnLeCV02mC2jeKVU1MbqAtiY6PKVI/mHCxyKY+iMJTgPXhWzE+W3N/ZEkjwUQEA
 c1OA==
X-Gm-Message-State: ALoCoQmARYzGT0Yu/o7OyAe+14YJf2i2JVFz3SbkeAc5BsC/fXJeITRxNbhKEjLw+1BAC+AIKOuq
X-Received: by 10.68.180.162 with SMTP id dp2mr22499630pbc.5.1384802106553;
 Mon, 18 Nov 2013 11:15:06 -0800 (PST)
Received: from rrcs-66-91-135-210.west.biz.rr.com
 (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210])
 by mx.google.com with ESMTPSA id gg10sm25139867pbc.46.2013.11.18.11.15.04
 for <multiple recipients>
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Mon, 18 Nov 2013 11:15:05 -0800 (PST)
Date: Mon, 18 Nov 2013 09:11:10 -1000 (HST)
From: Jeff Roberson <jroberson@jroberson.net>
X-X-Sender: jroberson@desktop
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: UMA cache back pressure
In-Reply-To: <52894C92.60905@FreeBSD.org>
Message-ID: <alpine.BSF.2.00.1311180857540.2109@desktop>
References: <52894C92.60905@FreeBSD.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Mon, 18 Nov 2013 20:16:59 +0000
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Nov 2013 19:23:08 -0000

On Mon, 18 Nov 2013, Alexander Motin wrote:

> Hi.
>
> I've created patch, based on earlier work of avg@, to add back pressure to 
> UMA allocation caches. The problem of physical memory or KVA exhaustion 
> existed there for many years and it is quite critical now for improving 
> systems performance while keeping stability. Changes done in memory 
> allocation last years improved situation. but haven't fixed completely. My 
> patch solves remaining problems from two sides: a) reducing bucket sizes 
> every time system detects low memory condition; and b) as last-resort 
> mechanism for very low memory condition, it cycling over all CPUs to purge 
> their per-CPU UMA caches. Benefit of this approach is in absence of any 
> additional hard-coded limits on cache sizes -- they are self-tuned, based on 
> load and memory pressure.
>
> With this change I believe it should be safe enough to enable UMA allocation 
> caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for amd64). I did 
> many tests on machine with 24 logical cores (and as result strong allocation 
> cache effects), and can say that with 40GB RAM using UMA caches, allowed by 
> this change, by two times increases results of SPEC NFS benchmark on ZFS pool 
> of several SSDs. To test system stability I've run the same test with 
> physical memory limited to just 2GB and system successfully survived that, 
> and even showed results 1.5 times better then with just last resort measures 
> of b). In both cases tools/umastat no longer shows unbound UMA cache growth, 
> that makes me believe in viability of this approach for longer runs.
>
> I would like to hear some comments about that:
> http://people.freebsd.org/~mav/uma_pressure.patch

Hey Mav,

This is a great start and great results.  I think it could probably even 
go in as-is, but I have a few suggestions.

First, let's test this with something that is really super allocator heavy 
and doesn't benefit much from bucket sizing.  For example, a network 
forwarding test.  Or maybe you could get someone like Netflix that is 
using it to push a lot of bits with less filesystem cost than zfs and 
spec.

Second, the cpu binding is a very costly and very high-latency operation. 
It would make sense to do CPU_FOREACH and then ZONE_FOREACH.  You're also 
biasing the first zones in the list.  The low memory condition will more 
often clear after you check these first zones.  So you might just check it 
once and equally penalize all zones.  I'm concerned that doing CPU_FOREACH 
in every zone will slow the pagedaemon more.  We also have been working 
towards per-domain pagedaemons so perhaps we should have a uma-reclaim 
taskqueue that we wake up to do the work?

Third, using vm_page_count_min() will only trigger when the pageout daemon 
can't keep up with the free target.  Typically this should only happen 
with a lot of dirty mmap'd pages or incredibly high system load coupled 
with frequent allocations.  So there may be many cases where reclaiming 
the extra UMA memory is helpful but the pagedaemon can still keep up while 
pushing out file pages that we'd prefer to keep.

I think the perfect heuristic would have some idea of how likely the UMA 
pages are to be re-used immediately so we can more effectively tradeoff 
between file pages and kernel memory cache.  As it is now we limit the 
uma_reclaim() calls to every 10 seconds when there is memory pressure. 
Perhaps we could keep a timestamp for when the last slab was allocated to 
a zone and do the more expensive reclaim on zones who have timestamps that 
exceed some threshold?  Then have a lower threshold for reclaiming at all? 
Again, it doesn't need to be perfect, but I believe we can catch a wider 
set of cases by carefully scheduling this.

Thanks,
Jeff

>
> Thank you.
>
> -- 
> Alexander Motin
>

From owner-freebsd-hackers@FreeBSD.ORG  Tue Nov 19 03:57:04 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id BD4D9D70
 for <freebsd-hackers@freebsd.org>; Tue, 19 Nov 2013 03:57:04 +0000 (UTC)
Received: from mail-pd0-f180.google.com (mail-pd0-f180.google.com
 [209.85.192.180])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 92A25206C
 for <freebsd-hackers@freebsd.org>; Tue, 19 Nov 2013 03:57:04 +0000 (UTC)
Received: by mail-pd0-f180.google.com with SMTP id q10so2319679pdj.11
 for <freebsd-hackers@freebsd.org>; Mon, 18 Nov 2013 19:56:57 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id
 :references:user-agent:mime-version:content-type;
 bh=6VF9TAgBSajXXWBrk0LFqIW0eS5I0BGL9U8kzpTXeFk=;
 b=cTQdOMMsnrxmU+tt8HhLXeJMAFKnNMs5kIO7Dpq7rtMSb0G3fF23RuwkHM3UQrfFEw
 yfQvv+tQcuW3Z4sytZBrQYqGwmTqDnpISO/nqo4vvqj3Losp1OpqZnBfGTlQpqT2KA91
 4e3FZJ1lQ2i3+oJ7BpfTM0opZBnDrgzb1BrDOs+xxH6dFj4G5umYFcBEKJBl2zs5Mrng
 CeKRHFpSsXdRIsyimfiu3NqXTyqb9xPMRaeEpX+kZt/JxKghApwXexmmd8JBcXw0hoya
 YObO3udtCGR0nd6jw1p/nSsjrXOSizz9U/tzGob9dBaWieKAydDAIGBBBpBCB/KSFTLi
 N7+A==
X-Gm-Message-State: ALoCoQnKzwJ1i4qKWuMjUxPDRIDmGmbQYJ8YLv2A8jadXD5Yi4hOTxcXOL87vW51YLgcZSq9EfnE
X-Received: by 10.69.11.130 with SMTP id ei2mr6017490pbd.144.1384833417886;
 Mon, 18 Nov 2013 19:56:57 -0800 (PST)
Received: from rrcs-66-91-135-210.west.biz.rr.com
 (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210])
 by mx.google.com with ESMTPSA id g8sm13723486pbe.37.2013.11.18.19.56.55
 for <multiple recipients>
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Mon, 18 Nov 2013 19:56:57 -0800 (PST)
Date: Mon, 18 Nov 2013 17:52:59 -1000 (HST)
From: Jeff Roberson <jroberson@jroberson.net>
X-X-Sender: jroberson@desktop
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: UMA cache back pressure
In-Reply-To: <CAJ-VmonWKP05OLzx3ExXd8ufxi_9EMso6_G_UZiMOWNWq22nzQ@mail.gmail.com>
Message-ID: <alpine.BSF.2.00.1311181751010.2109@desktop>
References: <52894C92.60905@FreeBSD.org>
 <alpine.BSF.2.00.1311180857540.2109@desktop> <528A70A2.4010308@FreeBSD.org>
 <CAJ-VmonWKP05OLzx3ExXd8ufxi_9EMso6_G_UZiMOWNWq22nzQ@mail.gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Tue, 19 Nov 2013 04:02:28 +0000
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Alexander Motin <mav@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 03:57:04 -0000

On Mon, 18 Nov 2013, Adrian Chadd wrote:

> Remember that for Netflix, we have a mostly non-cachable workload
> (with some very specific exceptions!) and thus we churn through VM
> pages at a presitidigious rate. 20gbit sec, or ~ 2.4 gigabytes a
> second, or ~ 680,000 4 kilobyte pages a second. It's quite frightening
> and it's only likely to increase.
>
> There's a lot of pressure from all over the place so IIRC pools tend
> to not stay very large for very long.

I think the combination of a lot of cache pressure, a lot of allocator 
use, and no ZFS makes you an interesting candidate.

>
> That's why I'm interested in your specific situations. Doing an all
> CPU TLB shootdown with 24 cores is costly. But after we killed some
> incorrect KVA mapping flags for sendfile, we (netflix) totally stopped

Do you have any information on what this change was?

> seeing the TLB shootdown and IPIs in any of the performance traces.
> Now, doing 24 cores worth of ZFS when you let the pools grow to the
> size you do is understandable, but I'd like to just make sure that you
> aren't breaking performance for people doing different workloads on
> less cores.

We also have opportunities now with vmem to cache KVA backed pages and 
release them together in bulk when necessary.  However, remember most UMA 
memory won't need an IPI since it comes from the direct map.   Only the 
few zones which use very large allocations will.

Jeff

>
> I'm a bit busy at work with other things so I can't spin up your patch
> on a cache for another week or two. But I'll certainly get around to
> it as I'd like to see this stuff catch on.
>
> What I _can_ do in a reasonably immediate timeframe is update
> vm0.freebsd.org to the latest -HEAD and stress test your patch out.
> I'm using vm0.freebsd.org to stress test -HEAD with ZFS doing
> concurrent poudriere builds so it gets very crowded on that box. The
> box currently survives a couple days before I hit some races to do
> with vnode exhaustion and a lack of handling there, and ZFS deadlocks.
> I'll just run this up to see if anything unexpected happens that
> causes it to blow up in a different way.
>
> Thanks,
>
>
>
> -adrian
>
>
> On 18 November 2013 11:55, Alexander Motin <mav@freebsd.org> wrote:
>> On 18.11.2013 21:11, Jeff Roberson wrote:
>>>
>>> On Mon, 18 Nov 2013, Alexander Motin wrote:
>>>>
>>>> I've created patch, based on earlier work of avg@, to add back
>>>> pressure to UMA allocation caches. The problem of physical memory or
>>>> KVA exhaustion existed there for many years and it is quite critical
>>>> now for improving systems performance while keeping stability. Changes
>>>> done in memory allocation last years improved situation. but haven't
>>>> fixed completely. My patch solves remaining problems from two sides:
>>>> a) reducing bucket sizes every time system detects low memory
>>>> condition; and b) as last-resort mechanism for very low memory
>>>> condition, it cycling over all CPUs to purge their per-CPU UMA caches.
>>>> Benefit of this approach is in absence of any additional hard-coded
>>>> limits on cache sizes -- they are self-tuned, based on load and memory
>>>> pressure.
>>>>
>>>> With this change I believe it should be safe enough to enable UMA
>>>> allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for
>>>> amd64). I did many tests on machine with 24 logical cores (and as
>>>> result strong allocation cache effects), and can say that with 40GB
>>>> RAM using UMA caches, allowed by this change, by two times increases
>>>> results of SPEC NFS benchmark on ZFS pool of several SSDs. To test
>>>> system stability I've run the same test with physical memory limited
>>>> to just 2GB and system successfully survived that, and even showed
>>>> results 1.5 times better then with just last resort measures of b). In
>>>> both cases tools/umastat no longer shows unbound UMA cache growth,
>>>> that makes me believe in viability of this approach for longer runs.
>>>>
>>>> I would like to hear some comments about that:
>>>> http://people.freebsd.org/~mav/uma_pressure.patch
>>>
>>>
>>> Hey Mav,
>>>
>>> This is a great start and great results.  I think it could probably even
>>> go in as-is, but I have a few suggestions.
>>
>>
>> Hey! Thanks for your review. I appreciate.
>>
>>
>>> First, let's test this with something that is really super allocator
>>> heavy and doesn't benefit much from bucket sizing.  For example, a
>>> network forwarding test.  Or maybe you could get someone like Netflix
>>> that is using it to push a lot of bits with less filesystem cost than
>>> zfs and spec.
>>
>>
>> I am not sure what simple forwarding may show in this case. Even on my
>> workload with ZFS creating strong memory pressure I still have mbuf* zones
>> buckets almost (some totally) maxed out. Without other major (or even any)
>> pressure in system they just can't become bigger then maximum. But if you
>> can propose some interesting test case with pressure that I can reproduce --
>> I am all ears.
>>
>>
>>> Second, the cpu binding is a very costly and very high-latency
>>> operation. It would make sense to do CPU_FOREACH and then ZONE_FOREACH.
>>> You're also biasing the first zones in the list.  The low memory
>>> condition will more often clear after you check these first zones.  So
>>> you might just check it once and equally penalize all zones.  I'm
>>> concerned that doing CPU_FOREACH in every zone will slow the pagedaemon
>>> more.
>>
>>
>> I completely agree with all you said here. This part of code I just took
>> as-is from earlier work. It definitely can be improved. I'll take a look on
>> that. But as I have mentioned in one of earlier responses that code used in
>> _very_ rare cases, unless system is heavily overloaded on memory, like doing
>> ZFS on box with 24 cores and 2GB RAM. During reasonable operation it is
>> enough to have soft back pressure to keep on caches in shape and never call
>> that.
>>
>>
>>> We also have been working towards per-domain pagedaemons so
>>> perhaps we should have a uma-reclaim taskqueue that we wake up to do the
>>> work?
>>
>>
>> VM is not my area so far, so please propose "the right way". I took this
>> task now only because I have to due to huge performance bottleneck this
>> problem causes and years it remains unsolved.
>>
>>
>>> Third, using vm_page_count_min() will only trigger when the pageout
>>> daemon can't keep up with the free target.  Typically this should only
>>> happen with a lot of dirty mmap'd pages or incredibly high system load
>>> coupled with frequent allocations.  So there may be many cases where
>>> reclaiming the extra UMA memory is helpful but the pagedaemon can still
>>> keep up while pushing out file pages that we'd prefer to keep.
>>
>>
>> As I have told that is indeed last resort. It does not need to be done
>> often. Per-CPU caches just should not grow without real need to the point
>> when they have to be cleaned.
>>
>>
>>> I think the perfect heuristic would have some idea of how likely the UMA
>>> pages are to be re-used immediately so we can more effectively tradeoff
>>> between file pages and kernel memory cache.  As it is now we limit the
>>> uma_reclaim() calls to every 10 seconds when there is memory pressure.
>>> Perhaps we could keep a timestamp for when the last slab was allocated
>>> to a zone and do the more expensive reclaim on zones who have timestamps
>>> that exceed some threshold?  Then have a lower threshold for reclaiming
>>> at all? Again, it doesn't need to be perfect, but I believe we can catch
>>> a wider set of cases by carefully scheduling this.
>>
>>
>> I was thinking about that too. But I think timestamps should be set not on
>> slab, but on bucket. The fact that zone is not allocating new slabs does not
>> mean it does not use its already allocated buckets. If we put time of the
>> last refill into each bucket, then we should be able to purge all buckets,
>> unused for specified period of time. Additionally we could put timestamp on
>> zone and update it every time zone runs out of its cache. If zone does not
>> run out of cache for some time -- probably it has unused buckets. So when we
>> need some RAM we should take a first look on zones that had stale timestamp.
>>
>>
>> --
>> Alexander Motin
>> _______________________________________________
>> freebsd-current@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
>

From owner-freebsd-hackers@FreeBSD.ORG  Tue Nov 19 04:02:59 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3300CEBE
 for <freebsd-hackers@freebsd.org>; Tue, 19 Nov 2013 04:02:59 +0000 (UTC)
Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com
 [209.85.220.52])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0B8DA20D1
 for <freebsd-hackers@freebsd.org>; Tue, 19 Nov 2013 04:02:58 +0000 (UTC)
Received: by mail-pa0-f52.google.com with SMTP id ld10so3134800pab.25
 for <freebsd-hackers@freebsd.org>; Mon, 18 Nov 2013 20:02:58 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id
 :references:user-agent:mime-version:content-type;
 bh=HMXfPC5MgeeZiW4Xl2FDvu4fohfBX8Z/xq8+r8X8A1g=;
 b=lFy8rjEZVEkfDcqPds/O5rZmo9LAwh/tSXb4pKJGMBDC1cBSpGY9MfUx/yb0reBfjt
 0zVmBqHe2ZCe9yfG1UvG749/EZSYWA0m43aMOK5zEN7hytcjpyW1b93bUMCsVN5iKGDO
 8rk2QrBqow/Unvxs8wF4OiyH9k4k/rmj98Y1ZW4cMmM6526WCQlDFIZNIaqBu6i3owJl
 Z/Rospl+ZR8EeAx7dJaba1b2usx7/LnuEmgu6t55hGJMzMwoMLZn7WyKoSkBaxYt2jzR
 XQDuuQvpy/xkjapCYg5sbC14zToUIZLHr5s77RFu2hwtwCSXYJ1/8DjC6lIQN8FnriXx
 6+Sw==
X-Gm-Message-State: ALoCoQllvG6lUCJKhp0sNGWJ4SAWnSgG+WvvpBAAoqzY1HIlcexTqvCsHVkQ0v0pYi7tU5cUXAGT
X-Received: by 10.68.218.3 with SMTP id pc3mr16807146pbc.71.1384833293474;
 Mon, 18 Nov 2013 19:54:53 -0800 (PST)
Received: from rrcs-66-91-135-210.west.biz.rr.com
 (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210])
 by mx.google.com with ESMTPSA id gg10sm26972304pbc.46.2013.11.18.19.54.51
 for <multiple recipients>
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Mon, 18 Nov 2013 19:54:52 -0800 (PST)
Date: Mon, 18 Nov 2013 17:50:54 -1000 (HST)
From: Jeff Roberson <jroberson@jroberson.net>
X-X-Sender: jroberson@desktop
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: UMA cache back pressure
In-Reply-To: <528A70A2.4010308@FreeBSD.org>
Message-ID: <alpine.BSF.2.00.1311181740060.2109@desktop>
References: <52894C92.60905@FreeBSD.org>
 <alpine.BSF.2.00.1311180857540.2109@desktop> <528A70A2.4010308@FreeBSD.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Tue, 19 Nov 2013 04:10:54 +0000
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 04:02:59 -0000


On Mon, 18 Nov 2013, Alexander Motin wrote:

> On 18.11.2013 21:11, Jeff Roberson wrote:
>> On Mon, 18 Nov 2013, Alexander Motin wrote:
>>> I've created patch, based on earlier work of avg@, to add back
>>> pressure to UMA allocation caches. The problem of physical memory or
>>> KVA exhaustion existed there for many years and it is quite critical
>>> now for improving systems performance while keeping stability. Changes
>>> done in memory allocation last years improved situation. but haven't
>>> fixed completely. My patch solves remaining problems from two sides:
>>> a) reducing bucket sizes every time system detects low memory
>>> condition; and b) as last-resort mechanism for very low memory
>>> condition, it cycling over all CPUs to purge their per-CPU UMA caches.
>>> Benefit of this approach is in absence of any additional hard-coded
>>> limits on cache sizes -- they are self-tuned, based on load and memory
>>> pressure.
>>> 
>>> With this change I believe it should be safe enough to enable UMA
>>> allocation caches in ZFS via vfs.zfs.zio.use_uma tunable (at least for
>>> amd64). I did many tests on machine with 24 logical cores (and as
>>> result strong allocation cache effects), and can say that with 40GB
>>> RAM using UMA caches, allowed by this change, by two times increases
>>> results of SPEC NFS benchmark on ZFS pool of several SSDs. To test
>>> system stability I've run the same test with physical memory limited
>>> to just 2GB and system successfully survived that, and even showed
>>> results 1.5 times better then with just last resort measures of b). In
>>> both cases tools/umastat no longer shows unbound UMA cache growth,
>>> that makes me believe in viability of this approach for longer runs.
>>> 
>>> I would like to hear some comments about that:
>>> http://people.freebsd.org/~mav/uma_pressure.patch
>> 
>> Hey Mav,
>> 
>> This is a great start and great results.  I think it could probably even
>> go in as-is, but I have a few suggestions.
>
> Hey! Thanks for your review. I appreciate.

And I appreciate more people being interested in working on the allocator.

>
>> First, let's test this with something that is really super allocator
>> heavy and doesn't benefit much from bucket sizing.  For example, a
>> network forwarding test.  Or maybe you could get someone like Netflix
>> that is using it to push a lot of bits with less filesystem cost than
>> zfs and spec.
>
> I am not sure what simple forwarding may show in this case. Even on my 
> workload with ZFS creating strong memory pressure I still have mbuf* zones 
> buckets almost (some totally) maxed out. Without other major (or even any) 
> pressure in system they just can't become bigger then maximum. But if you can 
> propose some interesting test case with pressure that I can reproduce -- I am 
> all ears.

I think part of that is also because you're using min free pages right now 
as your threshold.  It should probably be triggering slightly more often.

>
>> Second, the cpu binding is a very costly and very high-latency
>> operation. It would make sense to do CPU_FOREACH and then ZONE_FOREACH.
>> You're also biasing the first zones in the list.  The low memory
>> condition will more often clear after you check these first zones.  So
>> you might just check it once and equally penalize all zones.  I'm
>> concerned that doing CPU_FOREACH in every zone will slow the pagedaemon
>> more.
>
> I completely agree with all you said here. This part of code I just took 
> as-is from earlier work. It definitely can be improved. I'll take a look on 
> that. But as I have mentioned in one of earlier responses that code used in 
> _very_ rare cases, unless system is heavily overloaded on memory, like doing 
> ZFS on box with 24 cores and 2GB RAM. During reasonable operation it is 
> enough to have soft back pressure to keep on caches in shape and never call 
> that.
>
>> We also have been working towards per-domain pagedaemons so
>> perhaps we should have a uma-reclaim taskqueue that we wake up to do the
>> work?
>
> VM is not my area so far, so please propose "the right way". I took this task 
> now only because I have to due to huge performance bottleneck this problem 
> causes and years it remains unsolved.

Well it's probably fine to keep abusing the first domain's pageout daemon 
for now but we won't want to in the future, especially if we want to keep 
each domain's page daemon on the socket that it's managing.

>
>> Third, using vm_page_count_min() will only trigger when the pageout
>> daemon can't keep up with the free target.  Typically this should only
>> happen with a lot of dirty mmap'd pages or incredibly high system load
>> coupled with frequent allocations.  So there may be many cases where
>> reclaiming the extra UMA memory is helpful but the pagedaemon can still
>> keep up while pushing out file pages that we'd prefer to keep.
>
> As I have told that is indeed last resort. It does not need to be done often. 
> Per-CPU caches just should not grow without real need to the point when they 
> have to be cleaned.

Let me explain it differently.  Right now you're handling cases of 
overloaded CPU, if we run this code under different conditions we could 
handle overloaded memory better as well.  Imagine a system which has 
oversized buckets and lots of wasted memory but a pageout daemon which is 
still meeting targets by evicting page cache pages.  Perhaps there was a 
temporary use of some very large zones which is no longer necessary. 
Since we meet the paging target quickly enough we will never discover this 
other memory that we can evict.

Look at the vm page targets.  The target is very far from the min.  So 
typically the thread just wakes up and evicts clean pages very quickly to 
accommodate this.  ZFS is particularly affected because its pages can't be 
evicted by the page daemon, so you're more likely to run out, but other 
systems would benefit from this and they do have pages which could be 
evicted where you'd like to preserve them by trimming the uma cache.

Does that make sense?

>
>> I think the perfect heuristic would have some idea of how likely the UMA
>> pages are to be re-used immediately so we can more effectively tradeoff
>> between file pages and kernel memory cache.  As it is now we limit the
>> uma_reclaim() calls to every 10 seconds when there is memory pressure.
>> Perhaps we could keep a timestamp for when the last slab was allocated
>> to a zone and do the more expensive reclaim on zones who have timestamps
>> that exceed some threshold?  Then have a lower threshold for reclaiming
>> at all? Again, it doesn't need to be perfect, but I believe we can catch
>> a wider set of cases by carefully scheduling this.
>
> I was thinking about that too. But I think timestamps should be set not on 
> slab, but on bucket. The fact that zone is not allocating new slabs does not 
> mean it does not use its already allocated buckets. If we put time of the 
> last refill into each bucket, then we should be able to purge all buckets, 
> unused for specified period of time. Additionally we could put timestamp on 
> zone and update it every time zone runs out of its cache. If zone does not 
> run out of cache for some time -- probably it has unused buckets. So when we 
> need some RAM we should take a first look on zones that had stale timestamp.

Many healthy flow control algorithms maintain a relatively steady state by 
periodically testing the edges.  I would prefer to maintain the timestamp 
on a per-zone basis and not per-bucket anyway as it saves some space and 
we'd have to resize all the buckets if we take up another pointers space.

Anyway, I'm not too dogmatic about it.  There are probably several 
convenient ways to write it and no perfect one.

May I suggest that you make the change to only FOREACH_CPU once and then 
commit with your current heuristic.  Then we can try to take it one step 
further?

Thanks,
Jeff

>
> -- 
> Alexander Motin
>

From owner-freebsd-hackers@FreeBSD.ORG  Tue Nov 19 07:49:33 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C0DC5B1F;
 Tue, 19 Nov 2013 07:49:33 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 60FDB2A46;
 Tue, 19 Nov 2013 07:49:33 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAJ7nNck031698;
 Tue, 19 Nov 2013 09:49:23 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAJ7nNck031698
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id rAJ7nMiU031696;
 Tue, 19 Nov 2013 09:49:22 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Tue, 19 Nov 2013 09:49:22 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Doug Ambrisko <ambrisko@ambrisko.com>
Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs
Message-ID: <20131119074922.GY59496@kib.kiev.ua>
References: <51B3B59B.8050903@erdgeist.org>
 <CAMBSHm8GMWffuuEcSpuNu26Mv4N2yAa2iEdw5koiXx0w30zPRQ@mail.gmail.com>
 <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org>
 <20131115010854.GA76106@ambrisko.com>
 <20131116183129.GD59496@kib.kiev.ua>
 <20131118190142.GA28210@ambrisko.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="rmCyz0CRE2AtwE2l"
Content-Disposition: inline
In-Reply-To: <20131118190142.GA28210@ambrisko.com>
User-Agent: Mutt/1.5.22 (2013-10-16)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-hackers@freebsd.org, Dirk Engling <erdgeist@erdgeist.org>,
 Jase Thew <jase@freebsd.org>, mdf@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 07:49:33 -0000


--rmCyz0CRE2AtwE2l
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Nov 18, 2013 at 11:01:42AM -0800, Doug Ambrisko wrote:
> On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote:
> | Generally, I agree with the approach, but what is done seems to be too
> | simple to be usable.
>=20
> I like the simplicity and I'd like to see examples of not being usable.
I did exactly this in the text following the introductionary sentence,
isn't it ?

> =20
> | One obvious and important thing which is broken with the patch is
> | the unmounts from jails. In other words, now it is possible to mount
> | something from jail with appropriate privileges set up, and after that
> | corresponding mount cannot be unmounted, since vfs_mount_alloc() copies
> | trimmed path into f_mntonname, and sys_unmount() matches full path with
> | pathbuf.  Hmm, this should be broken in the same way for non-jailed
> | mounts with pathes which do not fit into f_mntonname.
>=20
> They can be umounted since it will fall back to fsid as in the non-jail
> case.  I just tried it sorry for the bad line wrap:
>=20
> + mount 192.168.38.1:/data/home/ambrisko/netboot /data/jail/test
> + jail -i -c name=3Dtest path=3D/data/jail/test host.hostname=3Dtest.ambr=
isko.com persist enforce_statfs=3D0 allow.mount=3D1 allow.mount.devfs=3D1 a=
llow.mount.nullfs=3D1 allow.mount.tmpfs=3D1 allow.mount.procfs=3D1
> 14
> + jexec test mkdir -p /12345678901234567890123456789012345678901234567890=
12345678901234567890123456789012345678901234567890/proc
> + jexec test mkdir -p /12345678901234567890123456789012345678901234567890=
12345678901234567890123456789012345678901234567890/dev
> + jexec test df
> + egrep '^devfs|^procfs'
> devfs                                             2         2          0 =
  100%    /dev
> procfs                                            8         8          0 =
  100%    /proc
> + jexec test mount -t procfs null //1234567890123456789012345678901234567=
890123456789012345678901234567890123456789012345678901234567890/proc
> + jexec test mount -t devfs null //12345678901234567890123456789012345678=
90123456789012345678901234567890123456789012345678901234567890/dev
> + jexec test df
> + egrep '^devfs|^procfs'
> devfs                                             2         2          0 =
  100%    /dev
> procfs                                            8         8          0 =
  100%    /proc
> procfs                                            8         8          0 =
  100%    /data/jail/test/1234567890123456789012345678901234567890123456789=
0123456789012345678901
> devfs                                             2         2          0 =
  100%    /data/jail/test/1234567890123456789012345678901234567890123456789=
0123456789012345678901
> + jexec test mount -v
> + egrep '^devfs|^procfs'
> devfs on /dev (devfs, local, multilabel, fsid 00ff007171000000)
> procfs on /proc (procfs, local, fsid 02ff000202000000)
> procfs on /data/jail/test/12345678901234567890123456789012345678901234567=
890123456789012345678901 (procfs, local, fsid 26ff000202000000)
> devfs on /data/jail/test/123456789012345678901234567890123456789012345678=
90123456789012345678901 (devfs, local, multilabel, fsid 27ff007171000000)
> + jexec test umount /1234567890123456789012345678901234567890123456789012=
345678901234567890123456789012345678901234567890/proc
> + jexec test df
> + egrep '^devfs|^procfs'
> devfs                                             2         2          0 =
  100%    /dev
> procfs                                            8         8          0 =
  100%    /proc
> devfs                                             2         2          0 =
  100%    /data/jail/test/1234567890123456789012345678901234567890123456789=
0123456789012345678901
> + jexec test umount /1234567890123456789012345678901234567890123456789012=
345678901234567890123456789012345678901234567890/dev
> + jexec test df
> + egrep '^devfs|^procfs'
> devfs                                             2         2          0 =
  100%    /dev
> procfs                                            8         8          0 =
  100%    /proc

I.e. unmount gets EINVAL, right ? I do not like it, if going this route,
why do we need to store the path in the kernel at all ? At least, the
attempt to unmount by path should consistently return EINVAL always,
instead of failing randomly due to an implementation detail, where the
caller can reasonably expect the syscall to succeed.

>=20
> | I think that struct mount should have a const char * field where the
> | non-trimmed path is stored and used for match at unmount. f_mntonname
> | truncation would be only unfortunate user interface glitch.
>=20
> Note that we are not storing the path in mount structure so no structures
> have changed which is nice since then we haven't introduced any real
> ABI breakage.  So we could MFC this.  The match isn't critical since
> umount will fall back to fsid and work.  One thing that might be good to
> do is change umount to try to umount via fsid first and then do the
> match if the fsid failed versus the other way round that it does now.
I do not like somtimes not storing the full path of the mount point.
I do understand that the path can easily made invalid, but I still want
it there.

MFC is not the problem for struct mount, which is never directly allocated
by non-VFS.  The new member must be added to the end of the structure, which
preserves KBI. I did such surgery more than once.

>=20
> The problem I see is if someone tries to do things based on the parsed
> output of mount/df then that will fail since the output is truncated.
Yes, this is understandable and IMO acceptable.


--rmCyz0CRE2AtwE2l
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSixgBAAoJEJDCuSvBvK1B1S4P/ilstNB84CaFfE1r+III3xkU
1X7eekSDOOz2A98mvW2pr4cqusf6Xx2J/feSIoKTqWs9yBcfGZjZt9l+i5d0C5t3
8JayjS/1sEYnbns1w4C34LoiVOHBRN1VAwS9XKQDcAvEcIpHFrYHxChHlkpsxRe8
+cz5hl2U9gRS6RjKHQJpC5OyskhMwXqjbbvJsvo37YEk0mYPAS9HvjBGilNgTph4
e9/a/ophP0AOF72KSMgaat5WT+37+x/ja6wBz+I3GWXjz0QgueuK/TIj/f3NFQpI
pJwYwwtkvZ4pcxv1ELv4ZwShHGRpI5HiUbRw9M1dTJgy3vTQ3pOWTxKqkH8XGkSq
N+vyww2toBBSCjL++UOaIaq7YPamR7jbeu2cNyQAbm8xh8fxwzZRmjOo1EevHR3r
NfRkjLhlQYszbR7LPmEyfZBYkJUAUivkXlYqhszQ0H5usUk+lBa9PzblMvrO3XU+
o2oA+aqGioRZmm9JlwKsqIIYgA8aQyZzAxRDOrgDurDxtD4fUyTNks78mIocwse6
n9hvLTXCED9Oc7OPW8rBnyetGLX0YpCBsoN/E+1TCOOkuZHvmG78LY1Ofp8yB8zK
EE66xfKpw6K7sQNSH6fsnbC/U9i1t9QVvmPG0UHKllR8yYAE/yJu86NUdytYWJRS
JvTzu0zVvmwXjvlggau5
=OSsC
-----END PGP SIGNATURE-----

--rmCyz0CRE2AtwE2l--

From owner-freebsd-hackers@FreeBSD.ORG  Tue Nov 19 14:33:19 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CE331D71;
 Tue, 19 Nov 2013 14:33:19 +0000 (UTC)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 90B8A240A;
 Tue, 19 Nov 2013 14:33:18 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA07017;
 Tue, 19 Nov 2013 16:33:10 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1VimMX-000Kce-Ts; Tue, 19 Nov 2013 16:33:09 +0200
Message-ID: <528B7681.6090806@FreeBSD.org>
Date: Tue, 19 Nov 2013 16:32:33 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: freebsd-hackers@FreeBSD.org
Subject: Fwd: taskqueue_block
References: <5287BDB9.10201@FreeBSD.org>
In-Reply-To: <5287BDB9.10201@FreeBSD.org>
X-Enigmail-Version: 1.6
X-Forwarded-Message-Id: <5287BDB9.10201@FreeBSD.org>
Content-Type: text/plain; charset=x-viet-vps
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 14:33:19 -0000


Forwarding this to the larger audience for a discussion.

-------- Original Message --------
Message-ID: <5287BDB9.10201@FreeBSD.org>
Date: Sat, 16 Nov 2013 20:47:21 +0200
From: Andriy Gapon <avg@FreeBSD.org>
Subject: taskqueue_block



It seems that either I do not understand something about taskqueue_block code or
it is a quite dangerous and abused API.  The fact that it is not properly
documented does not help either.

The commit message said:
> Implement taskqueue_block() and taskqueue_unblock(). These functions allow the
> owner of a queue to block and unblock execution of the tasks in the queue while
> allowing tasks to continue to be added queue. Combining this with
> taskqueue_drain() allows a queue to be safely disabled. The unblock function may
[...]

I indeed see this (anti?) pattern being used in the code.
But what about the following case.   One thread calls taskqueue_block() and sets
TQ_FLAGS_BLOCKED.  Another thread calls taskqueue_enqueue, this adds a task to
the queue and sets ta_pending of the task to 1.  tq_enqueue is not called, so an
actual queue runner is not called or waken up.   Then the first thread calls
taskqueue_drain() on the task.  As far as I can see, the thread would then just
wait forever because the task is pending and is not going to be executed.

Additionally, it is impossible to reason about the taskqueue's state after
taskqueue_block call, because the call just sets the flag and does not do any
synchronization.  And as described above, it is not safe to call APIs that could
allow the taskqueue or the task state to become known.

I think that taskqueue_block() should wait on the currently active tasks to
complete.  I don't think that this behavior could be optional.  I do see any
reasonable and safe use for "non-blocking" taskqueue_block().
taskqueue_drain() calls after taskqueue_block() must be removed.  The code
should either use taskqueue_drain() or "blocking" taskqueue_block() depending on
concrete circumstances.

What do you think?
Thank you.
-- 
Andriy Gapon



From owner-freebsd-hackers@FreeBSD.ORG  Tue Nov 19 17:42:24 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0CB1B15A;
 Tue, 19 Nov 2013 17:42:24 +0000 (UTC)
Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90])
 by mx1.freebsd.org (Postfix) with ESMTP id C55C32F71;
 Tue, 19 Nov 2013 17:42:23 +0000 (UTC)
X-Ambrisko-Me: Yes
Received: from server2.ambrisko.com (HELO internal.ambrisko.com)
 ([192.168.1.2])
 by ironport.ambrisko.com with ESMTP; 19 Nov 2013 09:46:09 -0800
Received: from ambrisko.com (localhost [127.0.0.1])
 by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id rAJHgGs7006486;
 Tue, 19 Nov 2013 09:42:16 -0800 (PST)
 (envelope-from ambrisko@ambrisko.com)
Received: (from ambrisko@localhost)
 by ambrisko.com (8.14.4/8.14.4/Submit) id rAJHgGmT006464;
 Tue, 19 Nov 2013 09:42:16 -0800 (PST) (envelope-from ambrisko)
Date: Tue, 19 Nov 2013 09:42:16 -0800
From: Doug Ambrisko <ambrisko@ambrisko.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs
Message-ID: <20131119174216.GA80753@ambrisko.com>
References: <51B3B59B.8050903@erdgeist.org>
 <CAMBSHm8GMWffuuEcSpuNu26Mv4N2yAa2iEdw5koiXx0w30zPRQ@mail.gmail.com>
 <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org>
 <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua>
 <20131118190142.GA28210@ambrisko.com> <20131119074922.GY59496@kib.kiev.ua>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20131119074922.GY59496@kib.kiev.ua>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-hackers@freebsd.org, Dirk Engling <erdgeist@erdgeist.org>,
 Jase Thew <jase@freebsd.org>, mdf@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 17:42:24 -0000

On Tue, Nov 19, 2013 at 09:49:22AM +0200, Konstantin Belousov wrote:
| On Mon, Nov 18, 2013 at 11:01:42AM -0800, Doug Ambrisko wrote:
| > On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote:
| > | Generally, I agree with the approach, but what is done seems to be too
| > | simple to be usable.
| > 
| > I like the simplicity and I'd like to see examples of not being usable.
| I did exactly this in the text following the introductionary sentence,
| isn't it ?

I thought you were implying more then the one example that you gave.
   
| > | One obvious and important thing which is broken with the patch is
| > | the unmounts from jails. In other words, now it is possible to mount
| > | something from jail with appropriate privileges set up, and after that
| > | corresponding mount cannot be unmounted, since vfs_mount_alloc() copies
| > | trimmed path into f_mntonname, and sys_unmount() matches full path with
| > | pathbuf.  Hmm, this should be broken in the same way for non-jailed
| > | mounts with pathes which do not fit into f_mntonname.
| > 
| > They can be umounted since it will fall back to fsid as in the non-jail
| > case.  I just tried it sorry for the bad line wrap:
| > 
| > + mount 192.168.38.1:/data/home/ambrisko/netboot /data/jail/test
| > + jail -i -c name=test path=/data/jail/test host.hostname=test.ambrisko.com persist enforce_statfs=0 allow.mount=1 allow.mount.devfs=1 allow.mount.nullfs=1 allow.mount.tmpfs=1 allow.mount.procfs=1
| > 14
| > + jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc
| > + jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev
| > + jexec test df
| > + egrep '^devfs|^procfs'
| > devfs                                             2         2          0   100%    /dev
| > procfs                                            8         8          0   100%    /proc
| > + jexec test mount -t procfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc
| > + jexec test mount -t devfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev
| > + jexec test df
| > + egrep '^devfs|^procfs'
| > devfs                                             2         2          0   100%    /dev
| > procfs                                            8         8          0   100%    /proc
| > procfs                                            8         8          0   100%    /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901
| > devfs                                             2         2          0   100%    /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901
| > + jexec test mount -v
| > + egrep '^devfs|^procfs'
| > devfs on /dev (devfs, local, multilabel, fsid 00ff007171000000)
| > procfs on /proc (procfs, local, fsid 02ff000202000000)
| > procfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (procfs, local, fsid 26ff000202000000)
| > devfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (devfs, local, multilabel, fsid 27ff007171000000)
| > + jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc
| > + jexec test df
| > + egrep '^devfs|^procfs'
| > devfs                                             2         2          0   100%    /dev
| > procfs                                            8         8          0   100%    /proc
| > devfs                                             2         2          0   100%    /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901
| > + jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev
| > + jexec test df
| > + egrep '^devfs|^procfs'
| > devfs                                             2         2          0   100%    /dev
| > procfs                                            8         8          0   100%    /proc
| 
| I.e. unmount gets EINVAL, right ? I do not like it, if going this route,
| why do we need to store the path in the kernel at all ?

For compatibility with old stuff that hasn't switch to fsid.  I'll
describe it below since it looks like umount(8) doesn't use it any more
unless fsid fails.

| At least, the
| attempt to unmount by path should consistently return EINVAL always,
| instead of failing randomly due to an implementation detail, where the
| caller can reasonably expect the syscall to succeed.

Yes, a failed match by is EINVAL, a failed match by fsid it ENOENT.

First I'm talking about the umount binary and I made a mistake describing
its behaviour.  I thought it was trying the path first when it actually
tried the fsid first returned from the stat structure.  If that fails
then it tries the path for older kernels:
        /* First try to unmount using the file system ID. */
        snprintf(fsidbuf, sizeof(fsidbuf), "FSID:%d:%d", sfs->f_fsid.val[0],
            sfs->f_fsid.val[1]);
        if (unmount(fsidbuf, fflag | MNT_BYFSID) != 0) {
                /* XXX, non-root users get a zero fsid, so don't warn. */
                if (errno != ENOENT || sfs->f_fsid.val[0] != 0 ||
                    sfs->f_fsid.val[1] != 0)
                        warn("unmount of %s failed", sfs->f_mntonname);
                if (errno != ENOENT) {
                        free(orignfsdirname);
                        return (1);
                }
                /* Compatibility for old kernels. */
                if (sfs->f_fsid.val[0] != 0 || sfs->f_fsid.val[1] != 0)
                        warnx("retrying using path instead of file system ID");
                if (unmount(sfs->f_mntonname, fflag) != 0) {
                        warn("unmount of %s failed", sfs->f_mntonname);
                        free(orignfsdirname);
                        return (1);
                }
        }

This was introduced at 1.38 in umount.c before 5.2 got released:
  When mount(8) is invoked with the `-v' flag, display the filesystem
  ID for each file system in addition to the normal information.

  In umount(8), accept filesystem IDs as well as the usual device and
  path names. This makes it possible to unambiguously specify which
  file system is to be unmounted even when two or more file systems
  share the same device and mountpoint names (e.g. NFS mounts from
  the same export into different chroots).
and refined in 1.39.

This doesn't address your concern about the system call unmount.

| > | I think that struct mount should have a const char * field where the
| > | non-trimmed path is stored and used for match at unmount. f_mntonname
| > | truncation would be only unfortunate user interface glitch.
| > 
| > Note that we are not storing the path in mount structure so no structures
| > have changed which is nice since then we haven't introduced any real
| > ABI breakage.  So we could MFC this.  The match isn't critical since
| > umount will fall back to fsid and work.  One thing that might be good to
| > do is change umount to try to umount via fsid first and then do the
| > match if the fsid failed versus the other way round that it does now.
| I do not like somtimes not storing the full path of the mount point.
| I do understand that the path can easily made invalid, but I still want
| it there.
| 
| MFC is not the problem for struct mount, which is never directly allocated
| by non-VFS.  The new member must be added to the end of the structure, which
| preserves KBI. I did such surgery more than once.

I was talking about the more general case since the system tries to keep
the path in the stat structure.  My prior approach which had more issues
was to modify the stat structure of which I was pointed to NetBSD and their
change to statvfs which doesn't really solve the problem.  They don't
have the check to see if the mount is longer then VFS_MNAMELEN (in their case)
and just truncate things.

If we are just talking about adding it to the mount structure that
would be okay since it isn't exposed to user land.  I can add that.
 
| > The problem I see is if someone tries to do things based on the parsed
| > output of mount/df then that will fail since the output is truncated.
| Yes, this is understandable and IMO acceptable.

I think we might be able to fix that in the future by populating our
bogus statvfs with more valid values for paths.  With your suggestion
we could populate the statvfs on the fly with a value from the mount
structure.  Then convert df and mount to use statvfs.

Thanks,

Doug A.

From owner-freebsd-hackers@FreeBSD.ORG  Tue Nov 19 18:50:27 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3164ECE6;
 Tue, 19 Nov 2013 18:50:27 +0000 (UTC)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 1C30023F4;
 Tue, 19 Nov 2013 18:50:25 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA10663;
 Tue, 19 Nov 2013 20:50:23 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1ViqNT-000KrX-Fy; Tue, 19 Nov 2013 20:50:23 +0200
Message-ID: <528BB2B7.8060908@FreeBSD.org>
Date: Tue, 19 Nov 2013 20:49:27 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: freebsd-hackers@FreeBSD.org, FreeBSD Current <freebsd-current@FreeBSD.org>
Subject: provide fast versions of ffsl and flsl for i386; ffsll and flsll
 for amd64
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 18:50:27 -0000


These are just trivial wrappers based on the fact that int and long on i386 have
the same "bit layout" and likewise for long and long long on amd64.
For your reviewing pleasure :-)
Thanks!

commit fdc1228b113f8b4c9dbda2b0323cb087c6b6df9d
Author: Andriy Gapon <avg@icyb.net.ua>
Date:   Thu Nov 7 19:13:00 2013 +0200

    provide fast versions of ffsl and flsl for i386; ffsll and flsll for amd64
diff --git a/sys/amd64/include/cpufunc.h b/sys/amd64/include/cpufunc.h
index 5f8197b..7464739 100644
--- a/sys/amd64/include/cpufunc.h
+++ b/sys/amd64/include/cpufunc.h
@@ -154,6 +154,14 @@ ffsl(long mask)
 	return (mask == 0 ? mask : (int)bsfq((u_long)mask) + 1);
 }

+#define	HAVE_INLINE_FFSLL
+
+static __inline int
+ffsll(long long mask)
+{
+	return (ffsl((long)mask));
+}
+
 #define	HAVE_INLINE_FLS

 static __inline int
@@ -170,6 +178,14 @@ flsl(long mask)
 	return (mask == 0 ? mask : (int)bsrq((u_long)mask) + 1);
 }

+#define	HAVE_INLINE_FLSLL
+
+static __inline int
+flsll(long long mask)
+{
+	return (flsl((long)mask));
+}
+
 #endif /* _KERNEL */

 static __inline void
diff --git a/sys/conf/files b/sys/conf/files
index d41b9d2..8077bfc 100644
--- a/sys/conf/files
+++ b/sys/conf/files
@@ -3029,7 +3029,6 @@ libkern/arc4random.c		standard
 libkern/bcd.c			standard
 libkern/bsearch.c		standard
 libkern/crc32.c			standard
-libkern/flsll.c                 standard
 libkern/fnmatch.c		standard
 libkern/iconv.c			optional libiconv
 libkern/iconv_converter_if.m	optional libiconv
diff --git a/sys/conf/files.arm b/sys/conf/files.arm
index 603fb2d..d15f014 100644
--- a/sys/conf/files.arm
+++ b/sys/conf/files.arm
@@ -87,6 +87,7 @@ libkern/divdi3.c		standard
 libkern/ffsl.c			standard
 libkern/fls.c			standard
 libkern/flsl.c			standard
+libkern/flsll.c			standard
 libkern/lshrdi3.c		standard
 libkern/moddi3.c		standard
 libkern/qdivrem.c		standard
diff --git a/sys/conf/files.i386 b/sys/conf/files.i386
index 23e03a3..030dbe1 100644
--- a/sys/conf/files.i386
+++ b/sys/conf/files.i386
@@ -524,8 +524,7 @@ kern/kern_clocksource.c		standard
 kern/imgact_aout.c		optional compat_aout
 kern/imgact_gzip.c		optional gzip
 libkern/divdi3.c		standard
-libkern/ffsl.c			standard
-libkern/flsl.c			standard
+libkern/flsll.c			standard
 libkern/memmove.c		standard
 libkern/memset.c		standard
 libkern/moddi3.c		standard
diff --git a/sys/conf/files.ia64 b/sys/conf/files.ia64
index 6719c98..e85c35d 100644
--- a/sys/conf/files.ia64
+++ b/sys/conf/files.ia64
@@ -120,6 +120,7 @@ libkern/bcmp.c			standard
 libkern/ffsl.c			standard
 libkern/fls.c			standard
 libkern/flsl.c			standard
+libkern/flsll.c			standard
 libkern/ia64/__divdi3.S		standard
 libkern/ia64/__divsi3.S		standard
 libkern/ia64/__moddi3.S		standard
diff --git a/sys/conf/files.mips b/sys/conf/files.mips
index 82d9a69..6522bb2 100644
--- a/sys/conf/files.mips
+++ b/sys/conf/files.mips
@@ -56,6 +56,7 @@ kern/subr_dummy_vdso_tc.c		standard
 libkern/ffsl.c				standard
 libkern/fls.c				standard
 libkern/flsl.c				standard
+libkern/flsll.c				standard
 libkern/memmove.c			standard
 libkern/cmpdi2.c			optional	mips | mipsel
 libkern/ucmpdi2.c			optional	mips | mipsel
diff --git a/sys/conf/files.pc98 b/sys/conf/files.pc98
index fd3ad4a..c95d956 100644
--- a/sys/conf/files.pc98
+++ b/sys/conf/files.pc98
@@ -210,6 +210,7 @@ kern/imgact_gzip.c		optional gzip
 libkern/divdi3.c		standard
 libkern/ffsl.c			standard
 libkern/flsl.c			standard
+libkern/flsll.c			standard
 libkern/memmove.c		standard
 libkern/memset.c		standard
 libkern/moddi3.c		standard
diff --git a/sys/conf/files.powerpc b/sys/conf/files.powerpc
index 6d90fc7..98b3da0 100644
--- a/sys/conf/files.powerpc
+++ b/sys/conf/files.powerpc
@@ -79,6 +79,7 @@ libkern/ffs.c			standard
 libkern/ffsl.c			standard
 libkern/fls.c			standard
 libkern/flsl.c			standard
+libkern/flsll.c			standard
 libkern/lshrdi3.c		optional	powerpc
 libkern/memmove.c		standard
 libkern/memset.c		standard
diff --git a/sys/conf/files.sparc64 b/sys/conf/files.sparc64
index 5c00350..ccee247 100644
--- a/sys/conf/files.sparc64
+++ b/sys/conf/files.sparc64
@@ -68,6 +68,7 @@ libkern/ffs.c			standard
 libkern/ffsl.c			standard
 libkern/fls.c			standard
 libkern/flsl.c			standard
+libkern/flsll.c			standard
 libkern/memmove.c		standard
 sparc64/central/central.c	optional	central
 sparc64/ebus/ebus.c		optional	ebus
diff --git a/sys/i386/include/cpufunc.h b/sys/i386/include/cpufunc.h
index 7cd3663..98f82f2 100644
--- a/sys/i386/include/cpufunc.h
+++ b/sys/i386/include/cpufunc.h
@@ -184,6 +184,14 @@ ffs(int mask)
 	 return (mask == 0 ? mask : (int)bsfl((u_int)mask) + 1);
 }

+#define	HAVE_INLINE_FFSL
+
+static __inline int
+ffsl(long mask)
+{
+	return (ffs((int)mask));
+}
+
 #define	HAVE_INLINE_FLS

 static __inline int
@@ -192,6 +200,14 @@ fls(int mask)
 	return (mask == 0 ? mask : (int)bsrl((u_int)mask) + 1);
 }

+#define	HAVE_INLINE_FLSL
+
+static __inline int
+flsl(long mask)
+{
+	return (fls((int)mask));
+}
+
 #endif /* _KERNEL */

 static __inline void

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Tue Nov 19 21:53:38 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9675B420;
 Tue, 19 Nov 2013 21:53:38 +0000 (UTC)
Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 590F12FAD;
 Tue, 19 Nov 2013 21:53:38 +0000 (UTC)
Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132])
 by mx1.stack.nl (Postfix) with ESMTP id 08160359316;
 Tue, 19 Nov 2013 22:53:35 +0100 (CET)
Received: by turtle.stack.nl (Postfix, from userid 1677)
 id F1085CB4E; Tue, 19 Nov 2013 22:53:34 +0100 (CET)
Date: Tue, 19 Nov 2013 22:53:34 +0100
From: Jilles Tjoelker <jilles@stack.nl>
To: Doug Ambrisko <ambrisko@ambrisko.com>
Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs
Message-ID: <20131119215334.GA30794@stack.nl>
References: <51B3B59B.8050903@erdgeist.org>
 <CAMBSHm8GMWffuuEcSpuNu26Mv4N2yAa2iEdw5koiXx0w30zPRQ@mail.gmail.com>
 <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org>
 <20131115010854.GA76106@ambrisko.com>
 <20131116183129.GD59496@kib.kiev.ua>
 <20131118190142.GA28210@ambrisko.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20131118190142.GA28210@ambrisko.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers@freebsd.org,
 Dirk Engling <erdgeist@erdgeist.org>, Jase Thew <jase@freebsd.org>,
 mdf@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Nov 2013 21:53:38 -0000

On Mon, Nov 18, 2013 at 11:01:42AM -0800, Doug Ambrisko wrote:
> On Sat, Nov 16, 2013 at 08:31:29PM +0200, Konstantin Belousov wrote:
> | I think that struct mount should have a const char * field where the
> | non-trimmed path is stored and used for match at unmount. f_mntonname
> | truncation would be only unfortunate user interface glitch.

> Note that we are not storing the path in mount structure so no structures
> have changed which is nice since then we haven't introduced any real
> ABI breakage.  So we could MFC this.  The match isn't critical since
> umount will fall back to fsid and work.  One thing that might be good to
> do is change umount to try to umount via fsid first and then do the
> match if the fsid failed versus the other way round that it does now.

> The problem I see is if someone tries to do things based on the parsed
> output of mount/df then that will fail since the output is truncated.

As noted in comments in sbin/umount/umount.c, the statfs() call is
deliberately after the mount list checks because it may block forever
for unresponsive NFS servers. It would be unfortunate if hung NFS
filesystems would have to be forcibly unmounted by copy/pasting the fsid
from 'mount -v'.

I like the idea of allowing longer mount paths in a simple way, though.

-- 
Jilles Tjoelker

From owner-freebsd-hackers@FreeBSD.ORG  Wed Nov 20 03:29:19 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4639F2B6;
 Wed, 20 Nov 2013 03:29:19 +0000 (UTC)
Received: from mail-qc0-x22f.google.com (mail-qc0-x22f.google.com
 [IPv6:2607:f8b0:400d:c01::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id EF859279A;
 Wed, 20 Nov 2013 03:29:18 +0000 (UTC)
Received: by mail-qc0-f175.google.com with SMTP id v14so581658qcr.20
 for <multiple recipients>; Tue, 19 Nov 2013 19:29:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=wjFe0RBd38gPvx7Bw5sjzObiqXdAH4uBkHG1bXKx1ww=;
 b=U1S8S19+K92GOasO33KHE9kRWYbcQxS10ZLYip7H76LGRd/8FfUQT9CCgBP0v2sZg4
 7NByFN0nxTnRnpmH3huyUOS+gAgkemwK1LhLphs5nBkbqmdjy+5p6vgvabl0RBu2g9o1
 UHyhTSI7G67lrFzfY97fnpTWyVVVJi5wIp7l8iQ6O18P0kB/oBPYXZ/DRmQ7d4a1K6mf
 KgzoNwqovGljXP//EU2NeEtUSHMYE85AAJuKPtOWEhHIygIEQmPqXYPxKcua4qKE8KMp
 +gizYug8RB+AWRvsLFvDNXQU9lqR4riJgQRrXDgLpWEHwGcoVdlNgexlwubsiyeyX0Cw
 YtqA==
MIME-Version: 1.0
X-Received: by 10.49.59.70 with SMTP id x6mr48743897qeq.17.1384918158152; Tue,
 19 Nov 2013 19:29:18 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.207.66 with HTTP; Tue, 19 Nov 2013 19:29:18 -0800 (PST)
In-Reply-To: <528B7681.6090806@FreeBSD.org>
References: <5287BDB9.10201@FreeBSD.org>
	<528B7681.6090806@FreeBSD.org>
Date: Tue, 19 Nov 2013 19:29:18 -0800
X-Google-Sender-Auth: I80uT54wLwfHYondPdSMYGqSQ0I
Message-ID: <CAJ-Vmon5AuBDO8q3uddSnvqBTq71r9vW66DAk9oVpLKUUbX0mA@mail.gmail.com>
Subject: Re: taskqueue_block
From: Adrian Chadd <adrian@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Nov 2013 03:29:19 -0000

Yes, and lets fix this. :)


-a

On 19 November 2013 06:32, Andriy Gapon <avg@freebsd.org> wrote:
>
> Forwarding this to the larger audience for a discussion.
>
> -------- Original Message --------
> Message-ID: <5287BDB9.10201@FreeBSD.org>
> Date: Sat, 16 Nov 2013 20:47:21 +0200
> From: Andriy Gapon <avg@FreeBSD.org>
> Subject: taskqueue_block
>
>
>
> It seems that either I do not understand something about taskqueue_block code or
> it is a quite dangerous and abused API.  The fact that it is not properly
> documented does not help either.
>
> The commit message said:
>> Implement taskqueue_block() and taskqueue_unblock(). These functions allow the
>> owner of a queue to block and unblock execution of the tasks in the queue while
>> allowing tasks to continue to be added queue. Combining this with
>> taskqueue_drain() allows a queue to be safely disabled. The unblock function may
> [...]
>
> I indeed see this (anti?) pattern being used in the code.
> But what about the following case.   One thread calls taskqueue_block() and sets
> TQ_FLAGS_BLOCKED.  Another thread calls taskqueue_enqueue, this adds a task to
> the queue and sets ta_pending of the task to 1.  tq_enqueue is not called, so an
> actual queue runner is not called or waken up.   Then the first thread calls
> taskqueue_drain() on the task.  As far as I can see, the thread would then just
> wait forever because the task is pending and is not going to be executed.
>
> Additionally, it is impossible to reason about the taskqueue's state after
> taskqueue_block call, because the call just sets the flag and does not do any
> synchronization.  And as described above, it is not safe to call APIs that could
> allow the taskqueue or the task state to become known.
>
> I think that taskqueue_block() should wait on the currently active tasks to
> complete.  I don't think that this behavior could be optional.  I do see any
> reasonable and safe use for "non-blocking" taskqueue_block().
> taskqueue_drain() calls after taskqueue_block() must be removed.  The code
> should either use taskqueue_drain() or "blocking" taskqueue_block() depending on
> concrete circumstances.
>
> What do you think?
> Thank you.
> --
> Andriy Gapon
>
>
>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Wed Nov 20 07:55:52 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3C0D4596;
 Wed, 20 Nov 2013 07:55:52 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1A33C2423;
 Wed, 20 Nov 2013 07:55:50 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAK7tXxw094013;
 Wed, 20 Nov 2013 09:55:33 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAK7tXxw094013
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id rAK7tVDp093989;
 Wed, 20 Nov 2013 09:55:31 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 20 Nov 2013 09:55:31 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Doug Ambrisko <ambrisko@ambrisko.com>
Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs
Message-ID: <20131120075531.GE59496@kib.kiev.ua>
References: <51B3B59B.8050903@erdgeist.org>
 <CAMBSHm8GMWffuuEcSpuNu26Mv4N2yAa2iEdw5koiXx0w30zPRQ@mail.gmail.com>
 <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org>
 <20131115010854.GA76106@ambrisko.com>
 <20131116183129.GD59496@kib.kiev.ua>
 <20131118190142.GA28210@ambrisko.com>
 <20131119074922.GY59496@kib.kiev.ua>
 <20131119174216.GA80753@ambrisko.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="mC3NEAINQo/WgN2+"
Content-Disposition: inline
In-Reply-To: <20131119174216.GA80753@ambrisko.com>
User-Agent: Mutt/1.5.22 (2013-10-16)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-hackers@freebsd.org, Dirk Engling <erdgeist@erdgeist.org>,
 Jase Thew <jase@freebsd.org>, mdf@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Nov 2013 07:55:52 -0000


--mC3NEAINQo/WgN2+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote:
> I was talking about the more general case since the system tries to keep
> the path in the stat structure.  My prior approach which had more issues
> was to modify the stat structure of which I was pointed to NetBSD and the=
ir
> change to statvfs which doesn't really solve the problem.  They don't
> have the check to see if the mount is longer then VFS_MNAMELEN (in their =
case)
> and just truncate things.
>=20
> If we are just talking about adding it to the mount structure that
> would be okay since it isn't exposed to user land.  I can add that.
Yes, this is exactly what I mean.  Add a struct mount field, and use
it for kernel only.  In fact, it only matters for sys_unmount() and
kern_jail.c, other locations in kernel use the path for warnings, and
this could be postponed if you prefer to minimize the patch.

--mC3NEAINQo/WgN2+
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSjGrzAAoJEJDCuSvBvK1BNYIQAIk75V37Pla/9LCW62TXNEuI
idymxdkG8Rnc0PKH3BfgtpJ+97qTZuI0GPFryyAuZjdT0DUFHni5LQ3lwsmlJJ5m
lLjaMEkZbGumMocAI311l+5n9BYiSNivwHdeJFl3uBA9yZSbK98n2QJJDdqK6CMk
LTaYCT0caoPacvJ8SbtfL0g9qqaGuE3t8ny+cBry+wSeS94PyDx+SzZ2vYLCyael
yLCzELHUzklQGpTuSU4e+sudr9km1y5pu60VpKiI46EB6kZLAe679PzP9VIBwgA+
fHlR2Q7NkgiETH1acAe6a8Qja6V2x+ETUHsVMTljyFuVKtYQJrT0l7M8swJKjBtG
PU16oCNPAfw6Rzz9+mFGqBAlFanoPVkb2l2C4fXzcPuyavlwJZQ2HE0b6i10uNFh
y53zmJYLHz0VZtUTcSOBdRrBbS5eInEckZyLzUBL3c/GUcSeZbdy+kTc+3DEFiMh
oSSQRsiwzebUB2woocbqFtxutySsUC9mNoA3o2JvPiWe+whj9PNPvlRK9+JJ4Wl/
i0oA1tBgC0AKuzp7M+jm6aIe8TnElxjirw/bfRU7+g1wsb3DPN5mEb85RHf2F3HB
49TIRCiJ/TzsSaeY1Vw6287QRU//xcZqus1NZV0d9grk9WxZ02gltnJ2SpqhTTUG
EcKVtz963Ek1zy4+nq6l
=WZx2
-----END PGP SIGNATURE-----

--mC3NEAINQo/WgN2+--

From owner-freebsd-hackers@FreeBSD.ORG  Wed Nov 20 17:01:47 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 31610105
 for <freebsd-hackers@FreeBSD.org>; Wed, 20 Nov 2013 17:01:47 +0000 (UTC)
Received: from mail.dusatko.org (static-84-242-66-51.net.upcbroadband.cz
 [84.242.66.51]) by mx1.freebsd.org (Postfix) with ESMTP id D91BE2988
 for <freebsd-hackers@FreeBSD.org>; Wed, 20 Nov 2013 17:01:46 +0000 (UTC)
Received: from mail.dusatko.org (localhost [127.0.0.1])
 by mail.dusatko.org (Postfix) with ESMTP id C1A782A1F
 for <freebsd-hackers@FreeBSD.org>; Wed, 20 Nov 2013 17:48:07 +0100 (CET)
Received: from Relict (Relict.praha.dusatko [192.168.253.33])
 by mail.dusatko.org (Postfix) with ESMTPA id 28DEB2A1D
 for <freebsd-hackers@FreeBSD.org>; Wed, 20 Nov 2013 17:48:06 +0100 (CET)
From: =?iso-8859-2?B?SmFuIER1ueF0a28=?= <jan@dusatko.org>
To: <freebsd-hackers@FreeBSD.org>
Subject: ZFS pool cheating
Date: Wed, 20 Nov 2013 17:47:16 +0100
Message-ID: <029f01cee610$4567f870$d037e950$@org>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Ac7mD/qSZDI57tCkTDuqTUiP8kAVEA==
Content-Language: cs
X-Mailman-Approved-At: Wed, 20 Nov 2013 17:43:33 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
Reply-To: jan@dusatko.org
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Nov 2013 17:01:47 -0000

Dear,
Do you someone know method, how can be pool converted from concatenating to
regular mirror? By mistake I replaced failed disk in pool using add not
replace, which caused me to change pool configuration.
I looking method allow me to have pool online during whole replacement
procedure.

Regards

Jan



From owner-freebsd-hackers@FreeBSD.ORG  Wed Nov 20 18:55:44 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id AECCB6D4
 for <freebsd-hackers@FreeBSD.org>; Wed, 20 Nov 2013 18:55:44 +0000 (UTC)
Received: from smtp.fagskolen.gjovik.no (smtp.fagskolen.gjovik.no
 [IPv6:2001:700:1100:1:200:ff:fe00:b])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 21F472167
 for <freebsd-hackers@FreeBSD.org>; Wed, 20 Nov 2013 18:55:43 +0000 (UTC)
Received: from mail.fig.ol.no (localhost [127.0.0.1])
 by mail.fig.ol.no (8.14.7/8.14.7) with ESMTP id rAKItZSV068537
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Wed, 20 Nov 2013 19:55:35 +0100 (CET)
 (envelope-from trond@fagskolen.gjovik.no)
Received: from localhost (trond@localhost)
 by mail.fig.ol.no (8.14.7/8.14.7/Submit) with ESMTP id rAKItZ6k068534;
 Wed, 20 Nov 2013 19:55:35 +0100 (CET)
 (envelope-from trond@fagskolen.gjovik.no)
X-Authentication-Warning: mail.fig.ol.no: trond owned process doing -bs
Date: Wed, 20 Nov 2013 19:55:34 +0100 (CET)
From: =?ISO-8859-1?Q?Trond_Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no>
Sender: Trond.Endrestol@fagskolen.gjovik.no
To: =?UTF-8?Q?Jan_Du=C5=A1=C3=A1tko?= <jan@dusatko.org>
Subject: Re: ZFS pool cheating
In-Reply-To: <029f01cee610$4567f870$d037e950$@org>
Message-ID: <alpine.BSF.2.00.1311201944490.26875@mail.fig.ol.no>
References: <029f01cee610$4567f870$d037e950$@org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
Organization: Fagskolen Innlandet
OpenPGP: url=http://fig.ol.no/~trond/trond.key
MIME-Version: 1.0
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=unavailable version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail.fig.ol.no
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.16
Cc: freebsd-hackers@FreeBSD.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Nov 2013 18:55:44 -0000

On Wed, 20 Nov 2013 17:47+0100, Jan Du?�tko wrote:

> Dear,
> Do you someone know method, how can be pool converted from concatenating to
> regular mirror? By mistake I replaced failed disk in pool using add not
> replace, which caused me to change pool configuration.
> I looking method allow me to have pool online during whole replacement
> procedure.
> 
> Regards
> 
> Jan

I'm afraid your only option is something along these lines:

1. Make a recursive snapshot of the entire pool.

2. Send a recursive ZFS stream of the recursive snapshots to another 
pool, or disk.

Beware of the danger of data loss by having only a single set of 
snapshots available as you proceed.

3. Destroy the old pool.

4. Recreate the original pool to a mirrored configuration.

5. Transfer the recursive ZFS stream back to the new pool using the 
zfs receive command.

6. Remove the recursive snapshots, if warranted.

If your able to setup a fresh pair of disks in your server, you might 
be able to transfer the snapshots to a mirrored pool assigned a 
temporary name. Then export both the current pool and the temporary 
pool. Import the temporary pool and rename the pool to the correct 
name as you import the temp pool.

I guess/hope someone more knowledgeable on ZFS will chime in and 
correct me.

-- 
+-------------------------------+------------------------------------+
| Vennlig hilsen,               | Best regards,                      |
| Trond Endrest�l,              | Trond Endrest�l,                   |
| IT-ansvarlig,                 | System administrator,              |
| Fagskolen Innlandet,          | Gj�vik Technical College, Norway,  |
| tlf. mob.   952 62 567,       | Cellular...: +47 952 62 567,       |
| sentralbord 61 14 54 00.      | Switchboard: +47 61 14 54 00.      |
+-------------------------------+------------------------------------+

From owner-freebsd-hackers@FreeBSD.ORG  Wed Nov 20 19:48:24 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 177ED6A3
 for <freebsd-hackers@freebsd.org>; Wed, 20 Nov 2013 19:48:24 +0000 (UTC)
Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id B7A1B248E
 for <freebsd-hackers@freebsd.org>; Wed, 20 Nov 2013 19:48:23 +0000 (UTC)
Received: from [194.32.164.24] (80-46-130-69.static.dsl.as9105.com
 [80.46.130.69])
 by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id rAKJZ5ib005129;
 Wed, 20 Nov 2013 19:35:06 GMT (envelope-from rb@gid.co.uk)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1822\))
Subject: Re: ZFS pool cheating
From: Bob Bishop <rb@gid.co.uk>
In-Reply-To: <029f01cee610$4567f870$d037e950$@org>
Date: Wed, 20 Nov 2013 19:33:21 +0000
Content-Transfer-Encoding: quoted-printable
Message-Id: <31C2CB4A-F792-4088-96D9-77F1C991D6F7@gid.co.uk>
References: <029f01cee610$4567f870$d037e950$@org>
To: jan@dusatko.org
X-Mailer: Apple Mail (2.1822)
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Nov 2013 19:48:24 -0000

Hi,

On 20 Nov 2013, at 16:47, Jan Du=9A=E1tko <jan@dusatko.org> wrote:

> Dear,
> Do you someone know method, how can be pool converted from =
concatenating to
> regular mirror? By mistake I replaced failed disk in pool using add =
not
> replace, which caused me to change pool configuration.
> I looking method allow me to have pool online during whole replacement
> procedure.

If you can connect two extra disks, you can make a concatenated mirror. =
If you can do that without a reboot you can do the whole procedure =
online.

With existing disks d1,d2 and new disks d3 at least as big as d1, d4 at =
least as big as d2:

zpool attach <pool> d1 d3
zpool attach <pool> d2 d4

Otherwise I think you are in for some downtime.

> Regards
>=20
> Jan
>=20
>=20
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org"
>=20

--
Bob Bishop
rb@gid.co.uk





From owner-freebsd-hackers@FreeBSD.ORG  Wed Nov 20 20:14:35 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C5A7F31B
 for <freebsd-hackers@FreeBSD.org>; Wed, 20 Nov 2013 20:14:35 +0000 (UTC)
Received: from mail.dusatko.org (static-84-242-66-51.net.upcbroadband.cz
 [84.242.66.51]) by mx1.freebsd.org (Postfix) with ESMTP id 867B62656
 for <freebsd-hackers@FreeBSD.org>; Wed, 20 Nov 2013 20:14:34 +0000 (UTC)
Received: from mail.dusatko.org (localhost [127.0.0.1])
 by mail.dusatko.org (Postfix) with ESMTP id B3823209F;
 Wed, 20 Nov 2013 21:14:34 +0100 (CET)
Received: from Relict (Relict.praha.dusatko [192.168.253.33])
 by mail.dusatko.org (Postfix) with ESMTPA id 2133E209E;
 Wed, 20 Nov 2013 21:14:34 +0100 (CET)
From: =?UTF-8?B?SmFuIER1xaHDoXRrbw==?= <jan@dusatko.org>
To: <Trond.Endrestol@fagskolen.gjovik.no>
References: <029f01cee610$4567f870$d037e950$@org>
 <alpine.BSF.2.00.1311201944490.26875@mail.fig.ol.no>
In-Reply-To: <alpine.BSF.2.00.1311201944490.26875@mail.fig.ol.no>
Subject: RE: ZFS pool cheating
Date: Wed, 20 Nov 2013 21:14:26 +0100
Message-ID: <007201cee62d$1c75c4c0$55614e40$@org>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Ac7mIhxlNoBg+0huQ2KmspoXVejeqgACr2KA
Content-Language: cs
X-Mailman-Approved-At: Wed, 20 Nov 2013 23:26:22 +0000
Cc: freebsd-hackers@FreeBSD.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
Reply-To: jan@dusatko.org
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Nov 2013 20:14:35 -0000

Did you try to send snapshot over network ?
I prepared backup using tar, currently plan to do ZFS snapshot and if =
there will be possibility to send it over SSH, I can minimize downtime =
and in the same time check / verify functionality of new pool with the =
same name

Regards

Jan

-----Original Message-----
From: Trond.Endrestol@fagskolen.gjovik.no =
[mailto:Trond.Endrestol@fagskolen.gjovik.no]=20
Sent: 20. listopadu 2013 19:56
To: Jan Du=C5=A1=C3=A1tko
Cc: freebsd-hackers@FreeBSD.org
Subject: Re: ZFS pool cheating

On Wed, 20 Nov 2013 17:47+0100, Jan Du?=C3=A1tko wrote:

> Dear,
> Do you someone know method, how can be pool converted from=20
> concatenating to regular mirror? By mistake I replaced failed disk in=20
> pool using add not replace, which caused me to change pool =
configuration.
> I looking method allow me to have pool online during whole replacement =

> procedure.
>=20
> Regards
>=20
> Jan

I'm afraid your only option is something along these lines:

1. Make a recursive snapshot of the entire pool.

2. Send a recursive ZFS stream of the recursive snapshots to another =
pool, or disk.

Beware of the danger of data loss by having only a single set of =
snapshots available as you proceed.

3. Destroy the old pool.

4. Recreate the original pool to a mirrored configuration.

5. Transfer the recursive ZFS stream back to the new pool using the zfs =
receive command.

6. Remove the recursive snapshots, if warranted.

If your able to setup a fresh pair of disks in your server, you might be =
able to transfer the snapshots to a mirrored pool assigned a temporary =
name. Then export both the current pool and the temporary pool. Import =
the temporary pool and rename the pool to the correct name as you import =
the temp pool.

I guess/hope someone more knowledgeable on ZFS will chime in and correct =
me.

--=20
+-------------------------------+------------------------------------+
| Vennlig hilsen,               | Best regards,                      |
| Trond Endrest=C3=B8l,              | Trond Endrest=C3=B8l,             =
      |
| IT-ansvarlig,                 | System administrator,              |
| Fagskolen Innlandet,          | Gj=C3=B8vik Technical College, Norway, =
 |
| tlf. mob.   952 62 567,       | Cellular...: +47 952 62 567,       |
| sentralbord 61 14 54 00.      | Switchboard: +47 61 14 54 00.      |
+-------------------------------+------------------------------------+


From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 07:18:16 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 22951E57
 for <freebsd-hackers@FreeBSD.org>; Thu, 21 Nov 2013 07:18:16 +0000 (UTC)
Received: from smtp.fagskolen.gjovik.no (smtp.fagskolen.gjovik.no
 [IPv6:2001:700:1100:1:200:ff:fe00:b])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id BAC8C28C7
 for <freebsd-hackers@FreeBSD.org>; Thu, 21 Nov 2013 07:18:15 +0000 (UTC)
Received: from mail.fig.ol.no (localhost [127.0.0.1])
 by mail.fig.ol.no (8.14.7/8.14.7) with ESMTP id rAL7I9ec074830
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Thu, 21 Nov 2013 08:18:09 +0100 (CET)
 (envelope-from trond@fagskolen.gjovik.no)
Received: from localhost (trond@localhost)
 by mail.fig.ol.no (8.14.7/8.14.7/Submit) with ESMTP id rAL7I98I074827;
 Thu, 21 Nov 2013 08:18:09 +0100 (CET)
 (envelope-from trond@fagskolen.gjovik.no)
X-Authentication-Warning: mail.fig.ol.no: trond owned process doing -bs
Date: Thu, 21 Nov 2013 08:18:09 +0100 (CET)
From: =?ISO-8859-1?Q?Trond_Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no>
Sender: Trond.Endrestol@fagskolen.gjovik.no
To: =?UTF-8?Q?Jan_Du=C5=A1=C3=A1tko?= <jan@dusatko.org>
Subject: RE: ZFS pool cheating
In-Reply-To: <007201cee62d$1c75c4c0$55614e40$@org>
Message-ID: <alpine.BSF.2.00.1311210814170.26875@mail.fig.ol.no>
References: <029f01cee610$4567f870$d037e950$@org>
 <alpine.BSF.2.00.1311201944490.26875@mail.fig.ol.no>
 <007201cee62d$1c75c4c0$55614e40$@org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
Organization: Fagskolen Innlandet
OpenPGP: url=http://fig.ol.no/~trond/trond.key
MIME-Version: 1.0
Content-ID: <alpine.BSF.2.00.1311210817350.26875@mail.fig.ol.no>
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=unavailable version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail.fig.ol.no
Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-1
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.16
Cc: freebsd-hackers@FreeBSD.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 07:18:16 -0000

> -----Original Message-----
> From: Trond.Endrestol@fagskolen.gjovik.no [mailto:Trond.Endrestol@fagskolen.gjovik.no] 
> Sent: 20. listopadu 2013 19:56
> To: Jan Du?�tko
> Cc: freebsd-hackers@FreeBSD.org
> Subject: Re: ZFS pool cheating
> 
> On Wed, 20 Nov 2013 17:47+0100, Jan Du?�tko wrote:
> 
> > Dear,
> > Do you someone know method, how can be pool converted from 
> > concatenating to regular mirror? By mistake I replaced failed disk in 
> > pool using add not replace, which caused me to change pool configuration.
> > I looking method allow me to have pool online during whole replacement 
> > procedure.
> > 
> > Regards
> > 
> > Jan
> 
> I'm afraid your only option is something along these lines:
> 
> 1. Make a recursive snapshot of the entire pool.
> 
> 2. Send a recursive ZFS stream of the recursive snapshots to another pool, or disk.
> 
> Beware of the danger of data loss by having only a single set of snapshots available as you proceed.
> 
> 3. Destroy the old pool.
> 
> 4. Recreate the original pool to a mirrored configuration.
> 
> 5. Transfer the recursive ZFS stream back to the new pool using the zfs receive command.
> 
> 6. Remove the recursive snapshots, if warranted.
> 
> If your able to setup a fresh pair of disks in your server, you 
> might be able to transfer the snapshots to a mirrored pool assigned 
> a temporary name. Then export both the current pool and the 
> temporary pool. Import the temporary pool and rename the pool to the 
> correct name as you import the temp pool.
> 
> I guess/hope someone more knowledgeable on ZFS will chime in and 
> correct me.

On Wed, 20 Nov 2013 21:14+0100, Jan Du?�tko wrote:

> Did you try to send snapshot over network ?

No, I haven't tried anything, this is just off the top of my head.

> I prepared backup using tar, currently plan to do ZFS snapshot and 
> if there will be possibility to send it over SSH, I can minimize 
> downtime and in the same time check / verify functionality of new 
> pool with the same name

BTW, please don't top-post.

-- 
+-------------------------------+------------------------------------+
| Vennlig hilsen,               | Best regards,                      |
| Trond Endrest�l,              | Trond Endrest�l,                   |
| IT-ansvarlig,                 | System administrator,              |
| Fagskolen Innlandet,          | Gj�vik Technical College, Norway,  |
| tlf. mob.   952 62 567,       | Cellular...: +47 952 62 567,       |
| sentralbord 61 14 54 00.      | Switchboard: +47 61 14 54 00.      |
+-------------------------------+------------------------------------+

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 12:39:09 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8C6A4D6C
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 12:39:09 +0000 (UTC)
Received: from mail-la0-x229.google.com (mail-la0-x229.google.com
 [IPv6:2a00:1450:4010:c03::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1C664248B
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 12:39:08 +0000 (UTC)
Received: by mail-la0-f41.google.com with SMTP id eo20so2371763lab.14
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 04:39:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=message-id:date:from:user-agent:mime-version:to:subject
 :content-type:content-transfer-encoding;
 bh=K4FL/i/vxeWzjvXlndKh+r4XivLxwROw8fuJeC1VQek=;
 b=ZMNZYpNWrPS8vttBedCX/xxSGqkmqhQf4+TdHO2RfGmofnpsB5sTekDIVkm2nIjLen
 fx/6h/YJVgffFG/L2olwSSzsFAo/t0+e3mAaBkIr6DteHHtoOmZajlNc5bb7MdyN3TZo
 5wAFu3ubTepyDt0j8STvk23TifwZy0lTQw/6IzYNH8faGcFuTRBdKOkelMWxSxRnf7Bs
 W6eedLotQdz2mIlbhvrKNDpXd6e4GIpDbtIptqebuLRi1XJ45U3xc2SBPO4/YV6U4Oit
 CXO/cb+EYUEOCigHNY7BX2sYBxSIdcs3/EKp9nGGzqSB6y0KlDr47NSc9nzxjYwh2O+q
 ee+A==
X-Received: by 10.152.170.199 with SMTP id ao7mr1130074lac.40.1385037546278;
 Thu, 21 Nov 2013 04:39:06 -0800 (PST)
Received: from [172.16.0.2] (tx97.net. [85.198.160.156])
 by mx.google.com with ESMTPSA id 8sm32490258laq.5.2013.11.21.04.39.04
 for <freebsd-hackers@freebsd.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Thu, 21 Nov 2013 04:39:05 -0800 (PST)
Message-ID: <528DFEE6.6020504@gmail.com>
Date: Thu, 21 Nov 2013 14:39:02 +0200
From: Vitaly Magerya <vmagerya@gmail.com>
User-Agent: Thunderbird
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Subject: Problem with signal 0 being delivered to SIGUSR1 handler
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 12:39:09 -0000

Hi, folks. I'm investigating a test case failure that devel/boehm-gc
has on recent FreeBSD releases. The problem is that a signal
handler registered for SIGUSR1 is sometimes called with signum=0,
which should not be possible under any conditions.

Here's a simple test case that demonstrates this behavior:

/* Compile with 'c99 -o example example.c -pthread'
 */
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

void signal_handler(int signum, siginfo_t *si, void *context) {
    if (signum != SIGUSR1) {
        printf("bad signal, signum=%d\n", signum);
        exit(1);
    }
}

void *thread_func(void *arg) {
    return arg;
}

int main(void) {
    struct sigaction sa = { 0 };
    sa.sa_flags = SA_SIGINFO;
    sa.sa_sigaction = signal_handler;
    if (sigfillset(&sa.sa_mask) != 0) abort();
    if (sigaction(SIGUSR1, &sa, NULL) != 0) abort();
    for (int i = 0; i < 10000; i++) {
        pthread_t t;
        pthread_create(&t, NULL, thread_func, NULL);
        pthread_kill(t, SIGUSR1);
    }
    return 0;
}

Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get
"signum=0" from this program, but you may need to run it a few
times or increase the number of iterations to see the same.

Interestingly enough, I don't see this behavior under 9.0-RELEASE.

So, any ideas what the problem here is?

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 13:20:59 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 24C93A9E
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 13:20:59 +0000 (UTC)
Received: from mail-pd0-x22d.google.com (mail-pd0-x22d.google.com
 [IPv6:2607:f8b0:400e:c02::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 040FF2732
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 13:20:58 +0000 (UTC)
Received: by mail-pd0-f173.google.com with SMTP id p10so3479531pdj.4
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 05:20:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=pz2rtkKRO8mN68MDPxdGRx02eJlbiE8MKW4/uRr1IgA=;
 b=EBR/3IVyC7741eAvTJfcejU7UGbxpyjp9tqw1Q2dEVJExRHfXChf4V4LZyrE0n/VR5
 h+1baJ3lOsxhV7ZF1dNXU9UoK4RI470IiI4Zly0A4A66Jr1qu2bTMhkL7dNl9k6jNMdg
 Zv5ahILqWJFPgXje24nAo+HMdjajIsIwlXHLH97MJV90Zp5z9s7/6NcDOl1SrEssS/QQ
 /aRfjN7F18ei9gRNXxZblU655g37vV/mrTC+7lp4kWKeMWbqGquh788R3Vu065D076Hr
 Odl37LJUG5VTB9K3iyJ/Z3eSVT8sOe4t2jRHD7efHt05fwnGnein9UN+/zWlT1ncwnuE
 FqFA==
MIME-Version: 1.0
X-Received: by 10.68.218.3 with SMTP id pc3mr6245928pbc.71.1385040056398; Thu,
 21 Nov 2013 05:20:56 -0800 (PST)
Received: by 10.70.41.133 with HTTP; Thu, 21 Nov 2013 05:20:56 -0800 (PST)
Date: Thu, 21 Nov 2013 07:20:56 -0600
Message-ID: <CAGm6yaTEFECTYVb94A13TaXMPSLtKLpTbw4iNdgd8SuNF1QDaA@mail.gmail.com>
Subject: 9.1 callout behavior
From: Bret Ketchum <bcketchum@gmail.com>
To: freebsd-hackers@freebsd.org
X-Mailman-Approved-At: Thu, 21 Nov 2013 13:27:31 +0000
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.16
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 13:20:59 -0000

     I've a callout which runs every 100ms and does a bit of accounting
using the global ticks variable. This one-shot callout was called fairly
consistently in 8.1, every 100ms give or take a few thousand clocks. I've
recently upgraded to 9.1 and for the most part the period is consistent.
However, periodically the callout function is executed anywhere between 5ms
to 20ms after the callout was reset and the function returned while global
ticks has increased 8x. The hardware has not changed (using the same
timecounter configuration):

CPU: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2500.05-MHz K8-class CPU)

kern.timecounter.hardware: TSC-low
kern.timecounter.tick: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.smp_tsc: 1

     And default eventtimer configuration:

kern.eventtimer.singlemul: 2
kern.eventtimer.idletick: 0
kern.eventtimer.activetick: 1
kern.eventtimer.timer: LAPIC
kern.eventtimer.periodic: 0

    If tickless mode is disabled the inconsistency goes away. Is the
premature expiration of the callout expected? Is the jump in global ticks
typical (say from 100 ticks to 800 ticks in 1.5ms)?

    Bret

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 17:40:36 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B697FC73;
 Thu, 21 Nov 2013 17:40:36 +0000 (UTC)
Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90])
 by mx1.freebsd.org (Postfix) with ESMTP id 6FE6226B3;
 Thu, 21 Nov 2013 17:40:36 +0000 (UTC)
X-Ambrisko-Me: Yes
Received: from server2.ambrisko.com (HELO internal.ambrisko.com)
 ([192.168.1.2])
 by ironport.ambrisko.com with ESMTP; 21 Nov 2013 09:44:23 -0800
Received: from ambrisko.com (localhost [127.0.0.1])
 by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id rALHeUcq087761;
 Thu, 21 Nov 2013 09:40:30 -0800 (PST)
 (envelope-from ambrisko@ambrisko.com)
Received: (from ambrisko@localhost)
 by ambrisko.com (8.14.4/8.14.4/Submit) id rALHeSQ0087758;
 Thu, 21 Nov 2013 09:40:28 -0800 (PST) (envelope-from ambrisko)
Date: Thu, 21 Nov 2013 09:40:28 -0800
From: Doug Ambrisko <ambrisko@ambrisko.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs
Message-ID: <20131121174028.GA80520@ambrisko.com>
References: <51B3B59B.8050903@erdgeist.org>
 <CAMBSHm8GMWffuuEcSpuNu26Mv4N2yAa2iEdw5koiXx0w30zPRQ@mail.gmail.com>
 <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org>
 <20131115010854.GA76106@ambrisko.com> <20131116183129.GD59496@kib.kiev.ua>
 <20131118190142.GA28210@ambrisko.com> <20131119074922.GY59496@kib.kiev.ua>
 <20131119174216.GA80753@ambrisko.com> <20131120075531.GE59496@kib.kiev.ua>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20131120075531.GE59496@kib.kiev.ua>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-hackers@freebsd.org, Dirk Engling <erdgeist@erdgeist.org>,
 Jase Thew <jase@freebsd.org>, mdf@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 17:40:36 -0000

On Wed, Nov 20, 2013 at 09:55:31AM +0200, Konstantin Belousov wrote:
| On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote:
| > I was talking about the more general case since the system tries to keep
| > the path in the stat structure.  My prior approach which had more issues
| > was to modify the stat structure of which I was pointed to NetBSD and their
| > change to statvfs which doesn't really solve the problem.  They don't
| > have the check to see if the mount is longer then VFS_MNAMELEN (in their case)
| > and just truncate things.
| > 
| > If we are just talking about adding it to the mount structure that
| > would be okay since it isn't exposed to user land.  I can add that.
|
| Yes, this is exactly what I mean.  Add a struct mount field, and use
| it for kernel only.  In fact, it only matters for sys_unmount() and
| kern_jail.c, other locations in kernel use the path for warnings, and
| this could be postponed if you prefer to minimize the patch.

Okay, I went through all of the occurances and compile tested (except
for #DEBUG).  I united a few things but should do more once I get
consensus on the approach.  I found a few spots that should be updated as
well and made the length check more consistant.  Some were doing >= and others
>.  So this should be better, however, a lot larger.  On the plus side
when we figure out how to return the longer path length to user land
that can be more flexible since the kernel is tracking the longer length.
Probably things to note are changes in:
	ZFS to mount snapshot
	cd9660 for symlinks
	fuse to return full path
	jail to check statfs and mount
	mount/umount to save and check full path
	mountroot to save new field for full path
	
Just in case it doesn't make it in email the full patch is at:
	http://people.freebsd.org/~ambrisko/mount_bigger.patch

Thanks,

Doug A.

Index: cddl/compat/opensolaris/kern/opensolaris_vfs.c
===================================================================
--- cddl/compat/opensolaris/kern/opensolaris_vfs.c	(revision 257489)
+++ cddl/compat/opensolaris/kern/opensolaris_vfs.c	(working copy)
@@ -126,7 +126,7 @@
 	 * variables will fit in our mp buffers, including the
 	 * terminating NUL.
 	 */
-	if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MNAMELEN)
+	if (strlen(fstype) > MFSNAMELEN || strlen(fspath) > MAXPATHLEN)
 		return (ENAMETOOLONG);
 
 	vfsp = vfs_byname_kld(fstype, td, &error);
Index: cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
===================================================================
--- cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c	(revision 257489)
+++ cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c	(working copy)
@@ -1069,12 +1069,12 @@
 
 	dmu_objset_rele(snap, FTAG);
 domount:
-	mountpoint_len = strlen(dvp->v_vfsp->mnt_stat.f_mntonname) +
+	mountpoint_len = strlen(dvp->v_vfsp->mnt_path) +
 	    strlen("/" ZFS_CTLDIR_NAME "/snapshot/") + strlen(nm) + 1;
 	mountpoint = kmem_alloc(mountpoint_len, KM_SLEEP);
 	(void) snprintf(mountpoint, mountpoint_len,
 	    "%s/" ZFS_CTLDIR_NAME "/snapshot/%s",
-	    dvp->v_vfsp->mnt_stat.f_mntonname, nm);
+	    dvp->v_vfsp->mnt_path, nm);
 	err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0);
 	kmem_free(mountpoint, mountpoint_len);
 	if (err == 0) {
Index: fs/cd9660/cd9660_rrip.c
===================================================================
--- fs/cd9660/cd9660_rrip.c	(revision 257489)
+++ fs/cd9660/cd9660_rrip.c	(working copy)
@@ -167,7 +167,7 @@
 			/* same as above */
 			outbuf -= len;
 			len = 0;
-			inbuf = ana->imp->im_mountp->mnt_stat.f_mntonname;
+			inbuf = (char *)ana->imp->im_mountp->mnt_path;
 			wlen = strlen(inbuf);
 			break;
 
Index: fs/ext2fs/ext2_lookup.c
===================================================================
--- fs/ext2fs/ext2_lookup.c	(revision 257489)
+++ fs/ext2fs/ext2_lookup.c	(working copy)
@@ -802,10 +802,10 @@
 	mp = ITOV(ip)->v_mount;
 	if ((mp->mnt_flag & MNT_RDONLY) == 0)
 		panic("ext2_dirbad: %s: bad dir ino %lu at offset %ld: %s\n",
-			mp->mnt_stat.f_mntonname, (u_long)ip->i_number,(long)offset, how);
+			mp->mnt_path, (u_long)ip->i_number,(long)offset, how);
 	else
 	(void)printf("%s: bad dir ino %lu at offset %ld: %s\n",
-            mp->mnt_stat.f_mntonname, (u_long)ip->i_number, (long)offset, how);
+            mp->mnt_path, (u_long)ip->i_number, (long)offset, how);
 
 }
 
Index: fs/fuse/fuse_vnops.c
===================================================================
--- fs/fuse/fuse_vnops.c	(revision 257489)
+++ fs/fuse/fuse_vnops.c	(working copy)
@@ -1265,7 +1265,7 @@
 	}
 	if (((char *)fdi.answ)[0] == '/' &&
 	    fuse_get_mpdata(vnode_mount(vp))->dataflags & FSESS_PUSH_SYMLINKS_IN) {
-		char *mpth = vnode_mount(vp)->mnt_stat.f_mntonname;
+		char *mpth = (char *)vnode_mount(vp)->mnt_path;
 
 		err = uiomove(mpth, strlen(mpth), uio);
 	}
Index: fs/nandfs/nandfs_segment.c
===================================================================
--- fs/nandfs/nandfs_segment.c	(revision 257489)
+++ fs/nandfs/nandfs_segment.c	(working copy)
@@ -1275,7 +1275,7 @@
 
 	mp = (struct mount *)addr;
 	db_printf("%p %s on %s (%s)\n", mp, mp->mnt_stat.f_mntfromname,
-	    mp->mnt_stat.f_mntonname, mp->mnt_stat.f_fstypename);
+	    mp->mnt_path, mp->mnt_stat.f_fstypename);
 
 
 	nmp = (struct nandfsmount *)(mp->mnt_data);
Index: fs/nullfs/null_vfsops.c
===================================================================
--- fs/nullfs/null_vfsops.c	(revision 257489)
+++ fs/nullfs/null_vfsops.c	(working copy)
@@ -211,7 +211,7 @@
 	vfs_mountedfrom(mp, target);
 
 	NULLFSDEBUG("nullfs_mount: lower %s, alias at %s\n",
-		mp->mnt_stat.f_mntfromname, mp->mnt_stat.f_mntonname);
+		mp->mnt_stat.f_mntfromname, mp->mnt_path);
 	return (0);
 }
 
Index: fs/unionfs/union_vfsops.c
===================================================================
--- fs/unionfs/union_vfsops.c	(revision 257489)
+++ fs/unionfs/union_vfsops.c	(working copy)
@@ -310,7 +310,7 @@
 	copystr(target, tmp, len, NULL);
 
 	UNIONFSDEBUG("unionfs_mount: from %s, on %s\n",
-	    mp->mnt_stat.f_mntfromname, mp->mnt_stat.f_mntonname);
+	    mp->mnt_stat.f_mntfromname, mp->mnt_path);
 
 	return (0);
 }
Index: geom/journal/g_journal.c
===================================================================
--- geom/journal/g_journal.c	(revision 257489)
+++ geom/journal/g_journal.c	(working copy)
@@ -2922,7 +2922,7 @@
 			goto next;
 		}
 
-		mountpoint = mp->mnt_stat.f_mntonname;
+		mountpoint = (char *)mp->mnt_path;
 
 		error = vn_start_write(NULL, &mp, V_WAIT);
 		if (error != 0) {
Index: gnu/fs/reiserfs/reiserfs_vfsops.c
===================================================================
--- gnu/fs/reiserfs/reiserfs_vfsops.c	(revision 257489)
+++ gnu/fs/reiserfs/reiserfs_vfsops.c	(working copy)
@@ -309,7 +309,7 @@
 	reiserfs_log(LOG_DEBUG, "...done\n");
 
 	if (sbp != &mp->mnt_stat) {
-		reiserfs_log(LOG_DEBUG, "copying monut point info\n");
+		reiserfs_log(LOG_DEBUG, "copying mount point info\n");
 		sbp->f_type = mp->mnt_vfc->vfc_typenum;
 		bcopy((caddr_t)mp->mnt_stat.f_mntonname,
 		    (caddr_t)&sbp->f_mntonname[0], MNAMELEN);
@@ -318,7 +318,7 @@
 		reiserfs_log(LOG_DEBUG, "  mount from: %s\n",
 		    sbp->f_mntfromname);
 		reiserfs_log(LOG_DEBUG, "  mount on:   %s\n",
-		    sbp->f_mntonname);
+		    mp->mnt_path);
 		reiserfs_log(LOG_DEBUG, "...done\n");
 	}
 
Index: kern/kern_jail.c
===================================================================
--- kern/kern_jail.c	(revision 257489)
+++ kern/kern_jail.c	(working copy)
@@ -3555,7 +3555,6 @@
 prison_canseemount(struct ucred *cred, struct mount *mp)
 {
 	struct prison *pr;
-	struct statfs *sp;
 	size_t len;
 
 	pr = cred->cr_prison;
@@ -3574,14 +3573,13 @@
 	if (strcmp(pr->pr_path, "/") == 0)
 		return (0);
 	len = strlen(pr->pr_path);
-	sp = &mp->mnt_stat;
-	if (strncmp(pr->pr_path, sp->f_mntonname, len) != 0)
+	if (strncmp(pr->pr_path, mp->mnt_path, len) != 0)
 		return (ENOENT);
 	/*
 	 * Be sure that we don't have situation where jail's root directory
 	 * is "/some/path" and mount point is "/some/pathpath".
 	 */
-	if (sp->f_mntonname[len] != '\0' && sp->f_mntonname[len] != '/')
+	if (mp->mnt_path[len] != '\0' && mp->mnt_path[len] != '/')
 		return (ENOENT);
 	return (0);
 }
Index: kern/vfs_mount.c
===================================================================
--- kern/vfs_mount.c	(revision 257489)
+++ kern/vfs_mount.c	(working copy)
@@ -473,6 +473,7 @@
 	mp->mnt_cred = crdup(cred);
 	mp->mnt_stat.f_owner = cred->cr_uid;
 	strlcpy(mp->mnt_stat.f_mntonname, fspath, MNAMELEN);
+	strlcpy((char *)mp->mnt_path, fspath, MAXPATHLEN);
 	mp->mnt_iosize_max = DFLTPHYS;
 #ifdef MAC
 	mac_mount_init(mp);
@@ -656,7 +657,7 @@
 	 * variables will fit in our mp buffers, including the
 	 * terminating NUL.
 	 */
-	if (fstypelen > MFSNAMELEN || fspathlen > MNAMELEN) {
+	if (fstypelen > MFSNAMELEN || fspathlen > MAXPATHLEN) {
 		error = ENAMETOOLONG;
 		goto bail;
 	}
@@ -748,8 +749,8 @@
 		return (EOPNOTSUPP);
 	}
 
-	ma = mount_argsu(ma, "fstype", uap->type, MNAMELEN);
-	ma = mount_argsu(ma, "fspath", uap->path, MNAMELEN);
+	ma = mount_argsu(ma, "fstype", uap->type, MFSNAMELEN);
+	ma = mount_argsu(ma, "fspath", uap->path, MAXPATHLEN);
 	ma = mount_argb(ma, flags & MNT_RDONLY, "noro");
 	ma = mount_argb(ma, !(flags & MNT_NOSUID), "nosuid");
 	ma = mount_argb(ma, !(flags & MNT_NOEXEC), "noexec");
@@ -1040,7 +1041,7 @@
 	 * variables will fit in our mp buffers, including the
 	 * terminating NUL.
 	 */
-	if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MNAMELEN)
+	if (strlen(fstype) > MFSNAMELEN || strlen(fspath) > MAXPATHLEN)
 		return (ENAMETOOLONG);
 
 	if (jailed(td->td_ucred) || usermount == 0) {
@@ -1095,9 +1096,9 @@
 	NDFREE(&nd, NDF_ONLY_PNBUF);
 	vp = nd.ni_vp;
 	if ((fsflags & MNT_UPDATE) == 0) {
-		pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK);
+		pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK);
 		strcpy(pathbuf, fspath);
-		error = vn_path_to_global_path(td, vp, pathbuf, MNAMELEN);
+		error = vn_path_to_global_path(td, vp, pathbuf, MAXPATHLEN);
 		/* debug.disablefullpath == 1 results in ENODEV */
 		if (error == 0 || error == ENODEV) {
 			error = vfs_domount_first(td, vfsp, pathbuf, vp,
@@ -1147,8 +1148,8 @@
 			return (error);
 	}
 
-	pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK);
-	error = copyinstr(uap->path, pathbuf, MNAMELEN, NULL);
+	pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK);
+	error = copyinstr(uap->path, pathbuf, MAXPATHLEN, NULL);
 	if (error) {
 		free(pathbuf, M_TEMP);
 		return (error);
@@ -1179,13 +1180,13 @@
 		if (namei(&nd) == 0) {
 			NDFREE(&nd, NDF_ONLY_PNBUF);
 			error = vn_path_to_global_path(td, nd.ni_vp, pathbuf,
-			    MNAMELEN);
+			    MAXPATHLEN);
 			if (error == 0 || error == ENODEV)
 				vput(nd.ni_vp);
 		}
 		mtx_lock(&mountlist_mtx);
 		TAILQ_FOREACH_REVERSE(mp, &mountlist, mntlist, mnt_list) {
-			if (strcmp(mp->mnt_stat.f_mntonname, pathbuf) == 0)
+			if (strcmp(mp->mnt_path, pathbuf) == 0)
 				break;
 		}
 		mtx_unlock(&mountlist_mtx);
Index: kern/vfs_mountroot.c
===================================================================
--- kern/vfs_mountroot.c	(revision 257489)
+++ kern/vfs_mountroot.c	(working copy)
@@ -307,6 +307,8 @@
 				vp->v_mountedhere = mporoot;
 				strlcpy(mporoot->mnt_stat.f_mntonname,
 				    fspath, MNAMELEN);
+				strlcpy((char *)mporoot->mnt_path,
+				    fspath, MAXPATHLEN);
 				VOP_UNLOCK(vp, 0);
 			} else
 				vput(vp);
Index: kern/vfs_subr.c
===================================================================
--- kern/vfs_subr.c	(revision 257489)
+++ kern/vfs_subr.c	(working copy)
@@ -2962,7 +2962,7 @@
 		TAILQ_FOREACH(mp, &mountlist, mnt_list) {
 			db_printf("%p %s on %s (%s)\n", mp,
 			    mp->mnt_stat.f_mntfromname,
-			    mp->mnt_stat.f_mntonname,
+			    mp->mnt_path,
 			    mp->mnt_stat.f_fstypename);
 			if (db_pager_quit)
 				break;
@@ -2973,7 +2973,7 @@
 
 	mp = (struct mount *)addr;
 	db_printf("%p %s on %s (%s)\n", mp, mp->mnt_stat.f_mntfromname,
-	    mp->mnt_stat.f_mntonname, mp->mnt_stat.f_fstypename);
+	    mp->mnt_path, mp->mnt_stat.f_fstypename);
 
 	buf[0] = '\0';
 	mflags = mp->mnt_flag;
@@ -3406,7 +3406,7 @@
 			 */
 			if (strcmp(mp->mnt_vfc->vfc_name, "devfs") != 0) {
 				printf("unmount of %s failed (",
-				    mp->mnt_stat.f_mntonname);
+				    mp->mnt_path);
 				if (error == EBUSY)
 					printf("BUSY)\n");
 				else
Index: security/mac_lomac/mac_lomac.c
===================================================================
--- security/mac_lomac/mac_lomac.c	(revision 257489)
+++ security/mac_lomac/mac_lomac.c	(working copy)
@@ -569,7 +569,7 @@
 		    "mountpount=%s)\n",
 		    subjlabeltext, p->p_pid, pgid, curthread->td_ucred->cr_uid,
 		    p->p_comm, subjtext, actionname, objlabeltext, objname,
-		    va.va_fileid, vp->v_mount->mnt_stat.f_mntonname);
+		    va.va_fileid, vp->v_mount->mnt_path);
 	} else {
 		log(LOG_INFO, "LOMAC: level-%s subject p%dg%du%d:%s demoted to"
 		    " level %s after %s a level-%s %s\n",
Index: sys/mount.h
===================================================================
--- sys/mount.h	(revision 257489)
+++ sys/mount.h	(working copy)
@@ -190,6 +190,7 @@
 	struct lock	mnt_explock;		/* vfs_export walkers lock */
 	TAILQ_ENTRY(mount) mnt_upper_link;	/* (m) we in the all uppers */
 	TAILQ_HEAD(, mount) mnt_uppers;		/* (m) upper mounts over us*/
+	const char	mnt_path[MAXPATHLEN];	/* actual mount path */
 };
 
 /*
Index: ufs/ffs/ffs_alloc.c
===================================================================
--- ufs/ffs/ffs_alloc.c	(revision 257489)
+++ ufs/ffs/ffs_alloc.c	(working copy)
@@ -2748,7 +2748,7 @@
 	case FFS_SET_FLAGS:
 #ifdef DEBUG
 		if (fsckcmds)
-			printf("%s: %s flags\n", mp->mnt_stat.f_mntonname,
+			printf("%s: %s flags\n", mp->mnt_path,
 			    cmd.size > 0 ? "set" : "clear");
 #endif /* DEBUG */
 		if (cmd.size > 0)
@@ -2761,7 +2761,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: adjust inode %jd link count by %jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value,
+			    mp->mnt_path, (intmax_t)cmd.value,
 			    (intmax_t)cmd.size);
 		}
 #endif /* DEBUG */
@@ -2782,7 +2782,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: adjust inode %jd block count by %jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value,
+			    mp->mnt_path, (intmax_t)cmd.value,
 			    (intmax_t)cmd.size);
 		}
 #endif /* DEBUG */
@@ -2804,12 +2804,12 @@
 		if (fsckcmds) {
 			if (cmd.size == 1)
 				printf("%s: free %s inode %ju\n",
-				    mp->mnt_stat.f_mntonname,
+				    mp->mnt_path,
 				    filetype == IFDIR ? "directory" : "file",
 				    (uintmax_t)cmd.value);
 			else
 				printf("%s: free %s inodes %ju-%ju\n",
-				    mp->mnt_stat.f_mntonname,
+				    mp->mnt_path,
 				    filetype == IFDIR ? "directory" : "file",
 				    (uintmax_t)cmd.value,
 				    (uintmax_t)(cmd.value + cmd.size - 1));
@@ -2829,11 +2829,11 @@
 		if (fsckcmds) {
 			if (cmd.size == 1)
 				printf("%s: free block %jd\n",
-				    mp->mnt_stat.f_mntonname,
+				    mp->mnt_path,
 				    (intmax_t)cmd.value);
 			else
 				printf("%s: free blocks %jd-%jd\n",
-				    mp->mnt_stat.f_mntonname, 
+				    mp->mnt_path, 
 				    (intmax_t)cmd.value,
 				    (intmax_t)cmd.value + cmd.size - 1);
 		}
@@ -2860,7 +2860,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: adjust number of directories by %jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value);
+			    mp->mnt_path, (intmax_t)cmd.value);
 		}
 #endif /* DEBUG */
 		fs->fs_cstotal.cs_ndir += cmd.value;
@@ -2870,7 +2870,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: adjust number of free blocks by %+jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value);
+			    mp->mnt_path, (intmax_t)cmd.value);
 		}
 #endif /* DEBUG */
 		fs->fs_cstotal.cs_nbfree += cmd.value;
@@ -2880,7 +2880,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: adjust number of free inodes by %+jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value);
+			    mp->mnt_path, (intmax_t)cmd.value);
 		}
 #endif /* DEBUG */
 		fs->fs_cstotal.cs_nifree += cmd.value;
@@ -2890,7 +2890,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: adjust number of free frags by %+jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value);
+			    mp->mnt_path, (intmax_t)cmd.value);
 		}
 #endif /* DEBUG */
 		fs->fs_cstotal.cs_nffree += cmd.value;
@@ -2900,7 +2900,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: adjust number of free clusters by %+jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value);
+			    mp->mnt_path, (intmax_t)cmd.value);
 		}
 #endif /* DEBUG */
 		fs->fs_cstotal.cs_numclusters += cmd.value;
@@ -2910,7 +2910,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: set current directory to inode %jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value);
+			    mp->mnt_path, (intmax_t)cmd.value);
 		}
 #endif /* DEBUG */
 		if ((error = ffs_vget(mp, (ino_t)cmd.value, LK_SHARED, &vp)))
@@ -2933,7 +2933,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: change .. in cwd from %jd to %jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value,
+			    mp->mnt_path, (intmax_t)cmd.value,
 			    (intmax_t)cmd.size);
 		}
 #endif /* DEBUG */
@@ -2972,7 +2972,7 @@
 			if (copyinstr((char *)(intptr_t)cmd.value, buf,32,NULL))
 				strncpy(buf, "Name_too_long", 32);
 			printf("%s: unlink %s (inode %jd)\n",
-			    mp->mnt_stat.f_mntonname, buf, (intmax_t)cmd.size);
+			    mp->mnt_path, buf, (intmax_t)cmd.size);
 		}
 #endif /* DEBUG */
 		/*
@@ -2994,7 +2994,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: update inode %jd\n",
-			    mp->mnt_stat.f_mntonname, (intmax_t)cmd.value);
+			    mp->mnt_path, (intmax_t)cmd.value);
 		}
 #endif /* DEBUG */
 		if ((error = ffs_vget(mp, (ino_t)cmd.value, LK_EXCLUSIVE, &vp)))
@@ -3028,7 +3028,7 @@
 #ifdef DEBUG
 		if (fsckcmds) {
 			printf("%s: %s buffered output for descriptor %jd\n",
-			    mp->mnt_stat.f_mntonname,
+			    mp->mnt_path,
 			    cmd.size == 1 ? "enable" : "disable",
 			    (intmax_t)cmd.value);
 		}
Index: ufs/ffs/ffs_snapshot.c
===================================================================
--- ufs/ffs/ffs_snapshot.c	(revision 257489)
+++ ufs/ffs/ffs_snapshot.c	(working copy)
@@ -693,7 +693,7 @@
 		nanotime(&endtime);
 		timespecsub(&endtime, &starttime);
 		printf("%s: suspended %ld.%03ld sec, redo %ld of %d\n",
-		    vp->v_mount->mnt_stat.f_mntonname, (long)endtime.tv_sec,
+		    vp->v_mount->mnt_path, (long)endtime.tv_sec,
 		    endtime.tv_nsec / 1000000, redo, fs->fs_ncg);
 	}
 	if (copy_fs == NULL)
Index: ufs/ffs/ffs_softdep.c
===================================================================
--- ufs/ffs/ffs_softdep.c	(revision 257489)
+++ ufs/ffs/ffs_softdep.c	(working copy)
@@ -733,7 +733,7 @@
  * Internal function prototypes.
  */
 static	void check_clear_deps(struct mount *);
-static	void softdep_error(char *, int);
+static	void softdep_error(const char *, int);
 static	int softdep_process_worklist(struct mount *, int);
 static	int softdep_waitidle(struct mount *);
 static	void drain_output(struct vnode *);
@@ -13771,7 +13771,7 @@
 	if ((bp->b_ioflags & BIO_ERROR) == 0)
 		panic("softdep_deallocate_dependencies: dangling deps");
 	if (bp->b_vp != NULL && bp->b_vp->v_mount != NULL)
-		softdep_error(bp->b_vp->v_mount->mnt_stat.f_mntonname, bp->b_error);
+		softdep_error(bp->b_vp->v_mount->mnt_path, bp->b_error);
 	else
 		printf("softdep_deallocate_dependencies: "
 		    "got error %d while accessing filesystem\n", bp->b_error);
@@ -13784,7 +13784,7 @@
  */
 static void
 softdep_error(func, error)
-	char *func;
+	const char *func;
 	int error;
 {
 
@@ -13916,7 +13916,7 @@
 db_print_ffs(struct ufsmount *ump)
 {
 	db_printf("mp %p %s devvp %p fs %p su_wl %d su_deps %d su_req %d\n",
-	    ump->um_mountp, ump->um_mountp->mnt_stat.f_mntonname,
+	    ump->um_mountp, ump->um_mountp->mnt_path,
 	    ump->um_devvp, ump->um_fs, ump->softdep_on_worklist,
 	    ump->softdep_deps, ump->softdep_req);
 }
Index: ufs/ffs/ffs_vfsops.c
===================================================================
--- ufs/ffs/ffs_vfsops.c	(revision 257489)
+++ ufs/ffs/ffs_vfsops.c	(working copy)
@@ -533,7 +533,7 @@
 		 * We need the name for the mount point (also used for
 		 * "last mounted on") copied in. If an error occurs,
 		 * the mount point is discarded by the upper level code.
-		 * Note that vfs_mount() populates f_mntonname for us.
+		 * Note that vfs_mount() populates mnt_path for us.
 		 */
 		if ((error = ffs_mountfs(devvp, mp, td)) != 0) {
 			vrele(devvp);
@@ -885,13 +885,13 @@
 		} else {
 			printf("WARNING: %s: GJOURNAL flag on fs "
 			    "but no gjournal provider below\n",
-			    mp->mnt_stat.f_mntonname);
+			    mp->mnt_path);
 			free(mp->mnt_gjprovider, M_UFSMNT);
 			mp->mnt_gjprovider = NULL;
 		}
 #else
 		printf("WARNING: %s: GJOURNAL flag on fs but no "
-		    "UFS_GJOURNAL support\n", mp->mnt_stat.f_mntonname);
+		    "UFS_GJOURNAL support\n", mp->mnt_path);
 #endif
 	} else {
 		mp->mnt_gjprovider = NULL;
@@ -976,7 +976,7 @@
 		MNT_IUNLOCK(mp);
 #else
 		printf("WARNING: %s: multilabel flag on fs but "
-		    "no MAC support\n", mp->mnt_stat.f_mntonname);
+		    "no MAC support\n", mp->mnt_path);
 #endif
 	}
 	if ((fs->fs_flags & FS_ACLS) != 0) {
@@ -986,7 +986,7 @@
 		if (mp->mnt_flag & MNT_NFS4ACLS)
 			printf("WARNING: %s: ACLs flag on fs conflicts with "
 			    "\"nfsv4acls\" mount option; option ignored\n",
-			    mp->mnt_stat.f_mntonname);
+			    mp->mnt_path);
 		mp->mnt_flag &= ~MNT_NFS4ACLS;
 		mp->mnt_flag |= MNT_ACLS;
 
@@ -993,7 +993,7 @@
 		MNT_IUNLOCK(mp);
 #else
 		printf("WARNING: %s: ACLs flag on fs but no ACLs support\n",
-		    mp->mnt_stat.f_mntonname);
+		    mp->mnt_path);
 #endif
 	}
 	if ((fs->fs_flags & FS_NFS4ACLS) != 0) {
@@ -1003,7 +1003,7 @@
 		if (mp->mnt_flag & MNT_ACLS)
 			printf("WARNING: %s: NFSv4 ACLs flag on fs conflicts "
 			    "with \"acls\" mount option; option ignored\n",
-			    mp->mnt_stat.f_mntonname);
+			    mp->mnt_path);
 		mp->mnt_flag &= ~MNT_ACLS;
 		mp->mnt_flag |= MNT_NFS4ACLS;
 
@@ -1010,7 +1010,7 @@
 		MNT_IUNLOCK(mp);
 #else
 		printf("WARNING: %s: NFSv4 ACLs flag on fs but no "
-		    "ACLs support\n", mp->mnt_stat.f_mntonname);
+		    "ACLs support\n", mp->mnt_path);
 #endif
 	}
 	if ((fs->fs_flags & FS_TRIM) != 0) {
@@ -1020,11 +1020,11 @@
 			if (!ump->um_candelete)
 				printf("WARNING: %s: TRIM flag on fs but disk "
 				    "does not support TRIM\n",
-				    mp->mnt_stat.f_mntonname);
+				    mp->mnt_path);
 		} else {
 			printf("WARNING: %s: TRIM flag on fs but disk does "
 			    "not confirm that it supports TRIM\n",
-			    mp->mnt_stat.f_mntonname);
+			    mp->mnt_path);
 			ump->um_candelete = 0;
 		}
 	}
@@ -1044,7 +1044,7 @@
 	 * Set FS local "last mounted on" information (NULL pad)
 	 */
 	bzero(fs->fs_fsmnt, MAXMNTLEN);
-	strlcpy(fs->fs_fsmnt, mp->mnt_stat.f_mntonname, MAXMNTLEN);
+	strlcpy(fs->fs_fsmnt, mp->mnt_path, MAXMNTLEN);
 	mp->mnt_stat.f_iosize = fs->fs_bsize;
 
 	if (mp->mnt_flag & MNT_ROOTFS) {
@@ -1241,7 +1241,7 @@
 	if ((error = ufs_extattr_stop(mp, td))) {
 		if (error != EOPNOTSUPP)
 			printf("WARNING: unmount %s: ufs_extattr_stop "
-			    "returned errno %d\n", mp->mnt_stat.f_mntonname,
+			    "returned errno %d\n", mp->mnt_path,
 			    error);
 		e_restart = 0;
 	} else {
Index: ufs/ufs/ufs_extattr.c
===================================================================
--- ufs/ufs/ufs_extattr.c	(revision 257489)
+++ ufs/ufs/ufs_extattr.c	(working copy)
@@ -923,7 +923,7 @@
 		 * up by the next write or extattrctl clean.
 		 */
 		printf("ufs_extattr_get (%s): inode number inconsistency (%d, %ju)\n",
-		    mp->mnt_stat.f_mntonname, ueh.ueh_i_gen, (uintmax_t)ip->i_gen);
+		    mp->mnt_path, ueh.ueh_i_gen, (uintmax_t)ip->i_gen);
 		error = ENOATTR;
 		goto vopunlock_exit;
 	}
@@ -1228,7 +1228,7 @@
 		 * the next write or extattrctl clean.
 		 */
 		printf("ufs_extattr_rm (%s): inode number inconsistency (%d, %jd)\n",
-		    mp->mnt_stat.f_mntonname, ueh.ueh_i_gen, (intmax_t)ip->i_gen);
+		    mp->mnt_path, ueh.ueh_i_gen, (intmax_t)ip->i_gen);
 		error = ENOATTR;
 		goto vopunlock_exit;
 	}
Index: ufs/ufs/ufs_lookup.c
===================================================================
--- ufs/ufs/ufs_lookup.c	(revision 257489)
+++ ufs/ufs/ufs_lookup.c	(working copy)
@@ -771,11 +771,11 @@
 	mp = ITOV(ip)->v_mount;
 	if ((mp->mnt_flag & MNT_RDONLY) == 0)
 		panic("ufs_dirbad: %s: bad dir ino %ju at offset %ld: %s",
-		    mp->mnt_stat.f_mntonname, (uintmax_t)ip->i_number,
+		    mp->mnt_path, (uintmax_t)ip->i_number,
 		    (long)offset, how);
 	else
 		(void)printf("%s: bad dir ino %ju at offset %ld: %s\n",
-		    mp->mnt_stat.f_mntonname, (uintmax_t)ip->i_number,
+		    mp->mnt_path, (uintmax_t)ip->i_number,
 		    (long)offset, how);
 }
 
Index: ufs/ufs/ufs_quota.c
===================================================================
--- ufs/ufs/ufs_quota.c	(revision 257489)
+++ ufs/ufs/ufs_quota.c	(working copy)
@@ -238,7 +238,7 @@
 		DQI_UNLOCK(dq);
 		if (warn)
 			uprintf("\n%s: warning, %s disk quota exceeded\n",
-			    ITOV(ip)->v_mount->mnt_stat.f_mntonname,
+			    ITOV(ip)->v_mount->mnt_path,
 			    quotatypes[i]);
 	}
 	return (0);
@@ -264,7 +264,7 @@
 			dq->dq_flags |= DQ_BLKS;
 			DQI_UNLOCK(dq);
 			uprintf("\n%s: write failed, %s disk limit reached\n",
-			    ITOV(ip)->v_mount->mnt_stat.f_mntonname,
+			    ITOV(ip)->v_mount->mnt_path,
 			    quotatypes[type]);
 			return (EDQUOT);
 		}
@@ -289,7 +289,7 @@
 				DQI_UNLOCK(dq);
 				uprintf("\n%s: write failed, %s "
 				    "disk quota exceeded for too long\n",
-				    ITOV(ip)->v_mount->mnt_stat.f_mntonname,
+				    ITOV(ip)->v_mount->mnt_path,
 				    quotatypes[type]);
 				return (EDQUOT);
 			}
@@ -382,7 +382,7 @@
 		DQI_UNLOCK(dq);
 		if (warn)
 			uprintf("\n%s: warning, %s inode quota exceeded\n",
-			    ITOV(ip)->v_mount->mnt_stat.f_mntonname,
+			    ITOV(ip)->v_mount->mnt_path,
 			    quotatypes[i]);
 	}
 	return (0);
@@ -407,7 +407,7 @@
 			dq->dq_flags |= DQ_INODS;
 			DQI_UNLOCK(dq);
 			uprintf("\n%s: write failed, %s inode limit reached\n",
-			    ITOV(ip)->v_mount->mnt_stat.f_mntonname,
+			    ITOV(ip)->v_mount->mnt_path,
 			    quotatypes[type]);
 			return (EDQUOT);
 		}
@@ -432,7 +432,7 @@
 				DQI_UNLOCK(dq);
 				uprintf("\n%s: write failed, %s "
 				    "inode quota exceeded for too long\n",
-				    ITOV(ip)->v_mount->mnt_stat.f_mntonname,
+				    ITOV(ip)->v_mount->mnt_path,
 				    quotatypes[type]);
 				return (EDQUOT);
 			}

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 19:43:03 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5DD8442A;
 Thu, 21 Nov 2013 19:43:03 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 379542FED;
 Thu, 21 Nov 2013 19:43:03 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 586F0B98A;
 Thu, 21 Nov 2013 14:43:02 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hackers@freebsd.org
Subject: Re: taskqueue_block
Date: Thu, 21 Nov 2013 14:14:06 -0500
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; )
References: <5287BDB9.10201@FreeBSD.org> <528B7681.6090806@FreeBSD.org>
 <CAJ-Vmon5AuBDO8q3uddSnvqBTq71r9vW66DAk9oVpLKUUbX0mA@mail.gmail.com>
In-Reply-To: <CAJ-Vmon5AuBDO8q3uddSnvqBTq71r9vW66DAk9oVpLKUUbX0mA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201311211414.06849.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Thu, 21 Nov 2013 14:43:02 -0500 (EST)
Cc: Adrian Chadd <adrian@freebsd.org>, Andriy Gapon <avg@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 19:43:03 -0000

On Tuesday, November 19, 2013 10:29:18 pm Adrian Chadd wrote:
> Yes, and lets fix this. :)

Hmm, is taskqueue_block() always used in context where waiting is safe?
 
> On 19 November 2013 06:32, Andriy Gapon <avg@freebsd.org> wrote:
> >
> > Forwarding this to the larger audience for a discussion.
> >
> > -------- Original Message --------
> > Message-ID: <5287BDB9.10201@FreeBSD.org>
> > Date: Sat, 16 Nov 2013 20:47:21 +0200
> > From: Andriy Gapon <avg@FreeBSD.org>
> > Subject: taskqueue_block
> >
> >
> >
> > It seems that either I do not understand something about taskqueue_block code or
> > it is a quite dangerous and abused API.  The fact that it is not properly
> > documented does not help either.
> >
> > The commit message said:
> >> Implement taskqueue_block() and taskqueue_unblock(). These functions allow the
> >> owner of a queue to block and unblock execution of the tasks in the queue while
> >> allowing tasks to continue to be added queue. Combining this with
> >> taskqueue_drain() allows a queue to be safely disabled. The unblock function may
> > [...]
> >
> > I indeed see this (anti?) pattern being used in the code.
> > But what about the following case.   One thread calls taskqueue_block() and sets
> > TQ_FLAGS_BLOCKED.  Another thread calls taskqueue_enqueue, this adds a task to
> > the queue and sets ta_pending of the task to 1.  tq_enqueue is not called, so an
> > actual queue runner is not called or waken up.   Then the first thread calls
> > taskqueue_drain() on the task.  As far as I can see, the thread would then just
> > wait forever because the task is pending and is not going to be executed.
> >
> > Additionally, it is impossible to reason about the taskqueue's state after
> > taskqueue_block call, because the call just sets the flag and does not do any
> > synchronization.  And as described above, it is not safe to call APIs that could
> > allow the taskqueue or the task state to become known.
> >
> > I think that taskqueue_block() should wait on the currently active tasks to
> > complete.  I don't think that this behavior could be optional.  I do see any
> > reasonable and safe use for "non-blocking" taskqueue_block().
> > taskqueue_drain() calls after taskqueue_block() must be removed.  The code
> > should either use taskqueue_drain() or "blocking" taskqueue_block() depending on
> > concrete circumstances.
> >
> > What do you think?
> > Thank you.
> > --
> > Andriy Gapon
> >
> >
> >
> > _______________________________________________
> > freebsd-hackers@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
> 

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 20:18:09 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 79DA77C8;
 Thu, 21 Nov 2013 20:18:09 +0000 (UTC)
Received: from mail-qa0-x236.google.com (mail-qa0-x236.google.com
 [IPv6:2607:f8b0:400d:c00::236])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1C4242259;
 Thu, 21 Nov 2013 20:18:09 +0000 (UTC)
Received: by mail-qa0-f54.google.com with SMTP id f11so4401168qae.13
 for <multiple recipients>; Thu, 21 Nov 2013 12:18:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=ktYvXsMjnzS6RttthW0sWR+ZnBMCGo/gnxADDTbXmD8=;
 b=ufBv5B7fbLvrtXIJYMWtEAQpIFkB3iRU3CwH3CU6RqlecnLGqN8SX1r04j4RhHyoLE
 8hlaVzBsQGOtbORfVR60NK1d7ydGKdtyiRYHrcPBk7GiTICvx14Y93+pHZpiJA5s4k2K
 BdcaE/fPV4VVNOl7lhP5Zkns7EEWiTtW2V5cz/Zp0Siws46EOQKFOQ7d6iacQXPpCewQ
 7Wx3yI8VCrpujWJawc671Zq4NeeutbAHDRU+uhHuF/TV/2QMwUUhC5v5tqCOqp/oksf1
 4l+tSjdU8lEsjTtOa5S0tuO5EUxfM5xhgi3qcwEHc97oH7Ua763BeWPQv+Io79lvXU+K
 A3dg==
MIME-Version: 1.0
X-Received: by 10.229.13.69 with SMTP id b5mr14764956qca.13.1385065088135;
 Thu, 21 Nov 2013 12:18:08 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.207.66 with HTTP; Thu, 21 Nov 2013 12:18:08 -0800 (PST)
In-Reply-To: <201311211414.06849.jhb@freebsd.org>
References: <5287BDB9.10201@FreeBSD.org> <528B7681.6090806@FreeBSD.org>
 <CAJ-Vmon5AuBDO8q3uddSnvqBTq71r9vW66DAk9oVpLKUUbX0mA@mail.gmail.com>
 <201311211414.06849.jhb@freebsd.org>
Date: Thu, 21 Nov 2013 12:18:08 -0800
X-Google-Sender-Auth: HsVRcgu5OF3X7r92FDpzIFeeMoY
Message-ID: <CAJ-VmomHZpJ+XQz6rrZp2W9_4=2zJ53=6x3B79qO-gL4Hjc4dA@mail.gmail.com>
Subject: Re: taskqueue_block
From: Adrian Chadd <adrian@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 Andriy Gapon <avg@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 20:18:09 -0000

On 21 November 2013 11:14, John Baldwin <jhb@freebsd.org> wrote:
> On Tuesday, November 19, 2013 10:29:18 pm Adrian Chadd wrote:
>> Yes, and lets fix this. :)
>
> Hmm, is taskqueue_block() always used in context where waiting is safe?

I seem to recall that a taskqueue function may wish to block further
jobs from running. The trouble is that since it was called from a task
queued to that particular taskqueue, it'd hang. Sigh.

So yes, some slightly saner semantics would be nice.


-adrian

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 20:19:24 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C33B58E9;
 Thu, 21 Nov 2013 20:19:24 +0000 (UTC)
Received: from mail-qe0-x22a.google.com (mail-qe0-x22a.google.com
 [IPv6:2607:f8b0:400d:c02::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 77F40226B;
 Thu, 21 Nov 2013 20:19:24 +0000 (UTC)
Received: by mail-qe0-f42.google.com with SMTP id t9so234688qeq.1
 for <multiple recipients>; Thu, 21 Nov 2013 12:19:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=kpGDszWBrPsRKcq3ThWD4VaQexDymEzZEyTqFrjvITA=;
 b=Gu1JBQArnz+VfCZ2R4U+4lT/4Ss/35NRuLa52jOY0uUdoaoo9Zfm0dnUOJ0FdQBzHw
 YgKNuXphSf3xTl8PIz/WiGDTd0ltdC7VqQF64CN63ZIate0fIlTDjIvWT+w+Vvzqq+MZ
 GKqNvUHck8YM2sQ97zapx8kMyYJKeE5eH2kc1XyZxEeS/jkRVQF1aJ/KA80oLWtreUiY
 OO9nGC+FjvWgeMjMx80ohUp4GXGAfJKNg5JuFIRhA4r7X5ybmDA9LhLouCDYzQKmkxQv
 pT0+q9eosqojie96n/wA3nsaDb6dkylHmVRNFr0vyk6JmukMoSeq4mupRcOGObUY/1o6
 R8LA==
MIME-Version: 1.0
X-Received: by 10.49.59.70 with SMTP id x6mr14644774qeq.17.1385065163751; Thu,
 21 Nov 2013 12:19:23 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.207.66 with HTTP; Thu, 21 Nov 2013 12:19:23 -0800 (PST)
In-Reply-To: <CAGm6yaTEFECTYVb94A13TaXMPSLtKLpTbw4iNdgd8SuNF1QDaA@mail.gmail.com>
References: <CAGm6yaTEFECTYVb94A13TaXMPSLtKLpTbw4iNdgd8SuNF1QDaA@mail.gmail.com>
Date: Thu, 21 Nov 2013 12:19:23 -0800
X-Google-Sender-Auth: KpSA66aqysuZ2E9gne5DjdMEl1E
Message-ID: <CAJ-Vmokrchy4pXLvZ21sCV09fQUdYKeUYCEH1U1NdfDBxhyJQg@mail.gmail.com>
Subject: Re: 9.1 callout behavior
From: Adrian Chadd <adrian@freebsd.org>
To: Bret Ketchum <bcketchum@gmail.com>, Alexander Motin <mav@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 20:19:24 -0000

Hi,

It sounds like you may have found an interesting test case.

Mav, any ideas?



-adrian


On 21 November 2013 05:20, Bret Ketchum <bcketchum@gmail.com> wrote:
>      I've a callout which runs every 100ms and does a bit of accounting
> using the global ticks variable. This one-shot callout was called fairly
> consistently in 8.1, every 100ms give or take a few thousand clocks. I've
> recently upgraded to 9.1 and for the most part the period is consistent.
> However, periodically the callout function is executed anywhere between 5ms
> to 20ms after the callout was reset and the function returned while global
> ticks has increased 8x. The hardware has not changed (using the same
> timecounter configuration):
>
> CPU: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2500.05-MHz K8-class CPU)
>
> kern.timecounter.hardware: TSC-low
> kern.timecounter.tick: 1
> kern.timecounter.invariant_tsc: 1
> kern.timecounter.smp_tsc: 1
>
>      And default eventtimer configuration:
>
> kern.eventtimer.singlemul: 2
> kern.eventtimer.idletick: 0
> kern.eventtimer.activetick: 1
> kern.eventtimer.timer: LAPIC
> kern.eventtimer.periodic: 0
>
>     If tickless mode is disabled the inconsistency goes away. Is the
> premature expiration of the callout expected? Is the jump in global ticks
> typical (say from 100 ticks to 800 ticks in 1.5ms)?
>
>     Bret
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 20:24:01 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B9712C72
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 20:24:01 +0000 (UTC)
Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com
 [IPv6:2a00:1450:400c:c05::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5AC4D22EC
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 20:24:01 +0000 (UTC)
Received: by mail-wi0-f169.google.com with SMTP id hm6so661495wib.0
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 12:23:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=rQhJlX6XP5x+JGWs7QDsqWTzGfG5AALlYgtKUCnAaqQ=;
 b=FXON+sJNaG0QuYkgUtksHsEzx91JAcv1iiGnMv09DdbSm7tK3+leHlsS62vTSEG6cJ
 PHHXGlF9RWUWZZed7j29c+KjHV6cRgI/Q8kIo+upq5Epgr1dpwQBb7MBq9CKt36pQsFE
 ZnXufydxQPfKar3kG0P+mfMu6mjh7B+CNrBvr0Rz4YlUJpc9AkWoAhLiBFNlSE1mhQsu
 mOO/faK15vgYRSNomALHIam2KB6+0kb/YtGyGDCFxK0GfchEyh/pVpt9O+Xa7nwHyLtK
 AfXRQwuyYPP1OhGftcGWcLmsAUnoa3unpp/p834dhPV5z5h0BY44fF9Nb50R0ZIVzDxn
 GN1w==
MIME-Version: 1.0
X-Received: by 10.180.74.174 with SMTP id u14mr7302128wiv.53.1385065439737;
 Thu, 21 Nov 2013 12:23:59 -0800 (PST)
Received: by 10.216.65.130 with HTTP; Thu, 21 Nov 2013 12:23:59 -0800 (PST)
Date: Thu, 21 Nov 2013 22:23:59 +0200
Message-ID: <CAP=KkTxQCzE+zOMQwnN0HYCxspj_-CEvyQ5nX5RyMmrbD5WExg@mail.gmail.com>
Subject: CRC32 feature in FreeBSD's bootloader
From: Boris Astardzhiev <boris.astardzhiev@gmail.com>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.16
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 20:24:01 -0000

Hello,

A few months ago I posted a new feature in the FreeBSD bootloader.
So far I haven't received any comments so I'll try to revive this topic.

http://www.freebsd.org/cgi/query-pr.cgi?pr=172301&cat=
http://lists.freebsd.org/pipermail/freebsd-fs/2012-October/015288.html

It may be of use to somebody. So any comments and suggestions?

Greetings,
Boris Astardzhiev

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 20:44:45 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EAA5DA35
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 20:44:45 +0000 (UTC)
Received: from co1outboundpool.messaging.microsoft.com
 (co1ehsobe003.messaging.microsoft.com [216.32.180.186])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id A99ED2497
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 20:44:45 +0000 (UTC)
Received: from mail192-co1-R.bigfish.com (10.243.78.231) by
 CO1EHSOBE002.bigfish.com (10.243.66.65) with Microsoft SMTP Server id
 14.1.225.22; Thu, 21 Nov 2013 20:44:34 +0000
Received: from mail192-co1 (localhost [127.0.0.1])	by
 mail192-co1-R.bigfish.com (Postfix) with ESMTP id 5C7379805EF;	Thu, 21 Nov
 2013 20:44:34 +0000 (UTC)
X-Forefront-Antispam-Report: CIP:157.56.240.101; KIP:(null); UIP:(null);
 IPV:NLI; H:BL2PRD0510HT001.namprd05.prod.outlook.com; RD:none; EFVD:NLI
X-SpamScore: 0
X-BigFish: VPS0(zz9371I542Izz1f42h2148h208ch1ee6h1de0h1fdah2073h2146h1202h1e76h1d1ah1d2ah1fc6hzz8275ch1de098h17326ah8275dh1de097h186068hz2fh109h2a8h839h947hd24hf0ah1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah224fh1d07h1d0ch1d2eh1d3fh1de9h1dfeh1dffh1fe8h1ff5h2216h22d0h9a9j1155h)
Received-SPF: pass (mail192-co1: domain of juniper.net designates
 157.56.240.101 as permitted sender) client-ip=157.56.240.101;
 envelope-from=aduane@juniper.net;
 helo=BL2PRD0510HT001.namprd05.prod.outlook.com ; .outlook.com ; 
X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;
 SFS:(377454003)(13464003)(199002)(189002)(74316001)(33646001)(79102001)(80976001)(80022001)(50986001)(77982001)(76796001)(83072001)(74502001)(56816003)(4396001)(81816001)(54316002)(15202345003)(63696002)(59766001)(74662001)(19580395003)(65816001)(19580405001)(49866001)(31966008)(56776001)(15975445006)(47446002)(47736001)(81686001)(83322001)(74366001)(2656002)(81342001)(74876001)(81542001)(46102001)(53806001)(76482001)(76576001)(76786001)(47976001)(54356001)(66066001)(74706001)(85306002)(87936001)(69226001)(51856001)(87266001)(24736002);
 DIR:OUT; SFP:; SCL:1; SRVR:BY2PR05MB582;
 H:BY2PR05MB582.namprd05.prod.outlook.com; CLIP:66.129.241.19; FPR:;
 RD:InfoNoRecords; A:1; MX:1; LANG:en; 
Received: from mail192-co1 (localhost.localdomain [127.0.0.1]) by mail192-co1
 (MessageSwitch) id 1385066672670737_26420;
 Thu, 21 Nov 2013 20:44:32 +0000 (UTC)
Received: from CO1EHSMHS010.bigfish.com (unknown [10.243.78.243])	by
 mail192-co1.bigfish.com (Postfix) with ESMTP id 9659014004C; Thu, 21 Nov 2013
 20:44:32 +0000 (UTC)
Received: from BL2PRD0510HT001.namprd05.prod.outlook.com (157.56.240.101) by
 CO1EHSMHS010.bigfish.com (10.243.66.20) with Microsoft SMTP Server (TLS) id
 14.16.227.3; Thu, 21 Nov 2013 20:44:32 +0000
Received: from BY2PR05MB582.namprd05.prod.outlook.com (10.141.219.146) by
 BL2PRD0510HT001.namprd05.prod.outlook.com (10.255.100.36) with Microsoft SMTP
 Server (TLS) id 14.16.383.1; Thu, 21 Nov 2013 20:44:32 +0000
Received: from BY2PR05MB582.namprd05.prod.outlook.com (10.141.219.146) by
 BY2PR05MB582.namprd05.prod.outlook.com (10.141.219.146) with Microsoft SMTP
 Server (TLS) id 15.0.820.5; Thu, 21 Nov 2013 20:44:30 +0000
Received: from BY2PR05MB582.namprd05.prod.outlook.com ([10.141.219.146]) by
 BY2PR05MB582.namprd05.prod.outlook.com ([10.141.219.146]) with mapi id
 15.00.0820.005; Thu, 21 Nov 2013 20:44:29 +0000
From: Andrew Duane <aduane@juniper.net>
To: Boris Astardzhiev <boris.astardzhiev@gmail.com>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: RE: CRC32 feature in FreeBSD's bootloader
Thread-Topic: CRC32 feature in FreeBSD's bootloader
Thread-Index: AQHO5ve1O2DgqpptYkKTB8hsnXVzzJowJrTg
Date: Thu, 21 Nov 2013 20:44:29 +0000
Message-ID: <597127d7d71a496995d9407842121a47@BY2PR05MB582.namprd05.prod.outlook.com>
References: <CAP=KkTxQCzE+zOMQwnN0HYCxspj_-CEvyQ5nX5RyMmrbD5WExg@mail.gmail.com>
In-Reply-To: <CAP=KkTxQCzE+zOMQwnN0HYCxspj_-CEvyQ5nX5RyMmrbD5WExg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [66.129.241.19]
x-forefront-prvs: 0037FD6480
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: juniper.net
X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn%
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 20:44:46 -0000

I'm all for it, depending on what you do with it. The bootloader I implemen=
ted for my platform tags every image it writes into flash with a checksum (=
we use MD5, not CRC32, but still), and can keep multiple copies as backup.

....................................
Andrew L. Duane
Resident Architect - AT&T Technical Lead
JNCIA - JUNOS
m=A0=A0=A0+1 603.770.7088
o    +1 408.933.6944 (2-6944)
skype: andrewlduane
aduane@juniper.net


LET'S=A0GET=A0STARTED=A0=20


-----Original Message-----
From: owner-freebsd-hackers@freebsd.org [mailto:owner-freebsd-hackers@freeb=
sd.org] On Behalf Of Boris Astardzhiev
Sent: Thursday, November 21, 2013 3:24 PM
To: freebsd-hackers@freebsd.org
Subject: CRC32 feature in FreeBSD's bootloader

Hello,

A few months ago I posted a new feature in the FreeBSD bootloader.
So far I haven't received any comments so I'll try to revive this topic.

http://www.freebsd.org/cgi/query-pr.cgi?pr=3D172301&cat=3D
http://lists.freebsd.org/pipermail/freebsd-fs/2012-October/015288.html

It may be of use to somebody. So any comments and suggestions?

Greetings,
Boris Astardzhiev
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"




From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 20:50:23 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 36EEBF84
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 20:50:23 +0000 (UTC)
Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 040B6251E
 for <freebsd-hackers@freebsd.org>; Thu, 21 Nov 2013 20:50:22 +0000 (UTC)
Received: from smtp.fisglobal.com ([10.132.206.16])
 by ltcfislmsgpa03.fnfis.com (8.14.5/8.14.5) with ESMTP id rALKoLbb028731
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT);
 Thu, 21 Nov 2013 14:50:21 -0600
Received: from LTCFISWMSGMB21.FNFIS.com ([169.254.1.7]) by
 LTCFISWMSGHT05.FNFIS.com ([10.132.206.16]) with mapi id 14.03.0158.001; Thu,
 21 Nov 2013 14:50:21 -0600
From: "Teske, Devin" <Devin.Teske@fisglobal.com>
To: Boris Astardzhiev <boris.astardzhiev@gmail.com>
Subject: Re: CRC32 feature in FreeBSD's bootloader
Thread-Topic: CRC32 feature in FreeBSD's bootloader
Thread-Index: AQHO5vtKBtkf5jJNEkqH2M7ihlEMZA==
Date: Thu, 21 Nov 2013 20:50:19 +0000
Message-ID: <4B3925A5-9DBF-42B4-A12D-C9C7D5E6078C@fisglobal.com>
References: <CAP=KkTxQCzE+zOMQwnN0HYCxspj_-CEvyQ5nX5RyMmrbD5WExg@mail.gmail.com>
In-Reply-To: <CAP=KkTxQCzE+zOMQwnN0HYCxspj_-CEvyQ5nX5RyMmrbD5WExg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.132.253.120]
Content-Type: text/plain; charset="us-ascii"
Content-ID: <C852FE2EA78CBA4AA8892198A417C78F@fisglobal.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8794, 1.0.14,
 0.0.0000
 definitions=2013-11-21_06:2013-11-21,2013-11-21,1970-01-01 signatures=0
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>, "Teske,
 Devin" <Devin.Teske@fisglobal.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
Reply-To: Devin Teske <dteske@freebsd.org>
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 20:50:23 -0000


On Nov 21, 2013, at 12:23 PM, Boris Astardzhiev wrote:

> Hello,
>=20
> A few months ago I posted a new feature in the FreeBSD bootloader.
> So far I haven't received any comments so I'll try to revive this topic.
>=20
> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D172301&cat=3D
> http://lists.freebsd.org/pipermail/freebsd-fs/2012-October/015288.html
>=20
> It may be of use to somebody. So any comments and suggestions?
>=20

I think it's a great idea. But...

Can you extend it to be available to the Forth layer.
That is, add a command to ficl.c that calls your code.

I would very much like to be able to compute the CRC32 of a file from
within Forth and get the results back on the stack.
--=20
Devin

_____________
The information contained in this message is proprietary and/or confidentia=
l. If you are not the intended recipient, please: (i) delete the message an=
d all copies; (ii) do not disclose, distribute or use the message in any ma=
nner; and (iii) notify the sender immediately. In addition, please be aware=
 that any message addressed to our domain is subject to archiving and revie=
w by persons other than the intended recipient. Thank you.

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 21:15:55 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6C5BC94B;
 Thu, 21 Nov 2013 21:15:55 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 095A726D1;
 Thu, 21 Nov 2013 21:15:54 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rALLFkpI074541;
 Thu, 21 Nov 2013 23:15:46 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rALLFkpI074541
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id rALLFkvl074540;
 Thu, 21 Nov 2013 23:15:46 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 21 Nov 2013 23:15:46 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Vitaly Magerya <vmagerya@gmail.com>
Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler
Message-ID: <20131121211546.GQ59496@kib.kiev.ua>
References: <528DFEE6.6020504@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="JeIqLcbgB5JjL5AU"
Content-Disposition: inline
In-Reply-To: <528DFEE6.6020504@gmail.com>
User-Agent: Mutt/1.5.22 (2013-10-16)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-hackers@freebsd.org, davidxu@freebsd.org, threads@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 21:15:55 -0000


--JeIqLcbgB5JjL5AU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Nov 21, 2013 at 02:39:02PM +0200, Vitaly Magerya wrote:
> Hi, folks. I'm investigating a test case failure that devel/boehm-gc
> has on recent FreeBSD releases. The problem is that a signal
> handler registered for SIGUSR1 is sometimes called with signum=3D0,
> which should not be possible under any conditions.
>=20
> Here's a simple test case that demonstrates this behavior:
>=20
> /* Compile with 'c99 -o example example.c -pthread'
>  */
> #include <pthread.h>
> #include <signal.h>
> #include <stdio.h>
> #include <stdlib.h>
>=20
> void signal_handler(int signum, siginfo_t *si, void *context) {
>     if (signum !=3D SIGUSR1) {
>         printf("bad signal, signum=3D%d\n", signum);
>         exit(1);
>     }
> }
>=20
> void *thread_func(void *arg) {
>     return arg;
> }
>=20
> int main(void) {
>     struct sigaction sa =3D { 0 };
>     sa.sa_flags =3D SA_SIGINFO;
>     sa.sa_sigaction =3D signal_handler;
>     if (sigfillset(&sa.sa_mask) !=3D 0) abort();
>     if (sigaction(SIGUSR1, &sa, NULL) !=3D 0) abort();
>     for (int i =3D 0; i < 10000; i++) {
>         pthread_t t;
>         pthread_create(&t, NULL, thread_func, NULL);
>         pthread_kill(t, SIGUSR1);
Side note.  pthread_kill(3) call behaviour is undefined if pthread_create(3)
in the line before failed.

>     }
>     return 0;
> }
>=20
> Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get
> "signum=3D0" from this program, but you may need to run it a few
> times or increase the number of iterations to see the same.
>=20
> Interestingly enough, I don't see this behavior under 9.0-RELEASE.
>=20
> So, any ideas what the problem here is?

It happens when libthr deferred signal handling path is taken for signal
delivery and for some reason the code inside the deferred path called
into rtld for symbol binding. Than, rtld lock is locked, some code in
rtld is executed, and rtld lock is unlocked. Unlock causes _thr_ast()
run, which results in the nested check_deferred_signal() execution.
The check_deferred_signal() clearks si_signo, so on return the same
signal is delivered one more time, but is advertized as signo zero.

The _thr_rtld_init() approach of doing dummy calls does not really work,
since it is not practically possible to enumerate the symbols needed
during signal delivery.

My first attempt to fix this was to increment curthread->critical_count
around the calls to check_* functions in the _thr_ast(), but it causes
reverse problem of losing _thr_ast() runs on unlock.

I ended up with the flag to indicate that deferred delivery is running,
so check_deferred_signal() should avoid doing anything. A delicate
moment is that user signal handler is allowed to modify the passed
machine context to result the return from the signal handler to cause
arbitrary jump, or just do longjmp(). For this case, I also clear the
flag in thr_sighandler(), since kernel signal delivery means that nested
delivery code should not run right now.

Please try this.

diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_privat=
e.h
index 83a02b5..c6651cd 100644
--- a/lib/libthr/thread/thr_private.h
+++ b/lib/libthr/thread/thr_private.h
@@ -433,6 +433,9 @@ struct pthread {
 	/* the sigaction should be used for deferred signal. */
 	struct sigaction	deferred_sigact;
=20
+	/* deferred signal delivery is performed, do not reenter. */
+	int			deferred_run;
+
 	/* Force new thread to exit. */
 	int			force_exit;
=20
diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c
index 415ddb0..57c9406 100644
--- a/lib/libthr/thread/thr_sig.c
+++ b/lib/libthr/thread/thr_sig.c
@@ -162,6 +162,7 @@ thr_sighandler(int sig, siginfo_t *info, void *_ucp)
 	act =3D _thr_sigact[sig-1].sigact;
 	_thr_rwl_unlock(&_thr_sigact[sig-1].lock);
 	errno =3D err;
+	curthread->deferred_run =3D 0;
=20
 	/*
 	 * if a thread is in critical region, for example it holds low level lock=
s,
@@ -320,14 +321,18 @@ check_deferred_signal(struct pthread *curthread)
 	siginfo_t info;
 	int uc_len;
=20
-	if (__predict_true(curthread->deferred_siginfo.si_signo =3D=3D 0))
+	if (__predict_true(curthread->deferred_siginfo.si_signo =3D=3D 0 ||
+	    curthread->deferred_run))
 		return;
=20
+	curthread->deferred_run =3D 1;
 	uc_len =3D __getcontextx_size();
 	uc =3D alloca(uc_len);
 	getcontext(uc);
-	if (curthread->deferred_siginfo.si_signo =3D=3D 0)
+	if (curthread->deferred_siginfo.si_signo =3D=3D 0) {
+		curthread->deferred_run =3D 0;
 		return;
+	}
 	__fillcontextx2((char *)uc);
 	act =3D curthread->deferred_sigact;
 	uc->uc_sigmask =3D curthread->deferred_sigmask;

--JeIqLcbgB5JjL5AU
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSjngBAAoJEJDCuSvBvK1BKx0QAJjAmJSh9i2IQC6e8pF1QXJG
P6lTmX3WpLVdnPAA5ord/KoiBCaNJQ4w2YaEWOzuP3o4GHX70dYLY9HWHuwgMhei
NzS+xOCdzZcPDI68ZghJ/N/67oJSlC9i/N4RLdgDqaBpElYrOKk1pmXqpQ/216op
XinMrpR5oR4TvXJ80dNCsGzc5xQ0J9LW5TjYf3rzHSJSaYWO6jSIUwDrb6kLxtVA
7enT9j8rMO+HbXgWNNcXMBTAfo+2PabK/33twemiX7dbzGTQapbVK6RU9MYBYO0N
2Sa6YI0Zd5SFJyXLLggPi/Qop/mGIrsCgd2ICOsGnBYtc5qGpeFZkbKB8OnRdw02
u4HWokfnaE6eH+ktipA9+nbpAGL3MCsHgSZBLoIKDX0YWmqvEMM6wHdrJWWwIfEB
/YJp8iHGwbrjtXx4ddUqa/30BRU1HzDImPafbAOvVdjLKFQozpHPJFwRhX+2NEA/
TA7PlXXLDVXc4wE7eP0Lo/8Vpnhk/Wv5Xz2a97F6IzdeOZpbuQwLaFf5eOJD77z9
8J1hhwE//c7nlk+9ovvRvqOdXyGeQSZaW22BRNu4VjYW/Cs5uaSGBCfPKJe99DGx
4tl3vaP28nnhQRH3reqyE/fJtfaJkMrGccO2EYVbkibaLWMEMBmLQ57no4TXrdWS
BU5IUgDGkfqq4DKVpL87
=jCRC
-----END PGP SIGNATURE-----

--JeIqLcbgB5JjL5AU--

From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 21 23:13:00 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 14E676BF;
 Thu, 21 Nov 2013 23:12:58 +0000 (UTC)
Received: from hoyletech.com (hoyletech.com [174.136.108.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 239F42F15;
 Thu, 21 Nov 2013 23:12:58 +0000 (UTC)
Received: from unknown (pool-108-51-142-17.washdc.fios.verizon.net
 [108.51.142.17])
 by hoyletech.com (Postfix) with ESMTPSA id 3F5DD60EC2;
 Thu, 21 Nov 2013 15:12:51 -0800 (PST)
Date: Thu, 21 Nov 2013 18:12:32 -0500
From: Nathanael Hoyle <nhoyle@hoyletech.com>
To: Doug Ambrisko <ambrisko@ambrisko.com>
Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs
Message-ID: <20131121181232.000071b8@unknown>
In-Reply-To: <20131121174028.GA80520@ambrisko.com>
References: <51B3B59B.8050903@erdgeist.org>
 <CAMBSHm8GMWffuuEcSpuNu26Mv4N2yAa2iEdw5koiXx0w30zPRQ@mail.gmail.com>
 <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org>
 <20131115010854.GA76106@ambrisko.com>
 <20131116183129.GD59496@kib.kiev.ua>
 <20131118190142.GA28210@ambrisko.com>
 <20131119074922.GY59496@kib.kiev.ua>
 <20131119174216.GA80753@ambrisko.com>
 <20131120075531.GE59496@kib.kiev.ua>
 <20131121174028.GA80520@ambrisko.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers@freebsd.org,
 Dirk Engling <erdgeist@erdgeist.org>, Jase Thew <jase@freebsd.org>,
 mdf@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 23:13:00 -0000

On Thu, 21 Nov 2013 09:40:28 -0800
Doug Ambrisko <ambrisko@ambrisko.com> wrote:

> On Wed, Nov 20, 2013 at 09:55:31AM +0200, Konstantin Belousov wrote:
> | On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote:
> | > I was talking about the more general case since the system tries
> to keep | > the path in the stat structure.  My prior approach which
> had more issues | > was to modify the stat structure of which I was
> pointed to NetBSD and their | > change to statvfs which doesn't
> really solve the problem.  They don't | > have the check to see if
> the mount is longer then VFS_MNAMELEN (in their case) | > and just
> truncate things. | > 
> | > If we are just talking about adding it to the mount structure that
> | > would be okay since it isn't exposed to user land.  I can add
> that. |
> | Yes, this is exactly what I mean.  Add a struct mount field, and use
> | it for kernel only.  In fact, it only matters for sys_unmount() and
> | kern_jail.c, other locations in kernel use the path for warnings,
> and | this could be postponed if you prefer to minimize the patch.
> 
> Okay, I went through all of the occurances and compile tested (except
> for #DEBUG).  I united a few things but should do more once I get
> consensus on the approach.  I found a few spots that should be
> updated as well and made the length check more consistant.  Some were
> doing >= and others
> >.  So this should be better, however, a lot larger.  On the plus side
> when we figure out how to return the longer path length to user land
> that can be more flexible since the kernel is tracking the longer
> length. Probably things to note are changes in:
> 	ZFS to mount snapshot
> 	cd9660 for symlinks
> 	fuse to return full path
> 	jail to check statfs and mount
> 	mount/umount to save and check full path
> 	mountroot to save new field for full path
> 	
> Just in case it doesn't make it in email the full patch is at:
> 	http://people.freebsd.org/~ambrisko/mount_bigger.patch
> 
> Thanks,
> 
> Doug A.
> 

Hey, long-time lurker, don't normally post, but I think this introduces
a boundary error. It certainly appears to make the code not match the
comments.

> Index: cddl/compat/opensolaris/kern/opensolaris_vfs.c
> ===================================================================
> --- cddl/compat/opensolaris/kern/opensolaris_vfs.c	(revision
> 257489) +++ cddl/compat/opensolaris/kern/opensolaris_vfs.c
> (working copy) @@ -126,7 +126,7 @@
>  	 * variables will fit in our mp buffers, including the
>  	 * terminating NUL.
>  	 */
> -	if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >=
> MNAMELEN)
> +	if (strlen(fstype) > MFSNAMELEN || strlen(fspath) >
> MAXPATHLEN) return (ENAMETOOLONG);
>  
>  	vfsp = vfs_byname_kld(fstype, td, &error);

The change from >= to > in this comparison means that where
strlen(fspath)==MAXPATHLEN, this guard is passed and no error is thrown.

> ===================================================================
> --- kern/vfs_mount.c	(revision 257489)
> +++ kern/vfs_mount.c	(working copy)
> @@ -473,6 +473,7 @@
>  	mp->mnt_cred = crdup(cred);
>  	mp->mnt_stat.f_owner = cred->cr_uid;
>  	strlcpy(mp->mnt_stat.f_mntonname, fspath, MNAMELEN);
> +	strlcpy((char *)mp->mnt_path, fspath, MAXPATHLEN);
>  	mp->mnt_iosize_max = DFLTPHYS;
>  #ifdef MAC
>  	mac_mount_init(mp);
> @@ -656,7 +657,7 @@
>  	 * variables will fit in our mp buffers, including the
>  	 * terminating NUL.
>  	 */
> -	if (fstypelen > MFSNAMELEN || fspathlen > MNAMELEN) {
> +	if (fstypelen > MFSNAMELEN || fspathlen > MAXPATHLEN) {
>  		error = ENAMETOOLONG;
>  		goto bail;

Same logic is used here, so it doesn't fail here.

>  	}
> @@ -748,8 +749,8 @@
>  		return (EOPNOTSUPP);
>  	}
>  
> -	ma = mount_argsu(ma, "fstype", uap->type, MNAMELEN);
> -	ma = mount_argsu(ma, "fspath", uap->path, MNAMELEN);
> +	ma = mount_argsu(ma, "fstype", uap->type, MFSNAMELEN);
> +	ma = mount_argsu(ma, "fspath", uap->path, MAXPATHLEN);
>  	ma = mount_argb(ma, flags & MNT_RDONLY, "noro");
>  	ma = mount_argb(ma, !(flags & MNT_NOSUID), "nosuid");
>  	ma = mount_argb(ma, !(flags & MNT_NOEXEC), "noexec");
> @@ -1040,7 +1041,7 @@
>  	 * variables will fit in our mp buffers, including the
>  	 * terminating NUL.
>  	 */
> -	if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >=
> MNAMELEN)
> +	if (strlen(fstype) > MFSNAMELEN || strlen(fspath) >
> MAXPATHLEN) return (ENAMETOOLONG);


Adding the rest of the comment from the sources here:
        /*
         * Be ultra-paranoid about making sure the type and fspath
         * variables will fit in our mp buffers, including the
         * terminating NUL.
         */

Ok, so intent is to ensure that the provided mount path can fully fit,
*including* terminating NULL character. For this to be true,
strlen(fspath)+1 must be <= MAXPATHLEN. This is not true when
strlen(fspath) == MAXPATHLEN, but is not detected in this check. At the
very least, here the code and the comments differ now.


> Index: kern/vfs_mountroot.c
> ===================================================================
> --- kern/vfs_mountroot.c	(revision 257489)
> +++ kern/vfs_mountroot.c	(working copy)
> @@ -307,6 +307,8 @@
>  				vp->v_mountedhere = mporoot;
>  				strlcpy(mporoot->mnt_stat.f_mntonname,
>  				    fspath, MNAMELEN);
> +				strlcpy((char *)mporoot->mnt_path,
> +				    fspath, MAXPATHLEN);
>  				VOP_UNLOCK(vp, 0);
>  			} else
>  				vput(vp);

Here strlcpy is used safely. However, in cases where
strlen(fspath)==MAXPATHLEN, the result is to truncate the last
character in fspath and replace it with a NULL byte to ensure proper
termination. Now mnt_path will contain not the "actual path", but the
actual path truncated by 1 character, and back to not matching what was
in the original call (same problem in earlier thread discussion).

Also, two different paths which differed in only their last character
would get truncated to the same path name (hilarity ensues).


Hopefully my understanding is correct and I'm not completely off track
and being unhelpful. I'm certain I have less experience with this code
than others on the thread, but wanted to note what I believe is an
issue.

Regards,
-Nathanael Hoyle

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 03:55:55 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 90A6B92A;
 Fri, 22 Nov 2013 03:55:55 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7085324B5;
 Fri, 22 Nov 2013 03:55:55 +0000 (UTC)
Received: from xyf.my.dom (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rAM3tskd068692;
 Fri, 22 Nov 2013 03:55:54 GMT (envelope-from davidxu@freebsd.org)
Message-ID: <528ED5D3.1030906@freebsd.org>
Date: Fri, 22 Nov 2013 11:56:03 +0800
From: David Xu <davidxu@freebsd.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
 rv:17.0) Gecko/20130416 Thunderbird/17.0.5
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler
References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua>
In-Reply-To: <20131121211546.GQ59496@kib.kiev.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, Vitaly Magerya <vmagerya@gmail.com>,
 threads@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 03:55:55 -0000

On 2013/11/22 05:15, Konstantin Belousov wrote:
> On Thu, Nov 21, 2013 at 02:39:02PM +0200, Vitaly Magerya wrote:
>> Hi, folks. I'm investigating a test case failure that devel/boehm-gc
>> has on recent FreeBSD releases. The problem is that a signal
>> handler registered for SIGUSR1 is sometimes called with signum=0,
>> which should not be possible under any conditions.
>>
>> Here's a simple test case that demonstrates this behavior:
>>
>> /* Compile with 'c99 -o example example.c -pthread'
>>   */
>> #include <pthread.h>
>> #include <signal.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> void signal_handler(int signum, siginfo_t *si, void *context) {
>>      if (signum != SIGUSR1) {
>>          printf("bad signal, signum=%d\n", signum);
>>          exit(1);
>>      }
>> }
>>
>> void *thread_func(void *arg) {
>>      return arg;
>> }
>>
>> int main(void) {
>>      struct sigaction sa = { 0 };
>>      sa.sa_flags = SA_SIGINFO;
>>      sa.sa_sigaction = signal_handler;
>>      if (sigfillset(&sa.sa_mask) != 0) abort();
>>      if (sigaction(SIGUSR1, &sa, NULL) != 0) abort();
>>      for (int i = 0; i < 10000; i++) {
>>          pthread_t t;
>>          pthread_create(&t, NULL, thread_func, NULL);
>>          pthread_kill(t, SIGUSR1);
> Side note.  pthread_kill(3) call behaviour is undefined if pthread_create(3)
> in the line before failed.
>
>>      }
>>      return 0;
>> }
>>
>> Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get
>> "signum=0" from this program, but you may need to run it a few
>> times or increase the number of iterations to see the same.
>>
>> Interestingly enough, I don't see this behavior under 9.0-RELEASE.
>>
>> So, any ideas what the problem here is?
>
> It happens when libthr deferred signal handling path is taken for signal
> delivery and for some reason the code inside the deferred path called
> into rtld for symbol binding. Than, rtld lock is locked, some code in
> rtld is executed, and rtld lock is unlocked. Unlock causes _thr_ast()
> run, which results in the nested check_deferred_signal() execution.
> The check_deferred_signal() clearks si_signo, so on return the same
> signal is delivered one more time, but is advertized as signo zero.
>
> The _thr_rtld_init() approach of doing dummy calls does not really work,
> since it is not practically possible to enumerate the symbols needed
> during signal delivery.
>
> My first attempt to fix this was to increment curthread->critical_count
> around the calls to check_* functions in the _thr_ast(), but it causes
> reverse problem of losing _thr_ast() runs on unlock.
>
> I ended up with the flag to indicate that deferred delivery is running,
> so check_deferred_signal() should avoid doing anything. A delicate
> moment is that user signal handler is allowed to modify the passed
> machine context to result the return from the signal handler to cause
> arbitrary jump, or just do longjmp(). For this case, I also clear the
> flag in thr_sighandler(), since kernel signal delivery means that nested
> delivery code should not run right now.
>
> Please try this.
>
> diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_private.h
> index 83a02b5..c6651cd 100644
> --- a/lib/libthr/thread/thr_private.h
> +++ b/lib/libthr/thread/thr_private.h
> @@ -433,6 +433,9 @@ struct pthread {
>   	/* the sigaction should be used for deferred signal. */
>   	struct sigaction	deferred_sigact;
>
> +	/* deferred signal delivery is performed, do not reenter. */
> +	int			deferred_run;
> +
>   	/* Force new thread to exit. */
>   	int			force_exit;
>
> diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c
> index 415ddb0..57c9406 100644
> --- a/lib/libthr/thread/thr_sig.c
> +++ b/lib/libthr/thread/thr_sig.c
> @@ -162,6 +162,7 @@ thr_sighandler(int sig, siginfo_t *info, void *_ucp)
>   	act = _thr_sigact[sig-1].sigact;
>   	_thr_rwl_unlock(&_thr_sigact[sig-1].lock);
>   	errno = err;
> +	curthread->deferred_run = 0;
>
>   	/*
>   	 * if a thread is in critical region, for example it holds low level locks,
> @@ -320,14 +321,18 @@ check_deferred_signal(struct pthread *curthread)
>   	siginfo_t info;
>   	int uc_len;
>
> -	if (__predict_true(curthread->deferred_siginfo.si_signo == 0))
> +	if (__predict_true(curthread->deferred_siginfo.si_signo == 0 ||
> +	    curthread->deferred_run))
>   		return;
>
> +	curthread->deferred_run = 1;
>   	uc_len = __getcontextx_size();
>   	uc = alloca(uc_len);
>   	getcontext(uc);
> -	if (curthread->deferred_siginfo.si_signo == 0)
> +	if (curthread->deferred_siginfo.si_signo == 0) {
> +		curthread->deferred_run = 0;
>   		return;
> +	}
>   	__fillcontextx2((char *)uc);
>   	act = curthread->deferred_sigact;
>   	uc->uc_sigmask = curthread->deferred_sigmask;
>

The patch looks fine to me.


From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 07:42:43 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D0F3A6D9;
 Fri, 22 Nov 2013 07:42:43 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 50ACC2F41;
 Fri, 22 Nov 2013 07:42:43 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAM7gTMh005995;
 Fri, 22 Nov 2013 09:42:29 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAM7gTMh005995
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id rAM7gSvn005994;
 Fri, 22 Nov 2013 09:42:28 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 22 Nov 2013 09:42:28 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Doug Ambrisko <ambrisko@ambrisko.com>
Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs
Message-ID: <20131122074228.GT59496@kib.kiev.ua>
References: <CAMBSHm8GMWffuuEcSpuNu26Mv4N2yAa2iEdw5koiXx0w30zPRQ@mail.gmail.com>
 <201306101152.17966.jhb@freebsd.org> <52854161.6080104@FreeBSD.org>
 <20131115010854.GA76106@ambrisko.com>
 <20131116183129.GD59496@kib.kiev.ua>
 <20131118190142.GA28210@ambrisko.com>
 <20131119074922.GY59496@kib.kiev.ua>
 <20131119174216.GA80753@ambrisko.com>
 <20131120075531.GE59496@kib.kiev.ua>
 <20131121174028.GA80520@ambrisko.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="T+nnW5vHQKf/VlFb"
Content-Disposition: inline
In-Reply-To: <20131121174028.GA80520@ambrisko.com>
User-Agent: Mutt/1.5.22 (2013-10-16)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-hackers@freebsd.org, Dirk Engling <erdgeist@erdgeist.org>,
 Jase Thew <jase@freebsd.org>, mdf@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 07:42:43 -0000


--T+nnW5vHQKf/VlFb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Nov 21, 2013 at 09:40:28AM -0800, Doug Ambrisko wrote:
> On Wed, Nov 20, 2013 at 09:55:31AM +0200, Konstantin Belousov wrote:
> | On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote:
> | > I was talking about the more general case since the system tries to k=
eep
> | > the path in the stat structure.  My prior approach which had more iss=
ues
> | > was to modify the stat structure of which I was pointed to NetBSD and=
 their
> | > change to statvfs which doesn't really solve the problem.  They don't
> | > have the check to see if the mount is longer then VFS_MNAMELEN (in th=
eir case)
> | > and just truncate things.
> | >=20
> | > If we are just talking about adding it to the mount structure that
> | > would be okay since it isn't exposed to user land.  I can add that.
> |
> | Yes, this is exactly what I mean.  Add a struct mount field, and use
> | it for kernel only.  In fact, it only matters for sys_unmount() and
> | kern_jail.c, other locations in kernel use the path for warnings, and
> | this could be postponed if you prefer to minimize the patch.
>=20
> Okay, I went through all of the occurances and compile tested (except
> for #DEBUG).  I united a few things but should do more once I get
> consensus on the approach.  I found a few spots that should be updated as
> well and made the length check more consistant.  Some were doing >=3D and=
 others
> >.  So this should be better, however, a lot larger.  On the plus side
> when we figure out how to return the longer path length to user land
> that can be more flexible since the kernel is tracking the longer length.
> Probably things to note are changes in:
> 	ZFS to mount snapshot
> 	cd9660 for symlinks
> 	fuse to return full path
> 	jail to check statfs and mount
> 	mount/umount to save and check full path
> 	mountroot to save new field for full path
> =09
> Just in case it doesn't make it in email the full patch is at:
> 	http://people.freebsd.org/~ambrisko/mount_bigger.patch
>=20
Yes, this is closer to the patch I can agree with :).

Two notes, one was already made, about off by one.

Second is, I suggest to make the mnt_path member a char *. Usually, the
mount point path length is quite short, so 1024 bytes for the buffer is
excessive. You can allocate exact needed buffer, which would save in
around 10KB of kernel memory even for relatively modest amount of 10
mounts.

For additional cleverness, mnt_path could point to f_mntonname when
the length is less than MNAMELEN.  Since mp deallocation is centralized,
code for the trick should be not too hard.

--T+nnW5vHQKf/VlFb
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSjwrkAAoJEJDCuSvBvK1BTQYP+gI9yGxHBxVh7pn4NBeZjg7e
MEe6URaiyFgSSsQ1HCB5Z6pvyXJGBo2ATZKR5IfD/61e9ZAylsLX8YPR7XBDQxv1
c2JGoBNilVbTZdneqx5eP+AWeTvKVPXt1q6xuBozLZy6xV+E9/P4vk+lBP9/bmVF
/xKtvYX6wsoM3AXCGlajppvRmBTuknkFgeOlCRrExeX4M0VHDWinphxnQt1f9v51
BHUAlmhJv77i0zi9UzU2/QlsKQ+n/dXKWkiobdzsu7anthdbPOLBSSEMkmAqf44F
yRgiv/MXQVdJpS4QXawnarSwdmH7xFw1YXnZk5lb9ysnS3e7EEiT5jlFK+MjQ7EV
JiEnVLm9v26lpBoRatfCHahZ+XGUa8WT1OzRWtCOyc/yCpK10kuRAS3hj5+e4vj8
ZNQtZxpMcJ8T20f5kwthE+cqr/dicja/oPgdoqmfEVODcfGmNWZyb43jSnrQabWp
p3CKJN3wJ52DQ2oF5khc9XTuJS72PM0BlsXDnlB0+tC0oRjNf74HXJzOrbf/OCh7
6NDbwxFK1OjaqPwaiT9b1J30mnqL8IqRbInkNYFHKBie/Q30cxEe9IpoNsNyRd2F
ou8Gwz8hx6R0E/j8m6cRFBeNKF4bej3EEZOgjqtZzLDpyTieZNZFs8cG3Hh15/+5
Jp7ZG8lCHLI2dJEvSNtT
=Hye4
-----END PGP SIGNATURE-----

--T+nnW5vHQKf/VlFb--

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 10:22:44 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id AD393B0E;
 Fri, 22 Nov 2013 10:22:44 +0000 (UTC)
Received: from mail-lb0-x22c.google.com (mail-lb0-x22c.google.com
 [IPv6:2a00:1450:4010:c04::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id CDE7E27FE;
 Fri, 22 Nov 2013 10:22:43 +0000 (UTC)
Received: by mail-lb0-f172.google.com with SMTP id z5so765996lbh.17
 for <multiple recipients>; Fri, 22 Nov 2013 02:22:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=F56WM0o7sF1cHguprHuxlEqMssyHnKayRWPaALttQuw=;
 b=myeDvG4mrYhmXeDBWPXGSLOsbjE1E0CktNRcRXVqssRB5oq1jYcgixI8Zh3gv8EXhz
 siSY84JBnabHjqI99IHP1pQJkEAKi+SahSnn1J3CA/Vg2KUVJzYcMrx4NK+Plqw9furG
 8EU8Vw8tfl5+ujF2WALUhopKkll2zcdW0zLHqJOf/oSKBPnbfXP4+XVfUA0GaC5yUC1Q
 Y3rHue95NiM5EowYOsJzB9AkI/B/ApuVh1D8XNUEMNBbXVkju/UK5MBqDML79FzBgVaB
 tpvEoJYmh4nTkTLeZBhUTjuJxrpTDKNsk82E9HVFSuHA7LG06dkSD1GNPa9c6CJ+LCV5
 cr0A==
X-Received: by 10.112.143.3 with SMTP id sa3mr8630181lbb.12.1385115761816;
 Fri, 22 Nov 2013 02:22:41 -0800 (PST)
Received: from [172.29.2.131] (195-248-173-117.static.vega-ua.net.
 [195.248.173.117])
 by mx.google.com with ESMTPSA id vz9sm26433975lbb.17.2013.11.22.02.22.38
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 22 Nov 2013 02:22:40 -0800 (PST)
Message-ID: <528F3062.8040105@gmail.com>
Date: Fri, 22 Nov 2013 12:22:26 +0200
From: Vitaly Magerya <vmagerya@gmail.com>
User-Agent: Thunderbird
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler
References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua>
In-Reply-To: <20131121211546.GQ59496@kib.kiev.ua>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: freebsd-hackers@freebsd.org, davidxu@freebsd.org, threads@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 10:22:44 -0000

On 2013-11-21 23:15, Konstantin Belousov wrote:
> Please try this.
> 
> diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_private.h
> [...]
> diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c
> [...]

Yeah, applied to 9.2-RELEASE, this fixes the issues I had; thank you.
Will you commit it and will it make it's way into 10-RELEASE?

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 11:56:28 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 793A7F32;
 Fri, 22 Nov 2013 11:56:28 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id DFA792F64;
 Fri, 22 Nov 2013 11:56:27 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAMBuI53059924;
 Fri, 22 Nov 2013 13:56:18 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAMBuI53059924
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id rAMBuIBl059923;
 Fri, 22 Nov 2013 13:56:18 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 22 Nov 2013 13:56:18 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Vitaly Magerya <vmagerya@gmail.com>
Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler
Message-ID: <20131122115618.GZ59496@kib.kiev.ua>
References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua>
 <528F3062.8040105@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="hwm514xwU++9Zw4g"
Content-Disposition: inline
In-Reply-To: <528F3062.8040105@gmail.com>
User-Agent: Mutt/1.5.22 (2013-10-16)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-hackers@freebsd.org, davidxu@freebsd.org, threads@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 11:56:28 -0000


--hwm514xwU++9Zw4g
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 22, 2013 at 12:22:26PM +0200, Vitaly Magerya wrote:
> On 2013-11-21 23:15, Konstantin Belousov wrote:
> > Please try this.
> >=20
> > diff --git a/lib/libthr/thread/thr_private.h b/lib/libthr/thread/thr_pr=
ivate.h
> > [...]
> > diff --git a/lib/libthr/thread/thr_sig.c b/lib/libthr/thread/thr_sig.c
> > [...]
>=20
> Yeah, applied to 9.2-RELEASE, this fixes the issues I had; thank you.
> Will you commit it and will it make it's way into 10-RELEASE?

Sure I will commit it after testing.  It is too premature to talk
about MFC, before the reasonable testing period in HEAD after commit.

--hwm514xwU++9Zw4g
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSj0ZhAAoJEJDCuSvBvK1BPD4P/AwPqZZC7h+Y8qI7u1GFqPku
4E4ePSll/quaDXonal8Kh8xhmzIKmruf1D8UI8rh0xlKt8fwBeaXRqUjJQyMHrmP
hPISOsevAlbN6GhftgqeEGa3v5mcwuz88RRPNxrfV/nRKdd8NRElQtbBVgVkatIr
F5MmOst7CChRrjmt+g5StdzEUXUcfm2togS5gvuxhZukEuWMqz56KZ/20SP7PvAG
A0lP/gbhOAZIlEhQ9/r8hBif/Sld42V7rRVr9PQr3ncAXVuICAcAoduVuhP/r/zH
ZqlSUbuTCBIsCH5dUT/Wcj77VIxV8amYzeAf/kRS8fFGlVOq2/tiTovaiFPpzmMH
CMamm+npBq26sZN3EhUckCmkbvXRWvevhyCTuBTon1rLK4gzI0YaYPx/PucEmsZq
UHus/X2Ude9NWG/yubPgq1M9ZcaWSTxrMAnBreZL+VIJlMgwEZuJPJ+L6hH3we9p
+Zp8Pf8cDFw9UeekKfepYDROKOpQJ3LJhfSyygzER2aDLTgQJ+DHzdUC1AxDa9Z8
TrXSxdQvH7WkPZRlQfPjmXCw2iD7AsfHsiRpIPGlo/rF5eUxBesELGck6OQ+gMx9
j5CXraBBz+uahhRJkP4ERgnDHGiEHOsT3eH9QXI799imd4IodvlC/krDGlBObPcV
GfctHyU6tLHkKgkbiZQ5
=pBj7
-----END PGP SIGNATURE-----

--hwm514xwU++9Zw4g--

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 13:36:09 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 057ED2AC;
 Fri, 22 Nov 2013 13:36:09 +0000 (UTC)
Received: from mx1.stack.nl (unknown [IPv6:2001:610:1108:5012::107])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 89B0324E8;
 Fri, 22 Nov 2013 13:36:08 +0000 (UTC)
Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132])
 by mx1.stack.nl (Postfix) with ESMTP id 060391203C8;
 Fri, 22 Nov 2013 14:35:54 +0100 (CET)
Received: by turtle.stack.nl (Postfix, from userid 1677)
 id D2507CB4E; Fri, 22 Nov 2013 14:35:53 +0100 (CET)
Date: Fri, 22 Nov 2013 14:35:53 +0100
From: Jilles Tjoelker <jilles@stack.nl>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler
Message-ID: <20131122133553.GA28457@stack.nl>
References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20131121211546.GQ59496@kib.kiev.ua>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-hackers@freebsd.org, threads@freebsd.org,
 Vitaly Magerya <vmagerya@gmail.com>, davidxu@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 13:36:09 -0000

On Thu, Nov 21, 2013 at 11:15:46PM +0200, Konstantin Belousov wrote:
> On Thu, Nov 21, 2013 at 02:39:02PM +0200, Vitaly Magerya wrote:
> > Hi, folks. I'm investigating a test case failure that devel/boehm-gc
> > has on recent FreeBSD releases. The problem is that a signal
> > handler registered for SIGUSR1 is sometimes called with signum=0,
> > which should not be possible under any conditions.

> > Here's a simple test case that demonstrates this behavior:

> > /* Compile with 'c99 -o example example.c -pthread'
> >  */
> > #include <pthread.h>
> > #include <signal.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > 
> > void signal_handler(int signum, siginfo_t *si, void *context) {
> >     if (signum != SIGUSR1) {
> >         printf("bad signal, signum=%d\n", signum);
> >         exit(1);
> >     }
> > }
> > 
> > void *thread_func(void *arg) {
> >     return arg;
> > }
> > 
> > int main(void) {
> >     struct sigaction sa = { 0 };
> >     sa.sa_flags = SA_SIGINFO;
> >     sa.sa_sigaction = signal_handler;
> >     if (sigfillset(&sa.sa_mask) != 0) abort();
> >     if (sigaction(SIGUSR1, &sa, NULL) != 0) abort();
> >     for (int i = 0; i < 10000; i++) {
> >         pthread_t t;
> >         pthread_create(&t, NULL, thread_func, NULL);
> >         pthread_kill(t, SIGUSR1);
> Side note.  pthread_kill(3) call behaviour is undefined if pthread_create(3)
> in the line before failed.
> 
> >     }
> >     return 0;
> > }

> > Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get
> > "signum=0" from this program, but you may need to run it a few
> > times or increase the number of iterations to see the same.

> > Interestingly enough, I don't see this behavior under 9.0-RELEASE.

This is because the bug was introduced with AVX support. (It also occurs
on systems without AVX.)

> > So, any ideas what the problem here is?

> It happens when libthr deferred signal handling path is taken for signal
> delivery and for some reason the code inside the deferred path called
> into rtld for symbol binding. Than, rtld lock is locked, some code in
> rtld is executed, and rtld lock is unlocked. Unlock causes _thr_ast()
> run, which results in the nested check_deferred_signal() execution.
> The check_deferred_signal() clearks si_signo, so on return the same
> signal is delivered one more time, but is advertized as signo zero.

> The _thr_rtld_init() approach of doing dummy calls does not really work,
> since it is not practically possible to enumerate the symbols needed
> during signal delivery.

> My first attempt to fix this was to increment curthread->critical_count
> around the calls to check_* functions in the _thr_ast(), but it causes
> reverse problem of losing _thr_ast() runs on unlock.

> I ended up with the flag to indicate that deferred delivery is running,
> so check_deferred_signal() should avoid doing anything. A delicate
> moment is that user signal handler is allowed to modify the passed
> machine context to result the return from the signal handler to cause
> arbitrary jump, or just do longjmp(). For this case, I also clear the
> flag in thr_sighandler(), since kernel signal delivery means that nested
> delivery code should not run right now.

This analysis suggests an easier approach: just move the check for
deferred_siginfo.si_signo == 0 downward. If __fillcontextx2 or sysarch
need to be looked up by rtld, the resulting _thr_ast() will invoke the
signal handler and the original call to check_deferred_signal() will do
nothing.

This patch fixes the problem for me on stable/9 and head.

Index: lib/libthr/thread/thr_sig.c
===================================================================
--- lib/libthr/thread/thr_sig.c	(revision 258178)
+++ lib/libthr/thread/thr_sig.c	(working copy)
@@ -326,12 +326,12 @@ check_deferred_signal(struct pthread *curthread)
 	uc_len = __getcontextx_size();
 	uc = alloca(uc_len);
 	getcontext(uc);
-	if (curthread->deferred_siginfo.si_signo == 0)
-		return;
 	__fillcontextx2((char *)uc);
 	act = curthread->deferred_sigact;
 	uc->uc_sigmask = curthread->deferred_sigmask;
 	memcpy(&info, &curthread->deferred_siginfo, sizeof(siginfo_t));
+	if (curthread->deferred_siginfo.si_signo == 0)
+		return;
 	/* remove signal */
 	curthread->deferred_siginfo.si_signo = 0;
 	handle_signal(&act, info.si_signo, &info, uc);

-- 
Jilles Tjoelker

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 14:36:01 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5CF082C4
 for <freebsd-hackers@freebsd.org>; Fri, 22 Nov 2013 14:36:01 +0000 (UTC)
Received: from mail-ie0-x22c.google.com (mail-ie0-x22c.google.com
 [IPv6:2607:f8b0:4001:c03::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3567C2847
 for <freebsd-hackers@freebsd.org>; Fri, 22 Nov 2013 14:36:01 +0000 (UTC)
Received: by mail-ie0-f172.google.com with SMTP id qd12so2210414ieb.17
 for <freebsd-hackers@freebsd.org>; Fri, 22 Nov 2013 06:36:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=ExcJIENT3SBeWV6IZi8gESC993xfSEbJ/tWEKewj8W8=;
 b=xZwTeXyXPZuuPEcYXP3AyHwBws3WpHk9AmyxvCjJyF5os39FtRxHEzSZ4Pqz0jsNJf
 AGl+ZffxhRghuY80UrZWBM9CeCxpVf+tWVZ1ugutGNLS+t6gJ5RbdCYXhvRzxcDtwKFA
 4zf8ha/UIkdwiReiwHTIu2TAWGhjb/p1+275E8ngHEt90N1WKzxTdKpneVqabKv+vHEW
 eQ5GstIs1jJycyrr+mkligfpmPTJ7QTAm65BYx1wlhoZoCoNQUCUDaEdVf4EUSwS2svk
 +Hu1KM0Cgdnut2fXCOOmTtUuf2P9Kob0MfBq2i/GiIodlmrQ2MT4MYOTr0S5VsOuN5ig
 AnBw==
MIME-Version: 1.0
X-Received: by 10.50.238.196 with SMTP id vm4mr2617364igc.43.1385130960628;
 Fri, 22 Nov 2013 06:36:00 -0800 (PST)
Received: by 10.50.225.70 with HTTP; Fri, 22 Nov 2013 06:36:00 -0800 (PST)
Date: Fri, 22 Nov 2013 15:36:00 +0100
Message-ID: <CALXu0UfCTC-3j4CR=XUiGiSNvZ5Eg5FoQJMyGmu=YuX9ZSvy3A@mail.gmail.com>
Subject: O_XATTR support in FreeBSD?
From: Cedric Blancher <cedric.blancher@gmail.com>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 14:36:01 -0000

Are there plans to support O_XATTR in FreeBSD anytime soon? Our
applications depend heavily on it (both through NFSv4 and ZFS) and we
may need an alternative to Solaris soon.

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 15:21:06 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A9E8EFD1;
 Fri, 22 Nov 2013 15:21:06 +0000 (UTC)
Received: from mail-la0-x232.google.com (mail-la0-x232.google.com
 [IPv6:2a00:1450:4010:c03::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id D39602AEE;
 Fri, 22 Nov 2013 15:21:05 +0000 (UTC)
Received: by mail-la0-f50.google.com with SMTP id el20so1078928lab.9
 for <multiple recipients>; Fri, 22 Nov 2013 07:21:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=xy+5XmsAdvZzZrW1Wqwxo9XR1IpG1jW73LOi83n/Ges=;
 b=cMTB7LLUIrYQIu0u69NufN+d4Spfy5iRxyMyMmBRWfVJdBb0S5c7s/QDRteiO5YlUF
 P992K7n6ghyRU2Zr0BhSq56MmgpvWRbM95FLzsSKAsDByx8K5q8zAna77mgbYei1C7Ao
 qdzTR8oYIV4C05px9refgM/o7RzzLuIUE7YWetdfrD2+Pm/elqM6toVCsdkZNJ0H2HHd
 rtSR7FGHZUSvTBPiqw1DMW2TO9nIfCvYShzzfLnpslM77ikjEDB8/x4bMeEVd9Ki5UG5
 CvOiwqC51xuKz6WMmcYdI5pl+Ke5egDF1+6D3z+lUZFtAtVHr7l0QivaQyOq28fjJa9v
 R4Cg==
X-Received: by 10.152.115.230 with SMTP id jr6mr1318172lab.45.1385133663854;
 Fri, 22 Nov 2013 07:21:03 -0800 (PST)
Received: from [172.16.0.2] (tx97.net. [85.198.160.156])
 by mx.google.com with ESMTPSA id k3sm27320892lbs.0.2013.11.22.07.21.01
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 22 Nov 2013 07:21:02 -0800 (PST)
Message-ID: <528F765A.8040306@gmail.com>
Date: Fri, 22 Nov 2013 17:20:58 +0200
From: Vitaly Magerya <vmagerya@gmail.com>
User-Agent: Thunderbird
MIME-Version: 1.0
To: Jilles Tjoelker <jilles@stack.nl>, 
 Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler
References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua>
 <20131122133553.GA28457@stack.nl>
In-Reply-To: <20131122133553.GA28457@stack.nl>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, davidxu@freebsd.org, threads@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 15:21:06 -0000

On 11/22/2013 15:35, Jilles Tjoelker wrote:
> This patch fixes the problem for me on stable/9 and head.
> 
> Index: lib/libthr/thread/thr_sig.c
> ===================================================================
> --- lib/libthr/thread/thr_sig.c	(revision 258178)
> +++ lib/libthr/thread/thr_sig.c	(working copy)
> @@ -326,12 +326,12 @@ check_deferred_signal(struct pthread *curthread)
>  	uc_len = __getcontextx_size();
>  	uc = alloca(uc_len);
>  	getcontext(uc);
> -	if (curthread->deferred_siginfo.si_signo == 0)
> -		return;
>  	__fillcontextx2((char *)uc);
>  	act = curthread->deferred_sigact;
>  	uc->uc_sigmask = curthread->deferred_sigmask;
>  	memcpy(&info, &curthread->deferred_siginfo, sizeof(siginfo_t));
> +	if (curthread->deferred_siginfo.si_signo == 0)
> +		return;
>  	/* remove signal */
>  	curthread->deferred_siginfo.si_signo = 0;
>  	handle_signal(&act, info.si_signo, &info, uc);
> 

I can confirm that this also solves the problems I'm seeing.

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 16:57:27 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3C2E5578
 for <freebsd-hackers@freebsd.org>; Fri, 22 Nov 2013 16:57:27 +0000 (UTC)
Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1FF8C20C3
 for <freebsd-hackers@freebsd.org>; Fri, 22 Nov 2013 16:57:26 +0000 (UTC)
Received: from [192.168.1.2] (pool-173-52-87-124.nycmny.fios.verizon.net
 [173.52.87.124])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested) (Authenticated sender: ryao)
 by smtp.gentoo.org (Postfix) with ESMTPSA id 4C19333DA86;
 Fri, 22 Nov 2013 16:57:19 +0000 (UTC)
Message-ID: <528F8CFE.9030709@gentoo.org>
Date: Fri, 22 Nov 2013 11:57:34 -0500
From: Richard Yao <ryao@gentoo.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130925 Thunderbird/17.0.9
MIME-Version: 1.0
To: Cedric Blancher <cedric.blancher@gmail.com>
Subject: Re: O_XATTR support in FreeBSD?
References: <CALXu0UfCTC-3j4CR=XUiGiSNvZ5Eg5FoQJMyGmu=YuX9ZSvy3A@mail.gmail.com>
In-Reply-To: <CALXu0UfCTC-3j4CR=XUiGiSNvZ5Eg5FoQJMyGmu=YuX9ZSvy3A@mail.gmail.com>
X-Enigmail-Version: 1.5.2
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature";
 boundary="VS5RI4F2qQipReN3gmqCg985gMP6rel7c"
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 16:57:27 -0000

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--VS5RI4F2qQipReN3gmqCg985gMP6rel7c
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 11/22/2013 09:36 AM, Cedric Blancher wrote:
> Are there plans to support O_XATTR in FreeBSD anytime soon? Our
> applications depend heavily on it (both through NFSv4 and ZFS) and we
> may need an alternative to Solaris soon.
>=20
> Ced
>=20

There is always OmniOS:

http://omnios.omniti.com/

That being said, do you mean that FreeBSD's ZFS implementation lacks
xattr support? ZFSOnLinux supports that, so I suppose that is another
option should you mean what I think you mean.


--VS5RI4F2qQipReN3gmqCg985gMP6rel7c
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSj40DAAoJECDuEZm+6ExknsgP/0vv7ZA+MbgRa7nI32GQQqtm
RvswloXlRSYMHWO/uQWMumvqAL+cwvXzLStx/AxFqrYEoRoZ73FL6n0ts9RoVlsX
6AagYioLA4EBaCJuszrqhc0ZvWkLdS0EKbcHBw9DUn0b2uzkkIOcPU7LNvvqAr6B
r+q0VirmIRfbRMWc0acMeRS9FqV40QT+TZOpvCF2U4eWkCo7CHji+belk1NdXVDa
bwk8b6aPfstgmAFC0ZVNdwp2AbKUSNDdVQV1+ZgIaSV4D+ctFIuPoIvknV83qmmt
l/etlF71bdYz6lMYkI1KJi1jD1W/MpIzUP6eXEVsRd6crsq81BkBl6tRckiaLE0N
6sesKVJKTYrDyb2LvhCa36Xuug4U//LzsBSkUz9ssLEfpY9r4fnt5e7yMolxqDbe
8H9IvGv3XbJAQfL10kIIsYeiNjixh7ZVfCixS0vpCYzND1ODRnZPK9oddrmlq2dk
AyACKJ9kuaHjJnHvjoj2ZVBFQcsMWpvk5ilhdKxBtfqkTrkbIRcKRpmPzBQoZdvR
tNPYF1wFjRa7//rifDkJklv5N+t0qlRwdzCw0QNGUBq0mbepRRVREzIEfHlkvLxt
3cLUFHpkjTg3Fr+K5tvEd5bM1cUgRturyU78eOb+tieFmmSPKj+hyBstqISfTPht
tLoVqBLhITkzT/RHncHj
=hsll
-----END PGP SIGNATURE-----

--VS5RI4F2qQipReN3gmqCg985gMP6rel7c--

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 17:04:20 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A223A6F8;
 Fri, 22 Nov 2013 17:04:20 +0000 (UTC)
Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90])
 by mx1.freebsd.org (Postfix) with ESMTP id 7A5442124;
 Fri, 22 Nov 2013 17:04:20 +0000 (UTC)
X-Ambrisko-Me: Yes
Received: from server2.ambrisko.com (HELO internal.ambrisko.com)
 ([192.168.1.2])
 by ironport.ambrisko.com with ESMTP; 22 Nov 2013 09:08:12 -0800
Received: from ambrisko.com (localhost [127.0.0.1])
 by internal.ambrisko.com (8.14.4/8.14.4) with ESMTP id rAMH4Ji4070672;
 Fri, 22 Nov 2013 09:04:19 -0800 (PST)
 (envelope-from ambrisko@ambrisko.com)
Received: (from ambrisko@localhost)
 by ambrisko.com (8.14.4/8.14.4/Submit) id rAMH4JRa070670;
 Fri, 22 Nov 2013 09:04:19 -0800 (PST) (envelope-from ambrisko)
Date: Fri, 22 Nov 2013 09:04:19 -0800
From: Doug Ambrisko <ambrisko@ambrisko.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Re: Fix MNAMELEN or reimplement struct statfs
Message-ID: <20131122170419.GA60910@ambrisko.com>
References: <201306101152.17966.jhb@freebsd.org>
 <52854161.6080104@FreeBSD.org> <20131115010854.GA76106@ambrisko.com>
 <20131116183129.GD59496@kib.kiev.ua> <20131118190142.GA28210@ambrisko.com>
 <20131119074922.GY59496@kib.kiev.ua> <20131119174216.GA80753@ambrisko.com>
 <20131120075531.GE59496@kib.kiev.ua> <20131121174028.GA80520@ambrisko.com>
 <20131122074228.GT59496@kib.kiev.ua>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20131122074228.GT59496@kib.kiev.ua>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-hackers@freebsd.org, Dirk Engling <erdgeist@erdgeist.org>,
 Jase Thew <jase@freebsd.org>, mdf@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 17:04:20 -0000

On Fri, Nov 22, 2013 at 09:42:28AM +0200, Konstantin Belousov wrote:
| On Thu, Nov 21, 2013 at 09:40:28AM -0800, Doug Ambrisko wrote:
| > On Wed, Nov 20, 2013 at 09:55:31AM +0200, Konstantin Belousov wrote:
| > | On Tue, Nov 19, 2013 at 09:42:16AM -0800, Doug Ambrisko wrote:
| > | > I was talking about the more general case since the system tries to keep
| > | > the path in the stat structure.  My prior approach which had more issues
| > | > was to modify the stat structure of which I was pointed to NetBSD and their
| > | > change to statvfs which doesn't really solve the problem.  They don't
| > | > have the check to see if the mount is longer then VFS_MNAMELEN (in their case)
| > | > and just truncate things.
| > | > 
| > | > If we are just talking about adding it to the mount structure that
| > | > would be okay since it isn't exposed to user land.  I can add that.
| > |
| > | Yes, this is exactly what I mean.  Add a struct mount field, and use
| > | it for kernel only.  In fact, it only matters for sys_unmount() and
| > | kern_jail.c, other locations in kernel use the path for warnings, and
| > | this could be postponed if you prefer to minimize the patch.
| > 
| > Okay, I went through all of the occurances and compile tested (except
| > for #DEBUG).  I united a few things but should do more once I get
| > consensus on the approach.  I found a few spots that should be updated as
| > well and made the length check more consistant.  Some were doing >= and others
| > >.  So this should be better, however, a lot larger.  On the plus side
| > when we figure out how to return the longer path length to user land
| > that can be more flexible since the kernel is tracking the longer length.
| > Probably things to note are changes in:
| > 	ZFS to mount snapshot
| > 	cd9660 for symlinks
| > 	fuse to return full path
| > 	jail to check statfs and mount
| > 	mount/umount to save and check full path
| > 	mountroot to save new field for full path
| > 	
| > Just in case it doesn't make it in email the full patch is at:
| > 	http://people.freebsd.org/~ambrisko/mount_bigger.patch
| > 
| Yes, this is closer to the patch I can agree with :).
| 
| Two notes, one was already made, about off by one.

The off by one, I want to revisit so that it is consistant.  We have
places in which there was checks
	if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MNAMELEN)
and
	if (strlen(fstype) >= MFSNAMELEN - 1 || strlen(fspath) >= MNAMELEN - 1)
both with the same comment of "Be ultra-paranoid".  Unless something is
special they should have been the same and whatever is right should be
carried forward.  If there is a special case then it should be clearly
commented.  Since this check has moved into other code we need to get
it hashed out once and for all IMHO.  I mainly did this current change
to make sure attention is drawn to this for now until it is resolved.
 
| Second is, I suggest to make the mnt_path member a char *. Usually, the
| mount point path length is quite short, so 1024 bytes for the buffer is
| excessive. You can allocate exact needed buffer, which would save in
| around 10KB of kernel memory even for relatively modest amount of 10
| mounts.

Okay, I thought you wanted it a const char to potential guard against
some mis use of the field in that this should be a read only value.
I had actually planned to do the malloc since I was concerned about
if this structure got allocated on the stack then it could explode
the kernel's stack.  It seems most of the consumers access the mount
structure as a pointer so then I wasn't as concerned.
 
| For additional cleverness, mnt_path could point to f_mntonname when
| the length is less than MNAMELEN.  Since mp deallocation is centralized,
| code for the trick should be not too hard.

I'll look to see if I can change the other places that update mnt_path
to use the vfs_mount_alloc type function.  Since then we could get more
sophisticated about the mnt_path allocater/reference as you mention.
In nfs_mounroot.c it probably doesn't matter much since it should be a
short path but it could be more of an issue with zfs snapshots.

It looks like we are converging.  I'll make some more changes to make
sure we are getting on a good path port another patch.  Once that looks
okay in concept then I'll start looking into testing the various file
systems since unfortuanately it touches a lot of code even though it is
mostly mechanical.  I don't have a lot of time to work on this so I
want to optimize various things as once.  If someone can help unit test
corner cases that would be great with the various file systems.  Atleast
I have VirtualBox netbooting so I can test things quicker.  However,
that required some debugging and changes to pxeboot to send the Client ID
so isc-dhcpd didn't get upset with it.  I need to check that doesn't
break the non-ipxe boot stuff that doesn't require the Client ID field to
be set.  I've only run into this issue with ipxe in VirtualBox and qemu.
I also have some pxe boot robustness and caching fixes that I should
get in as well.

Thanks,

Doug A.

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 18:11:24 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C806BB25
 for <freebsd-hackers@freebsd.org>; Fri, 22 Nov 2013 18:11:24 +0000 (UTC)
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com
 [66.111.4.25])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 814372508
 for <freebsd-hackers@freebsd.org>; Fri, 22 Nov 2013 18:11:24 +0000 (UTC)
Received: from compute1.internal (compute1.nyi.mail.srv.osa [10.202.2.41])
 by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 4285B21362;
 Fri, 22 Nov 2013 13:11:21 -0500 (EST)
Received: from frontend1 ([10.202.2.160])
 by compute1.internal (MEProxy); Fri, 22 Nov 2013 13:11:21 -0500
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=
 daniel.shahaf.name; h=date:from:to:subject:message-id
 :mime-version:content-type; s=mesmtp; bh=M56sLF09ustEm9n5OiQUYQ4
 jFAA=; b=uogmQmjyoxFNCltHWwy4QXbWx3k65BDisfxEf0ijYnGZ0WqYfeHzbCC
 9XxeC33H6JP/CNwi9QNzxQ+rNeCUsM6kyHgg9WTudBT872+A3zE+X65D4LcCn0vS
 UJ52hMRkJlXpKFlStTtVsXb2GdLJJGhF3w+jY7VlyvdVBZDjnP2c=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=
 messagingengine.com; h=date:from:to:subject:message-id
 :mime-version:content-type; s=smtpout; bh=M56sLF09ustEm9n5OiQUYQ
 4jFAA=; b=fBSU18CeSvt9OJWjJxjpj06Kp6wRcdexrGFXj2MWrW26o5oShawyQu
 X4VSp4igFSJHi0R0AoXlI0KgK1kaufHxrpEws1+D7+5/GQuYOXjHyGxD9ccIRg8j
 nELa8pDPH+dU5jdR4PABYWFfGUYbHPFnBjOho2SpUw+bMiMj9AgEs=
X-Sasl-enc: dq1Pj/X6+MljBSKBTaqnXWESxretVpGdzg2YJ3eP8NVP 1385143880
Received: from tarsus.local2 (unknown [46.19.33.46])
 by mail.messagingengine.com (Postfix) with ESMTPA id 6C2C5C00E83;
 Fri, 22 Nov 2013 13:11:20 -0500 (EST)
Date: Fri, 22 Nov 2013 20:11:10 +0200
From: Daniel Shahaf <d.s@daniel.shahaf.name>
To: freebsd-hackers@freebsd.org
Subject: 'freebsd-update cron' repeatedly announcing 9.1-RELEASE-p8
Message-ID: <20131122181110.GA29056@tarsus.local2>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 18:11:24 -0000

This cron job:
0 3 * * * /usr/sbin/freebsd-update cron
emails me nightly with a request to update to 9.1-RELEASE-p8.

But I don't need the -p8 fixes in my environment, so that nightly mail
is just clutter in my inbox, and would make it harder for me to notice
-p9 when that is announced.

So I added a freebsd-update.conf(5) knob to allow suppressing the email
if it's for a given release.  See attachment.  The intended use is to
set the knob to "9.1-RELEASE-p8" and then, when I start getting mails
about -p9, either install -p9 or update the knob's value to -p9.

Daniel

Index: etc/freebsd-update.conf
===================================================================
--- etc/freebsd-update.conf	(revision 258471)
+++ etc/freebsd-update.conf	(working copy)
@@ -74,3 +74,7 @@ MergeChanges /etc/ /boot/device.hints
 
 # When backing up a kernel also back up debug symbol files?
 # BackupKernelSymbolFiles no
+
+# If the new release is the specified value, don't emit an email announcing
+# it.  (Default: unspecified)
+# IgnoreReleases 9.1-RELEASE-p8
Index: share/man/man5/freebsd-update.conf.5
===================================================================
--- share/man/man5/freebsd-update.conf.5	(revision 258471)
+++ share/man/man5/freebsd-update.conf.5	(working copy)
@@ -218,6 +218,13 @@ backup kernel, the
 .Cm freebsd-update
 rollback command will recreate the symbol files along with the old
 kernel.
+.It Cm IgnoreReleases
+The parameters following this keyword are regular expressions;
+if the new release matches one of them, it will be ignored by
+.Cm cron .
+.Pp
+This option can be specified multiple times, and the parameters
+accumulate.
 .El
 .Sh FILES
 .Bl -tag -width "/etc/freebsd-update.conf"
Index: usr.sbin/freebsd-update/freebsd-update.sh
===================================================================
--- usr.sbin/freebsd-update/freebsd-update.sh	(revision 258471)
+++ usr.sbin/freebsd-update/freebsd-update.sh	(working copy)
@@ -88,6 +88,7 @@ EOF
 CONFIGOPTIONS="KEYPRINT WORKDIR SERVERNAME MAILTO ALLOWADD ALLOWDELETE
     KEEPMODIFIEDMETADATA COMPONENTS IGNOREPATHS UPDATEIFUNMODIFIED
     BASEDIR VERBOSELEVEL TARGETRELEASE STRICTCOMPONENTS MERGECHANGES
+    IGNORERELEASE
     IDSIGNOREPATHS BACKUPKERNEL BACKUPKERNELDIR BACKUPKERNELSYMBOLFILES"
 
 # Set all the configuration options to "".
@@ -217,6 +218,13 @@ config_Components () {
 	done
 }
 
+# Add to the list of releases updates to will be ignored.
+config_IgnoreReleases () {
+	for C in $@; do
+		IGNORERELEASE="${IGNORERELEASE} ${C}"
+	done
+}
+
 # Add to the list of paths under which updates will be ignored.
 config_IgnorePaths () {
 	for C in $@; do
@@ -2086,6 +2094,21 @@ fetch_run () {
 	fetch_warn_eol || return 1
 }
 
+# If the available release is in IgnoreReleases, return true.
+# Else, return false.
+cron_suppress_mail() {
+	TMPFILE=$1
+	if grep -q "No updates needed" ${TMPFILE}; then
+		return 0
+	fi
+	for X in ${IGNORERELEASE}; do
+		if echo "${RELNUM}-p${RELPATCHNUM}" | grep -q "${X}"; then
+			return 0
+		fi
+	done
+	return 1
+}
+
 # If StrictComponents is not "yes", generate a new components list
 # with only the components which appear to be installed.
 upgrade_guess_components () {
@@ -3199,7 +3222,7 @@ cmd_cron () {
 
 	TMPFILE=`mktemp /tmp/freebsd-update.XXXXXX` || exit 1
 	if ! fetch_run >> ${TMPFILE} ||
-	    ! grep -q "No updates needed" ${TMPFILE} ||
+	    ! cron_suppress_mail ${TMPFILE} ||
 	    [ ${VERBOSELEVEL} = "debug" ]; then
 		mail -s "`hostname` security updates" ${MAILTO} < ${TMPFILE}
 	fi

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 18:39:53 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 117E0145;
 Fri, 22 Nov 2013 18:39:53 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 842552652;
 Fri, 22 Nov 2013 18:39:52 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id rAMIdgQ8044050;
 Fri, 22 Nov 2013 20:39:42 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua rAMIdgQ8044050
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id rAMIdgnB044049;
 Fri, 22 Nov 2013 20:39:42 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 22 Nov 2013 20:39:42 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Jilles Tjoelker <jilles@stack.nl>
Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler
Message-ID: <20131122183942.GB59496@kib.kiev.ua>
References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua>
 <20131122133553.GA28457@stack.nl>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="rmx1G5GNWS01lHd9"
Content-Disposition: inline
In-Reply-To: <20131122133553.GA28457@stack.nl>
User-Agent: Mutt/1.5.22 (2013-10-16)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-hackers@freebsd.org, threads@freebsd.org,
 Vitaly Magerya <vmagerya@gmail.com>, davidxu@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 18:39:53 -0000


--rmx1G5GNWS01lHd9
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 22, 2013 at 02:35:53PM +0100, Jilles Tjoelker wrote:
> This analysis suggests an easier approach: just move the check for
> deferred_siginfo.si_signo =3D=3D 0 downward. If __fillcontextx2 or sysarch
> need to be looked up by rtld, the resulting _thr_ast() will invoke the
> signal handler and the original call to check_deferred_signal() will do
> nothing.
>=20
> This patch fixes the problem for me on stable/9 and head.
>=20
> Index: lib/libthr/thread/thr_sig.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- lib/libthr/thread/thr_sig.c	(revision 258178)
> +++ lib/libthr/thread/thr_sig.c	(working copy)
> @@ -326,12 +326,12 @@ check_deferred_signal(struct pthread *curthread)
>  	uc_len =3D __getcontextx_size();
>  	uc =3D alloca(uc_len);
>  	getcontext(uc);
> -	if (curthread->deferred_siginfo.si_signo =3D=3D 0)
> -		return;
>  	__fillcontextx2((char *)uc);
>  	act =3D curthread->deferred_sigact;
>  	uc->uc_sigmask =3D curthread->deferred_sigmask;
>  	memcpy(&info, &curthread->deferred_siginfo, sizeof(siginfo_t));
> +	if (curthread->deferred_siginfo.si_signo =3D=3D 0)
> +		return;
>  	/* remove signal */
>  	curthread->deferred_siginfo.si_signo =3D 0;
>  	handle_signal(&act, info.si_signo, &info, uc);
>=20

I do not like this. It is similar to what I did initially when I
debugged the problem, but the duplicated calls to getcontext(2) and
sysarch(2) stayed out as a sore in ktrace. I also do not like the fact
that, with the change, signal is delivered from an rtld context.

If taking such road, the fix would be to add __fillcontext2() to
_rtld_init(), but I described the reason for other fix in the initial
response.

--rmx1G5GNWS01lHd9
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSj6TtAAoJEJDCuSvBvK1BOSkQAIgX0yy3jpTylGEV1X5BfvRt
SkbpN+JlzSgUTMKGrnA0qt03SQE2JZp9rHS+b8qPEgDuXG/P76pz10rqcMF+3wv3
4Xs9yiv0r4kRv9Blw7d5tvsXi1HH9sF8hPmj2TbL2rJ1qOv4hacg5LLvocyZZ4oz
yyL5WRB6XwQTW3Ax8BXSMuxLvHA4P2PAQ6CxG2283O1WQrOHELroLGTeS1nCvjaI
irefCxx5lXWS3HYi6NxkV6MWIBYI7e57tLZNAKJnF5FDT8bWw/0hqR1/8Jpp/80Y
vEs/56f1yNzJibzTS84NmZ5iW5KsKC4NR/Oq3AyRgZQ65C6Du2oOyHgjDW7o6a+i
JznvcXVGA4TlF0m2e0zoXAhG0uHtxKZaHeDm8MBrR2ghZY2w1o2IHxIW944yzzY4
wkHT3i2WsMVkpPqyIMr2Zb4Z/tKf9bnthk3K3+JnTbSJDnvpzU2xIU3B1iosmXM2
GRKBCwzD36MzJ0MBZWbSWtpdJZDcS+qZVyJviq3TKsqd0Tfbr+08LtkXJ8w+3gDV
de4RMbNc9cqN9hq+mvvTxdZKUd4nFYuwZXx0qyUxZequ16tYpUfAXlnaVco6vYAS
5fFc1ztq+lVhjkLnGeW+SE1q4Alju6cgAnf25XUo+7W3ZEtC+DxXnrtuptpzcJ3X
XZLeWMJ+5fuwTle9w9SP
=pWVY
-----END PGP SIGNATURE-----

--rmx1G5GNWS01lHd9--

From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 19:57:23 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A48E92FB
 for <freebsd-hackers@freebsd.org>; Fri, 22 Nov 2013 19:57:23 +0000 (UTC)
Received: from nm14-vm0.bullet.mail.bf1.yahoo.com
 (nm14-vm0.bullet.mail.bf1.yahoo.com [98.139.213.164])
 by mx1.freebsd.org (Postfix) with SMTP id 4AB072AA1
 for <freebsd-hackers@freebsd.org>; Fri, 22 Nov 2013 19:57:22 +0000 (UTC)
Received: from [98.139.215.141] by nm14.bullet.mail.bf1.yahoo.com with NNFMP;
 22 Nov 2013 19:55:40 -0000
Received: from [98.139.211.198] by tm12.bullet.mail.bf1.yahoo.com with NNFMP;
 22 Nov 2013 19:55:40 -0000
Received: from [127.0.0.1] by smtp207.mail.bf1.yahoo.com with NNFMP;
 22 Nov 2013 19:55:40 -0000
X-Yahoo-Newman-Id: 790450.3025.bm@smtp207.mail.bf1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: r58nt9YVM1ni.HUjy_PIn9MN7RJhHYNfSWZmMN6SyQ6ydl4
 wFlPqwcSkK74eTllL4gC.FosZNa6Ne8WNJiaFMaVdMOC1VrPdWUT1GcoJzAe
 TWnsmJAKfmU6UVAZVl36dXze50jJPe9c3JUfWkQk86BviiFteZY3x9AQcHpP
 3hg7XFqTzmnrklF6v_ODw2xloYmS4.zgGGsBp6vNTlgVHgw117oktdm0nN.1
 2UAdrCh1MOofWiX4FHKwXUMBzj2vkA88AV4u_6nnHGh9RfQma45i8izUWFHi
 R714ofTMkDDgdkXnrg2vJ5DcjTTIx_MvrFOmUR4Vv9eC2zdD0eEEzHpTotHa
 JvNm11NxxI847qJV.6hTFrgTnslD4UZoAJ5Fg095ZSIqaNoT.puLDdFN4yqG
 .iBMVdV5gq9vb5dNgMhXlRlNqDgeqSj71kXyck6TitCSCdmZIxEgvE6Jbcym
 IaAc10yXdvmKEdNFpuu0N0SzdnzH21M4bpXeUD3ijdYYKCFiHmGnCLFOWXI6
 eTw8Llumn4A2UfnCP0lswmYR.E1E57ShPMfiXiVFrWjJX8VGDbgThrPis8Gx
 7wZRdDQdT2RrqAqdOT.Gq2WYoxKC7ZD2LfHyPCVDTqAgfeQDsLygOQ9Sw6cV C
X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf
X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with )
 by smtp207.mail.bf1.yahoo.com with SMTP; 22 Nov 2013 11:55:40 -0800 PST
Message-ID: <528FB6BA.7040606@FreeBSD.org>
Date: Fri, 22 Nov 2013 14:55:38 -0500
From: Pedro Giffuni <pfg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: Cedric Blancher <cedric.blancher@gmail.com>
Subject: Re: O_XATTR support in FreeBSD?
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Freebsd hackers list <freebsd-hackers@freebsd.org>,
 Richard Yao <ryao@gentoo.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 19:57:23 -0000

Well ...

According to:

https://wiki.freebsd.org/ZFS

We do support Extended Attributes on ZFS but they differ from the ones 
in Solaris (and Linux).

Pedro.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Nov 23 07:13:15 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CFC8F2AC;
 Sat, 23 Nov 2013 07:13:15 +0000 (UTC)
Received: from mail-ie0-x22b.google.com (mail-ie0-x22b.google.com
 [IPv6:2607:f8b0:4001:c03::22b])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9CD7E2950;
 Sat, 23 Nov 2013 07:13:15 +0000 (UTC)
Received: by mail-ie0-f171.google.com with SMTP id ar20so3723042iec.16
 for <multiple recipients>; Fri, 22 Nov 2013 23:13:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=lGE/e4u6r+17h/nBlfDBcfeLo8hbMup9bNGaNmmBUro=;
 b=PRNFBz+DureTz68lxZLcOcZl9PIIEa+akVmTBlsVS4oc4c2AzNdQOiYdhNp2BOsVt0
 qHWpajdaxmKRVn3zvBlSqxD6DKWEXfMzWSjt9BiRQXwlHcwRpkKIf5g61coJ01xiHxKu
 2MIjU5X+eJB/s0x9AJd84cdNb5SVWzt4+dcPOA5rEQpgf3wVZ6Zlo1B3GRuuwHNpCB+q
 CATEFBpWCKzOQG4o123lNxEDKpHqEQkv5wxohlZHS/V3iHdHoGlSex/7JvFXkrAr/Z5E
 tyT/2B6wXCMW+n2lRXrE3L2xb2C+LPqDVE4dKIkoHjLD/sDwb+kwDdNkKRsbspbprtLn
 L4BQ==
MIME-Version: 1.0
X-Received: by 10.50.50.169 with SMTP id d9mr5517335igo.28.1385190794938; Fri,
 22 Nov 2013 23:13:14 -0800 (PST)
Received: by 10.50.225.70 with HTTP; Fri, 22 Nov 2013 23:13:14 -0800 (PST)
In-Reply-To: <528FB6BA.7040606@FreeBSD.org>
References: <528FB6BA.7040606@FreeBSD.org>
Date: Sat, 23 Nov 2013 08:13:14 +0100
Message-ID: <CALXu0UeDprsYibGoUQ-k_R-3_X6WoVkC_E9_9BkBwBo6aB-Zbg@mail.gmail.com>
Subject: Re: O_XATTR support in FreeBSD?
From: Cedric Blancher <cedric.blancher@gmail.com>
To: Pedro Giffuni <pfg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Freebsd hackers list <freebsd-hackers@freebsd.org>,
 Richard Yao <ryao@gentoo.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Nov 2013 07:13:15 -0000

On 22 November 2013 20:55, Pedro Giffuni <pfg@freebsd.org> wrote:
> Well ...
>
> According to:
>
> https://wiki.freebsd.org/ZFS
>
> We do support Extended Attributes on ZFS but they differ from the ones in
> Solaris (and Linux).

Well, we need the one specified in the NFSv4 standard. The Linux
extended attributes are pretty much useless because they are size
restricted (typical attribute size here is in the GB range, and for
example NIH and CERN have even much bigger sizes), can't be accessed
like normal files and are incompatible to Window's Alternate Streams.

Ced
-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

From owner-freebsd-hackers@FreeBSD.ORG  Sat Nov 23 14:08:09 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A552A39B
 for <freebsd-hackers@freebsd.org>; Sat, 23 Nov 2013 14:08:09 +0000 (UTC)
Received: from nm27-vm1.bullet.mail.bf1.yahoo.com
 (nm27-vm1.bullet.mail.bf1.yahoo.com [98.139.213.148])
 by mx1.freebsd.org (Postfix) with SMTP id 4B29129C9
 for <freebsd-hackers@freebsd.org>; Sat, 23 Nov 2013 14:08:08 +0000 (UTC)
Received: from [66.196.81.173] by nm27.bullet.mail.bf1.yahoo.com with NNFMP;
 23 Nov 2013 14:05:03 -0000
Received: from [98.139.213.8] by tm19.bullet.mail.bf1.yahoo.com with NNFMP;
 23 Nov 2013 14:05:03 -0000
Received: from [127.0.0.1] by smtp108.mail.bf1.yahoo.com with NNFMP;
 23 Nov 2013 14:05:03 -0000
X-Yahoo-Newman-Id: 196499.76039.bm@smtp108.mail.bf1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: APb89qgVM1nxo8iK80PQAcD3aS3OiynBKVpX3_FN0UNo4w3
 C5iOedsrQ4ybfkiQgdZat2ZZ6J.XxuS_rX0f0RHq99NeVYarcUQvlP0KWcMm
 bwNRtr22MZCmz0ne5ujw5ImZBA6wS_Y1YAng8BOVl9NSH72_mkRwzld41652
 .sU35rD2ystsnT7.JRWTEKaz7XI5ILJkcjG65LkHSHUFuBvpyzYWgAaSUbrn
 hiavz_gVvbGrZVtSfSTfi1qk2b84Iv.MTmRiE_Vxlny0r4T4.4l1oqF_2329
 74r.by70BfW241icV_0VMH4NQCQwWDLDmlJiWNXeVeIu8RKf77lZwHrd6tzN
 8aCLKylKR..RjfRRiWQlBdqCkLp09dmSl9CLkzGN9PK.znlwWe7JD83mwOXE
 bm4bJ8T4JCJGYDfq5XPokj4YLIxqHu5jqu6af9M6CtC16vXdV.KGcQ_93faQ
 h_c7njKc52Zqgsg0I4CpWR73PUqhbpfD.ca0Ffz.0Br9jxFm_F9qee5lmTKV
 eBfznv.lKxJHa_hlNlYwTxG7gX8aYpxw51vWzMyU0w7Nn2Z9oIX3Nvw--
X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf
X-Rocket-Received: from [192.168.0.102] (pfg@190.157.126.109 with )
 by smtp108.mail.bf1.yahoo.com with SMTP; 23 Nov 2013 06:05:03 -0800 PST
Message-ID: <5290B60D.2050006@FreeBSD.org>
Date: Sat, 23 Nov 2013 09:05:01 -0500
From: Pedro Giffuni <pfg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: Cedric Blancher <cedric.blancher@gmail.com>
Subject: Re: O_XATTR support in FreeBSD?
References: <528FB6BA.7040606@FreeBSD.org>
 <CALXu0UeDprsYibGoUQ-k_R-3_X6WoVkC_E9_9BkBwBo6aB-Zbg@mail.gmail.com>
In-Reply-To: <CALXu0UeDprsYibGoUQ-k_R-3_X6WoVkC_E9_9BkBwBo6aB-Zbg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Freebsd hackers list <freebsd-hackers@freebsd.org>,
 Richard Yao <ryao@gentoo.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Nov 2013 14:08:09 -0000

On 23.11.2013 02:13, Cedric Blancher wrote:
> On 22 November 2013 20:55, Pedro Giffuni <pfg@freebsd.org> wrote:
>> Well ...
>>
>> According to:
>>
>> https://wiki.freebsd.org/ZFS
>>
>> We do support Extended Attributes on ZFS but they differ from the ones in
>> Solaris (and Linux).
> Well, we need the one specified in the NFSv4 standard. The Linux
> extended attributes are pretty much useless because they are size
> restricted (typical attribute size here is in the GB range, and for
> example NIH and CERN have even much bigger sizes), can't be accessed
> like normal files and are incompatible to Window's Alternate Streams.
>
> Ced

I was unaware of a standard for EA beyond the old posix draft.
The reason for Extended Attributes is supporting ACL and we support both 
the draft posix and the NFS/win style ACLs.

Not sure about the status of NFSv4. The guys in the posix-1e list should 
know better.

regards,

Pedro.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Nov 23 22:53:47 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EE8ED5F1;
 Sat, 23 Nov 2013 22:53:47 +0000 (UTC)
Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id A3EED2007;
 Sat, 23 Nov 2013 22:53:46 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqQEAMcxkVKDaFve/2dsb2JhbABZgz9Tgnm4Vk6BMnSCJQEBAQMBAQEBICsgCwUWDgoCAg0ZAikBCSYGCAcEARwBA4daBg2uCZBCF4EpjQYHAQEbNAeCa4FIA4lCjAODf4kbh0eDRh4xewkXIg
X-IronPort-AV: E=Sophos;i="4.93,759,1378872000"; d="scan'208";a="71626984"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-annu.net.uoguelph.ca with ESMTP; 23 Nov 2013 17:53:38 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 04940B40EB;
 Sat, 23 Nov 2013 17:53:38 -0500 (EST)
Date: Sat, 23 Nov 2013 17:53:38 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Pedro Giffuni <pfg@FreeBSD.org>
Message-ID: <820263347.19772534.1385247218007.JavaMail.root@uoguelph.ca>
In-Reply-To: <5290B60D.2050006@FreeBSD.org>
Subject: Re: O_XATTR support in FreeBSD?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: Freebsd hackers list <freebsd-hackers@freebsd.org>,
 Richard Yao <ryao@gentoo.org>, Cedric Blancher <cedric.blancher@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Nov 2013 22:53:48 -0000

Pedro Giffuni wrote:
> On 23.11.2013 02:13, Cedric Blancher wrote:
> > On 22 November 2013 20:55, Pedro Giffuni <pfg@freebsd.org> wrote:
> >> Well ...
> >>
> >> According to:
> >>
> >> https://wiki.freebsd.org/ZFS
> >>
> >> We do support Extended Attributes on ZFS but they differ from the
> >> ones in
> >> Solaris (and Linux).
> > Well, we need the one specified in the NFSv4 standard. The Linux
> > extended attributes are pretty much useless because they are size
> > restricted (typical attribute size here is in the GB range, and for
> > example NIH and CERN have even much bigger sizes), can't be
> > accessed
> > like normal files and are incompatible to Window's Alternate
> > Streams.
> >
> > Ced
> 
> I was unaware of a standard for EA beyond the old posix draft.
> The reason for Extended Attributes is supporting ACL and we support
> both
> the draft posix and the NFS/win style ACLs.
> 
Interestingly, FreeBSD has a VOP_OPENEXTATTR() but no syscall
that uses it nor support for it in ZFS. (I'm just guessing it
was intended for an openat(2) syscall at some time?)
Btw Cedric, if you had mentioned "subfiles" or "fork files" in your
subject line, you might have gotten a better answer. I, for one,
didn't know what O_XATTR is. I also always get confused w.r.t. what
to call these beasts. (NFSv4 calls the named attributes.)

Btw, apps can use extended attributes (the limited sized
atomically stored/read kind). They aren't just for
storing ACLs.

> Not sure about the status of NFSv4. The guys in the posix-1e list
> should
> know better.
> 
The NFSv4 implementation in FreeBSD does not support it, although
adding it wouldn't be hard if someone figures out how to do the
syscall and adds support for the VOP()s in ZFS. (I'm not volunteering
to do the latter. I have plenty of other stuff on my to-do list;-)

rick

> regards,
> 
> Pedro.
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"
> 

From owner-freebsd-hackers@FreeBSD.ORG  Sat Nov 23 23:48:39 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 11970C75;
 Sat, 23 Nov 2013 23:48:39 +0000 (UTC)
Received: from mail.crittercasa.com (mail.turbofuzz.com [208.87.221.144])
 by mx1.freebsd.org (Postfix) with ESMTP id E1001223F;
 Sat, 23 Nov 2013 23:48:38 +0000 (UTC)
Received: from [10.20.30.117] (248.sub-70-197-7.myvzw.com [70.197.7.248])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mail.crittercasa.com (Postfix) with ESMTPS id B75F8164874;
 Sat, 23 Nov 2013 15:41:38 -0800 (PST)
Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1812\))
Subject: Re: O_XATTR support in FreeBSD?
From: Jordan Hubbard <jkh@mail.turbofuzz.com>
In-Reply-To: <820263347.19772534.1385247218007.JavaMail.root@uoguelph.ca>
Date: Sat, 23 Nov 2013 15:41:37 -0800
Message-Id: <BC41DB59-5868-432D-9452-00F420934E12@mail.turbofuzz.com>
References: <820263347.19772534.1385247218007.JavaMail.root@uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1812)
Content-Type: text/plain;
	charset=windows-1252
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.16
Cc: Freebsd hackers list <freebsd-hackers@freebsd.org>,
 Richard Yao <ryao@gentoo.org>, Pedro Giffuni <pfg@FreeBSD.org>,
 Cedric Blancher <cedric.blancher@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Nov 2013 23:48:39 -0000


On Nov 23, 2013, at 2:53 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Interestingly, FreeBSD has a VOP_OPENEXTATTR() but no syscall
> that uses it nor support for it in ZFS. (I'm just guessing it
> was intended for an openat(2) syscall at some time?)
> Btw Cedric, if you had mentioned "subfiles" or "fork files" in your
> subject line, you might have gotten a better answer. I, for one,
> didn't know what O_XATTR is. I also always get confused w.r.t. what
> to call these beasts. (NFSv4 calls the named attributes.)
>=20
> Btw, apps can use extended attributes (the limited sized
> atomically stored/read kind). They aren't just for
> storing ACLs.

Sigh.  Extended Attributes. :-/

I guess I=92ll raise my head in this discussion.  They=92ve certainly =
been the bane of my existence for long enough!

First, supporting EAs properly really involves multiple levels of the =
Unix command and library stack.

The filesystem can support them natively, sure, but that=92s actually =
somewhat optional since you can always (cough cough) stick them in a =
side-store if the rest of the stack cooperates.  That=92s where the =
awesome AppleDouble files came from (=93._weirdfile" corresponding to =
=93weirdfile") which remain useful even after filesystems like =
HFS/ZFS/UFS became EA-aware natively because there=92s always those =
foreign data stores to talk to (some early AFP/CIFS/NFS mount, for =
example) and the fact that you still need to *serialize* the dang things =
into tar / cpio / zip / ??? files as well as across network replication =
with tools like rsync.  What good is an EA, much less an ACL that=92s =
been stored in an EA, if it gets stripped off the first time you tar up =
a directory and extract it somewhere else?

So I wouldn=92t start with NFSv4 or ZFS if I was asking the question.  I =
would start with libc and ask if it had anything similar to copyfile(3) =
so that the tools above it could start actually supporting those =
attributes on a *practical* basis! :)

- Jordan