From owner-freebsd-arch@FreeBSD.ORG  Sun Oct 26 00:01:04 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2CBC4E09
 for <freebsd-arch@freebsd.org>; Sun, 26 Oct 2014 00:01:04 +0000 (UTC)
Received: from na01-bn1-obe.outbound.protection.outlook.com
 (mail-bn1bon0114.outbound.protection.outlook.com [157.56.111.114])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "mail.protection.outlook.com",
 Issuer "MSIT Machine Auth CA 2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id C6E44C1B
 for <freebsd-arch@freebsd.org>; Sun, 26 Oct 2014 00:01:02 +0000 (UTC)
Received: from CO2PR05CA011.namprd05.prod.outlook.com (10.141.241.139) by
 BL2PR05MB114.namprd05.prod.outlook.com (10.255.232.24) with Microsoft SMTP
 Server (TLS) id 15.0.1054.13; Sat, 25 Oct 2014 23:45:50 +0000
Received: from BY2FFO11FD058.protection.gbl (2a01:111:f400:7c0c::134) by
 CO2PR05CA011.outlook.office365.com (2a01:111:e400:1429::11) with Microsoft
 SMTP Server (TLS) id 15.1.6.9 via Frontend Transport; Sat, 25 Oct 2014
 23:45:49 +0000
Received: from P-EMF01-SAC.jnpr.net (66.129.239.15) by
 BY2FFO11FD058.mail.protection.outlook.com (10.1.15.178) with Microsoft SMTP
 Server (TLS) id 15.0.1049.20 via Frontend Transport; Sat, 25 Oct 2014
 23:45:49 +0000
Received: from magenta.juniper.net (172.17.27.123) by P-EMF01-SAC.jnpr.net
 (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Sat, 25 Oct
 2014 16:45:48 -0700
Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28])	by
 magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s9PNjlR96232;	Sat, 25 Oct
 2014 16:45:47 -0700 (PDT)	(envelope-from sjg@juniper.net)
Received: from chaos (localhost [127.0.0.1])	by chaos.jnpr.net (Postfix) with
 ESMTP id D871D580A3;	Sat, 25 Oct 2014 16:45:46 -0700 (PDT)
To: =?us-ascii?Q?=3D=3Futf-8=3FQ=3FDag-Erling=5FSm=3DC3=3DB8rgrav=3F=3D?=
 <des@des.no>
Subject: Re: Retiring WITH_INSTALL_AS_USER
In-Reply-To: <86k33o7ziu.fsf@nine.des.no>
References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com>
 <21044.1414038558@chaos> <E40CAE9C-0C6B-4D7C-879E-53926D0A775E@bsdimp.com>
 <9250.1414076335@chaos> <86wq7p4zcx.fsf@nine.des.no> <10072.1414165996@chaos>
 <86k33o7ziu.fsf@nine.des.no>
Comments: In-reply-to: =?us-ascii?Q?=3D=3Futf-8=3FQ=3FDag-Erling=5FSm=3DC3?=
 =?us-ascii?Q?=3DB8rgrav=3F=3D?= <des@des.no>
 message dated "Sat, 25 Oct 2014 20:52:25 +0200."
From: "Simon J. Gerraty" <sjg@juniper.net>
X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Sat, 25 Oct 2014 16:45:46 -0700
Message-ID: <15245.1414280746@chaos>
X-EOPAttributedMessage: 0
X-Forefront-Antispam-Report: CIP:66.129.239.15; CTRY:US; IPV:NLI; EFV:NLI;
 SFV:NSPM;
 SFS:(10019020)(6009001)(189002)(24454002)(199003)(76506005)(93886004)(62966002)(76482002)(87936001)(23756003)(19580395003)(46102003)(105596002)(47776003)(95666004)(117636001)(20776003)(106466001)(57986006)(80022003)(64706001)(77156001)(99396003)(81156004)(31966008)(87286001)(93916002)(84676001)(107046002)(21056001)(120916001)(104166001)(110136001)(44976005)(92726001)(92566001)(6806004)(89996001)(86362001)(68736004)(102836001)(33716001)(85852003)(97736003)(69596002)(4396001)(50466002)(50226001)(88136002)(19580405001)(85306004)(76176999)(50986999)(42262002)(62816006);
 DIR:OUT; SFP:1102; SCL:1; SRVR:BL2PR05MB114; H:P-EMF01-SAC.jnpr.net; FPR:;
 MLV:sfv; PTR:InfoDomainNonexistent; MX:1; A:1; LANG:en; 
X-Microsoft-Antispam: UriScan:;
X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:BL2PR05MB114;
X-Exchange-Antispam-Report-Test: UriScan:;
X-Forefront-PRVS: 0375972289
Received-SPF: SoftFail (protection.outlook.com: domain of transitioning
 juniper.net discourages use of 66.129.239.15 as permitted sender)
Authentication-Results: spf=softfail (sender IP is 66.129.239.15)
 smtp.mailfrom=sjg@juniper.net; 
X-OriginatorOrg: juniper.net
Cc: FreeBSD Arch <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 00:01:04 -0000

Dag-Erling Sm=F8rgrav <des@des.no> wrote:
> NO_ROOT solves it in a much better fashion, by modifying install(1)'s
> behavior so that instead of performing the chown / chgrp / chmod, it
> records it in a file which can then be used to generate a package
> manifest or something like that.

Right, so my only concern is running mtree=20
during the build to create staging tree.

Most makefiles rely on the dir they are going to install into existing,
and install cannot tell from its arguments whether the destination
should be a file or a directory.

So I'm curios as to why the filtering that made it safe to use mtree was
removed and what if anything replaced it.

From owner-freebsd-arch@FreeBSD.ORG  Sun Oct 26 03:27:45 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 78CF3987
 for <freebsd-arch@freebsd.org>; Sun, 26 Oct 2014 03:27:45 +0000 (UTC)
Received: from mail-yh0-f49.google.com (mail-yh0-f49.google.com
 [209.85.213.49])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 33C04FD4
 for <freebsd-arch@freebsd.org>; Sun, 26 Oct 2014 03:27:44 +0000 (UTC)
Received: by mail-yh0-f49.google.com with SMTP id a41so3019842yho.36
 for <freebsd-arch@freebsd.org>; Sat, 25 Oct 2014 20:27:43 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:sender:content-type:mime-version:subject:from
 :in-reply-to:date:cc:message-id:references:to;
 bh=EVegrt8ZNv3KdcMGDg3JDXO24Gke5bz8R72UGzwENeY=;
 b=ZF9eJ57AbbqkinNz3oHQB3Lr3QHPAV9jb9Q0DKeWS+SinIuPatKVGhl6sDwFJQxE4s
 ijj+O50A4Clc4zufupBeZ5cdBiGYUfLLSwNzPinbEMCi9HlHdEyjGT+gOWsWmxslu+MD
 OfuZxPUnDbJ6mIBB9A2eYT1PW6WUERTU2gXtwhsiQOzUjnOjAyeXByJ7VzSKIm0PcGAE
 4/LTxXFNLpiBdk11RWteUngXgcHalcMQMVPoo1yP3bJMRx4eYfNPbtjpHEi+H5cOTG+s
 7Mg0o6PUGVoByG4tdwZwVebgNUVjWFdCgSYavUzKXXMZFg6hy2mnwSmGFrIpLHgO8xIE
 Sx6A==
X-Gm-Message-State: ALoCoQkn6Fa7SyXsH2nHNm/DRLX0RDSEwbT9+JmMS38oJw1twsXedSUEQ919iCBw39UljDQCibFe
X-Received: by 10.236.231.98 with SMTP id k92mr496897yhq.161.1414294063797;
 Sat, 25 Oct 2014 20:27:43 -0700 (PDT)
Received: from [192.168.0.14] (173-18-133-79.client.mchsi.com. [173.18.133.79])
 by mx.google.com with ESMTPSA id h2sm4116976yhh.25.2014.10.25.20.27.43
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Sat, 25 Oct 2014 20:27:43 -0700 (PDT)
Sender: Warner Losh <wlosh@bsdimp.com>
Content-Type: multipart/signed;
 boundary="Apple-Mail=_F5883AC1-F0A2-4C99-B5C2-705037D12AFD";
 protocol="application/pgp-signature"; micalg=pgp-sha512
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Subject: Re: Retiring WITH_INSTALL_AS_USER
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <9250.1414076335@chaos>
Date: Sat, 25 Oct 2014 22:27:41 -0500
Message-Id: <C6F766A6-00DC-4F61-A87F-917BA99EBDA2@bsdimp.com>
References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com>
 <21044.1414038558@chaos> <E40CAE9C-0C6B-4D7C-879E-53926D0A775E@bsdimp.com>
 <9250.1414076335@chaos>
To: "Simon J. Gerraty" <sjg@juniper.net>
X-Mailer: Apple Mail (2.1878.6)
Cc: FreeBSD Arch <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 03:27:45 -0000


--Apple-Mail=_F5883AC1-F0A2-4C99-B5C2-705037D12AFD
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252


On Oct 23, 2014, at 9:58 AM, Simon J. Gerraty <sjg@juniper.net> wrote:

> Warner Losh <imp@bsdimp.com> wrote:
>> If it is in the tree, it needs to work.=20
>=20
> No argument there.
>=20
>> It is broken in about a dozen places
>> now. Perhaps not the ones that you use.
>=20
> Hmm I have it permanently set in a projects/bmake tree that builds
> buildworld etc fine (while producing meta files) - though its been a
> month or two since last sync.

Buildworld it is fine. installworld is where it breaks. In a lot of =
places.

> Internally we have it set in head trees too.
> I don't doubt there's something lacking - just haven't noticed, sorry.
>=20
>> Makefile.inc1 is the only place it is documented right now. NO_ROOT
>> creates a METADATA file for the attributes of the file and does =
simple
>> copies instead. This lets you build entirely as an unpriv=92d user, =
but
>> still use makefs to get a filesystem with the proper attributes. In
>> many ways it is what you want, and you could get what you want by
>> specifying /dev/null for that METADATA if it were more tightly
>> coupled.=20
>=20
> Sounds ok.=20
>=20
> Hmm etc/Makefile looks like it lost the ability to run mtree safely=20
> in a cross-build env?  The MTREE_FILTER stuff ensures that mtree =
doesn't
> choke on unknown users and such.
> How is that handled now?

That=92s a good question. With NO_ROOT you postpone the unknown users =
until makefs time. There=92s both pros and cons to that...

Warner

--Apple-Mail=_F5883AC1-F0A2-4C99-B5C2-705037D12AFD
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJUTGotAAoJEGwc0Sh9sBEA+TwQAL3XBpXVNwzMd0O4Mq9NcFSx
uWIpq64H8CVv4phMrX81bYt8pGW1URwpF9dfCirHckfH+DsQaw1IcEj40n4nIRP8
yovJesjn2l9+AjlYcmYW28HxLl6vJUP4F2Hp0JQWT47/JAUQ/ymSL1kLR0ZK/4ZA
APVUkri51GJFR9qOc/EiAvOyRnN5zLd9+opzZfdHgIxop0gbxZPk9+BwgdF1isZm
m2vA6s+BCQyvgJAub9r5phqIoQUD1sB4r1RvDae7QhWlTOfRivHQfrBWnHQcTNUl
AexamP/vFh4MOWEkFXKxOEPQSOudYIDwkeyWRKpPwO8czVqq6gAVTnS9wa4EUJTg
lMM4hhMfRYeXIaEQq84NSLzm8gGtqv+JY4rhYwavzEbfFbUn33BkJZTJbIlU0kFI
XBDOMvg/91v5MuyGXOcyHNTqzZVfnAjIOANha6HnPa2+Wrdl4V3JCNpFKR1JKWSB
1egK5sGmsbSpxlAs3z8i87apULoG2dcD5YPUz+tRufZRoPyBk+JL3/Y/ej3NYAfh
BknSZW1YjKzH/qIXQbRlz39p2W1XLXQ/+3pwe4HpcnxPuQY36631jc3oXbxUvht+
9B+3avfZWbiuOS07aed3lkCn5THrwJxGK/HQSH7Dlq+1PGLda0hSNJ4l354eOv1k
VSB8Ue23pcfaFQP3+KHP
=BqsS
-----END PGP SIGNATURE-----

--Apple-Mail=_F5883AC1-F0A2-4C99-B5C2-705037D12AFD--

From owner-freebsd-arch@FreeBSD.ORG  Sun Oct 26 03:40:09 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 16032C08
 for <freebsd-arch@freebsd.org>; Sun, 26 Oct 2014 03:40:09 +0000 (UTC)
Received: from mail-yh0-f45.google.com (mail-yh0-f45.google.com
 [209.85.213.45])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id C518B133
 for <freebsd-arch@freebsd.org>; Sun, 26 Oct 2014 03:40:08 +0000 (UTC)
Received: by mail-yh0-f45.google.com with SMTP id f73so3017946yha.18
 for <freebsd-arch@freebsd.org>; Sat, 25 Oct 2014 20:40:07 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:sender:content-type:mime-version:subject:from
 :in-reply-to:date:cc:message-id:references:to;
 bh=QNytj1TQpR+WWjf5izUZsW3+lBzhKuK3MZFttAm3o+o=;
 b=Cp7NgEg6vc1D3agaRiSKLhYaG1CF/9p10kD5Gs4gRD7hdAqWes71ImGSbOjMww3EyN
 UjRfgpaGk7J40j3RuJWEC60OMhC4VMC+FQmJYVsrGtffL8TMH9oPSe7ghbS0rxyacpy9
 gGsTcCOFP658pvyPkdmYSdjrIqVFP5W8uVEdHiwHRlIn/cKv5nY77oMlMzpuK4mmF2lS
 7GMYy/zx/jY0E+68bDnco7hdhaTaWoFMy0bj8LTIqWjaZb0EsU9RMnNN9kfx0XNccVhl
 8pX9V5CbiAu40Ot2WIQgkgJbKMPkLfygvmkjME8tGmWNscWu582jocyo/r4qnRqmjUml
 g2Tw==
X-Gm-Message-State: ALoCoQmXuU5os5C3FkIxaPoYjgodsoD1fj++6SNEMyFDtZHJTRbLKCkrhYqcd0+LihCRF6rIAvaq
X-Received: by 10.236.19.69 with SMTP id m45mr14059367yhm.111.1414294807324;
 Sat, 25 Oct 2014 20:40:07 -0700 (PDT)
Received: from [192.168.0.14] (173-18-133-79.client.mchsi.com. [173.18.133.79])
 by mx.google.com with ESMTPSA id v31sm4134256yha.16.2014.10.25.20.40.06
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Sat, 25 Oct 2014 20:40:06 -0700 (PDT)
Sender: Warner Losh <wlosh@bsdimp.com>
Content-Type: multipart/signed;
 boundary="Apple-Mail=_E190E4BF-9E4A-4887-A132-23BC81335740";
 protocol="application/pgp-signature"; micalg=pgp-sha512
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Subject: Re: Retiring WITH_INSTALL_AS_USER
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <9250.1414076335@chaos>
Date: Sat, 25 Oct 2014 22:40:05 -0500
Message-Id: <E27F1AEF-90BD-48C3-A145-8AE79E8B8C54@bsdimp.com>
References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com>
 <21044.1414038558@chaos> <E40CAE9C-0C6B-4D7C-879E-53926D0A775E@bsdimp.com>
 <9250.1414076335@chaos>
To: "Simon J. Gerraty" <sjg@juniper.net>
X-Mailer: Apple Mail (2.1878.6)
Cc: FreeBSD Arch <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 03:40:09 -0000


--Apple-Mail=_E190E4BF-9E4A-4887-A132-23BC81335740
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252


On Oct 23, 2014, at 9:58 AM, Simon J. Gerraty <sjg@juniper.net> wrote:
> Hmm etc/Makefile looks like it lost the ability to run mtree safely=20
> in a cross-build env?  The MTREE_FILTER stuff ensures that mtree =
doesn't
> choke on unknown users and such.
> How is that handled now?

I=92m not sure I follow. MTREE_FILTER doesn=92t seem to exist in any =
version I=92ve checked in the last three years.

Warner


--Apple-Mail=_E190E4BF-9E4A-4887-A132-23BC81335740
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJUTG0VAAoJEGwc0Sh9sBEAthsP/12M/zIxlZV1vG1zMAfsrfX+
VAq+ICCExGtcPD18SFjcYpiHJcb1MQ9gKYzoq1c8W8A000AK48H9YNi21I3NpqrH
ciY414KQycQXjwLT1ViOg70LWipBvYODF5mSrjQBuuVFzf0psfDhtahwFTPtO/zy
XzRCWkJ0kzmU3UMQGhASnyjGJCrrH4WAh075OAjwHDaMkY3ANEcHQqY1O1D0Iaqp
xgU7PA+06Y3/7+ga3h0ksKF+V1fYI3PwHMD9qTSyGaSdaUt16JsTZ6n9pJgjQkb3
QzPKG3VPu2wufKKwEp2IoQwPhLzJLJhmDGdV7MxlHcSNGlmAKnqx3GJ3/SV+ONKA
6QKDAipR7wUBQ6y5jvmGyF8wW6NVLM3SvKYIzflcmImy/+ajPWKSd+HawREeeE8G
FUN36OJWho5b7vQ7LFHRahGaJMOYOOfDNyWG76MhJUPUicO9LCmRoXCoKq1qISlA
cLDzjuO36HFiWg9urL1xLQ2RVi0wp0dEvJwz8FUNsLZX/HVmwQINzLT0b2ebKmAS
Q94CzuJJBbUWH+KszoNBvfGIPEn492r0/JxNvsX26PhCUMcZ1Lg9DPx3x8HXi2sD
wmrSHfeguPyqIVq7p+87HArssZcrePQJG3gNu3a55ib1gSbLawJa8un69cqrKRgI
GcxenIss9uNaW2HM/Eeq
=v3Kf
-----END PGP SIGNATURE-----

--Apple-Mail=_E190E4BF-9E4A-4887-A132-23BC81335740--

From owner-freebsd-arch@FreeBSD.ORG  Sun Oct 26 06:31:37 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 54811D65
 for <freebsd-arch@freebsd.org>; Sun, 26 Oct 2014 06:31:37 +0000 (UTC)
Received: from na01-bn1-obe.outbound.protection.outlook.com
 (mail-bn1bbn0109.outbound.protection.outlook.com [157.56.111.109])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "mail.protection.outlook.com",
 Issuer "MSIT Machine Auth CA 2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E8CF6F5
 for <freebsd-arch@freebsd.org>; Sun, 26 Oct 2014 06:31:36 +0000 (UTC)
Received: from BLUPR05CA0063.namprd05.prod.outlook.com (10.141.20.33) by
 SN2PR0501MB1037.namprd05.prod.outlook.com (25.160.58.154) with Microsoft SMTP
 Server (TLS) id 15.1.6.9; Sun, 26 Oct 2014 03:59:25 +0000
Received: from BY2FFO11FD051.protection.gbl (2a01:111:f400:7c0c::143) by
 BLUPR05CA0063.outlook.office365.com (2a01:111:e400:855::33) with Microsoft
 SMTP Server (TLS) id 15.1.6.9 via Frontend Transport; Sun, 26 Oct 2014
 03:59:25 +0000
Received: from P-EMF01-SAC.jnpr.net (66.129.239.15) by
 BY2FFO11FD051.mail.protection.outlook.com (10.1.15.188) with Microsoft SMTP
 Server (TLS) id 15.0.1049.20 via Frontend Transport; Sun, 26 Oct 2014
 03:59:25 +0000
Received: from magenta.juniper.net (172.17.27.123) by P-EMF01-SAC.jnpr.net
 (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Sat, 25 Oct
 2014 20:59:24 -0700
Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28])	by
 magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s9Q3xNR43856;	Sat, 25 Oct
 2014 20:59:23 -0700 (PDT)	(envelope-from sjg@juniper.net)
Received: from chaos (localhost [127.0.0.1])	by chaos.jnpr.net (Postfix) with
 ESMTP id 3BAA0580A3;	Sat, 25 Oct 2014 20:59:23 -0700 (PDT)
To: Warner Losh <imp@bsdimp.com>
Subject: Re: Retiring WITH_INSTALL_AS_USER
In-Reply-To: <E27F1AEF-90BD-48C3-A145-8AE79E8B8C54@bsdimp.com>
References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com>
 <21044.1414038558@chaos> <E40CAE9C-0C6B-4D7C-879E-53926D0A775E@bsdimp.com>
 <9250.1414076335@chaos> <E27F1AEF-90BD-48C3-A145-8AE79E8B8C54@bsdimp.com>
Comments: In-reply-to: Warner Losh <imp@bsdimp.com>
 message dated "Sat, 25 Oct 2014 22:40:05 -0500."
From: "Simon J. Gerraty" <sjg@juniper.net>
X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Date: Sat, 25 Oct 2014 20:59:23 -0700
Message-ID: <24381.1414295963@chaos>
X-EOPAttributedMessage: 0
X-Forefront-Antispam-Report: CIP:66.129.239.15; CTRY:US; IPV:NLI; EFV:NLI;
 SFV:NSPM;
 SFS:(10019020)(6009001)(199003)(189002)(24454002)(46102003)(76482002)(69596002)(76506005)(19580395003)(68736004)(99396003)(80022003)(50466002)(89996001)(57986006)(6806004)(85852003)(104166001)(92726001)(93886004)(4396001)(20776003)(558084003)(47776003)(92566001)(19580405001)(44976005)(102836001)(77156001)(84676001)(85306004)(88136002)(93916002)(86362001)(50226001)(107046002)(81156004)(106466001)(23676002)(87936001)(50986999)(76176999)(21056001)(110136001)(87286001)(97736003)(33716001)(117636001)(105596002)(120916001)(64706001)(62966002)(95666004)(31966008)(62816006)(42262002);
 DIR:OUT; SFP:1102; SCL:1; SRVR:SN2PR0501MB1037; H:P-EMF01-SAC.jnpr.net; FPR:;
 MLV:sfv; PTR:InfoDomainNonexistent; A:1; MX:1; LANG:en; 
X-Microsoft-Antispam: UriScan:;
X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:SN2PR0501MB1037;
X-Forefront-PRVS: 0376ECF4DD
Received-SPF: SoftFail (protection.outlook.com: domain of transitioning
 juniper.net discourages use of 66.129.239.15 as permitted sender)
Authentication-Results: spf=softfail (sender IP is 66.129.239.15)
 smtp.mailfrom=sjg@juniper.net; 
X-OriginatorOrg: juniper.net
Cc: FreeBSD Arch <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 06:31:37 -0000

Warner Losh <imp@bsdimp.com> wrote:
> I=E2=80=99m not sure I follow. MTREE_FILTER doesn=E2=80=99t seem to exist=
 in any
> version I=E2=80=99ve checked in the last three years.=20

Ah - my faulty memory ;-)
Its in projects/bmake - and our internal tree.

Ignore me.

From owner-freebsd-arch@FreeBSD.ORG  Sun Oct 26 06:43:01 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 44D96B3;
 Sun, 26 Oct 2014 06:43:01 +0000 (UTC)
Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com
 [IPv6:2a00:1450:400c:c05::236])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id AEBA11D6;
 Sun, 26 Oct 2014 06:43:00 +0000 (UTC)
Received: by mail-wi0-f182.google.com with SMTP id d1so456220wiv.15
 for <multiple recipients>; Sat, 25 Oct 2014 23:42:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=N6aa71F3326OK2SG7tqZl29vPvewLtk8qOAXufQd4OM=;
 b=PYqThDoK/0PvMcHdjvPnxOB5nvaPyViFVaPWe6HgEB6M+vKNLUoC9cx7E0ChryiXF/
 8VKSxEwOyMqzjrXXdzgpYluMnyVyQyVwptcuaFYPmICUOfjdGIxzW6/PTK2gF/+VfzFw
 mlVXtUnUTAqDiCZDBCzf8bOjLkoazGKmz6neFpsD/E8xkeKPrQ2I++puwPDYbik9UTtP
 8vJ4+8Xn4hV2cyfChrkmxWergtt+0qUk+vT5cG+4LazL9joPJmBSC4ST0fewwddK3mXz
 l1PhQp/z0HAnGGcPRJVfaKxhqkhRw1Fwm2KdVPLx7sd1XdBzONtmLIhoBzfbSzVZudFi
 TB0w==
MIME-Version: 1.0
X-Received: by 10.194.192.161 with SMTP id hh1mr15577974wjc.72.1414305778801; 
 Sat, 25 Oct 2014 23:42:58 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.216.106.136 with HTTP; Sat, 25 Oct 2014 23:42:58 -0700 (PDT)
In-Reply-To: <1414265035.12052.646.camel@revolution.hippie.lan>
References: <20141025184448.GA19066@dft-labs.eu>
 <20141025190407.GU82214@funkthat.com>
 <1414265035.12052.646.camel@revolution.hippie.lan>
Date: Sat, 25 Oct 2014 23:42:58 -0700
X-Google-Sender-Auth: q4f-Gfzx0pjuCYMyQ25RXz1o9RA
Message-ID: <CAJ-VmokmY8SRvHyxqkgdw9eaQCDuRz-9vsZ9YGuYS5bD40rdQQ@mail.gmail.com>
Subject: Re: refcount_release_take_##lock
From: Adrian Chadd <adrian@freebsd.org>
To: Ian Lepore <ian@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: John-Mark Gurney <jmg@funkthat.com>, Mateusz Guzik <mjguzik@gmail.com>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Oct 2014 06:43:01 -0000

This is exactly why refcount==0 should be the only prelude to freeing
the object. There should be no way to actually take a reference on an
object that has a refcount of 0, because (surprise) at this stage
noone is referencing it anymore.

Ie, once the refcount hits 0, this means that nothing references it at
all - including any data structures that may be storing it. For
example, if an rtentry is in a radix tree, its refcount should be 1 or
more, not 0.

It's the only way this can work.

(The net80211 stack suffers from this and I'm about to set it on fire
until I fix it. It's been a source of crashes for almost 6 years now.)


-adrian


On 25 October 2014 12:23, Ian Lepore <ian@freebsd.org> wrote:
> On Sat, 2014-10-25 at 12:04 -0700, John-Mark Gurney wrote:
>> Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200:
>> > The following idiom is used here and there:
>> >
>> > int old;
>> > old = obj->ref;
>> > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1))
>> >     return;
>> > lock(&something);
>> > if (refcount_release(&obj->ref) == 0) {
>> >     unlock(&something);
>> >     return;
>> > }
>> > free up
>> > unlock(&something);
>> >
>> > ==========
>>
>> Couldn't this be better written as:
>> if (__predict_false(refcount_release(&obj->ref) == 0)) {
>
> Could you not get preempted at this point, whereupon another thread
> acquires then releases obj, deletes it because it keeps running through
> this point, then eventually your original thread wakes up, gets the
> lock, and dereferences the now-defunct obj pointer?
>
> (Also, I think that should be != 0, above?)
>
> -- Ian
>
>>       lock(&something);
>>       if (__predict_true(!obj->ref)) {
>>               free up
>>       }
>>       unlock(&something);
>> }
>>
>> The reason I'm asking is that I changed how IPsec SA ref counting was
>> handled, and used something similar...
>>
>> My code gets rid of a branch, and is better in that it uses refcount
>> API properly, instead of using atomic_cmpset_int...
>>
>> > I decided to implement it as a common function.
>> >
>> > We have only refcount.h and I didn't want to bloat all including code
>> > with additional definitions and as such I came up with a macro that has
>> > to be used in .c file and that will define appropriate inline func.
>> >
>> > I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_
>> > macro, assuming it has to stay.
>>
>> You could shorten it to REFCNT_REL_TAKE_
>>
>> > Comments?
>>
>> Will you update the refcount(9) man page w/ documentation before
>> committing?
>>
>
>
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 06:59:35 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id F3ED721B;
 Mon, 27 Oct 2014 06:59:34 +0000 (UTC)
Received: from pp2.rice.edu (proofpoint2.mail.rice.edu [128.42.201.101])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9BFF829A;
 Mon, 27 Oct 2014 06:59:33 +0000 (UTC)
Received: from pps.filterd (pp2.rice.edu [127.0.0.1])
 by pp2.rice.edu (8.14.5/8.14.5) with SMTP id s9R6xQFn024741;
 Mon, 27 Oct 2014 01:59:26 -0500
Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30])
 by pp2.rice.edu with ESMTP id 1q7yw40j3w-1;
 Mon, 27 Oct 2014 01:59:25 -0500
X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel
Received: from 108-254-203-201.lightspeed.hstntx.sbcglobal.net
 (108-254-203-201.lightspeed.hstntx.sbcglobal.net [108.254.203.201])
 (using TLSv1 with cipher RC4-MD5 (128/128 bits))
 (No client certificate requested) (Authenticated sender: alc)
 by mh11.mail.rice.edu (Postfix) with ESMTPSA id 4752B4C00A5;
 Mon, 27 Oct 2014 01:59:25 -0500 (CDT)
Message-ID: <544DED4C.3010501@rice.edu>
Date: Mon, 27 Oct 2014 01:59:24 -0500
From: Alan Cox <alc@rice.edu>
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: Svatopluk Kraus <onwahe@gmail.com>
Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE
References: <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com>	<CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt+ooD+3pzq+FV++TQ@mail.gmail.com>	<CAFHCsPWq9WqeFnx1a+StfSxj=jwcE9GPyVsoyh0+azr3HmM6vQ@mail.gmail.com>	<5428AF3B.1030906@rice.edu>	<CAFHCsPWxF0G+bqBYgxH=WtV+St_UTWZj+Y2-PHfoYSLjC_Qpig@mail.gmail.com>	<54497DC1.5070506@rice.edu>
 <CAFHCsPVj3PGbkSmkKsd2bGvmh3+dZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com>
In-Reply-To: <CAFHCsPVj3PGbkSmkKsd2bGvmh3+dZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com>
X-Enigmail-Version: 1.6
Content-Type: multipart/mixed; boundary="------------090909070100060609070401"
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0
 kscore.is_bulkscore=0
 kscore.compositescore=0.999328515101207 circleOfTrustscore=0
 compositescore=0.601496849000349 urlsuspect_oldscore=0.00149684900034924
 suspectscore=11 recipient_domain_to_sender_totalscore=0 phishscore=0
 bulkscore=0 kscore.is_spamscore=0 recipient_to_sender_totalscore=0
 recipient_domain_to_sender_domain_totalscore=0 rbsscore=0.601496849000349
 spamscore=0 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.9
 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=7.0.1-1402240000 definitions=main-1410270079
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 06:59:35 -0000

This is a multi-part message in MIME format.
--------------090909070100060609070401
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

On 10/24/2014 06:33, Svatopluk Kraus wrote:
>
> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu
> <mailto:alc@rice.edu>> wrote:
>
>     On 10/08/2014 10:38, Svatopluk Kraus wrote:
>     > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu
>     <mailto:alc@rice.edu>> wrote:
>     >
>     >>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>     >>
>     >>
>     >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com
>     <mailto:alan.l.cox@gmail.com>> wrote:
>     >>
>     >>>
>     >>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus
>     <onwahe@gmail.com <mailto:onwahe@gmail.com>>
>     >>> wrote:
>     >>>
>     >>>> Hi,
>     >>>>
>     >>>> I and Michal are finishing new ARM pmap-v6 code. There is one
>     problem
>     >>>> we've
>     >>>> dealt with somehow, but now we would like to do it better.
>     It's about
>     >>>> physical pages which are allocated before vm subsystem is
>     initialized.
>     >>>> While later on these pages could be found in vm_page_array when
>     >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>     >>>> VM_PHYSSEG_SPARSE
>     >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>     >>>>
>     >>>> It really would be nice to utilize vm_page_array for such
>     preallocated
>     >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is
>     used. Things
>     >>>> could be much easier then. In our case, it's about pages
>     which are used
>     >>>> for
>     >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two
>     sets of such
>     >>>> pages. First ones are preallocated and second ones are
>     allocated after vm
>     >>>> subsystem was inited. We must deal with each set differently.
>     So code is
>     >>>> more complex and so is debugging.
>     >>>>
>     >>>> Thus we need some method how to say that some part of
>     physical memory
>     >>>> should be included in vm_page_array, but the pages from that
>     region
>     >>>> should
>     >>>> not be put to free list during initialization. We think that such
>     >>>> possibility could be utilized in general. There could be a
>     need for some
>     >>>> physical space which:
>     >>>>
>     >>>> (1) is needed only during boot and later on it can be freed
>     and put to vm
>     >>>> subsystem,
>     >>>>
>     >>>> (2) is needed for something else and vm_page_array code could
>     be used
>     >>>> without some kind of its duplication.
>     >>>>
>     >>>> There is already some code which deals with blacklisted pages in
>     >>>> vm_page.c
>     >>>> file. So the easiest way how to deal with presented situation
>     is to add
>     >>>> some callback to this part of code which will be able to
>     either exclude
>     >>>> whole phys_avail[i], phys_avail[i+1] region or single pages.
>     As the
>     >>>> biggest
>     >>>> phys_avail region is used for vm subsystem allocations, there
>     should be
>     >>>> some more coding. (However, blacklisted pages are not dealt
>     with on that
>     >>>> part of region.)
>     >>>>
>     >>>> We would like to know if there is any objection:
>     >>>>
>     >>>> (1) to deal with presented problem,
>     >>>> (2) to deal with the problem presented way.
>     >>>> Some help is very appreciated. Thanks
>     >>>>
>     >>>>
>     >>> As an experiment, try modifying vm_phys.c to use dump_avail
>     instead of
>     >>> phys_avail when sizing vm_page_array.  On amd64, where the
>     same problem
>     >>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now,
>     this is
>     >>> probably my preferred solution.  The catch being that not all
>     architectures
>     >>> implement dump_avail, but my recollection is that arm does.
>     >>>
>     >> Frankly, I would prefer this too, but there is one big open
>     question:
>     >>
>     >> What is dump_avail for?
>     >>
>     >>
>     >>
>     >> dump_avail[] is solving a similar problem in the minidump code,
>     hence, the
>     >> prefix "dump_" in its name.  In other words, the minidump code
>     couldn't use
>     >> phys_avail[] either because it didn't describe the full range
>     of physical
>     >> addresses that might be included in a minidump, so dump_avail[]
>     was created.
>     >>
>     >> There is already precedent for what I'm suggesting. 
>     dump_avail[] is
>     >> already (ab)used outside of the minidump code on x86 to solve
>     this same
>     >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>     >>
>     >>
>     >>  Using it for vm_page_array initialization and segmentation
>     means that
>     >> phys_avail must be a subset of it. And this must be stated and
>     be visible
>     >> enough. Maybe it should be even checked in code. I like the idea of
>     >> thinking about dump_avail as something what desribes all memory
>     in a
>     >> system, but it's not how dump_avail is defined in archs now.
>     >>
>     >>
>     >>
>     >> When you say "it's not how dump_avail is defined in archs now",
>     I'm not
>     >> sure whether you're talking about the code or the comments.  In
>     terms of
>     >> code, dump_avail[] is a superset of phys_avail[], and I'm not
>     aware of any
>     >> code that would have to change.  In terms of comments, I did a
>     grep looking
>     >> for comments defining what dump_avail[] is, because I couldn't
>     remember
>     >> any.  I found one ... on arm.  So, I don't think it's a onerous
>     task
>     >> changing the definition of dump_avail[].  :-)
>     >>
>     >> Already, as things stand today with dump_avail[] being used
>     outside of the
>     >> minidump code, one could reasonably argue that it should be
>     renamed to
>     >> something like phys_exists[].
>     >>
>     >>
>     >>
>     >> I will experiment with it on monday then. However, it's not
>     only about how
>     >> memory segments are created in vm_phys.c, but it's about how
>     vm_page_array
>     >> size is computed in vm_page.c too.
>     >>
>     >>
>     >>
>     >> Yes, and there is also a place in vm_reserv.c that needs to
>     change.   I've
>     >> attached the patch that I developed and tested a long time
>     ago.  It still
>     >> applies cleanly and runs ok on amd64.
>     >>
>     >>
>     >>
>     >
>     >
>     > Well, I've created and tested minimalistic patch which - I hope - is
>     > commitable. It runs ok on pandaboard (arm-v6) and solves
>     presented problem.
>     > I would really appreciate if this will be commited. Thanks.
>
>
>     Sorry for the slow reply.  I've just been swamped with work lately.  I
>     finally had some time to look at this in the last day or so.
>
>     The first thing that I propose to do is commit the attached
>     patch.  This
>     patch changes pmap_init() on amd64, armv6, and i386 so that it no
>     longer
>     consults phys_avail[] to determine the end of memory.  Instead, it
>     calls
>     a new function provided by vm_phys.c to obtain the same
>     information from
>     vm_phys_segs[].
>
>     With this change, the new variable phys_managed in your patch wouldn't
>     need to be a global.  It could be a local variable in
>     vm_page_startup()
>     that we pass as a parameter to vm_phys_init() and vm_reserv_init().
>
>     More generally, the long-term vision that I have is that we would stop
>     using phys_avail[] after vm_page_startup() had completed.  It
>     would only
>     be used during initialization.  After that we would use vm_phys_segs[]
>     and functions provided by vm_phys.c.
>
>  
> I understand. The patch and the long-term vision are fine for me. I
> just was not to bold to pass phys_managed as a parameter to
> vm_phys_init() and vm_reserv_init(). However, I certainly was thinking
> about it. While reading comment above vm_phys_get_end(), do we care of
> if last usable address is 0xFFFFFFFF?


To date, this hasn't been a problem.  However, handling 0xFFFFFFFF is
easy.  So, the final version of the patch that I committed this weekend
does so.

Can you please try the attached patch?  It replaces phys_avail[] with
vm_phys_segs[] in arm's busdma.


> Do you think that the rest of my patch considering changes due to your
> patch is ok?
>  


Basically, yes.  I do, however, think that

+#if defined(__arm__)
+       phys_managed = dump_avail;
+#else
+       phys_managed = phys_avail;
+#endif

should also be conditioned on VM_PHYSSEG_SPARSE.


>  
>
>     >
>     > BTW, while I was inspecting all archs, I think that maybe it's
>     time to do
>     > what was done for busdma not long ago. There are many similar
>     codes across
>     > archs which deal with physical memory and could be generalized
>     and put to
>     > kern/subr_physmem.c for utilization. All work with physical
>     memory could be
>     > simplify to two arrays of regions.
>     >
>     > phys_present[] ... describes all present physical memory regions
>     > phys_exclude[] ... describes various exclusions from phys_present[]
>     >
>     > Each excluded region will be labeled by flags to say what kind
>     of exclusion
>     > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE,
>     NOMEMRW  could
>     > be combined. This idea is taken from sys/arm/arm/physmem.c.
>     >
>     > All other arrays like phys_managed[], phys_avail[], dump_avail[]
>     will be
>     > created from these phys_present[] and phys_exclude[].
>     > This way bootstrap codes in archs could be simplified and
>     unified. For
>     > example, dealing with either hw.physmem or page with PA
>     0x00000000 could be
>     > transparent.
>     >
>     > I'm prepared to volunteer if the thing is ripe. However, some
>     tutor will be
>     > looked for.
>
>
>     I've never really looked at arm/arm/physmem.c before.  Let me do that
>     before I comment on this.
>
> No problem. This could be long-term aim. However, I hope the
> VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present
> time. In every case, thanks for your help.
>  
>  


--------------090909070100060609070401
Content-Type: text/plain; charset=ISO-8859-15;
 name="busdma_arm1.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="busdma_arm1.patch"

SW5kZXg6IGFybS9hcm0vYnVzZG1hX21hY2hkZXAtdjYuYwo9PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBh
cm0vYXJtL2J1c2RtYV9tYWNoZGVwLXY2LmMJKHJldmlzaW9uIDI3MzY5OSkKKysrIGFybS9h
cm0vYnVzZG1hX21hY2hkZXAtdjYuYwkod29ya2luZyBjb3B5KQpAQCAtNTQsOCArNTQsOSBA
QCBfX0ZCU0RJRCgiJEZyZWVCU0QkIik7CiAjaW5jbHVkZSA8c3lzL3Vpby5oPgogCiAjaW5j
bHVkZSA8dm0vdm0uaD4KKyNpbmNsdWRlIDx2bS92bV9wYXJhbS5oPgogI2luY2x1ZGUgPHZt
L3ZtX3BhZ2UuaD4KLSNpbmNsdWRlIDx2bS92bV9tYXAuaD4KKyNpbmNsdWRlIDx2bS92bV9w
aHlzLmg+CiAjaW5jbHVkZSA8dm0vdm1fZXh0ZXJuLmg+CiAjaW5jbHVkZSA8dm0vdm1fa2Vy
bi5oPgogCkBAIC0yNzcsMTYgKzI3OCwxOCBAQCBTWVNJTklUKGJ1c2RtYSwgU0lfU1VCX0tN
RU0rMSwgU0lfT1JERVJfRklSU1QsIGJ1cwogICogZXhwcmVzcywgc28gd2UgdGFrZSBhIGZh
c3Qgb3V0LgogICovCiBzdGF0aWMgaW50Ci1leGNsdXNpb25fYm91bmNlX2NoZWNrKHZtX29m
ZnNldF90IGxvd2FkZHIsIHZtX29mZnNldF90IGhpZ2hhZGRyKQorZXhjbHVzaW9uX2JvdW5j
ZV9jaGVjayh2bV9wYWRkcl90IGxvd2FkZHIsIHZtX3BhZGRyX3QgaGlnaGFkZHIpCiB7CisJ
c3RydWN0IHZtX3BoeXNfc2VnICpzZWc7CiAJaW50IGk7CiAKIAlpZiAobG93YWRkciA+PSBC
VVNfU1BBQ0VfTUFYQUREUikKIAkJcmV0dXJuICgwKTsKIAotCWZvciAoaSA9IDA7IHBoeXNf
YXZhaWxbaV0gJiYgcGh5c19hdmFpbFtpICsgMV07IGkgKz0gMikgewotCQlpZiAoKGxvd2Fk
ZHIgPj0gcGh5c19hdmFpbFtpXSAmJiBsb3dhZGRyIDwgcGh5c19hdmFpbFtpICsgMV0pIHx8
Ci0JCSAgICAobG93YWRkciA8IHBoeXNfYXZhaWxbaV0gJiYgaGlnaGFkZHIgPj0gcGh5c19h
dmFpbFtpXSkpCisJZm9yIChpID0gMDsgaSA8IHZtX3BoeXNfbnNlZ3M7IGkrKykgeworCQlz
ZWcgPSAmdm1fcGh5c19zZWdzW2ldOworCQlpZiAoKGxvd2FkZHIgPj0gc2VnLT5zdGFydCAm
JiBsb3dhZGRyIDwgc2VnLT5lbmQpIHx8CisJCSAgICAobG93YWRkciA8IHNlZy0+c3RhcnQg
JiYgaGlnaGFkZHIgPj0gc2VnLT5zdGFydCkpCiAJCQlyZXR1cm4gKDEpOwogCX0KIAlyZXR1
cm4gKDApOwpJbmRleDogYXJtL2FybS9idXNkbWFfbWFjaGRlcC5jCj09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0K
LS0tIGFybS9hcm0vYnVzZG1hX21hY2hkZXAuYwkocmV2aXNpb24gMjczNjk5KQorKysgYXJt
L2FybS9idXNkbWFfbWFjaGRlcC5jCSh3b3JraW5nIGNvcHkpCkBAIC03MCwxMCArNzAsMTEg
QEAgX19GQlNESUQoIiRGcmVlQlNEJCIpOwogCiAjaW5jbHVkZSA8dm0vdW1hLmg+CiAjaW5j
bHVkZSA8dm0vdm0uaD4KKyNpbmNsdWRlIDx2bS92bV9wYXJhbS5oPgogI2luY2x1ZGUgPHZt
L3ZtX2V4dGVybi5oPgogI2luY2x1ZGUgPHZtL3ZtX2tlcm4uaD4KICNpbmNsdWRlIDx2bS92
bV9wYWdlLmg+Ci0jaW5jbHVkZSA8dm0vdm1fbWFwLmg+CisjaW5jbHVkZSA8dm0vdm1fcGh5
cy5oPgogCiAjaW5jbHVkZSA8bWFjaGluZS9hdG9taWMuaD4KICNpbmNsdWRlIDxtYWNoaW5l
L2J1cy5oPgpAQCAtMzI2LDE3ICszMjcsMTkgQEAgcnVuX2ZpbHRlcihidXNfZG1hX3RhZ190
IGRtYXQsIGJ1c19hZGRyX3QgcGFkZHIpCiAgKiBleHByZXNzLCBzbyB3ZSB0YWtlIGEgZmFz
dCBvdXQuCiAgKi8KIHN0YXRpYyBfX2lubGluZSBpbnQKLV9idXNfZG1hX2Nhbl9ib3VuY2Uo
dm1fb2Zmc2V0X3QgbG93YWRkciwgdm1fb2Zmc2V0X3QgaGlnaGFkZHIpCitfYnVzX2RtYV9j
YW5fYm91bmNlKHZtX3BhZGRyX3QgbG93YWRkciwgdm1fcGFkZHJfdCBoaWdoYWRkcikKIHsK
KwlzdHJ1Y3Qgdm1fcGh5c19zZWcgKnNlZzsKIAlpbnQgaTsKIAogCWlmIChsb3dhZGRyID49
IEJVU19TUEFDRV9NQVhBRERSKQogCQlyZXR1cm4gKDApOwogCi0JZm9yIChpID0gMDsgcGh5
c19hdmFpbFtpXSAmJiBwaHlzX2F2YWlsW2kgKyAxXTsgaSArPSAyKSB7Ci0JCWlmICgobG93
YWRkciA+PSBwaHlzX2F2YWlsW2ldICYmIGxvd2FkZHIgPD0gcGh5c19hdmFpbFtpICsgMV0p
Ci0JCSAgICB8fCAobG93YWRkciA8IHBoeXNfYXZhaWxbaV0gJiYKLQkJICAgIGhpZ2hhZGRy
ID4gcGh5c19hdmFpbFtpXSkpCisJZm9yIChpID0gMDsgaSA8IHZtX3BoeXNfbnNlZ3M7IGkr
KykgeworCQlzZWcgPSAmdm1fcGh5c19zZWdzW2ldOworCQlpZiAoKGxvd2FkZHIgPj0gc2Vn
LT5zdGFydCAmJiBsb3dhZGRyIDw9IHNlZy0+ZW5kKQorCQkgICAgfHwgKGxvd2FkZHIgPCBz
ZWctPnN0YXJ0ICYmCisJCSAgICBoaWdoYWRkciA+IHNlZy0+c3RhcnQpKQogCQkJcmV0dXJu
ICgxKTsKIAl9CiAJcmV0dXJuICgwKTsK
--------------090909070100060609070401--

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 13:22:54 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 77C9793;
 Mon, 27 Oct 2014 13:22:54 +0000 (UTC)
Received: from mail-qa0-x232.google.com (mail-qa0-x232.google.com
 [IPv6:2607:f8b0:400d:c00::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1CE05794;
 Mon, 27 Oct 2014 13:22:54 +0000 (UTC)
Received: by mail-qa0-f50.google.com with SMTP id cs9so3700333qab.23
 for <multiple recipients>; Mon, 27 Oct 2014 06:22:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=VDnRCsrq4Y8dqS0xYiI32m8B/cg4150RMsVgyAnfORY=;
 b=Yz5NQHZjUx88Baghv3u0NbsBYhmkk4Aj8BW3o+mqC6kSmRJ62m1ju5Aa2XSXkuqJrV
 UchSZa0h00oz7sxTpX/EtBh3dzqBg5Y7ogmGyu8Ln0mOPzD4IEwI7BIt2S1xreYhdhS8
 RRpv1P8UebIkYHZ+w8ildF1X/xzuNTir3t9UJKp9qaQQHWxGxLnPXYlwQyt+Rjjw03Yh
 cchO840P6sf/G9ihRk6zM2HheNkARhWKOA+YoVsojpG+xCj43bFcBSGs49k7uH1napda
 FTBHtMfvL6NB0rng58Ey44q1ZO71zTpxK8qEmKjiRmBA9OOy89ARri/NO1vtlrJqhglj
 1o6A==
MIME-Version: 1.0
X-Received: by 10.229.176.70 with SMTP id bd6mr20043683qcb.12.1414416173014;
 Mon, 27 Oct 2014 06:22:53 -0700 (PDT)
Received: by 10.140.23.242 with HTTP; Mon, 27 Oct 2014 06:22:52 -0700 (PDT)
In-Reply-To: <544DED4C.3010501@rice.edu>
References: <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com>
 <CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt+ooD+3pzq+FV++TQ@mail.gmail.com>
 <CAFHCsPWq9WqeFnx1a+StfSxj=jwcE9GPyVsoyh0+azr3HmM6vQ@mail.gmail.com>
 <5428AF3B.1030906@rice.edu>
 <CAFHCsPWxF0G+bqBYgxH=WtV+St_UTWZj+Y2-PHfoYSLjC_Qpig@mail.gmail.com>
 <54497DC1.5070506@rice.edu>
 <CAFHCsPVj3PGbkSmkKsd2bGvmh3+dZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com>
 <544DED4C.3010501@rice.edu>
Date: Mon, 27 Oct 2014 14:22:52 +0100
Message-ID: <CAFHCsPV1H6XsOoDFitQFgJOP6u+giEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com>
Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE
From: Svatopluk Kraus <onwahe@gmail.com>
To: Alan Cox <alc@rice.edu>
Content-Type: multipart/mixed; boundary=001a11c2d8ba8f4c290506676df8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 13:22:54 -0000

--001a11c2d8ba8f4c290506676df8
Content-Type: text/plain; charset=UTF-8

On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox <alc@rice.edu> wrote:

>   On 10/24/2014 06:33, Svatopluk Kraus wrote:
>
>
> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu> wrote:
>
>>  On 10/08/2014 10:38, Svatopluk Kraus wrote:
>> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu> wrote:
>> >
>> >>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>> >>
>> >>
>> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com>
>> wrote:
>> >>
>> >>>
>> >>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem
>> >>>> we've
>> >>>> dealt with somehow, but now we would like to do it better. It's about
>> >>>> physical pages which are allocated before vm subsystem is
>> initialized.
>> >>>> While later on these pages could be found in vm_page_array when
>> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>> >>>> VM_PHYSSEG_SPARSE
>> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>> >>>>
>> >>>> It really would be nice to utilize vm_page_array for such
>> preallocated
>> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used.
>> Things
>> >>>> could be much easier then. In our case, it's about pages which are
>> used
>> >>>> for
>> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of
>> such
>> >>>> pages. First ones are preallocated and second ones are allocated
>> after vm
>> >>>> subsystem was inited. We must deal with each set differently. So
>> code is
>> >>>> more complex and so is debugging.
>> >>>>
>> >>>> Thus we need some method how to say that some part of physical memory
>> >>>> should be included in vm_page_array, but the pages from that region
>> >>>> should
>> >>>> not be put to free list during initialization. We think that such
>> >>>> possibility could be utilized in general. There could be a need for
>> some
>> >>>> physical space which:
>> >>>>
>> >>>> (1) is needed only during boot and later on it can be freed and put
>> to vm
>> >>>> subsystem,
>> >>>>
>> >>>> (2) is needed for something else and vm_page_array code could be used
>> >>>> without some kind of its duplication.
>> >>>>
>> >>>> There is already some code which deals with blacklisted pages in
>> >>>> vm_page.c
>> >>>> file. So the easiest way how to deal with presented situation is to
>> add
>> >>>> some callback to this part of code which will be able to either
>> exclude
>> >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the
>> >>>> biggest
>> >>>> phys_avail region is used for vm subsystem allocations, there should
>> be
>> >>>> some more coding. (However, blacklisted pages are not dealt with on
>> that
>> >>>> part of region.)
>> >>>>
>> >>>> We would like to know if there is any objection:
>> >>>>
>> >>>> (1) to deal with presented problem,
>> >>>> (2) to deal with the problem presented way.
>> >>>> Some help is very appreciated. Thanks
>> >>>>
>> >>>>
>> >>> As an experiment, try modifying vm_phys.c to use dump_avail instead of
>> >>> phys_avail when sizing vm_page_array.  On amd64, where the same
>> problem
>> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now, this is
>> >>> probably my preferred solution.  The catch being that not all
>> architectures
>> >>> implement dump_avail, but my recollection is that arm does.
>> >>>
>> >> Frankly, I would prefer this too, but there is one big open question:
>> >>
>> >> What is dump_avail for?
>> >>
>> >>
>> >>
>> >> dump_avail[] is solving a similar problem in the minidump code, hence,
>> the
>> >> prefix "dump_" in its name.  In other words, the minidump code
>> couldn't use
>> >> phys_avail[] either because it didn't describe the full range of
>> physical
>> >> addresses that might be included in a minidump, so dump_avail[] was
>> created.
>> >>
>> >> There is already precedent for what I'm suggesting.  dump_avail[] is
>> >> already (ab)used outside of the minidump code on x86 to solve this same
>> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>> >>
>> >>
>> >>  Using it for vm_page_array initialization and segmentation means that
>> >> phys_avail must be a subset of it. And this must be stated and be
>> visible
>> >> enough. Maybe it should be even checked in code. I like the idea of
>> >> thinking about dump_avail as something what desribes all memory in a
>> >> system, but it's not how dump_avail is defined in archs now.
>> >>
>> >>
>> >>
>> >> When you say "it's not how dump_avail is defined in archs now", I'm not
>> >> sure whether you're talking about the code or the comments.  In terms
>> of
>> >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of
>> any
>> >> code that would have to change.  In terms of comments, I did a grep
>> looking
>> >> for comments defining what dump_avail[] is, because I couldn't remember
>> >> any.  I found one ... on arm.  So, I don't think it's a onerous task
>> >> changing the definition of dump_avail[].  :-)
>> >>
>> >> Already, as things stand today with dump_avail[] being used outside of
>> the
>> >> minidump code, one could reasonably argue that it should be renamed to
>> >> something like phys_exists[].
>> >>
>> >>
>> >>
>> >> I will experiment with it on monday then. However, it's not only about
>> how
>> >> memory segments are created in vm_phys.c, but it's about how
>> vm_page_array
>> >> size is computed in vm_page.c too.
>> >>
>> >>
>> >>
>> >> Yes, and there is also a place in vm_reserv.c that needs to change.
>>  I've
>> >> attached the patch that I developed and tested a long time ago.  It
>> still
>> >> applies cleanly and runs ok on amd64.
>> >>
>> >>
>> >>
>> >
>> >
>> > Well, I've created and tested minimalistic patch which - I hope - is
>> > commitable. It runs ok on pandaboard (arm-v6) and solves presented
>> problem.
>> > I would really appreciate if this will be commited. Thanks.
>>
>>
>> Sorry for the slow reply.  I've just been swamped with work lately.  I
>> finally had some time to look at this in the last day or so.
>>
>> The first thing that I propose to do is commit the attached patch.  This
>> patch changes pmap_init() on amd64, armv6, and i386 so that it no longer
>> consults phys_avail[] to determine the end of memory.  Instead, it calls
>> a new function provided by vm_phys.c to obtain the same information from
>> vm_phys_segs[].
>>
>> With this change, the new variable phys_managed in your patch wouldn't
>> need to be a global.  It could be a local variable in vm_page_startup()
>> that we pass as a parameter to vm_phys_init() and vm_reserv_init().
>>
>> More generally, the long-term vision that I have is that we would stop
>> using phys_avail[] after vm_page_startup() had completed.  It would only
>> be used during initialization.  After that we would use vm_phys_segs[]
>> and functions provided by vm_phys.c.
>>
>
> I understand. The patch and the long-term vision are fine for me. I just
> was not to bold to pass phys_managed as a parameter to vm_phys_init() and
> vm_reserv_init(). However, I certainly was thinking about it. While reading
> comment above vm_phys_get_end(), do we care of if last usable address is
> 0xFFFFFFFF?
>
>
>
> To date, this hasn't been a problem.  However, handling 0xFFFFFFFF is
> easy.  So, the final version of the patch that I committed this weekend
> does so.
>
> Can you please try the attached patch?  It replaces phys_avail[] with
> vm_phys_segs[] in arm's busdma.
>


It works fine on arm-v6 pandaboard. I have no objection to commit it.
However, it's only 1:1 replacement. In fact, I still keep the following
pattern in my head:

present memory in system <=> all RAM and whatsoever
nobounce memory <=> addressable by DMA
managed memory by vm subsystem  <=> i.e. kept in vm_page_array
available memory for vm subsystem <=> can be allocated

So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it could
be too much limiting in some scenarios. I would like to see something
different in exclusion_bounce_check() in the future. Something what
reflects NOBOUNCE property and not NOALLOC one like now.


>
>
>
>  Do you think that the rest of my patch considering changes due to your
> patch is ok?
>
>
>
>
> Basically, yes.  I do, however, think that
>
> +#if defined(__arm__)
> +       phys_managed = dump_avail;
> +#else
> +       phys_managed = phys_avail;
> +#endif
>
> should also be conditioned on VM_PHYSSEG_SPARSE.
>


So I've prepared new patch. phys_managed[] is passed to vm_phys_init() and
vm_reserv_init() as a parameter and small optimalization is made in
vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you
mentioned. Anyhow, I still think that this is only temporary hack. In
general, phys_managed[] should always be distinguished from phys_avail[].


>
>
>> >
>> > BTW, while I was inspecting all archs, I think that maybe it's time to
>> do
>> > what was done for busdma not long ago. There are many similar codes
>> across
>> > archs which deal with physical memory and could be generalized and put
>> to
>> > kern/subr_physmem.c for utilization. All work with physical memory
>> could be
>> > simplify to two arrays of regions.
>> >
>> > phys_present[] ... describes all present physical memory regions
>> > phys_exclude[] ... describes various exclusions from phys_present[]
>> >
>> > Each excluded region will be labeled by flags to say what kind of
>> exclusion
>> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW
>> could
>> > be combined. This idea is taken from sys/arm/arm/physmem.c.
>> >
>> > All other arrays like phys_managed[], phys_avail[], dump_avail[] will be
>> > created from these phys_present[] and phys_exclude[].
>> > This way bootstrap codes in archs could be simplified and unified. For
>> > example, dealing with either hw.physmem or page with PA 0x00000000
>> could be
>> > transparent.
>> >
>> > I'm prepared to volunteer if the thing is ripe. However, some tutor
>> will be
>> > looked for.
>>
>>
>> I've never really looked at arm/arm/physmem.c before.  Let me do that
>> before I comment on this.
>>
>> No problem. This could be long-term aim. However, I hope the
> VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present time.
> In every case, thanks for your help.
>
>
>
>
>

--001a11c2d8ba8f4c290506676df8
Content-Type: application/octet-stream; name="phys_managed2.patch"
Content-Disposition: attachment; filename="phys_managed2.patch"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_i1rurru11

SW5kZXg6IHN5cy92bS92bV9wYWdlLmMKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gc3lzL3ZtL3ZtX3BhZ2UuYwko
cmV2aXNpb24gMjczNzM0KQorKysgc3lzL3ZtL3ZtX3BhZ2UuYwkod29ya2luZyBjb3B5KQpAQCAt
MjkwLDYgKzI5MCw3IEBACiAJdm1fcGFkZHJfdCBwYTsKIAl2bV9wYWRkcl90IGxhc3RfcGE7CiAJ
Y2hhciAqbGlzdDsKKwl2bV9wYWRkcl90ICpwaHlzX21hbmFnZWQ7CgogCS8qIHRoZSBiaWdnZXN0
IG1lbW9yeSBhcnJheSBpcyB0aGUgc2Vjb25kIGdyb3VwIG9mIHBhZ2VzICovCiAJdm1fcGFkZHJf
dCBlbmQ7CkBAIC0zMDEsMzEgKzMwMiwzOSBAQAogCWJpZ2dlc3RvbmUgPSAwOwogCXZhZGRyID0g
cm91bmRfcGFnZSh2YWRkcik7CgotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSArIDFdOyBpICs9
IDIpIHsKLQkJcGh5c19hdmFpbFtpXSA9IHJvdW5kX3BhZ2UocGh5c19hdmFpbFtpXSk7Ci0JCXBo
eXNfYXZhaWxbaSArIDFdID0gdHJ1bmNfcGFnZShwaHlzX2F2YWlsW2kgKyAxXSk7CisjaWYgZGVm
aW5lZChWTV9QSFlTU0VHX1NQQVJTRSkgJiYgZGVmaW5lZChfX2FybV9fKQorCXBoeXNfbWFuYWdl
ZCA9IGR1bXBfYXZhaWw7CisjZWxzZQorCXBoeXNfbWFuYWdlZCA9IHBoeXNfYXZhaWw7CisjZW5k
aWYKKworCWxvd193YXRlciA9IHJvdW5kX3BhZ2UocGh5c19tYW5hZ2VkWzBdKTsKKwloaWdoX3dh
dGVyID0gcm91bmRfcGFnZShwaHlzX21hbmFnZWRbMV0pOworCWZvciAoaSA9IDI7IHBoeXNfbWFu
YWdlZFtpICsgMV07IGkgKz0gMikgeworCQlwaHlzX21hbmFnZWRbaV0gPSByb3VuZF9wYWdlKHBo
eXNfbWFuYWdlZFtpXSk7CisJCXBoeXNfbWFuYWdlZFtpICsgMV0gPSB0cnVuY19wYWdlKHBoeXNf
bWFuYWdlZFtpICsgMV0pOworCQlpZiAocGh5c19tYW5hZ2VkW2ldIDwgbG93X3dhdGVyKQorCQkJ
bG93X3dhdGVyID0gcGh5c19tYW5hZ2VkW2ldOworCQlpZiAocGh5c19tYW5hZ2VkW2kgKyAxXSA+
IGhpZ2hfd2F0ZXIpCisJCQloaWdoX3dhdGVyID0gcGh5c19tYW5hZ2VkW2kgKyAxXTsKIAl9Cgot
CWxvd193YXRlciA9IHBoeXNfYXZhaWxbMF07Ci0JaGlnaF93YXRlciA9IHBoeXNfYXZhaWxbMV07
CisjaWZkZWYgWEVOCisJbG93X3dhdGVyID0gMDsKKyNlbmRpZgoKIAlmb3IgKGkgPSAwOyBwaHlz
X2F2YWlsW2kgKyAxXTsgaSArPSAyKSB7Ci0JCXZtX3BhZGRyX3Qgc2l6ZSA9IHBoeXNfYXZhaWxb
aSArIDFdIC0gcGh5c19hdmFpbFtpXTsKKwkJdm1fcGFkZHJfdCBzaXplOwoKKwkJcGh5c19hdmFp
bFtpXSA9IHJvdW5kX3BhZ2UocGh5c19hdmFpbFtpXSk7CisJCXBoeXNfYXZhaWxbaSArIDFdID0g
dHJ1bmNfcGFnZShwaHlzX2F2YWlsW2kgKyAxXSk7CisJCXNpemUgPSBwaHlzX2F2YWlsW2kgKyAx
XSAtIHBoeXNfYXZhaWxbaV07CiAJCWlmIChzaXplID4gYmlnZ2VzdHNpemUpIHsKIAkJCWJpZ2dl
c3RvbmUgPSBpOwogCQkJYmlnZ2VzdHNpemUgPSBzaXplOwogCQl9Ci0JCWlmIChwaHlzX2F2YWls
W2ldIDwgbG93X3dhdGVyKQotCQkJbG93X3dhdGVyID0gcGh5c19hdmFpbFtpXTsKLQkJaWYgKHBo
eXNfYXZhaWxbaSArIDFdID4gaGlnaF93YXRlcikKLQkJCWhpZ2hfd2F0ZXIgPSBwaHlzX2F2YWls
W2kgKyAxXTsKIAl9CgotI2lmZGVmIFhFTgotCWxvd193YXRlciA9IDA7Ci0jZW5kaWYKLQogCWVu
ZCA9IHBoeXNfYXZhaWxbYmlnZ2VzdG9uZSsxXTsKCiAJLyoKQEAgLTM5Myw4ICs0MDIsOCBAQAog
CWZpcnN0X3BhZ2UgPSBsb3dfd2F0ZXIgLyBQQUdFX1NJWkU7CiAjaWZkZWYgVk1fUEhZU1NFR19T
UEFSU0UKIAlwYWdlX3JhbmdlID0gMDsKLQlmb3IgKGkgPSAwOyBwaHlzX2F2YWlsW2kgKyAxXSAh
PSAwOyBpICs9IDIpCi0JCXBhZ2VfcmFuZ2UgKz0gYXRvcChwaHlzX2F2YWlsW2kgKyAxXSAtIHBo
eXNfYXZhaWxbaV0pOworCWZvciAoaSA9IDA7IHBoeXNfbWFuYWdlZFtpICsgMV0gIT0gMDsgaSAr
PSAyKQorCQlwYWdlX3JhbmdlICs9IGF0b3AocGh5c19tYW5hZ2VkW2kgKyAxXSAtIHBoeXNfbWFu
YWdlZFtpXSk7CiAjZWxpZiBkZWZpbmVkKFZNX1BIWVNTRUdfREVOU0UpCiAJcGFnZV9yYW5nZSA9
IGhpZ2hfd2F0ZXIgLyBQQUdFX1NJWkUgLSBmaXJzdF9wYWdlOwogI2Vsc2UKQEAgLTQ0NSw3ICs0
NTQsNyBAQAogCS8qCiAJICogSW5pdGlhbGl6ZSB0aGUgcGh5c2ljYWwgbWVtb3J5IGFsbG9jYXRv
ci4KIAkgKi8KLQl2bV9waHlzX2luaXQoKTsKKwl2bV9waHlzX2luaXQocGh5c19tYW5hZ2VkKTsK
CiAJLyoKIAkgKiBBZGQgZXZlcnkgYXZhaWxhYmxlIHBoeXNpY2FsIHBhZ2UgdGhhdCBpcyBub3Qg
YmxhY2tsaXN0ZWQgdG8KQEAgLTQ3Miw3ICs0ODEsNyBAQAogCS8qCiAJICogSW5pdGlhbGl6ZSB0
aGUgcmVzZXJ2YXRpb24gbWFuYWdlbWVudCBzeXN0ZW0uCiAJICovCi0Jdm1fcmVzZXJ2X2luaXQo
KTsKKwl2bV9yZXNlcnZfaW5pdChwaHlzX21hbmFnZWQpOwogI2VuZGlmCiAJcmV0dXJuICh2YWRk
cik7CiB9CkluZGV4OiBzeXMvdm0vdm1fcGh5cy5jCj09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHN5cy92bS92bV9w
aHlzLmMJKHJldmlzaW9uIDI3MzczNCkKKysrIHN5cy92bS92bV9waHlzLmMJKHdvcmtpbmcgY29w
eSkKQEAgLTM2MCwyMiArMzYwLDIyIEBACiAgKiBJbml0aWFsaXplIHRoZSBwaHlzaWNhbCBtZW1v
cnkgYWxsb2NhdG9yLgogICovCiB2b2lkCi12bV9waHlzX2luaXQodm9pZCkKK3ZtX3BoeXNfaW5p
dCh2bV9wYWRkcl90ICpyZWdpb25zKQogewogCXN0cnVjdCB2bV9mcmVlbGlzdCAqZmw7CiAJaW50
IGRvbSwgZmxpbmQsIGksIG9pbmQsIHBpbmQ7CgotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSAr
IDFdICE9IDA7IGkgKz0gMikgeworCWZvciAoaSA9IDA7IHJlZ2lvbnNbaSArIDFdICE9IDA7IGkg
Kz0gMikgewogI2lmZGVmCVZNX0ZSRUVMSVNUX0lTQURNQQotCQlpZiAocGh5c19hdmFpbFtpXSA8
IDE2Nzc3MjE2KSB7Ci0JCQlpZiAocGh5c19hdmFpbFtpICsgMV0gPiAxNjc3NzIxNikgewotCQkJ
CXZtX3BoeXNfY3JlYXRlX3NlZyhwaHlzX2F2YWlsW2ldLCAxNjc3NzIxNiwKKwkJaWYgKHJlZ2lv
bnNbaV0gPCAxNjc3NzIxNikgeworCQkJaWYgKHJlZ2lvbnNbaSArIDFdID4gMTY3NzcyMTYpIHsK
KwkJCQl2bV9waHlzX2NyZWF0ZV9zZWcocmVnaW9uc1tpXSwgMTY3NzcyMTYsCiAJCQkJICAgIFZN
X0ZSRUVMSVNUX0lTQURNQSk7Ci0JCQkJdm1fcGh5c19jcmVhdGVfc2VnKDE2Nzc3MjE2LCBwaHlz
X2F2YWlsW2kgKyAxXSwKKwkJCQl2bV9waHlzX2NyZWF0ZV9zZWcoMTY3NzcyMTYsIHJlZ2lvbnNb
aSArIDFdLAogCQkJCSAgICBWTV9GUkVFTElTVF9ERUZBVUxUKTsKIAkJCX0gZWxzZSB7Ci0JCQkJ
dm1fcGh5c19jcmVhdGVfc2VnKHBoeXNfYXZhaWxbaV0sCi0JCQkJICAgIHBoeXNfYXZhaWxbaSAr
IDFdLCBWTV9GUkVFTElTVF9JU0FETUEpOworCQkJCXZtX3BoeXNfY3JlYXRlX3NlZyhyZWdpb25z
W2ldLCByZWdpb25zW2kgKyAxXSwKKwkJCQkgICAgVk1fRlJFRUxJU1RfSVNBRE1BKTsKIAkJCX0K
IAkJCWlmIChWTV9GUkVFTElTVF9JU0FETUEgPj0gdm1fbmZyZWVsaXN0cykKIAkJCQl2bV9uZnJl
ZWxpc3RzID0gVk1fRlJFRUxJU1RfSVNBRE1BICsgMTsKQEAgLTM4MiwyMSArMzgyLDIxIEBACiAJ
CX0gZWxzZQogI2VuZGlmCiAjaWZkZWYJVk1fRlJFRUxJU1RfSElHSE1FTQotCQlpZiAocGh5c19h
dmFpbFtpICsgMV0gPiBWTV9ISUdITUVNX0FERFJFU1MpIHsKLQkJCWlmIChwaHlzX2F2YWlsW2ld
IDwgVk1fSElHSE1FTV9BRERSRVNTKSB7Ci0JCQkJdm1fcGh5c19jcmVhdGVfc2VnKHBoeXNfYXZh
aWxbaV0sCisJCWlmIChyZWdpb25zW2kgKyAxXSA+IFZNX0hJR0hNRU1fQUREUkVTUykgeworCQkJ
aWYgKHJlZ2lvbnNbaV0gPCBWTV9ISUdITUVNX0FERFJFU1MpIHsKKwkJCQl2bV9waHlzX2NyZWF0
ZV9zZWcocmVnaW9uc1tpXSwKIAkJCQkgICAgVk1fSElHSE1FTV9BRERSRVNTLCBWTV9GUkVFTElT
VF9ERUZBVUxUKTsKIAkJCQl2bV9waHlzX2NyZWF0ZV9zZWcoVk1fSElHSE1FTV9BRERSRVNTLAot
CQkJCSAgICBwaHlzX2F2YWlsW2kgKyAxXSwgVk1fRlJFRUxJU1RfSElHSE1FTSk7CisJCQkJICAg
IHJlZ2lvbnNbaSArIDFdLCBWTV9GUkVFTElTVF9ISUdITUVNKTsKIAkJCX0gZWxzZSB7Ci0JCQkJ
dm1fcGh5c19jcmVhdGVfc2VnKHBoeXNfYXZhaWxbaV0sCi0JCQkJICAgIHBoeXNfYXZhaWxbaSAr
IDFdLCBWTV9GUkVFTElTVF9ISUdITUVNKTsKKwkJCQl2bV9waHlzX2NyZWF0ZV9zZWcocmVnaW9u
c1tpXSwgcmVnaW9uc1tpICsgMV0sCisJCQkJICAgIFZNX0ZSRUVMSVNUX0hJR0hNRU0pOwogCQkJ
fQogCQkJaWYgKFZNX0ZSRUVMSVNUX0hJR0hNRU0gPj0gdm1fbmZyZWVsaXN0cykKIAkJCQl2bV9u
ZnJlZWxpc3RzID0gVk1fRlJFRUxJU1RfSElHSE1FTSArIDE7CiAJCX0gZWxzZQogI2VuZGlmCi0J
CXZtX3BoeXNfY3JlYXRlX3NlZyhwaHlzX2F2YWlsW2ldLCBwaHlzX2F2YWlsW2kgKyAxXSwKKwkJ
dm1fcGh5c19jcmVhdGVfc2VnKHJlZ2lvbnNbaV0sIHJlZ2lvbnNbaSArIDFdLAogCQkgICAgVk1f
RlJFRUxJU1RfREVGQVVMVCk7CiAJfQogCWZvciAoZG9tID0gMDsgZG9tIDwgdm1fbmRvbWFpbnM7
IGRvbSsrKSB7CkluZGV4OiBzeXMvdm0vdm1fcGh5cy5oCj09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHN5cy92bS92
bV9waHlzLmgJKHJldmlzaW9uIDI3MzczNCkKKysrIHN5cy92bS92bV9waHlzLmgJKHdvcmtpbmcg
Y29weSkKQEAgLTgwLDcgKzgwLDcgQEAKIHZtX3BhZ2VfdCB2bV9waHlzX2ZpY3RpdGlvdXNfdG9f
dm1fcGFnZSh2bV9wYWRkcl90IHBhKTsKIHZvaWQgdm1fcGh5c19mcmVlX2NvbnRpZyh2bV9wYWdl
X3QgbSwgdV9sb25nIG5wYWdlcyk7CiB2b2lkIHZtX3BoeXNfZnJlZV9wYWdlcyh2bV9wYWdlX3Qg
bSwgaW50IG9yZGVyKTsKLXZvaWQgdm1fcGh5c19pbml0KHZvaWQpOwordm9pZCB2bV9waHlzX2lu
aXQodm1fcGFkZHJfdCAqcmVnaW9ucyk7CiB2bV9wYWdlX3Qgdm1fcGh5c19wYWRkcl90b192bV9w
YWdlKHZtX3BhZGRyX3QgcGEpOwogdm9pZCB2bV9waHlzX3NldF9wb29sKGludCBwb29sLCB2bV9w
YWdlX3QgbSwgaW50IG9yZGVyKTsKIGJvb2xlYW5fdCB2bV9waHlzX3VuZnJlZV9wYWdlKHZtX3Bh
Z2VfdCBtKTsKSW5kZXg6IHN5cy92bS92bV9yZXNlcnYuYwo9PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzeXMvdm0v
dm1fcmVzZXJ2LmMJKHJldmlzaW9uIDI3MzczNCkKKysrIHN5cy92bS92bV9yZXNlcnYuYwkod29y
a2luZyBjb3B5KQpAQCAtODE1LDcgKzgxNSw3IEBACiAgKiBSZXF1aXJlcyB0aGF0IHZtX3BhZ2Vf
YXJyYXkgYW5kIGZpcnN0X3BhZ2UgYXJlIGluaXRpYWxpemVkIQogICovCiB2b2lkCi12bV9yZXNl
cnZfaW5pdCh2b2lkKQordm1fcmVzZXJ2X2luaXQodm1fcGFkZHJfdCAqcmVnaW9ucykKIHsKIAl2
bV9wYWRkcl90IHBhZGRyOwogCWludCBpOwpAQCAtODI0LDkgKzgyNCw5IEBACiAJICogSW5pdGlh
bGl6ZSB0aGUgcmVzZXJ2YXRpb24gYXJyYXkuICBTcGVjaWZpY2FsbHksIGluaXRpYWxpemUgdGhl
CiAJICogInBhZ2VzIiBmaWVsZCBmb3IgZXZlcnkgZWxlbWVudCB0aGF0IGhhcyBhbiB1bmRlcmx5
aW5nIHN1cGVycGFnZS4KIAkgKi8KLQlmb3IgKGkgPSAwOyBwaHlzX2F2YWlsW2kgKyAxXSAhPSAw
OyBpICs9IDIpIHsKLQkJcGFkZHIgPSByb3VuZHVwMihwaHlzX2F2YWlsW2ldLCBWTV9MRVZFTF8w
X1NJWkUpOwotCQl3aGlsZSAocGFkZHIgKyBWTV9MRVZFTF8wX1NJWkUgPD0gcGh5c19hdmFpbFtp
ICsgMV0pIHsKKwlmb3IgKGkgPSAwOyByZWdpb25zW2kgKyAxXSAhPSAwOyBpICs9IDIpIHsKKwkJ
cGFkZHIgPSByb3VuZHVwMihyZWdpb25zW2ldLCBWTV9MRVZFTF8wX1NJWkUpOworCQl3aGlsZSAo
cGFkZHIgKyBWTV9MRVZFTF8wX1NJWkUgPD0gcmVnaW9uc1tpICsgMV0pIHsKIAkJCXZtX3Jlc2Vy
dl9hcnJheVtwYWRkciA+PiBWTV9MRVZFTF8wX1NISUZUXS5wYWdlcyA9CiAJCQkgICAgUEhZU19U
T19WTV9QQUdFKHBhZGRyKTsKIAkJCXBhZGRyICs9IFZNX0xFVkVMXzBfU0laRTsKSW5kZXg6IHN5
cy92bS92bV9yZXNlcnYuaAo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzeXMvdm0vdm1fcmVzZXJ2LmgJKHJldmlz
aW9uIDI3MzczNCkKKysrIHN5cy92bS92bV9yZXNlcnYuaAkod29ya2luZyBjb3B5KQpAQCAtNTIs
NyArNTIsNyBAQAogCQkgICAgdm1fcGFnZV90IG1wcmVkKTsKIHZvaWQJCXZtX3Jlc2Vydl9icmVh
a19hbGwodm1fb2JqZWN0X3Qgb2JqZWN0KTsKIGJvb2xlYW5fdAl2bV9yZXNlcnZfZnJlZV9wYWdl
KHZtX3BhZ2VfdCBtKTsKLXZvaWQJCXZtX3Jlc2Vydl9pbml0KHZvaWQpOwordm9pZAkJdm1fcmVz
ZXJ2X2luaXQodm1fcGFkZHJfdCAqcmVnaW9ucyk7CiBpbnQJCXZtX3Jlc2Vydl9sZXZlbF9pZmZ1
bGxwb3Aodm1fcGFnZV90IG0pOwogYm9vbGVhbl90CXZtX3Jlc2Vydl9yZWFjdGl2YXRlX3BhZ2Uo
dm1fcGFnZV90IG0pOwogYm9vbGVhbl90CXZtX3Jlc2Vydl9yZWNsYWltX2NvbnRpZyh1X2xvbmcg
bnBhZ2VzLCB2bV9wYWRkcl90IGxvdywK
--001a11c2d8ba8f4c290506676df8--

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 16:29:24 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8CBCEEBF
 for <freebsd-arch@freebsd.org>; Mon, 27 Oct 2014 16:29:24 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 65A52E85
 for <freebsd-arch@freebsd.org>; Mon, 27 Oct 2014 16:29:24 +0000 (UTC)
Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net
 [173.70.85.31])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 66B24B96E;
 Mon, 27 Oct 2014 12:29:23 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Subject: Re: RfC: fueword(9) and casueword(9)
Date: Mon, 27 Oct 2014 11:17:51 -0400
Message-ID: <2048849.GkvWliFbyg@ralph.baldwin.cx>
User-Agent: KMail/4.14.2 (FreeBSD/10.1-PRERELEASE; KDE/4.14.2; amd64; ; )
In-Reply-To: <20141021162306.GE1877@kib.kiev.ua>
References: <20141021094539.GA1877@kib.kiev.ua>
 <20141022002825.H2080@besplex.bde.org> <20141021162306.GE1877@kib.kiev.ua>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 27 Oct 2014 12:29:23 -0400 (EDT)
Cc: Konstantin Belousov <kostikbel@gmail.com>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 16:29:24 -0000

On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote:
> On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote:
> > A new API should try to fix these __DEVOLATILE() abominations.  I think it
> > is safe, and even correct, to declare the pointers as volatile const void
> > *, since the functions really can handle volatile data, unlike copyin().
> > 
> > Atomic op functions are declared as taking pointers to volatile for
> > similar reasons.  Often they are applied to non-volatile data, but
> > adding a qualifier is type-safe and doesn't cost efficiency since the
> > pointer access is is not known to the compiler.  (The last point is not
> > so clear -- the compiler can see things in the functions since they are
> > inline asm.  fueword() isn't inline so its (in)efficiency is not changed.)
> > 
> > The atomic read functions are not declared as taking pointers to const.
> > The __DECONST() abomination might be used to work around this bug.
> 
> I prefer to not complicate the fetch(9) KPI due to the mistakes in the
> umtx structures definitions.  I think that it is bug to mark the lock
> words with volatile.  I want the fueword(9) interface to be as much
> similar to fuword(9), in particular, volatile seems to be not needed.

I agree with Bruce here.  casuword() already accepts volatile.  I also
think umtx is correct in marking the field as volatile.  They are subject
to change without the compiler's knowledge albeit by other threads
rather than signal handlers.  Having them marked volatile doesn't really
matter for the kernel, but the header is also used in userland and is
relevant in sem_new.c, etc.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 16:29:23 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id DA659DDF
 for <freebsd-arch@freebsd.org>; Mon, 27 Oct 2014 16:29:23 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B3044E84
 for <freebsd-arch@freebsd.org>; Mon, 27 Oct 2014 16:29:23 +0000 (UTC)
Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net
 [173.70.85.31])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id A575FB941;
 Mon, 27 Oct 2014 12:29:22 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Subject: Re: refcount_release_take_##lock
Date: Mon, 27 Oct 2014 11:27:45 -0400
Message-ID: <2629048.tOq3sNXcCP@ralph.baldwin.cx>
User-Agent: KMail/4.14.2 (FreeBSD/10.1-PRERELEASE; KDE/4.14.2; amd64; ; )
In-Reply-To: <20141025190407.GU82214@funkthat.com>
References: <20141025184448.GA19066@dft-labs.eu>
 <20141025190407.GU82214@funkthat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 27 Oct 2014 12:29:22 -0400 (EDT)
Cc: John-Mark Gurney <jmg@funkthat.com>, Mateusz Guzik <mjguzik@gmail.com>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 16:29:23 -0000

On Saturday, October 25, 2014 12:04:07 PM John-Mark Gurney wrote:
> Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200:
> > The following idiom is used here and there:
> > 
> > int old;
> > old = obj->ref;
> > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1))
> > 
> > 	return;
> > 
> > lock(&something);
> > if (refcount_release(&obj->ref) == 0) {
> > 
> > 	unlock(&something);
> > 	return;
> > 
> > }
> > free up
> > unlock(&something);
> > 
> > ==========
> 
> Couldn't this be better written as:
> if (__predict_false(refcount_release(&obj->ref) == 0)) {
> 	lock(&something);
> 	if (__predict_true(!obj->ref)) {
> 		free up
> 	}
> 	unlock(&something);
> }
> 
> The reason I'm asking is that I changed how IPsec SA ref counting was
> handled, and used something similar...

No, this has a race as others have noted.  Please go fix the IPsec code. :)
 
> My code gets rid of a branch, and is better in that it uses refcount
> API properly, instead of using atomic_cmpset_int...

He is extending the refcount() API (which uses atomic_* internally).
The API implementation _should_ use atomic_* directly.

Mateusz,

Please keep the refcount_*() prefix so it matches the rest of the API.  I 
would just declare the functions directly in refcount.h rather than requiring 
a macro to be invoked in each C file.  We can also just implement the needed 
lock types for now instead of all of them.

You could maybe replace 'take' with 'lock', but either name is fine.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 16:31:54 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CA992315;
 Mon, 27 Oct 2014 16:31:54 +0000 (UTC)
Received: from pp2.rice.edu (proofpoint2.mail.rice.edu [128.42.201.101])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 74CF7EBD;
 Mon, 27 Oct 2014 16:31:53 +0000 (UTC)
Received: from pps.filterd (pp2.rice.edu [127.0.0.1])
 by pp2.rice.edu (8.14.5/8.14.5) with SMTP id s9RGRF9k011035;
 Mon, 27 Oct 2014 11:31:51 -0500
Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10])
 by pp2.rice.edu with ESMTP id 1q7yw40t9v-1;
 Mon, 27 Oct 2014 11:31:51 -0500
X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel
Received: from 108-254-203-201.lightspeed.hstntx.sbcglobal.net
 (108-254-203-201.lightspeed.hstntx.sbcglobal.net [108.254.203.201])
 (using TLSv1 with cipher RC4-MD5 (128/128 bits))
 (No client certificate requested) (Authenticated sender: alc)
 by mh3.mail.rice.edu (Postfix) with ESMTPSA id CE2CA403FC;
 Mon, 27 Oct 2014 11:31:50 -0500 (CDT)
Message-ID: <544E7376.6040002@rice.edu>
Date: Mon, 27 Oct 2014 11:31:50 -0500
From: Alan Cox <alc@rice.edu>
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: Svatopluk Kraus <onwahe@gmail.com>
Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE
References: <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com>	<CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt+ooD+3pzq+FV++TQ@mail.gmail.com>	<CAFHCsPWq9WqeFnx1a+StfSxj=jwcE9GPyVsoyh0+azr3HmM6vQ@mail.gmail.com>	<5428AF3B.1030906@rice.edu>	<CAFHCsPWxF0G+bqBYgxH=WtV+St_UTWZj+Y2-PHfoYSLjC_Qpig@mail.gmail.com>	<54497DC1.5070506@rice.edu>	<CAFHCsPVj3PGbkSmkKsd2bGvmh3+dZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com>	<544DED4C.3010501@rice.edu>
 <CAFHCsPV1H6XsOoDFitQFgJOP6u+giEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com>
In-Reply-To: <CAFHCsPV1H6XsOoDFitQFgJOP6u+giEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com>
X-Enigmail-Version: 1.6
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
 suspectscore=11
 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=7.0.1-1402240000 definitions=main-1410270157
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 16:31:54 -0000

On 10/27/2014 08:22, Svatopluk Kraus wrote:
>
> On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox <alc@rice.edu
> <mailto:alc@rice.edu>> wrote:
>
>     On 10/24/2014 06:33, Svatopluk Kraus wrote:
>>
>>     On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu
>>     <mailto:alc@rice.edu>> wrote:
>>
>>         On 10/08/2014 10:38, Svatopluk Kraus wrote:
>>         > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu
>>         <mailto:alc@rice.edu>> wrote:
>>         >
>>         >>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>>         >>
>>         >>
>>         >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox
>>         <alan.l.cox@gmail.com <mailto:alan.l.cox@gmail.com>> wrote:
>>         >>
>>         >>>
>>         >>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus
>>         <onwahe@gmail.com <mailto:onwahe@gmail.com>>
>>         >>> wrote:
>>         >>>
>>         >>>> Hi,
>>         >>>>
>>         >>>> I and Michal are finishing new ARM pmap-v6 code. There
>>         is one problem
>>         >>>> we've
>>         >>>> dealt with somehow, but now we would like to do it
>>         better. It's about
>>         >>>> physical pages which are allocated before vm subsystem
>>         is initialized.
>>         >>>> While later on these pages could be found in
>>         vm_page_array when
>>         >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>>         >>>> VM_PHYSSEG_SPARSE
>>         >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>>         >>>>
>>         >>>> It really would be nice to utilize vm_page_array for
>>         such preallocated
>>         >>>> physical pages even when VM_PHYSSEG_SPARSE memory model
>>         is used. Things
>>         >>>> could be much easier then. In our case, it's about pages
>>         which are used
>>         >>>> for
>>         >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have
>>         two sets of such
>>         >>>> pages. First ones are preallocated and second ones are
>>         allocated after vm
>>         >>>> subsystem was inited. We must deal with each set
>>         differently. So code is
>>         >>>> more complex and so is debugging.
>>         >>>>
>>         >>>> Thus we need some method how to say that some part of
>>         physical memory
>>         >>>> should be included in vm_page_array, but the pages from
>>         that region
>>         >>>> should
>>         >>>> not be put to free list during initialization. We think
>>         that such
>>         >>>> possibility could be utilized in general. There could be
>>         a need for some
>>         >>>> physical space which:
>>         >>>>
>>         >>>> (1) is needed only during boot and later on it can be
>>         freed and put to vm
>>         >>>> subsystem,
>>         >>>>
>>         >>>> (2) is needed for something else and vm_page_array code
>>         could be used
>>         >>>> without some kind of its duplication.
>>         >>>>
>>         >>>> There is already some code which deals with blacklisted
>>         pages in
>>         >>>> vm_page.c
>>         >>>> file. So the easiest way how to deal with presented
>>         situation is to add
>>         >>>> some callback to this part of code which will be able to
>>         either exclude
>>         >>>> whole phys_avail[i], phys_avail[i+1] region or single
>>         pages. As the
>>         >>>> biggest
>>         >>>> phys_avail region is used for vm subsystem allocations,
>>         there should be
>>         >>>> some more coding. (However, blacklisted pages are not
>>         dealt with on that
>>         >>>> part of region.)
>>         >>>>
>>         >>>> We would like to know if there is any objection:
>>         >>>>
>>         >>>> (1) to deal with presented problem,
>>         >>>> (2) to deal with the problem presented way.
>>         >>>> Some help is very appreciated. Thanks
>>         >>>>
>>         >>>>
>>         >>> As an experiment, try modifying vm_phys.c to use
>>         dump_avail instead of
>>         >>> phys_avail when sizing vm_page_array.  On amd64, where
>>         the same problem
>>         >>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right
>>         now, this is
>>         >>> probably my preferred solution.  The catch being that not
>>         all architectures
>>         >>> implement dump_avail, but my recollection is that arm does.
>>         >>>
>>         >> Frankly, I would prefer this too, but there is one big
>>         open question:
>>         >>
>>         >> What is dump_avail for?
>>         >>
>>         >>
>>         >>
>>         >> dump_avail[] is solving a similar problem in the minidump
>>         code, hence, the
>>         >> prefix "dump_" in its name.  In other words, the minidump
>>         code couldn't use
>>         >> phys_avail[] either because it didn't describe the full
>>         range of physical
>>         >> addresses that might be included in a minidump, so
>>         dump_avail[] was created.
>>         >>
>>         >> There is already precedent for what I'm suggesting. 
>>         dump_avail[] is
>>         >> already (ab)used outside of the minidump code on x86 to
>>         solve this same
>>         >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>>         >>
>>         >>
>>         >>  Using it for vm_page_array initialization and
>>         segmentation means that
>>         >> phys_avail must be a subset of it. And this must be stated
>>         and be visible
>>         >> enough. Maybe it should be even checked in code. I like
>>         the idea of
>>         >> thinking about dump_avail as something what desribes all
>>         memory in a
>>         >> system, but it's not how dump_avail is defined in archs now.
>>         >>
>>         >>
>>         >>
>>         >> When you say "it's not how dump_avail is defined in archs
>>         now", I'm not
>>         >> sure whether you're talking about the code or the
>>         comments.  In terms of
>>         >> code, dump_avail[] is a superset of phys_avail[], and I'm
>>         not aware of any
>>         >> code that would have to change.  In terms of comments, I
>>         did a grep looking
>>         >> for comments defining what dump_avail[] is, because I
>>         couldn't remember
>>         >> any.  I found one ... on arm.  So, I don't think it's a
>>         onerous task
>>         >> changing the definition of dump_avail[].  :-)
>>         >>
>>         >> Already, as things stand today with dump_avail[] being
>>         used outside of the
>>         >> minidump code, one could reasonably argue that it should
>>         be renamed to
>>         >> something like phys_exists[].
>>         >>
>>         >>
>>         >>
>>         >> I will experiment with it on monday then. However, it's
>>         not only about how
>>         >> memory segments are created in vm_phys.c, but it's about
>>         how vm_page_array
>>         >> size is computed in vm_page.c too.
>>         >>
>>         >>
>>         >>
>>         >> Yes, and there is also a place in vm_reserv.c that needs
>>         to change.   I've
>>         >> attached the patch that I developed and tested a long time
>>         ago.  It still
>>         >> applies cleanly and runs ok on amd64.
>>         >>
>>         >>
>>         >>
>>         >
>>         >
>>         > Well, I've created and tested minimalistic patch which - I
>>         hope - is
>>         > commitable. It runs ok on pandaboard (arm-v6) and solves
>>         presented problem.
>>         > I would really appreciate if this will be commited. Thanks.
>>
>>
>>         Sorry for the slow reply.  I've just been swamped with work
>>         lately.  I
>>         finally had some time to look at this in the last day or so.
>>
>>         The first thing that I propose to do is commit the attached
>>         patch.  This
>>         patch changes pmap_init() on amd64, armv6, and i386 so that
>>         it no longer
>>         consults phys_avail[] to determine the end of memory. 
>>         Instead, it calls
>>         a new function provided by vm_phys.c to obtain the same
>>         information from
>>         vm_phys_segs[].
>>
>>         With this change, the new variable phys_managed in your patch
>>         wouldn't
>>         need to be a global.  It could be a local variable in
>>         vm_page_startup()
>>         that we pass as a parameter to vm_phys_init() and
>>         vm_reserv_init().
>>
>>         More generally, the long-term vision that I have is that we
>>         would stop
>>         using phys_avail[] after vm_page_startup() had completed.  It
>>         would only
>>         be used during initialization.  After that we would use
>>         vm_phys_segs[]
>>         and functions provided by vm_phys.c.
>>
>>      
>>     I understand. The patch and the long-term vision are fine for me.
>>     I just was not to bold to pass phys_managed as a parameter to
>>     vm_phys_init() and vm_reserv_init(). However, I certainly was
>>     thinking about it. While reading comment above vm_phys_get_end(),
>>     do we care of if last usable address is 0xFFFFFFFF?
>
>
>     To date, this hasn't been a problem.  However, handling 0xFFFFFFFF
>     is easy.  So, the final version of the patch that I committed this
>     weekend does so.
>
>     Can you please try the attached patch?  It replaces phys_avail[]
>     with vm_phys_segs[] in arm's busdma.
>
>  
>  
> It works fine on arm-v6 pandaboard. I have no objection to commit it.
> However, it's only 1:1 replacement.


Right now, yes.  However, once your patch is committed, it won't be 1:1
anymore, because vm_phys_segs[] will be populated based on dump_avail[]
rather than phys_avail[].

My interpretation of the affected code is that using the ranges defined
by dump_avail[] is actually closer to what this code intended.


> In fact, I still keep the following pattern in my head:
>  
> present memory in system <=> all RAM and whatsoever
> nobounce memory <=> addressable by DMA


In general, I don't see how this can be an attribute of the memory,
because it's going to depend on the device.  In other words, a given
physical address may require bouncing for some device but not all devices.


> managed memory by vm subsystem  <=> i.e. kept in vm_page_array
> available memory for vm subsystem <=> can be allocated
>  
> So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it
> could be too much limiting in some scenarios. I would like to see
> something different in exclusion_bounce_check() in the future.
> Something what reflects NOBOUNCE property and not NOALLOC one like now.
>  
>  
>
>
>
>
>>     Do you think that the rest of my patch considering changes due to
>>     your patch is ok?
>>      
>
>
>     Basically, yes.  I do, however, think that
>
>     +#if defined(__arm__)
>     +       phys_managed = dump_avail;
>     +#else
>     +       phys_managed = phys_avail;
>     +#endif
>
>     should also be conditioned on VM_PHYSSEG_SPARSE.
>
>  
>  
>  
> So I've prepared new patch. phys_managed[] is passed to vm_phys_init()
> and vm_reserv_init() as a parameter and small optimalization is made
> in vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you
> mentioned. Anyhow, I still think that this is only temporary hack. In
> general, phys_managed[] should always be distinguished from phys_avail[].
>  
>  
>
>>      
>>
>>         >
>>         > BTW, while I was inspecting all archs, I think that maybe
>>         it's time to do
>>         > what was done for busdma not long ago. There are many
>>         similar codes across
>>         > archs which deal with physical memory and could be
>>         generalized and put to
>>         > kern/subr_physmem.c for utilization. All work with physical
>>         memory could be
>>         > simplify to two arrays of regions.
>>         >
>>         > phys_present[] ... describes all present physical memory
>>         regions
>>         > phys_exclude[] ... describes various exclusions from
>>         phys_present[]
>>         >
>>         > Each excluded region will be labeled by flags to say what
>>         kind of exclusion
>>         > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE,
>>         NOMEMRW  could
>>         > be combined. This idea is taken from sys/arm/arm/physmem.c.
>>         >
>>         > All other arrays like phys_managed[], phys_avail[],
>>         dump_avail[] will be
>>         > created from these phys_present[] and phys_exclude[].
>>         > This way bootstrap codes in archs could be simplified and
>>         unified. For
>>         > example, dealing with either hw.physmem or page with PA
>>         0x00000000 could be
>>         > transparent.
>>         >
>>         > I'm prepared to volunteer if the thing is ripe. However,
>>         some tutor will be
>>         > looked for.
>>
>>
>>         I've never really looked at arm/arm/physmem.c before.  Let me
>>         do that
>>         before I comment on this.
>>
>>     No problem. This could be long-term aim. However, I hope the
>>     VM_PHYSSEG_SPARSE problem could be dealt with in MI code in
>>     present time. In every case, thanks for your help.
>>      
>>      
>
>


From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 16:56:16 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 69D2DFE0;
 Mon, 27 Oct 2014 16:56:16 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A8DA526D;
 Mon, 27 Oct 2014 16:56:15 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9RGtvOc032843
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 27 Oct 2014 18:55:57 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9RGtvOc032843
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s9RGtv8G032839;
 Mon, 27 Oct 2014 18:55:57 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Mon, 27 Oct 2014 18:55:57 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Subject: Re: RfC: fueword(9) and casueword(9)
Message-ID: <20141027165557.GC1877@kib.kiev.ua>
References: <20141021094539.GA1877@kib.kiev.ua>
 <20141022002825.H2080@besplex.bde.org>
 <20141021162306.GE1877@kib.kiev.ua>
 <2048849.GkvWliFbyg@ralph.baldwin.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2048849.GkvWliFbyg@ralph.baldwin.cx>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED, FREEMAIL_FROM, NML_ADSP_CUSTOM_MED,
 T_FILL_THIS_FORM_SHORT
 autolearn=no autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 16:56:16 -0000

On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote:
> On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote:
> > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote:
> > > A new API should try to fix these __DEVOLATILE() abominations.  I think it
> > > is safe, and even correct, to declare the pointers as volatile const void
> > > *, since the functions really can handle volatile data, unlike copyin().
> > > 
> > > Atomic op functions are declared as taking pointers to volatile for
> > > similar reasons.  Often they are applied to non-volatile data, but
> > > adding a qualifier is type-safe and doesn't cost efficiency since the
> > > pointer access is is not known to the compiler.  (The last point is not
> > > so clear -- the compiler can see things in the functions since they are
> > > inline asm.  fueword() isn't inline so its (in)efficiency is not changed.)
> > > 
> > > The atomic read functions are not declared as taking pointers to const.
> > > The __DECONST() abomination might be used to work around this bug.
> > 
> > I prefer to not complicate the fetch(9) KPI due to the mistakes in the
> > umtx structures definitions.  I think that it is bug to mark the lock
> > words with volatile.  I want the fueword(9) interface to be as much
> > similar to fuword(9), in particular, volatile seems to be not needed.
> 
> I agree with Bruce here.  casuword() already accepts volatile.  I also
> think umtx is correct in marking the field as volatile.  They are subject
> to change without the compiler's knowledge albeit by other threads
> rather than signal handlers.  Having them marked volatile doesn't really
> matter for the kernel, but the header is also used in userland and is
> relevant in sem_new.c, etc.

You agree with making fueword() accept volatile const void * as the
address ?  Or do you agree with the existence of the volatile type
qualifier for the lock field of umtx structures ?

I definitely do not want to make fueword() different from fuword() in
this aspect.  If changing both fueword() and fuword() to take volatile
const * address, this should be different patch.  At least because
that existing changes to kern_umtx.c are really complicated due to
changing very delicate logic, and I do not want to add unrelated and
splittable modifications to something which I expect to require
more debugging in the wild.

Below is the current version, which passed Peter' stress2 load on x86.
I also did smoke-testing on powerpc64.  After make tinderbox finishes
successfully for the patch, I consider the change ready.

diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile
index bc21dc6..fb63e78 100644
--- a/share/man/man9/Makefile
+++ b/share/man/man9/Makefile
@@ -581,6 +581,9 @@ MLINKS+=condvar.9 cv_broadcast.9 \
 MLINKS+=config_intrhook.9 config_intrhook_disestablish.9 \
 	config_intrhook.9 config_intrhook_establish.9
 MLINKS+=contigmalloc.9 contigfree.9
+MLINKS+=casuword.9 casueword.9 \
+	casuword.9 casueword32.9 \
+	casuword.9 casuword32.9
 MLINKS+=copy.9 copyin.9 \
 	copy.9 copyin_nofault.9 \
 	copy.9 copyinstr.9 \
@@ -688,7 +691,10 @@ MLINKS+=fetch.9 fubyte.9 \
 	fetch.9 fuword.9 \
 	fetch.9 fuword16.9 \
 	fetch.9 fuword32.9 \
-	fetch.9 fuword64.9
+	fetch.9 fuword64.9 \
+	fetch.9 fueword.9 \
+	fetch.9 fueword32.9 \
+	fetch.9 fueword64.9
 MLINKS+=firmware.9 firmware_get.9 \
 	firmware.9 firmware_put.9 \
 	firmware.9 firmware_register.9 \
diff --git a/share/man/man9/casuword.9 b/share/man/man9/casuword.9
new file mode 100644
index 0000000..34a0f1d
--- /dev/null
+++ b/share/man/man9/casuword.9
@@ -0,0 +1,95 @@
+.\" Copyright (c) 2014 The FreeBSD Foundation
+.\" All rights reserved.
+.\"
+.\" Part of this documentation was written by
+.\" Konstantin Belousov <kib@FreeBSD.org> under sponsorship
+.\" from the FreeBSD Foundation.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd October 21, 2014
+.Dt CASU 9
+.Os
+.Sh NAME
+.Nm casueword ,
+.Nm casueword32 ,
+.Nm casuword ,
+.Nm casuword32
+.Nd fetch, compare and store data from user-space
+.Sh SYNOPSIS
+.In sys/types.h
+.In sys/systm.h
+.Ft int
+.Fn casueword "volatile u_long *base" "u_long oldval" "u_long *oldvalp" "u_long newval"
+.Ft int
+.Fn casueword32 "volatile uint32_t *base" "uint32_t oldval" "uint32_t *oldvalp" "uint32_t newval"
+.Ft u_long
+.Fn casuword "volatile u_long *base" "u_long oldval" "u_long newval"
+.Ft uint32_t
+.Fn casuword32 "volatile uint32_t *base" "uint32_t oldval" "uint32_t newval"
+.Sh DESCRIPTION
+The
+.Nm
+functions are designed to perform atomic compare-and-swap operation on
+the value in the usermode memory of the current process.
+.Pp
+The
+.Nm
+routines reads the value from user memory with address
+.Pa base ,
+and compare the value read with
+.Pa oldval .
+If the values are equal,
+.Pa newval
+is written to the
+.Pa *base .
+In case of
+.Fn casueword32
+and
+.Fn casueword ,
+old value is stored into the (kernel-mode) variable pointed by
+.Pa *oldvalp .
+The userspace value must be naturally aligned.
+.Pp
+The callers of
+.Fn casuword
+and
+.Fn casuword32
+functions cannot distinguish between -1 read from
+userspace and function failure.
+.Sh RETURN VALUES
+The
+.Fn casuword
+and
+.Fn casuword32
+functions return the data fetched or -1 on failure.
+The
+.Fn casueword
+and
+.Fn casueword32
+functions return 0 on success and -1 on failure.
+.Sh SEE ALSO
+.Xr atomic 9 ,
+.Xr fetch 9 ,
+.Xr store 9
diff --git a/share/man/man9/fetch.9 b/share/man/man9/fetch.9
index ccf6866..7e13cbc 100644
--- a/share/man/man9/fetch.9
+++ b/share/man/man9/fetch.9
@@ -34,7 +34,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd October 5, 2009
+.Dd October 21, 2014
 .Dt FETCH 9
 .Os
 .Sh NAME
@@ -44,11 +44,13 @@
 .Nm fuword ,
 .Nm fuword16 ,
 .Nm fuword32 ,
-.Nm fuword64
+.Nm fuword64 ,
+.Nm fueword ,
+.Nm fueword32 ,
+.Nm fueword64
 .Nd fetch data from user-space
 .Sh SYNOPSIS
 .In sys/types.h
-.In sys/time.h
 .In sys/systm.h
 .Ft int
 .Fn fubyte "const void *base"
@@ -60,27 +62,38 @@
 .Fn fuword32 "const void *base"
 .Ft int64_t
 .Fn fuword64 "const void *base"
+.Ft long
+.Fn fueword "const void *base" "long *val"
+.Ft int32_t
+.Fn fueword32 "const void *base" "int32_t *val"
+.Ft int64_t
+.Fn fueword64 "const void *base" "int64_t *val"
 .In sys/resourcevar.h
 .Ft int
 .Fn fuswintr "void *base"
 .Sh DESCRIPTION
 The
 .Nm
-functions are designed to copy small amounts of data from user-space.
+functions are designed to copy small amounts of data from user-space
+of the current process.
+If read is successful, it is performed atomically.
+The data read must be naturally aligned.
 .Pp
 The
 .Nm
 routines provide the following functionality:
-.Bl -tag -width "fuswintr()"
+.Bl -tag -width "fueword32()"
 .It Fn fubyte
 Fetches a byte of data from the user-space address
 .Pa base .
+The byte read is zero-extended into the results variable.
 .It Fn fuword
-Fetches a word of data from the user-space address
+Fetches a word of data (long) from the user-space address
 .Pa base .
 .It Fn fuword16
 Fetches 16 bits of data from the user-space address
 .Pa base .
+The half-word read is zero-extended into the results variable.
 .It Fn fuword32
 Fetches 32 bits of data from the user-space address
 .Pa base .
@@ -91,11 +104,46 @@ Fetches 64 bits of data from the user-space address
 Fetches a short word of data from the user-space address
 .Pa base .
 This function is safe to call during an interrupt context.
+.It Fn fueword
+Fetches a word of data (long) from the user-space address
+.Pa base
+and stores the result in the variable pointed by
+.Pa val .
+.It Fn fueword32
+Fetches 32 bits of data from the user-space address
+.Pa base
+and stores the result in the variable pointed by
+.Pa val .
+.It Fn fueword64
+Fetches 64 bits of data from the user-space address
+.Pa base
+and stores the result in the variable pointed by
+.Pa val .
 .El
+.Pp
+The callers of
+.Fn fuword ,
+.Fn fuword32
+and
+.Fn fuword64
+functions cannot distinguish between -1 read from
+userspace and function failure.
 .Sh RETURN VALUES
 The
-.Nm
+.Fn fubyte ,
+.Fn fuword ,
+.Fn fuword16 ,
+.Fn fuword32 ,
+.Fn fuword64 ,
+and
+.Fn fuswintr
 functions return the data fetched or -1 on failure.
+The
+.Fn fueword ,
+.Fn fueword32
+and
+.Fn fueword64
+functions return 0 on success and -1 on failure.
 .Sh SEE ALSO
 .Xr copy 9 ,
 .Xr store 9
diff --git a/sys/amd64/amd64/support.S b/sys/amd64/amd64/support.S
index 4897367..50e653d 100644
--- a/sys/amd64/amd64/support.S
+++ b/sys/amd64/amd64/support.S
@@ -312,12 +312,13 @@ copyin_fault:
 END(copyin)
 
 /*
- * casuword32.  Compare and set user integer.  Returns -1 or the current value.
- *        dst = %rdi, old = %rsi, new = %rdx
+ * casueword32.  Compare and set user integer.  Returns -1 on fault,
+ *        0 if access was successful.  Old value is written to *oldp.
+ *        dst = %rdi, old = %esi, oldp = %rdx, new = %ecx
  */
-ENTRY(casuword32)
-	movq	PCPU(CURPCB),%rcx
-	movq	$fusufault,PCB_ONFAULT(%rcx)
+ENTRY(casueword32)
+	movq	PCPU(CURPCB),%r8
+	movq	$fusufault,PCB_ONFAULT(%r8)
 
 	movq	$VM_MAXUSER_ADDRESS-4,%rax
 	cmpq	%rax,%rdi			/* verify address is valid */
@@ -327,26 +328,34 @@ ENTRY(casuword32)
 #ifdef SMP
 	lock
 #endif
-	cmpxchgl %edx,(%rdi)			/* new = %edx */
+	cmpxchgl %ecx,(%rdi)			/* new = %ecx */
 
 	/*
 	 * The old value is in %eax.  If the store succeeded it will be the
 	 * value we expected (old) from before the store, otherwise it will
-	 * be the current value.
+	 * be the current value.  Save %eax into %esi to prepare the return
+	 * value.
 	 */
+	movl	%eax,%esi
+	xorl	%eax,%eax
+	movq	%rax,PCB_ONFAULT(%r8)
 
-	movq	PCPU(CURPCB),%rcx
-	movq	$0,PCB_ONFAULT(%rcx)
+	/*
+	 * Access the oldp after the pcb_onfault is cleared, to correctly
+	 * catch corrupted pointer.
+	 */
+	movl	%esi,(%rdx)			/* oldp = %rdx */
 	ret
-END(casuword32)
+END(casueword32)
 
 /*
- * casuword.  Compare and set user word.  Returns -1 or the current value.
- *        dst = %rdi, old = %rsi, new = %rdx
+ * casueword.  Compare and set user long.  Returns -1 on fault,
+ *        0 if access was successful.  Old value is written to *oldp.
+ *        dst = %rdi, old = %rsi, oldp = %rdx, new = %rcx
  */
-ENTRY(casuword)
-	movq	PCPU(CURPCB),%rcx
-	movq	$fusufault,PCB_ONFAULT(%rcx)
+ENTRY(casueword)
+	movq	PCPU(CURPCB),%r8
+	movq	$fusufault,PCB_ONFAULT(%r8)
 
 	movq	$VM_MAXUSER_ADDRESS-4,%rax
 	cmpq	%rax,%rdi			/* verify address is valid */
@@ -356,28 +365,28 @@ ENTRY(casuword)
 #ifdef SMP
 	lock
 #endif
-	cmpxchgq %rdx,(%rdi)			/* new = %rdx */
+	cmpxchgq %rcx,(%rdi)			/* new = %rcx */
 
 	/*
-	 * The old value is in %eax.  If the store succeeded it will be the
+	 * The old value is in %rax.  If the store succeeded it will be the
 	 * value we expected (old) from before the store, otherwise it will
 	 * be the current value.
 	 */
-
-	movq	PCPU(CURPCB),%rcx
-	movq	$fusufault,PCB_ONFAULT(%rcx)
-	movq	$0,PCB_ONFAULT(%rcx)
+	movq	%rax,%rsi
+	xorl	%eax,%eax
+	movq	%rax,PCB_ONFAULT(%r8)
+	movq	%rsi,(%rdx)
 	ret
-END(casuword)
+END(casueword)
 
 /*
  * Fetch (load) a 64-bit word, a 32-bit word, a 16-bit word, or an 8-bit
- * byte from user memory.  All these functions are MPSAFE.
- * addr = %rdi
+ * byte from user memory.
+ * addr = %rdi, valp = %rsi
  */
 
-ALTENTRY(fuword64)
-ENTRY(fuword)
+ALTENTRY(fueword64)
+ENTRY(fueword)
 	movq	PCPU(CURPCB),%rcx
 	movq	$fusufault,PCB_ONFAULT(%rcx)
 
@@ -385,13 +394,15 @@ ENTRY(fuword)
 	cmpq	%rax,%rdi			/* verify address is valid */
 	ja	fusufault
 
-	movq	(%rdi),%rax
-	movq	$0,PCB_ONFAULT(%rcx)
+	xorl	%eax,%eax
+	movq	(%rdi),%r11
+	movq	%rax,PCB_ONFAULT(%rcx)
+	movq	%r11,(%rsi)
 	ret
 END(fuword64)
 END(fuword)
 
-ENTRY(fuword32)
+ENTRY(fueword32)
 	movq	PCPU(CURPCB),%rcx
 	movq	$fusufault,PCB_ONFAULT(%rcx)
 
@@ -399,10 +410,12 @@ ENTRY(fuword32)
 	cmpq	%rax,%rdi			/* verify address is valid */
 	ja	fusufault
 
-	movl	(%rdi),%eax
-	movq	$0,PCB_ONFAULT(%rcx)
+	xorl	%eax,%eax
+	movl	(%rdi),%r11d
+	movq	%rax,PCB_ONFAULT(%rcx)
+	movl	%r11d,(%rsi)
 	ret
-END(fuword32)
+END(fueword32)
 
 /*
  * fuswintr() and suswintr() are specialized variants of fuword16() and
diff --git a/sys/amd64/ia32/ia32_syscall.c b/sys/amd64/ia32/ia32_syscall.c
index 0cdec6f..92249f9 100644
--- a/sys/amd64/ia32/ia32_syscall.c
+++ b/sys/amd64/ia32/ia32_syscall.c
@@ -110,7 +110,7 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa)
 	struct proc *p;
 	struct trapframe *frame;
 	caddr_t params;
-	u_int32_t args[8];
+	u_int32_t args[8], tmp;
 	int error, i;
 
 	p = td->td_proc;
@@ -126,7 +126,10 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa)
 		/*
 		 * Code is first argument, followed by actual args.
 		 */
-		sa->code = fuword32(params);
+		error = fueword32(params, &tmp);
+		if (error == -1)
+			return (EFAULT);
+		sa->code = tmp;
 		params += sizeof(int);
 	} else if (sa->code == SYS___syscall) {
 		/*
@@ -135,7 +138,10 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa)
 		 * We use a 32-bit fetch in case params is not
 		 * aligned.
 		 */
-		sa->code = fuword32(params);
+		error = fueword32(params, &tmp);
+		if (error == -1)
+			return (EFAULT);
+		sa->code = tmp;
 		params += sizeof(quad_t);
 	}
  	if (p->p_sysent->sv_mask)
diff --git a/sys/arm/include/param.h b/sys/arm/include/param.h
index 4a64607..6267154 100644
--- a/sys/arm/include/param.h
+++ b/sys/arm/include/param.h
@@ -149,4 +149,8 @@
 
 #define	pgtok(x)		((x) * (PAGE_SIZE / 1024))
 
+#ifdef _KERNEL
+#define	NO_FUEWORD	1
+#endif
+
 #endif /* !_ARM_INCLUDE_PARAM_H_ */
diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c
index 8ec949f..5ea062e 100644
--- a/sys/compat/freebsd32/freebsd32_misc.c
+++ b/sys/compat/freebsd32/freebsd32_misc.c
@@ -1832,16 +1832,21 @@ freebsd32_sysctl(struct thread *td, struct freebsd32_sysctl_args *uap)
 {
 	int error, name[CTL_MAXNAME];
 	size_t j, oldlen;
+	uint32_t tmp;
 
 	if (uap->namelen > CTL_MAXNAME || uap->namelen < 2)
 		return (EINVAL);
  	error = copyin(uap->name, name, uap->namelen * sizeof(int));
  	if (error)
 		return (error);
-	if (uap->oldlenp)
-		oldlen = fuword32(uap->oldlenp);
-	else
+	if (uap->oldlenp) {
+		error = fueword32(uap->oldlenp, &tmp);
+		oldlen = tmp;
+	} else {
 		oldlen = 0;
+	}
+	if (error != 0)
+		return (EFAULT);
 	error = userland_sysctl(td, name, uap->namelen,
 		uap->old, &oldlen, 1,
 		uap->new, uap->newlen, &j, SCTL_MASK32);
diff --git a/sys/i386/i386/support.s b/sys/i386/i386/support.s
index c126f78..0a08012 100644
--- a/sys/i386/i386/support.s
+++ b/sys/i386/i386/support.s
@@ -389,16 +389,16 @@ copyin_fault:
 	ret
 
 /*
- * casuword.  Compare and set user word.  Returns -1 or the current value.
+ * casueword.  Compare and set user word.  Returns -1 on fault,
+ * 0 on non-faulting access.  The current value is in *oldp.
  */
-
-ALTENTRY(casuword32)
-ENTRY(casuword)
+ALTENTRY(casueword32)
+ENTRY(casueword)
 	movl	PCPU(CURPCB),%ecx
 	movl	$fusufault,PCB_ONFAULT(%ecx)
 	movl	4(%esp),%edx			/* dst */
 	movl	8(%esp),%eax			/* old */
-	movl	12(%esp),%ecx			/* new */
+	movl	16(%esp),%ecx			/* new */
 
 	cmpl	$VM_MAXUSER_ADDRESS-4,%edx	/* verify address is valid */
 	ja	fusufault
@@ -416,17 +416,20 @@ ENTRY(casuword)
 
 	movl	PCPU(CURPCB),%ecx
 	movl	$0,PCB_ONFAULT(%ecx)
+	movl	12(%esp),%edx			/* oldp */
+	movl	%eax,(%edx)
+	xorl	%eax,%eax
 	ret
-END(casuword32)
-END(casuword)
+END(casueword32)
+END(casueword)
 
 /*
  * Fetch (load) a 32-bit word, a 16-bit word, or an 8-bit byte from user
- * memory.  All these functions are MPSAFE.
+ * memory.
  */
 
-ALTENTRY(fuword32)
-ENTRY(fuword)
+ALTENTRY(fueword32)
+ENTRY(fueword)
 	movl	PCPU(CURPCB),%ecx
 	movl	$fusufault,PCB_ONFAULT(%ecx)
 	movl	4(%esp),%edx			/* from */
@@ -436,9 +439,12 @@ ENTRY(fuword)
 
 	movl	(%edx),%eax
 	movl	$0,PCB_ONFAULT(%ecx)
+	movl	8(%esp),%edx
+	movl	%eax,(%edx)
+	xorl	%eax,%eax
 	ret
-END(fuword32)
-END(fuword)
+END(fueword32)
+END(fueword)
 
 /*
  * fuswintr() and suswintr() are specialized variants of fuword16() and
diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c
index 1d0d104..84d6ec3 100644
--- a/sys/i386/i386/trap.c
+++ b/sys/i386/i386/trap.c
@@ -1059,6 +1059,7 @@ cpu_fetch_syscall_args(struct thread *td, struct syscall_args *sa)
 	struct proc *p;
 	struct trapframe *frame;
 	caddr_t params;
+	long tmp;
 	int error;
 
 	p = td->td_proc;
@@ -1074,14 +1075,20 @@ cpu_fetch_syscall_args(struct thread *td, struct syscall_args *sa)
 		/*
 		 * Code is first argument, followed by actual args.
 		 */
-		sa->code = fuword(params);
+		error = fueword(params, &tmp);
+		if (error == -1)
+			return (EFAULT);
+		sa->code = tmp;
 		params += sizeof(int);
 	} else if (sa->code == SYS___syscall) {
 		/*
 		 * Like syscall, but code is a quad, so as to maintain
 		 * quad alignment for the rest of the arguments.
 		 */
-		sa->code = fuword(params);
+		error = fueword(params, &tmp);
+		if (error == -1)
+			return (EFAULT);
+		sa->code = tmp;
 		params += sizeof(quad_t);
 	}
 
diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c
index 09212c8..45d4c6f 100644
--- a/sys/kern/kern_exec.c
+++ b/sys/kern/kern_exec.c
@@ -1091,7 +1091,7 @@ int
 exec_copyin_args(struct image_args *args, char *fname,
     enum uio_seg segflg, char **argv, char **envv)
 {
-	char *argp, *envp;
+	u_long argp, envp;
 	int error;
 	size_t length;
 
@@ -1127,13 +1127,17 @@ exec_copyin_args(struct image_args *args, char *fname,
 	/*
 	 * extract arguments first
 	 */
-	while ((argp = (caddr_t) (intptr_t) fuword(argv++))) {
-		if (argp == (caddr_t) -1) {
+	for (;;) {
+		error = fueword(argv++, &argp);
+		if (error == -1) {
 			error = EFAULT;
 			goto err_exit;
 		}
-		if ((error = copyinstr(argp, args->endp,
-		    args->stringspace, &length))) {
+		if (argp == 0)
+			break;
+		error = copyinstr((void *)(uintptr_t)argp, args->endp,
+		    args->stringspace, &length);
+		if (error != 0) {
 			if (error == ENAMETOOLONG) 
 				error = E2BIG;
 			goto err_exit;
@@ -1149,13 +1153,17 @@ exec_copyin_args(struct image_args *args, char *fname,
 	 * extract environment strings
 	 */
 	if (envv) {
-		while ((envp = (caddr_t)(intptr_t)fuword(envv++))) {
-			if (envp == (caddr_t)-1) {
+		for (;;) {
+			error = fueword(envv++, &envp);
+			if (error == -1) {
 				error = EFAULT;
 				goto err_exit;
 			}
-			if ((error = copyinstr(envp, args->endp,
-			    args->stringspace, &length))) {
+			if (envp == 0)
+				break;
+			error = copyinstr((void *)(uintptr_t)envp,
+			    args->endp, args->stringspace, &length);
+			if (error != 0) {
 				if (error == ENAMETOOLONG)
 					error = E2BIG;
 				goto err_exit;
diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c
index c815e36..58e76bc 100644
--- a/sys/kern/kern_umtx.c
+++ b/sys/kern/kern_umtx.c
@@ -510,6 +510,15 @@ umtxq_unbusy(struct umtx_key *key)
 		wakeup_one(uc);
 }
 
+static inline void
+umtxq_unbusy_unlocked(struct umtx_key *key)
+{
+
+	umtxq_lock(key);
+	umtxq_unbusy(key);
+	umtxq_unlock(key);
+}
+
 static struct umtxq_queue *
 umtxq_queue_lookup(struct umtx_key *key, int q)
 {
@@ -847,6 +856,7 @@ do_wait(struct thread *td, void *addr, u_long id,
 	struct abs_timeout timo;
 	struct umtx_q *uq;
 	u_long tmp;
+	uint32_t tmp32;
 	int error = 0;
 
 	uq = td->td_umtxq;
@@ -860,18 +870,29 @@ do_wait(struct thread *td, void *addr, u_long id,
 	umtxq_lock(&uq->uq_key);
 	umtxq_insert(uq);
 	umtxq_unlock(&uq->uq_key);
-	if (compat32 == 0)
-		tmp = fuword(addr);
-        else
-		tmp = (unsigned int)fuword32(addr);
+	if (compat32 == 0) {
+		error = fueword(addr, &tmp);
+		if (error != 0)
+			error = EFAULT;
+	} else {
+		error = fueword32(addr, &tmp32);
+		if (error == 0)
+			tmp = tmp32;
+		else
+			error = EFAULT;
+	}
 	umtxq_lock(&uq->uq_key);
-	if (tmp == id)
-		error = umtxq_sleep(uq, "uwait", timeout == NULL ?
-		    NULL : &timo);
-	if ((uq->uq_flags & UQF_UMTXQ) == 0)
-		error = 0;
-	else
+	if (error == 0) {
+		if (tmp == id)
+			error = umtxq_sleep(uq, "uwait", timeout == NULL ?
+			    NULL : &timo);
+		if ((uq->uq_flags & UQF_UMTXQ) == 0)
+			error = 0;
+		else
+			umtxq_remove(uq);
+	} else if ((uq->uq_flags & UQF_UMTXQ) != 0) {
 		umtxq_remove(uq);
+	}
 	umtxq_unlock(&uq->uq_key);
 	umtx_key_release(&uq->uq_key);
 	if (error == ERESTART)
@@ -908,11 +929,11 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags,
 	struct abs_timeout timo;
 	struct umtx_q *uq;
 	uint32_t owner, old, id;
-	int error = 0;
+	int error, rv;
 
 	id = td->td_tid;
 	uq = td->td_umtxq;
-
+	error = 0;
 	if (timeout != NULL)
 		abs_timeout_init2(&timo, timeout);
 
@@ -921,7 +942,9 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags,
 	 * can fault on any access.
 	 */
 	for (;;) {
-		owner = fuword32(__DEVOLATILE(void *, &m->m_owner));
+		rv = fueword32(__DEVOLATILE(void *, &m->m_owner), &owner);
+		if (rv == -1)
+			return (EFAULT);
 		if (mode == _UMUTEX_WAIT) {
 			if (owner == UMUTEX_UNOWNED || owner == UMUTEX_CONTESTED)
 				return (0);
@@ -929,31 +952,31 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags,
 			/*
 			 * Try the uncontested case.  This should be done in userland.
 			 */
-			owner = casuword32(&m->m_owner, UMUTEX_UNOWNED, id);
+			rv = casueword32(&m->m_owner, UMUTEX_UNOWNED,
+			    &owner, id);
+			/* The address was invalid. */
+			if (rv == -1)
+				return (EFAULT);
 
 			/* The acquire succeeded. */
 			if (owner == UMUTEX_UNOWNED)
 				return (0);
 
-			/* The address was invalid. */
-			if (owner == -1)
-				return (EFAULT);
-
 			/* If no one owns it but it is contested try to acquire it. */
 			if (owner == UMUTEX_CONTESTED) {
-				owner = casuword32(&m->m_owner,
-				    UMUTEX_CONTESTED, id | UMUTEX_CONTESTED);
+				rv = casueword32(&m->m_owner,
+				    UMUTEX_CONTESTED, &owner,
+				    id | UMUTEX_CONTESTED);
+				/* The address was invalid. */
+				if (rv == -1)
+					return (EFAULT);
 
 				if (owner == UMUTEX_CONTESTED)
 					return (0);
 
-				/* The address was invalid. */
-				if (owner == -1)
-					return (EFAULT);
-
-				error = umtxq_check_susp(td);
-				if (error != 0)
-					return (error);
+				rv = umtxq_check_susp(td);
+				if (rv != 0)
+					return (rv);
 
 				/* If this failed the lock has changed, restart. */
 				continue;
@@ -985,10 +1008,11 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags,
 		 * either some one else has acquired the lock or it has been
 		 * released.
 		 */
-		old = casuword32(&m->m_owner, owner, owner | UMUTEX_CONTESTED);
+		rv = casueword32(&m->m_owner, owner, &old,
+		    owner | UMUTEX_CONTESTED);
 
 		/* The address was invalid. */
-		if (old == -1) {
+		if (rv == -1) {
 			umtxq_lock(&uq->uq_key);
 			umtxq_remove(uq);
 			umtxq_unbusy(&uq->uq_key);
@@ -1033,16 +1057,16 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags)
 	/*
 	 * Make sure we own this mtx.
 	 */
-	owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner));
-	if (owner == -1)
+	error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner);
+	if (error == -1)
 		return (EFAULT);
 
 	if ((owner & ~UMUTEX_CONTESTED) != id)
 		return (EPERM);
 
 	if ((owner & UMUTEX_CONTESTED) == 0) {
-		old = casuword32(&m->m_owner, owner, UMUTEX_UNOWNED);
-		if (old == -1)
+		error = casueword32(&m->m_owner, owner, &old, UMUTEX_UNOWNED);
+		if (error == -1)
 			return (EFAULT);
 		if (old == owner)
 			return (0);
@@ -1064,14 +1088,14 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags)
 	 * there is zero or one thread only waiting for it.
 	 * Otherwise, it must be marked as contested.
 	 */
-	old = casuword32(&m->m_owner, owner,
-		count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED);
+	error = casueword32(&m->m_owner, owner, &old,
+	    count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED);
 	umtxq_lock(&key);
 	umtxq_signal(&key,1);
 	umtxq_unbusy(&key);
 	umtxq_unlock(&key);
 	umtx_key_release(&key);
-	if (old == -1)
+	if (error == -1)
 		return (EFAULT);
 	if (old != owner)
 		return (EINVAL);
@@ -1091,14 +1115,16 @@ do_wake_umutex(struct thread *td, struct umutex *m)
 	int error;
 	int count;
 
-	owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner));
-	if (owner == -1)
+	error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner);
+	if (error == -1)
 		return (EFAULT);
 
 	if ((owner & ~UMUTEX_CONTESTED) != 0)
 		return (0);
 
-	flags = fuword32(&m->m_flags);
+	error = fueword32(&m->m_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 
 	/* We should only ever be in here for contested locks */
 	if ((error = umtx_key_get(m, TYPE_NORMAL_UMUTEX, GET_SHARE(flags),
@@ -1110,16 +1136,20 @@ do_wake_umutex(struct thread *td, struct umutex *m)
 	count = umtxq_count(&key);
 	umtxq_unlock(&key);
 
-	if (count <= 1)
-		owner = casuword32(&m->m_owner, UMUTEX_CONTESTED, UMUTEX_UNOWNED);
+	if (count <= 1) {
+		error = casueword32(&m->m_owner, UMUTEX_CONTESTED, &owner,
+		    UMUTEX_UNOWNED);
+		if (error == -1)
+			error = EFAULT;
+	}
 
 	umtxq_lock(&key);
-	if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0)
+	if (error == 0 && count != 0 && (owner & ~UMUTEX_CONTESTED) == 0)
 		umtxq_signal(&key, 1);
 	umtxq_unbusy(&key);
 	umtxq_unlock(&key);
 	umtx_key_release(&key);
-	return (0);
+	return (error);
 }
 
 /*
@@ -1162,41 +1192,49 @@ do_wake2_umutex(struct thread *td, struct umutex *m, uint32_t flags)
 	 * any memory.
 	 */
 	if (count > 1) {
-		owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner));
-		while ((owner & UMUTEX_CONTESTED) ==0) {
-			old = casuword32(&m->m_owner, owner,
-			    owner|UMUTEX_CONTESTED);
+		error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner),
+		    &owner);
+		if (error == -1)
+			error = EFAULT;
+		while (error == 0 && (owner & UMUTEX_CONTESTED) == 0) {
+			error = casueword32(&m->m_owner, owner, &old,
+			    owner | UMUTEX_CONTESTED);
+			if (error == -1) {
+				error = EFAULT;
+				break;
+			}
 			if (old == owner)
 				break;
 			owner = old;
-			if (old == -1)
-				break;
 			error = umtxq_check_susp(td);
 			if (error != 0)
 				break;
 		}
 	} else if (count == 1) {
-		owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner));
-		while ((owner & ~UMUTEX_CONTESTED) != 0 &&
+		error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner),
+		    &owner);
+		if (error == -1)
+			error = EFAULT;
+		while (error == 0 && (owner & ~UMUTEX_CONTESTED) != 0 &&
 		       (owner & UMUTEX_CONTESTED) == 0) {
-			old = casuword32(&m->m_owner, owner,
-			    owner|UMUTEX_CONTESTED);
+			error = casueword32(&m->m_owner, owner, &old,
+			    owner | UMUTEX_CONTESTED);
+			if (error == -1) {
+				error = EFAULT;
+				break;
+			}
 			if (old == owner)
 				break;
 			owner = old;
-			if (old == -1)
-				break;
 			error = umtxq_check_susp(td);
 			if (error != 0)
 				break;
 		}
 	}
 	umtxq_lock(&key);
-	if (owner == -1) {
-		error = EFAULT;
+	if (error == EFAULT) {
 		umtxq_signal(&key, INT_MAX);
-	}
-	else if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0)
+	} else if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0)
 		umtxq_signal(&key, 1);
 	umtxq_unbusy(&key);
 	umtxq_unlock(&key);
@@ -1576,7 +1614,7 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags,
 	struct umtx_q *uq;
 	struct umtx_pi *pi, *new_pi;
 	uint32_t id, owner, old;
-	int error;
+	int error, rv;
 
 	id = td->td_tid;
 	uq = td->td_umtxq;
@@ -1619,7 +1657,12 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags,
 		/*
 		 * Try the uncontested case.  This should be done in userland.
 		 */
-		owner = casuword32(&m->m_owner, UMUTEX_UNOWNED, id);
+		rv = casueword32(&m->m_owner, UMUTEX_UNOWNED, &owner, id);
+		/* The address was invalid. */
+		if (rv == -1) {
+			error = EFAULT;
+			break;
+		}
 
 		/* The acquire succeeded. */
 		if (owner == UMUTEX_UNOWNED) {
@@ -1627,16 +1670,15 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags,
 			break;
 		}
 
-		/* The address was invalid. */
-		if (owner == -1) {
-			error = EFAULT;
-			break;
-		}
-
 		/* If no one owns it but it is contested try to acquire it. */
 		if (owner == UMUTEX_CONTESTED) {
-			owner = casuword32(&m->m_owner,
-			    UMUTEX_CONTESTED, id | UMUTEX_CONTESTED);
+			rv = casueword32(&m->m_owner,
+			    UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED);
+			/* The address was invalid. */
+			if (rv == -1) {
+				error = EFAULT;
+				break;
+			}
 
 			if (owner == UMUTEX_CONTESTED) {
 				umtxq_lock(&uq->uq_key);
@@ -1647,12 +1689,6 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags,
 				break;
 			}
 
-			/* The address was invalid. */
-			if (owner == -1) {
-				error = EFAULT;
-				break;
-			}
-
 			error = umtxq_check_susp(td);
 			if (error != 0)
 				break;
@@ -1683,13 +1719,12 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags,
 		 * either some one else has acquired the lock or it has been
 		 * released.
 		 */
-		old = casuword32(&m->m_owner, owner, owner | UMUTEX_CONTESTED);
+		rv = casueword32(&m->m_owner, owner, &old,
+		    owner | UMUTEX_CONTESTED);
 
 		/* The address was invalid. */
-		if (old == -1) {
-			umtxq_lock(&uq->uq_key);
-			umtxq_unbusy(&uq->uq_key);
-			umtxq_unlock(&uq->uq_key);
+		if (rv == -1) {
+			umtxq_unbusy_unlocked(&uq->uq_key);
 			error = EFAULT;
 			break;
 		}
@@ -1741,8 +1776,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags)
 	/*
 	 * Make sure we own this mtx.
 	 */
-	owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner));
-	if (owner == -1)
+	error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner);
+	if (error == -1)
 		return (EFAULT);
 
 	if ((owner & ~UMUTEX_CONTESTED) != id)
@@ -1750,8 +1785,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags)
 
 	/* This should be done in userland */
 	if ((owner & UMUTEX_CONTESTED) == 0) {
-		old = casuword32(&m->m_owner, owner, UMUTEX_UNOWNED);
-		if (old == -1)
+		error = casueword32(&m->m_owner, owner, &old, UMUTEX_UNOWNED);
+		if (error == -1)
 			return (EFAULT);
 		if (old == owner)
 			return (0);
@@ -1809,14 +1844,12 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags)
 	 * there is zero or one thread only waiting for it.
 	 * Otherwise, it must be marked as contested.
 	 */
-	old = casuword32(&m->m_owner, owner,
-		count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED);
+	error = casueword32(&m->m_owner, owner, &old,
+	    count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED);
 
-	umtxq_lock(&key);
-	umtxq_unbusy(&key);
-	umtxq_unlock(&key);
+	umtxq_unbusy_unlocked(&key);
 	umtx_key_release(&key);
-	if (old == -1)
+	if (error == -1)
 		return (EFAULT);
 	if (old != owner)
 		return (EINVAL);
@@ -1835,7 +1868,7 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags,
 	struct umtx_pi *pi;
 	uint32_t ceiling;
 	uint32_t owner, id;
-	int error, pri, old_inherited_pri, su;
+	int error, pri, old_inherited_pri, su, rv;
 
 	id = td->td_tid;
 	uq = td->td_umtxq;
@@ -1853,7 +1886,12 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags,
 		umtxq_busy(&uq->uq_key);
 		umtxq_unlock(&uq->uq_key);
 
-		ceiling = RTP_PRIO_MAX - fuword32(&m->m_ceilings[0]);
+		rv = fueword32(&m->m_ceilings[0], &ceiling);
+		if (rv == -1) {
+			error = EFAULT;
+			goto out;
+		}
+		ceiling = RTP_PRIO_MAX - ceiling;
 		if (ceiling > RTP_PRIO_MAX) {
 			error = EINVAL;
 			goto out;
@@ -1874,17 +1912,16 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags,
 		}
 		mtx_unlock_spin(&umtx_lock);
 
-		owner = casuword32(&m->m_owner,
-		    UMUTEX_CONTESTED, id | UMUTEX_CONTESTED);
-
-		if (owner == UMUTEX_CONTESTED) {
-			error = 0;
+		rv = casueword32(&m->m_owner,
+		    UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED);
+		/* The address was invalid. */
+		if (rv == -1) {
+			error = EFAULT;
 			break;
 		}
 
-		/* The address was invalid. */
-		if (owner == -1) {
-			error = EFAULT;
+		if (owner == UMUTEX_CONTESTED) {
+			error = 0;
 			break;
 		}
 
@@ -1946,9 +1983,7 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags,
 	}
 
 out:
-	umtxq_lock(&uq->uq_key);
-	umtxq_unbusy(&uq->uq_key);
-	umtxq_unlock(&uq->uq_key);
+	umtxq_unbusy_unlocked(&uq->uq_key);
 	umtx_key_release(&uq->uq_key);
 	return (error);
 }
@@ -1973,8 +2008,8 @@ do_unlock_pp(struct thread *td, struct umutex *m, uint32_t flags)
 	/*
 	 * Make sure we own this mtx.
 	 */
-	owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner));
-	if (owner == -1)
+	error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner);
+	if (error == -1)
 		return (EFAULT);
 
 	if ((owner & ~UMUTEX_CONTESTED) != id)
@@ -2047,9 +2082,11 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling,
 	uint32_t save_ceiling;
 	uint32_t owner, id;
 	uint32_t flags;
-	int error;
+	int error, rv;
 
-	flags = fuword32(&m->m_flags);
+	error = fueword32(&m->m_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 	if ((flags & UMUTEX_PRIO_PROTECT) == 0)
 		return (EINVAL);
 	if (ceiling > RTP_PRIO_MAX)
@@ -2064,10 +2101,18 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling,
 		umtxq_busy(&uq->uq_key);
 		umtxq_unlock(&uq->uq_key);
 
-		save_ceiling = fuword32(&m->m_ceilings[0]);
+		rv = fueword32(&m->m_ceilings[0], &save_ceiling);
+		if (rv == -1) {
+			error = EFAULT;
+			break;
+		}
 
-		owner = casuword32(&m->m_owner,
-		    UMUTEX_CONTESTED, id | UMUTEX_CONTESTED);
+		rv = casueword32(&m->m_owner,
+		    UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED);
+		if (rv == -1) {
+			error = EFAULT;
+			break;
+		}
 
 		if (owner == UMUTEX_CONTESTED) {
 			suword32(&m->m_ceilings[0], ceiling);
@@ -2077,12 +2122,6 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling,
 			break;
 		}
 
-		/* The address was invalid. */
-		if (owner == -1) {
-			error = EFAULT;
-			break;
-		}
-
 		if ((owner & ~UMUTEX_CONTESTED) == id) {
 			suword32(&m->m_ceilings[0], ceiling);
 			error = 0;
@@ -2129,8 +2168,8 @@ do_lock_umutex(struct thread *td, struct umutex *m,
 	uint32_t flags;
 	int error;
 
-	flags = fuword32(&m->m_flags);
-	if (flags == -1)
+	error = fueword32(&m->m_flags, &flags);
+	if (error == -1)
 		return (EFAULT);
 
 	switch(flags & (UMUTEX_PRIO_INHERIT | UMUTEX_PRIO_PROTECT)) {
@@ -2164,9 +2203,10 @@ static int
 do_unlock_umutex(struct thread *td, struct umutex *m)
 {
 	uint32_t flags;
+	int error;
 
-	flags = fuword32(&m->m_flags);
-	if (flags == -1)
+	error = fueword32(&m->m_flags, &flags);
+	if (error == -1)
 		return (EFAULT);
 
 	switch(flags & (UMUTEX_PRIO_INHERIT | UMUTEX_PRIO_PROTECT)) {
@@ -2187,21 +2227,27 @@ do_cv_wait(struct thread *td, struct ucond *cv, struct umutex *m,
 {
 	struct abs_timeout timo;
 	struct umtx_q *uq;
-	uint32_t flags;
-	uint32_t clockid;
+	uint32_t flags, clockid, hasw;
 	int error;
 
 	uq = td->td_umtxq;
-	flags = fuword32(&cv->c_flags);
+	error = fueword32(&cv->c_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 	error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &uq->uq_key);
 	if (error != 0)
 		return (error);
 
 	if ((wflags & CVWAIT_CLOCKID) != 0) {
-		clockid = fuword32(&cv->c_clockid);
+		error = fueword32(&cv->c_clockid, &clockid);
+		if (error == -1) {
+			umtx_key_release(&uq->uq_key);
+			return (EFAULT);
+		}
 		if (clockid < CLOCK_REALTIME ||
 		    clockid >= CLOCK_THREAD_CPUTIME_ID) {
 			/* hmm, only HW clock id will work. */
+			umtx_key_release(&uq->uq_key);
 			return (EINVAL);
 		}
 	} else {
@@ -2217,12 +2263,12 @@ do_cv_wait(struct thread *td, struct ucond *cv, struct umutex *m,
 	 * Set c_has_waiters to 1 before releasing user mutex, also
 	 * don't modify cache line when unnecessary.
 	 */
-	if (fuword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters)) == 0)
+	error = fueword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters),
+	    &hasw);
+	if (error == 0 && hasw == 0)
 		suword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), 1);
 
-	umtxq_lock(&uq->uq_key);
-	umtxq_unbusy(&uq->uq_key);
-	umtxq_unlock(&uq->uq_key);
+	umtxq_unbusy_unlocked(&uq->uq_key);
 
 	error = do_unlock_umutex(td, m);
 
@@ -2276,7 +2322,9 @@ do_cv_signal(struct thread *td, struct ucond *cv)
 	int error, cnt, nwake;
 	uint32_t flags;
 
-	flags = fuword32(&cv->c_flags);
+	error = fueword32(&cv->c_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 	if ((error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &key)) != 0)
 		return (error);	
 	umtxq_lock(&key);
@@ -2287,6 +2335,8 @@ do_cv_signal(struct thread *td, struct ucond *cv)
 		umtxq_unlock(&key);
 		error = suword32(
 		    __DEVOLATILE(uint32_t *, &cv->c_has_waiters), 0);
+		if (error == -1)
+			error = EFAULT;
 		umtxq_lock(&key);
 	}
 	umtxq_unbusy(&key);
@@ -2302,7 +2352,9 @@ do_cv_broadcast(struct thread *td, struct ucond *cv)
 	int error;
 	uint32_t flags;
 
-	flags = fuword32(&cv->c_flags);
+	error = fueword32(&cv->c_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 	if ((error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &key)) != 0)
 		return (error);	
 
@@ -2312,10 +2364,10 @@ do_cv_broadcast(struct thread *td, struct ucond *cv)
 	umtxq_unlock(&key);
 
 	error = suword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), 0);
+	if (error == -1)
+		error = EFAULT;
 
-	umtxq_lock(&key);
-	umtxq_unbusy(&key);
-	umtxq_unlock(&key);
+	umtxq_unbusy_unlocked(&key);
 
 	umtx_key_release(&key);
 	return (error);
@@ -2329,10 +2381,12 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx
 	uint32_t flags, wrflags;
 	int32_t state, oldstate;
 	int32_t blocked_readers;
-	int error;
+	int error, rv;
 
 	uq = td->td_umtxq;
-	flags = fuword32(&rwlock->rw_flags);
+	error = fueword32(&rwlock->rw_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 	error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key);
 	if (error != 0)
 		return (error);
@@ -2345,15 +2399,22 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx
 		wrflags |= URWLOCK_WRITE_WAITERS;
 
 	for (;;) {
-		state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state));
+		rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state),
+		    &state);
+		if (rv == -1) {
+			umtx_key_release(&uq->uq_key);
+			return (EFAULT);
+		}
+
 		/* try to lock it */
 		while (!(state & wrflags)) {
 			if (__predict_false(URWLOCK_READER_COUNT(state) == URWLOCK_MAX_READERS)) {
 				umtx_key_release(&uq->uq_key);
 				return (EAGAIN);
 			}
-			oldstate = casuword32(&rwlock->rw_state, state, state + 1);
-			if (oldstate == -1) {
+			rv = casueword32(&rwlock->rw_state, state,
+			    &oldstate, state + 1);
+			if (rv == -1) {
 				umtx_key_release(&uq->uq_key);
 				return (EFAULT);
 			}
@@ -2379,12 +2440,17 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx
 		 * re-read the state, in case it changed between the try-lock above
 		 * and the check below
 		 */
-		state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state));
+		rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state),
+		    &state);
+		if (rv == -1)
+			error = EFAULT;
 
 		/* set read contention bit */
-		while ((state & wrflags) && !(state & URWLOCK_READ_WAITERS)) {
-			oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_READ_WAITERS);
-			if (oldstate == -1) {
+		while (error == 0 && (state & wrflags) &&
+		    !(state & URWLOCK_READ_WAITERS)) {
+			rv = casueword32(&rwlock->rw_state, state,
+			    &oldstate, state | URWLOCK_READ_WAITERS);
+			if (rv == -1) {
 				error = EFAULT;
 				break;
 			}
@@ -2396,17 +2462,13 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx
 				break;
 		}
 		if (error != 0) {
-			umtxq_lock(&uq->uq_key);
-			umtxq_unbusy(&uq->uq_key);
-			umtxq_unlock(&uq->uq_key);
+			umtxq_unbusy_unlocked(&uq->uq_key);
 			break;
 		}
 
 		/* state is changed while setting flags, restart */
 		if (!(state & wrflags)) {
-			umtxq_lock(&uq->uq_key);
-			umtxq_unbusy(&uq->uq_key);
-			umtxq_unlock(&uq->uq_key);
+			umtxq_unbusy_unlocked(&uq->uq_key);
 			error = umtxq_check_susp(td);
 			if (error != 0)
 				break;
@@ -2415,7 +2477,13 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx
 
 sleep:
 		/* contention bit is set, before sleeping, increase read waiter count */
-		blocked_readers = fuword32(&rwlock->rw_blocked_readers);
+		rv = fueword32(&rwlock->rw_blocked_readers,
+		    &blocked_readers);
+		if (rv == -1) {
+			umtxq_unbusy_unlocked(&uq->uq_key);
+			error = EFAULT;
+			break;
+		}
 		suword32(&rwlock->rw_blocked_readers, blocked_readers+1);
 
 		while (state & wrflags) {
@@ -2431,18 +2499,32 @@ sleep:
 			umtxq_unlock(&uq->uq_key);
 			if (error)
 				break;
-			state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state));
+			rv = fueword32(__DEVOLATILE(int32_t *,
+			    &rwlock->rw_state), &state);
+			if (rv == -1) {
+				error = EFAULT;
+				break;
+			}
 		}
 
 		/* decrease read waiter count, and may clear read contention bit */
-		blocked_readers = fuword32(&rwlock->rw_blocked_readers);
+		rv = fueword32(&rwlock->rw_blocked_readers,
+		    &blocked_readers);
+		if (rv == -1) {
+			umtxq_unbusy_unlocked(&uq->uq_key);
+			error = EFAULT;
+			break;
+		}
 		suword32(&rwlock->rw_blocked_readers, blocked_readers-1);
 		if (blocked_readers == 1) {
-			state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state));
-			for (;;) {
-				oldstate = casuword32(&rwlock->rw_state, state,
-					 state & ~URWLOCK_READ_WAITERS);
-				if (oldstate == -1) {
+			rv = fueword32(__DEVOLATILE(int32_t *,
+			    &rwlock->rw_state), &state);
+			if (rv == -1)
+				error = EFAULT;
+			while (error == 0) {
+				rv = casueword32(&rwlock->rw_state, state,
+				    &oldstate, state & ~URWLOCK_READ_WAITERS);
+				if (rv == -1) {
 					error = EFAULT;
 					break;
 				}
@@ -2450,14 +2532,10 @@ sleep:
 					break;
 				state = oldstate;
 				error = umtxq_check_susp(td);
-				if (error != 0)
-					break;
 			}
 		}
 
-		umtxq_lock(&uq->uq_key);
-		umtxq_unbusy(&uq->uq_key);
-		umtxq_unlock(&uq->uq_key);
+		umtxq_unbusy_unlocked(&uq->uq_key);
 		if (error != 0)
 			break;
 	}
@@ -2476,10 +2554,12 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo
 	int32_t state, oldstate;
 	int32_t blocked_writers;
 	int32_t blocked_readers;
-	int error;
+	int error, rv;
 
 	uq = td->td_umtxq;
-	flags = fuword32(&rwlock->rw_flags);
+	error = fueword32(&rwlock->rw_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 	error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key);
 	if (error != 0)
 		return (error);
@@ -2489,10 +2569,16 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo
 
 	blocked_readers = 0;
 	for (;;) {
-		state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state));
+		rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state),
+		    &state);
+		if (rv == -1) {
+			umtx_key_release(&uq->uq_key);
+			return (EFAULT);
+		}
 		while (!(state & URWLOCK_WRITE_OWNER) && URWLOCK_READER_COUNT(state) == 0) {
-			oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_WRITE_OWNER);
-			if (oldstate == -1) {
+			rv = casueword32(&rwlock->rw_state, state,
+			    &oldstate, state | URWLOCK_WRITE_OWNER);
+			if (rv == -1) {
 				umtx_key_release(&uq->uq_key);
 				return (EFAULT);
 			}
@@ -2528,12 +2614,17 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo
 		 * re-read the state, in case it changed between the try-lock above
 		 * and the check below
 		 */
-		state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state));
+		rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state),
+		    &state);
+		if (rv == -1)
+			error = EFAULT;
 
-		while (((state & URWLOCK_WRITE_OWNER) || URWLOCK_READER_COUNT(state) != 0) &&
-		       (state & URWLOCK_WRITE_WAITERS) == 0) {
-			oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_WRITE_WAITERS);
-			if (oldstate == -1) {
+		while (error == 0 && ((state & URWLOCK_WRITE_OWNER) ||
+		    URWLOCK_READER_COUNT(state) != 0) &&
+		    (state & URWLOCK_WRITE_WAITERS) == 0) {
+			rv = casueword32(&rwlock->rw_state, state,
+			    &oldstate, state | URWLOCK_WRITE_WAITERS);
+			if (rv == -1) {
 				error = EFAULT;
 				break;
 			}
@@ -2545,23 +2636,25 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo
 				break;
 		}
 		if (error != 0) {
-			umtxq_lock(&uq->uq_key);
-			umtxq_unbusy(&uq->uq_key);
-			umtxq_unlock(&uq->uq_key);
+			umtxq_unbusy_unlocked(&uq->uq_key);
 			break;
 		}
 
 		if (!(state & URWLOCK_WRITE_OWNER) && URWLOCK_READER_COUNT(state) == 0) {
-			umtxq_lock(&uq->uq_key);
-			umtxq_unbusy(&uq->uq_key);
-			umtxq_unlock(&uq->uq_key);
+			umtxq_unbusy_unlocked(&uq->uq_key);
 			error = umtxq_check_susp(td);
 			if (error != 0)
 				break;
 			continue;
 		}
 sleep:
-		blocked_writers = fuword32(&rwlock->rw_blocked_writers);
+		rv = fueword32(&rwlock->rw_blocked_writers,
+		    &blocked_writers);
+		if (rv == -1) {
+			umtxq_unbusy_unlocked(&uq->uq_key);
+			error = EFAULT;
+			break;
+		}
 		suword32(&rwlock->rw_blocked_writers, blocked_writers+1);
 
 		while ((state & URWLOCK_WRITE_OWNER) || URWLOCK_READER_COUNT(state) != 0) {
@@ -2577,17 +2670,34 @@ sleep:
 			umtxq_unlock(&uq->uq_key);
 			if (error)
 				break;
-			state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state));
+			rv = fueword32(__DEVOLATILE(int32_t *,
+			    &rwlock->rw_state), &state);
+			if (rv == -1) {
+				error = EFAULT;
+				break;
+			}
 		}
 
-		blocked_writers = fuword32(&rwlock->rw_blocked_writers);
+		rv = fueword32(&rwlock->rw_blocked_writers,
+		    &blocked_writers);
+		if (rv == -1) {
+			umtxq_unbusy_unlocked(&uq->uq_key);
+			error = EFAULT;
+			break;
+		}
 		suword32(&rwlock->rw_blocked_writers, blocked_writers-1);
 		if (blocked_writers == 1) {
-			state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state));
+			rv = fueword32(__DEVOLATILE(int32_t *,
+			    &rwlock->rw_state), &state);
+			if (rv == -1) {
+				umtxq_unbusy_unlocked(&uq->uq_key);
+				error = EFAULT;
+				break;
+			}
 			for (;;) {
-				oldstate = casuword32(&rwlock->rw_state, state,
-					 state & ~URWLOCK_WRITE_WAITERS);
-				if (oldstate == -1) {
+				rv = casueword32(&rwlock->rw_state, state,
+				    &oldstate, state & ~URWLOCK_WRITE_WAITERS);
+				if (rv == -1) {
 					error = EFAULT;
 					break;
 				}
@@ -2603,13 +2713,17 @@ sleep:
 				if (error != 0)
 					break;
 			}
-			blocked_readers = fuword32(&rwlock->rw_blocked_readers);
+			rv = fueword32(&rwlock->rw_blocked_readers,
+			    &blocked_readers);
+			if (rv == -1) {
+				umtxq_unbusy_unlocked(&uq->uq_key);
+				error = EFAULT;
+				break;
+			}
 		} else
 			blocked_readers = 0;
 
-		umtxq_lock(&uq->uq_key);
-		umtxq_unbusy(&uq->uq_key);
-		umtxq_unlock(&uq->uq_key);
+		umtxq_unbusy_unlocked(&uq->uq_key);
 	}
 
 	umtx_key_release(&uq->uq_key);
@@ -2624,20 +2738,26 @@ do_rw_unlock(struct thread *td, struct urwlock *rwlock)
 	struct umtx_q *uq;
 	uint32_t flags;
 	int32_t state, oldstate;
-	int error, q, count;
+	int error, rv, q, count;
 
 	uq = td->td_umtxq;
-	flags = fuword32(&rwlock->rw_flags);
+	error = fueword32(&rwlock->rw_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 	error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key);
 	if (error != 0)
 		return (error);
 
-	state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state));
+	error = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), &state);
+	if (error == -1) {
+		error = EFAULT;
+		goto out;
+	}
 	if (state & URWLOCK_WRITE_OWNER) {
 		for (;;) {
-			oldstate = casuword32(&rwlock->rw_state, state, 
-				state & ~URWLOCK_WRITE_OWNER);
-			if (oldstate == -1) {
+			rv = casueword32(&rwlock->rw_state, state, 
+			    &oldstate, state & ~URWLOCK_WRITE_OWNER);
+			if (rv == -1) {
 				error = EFAULT;
 				goto out;
 			}
@@ -2655,9 +2775,9 @@ do_rw_unlock(struct thread *td, struct urwlock *rwlock)
 		}
 	} else if (URWLOCK_READER_COUNT(state) != 0) {
 		for (;;) {
-			oldstate = casuword32(&rwlock->rw_state, state,
-				state - 1);
-			if (oldstate == -1) {
+			rv = casueword32(&rwlock->rw_state, state,
+			    &oldstate, state - 1);
+			if (rv == -1) {
 				error = EFAULT;
 				goto out;
 			}
@@ -2716,11 +2836,13 @@ do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout)
 {
 	struct abs_timeout timo;
 	struct umtx_q *uq;
-	uint32_t flags, count;
-	int error;
+	uint32_t flags, count, count1;
+	int error, rv;
 
 	uq = td->td_umtxq;
-	flags = fuword32(&sem->_flags);
+	error = fueword32(&sem->_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 	error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &uq->uq_key);
 	if (error != 0)
 		return (error);
@@ -2732,15 +2854,16 @@ do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout)
 	umtxq_busy(&uq->uq_key);
 	umtxq_insert(uq);
 	umtxq_unlock(&uq->uq_key);
-	casuword32(&sem->_has_waiters, 0, 1);
-	count = fuword32(__DEVOLATILE(uint32_t *, &sem->_count));
-	if (count != 0) {
+	rv = casueword32(&sem->_has_waiters, 0, &count1, 1);
+	if (rv == 0)
+		rv = fueword32(__DEVOLATILE(uint32_t *, &sem->_count), &count);
+	if (rv == -1 || count != 0) {
 		umtxq_lock(&uq->uq_key);
 		umtxq_unbusy(&uq->uq_key);
 		umtxq_remove(uq);
 		umtxq_unlock(&uq->uq_key);
 		umtx_key_release(&uq->uq_key);
-		return (0);
+		return (rv == -1 ? EFAULT : 0);
 	}
 	umtxq_lock(&uq->uq_key);
 	umtxq_unbusy(&uq->uq_key);
@@ -2771,7 +2894,9 @@ do_sem_wake(struct thread *td, struct _usem *sem)
 	int error, cnt;
 	uint32_t flags;
 
-	flags = fuword32(&sem->_flags);
+	error = fueword32(&sem->_flags, &flags);
+	if (error == -1)
+		return (EFAULT);
 	if ((error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &key)) != 0)
 		return (error);	
 	umtxq_lock(&key);
@@ -2789,6 +2914,8 @@ do_sem_wake(struct thread *td, struct _usem *sem)
 			error = suword32(
 			    __DEVOLATILE(uint32_t *, &sem->_has_waiters), 0);
 			umtxq_lock(&key);
+			if (error == -1)
+				error = EFAULT;
 		}
 	}
 	umtxq_unbusy(&key);
@@ -2804,7 +2931,7 @@ do_sem2_wait(struct thread *td, struct _usem2 *sem, struct _umtx_time *timeout)
 	struct abs_timeout timo;
 	struct umtx_q *uq;
 	uint32_t count, flags;
-	int error;
+	int error, rv;
 
 	uq = td->td_umtxq;
 	flags = fuword32(&sem->_flags);
@@ -2819,8 +2946,8 @@ do_sem2_wait(struct thread *td, struct _usem2 *sem, struct _umtx_time *timeout)
 	umtxq_busy(&uq->uq_key);
 	umtxq_insert(uq);
 	umtxq_unlock(&uq->uq_key);
-	count = fuword32(__DEVOLATILE(uint32_t *, &sem->_count));
-	if (count == -1) {
+	rv = fueword32(__DEVOLATILE(uint32_t *, &sem->_count), &count);
+	if (rv == -1) {
 		umtxq_lock(&uq->uq_key);
 		umtxq_unbusy(&uq->uq_key);
 		umtxq_remove(uq);
@@ -2839,8 +2966,8 @@ do_sem2_wait(struct thread *td, struct _usem2 *sem, struct _umtx_time *timeout)
 		}
 		if (count == USEM_HAS_WAITERS)
 			break;
-		count = casuword32(&sem->_count, 0, USEM_HAS_WAITERS);
-		if (count == -1) {
+		rv = casueword32(&sem->_count, 0, &count, USEM_HAS_WAITERS);
+		if (rv == -1) {
 			umtxq_lock(&uq->uq_key);
 			umtxq_unbusy(&uq->uq_key);
 			umtxq_remove(uq);
@@ -2877,10 +3004,12 @@ static int
 do_sem2_wake(struct thread *td, struct _usem2 *sem)
 {
 	struct umtx_key key;
-	int error, cnt;
+	int error, cnt, rv;
 	uint32_t count, flags;
 
-	flags = fuword32(&sem->_flags);
+	rv = fueword32(&sem->_flags, &flags);
+	if (rv == -1)
+		return (EFAULT);
 	if ((error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &key)) != 0)
 		return (error);	
 	umtxq_lock(&key);
@@ -2895,12 +3024,12 @@ do_sem2_wake(struct thread *td, struct _usem2 *sem)
 		 */
 		if (cnt == 1) {
 			umtxq_unlock(&key);
-			count = fuword32(__DEVOLATILE(uint32_t *,
-			    &sem->_count));
-			while (count != -1 && count & USEM_HAS_WAITERS)
-				count = casuword32(&sem->_count, count,
+			rv = fueword32(__DEVOLATILE(uint32_t *, &sem->_count),
+			    &count);
+			while (rv != -1 && count & USEM_HAS_WAITERS)
+				rv = casueword32(&sem->_count, count, &count,
 				    count & ~USEM_HAS_WAITERS);
-			if (count == -1)
+			if (rv == -1)
 				error = EFAULT;
 			umtxq_lock(&key);
 		}
diff --git a/sys/kern/subr_uio.c b/sys/kern/subr_uio.c
index f2e6e32..f2bbb0c 100644
--- a/sys/kern/subr_uio.c
+++ b/sys/kern/subr_uio.c
@@ -7,6 +7,11 @@
  * Co. or Unix System Laboratories, Inc. and are reproduced herein with
  * the permission of UNIX System Laboratories, Inc.
  *
+ * Copyright (c) 2014 The FreeBSD Foundation
+ *
+ * Portions of this software were developed by Konstantin Belousov
+ * under sponsorship from the FreeBSD Foundation.
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
@@ -438,3 +443,128 @@ copyout_unmap(struct thread *td, vm_offset_t addr, size_t sz)
 
 	return (0);
 }
+
+#ifdef NO_FUEWORD
+/*
+ * XXXKIB The temporal implementation of fue*() functions which do not
+ * handle usermode -1 properly, mixing it with the fault code.  Keep
+ * this until MD code is written.  Currently sparc64, mips and arm do
+ * not have proper implementation.
+ */
+
+int
+fueword(const void *base, long *val)
+{
+	long res;
+
+	res = fuword(base);
+	if (res == -1)
+		return (-1);
+	*val = res;
+	return (0);
+}
+
+int
+fueword32(const void *base, int32_t *val)
+{
+	int32_t res;
+
+	res = fuword32(base);
+	if (res == -1)
+		return (-1);
+	*val = res;
+	return (0);
+}
+
+#ifdef _LP64
+int
+fueword64(const void *base, int64_t *val)
+{
+	int32_t res;
+
+	res = fuword64(base);
+	if (res == -1)
+		return (-1);
+	*val = res;
+	return (0);
+}
+#endif
+
+int
+casueword32(volatile uint32_t *base, uint32_t oldval, uint32_t *oldvalp,
+    uint32_t newval)
+{
+	int32_t ov;
+
+	ov = casuword32(base, oldval, newval);
+	if (ov == -1)
+		return (-1);
+	*oldvalp = ov;
+	return (0);
+}
+
+int
+casueword(volatile u_long *p, u_long oldval, u_long *oldvalp, u_long newval)
+{
+	u_long ov;
+
+	ov = casuword(p, oldval, newval);
+	if (ov == -1)
+		return (-1);
+	*oldvalp = ov;
+	return (0);
+}
+#else /* NO_FUEWORD */
+int32_t
+fuword32(const void *addr)
+{
+	int rv;
+	int32_t val;
+
+	rv = fueword32(addr, &val);
+	return (rv == -1 ? -1 : val);
+}
+
+#ifdef _LP64
+int64_t
+fuword64(const void *addr)
+{
+	int rv;
+	int64_t val;
+
+	rv = fueword64(addr, &val);
+	return (rv == -1 ? -1 : val);
+}
+#endif /* _LP64 */
+
+long
+fuword(const void *addr)
+{
+	long val;
+	int rv;
+
+	rv = fueword(addr, &val);
+	return (rv == -1 ? -1 : val);
+}
+
+uint32_t
+casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new)
+{
+	int rv;
+	uint32_t val;
+
+	rv = casueword32(addr, old, &val, new);
+	return (rv == -1 ? -1 : val);
+}
+
+u_long
+casuword(volatile u_long *addr, u_long old, u_long new)
+{
+	int rv;
+	u_long val;
+
+	rv = casueword(addr, old, &val, new);
+	return (rv == -1 ? -1 : val);
+}
+
+#endif /* NO_FUEWORD */
diff --git a/sys/kern/vfs_acl.c b/sys/kern/vfs_acl.c
index 93626fb..e9361e5 100644
--- a/sys/kern/vfs_acl.c
+++ b/sys/kern/vfs_acl.c
@@ -148,6 +148,7 @@ acl_copyin(void *user_acl, struct acl *kernel_acl, acl_type_t type)
 static int
 acl_copyout(struct acl *kernel_acl, void *user_acl, acl_type_t type)
 {
+	uint32_t am;
 	int error;
 	struct oldacl old;
 
@@ -162,8 +163,11 @@ acl_copyout(struct acl *kernel_acl, void *user_acl, acl_type_t type)
 		break;
 
 	default:
-		if (fuword32((char *)user_acl +
-		    offsetof(struct acl, acl_maxcnt)) != ACL_MAX_ENTRIES)
+		error = fueword32((char *)user_acl +
+		    offsetof(struct acl, acl_maxcnt), &am);
+		if (error == -1)
+			return (EFAULT);
+		if (am != ACL_MAX_ENTRIES)
 			return (EINVAL);
 
 		error = copyout(kernel_acl, user_acl, sizeof(*kernel_acl));
diff --git a/sys/mips/include/param.h b/sys/mips/include/param.h
index 2d1d7f1..90f3e6f 100644
--- a/sys/mips/include/param.h
+++ b/sys/mips/include/param.h
@@ -178,4 +178,8 @@
 
 #define	pgtok(x)		((x) * (PAGE_SIZE / 1024))
 
+#ifdef _KERNEL
+#define	NO_FUEWORD	1
+#endif
+
 #endif /* !_MIPS_INCLUDE_PARAM_H_ */
diff --git a/sys/net/if_spppsubr.c b/sys/net/if_spppsubr.c
index 9dc55c5..c0f8e39 100644
--- a/sys/net/if_spppsubr.c
+++ b/sys/net/if_spppsubr.c
@@ -5060,7 +5060,8 @@ sppp_params(struct sppp *sp, u_long cmd, void *data)
 	 * Check the cmd word first before attempting to fetch all the
 	 * data.
 	 */
-	if ((subcmd = fuword(ifr->ifr_data)) == -1) {
+	rv = fueword(ifr->ifr_data, &subcmd);
+	if (rv == -1) {
 		rv = EFAULT;
 		goto quit;
 	}
diff --git a/sys/powerpc/powerpc/copyinout.c b/sys/powerpc/powerpc/copyinout.c
index dcfab80..a337c8b 100644
--- a/sys/powerpc/powerpc/copyinout.c
+++ b/sys/powerpc/powerpc/copyinout.c
@@ -405,14 +405,13 @@ fubyte(const void *addr)
 	return (val);
 }
 
-#ifdef __powerpc64__
-int32_t
-fuword32(const void *addr)
+int
+fuword16(const void *addr)
 {
 	struct		thread *td;
 	pmap_t		pm;
 	faultbuf	env;
-	int32_t		*p, val;
+	uint16_t	*p, val;
 
 	td = curthread;
 	pm = &td->td_proc->p_vmspace->vm_pmap;
@@ -432,15 +431,14 @@ fuword32(const void *addr)
 	td->td_pcb->pcb_onfault = NULL;
 	return (val);
 }
-#endif
 
-long
-fuword(const void *addr)
+int
+fueword32(const void *addr, int32_t *val)
 {
 	struct		thread *td;
 	pmap_t		pm;
 	faultbuf	env;
-	long		*p, val;
+	int32_t		*p;
 
 	td = curthread;
 	pm = &td->td_proc->p_vmspace->vm_pmap;
@@ -455,22 +453,71 @@ fuword(const void *addr)
 		return (-1);
 	}
 
-	val = *p;
+	*val = *p;
 
 	td->td_pcb->pcb_onfault = NULL;
-	return (val);
+	return (0);
 }
 
-#ifndef __powerpc64__
-int32_t
-fuword32(const void *addr)
+#ifdef __powerpc64__
+int
+fueword64(const void *addr, int64_t *val)
 {
-	return ((int32_t)fuword(addr));
+	struct		thread *td;
+	pmap_t		pm;
+	faultbuf	env;
+	int64_t		*p;
+
+	td = curthread;
+	pm = &td->td_proc->p_vmspace->vm_pmap;
+
+	if (setfault(env)) {
+		td->td_pcb->pcb_onfault = NULL;
+		return (-1);
+	}
+
+	if (map_user_ptr(pm, addr, (void **)&p, sizeof(*p), NULL)) {
+		td->td_pcb->pcb_onfault = NULL;
+		return (-1);
+	}
+
+	*val = *p;
+
+	td->td_pcb->pcb_onfault = NULL;
+	return (0);
 }
 #endif
 
-uint32_t
-casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new)
+int
+fueword(const void *addr, long *val)
+{
+	struct		thread *td;
+	pmap_t		pm;
+	faultbuf	env;
+	long		*p;
+
+	td = curthread;
+	pm = &td->td_proc->p_vmspace->vm_pmap;
+
+	if (setfault(env)) {
+		td->td_pcb->pcb_onfault = NULL;
+		return (-1);
+	}
+
+	if (map_user_ptr(pm, addr, (void **)&p, sizeof(*p), NULL)) {
+		td->td_pcb->pcb_onfault = NULL;
+		return (-1);
+	}
+
+	*val = *p;
+
+	td->td_pcb->pcb_onfault = NULL;
+	return (0);
+}
+
+int
+casueword32(volatile uint32_t *addr, uint32_t old, uint32_t *oldvalp,
+    uint32_t new)
 {
 	struct thread *td;
 	pmap_t pm;
@@ -507,18 +554,21 @@ casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new)
 
 	td->td_pcb->pcb_onfault = NULL;
 
-	return (val);
+	*oldvalp = val;
+	return (0);
 }
 
 #ifndef __powerpc64__
-u_long
-casuword(volatile u_long *addr, u_long old, u_long new)
+int
+casueword(volatile u_long *addr, u_long old, u_long *oldvalp, u_long new)
 {
-	return (casuword32((volatile uint32_t *)addr, old, new));
+
+	return (casueword32((volatile uint32_t *)addr, old,
+	    (uint32_t *)oldvalp, new));
 }
 #else
-u_long
-casuword(volatile u_long *addr, u_long old, u_long new)
+int
+casueword(volatile u_long *addr, u_long old, u_long *oldvalp, u_long new)
 {
 	struct thread *td;
 	pmap_t pm;
@@ -555,7 +605,7 @@ casuword(volatile u_long *addr, u_long old, u_long new)
 
 	td->td_pcb->pcb_onfault = NULL;
 
-	return (val);
+	*oldvalp = val;
+	return (0);
 }
 #endif
-
diff --git a/sys/sparc64/include/param.h b/sys/sparc64/include/param.h
index e59f2c4..46bacae 100644
--- a/sys/sparc64/include/param.h
+++ b/sys/sparc64/include/param.h
@@ -146,4 +146,8 @@
 
 #define	pgtok(x)		((unsigned long)(x) * (PAGE_SIZE / 1024))
 
+#ifdef _KERNEL
+#define	NO_FUEWORD	1
+#endif
+
 #endif /* !_SPARC64_INCLUDE_PARAM_H_ */
diff --git a/sys/sys/systm.h b/sys/sys/systm.h
index f4eae57..6e5ee61 100644
--- a/sys/sys/systm.h
+++ b/sys/sys/systm.h
@@ -254,16 +254,23 @@ int	copyout_nofault(const void * __restrict kaddr, void * __restrict udaddr,
 
 int	fubyte(const void *base);
 long	fuword(const void *base);
-int	fuword16(void *base);
+int	fuword16(const void *base);
 int32_t	fuword32(const void *base);
 int64_t	fuword64(const void *base);
+int	fueword(const void *base, long *val);
+int	fueword32(const void *base, int32_t *val);
+int	fueword64(const void *base, int64_t *val);
 int	subyte(void *base, int byte);
 int	suword(void *base, long word);
 int	suword16(void *base, int word);
 int	suword32(void *base, int32_t word);
 int	suword64(void *base, int64_t word);
 uint32_t casuword32(volatile uint32_t *base, uint32_t oldval, uint32_t newval);
-u_long	 casuword(volatile u_long *p, u_long oldval, u_long newval);
+u_long	casuword(volatile u_long *p, u_long oldval, u_long newval);
+int	casueword32(volatile uint32_t *base, uint32_t oldval, uint32_t *oldvalp,
+	    uint32_t newval);
+int	casueword(volatile u_long *p, u_long oldval, u_long *oldvalp,
+	    u_long newval);
 
 void	realitexpire(void *);
 

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 19:27:27 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 212D9123;
 Mon, 27 Oct 2014 19:27:27 +0000 (UTC)
Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com
 [IPv6:2a00:1450:400c:c05::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 868497DC;
 Mon, 27 Oct 2014 19:27:26 +0000 (UTC)
Received: by mail-wi0-f173.google.com with SMTP id ex7so7372473wid.0
 for <multiple recipients>; Mon, 27 Oct 2014 12:27:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=yXICYh3kL1CHBrhvQ6Z/yrJkE3HEaaFW7CBNvky6pM0=;
 b=bi237DWtzoSXkJLCwC0IiVgtjP8lYWewR+jpO9MVDPuTh1f22ssKmuDbDQ9LGEws6R
 BATpyGhS3K9Z5cmJhuFuSWKsF4rzdjhQoaxKGzigA22BUZZ+zC5EPC4/zXCCGoogqUZu
 NFiAPoYkQOedDDrrRjrz7DVsFKAYPrkI53lyWUUudTT1RFKm0Ral+Y3TwpK7vHv+Mg1P
 OA6Tkiw8C/dmoXLrYlV87utdSPezViE0YxjORsTQvm1D3C8pfMd7dx7US6oCCAikjh4X
 wine9HmHFccCL7ifh1v8MEHcAxHK/XTjtKhn73Ir/2ZKkupGiihP4MrDpdMKtzSLiyKG
 AEfQ==
X-Received: by 10.180.20.162 with SMTP id o2mr12790602wie.57.1414438044546;
 Mon, 27 Oct 2014 12:27:24 -0700 (PDT)
Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net.
 [2001:470:1f08:1f7::2])
 by mx.google.com with ESMTPSA id c5sm16603504wje.30.2014.10.27.12.27.23
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Mon, 27 Oct 2014 12:27:23 -0700 (PDT)
Date: Mon, 27 Oct 2014 20:27:21 +0100
From: Mateusz Guzik <mjguzik@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Subject: Re: refcount_release_take_##lock
Message-ID: <20141027192721.GA28049@dft-labs.eu>
References: <20141025184448.GA19066@dft-labs.eu>
 <20141025190407.GU82214@funkthat.com>
 <2629048.tOq3sNXcCP@ralph.baldwin.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <2629048.tOq3sNXcCP@ralph.baldwin.cx>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: John-Mark Gurney <jmg@funkthat.com>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 19:27:27 -0000

On Mon, Oct 27, 2014 at 11:27:45AM -0400, John Baldwin wrote:
> Please keep the refcount_*() prefix so it matches the rest of the API.  I 
> would just declare the functions directly in refcount.h rather than requiring 
> a macro to be invoked in each C file.  We can also just implement the needed 
> lock types for now instead of all of them.
> 
> You could maybe replace 'take' with 'lock', but either name is fine.
> 


We need sx and rwlocks (and temporarily mutexes, but that is going away
in few days).

I ran into the following issue: opensolaris code has its own rwlock.h,
and their refcount.h eventually includes ours refcount.h (and it has to
since e.g. our file.h requires it).

I don't know any good solution.

We could add locking funcs to a separate header (refcount_lock.h?) or use the
following hack:

diff --git a/sys/sys/refcount.h b/sys/sys/refcount.h
index 4611664..ce35131 100644
--- a/sys/sys/refcount.h
+++ b/sys/sys/refcount.h
@@ -29,15 +29,19 @@
 #ifndef __SYS_REFCOUNT_H__
 #define __SYS_REFCOUNT_H__
 
-#include <sys/limits.h>
-#include <machine/atomic.h>
-
 #ifdef _KERNEL
+#include <sys/param.h>
 #include <sys/systm.h>
+#include <sys/lock.h>
+#include <sys/rwlock.h>
+#include <sys/sx.h>
 #else
 #define	KASSERT(exp, msg)	/* */
 #endif
 
+#include <sys/limits.h>
+#include <machine/atomic.h>
+
 static __inline void
 refcount_init(volatile u_int *count, u_int value)
 {
@@ -64,4 +68,36 @@ refcount_release(volatile u_int *count)
 	return (old == 1);
 }
 
+#ifdef _KERNEL
+
+#define	REFCOUNT_RELEASE_LOCK_DEFINE(NAME, TYPE, LOCK, UNLOCK)		\
+static __inline int							\
+refcount_release_lock_##NAME(volatile u_int *count, TYPE *v)		\
+{									\
+	u_int old;							\
+									\
+	old = *count;							\
+	if (old > 1 && atomic_cmpset_int(count, old, old - 1))		\
+		return (0);						\
+	LOCK(v);							\
+	if (refcount_release(count))					\
+		return (1);						\
+	UNLOCK(v);							\
+	return (0);							\
+}
+
+REFCOUNT_RELEASE_LOCK_DEFINE(sx, struct sx, sx_xlock, sx_xunlock);
+
+#ifdef _SYS_RWLOCK_H_
+REFCOUNT_RELEASE_LOCK_DEFINE(rwlock, struct rwlock, rw_wlock, rw_wunlock);
+#else
+/*
+ * A hack to resolve header conflict with opensolaris which provides its own
+ * rwlock.h
+ */
+#define	refcount_release_lock_rwlock CTASSERT(0, "not implemented")
+#endif /* ! _SYS_RWLOCK_H_ */
+
+#endif /* ! _KERNEL */
+
 #endif	/* ! __SYS_REFCOUNT_H__ */

-- 
Mateusz Guzik <mjguzik gmail.com>

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 22:42:44 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A1BEB160;
 Mon, 27 Oct 2014 22:42:44 +0000 (UTC)
Received: from mail-qg0-x232.google.com (mail-qg0-x232.google.com
 [IPv6:2607:f8b0:400d:c04::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4A72CF60;
 Mon, 27 Oct 2014 22:42:44 +0000 (UTC)
Received: by mail-qg0-f50.google.com with SMTP id a108so2211113qge.9
 for <multiple recipients>; Mon, 27 Oct 2014 15:42:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=81sHLuCIEiotPGVwX5CPfWztVOFK7F60wev327HAAJI=;
 b=LZMdtoqrDn4uhsyw49gOPU8ageb9V8mkCTGsTZISnIjD507arw0iwJX4quUvDD4hn1
 A0pvJgIzZArLKhOc2yrgxPFWRcXxzQZkuOHPYdw48F9dh7sOCpWziX2ffNTOjB4ycluK
 b0zLyjVtbfO4QTGiEBI93hN83HQdWhY92asEKS8f8tWgzb22CXN7zYwfgpHrB0gbEnhq
 B5vjP8TxY0f/Q4aeIbt/0QEY5mUE+GtZFCiC+cryjLIoLZtEtdwFpQlJ58auDApI9nz/
 yMAocPS++k+V9bANXIQSX/kndehR93k9HQN6na6nQ7cP3Ro/C6Xgn6wCcoOgQmK2m/aN
 0VSQ==
MIME-Version: 1.0
X-Received: by 10.140.44.8 with SMTP id f8mr36255477qga.105.1414449763302;
 Mon, 27 Oct 2014 15:42:43 -0700 (PDT)
Received: by 10.140.23.242 with HTTP; Mon, 27 Oct 2014 15:42:43 -0700 (PDT)
In-Reply-To: <544E7376.6040002@rice.edu>
References: <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com>
 <CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt+ooD+3pzq+FV++TQ@mail.gmail.com>
 <CAFHCsPWq9WqeFnx1a+StfSxj=jwcE9GPyVsoyh0+azr3HmM6vQ@mail.gmail.com>
 <5428AF3B.1030906@rice.edu>
 <CAFHCsPWxF0G+bqBYgxH=WtV+St_UTWZj+Y2-PHfoYSLjC_Qpig@mail.gmail.com>
 <54497DC1.5070506@rice.edu>
 <CAFHCsPVj3PGbkSmkKsd2bGvmh3+dZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com>
 <544DED4C.3010501@rice.edu>
 <CAFHCsPV1H6XsOoDFitQFgJOP6u+giEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com>
 <544E7376.6040002@rice.edu>
Date: Mon, 27 Oct 2014 23:42:43 +0100
Message-ID: <CAFHCsPX_ukk+_8Lrnj7svnhb4Mz+GViOwtaKk_r6S_Fo7gfHGg@mail.gmail.com>
Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE
From: Svatopluk Kraus <onwahe@gmail.com>
To: Alan Cox <alc@rice.edu>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 22:42:44 -0000

On Mon, Oct 27, 2014 at 5:31 PM, Alan Cox <alc@rice.edu> wrote:
>
> On 10/27/2014 08:22, Svatopluk Kraus wrote:
>
>
> On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox <alc@rice.edu> wrote:
>>
>> On 10/24/2014 06:33, Svatopluk Kraus wrote:
>>
>>
>> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu> wrote:
>>>
>>> On 10/08/2014 10:38, Svatopluk Kraus wrote:
>>> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu> wrote:
>>> >
>>> >>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>>> >>
>>> >>
>>> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com>
wrote:
>>> >>
>>> >>>
>>> >>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I and Michal are finishing new ARM pmap-v6 code. There is one
problem
>>> >>>> we've
>>> >>>> dealt with somehow, but now we would like to do it better. It's
about
>>> >>>> physical pages which are allocated before vm subsystem is
initialized.
>>> >>>> While later on these pages could be found in vm_page_array when
>>> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>>> >>>> VM_PHYSSEG_SPARSE
>>> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>>> >>>>
>>> >>>> It really would be nice to utilize vm_page_array for such
preallocated
>>> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used.
Things
>>> >>>> could be much easier then. In our case, it's about pages which are
used
>>> >>>> for
>>> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets
of such
>>> >>>> pages. First ones are preallocated and second ones are allocated
after vm
>>> >>>> subsystem was inited. We must deal with each set differently. So
code is
>>> >>>> more complex and so is debugging.
>>> >>>>
>>> >>>> Thus we need some method how to say that some part of physical
memory
>>> >>>> should be included in vm_page_array, but the pages from that region
>>> >>>> should
>>> >>>> not be put to free list during initialization. We think that such
>>> >>>> possibility could be utilized in general. There could be a need
for some
>>> >>>> physical space which:
>>> >>>>
>>> >>>> (1) is needed only during boot and later on it can be freed and
put to vm
>>> >>>> subsystem,
>>> >>>>
>>> >>>> (2) is needed for something else and vm_page_array code could be
used
>>> >>>> without some kind of its duplication.
>>> >>>>
>>> >>>> There is already some code which deals with blacklisted pages in
>>> >>>> vm_page.c
>>> >>>> file. So the easiest way how to deal with presented situation is
to add
>>> >>>> some callback to this part of code which will be able to either
exclude
>>> >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the
>>> >>>> biggest
>>> >>>> phys_avail region is used for vm subsystem allocations, there
should be
>>> >>>> some more coding. (However, blacklisted pages are not dealt with
on that
>>> >>>> part of region.)
>>> >>>>
>>> >>>> We would like to know if there is any objection:
>>> >>>>
>>> >>>> (1) to deal with presented problem,
>>> >>>> (2) to deal with the problem presented way.
>>> >>>> Some help is very appreciated. Thanks
>>> >>>>
>>> >>>>
>>> >>> As an experiment, try modifying vm_phys.c to use dump_avail instead
of
>>> >>> phys_avail when sizing vm_page_array.  On amd64, where the same
problem
>>> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now, this
is
>>> >>> probably my preferred solution.  The catch being that not all
architectures
>>> >>> implement dump_avail, but my recollection is that arm does.
>>> >>>
>>> >> Frankly, I would prefer this too, but there is one big open question:
>>> >>
>>> >> What is dump_avail for?
>>> >>
>>> >>
>>> >>
>>> >> dump_avail[] is solving a similar problem in the minidump code,
hence, the
>>> >> prefix "dump_" in its name.  In other words, the minidump code
couldn't use
>>> >> phys_avail[] either because it didn't describe the full range of
physical
>>> >> addresses that might be included in a minidump, so dump_avail[] was
created.
>>> >>
>>> >> There is already precedent for what I'm suggesting.  dump_avail[] is
>>> >> already (ab)used outside of the minidump code on x86 to solve this
same
>>> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>>> >>
>>> >>
>>> >>  Using it for vm_page_array initialization and segmentation means
that
>>> >> phys_avail must be a subset of it. And this must be stated and be
visible
>>> >> enough. Maybe it should be even checked in code. I like the idea of
>>> >> thinking about dump_avail as something what desribes all memory in a
>>> >> system, but it's not how dump_avail is defined in archs now.
>>> >>
>>> >>
>>> >>
>>> >> When you say "it's not how dump_avail is defined in archs now", I'm
not
>>> >> sure whether you're talking about the code or the comments.  In
terms of
>>> >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware
of any
>>> >> code that would have to change.  In terms of comments, I did a grep
looking
>>> >> for comments defining what dump_avail[] is, because I couldn't
remember
>>> >> any.  I found one ... on arm.  So, I don't think it's a onerous task
>>> >> changing the definition of dump_avail[].  :-)
>>> >>
>>> >> Already, as things stand today with dump_avail[] being used outside
of the
>>> >> minidump code, one could reasonably argue that it should be renamed
to
>>> >> something like phys_exists[].
>>> >>
>>> >>
>>> >>
>>> >> I will experiment with it on monday then. However, it's not only
about how
>>> >> memory segments are created in vm_phys.c, but it's about how
vm_page_array
>>> >> size is computed in vm_page.c too.
>>> >>
>>> >>
>>> >>
>>> >> Yes, and there is also a place in vm_reserv.c that needs to change.
  I've
>>> >> attached the patch that I developed and tested a long time ago.  It
still
>>> >> applies cleanly and runs ok on amd64.
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> > Well, I've created and tested minimalistic patch which - I hope - is
>>> > commitable. It runs ok on pandaboard (arm-v6) and solves presented
problem.
>>> > I would really appreciate if this will be commited. Thanks.
>>>
>>>
>>> Sorry for the slow reply.  I've just been swamped with work lately.  I
>>> finally had some time to look at this in the last day or so.
>>>
>>> The first thing that I propose to do is commit the attached patch.  This
>>> patch changes pmap_init() on amd64, armv6, and i386 so that it no longer
>>> consults phys_avail[] to determine the end of memory.  Instead, it calls
>>> a new function provided by vm_phys.c to obtain the same information from
>>> vm_phys_segs[].
>>>
>>> With this change, the new variable phys_managed in your patch wouldn't
>>> need to be a global.  It could be a local variable in vm_page_startup()
>>> that we pass as a parameter to vm_phys_init() and vm_reserv_init().
>>>
>>> More generally, the long-term vision that I have is that we would stop
>>> using phys_avail[] after vm_page_startup() had completed.  It would only
>>> be used during initialization.  After that we would use vm_phys_segs[]
>>> and functions provided by vm_phys.c.
>>
>>
>> I understand. The patch and the long-term vision are fine for me. I just
was not to bold to pass phys_managed as a parameter to vm_phys_init() and
vm_reserv_init(). However, I certainly was thinking about it. While reading
comment above vm_phys_get_end(), do we care of if last usable address is
0xFFFFFFFF?
>>
>>
>>
>> To date, this hasn't been a problem.  However, handling 0xFFFFFFFF is
easy.  So, the final version of the patch that I committed this weekend
does so.
>>
>> Can you please try the attached patch?  It replaces phys_avail[] with
vm_phys_segs[] in arm's busdma.
>
>
>
> It works fine on arm-v6 pandaboard. I have no objection to commit it.
However, it's only 1:1 replacement.
>
>
>
> Right now, yes.  However, once your patch is committed, it won't be 1:1
anymore, because vm_phys_segs[] will be populated based on dump_avail[]
rather than phys_avail[].
>
> My interpretation of the affected code is that using the ranges defined
by dump_avail[] is actually closer to what this code intended.
>


True in both cases. As you said, it's closer.


>
> In fact, I still keep the following pattern in my head:
>
> present memory in system <=> all RAM and whatsoever
> nobounce memory <=> addressable by DMA
>
>
>
> In general, I don't see how this can be an attribute of the memory,
because it's going to depend on the device.  In other words, a given
physical address may require bouncing for some device but not all devices.
>


True again. I was thinking about it like some common property along all DMA
devices on platform. If it's not that, but test for present RAM, then
dump_avail[] is closer. However, again, does dump_avail[] represent all
present RAM?


>
>
> managed memory by vm subsystem  <=> i.e. kept in vm_page_array
> available memory for vm subsystem <=> can be allocated
>
> So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it
could be too much limiting in some scenarios. I would like to see something
different in exclusion_bounce_check() in the future. Something what
reflects NOBOUNCE property and not NOALLOC one like now.
>
>
>>
>>
>>
>>
>> Do you think that the rest of my patch considering changes due to your
patch is ok?
>>
>>
>>
>>
>> Basically, yes.  I do, however, think that
>>
>> +#if defined(__arm__)
>> +       phys_managed = dump_avail;
>> +#else
>> +       phys_managed = phys_avail;
>> +#endif
>>
>> should also be conditioned on VM_PHYSSEG_SPARSE.
>
>
>
>
> So I've prepared new patch. phys_managed[] is passed to vm_phys_init()
and vm_reserv_init() as a parameter and small optimalization is made in
vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you
mentioned. Anyhow, I still think that this is only temporary hack. In
general, phys_managed[] should always be distinguished from phys_avail[].
>
>
>>
>>
>>>
>>> >
>>> > BTW, while I was inspecting all archs, I think that maybe it's time
to do
>>> > what was done for busdma not long ago. There are many similar codes
across
>>> > archs which deal with physical memory and could be generalized and
put to
>>> > kern/subr_physmem.c for utilization. All work with physical memory
could be
>>> > simplify to two arrays of regions.
>>> >
>>> > phys_present[] ... describes all present physical memory regions
>>> > phys_exclude[] ... describes various exclusions from phys_present[]
>>> >
>>> > Each excluded region will be labeled by flags to say what kind of
exclusion
>>> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW
 could
>>> > be combined. This idea is taken from sys/arm/arm/physmem.c.
>>> >
>>> > All other arrays like phys_managed[], phys_avail[], dump_avail[] will
be
>>> > created from these phys_present[] and phys_exclude[].
>>> > This way bootstrap codes in archs could be simplified and unified. For
>>> > example, dealing with either hw.physmem or page with PA 0x00000000
could be
>>> > transparent.
>>> >
>>> > I'm prepared to volunteer if the thing is ripe. However, some tutor
will be
>>> > looked for.
>>>
>>>
>>> I've never really looked at arm/arm/physmem.c before.  Let me do that
>>> before I comment on this.
>>>
>> No problem. This could be long-term aim. However, I hope the
VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present time.
In every case, thanks for your help.
>>
>>
>>
>>
>
>

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 22:49:07 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2C8FC45C;
 Mon, 27 Oct 2014 22:49:07 +0000 (UTC)
Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com
 [IPv6:2a00:1450:400c:c05::233])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9586DFA2;
 Mon, 27 Oct 2014 22:49:06 +0000 (UTC)
Received: by mail-wi0-f179.google.com with SMTP id h11so5822112wiw.12
 for <multiple recipients>; Mon, 27 Oct 2014 15:49:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:mime-version:content-type
 :content-disposition:user-agent;
 bh=c8u/AtI7w58RsWcr9Oj+CGdSjTpVcz3VkhOWG2wJlt0=;
 b=jKDs6FvPZ9kRQDcBXDlNu2pjR6jMX2uyvPzJsUy/DRI1w0OdgxrzG56wKwU11GLriz
 LkfbkqkzND6BjAZIh96eVgDzUGNlQqnzVnqNm3PdHAIxljIaLnwQ/63W/8pEjKvXJNti
 DNwPNGng/qOP41zHHXlgVIIUnXgCpWpkOolIyRdwd8YuGKCDEXbv5Kl+08J74kw5FCYl
 uuIt1OUeA8oWguFAmz62uJTLB8IBvqoeRzcQyxUQg1rH/hZ9URU09kRpUCN8+BGHzMH6
 Jq9w7oa13fxp4cZoTz6FgKLLjpR2UipN2zW/XaMRRqDw+9p2vda3SovnhHJtO2F56jqt
 Y8Lg==
X-Received: by 10.180.212.48 with SMTP id nh16mr354104wic.50.1414450144706;
 Mon, 27 Oct 2014 15:49:04 -0700 (PDT)
Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net.
 [2001:470:1f08:1f7::2])
 by mx.google.com with ESMTPSA id u4sm13361327wiy.9.2014.10.27.15.49.03
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Mon, 27 Oct 2014 15:49:04 -0700 (PDT)
Date: Mon, 27 Oct 2014 23:49:01 +0100
From: Mateusz Guzik <mjguzik@gmail.com>
To: freebsd-arch@freebsd.org
Subject: amd64 modules still use atomics as callable functions
Message-ID: <20141027224901.GC28049@dft-labs.eu>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Konstantin Belousov <kib@FreeBSD.org>, Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 22:49:07 -0000

Turns out several years ago the kernel was modified to provide actual
functions for atomic operations and modules are always using them.

I propose plugging it on amd64 in head.

For stable/10 we can always provide them, but inline in modules by default
(testing a KLD_WANT_ATOMIC_FUNC knob?).

diff --git a/sys/amd64/amd64/atomic.c b/sys/amd64/amd64/atomic.c
deleted file mode 100644
index 1b4ff7e..0000000
--- a/sys/amd64/amd64/atomic.c
+++ /dev/null
@@ -1,49 +0,0 @@
-/*-
- * Copyright (c) 1999 Peter Jeremy
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *    notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *    notice, this list of conditions and the following disclaimer in the
- *    documentation and/or other materials provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
- * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- */
-
-#include <sys/cdefs.h>
-__FBSDID("$FreeBSD$");
-
-/* This file creates publically callable functions to perform various
- * simple arithmetic on memory which is atomic in the presence of
- * interrupts and multiple processors.
- */
-#include <sys/types.h>
-
-/* Firstly make atomic.h generate prototypes as it will for kernel modules */
-#define KLD_MODULE
-#include <machine/atomic.h>
-#undef _MACHINE_ATOMIC_H_	/* forget we included it */
-#undef KLD_MODULE
-#undef ATOMIC_ASM
-
-/* Make atomic.h generate public functions */
-#define WANT_FUNCTIONS
-#define static
-#undef __inline
-#define __inline
-
-#include <machine/atomic.h>
diff --git a/sys/amd64/include/atomic.h b/sys/amd64/include/atomic.h
index 9110dc5..e7e1735 100644
--- a/sys/amd64/include/atomic.h
+++ b/sys/amd64/include/atomic.h
@@ -69,28 +69,7 @@
  * The above functions are expanded inline in the statically-linked
  * kernel.  Lock prefixes are generated if an SMP kernel is being
  * built.
- *
- * Kernel modules call real functions which are built into the kernel.
- * This allows kernel modules to be portable between UP and SMP systems.
  */
-#if defined(KLD_MODULE) || !defined(__GNUCLIKE_ASM)
-#define	ATOMIC_ASM(NAME, TYPE, OP, CONS, V)			\
-void atomic_##NAME##_##TYPE(volatile u_##TYPE *p, u_##TYPE v);	\
-void atomic_##NAME##_barr_##TYPE(volatile u_##TYPE *p, u_##TYPE v)
-
-int	atomic_cmpset_int(volatile u_int *dst, u_int expect, u_int src);
-int	atomic_cmpset_long(volatile u_long *dst, u_long expect, u_long src);
-u_int	atomic_fetchadd_int(volatile u_int *p, u_int v);
-u_long	atomic_fetchadd_long(volatile u_long *p, u_long v);
-int	atomic_testandset_int(volatile u_int *p, u_int v);
-int	atomic_testandset_long(volatile u_long *p, u_int v);
-
-#define	ATOMIC_LOAD(TYPE, LOP)					\
-u_##TYPE	atomic_load_acq_##TYPE(volatile u_##TYPE *p)
-#define	ATOMIC_STORE(TYPE)					\
-void		atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v)
-
-#else /* !KLD_MODULE && __GNUCLIKE_ASM */
 
 /*
  * For userland, always use lock prefixes so that the binaries will run
@@ -293,8 +272,6 @@ struct __hack
 
 #endif /* _KERNEL && !SMP */
 
-#endif /* KLD_MODULE || !__GNUCLIKE_ASM */
-
 ATOMIC_ASM(set,	     char,  "orb %b1,%0",  "iq",  v);
 ATOMIC_ASM(clear,    char,  "andb %b1,%0", "iq", ~v);
 ATOMIC_ASM(add,	     char,  "addb %b1,%0", "iq",  v);
diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64
index 9e5a2ed..0749b05 100644
--- a/sys/conf/files.amd64
+++ b/sys/conf/files.amd64
@@ -91,7 +91,6 @@ acpi_wakedata.h			optional	acpi			\
 #
 amd64/amd64/amd64_mem.c		optional	mem
 #amd64/amd64/apic_vector.S	standard
-amd64/amd64/atomic.c		standard
 amd64/amd64/autoconf.c		standard
 amd64/amd64/bios.c		standard
 amd64/amd64/bpf_jit_machdep.c	optional	bpf_jitter

-- 
Mateusz Guzik <mjguzik gmail.com>

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 23:14:02 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id ADD6FA2C
 for <arch@FreeBSD.org>; Mon, 27 Oct 2014 23:14:02 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "funkthat.com", Issuer "funkthat.com" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7FCD0311
 for <arch@FreeBSD.org>; Mon, 27 Oct 2014 23:14:02 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9RNE18v076592
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
 for <arch@FreeBSD.org>; Mon, 27 Oct 2014 16:14:01 -0700 (PDT)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9RNE1ZV076591
 for arch@FreeBSD.org; Mon, 27 Oct 2014 16:14:01 -0700 (PDT)
 (envelope-from jmg)
Date: Mon, 27 Oct 2014 16:14:01 -0700
From: John-Mark Gurney <jmg@funkthat.com>
To: arch@FreeBSD.org
Subject: boot man pages installed four times..
Message-ID: <20141027231401.GQ82214@funkthat.com>
Mail-Followup-To: arch@FreeBSD.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Mon, 27 Oct 2014 16:14:01 -0700 (PDT)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 23:14:02 -0000

So, our loader man pages are currently installed four different times
during installworld...  Once each durning sys/boot/userboot/userboot,
sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader

This is because sys/boot/common/Makefile.inc defines the man pages, and
each of these locations include that Makefile...

It seems like the logical thing to do is to create a sys/boot/man that
only installed man pages...  This will partly move us to always
installing all man pages on all archs...

Comments?

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-arch@FreeBSD.ORG  Mon Oct 27 23:34:34 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B61E1EB6;
 Mon, 27 Oct 2014 23:34:34 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "funkthat.com", Issuer "funkthat.com" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 84065758;
 Mon, 27 Oct 2014 23:34:34 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9RNYXVK076794
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 27 Oct 2014 16:34:33 -0700 (PDT)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9RNYXv0076793;
 Mon, 27 Oct 2014 16:34:33 -0700 (PDT) (envelope-from jmg)
Date: Mon, 27 Oct 2014 16:34:33 -0700
From: John-Mark Gurney <jmg@funkthat.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Subject: Re: amd64 modules still use atomics as callable functions
Message-ID: <20141027233432.GR82214@funkthat.com>
Mail-Followup-To: Mateusz Guzik <mjguzik@gmail.com>,
 freebsd-arch@freebsd.org, Konstantin Belousov <kib@freebsd.org>,
 Alan Cox <alc@rice.edu>
References: <20141027224901.GC28049@dft-labs.eu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141027224901.GC28049@dft-labs.eu>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Mon, 27 Oct 2014 16:34:33 -0700 (PDT)
Cc: Alan Cox <alc@rice.edu>, Konstantin Belousov <kib@freebsd.org>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2014 23:34:34 -0000

Mateusz Guzik wrote this message on Mon, Oct 27, 2014 at 23:49 +0100:
> Turns out several years ago the kernel was modified to provide actual
> functions for atomic operations and modules are always using them.
> 
> I propose plugging it on amd64 in head.

It'd be interesting to measure the difference between making the call,
and the cost of the lock prefix...

On modern processors, according to instruction_tables.pdf, the lock
prefix costs between 5 and 45 cycles..  It could be more on older
processors...  Though another references says that a function call
over head is in the 7-9 cycle range, so w/o measuring, I'm not so sure
this is a good idea...

Originally I was in favor of this, as the number of amd64 systems that
aren't SMP aware are getting rarer by the day...  But, considering that
many locking ops (if contended) will take a lot longer, I'm not so
sure that the inline call will save you that much..

It'd be useful to see a comparision between:
LOCK'd inlined
LOCK'd via function call
non-LOCK'd inlined

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 00:25:21 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 1E25AA2B
 for <arch@freebsd.org>; Tue, 28 Oct 2014 00:25:21 +0000 (UTC)
Received: from mail-ie0-x231.google.com (mail-ie0-x231.google.com
 [IPv6:2607:f8b0:4001:c03::231])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E4D6AC76
 for <arch@freebsd.org>; Tue, 28 Oct 2014 00:25:20 +0000 (UTC)
Received: by mail-ie0-f177.google.com with SMTP id tp5so5506143ieb.36
 for <arch@freebsd.org>; Mon, 27 Oct 2014 17:25:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :content-type; bh=gJ7dZyQEz5+B2Rh5iXPRmvr1nx5Yaq7o8rULuVScrxE=;
 b=mBXg2SXjDnNj18gGW1yQf1ZOm/uD9b+5dUxQ+dr76FHHXzQdfGLg2Xg5jwN9/N8WsE
 UBiInq5xdm2I3Ho2W1VOXS2lfis83VODW6Fo7LeP3Pe8nCnNsWgPNjraAfGIrPnFhfHl
 W25H00pfka0BP+BOUPGScpJpXhid6A6fRUhqe8UGF0Bm6TTz/A4LgbMrPr0tx25eiG6H
 7UnbvIDf8MRgpzWRfFaF78x5AtpDzsf6LPfv4tSiUrIWo4M4yJ3X75ts5gC9VJw9L0Og
 NH3EJpR7KwRaqFW6WAgOFpi6v9CBxI1QHw2hTVjxx6BQMuDD0tq/41BnJwcCZrDcfO6B
 gzGQ==
MIME-Version: 1.0
X-Received: by 10.107.29.209 with SMTP id d200mr6759792iod.57.1414455920206;
 Mon, 27 Oct 2014 17:25:20 -0700 (PDT)
Received: by 10.50.193.135 with HTTP; Mon, 27 Oct 2014 17:25:20 -0700 (PDT)
In-Reply-To: <20141027231401.GQ82214@funkthat.com>
References: <20141027231401.GQ82214@funkthat.com>
Date: Mon, 27 Oct 2014 17:25:20 -0700
Message-ID: <CAGHfRMApPwy4wB0Wb29kjoXD8W=sJTjRcHHDtuVK-dqk18HpbA@mail.gmail.com>
Subject: Re: boot man pages installed four times..
From: NGie Cooper <yaneurabeya@gmail.com>
To: "freebsd-arch@FreeBSD.org Arch" <arch@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 00:25:21 -0000

On Mon, Oct 27, 2014 at 4:14 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
> So, our loader man pages are currently installed four different times
> during installworld...  Once each durning sys/boot/userboot/userboot,
> sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader
>
> This is because sys/boot/common/Makefile.inc defines the man pages, and
> each of these locations include that Makefile...
>
> It seems like the logical thing to do is to create a sys/boot/man that
> only installed man pages...  This will partly move us to always
> installing all man pages on all archs...

Should this manpages just be installed as part of
share/man/man<section> instead?
Cheers!

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 01:23:33 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C202D9E5
 for <arch@freebsd.org>; Tue, 28 Oct 2014 01:23:33 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "funkthat.com", Issuer "funkthat.com" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7E568376
 for <arch@freebsd.org>; Tue, 28 Oct 2014 01:23:32 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9S1NV3v077964
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Mon, 27 Oct 2014 18:23:31 -0700 (PDT)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9S1NVqo077963;
 Mon, 27 Oct 2014 18:23:31 -0700 (PDT) (envelope-from jmg)
Date: Mon, 27 Oct 2014 18:23:31 -0700
From: John-Mark Gurney <jmg@funkthat.com>
To: NGie Cooper <yaneurabeya@gmail.com>
Subject: Re: boot man pages installed four times..
Message-ID: <20141028012331.GT82214@funkthat.com>
Mail-Followup-To: NGie Cooper <yaneurabeya@gmail.com>,
 "freebsd-arch@FreeBSD.org Arch" <arch@freebsd.org>
References: <20141027231401.GQ82214@funkthat.com>
 <CAGHfRMApPwy4wB0Wb29kjoXD8W=sJTjRcHHDtuVK-dqk18HpbA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAGHfRMApPwy4wB0Wb29kjoXD8W=sJTjRcHHDtuVK-dqk18HpbA@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Mon, 27 Oct 2014 18:23:31 -0700 (PDT)
Cc: "freebsd-arch@FreeBSD.org Arch" <arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 01:23:33 -0000

NGie Cooper wrote this message on Mon, Oct 27, 2014 at 17:25 -0700:
> On Mon, Oct 27, 2014 at 4:14 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
> > So, our loader man pages are currently installed four different times
> > during installworld...  Once each durning sys/boot/userboot/userboot,
> > sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader
> >
> > This is because sys/boot/common/Makefile.inc defines the man pages, and
> > each of these locations include that Makefile...
> >
> > It seems like the logical thing to do is to create a sys/boot/man that
> > only installed man pages...  This will partly move us to always
> > installing all man pages on all archs...
> 
> Should this manpages just be installed as part of
> share/man/man<section> instead?

That would involve moving the man pages from sys/boot into share/man
which IMO doesn't make much sense...  Yes, they could be installed
from where ever we want, but they are usually installed from where
they reside..

Looks like only atf is installing from share/man when their pages are
located else where...  We shouldn't introduce more, and atf should be
fixed...  and it's only doing it for two man pages...

Hmm... atf-test-case.4 seems to be in the wrong section too...  section
for is for devices and device drivers, but atf-test-case doesn't have
any relation to the kernel...  It should probably be moved into section
7...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 01:38:44 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9AAB7D8B
 for <arch@freebsd.org>; Tue, 28 Oct 2014 01:38:44 +0000 (UTC)
Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com
 [IPv6:2607:f8b0:4001:c05::22b])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 623C7674
 for <arch@freebsd.org>; Tue, 28 Oct 2014 01:38:44 +0000 (UTC)
Received: by mail-ig0-f171.google.com with SMTP id l13so7520176iga.16
 for <arch@freebsd.org>; Mon, 27 Oct 2014 18:38:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :content-type; bh=9zJD3siXNgT+6dlgmvHbJF8qc/tQvRviFPp+YFrcBgQ=;
 b=RvicKfs2+EX/+9o00gAQ13SsQKdSGmIy5UtPqu4ZRlJPdn6s2+va63yDm9vA5I+aw1
 3H5cC0eNGpQCPT9OOQEjzeqnXPAc+5YuDcUCZ2Y3nI5vaIhGu27uSPaUiTz6a0pXAo+K
 8peMj20p2q+ZpQPEUUt6cxijfNeA4B4EmBw+ZAtOz7ZO8mH2qNqp8NIgwhE+Yp0mu2Fa
 UQz2XATCJhUPKfwG8Aes7tUK5o3bgV+IIhkp1aVJoD5SKkdTbvhV+OdN1KAQgvN75lq0
 mhIHpgI4hg3fg6ndlI34c2lshcYXW+mxvqKhMHdXxKR24pkvSAw6RqXytPz927py9ivF
 JPBw==
MIME-Version: 1.0
X-Received: by 10.107.18.1 with SMTP id a1mr115739ioj.83.1414460323805; Mon,
 27 Oct 2014 18:38:43 -0700 (PDT)
Received: by 10.50.193.135 with HTTP; Mon, 27 Oct 2014 18:38:43 -0700 (PDT)
In-Reply-To: <20141028012331.GT82214@funkthat.com>
References: <20141027231401.GQ82214@funkthat.com>
 <CAGHfRMApPwy4wB0Wb29kjoXD8W=sJTjRcHHDtuVK-dqk18HpbA@mail.gmail.com>
 <20141028012331.GT82214@funkthat.com>
Date: Mon, 27 Oct 2014 18:38:43 -0700
Message-ID: <CAGHfRMBDdnNZyy=zM+mvNiTcEAjP9SXfDnJtwwa2-08Os5RMvw@mail.gmail.com>
Subject: Re: boot man pages installed four times..
From: NGie Cooper <yaneurabeya@gmail.com>
To: NGie Cooper <yaneurabeya@gmail.com>, 
 "freebsd-arch@FreeBSD.org Arch" <arch@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 01:38:44 -0000

On Mon, Oct 27, 2014 at 6:23 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
> NGie Cooper wrote this message on Mon, Oct 27, 2014 at 17:25 -0700:
>> On Mon, Oct 27, 2014 at 4:14 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
>> > So, our loader man pages are currently installed four different times
>> > during installworld...  Once each durning sys/boot/userboot/userboot,
>> > sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader
>> >
>> > This is because sys/boot/common/Makefile.inc defines the man pages, and
>> > each of these locations include that Makefile...
>> >
>> > It seems like the logical thing to do is to create a sys/boot/man that
>> > only installed man pages...  This will partly move us to always
>> > installing all man pages on all archs...
>>
>> Should this manpages just be installed as part of
>> share/man/man<section> instead?
>
> That would involve moving the man pages from sys/boot into share/man
> which IMO doesn't make much sense...  Yes, they could be installed
> from where ever we want, but they are usually installed from where
> they reside..
>
> Looks like only atf is installing from share/man when their pages are
> located else where...  We shouldn't introduce more, and atf should be
> fixed...  and it's only doing it for two man pages...
>
> Hmm... atf-test-case.4 seems to be in the wrong section too...  section
> for is for devices and device drivers, but atf-test-case doesn't have
> any relation to the kernel...  It should probably be moved into section
> 7...

Yes, I thought so too. Please file a bug and CC both jmmv and myself.
Thanks!

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 02:21:57 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3CB3D7B3
 for <arch@freebsd.org>; Tue, 28 Oct 2014 02:21:57 +0000 (UTC)
Received: from mail-yh0-f51.google.com (mail-yh0-f51.google.com
 [209.85.213.51])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id ED558AE5
 for <arch@freebsd.org>; Tue, 28 Oct 2014 02:21:56 +0000 (UTC)
Received: by mail-yh0-f51.google.com with SMTP id c41so2066826yho.10
 for <arch@freebsd.org>; Mon, 27 Oct 2014 19:21:55 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:sender:content-type:mime-version:subject:from
 :in-reply-to:date:cc:message-id:references:to;
 bh=EApchF/rk92Xl/FWh/6NIcjeVyIlBA9HVcxBjOp6Ytw=;
 b=XdF5nyzBfzN5qqNbXyXMJA9TRQ9XxoSsQPBq3CFjueUauHF1d9MCEti5NnoeNL5C3/
 Mrp9JTH3uNjsQHZ4zPy3ladOYiQ+AVNnRD/hiNcvg7AovzeWLC2liFizr4YwT+v+2FWJ
 TV94d8EuJMnb8plLhCIXnhh+rn7SH5K5MLyfqDEyNc0k10bmH3Gkm6l6vSI6wesOFtZe
 okg7mJYgYFwempfm9HiuG3tasJiKCFVbN53lYYR12HcG/NzM6ALHi0m5QbuvxzHSU07L
 SyNZ4nWPmgETkAR2HgZmhPl3fHjChD3GWlaUc5WKuTBU5gne0UczAyHyEwUKpGnfJOGQ
 dUGw==
X-Gm-Message-State: ALoCoQmNQmG1assvDlfXCUVFn3x1TdVKoRk82OaPL4OPrThlMk9hrP7ObEc9o9iDFgezm+gmVqb4
X-Received: by 10.236.47.196 with SMTP id t44mr156111yhb.59.1414462915751;
 Mon, 27 Oct 2014 19:21:55 -0700 (PDT)
Received: from [192.168.0.14] (173-18-133-79.client.mchsi.com. [173.18.133.79])
 by mx.google.com with ESMTPSA id c76sm84202yho.12.2014.10.27.19.21.55
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 27 Oct 2014 19:21:55 -0700 (PDT)
Sender: Warner Losh <wlosh@bsdimp.com>
Content-Type: multipart/signed;
 boundary="Apple-Mail=_D9BD063E-8869-471C-BDF4-29A8B1348628";
 protocol="application/pgp-signature"; micalg=pgp-sha512
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Subject: Re: boot man pages installed four times..
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <20141027231401.GQ82214@funkthat.com>
Date: Mon, 27 Oct 2014 21:16:56 -0500
Message-Id: <1EC3043C-72FD-4790-B833-8E89C39B3FB9@bsdimp.com>
References: <20141027231401.GQ82214@funkthat.com>
To: John-Mark Gurney <jmg@funkthat.com>
X-Mailer: Apple Mail (2.1878.6)
Cc: arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 02:21:57 -0000


--Apple-Mail=_D9BD063E-8869-471C-BDF4-29A8B1348628
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252


On Oct 27, 2014, at 6:14 PM, John-Mark Gurney <jmg@funkthat.com> wrote:

> So, our loader man pages are currently installed four different times
> during installworld...  Once each durning sys/boot/userboot/userboot,
> sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader
>=20
> This is because sys/boot/common/Makefile.inc defines the man pages, =
and
> each of these locations include that Makefile...
>=20
> It seems like the logical thing to do is to create a sys/boot/man that
> only installed man pages...  This will partly move us to always
> installing all man pages on all archs...
>=20
> Comments?

We should have a common set installed from a  new directory, and if =
there=92s
a need for variations we should install them from the current locations =
(and
make sure there=92s a cross ref from the common ones).

Warner


--Apple-Mail=_D9BD063E-8869-471C-BDF4-29A8B1348628
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJUTvyZAAoJEGwc0Sh9sBEAVgkQAJv+JRN2IRli1vP49SehN1SA
aIuRReJwZ7La6KghN00m67APJQh7J67s0e0PRKz+gA2G71zneQF7yPU4MHdIeCxk
oZGFtsGK2YPvI75xhDJY6iQKALy9eshNS5EXoHAznnf/VOX6eRPBkt/EfhO6eykb
zaX2zY6gvVMUddEL7s1CwCvKPkdlSxBrQRh6kfpyVO5SVKgP/f4dzRxuHIoiFhsq
chFliavKvUrRF/oFx+XORvkozQntqckn/NdPYsbj3a8DkF/Tl5vq4iW2bpjWRsz3
GiRsFZV3wa1slsjUgImkv4VSoFQGqVq8WRxYYdpUYFTPRlm9c9dHUWISmKSM4IdF
d/W3JoK6Jg95SglpzqIxTPXZ3JfYC6QD+zm/QUAs1XbFabe7qAY8TGqtvfSISWxL
IrxrDYQ89yTNMrG/P0zeGztmQfzLPXcW1aJlLUGBBcF6jTx5t7FgmH71KrC3u8FU
C8cm6mCE1YLDgColUapnBaD/QoQ4vpJuMTAxBnYGELdsDUVAE8PWdf6th6YifE74
bxM+z8dLUU5S1ie3icGjPrep9jNXysNGgmv4aq9OfH4QUcMT2R109fD5yt4M1v3J
zfrE2SOZsnk/izwUtLDlrtHHZXVN8IHevHneUE3vriU7+Sasm80I7KHI2SC7y/5K
7bK3DzrW8BmWPSqsVbRi
=6jce
-----END PGP SIGNATURE-----

--Apple-Mail=_D9BD063E-8869-471C-BDF4-29A8B1348628--

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 02:52:28 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A52EEBB1;
 Tue, 28 Oct 2014 02:52:28 +0000 (UTC)
Received: from mail-wg0-x22a.google.com (mail-wg0-x22a.google.com
 [IPv6:2a00:1450:400c:c00::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id C39D4D54;
 Tue, 28 Oct 2014 02:52:27 +0000 (UTC)
Received: by mail-wg0-f42.google.com with SMTP id k14so5028055wgh.13
 for <multiple recipients>; Mon, 27 Oct 2014 19:52:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:mime-version:content-type
 :content-disposition:user-agent;
 bh=xjJ6BspOw1GD30+/rPxX3KdG4RukQlnlA2cnx0tgkoI=;
 b=GcltBRlOf1fjMv4YV/+OEDg9P8zVdTizH8ZDcqIGFgAs1QMqo79OFJaGPZdSZTnhjc
 MytewOutgwS5DGlPcYoAJAC7YkyKWPXLRttXE7JO51vVEjTmTAX/3N7FOHRCzPkvtbxZ
 gEdxFgaKvMmnzsyTRWgWgkl+2Jb3B+hjAEHuUbRyChGsLJ0hH5ZQA9nGhy3NSxiqEcu4
 643hAeQCd9YNXsLXO/T2JPyCBpHtVVAWIvTHIX9ZYGPGMOcyOYRH0pNjB0iFYB0DcHs9
 N2lplH6PiVAqj+6osCZUOKAZ0VdTA9vPrMg+K+SGAsqp/w/ruqxswtzSsdCLz83tWtfr
 MhSg==
X-Received: by 10.180.90.65 with SMTP id bu1mr1089117wib.71.1414464746012;
 Mon, 27 Oct 2014 19:52:26 -0700 (PDT)
Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net.
 [2001:470:1f08:1f7::2])
 by mx.google.com with ESMTPSA id dc8sm593915wib.7.2014.10.27.19.52.24
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Mon, 27 Oct 2014 19:52:25 -0700 (PDT)
Date: Tue, 28 Oct 2014 03:52:22 +0100
From: Mateusz Guzik <mjguzik@gmail.com>
To: freebsd-arch@freebsd.org
Subject: atomic ops
Message-ID: <20141028025222.GA19223@dft-labs.eu>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Attilio Rao <attilio@FreeBSD.org>, adrian@freebsd.org,
 Konstantin Belousov <kib@FreeBSD.org>, Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 02:52:28 -0000

As was mentioned sometime ago, our situation related to atomic ops is
not ideal.

atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide
full memory barriers, which is stronger than needed.

Moreover, load is implemented as lock cmpchg on var address, so it is
addditionally slower especially when cpus compete.

On amd64 it is sufficient to place a compiler barrier in such cases.

Next, we lack some atomic ops in the first place.

Let's define some useful terms:
smp_wmb - no writes can be reordered past this point
smp_rmb - no reads can be reordered past this point

With this in mind, we lack ops which would guarantee only the following:

1. var = tmp; smp_wmb();
2. tmp = var; smp_rmb();
3. smp_rmb(); tmp = var;

This matters since what we can use already to emulate this is way
heavier than needed on aforementioned amd64 and most likely other archs.

It is unclear to me whether it makes sense to alter what
atomic_load_acq_* are currently doing.

The simplest thing would be to just introduce aforementioned macros.

Unfortunately I don't have any ideas for new function names.

I was considering stealing consumer/producer wording instead of acq/rel,
but that does not help with case 1.

Also there is no common header for atomic ops.

I propose adding sys/atomic.h which includes machine/atomic.h. Then it
would provide atomic ops missing from md header implemented using what
is already there.

For an example where it could be useful see
https://svnweb.freebsd.org/base/head/sys/sys/seq.h?view=markup

Comments?

And yes, I know that:
- atomic_load_acq_rmb_int is a terrible name and I'm trying to get rid
  of it
- seq_consistent misses a read memory barrier, but in worst case this
  will result in spurious ENOTCAPABLE returned. security problem of
  circumventing capabilities is plugged since seq is properly re-checked
  before we return

-- 
Mateusz Guzik <mjguzik gmail.com>

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 13:18:44 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9FDC94C1;
 Tue, 28 Oct 2014 13:18:44 +0000 (UTC)
Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com
 [IPv6:2a00:1450:400c:c05::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E626BA2D;
 Tue, 28 Oct 2014 13:18:43 +0000 (UTC)
Received: by mail-wi0-f175.google.com with SMTP id h11so7294853wiw.14
 for <multiple recipients>; Tue, 28 Oct 2014 06:18:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=FeHGoc80ObjiUvi06oKq+pUHv4vHaXaGrEZ7P6f5mec=;
 b=Xb9ZRDhWusFnS6KOK6De8hUD2/qpu2nu5kWjtchYMLsY0ye/G9QGLbLgZ4deZHRo7T
 hJ4ku3h9mmTfn35gZ0w+jN/cdBn5AyN6rDVDRhbQxYK2tojzTXP6Jd6JPpqT55WafK3L
 5sZD3s3CTwSFXG2lKJE3cV3N+ADZ3kaj2wqvI7Zg/BsCtKVBJ9DEX6LJp8MI0AUUPLWV
 IsbEWvAn9RYWZXdY5Uy7hHoB60/THTGHKxz2wu7VjA661ICWOjYpGyYKCS3zI+jbvOwz
 fGVwPCfTNFUYITy3EP+pRso3Q5Ptu/M+26TjaSQCU7GW+bhuE1wRqq3iD/PCqwQ0I48+
 KLKg==
MIME-Version: 1.0
X-Received: by 10.180.10.231 with SMTP id l7mr28262950wib.1.1414502321855;
 Tue, 28 Oct 2014 06:18:41 -0700 (PDT)
Reply-To: attilio@FreeBSD.org
Sender: asmrookie@gmail.com
Received: by 10.217.69.73 with HTTP; Tue, 28 Oct 2014 06:18:41 -0700 (PDT)
In-Reply-To: <20141028025222.GA19223@dft-labs.eu>
References: <20141028025222.GA19223@dft-labs.eu>
Date: Tue, 28 Oct 2014 14:18:41 +0100
X-Google-Sender-Auth: 1ORo-3u8UGc8pxN1KyKytrYDHKI
Message-ID: <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
Subject: Re: atomic ops
From: Attilio Rao <attilio@freebsd.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Content-Type: text/plain; charset=UTF-8
Cc: Adrian Chadd <adrian@freebsd.org>, Alan Cox <alc@rice.edu>,
 Konstantin Belousov <kib@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 13:18:44 -0000

On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com> wrote:
> As was mentioned sometime ago, our situation related to atomic ops is
> not ideal.
>
> atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide
> full memory barriers, which is stronger than needed.
>
> Moreover, load is implemented as lock cmpchg on var address, so it is
> addditionally slower especially when cpus compete.

I already explained this once privately: fully memory barriers is not
stronger than needed.
FreeBSD has a different semantic than Linux. We historically enforce a
full barrier on _acq() and _rel() rather then just a read and write
barrier, hence we need a different implementation than Linux.
There is code that relies on this property, like the locking
primitives (release a mutex, for instance).

In short: optimizing the implementation for performance is fine and
due. Changing the semantic is not fine, unless you have reviewed and
fixed all the uses of _rel() and _acq().

> On amd64 it is sufficient to place a compiler barrier in such cases.
>
> Next, we lack some atomic ops in the first place.
>
> Let's define some useful terms:
> smp_wmb - no writes can be reordered past this point
> smp_rmb - no reads can be reordered past this point
>
> With this in mind, we lack ops which would guarantee only the following:
>
> 1. var = tmp; smp_wmb();
> 2. tmp = var; smp_rmb();
> 3. smp_rmb(); tmp = var;
>
> This matters since what we can use already to emulate this is way
> heavier than needed on aforementioned amd64 and most likely other archs.

I can see the value of such barriers in case you want to just
synchronize operation regards read or writes.
I also believe that on newest intel processors (for which we should
optimize) rmb() and wmb() got significantly faster than mb(). However
the most interesting case would be for arm and mips, I assume. That's
where you would see a bigger perf difference if you optimize the
membar paths.

Last time I looked into it, in FreeBSD kernel the Linux-ish
rmb()/wmb()/etc. were used primilarly in 3 places: Linux-derived code,
handling of 16-bits operand and implementation of "faster" bus
barriers.
Initially I had thought about just confining the smp_*() in a Linux
compat layer and fix the other 2 in this way: for 16-bits operands
just pad to 32-bits, as the C11 standard also does. For the bus
barriers, just grow more versions to actually include the rmb()/wmb()
scheme within.

At this point, I understand we may want to instead  support the
concept of write-only or read-only barrier. This means that if we want
to keep the concept tied to the current _acq()/_rel() scheme we will
end up with a KPI explosion.

I'm not the one making the call here, but for a faster and more
granluar approach, possibly we can end up using smp_rmb() and
smp_wmb() directly. As I said I'm not the one making the call.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 13:43:04 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9194EDEF;
 Tue, 28 Oct 2014 13:43:04 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 34110D26;
 Tue, 28 Oct 2014 13:43:04 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9SDgto5027853
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 28 Oct 2014 15:42:55 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9SDgto5027853
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s9SDgtSQ027852;
 Tue, 28 Oct 2014 15:42:55 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Tue, 28 Oct 2014 15:42:54 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Subject: Re: atomic ops
Message-ID: <20141028134254.GD1877@kib.kiev.ua>
References: <20141028025222.GA19223@dft-labs.eu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141028025222.GA19223@dft-labs.eu>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: Attilio Rao <attilio@FreeBSD.org>, adrian@freebsd.org,
 Alan Cox <alc@rice.edu>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 13:43:04 -0000

On Tue, Oct 28, 2014 at 03:52:22AM +0100, Mateusz Guzik wrote:
> As was mentioned sometime ago, our situation related to atomic ops is
> not ideal.
> 
> atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide
> full memory barriers, which is stronger than needed.
x86 atomic_store_rel() does not establish any cpu barrier, due to the
already provided guarantees of the architecture.

> 
> Moreover, load is implemented as lock cmpchg on var address, so it is
> addditionally slower especially when cpus compete.
> 
> On amd64 it is sufficient to place a compiler barrier in such cases.
> 
> Next, we lack some atomic ops in the first place.
> 
> Let's define some useful terms:
> smp_wmb - no writes can be reordered past this point
> smp_rmb - no reads can be reordered past this point
> 
> With this in mind, we lack ops which would guarantee only the following:
> 
> 1. var = tmp; smp_wmb();
> 2. tmp = var; smp_rmb();
> 3. smp_rmb(); tmp = var;
> 
> This matters since what we can use already to emulate this is way
> heavier than needed on aforementioned amd64 and most likely other archs.
> 
> It is unclear to me whether it makes sense to alter what
> atomic_load_acq_* are currently doing.
I still think that our load/stores, comparing with the classic definition
of the operations, are ordered, i.e. what is called sequential consistent
in the C standard.  I have no idea if we want this property, or is it
used really.  The kern_intr.c (ab)uses load in this way.

> 
> The simplest thing would be to just introduce aforementioned macros.
> 
> Unfortunately I don't have any ideas for new function names.
> 
> I was considering stealing consumer/producer wording instead of acq/rel,
> but that does not help with case 1.
> 
> Also there is no common header for atomic ops.
> 
> I propose adding sys/atomic.h which includes machine/atomic.h. Then it
> would provide atomic ops missing from md header implemented using what
> is already there.
> 
> For an example where it could be useful see
> https://svnweb.freebsd.org/base/head/sys/sys/seq.h?view=markup
> 
> Comments?
> 
> And yes, I know that:
> - atomic_load_acq_rmb_int is a terrible name and I'm trying to get rid
>   of it
> - seq_consistent misses a read memory barrier, but in worst case this
>   will result in spurious ENOTCAPABLE returned. security problem of
>   circumventing capabilities is plugged since seq is properly re-checked
>   before we return
> 
> -- 
> Mateusz Guzik <mjguzik gmail.com>

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 14:25:25 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C00D9FA9;
 Tue, 28 Oct 2014 14:25:25 +0000 (UTC)
Received: from nibbler.fubar.geek.nz (nibbler.fubar.geek.nz [199.48.134.198])
 by mx1.freebsd.org (Postfix) with ESMTP id A2236252;
 Tue, 28 Oct 2014 14:25:25 +0000 (UTC)
Received: from bender.lan (97e078e7.skybroadband.com [151.224.120.231])
 by nibbler.fubar.geek.nz (Postfix) with ESMTPSA id 0E9225C692;
 Tue, 28 Oct 2014 14:25:16 +0000 (UTC)
Date: Tue, 28 Oct 2014 14:25:10 +0000
From: Andrew Turner <andrew@fubar.geek.nz>
To: Attilio Rao <attilio@freebsd.org>
Subject: Re: atomic ops
Message-ID: <20141028142510.10a9d3cb@bender.lan>
In-Reply-To: <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
References: <20141028025222.GA19223@dft-labs.eu>
 <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>,
 Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Konstantin Belousov <kib@freebsd.org>, Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 14:25:25 -0000

On Tue, 28 Oct 2014 14:18:41 +0100
Attilio Rao <attilio@freebsd.org> wrote:

> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
> wrote:
> > As was mentioned sometime ago, our situation related to atomic ops
> > is not ideal.
> >
> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide
> > full memory barriers, which is stronger than needed.
> >
> > Moreover, load is implemented as lock cmpchg on var address, so it
> > is addditionally slower especially when cpus compete.
> 
> I already explained this once privately: fully memory barriers is not
> stronger than needed.
> FreeBSD has a different semantic than Linux. We historically enforce a
> full barrier on _acq() and _rel() rather then just a read and write
> barrier, hence we need a different implementation than Linux.
> There is code that relies on this property, like the locking
> primitives (release a mutex, for instance).

On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
there are only full barriers. On both 32 and 64-bit ARMv8 ARM has added
support for load-acquire and store-release atomic instructions. For the
use in atomic instructions we can assume these only operate of the
address passed to them.

It is unlikely we will use them in the 32-bit port however I would like
to know the expected semantics of these atomic functions to make sure
we get them correct in the arm64 port. I have been advised by one of
the ARM Linux kernel maintainers on the problems they have found using
these instructions but have yet to determine what our atomic functions
guarantee.

Andrew

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 14:33:09 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2DD312B8;
 Tue, 28 Oct 2014 14:33:09 +0000 (UTC)
Received: from mail-wi0-x231.google.com (mail-wi0-x231.google.com
 [IPv6:2a00:1450:400c:c05::231])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6F5C136A;
 Tue, 28 Oct 2014 14:33:08 +0000 (UTC)
Received: by mail-wi0-f177.google.com with SMTP id ex7so1786198wid.10
 for <multiple recipients>; Tue, 28 Oct 2014 07:33:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=jzJCJ/x9TPlske+Ae9sJ+KKAiDx64WR+YwmL3PKINic=;
 b=T6fDJEjC5VO+EA+Bi+saaE6lDDC+1OQsIjywQtKA+wok5ZnyvQyCMCj5SJ+tCUWyle
 NbfieXvHbl9F5pT6w1LmXsuCWpXeLv0vVtpv16jCNyZhtcWJ1ybr5513H+6QqROxi0YH
 LD5UyCjouWbMqTWdqkF8vzjW74g/pkzECk2wLO3PZbUkgyrDiw9V2eDLcJ54PPJwhUs0
 mHoBNE1ishFSVkz7nF/BOeqb/iUNEWl3oHD9Idayn9sk+5IY4HTe6K6TbZ4jZD6UNo1C
 Q7Vcac9bKQXc7+jgGBkamr+JsIR6iR8mBZZ/Mn0gK99VeCGsY47ozaI+i2myxsv2xYLi
 TWsA==
MIME-Version: 1.0
X-Received: by 10.180.83.37 with SMTP id n5mr28839571wiy.7.1414506786594; Tue,
 28 Oct 2014 07:33:06 -0700 (PDT)
Reply-To: attilio@FreeBSD.org
Sender: asmrookie@gmail.com
Received: by 10.217.69.73 with HTTP; Tue, 28 Oct 2014 07:33:06 -0700 (PDT)
In-Reply-To: <20141028142510.10a9d3cb@bender.lan>
References: <20141028025222.GA19223@dft-labs.eu>
 <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
 <20141028142510.10a9d3cb@bender.lan>
Date: Tue, 28 Oct 2014 15:33:06 +0100
X-Google-Sender-Auth: ElSPvKB72y9f1cRQFz2uCY0dy7U
Message-ID: <CAJ-FndD=9MgK608ra8+eMy=cAdq+A0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com>
Subject: Re: atomic ops
From: Attilio Rao <attilio@freebsd.org>
To: Andrew Turner <andrew@fubar.geek.nz>
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>,
 Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Konstantin Belousov <kib@freebsd.org>, Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 14:33:09 -0000

On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz> wrote:
> On Tue, 28 Oct 2014 14:18:41 +0100
> Attilio Rao <attilio@freebsd.org> wrote:
>
>> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
>> wrote:
>> > As was mentioned sometime ago, our situation related to atomic ops
>> > is not ideal.
>> >
>> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide
>> > full memory barriers, which is stronger than needed.
>> >
>> > Moreover, load is implemented as lock cmpchg on var address, so it
>> > is addditionally slower especially when cpus compete.
>>
>> I already explained this once privately: fully memory barriers is not
>> stronger than needed.
>> FreeBSD has a different semantic than Linux. We historically enforce a
>> full barrier on _acq() and _rel() rather then just a read and write
>> barrier, hence we need a different implementation than Linux.
>> There is code that relies on this property, like the locking
>> primitives (release a mutex, for instance).
>
> On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
> there are only full barriers. On both 32 and 64-bit ARMv8 ARM has added
> support for load-acquire and store-release atomic instructions. For the
> use in atomic instructions we can assume these only operate of the
> address passed to them.
>
> It is unlikely we will use them in the 32-bit port however I would like
> to know the expected semantics of these atomic functions to make sure
> we get them correct in the arm64 port. I have been advised by one of
> the ARM Linux kernel maintainers on the problems they have found using
> these instructions but have yet to determine what our atomic functions
> guarantee.

For FreeBSD the "reference doc" is atomic(9).
It clearly states:

The second variant of each operation includes a read memory barrier.
This barrier ensures that the effects of this operation are completed
before the effects of any later data accesses.  As a result, the opera-
tion is said to have acquire semantics as it acquires a pseudo-lock
requiring further operations to wait until it has completed.  To denote
this, the suffix ``_acq'' is inserted into the function name immediately
prior to the ``_<type>'' suffix.  For example, to subtract two integers
ensuring that any later writes will happen after the subtraction is per-
formed, use atomic_subtract_acq_int().

The third variant of each operation includes a write memory barrier.
This ensures that all effects of all previous data accesses are completed
before this operation takes place. As a result, the operation is said to
have release semantics as it releases any pending data accesses to be
completed before its operation is performed.  To denote this, the suffix
``_rel'' is inserted into the function name immediately prior to the
``_<type>'' suffix.  For example, to add two long integers ensuring that
all previous writes will happen first, use atomic_add_rel_long().

The bottom-side of all this is that read memory barriers ensures that
the effect of the operations you are making (load in case of
atomic_load_acq_int(), for example) are completed before any later
data accesses. "Data accesses" qualifies for *all* the operations
including read, writes, etc. This is very different by what Linux
assumes for its rmb() barrier, for example which just orders loads. So
for FreeBSD there is no _acq -> rmb() analogy and there is no _rel ->
wmb() analogy.

This must be kept well in mind when trying to optimize the atomic_*()
operations.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 16:21:06 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 12C0632B
 for <freebsd-arch@freebsd.org>; Tue, 28 Oct 2014 16:21:06 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id DFBC220E
 for <freebsd-arch@freebsd.org>; Tue, 28 Oct 2014 16:21:05 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 32B78B980;
 Tue, 28 Oct 2014 12:21:04 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: RfC: fueword(9) and casueword(9)
Date: Tue, 28 Oct 2014 11:46:49 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20141021094539.GA1877@kib.kiev.ua>
 <2048849.GkvWliFbyg@ralph.baldwin.cx> <20141027165557.GC1877@kib.kiev.ua>
In-Reply-To: <20141027165557.GC1877@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201410281146.49370.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 28 Oct 2014 12:21:04 -0400 (EDT)
Cc: freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 16:21:06 -0000

On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote:
> On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote:
> > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote:
> > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote:
> > > > A new API should try to fix these __DEVOLATILE() abominations.  I think it
> > > > is safe, and even correct, to declare the pointers as volatile const void
> > > > *, since the functions really can handle volatile data, unlike copyin().
> > > > 
> > > > Atomic op functions are declared as taking pointers to volatile for
> > > > similar reasons.  Often they are applied to non-volatile data, but
> > > > adding a qualifier is type-safe and doesn't cost efficiency since the
> > > > pointer access is is not known to the compiler.  (The last point is not
> > > > so clear -- the compiler can see things in the functions since they are
> > > > inline asm.  fueword() isn't inline so its (in)efficiency is not changed.)
> > > > 
> > > > The atomic read functions are not declared as taking pointers to const.
> > > > The __DECONST() abomination might be used to work around this bug.
> > > 
> > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the
> > > umtx structures definitions.  I think that it is bug to mark the lock
> > > words with volatile.  I want the fueword(9) interface to be as much
> > > similar to fuword(9), in particular, volatile seems to be not needed.
> > 
> > I agree with Bruce here.  casuword() already accepts volatile.  I also
> > think umtx is correct in marking the field as volatile.  They are subject
> > to change without the compiler's knowledge albeit by other threads
> > rather than signal handlers.  Having them marked volatile doesn't really
> > matter for the kernel, but the header is also used in userland and is
> > relevant in sem_new.c, etc.
> 
> You agree with making fueword() accept volatile const void * as the
> address ?  Or do you agree with the existence of the volatile type
> qualifier for the lock field of umtx structures ?

I agree with both (I thought Bruce only asserted the first).

> I definitely do not want to make fueword() different from fuword() in
> this aspect.  If changing both fueword() and fuword() to take volatile
> const * address, this should be different patch.

I also agree that fuword() and fueword() should take identical arguments,
so if this change is made it should be a separate patch (and should include
suword()).

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 16:21:10 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2A3E73C8
 for <freebsd-arch@freebsd.org>; Tue, 28 Oct 2014 16:21:10 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0402F210
 for <freebsd-arch@freebsd.org>; Tue, 28 Oct 2014 16:21:10 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id DA997B995;
 Tue, 28 Oct 2014 12:21:08 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Subject: Re: refcount_release_take_##lock
Date: Tue, 28 Oct 2014 11:54:54 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20141025184448.GA19066@dft-labs.eu>
 <2629048.tOq3sNXcCP@ralph.baldwin.cx> <20141027192721.GA28049@dft-labs.eu>
In-Reply-To: <20141027192721.GA28049@dft-labs.eu>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: <201410281154.54581.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 28 Oct 2014 12:21:08 -0400 (EDT)
Cc: John-Mark Gurney <jmg@funkthat.com>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 16:21:10 -0000

On Monday, October 27, 2014 3:27:21 pm Mateusz Guzik wrote:
> On Mon, Oct 27, 2014 at 11:27:45AM -0400, John Baldwin wrote:
> > Please keep the refcount_*() prefix so it matches the rest of the API.  I 
> > would just declare the functions directly in refcount.h rather than requiring 
> > a macro to be invoked in each C file.  We can also just implement the needed 
> > lock types for now instead of all of them.
> > 
> > You could maybe replace 'take' with 'lock', but either name is fine.
> > 
> 
> 
> We need sx and rwlocks (and temporarily mutexes, but that is going away
> in few days).

Ok.

> I ran into the following issue: opensolaris code has its own rwlock.h,
> and their refcount.h eventually includes ours refcount.h (and it has to
> since e.g. our file.h requires it).
> 
> I don't know any good solution.

Ugh.

> We could add locking funcs to a separate header (refcount_lock.h?) or use the
> following hack:
> 
> +#ifdef _SYS_RWLOCK_H_
> +REFCOUNT_RELEASE_LOCK_DEFINE(rwlock, struct rwlock, rw_wlock, rw_wunlock);
> +#else

The problem here is that typically refcount.h would be included before rwlock.h
(style(9) sorts headers alphabetically).

Given that you want to inline this anyway, you could perhaps implement it as
a macro instead of an inline function?  That would result in it only being
parsed when used which would side-step this.  It's not really ideal but might
be less ugly than the other options.  Something like:

#define _refcount_release_lock(count, lock, LOCK_OP, UNLOCK_OP) \
...

#define	refcount_release_lock_mtx(count, lock)					\
	_refcount_release_lock((count), (lock), mtx_lock, mtx_unlock)

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 17:44:36 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 66CB3262;
 Tue, 28 Oct 2014 17:44:36 +0000 (UTC)
Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com
 [IPv6:2a00:1450:400c:c05::233])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id CCAF4D63;
 Tue, 28 Oct 2014 17:44:35 +0000 (UTC)
Received: by mail-wi0-f179.google.com with SMTP id h11so2376202wiw.12
 for <multiple recipients>; Tue, 28 Oct 2014 10:44:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=BCmsYGm/PuTTSnfkwXGltszK2GwSzK4ABJQUzyTr5gY=;
 b=ifEYN6Dqz7mg5xt4A1Vys9pqJYcJ5pM+fM3iZ0R7VtR+IO+Yh/WyMtKDetnsK38GXp
 bTaz0MfD+INVZ2FnCiP4F89pZo0aFL0h63SIbLTcJfNvfj6GGMpLtzLvLja/QO3FSLcj
 V3J8snEUbKSilkJ13fGOQH3enDW9Bubh2WFuxWKZlm+VaYuRDDn9BBAzVf0AEg5OX6DF
 vSZ5hMZZejZI80w1N8EQo05T2causCTzX30yblyRLs/2WheYGe0NtugBM8IjxBPosnf/
 XMVgvPK0QZ0NpNrQLy22qarSeKIAv6HZ4pUYkNVksr23ALKjWu2SScX/vrZeMVAlIKEW
 +BDg==
X-Received: by 10.180.221.129 with SMTP id qe1mr6701088wic.21.1414518271988;
 Tue, 28 Oct 2014 10:44:31 -0700 (PDT)
Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net.
 [2001:470:1f08:1f7::2])
 by mx.google.com with ESMTPSA id eu8sm11226564wic.1.2014.10.28.10.44.30
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Tue, 28 Oct 2014 10:44:31 -0700 (PDT)
Date: Tue, 28 Oct 2014 18:44:28 +0100
From: Mateusz Guzik <mjguzik@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Subject: Re: refcount_release_take_##lock
Message-ID: <20141028174428.GA12014@dft-labs.eu>
References: <20141025184448.GA19066@dft-labs.eu>
 <2629048.tOq3sNXcCP@ralph.baldwin.cx>
 <20141027192721.GA28049@dft-labs.eu>
 <201410281154.54581.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <201410281154.54581.jhb@freebsd.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: John-Mark Gurney <jmg@funkthat.com>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 17:44:36 -0000

On Tue, Oct 28, 2014 at 11:54:54AM -0400, John Baldwin wrote:
> On Monday, October 27, 2014 3:27:21 pm Mateusz Guzik wrote:
> > On Mon, Oct 27, 2014 at 11:27:45AM -0400, John Baldwin wrote:
> > > Please keep the refcount_*() prefix so it matches the rest of the API.  I 
> > > would just declare the functions directly in refcount.h rather than requiring 
> > > a macro to be invoked in each C file.  We can also just implement the needed 
> > > lock types for now instead of all of them.
> > > 
> > > You could maybe replace 'take' with 'lock', but either name is fine.
> > > 
> > 
> > 
> > We need sx and rwlocks (and temporarily mutexes, but that is going away
> > in few days).
> 
> Ok.
> 
> > I ran into the following issue: opensolaris code has its own rwlock.h,
> > and their refcount.h eventually includes ours refcount.h (and it has to
> > since e.g. our file.h requires it).
> > 
> > I don't know any good solution.
> 
> Ugh.
> 
> > We could add locking funcs to a separate header (refcount_lock.h?) or use the
> > following hack:
> > 
> > +#ifdef _SYS_RWLOCK_H_
> > +REFCOUNT_RELEASE_LOCK_DEFINE(rwlock, struct rwlock, rw_wlock, rw_wunlock);
> > +#else
> 
> The problem here is that typically refcount.h would be included before rwlock.h
> (style(9) sorts headers alphabetically).
> 
> Given that you want to inline this anyway, you could perhaps implement it as
> a macro instead of an inline function?  That would result in it only being
> parsed when used which would side-step this.  It's not really ideal but might
> be less ugly than the other options.  Something like:
> 
> #define _refcount_release_lock(count, lock, LOCK_OP, UNLOCK_OP) \
> ...
> 
> #define	refcount_release_lock_mtx(count, lock)					\
> 	_refcount_release_lock((count), (lock), mtx_lock, mtx_unlock)
> 

diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c
index f8ae0e6..e94ccde 100644
--- a/sys/kern/kern_jail.c
+++ b/sys/kern/kern_jail.c
@@ -4466,15 +4466,12 @@ prison_racct_free_locked(struct prison_racct *prr)
 void
 prison_racct_free(struct prison_racct *prr)
 {
-	int old;
 
 	sx_assert(&allprison_lock, SA_UNLOCKED);
 
-	old = prr->prr_refcount;
-	if (old > 1 && atomic_cmpset_int(&prr->prr_refcount, old, old - 1))
+	if (!refcount_release_lock_sx(&prr->prr_refcount, &allprison_lock))
 		return;
 
-	sx_xlock(&allprison_lock);
 	prison_racct_free_locked(prr);
 	sx_xunlock(&allprison_lock);
 }
diff --git a/sys/kern/kern_loginclass.c b/sys/kern/kern_loginclass.c
index c0946ef..0771b38 100644
--- a/sys/kern/kern_loginclass.c
+++ b/sys/kern/kern_loginclass.c
@@ -81,18 +81,10 @@ loginclass_hold(struct loginclass *lc)
 void
 loginclass_free(struct loginclass *lc)
 {
-	int old;
 
-	old = lc->lc_refcount;
-	if (old > 1 && atomic_cmpset_int(&lc->lc_refcount, old, old - 1))
+	if (!refcount_release_lock_rwlock(&lc->lc_refcount, &loginclasses_lock))
 		return;
 
-	rw_wlock(&loginclasses_lock);
-	if (!refcount_release(&lc->lc_refcount)) {
-		rw_wunlock(&loginclasses_lock);
-		return;
-	}
-
 	racct_destroy(&lc->lc_racct);
 	LIST_REMOVE(lc, lc_next);
 	rw_wunlock(&loginclasses_lock);
diff --git a/sys/kern/kern_resource.c b/sys/kern/kern_resource.c
index 037a257..e1d5237 100644
--- a/sys/kern/kern_resource.c
+++ b/sys/kern/kern_resource.c
@@ -1303,20 +1303,10 @@ uihold(struct uidinfo *uip)
 void
 uifree(struct uidinfo *uip)
 {
-	int old;
 
-	/* Prepare for optimal case. */
-	old = uip->ui_ref;
-	if (old > 1 && atomic_cmpset_int(&uip->ui_ref, old, old - 1))
+	if (!refcount_release_lock_rwlock(&uip->ui_ref, &uihashtbl_lock))
 		return;
 
-	/* Prepare for suboptimal case. */
-	rw_wlock(&uihashtbl_lock);
-	if (refcount_release(&uip->ui_ref) == 0) {
-		rw_wunlock(&uihashtbl_lock);
-		return;
-	}
-
 	racct_destroy(&uip->ui_racct);
 	LIST_REMOVE(uip, ui_hash);
 	rw_wunlock(&uihashtbl_lock);
diff --git a/sys/sys/refcount.h b/sys/sys/refcount.h
index 4611664..343da6d 100644
--- a/sys/sys/refcount.h
+++ b/sys/sys/refcount.h
@@ -64,4 +64,34 @@ refcount_release(volatile u_int *count)
 	return (old == 1);
 }
 
+#define	_refcount_release_lock(count, lock, TYPE, LOCK_OP, UNLOCK_OP)		\
+({										\
+	TYPE *__lock;								\
+	volatile u_int *__cp;							\
+	u_int __old;								\
+	bool __ret;								\
+										\
+	__lock = (lock);							\
+	__cp = (count);								\
+	__old = *__cp;								\
+	__ret = 0;								\
+	if (!(__old > 1 && atomic_cmpset_int(__cp, __old, __old - 1))) {	\
+		LOCK_OP(__lock);						\
+		if (refcount_release(__cp) == 0)				\
+			UNLOCK_OP(__lock);					\
+		else 								\
+			__ret = 1;						\
+	}									\
+	__ret;									\
+})
+
+#define	refcount_release_lock_mtx(count, lock)		\
+	    _refcount_release_lock(count, lock, struct mtx, mtx_lock, mtx_unlock)
+#define	refcount_release_lock_rmlock(count, lock)	\
+	    _refcount_release_lock(count, lock, struct rmlock, rm_wlock, rm_wunlock)
+#define	refcount_release_lock_rwlock(count, lock)	\
+	    _refcount_release_lock(count, lock, struct rwlock, rw_wlock, rw_wunlock)
+#define	refcount_release_lock_sx(count, lock)		\
+	    _refcount_release_lock(count, lock, struct sx, sx_xlock, sx_xunlock)
+
 #endif	/* ! __SYS_REFCOUNT_H__ */

-- 
Mateusz Guzik <mjguzik gmail.com>

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 17:53:29 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5A02A638;
 Tue, 28 Oct 2014 17:53:29 +0000 (UTC)
Received: from nibbler.fubar.geek.nz (nibbler.fubar.geek.nz [199.48.134.198])
 by mx1.freebsd.org (Postfix) with ESMTP id 2B9A8E55;
 Tue, 28 Oct 2014 17:53:28 +0000 (UTC)
Received: from bender.lan (97e078e7.skybroadband.com [151.224.120.231])
 by nibbler.fubar.geek.nz (Postfix) with ESMTPSA id 6EA175CC08;
 Tue, 28 Oct 2014 17:53:26 +0000 (UTC)
Date: Tue, 28 Oct 2014 17:53:18 +0000
From: Andrew Turner <andrew@fubar.geek.nz>
To: Attilio Rao <attilio@freebsd.org>
Subject: Re: atomic ops
Message-ID: <20141028175318.709d2ef6@bender.lan>
In-Reply-To: <CAJ-FndD=9MgK608ra8+eMy=cAdq+A0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com>
References: <20141028025222.GA19223@dft-labs.eu>
 <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
 <20141028142510.10a9d3cb@bender.lan>
 <CAJ-FndD=9MgK608ra8+eMy=cAdq+A0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>,
 Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Konstantin Belousov <kib@freebsd.org>, Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 17:53:29 -0000

On Tue, 28 Oct 2014 15:33:06 +0100
Attilio Rao <attilio@freebsd.org> wrote:
> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz>
> wrote:
> > On Tue, 28 Oct 2014 14:18:41 +0100
> > Attilio Rao <attilio@freebsd.org> wrote:
> >
> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
> >> wrote:
> >> > As was mentioned sometime ago, our situation related to atomic
> >> > ops is not ideal.
> >> >
> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64)
> >> > provide full memory barriers, which is stronger than needed.
> >> >
> >> > Moreover, load is implemented as lock cmpchg on var address, so
> >> > it is addditionally slower especially when cpus compete.
> >>
> >> I already explained this once privately: fully memory barriers is
> >> not stronger than needed.
> >> FreeBSD has a different semantic than Linux. We historically
> >> enforce a full barrier on _acq() and _rel() rather then just a
> >> read and write barrier, hence we need a different implementation
> >> than Linux. There is code that relies on this property, like the
> >> locking primitives (release a mutex, for instance).
> >
> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has
> > added support for load-acquire and store-release atomic
> > instructions. For the use in atomic instructions we can assume
> > these only operate of the address passed to them.
> >
> > It is unlikely we will use them in the 32-bit port however I would
> > like to know the expected semantics of these atomic functions to
> > make sure we get them correct in the arm64 port. I have been
> > advised by one of the ARM Linux kernel maintainers on the problems
> > they have found using these instructions but have yet to determine
> > what our atomic functions guarantee.
> 
> For FreeBSD the "reference doc" is atomic(9).
> It clearly states:

There may also be a difference between what it states, how they are
implemented, and what developers assume they do. I'm trying to make
sure I get them correct.

> The second variant of each operation includes a read memory barrier.
> This barrier ensures that the effects of this operation are completed
> before the effects of any later data accesses.  As a result, the
> opera- tion is said to have acquire semantics as it acquires a
> pseudo-lock requiring further operations to wait until it has
> completed.  To denote this, the suffix ``_acq'' is inserted into the
> function name immediately prior to the ``_<type>'' suffix.  For
> example, to subtract two integers ensuring that any later writes will
> happen after the subtraction is per- formed, use
> atomic_subtract_acq_int().

It depends on the point we guarantee the acquire barrier to be. On ARMv8
the function will be a load/modify/write sequence. If we use a
load-acquire operation for atomic_subtract_acq_int, for example, for a
pointer P and value to subtract X:

loop:
 load-acquire *P to N
 perform N = N - X
 store-exclusive N to *P
 if the store failed goto loop

where N and X are both registers.

This will mean no access after this loop will happen before it, but
they may happen within it, e.g. if there was a later access A the
following may be possible:

Load P
Access A
Store P

We know the store will happen as if it fails, e.g. another processor
access *P, the store will have failed and will iterate over the loop.

The other point is we can guarantee any store-release, and therefore
any prior access, has happened before a later load-acquire even if it's
on another processor.

...

> The bottom-side of all this is that read memory barriers ensures that
> the effect of the operations you are making (load in case of
> atomic_load_acq_int(), for example) are completed before any later
> data accesses. "Data accesses" qualifies for *all* the operations
> including read, writes, etc. This is very different by what Linux
> assumes for its rmb() barrier, for example which just orders loads. So
> for FreeBSD there is no _acq -> rmb() analogy and there is no _rel ->
> wmb() analogy.

On ARMv8 using the above pseudo-code the operation later operations
will not be moved before the load-acquire, but they may happen before
it's store. Having discussed this with John Baldwin I don't think this
is a problem due to the nature of the store operation being allowed to
fail if another processor has written its memory.

> 
> This must be kept well in mind when trying to optimize the atomic_*()
> operations.

At this point I'm more interested in getting them correct as they will
be important when I start on SMP support.

Andrew

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 18:26:50 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 7A4962D8
 for <freebsd-arch@freebsd.org>; Tue, 28 Oct 2014 18:26:50 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 52DCA23E
 for <freebsd-arch@freebsd.org>; Tue, 28 Oct 2014 18:26:50 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 46DCFB980;
 Tue, 28 Oct 2014 14:26:49 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Subject: Re: refcount_release_take_##lock
Date: Tue, 28 Oct 2014 14:13:58 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20141025184448.GA19066@dft-labs.eu>
 <201410281154.54581.jhb@freebsd.org> <20141028174428.GA12014@dft-labs.eu>
In-Reply-To: <20141028174428.GA12014@dft-labs.eu>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: <201410281413.58414.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 28 Oct 2014 14:26:49 -0400 (EDT)
Cc: John-Mark Gurney <jmg@funkthat.com>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 18:26:50 -0000

On Tuesday, October 28, 2014 1:44:28 pm Mateusz Guzik wrote:
> diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c
> index f8ae0e6..e94ccde 100644
> --- a/sys/kern/kern_jail.c
> +++ b/sys/kern/kern_jail.c

The diff looks good to me.  Just need to update refcount.9 as well.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 18:26:51 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id ED6162DB;
 Tue, 28 Oct 2014 18:26:51 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B8BA0240;
 Tue, 28 Oct 2014 18:26:51 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 8ADE7B9B4;
 Tue, 28 Oct 2014 14:26:50 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Subject: Re: amd64 modules still use atomics as callable functions
Date: Tue, 28 Oct 2014 14:18:04 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20141027224901.GC28049@dft-labs.eu>
In-Reply-To: <20141027224901.GC28049@dft-labs.eu>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201410281418.04704.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 28 Oct 2014 14:26:50 -0400 (EDT)
Cc: Mateusz Guzik <mjguzik@gmail.com>, Konstantin Belousov <kib@freebsd.org>,
 Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 18:26:52 -0000

On Monday, October 27, 2014 6:49:01 pm Mateusz Guzik wrote:
> Turns out several years ago the kernel was modified to provide actual
> functions for atomic operations and modules are always using them.
> 
> I propose plugging it on amd64 in head.
> 
> For stable/10 we can always provide them, but inline in modules by default
> (testing a KLD_WANT_ATOMIC_FUNC knob?).

I think some of the comments might need tweaking still:

> diff --git a/sys/amd64/include/atomic.h b/sys/amd64/include/atomic.h
> index 9110dc5..e7e1735 100644
> --- a/sys/amd64/include/atomic.h
> +++ b/sys/amd64/include/atomic.h
> @@ -69,28 +69,7 @@
>   * The above functions are expanded inline in the statically-linked
>   * kernel.  Lock prefixes are generated if an SMP kernel is being
>   * built.
> - *
> - * Kernel modules call real functions which are built into the kernel.
> - * This allows kernel modules to be portable between UP and SMP systems.
>   */
> -#if defined(KLD_MODULE) || !defined(__GNUCLIKE_ASM)
> -#define	ATOMIC_ASM(NAME, TYPE, OP, CONS, V)			\
> -void atomic_##NAME##_##TYPE(volatile u_##TYPE *p, u_##TYPE v);	\
> -void atomic_##NAME##_barr_##TYPE(volatile u_##TYPE *p, u_##TYPE v)
> -
> -int	atomic_cmpset_int(volatile u_int *dst, u_int expect, u_int src);
> -int	atomic_cmpset_long(volatile u_long *dst, u_long expect, u_long src);
> -u_int	atomic_fetchadd_int(volatile u_int *p, u_int v);
> -u_long	atomic_fetchadd_long(volatile u_long *p, u_long v);
> -int	atomic_testandset_int(volatile u_int *p, u_int v);
> -int	atomic_testandset_long(volatile u_long *p, u_int v);
> -
> -#define	ATOMIC_LOAD(TYPE, LOP)					\
> -u_##TYPE	atomic_load_acq_##TYPE(volatile u_##TYPE *p)
> -#define	ATOMIC_STORE(TYPE)					\
> -void		atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v)
> -
> -#else /* !KLD_MODULE && __GNUCLIKE_ASM */
>  
>  /*
>   * For userland, always use lock prefixes so that the binaries will run

Like here: maybe "For userland and kernel modules, always use lock prefixes..."

Also, this does break the !__GNUCLIKE_ASM case, but I'm not sure if that case
actually works anyway.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 18:26:53 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 16D45349;
 Tue, 28 Oct 2014 18:26:53 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D5B63243;
 Tue, 28 Oct 2014 18:26:52 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id D4051B96E;
 Tue, 28 Oct 2014 14:26:51 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Subject: Re: boot man pages installed four times..
Date: Tue, 28 Oct 2014 14:19:58 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20141027231401.GQ82214@funkthat.com>
 <CAGHfRMApPwy4wB0Wb29kjoXD8W=sJTjRcHHDtuVK-dqk18HpbA@mail.gmail.com>
In-Reply-To: <CAGHfRMApPwy4wB0Wb29kjoXD8W=sJTjRcHHDtuVK-dqk18HpbA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201410281419.58068.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 28 Oct 2014 14:26:51 -0400 (EDT)
Cc: "freebsd-arch@FreeBSD.org Arch" <arch@freebsd.org>,
 NGie Cooper <yaneurabeya@gmail.com>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 18:26:53 -0000

On Monday, October 27, 2014 8:25:20 pm NGie Cooper wrote:
> On Mon, Oct 27, 2014 at 4:14 PM, John-Mark Gurney <jmg@funkthat.com> wrote:
> > So, our loader man pages are currently installed four different times
> > during installworld...  Once each durning sys/boot/userboot/userboot,
> > sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader
> >
> > This is because sys/boot/common/Makefile.inc defines the man pages, and
> > each of these locations include that Makefile...
> >
> > It seems like the logical thing to do is to create a sys/boot/man that
> > only installed man pages...  This will partly move us to always
> > installing all man pages on all archs...
> 
> Should this manpages just be installed as part of
> share/man/man<section> instead?

Ugh, no.  We should keep manpages out of there when possible.  E.g. all the 
pthread manpages should move next to libthr (now that we only have one thread 
library).  I would also like to eventually move kernel manpages into sys 
(perhaps sys/man, though it would be really nice to put driver manpages into 
sys/dev/foo if possible).

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 19:34:10 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6F0A7E64;
 Tue, 28 Oct 2014 19:34:10 +0000 (UTC)
Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com
 [IPv6:2a00:1450:400c:c00::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id CFFA2BFB;
 Tue, 28 Oct 2014 19:34:09 +0000 (UTC)
Received: by mail-wg0-f41.google.com with SMTP id k14so279534wgh.14
 for <multiple recipients>; Tue, 28 Oct 2014 12:34:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=/aDpkQjplcyzjm2A6G41Q0oy4tZyAMn6R7C9/6NS6lw=;
 b=ysubuULWHwMjbbZMIvxHCC+s23v3LQcyNfDhfZgTJYXKf6M+T4af0McRqZneNk8tLB
 Q18OSRN3DXme5c526mpU4rqgP4tg9WWSD9/IjdOISrqiaj17zBkKVlxLq8thESsAc6Ka
 wZrPmRPLIlK7J4l8M15tJzE3dtKz3jPCwy1r+fyCCypcPHfED9FV+OF146qb1zMn6Oeb
 SJvkfMPUCvrKQHuGOPVytoESguKqq//099hxGQQ+X1DS1U4EDWdNjnRdHYYQoNF0/oqZ
 FLwL+rrznf9tZjOBzVYQvdGJo2zx9crgikrxvLI/FeCz0uHkGUtzpNpIxEx+rBymSHaQ
 8FXg==
X-Received: by 10.194.158.4 with SMTP id wq4mr7192449wjb.58.1414524847957;
 Tue, 28 Oct 2014 12:34:07 -0700 (PDT)
Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net.
 [2001:470:1f08:1f7::2])
 by mx.google.com with ESMTPSA id dg3sm3224593wib.14.2014.10.28.12.34.06
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Tue, 28 Oct 2014 12:34:07 -0700 (PDT)
Date: Tue, 28 Oct 2014 20:34:04 +0100
From: Mateusz Guzik <mjguzik@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Subject: Re: refcount_release_take_##lock
Message-ID: <20141028193404.GB12014@dft-labs.eu>
References: <20141025184448.GA19066@dft-labs.eu>
 <201410281154.54581.jhb@freebsd.org>
 <20141028174428.GA12014@dft-labs.eu>
 <201410281413.58414.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <201410281413.58414.jhb@freebsd.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: John-Mark Gurney <jmg@funkthat.com>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 19:34:10 -0000

On Tue, Oct 28, 2014 at 02:13:58PM -0400, John Baldwin wrote:
> On Tuesday, October 28, 2014 1:44:28 pm Mateusz Guzik wrote:
> > diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c
> > index f8ae0e6..e94ccde 100644
> > --- a/sys/kern/kern_jail.c
> > +++ b/sys/kern/kern_jail.c
> 
> The diff looks good to me.  Just need to update refcount.9 as well.
> 

diff --git a/share/man/man9/refcount.9 b/share/man/man9/refcount.9
index e7702a2..61b9b51 100644
--- a/share/man/man9/refcount.9
+++ b/share/man/man9/refcount.9
@@ -26,7 +26,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd January 20, 2009
+.Dd October 28, 2014
 .Dt REFCOUNT 9
 .Os
 .Sh NAME
@@ -44,6 +44,15 @@
 .Fn refcount_acquire "volatile u_int *count"
 .Ft int
 .Fn refcount_release "volatile u_int *count"
+.In sys/mutex.h
+.Fn refcount_release_lock_mtx "volatile u_int *count, struct mtx *lock"
+.In sys/rmlock.h
+.Fn refcount_release_lock_rmlock "volatile u_int *count, struct rmlock *lock"
+.In sys/rwlock.h
+.Fn refcount_release_lock_rwlock "volatile u_int *count, struct rwlock *lock"
+.In sys/lock.h
+.In sys/sx.h
+.Fn refcount_release_lock_sx "volatile u_int *count, struct sx *lock"
 .Sh DESCRIPTION
 The
 .Nm
@@ -77,6 +86,13 @@ The function returns a non-zero value if the reference being released was
 the last reference;
 otherwise, it returns zero.
 .Pp
+.Fn refcount_release_lock_*
+functions release an existing reference holding the lock if it is the last
+reference.
+These functions return with the lock held and a non-zero value if the reference
+being released was the last reference;
+otherwise, they returns zero and the lock is not held.
+.Pp
 Note that these routines do not provide any inter-CPU synchronization,
 data protection,
 or memory ordering guarantees except for managing the counter.
@@ -91,6 +107,18 @@ The
 .Nm refcount_release
 function returns non-zero when releasing the last reference and zero when
 releasing any other reference.
+.Pp
+.Nm refcount_release_lock_*
+functions return with the lock held and non-zero value when releasing the last
+reference, zero without the lock held when releasing any other reference.
 .Sh HISTORY
-These functions were introduced in
+.Fn refcount_init ,
+.Fn refcount_acquire
+and
+.Fn refcount_release
+functions were introduced in
 .Fx 6.0 .
+.Pp
+.Fn refcount_release_lock_*
+functions were introduced in
+.Fx 10.2 .

-- 
Mateusz Guzik <mjguzik gmail.com>

From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 20:08:29 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C9732F1E;
 Tue, 28 Oct 2014 20:08:29 +0000 (UTC)
Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com
 [IPv6:2a00:1450:400c:c05::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id F1F08F71;
 Tue, 28 Oct 2014 20:08:28 +0000 (UTC)
Received: by mail-wi0-f174.google.com with SMTP id q5so10457204wiv.7
 for <multiple recipients>; Tue, 28 Oct 2014 13:08:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=SWEVK100JCP3hH6nIA5W048mTL2DlE6AZHkVV+IePd8=;
 b=nav94NyciEgZmh9wHr3O3aCWtbbOu70kmAyPv5/MYTISIVig9/1kBzfJUxJhL8Gan/
 eLQ5/k1C/2mcRq49GSZaC2PZZdVv7n6yxYYrvpGjyhgSj/VLK2kGq4s0sx0i5Sw1JUT4
 U6KqpJqjNnYnspJ7jZtFJcnkNFeUoTn4XZZo8Ks4RlHer5wWR2R2GhgPoNJ9MHM6FBlM
 0b/fzUcpcg1p5L12qBaynuPAcnEwVWqACFXaGL6seaY1/es5nfPEqAE4RhyrDpdK7fbP
 FWba5d35DdjWX/ImQvus/34RnhmtOqftFU1lweemp8zZa4o6ubMGa6p2zdRAL9dnng/l
 5J6g==
MIME-Version: 1.0
X-Received: by 10.180.83.37 with SMTP id n5mr31131906wiy.7.1414526907071; Tue,
 28 Oct 2014 13:08:27 -0700 (PDT)
Reply-To: attilio@FreeBSD.org
Sender: asmrookie@gmail.com
Received: by 10.217.69.73 with HTTP; Tue, 28 Oct 2014 13:08:27 -0700 (PDT)
In-Reply-To: <20141028175318.709d2ef6@bender.lan>
References: <20141028025222.GA19223@dft-labs.eu>
 <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
 <20141028142510.10a9d3cb@bender.lan>
 <CAJ-FndD=9MgK608ra8+eMy=cAdq+A0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com>
 <20141028175318.709d2ef6@bender.lan>
Date: Tue, 28 Oct 2014 21:08:27 +0100
X-Google-Sender-Auth: c8HZhVde7fGTdTrH6xz9ynAAlLQ
Message-ID: <CAJ-FndCsvLV_B3Q0boyK78980chM79hFf_dRyEqRtxzMJkpD5g@mail.gmail.com>
Subject: Re: atomic ops
From: Attilio Rao <attilio@freebsd.org>
To: Andrew Turner <andrew@fubar.geek.nz>
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>,
 Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Konstantin Belousov <kib@freebsd.org>, Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 20:08:29 -0000

On Tue, Oct 28, 2014 at 6:53 PM, Andrew Turner <andrew@fubar.geek.nz> wrote:
> On Tue, 28 Oct 2014 15:33:06 +0100
> Attilio Rao <attilio@freebsd.org> wrote:
>> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz>
>> wrote:
>> > On Tue, 28 Oct 2014 14:18:41 +0100
>> > Attilio Rao <attilio@freebsd.org> wrote:
>> >
>> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
>> >> wrote:
>> >> > As was mentioned sometime ago, our situation related to atomic
>> >> > ops is not ideal.
>> >> >
>> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64)
>> >> > provide full memory barriers, which is stronger than needed.
>> >> >
>> >> > Moreover, load is implemented as lock cmpchg on var address, so
>> >> > it is addditionally slower especially when cpus compete.
>> >>
>> >> I already explained this once privately: fully memory barriers is
>> >> not stronger than needed.
>> >> FreeBSD has a different semantic than Linux. We historically
>> >> enforce a full barrier on _acq() and _rel() rather then just a
>> >> read and write barrier, hence we need a different implementation
>> >> than Linux. There is code that relies on this property, like the
>> >> locking primitives (release a mutex, for instance).
>> >
>> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
>> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has
>> > added support for load-acquire and store-release atomic
>> > instructions. For the use in atomic instructions we can assume
>> > these only operate of the address passed to them.
>> >
>> > It is unlikely we will use them in the 32-bit port however I would
>> > like to know the expected semantics of these atomic functions to
>> > make sure we get them correct in the arm64 port. I have been
>> > advised by one of the ARM Linux kernel maintainers on the problems
>> > they have found using these instructions but have yet to determine
>> > what our atomic functions guarantee.
>>
>> For FreeBSD the "reference doc" is atomic(9).
>> It clearly states:
>
> There may also be a difference between what it states, how they are
> implemented, and what developers assume they do. I'm trying to make
> sure I get them correct.

atomic(9) is our reference so there might be no difference between
what it states and what all architectures implement.
I can say that x86 follows atomic(9) well. I'm not competent enough to
judge if all the !x86 arches follow it completely.
I can understand that developers may get confused. The FreeBSD scheme
is pretty unique. It comes from the fact that historically the membar
support was made to initially support x86. The super-widespread Linux
design, instead, tried to catch all architectures in its description.
It become very well known and I think it also "pushed" for companies
like Intel to invest in improving performance of things like explicit
read/write barriers, etc.

>> The second variant of each operation includes a read memory barrier.
>> This barrier ensures that the effects of this operation are completed
>> before the effects of any later data accesses.  As a result, the
>> opera- tion is said to have acquire semantics as it acquires a
>> pseudo-lock requiring further operations to wait until it has
>> completed.  To denote this, the suffix ``_acq'' is inserted into the
>> function name immediately prior to the ``_<type>'' suffix.  For
>> example, to subtract two integers ensuring that any later writes will
>> happen after the subtraction is per- formed, use
>> atomic_subtract_acq_int().
>
> It depends on the point we guarantee the acquire barrier to be. On ARMv8
> the function will be a load/modify/write sequence. If we use a
> load-acquire operation for atomic_subtract_acq_int, for example, for a
> pointer P and value to subtract X:
>
> loop:
>  load-acquire *P to N
>  perform N = N - X
>  store-exclusive N to *P
>  if the store failed goto loop
>
> where N and X are both registers.
>
> This will mean no access after this loop will happen before it, but
> they may happen within it, e.g. if there was a later access A the
> following may be possible:
>
> Load P
> Access A
> Store P

No, this will be broken in FreeBSD if "Access A" is later.
If "Access A" is prior the membar it doesn't really matter if it gets
interleaved with any of the operations in the atomic instruction.
Ideally, it could even surpass the Store P itself.
But if "Access A" is later (and you want to implement an _acq()
barrier) then it cannot absolutely gets in the middle of the atomic_*
operation.

> We know the store will happen as if it fails, e.g. another processor
> access *P, the store will have failed and will iterate over the loop.
>
> The other point is we can guarantee any store-release, and therefore
> any prior access, has happened before a later load-acquire even if it's
> on another processor.

No, we can never guarantee on the visibility of the operations by other CPUs.
We just make guarantee on how the operations are posted on the system
bus (or how they are locally visible).
Keeping in mind that FreeBSD model cames from x86, you can sense that
some things are sized on the x86 model, which doesn't have any rule or
ordering on global visibility of the operations.

> ...
>
>> The bottom-side of all this is that read memory barriers ensures that
>> the effect of the operations you are making (load in case of
>> atomic_load_acq_int(), for example) are completed before any later
>> data accesses. "Data accesses" qualifies for *all* the operations
>> including read, writes, etc. This is very different by what Linux
>> assumes for its rmb() barrier, for example which just orders loads. So
>> for FreeBSD there is no _acq -> rmb() analogy and there is no _rel ->
>> wmb() analogy.
>
> On ARMv8 using the above pseudo-code the operation later operations
> will not be moved before the load-acquire, but they may happen before
> it's store. Having discussed this with John Baldwin I don't think this
> is a problem due to the nature of the store operation being allowed to
> fail if another processor has written its memory.
>
>>
>> This must be kept well in mind when trying to optimize the atomic_*()
>> operations.
>
> At this point I'm more interested in getting them correct as they will
> be important when I start on SMP support.

Sure. The thread as started as an "optimization of x86" but it refers
to all atomic_* on every architecture FreeBSD supports.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 06:09:39 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 94326AB5;
 Wed, 29 Oct 2014 06:09:39 +0000 (UTC)
Received: from vps.rulingia.com (vps.rulingia.com [103.243.244.15])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "vps.rulingia.com", Issuer "CAcert Class 3 Root" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0DB08390;
 Wed, 29 Oct 2014 06:09:37 +0000 (UTC)
Received: from server.rulingia.com (c220-239-242-83.belrs5.nsw.optusnet.com.au
 [220.239.242.83])
 by vps.rulingia.com (8.14.9/8.14.9) with ESMTP id s9T69MFA077927
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Wed, 29 Oct 2014 17:09:28 +1100 (AEDT)
 (envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
 by server.rulingia.com (8.14.9/8.14.9) with ESMTP id s9T69GkJ061291
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 29 Oct 2014 17:09:16 +1100 (EST)
 (envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
 by server.rulingia.com (8.14.9/8.14.9/Submit) id s9T69FKs061290;
 Wed, 29 Oct 2014 17:09:15 +1100 (EST) (envelope-from peter)
Date: Wed, 29 Oct 2014 17:09:15 +1100
From: Peter Jeremy <peter@rulingia.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Subject: Re: amd64 modules still use atomics as callable functions
Message-ID: <20141029060915.GA56181@server.rulingia.com>
References: <20141027224901.GC28049@dft-labs.eu>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="Dxnq1zWXvFF0Q93v"
Content-Disposition: inline
In-Reply-To: <20141027224901.GC28049@dft-labs.eu>
X-PGP-Key: http://www.rulingia.com/keys/peter.pgp
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: Alan Cox <alc@rice.edu>, Konstantin Belousov <kib@FreeBSD.org>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 06:09:39 -0000


--Dxnq1zWXvFF0Q93v
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2014-Oct-27 23:49:01 +0100, Mateusz Guzik <mjguzik@gmail.com> wrote:
>Turns out several years ago the kernel was modified to provide actual
>functions for atomic operations and modules are always using them.
>
>I propose plugging it on amd64 in head.
>
>For stable/10 we can always provide them, but inline in modules by default
>(testing a KLD_WANT_ATOMIC_FUNC knob?).

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D173322

--=20
Peter Jeremy

--Dxnq1zWXvFF0Q93v
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQJ8BAEBCgBmBQJUUISLXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFRUIyOTg2QzMwNjcxRTc0RTY1QzIyN0Ux
NkE1OTdBMEU0QTIwQjM0AAoJEBall6Dkogs0I98P/ji+wVot8LztKeBy3A3J76ny
6InZ1+HTkApGBG4aiEhTBijDZywZENXvs7oU3e8bnvGZdO1/wdzGsbD05XZBkYjA
fI5haPKAKv8sp91pTbrE2C/TpRthPQpuRTsTFUlCyfAS7Owxd7+HryDkvz6socGy
JWeqqgT3OsVwSwtuoeNFtvSqpDlXLKVGgsIEIcQqVjnRYWkf0VxuKvPclvRKiuxQ
DzLi/dQhiIAwGaGZMJ7FTNZjNKhZ/qliENPueIMbAHgUlcHbd7i7cCc3dr4EVNfq
GEVYWxwYLCmtQCSTnpvRmOjsceUpfsR6tKVrGWvjUdThgKWWH3XVL1D9XSimF3Xg
pxX3hklS5aNDEzlm+McidlIH8nNWCSsHPZm0A5in+QROJUg4T7hjWvgIXQmpC/f1
Dd713JV0g/C+NdUwkKgYm09t0WY36BdrntTuN3dDPUY7WVA/uEcjFeI07OCMogOU
XwLaFSNtwpH4BzOOc5FxzjAZ2GNEHisek7QGFk/g3wfdVMtC57hXZ7eYI6jsTQ8v
F1q/3pI9+j9y9gObAm7+08s1HHULJbgez/Od8z2aGLkMIK1fbcSS76H6moG62MnQ
pIE9C6IqqiSUtKo9zYjmp6C+XAGHEnKhOo+XgZ3vDqdvZt7XrKRG9LH/U1x9Eten
L4ihGzecF+4LyLNNp4Ff
=rOZX
-----END PGP SIGNATURE-----

--Dxnq1zWXvFF0Q93v--

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 16:02:18 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3FCBDEB0;
 Wed, 29 Oct 2014 16:02:18 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F0FCBDEF;
 Wed, 29 Oct 2014 16:02:17 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 03BD8B915;
 Wed, 29 Oct 2014 12:02:16 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org,
 attilio@freebsd.org
Subject: Re: atomic ops
Date: Wed, 29 Oct 2014 10:59:16 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20141028025222.GA19223@dft-labs.eu>
 <20141028175318.709d2ef6@bender.lan>
 <CAJ-FndCsvLV_B3Q0boyK78980chM79hFf_dRyEqRtxzMJkpD5g@mail.gmail.com>
In-Reply-To: <CAJ-FndCsvLV_B3Q0boyK78980chM79hFf_dRyEqRtxzMJkpD5g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201410291059.16829.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Wed, 29 Oct 2014 12:02:16 -0400 (EDT)
Cc: Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Konstantin Belousov <kib@freebsd.org>, Andrew Turner <andrew@fubar.geek.nz>,
 Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 16:02:18 -0000

On Tuesday, October 28, 2014 4:08:27 pm Attilio Rao wrote:
> On Tue, Oct 28, 2014 at 6:53 PM, Andrew Turner <andrew@fubar.geek.nz> wrote:
> > On Tue, 28 Oct 2014 15:33:06 +0100
> > Attilio Rao <attilio@freebsd.org> wrote:
> >> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz>
> >> wrote:
> >> > On Tue, 28 Oct 2014 14:18:41 +0100
> >> > Attilio Rao <attilio@freebsd.org> wrote:
> >> >
> >> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
> >> >> wrote:
> >> >> > As was mentioned sometime ago, our situation related to atomic
> >> >> > ops is not ideal.
> >> >> >
> >> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64)
> >> >> > provide full memory barriers, which is stronger than needed.
> >> >> >
> >> >> > Moreover, load is implemented as lock cmpchg on var address, so
> >> >> > it is addditionally slower especially when cpus compete.
> >> >>
> >> >> I already explained this once privately: fully memory barriers is
> >> >> not stronger than needed.
> >> >> FreeBSD has a different semantic than Linux. We historically
> >> >> enforce a full barrier on _acq() and _rel() rather then just a
> >> >> read and write barrier, hence we need a different implementation
> >> >> than Linux. There is code that relies on this property, like the
> >> >> locking primitives (release a mutex, for instance).
> >> >
> >> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
> >> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has
> >> > added support for load-acquire and store-release atomic
> >> > instructions. For the use in atomic instructions we can assume
> >> > these only operate of the address passed to them.
> >> >
> >> > It is unlikely we will use them in the 32-bit port however I would
> >> > like to know the expected semantics of these atomic functions to
> >> > make sure we get them correct in the arm64 port. I have been
> >> > advised by one of the ARM Linux kernel maintainers on the problems
> >> > they have found using these instructions but have yet to determine
> >> > what our atomic functions guarantee.
> >>
> >> For FreeBSD the "reference doc" is atomic(9).
> >> It clearly states:
> >
> > There may also be a difference between what it states, how they are
> > implemented, and what developers assume they do. I'm trying to make
> > sure I get them correct.
> 
> atomic(9) is our reference so there might be no difference between
> what it states and what all architectures implement.
> I can say that x86 follows atomic(9) well. I'm not competent enough to
> judge if all the !x86 arches follow it completely.
> I can understand that developers may get confused. The FreeBSD scheme
> is pretty unique. It comes from the fact that historically the membar
> support was made to initially support x86. The super-widespread Linux
> design, instead, tried to catch all architectures in its description.
> It become very well known and I think it also "pushed" for companies
> like Intel to invest in improving performance of things like explicit
> read/write barriers, etc.

Actually, it was designed to support ia64 (and specifically the .acq and
.rel modifiers on the ld, st, and cmpxchg instructions).  Some of the
langage is wrong (and is my fault) in that they are not "read" and
"write" barriers.  They truly are "acquire" and "release".  That said,
x86 has stronger barriers than that, partly because on i386 there wasn't
a whole lot of options (though atomic_store_rel on even i386 should just
be a simple store).

> >> The second variant of each operation includes a read memory barrier.
> >> This barrier ensures that the effects of this operation are completed
> >> before the effects of any later data accesses.  As a result, the
> >> opera- tion is said to have acquire semantics as it acquires a
> >> pseudo-lock requiring further operations to wait until it has
> >> completed.  To denote this, the suffix ``_acq'' is inserted into the
> >> function name immediately prior to the ``_<type>'' suffix.  For
> >> example, to subtract two integers ensuring that any later writes will
> >> happen after the subtraction is per- formed, use
> >> atomic_subtract_acq_int().
> >
> > It depends on the point we guarantee the acquire barrier to be. On ARMv8
> > the function will be a load/modify/write sequence. If we use a
> > load-acquire operation for atomic_subtract_acq_int, for example, for a
> > pointer P and value to subtract X:
> >
> > loop:
> >  load-acquire *P to N
> >  perform N = N - X
> >  store-exclusive N to *P
> >  if the store failed goto loop
> >
> > where N and X are both registers.
> >
> > This will mean no access after this loop will happen before it, but
> > they may happen within it, e.g. if there was a later access A the
> > following may be possible:
> >
> > Load P
> > Access A
> > Store P
> 
> No, this will be broken in FreeBSD if "Access A" is later.
> If "Access A" is prior the membar it doesn't really matter if it gets
> interleaved with any of the operations in the atomic instruction.
> Ideally, it could even surpass the Store P itself.
> But if "Access A" is later (and you want to implement an _acq()
> barrier) then it cannot absolutely gets in the middle of the atomic_*
> operation.

Eh, that isn't broken.  It is subtle however.  The reason it isn't broken
is that if any access to P occurs afer the 'load P', then the store will
fail and the load-acquire will be retried, if A was accessed during the
atomi op, the load-acquire during the try will discard that and force A
to be re-accessed.  If P is not accessed during the atomic op, then it is
safe to access A during the atomic op itself.

> > We know the store will happen as if it fails, e.g. another processor
> > access *P, the store will have failed and will iterate over the loop.
> >
> > The other point is we can guarantee any store-release, and therefore
> > any prior access, has happened before a later load-acquire even if it's
> > on another processor.
> 
> No, we can never guarantee on the visibility of the operations by other CPUs.
> We just make guarantee on how the operations are posted on the system
> bus (or how they are locally visible).
> Keeping in mind that FreeBSD model cames from x86, you can sense that
> some things are sized on the x86 model, which doesn't have any rule or
> ordering on global visibility of the operations.

1) Again, it's actually based on ia64.

2) x86 _does_ have rules on ordering of global visiblity in that most
   stores (aside from some SSE special cases) will become visible in
   program order.  Now, you can't force the _timing_ of when the stores
   become visible (and this is true in general, in MI code you can't
   assume that a barrier is equivalent to a cache flush).

3) In this case I think Andrew is using "armv8" for "we" and you can
   depend on architecture-specific semantics to determine the implementation
   of atomic(9).

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 16:33:38 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 88151130;
 Wed, 29 Oct 2014 16:33:38 +0000 (UTC)
Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com
 [IPv6:2a00:1450:400c:c05::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A27DD1F1;
 Wed, 29 Oct 2014 16:33:37 +0000 (UTC)
Received: by mail-wi0-f174.google.com with SMTP id d1so2217340wiv.7
 for <multiple recipients>; Wed, 29 Oct 2014 09:33:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=TF/uYCoSxUDq2sBIVwBDOiYd5dX90SoSJuXcg7pkGKY=;
 b=I4y3JFhgPbTUASnA/hcFMavI66+djGrw96bVaLQYIhtTce3HtZ/3cYfmOZK6K7P27D
 fjk6WGUQJ+zL4PX+0YveMH44HyAyFIzDWVuYy4O4XsqcxRVex3uBzuQO83IVjczbiI16
 W10TWdN1t1Net12z5D3TgLD1bZJ09KooxBmdo3Hoj0VEWrv63TmoAtANj28FCVJW0myR
 aTqJBGcIUeo7xpjl3BRL2nPdlcxZqXfNmcDe2qqvXwOe/nRDd5k792H/CXTUuxBY0ytQ
 Z3gXx+zKgG2ho4HkipvUlrZWk6MzEMyJUHBpGxMyNt00q4hqD1CJXIMokDXdyFXvqdY1
 KZNg==
MIME-Version: 1.0
X-Received: by 10.180.19.234 with SMTP id i10mr7995696wie.28.1414600415661;
 Wed, 29 Oct 2014 09:33:35 -0700 (PDT)
Reply-To: attilio@FreeBSD.org
Sender: asmrookie@gmail.com
Received: by 10.217.69.73 with HTTP; Wed, 29 Oct 2014 09:33:35 -0700 (PDT)
In-Reply-To: <201410291059.16829.jhb@freebsd.org>
References: <20141028025222.GA19223@dft-labs.eu>
 <20141028175318.709d2ef6@bender.lan>
 <CAJ-FndCsvLV_B3Q0boyK78980chM79hFf_dRyEqRtxzMJkpD5g@mail.gmail.com>
 <201410291059.16829.jhb@freebsd.org>
Date: Wed, 29 Oct 2014 17:33:35 +0100
X-Google-Sender-Auth: wInE1xvvT49TWCYSJ5g93hdZTYc
Message-ID: <CAJ-FndAxOuA4faFfUUbXkO7aLxNh_EKm6sZ65NE9EnU903GEOQ@mail.gmail.com>
Subject: Re: atomic ops
From: Attilio Rao <attilio@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Alan Cox <alc@rice.edu>, Andrew Turner <andrew@fubar.geek.nz>,
 Konstantin Belousov <kib@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 16:33:38 -0000

On Wed, Oct 29, 2014 at 3:59 PM, John Baldwin <jhb@freebsd.org> wrote:
> On Tuesday, October 28, 2014 4:08:27 pm Attilio Rao wrote:
>> On Tue, Oct 28, 2014 at 6:53 PM, Andrew Turner <andrew@fubar.geek.nz> wrote:
>> > On Tue, 28 Oct 2014 15:33:06 +0100
>> > Attilio Rao <attilio@freebsd.org> wrote:
>> >> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz>
>> >> wrote:
>> >> > On Tue, 28 Oct 2014 14:18:41 +0100
>> >> > Attilio Rao <attilio@freebsd.org> wrote:
>> >> >
>> >> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
>> >> >> wrote:
>> >> >> > As was mentioned sometime ago, our situation related to atomic
>> >> >> > ops is not ideal.
>> >> >> >
>> >> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64)
>> >> >> > provide full memory barriers, which is stronger than needed.
>> >> >> >
>> >> >> > Moreover, load is implemented as lock cmpchg on var address, so
>> >> >> > it is addditionally slower especially when cpus compete.
>> >> >>
>> >> >> I already explained this once privately: fully memory barriers is
>> >> >> not stronger than needed.
>> >> >> FreeBSD has a different semantic than Linux. We historically
>> >> >> enforce a full barrier on _acq() and _rel() rather then just a
>> >> >> read and write barrier, hence we need a different implementation
>> >> >> than Linux. There is code that relies on this property, like the
>> >> >> locking primitives (release a mutex, for instance).
>> >> >
>> >> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
>> >> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has
>> >> > added support for load-acquire and store-release atomic
>> >> > instructions. For the use in atomic instructions we can assume
>> >> > these only operate of the address passed to them.
>> >> >
>> >> > It is unlikely we will use them in the 32-bit port however I would
>> >> > like to know the expected semantics of these atomic functions to
>> >> > make sure we get them correct in the arm64 port. I have been
>> >> > advised by one of the ARM Linux kernel maintainers on the problems
>> >> > they have found using these instructions but have yet to determine
>> >> > what our atomic functions guarantee.
>> >>
>> >> For FreeBSD the "reference doc" is atomic(9).
>> >> It clearly states:
>> >
>> > There may also be a difference between what it states, how they are
>> > implemented, and what developers assume they do. I'm trying to make
>> > sure I get them correct.
>>
>> atomic(9) is our reference so there might be no difference between
>> what it states and what all architectures implement.
>> I can say that x86 follows atomic(9) well. I'm not competent enough to
>> judge if all the !x86 arches follow it completely.
>> I can understand that developers may get confused. The FreeBSD scheme
>> is pretty unique. It comes from the fact that historically the membar
>> support was made to initially support x86. The super-widespread Linux
>> design, instead, tried to catch all architectures in its description.
>> It become very well known and I think it also "pushed" for companies
>> like Intel to invest in improving performance of things like explicit
>> read/write barriers, etc.
>
> Actually, it was designed to support ia64 (and specifically the .acq and
> .rel modifiers on the ld, st, and cmpxchg instructions).  Some of the
> langage is wrong (and is my fault) in that they are not "read" and
> "write" barriers.  They truly are "acquire" and "release".  That said,
> x86 has stronger barriers than that, partly because on i386 there wasn't
> a whole lot of options (though atomic_store_rel on even i386 should just
> be a simple store).
>
>> >> The second variant of each operation includes a read memory barrier.
>> >> This barrier ensures that the effects of this operation are completed
>> >> before the effects of any later data accesses.  As a result, the
>> >> opera- tion is said to have acquire semantics as it acquires a
>> >> pseudo-lock requiring further operations to wait until it has
>> >> completed.  To denote this, the suffix ``_acq'' is inserted into the
>> >> function name immediately prior to the ``_<type>'' suffix.  For
>> >> example, to subtract two integers ensuring that any later writes will
>> >> happen after the subtraction is per- formed, use
>> >> atomic_subtract_acq_int().
>> >
>> > It depends on the point we guarantee the acquire barrier to be. On ARMv8
>> > the function will be a load/modify/write sequence. If we use a
>> > load-acquire operation for atomic_subtract_acq_int, for example, for a
>> > pointer P and value to subtract X:
>> >
>> > loop:
>> >  load-acquire *P to N
>> >  perform N = N - X
>> >  store-exclusive N to *P
>> >  if the store failed goto loop
>> >
>> > where N and X are both registers.
>> >
>> > This will mean no access after this loop will happen before it, but
>> > they may happen within it, e.g. if there was a later access A the
>> > following may be possible:
>> >
>> > Load P
>> > Access A
>> > Store P
>>
>> No, this will be broken in FreeBSD if "Access A" is later.
>> If "Access A" is prior the membar it doesn't really matter if it gets
>> interleaved with any of the operations in the atomic instruction.
>> Ideally, it could even surpass the Store P itself.
>> But if "Access A" is later (and you want to implement an _acq()
>> barrier) then it cannot absolutely gets in the middle of the atomic_*
>> operation.
>
> Eh, that isn't broken.  It is subtle however.  The reason it isn't broken
> is that if any access to P occurs afer the 'load P', then the store will
> fail and the load-acquire will be retried, if A was accessed during the
> atomi op, the load-acquire during the try will discard that and force A
> to be re-accessed.  If P is not accessed during the atomic op, then it is
> safe to access A during the atomic op itself.

This is specific to armv8, which I know 0 about. Good to know.
>From a general point of view the description didn't seem ok.

>> > We know the store will happen as if it fails, e.g. another processor
>> > access *P, the store will have failed and will iterate over the loop.
>> >
>> > The other point is we can guarantee any store-release, and therefore
>> > any prior access, has happened before a later load-acquire even if it's
>> > on another processor.
>>
>> No, we can never guarantee on the visibility of the operations by other CPUs.
>> We just make guarantee on how the operations are posted on the system
>> bus (or how they are locally visible).
>> Keeping in mind that FreeBSD model cames from x86, you can sense that
>> some things are sized on the x86 model, which doesn't have any rule or
>> ordering on global visibility of the operations.
>
> 1) Again, it's actually based on ia64.
>
> 2) x86 _does_ have rules on ordering of global visiblity in that most
>    stores (aside from some SSE special cases) will become visible in
>    program order.  Now, you can't force the _timing_ of when the stores
>    become visible (and this is true in general, in MI code you can't
>    assume that a barrier is equivalent to a cache flush).

Yes, this is what I mean. You can't have guarantee on the global
timing of the memory accesses.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 16:58:20 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CDE9EFBF;
 Wed, 29 Oct 2014 16:58:20 +0000 (UTC)
Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8B9166AE;
 Wed, 29 Oct 2014 16:58:20 +0000 (UTC)
Received: from [73.34.117.227] (helo=ilsoft.org)
 by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1XjWZe-0005En-AH; Wed, 29 Oct 2014 16:58:18 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s9TGwGP8081108;
 Wed, 29 Oct 2014 10:58:16 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 73.34.117.227
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX18JTPxJ34qVhZExi5qdQV6H
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: atomic ops
From: Ian Lepore <ian@FreeBSD.org>
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <201410291059.16829.jhb@freebsd.org>
References: <20141028025222.GA19223@dft-labs.eu>
 <20141028175318.709d2ef6@bender.lan>
 <CAJ-FndCsvLV_B3Q0boyK78980chM79hFf_dRyEqRtxzMJkpD5g@mail.gmail.com>
 <201410291059.16829.jhb@freebsd.org>
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 29 Oct 2014 10:58:15 -0600
Message-ID: <1414601895.17308.89.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Alan Cox <alc@rice.edu>, Andrew Turner <andrew@fubar.geek.nz>,
 attilio@freebsd.org, Konstantin Belousov <kib@freebsd.org>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 16:58:20 -0000

On Wed, 2014-10-29 at 10:59 -0400, John Baldwin wrote:
> On Tuesday, October 28, 2014 4:08:27 pm Attilio Rao wrote:
> > On Tue, Oct 28, 2014 at 6:53 PM, Andrew Turner <andrew@fubar.geek.nz> wrote:
> > > On Tue, 28 Oct 2014 15:33:06 +0100
> > > Attilio Rao <attilio@freebsd.org> wrote:
> > >> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz>
> > >> wrote:
> > >> > On Tue, 28 Oct 2014 14:18:41 +0100
> > >> > Attilio Rao <attilio@freebsd.org> wrote:
> > >> >
> > >> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
> > >> >> wrote:
> > >> >> > As was mentioned sometime ago, our situation related to atomic
> > >> >> > ops is not ideal.
> > >> >> >
> > >> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64)
> > >> >> > provide full memory barriers, which is stronger than needed.
> > >> >> >
> > >> >> > Moreover, load is implemented as lock cmpchg on var address, so
> > >> >> > it is addditionally slower especially when cpus compete.
> > >> >>
> > >> >> I already explained this once privately: fully memory barriers is
> > >> >> not stronger than needed.
> > >> >> FreeBSD has a different semantic than Linux. We historically
> > >> >> enforce a full barrier on _acq() and _rel() rather then just a
> > >> >> read and write barrier, hence we need a different implementation
> > >> >> than Linux. There is code that relies on this property, like the
> > >> >> locking primitives (release a mutex, for instance).
> > >> >
> > >> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
> > >> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has
> > >> > added support for load-acquire and store-release atomic
> > >> > instructions. For the use in atomic instructions we can assume
> > >> > these only operate of the address passed to them.
> > >> >
> > >> > It is unlikely we will use them in the 32-bit port however I would
> > >> > like to know the expected semantics of these atomic functions to
> > >> > make sure we get them correct in the arm64 port. I have been
> > >> > advised by one of the ARM Linux kernel maintainers on the problems
> > >> > they have found using these instructions but have yet to determine
> > >> > what our atomic functions guarantee.
> > >>
> > >> For FreeBSD the "reference doc" is atomic(9).
> > >> It clearly states:
> > >
> > > There may also be a difference between what it states, how they are
> > > implemented, and what developers assume they do. I'm trying to make
> > > sure I get them correct.
> > 
> > atomic(9) is our reference so there might be no difference between
> > what it states and what all architectures implement.
> > I can say that x86 follows atomic(9) well. I'm not competent enough to
> > judge if all the !x86 arches follow it completely.
> > I can understand that developers may get confused. The FreeBSD scheme
> > is pretty unique. It comes from the fact that historically the membar
> > support was made to initially support x86. The super-widespread Linux
> > design, instead, tried to catch all architectures in its description.
> > It become very well known and I think it also "pushed" for companies
> > like Intel to invest in improving performance of things like explicit
> > read/write barriers, etc.
> 
> Actually, it was designed to support ia64 (and specifically the .acq and
> .rel modifiers on the ld, st, and cmpxchg instructions).  Some of the
> langage is wrong (and is my fault) in that they are not "read" and
> "write" barriers.  They truly are "acquire" and "release".  That said,
> x86 has stronger barriers than that, partly because on i386 there wasn't
> a whole lot of options (though atomic_store_rel on even i386 should just
> be a simple store).
> 
> > >> The second variant of each operation includes a read memory barrier.
> > >> This barrier ensures that the effects of this operation are completed
> > >> before the effects of any later data accesses.  As a result, the
> > >> opera- tion is said to have acquire semantics as it acquires a
> > >> pseudo-lock requiring further operations to wait until it has
> > >> completed.  To denote this, the suffix ``_acq'' is inserted into the
> > >> function name immediately prior to the ``_<type>'' suffix.  For
> > >> example, to subtract two integers ensuring that any later writes will
> > >> happen after the subtraction is per- formed, use
> > >> atomic_subtract_acq_int().
> > >
> > > It depends on the point we guarantee the acquire barrier to be. On ARMv8
> > > the function will be a load/modify/write sequence. If we use a
> > > load-acquire operation for atomic_subtract_acq_int, for example, for a
> > > pointer P and value to subtract X:
> > >
> > > loop:
> > >  load-acquire *P to N
> > >  perform N = N - X
> > >  store-exclusive N to *P
> > >  if the store failed goto loop
> > >
> > > where N and X are both registers.
> > >
> > > This will mean no access after this loop will happen before it, but
> > > they may happen within it, e.g. if there was a later access A the
> > > following may be possible:
> > >
> > > Load P
> > > Access A
> > > Store P
> > 
> > No, this will be broken in FreeBSD if "Access A" is later.
> > If "Access A" is prior the membar it doesn't really matter if it gets
> > interleaved with any of the operations in the atomic instruction.
> > Ideally, it could even surpass the Store P itself.
> > But if "Access A" is later (and you want to implement an _acq()
> > barrier) then it cannot absolutely gets in the middle of the atomic_*
> > operation.
> 
> Eh, that isn't broken.  It is subtle however.  The reason it isn't broken
> is that if any access to P occurs afer the 'load P', then the store will
> fail and the load-acquire will be retried, if A was accessed during the
> atomi op, the load-acquire during the try will discard that and force A
> to be re-accessed.  If P is not accessed during the atomic op, then it is
> safe to access A during the atomic op itself.
> 

I'm not sure I completely agree with all of this. 

First, for 

        if any access to P occurs afer the 'load P', then the store will
        fail and the load-acquire will be retried

The term 'access' needs to be changed to 'store'.  Other read accesses
to P will not cause the store-exclusive to fail.

Next, when we consider 'Access A' I'm not sure it's true that the access
will replay if the store-exclusive fails and the operation loops.  The
access to A may have been a prefetch, even a prefetch for data on a
predicted upcoming execution branch which may or may not end up being
taken.

I think the only think that makes an ldrex/strex sequence safe for use
in implementing synchronization primitives is to insert a 'dmb' after
the acquire loop (after the strex succeeds), and 'dsb' before the
release loop (dsb is required for SMP, dmb might be good enough on UP).

Looking into this has made me realize our current armv6/7 atomics are
incorrect in this regard.  Guess I'll see about fixing them up Real Soon
Now.  :)

-- Ian

> > > We know the store will happen as if it fails, e.g. another processor
> > > access *P, the store will have failed and will iterate over the loop.
> > >
> > > The other point is we can guarantee any store-release, and therefore
> > > any prior access, has happened before a later load-acquire even if it's
> > > on another processor.
> > 
> > No, we can never guarantee on the visibility of the operations by other CPUs.
> > We just make guarantee on how the operations are posted on the system
> > bus (or how they are locally visible).
> > Keeping in mind that FreeBSD model cames from x86, you can sense that
> > some things are sized on the x86 model, which doesn't have any rule or
> > ordering on global visibility of the operations.
> 
> 1) Again, it's actually based on ia64.
> 
> 2) x86 _does_ have rules on ordering of global visiblity in that most
>    stores (aside from some SSE special cases) will become visible in
>    program order.  Now, you can't force the _timing_ of when the stores
>    become visible (and this is true in general, in MI code you can't
>    assume that a barrier is equivalent to a cache flush).
> 
> 3) In this case I think Andrew is using "armv8" for "we" and you can
>    depend on architecture-specific semantics to determine the implementation
>    of atomic(9).
> 


From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 17:36:38 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A683DB13;
 Wed, 29 Oct 2014 17:36:38 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7CADFB18;
 Wed, 29 Oct 2014 17:36:38 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6DC03B97F;
 Wed, 29 Oct 2014 13:36:37 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Ian Lepore <ian@freebsd.org>
Subject: Re: atomic ops
Date: Wed, 29 Oct 2014 13:35:57 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20141028025222.GA19223@dft-labs.eu>
 <201410291059.16829.jhb@freebsd.org>
 <1414601895.17308.89.camel@revolution.hippie.lan>
In-Reply-To: <1414601895.17308.89.camel@revolution.hippie.lan>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201410291335.57919.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Wed, 29 Oct 2014 13:36:37 -0400 (EDT)
Cc: Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Alan Cox <alc@rice.edu>, Andrew Turner <andrew@fubar.geek.nz>,
 attilio@freebsd.org, Konstantin Belousov <kib@freebsd.org>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 17:36:38 -0000

On Wednesday, October 29, 2014 12:58:15 pm Ian Lepore wrote:
> On Wed, 2014-10-29 at 10:59 -0400, John Baldwin wrote:
> > Eh, that isn't broken.  It is subtle however.  The reason it isn't broken
> > is that if any access to P occurs afer the 'load P', then the store will
> > fail and the load-acquire will be retried, if A was accessed during the
> > atomi op, the load-acquire during the try will discard that and force A
> > to be re-accessed.  If P is not accessed during the atomic op, then it is
> > safe to access A during the atomic op itself.
> > 
> 
> I'm not sure I completely agree with all of this. 
> 
> First, for 
> 
>         if any access to P occurs afer the 'load P', then the store will
>         fail and the load-acquire will be retried
> 
> The term 'access' needs to be changed to 'store'.  Other read accesses
> to P will not cause the store-exclusive to fail.

Correct, though for the places where acquire is used I believe that is ok.
Certainly for lock cookies it is ok.  It's writes to the lock cookie that
would invalidate 'A'.

> Next, when we consider 'Access A' I'm not sure it's true that the access
> will replay if the store-exclusive fails and the operation loops.  The
> access to A may have been a prefetch, even a prefetch for data on a
> predicted upcoming execution branch which may or may not end up being
> taken.
> 
> I think the only think that makes an ldrex/strex sequence safe for use
> in implementing synchronization primitives is to insert a 'dmb' after
> the acquire loop (after the strex succeeds), and 'dsb' before the
> release loop (dsb is required for SMP, dmb might be good enough on UP).
> 
> Looking into this has made me realize our current armv6/7 atomics are
> incorrect in this regard.  Guess I'll see about fixing them up Real Soon
> Now.  :)

I'm not actually sure either, but it would be surprising to me otherwise.
Presumably there is nothing magic about a branch.  Either the load-acquire
is an acquire barrier or it isn't.  Namely, suppose you had this sequence:

	load-acquire P
	access A (prefetch)
	load-acquire Q
	load A

Would you expect the prefetch to satisfy the load or should the load-acquire
on Q discard that?  Having a branch after a failing conditional store back
to the load acquire should work similarly.  It has to discard anything that
was prefetched or it isn't an actual load-acquire.

That is consider:

1:
	load-acquire P
	access A (prefetch)
	conditonal-store P
	branch-if-fail 1b
	load A

In the case that the branch fails, the sequence of operations is:

	load-acquire P
	access A (prefetch)
	conditional-store P
	branch
	load-acquire P

That should be equivalent to the first sequence above unless the branch
instruction has the magical property of disabling memory barriers on the
instruction after a branch (which would be insane).

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 17:50:29 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E62AD21B
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 17:50:29 +0000 (UTC)
Received: from mail-yk0-f177.google.com (mail-yk0-f177.google.com
 [209.85.160.177])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A9E6CC98
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 17:50:29 +0000 (UTC)
Received: by mail-yk0-f177.google.com with SMTP id 79so1505210ykr.8
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 10:50:23 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=7UNUoQuwZvnxRXz4FABMxYW4GBXDMSDjLiV2waYoVZI=;
 b=jaouMgRlQgwrCPw4bjja0rjSKJjCjRGBwZlHAcqOrzCnEsHds7LDeXSseJVkMRlZtr
 LN1XidwKBWJKYf3PBuAxSvnbXeyu/Ka+K//tbk2nAGDrw1p4KMsrojZiQV7zyuqrFrsN
 g9dffXLpGg90UmytBoqU62dmIKFi8HpngWvhw7V1du5vt2shsH3A66dCh71rkYpo4kan
 6ojTUKKtUSzlc/AXaDHJZDKewuCOV5/e41mBnvUSroKYyTYKopGLuCBQ0vTwcjdvDj5V
 qcpjvQ4Uwy+vzmKxNbxaejTfmfWXeD1LYnNfYQY/HXIEJzAeA1LU/K/g8IAcUwCOvI3Q
 OsIg==
X-Gm-Message-State: ALoCoQntSTM/4ZSgYseEc5bo7QWGwZb+sgB2/Xt4t/2EaZpzmMqdeeHk/RuC4o6Wd+7dSCGtMACd
MIME-Version: 1.0
X-Received: by 10.170.233.6 with SMTP id z6mr3453134ykf.101.1414605023070;
 Wed, 29 Oct 2014 10:50:23 -0700 (PDT)
Received: by 10.170.46.203 with HTTP; Wed, 29 Oct 2014 10:50:22 -0700 (PDT)
X-Originating-IP: [62.165.198.134]
In-Reply-To: <201410281146.49370.jhb@freebsd.org>
References: <20141021094539.GA1877@kib.kiev.ua>
 <2048849.GkvWliFbyg@ralph.baldwin.cx>
 <20141027165557.GC1877@kib.kiev.ua>
 <201410281146.49370.jhb@freebsd.org>
Date: Wed, 29 Oct 2014 18:50:22 +0100
Message-ID: <CAPQ4ffsXa4BOHWJt_YhPOSDu5KQpUf0oVcMoiAFCxyR9YVKCdQ@mail.gmail.com>
Subject: Re: RfC: fueword(9) and casueword(9)
From: Oliver Pinter <oliver.pinter@hardenedbsd.org>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 17:50:30 -0000

On Tue, Oct 28, 2014 at 4:46 PM, John Baldwin <jhb@freebsd.org> wrote:
> On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote:
>> On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote:
>> > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote:
>> > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote:
>> > > > A new API should try to fix these __DEVOLATILE() abominations.  I think it
>> > > > is safe, and even correct, to declare the pointers as volatile const void
>> > > > *, since the functions really can handle volatile data, unlike copyin().
>> > > >
>> > > > Atomic op functions are declared as taking pointers to volatile for
>> > > > similar reasons.  Often they are applied to non-volatile data, but
>> > > > adding a qualifier is type-safe and doesn't cost efficiency since the
>> > > > pointer access is is not known to the compiler.  (The last point is not
>> > > > so clear -- the compiler can see things in the functions since they are
>> > > > inline asm.  fueword() isn't inline so its (in)efficiency is not changed.)
>> > > >
>> > > > The atomic read functions are not declared as taking pointers to const.
>> > > > The __DECONST() abomination might be used to work around this bug.
>> > >
>> > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the
>> > > umtx structures definitions.  I think that it is bug to mark the lock
>> > > words with volatile.  I want the fueword(9) interface to be as much
>> > > similar to fuword(9), in particular, volatile seems to be not needed.
>> >
>> > I agree with Bruce here.  casuword() already accepts volatile.  I also
>> > think umtx is correct in marking the field as volatile.  They are subject
>> > to change without the compiler's knowledge albeit by other threads
>> > rather than signal handlers.  Having them marked volatile doesn't really
>> > matter for the kernel, but the header is also used in userland and is
>> > relevant in sem_new.c, etc.
>>
>> You agree with making fueword() accept volatile const void * as the
>> address ?  Or do you agree with the existence of the volatile type
>> qualifier for the lock field of umtx structures ?
>
> I agree with both (I thought Bruce only asserted the first).
>
>> I definitely do not want to make fueword() different from fuword() in
>> this aspect.  If changing both fueword() and fuword() to take volatile
>> const * address, this should be different patch.
>
> I also agree that fuword() and fueword() should take identical arguments,
> so if this change is made it should be a separate patch (and should include
> suword()).
>
> --
> John Baldwin

Hi Konstantin!

I got this error with clang_complete + vim:

"/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c"
286L, 8326Csem_wait: Operation not supported

                                            sem_wait: Operation not
supported


      Fatal Python error: PyEval_SaveThread: NULL tstate
Vim: Caught deadly signal ABRT
Vim: Finished.
Abort (core dumped)

It's on recent HEAD + HardenedBSD patches, so I must to inspect that
this is caused by hbsd's changes or your.

I don't see this problem on HardenedBSD build, which built on Oct. 23:
[1] FreeBSD 11.0-CURRENT #0 0c61f55(hardened/current/master): Thu Oct
23 09:04:50 CEST 2014
[1]     op@hardenedbsd:/usr/obj/usr/src/sys/HARDENEDBSD amd64

(currently I build a new kernel, which was based before the fueword changes)

If you need help, please ping me.


> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 17:54:08 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 003D9478
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 17:54:07 +0000 (UTC)
Received: from mail-yh0-f43.google.com (mail-yh0-f43.google.com
 [209.85.213.43])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B60CAD69
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 17:54:07 +0000 (UTC)
Received: by mail-yh0-f43.google.com with SMTP id z6so824985yhz.2
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 10:54:01 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=Y+BJt6sDSKi/4TJ+XBl0MzDsv6bacnziejLt6tWUs28=;
 b=B3jHuFuR5hoYtfsHxUUsqaUR9447KcC5fpD1DCZEsn/T6RgtvAf8P1KhyFlgQaJJTf
 iZiqPIEwvzR6SnXin5Xg3ij2sROnCSJUGmb79qEneNjFbCezgYhkBjFudy2+b2CTeRt5
 6/xQW/q8pPXYNKPiAZcp9lpagzkwD7RPqY1VTpxrem9bxDNcrfiV3+9bPkczGk7NNORl
 wiETaWvWK7B0I77b6BZPNaVQGj5HWMpp1u1RzHv8wzRK6DHkgugtbTQ4dM7uvTEXFJ0F
 wXSud5N00e226ag6rMi7kne3IvYwrUAiLojEuVyKL3GrWSmJ+YOO2KOgtCRRwiQ8oCAn
 5gbw==
X-Gm-Message-State: ALoCoQkhvu9QY0apo8WE7ets7gts2IYKdj2HYsMboIaumGT1c258++QyyF/NcEPuym0rk2W2sBkR
MIME-Version: 1.0
X-Received: by 10.236.14.229 with SMTP id d65mr3172507yhd.45.1414605241344;
 Wed, 29 Oct 2014 10:54:01 -0700 (PDT)
Received: by 10.170.46.203 with HTTP; Wed, 29 Oct 2014 10:54:00 -0700 (PDT)
X-Originating-IP: [62.165.198.134]
In-Reply-To: <CAPQ4ffsXa4BOHWJt_YhPOSDu5KQpUf0oVcMoiAFCxyR9YVKCdQ@mail.gmail.com>
References: <20141021094539.GA1877@kib.kiev.ua>
 <2048849.GkvWliFbyg@ralph.baldwin.cx>
 <20141027165557.GC1877@kib.kiev.ua>
 <201410281146.49370.jhb@freebsd.org>
 <CAPQ4ffsXa4BOHWJt_YhPOSDu5KQpUf0oVcMoiAFCxyR9YVKCdQ@mail.gmail.com>
Date: Wed, 29 Oct 2014 18:54:00 +0100
Message-ID: <CAPQ4ffsSPtRyQD==WROCR6Shmm6d=N_6oS8zoJEcio9fCi1Amw@mail.gmail.com>
Subject: Re: RfC: fueword(9) and casueword(9)
From: Oliver Pinter <oliver.pinter@hardenedbsd.org>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 17:54:08 -0000

On Wed, Oct 29, 2014 at 6:50 PM, Oliver Pinter
<oliver.pinter@hardenedbsd.org> wrote:
> On Tue, Oct 28, 2014 at 4:46 PM, John Baldwin <jhb@freebsd.org> wrote:
>> On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote:
>>> On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote:
>>> > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote:
>>> > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote:
>>> > > > A new API should try to fix these __DEVOLATILE() abominations.  I think it
>>> > > > is safe, and even correct, to declare the pointers as volatile const void
>>> > > > *, since the functions really can handle volatile data, unlike copyin().
>>> > > >
>>> > > > Atomic op functions are declared as taking pointers to volatile for
>>> > > > similar reasons.  Often they are applied to non-volatile data, but
>>> > > > adding a qualifier is type-safe and doesn't cost efficiency since the
>>> > > > pointer access is is not known to the compiler.  (The last point is not
>>> > > > so clear -- the compiler can see things in the functions since they are
>>> > > > inline asm.  fueword() isn't inline so its (in)efficiency is not changed.)
>>> > > >
>>> > > > The atomic read functions are not declared as taking pointers to const.
>>> > > > The __DECONST() abomination might be used to work around this bug.
>>> > >
>>> > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the
>>> > > umtx structures definitions.  I think that it is bug to mark the lock
>>> > > words with volatile.  I want the fueword(9) interface to be as much
>>> > > similar to fuword(9), in particular, volatile seems to be not needed.
>>> >
>>> > I agree with Bruce here.  casuword() already accepts volatile.  I also
>>> > think umtx is correct in marking the field as volatile.  They are subject
>>> > to change without the compiler's knowledge albeit by other threads
>>> > rather than signal handlers.  Having them marked volatile doesn't really
>>> > matter for the kernel, but the header is also used in userland and is
>>> > relevant in sem_new.c, etc.
>>>
>>> You agree with making fueword() accept volatile const void * as the
>>> address ?  Or do you agree with the existence of the volatile type
>>> qualifier for the lock field of umtx structures ?
>>
>> I agree with both (I thought Bruce only asserted the first).
>>
>>> I definitely do not want to make fueword() different from fuword() in
>>> this aspect.  If changing both fueword() and fuword() to take volatile
>>> const * address, this should be different patch.
>>
>> I also agree that fuword() and fueword() should take identical arguments,
>> so if this change is made it should be a separate patch (and should include
>> suword()).
>>
>> --
>> John Baldwin
>
> Hi Konstantin!
>
> I got this error with clang_complete + vim:
>
> "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c"
> 286L, 8326Csem_wait: Operation not supported
>
>                                             sem_wait: Operation not
> supported
>
>
>       Fatal Python error: PyEval_SaveThread: NULL tstate
> Vim: Caught deadly signal ABRT
> Vim: Finished.
> Abort (core dumped)
>
> It's on recent HEAD + HardenedBSD patches, so I must to inspect that
> this is caused by hbsd's changes or your.
>
> I don't see this problem on HardenedBSD build, which built on Oct. 23:
> [1] FreeBSD 11.0-CURRENT #0 0c61f55(hardened/current/master): Thu Oct
> 23 09:04:50 CEST 2014
> [1]     op@hardenedbsd:/usr/obj/usr/src/sys/HARDENEDBSD amd64
>
> (currently I build a new kernel, which was based before the fueword changes)
>
> If you need help, please ping me.

gdb vim

r ...

"/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c"
286L, 8326C(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...(no debugging symbols
found)...(no debugging symbols found)...sem_wait: Operation not
supported


sem_wait: Operation not supported
   Fatal Python error: PyEval_SaveThread: NULL tstate

Program received signal SIGABRT, Aborted.
0x00000009f5bb387a in thr_kill () from /lib/libc.so.7
(gdb) bt
#0  0x00000009f5bb387a in thr_kill () from /lib/libc.so.7
#1  0x00000009f5c76849 in abort () from /lib/libc.so.7
#2  0x00000009f566c031 in Py_FatalError () from /usr/local/lib/libpython2.7.so.1
#3  0x00000009f56448f1 in PyEval_SaveThread () from
/usr/local/lib/libpython2.7.so.1
#4  0x00000009f79ceef5 in _PyTime_FloatTime () from
/usr/local/lib/python2.7/lib-dynload/time.so
#5  0x00000009f564a31b in PyEval_EvalFrameEx () from
/usr/local/lib/libpython2.7.so.1
#6  0x00000009f564cb42 in _PyEval_SliceIndex () from
/usr/local/lib/libpython2.7.so.1
#7  0x00000009f564862b in PyEval_EvalFrameEx () from
/usr/local/lib/libpython2.7.so.1
#8  0x00000009f564cb42 in _PyEval_SliceIndex () from
/usr/local/lib/libpython2.7.so.1
#9  0x00000009f564862b in PyEval_EvalFrameEx () from
/usr/local/lib/libpython2.7.so.1
#10 0x00000009f56452d4 in PyEval_EvalCodeEx () from
/usr/local/lib/libpython2.7.so.1
#11 0x00000009f55d63bc in PyFunction_SetClosure () from
/usr/local/lib/libpython2.7.so.1
#12 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1
#13 0x00000009f55becc3 in PyMethod_New () from /usr/local/lib/libpython2.7.so.1
#14 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1
#15 0x00000009f564c28d in PyEval_CallObjectWithKeywords () from
/usr/local/lib/libpython2.7.so.1
#16 0x00000009f5681916 in initthread () from /usr/local/lib/libpython2.7.so.1
#17 0x00000009f59274f5 in pthread_create () from /lib/libthr.so.3
#18 0x0000000000000000 in ?? ()


>
>
>> _______________________________________________
>> freebsd-arch@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 18:03:54 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A28CA86F;
 Wed, 29 Oct 2014 18:03:54 +0000 (UTC)
Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 60E91E9E;
 Wed, 29 Oct 2014 18:03:54 +0000 (UTC)
Received: from [73.34.117.227] (helo=ilsoft.org)
 by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
 (Exim 4.72) (envelope-from <ian@FreeBSD.org>)
 id 1XjXb6-000ADo-SF; Wed, 29 Oct 2014 18:03:53 +0000
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by ilsoft.org (8.14.9/8.14.9) with ESMTP id s9TI3osa081247;
 Wed, 29 Oct 2014 12:03:50 -0600 (MDT) (envelope-from ian@FreeBSD.org)
X-Mail-Handler: Dyn Standard SMTP by Dyn
X-Originating-IP: 73.34.117.227
X-Report-Abuse-To: abuse@dyndns.com (see
 http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse
 reporting information)
X-MHO-User: U2FsdGVkX1+2a/JXp6EOyczZ1i5aJJC0
X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan
 [172.22.42.240] claimed to be [172.22.42.240]
Subject: Re: atomic ops
From: Ian Lepore <ian@FreeBSD.org>
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <201410291335.57919.jhb@freebsd.org>
References: <20141028025222.GA19223@dft-labs.eu>
 <201410291059.16829.jhb@freebsd.org>
 <1414601895.17308.89.camel@revolution.hippie.lan>
 <201410291335.57919.jhb@freebsd.org>
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 29 Oct 2014 12:03:50 -0600
Message-ID: <1414605830.17308.100.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Alan Cox <alc@rice.edu>, Andrew Turner <andrew@fubar.geek.nz>,
 attilio@freebsd.org, Konstantin Belousov <kib@freebsd.org>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 18:03:54 -0000

On Wed, 2014-10-29 at 13:35 -0400, John Baldwin wrote:
> On Wednesday, October 29, 2014 12:58:15 pm Ian Lepore wrote:
> > On Wed, 2014-10-29 at 10:59 -0400, John Baldwin wrote:
> > > Eh, that isn't broken.  It is subtle however.  The reason it isn't broken
> > > is that if any access to P occurs afer the 'load P', then the store will
> > > fail and the load-acquire will be retried, if A was accessed during the
> > > atomi op, the load-acquire during the try will discard that and force A
> > > to be re-accessed.  If P is not accessed during the atomic op, then it is
> > > safe to access A during the atomic op itself.
> > > 
> > 
> > I'm not sure I completely agree with all of this. 
> > 
> > First, for 
> > 
> >         if any access to P occurs afer the 'load P', then the store will
> >         fail and the load-acquire will be retried
> > 
> > The term 'access' needs to be changed to 'store'.  Other read accesses
> > to P will not cause the store-exclusive to fail.
> 
> Correct, though for the places where acquire is used I believe that is ok.
> Certainly for lock cookies it is ok.  It's writes to the lock cookie that
> would invalidate 'A'.
> 
> > Next, when we consider 'Access A' I'm not sure it's true that the access
> > will replay if the store-exclusive fails and the operation loops.  The
> > access to A may have been a prefetch, even a prefetch for data on a
> > predicted upcoming execution branch which may or may not end up being
> > taken.
> > 
> > I think the only think that makes an ldrex/strex sequence safe for use
> > in implementing synchronization primitives is to insert a 'dmb' after
> > the acquire loop (after the strex succeeds), and 'dsb' before the
> > release loop (dsb is required for SMP, dmb might be good enough on UP).
> > 
> > Looking into this has made me realize our current armv6/7 atomics are
> > incorrect in this regard.  Guess I'll see about fixing them up Real Soon
> > Now.  :)
> 
> I'm not actually sure either, but it would be surprising to me otherwise.
> Presumably there is nothing magic about a branch.  Either the load-acquire
> is an acquire barrier or it isn't.  Namely, suppose you had this sequence:
> 
> 	load-acquire P
> 	access A (prefetch)
> 	load-acquire Q
> 	load A
> 
> Would you expect the prefetch to satisfy the load or should the load-acquire
> on Q discard that?  Having a branch after a failing conditional store back
> to the load acquire should work similarly.  It has to discard anything that
> was prefetched or it isn't an actual load-acquire.
> 
> That is consider:
> 
> 1:
> 	load-acquire P
> 	access A (prefetch)
> 	conditonal-store P
> 	branch-if-fail 1b
> 	load A
> 
> In the case that the branch fails, the sequence of operations is:
> 
> 	load-acquire P
> 	access A (prefetch)
> 	conditional-store P
> 	branch
> 	load-acquire P
> 
> That should be equivalent to the first sequence above unless the branch
> instruction has the magical property of disabling memory barriers on the
> instruction after a branch (which would be insane).
> 

I hadn't realized it when I wrote that, but Andy was speaking in the
context of armv8, which has a true load-acquire instruction.  In our
current code (armv6 and 7) we need the explicit dmb/dsb barriers to get
the same effect.  (It turns out we do have barriers, I misspoke earlier,
but some of our dmb need to be dsb.)

-- Ian


From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 18:06:42 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8593B9DA;
 Wed, 29 Oct 2014 18:06:42 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0DF87EC6;
 Wed, 29 Oct 2014 18:06:41 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9TI6Z0c023223
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Wed, 29 Oct 2014 20:06:35 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9TI6Z0c023223
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s9TI6ZVU023222;
 Wed, 29 Oct 2014 20:06:35 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 29 Oct 2014 20:06:35 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Oliver Pinter <oliver.pinter@hardenedbsd.org>
Subject: Re: RfC: fueword(9) and casueword(9)
Message-ID: <20141029180635.GJ53947@kib.kiev.ua>
References: <20141021094539.GA1877@kib.kiev.ua>
 <2048849.GkvWliFbyg@ralph.baldwin.cx>
 <20141027165557.GC1877@kib.kiev.ua>
 <201410281146.49370.jhb@freebsd.org>
 <CAPQ4ffsXa4BOHWJt_YhPOSDu5KQpUf0oVcMoiAFCxyR9YVKCdQ@mail.gmail.com>
 <CAPQ4ffsSPtRyQD==WROCR6Shmm6d=N_6oS8zoJEcio9fCi1Amw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAPQ4ffsSPtRyQD==WROCR6Shmm6d=N_6oS8zoJEcio9fCi1Amw@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 18:06:42 -0000

On Wed, Oct 29, 2014 at 06:54:00PM +0100, Oliver Pinter wrote:
> On Wed, Oct 29, 2014 at 6:50 PM, Oliver Pinter
> <oliver.pinter@hardenedbsd.org> wrote:
> > On Tue, Oct 28, 2014 at 4:46 PM, John Baldwin <jhb@freebsd.org> wrote:
> >> On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote:
> >>> On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote:
> >>> > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote:
> >>> > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote:
> >>> > > > A new API should try to fix these __DEVOLATILE() abominations.  I think it
> >>> > > > is safe, and even correct, to declare the pointers as volatile const void
> >>> > > > *, since the functions really can handle volatile data, unlike copyin().
> >>> > > >
> >>> > > > Atomic op functions are declared as taking pointers to volatile for
> >>> > > > similar reasons.  Often they are applied to non-volatile data, but
> >>> > > > adding a qualifier is type-safe and doesn't cost efficiency since the
> >>> > > > pointer access is is not known to the compiler.  (The last point is not
> >>> > > > so clear -- the compiler can see things in the functions since they are
> >>> > > > inline asm.  fueword() isn't inline so its (in)efficiency is not changed.)
> >>> > > >
> >>> > > > The atomic read functions are not declared as taking pointers to const.
> >>> > > > The __DECONST() abomination might be used to work around this bug.
> >>> > >
> >>> > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the
> >>> > > umtx structures definitions.  I think that it is bug to mark the lock
> >>> > > words with volatile.  I want the fueword(9) interface to be as much
> >>> > > similar to fuword(9), in particular, volatile seems to be not needed.
> >>> >
> >>> > I agree with Bruce here.  casuword() already accepts volatile.  I also
> >>> > think umtx is correct in marking the field as volatile.  They are subject
> >>> > to change without the compiler's knowledge albeit by other threads
> >>> > rather than signal handlers.  Having them marked volatile doesn't really
> >>> > matter for the kernel, but the header is also used in userland and is
> >>> > relevant in sem_new.c, etc.
> >>>
> >>> You agree with making fueword() accept volatile const void * as the
> >>> address ?  Or do you agree with the existence of the volatile type
> >>> qualifier for the lock field of umtx structures ?
> >>
> >> I agree with both (I thought Bruce only asserted the first).
> >>
> >>> I definitely do not want to make fueword() different from fuword() in
> >>> this aspect.  If changing both fueword() and fuword() to take volatile
> >>> const * address, this should be different patch.
> >>
> >> I also agree that fuword() and fueword() should take identical arguments,
> >> so if this change is made it should be a separate patch (and should include
> >> suword()).
> >>
> >> --
> >> John Baldwin
> >
> > Hi Konstantin!
> >
> > I got this error with clang_complete + vim:
> >
> > "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c"
> > 286L, 8326Csem_wait: Operation not supported
> >
> >                                             sem_wait: Operation not
> > supported
> >
> >
> >       Fatal Python error: PyEval_SaveThread: NULL tstate
> > Vim: Caught deadly signal ABRT
> > Vim: Finished.
> > Abort (core dumped)
> >
> > It's on recent HEAD + HardenedBSD patches, so I must to inspect that
> > this is caused by hbsd's changes or your.
> >
> > I don't see this problem on HardenedBSD build, which built on Oct. 23:
> > [1] FreeBSD 11.0-CURRENT #0 0c61f55(hardened/current/master): Thu Oct
> > 23 09:04:50 CEST 2014
> > [1]     op@hardenedbsd:/usr/obj/usr/src/sys/HARDENEDBSD amd64
> >
> > (currently I build a new kernel, which was based before the fueword changes)
> >
> > If you need help, please ping me.
> 
> gdb vim
> 
> r ...
> 
> "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c"
> 286L, 8326C(no debugging symbols found)...(no debugging symbols
> found)...(no debugging symbols found)...(no debugging symbols
> found)...(no debugging symbols found)...(no debugging symbols
> found)...(no debugging symbols found)...(no debugging symbols
> found)...(no debugging symbols found)...(no debugging symbols
> found)...(no debugging symbols found)...(no debugging symbols
> found)...(no debugging symbols found)...(no debugging symbols
> found)...(no debugging symbols found)...(no debugging symbols
> found)...(no debugging symbols found)...sem_wait: Operation not
> supported
> 
> 
> sem_wait: Operation not supported
>    Fatal Python error: PyEval_SaveThread: NULL tstate
> 
> Program received signal SIGABRT, Aborted.
> 0x00000009f5bb387a in thr_kill () from /lib/libc.so.7
> (gdb) bt
> #0  0x00000009f5bb387a in thr_kill () from /lib/libc.so.7
> #1  0x00000009f5c76849 in abort () from /lib/libc.so.7
> #2  0x00000009f566c031 in Py_FatalError () from /usr/local/lib/libpython2.7.so.1
> #3  0x00000009f56448f1 in PyEval_SaveThread () from
> /usr/local/lib/libpython2.7.so.1
> #4  0x00000009f79ceef5 in _PyTime_FloatTime () from
> /usr/local/lib/python2.7/lib-dynload/time.so
> #5  0x00000009f564a31b in PyEval_EvalFrameEx () from
> /usr/local/lib/libpython2.7.so.1
> #6  0x00000009f564cb42 in _PyEval_SliceIndex () from
> /usr/local/lib/libpython2.7.so.1
> #7  0x00000009f564862b in PyEval_EvalFrameEx () from
> /usr/local/lib/libpython2.7.so.1
> #8  0x00000009f564cb42 in _PyEval_SliceIndex () from
> /usr/local/lib/libpython2.7.so.1
> #9  0x00000009f564862b in PyEval_EvalFrameEx () from
> /usr/local/lib/libpython2.7.so.1
> #10 0x00000009f56452d4 in PyEval_EvalCodeEx () from
> /usr/local/lib/libpython2.7.so.1
> #11 0x00000009f55d63bc in PyFunction_SetClosure () from
> /usr/local/lib/libpython2.7.so.1
> #12 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1
> #13 0x00000009f55becc3 in PyMethod_New () from /usr/local/lib/libpython2.7.so.1
> #14 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1
> #15 0x00000009f564c28d in PyEval_CallObjectWithKeywords () from
> /usr/local/lib/libpython2.7.so.1
> #16 0x00000009f5681916 in initthread () from /usr/local/lib/libpython2.7.so.1
> #17 0x00000009f59274f5 in pthread_create () from /lib/libthr.so.3
> #18 0x0000000000000000 in ?? ()
> 

How could I get a single bit of useful information from this text ?

My guess is that you have old libc and new kernel compiled without
COMPAT_FREEBSD9 and 10.  If this is the cause, it has nothing to
do with my changes.

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 18:10:51 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 1A144BBE
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 18:10:51 +0000 (UTC)
Received: from mail-yh0-f41.google.com (mail-yh0-f41.google.com
 [209.85.213.41])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id CF207FA0
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 18:10:50 +0000 (UTC)
Received: by mail-yh0-f41.google.com with SMTP id b6so844656yha.14
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 11:10:44 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=nIKdmnOOMwfqUrS20ZGqYgtwQh6nWRLbukuilCiwNTU=;
 b=Inwj6hls8aRb0z5VtFP0vBnuOz+qO+dUZp5UlVEiiLUxaY1wuqDaJzym1VBRWIhib3
 6Ny3IzGFFtkUNAJ1+GVgiKjLzbHutqHFiEk41pjINDTI5p9jGYZ1sJ2UdHFP6ntOue57
 1mqGARiZwN6AnhI19zLOcnHGUy983Pt7AnoHT0vgrXFBBwXS79xtWuUS3UpmwrCZItA0
 gFRR85blE79vOSmnhObjKDBi24gal733ChhaEaXuFSuvdWI0auJocmxjQ9Sjjw/ahHmO
 0dN3sEDNdv6T6/MF3jgpyVEfOvkvFgTHuhVfQ21GBl99lzUlWs20o7V9KSAMEcGe0bdK
 tjfA==
X-Gm-Message-State: ALoCoQlIreRa3mffVT0Uz+qArCzNpPmGLzQ0R1bxmvnZCdtrOxrU/IWiDLGkSIeAE0fHAuUxr7xg
MIME-Version: 1.0
X-Received: by 10.170.233.6 with SMTP id z6mr3559601ykf.101.1414606244246;
 Wed, 29 Oct 2014 11:10:44 -0700 (PDT)
Received: by 10.170.46.203 with HTTP; Wed, 29 Oct 2014 11:10:44 -0700 (PDT)
X-Originating-IP: [62.165.198.134]
In-Reply-To: <20141029180635.GJ53947@kib.kiev.ua>
References: <20141021094539.GA1877@kib.kiev.ua>
 <2048849.GkvWliFbyg@ralph.baldwin.cx>
 <20141027165557.GC1877@kib.kiev.ua>
 <201410281146.49370.jhb@freebsd.org>
 <CAPQ4ffsXa4BOHWJt_YhPOSDu5KQpUf0oVcMoiAFCxyR9YVKCdQ@mail.gmail.com>
 <CAPQ4ffsSPtRyQD==WROCR6Shmm6d=N_6oS8zoJEcio9fCi1Amw@mail.gmail.com>
 <20141029180635.GJ53947@kib.kiev.ua>
Date: Wed, 29 Oct 2014 19:10:44 +0100
Message-ID: <CAPQ4ffsNG8Gaes1P_YOv9QBid140qK3YJqbch27DP3YEhYqrnQ@mail.gmail.com>
Subject: Re: RfC: fueword(9) and casueword(9)
From: Oliver Pinter <oliver.pinter@hardenedbsd.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 18:10:51 -0000

On Wed, Oct 29, 2014 at 7:06 PM, Konstantin Belousov
<kostikbel@gmail.com> wrote:
> On Wed, Oct 29, 2014 at 06:54:00PM +0100, Oliver Pinter wrote:
>> On Wed, Oct 29, 2014 at 6:50 PM, Oliver Pinter
>> <oliver.pinter@hardenedbsd.org> wrote:
>> > On Tue, Oct 28, 2014 at 4:46 PM, John Baldwin <jhb@freebsd.org> wrote:
>> >> On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote:
>> >>> On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote:
>> >>> > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote:
>> >>> > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote:
>> >>> > > > A new API should try to fix these __DEVOLATILE() abominations.  I think it
>> >>> > > > is safe, and even correct, to declare the pointers as volatile const void
>> >>> > > > *, since the functions really can handle volatile data, unlike copyin().
>> >>> > > >
>> >>> > > > Atomic op functions are declared as taking pointers to volatile for
>> >>> > > > similar reasons.  Often they are applied to non-volatile data, but
>> >>> > > > adding a qualifier is type-safe and doesn't cost efficiency since the
>> >>> > > > pointer access is is not known to the compiler.  (The last point is not
>> >>> > > > so clear -- the compiler can see things in the functions since they are
>> >>> > > > inline asm.  fueword() isn't inline so its (in)efficiency is not changed.)
>> >>> > > >
>> >>> > > > The atomic read functions are not declared as taking pointers to const.
>> >>> > > > The __DECONST() abomination might be used to work around this bug.
>> >>> > >
>> >>> > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the
>> >>> > > umtx structures definitions.  I think that it is bug to mark the lock
>> >>> > > words with volatile.  I want the fueword(9) interface to be as much
>> >>> > > similar to fuword(9), in particular, volatile seems to be not needed.
>> >>> >
>> >>> > I agree with Bruce here.  casuword() already accepts volatile.  I also
>> >>> > think umtx is correct in marking the field as volatile.  They are subject
>> >>> > to change without the compiler's knowledge albeit by other threads
>> >>> > rather than signal handlers.  Having them marked volatile doesn't really
>> >>> > matter for the kernel, but the header is also used in userland and is
>> >>> > relevant in sem_new.c, etc.
>> >>>
>> >>> You agree with making fueword() accept volatile const void * as the
>> >>> address ?  Or do you agree with the existence of the volatile type
>> >>> qualifier for the lock field of umtx structures ?
>> >>
>> >> I agree with both (I thought Bruce only asserted the first).
>> >>
>> >>> I definitely do not want to make fueword() different from fuword() in
>> >>> this aspect.  If changing both fueword() and fuword() to take volatile
>> >>> const * address, this should be different patch.
>> >>
>> >> I also agree that fuword() and fueword() should take identical arguments,
>> >> so if this change is made it should be a separate patch (and should include
>> >> suword()).
>> >>
>> >> --
>> >> John Baldwin
>> >
>> > Hi Konstantin!
>> >
>> > I got this error with clang_complete + vim:
>> >
>> > "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c"
>> > 286L, 8326Csem_wait: Operation not supported
>> >
>> >                                             sem_wait: Operation not
>> > supported
>> >
>> >
>> >       Fatal Python error: PyEval_SaveThread: NULL tstate
>> > Vim: Caught deadly signal ABRT
>> > Vim: Finished.
>> > Abort (core dumped)
>> >
>> > It's on recent HEAD + HardenedBSD patches, so I must to inspect that
>> > this is caused by hbsd's changes or your.
>> >
>> > I don't see this problem on HardenedBSD build, which built on Oct. 23:
>> > [1] FreeBSD 11.0-CURRENT #0 0c61f55(hardened/current/master): Thu Oct
>> > 23 09:04:50 CEST 2014
>> > [1]     op@hardenedbsd:/usr/obj/usr/src/sys/HARDENEDBSD amd64
>> >
>> > (currently I build a new kernel, which was based before the fueword changes)
>> >
>> > If you need help, please ping me.
>>
>> gdb vim
>>
>> r ...
>>
>> "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c"
>> 286L, 8326C(no debugging symbols found)...(no debugging symbols
>> found)...(no debugging symbols found)...(no debugging symbols
>> found)...(no debugging symbols found)...(no debugging symbols
>> found)...(no debugging symbols found)...(no debugging symbols
>> found)...(no debugging symbols found)...(no debugging symbols
>> found)...(no debugging symbols found)...(no debugging symbols
>> found)...(no debugging symbols found)...(no debugging symbols
>> found)...(no debugging symbols found)...(no debugging symbols
>> found)...(no debugging symbols found)...sem_wait: Operation not
>> supported
>>
>>
>> sem_wait: Operation not supported
>>    Fatal Python error: PyEval_SaveThread: NULL tstate
>>
>> Program received signal SIGABRT, Aborted.
>> 0x00000009f5bb387a in thr_kill () from /lib/libc.so.7
>> (gdb) bt
>> #0  0x00000009f5bb387a in thr_kill () from /lib/libc.so.7
>> #1  0x00000009f5c76849 in abort () from /lib/libc.so.7
>> #2  0x00000009f566c031 in Py_FatalError () from /usr/local/lib/libpython2.7.so.1
>> #3  0x00000009f56448f1 in PyEval_SaveThread () from
>> /usr/local/lib/libpython2.7.so.1
>> #4  0x00000009f79ceef5 in _PyTime_FloatTime () from
>> /usr/local/lib/python2.7/lib-dynload/time.so
>> #5  0x00000009f564a31b in PyEval_EvalFrameEx () from
>> /usr/local/lib/libpython2.7.so.1
>> #6  0x00000009f564cb42 in _PyEval_SliceIndex () from
>> /usr/local/lib/libpython2.7.so.1
>> #7  0x00000009f564862b in PyEval_EvalFrameEx () from
>> /usr/local/lib/libpython2.7.so.1
>> #8  0x00000009f564cb42 in _PyEval_SliceIndex () from
>> /usr/local/lib/libpython2.7.so.1
>> #9  0x00000009f564862b in PyEval_EvalFrameEx () from
>> /usr/local/lib/libpython2.7.so.1
>> #10 0x00000009f56452d4 in PyEval_EvalCodeEx () from
>> /usr/local/lib/libpython2.7.so.1
>> #11 0x00000009f55d63bc in PyFunction_SetClosure () from
>> /usr/local/lib/libpython2.7.so.1
>> #12 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1
>> #13 0x00000009f55becc3 in PyMethod_New () from /usr/local/lib/libpython2.7.so.1
>> #14 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1
>> #15 0x00000009f564c28d in PyEval_CallObjectWithKeywords () from
>> /usr/local/lib/libpython2.7.so.1
>> #16 0x00000009f5681916 in initthread () from /usr/local/lib/libpython2.7.so.1
>> #17 0x00000009f59274f5 in pthread_create () from /lib/libthr.so.3
>> #18 0x0000000000000000 in ?? ()
>>
>
> How could I get a single bit of useful information from this text ?
>
> My guess is that you have old libc and new kernel compiled without
> COMPAT_FREEBSD9 and 10.  If this is the cause, it has nothing to
> do with my changes.

Sure. The userland is from Oct. 20 too, and COMPAT_FREEBSD{9,10} was
not added to kernel config.

Thanks!

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 18:14:15 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 30553CE8;
 Wed, 29 Oct 2014 18:14:15 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 07832FCC;
 Wed, 29 Oct 2014 18:14:15 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1A93AB923;
 Wed, 29 Oct 2014 14:14:14 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Ian Lepore <ian@freebsd.org>
Subject: Re: atomic ops
Date: Wed, 29 Oct 2014 14:13:18 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20141028025222.GA19223@dft-labs.eu>
 <201410291335.57919.jhb@freebsd.org>
 <1414605830.17308.100.camel@revolution.hippie.lan>
In-Reply-To: <1414605830.17308.100.camel@revolution.hippie.lan>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201410291413.18858.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Wed, 29 Oct 2014 14:14:14 -0400 (EDT)
Cc: Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Alan Cox <alc@rice.edu>, Andrew Turner <andrew@fubar.geek.nz>,
 attilio@freebsd.org, Konstantin Belousov <kib@freebsd.org>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 18:14:15 -0000

On Wednesday, October 29, 2014 2:03:50 pm Ian Lepore wrote:
> I hadn't realized it when I wrote that, but Andy was speaking in the
> context of armv8, which has a true load-acquire instruction.  In our
> current code (armv6 and 7) we need the explicit dmb/dsb barriers to get
> the same effect.  (It turns out we do have barriers, I misspoke earlier,
> but some of our dmb need to be dsb.)

Ah, ok.  Fair enough. :)

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 18:23:47 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A2A2F114;
 Wed, 29 Oct 2014 18:23:47 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2D399168;
 Wed, 29 Oct 2014 18:23:47 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9TINdq9026613
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Wed, 29 Oct 2014 20:23:39 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9TINdq9026613
Received: (from kostik@localhost)
 by tom.home (8.14.9/8.14.9/Submit) id s9TINdql026612;
 Wed, 29 Oct 2014 20:23:39 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 29 Oct 2014 20:23:39 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Oliver Pinter <oliver.pinter@hardenedbsd.org>
Subject: Re: RfC: fueword(9) and casueword(9)
Message-ID: <20141029182339.GK53947@kib.kiev.ua>
References: <20141021094539.GA1877@kib.kiev.ua>
 <2048849.GkvWliFbyg@ralph.baldwin.cx>
 <20141027165557.GC1877@kib.kiev.ua>
 <201410281146.49370.jhb@freebsd.org>
 <CAPQ4ffsXa4BOHWJt_YhPOSDu5KQpUf0oVcMoiAFCxyR9YVKCdQ@mail.gmail.com>
 <CAPQ4ffsSPtRyQD==WROCR6Shmm6d=N_6oS8zoJEcio9fCi1Amw@mail.gmail.com>
 <20141029180635.GJ53947@kib.kiev.ua>
 <CAPQ4ffsNG8Gaes1P_YOv9QBid140qK3YJqbch27DP3YEhYqrnQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAPQ4ffsNG8Gaes1P_YOv9QBid140qK3YJqbch27DP3YEhYqrnQ@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.0
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home
Cc: freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 18:23:47 -0000

On Wed, Oct 29, 2014 at 07:10:44PM +0100, Oliver Pinter wrote:
> On Wed, Oct 29, 2014 at 7:06 PM, Konstantin Belousov
> <kostikbel@gmail.com> wrote:
> > How could I get a single bit of useful information from this text ?
> >
> > My guess is that you have old libc and new kernel compiled without
> > COMPAT_FREEBSD9 and 10.  If this is the cause, it has nothing to
> > do with my changes.
> 
> Sure. The userland is from Oct. 20 too, and COMPAT_FREEBSD{9,10} was
> not added to kernel config.

So again.  Did adding COMPAT_FREEBSD9 solved the issue ?

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 19:05:07 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0C21695A;
 Wed, 29 Oct 2014 19:05:07 +0000 (UTC)
Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com
 [IPv6:2a00:1450:400c:c05::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 27AA97C9;
 Wed, 29 Oct 2014 19:05:06 +0000 (UTC)
Received: by mail-wi0-f175.google.com with SMTP id ex7so2594192wid.2
 for <multiple recipients>; Wed, 29 Oct 2014 12:05:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=EfEBLIfl4bpXq56QWoRctyt1C/8hgDLUw9AuS5DDoq0=;
 b=A/l1/XTnIvNRPsC59hxvUuJWo/P3d9KmX5fSMRlPo3yh1SDyjl7CPPozx691jtpebh
 jcZaEYfgZTK3Akq6GRZMX5YQ+5rbzkGAkhaofvqGLCsS8aOROssvxllUz1tOZiCKE+Pb
 Vhy6Lwh9jSzo96UWoQraNxmaYwQFc8klfgRdp1m/njnKG/GCvJLx4XPtaVVsVMRMzZxh
 TixRmqJQaas4kRJwdk0B8sSa0QgkgoATbggEW2QTEK/HQraXOcdXtPiO59QtzRb7FNwJ
 3cll3iL3rIVPYmPPb2UTqvIzfQtRF0uY7/kZN800SqwExghtd+yYDkD4/bmcl6kxFU3v
 dGBQ==
X-Received: by 10.180.21.140 with SMTP id v12mr38671171wie.44.1414609503373;
 Wed, 29 Oct 2014 12:05:03 -0700 (PDT)
Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net.
 [2001:470:1f08:1f7::2])
 by mx.google.com with ESMTPSA id rx8sm1582962wjb.30.2014.10.29.12.05.01
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Wed, 29 Oct 2014 12:05:02 -0700 (PDT)
Date: Wed, 29 Oct 2014 20:04:59 +0100
From: Mateusz Guzik <mjguzik@gmail.com>
To: Attilio Rao <attilio@freebsd.org>
Subject: Re: atomic ops
Message-ID: <20141029190459.GA25368@dft-labs.eu>
References: <20141028025222.GA19223@dft-labs.eu>
 <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Adrian Chadd <adrian@freebsd.org>, Alan Cox <alc@rice.edu>,
 Konstantin Belousov <kib@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 19:05:07 -0000

On Tue, Oct 28, 2014 at 02:18:41PM +0100, Attilio Rao wrote:
> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com> wrote:
> > As was mentioned sometime ago, our situation related to atomic ops is
> > not ideal.
> >
> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide
> > full memory barriers, which is stronger than needed.
> >
> > Moreover, load is implemented as lock cmpchg on var address, so it is
> > addditionally slower especially when cpus compete.
> 
> I already explained this once privately: fully memory barriers is not
> stronger than needed.
> FreeBSD has a different semantic than Linux. We historically enforce a
> full barrier on _acq() and _rel() rather then just a read and write
> barrier, hence we need a different implementation than Linux.
> There is code that relies on this property, like the locking
> primitives (release a mutex, for instance).
> 

I mean stronger than needed in some cases, popular one is fget_unlocked
and we provide no "lightest sufficient" barrier (which would also be
cheaper).

Other case which benefits greatly is sys/sys/seq.h. As noted in some
other thread, using load_acq as it is destroys performance.

I don't dispute the need for full barriers, although it is unclear what
current consumers of load_acq actually need a full barrier..

> In short: optimizing the implementation for performance is fine and
> due. Changing the semantic is not fine, unless you have reviewed and
> fixed all the uses of _rel() and _acq().
> 
> > On amd64 it is sufficient to place a compiler barrier in such cases.
> >
> > Next, we lack some atomic ops in the first place.
> >
> > Let's define some useful terms:
> > smp_wmb - no writes can be reordered past this point
> > smp_rmb - no reads can be reordered past this point
> >
> > With this in mind, we lack ops which would guarantee only the following:
> >
> > 1. var = tmp; smp_wmb();
> > 2. tmp = var; smp_rmb();
> > 3. smp_rmb(); tmp = var;
> >
> > This matters since what we can use already to emulate this is way
> > heavier than needed on aforementioned amd64 and most likely other archs.
> 
> I can see the value of such barriers in case you want to just
> synchronize operation regards read or writes.
> I also believe that on newest intel processors (for which we should
> optimize) rmb() and wmb() got significantly faster than mb(). However
> the most interesting case would be for arm and mips, I assume. That's
> where you would see a bigger perf difference if you optimize the
> membar paths.
> 
> Last time I looked into it, in FreeBSD kernel the Linux-ish
> rmb()/wmb()/etc. were used primilarly in 3 places: Linux-derived code,
> handling of 16-bits operand and implementation of "faster" bus
> barriers.
> Initially I had thought about just confining the smp_*() in a Linux
> compat layer and fix the other 2 in this way: for 16-bits operands
> just pad to 32-bits, as the C11 standard also does. For the bus
> barriers, just grow more versions to actually include the rmb()/wmb()
> scheme within.
> 
> At this point, I understand we may want to instead  support the
> concept of write-only or read-only barrier. This means that if we want
> to keep the concept tied to the current _acq()/_rel() scheme we will
> end up with a KPI explosion.
> 
> I'm not the one making the call here, but for a faster and more
> granluar approach, possibly we can end up using smp_rmb() and
> smp_wmb() directly. As I said I'm not the one making the call.
> 

Well, I don't know original motivation for expressing stuff with
_load_acq and _store_rel.

Anyway, maybe we could do something along (expressing intent, not actual
code):

mb_producer_start(p, v) { *p = v; smp_wmb(); }
mb_producer(p, v) { smp_wmb(); *p = v; }
mb_producer_end(p, v) { mb_producer(p, v); }

type mb_consumer(p) { var = *p; smp_rmb(); return (var); }
type mb_consumer_start(p) { return (mb_consumer(p)); } 
type mb_consumer_end(p) { smp_rmb(); return (*p); }

-- 
Mateusz Guzik <mjguzik gmail.com>

From owner-freebsd-arch@FreeBSD.ORG  Wed Oct 29 19:13:36 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 81E25B19
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 19:13:36 +0000 (UTC)
Received: from mail-yk0-f175.google.com (mail-yk0-f175.google.com
 [209.85.160.175])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 424318B1
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 19:13:35 +0000 (UTC)
Received: by mail-yk0-f175.google.com with SMTP id q9so1592043ykb.6
 for <freebsd-arch@freebsd.org>; Wed, 29 Oct 2014 12:13:34 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=8PM+euvGxgSRsVB/43XwSTzWjL/DGLVwkdU5UehBNVM=;
 b=ea9AQnnjjpPlXsoBPJ+LuG7/rlbKXDKWPu5JZ1UrevmdiZKA3RdwylMwV4SA7Jntl8
 XICkpri11LHl1qu8fV8J5UJQwnm9qne1tb6amK2AICo3vQSF2nGjnwQZYr7PW0Dpi6Uk
 yS/I8qCntTJw8z0epJLOD+/YTnL9bis+qINdh7g7heZ42TQhR6jH30EEOKfjC5rS6xsI
 Ca6SrzkeyGzA4f/pV6IpH9ykIXiihAL9u0mjDbQapuo7FWAUXQnq8CrcSx+rXAr6YjVr
 ZgPaVau4PBxwsAVkhhpr4tJBDTF/n3Y6hxMBvX5jGxhk9SzWIghXLFVGUY4GVrlOI+7/
 vlEg==
X-Gm-Message-State: ALoCoQlvIeQEbIxTVIhNHlGGusntIsSLcLTsmFQoNJBqo5g1//ZeMH4iuuos9UHP/MIwrogYxK5z
MIME-Version: 1.0
X-Received: by 10.170.223.84 with SMTP id p81mr3682025ykf.110.1414609710459;
 Wed, 29 Oct 2014 12:08:30 -0700 (PDT)
Received: by 10.170.46.203 with HTTP; Wed, 29 Oct 2014 12:08:30 -0700 (PDT)
X-Originating-IP: [62.165.198.134]
In-Reply-To: <20141029182339.GK53947@kib.kiev.ua>
References: <20141021094539.GA1877@kib.kiev.ua>
 <2048849.GkvWliFbyg@ralph.baldwin.cx>
 <20141027165557.GC1877@kib.kiev.ua>
 <201410281146.49370.jhb@freebsd.org>
 <CAPQ4ffsXa4BOHWJt_YhPOSDu5KQpUf0oVcMoiAFCxyR9YVKCdQ@mail.gmail.com>
 <CAPQ4ffsSPtRyQD==WROCR6Shmm6d=N_6oS8zoJEcio9fCi1Amw@mail.gmail.com>
 <20141029180635.GJ53947@kib.kiev.ua>
 <CAPQ4ffsNG8Gaes1P_YOv9QBid140qK3YJqbch27DP3YEhYqrnQ@mail.gmail.com>
 <20141029182339.GK53947@kib.kiev.ua>
Date: Wed, 29 Oct 2014 20:08:30 +0100
Message-ID: <CAPQ4ffvyPxohVLiQqdfZkQcv5t=gfPN-4tUNxPWPT+eivpDjJQ@mail.gmail.com>
Subject: Re: RfC: fueword(9) and casueword(9)
From: Oliver Pinter <oliver.pinter@hardenedbsd.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2014 19:13:36 -0000

On Wed, Oct 29, 2014 at 7:23 PM, Konstantin Belousov
<kostikbel@gmail.com> wrote:
> On Wed, Oct 29, 2014 at 07:10:44PM +0100, Oliver Pinter wrote:
>> On Wed, Oct 29, 2014 at 7:06 PM, Konstantin Belousov
>> <kostikbel@gmail.com> wrote:
>> > How could I get a single bit of useful information from this text ?
>> >
>> > My guess is that you have old libc and new kernel compiled without
>> > COMPAT_FREEBSD9 and 10.  If this is the cause, it has nothing to
>> > do with my changes.
>>
>> Sure. The userland is from Oct. 20 too, and COMPAT_FREEBSD{9,10} was
>> not added to kernel config.
>
> So again.  Did adding COMPAT_FREEBSD9 solved the issue ?

I added both COMPAT_FREEBSD9 and COMPAT_FREEBSD10, and the problem
fixed. Thanks!

From owner-freebsd-arch@FreeBSD.ORG  Thu Oct 30 18:10:55 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B67321ED;
 Thu, 30 Oct 2014 18:10:55 +0000 (UTC)
Received: from nibbler.fubar.geek.nz (nibbler.fubar.geek.nz [199.48.134.198])
 by mx1.freebsd.org (Postfix) with ESMTP id 9684BE14;
 Thu, 30 Oct 2014 18:10:55 +0000 (UTC)
Received: from bender.lan (97e078e7.skybroadband.com [151.224.120.231])
 by nibbler.fubar.geek.nz (Postfix) with ESMTPSA id 95F815CC08;
 Thu, 30 Oct 2014 18:10:53 +0000 (UTC)
Date: Thu, 30 Oct 2014 18:10:48 +0000
From: Andrew Turner <andrew@fubar.geek.nz>
To: John Baldwin <jhb@freebsd.org>
Subject: Re: atomic ops
Message-ID: <20141030181048.4cbeeec6@bender.lan>
In-Reply-To: <201410291335.57919.jhb@freebsd.org>
References: <20141028025222.GA19223@dft-labs.eu>
 <201410291059.16829.jhb@freebsd.org>
 <1414601895.17308.89.camel@revolution.hippie.lan>
 <201410291335.57919.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Ian Lepore <ian@freebsd.org>, Alan Cox <alc@rice.edu>, attilio@freebsd.org,
 Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Oct 2014 18:10:55 -0000

On Wed, 29 Oct 2014 13:35:57 -0400
John Baldwin <jhb@freebsd.org> wrote:
> On Wednesday, October 29, 2014 12:58:15 pm Ian Lepore wrote:
> > Next, when we consider 'Access A' I'm not sure it's true that the
> > access will replay if the store-exclusive fails and the operation
> > loops.  The access to A may have been a prefetch, even a prefetch
> > for data on a predicted upcoming execution branch which may or may
> > not end up being taken.
> > 
> > I think the only think that makes an ldrex/strex sequence safe for
> > use in implementing synchronization primitives is to insert a 'dmb'
> > after the acquire loop (after the strex succeeds), and 'dsb' before
> > the release loop (dsb is required for SMP, dmb might be good enough
> > on UP).
> > 
> > Looking into this has made me realize our current armv6/7 atomics
> > are incorrect in this regard.  Guess I'll see about fixing them up
> > Real Soon Now.  :)
> 
> I'm not actually sure either, but it would be surprising to me
> otherwise. Presumably there is nothing magic about a branch.  Either
> the load-acquire is an acquire barrier or it isn't.  Namely, suppose
> you had this sequence:
> 
> 	load-acquire P
> 	access A (prefetch)
> 	load-acquire Q
> 	load A
> 
> Would you expect the prefetch to satisfy the load or should the
> load-acquire on Q discard that?  Having a branch after a failing
> conditional store back to the load acquire should work similarly.  It
> has to discard anything that was prefetched or it isn't an actual
> load-acquire.

I have checked with someone in ARM. The prefetch should not be
considered an access with regard to the barrier and it could be moved
before it as it will only load data into the cache. The barrier only
deals with loading data into the core, i.e. if it has was part of the
prefetch it will be loaded from the cache no earlier than the
load-acquire. The cache coherency protocol ensures the data will be up
to date while the barrier will ensure the ordering of the load of A.

In the above example the prefetch of A will not be thrown away but the
data in the cache may change between the prefetch and load A if another
core has written to A. If this is the case the load will be of the new
data.

Andrew

From owner-freebsd-arch@FreeBSD.ORG  Thu Oct 30 19:05:47 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2B4F8E28;
 Thu, 30 Oct 2014 19:05:47 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 018BD619;
 Thu, 30 Oct 2014 19:05:47 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 472A0B923;
 Thu, 30 Oct 2014 15:05:45 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Andrew Turner <andrew@fubar.geek.nz>
Subject: Re: atomic ops
Date: Thu, 30 Oct 2014 15:03:13 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <20141028025222.GA19223@dft-labs.eu>
 <201410291335.57919.jhb@freebsd.org> <20141030181048.4cbeeec6@bender.lan>
In-Reply-To: <20141030181048.4cbeeec6@bender.lan>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201410301503.14225.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Thu, 30 Oct 2014 15:05:45 -0400 (EDT)
Cc: Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Ian Lepore <ian@freebsd.org>, Alan Cox <alc@rice.edu>, attilio@freebsd.org,
 Konstantin Belousov <kib@freebsd.org>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Oct 2014 19:05:47 -0000

On Thursday, October 30, 2014 2:10:48 pm Andrew Turner wrote:
> On Wed, 29 Oct 2014 13:35:57 -0400
> John Baldwin <jhb@freebsd.org> wrote:
> > On Wednesday, October 29, 2014 12:58:15 pm Ian Lepore wrote:
> > > Next, when we consider 'Access A' I'm not sure it's true that the
> > > access will replay if the store-exclusive fails and the operation
> > > loops.  The access to A may have been a prefetch, even a prefetch
> > > for data on a predicted upcoming execution branch which may or may
> > > not end up being taken.
> > > 
> > > I think the only think that makes an ldrex/strex sequence safe for
> > > use in implementing synchronization primitives is to insert a 'dmb'
> > > after the acquire loop (after the strex succeeds), and 'dsb' before
> > > the release loop (dsb is required for SMP, dmb might be good enough
> > > on UP).
> > > 
> > > Looking into this has made me realize our current armv6/7 atomics
> > > are incorrect in this regard.  Guess I'll see about fixing them up
> > > Real Soon Now.  :)
> > 
> > I'm not actually sure either, but it would be surprising to me
> > otherwise. Presumably there is nothing magic about a branch.  Either
> > the load-acquire is an acquire barrier or it isn't.  Namely, suppose
> > you had this sequence:
> > 
> > 	load-acquire P
> > 	access A (prefetch)
> > 	load-acquire Q
> > 	load A
> > 
> > Would you expect the prefetch to satisfy the load or should the
> > load-acquire on Q discard that?  Having a branch after a failing
> > conditional store back to the load acquire should work similarly.  It
> > has to discard anything that was prefetched or it isn't an actual
> > load-acquire.
> 
> I have checked with someone in ARM. The prefetch should not be
> considered an access with regard to the barrier and it could be moved
> before it as it will only load data into the cache. The barrier only
> deals with loading data into the core, i.e. if it has was part of the
> prefetch it will be loaded from the cache no earlier than the
> load-acquire. The cache coherency protocol ensures the data will be up
> to date while the barrier will ensure the ordering of the load of A.
> 
> In the above example the prefetch of A will not be thrown away but the
> data in the cache may change between the prefetch and load A if another
> core has written to A. If this is the case the load will be of the new
> data.

That is sufficient for what atomic(9)'s _acq wants, yes.

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct 31 19:12:19 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A49EBA86;
 Fri, 31 Oct 2014 19:12:19 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "funkthat.com", Issuer "funkthat.com" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7D19F6A;
 Fri, 31 Oct 2014 19:12:19 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9VJCCR0042606
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 31 Oct 2014 12:12:13 -0700 (PDT)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9VJCCN5042605;
 Fri, 31 Oct 2014 12:12:12 -0700 (PDT) (envelope-from jmg)
Date: Fri, 31 Oct 2014 12:12:12 -0700
From: John-Mark Gurney <jmg@funkthat.com>
To: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject: any reason not to enable IPDIVERT for ipfw module?
Message-ID: <20141031191212.GO8852@funkthat.com>
Mail-Followup-To: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Fri, 31 Oct 2014 12:12:13 -0700 (PDT)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Oct 2014 19:12:19 -0000

Can any one think of a good reason not to enable IPDIVERT sockets in
the ipfw module?

And possibly enabling default to accept?   That way you don't have to
go to the console when you load the ipfw module because you forgot to
auto add the accept all rule? :)

something like:
==== //depot/projects/opencrypto/sys/modules/ipfw/Makefile#3 - /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile ====
--- /tmp/tmp.15774.16   2014-10-31 12:11:56.000000000 -0700
+++ /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile   2014-10-31 12:11:54.000000000 -0700
@@ -16,7 +16,10 @@
 #CFLAGS+= -DIPFIREWALL_VERBOSE_LIMIT=100
 #
 #If you want it to pass all packets by default
-#CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT
+CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT
+#
+#If you want divert sockets
+CFLAGS+= -DIPDIVERT
 #
 
 .include <bsd.kmod.mk>

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct 31 19:14:30 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C4F25D9C;
 Fri, 31 Oct 2014 19:14:30 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "funkthat.com", Issuer "funkthat.com" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 8B077C4;
 Fri, 31 Oct 2014 19:14:30 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9VJETXQ042646
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 31 Oct 2014 12:14:29 -0700 (PDT)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9VJETUc042645;
 Fri, 31 Oct 2014 12:14:29 -0700 (PDT) (envelope-from jmg)
Date: Fri, 31 Oct 2014 12:14:28 -0700
From: John-Mark Gurney <jmg@funkthat.com>
To: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject: Re: any reason not to enable IPDIVERT for ipfw module?
Message-ID: <20141031191428.GP8852@funkthat.com>
Mail-Followup-To: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org
References: <20141031191212.GO8852@funkthat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141031191212.GO8852@funkthat.com>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Fri, 31 Oct 2014 12:14:29 -0700 (PDT)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Oct 2014 19:14:30 -0000

John-Mark Gurney wrote this message on Fri, Oct 31, 2014 at 12:12 -0700:
> Can any one think of a good reason not to enable IPDIVERT sockets in
> the ipfw module?

sorry, ignore this...  didn't realize ipdivert was loadable as a
separate module, ipdivert...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct 31 23:35:08 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 96593820;
 Fri, 31 Oct 2014 23:35:08 +0000 (UTC)
Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com
 [IPv6:2a00:1450:400c:c05::22b])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0866B5F1;
 Fri, 31 Oct 2014 23:35:07 +0000 (UTC)
Received: by mail-wi0-f171.google.com with SMTP id q5so2536856wiv.4
 for <multiple recipients>; Fri, 31 Oct 2014 16:35:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=3QiSI2ci/GVxQj4qve93FYPyQGi8c4kwaw04rrnf0tw=;
 b=CSdefo+85GlExOt+/G1I0Fp6ZyIGaSmNxektAv6VZvVzwWudXVHmM+DawlI5stwOaV
 HX5UjgSIB1zQzRQNmyRRXd+iRTfsd8PPi7upAyuDguVNLDDY6rnetBuf9P5lVCzot1JZ
 JPqvGehdI+MI+f9/JnJNeU5YnBpiO85/tbAxY0Vtu9M3OWCVcPRgYJYwjwHASadwRB/Q
 Jx5ivRl4PE+SZMAdizrXabOspbRC45XrGT9y49MNI2+d8RlCXpnV9XUXKjeGxt6anncg
 xsl092peSIIq/BnhcriZRqVbf/ep4qqR6Ie1jJ2rG9ZUHw8D9COwbjcyxoydn8XvJzVO
 vVzw==
X-Received: by 10.194.62.226 with SMTP id b2mr26950561wjs.46.1414798505978;
 Fri, 31 Oct 2014 16:35:05 -0700 (PDT)
Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net.
 [2001:470:1f08:1f7::2])
 by mx.google.com with ESMTPSA id da3sm13654686wjb.12.2014.10.31.16.35.04
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Fri, 31 Oct 2014 16:35:05 -0700 (PDT)
Date: Sat, 1 Nov 2014 00:35:02 +0100
From: Mateusz Guzik <mjguzik@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Subject: Re: refcount_release_take_##lock
Message-ID: <20141031233502.GB20591@dft-labs.eu>
References: <20141025184448.GA19066@dft-labs.eu>
 <201410281154.54581.jhb@freebsd.org>
 <20141028174428.GA12014@dft-labs.eu>
 <201410281413.58414.jhb@freebsd.org>
 <20141028193404.GB12014@dft-labs.eu>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20141028193404.GB12014@dft-labs.eu>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: John-Mark Gurney <jmg@funkthat.com>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Oct 2014 23:35:08 -0000

On Tue, Oct 28, 2014 at 08:34:04PM +0100, Mateusz Guzik wrote:
> On Tue, Oct 28, 2014 at 02:13:58PM -0400, John Baldwin wrote:
> > On Tuesday, October 28, 2014 1:44:28 pm Mateusz Guzik wrote:
> > > diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c
> > > index f8ae0e6..e94ccde 100644
> > > --- a/sys/kern/kern_jail.c
> > > +++ b/sys/kern/kern_jail.c
> > 
> > The diff looks good to me.  Just need to update refcount.9 as well.
> > 
> 

Ping? Is this diff ok?

> diff --git a/share/man/man9/refcount.9 b/share/man/man9/refcount.9
> index e7702a2..61b9b51 100644
> --- a/share/man/man9/refcount.9
> +++ b/share/man/man9/refcount.9
> @@ -26,7 +26,7 @@
>  .\"
>  .\" $FreeBSD$
>  .\"
> -.Dd January 20, 2009
> +.Dd October 28, 2014
>  .Dt REFCOUNT 9
>  .Os
>  .Sh NAME
> @@ -44,6 +44,15 @@
>  .Fn refcount_acquire "volatile u_int *count"
>  .Ft int
>  .Fn refcount_release "volatile u_int *count"
> +.In sys/mutex.h
> +.Fn refcount_release_lock_mtx "volatile u_int *count, struct mtx *lock"
> +.In sys/rmlock.h
> +.Fn refcount_release_lock_rmlock "volatile u_int *count, struct rmlock *lock"
> +.In sys/rwlock.h
> +.Fn refcount_release_lock_rwlock "volatile u_int *count, struct rwlock *lock"
> +.In sys/lock.h
> +.In sys/sx.h
> +.Fn refcount_release_lock_sx "volatile u_int *count, struct sx *lock"
>  .Sh DESCRIPTION
>  The
>  .Nm
> @@ -77,6 +86,13 @@ The function returns a non-zero value if the reference being released was
>  the last reference;
>  otherwise, it returns zero.
>  .Pp
> +.Fn refcount_release_lock_*
> +functions release an existing reference holding the lock if it is the last
> +reference.
> +These functions return with the lock held and a non-zero value if the reference
> +being released was the last reference;
> +otherwise, they returns zero and the lock is not held.
> +.Pp
>  Note that these routines do not provide any inter-CPU synchronization,
>  data protection,
>  or memory ordering guarantees except for managing the counter.
> @@ -91,6 +107,18 @@ The
>  .Nm refcount_release
>  function returns non-zero when releasing the last reference and zero when
>  releasing any other reference.
> +.Pp
> +.Nm refcount_release_lock_*
> +functions return with the lock held and non-zero value when releasing the last
> +reference, zero without the lock held when releasing any other reference.
>  .Sh HISTORY
> -These functions were introduced in
> +.Fn refcount_init ,
> +.Fn refcount_acquire
> +and
> +.Fn refcount_release
> +functions were introduced in
>  .Fx 6.0 .
> +.Pp
> +.Fn refcount_release_lock_*
> +functions were introduced in
> +.Fx 10.2 .
> 
> -- 
> Mateusz Guzik <mjguzik gmail.com>

-- 
Mateusz Guzik <mjguzik gmail.com>

From owner-freebsd-arch@FreeBSD.ORG  Sat Nov  1 01:28:29 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 36C69C7;
 Sat,  1 Nov 2014 01:28:29 +0000 (UTC)
Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com
 [IPv6:2607:f8b0:4003:c06::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id EAD00275;
 Sat,  1 Nov 2014 01:28:28 +0000 (UTC)
Received: by mail-oi0-f50.google.com with SMTP id v63so3124019oia.9
 for <multiple recipients>; Fri, 31 Oct 2014 18:28:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :content-type; bh=d47zLIZrH06oXHgyEodvKkAFAAFr7Q0KN9R8xb08vAg=;
 b=ynU8wbfanR04lpBx8Og1HZVAnVegF+1rspxtPWotqZO9F4+wjzWE+8P0mItAZdslfX
 uTfsr5bEssBv4WmpH6fi2XdaHtnX+jBZuZFUaMgOCPi2y+mMiLdOFchP+lQoApCoT27s
 8T6QGC2mOmOR5mQv3rmXeZNWfMP3ab4hHtkT2Vq/cGv3Lx0Mbai4hh5n3RHCauGkRZ3U
 SMqug+ugEDUdUBtriCwUrXOmTXrT6MA0N94VYmjt4f6jYeX/NNjXu4sbG4V0XQvGo85E
 cIMKe4E/S4R2HKisXSqAo01WEeVALr/QvawPkPHlf0FFOfR+jiqQN77qKTijksjKpwjl
 9J1A==
MIME-Version: 1.0
X-Received: by 10.182.18.104 with SMTP id v8mr22769616obd.3.1414805308246;
 Fri, 31 Oct 2014 18:28:28 -0700 (PDT)
Received: by 10.202.104.39 with HTTP; Fri, 31 Oct 2014 18:28:28 -0700 (PDT)
Received: by 10.202.104.39 with HTTP; Fri, 31 Oct 2014 18:28:28 -0700 (PDT)
In-Reply-To: <20141031191212.GO8852@funkthat.com>
References: <20141031191212.GO8852@funkthat.com>
Date: Fri, 31 Oct 2014 18:28:28 -0700
Message-ID: <CAOjFWZ7EZUHi+7VgQ53os4MYuZT6SSf89B1dQSPX-SZLrhFzzw@mail.gmail.com>
Subject: Re: any reason not to enable IPDIVERT for ipfw module?
From: Freddie Cash <fjwcash@gmail.com>
To: FreeBSD Arch <freebsd-arch@freebsd.org>,
 freebsd-net <freebsd-net@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Nov 2014 01:28:29 -0000

On Oct 31, 2014 12:12 PM, "John-Mark Gurney" <jmg@funkthat.com> wrote:
>
> Can any one think of a good reason not to enable IPDIVERT sockets in
> the ipfw module?
>
> And possibly enabling default to accept?   That way you don't have to
> go to the console when you load the ipfw module because you forgot to
> auto add the accept all rule? :)

You can change the default rule to accept via loader.conf and it will be
set when the module is loaded.

net.inet.IP.fw.default_to_accept or something Luke that.


> something like:
> ==== //depot/projects/opencrypto/sys/modules/ipfw/Makefile#3 -
/home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile ====
> --- /tmp/tmp.15774.16   2014-10-31 12:11:56.000000000 -0700
> +++ /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile
 2014-10-31 12:11:54.000000000 -0700
> @@ -16,7 +16,10 @@
>  #CFLAGS+= -DIPFIREWALL_VERBOSE_LIMIT=100
>  #
>  #If you want it to pass all packets by default
> -#CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT
> +CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT
> +#
> +#If you want divert sockets
> +CFLAGS+= -DIPDIVERT
>  #
>
>  .include <bsd.kmod.mk>
>
> --
>   John-Mark Gurney                              Voice: +1 415 225 5579
>
>      "All that I will do, has been done, All that I have, has not."
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"

From owner-freebsd-arch@FreeBSD.ORG  Sat Nov  1 05:16:11 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EDC0B1CB;
 Sat,  1 Nov 2014 05:16:11 +0000 (UTC)
Received: from sola.nimnet.asn.au (paqi.nimnet.asn.au [115.70.110.159])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 44F411D6;
 Sat,  1 Nov 2014 05:16:10 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by sola.nimnet.asn.au (8.14.2/8.14.2) with ESMTP id sA15G85D081579;
 Sat, 1 Nov 2014 16:16:08 +1100 (EST)
 (envelope-from smithi@nimnet.asn.au)
Date: Sat, 1 Nov 2014 16:16:07 +1100 (EST)
From: Ian Smith <smithi@nimnet.asn.au>
To: Freddie Cash <fjwcash@gmail.com>
Subject: Re: any reason not to enable IPDIVERT for ipfw module?
In-Reply-To: <CAOjFWZ7EZUHi+7VgQ53os4MYuZT6SSf89B1dQSPX-SZLrhFzzw@mail.gmail.com>
Message-ID: <20141101144834.N52402@sola.nimnet.asn.au>
References: <20141031191212.GO8852@funkthat.com>
 <CAOjFWZ7EZUHi+7VgQ53os4MYuZT6SSf89B1dQSPX-SZLrhFzzw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: freebsd-net <freebsd-net@freebsd.org>, freebsd-ipfw@freebsd.org,
 FreeBSD Arch <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 01 Nov 2014 05:16:12 -0000

On Fri, 31 Oct 2014 18:28:28 -0700, Freddie Cash wrote:

 > On Oct 31, 2014 12:12 PM, "John-Mark Gurney" <jmg@funkthat.com> wrote:
 > >
 > > Can any one think of a good reason not to enable IPDIVERT sockets in
 > > the ipfw module?

Yes, two.  Nowadays people are just as or perhaps more likely to use 
in-kernel NAT, loading ipfw_nat.ko instead of ipdivert.ko, and there's 
no good reason to add extra code to ipfw.ko unless it's going to be 
used.  See libalias(3) /MODULAR ARCHITECTURE

Similaly there'd be no reason to include dummynet code unless using it.

 > > And possibly enabling default to accept?   That way you don't have to
 > > go to the console when you load the ipfw module because you forgot to
 > > auto add the accept all rule? :)

That'd reverse some 15+ years of security policy, of having the firewall 
closed until you've loaded your ruleset, to cater to forgetfulness? :)

 > You can change the default rule to accept via loader.conf and it will be
 > set when the module is loaded.
 > 
 > net.inet.IP.fw.default_to_accept or something Luke that.

Yes, net.inet.ip.fw.default_to_accept=1 is a loader tunable, and can be 
set before ipfw is loaded, unlike the net.inet.ip.fw sysctls which don't 
exist until ipfw is loaded.  Or it can be set to 0 to reverse policy if 
kernel has been built with 'options IPFIREWALL_DEFAULT_TO_ACCEPT'.

Normally /etc/rc.d/ipfw takes care of loading ipfw_nat or ipdivert (or 
both if you wanted to use both natd(8) and ipfw_nat for some reason?) 
and/or dummynet, according to the rc.conf variables.

I've added freebsd-ipfw@ to ccs, just because it seems relevant ..

cheers, Ian

 > > something like:
 > > ==== //depot/projects/opencrypto/sys/modules/ipfw/Makefile#3 -
 > /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile ====
 > > --- /tmp/tmp.15774.16   2014-10-31 12:11:56.000000000 -0700
 > > +++ /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile
 >  2014-10-31 12:11:54.000000000 -0700
 > > @@ -16,7 +16,10 @@
 > >  #CFLAGS+= -DIPFIREWALL_VERBOSE_LIMIT=100
 > >  #
 > >  #If you want it to pass all packets by default
 > > -#CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT
 > > +CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT
 > > +#
 > > +#If you want divert sockets
 > > +CFLAGS+= -DIPDIVERT
 > >  #
 > >
 > >  .include <bsd.kmod.mk>
 > >
 > > --
 > >   John-Mark Gurney                              Voice: +1 415 225 5579
 > >
 > >      "All that I will do, has been done, All that I have, has not."