From owner-freebsd-arch@FreeBSD.ORG Sun Oct 26 00:01:04 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2CBC4E09 for ; Sun, 26 Oct 2014 00:01:04 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0114.outbound.protection.outlook.com [157.56.111.114]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C6E44C1B for ; Sun, 26 Oct 2014 00:01:02 +0000 (UTC) Received: from CO2PR05CA011.namprd05.prod.outlook.com (10.141.241.139) by BL2PR05MB114.namprd05.prod.outlook.com (10.255.232.24) with Microsoft SMTP Server (TLS) id 15.0.1054.13; Sat, 25 Oct 2014 23:45:50 +0000 Received: from BY2FFO11FD058.protection.gbl (2a01:111:f400:7c0c::134) by CO2PR05CA011.outlook.office365.com (2a01:111:e400:1429::11) with Microsoft SMTP Server (TLS) id 15.1.6.9 via Frontend Transport; Sat, 25 Oct 2014 23:45:49 +0000 Received: from P-EMF01-SAC.jnpr.net (66.129.239.15) by BY2FFO11FD058.mail.protection.outlook.com (10.1.15.178) with Microsoft SMTP Server (TLS) id 15.0.1049.20 via Frontend Transport; Sat, 25 Oct 2014 23:45:49 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF01-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Sat, 25 Oct 2014 16:45:48 -0700 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s9PNjlR96232; Sat, 25 Oct 2014 16:45:47 -0700 (PDT) (envelope-from sjg@juniper.net) Received: from chaos (localhost [127.0.0.1]) by chaos.jnpr.net (Postfix) with ESMTP id D871D580A3; Sat, 25 Oct 2014 16:45:46 -0700 (PDT) To: =?us-ascii?Q?=3D=3Futf-8=3FQ=3FDag-Erling=5FSm=3DC3=3DB8rgrav=3F=3D?= Subject: Re: Retiring WITH_INSTALL_AS_USER In-Reply-To: <86k33o7ziu.fsf@nine.des.no> References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> <21044.1414038558@chaos> <9250.1414076335@chaos> <86wq7p4zcx.fsf@nine.des.no> <10072.1414165996@chaos> <86k33o7ziu.fsf@nine.des.no> Comments: In-reply-to: =?us-ascii?Q?=3D=3Futf-8=3FQ=3FDag-Erling=5FSm=3DC3?= =?us-ascii?Q?=3DB8rgrav=3F=3D?= message dated "Sat, 25 Oct 2014 20:52:25 +0200." From: "Simon J. Gerraty" X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Date: Sat, 25 Oct 2014 16:45:46 -0700 Message-ID: <15245.1414280746@chaos> X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:66.129.239.15; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(189002)(24454002)(199003)(76506005)(93886004)(62966002)(76482002)(87936001)(23756003)(19580395003)(46102003)(105596002)(47776003)(95666004)(117636001)(20776003)(106466001)(57986006)(80022003)(64706001)(77156001)(99396003)(81156004)(31966008)(87286001)(93916002)(84676001)(107046002)(21056001)(120916001)(104166001)(110136001)(44976005)(92726001)(92566001)(6806004)(89996001)(86362001)(68736004)(102836001)(33716001)(85852003)(97736003)(69596002)(4396001)(50466002)(50226001)(88136002)(19580405001)(85306004)(76176999)(50986999)(42262002)(62816006); DIR:OUT; SFP:1102; SCL:1; SRVR:BL2PR05MB114; H:P-EMF01-SAC.jnpr.net; FPR:; MLV:sfv; PTR:InfoDomainNonexistent; MX:1; A:1; LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:BL2PR05MB114; X-Exchange-Antispam-Report-Test: UriScan:; X-Forefront-PRVS: 0375972289 Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.15 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.15) smtp.mailfrom=sjg@juniper.net; X-OriginatorOrg: juniper.net Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 00:01:04 -0000 Dag-Erling Sm=F8rgrav wrote: > NO_ROOT solves it in a much better fashion, by modifying install(1)'s > behavior so that instead of performing the chown / chgrp / chmod, it > records it in a file which can then be used to generate a package > manifest or something like that. Right, so my only concern is running mtree=20 during the build to create staging tree. Most makefiles rely on the dir they are going to install into existing, and install cannot tell from its arguments whether the destination should be a file or a directory. So I'm curios as to why the filtering that made it safe to use mtree was removed and what if anything replaced it. From owner-freebsd-arch@FreeBSD.ORG Sun Oct 26 03:27:45 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 78CF3987 for ; Sun, 26 Oct 2014 03:27:45 +0000 (UTC) Received: from mail-yh0-f49.google.com (mail-yh0-f49.google.com [209.85.213.49]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 33C04FD4 for ; Sun, 26 Oct 2014 03:27:44 +0000 (UTC) Received: by mail-yh0-f49.google.com with SMTP id a41so3019842yho.36 for ; Sat, 25 Oct 2014 20:27:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=EVegrt8ZNv3KdcMGDg3JDXO24Gke5bz8R72UGzwENeY=; b=ZF9eJ57AbbqkinNz3oHQB3Lr3QHPAV9jb9Q0DKeWS+SinIuPatKVGhl6sDwFJQxE4s ijj+O50A4Clc4zufupBeZ5cdBiGYUfLLSwNzPinbEMCi9HlHdEyjGT+gOWsWmxslu+MD OfuZxPUnDbJ6mIBB9A2eYT1PW6WUERTU2gXtwhsiQOzUjnOjAyeXByJ7VzSKIm0PcGAE 4/LTxXFNLpiBdk11RWteUngXgcHalcMQMVPoo1yP3bJMRx4eYfNPbtjpHEi+H5cOTG+s 7Mg0o6PUGVoByG4tdwZwVebgNUVjWFdCgSYavUzKXXMZFg6hy2mnwSmGFrIpLHgO8xIE Sx6A== X-Gm-Message-State: ALoCoQkn6Fa7SyXsH2nHNm/DRLX0RDSEwbT9+JmMS38oJw1twsXedSUEQ919iCBw39UljDQCibFe X-Received: by 10.236.231.98 with SMTP id k92mr496897yhq.161.1414294063797; Sat, 25 Oct 2014 20:27:43 -0700 (PDT) Received: from [192.168.0.14] (173-18-133-79.client.mchsi.com. [173.18.133.79]) by mx.google.com with ESMTPSA id h2sm4116976yhh.25.2014.10.25.20.27.43 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 25 Oct 2014 20:27:43 -0700 (PDT) Sender: Warner Losh Content-Type: multipart/signed; boundary="Apple-Mail=_F5883AC1-F0A2-4C99-B5C2-705037D12AFD"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Retiring WITH_INSTALL_AS_USER From: Warner Losh In-Reply-To: <9250.1414076335@chaos> Date: Sat, 25 Oct 2014 22:27:41 -0500 Message-Id: References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> <21044.1414038558@chaos> <9250.1414076335@chaos> To: "Simon J. Gerraty" X-Mailer: Apple Mail (2.1878.6) Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 03:27:45 -0000 --Apple-Mail=_F5883AC1-F0A2-4C99-B5C2-705037D12AFD Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Oct 23, 2014, at 9:58 AM, Simon J. Gerraty wrote: > Warner Losh wrote: >> If it is in the tree, it needs to work.=20 >=20 > No argument there. >=20 >> It is broken in about a dozen places >> now. Perhaps not the ones that you use. >=20 > Hmm I have it permanently set in a projects/bmake tree that builds > buildworld etc fine (while producing meta files) - though its been a > month or two since last sync. Buildworld it is fine. installworld is where it breaks. In a lot of = places. > Internally we have it set in head trees too. > I don't doubt there's something lacking - just haven't noticed, sorry. >=20 >> Makefile.inc1 is the only place it is documented right now. NO_ROOT >> creates a METADATA file for the attributes of the file and does = simple >> copies instead. This lets you build entirely as an unpriv=92d user, = but >> still use makefs to get a filesystem with the proper attributes. In >> many ways it is what you want, and you could get what you want by >> specifying /dev/null for that METADATA if it were more tightly >> coupled.=20 >=20 > Sounds ok.=20 >=20 > Hmm etc/Makefile looks like it lost the ability to run mtree safely=20 > in a cross-build env? The MTREE_FILTER stuff ensures that mtree = doesn't > choke on unknown users and such. > How is that handled now? That=92s a good question. With NO_ROOT you postpone the unknown users = until makefs time. There=92s both pros and cons to that... Warner --Apple-Mail=_F5883AC1-F0A2-4C99-B5C2-705037D12AFD Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUTGotAAoJEGwc0Sh9sBEA+TwQAL3XBpXVNwzMd0O4Mq9NcFSx uWIpq64H8CVv4phMrX81bYt8pGW1URwpF9dfCirHckfH+DsQaw1IcEj40n4nIRP8 yovJesjn2l9+AjlYcmYW28HxLl6vJUP4F2Hp0JQWT47/JAUQ/ymSL1kLR0ZK/4ZA APVUkri51GJFR9qOc/EiAvOyRnN5zLd9+opzZfdHgIxop0gbxZPk9+BwgdF1isZm m2vA6s+BCQyvgJAub9r5phqIoQUD1sB4r1RvDae7QhWlTOfRivHQfrBWnHQcTNUl AexamP/vFh4MOWEkFXKxOEPQSOudYIDwkeyWRKpPwO8czVqq6gAVTnS9wa4EUJTg lMM4hhMfRYeXIaEQq84NSLzm8gGtqv+JY4rhYwavzEbfFbUn33BkJZTJbIlU0kFI XBDOMvg/91v5MuyGXOcyHNTqzZVfnAjIOANha6HnPa2+Wrdl4V3JCNpFKR1JKWSB 1egK5sGmsbSpxlAs3z8i87apULoG2dcD5YPUz+tRufZRoPyBk+JL3/Y/ej3NYAfh BknSZW1YjKzH/qIXQbRlz39p2W1XLXQ/+3pwe4HpcnxPuQY36631jc3oXbxUvht+ 9B+3avfZWbiuOS07aed3lkCn5THrwJxGK/HQSH7Dlq+1PGLda0hSNJ4l354eOv1k VSB8Ue23pcfaFQP3+KHP =BqsS -----END PGP SIGNATURE----- --Apple-Mail=_F5883AC1-F0A2-4C99-B5C2-705037D12AFD-- From owner-freebsd-arch@FreeBSD.ORG Sun Oct 26 03:40:09 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 16032C08 for ; Sun, 26 Oct 2014 03:40:09 +0000 (UTC) Received: from mail-yh0-f45.google.com (mail-yh0-f45.google.com [209.85.213.45]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C518B133 for ; Sun, 26 Oct 2014 03:40:08 +0000 (UTC) Received: by mail-yh0-f45.google.com with SMTP id f73so3017946yha.18 for ; Sat, 25 Oct 2014 20:40:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=QNytj1TQpR+WWjf5izUZsW3+lBzhKuK3MZFttAm3o+o=; b=Cp7NgEg6vc1D3agaRiSKLhYaG1CF/9p10kD5Gs4gRD7hdAqWes71ImGSbOjMww3EyN UjRfgpaGk7J40j3RuJWEC60OMhC4VMC+FQmJYVsrGtffL8TMH9oPSe7ghbS0rxyacpy9 gGsTcCOFP658pvyPkdmYSdjrIqVFP5W8uVEdHiwHRlIn/cKv5nY77oMlMzpuK4mmF2lS 7GMYy/zx/jY0E+68bDnco7hdhaTaWoFMy0bj8LTIqWjaZb0EsU9RMnNN9kfx0XNccVhl 8pX9V5CbiAu40Ot2WIQgkgJbKMPkLfygvmkjME8tGmWNscWu582jocyo/r4qnRqmjUml g2Tw== X-Gm-Message-State: ALoCoQmXuU5os5C3FkIxaPoYjgodsoD1fj++6SNEMyFDtZHJTRbLKCkrhYqcd0+LihCRF6rIAvaq X-Received: by 10.236.19.69 with SMTP id m45mr14059367yhm.111.1414294807324; Sat, 25 Oct 2014 20:40:07 -0700 (PDT) Received: from [192.168.0.14] (173-18-133-79.client.mchsi.com. [173.18.133.79]) by mx.google.com with ESMTPSA id v31sm4134256yha.16.2014.10.25.20.40.06 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 25 Oct 2014 20:40:06 -0700 (PDT) Sender: Warner Losh Content-Type: multipart/signed; boundary="Apple-Mail=_E190E4BF-9E4A-4887-A132-23BC81335740"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Retiring WITH_INSTALL_AS_USER From: Warner Losh In-Reply-To: <9250.1414076335@chaos> Date: Sat, 25 Oct 2014 22:40:05 -0500 Message-Id: References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> <21044.1414038558@chaos> <9250.1414076335@chaos> To: "Simon J. Gerraty" X-Mailer: Apple Mail (2.1878.6) Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 03:40:09 -0000 --Apple-Mail=_E190E4BF-9E4A-4887-A132-23BC81335740 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Oct 23, 2014, at 9:58 AM, Simon J. Gerraty wrote: > Hmm etc/Makefile looks like it lost the ability to run mtree safely=20 > in a cross-build env? The MTREE_FILTER stuff ensures that mtree = doesn't > choke on unknown users and such. > How is that handled now? I=92m not sure I follow. MTREE_FILTER doesn=92t seem to exist in any = version I=92ve checked in the last three years. Warner --Apple-Mail=_E190E4BF-9E4A-4887-A132-23BC81335740 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUTG0VAAoJEGwc0Sh9sBEAthsP/12M/zIxlZV1vG1zMAfsrfX+ VAq+ICCExGtcPD18SFjcYpiHJcb1MQ9gKYzoq1c8W8A000AK48H9YNi21I3NpqrH ciY414KQycQXjwLT1ViOg70LWipBvYODF5mSrjQBuuVFzf0psfDhtahwFTPtO/zy XzRCWkJ0kzmU3UMQGhASnyjGJCrrH4WAh075OAjwHDaMkY3ANEcHQqY1O1D0Iaqp xgU7PA+06Y3/7+ga3h0ksKF+V1fYI3PwHMD9qTSyGaSdaUt16JsTZ6n9pJgjQkb3 QzPKG3VPu2wufKKwEp2IoQwPhLzJLJhmDGdV7MxlHcSNGlmAKnqx3GJ3/SV+ONKA 6QKDAipR7wUBQ6y5jvmGyF8wW6NVLM3SvKYIzflcmImy/+ajPWKSd+HawREeeE8G FUN36OJWho5b7vQ7LFHRahGaJMOYOOfDNyWG76MhJUPUicO9LCmRoXCoKq1qISlA cLDzjuO36HFiWg9urL1xLQ2RVi0wp0dEvJwz8FUNsLZX/HVmwQINzLT0b2ebKmAS Q94CzuJJBbUWH+KszoNBvfGIPEn492r0/JxNvsX26PhCUMcZ1Lg9DPx3x8HXi2sD wmrSHfeguPyqIVq7p+87HArssZcrePQJG3gNu3a55ib1gSbLawJa8un69cqrKRgI GcxenIss9uNaW2HM/Eeq =v3Kf -----END PGP SIGNATURE----- --Apple-Mail=_E190E4BF-9E4A-4887-A132-23BC81335740-- From owner-freebsd-arch@FreeBSD.ORG Sun Oct 26 06:31:37 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 54811D65 for ; Sun, 26 Oct 2014 06:31:37 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bbn0109.outbound.protection.outlook.com [157.56.111.109]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E8CF6F5 for ; Sun, 26 Oct 2014 06:31:36 +0000 (UTC) Received: from BLUPR05CA0063.namprd05.prod.outlook.com (10.141.20.33) by SN2PR0501MB1037.namprd05.prod.outlook.com (25.160.58.154) with Microsoft SMTP Server (TLS) id 15.1.6.9; Sun, 26 Oct 2014 03:59:25 +0000 Received: from BY2FFO11FD051.protection.gbl (2a01:111:f400:7c0c::143) by BLUPR05CA0063.outlook.office365.com (2a01:111:e400:855::33) with Microsoft SMTP Server (TLS) id 15.1.6.9 via Frontend Transport; Sun, 26 Oct 2014 03:59:25 +0000 Received: from P-EMF01-SAC.jnpr.net (66.129.239.15) by BY2FFO11FD051.mail.protection.outlook.com (10.1.15.188) with Microsoft SMTP Server (TLS) id 15.0.1049.20 via Frontend Transport; Sun, 26 Oct 2014 03:59:25 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF01-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Sat, 25 Oct 2014 20:59:24 -0700 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s9Q3xNR43856; Sat, 25 Oct 2014 20:59:23 -0700 (PDT) (envelope-from sjg@juniper.net) Received: from chaos (localhost [127.0.0.1]) by chaos.jnpr.net (Postfix) with ESMTP id 3BAA0580A3; Sat, 25 Oct 2014 20:59:23 -0700 (PDT) To: Warner Losh Subject: Re: Retiring WITH_INSTALL_AS_USER In-Reply-To: References: <96C0B2BE-0621-4162-BBB7-7D34AEAB5FD0@gmail.com> <21044.1414038558@chaos> <9250.1414076335@chaos> Comments: In-reply-to: Warner Losh message dated "Sat, 25 Oct 2014 22:40:05 -0500." From: "Simon J. Gerraty" X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Date: Sat, 25 Oct 2014 20:59:23 -0700 Message-ID: <24381.1414295963@chaos> X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:66.129.239.15; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(199003)(189002)(24454002)(46102003)(76482002)(69596002)(76506005)(19580395003)(68736004)(99396003)(80022003)(50466002)(89996001)(57986006)(6806004)(85852003)(104166001)(92726001)(93886004)(4396001)(20776003)(558084003)(47776003)(92566001)(19580405001)(44976005)(102836001)(77156001)(84676001)(85306004)(88136002)(93916002)(86362001)(50226001)(107046002)(81156004)(106466001)(23676002)(87936001)(50986999)(76176999)(21056001)(110136001)(87286001)(97736003)(33716001)(117636001)(105596002)(120916001)(64706001)(62966002)(95666004)(31966008)(62816006)(42262002); DIR:OUT; SFP:1102; SCL:1; SRVR:SN2PR0501MB1037; H:P-EMF01-SAC.jnpr.net; FPR:; MLV:sfv; PTR:InfoDomainNonexistent; A:1; MX:1; LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:SN2PR0501MB1037; X-Forefront-PRVS: 0376ECF4DD Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.15 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.15) smtp.mailfrom=sjg@juniper.net; X-OriginatorOrg: juniper.net Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 06:31:37 -0000 Warner Losh wrote: > I=E2=80=99m not sure I follow. MTREE_FILTER doesn=E2=80=99t seem to exist= in any > version I=E2=80=99ve checked in the last three years.=20 Ah - my faulty memory ;-) Its in projects/bmake - and our internal tree. Ignore me. From owner-freebsd-arch@FreeBSD.ORG Sun Oct 26 06:43:01 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 44D96B3; Sun, 26 Oct 2014 06:43:01 +0000 (UTC) Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com [IPv6:2a00:1450:400c:c05::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AEBA11D6; Sun, 26 Oct 2014 06:43:00 +0000 (UTC) Received: by mail-wi0-f182.google.com with SMTP id d1so456220wiv.15 for ; Sat, 25 Oct 2014 23:42:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=N6aa71F3326OK2SG7tqZl29vPvewLtk8qOAXufQd4OM=; b=PYqThDoK/0PvMcHdjvPnxOB5nvaPyViFVaPWe6HgEB6M+vKNLUoC9cx7E0ChryiXF/ 8VKSxEwOyMqzjrXXdzgpYluMnyVyQyVwptcuaFYPmICUOfjdGIxzW6/PTK2gF/+VfzFw mlVXtUnUTAqDiCZDBCzf8bOjLkoazGKmz6neFpsD/E8xkeKPrQ2I++puwPDYbik9UTtP 8vJ4+8Xn4hV2cyfChrkmxWergtt+0qUk+vT5cG+4LazL9joPJmBSC4ST0fewwddK3mXz l1PhQp/z0HAnGGcPRJVfaKxhqkhRw1Fwm2KdVPLx7sd1XdBzONtmLIhoBzfbSzVZudFi TB0w== MIME-Version: 1.0 X-Received: by 10.194.192.161 with SMTP id hh1mr15577974wjc.72.1414305778801; Sat, 25 Oct 2014 23:42:58 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.216.106.136 with HTTP; Sat, 25 Oct 2014 23:42:58 -0700 (PDT) In-Reply-To: <1414265035.12052.646.camel@revolution.hippie.lan> References: <20141025184448.GA19066@dft-labs.eu> <20141025190407.GU82214@funkthat.com> <1414265035.12052.646.camel@revolution.hippie.lan> Date: Sat, 25 Oct 2014 23:42:58 -0700 X-Google-Sender-Auth: q4f-Gfzx0pjuCYMyQ25RXz1o9RA Message-ID: Subject: Re: refcount_release_take_##lock From: Adrian Chadd To: Ian Lepore Content-Type: text/plain; charset=UTF-8 Cc: John-Mark Gurney , Mateusz Guzik , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Oct 2014 06:43:01 -0000 This is exactly why refcount==0 should be the only prelude to freeing the object. There should be no way to actually take a reference on an object that has a refcount of 0, because (surprise) at this stage noone is referencing it anymore. Ie, once the refcount hits 0, this means that nothing references it at all - including any data structures that may be storing it. For example, if an rtentry is in a radix tree, its refcount should be 1 or more, not 0. It's the only way this can work. (The net80211 stack suffers from this and I'm about to set it on fire until I fix it. It's been a source of crashes for almost 6 years now.) -adrian On 25 October 2014 12:23, Ian Lepore wrote: > On Sat, 2014-10-25 at 12:04 -0700, John-Mark Gurney wrote: >> Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200: >> > The following idiom is used here and there: >> > >> > int old; >> > old = obj->ref; >> > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) >> > return; >> > lock(&something); >> > if (refcount_release(&obj->ref) == 0) { >> > unlock(&something); >> > return; >> > } >> > free up >> > unlock(&something); >> > >> > ========== >> >> Couldn't this be better written as: >> if (__predict_false(refcount_release(&obj->ref) == 0)) { > > Could you not get preempted at this point, whereupon another thread > acquires then releases obj, deletes it because it keeps running through > this point, then eventually your original thread wakes up, gets the > lock, and dereferences the now-defunct obj pointer? > > (Also, I think that should be != 0, above?) > > -- Ian > >> lock(&something); >> if (__predict_true(!obj->ref)) { >> free up >> } >> unlock(&something); >> } >> >> The reason I'm asking is that I changed how IPsec SA ref counting was >> handled, and used something similar... >> >> My code gets rid of a branch, and is better in that it uses refcount >> API properly, instead of using atomic_cmpset_int... >> >> > I decided to implement it as a common function. >> > >> > We have only refcount.h and I didn't want to bloat all including code >> > with additional definitions and as such I came up with a macro that has >> > to be used in .c file and that will define appropriate inline func. >> > >> > I'm definitely looking for better names for REFCOUNT_RELEASE_TAKE_USE_ >> > macro, assuming it has to stay. >> >> You could shorten it to REFCNT_REL_TAKE_ >> >> > Comments? >> >> Will you update the refcount(9) man page w/ documentation before >> committing? >> > > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 06:59:35 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F3ED721B; Mon, 27 Oct 2014 06:59:34 +0000 (UTC) Received: from pp2.rice.edu (proofpoint2.mail.rice.edu [128.42.201.101]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9BFF829A; Mon, 27 Oct 2014 06:59:33 +0000 (UTC) Received: from pps.filterd (pp2.rice.edu [127.0.0.1]) by pp2.rice.edu (8.14.5/8.14.5) with SMTP id s9R6xQFn024741; Mon, 27 Oct 2014 01:59:26 -0500 Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30]) by pp2.rice.edu with ESMTP id 1q7yw40j3w-1; Mon, 27 Oct 2014 01:59:25 -0500 X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel Received: from 108-254-203-201.lightspeed.hstntx.sbcglobal.net (108-254-203-201.lightspeed.hstntx.sbcglobal.net [108.254.203.201]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh11.mail.rice.edu (Postfix) with ESMTPSA id 4752B4C00A5; Mon, 27 Oct 2014 01:59:25 -0500 (CDT) Message-ID: <544DED4C.3010501@rice.edu> Date: Mon, 27 Oct 2014 01:59:24 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Svatopluk Kraus Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE References: <5428AF3B.1030906@rice.edu> <54497DC1.5070506@rice.edu> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: multipart/mixed; boundary="------------090909070100060609070401" X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=0 kscore.compositescore=0.999328515101207 circleOfTrustscore=0 compositescore=0.601496849000349 urlsuspect_oldscore=0.00149684900034924 suspectscore=11 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 recipient_to_sender_totalscore=0 recipient_domain_to_sender_domain_totalscore=0 rbsscore=0.601496849000349 spamscore=0 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1410270079 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: alc@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 06:59:35 -0000 This is a multi-part message in MIME format. --------------090909070100060609070401 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 10/24/2014 06:33, Svatopluk Kraus wrote: > > On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox > wrote: > > On 10/08/2014 10:38, Svatopluk Kraus wrote: > > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox > wrote: > > > >> On 09/27/2014 03:51, Svatopluk Kraus wrote: > >> > >> > >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox > wrote: > >> > >>> > >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus > > > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> I and Michal are finishing new ARM pmap-v6 code. There is one > problem > >>>> we've > >>>> dealt with somehow, but now we would like to do it better. > It's about > >>>> physical pages which are allocated before vm subsystem is > initialized. > >>>> While later on these pages could be found in vm_page_array when > >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for > >>>> VM_PHYSSEG_SPARSE > >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. > >>>> > >>>> It really would be nice to utilize vm_page_array for such > preallocated > >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is > used. Things > >>>> could be much easier then. In our case, it's about pages > which are used > >>>> for > >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two > sets of such > >>>> pages. First ones are preallocated and second ones are > allocated after vm > >>>> subsystem was inited. We must deal with each set differently. > So code is > >>>> more complex and so is debugging. > >>>> > >>>> Thus we need some method how to say that some part of > physical memory > >>>> should be included in vm_page_array, but the pages from that > region > >>>> should > >>>> not be put to free list during initialization. We think that such > >>>> possibility could be utilized in general. There could be a > need for some > >>>> physical space which: > >>>> > >>>> (1) is needed only during boot and later on it can be freed > and put to vm > >>>> subsystem, > >>>> > >>>> (2) is needed for something else and vm_page_array code could > be used > >>>> without some kind of its duplication. > >>>> > >>>> There is already some code which deals with blacklisted pages in > >>>> vm_page.c > >>>> file. So the easiest way how to deal with presented situation > is to add > >>>> some callback to this part of code which will be able to > either exclude > >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. > As the > >>>> biggest > >>>> phys_avail region is used for vm subsystem allocations, there > should be > >>>> some more coding. (However, blacklisted pages are not dealt > with on that > >>>> part of region.) > >>>> > >>>> We would like to know if there is any objection: > >>>> > >>>> (1) to deal with presented problem, > >>>> (2) to deal with the problem presented way. > >>>> Some help is very appreciated. Thanks > >>>> > >>>> > >>> As an experiment, try modifying vm_phys.c to use dump_avail > instead of > >>> phys_avail when sizing vm_page_array. On amd64, where the > same problem > >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, > this is > >>> probably my preferred solution. The catch being that not all > architectures > >>> implement dump_avail, but my recollection is that arm does. > >>> > >> Frankly, I would prefer this too, but there is one big open > question: > >> > >> What is dump_avail for? > >> > >> > >> > >> dump_avail[] is solving a similar problem in the minidump code, > hence, the > >> prefix "dump_" in its name. In other words, the minidump code > couldn't use > >> phys_avail[] either because it didn't describe the full range > of physical > >> addresses that might be included in a minidump, so dump_avail[] > was created. > >> > >> There is already precedent for what I'm suggesting. > dump_avail[] is > >> already (ab)used outside of the minidump code on x86 to solve > this same > >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. > >> > >> > >> Using it for vm_page_array initialization and segmentation > means that > >> phys_avail must be a subset of it. And this must be stated and > be visible > >> enough. Maybe it should be even checked in code. I like the idea of > >> thinking about dump_avail as something what desribes all memory > in a > >> system, but it's not how dump_avail is defined in archs now. > >> > >> > >> > >> When you say "it's not how dump_avail is defined in archs now", > I'm not > >> sure whether you're talking about the code or the comments. In > terms of > >> code, dump_avail[] is a superset of phys_avail[], and I'm not > aware of any > >> code that would have to change. In terms of comments, I did a > grep looking > >> for comments defining what dump_avail[] is, because I couldn't > remember > >> any. I found one ... on arm. So, I don't think it's a onerous > task > >> changing the definition of dump_avail[]. :-) > >> > >> Already, as things stand today with dump_avail[] being used > outside of the > >> minidump code, one could reasonably argue that it should be > renamed to > >> something like phys_exists[]. > >> > >> > >> > >> I will experiment with it on monday then. However, it's not > only about how > >> memory segments are created in vm_phys.c, but it's about how > vm_page_array > >> size is computed in vm_page.c too. > >> > >> > >> > >> Yes, and there is also a place in vm_reserv.c that needs to > change. I've > >> attached the patch that I developed and tested a long time > ago. It still > >> applies cleanly and runs ok on amd64. > >> > >> > >> > > > > > > Well, I've created and tested minimalistic patch which - I hope - is > > commitable. It runs ok on pandaboard (arm-v6) and solves > presented problem. > > I would really appreciate if this will be commited. Thanks. > > > Sorry for the slow reply. I've just been swamped with work lately. I > finally had some time to look at this in the last day or so. > > The first thing that I propose to do is commit the attached > patch. This > patch changes pmap_init() on amd64, armv6, and i386 so that it no > longer > consults phys_avail[] to determine the end of memory. Instead, it > calls > a new function provided by vm_phys.c to obtain the same > information from > vm_phys_segs[]. > > With this change, the new variable phys_managed in your patch wouldn't > need to be a global. It could be a local variable in > vm_page_startup() > that we pass as a parameter to vm_phys_init() and vm_reserv_init(). > > More generally, the long-term vision that I have is that we would stop > using phys_avail[] after vm_page_startup() had completed. It > would only > be used during initialization. After that we would use vm_phys_segs[] > and functions provided by vm_phys.c. > > > I understand. The patch and the long-term vision are fine for me. I > just was not to bold to pass phys_managed as a parameter to > vm_phys_init() and vm_reserv_init(). However, I certainly was thinking > about it. While reading comment above vm_phys_get_end(), do we care of > if last usable address is 0xFFFFFFFF? To date, this hasn't been a problem. However, handling 0xFFFFFFFF is easy. So, the final version of the patch that I committed this weekend does so. Can you please try the attached patch? It replaces phys_avail[] with vm_phys_segs[] in arm's busdma. > Do you think that the rest of my patch considering changes due to your > patch is ok? > Basically, yes. I do, however, think that +#if defined(__arm__) + phys_managed = dump_avail; +#else + phys_managed = phys_avail; +#endif should also be conditioned on VM_PHYSSEG_SPARSE. > > > > > > BTW, while I was inspecting all archs, I think that maybe it's > time to do > > what was done for busdma not long ago. There are many similar > codes across > > archs which deal with physical memory and could be generalized > and put to > > kern/subr_physmem.c for utilization. All work with physical > memory could be > > simplify to two arrays of regions. > > > > phys_present[] ... describes all present physical memory regions > > phys_exclude[] ... describes various exclusions from phys_present[] > > > > Each excluded region will be labeled by flags to say what kind > of exclusion > > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, > NOMEMRW could > > be combined. This idea is taken from sys/arm/arm/physmem.c. > > > > All other arrays like phys_managed[], phys_avail[], dump_avail[] > will be > > created from these phys_present[] and phys_exclude[]. > > This way bootstrap codes in archs could be simplified and > unified. For > > example, dealing with either hw.physmem or page with PA > 0x00000000 could be > > transparent. > > > > I'm prepared to volunteer if the thing is ripe. However, some > tutor will be > > looked for. > > > I've never really looked at arm/arm/physmem.c before. Let me do that > before I comment on this. > > No problem. This could be long-term aim. However, I hope the > VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present > time. In every case, thanks for your help. > > --------------090909070100060609070401 Content-Type: text/plain; charset=ISO-8859-15; name="busdma_arm1.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="busdma_arm1.patch" SW5kZXg6IGFybS9hcm0vYnVzZG1hX21hY2hkZXAtdjYuYwo9PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBh cm0vYXJtL2J1c2RtYV9tYWNoZGVwLXY2LmMJKHJldmlzaW9uIDI3MzY5OSkKKysrIGFybS9h cm0vYnVzZG1hX21hY2hkZXAtdjYuYwkod29ya2luZyBjb3B5KQpAQCAtNTQsOCArNTQsOSBA QCBfX0ZCU0RJRCgiJEZyZWVCU0QkIik7CiAjaW5jbHVkZSA8c3lzL3Vpby5oPgogCiAjaW5j bHVkZSA8dm0vdm0uaD4KKyNpbmNsdWRlIDx2bS92bV9wYXJhbS5oPgogI2luY2x1ZGUgPHZt L3ZtX3BhZ2UuaD4KLSNpbmNsdWRlIDx2bS92bV9tYXAuaD4KKyNpbmNsdWRlIDx2bS92bV9w aHlzLmg+CiAjaW5jbHVkZSA8dm0vdm1fZXh0ZXJuLmg+CiAjaW5jbHVkZSA8dm0vdm1fa2Vy bi5oPgogCkBAIC0yNzcsMTYgKzI3OCwxOCBAQCBTWVNJTklUKGJ1c2RtYSwgU0lfU1VCX0tN RU0rMSwgU0lfT1JERVJfRklSU1QsIGJ1cwogICogZXhwcmVzcywgc28gd2UgdGFrZSBhIGZh c3Qgb3V0LgogICovCiBzdGF0aWMgaW50Ci1leGNsdXNpb25fYm91bmNlX2NoZWNrKHZtX29m ZnNldF90IGxvd2FkZHIsIHZtX29mZnNldF90IGhpZ2hhZGRyKQorZXhjbHVzaW9uX2JvdW5j ZV9jaGVjayh2bV9wYWRkcl90IGxvd2FkZHIsIHZtX3BhZGRyX3QgaGlnaGFkZHIpCiB7CisJ c3RydWN0IHZtX3BoeXNfc2VnICpzZWc7CiAJaW50IGk7CiAKIAlpZiAobG93YWRkciA+PSBC VVNfU1BBQ0VfTUFYQUREUikKIAkJcmV0dXJuICgwKTsKIAotCWZvciAoaSA9IDA7IHBoeXNf YXZhaWxbaV0gJiYgcGh5c19hdmFpbFtpICsgMV07IGkgKz0gMikgewotCQlpZiAoKGxvd2Fk ZHIgPj0gcGh5c19hdmFpbFtpXSAmJiBsb3dhZGRyIDwgcGh5c19hdmFpbFtpICsgMV0pIHx8 Ci0JCSAgICAobG93YWRkciA8IHBoeXNfYXZhaWxbaV0gJiYgaGlnaGFkZHIgPj0gcGh5c19h dmFpbFtpXSkpCisJZm9yIChpID0gMDsgaSA8IHZtX3BoeXNfbnNlZ3M7IGkrKykgeworCQlz ZWcgPSAmdm1fcGh5c19zZWdzW2ldOworCQlpZiAoKGxvd2FkZHIgPj0gc2VnLT5zdGFydCAm JiBsb3dhZGRyIDwgc2VnLT5lbmQpIHx8CisJCSAgICAobG93YWRkciA8IHNlZy0+c3RhcnQg JiYgaGlnaGFkZHIgPj0gc2VnLT5zdGFydCkpCiAJCQlyZXR1cm4gKDEpOwogCX0KIAlyZXR1 cm4gKDApOwpJbmRleDogYXJtL2FybS9idXNkbWFfbWFjaGRlcC5jCj09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0K LS0tIGFybS9hcm0vYnVzZG1hX21hY2hkZXAuYwkocmV2aXNpb24gMjczNjk5KQorKysgYXJt L2FybS9idXNkbWFfbWFjaGRlcC5jCSh3b3JraW5nIGNvcHkpCkBAIC03MCwxMCArNzAsMTEg QEAgX19GQlNESUQoIiRGcmVlQlNEJCIpOwogCiAjaW5jbHVkZSA8dm0vdW1hLmg+CiAjaW5j bHVkZSA8dm0vdm0uaD4KKyNpbmNsdWRlIDx2bS92bV9wYXJhbS5oPgogI2luY2x1ZGUgPHZt L3ZtX2V4dGVybi5oPgogI2luY2x1ZGUgPHZtL3ZtX2tlcm4uaD4KICNpbmNsdWRlIDx2bS92 bV9wYWdlLmg+Ci0jaW5jbHVkZSA8dm0vdm1fbWFwLmg+CisjaW5jbHVkZSA8dm0vdm1fcGh5 cy5oPgogCiAjaW5jbHVkZSA8bWFjaGluZS9hdG9taWMuaD4KICNpbmNsdWRlIDxtYWNoaW5l L2J1cy5oPgpAQCAtMzI2LDE3ICszMjcsMTkgQEAgcnVuX2ZpbHRlcihidXNfZG1hX3RhZ190 IGRtYXQsIGJ1c19hZGRyX3QgcGFkZHIpCiAgKiBleHByZXNzLCBzbyB3ZSB0YWtlIGEgZmFz dCBvdXQuCiAgKi8KIHN0YXRpYyBfX2lubGluZSBpbnQKLV9idXNfZG1hX2Nhbl9ib3VuY2Uo dm1fb2Zmc2V0X3QgbG93YWRkciwgdm1fb2Zmc2V0X3QgaGlnaGFkZHIpCitfYnVzX2RtYV9j YW5fYm91bmNlKHZtX3BhZGRyX3QgbG93YWRkciwgdm1fcGFkZHJfdCBoaWdoYWRkcikKIHsK KwlzdHJ1Y3Qgdm1fcGh5c19zZWcgKnNlZzsKIAlpbnQgaTsKIAogCWlmIChsb3dhZGRyID49 IEJVU19TUEFDRV9NQVhBRERSKQogCQlyZXR1cm4gKDApOwogCi0JZm9yIChpID0gMDsgcGh5 c19hdmFpbFtpXSAmJiBwaHlzX2F2YWlsW2kgKyAxXTsgaSArPSAyKSB7Ci0JCWlmICgobG93 YWRkciA+PSBwaHlzX2F2YWlsW2ldICYmIGxvd2FkZHIgPD0gcGh5c19hdmFpbFtpICsgMV0p Ci0JCSAgICB8fCAobG93YWRkciA8IHBoeXNfYXZhaWxbaV0gJiYKLQkJICAgIGhpZ2hhZGRy ID4gcGh5c19hdmFpbFtpXSkpCisJZm9yIChpID0gMDsgaSA8IHZtX3BoeXNfbnNlZ3M7IGkr KykgeworCQlzZWcgPSAmdm1fcGh5c19zZWdzW2ldOworCQlpZiAoKGxvd2FkZHIgPj0gc2Vn LT5zdGFydCAmJiBsb3dhZGRyIDw9IHNlZy0+ZW5kKQorCQkgICAgfHwgKGxvd2FkZHIgPCBz ZWctPnN0YXJ0ICYmCisJCSAgICBoaWdoYWRkciA+IHNlZy0+c3RhcnQpKQogCQkJcmV0dXJu ICgxKTsKIAl9CiAJcmV0dXJuICgwKTsK --------------090909070100060609070401-- From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 13:22:54 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 77C9793; Mon, 27 Oct 2014 13:22:54 +0000 (UTC) Received: from mail-qa0-x232.google.com (mail-qa0-x232.google.com [IPv6:2607:f8b0:400d:c00::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1CE05794; Mon, 27 Oct 2014 13:22:54 +0000 (UTC) Received: by mail-qa0-f50.google.com with SMTP id cs9so3700333qab.23 for ; Mon, 27 Oct 2014 06:22:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=VDnRCsrq4Y8dqS0xYiI32m8B/cg4150RMsVgyAnfORY=; b=Yz5NQHZjUx88Baghv3u0NbsBYhmkk4Aj8BW3o+mqC6kSmRJ62m1ju5Aa2XSXkuqJrV UchSZa0h00oz7sxTpX/EtBh3dzqBg5Y7ogmGyu8Ln0mOPzD4IEwI7BIt2S1xreYhdhS8 RRpv1P8UebIkYHZ+w8ildF1X/xzuNTir3t9UJKp9qaQQHWxGxLnPXYlwQyt+Rjjw03Yh cchO840P6sf/G9ihRk6zM2HheNkARhWKOA+YoVsojpG+xCj43bFcBSGs49k7uH1napda FTBHtMfvL6NB0rng58Ey44q1ZO71zTpxK8qEmKjiRmBA9OOy89ARri/NO1vtlrJqhglj 1o6A== MIME-Version: 1.0 X-Received: by 10.229.176.70 with SMTP id bd6mr20043683qcb.12.1414416173014; Mon, 27 Oct 2014 06:22:53 -0700 (PDT) Received: by 10.140.23.242 with HTTP; Mon, 27 Oct 2014 06:22:52 -0700 (PDT) In-Reply-To: <544DED4C.3010501@rice.edu> References: <5428AF3B.1030906@rice.edu> <54497DC1.5070506@rice.edu> <544DED4C.3010501@rice.edu> Date: Mon, 27 Oct 2014 14:22:52 +0100 Message-ID: Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE From: Svatopluk Kraus To: Alan Cox Content-Type: multipart/mixed; boundary=001a11c2d8ba8f4c290506676df8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: alc@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 13:22:54 -0000 --001a11c2d8ba8f4c290506676df8 Content-Type: text/plain; charset=UTF-8 On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox wrote: > On 10/24/2014 06:33, Svatopluk Kraus wrote: > > > On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox wrote: > >> On 10/08/2014 10:38, Svatopluk Kraus wrote: >> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox wrote: >> > >> >> On 09/27/2014 03:51, Svatopluk Kraus wrote: >> >> >> >> >> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox >> wrote: >> >> >> >>> >> >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus >> >>> wrote: >> >>> >> >>>> Hi, >> >>>> >> >>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem >> >>>> we've >> >>>> dealt with somehow, but now we would like to do it better. It's about >> >>>> physical pages which are allocated before vm subsystem is >> initialized. >> >>>> While later on these pages could be found in vm_page_array when >> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for >> >>>> VM_PHYSSEG_SPARSE >> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. >> >>>> >> >>>> It really would be nice to utilize vm_page_array for such >> preallocated >> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. >> Things >> >>>> could be much easier then. In our case, it's about pages which are >> used >> >>>> for >> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of >> such >> >>>> pages. First ones are preallocated and second ones are allocated >> after vm >> >>>> subsystem was inited. We must deal with each set differently. So >> code is >> >>>> more complex and so is debugging. >> >>>> >> >>>> Thus we need some method how to say that some part of physical memory >> >>>> should be included in vm_page_array, but the pages from that region >> >>>> should >> >>>> not be put to free list during initialization. We think that such >> >>>> possibility could be utilized in general. There could be a need for >> some >> >>>> physical space which: >> >>>> >> >>>> (1) is needed only during boot and later on it can be freed and put >> to vm >> >>>> subsystem, >> >>>> >> >>>> (2) is needed for something else and vm_page_array code could be used >> >>>> without some kind of its duplication. >> >>>> >> >>>> There is already some code which deals with blacklisted pages in >> >>>> vm_page.c >> >>>> file. So the easiest way how to deal with presented situation is to >> add >> >>>> some callback to this part of code which will be able to either >> exclude >> >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the >> >>>> biggest >> >>>> phys_avail region is used for vm subsystem allocations, there should >> be >> >>>> some more coding. (However, blacklisted pages are not dealt with on >> that >> >>>> part of region.) >> >>>> >> >>>> We would like to know if there is any objection: >> >>>> >> >>>> (1) to deal with presented problem, >> >>>> (2) to deal with the problem presented way. >> >>>> Some help is very appreciated. Thanks >> >>>> >> >>>> >> >>> As an experiment, try modifying vm_phys.c to use dump_avail instead of >> >>> phys_avail when sizing vm_page_array. On amd64, where the same >> problem >> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, this is >> >>> probably my preferred solution. The catch being that not all >> architectures >> >>> implement dump_avail, but my recollection is that arm does. >> >>> >> >> Frankly, I would prefer this too, but there is one big open question: >> >> >> >> What is dump_avail for? >> >> >> >> >> >> >> >> dump_avail[] is solving a similar problem in the minidump code, hence, >> the >> >> prefix "dump_" in its name. In other words, the minidump code >> couldn't use >> >> phys_avail[] either because it didn't describe the full range of >> physical >> >> addresses that might be included in a minidump, so dump_avail[] was >> created. >> >> >> >> There is already precedent for what I'm suggesting. dump_avail[] is >> >> already (ab)used outside of the minidump code on x86 to solve this same >> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. >> >> >> >> >> >> Using it for vm_page_array initialization and segmentation means that >> >> phys_avail must be a subset of it. And this must be stated and be >> visible >> >> enough. Maybe it should be even checked in code. I like the idea of >> >> thinking about dump_avail as something what desribes all memory in a >> >> system, but it's not how dump_avail is defined in archs now. >> >> >> >> >> >> >> >> When you say "it's not how dump_avail is defined in archs now", I'm not >> >> sure whether you're talking about the code or the comments. In terms >> of >> >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of >> any >> >> code that would have to change. In terms of comments, I did a grep >> looking >> >> for comments defining what dump_avail[] is, because I couldn't remember >> >> any. I found one ... on arm. So, I don't think it's a onerous task >> >> changing the definition of dump_avail[]. :-) >> >> >> >> Already, as things stand today with dump_avail[] being used outside of >> the >> >> minidump code, one could reasonably argue that it should be renamed to >> >> something like phys_exists[]. >> >> >> >> >> >> >> >> I will experiment with it on monday then. However, it's not only about >> how >> >> memory segments are created in vm_phys.c, but it's about how >> vm_page_array >> >> size is computed in vm_page.c too. >> >> >> >> >> >> >> >> Yes, and there is also a place in vm_reserv.c that needs to change. >> I've >> >> attached the patch that I developed and tested a long time ago. It >> still >> >> applies cleanly and runs ok on amd64. >> >> >> >> >> >> >> > >> > >> > Well, I've created and tested minimalistic patch which - I hope - is >> > commitable. It runs ok on pandaboard (arm-v6) and solves presented >> problem. >> > I would really appreciate if this will be commited. Thanks. >> >> >> Sorry for the slow reply. I've just been swamped with work lately. I >> finally had some time to look at this in the last day or so. >> >> The first thing that I propose to do is commit the attached patch. This >> patch changes pmap_init() on amd64, armv6, and i386 so that it no longer >> consults phys_avail[] to determine the end of memory. Instead, it calls >> a new function provided by vm_phys.c to obtain the same information from >> vm_phys_segs[]. >> >> With this change, the new variable phys_managed in your patch wouldn't >> need to be a global. It could be a local variable in vm_page_startup() >> that we pass as a parameter to vm_phys_init() and vm_reserv_init(). >> >> More generally, the long-term vision that I have is that we would stop >> using phys_avail[] after vm_page_startup() had completed. It would only >> be used during initialization. After that we would use vm_phys_segs[] >> and functions provided by vm_phys.c. >> > > I understand. The patch and the long-term vision are fine for me. I just > was not to bold to pass phys_managed as a parameter to vm_phys_init() and > vm_reserv_init(). However, I certainly was thinking about it. While reading > comment above vm_phys_get_end(), do we care of if last usable address is > 0xFFFFFFFF? > > > > To date, this hasn't been a problem. However, handling 0xFFFFFFFF is > easy. So, the final version of the patch that I committed this weekend > does so. > > Can you please try the attached patch? It replaces phys_avail[] with > vm_phys_segs[] in arm's busdma. > It works fine on arm-v6 pandaboard. I have no objection to commit it. However, it's only 1:1 replacement. In fact, I still keep the following pattern in my head: present memory in system <=> all RAM and whatsoever nobounce memory <=> addressable by DMA managed memory by vm subsystem <=> i.e. kept in vm_page_array available memory for vm subsystem <=> can be allocated So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it could be too much limiting in some scenarios. I would like to see something different in exclusion_bounce_check() in the future. Something what reflects NOBOUNCE property and not NOALLOC one like now. > > > > Do you think that the rest of my patch considering changes due to your > patch is ok? > > > > > Basically, yes. I do, however, think that > > +#if defined(__arm__) > + phys_managed = dump_avail; > +#else > + phys_managed = phys_avail; > +#endif > > should also be conditioned on VM_PHYSSEG_SPARSE. > So I've prepared new patch. phys_managed[] is passed to vm_phys_init() and vm_reserv_init() as a parameter and small optimalization is made in vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you mentioned. Anyhow, I still think that this is only temporary hack. In general, phys_managed[] should always be distinguished from phys_avail[]. > > >> > >> > BTW, while I was inspecting all archs, I think that maybe it's time to >> do >> > what was done for busdma not long ago. There are many similar codes >> across >> > archs which deal with physical memory and could be generalized and put >> to >> > kern/subr_physmem.c for utilization. All work with physical memory >> could be >> > simplify to two arrays of regions. >> > >> > phys_present[] ... describes all present physical memory regions >> > phys_exclude[] ... describes various exclusions from phys_present[] >> > >> > Each excluded region will be labeled by flags to say what kind of >> exclusion >> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW >> could >> > be combined. This idea is taken from sys/arm/arm/physmem.c. >> > >> > All other arrays like phys_managed[], phys_avail[], dump_avail[] will be >> > created from these phys_present[] and phys_exclude[]. >> > This way bootstrap codes in archs could be simplified and unified. For >> > example, dealing with either hw.physmem or page with PA 0x00000000 >> could be >> > transparent. >> > >> > I'm prepared to volunteer if the thing is ripe. However, some tutor >> will be >> > looked for. >> >> >> I've never really looked at arm/arm/physmem.c before. Let me do that >> before I comment on this. >> >> No problem. This could be long-term aim. However, I hope the > VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present time. > In every case, thanks for your help. > > > > > --001a11c2d8ba8f4c290506676df8 Content-Type: application/octet-stream; name="phys_managed2.patch" Content-Disposition: attachment; filename="phys_managed2.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i1rurru11 SW5kZXg6IHN5cy92bS92bV9wYWdlLmMKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gc3lzL3ZtL3ZtX3BhZ2UuYwko cmV2aXNpb24gMjczNzM0KQorKysgc3lzL3ZtL3ZtX3BhZ2UuYwkod29ya2luZyBjb3B5KQpAQCAt MjkwLDYgKzI5MCw3IEBACiAJdm1fcGFkZHJfdCBwYTsKIAl2bV9wYWRkcl90IGxhc3RfcGE7CiAJ Y2hhciAqbGlzdDsKKwl2bV9wYWRkcl90ICpwaHlzX21hbmFnZWQ7CgogCS8qIHRoZSBiaWdnZXN0 IG1lbW9yeSBhcnJheSBpcyB0aGUgc2Vjb25kIGdyb3VwIG9mIHBhZ2VzICovCiAJdm1fcGFkZHJf dCBlbmQ7CkBAIC0zMDEsMzEgKzMwMiwzOSBAQAogCWJpZ2dlc3RvbmUgPSAwOwogCXZhZGRyID0g cm91bmRfcGFnZSh2YWRkcik7CgotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSArIDFdOyBpICs9 IDIpIHsKLQkJcGh5c19hdmFpbFtpXSA9IHJvdW5kX3BhZ2UocGh5c19hdmFpbFtpXSk7Ci0JCXBo eXNfYXZhaWxbaSArIDFdID0gdHJ1bmNfcGFnZShwaHlzX2F2YWlsW2kgKyAxXSk7CisjaWYgZGVm aW5lZChWTV9QSFlTU0VHX1NQQVJTRSkgJiYgZGVmaW5lZChfX2FybV9fKQorCXBoeXNfbWFuYWdl ZCA9IGR1bXBfYXZhaWw7CisjZWxzZQorCXBoeXNfbWFuYWdlZCA9IHBoeXNfYXZhaWw7CisjZW5k aWYKKworCWxvd193YXRlciA9IHJvdW5kX3BhZ2UocGh5c19tYW5hZ2VkWzBdKTsKKwloaWdoX3dh dGVyID0gcm91bmRfcGFnZShwaHlzX21hbmFnZWRbMV0pOworCWZvciAoaSA9IDI7IHBoeXNfbWFu YWdlZFtpICsgMV07IGkgKz0gMikgeworCQlwaHlzX21hbmFnZWRbaV0gPSByb3VuZF9wYWdlKHBo eXNfbWFuYWdlZFtpXSk7CisJCXBoeXNfbWFuYWdlZFtpICsgMV0gPSB0cnVuY19wYWdlKHBoeXNf bWFuYWdlZFtpICsgMV0pOworCQlpZiAocGh5c19tYW5hZ2VkW2ldIDwgbG93X3dhdGVyKQorCQkJ bG93X3dhdGVyID0gcGh5c19tYW5hZ2VkW2ldOworCQlpZiAocGh5c19tYW5hZ2VkW2kgKyAxXSA+ IGhpZ2hfd2F0ZXIpCisJCQloaWdoX3dhdGVyID0gcGh5c19tYW5hZ2VkW2kgKyAxXTsKIAl9Cgot CWxvd193YXRlciA9IHBoeXNfYXZhaWxbMF07Ci0JaGlnaF93YXRlciA9IHBoeXNfYXZhaWxbMV07 CisjaWZkZWYgWEVOCisJbG93X3dhdGVyID0gMDsKKyNlbmRpZgoKIAlmb3IgKGkgPSAwOyBwaHlz X2F2YWlsW2kgKyAxXTsgaSArPSAyKSB7Ci0JCXZtX3BhZGRyX3Qgc2l6ZSA9IHBoeXNfYXZhaWxb aSArIDFdIC0gcGh5c19hdmFpbFtpXTsKKwkJdm1fcGFkZHJfdCBzaXplOwoKKwkJcGh5c19hdmFp bFtpXSA9IHJvdW5kX3BhZ2UocGh5c19hdmFpbFtpXSk7CisJCXBoeXNfYXZhaWxbaSArIDFdID0g dHJ1bmNfcGFnZShwaHlzX2F2YWlsW2kgKyAxXSk7CisJCXNpemUgPSBwaHlzX2F2YWlsW2kgKyAx XSAtIHBoeXNfYXZhaWxbaV07CiAJCWlmIChzaXplID4gYmlnZ2VzdHNpemUpIHsKIAkJCWJpZ2dl c3RvbmUgPSBpOwogCQkJYmlnZ2VzdHNpemUgPSBzaXplOwogCQl9Ci0JCWlmIChwaHlzX2F2YWls W2ldIDwgbG93X3dhdGVyKQotCQkJbG93X3dhdGVyID0gcGh5c19hdmFpbFtpXTsKLQkJaWYgKHBo eXNfYXZhaWxbaSArIDFdID4gaGlnaF93YXRlcikKLQkJCWhpZ2hfd2F0ZXIgPSBwaHlzX2F2YWls W2kgKyAxXTsKIAl9CgotI2lmZGVmIFhFTgotCWxvd193YXRlciA9IDA7Ci0jZW5kaWYKLQogCWVu ZCA9IHBoeXNfYXZhaWxbYmlnZ2VzdG9uZSsxXTsKCiAJLyoKQEAgLTM5Myw4ICs0MDIsOCBAQAog CWZpcnN0X3BhZ2UgPSBsb3dfd2F0ZXIgLyBQQUdFX1NJWkU7CiAjaWZkZWYgVk1fUEhZU1NFR19T UEFSU0UKIAlwYWdlX3JhbmdlID0gMDsKLQlmb3IgKGkgPSAwOyBwaHlzX2F2YWlsW2kgKyAxXSAh PSAwOyBpICs9IDIpCi0JCXBhZ2VfcmFuZ2UgKz0gYXRvcChwaHlzX2F2YWlsW2kgKyAxXSAtIHBo eXNfYXZhaWxbaV0pOworCWZvciAoaSA9IDA7IHBoeXNfbWFuYWdlZFtpICsgMV0gIT0gMDsgaSAr PSAyKQorCQlwYWdlX3JhbmdlICs9IGF0b3AocGh5c19tYW5hZ2VkW2kgKyAxXSAtIHBoeXNfbWFu YWdlZFtpXSk7CiAjZWxpZiBkZWZpbmVkKFZNX1BIWVNTRUdfREVOU0UpCiAJcGFnZV9yYW5nZSA9 IGhpZ2hfd2F0ZXIgLyBQQUdFX1NJWkUgLSBmaXJzdF9wYWdlOwogI2Vsc2UKQEAgLTQ0NSw3ICs0 NTQsNyBAQAogCS8qCiAJICogSW5pdGlhbGl6ZSB0aGUgcGh5c2ljYWwgbWVtb3J5IGFsbG9jYXRv ci4KIAkgKi8KLQl2bV9waHlzX2luaXQoKTsKKwl2bV9waHlzX2luaXQocGh5c19tYW5hZ2VkKTsK CiAJLyoKIAkgKiBBZGQgZXZlcnkgYXZhaWxhYmxlIHBoeXNpY2FsIHBhZ2UgdGhhdCBpcyBub3Qg YmxhY2tsaXN0ZWQgdG8KQEAgLTQ3Miw3ICs0ODEsNyBAQAogCS8qCiAJICogSW5pdGlhbGl6ZSB0 aGUgcmVzZXJ2YXRpb24gbWFuYWdlbWVudCBzeXN0ZW0uCiAJICovCi0Jdm1fcmVzZXJ2X2luaXQo KTsKKwl2bV9yZXNlcnZfaW5pdChwaHlzX21hbmFnZWQpOwogI2VuZGlmCiAJcmV0dXJuICh2YWRk cik7CiB9CkluZGV4OiBzeXMvdm0vdm1fcGh5cy5jCj09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHN5cy92bS92bV9w aHlzLmMJKHJldmlzaW9uIDI3MzczNCkKKysrIHN5cy92bS92bV9waHlzLmMJKHdvcmtpbmcgY29w eSkKQEAgLTM2MCwyMiArMzYwLDIyIEBACiAgKiBJbml0aWFsaXplIHRoZSBwaHlzaWNhbCBtZW1v cnkgYWxsb2NhdG9yLgogICovCiB2b2lkCi12bV9waHlzX2luaXQodm9pZCkKK3ZtX3BoeXNfaW5p dCh2bV9wYWRkcl90ICpyZWdpb25zKQogewogCXN0cnVjdCB2bV9mcmVlbGlzdCAqZmw7CiAJaW50 IGRvbSwgZmxpbmQsIGksIG9pbmQsIHBpbmQ7CgotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSAr IDFdICE9IDA7IGkgKz0gMikgeworCWZvciAoaSA9IDA7IHJlZ2lvbnNbaSArIDFdICE9IDA7IGkg Kz0gMikgewogI2lmZGVmCVZNX0ZSRUVMSVNUX0lTQURNQQotCQlpZiAocGh5c19hdmFpbFtpXSA8 IDE2Nzc3MjE2KSB7Ci0JCQlpZiAocGh5c19hdmFpbFtpICsgMV0gPiAxNjc3NzIxNikgewotCQkJ CXZtX3BoeXNfY3JlYXRlX3NlZyhwaHlzX2F2YWlsW2ldLCAxNjc3NzIxNiwKKwkJaWYgKHJlZ2lv bnNbaV0gPCAxNjc3NzIxNikgeworCQkJaWYgKHJlZ2lvbnNbaSArIDFdID4gMTY3NzcyMTYpIHsK KwkJCQl2bV9waHlzX2NyZWF0ZV9zZWcocmVnaW9uc1tpXSwgMTY3NzcyMTYsCiAJCQkJICAgIFZN X0ZSRUVMSVNUX0lTQURNQSk7Ci0JCQkJdm1fcGh5c19jcmVhdGVfc2VnKDE2Nzc3MjE2LCBwaHlz X2F2YWlsW2kgKyAxXSwKKwkJCQl2bV9waHlzX2NyZWF0ZV9zZWcoMTY3NzcyMTYsIHJlZ2lvbnNb aSArIDFdLAogCQkJCSAgICBWTV9GUkVFTElTVF9ERUZBVUxUKTsKIAkJCX0gZWxzZSB7Ci0JCQkJ dm1fcGh5c19jcmVhdGVfc2VnKHBoeXNfYXZhaWxbaV0sCi0JCQkJICAgIHBoeXNfYXZhaWxbaSAr IDFdLCBWTV9GUkVFTElTVF9JU0FETUEpOworCQkJCXZtX3BoeXNfY3JlYXRlX3NlZyhyZWdpb25z W2ldLCByZWdpb25zW2kgKyAxXSwKKwkJCQkgICAgVk1fRlJFRUxJU1RfSVNBRE1BKTsKIAkJCX0K IAkJCWlmIChWTV9GUkVFTElTVF9JU0FETUEgPj0gdm1fbmZyZWVsaXN0cykKIAkJCQl2bV9uZnJl ZWxpc3RzID0gVk1fRlJFRUxJU1RfSVNBRE1BICsgMTsKQEAgLTM4MiwyMSArMzgyLDIxIEBACiAJ CX0gZWxzZQogI2VuZGlmCiAjaWZkZWYJVk1fRlJFRUxJU1RfSElHSE1FTQotCQlpZiAocGh5c19h dmFpbFtpICsgMV0gPiBWTV9ISUdITUVNX0FERFJFU1MpIHsKLQkJCWlmIChwaHlzX2F2YWlsW2ld IDwgVk1fSElHSE1FTV9BRERSRVNTKSB7Ci0JCQkJdm1fcGh5c19jcmVhdGVfc2VnKHBoeXNfYXZh aWxbaV0sCisJCWlmIChyZWdpb25zW2kgKyAxXSA+IFZNX0hJR0hNRU1fQUREUkVTUykgeworCQkJ aWYgKHJlZ2lvbnNbaV0gPCBWTV9ISUdITUVNX0FERFJFU1MpIHsKKwkJCQl2bV9waHlzX2NyZWF0 ZV9zZWcocmVnaW9uc1tpXSwKIAkJCQkgICAgVk1fSElHSE1FTV9BRERSRVNTLCBWTV9GUkVFTElT VF9ERUZBVUxUKTsKIAkJCQl2bV9waHlzX2NyZWF0ZV9zZWcoVk1fSElHSE1FTV9BRERSRVNTLAot CQkJCSAgICBwaHlzX2F2YWlsW2kgKyAxXSwgVk1fRlJFRUxJU1RfSElHSE1FTSk7CisJCQkJICAg IHJlZ2lvbnNbaSArIDFdLCBWTV9GUkVFTElTVF9ISUdITUVNKTsKIAkJCX0gZWxzZSB7Ci0JCQkJ dm1fcGh5c19jcmVhdGVfc2VnKHBoeXNfYXZhaWxbaV0sCi0JCQkJICAgIHBoeXNfYXZhaWxbaSAr IDFdLCBWTV9GUkVFTElTVF9ISUdITUVNKTsKKwkJCQl2bV9waHlzX2NyZWF0ZV9zZWcocmVnaW9u c1tpXSwgcmVnaW9uc1tpICsgMV0sCisJCQkJICAgIFZNX0ZSRUVMSVNUX0hJR0hNRU0pOwogCQkJ fQogCQkJaWYgKFZNX0ZSRUVMSVNUX0hJR0hNRU0gPj0gdm1fbmZyZWVsaXN0cykKIAkJCQl2bV9u ZnJlZWxpc3RzID0gVk1fRlJFRUxJU1RfSElHSE1FTSArIDE7CiAJCX0gZWxzZQogI2VuZGlmCi0J CXZtX3BoeXNfY3JlYXRlX3NlZyhwaHlzX2F2YWlsW2ldLCBwaHlzX2F2YWlsW2kgKyAxXSwKKwkJ dm1fcGh5c19jcmVhdGVfc2VnKHJlZ2lvbnNbaV0sIHJlZ2lvbnNbaSArIDFdLAogCQkgICAgVk1f RlJFRUxJU1RfREVGQVVMVCk7CiAJfQogCWZvciAoZG9tID0gMDsgZG9tIDwgdm1fbmRvbWFpbnM7 IGRvbSsrKSB7CkluZGV4OiBzeXMvdm0vdm1fcGh5cy5oCj09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHN5cy92bS92 bV9waHlzLmgJKHJldmlzaW9uIDI3MzczNCkKKysrIHN5cy92bS92bV9waHlzLmgJKHdvcmtpbmcg Y29weSkKQEAgLTgwLDcgKzgwLDcgQEAKIHZtX3BhZ2VfdCB2bV9waHlzX2ZpY3RpdGlvdXNfdG9f dm1fcGFnZSh2bV9wYWRkcl90IHBhKTsKIHZvaWQgdm1fcGh5c19mcmVlX2NvbnRpZyh2bV9wYWdl X3QgbSwgdV9sb25nIG5wYWdlcyk7CiB2b2lkIHZtX3BoeXNfZnJlZV9wYWdlcyh2bV9wYWdlX3Qg bSwgaW50IG9yZGVyKTsKLXZvaWQgdm1fcGh5c19pbml0KHZvaWQpOwordm9pZCB2bV9waHlzX2lu aXQodm1fcGFkZHJfdCAqcmVnaW9ucyk7CiB2bV9wYWdlX3Qgdm1fcGh5c19wYWRkcl90b192bV9w YWdlKHZtX3BhZGRyX3QgcGEpOwogdm9pZCB2bV9waHlzX3NldF9wb29sKGludCBwb29sLCB2bV9w YWdlX3QgbSwgaW50IG9yZGVyKTsKIGJvb2xlYW5fdCB2bV9waHlzX3VuZnJlZV9wYWdlKHZtX3Bh Z2VfdCBtKTsKSW5kZXg6IHN5cy92bS92bV9yZXNlcnYuYwo9PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzeXMvdm0v dm1fcmVzZXJ2LmMJKHJldmlzaW9uIDI3MzczNCkKKysrIHN5cy92bS92bV9yZXNlcnYuYwkod29y a2luZyBjb3B5KQpAQCAtODE1LDcgKzgxNSw3IEBACiAgKiBSZXF1aXJlcyB0aGF0IHZtX3BhZ2Vf YXJyYXkgYW5kIGZpcnN0X3BhZ2UgYXJlIGluaXRpYWxpemVkIQogICovCiB2b2lkCi12bV9yZXNl cnZfaW5pdCh2b2lkKQordm1fcmVzZXJ2X2luaXQodm1fcGFkZHJfdCAqcmVnaW9ucykKIHsKIAl2 bV9wYWRkcl90IHBhZGRyOwogCWludCBpOwpAQCAtODI0LDkgKzgyNCw5IEBACiAJICogSW5pdGlh bGl6ZSB0aGUgcmVzZXJ2YXRpb24gYXJyYXkuICBTcGVjaWZpY2FsbHksIGluaXRpYWxpemUgdGhl CiAJICogInBhZ2VzIiBmaWVsZCBmb3IgZXZlcnkgZWxlbWVudCB0aGF0IGhhcyBhbiB1bmRlcmx5 aW5nIHN1cGVycGFnZS4KIAkgKi8KLQlmb3IgKGkgPSAwOyBwaHlzX2F2YWlsW2kgKyAxXSAhPSAw OyBpICs9IDIpIHsKLQkJcGFkZHIgPSByb3VuZHVwMihwaHlzX2F2YWlsW2ldLCBWTV9MRVZFTF8w X1NJWkUpOwotCQl3aGlsZSAocGFkZHIgKyBWTV9MRVZFTF8wX1NJWkUgPD0gcGh5c19hdmFpbFtp ICsgMV0pIHsKKwlmb3IgKGkgPSAwOyByZWdpb25zW2kgKyAxXSAhPSAwOyBpICs9IDIpIHsKKwkJ cGFkZHIgPSByb3VuZHVwMihyZWdpb25zW2ldLCBWTV9MRVZFTF8wX1NJWkUpOworCQl3aGlsZSAo cGFkZHIgKyBWTV9MRVZFTF8wX1NJWkUgPD0gcmVnaW9uc1tpICsgMV0pIHsKIAkJCXZtX3Jlc2Vy dl9hcnJheVtwYWRkciA+PiBWTV9MRVZFTF8wX1NISUZUXS5wYWdlcyA9CiAJCQkgICAgUEhZU19U T19WTV9QQUdFKHBhZGRyKTsKIAkJCXBhZGRyICs9IFZNX0xFVkVMXzBfU0laRTsKSW5kZXg6IHN5 cy92bS92bV9yZXNlcnYuaAo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzeXMvdm0vdm1fcmVzZXJ2LmgJKHJldmlz aW9uIDI3MzczNCkKKysrIHN5cy92bS92bV9yZXNlcnYuaAkod29ya2luZyBjb3B5KQpAQCAtNTIs NyArNTIsNyBAQAogCQkgICAgdm1fcGFnZV90IG1wcmVkKTsKIHZvaWQJCXZtX3Jlc2Vydl9icmVh a19hbGwodm1fb2JqZWN0X3Qgb2JqZWN0KTsKIGJvb2xlYW5fdAl2bV9yZXNlcnZfZnJlZV9wYWdl KHZtX3BhZ2VfdCBtKTsKLXZvaWQJCXZtX3Jlc2Vydl9pbml0KHZvaWQpOwordm9pZAkJdm1fcmVz ZXJ2X2luaXQodm1fcGFkZHJfdCAqcmVnaW9ucyk7CiBpbnQJCXZtX3Jlc2Vydl9sZXZlbF9pZmZ1 bGxwb3Aodm1fcGFnZV90IG0pOwogYm9vbGVhbl90CXZtX3Jlc2Vydl9yZWFjdGl2YXRlX3BhZ2Uo dm1fcGFnZV90IG0pOwogYm9vbGVhbl90CXZtX3Jlc2Vydl9yZWNsYWltX2NvbnRpZyh1X2xvbmcg bnBhZ2VzLCB2bV9wYWRkcl90IGxvdywK --001a11c2d8ba8f4c290506676df8-- From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 16:29:24 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8CBCEEBF for ; Mon, 27 Oct 2014 16:29:24 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 65A52E85 for ; Mon, 27 Oct 2014 16:29:24 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 66B24B96E; Mon, 27 Oct 2014 12:29:23 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: RfC: fueword(9) and casueword(9) Date: Mon, 27 Oct 2014 11:17:51 -0400 Message-ID: <2048849.GkvWliFbyg@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-PRERELEASE; KDE/4.14.2; amd64; ; ) In-Reply-To: <20141021162306.GE1877@kib.kiev.ua> References: <20141021094539.GA1877@kib.kiev.ua> <20141022002825.H2080@besplex.bde.org> <20141021162306.GE1877@kib.kiev.ua> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 27 Oct 2014 12:29:23 -0400 (EDT) Cc: Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 16:29:24 -0000 On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote: > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote: > > A new API should try to fix these __DEVOLATILE() abominations. I think it > > is safe, and even correct, to declare the pointers as volatile const void > > *, since the functions really can handle volatile data, unlike copyin(). > > > > Atomic op functions are declared as taking pointers to volatile for > > similar reasons. Often they are applied to non-volatile data, but > > adding a qualifier is type-safe and doesn't cost efficiency since the > > pointer access is is not known to the compiler. (The last point is not > > so clear -- the compiler can see things in the functions since they are > > inline asm. fueword() isn't inline so its (in)efficiency is not changed.) > > > > The atomic read functions are not declared as taking pointers to const. > > The __DECONST() abomination might be used to work around this bug. > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the > umtx structures definitions. I think that it is bug to mark the lock > words with volatile. I want the fueword(9) interface to be as much > similar to fuword(9), in particular, volatile seems to be not needed. I agree with Bruce here. casuword() already accepts volatile. I also think umtx is correct in marking the field as volatile. They are subject to change without the compiler's knowledge albeit by other threads rather than signal handlers. Having them marked volatile doesn't really matter for the kernel, but the header is also used in userland and is relevant in sem_new.c, etc. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 16:29:23 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DA659DDF for ; Mon, 27 Oct 2014 16:29:23 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B3044E84 for ; Mon, 27 Oct 2014 16:29:23 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id A575FB941; Mon, 27 Oct 2014 12:29:22 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: refcount_release_take_##lock Date: Mon, 27 Oct 2014 11:27:45 -0400 Message-ID: <2629048.tOq3sNXcCP@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-PRERELEASE; KDE/4.14.2; amd64; ; ) In-Reply-To: <20141025190407.GU82214@funkthat.com> References: <20141025184448.GA19066@dft-labs.eu> <20141025190407.GU82214@funkthat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 27 Oct 2014 12:29:22 -0400 (EDT) Cc: John-Mark Gurney , Mateusz Guzik X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 16:29:23 -0000 On Saturday, October 25, 2014 12:04:07 PM John-Mark Gurney wrote: > Mateusz Guzik wrote this message on Sat, Oct 25, 2014 at 20:44 +0200: > > The following idiom is used here and there: > > > > int old; > > old = obj->ref; > > if (old > 1 && atomic_cmpset_int(&obj->ref, old, old -1)) > > > > return; > > > > lock(&something); > > if (refcount_release(&obj->ref) == 0) { > > > > unlock(&something); > > return; > > > > } > > free up > > unlock(&something); > > > > ========== > > Couldn't this be better written as: > if (__predict_false(refcount_release(&obj->ref) == 0)) { > lock(&something); > if (__predict_true(!obj->ref)) { > free up > } > unlock(&something); > } > > The reason I'm asking is that I changed how IPsec SA ref counting was > handled, and used something similar... No, this has a race as others have noted. Please go fix the IPsec code. :) > My code gets rid of a branch, and is better in that it uses refcount > API properly, instead of using atomic_cmpset_int... He is extending the refcount() API (which uses atomic_* internally). The API implementation _should_ use atomic_* directly. Mateusz, Please keep the refcount_*() prefix so it matches the rest of the API. I would just declare the functions directly in refcount.h rather than requiring a macro to be invoked in each C file. We can also just implement the needed lock types for now instead of all of them. You could maybe replace 'take' with 'lock', but either name is fine. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 16:31:54 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CA992315; Mon, 27 Oct 2014 16:31:54 +0000 (UTC) Received: from pp2.rice.edu (proofpoint2.mail.rice.edu [128.42.201.101]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 74CF7EBD; Mon, 27 Oct 2014 16:31:53 +0000 (UTC) Received: from pps.filterd (pp2.rice.edu [127.0.0.1]) by pp2.rice.edu (8.14.5/8.14.5) with SMTP id s9RGRF9k011035; Mon, 27 Oct 2014 11:31:51 -0500 Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by pp2.rice.edu with ESMTP id 1q7yw40t9v-1; Mon, 27 Oct 2014 11:31:51 -0500 X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received: from 108-254-203-201.lightspeed.hstntx.sbcglobal.net (108-254-203-201.lightspeed.hstntx.sbcglobal.net [108.254.203.201]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh3.mail.rice.edu (Postfix) with ESMTPSA id CE2CA403FC; Mon, 27 Oct 2014 11:31:50 -0500 (CDT) Message-ID: <544E7376.6040002@rice.edu> Date: Mon, 27 Oct 2014 11:31:50 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Svatopluk Kraus Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE References: <5428AF3B.1030906@rice.edu> <54497DC1.5070506@rice.edu> <544DED4C.3010501@rice.edu> In-Reply-To: X-Enigmail-Version: 1.6 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=11 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1410270157 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: alc@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 16:31:54 -0000 On 10/27/2014 08:22, Svatopluk Kraus wrote: > > On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox > wrote: > > On 10/24/2014 06:33, Svatopluk Kraus wrote: >> >> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox > > wrote: >> >> On 10/08/2014 10:38, Svatopluk Kraus wrote: >> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox > > wrote: >> > >> >> On 09/27/2014 03:51, Svatopluk Kraus wrote: >> >> >> >> >> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox >> > wrote: >> >> >> >>> >> >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus >> > >> >>> wrote: >> >>> >> >>>> Hi, >> >>>> >> >>>> I and Michal are finishing new ARM pmap-v6 code. There >> is one problem >> >>>> we've >> >>>> dealt with somehow, but now we would like to do it >> better. It's about >> >>>> physical pages which are allocated before vm subsystem >> is initialized. >> >>>> While later on these pages could be found in >> vm_page_array when >> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for >> >>>> VM_PHYSSEG_SPARSE >> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. >> >>>> >> >>>> It really would be nice to utilize vm_page_array for >> such preallocated >> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model >> is used. Things >> >>>> could be much easier then. In our case, it's about pages >> which are used >> >>>> for >> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have >> two sets of such >> >>>> pages. First ones are preallocated and second ones are >> allocated after vm >> >>>> subsystem was inited. We must deal with each set >> differently. So code is >> >>>> more complex and so is debugging. >> >>>> >> >>>> Thus we need some method how to say that some part of >> physical memory >> >>>> should be included in vm_page_array, but the pages from >> that region >> >>>> should >> >>>> not be put to free list during initialization. We think >> that such >> >>>> possibility could be utilized in general. There could be >> a need for some >> >>>> physical space which: >> >>>> >> >>>> (1) is needed only during boot and later on it can be >> freed and put to vm >> >>>> subsystem, >> >>>> >> >>>> (2) is needed for something else and vm_page_array code >> could be used >> >>>> without some kind of its duplication. >> >>>> >> >>>> There is already some code which deals with blacklisted >> pages in >> >>>> vm_page.c >> >>>> file. So the easiest way how to deal with presented >> situation is to add >> >>>> some callback to this part of code which will be able to >> either exclude >> >>>> whole phys_avail[i], phys_avail[i+1] region or single >> pages. As the >> >>>> biggest >> >>>> phys_avail region is used for vm subsystem allocations, >> there should be >> >>>> some more coding. (However, blacklisted pages are not >> dealt with on that >> >>>> part of region.) >> >>>> >> >>>> We would like to know if there is any objection: >> >>>> >> >>>> (1) to deal with presented problem, >> >>>> (2) to deal with the problem presented way. >> >>>> Some help is very appreciated. Thanks >> >>>> >> >>>> >> >>> As an experiment, try modifying vm_phys.c to use >> dump_avail instead of >> >>> phys_avail when sizing vm_page_array. On amd64, where >> the same problem >> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right >> now, this is >> >>> probably my preferred solution. The catch being that not >> all architectures >> >>> implement dump_avail, but my recollection is that arm does. >> >>> >> >> Frankly, I would prefer this too, but there is one big >> open question: >> >> >> >> What is dump_avail for? >> >> >> >> >> >> >> >> dump_avail[] is solving a similar problem in the minidump >> code, hence, the >> >> prefix "dump_" in its name. In other words, the minidump >> code couldn't use >> >> phys_avail[] either because it didn't describe the full >> range of physical >> >> addresses that might be included in a minidump, so >> dump_avail[] was created. >> >> >> >> There is already precedent for what I'm suggesting. >> dump_avail[] is >> >> already (ab)used outside of the minidump code on x86 to >> solve this same >> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. >> >> >> >> >> >> Using it for vm_page_array initialization and >> segmentation means that >> >> phys_avail must be a subset of it. And this must be stated >> and be visible >> >> enough. Maybe it should be even checked in code. I like >> the idea of >> >> thinking about dump_avail as something what desribes all >> memory in a >> >> system, but it's not how dump_avail is defined in archs now. >> >> >> >> >> >> >> >> When you say "it's not how dump_avail is defined in archs >> now", I'm not >> >> sure whether you're talking about the code or the >> comments. In terms of >> >> code, dump_avail[] is a superset of phys_avail[], and I'm >> not aware of any >> >> code that would have to change. In terms of comments, I >> did a grep looking >> >> for comments defining what dump_avail[] is, because I >> couldn't remember >> >> any. I found one ... on arm. So, I don't think it's a >> onerous task >> >> changing the definition of dump_avail[]. :-) >> >> >> >> Already, as things stand today with dump_avail[] being >> used outside of the >> >> minidump code, one could reasonably argue that it should >> be renamed to >> >> something like phys_exists[]. >> >> >> >> >> >> >> >> I will experiment with it on monday then. However, it's >> not only about how >> >> memory segments are created in vm_phys.c, but it's about >> how vm_page_array >> >> size is computed in vm_page.c too. >> >> >> >> >> >> >> >> Yes, and there is also a place in vm_reserv.c that needs >> to change. I've >> >> attached the patch that I developed and tested a long time >> ago. It still >> >> applies cleanly and runs ok on amd64. >> >> >> >> >> >> >> > >> > >> > Well, I've created and tested minimalistic patch which - I >> hope - is >> > commitable. It runs ok on pandaboard (arm-v6) and solves >> presented problem. >> > I would really appreciate if this will be commited. Thanks. >> >> >> Sorry for the slow reply. I've just been swamped with work >> lately. I >> finally had some time to look at this in the last day or so. >> >> The first thing that I propose to do is commit the attached >> patch. This >> patch changes pmap_init() on amd64, armv6, and i386 so that >> it no longer >> consults phys_avail[] to determine the end of memory. >> Instead, it calls >> a new function provided by vm_phys.c to obtain the same >> information from >> vm_phys_segs[]. >> >> With this change, the new variable phys_managed in your patch >> wouldn't >> need to be a global. It could be a local variable in >> vm_page_startup() >> that we pass as a parameter to vm_phys_init() and >> vm_reserv_init(). >> >> More generally, the long-term vision that I have is that we >> would stop >> using phys_avail[] after vm_page_startup() had completed. It >> would only >> be used during initialization. After that we would use >> vm_phys_segs[] >> and functions provided by vm_phys.c. >> >> >> I understand. The patch and the long-term vision are fine for me. >> I just was not to bold to pass phys_managed as a parameter to >> vm_phys_init() and vm_reserv_init(). However, I certainly was >> thinking about it. While reading comment above vm_phys_get_end(), >> do we care of if last usable address is 0xFFFFFFFF? > > > To date, this hasn't been a problem. However, handling 0xFFFFFFFF > is easy. So, the final version of the patch that I committed this > weekend does so. > > Can you please try the attached patch? It replaces phys_avail[] > with vm_phys_segs[] in arm's busdma. > > > > It works fine on arm-v6 pandaboard. I have no objection to commit it. > However, it's only 1:1 replacement. Right now, yes. However, once your patch is committed, it won't be 1:1 anymore, because vm_phys_segs[] will be populated based on dump_avail[] rather than phys_avail[]. My interpretation of the affected code is that using the ranges defined by dump_avail[] is actually closer to what this code intended. > In fact, I still keep the following pattern in my head: > > present memory in system <=> all RAM and whatsoever > nobounce memory <=> addressable by DMA In general, I don't see how this can be an attribute of the memory, because it's going to depend on the device. In other words, a given physical address may require bouncing for some device but not all devices. > managed memory by vm subsystem <=> i.e. kept in vm_page_array > available memory for vm subsystem <=> can be allocated > > So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it > could be too much limiting in some scenarios. I would like to see > something different in exclusion_bounce_check() in the future. > Something what reflects NOBOUNCE property and not NOALLOC one like now. > > > > > > >> Do you think that the rest of my patch considering changes due to >> your patch is ok? >> > > > Basically, yes. I do, however, think that > > +#if defined(__arm__) > + phys_managed = dump_avail; > +#else > + phys_managed = phys_avail; > +#endif > > should also be conditioned on VM_PHYSSEG_SPARSE. > > > > > So I've prepared new patch. phys_managed[] is passed to vm_phys_init() > and vm_reserv_init() as a parameter and small optimalization is made > in vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you > mentioned. Anyhow, I still think that this is only temporary hack. In > general, phys_managed[] should always be distinguished from phys_avail[]. > > > >> >> >> > >> > BTW, while I was inspecting all archs, I think that maybe >> it's time to do >> > what was done for busdma not long ago. There are many >> similar codes across >> > archs which deal with physical memory and could be >> generalized and put to >> > kern/subr_physmem.c for utilization. All work with physical >> memory could be >> > simplify to two arrays of regions. >> > >> > phys_present[] ... describes all present physical memory >> regions >> > phys_exclude[] ... describes various exclusions from >> phys_present[] >> > >> > Each excluded region will be labeled by flags to say what >> kind of exclusion >> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, >> NOMEMRW could >> > be combined. This idea is taken from sys/arm/arm/physmem.c. >> > >> > All other arrays like phys_managed[], phys_avail[], >> dump_avail[] will be >> > created from these phys_present[] and phys_exclude[]. >> > This way bootstrap codes in archs could be simplified and >> unified. For >> > example, dealing with either hw.physmem or page with PA >> 0x00000000 could be >> > transparent. >> > >> > I'm prepared to volunteer if the thing is ripe. However, >> some tutor will be >> > looked for. >> >> >> I've never really looked at arm/arm/physmem.c before. Let me >> do that >> before I comment on this. >> >> No problem. This could be long-term aim. However, I hope the >> VM_PHYSSEG_SPARSE problem could be dealt with in MI code in >> present time. In every case, thanks for your help. >> >> > > From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 16:56:16 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 69D2DFE0; Mon, 27 Oct 2014 16:56:16 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A8DA526D; Mon, 27 Oct 2014 16:56:15 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9RGtvOc032843 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 27 Oct 2014 18:55:57 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9RGtvOc032843 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9RGtv8G032839; Mon, 27 Oct 2014 18:55:57 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 27 Oct 2014 18:55:57 +0200 From: Konstantin Belousov To: John Baldwin Subject: Re: RfC: fueword(9) and casueword(9) Message-ID: <20141027165557.GC1877@kib.kiev.ua> References: <20141021094539.GA1877@kib.kiev.ua> <20141022002825.H2080@besplex.bde.org> <20141021162306.GE1877@kib.kiev.ua> <2048849.GkvWliFbyg@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2048849.GkvWliFbyg@ralph.baldwin.cx> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED, FREEMAIL_FROM, NML_ADSP_CUSTOM_MED, T_FILL_THIS_FORM_SHORT autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 16:56:16 -0000 On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote: > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote: > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote: > > > A new API should try to fix these __DEVOLATILE() abominations. I think it > > > is safe, and even correct, to declare the pointers as volatile const void > > > *, since the functions really can handle volatile data, unlike copyin(). > > > > > > Atomic op functions are declared as taking pointers to volatile for > > > similar reasons. Often they are applied to non-volatile data, but > > > adding a qualifier is type-safe and doesn't cost efficiency since the > > > pointer access is is not known to the compiler. (The last point is not > > > so clear -- the compiler can see things in the functions since they are > > > inline asm. fueword() isn't inline so its (in)efficiency is not changed.) > > > > > > The atomic read functions are not declared as taking pointers to const. > > > The __DECONST() abomination might be used to work around this bug. > > > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the > > umtx structures definitions. I think that it is bug to mark the lock > > words with volatile. I want the fueword(9) interface to be as much > > similar to fuword(9), in particular, volatile seems to be not needed. > > I agree with Bruce here. casuword() already accepts volatile. I also > think umtx is correct in marking the field as volatile. They are subject > to change without the compiler's knowledge albeit by other threads > rather than signal handlers. Having them marked volatile doesn't really > matter for the kernel, but the header is also used in userland and is > relevant in sem_new.c, etc. You agree with making fueword() accept volatile const void * as the address ? Or do you agree with the existence of the volatile type qualifier for the lock field of umtx structures ? I definitely do not want to make fueword() different from fuword() in this aspect. If changing both fueword() and fuword() to take volatile const * address, this should be different patch. At least because that existing changes to kern_umtx.c are really complicated due to changing very delicate logic, and I do not want to add unrelated and splittable modifications to something which I expect to require more debugging in the wild. Below is the current version, which passed Peter' stress2 load on x86. I also did smoke-testing on powerpc64. After make tinderbox finishes successfully for the patch, I consider the change ready. diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile index bc21dc6..fb63e78 100644 --- a/share/man/man9/Makefile +++ b/share/man/man9/Makefile @@ -581,6 +581,9 @@ MLINKS+=condvar.9 cv_broadcast.9 \ MLINKS+=config_intrhook.9 config_intrhook_disestablish.9 \ config_intrhook.9 config_intrhook_establish.9 MLINKS+=contigmalloc.9 contigfree.9 +MLINKS+=casuword.9 casueword.9 \ + casuword.9 casueword32.9 \ + casuword.9 casuword32.9 MLINKS+=copy.9 copyin.9 \ copy.9 copyin_nofault.9 \ copy.9 copyinstr.9 \ @@ -688,7 +691,10 @@ MLINKS+=fetch.9 fubyte.9 \ fetch.9 fuword.9 \ fetch.9 fuword16.9 \ fetch.9 fuword32.9 \ - fetch.9 fuword64.9 + fetch.9 fuword64.9 \ + fetch.9 fueword.9 \ + fetch.9 fueword32.9 \ + fetch.9 fueword64.9 MLINKS+=firmware.9 firmware_get.9 \ firmware.9 firmware_put.9 \ firmware.9 firmware_register.9 \ diff --git a/share/man/man9/casuword.9 b/share/man/man9/casuword.9 new file mode 100644 index 0000000..34a0f1d --- /dev/null +++ b/share/man/man9/casuword.9 @@ -0,0 +1,95 @@ +.\" Copyright (c) 2014 The FreeBSD Foundation +.\" All rights reserved. +.\" +.\" Part of this documentation was written by +.\" Konstantin Belousov under sponsorship +.\" from the FreeBSD Foundation. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd October 21, 2014 +.Dt CASU 9 +.Os +.Sh NAME +.Nm casueword , +.Nm casueword32 , +.Nm casuword , +.Nm casuword32 +.Nd fetch, compare and store data from user-space +.Sh SYNOPSIS +.In sys/types.h +.In sys/systm.h +.Ft int +.Fn casueword "volatile u_long *base" "u_long oldval" "u_long *oldvalp" "u_long newval" +.Ft int +.Fn casueword32 "volatile uint32_t *base" "uint32_t oldval" "uint32_t *oldvalp" "uint32_t newval" +.Ft u_long +.Fn casuword "volatile u_long *base" "u_long oldval" "u_long newval" +.Ft uint32_t +.Fn casuword32 "volatile uint32_t *base" "uint32_t oldval" "uint32_t newval" +.Sh DESCRIPTION +The +.Nm +functions are designed to perform atomic compare-and-swap operation on +the value in the usermode memory of the current process. +.Pp +The +.Nm +routines reads the value from user memory with address +.Pa base , +and compare the value read with +.Pa oldval . +If the values are equal, +.Pa newval +is written to the +.Pa *base . +In case of +.Fn casueword32 +and +.Fn casueword , +old value is stored into the (kernel-mode) variable pointed by +.Pa *oldvalp . +The userspace value must be naturally aligned. +.Pp +The callers of +.Fn casuword +and +.Fn casuword32 +functions cannot distinguish between -1 read from +userspace and function failure. +.Sh RETURN VALUES +The +.Fn casuword +and +.Fn casuword32 +functions return the data fetched or -1 on failure. +The +.Fn casueword +and +.Fn casueword32 +functions return 0 on success and -1 on failure. +.Sh SEE ALSO +.Xr atomic 9 , +.Xr fetch 9 , +.Xr store 9 diff --git a/share/man/man9/fetch.9 b/share/man/man9/fetch.9 index ccf6866..7e13cbc 100644 --- a/share/man/man9/fetch.9 +++ b/share/man/man9/fetch.9 @@ -34,7 +34,7 @@ .\" .\" $FreeBSD$ .\" -.Dd October 5, 2009 +.Dd October 21, 2014 .Dt FETCH 9 .Os .Sh NAME @@ -44,11 +44,13 @@ .Nm fuword , .Nm fuword16 , .Nm fuword32 , -.Nm fuword64 +.Nm fuword64 , +.Nm fueword , +.Nm fueword32 , +.Nm fueword64 .Nd fetch data from user-space .Sh SYNOPSIS .In sys/types.h -.In sys/time.h .In sys/systm.h .Ft int .Fn fubyte "const void *base" @@ -60,27 +62,38 @@ .Fn fuword32 "const void *base" .Ft int64_t .Fn fuword64 "const void *base" +.Ft long +.Fn fueword "const void *base" "long *val" +.Ft int32_t +.Fn fueword32 "const void *base" "int32_t *val" +.Ft int64_t +.Fn fueword64 "const void *base" "int64_t *val" .In sys/resourcevar.h .Ft int .Fn fuswintr "void *base" .Sh DESCRIPTION The .Nm -functions are designed to copy small amounts of data from user-space. +functions are designed to copy small amounts of data from user-space +of the current process. +If read is successful, it is performed atomically. +The data read must be naturally aligned. .Pp The .Nm routines provide the following functionality: -.Bl -tag -width "fuswintr()" +.Bl -tag -width "fueword32()" .It Fn fubyte Fetches a byte of data from the user-space address .Pa base . +The byte read is zero-extended into the results variable. .It Fn fuword -Fetches a word of data from the user-space address +Fetches a word of data (long) from the user-space address .Pa base . .It Fn fuword16 Fetches 16 bits of data from the user-space address .Pa base . +The half-word read is zero-extended into the results variable. .It Fn fuword32 Fetches 32 bits of data from the user-space address .Pa base . @@ -91,11 +104,46 @@ Fetches 64 bits of data from the user-space address Fetches a short word of data from the user-space address .Pa base . This function is safe to call during an interrupt context. +.It Fn fueword +Fetches a word of data (long) from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . +.It Fn fueword32 +Fetches 32 bits of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . +.It Fn fueword64 +Fetches 64 bits of data from the user-space address +.Pa base +and stores the result in the variable pointed by +.Pa val . .El +.Pp +The callers of +.Fn fuword , +.Fn fuword32 +and +.Fn fuword64 +functions cannot distinguish between -1 read from +userspace and function failure. .Sh RETURN VALUES The -.Nm +.Fn fubyte , +.Fn fuword , +.Fn fuword16 , +.Fn fuword32 , +.Fn fuword64 , +and +.Fn fuswintr functions return the data fetched or -1 on failure. +The +.Fn fueword , +.Fn fueword32 +and +.Fn fueword64 +functions return 0 on success and -1 on failure. .Sh SEE ALSO .Xr copy 9 , .Xr store 9 diff --git a/sys/amd64/amd64/support.S b/sys/amd64/amd64/support.S index 4897367..50e653d 100644 --- a/sys/amd64/amd64/support.S +++ b/sys/amd64/amd64/support.S @@ -312,12 +312,13 @@ copyin_fault: END(copyin) /* - * casuword32. Compare and set user integer. Returns -1 or the current value. - * dst = %rdi, old = %rsi, new = %rdx + * casueword32. Compare and set user integer. Returns -1 on fault, + * 0 if access was successful. Old value is written to *oldp. + * dst = %rdi, old = %esi, oldp = %rdx, new = %ecx */ -ENTRY(casuword32) - movq PCPU(CURPCB),%rcx - movq $fusufault,PCB_ONFAULT(%rcx) +ENTRY(casueword32) + movq PCPU(CURPCB),%r8 + movq $fusufault,PCB_ONFAULT(%r8) movq $VM_MAXUSER_ADDRESS-4,%rax cmpq %rax,%rdi /* verify address is valid */ @@ -327,26 +328,34 @@ ENTRY(casuword32) #ifdef SMP lock #endif - cmpxchgl %edx,(%rdi) /* new = %edx */ + cmpxchgl %ecx,(%rdi) /* new = %ecx */ /* * The old value is in %eax. If the store succeeded it will be the * value we expected (old) from before the store, otherwise it will - * be the current value. + * be the current value. Save %eax into %esi to prepare the return + * value. */ + movl %eax,%esi + xorl %eax,%eax + movq %rax,PCB_ONFAULT(%r8) - movq PCPU(CURPCB),%rcx - movq $0,PCB_ONFAULT(%rcx) + /* + * Access the oldp after the pcb_onfault is cleared, to correctly + * catch corrupted pointer. + */ + movl %esi,(%rdx) /* oldp = %rdx */ ret -END(casuword32) +END(casueword32) /* - * casuword. Compare and set user word. Returns -1 or the current value. - * dst = %rdi, old = %rsi, new = %rdx + * casueword. Compare and set user long. Returns -1 on fault, + * 0 if access was successful. Old value is written to *oldp. + * dst = %rdi, old = %rsi, oldp = %rdx, new = %rcx */ -ENTRY(casuword) - movq PCPU(CURPCB),%rcx - movq $fusufault,PCB_ONFAULT(%rcx) +ENTRY(casueword) + movq PCPU(CURPCB),%r8 + movq $fusufault,PCB_ONFAULT(%r8) movq $VM_MAXUSER_ADDRESS-4,%rax cmpq %rax,%rdi /* verify address is valid */ @@ -356,28 +365,28 @@ ENTRY(casuword) #ifdef SMP lock #endif - cmpxchgq %rdx,(%rdi) /* new = %rdx */ + cmpxchgq %rcx,(%rdi) /* new = %rcx */ /* - * The old value is in %eax. If the store succeeded it will be the + * The old value is in %rax. If the store succeeded it will be the * value we expected (old) from before the store, otherwise it will * be the current value. */ - - movq PCPU(CURPCB),%rcx - movq $fusufault,PCB_ONFAULT(%rcx) - movq $0,PCB_ONFAULT(%rcx) + movq %rax,%rsi + xorl %eax,%eax + movq %rax,PCB_ONFAULT(%r8) + movq %rsi,(%rdx) ret -END(casuword) +END(casueword) /* * Fetch (load) a 64-bit word, a 32-bit word, a 16-bit word, or an 8-bit - * byte from user memory. All these functions are MPSAFE. - * addr = %rdi + * byte from user memory. + * addr = %rdi, valp = %rsi */ -ALTENTRY(fuword64) -ENTRY(fuword) +ALTENTRY(fueword64) +ENTRY(fueword) movq PCPU(CURPCB),%rcx movq $fusufault,PCB_ONFAULT(%rcx) @@ -385,13 +394,15 @@ ENTRY(fuword) cmpq %rax,%rdi /* verify address is valid */ ja fusufault - movq (%rdi),%rax - movq $0,PCB_ONFAULT(%rcx) + xorl %eax,%eax + movq (%rdi),%r11 + movq %rax,PCB_ONFAULT(%rcx) + movq %r11,(%rsi) ret END(fuword64) END(fuword) -ENTRY(fuword32) +ENTRY(fueword32) movq PCPU(CURPCB),%rcx movq $fusufault,PCB_ONFAULT(%rcx) @@ -399,10 +410,12 @@ ENTRY(fuword32) cmpq %rax,%rdi /* verify address is valid */ ja fusufault - movl (%rdi),%eax - movq $0,PCB_ONFAULT(%rcx) + xorl %eax,%eax + movl (%rdi),%r11d + movq %rax,PCB_ONFAULT(%rcx) + movl %r11d,(%rsi) ret -END(fuword32) +END(fueword32) /* * fuswintr() and suswintr() are specialized variants of fuword16() and diff --git a/sys/amd64/ia32/ia32_syscall.c b/sys/amd64/ia32/ia32_syscall.c index 0cdec6f..92249f9 100644 --- a/sys/amd64/ia32/ia32_syscall.c +++ b/sys/amd64/ia32/ia32_syscall.c @@ -110,7 +110,7 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa) struct proc *p; struct trapframe *frame; caddr_t params; - u_int32_t args[8]; + u_int32_t args[8], tmp; int error, i; p = td->td_proc; @@ -126,7 +126,10 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa) /* * Code is first argument, followed by actual args. */ - sa->code = fuword32(params); + error = fueword32(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(int); } else if (sa->code == SYS___syscall) { /* @@ -135,7 +138,10 @@ ia32_fetch_syscall_args(struct thread *td, struct syscall_args *sa) * We use a 32-bit fetch in case params is not * aligned. */ - sa->code = fuword32(params); + error = fueword32(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(quad_t); } if (p->p_sysent->sv_mask) diff --git a/sys/arm/include/param.h b/sys/arm/include/param.h index 4a64607..6267154 100644 --- a/sys/arm/include/param.h +++ b/sys/arm/include/param.h @@ -149,4 +149,8 @@ #define pgtok(x) ((x) * (PAGE_SIZE / 1024)) +#ifdef _KERNEL +#define NO_FUEWORD 1 +#endif + #endif /* !_ARM_INCLUDE_PARAM_H_ */ diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c index 8ec949f..5ea062e 100644 --- a/sys/compat/freebsd32/freebsd32_misc.c +++ b/sys/compat/freebsd32/freebsd32_misc.c @@ -1832,16 +1832,21 @@ freebsd32_sysctl(struct thread *td, struct freebsd32_sysctl_args *uap) { int error, name[CTL_MAXNAME]; size_t j, oldlen; + uint32_t tmp; if (uap->namelen > CTL_MAXNAME || uap->namelen < 2) return (EINVAL); error = copyin(uap->name, name, uap->namelen * sizeof(int)); if (error) return (error); - if (uap->oldlenp) - oldlen = fuword32(uap->oldlenp); - else + if (uap->oldlenp) { + error = fueword32(uap->oldlenp, &tmp); + oldlen = tmp; + } else { oldlen = 0; + } + if (error != 0) + return (EFAULT); error = userland_sysctl(td, name, uap->namelen, uap->old, &oldlen, 1, uap->new, uap->newlen, &j, SCTL_MASK32); diff --git a/sys/i386/i386/support.s b/sys/i386/i386/support.s index c126f78..0a08012 100644 --- a/sys/i386/i386/support.s +++ b/sys/i386/i386/support.s @@ -389,16 +389,16 @@ copyin_fault: ret /* - * casuword. Compare and set user word. Returns -1 or the current value. + * casueword. Compare and set user word. Returns -1 on fault, + * 0 on non-faulting access. The current value is in *oldp. */ - -ALTENTRY(casuword32) -ENTRY(casuword) +ALTENTRY(casueword32) +ENTRY(casueword) movl PCPU(CURPCB),%ecx movl $fusufault,PCB_ONFAULT(%ecx) movl 4(%esp),%edx /* dst */ movl 8(%esp),%eax /* old */ - movl 12(%esp),%ecx /* new */ + movl 16(%esp),%ecx /* new */ cmpl $VM_MAXUSER_ADDRESS-4,%edx /* verify address is valid */ ja fusufault @@ -416,17 +416,20 @@ ENTRY(casuword) movl PCPU(CURPCB),%ecx movl $0,PCB_ONFAULT(%ecx) + movl 12(%esp),%edx /* oldp */ + movl %eax,(%edx) + xorl %eax,%eax ret -END(casuword32) -END(casuword) +END(casueword32) +END(casueword) /* * Fetch (load) a 32-bit word, a 16-bit word, or an 8-bit byte from user - * memory. All these functions are MPSAFE. + * memory. */ -ALTENTRY(fuword32) -ENTRY(fuword) +ALTENTRY(fueword32) +ENTRY(fueword) movl PCPU(CURPCB),%ecx movl $fusufault,PCB_ONFAULT(%ecx) movl 4(%esp),%edx /* from */ @@ -436,9 +439,12 @@ ENTRY(fuword) movl (%edx),%eax movl $0,PCB_ONFAULT(%ecx) + movl 8(%esp),%edx + movl %eax,(%edx) + xorl %eax,%eax ret -END(fuword32) -END(fuword) +END(fueword32) +END(fueword) /* * fuswintr() and suswintr() are specialized variants of fuword16() and diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c index 1d0d104..84d6ec3 100644 --- a/sys/i386/i386/trap.c +++ b/sys/i386/i386/trap.c @@ -1059,6 +1059,7 @@ cpu_fetch_syscall_args(struct thread *td, struct syscall_args *sa) struct proc *p; struct trapframe *frame; caddr_t params; + long tmp; int error; p = td->td_proc; @@ -1074,14 +1075,20 @@ cpu_fetch_syscall_args(struct thread *td, struct syscall_args *sa) /* * Code is first argument, followed by actual args. */ - sa->code = fuword(params); + error = fueword(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(int); } else if (sa->code == SYS___syscall) { /* * Like syscall, but code is a quad, so as to maintain * quad alignment for the rest of the arguments. */ - sa->code = fuword(params); + error = fueword(params, &tmp); + if (error == -1) + return (EFAULT); + sa->code = tmp; params += sizeof(quad_t); } diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c index 09212c8..45d4c6f 100644 --- a/sys/kern/kern_exec.c +++ b/sys/kern/kern_exec.c @@ -1091,7 +1091,7 @@ int exec_copyin_args(struct image_args *args, char *fname, enum uio_seg segflg, char **argv, char **envv) { - char *argp, *envp; + u_long argp, envp; int error; size_t length; @@ -1127,13 +1127,17 @@ exec_copyin_args(struct image_args *args, char *fname, /* * extract arguments first */ - while ((argp = (caddr_t) (intptr_t) fuword(argv++))) { - if (argp == (caddr_t) -1) { + for (;;) { + error = fueword(argv++, &argp); + if (error == -1) { error = EFAULT; goto err_exit; } - if ((error = copyinstr(argp, args->endp, - args->stringspace, &length))) { + if (argp == 0) + break; + error = copyinstr((void *)(uintptr_t)argp, args->endp, + args->stringspace, &length); + if (error != 0) { if (error == ENAMETOOLONG) error = E2BIG; goto err_exit; @@ -1149,13 +1153,17 @@ exec_copyin_args(struct image_args *args, char *fname, * extract environment strings */ if (envv) { - while ((envp = (caddr_t)(intptr_t)fuword(envv++))) { - if (envp == (caddr_t)-1) { + for (;;) { + error = fueword(envv++, &envp); + if (error == -1) { error = EFAULT; goto err_exit; } - if ((error = copyinstr(envp, args->endp, - args->stringspace, &length))) { + if (envp == 0) + break; + error = copyinstr((void *)(uintptr_t)envp, + args->endp, args->stringspace, &length); + if (error != 0) { if (error == ENAMETOOLONG) error = E2BIG; goto err_exit; diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c index c815e36..58e76bc 100644 --- a/sys/kern/kern_umtx.c +++ b/sys/kern/kern_umtx.c @@ -510,6 +510,15 @@ umtxq_unbusy(struct umtx_key *key) wakeup_one(uc); } +static inline void +umtxq_unbusy_unlocked(struct umtx_key *key) +{ + + umtxq_lock(key); + umtxq_unbusy(key); + umtxq_unlock(key); +} + static struct umtxq_queue * umtxq_queue_lookup(struct umtx_key *key, int q) { @@ -847,6 +856,7 @@ do_wait(struct thread *td, void *addr, u_long id, struct abs_timeout timo; struct umtx_q *uq; u_long tmp; + uint32_t tmp32; int error = 0; uq = td->td_umtxq; @@ -860,18 +870,29 @@ do_wait(struct thread *td, void *addr, u_long id, umtxq_lock(&uq->uq_key); umtxq_insert(uq); umtxq_unlock(&uq->uq_key); - if (compat32 == 0) - tmp = fuword(addr); - else - tmp = (unsigned int)fuword32(addr); + if (compat32 == 0) { + error = fueword(addr, &tmp); + if (error != 0) + error = EFAULT; + } else { + error = fueword32(addr, &tmp32); + if (error == 0) + tmp = tmp32; + else + error = EFAULT; + } umtxq_lock(&uq->uq_key); - if (tmp == id) - error = umtxq_sleep(uq, "uwait", timeout == NULL ? - NULL : &timo); - if ((uq->uq_flags & UQF_UMTXQ) == 0) - error = 0; - else + if (error == 0) { + if (tmp == id) + error = umtxq_sleep(uq, "uwait", timeout == NULL ? + NULL : &timo); + if ((uq->uq_flags & UQF_UMTXQ) == 0) + error = 0; + else + umtxq_remove(uq); + } else if ((uq->uq_flags & UQF_UMTXQ) != 0) { umtxq_remove(uq); + } umtxq_unlock(&uq->uq_key); umtx_key_release(&uq->uq_key); if (error == ERESTART) @@ -908,11 +929,11 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, struct abs_timeout timo; struct umtx_q *uq; uint32_t owner, old, id; - int error = 0; + int error, rv; id = td->td_tid; uq = td->td_umtxq; - + error = 0; if (timeout != NULL) abs_timeout_init2(&timo, timeout); @@ -921,7 +942,9 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, * can fault on any access. */ for (;;) { - owner = fuword32(__DEVOLATILE(void *, &m->m_owner)); + rv = fueword32(__DEVOLATILE(void *, &m->m_owner), &owner); + if (rv == -1) + return (EFAULT); if (mode == _UMUTEX_WAIT) { if (owner == UMUTEX_UNOWNED || owner == UMUTEX_CONTESTED) return (0); @@ -929,31 +952,31 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, /* * Try the uncontested case. This should be done in userland. */ - owner = casuword32(&m->m_owner, UMUTEX_UNOWNED, id); + rv = casueword32(&m->m_owner, UMUTEX_UNOWNED, + &owner, id); + /* The address was invalid. */ + if (rv == -1) + return (EFAULT); /* The acquire succeeded. */ if (owner == UMUTEX_UNOWNED) return (0); - /* The address was invalid. */ - if (owner == -1) - return (EFAULT); - /* If no one owns it but it is contested try to acquire it. */ if (owner == UMUTEX_CONTESTED) { - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, + id | UMUTEX_CONTESTED); + /* The address was invalid. */ + if (rv == -1) + return (EFAULT); if (owner == UMUTEX_CONTESTED) return (0); - /* The address was invalid. */ - if (owner == -1) - return (EFAULT); - - error = umtxq_check_susp(td); - if (error != 0) - return (error); + rv = umtxq_check_susp(td); + if (rv != 0) + return (rv); /* If this failed the lock has changed, restart. */ continue; @@ -985,10 +1008,11 @@ do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, * either some one else has acquired the lock or it has been * released. */ - old = casuword32(&m->m_owner, owner, owner | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); /* The address was invalid. */ - if (old == -1) { + if (rv == -1) { umtxq_lock(&uq->uq_key); umtxq_remove(uq); umtxq_unbusy(&uq->uq_key); @@ -1033,16 +1057,16 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags) /* * Make sure we own this mtx. */ - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != id) return (EPERM); if ((owner & UMUTEX_CONTESTED) == 0) { - old = casuword32(&m->m_owner, owner, UMUTEX_UNOWNED); - if (old == -1) + error = casueword32(&m->m_owner, owner, &old, UMUTEX_UNOWNED); + if (error == -1) return (EFAULT); if (old == owner) return (0); @@ -1064,14 +1088,14 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags) * there is zero or one thread only waiting for it. * Otherwise, it must be marked as contested. */ - old = casuword32(&m->m_owner, owner, - count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); + error = casueword32(&m->m_owner, owner, &old, + count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); umtxq_lock(&key); umtxq_signal(&key,1); umtxq_unbusy(&key); umtxq_unlock(&key); umtx_key_release(&key); - if (old == -1) + if (error == -1) return (EFAULT); if (old != owner) return (EINVAL); @@ -1091,14 +1115,16 @@ do_wake_umutex(struct thread *td, struct umutex *m) int error; int count; - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != 0) return (0); - flags = fuword32(&m->m_flags); + error = fueword32(&m->m_flags, &flags); + if (error == -1) + return (EFAULT); /* We should only ever be in here for contested locks */ if ((error = umtx_key_get(m, TYPE_NORMAL_UMUTEX, GET_SHARE(flags), @@ -1110,16 +1136,20 @@ do_wake_umutex(struct thread *td, struct umutex *m) count = umtxq_count(&key); umtxq_unlock(&key); - if (count <= 1) - owner = casuword32(&m->m_owner, UMUTEX_CONTESTED, UMUTEX_UNOWNED); + if (count <= 1) { + error = casueword32(&m->m_owner, UMUTEX_CONTESTED, &owner, + UMUTEX_UNOWNED); + if (error == -1) + error = EFAULT; + } umtxq_lock(&key); - if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) + if (error == 0 && count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) umtxq_signal(&key, 1); umtxq_unbusy(&key); umtxq_unlock(&key); umtx_key_release(&key); - return (0); + return (error); } /* @@ -1162,41 +1192,49 @@ do_wake2_umutex(struct thread *td, struct umutex *m, uint32_t flags) * any memory. */ if (count > 1) { - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - while ((owner & UMUTEX_CONTESTED) ==0) { - old = casuword32(&m->m_owner, owner, - owner|UMUTEX_CONTESTED); + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), + &owner); + if (error == -1) + error = EFAULT; + while (error == 0 && (owner & UMUTEX_CONTESTED) == 0) { + error = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); + if (error == -1) { + error = EFAULT; + break; + } if (old == owner) break; owner = old; - if (old == -1) - break; error = umtxq_check_susp(td); if (error != 0) break; } } else if (count == 1) { - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - while ((owner & ~UMUTEX_CONTESTED) != 0 && + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), + &owner); + if (error == -1) + error = EFAULT; + while (error == 0 && (owner & ~UMUTEX_CONTESTED) != 0 && (owner & UMUTEX_CONTESTED) == 0) { - old = casuword32(&m->m_owner, owner, - owner|UMUTEX_CONTESTED); + error = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); + if (error == -1) { + error = EFAULT; + break; + } if (old == owner) break; owner = old; - if (old == -1) - break; error = umtxq_check_susp(td); if (error != 0) break; } } umtxq_lock(&key); - if (owner == -1) { - error = EFAULT; + if (error == EFAULT) { umtxq_signal(&key, INT_MAX); - } - else if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) + } else if (count != 0 && (owner & ~UMUTEX_CONTESTED) == 0) umtxq_signal(&key, 1); umtxq_unbusy(&key); umtxq_unlock(&key); @@ -1576,7 +1614,7 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, struct umtx_q *uq; struct umtx_pi *pi, *new_pi; uint32_t id, owner, old; - int error; + int error, rv; id = td->td_tid; uq = td->td_umtxq; @@ -1619,7 +1657,12 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, /* * Try the uncontested case. This should be done in userland. */ - owner = casuword32(&m->m_owner, UMUTEX_UNOWNED, id); + rv = casueword32(&m->m_owner, UMUTEX_UNOWNED, &owner, id); + /* The address was invalid. */ + if (rv == -1) { + error = EFAULT; + break; + } /* The acquire succeeded. */ if (owner == UMUTEX_UNOWNED) { @@ -1627,16 +1670,15 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; - break; - } - /* If no one owns it but it is contested try to acquire it. */ if (owner == UMUTEX_CONTESTED) { - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED); + /* The address was invalid. */ + if (rv == -1) { + error = EFAULT; + break; + } if (owner == UMUTEX_CONTESTED) { umtxq_lock(&uq->uq_key); @@ -1647,12 +1689,6 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; - break; - } - error = umtxq_check_susp(td); if (error != 0) break; @@ -1683,13 +1719,12 @@ do_lock_pi(struct thread *td, struct umutex *m, uint32_t flags, * either some one else has acquired the lock or it has been * released. */ - old = casuword32(&m->m_owner, owner, owner | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, owner, &old, + owner | UMUTEX_CONTESTED); /* The address was invalid. */ - if (old == -1) { - umtxq_lock(&uq->uq_key); - umtxq_unbusy(&uq->uq_key); - umtxq_unlock(&uq->uq_key); + if (rv == -1) { + umtxq_unbusy_unlocked(&uq->uq_key); error = EFAULT; break; } @@ -1741,8 +1776,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags) /* * Make sure we own this mtx. */ - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != id) @@ -1750,8 +1785,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags) /* This should be done in userland */ if ((owner & UMUTEX_CONTESTED) == 0) { - old = casuword32(&m->m_owner, owner, UMUTEX_UNOWNED); - if (old == -1) + error = casueword32(&m->m_owner, owner, &old, UMUTEX_UNOWNED); + if (error == -1) return (EFAULT); if (old == owner) return (0); @@ -1809,14 +1844,12 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags) * there is zero or one thread only waiting for it. * Otherwise, it must be marked as contested. */ - old = casuword32(&m->m_owner, owner, - count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); + error = casueword32(&m->m_owner, owner, &old, + count <= 1 ? UMUTEX_UNOWNED : UMUTEX_CONTESTED); - umtxq_lock(&key); - umtxq_unbusy(&key); - umtxq_unlock(&key); + umtxq_unbusy_unlocked(&key); umtx_key_release(&key); - if (old == -1) + if (error == -1) return (EFAULT); if (old != owner) return (EINVAL); @@ -1835,7 +1868,7 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, struct umtx_pi *pi; uint32_t ceiling; uint32_t owner, id; - int error, pri, old_inherited_pri, su; + int error, pri, old_inherited_pri, su, rv; id = td->td_tid; uq = td->td_umtxq; @@ -1853,7 +1886,12 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, umtxq_busy(&uq->uq_key); umtxq_unlock(&uq->uq_key); - ceiling = RTP_PRIO_MAX - fuword32(&m->m_ceilings[0]); + rv = fueword32(&m->m_ceilings[0], &ceiling); + if (rv == -1) { + error = EFAULT; + goto out; + } + ceiling = RTP_PRIO_MAX - ceiling; if (ceiling > RTP_PRIO_MAX) { error = EINVAL; goto out; @@ -1874,17 +1912,16 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, } mtx_unlock_spin(&umtx_lock); - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); - - if (owner == UMUTEX_CONTESTED) { - error = 0; + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED); + /* The address was invalid. */ + if (rv == -1) { + error = EFAULT; break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; + if (owner == UMUTEX_CONTESTED) { + error = 0; break; } @@ -1946,9 +1983,7 @@ do_lock_pp(struct thread *td, struct umutex *m, uint32_t flags, } out: - umtxq_lock(&uq->uq_key); - umtxq_unbusy(&uq->uq_key); - umtxq_unlock(&uq->uq_key); + umtxq_unbusy_unlocked(&uq->uq_key); umtx_key_release(&uq->uq_key); return (error); } @@ -1973,8 +2008,8 @@ do_unlock_pp(struct thread *td, struct umutex *m, uint32_t flags) /* * Make sure we own this mtx. */ - owner = fuword32(__DEVOLATILE(uint32_t *, &m->m_owner)); - if (owner == -1) + error = fueword32(__DEVOLATILE(uint32_t *, &m->m_owner), &owner); + if (error == -1) return (EFAULT); if ((owner & ~UMUTEX_CONTESTED) != id) @@ -2047,9 +2082,11 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling, uint32_t save_ceiling; uint32_t owner, id; uint32_t flags; - int error; + int error, rv; - flags = fuword32(&m->m_flags); + error = fueword32(&m->m_flags, &flags); + if (error == -1) + return (EFAULT); if ((flags & UMUTEX_PRIO_PROTECT) == 0) return (EINVAL); if (ceiling > RTP_PRIO_MAX) @@ -2064,10 +2101,18 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling, umtxq_busy(&uq->uq_key); umtxq_unlock(&uq->uq_key); - save_ceiling = fuword32(&m->m_ceilings[0]); + rv = fueword32(&m->m_ceilings[0], &save_ceiling); + if (rv == -1) { + error = EFAULT; + break; + } - owner = casuword32(&m->m_owner, - UMUTEX_CONTESTED, id | UMUTEX_CONTESTED); + rv = casueword32(&m->m_owner, + UMUTEX_CONTESTED, &owner, id | UMUTEX_CONTESTED); + if (rv == -1) { + error = EFAULT; + break; + } if (owner == UMUTEX_CONTESTED) { suword32(&m->m_ceilings[0], ceiling); @@ -2077,12 +2122,6 @@ do_set_ceiling(struct thread *td, struct umutex *m, uint32_t ceiling, break; } - /* The address was invalid. */ - if (owner == -1) { - error = EFAULT; - break; - } - if ((owner & ~UMUTEX_CONTESTED) == id) { suword32(&m->m_ceilings[0], ceiling); error = 0; @@ -2129,8 +2168,8 @@ do_lock_umutex(struct thread *td, struct umutex *m, uint32_t flags; int error; - flags = fuword32(&m->m_flags); - if (flags == -1) + error = fueword32(&m->m_flags, &flags); + if (error == -1) return (EFAULT); switch(flags & (UMUTEX_PRIO_INHERIT | UMUTEX_PRIO_PROTECT)) { @@ -2164,9 +2203,10 @@ static int do_unlock_umutex(struct thread *td, struct umutex *m) { uint32_t flags; + int error; - flags = fuword32(&m->m_flags); - if (flags == -1) + error = fueword32(&m->m_flags, &flags); + if (error == -1) return (EFAULT); switch(flags & (UMUTEX_PRIO_INHERIT | UMUTEX_PRIO_PROTECT)) { @@ -2187,21 +2227,27 @@ do_cv_wait(struct thread *td, struct ucond *cv, struct umutex *m, { struct abs_timeout timo; struct umtx_q *uq; - uint32_t flags; - uint32_t clockid; + uint32_t flags, clockid, hasw; int error; uq = td->td_umtxq; - flags = fuword32(&cv->c_flags); + error = fueword32(&cv->c_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); if ((wflags & CVWAIT_CLOCKID) != 0) { - clockid = fuword32(&cv->c_clockid); + error = fueword32(&cv->c_clockid, &clockid); + if (error == -1) { + umtx_key_release(&uq->uq_key); + return (EFAULT); + } if (clockid < CLOCK_REALTIME || clockid >= CLOCK_THREAD_CPUTIME_ID) { /* hmm, only HW clock id will work. */ + umtx_key_release(&uq->uq_key); return (EINVAL); } } else { @@ -2217,12 +2263,12 @@ do_cv_wait(struct thread *td, struct ucond *cv, struct umutex *m, * Set c_has_waiters to 1 before releasing user mutex, also * don't modify cache line when unnecessary. */ - if (fuword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters)) == 0) + error = fueword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), + &hasw); + if (error == 0 && hasw == 0) suword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), 1); - umtxq_lock(&uq->uq_key); - umtxq_unbusy(&uq->uq_key); - umtxq_unlock(&uq->uq_key); + umtxq_unbusy_unlocked(&uq->uq_key); error = do_unlock_umutex(td, m); @@ -2276,7 +2322,9 @@ do_cv_signal(struct thread *td, struct ucond *cv) int error, cnt, nwake; uint32_t flags; - flags = fuword32(&cv->c_flags); + error = fueword32(&cv->c_flags, &flags); + if (error == -1) + return (EFAULT); if ((error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &key)) != 0) return (error); umtxq_lock(&key); @@ -2287,6 +2335,8 @@ do_cv_signal(struct thread *td, struct ucond *cv) umtxq_unlock(&key); error = suword32( __DEVOLATILE(uint32_t *, &cv->c_has_waiters), 0); + if (error == -1) + error = EFAULT; umtxq_lock(&key); } umtxq_unbusy(&key); @@ -2302,7 +2352,9 @@ do_cv_broadcast(struct thread *td, struct ucond *cv) int error; uint32_t flags; - flags = fuword32(&cv->c_flags); + error = fueword32(&cv->c_flags, &flags); + if (error == -1) + return (EFAULT); if ((error = umtx_key_get(cv, TYPE_CV, GET_SHARE(flags), &key)) != 0) return (error); @@ -2312,10 +2364,10 @@ do_cv_broadcast(struct thread *td, struct ucond *cv) umtxq_unlock(&key); error = suword32(__DEVOLATILE(uint32_t *, &cv->c_has_waiters), 0); + if (error == -1) + error = EFAULT; - umtxq_lock(&key); - umtxq_unbusy(&key); - umtxq_unlock(&key); + umtxq_unbusy_unlocked(&key); umtx_key_release(&key); return (error); @@ -2329,10 +2381,12 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx uint32_t flags, wrflags; int32_t state, oldstate; int32_t blocked_readers; - int error; + int error, rv; uq = td->td_umtxq; - flags = fuword32(&rwlock->rw_flags); + error = fueword32(&rwlock->rw_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); @@ -2345,15 +2399,22 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx wrflags |= URWLOCK_WRITE_WAITERS; for (;;) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) { + umtx_key_release(&uq->uq_key); + return (EFAULT); + } + /* try to lock it */ while (!(state & wrflags)) { if (__predict_false(URWLOCK_READER_COUNT(state) == URWLOCK_MAX_READERS)) { umtx_key_release(&uq->uq_key); return (EAGAIN); } - oldstate = casuword32(&rwlock->rw_state, state, state + 1); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state + 1); + if (rv == -1) { umtx_key_release(&uq->uq_key); return (EFAULT); } @@ -2379,12 +2440,17 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx * re-read the state, in case it changed between the try-lock above * and the check below */ - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) + error = EFAULT; /* set read contention bit */ - while ((state & wrflags) && !(state & URWLOCK_READ_WAITERS)) { - oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_READ_WAITERS); - if (oldstate == -1) { + while (error == 0 && (state & wrflags) && + !(state & URWLOCK_READ_WAITERS)) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state | URWLOCK_READ_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2396,17 +2462,13 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx break; } if (error != 0) { - umtxq_lock(&uq->uq_key); - umtxq_unbusy(&uq->uq_key); - umtxq_unlock(&uq->uq_key); + umtxq_unbusy_unlocked(&uq->uq_key); break; } /* state is changed while setting flags, restart */ if (!(state & wrflags)) { - umtxq_lock(&uq->uq_key); - umtxq_unbusy(&uq->uq_key); - umtxq_unlock(&uq->uq_key); + umtxq_unbusy_unlocked(&uq->uq_key); error = umtxq_check_susp(td); if (error != 0) break; @@ -2415,7 +2477,13 @@ do_rw_rdlock(struct thread *td, struct urwlock *rwlock, long fflag, struct _umtx sleep: /* contention bit is set, before sleeping, increase read waiter count */ - blocked_readers = fuword32(&rwlock->rw_blocked_readers); + rv = fueword32(&rwlock->rw_blocked_readers, + &blocked_readers); + if (rv == -1) { + umtxq_unbusy_unlocked(&uq->uq_key); + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_readers, blocked_readers+1); while (state & wrflags) { @@ -2431,18 +2499,32 @@ sleep: umtxq_unlock(&uq->uq_key); if (error) break; - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) { + error = EFAULT; + break; + } } /* decrease read waiter count, and may clear read contention bit */ - blocked_readers = fuword32(&rwlock->rw_blocked_readers); + rv = fueword32(&rwlock->rw_blocked_readers, + &blocked_readers); + if (rv == -1) { + umtxq_unbusy_unlocked(&uq->uq_key); + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_readers, blocked_readers-1); if (blocked_readers == 1) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); - for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state & ~URWLOCK_READ_WAITERS); - if (oldstate == -1) { + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) + error = EFAULT; + while (error == 0) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state & ~URWLOCK_READ_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2450,14 +2532,10 @@ sleep: break; state = oldstate; error = umtxq_check_susp(td); - if (error != 0) - break; } } - umtxq_lock(&uq->uq_key); - umtxq_unbusy(&uq->uq_key); - umtxq_unlock(&uq->uq_key); + umtxq_unbusy_unlocked(&uq->uq_key); if (error != 0) break; } @@ -2476,10 +2554,12 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo int32_t state, oldstate; int32_t blocked_writers; int32_t blocked_readers; - int error; + int error, rv; uq = td->td_umtxq; - flags = fuword32(&rwlock->rw_flags); + error = fueword32(&rwlock->rw_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); @@ -2489,10 +2569,16 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo blocked_readers = 0; for (;;) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) { + umtx_key_release(&uq->uq_key); + return (EFAULT); + } while (!(state & URWLOCK_WRITE_OWNER) && URWLOCK_READER_COUNT(state) == 0) { - oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_WRITE_OWNER); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state | URWLOCK_WRITE_OWNER); + if (rv == -1) { umtx_key_release(&uq->uq_key); return (EFAULT); } @@ -2528,12 +2614,17 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo * re-read the state, in case it changed between the try-lock above * and the check below */ - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), + &state); + if (rv == -1) + error = EFAULT; - while (((state & URWLOCK_WRITE_OWNER) || URWLOCK_READER_COUNT(state) != 0) && - (state & URWLOCK_WRITE_WAITERS) == 0) { - oldstate = casuword32(&rwlock->rw_state, state, state | URWLOCK_WRITE_WAITERS); - if (oldstate == -1) { + while (error == 0 && ((state & URWLOCK_WRITE_OWNER) || + URWLOCK_READER_COUNT(state) != 0) && + (state & URWLOCK_WRITE_WAITERS) == 0) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state | URWLOCK_WRITE_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2545,23 +2636,25 @@ do_rw_wrlock(struct thread *td, struct urwlock *rwlock, struct _umtx_time *timeo break; } if (error != 0) { - umtxq_lock(&uq->uq_key); - umtxq_unbusy(&uq->uq_key); - umtxq_unlock(&uq->uq_key); + umtxq_unbusy_unlocked(&uq->uq_key); break; } if (!(state & URWLOCK_WRITE_OWNER) && URWLOCK_READER_COUNT(state) == 0) { - umtxq_lock(&uq->uq_key); - umtxq_unbusy(&uq->uq_key); - umtxq_unlock(&uq->uq_key); + umtxq_unbusy_unlocked(&uq->uq_key); error = umtxq_check_susp(td); if (error != 0) break; continue; } sleep: - blocked_writers = fuword32(&rwlock->rw_blocked_writers); + rv = fueword32(&rwlock->rw_blocked_writers, + &blocked_writers); + if (rv == -1) { + umtxq_unbusy_unlocked(&uq->uq_key); + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_writers, blocked_writers+1); while ((state & URWLOCK_WRITE_OWNER) || URWLOCK_READER_COUNT(state) != 0) { @@ -2577,17 +2670,34 @@ sleep: umtxq_unlock(&uq->uq_key); if (error) break; - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) { + error = EFAULT; + break; + } } - blocked_writers = fuword32(&rwlock->rw_blocked_writers); + rv = fueword32(&rwlock->rw_blocked_writers, + &blocked_writers); + if (rv == -1) { + umtxq_unbusy_unlocked(&uq->uq_key); + error = EFAULT; + break; + } suword32(&rwlock->rw_blocked_writers, blocked_writers-1); if (blocked_writers == 1) { - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + rv = fueword32(__DEVOLATILE(int32_t *, + &rwlock->rw_state), &state); + if (rv == -1) { + umtxq_unbusy_unlocked(&uq->uq_key); + error = EFAULT; + break; + } for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state & ~URWLOCK_WRITE_WAITERS); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state & ~URWLOCK_WRITE_WAITERS); + if (rv == -1) { error = EFAULT; break; } @@ -2603,13 +2713,17 @@ sleep: if (error != 0) break; } - blocked_readers = fuword32(&rwlock->rw_blocked_readers); + rv = fueword32(&rwlock->rw_blocked_readers, + &blocked_readers); + if (rv == -1) { + umtxq_unbusy_unlocked(&uq->uq_key); + error = EFAULT; + break; + } } else blocked_readers = 0; - umtxq_lock(&uq->uq_key); - umtxq_unbusy(&uq->uq_key); - umtxq_unlock(&uq->uq_key); + umtxq_unbusy_unlocked(&uq->uq_key); } umtx_key_release(&uq->uq_key); @@ -2624,20 +2738,26 @@ do_rw_unlock(struct thread *td, struct urwlock *rwlock) struct umtx_q *uq; uint32_t flags; int32_t state, oldstate; - int error, q, count; + int error, rv, q, count; uq = td->td_umtxq; - flags = fuword32(&rwlock->rw_flags); + error = fueword32(&rwlock->rw_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(rwlock, TYPE_RWLOCK, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); - state = fuword32(__DEVOLATILE(int32_t *, &rwlock->rw_state)); + error = fueword32(__DEVOLATILE(int32_t *, &rwlock->rw_state), &state); + if (error == -1) { + error = EFAULT; + goto out; + } if (state & URWLOCK_WRITE_OWNER) { for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state & ~URWLOCK_WRITE_OWNER); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state & ~URWLOCK_WRITE_OWNER); + if (rv == -1) { error = EFAULT; goto out; } @@ -2655,9 +2775,9 @@ do_rw_unlock(struct thread *td, struct urwlock *rwlock) } } else if (URWLOCK_READER_COUNT(state) != 0) { for (;;) { - oldstate = casuword32(&rwlock->rw_state, state, - state - 1); - if (oldstate == -1) { + rv = casueword32(&rwlock->rw_state, state, + &oldstate, state - 1); + if (rv == -1) { error = EFAULT; goto out; } @@ -2716,11 +2836,13 @@ do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout) { struct abs_timeout timo; struct umtx_q *uq; - uint32_t flags, count; - int error; + uint32_t flags, count, count1; + int error, rv; uq = td->td_umtxq; - flags = fuword32(&sem->_flags); + error = fueword32(&sem->_flags, &flags); + if (error == -1) + return (EFAULT); error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &uq->uq_key); if (error != 0) return (error); @@ -2732,15 +2854,16 @@ do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout) umtxq_busy(&uq->uq_key); umtxq_insert(uq); umtxq_unlock(&uq->uq_key); - casuword32(&sem->_has_waiters, 0, 1); - count = fuword32(__DEVOLATILE(uint32_t *, &sem->_count)); - if (count != 0) { + rv = casueword32(&sem->_has_waiters, 0, &count1, 1); + if (rv == 0) + rv = fueword32(__DEVOLATILE(uint32_t *, &sem->_count), &count); + if (rv == -1 || count != 0) { umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); umtxq_remove(uq); umtxq_unlock(&uq->uq_key); umtx_key_release(&uq->uq_key); - return (0); + return (rv == -1 ? EFAULT : 0); } umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); @@ -2771,7 +2894,9 @@ do_sem_wake(struct thread *td, struct _usem *sem) int error, cnt; uint32_t flags; - flags = fuword32(&sem->_flags); + error = fueword32(&sem->_flags, &flags); + if (error == -1) + return (EFAULT); if ((error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &key)) != 0) return (error); umtxq_lock(&key); @@ -2789,6 +2914,8 @@ do_sem_wake(struct thread *td, struct _usem *sem) error = suword32( __DEVOLATILE(uint32_t *, &sem->_has_waiters), 0); umtxq_lock(&key); + if (error == -1) + error = EFAULT; } } umtxq_unbusy(&key); @@ -2804,7 +2931,7 @@ do_sem2_wait(struct thread *td, struct _usem2 *sem, struct _umtx_time *timeout) struct abs_timeout timo; struct umtx_q *uq; uint32_t count, flags; - int error; + int error, rv; uq = td->td_umtxq; flags = fuword32(&sem->_flags); @@ -2819,8 +2946,8 @@ do_sem2_wait(struct thread *td, struct _usem2 *sem, struct _umtx_time *timeout) umtxq_busy(&uq->uq_key); umtxq_insert(uq); umtxq_unlock(&uq->uq_key); - count = fuword32(__DEVOLATILE(uint32_t *, &sem->_count)); - if (count == -1) { + rv = fueword32(__DEVOLATILE(uint32_t *, &sem->_count), &count); + if (rv == -1) { umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); umtxq_remove(uq); @@ -2839,8 +2966,8 @@ do_sem2_wait(struct thread *td, struct _usem2 *sem, struct _umtx_time *timeout) } if (count == USEM_HAS_WAITERS) break; - count = casuword32(&sem->_count, 0, USEM_HAS_WAITERS); - if (count == -1) { + rv = casueword32(&sem->_count, 0, &count, USEM_HAS_WAITERS); + if (rv == -1) { umtxq_lock(&uq->uq_key); umtxq_unbusy(&uq->uq_key); umtxq_remove(uq); @@ -2877,10 +3004,12 @@ static int do_sem2_wake(struct thread *td, struct _usem2 *sem) { struct umtx_key key; - int error, cnt; + int error, cnt, rv; uint32_t count, flags; - flags = fuword32(&sem->_flags); + rv = fueword32(&sem->_flags, &flags); + if (rv == -1) + return (EFAULT); if ((error = umtx_key_get(sem, TYPE_SEM, GET_SHARE(flags), &key)) != 0) return (error); umtxq_lock(&key); @@ -2895,12 +3024,12 @@ do_sem2_wake(struct thread *td, struct _usem2 *sem) */ if (cnt == 1) { umtxq_unlock(&key); - count = fuword32(__DEVOLATILE(uint32_t *, - &sem->_count)); - while (count != -1 && count & USEM_HAS_WAITERS) - count = casuword32(&sem->_count, count, + rv = fueword32(__DEVOLATILE(uint32_t *, &sem->_count), + &count); + while (rv != -1 && count & USEM_HAS_WAITERS) + rv = casueword32(&sem->_count, count, &count, count & ~USEM_HAS_WAITERS); - if (count == -1) + if (rv == -1) error = EFAULT; umtxq_lock(&key); } diff --git a/sys/kern/subr_uio.c b/sys/kern/subr_uio.c index f2e6e32..f2bbb0c 100644 --- a/sys/kern/subr_uio.c +++ b/sys/kern/subr_uio.c @@ -7,6 +7,11 @@ * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * + * Copyright (c) 2014 The FreeBSD Foundation + * + * Portions of this software were developed by Konstantin Belousov + * under sponsorship from the FreeBSD Foundation. + * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: @@ -438,3 +443,128 @@ copyout_unmap(struct thread *td, vm_offset_t addr, size_t sz) return (0); } + +#ifdef NO_FUEWORD +/* + * XXXKIB The temporal implementation of fue*() functions which do not + * handle usermode -1 properly, mixing it with the fault code. Keep + * this until MD code is written. Currently sparc64, mips and arm do + * not have proper implementation. + */ + +int +fueword(const void *base, long *val) +{ + long res; + + res = fuword(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} + +int +fueword32(const void *base, int32_t *val) +{ + int32_t res; + + res = fuword32(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} + +#ifdef _LP64 +int +fueword64(const void *base, int64_t *val) +{ + int32_t res; + + res = fuword64(base); + if (res == -1) + return (-1); + *val = res; + return (0); +} +#endif + +int +casueword32(volatile uint32_t *base, uint32_t oldval, uint32_t *oldvalp, + uint32_t newval) +{ + int32_t ov; + + ov = casuword32(base, oldval, newval); + if (ov == -1) + return (-1); + *oldvalp = ov; + return (0); +} + +int +casueword(volatile u_long *p, u_long oldval, u_long *oldvalp, u_long newval) +{ + u_long ov; + + ov = casuword(p, oldval, newval); + if (ov == -1) + return (-1); + *oldvalp = ov; + return (0); +} +#else /* NO_FUEWORD */ +int32_t +fuword32(const void *addr) +{ + int rv; + int32_t val; + + rv = fueword32(addr, &val); + return (rv == -1 ? -1 : val); +} + +#ifdef _LP64 +int64_t +fuword64(const void *addr) +{ + int rv; + int64_t val; + + rv = fueword64(addr, &val); + return (rv == -1 ? -1 : val); +} +#endif /* _LP64 */ + +long +fuword(const void *addr) +{ + long val; + int rv; + + rv = fueword(addr, &val); + return (rv == -1 ? -1 : val); +} + +uint32_t +casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new) +{ + int rv; + uint32_t val; + + rv = casueword32(addr, old, &val, new); + return (rv == -1 ? -1 : val); +} + +u_long +casuword(volatile u_long *addr, u_long old, u_long new) +{ + int rv; + u_long val; + + rv = casueword(addr, old, &val, new); + return (rv == -1 ? -1 : val); +} + +#endif /* NO_FUEWORD */ diff --git a/sys/kern/vfs_acl.c b/sys/kern/vfs_acl.c index 93626fb..e9361e5 100644 --- a/sys/kern/vfs_acl.c +++ b/sys/kern/vfs_acl.c @@ -148,6 +148,7 @@ acl_copyin(void *user_acl, struct acl *kernel_acl, acl_type_t type) static int acl_copyout(struct acl *kernel_acl, void *user_acl, acl_type_t type) { + uint32_t am; int error; struct oldacl old; @@ -162,8 +163,11 @@ acl_copyout(struct acl *kernel_acl, void *user_acl, acl_type_t type) break; default: - if (fuword32((char *)user_acl + - offsetof(struct acl, acl_maxcnt)) != ACL_MAX_ENTRIES) + error = fueword32((char *)user_acl + + offsetof(struct acl, acl_maxcnt), &am); + if (error == -1) + return (EFAULT); + if (am != ACL_MAX_ENTRIES) return (EINVAL); error = copyout(kernel_acl, user_acl, sizeof(*kernel_acl)); diff --git a/sys/mips/include/param.h b/sys/mips/include/param.h index 2d1d7f1..90f3e6f 100644 --- a/sys/mips/include/param.h +++ b/sys/mips/include/param.h @@ -178,4 +178,8 @@ #define pgtok(x) ((x) * (PAGE_SIZE / 1024)) +#ifdef _KERNEL +#define NO_FUEWORD 1 +#endif + #endif /* !_MIPS_INCLUDE_PARAM_H_ */ diff --git a/sys/net/if_spppsubr.c b/sys/net/if_spppsubr.c index 9dc55c5..c0f8e39 100644 --- a/sys/net/if_spppsubr.c +++ b/sys/net/if_spppsubr.c @@ -5060,7 +5060,8 @@ sppp_params(struct sppp *sp, u_long cmd, void *data) * Check the cmd word first before attempting to fetch all the * data. */ - if ((subcmd = fuword(ifr->ifr_data)) == -1) { + rv = fueword(ifr->ifr_data, &subcmd); + if (rv == -1) { rv = EFAULT; goto quit; } diff --git a/sys/powerpc/powerpc/copyinout.c b/sys/powerpc/powerpc/copyinout.c index dcfab80..a337c8b 100644 --- a/sys/powerpc/powerpc/copyinout.c +++ b/sys/powerpc/powerpc/copyinout.c @@ -405,14 +405,13 @@ fubyte(const void *addr) return (val); } -#ifdef __powerpc64__ -int32_t -fuword32(const void *addr) +int +fuword16(const void *addr) { struct thread *td; pmap_t pm; faultbuf env; - int32_t *p, val; + uint16_t *p, val; td = curthread; pm = &td->td_proc->p_vmspace->vm_pmap; @@ -432,15 +431,14 @@ fuword32(const void *addr) td->td_pcb->pcb_onfault = NULL; return (val); } -#endif -long -fuword(const void *addr) +int +fueword32(const void *addr, int32_t *val) { struct thread *td; pmap_t pm; faultbuf env; - long *p, val; + int32_t *p; td = curthread; pm = &td->td_proc->p_vmspace->vm_pmap; @@ -455,22 +453,71 @@ fuword(const void *addr) return (-1); } - val = *p; + *val = *p; td->td_pcb->pcb_onfault = NULL; - return (val); + return (0); } -#ifndef __powerpc64__ -int32_t -fuword32(const void *addr) +#ifdef __powerpc64__ +int +fueword64(const void *addr, int64_t *val) { - return ((int32_t)fuword(addr)); + struct thread *td; + pmap_t pm; + faultbuf env; + int64_t *p; + + td = curthread; + pm = &td->td_proc->p_vmspace->vm_pmap; + + if (setfault(env)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + if (map_user_ptr(pm, addr, (void **)&p, sizeof(*p), NULL)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + *val = *p; + + td->td_pcb->pcb_onfault = NULL; + return (0); } #endif -uint32_t -casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new) +int +fueword(const void *addr, long *val) +{ + struct thread *td; + pmap_t pm; + faultbuf env; + long *p; + + td = curthread; + pm = &td->td_proc->p_vmspace->vm_pmap; + + if (setfault(env)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + if (map_user_ptr(pm, addr, (void **)&p, sizeof(*p), NULL)) { + td->td_pcb->pcb_onfault = NULL; + return (-1); + } + + *val = *p; + + td->td_pcb->pcb_onfault = NULL; + return (0); +} + +int +casueword32(volatile uint32_t *addr, uint32_t old, uint32_t *oldvalp, + uint32_t new) { struct thread *td; pmap_t pm; @@ -507,18 +554,21 @@ casuword32(volatile uint32_t *addr, uint32_t old, uint32_t new) td->td_pcb->pcb_onfault = NULL; - return (val); + *oldvalp = val; + return (0); } #ifndef __powerpc64__ -u_long -casuword(volatile u_long *addr, u_long old, u_long new) +int +casueword(volatile u_long *addr, u_long old, u_long *oldvalp, u_long new) { - return (casuword32((volatile uint32_t *)addr, old, new)); + + return (casueword32((volatile uint32_t *)addr, old, + (uint32_t *)oldvalp, new)); } #else -u_long -casuword(volatile u_long *addr, u_long old, u_long new) +int +casueword(volatile u_long *addr, u_long old, u_long *oldvalp, u_long new) { struct thread *td; pmap_t pm; @@ -555,7 +605,7 @@ casuword(volatile u_long *addr, u_long old, u_long new) td->td_pcb->pcb_onfault = NULL; - return (val); + *oldvalp = val; + return (0); } #endif - diff --git a/sys/sparc64/include/param.h b/sys/sparc64/include/param.h index e59f2c4..46bacae 100644 --- a/sys/sparc64/include/param.h +++ b/sys/sparc64/include/param.h @@ -146,4 +146,8 @@ #define pgtok(x) ((unsigned long)(x) * (PAGE_SIZE / 1024)) +#ifdef _KERNEL +#define NO_FUEWORD 1 +#endif + #endif /* !_SPARC64_INCLUDE_PARAM_H_ */ diff --git a/sys/sys/systm.h b/sys/sys/systm.h index f4eae57..6e5ee61 100644 --- a/sys/sys/systm.h +++ b/sys/sys/systm.h @@ -254,16 +254,23 @@ int copyout_nofault(const void * __restrict kaddr, void * __restrict udaddr, int fubyte(const void *base); long fuword(const void *base); -int fuword16(void *base); +int fuword16(const void *base); int32_t fuword32(const void *base); int64_t fuword64(const void *base); +int fueword(const void *base, long *val); +int fueword32(const void *base, int32_t *val); +int fueword64(const void *base, int64_t *val); int subyte(void *base, int byte); int suword(void *base, long word); int suword16(void *base, int word); int suword32(void *base, int32_t word); int suword64(void *base, int64_t word); uint32_t casuword32(volatile uint32_t *base, uint32_t oldval, uint32_t newval); -u_long casuword(volatile u_long *p, u_long oldval, u_long newval); +u_long casuword(volatile u_long *p, u_long oldval, u_long newval); +int casueword32(volatile uint32_t *base, uint32_t oldval, uint32_t *oldvalp, + uint32_t newval); +int casueword(volatile u_long *p, u_long oldval, u_long *oldvalp, + u_long newval); void realitexpire(void *); From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 19:27:27 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 212D9123; Mon, 27 Oct 2014 19:27:27 +0000 (UTC) Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com [IPv6:2a00:1450:400c:c05::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 868497DC; Mon, 27 Oct 2014 19:27:26 +0000 (UTC) Received: by mail-wi0-f173.google.com with SMTP id ex7so7372473wid.0 for ; Mon, 27 Oct 2014 12:27:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=yXICYh3kL1CHBrhvQ6Z/yrJkE3HEaaFW7CBNvky6pM0=; b=bi237DWtzoSXkJLCwC0IiVgtjP8lYWewR+jpO9MVDPuTh1f22ssKmuDbDQ9LGEws6R BATpyGhS3K9Z5cmJhuFuSWKsF4rzdjhQoaxKGzigA22BUZZ+zC5EPC4/zXCCGoogqUZu NFiAPoYkQOedDDrrRjrz7DVsFKAYPrkI53lyWUUudTT1RFKm0Ral+Y3TwpK7vHv+Mg1P OA6Tkiw8C/dmoXLrYlV87utdSPezViE0YxjORsTQvm1D3C8pfMd7dx7US6oCCAikjh4X wine9HmHFccCL7ifh1v8MEHcAxHK/XTjtKhn73Ir/2ZKkupGiihP4MrDpdMKtzSLiyKG AEfQ== X-Received: by 10.180.20.162 with SMTP id o2mr12790602wie.57.1414438044546; Mon, 27 Oct 2014 12:27:24 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id c5sm16603504wje.30.2014.10.27.12.27.23 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 27 Oct 2014 12:27:23 -0700 (PDT) Date: Mon, 27 Oct 2014 20:27:21 +0100 From: Mateusz Guzik To: John Baldwin Subject: Re: refcount_release_take_##lock Message-ID: <20141027192721.GA28049@dft-labs.eu> References: <20141025184448.GA19066@dft-labs.eu> <20141025190407.GU82214@funkthat.com> <2629048.tOq3sNXcCP@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <2629048.tOq3sNXcCP@ralph.baldwin.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: John-Mark Gurney , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 19:27:27 -0000 On Mon, Oct 27, 2014 at 11:27:45AM -0400, John Baldwin wrote: > Please keep the refcount_*() prefix so it matches the rest of the API. I > would just declare the functions directly in refcount.h rather than requiring > a macro to be invoked in each C file. We can also just implement the needed > lock types for now instead of all of them. > > You could maybe replace 'take' with 'lock', but either name is fine. > We need sx and rwlocks (and temporarily mutexes, but that is going away in few days). I ran into the following issue: opensolaris code has its own rwlock.h, and their refcount.h eventually includes ours refcount.h (and it has to since e.g. our file.h requires it). I don't know any good solution. We could add locking funcs to a separate header (refcount_lock.h?) or use the following hack: diff --git a/sys/sys/refcount.h b/sys/sys/refcount.h index 4611664..ce35131 100644 --- a/sys/sys/refcount.h +++ b/sys/sys/refcount.h @@ -29,15 +29,19 @@ #ifndef __SYS_REFCOUNT_H__ #define __SYS_REFCOUNT_H__ -#include -#include - #ifdef _KERNEL +#include #include +#include +#include +#include #else #define KASSERT(exp, msg) /* */ #endif +#include +#include + static __inline void refcount_init(volatile u_int *count, u_int value) { @@ -64,4 +68,36 @@ refcount_release(volatile u_int *count) return (old == 1); } +#ifdef _KERNEL + +#define REFCOUNT_RELEASE_LOCK_DEFINE(NAME, TYPE, LOCK, UNLOCK) \ +static __inline int \ +refcount_release_lock_##NAME(volatile u_int *count, TYPE *v) \ +{ \ + u_int old; \ + \ + old = *count; \ + if (old > 1 && atomic_cmpset_int(count, old, old - 1)) \ + return (0); \ + LOCK(v); \ + if (refcount_release(count)) \ + return (1); \ + UNLOCK(v); \ + return (0); \ +} + +REFCOUNT_RELEASE_LOCK_DEFINE(sx, struct sx, sx_xlock, sx_xunlock); + +#ifdef _SYS_RWLOCK_H_ +REFCOUNT_RELEASE_LOCK_DEFINE(rwlock, struct rwlock, rw_wlock, rw_wunlock); +#else +/* + * A hack to resolve header conflict with opensolaris which provides its own + * rwlock.h + */ +#define refcount_release_lock_rwlock CTASSERT(0, "not implemented") +#endif /* ! _SYS_RWLOCK_H_ */ + +#endif /* ! _KERNEL */ + #endif /* ! __SYS_REFCOUNT_H__ */ -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 22:42:44 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A1BEB160; Mon, 27 Oct 2014 22:42:44 +0000 (UTC) Received: from mail-qg0-x232.google.com (mail-qg0-x232.google.com [IPv6:2607:f8b0:400d:c04::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4A72CF60; Mon, 27 Oct 2014 22:42:44 +0000 (UTC) Received: by mail-qg0-f50.google.com with SMTP id a108so2211113qge.9 for ; Mon, 27 Oct 2014 15:42:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=81sHLuCIEiotPGVwX5CPfWztVOFK7F60wev327HAAJI=; b=LZMdtoqrDn4uhsyw49gOPU8ageb9V8mkCTGsTZISnIjD507arw0iwJX4quUvDD4hn1 A0pvJgIzZArLKhOc2yrgxPFWRcXxzQZkuOHPYdw48F9dh7sOCpWziX2ffNTOjB4ycluK b0zLyjVtbfO4QTGiEBI93hN83HQdWhY92asEKS8f8tWgzb22CXN7zYwfgpHrB0gbEnhq B5vjP8TxY0f/Q4aeIbt/0QEY5mUE+GtZFCiC+cryjLIoLZtEtdwFpQlJ58auDApI9nz/ yMAocPS++k+V9bANXIQSX/kndehR93k9HQN6na6nQ7cP3Ro/C6Xgn6wCcoOgQmK2m/aN 0VSQ== MIME-Version: 1.0 X-Received: by 10.140.44.8 with SMTP id f8mr36255477qga.105.1414449763302; Mon, 27 Oct 2014 15:42:43 -0700 (PDT) Received: by 10.140.23.242 with HTTP; Mon, 27 Oct 2014 15:42:43 -0700 (PDT) In-Reply-To: <544E7376.6040002@rice.edu> References: <5428AF3B.1030906@rice.edu> <54497DC1.5070506@rice.edu> <544DED4C.3010501@rice.edu> <544E7376.6040002@rice.edu> Date: Mon, 27 Oct 2014 23:42:43 +0100 Message-ID: Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE From: Svatopluk Kraus To: Alan Cox Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: alc@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 22:42:44 -0000 On Mon, Oct 27, 2014 at 5:31 PM, Alan Cox wrote: > > On 10/27/2014 08:22, Svatopluk Kraus wrote: > > > On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox wrote: >> >> On 10/24/2014 06:33, Svatopluk Kraus wrote: >> >> >> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox wrote: >>> >>> On 10/08/2014 10:38, Svatopluk Kraus wrote: >>> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox wrote: >>> > >>> >> On 09/27/2014 03:51, Svatopluk Kraus wrote: >>> >> >>> >> >>> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox wrote: >>> >> >>> >>> >>> >>> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus >>> >>> wrote: >>> >>> >>> >>>> Hi, >>> >>>> >>> >>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem >>> >>>> we've >>> >>>> dealt with somehow, but now we would like to do it better. It's about >>> >>>> physical pages which are allocated before vm subsystem is initialized. >>> >>>> While later on these pages could be found in vm_page_array when >>> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for >>> >>>> VM_PHYSSEG_SPARSE >>> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. >>> >>>> >>> >>>> It really would be nice to utilize vm_page_array for such preallocated >>> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. Things >>> >>>> could be much easier then. In our case, it's about pages which are used >>> >>>> for >>> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of such >>> >>>> pages. First ones are preallocated and second ones are allocated after vm >>> >>>> subsystem was inited. We must deal with each set differently. So code is >>> >>>> more complex and so is debugging. >>> >>>> >>> >>>> Thus we need some method how to say that some part of physical memory >>> >>>> should be included in vm_page_array, but the pages from that region >>> >>>> should >>> >>>> not be put to free list during initialization. We think that such >>> >>>> possibility could be utilized in general. There could be a need for some >>> >>>> physical space which: >>> >>>> >>> >>>> (1) is needed only during boot and later on it can be freed and put to vm >>> >>>> subsystem, >>> >>>> >>> >>>> (2) is needed for something else and vm_page_array code could be used >>> >>>> without some kind of its duplication. >>> >>>> >>> >>>> There is already some code which deals with blacklisted pages in >>> >>>> vm_page.c >>> >>>> file. So the easiest way how to deal with presented situation is to add >>> >>>> some callback to this part of code which will be able to either exclude >>> >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the >>> >>>> biggest >>> >>>> phys_avail region is used for vm subsystem allocations, there should be >>> >>>> some more coding. (However, blacklisted pages are not dealt with on that >>> >>>> part of region.) >>> >>>> >>> >>>> We would like to know if there is any objection: >>> >>>> >>> >>>> (1) to deal with presented problem, >>> >>>> (2) to deal with the problem presented way. >>> >>>> Some help is very appreciated. Thanks >>> >>>> >>> >>>> >>> >>> As an experiment, try modifying vm_phys.c to use dump_avail instead of >>> >>> phys_avail when sizing vm_page_array. On amd64, where the same problem >>> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, this is >>> >>> probably my preferred solution. The catch being that not all architectures >>> >>> implement dump_avail, but my recollection is that arm does. >>> >>> >>> >> Frankly, I would prefer this too, but there is one big open question: >>> >> >>> >> What is dump_avail for? >>> >> >>> >> >>> >> >>> >> dump_avail[] is solving a similar problem in the minidump code, hence, the >>> >> prefix "dump_" in its name. In other words, the minidump code couldn't use >>> >> phys_avail[] either because it didn't describe the full range of physical >>> >> addresses that might be included in a minidump, so dump_avail[] was created. >>> >> >>> >> There is already precedent for what I'm suggesting. dump_avail[] is >>> >> already (ab)used outside of the minidump code on x86 to solve this same >>> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. >>> >> >>> >> >>> >> Using it for vm_page_array initialization and segmentation means that >>> >> phys_avail must be a subset of it. And this must be stated and be visible >>> >> enough. Maybe it should be even checked in code. I like the idea of >>> >> thinking about dump_avail as something what desribes all memory in a >>> >> system, but it's not how dump_avail is defined in archs now. >>> >> >>> >> >>> >> >>> >> When you say "it's not how dump_avail is defined in archs now", I'm not >>> >> sure whether you're talking about the code or the comments. In terms of >>> >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any >>> >> code that would have to change. In terms of comments, I did a grep looking >>> >> for comments defining what dump_avail[] is, because I couldn't remember >>> >> any. I found one ... on arm. So, I don't think it's a onerous task >>> >> changing the definition of dump_avail[]. :-) >>> >> >>> >> Already, as things stand today with dump_avail[] being used outside of the >>> >> minidump code, one could reasonably argue that it should be renamed to >>> >> something like phys_exists[]. >>> >> >>> >> >>> >> >>> >> I will experiment with it on monday then. However, it's not only about how >>> >> memory segments are created in vm_phys.c, but it's about how vm_page_array >>> >> size is computed in vm_page.c too. >>> >> >>> >> >>> >> >>> >> Yes, and there is also a place in vm_reserv.c that needs to change. I've >>> >> attached the patch that I developed and tested a long time ago. It still >>> >> applies cleanly and runs ok on amd64. >>> >> >>> >> >>> >> >>> > >>> > >>> > Well, I've created and tested minimalistic patch which - I hope - is >>> > commitable. It runs ok on pandaboard (arm-v6) and solves presented problem. >>> > I would really appreciate if this will be commited. Thanks. >>> >>> >>> Sorry for the slow reply. I've just been swamped with work lately. I >>> finally had some time to look at this in the last day or so. >>> >>> The first thing that I propose to do is commit the attached patch. This >>> patch changes pmap_init() on amd64, armv6, and i386 so that it no longer >>> consults phys_avail[] to determine the end of memory. Instead, it calls >>> a new function provided by vm_phys.c to obtain the same information from >>> vm_phys_segs[]. >>> >>> With this change, the new variable phys_managed in your patch wouldn't >>> need to be a global. It could be a local variable in vm_page_startup() >>> that we pass as a parameter to vm_phys_init() and vm_reserv_init(). >>> >>> More generally, the long-term vision that I have is that we would stop >>> using phys_avail[] after vm_page_startup() had completed. It would only >>> be used during initialization. After that we would use vm_phys_segs[] >>> and functions provided by vm_phys.c. >> >> >> I understand. The patch and the long-term vision are fine for me. I just was not to bold to pass phys_managed as a parameter to vm_phys_init() and vm_reserv_init(). However, I certainly was thinking about it. While reading comment above vm_phys_get_end(), do we care of if last usable address is 0xFFFFFFFF? >> >> >> >> To date, this hasn't been a problem. However, handling 0xFFFFFFFF is easy. So, the final version of the patch that I committed this weekend does so. >> >> Can you please try the attached patch? It replaces phys_avail[] with vm_phys_segs[] in arm's busdma. > > > > It works fine on arm-v6 pandaboard. I have no objection to commit it. However, it's only 1:1 replacement. > > > > Right now, yes. However, once your patch is committed, it won't be 1:1 anymore, because vm_phys_segs[] will be populated based on dump_avail[] rather than phys_avail[]. > > My interpretation of the affected code is that using the ranges defined by dump_avail[] is actually closer to what this code intended. > True in both cases. As you said, it's closer. > > In fact, I still keep the following pattern in my head: > > present memory in system <=> all RAM and whatsoever > nobounce memory <=> addressable by DMA > > > > In general, I don't see how this can be an attribute of the memory, because it's going to depend on the device. In other words, a given physical address may require bouncing for some device but not all devices. > True again. I was thinking about it like some common property along all DMA devices on platform. If it's not that, but test for present RAM, then dump_avail[] is closer. However, again, does dump_avail[] represent all present RAM? > > > managed memory by vm subsystem <=> i.e. kept in vm_page_array > available memory for vm subsystem <=> can be allocated > > So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it could be too much limiting in some scenarios. I would like to see something different in exclusion_bounce_check() in the future. Something what reflects NOBOUNCE property and not NOALLOC one like now. > > >> >> >> >> >> Do you think that the rest of my patch considering changes due to your patch is ok? >> >> >> >> >> Basically, yes. I do, however, think that >> >> +#if defined(__arm__) >> + phys_managed = dump_avail; >> +#else >> + phys_managed = phys_avail; >> +#endif >> >> should also be conditioned on VM_PHYSSEG_SPARSE. > > > > > So I've prepared new patch. phys_managed[] is passed to vm_phys_init() and vm_reserv_init() as a parameter and small optimalization is made in vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you mentioned. Anyhow, I still think that this is only temporary hack. In general, phys_managed[] should always be distinguished from phys_avail[]. > > >> >> >>> >>> > >>> > BTW, while I was inspecting all archs, I think that maybe it's time to do >>> > what was done for busdma not long ago. There are many similar codes across >>> > archs which deal with physical memory and could be generalized and put to >>> > kern/subr_physmem.c for utilization. All work with physical memory could be >>> > simplify to two arrays of regions. >>> > >>> > phys_present[] ... describes all present physical memory regions >>> > phys_exclude[] ... describes various exclusions from phys_present[] >>> > >>> > Each excluded region will be labeled by flags to say what kind of exclusion >>> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW could >>> > be combined. This idea is taken from sys/arm/arm/physmem.c. >>> > >>> > All other arrays like phys_managed[], phys_avail[], dump_avail[] will be >>> > created from these phys_present[] and phys_exclude[]. >>> > This way bootstrap codes in archs could be simplified and unified. For >>> > example, dealing with either hw.physmem or page with PA 0x00000000 could be >>> > transparent. >>> > >>> > I'm prepared to volunteer if the thing is ripe. However, some tutor will be >>> > looked for. >>> >>> >>> I've never really looked at arm/arm/physmem.c before. Let me do that >>> before I comment on this. >>> >> No problem. This could be long-term aim. However, I hope the VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present time. In every case, thanks for your help. >> >> >> >> > > From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 22:49:07 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2C8FC45C; Mon, 27 Oct 2014 22:49:07 +0000 (UTC) Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com [IPv6:2a00:1450:400c:c05::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9586DFA2; Mon, 27 Oct 2014 22:49:06 +0000 (UTC) Received: by mail-wi0-f179.google.com with SMTP id h11so5822112wiw.12 for ; Mon, 27 Oct 2014 15:49:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=c8u/AtI7w58RsWcr9Oj+CGdSjTpVcz3VkhOWG2wJlt0=; b=jKDs6FvPZ9kRQDcBXDlNu2pjR6jMX2uyvPzJsUy/DRI1w0OdgxrzG56wKwU11GLriz LkfbkqkzND6BjAZIh96eVgDzUGNlQqnzVnqNm3PdHAIxljIaLnwQ/63W/8pEjKvXJNti DNwPNGng/qOP41zHHXlgVIIUnXgCpWpkOolIyRdwd8YuGKCDEXbv5Kl+08J74kw5FCYl uuIt1OUeA8oWguFAmz62uJTLB8IBvqoeRzcQyxUQg1rH/hZ9URU09kRpUCN8+BGHzMH6 Jq9w7oa13fxp4cZoTz6FgKLLjpR2UipN2zW/XaMRRqDw+9p2vda3SovnhHJtO2F56jqt Y8Lg== X-Received: by 10.180.212.48 with SMTP id nh16mr354104wic.50.1414450144706; Mon, 27 Oct 2014 15:49:04 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id u4sm13361327wiy.9.2014.10.27.15.49.03 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 27 Oct 2014 15:49:04 -0700 (PDT) Date: Mon, 27 Oct 2014 23:49:01 +0100 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: amd64 modules still use atomics as callable functions Message-ID: <20141027224901.GC28049@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Konstantin Belousov , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 22:49:07 -0000 Turns out several years ago the kernel was modified to provide actual functions for atomic operations and modules are always using them. I propose plugging it on amd64 in head. For stable/10 we can always provide them, but inline in modules by default (testing a KLD_WANT_ATOMIC_FUNC knob?). diff --git a/sys/amd64/amd64/atomic.c b/sys/amd64/amd64/atomic.c deleted file mode 100644 index 1b4ff7e..0000000 --- a/sys/amd64/amd64/atomic.c +++ /dev/null @@ -1,49 +0,0 @@ -/*- - * Copyright (c) 1999 Peter Jeremy - * All rights reserved. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF - * SUCH DAMAGE. - */ - -#include -__FBSDID("$FreeBSD$"); - -/* This file creates publically callable functions to perform various - * simple arithmetic on memory which is atomic in the presence of - * interrupts and multiple processors. - */ -#include - -/* Firstly make atomic.h generate prototypes as it will for kernel modules */ -#define KLD_MODULE -#include -#undef _MACHINE_ATOMIC_H_ /* forget we included it */ -#undef KLD_MODULE -#undef ATOMIC_ASM - -/* Make atomic.h generate public functions */ -#define WANT_FUNCTIONS -#define static -#undef __inline -#define __inline - -#include diff --git a/sys/amd64/include/atomic.h b/sys/amd64/include/atomic.h index 9110dc5..e7e1735 100644 --- a/sys/amd64/include/atomic.h +++ b/sys/amd64/include/atomic.h @@ -69,28 +69,7 @@ * The above functions are expanded inline in the statically-linked * kernel. Lock prefixes are generated if an SMP kernel is being * built. - * - * Kernel modules call real functions which are built into the kernel. - * This allows kernel modules to be portable between UP and SMP systems. */ -#if defined(KLD_MODULE) || !defined(__GNUCLIKE_ASM) -#define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) \ -void atomic_##NAME##_##TYPE(volatile u_##TYPE *p, u_##TYPE v); \ -void atomic_##NAME##_barr_##TYPE(volatile u_##TYPE *p, u_##TYPE v) - -int atomic_cmpset_int(volatile u_int *dst, u_int expect, u_int src); -int atomic_cmpset_long(volatile u_long *dst, u_long expect, u_long src); -u_int atomic_fetchadd_int(volatile u_int *p, u_int v); -u_long atomic_fetchadd_long(volatile u_long *p, u_long v); -int atomic_testandset_int(volatile u_int *p, u_int v); -int atomic_testandset_long(volatile u_long *p, u_int v); - -#define ATOMIC_LOAD(TYPE, LOP) \ -u_##TYPE atomic_load_acq_##TYPE(volatile u_##TYPE *p) -#define ATOMIC_STORE(TYPE) \ -void atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v) - -#else /* !KLD_MODULE && __GNUCLIKE_ASM */ /* * For userland, always use lock prefixes so that the binaries will run @@ -293,8 +272,6 @@ struct __hack #endif /* _KERNEL && !SMP */ -#endif /* KLD_MODULE || !__GNUCLIKE_ASM */ - ATOMIC_ASM(set, char, "orb %b1,%0", "iq", v); ATOMIC_ASM(clear, char, "andb %b1,%0", "iq", ~v); ATOMIC_ASM(add, char, "addb %b1,%0", "iq", v); diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64 index 9e5a2ed..0749b05 100644 --- a/sys/conf/files.amd64 +++ b/sys/conf/files.amd64 @@ -91,7 +91,6 @@ acpi_wakedata.h optional acpi \ # amd64/amd64/amd64_mem.c optional mem #amd64/amd64/apic_vector.S standard -amd64/amd64/atomic.c standard amd64/amd64/autoconf.c standard amd64/amd64/bios.c standard amd64/amd64/bpf_jit_machdep.c optional bpf_jitter -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 23:14:02 2014 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id ADD6FA2C for ; Mon, 27 Oct 2014 23:14:02 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7FCD0311 for ; Mon, 27 Oct 2014 23:14:02 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9RNE18v076592 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 27 Oct 2014 16:14:01 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9RNE1ZV076591 for arch@FreeBSD.org; Mon, 27 Oct 2014 16:14:01 -0700 (PDT) (envelope-from jmg) Date: Mon, 27 Oct 2014 16:14:01 -0700 From: John-Mark Gurney To: arch@FreeBSD.org Subject: boot man pages installed four times.. Message-ID: <20141027231401.GQ82214@funkthat.com> Mail-Followup-To: arch@FreeBSD.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Mon, 27 Oct 2014 16:14:01 -0700 (PDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 23:14:02 -0000 So, our loader man pages are currently installed four different times during installworld... Once each durning sys/boot/userboot/userboot, sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader This is because sys/boot/common/Makefile.inc defines the man pages, and each of these locations include that Makefile... It seems like the logical thing to do is to create a sys/boot/man that only installed man pages... This will partly move us to always installing all man pages on all archs... Comments? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Mon Oct 27 23:34:34 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B61E1EB6; Mon, 27 Oct 2014 23:34:34 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 84065758; Mon, 27 Oct 2014 23:34:34 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9RNYXVK076794 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 27 Oct 2014 16:34:33 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9RNYXv0076793; Mon, 27 Oct 2014 16:34:33 -0700 (PDT) (envelope-from jmg) Date: Mon, 27 Oct 2014 16:34:33 -0700 From: John-Mark Gurney To: Mateusz Guzik Subject: Re: amd64 modules still use atomics as callable functions Message-ID: <20141027233432.GR82214@funkthat.com> Mail-Followup-To: Mateusz Guzik , freebsd-arch@freebsd.org, Konstantin Belousov , Alan Cox References: <20141027224901.GC28049@dft-labs.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141027224901.GC28049@dft-labs.eu> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Mon, 27 Oct 2014 16:34:33 -0700 (PDT) Cc: Alan Cox , Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2014 23:34:34 -0000 Mateusz Guzik wrote this message on Mon, Oct 27, 2014 at 23:49 +0100: > Turns out several years ago the kernel was modified to provide actual > functions for atomic operations and modules are always using them. > > I propose plugging it on amd64 in head. It'd be interesting to measure the difference between making the call, and the cost of the lock prefix... On modern processors, according to instruction_tables.pdf, the lock prefix costs between 5 and 45 cycles.. It could be more on older processors... Though another references says that a function call over head is in the 7-9 cycle range, so w/o measuring, I'm not so sure this is a good idea... Originally I was in favor of this, as the number of amd64 systems that aren't SMP aware are getting rarer by the day... But, considering that many locking ops (if contended) will take a lot longer, I'm not so sure that the inline call will save you that much.. It'd be useful to see a comparision between: LOCK'd inlined LOCK'd via function call non-LOCK'd inlined -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 00:25:21 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1E25AA2B for ; Tue, 28 Oct 2014 00:25:21 +0000 (UTC) Received: from mail-ie0-x231.google.com (mail-ie0-x231.google.com [IPv6:2607:f8b0:4001:c03::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E4D6AC76 for ; Tue, 28 Oct 2014 00:25:20 +0000 (UTC) Received: by mail-ie0-f177.google.com with SMTP id tp5so5506143ieb.36 for ; Mon, 27 Oct 2014 17:25:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=gJ7dZyQEz5+B2Rh5iXPRmvr1nx5Yaq7o8rULuVScrxE=; b=mBXg2SXjDnNj18gGW1yQf1ZOm/uD9b+5dUxQ+dr76FHHXzQdfGLg2Xg5jwN9/N8WsE UBiInq5xdm2I3Ho2W1VOXS2lfis83VODW6Fo7LeP3Pe8nCnNsWgPNjraAfGIrPnFhfHl W25H00pfka0BP+BOUPGScpJpXhid6A6fRUhqe8UGF0Bm6TTz/A4LgbMrPr0tx25eiG6H 7UnbvIDf8MRgpzWRfFaF78x5AtpDzsf6LPfv4tSiUrIWo4M4yJ3X75ts5gC9VJw9L0Og NH3EJpR7KwRaqFW6WAgOFpi6v9CBxI1QHw2hTVjxx6BQMuDD0tq/41BnJwcCZrDcfO6B gzGQ== MIME-Version: 1.0 X-Received: by 10.107.29.209 with SMTP id d200mr6759792iod.57.1414455920206; Mon, 27 Oct 2014 17:25:20 -0700 (PDT) Received: by 10.50.193.135 with HTTP; Mon, 27 Oct 2014 17:25:20 -0700 (PDT) In-Reply-To: <20141027231401.GQ82214@funkthat.com> References: <20141027231401.GQ82214@funkthat.com> Date: Mon, 27 Oct 2014 17:25:20 -0700 Message-ID: Subject: Re: boot man pages installed four times.. From: NGie Cooper To: "freebsd-arch@FreeBSD.org Arch" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 00:25:21 -0000 On Mon, Oct 27, 2014 at 4:14 PM, John-Mark Gurney wrote: > So, our loader man pages are currently installed four different times > during installworld... Once each durning sys/boot/userboot/userboot, > sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader > > This is because sys/boot/common/Makefile.inc defines the man pages, and > each of these locations include that Makefile... > > It seems like the logical thing to do is to create a sys/boot/man that > only installed man pages... This will partly move us to always > installing all man pages on all archs... Should this manpages just be installed as part of share/man/man
instead? Cheers! From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 01:23:33 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C202D9E5 for ; Tue, 28 Oct 2014 01:23:33 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7E568376 for ; Tue, 28 Oct 2014 01:23:32 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9S1NV3v077964 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 27 Oct 2014 18:23:31 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9S1NVqo077963; Mon, 27 Oct 2014 18:23:31 -0700 (PDT) (envelope-from jmg) Date: Mon, 27 Oct 2014 18:23:31 -0700 From: John-Mark Gurney To: NGie Cooper Subject: Re: boot man pages installed four times.. Message-ID: <20141028012331.GT82214@funkthat.com> Mail-Followup-To: NGie Cooper , "freebsd-arch@FreeBSD.org Arch" References: <20141027231401.GQ82214@funkthat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Mon, 27 Oct 2014 18:23:31 -0700 (PDT) Cc: "freebsd-arch@FreeBSD.org Arch" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 01:23:33 -0000 NGie Cooper wrote this message on Mon, Oct 27, 2014 at 17:25 -0700: > On Mon, Oct 27, 2014 at 4:14 PM, John-Mark Gurney wrote: > > So, our loader man pages are currently installed four different times > > during installworld... Once each durning sys/boot/userboot/userboot, > > sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader > > > > This is because sys/boot/common/Makefile.inc defines the man pages, and > > each of these locations include that Makefile... > > > > It seems like the logical thing to do is to create a sys/boot/man that > > only installed man pages... This will partly move us to always > > installing all man pages on all archs... > > Should this manpages just be installed as part of > share/man/man
instead? That would involve moving the man pages from sys/boot into share/man which IMO doesn't make much sense... Yes, they could be installed from where ever we want, but they are usually installed from where they reside.. Looks like only atf is installing from share/man when their pages are located else where... We shouldn't introduce more, and atf should be fixed... and it's only doing it for two man pages... Hmm... atf-test-case.4 seems to be in the wrong section too... section for is for devices and device drivers, but atf-test-case doesn't have any relation to the kernel... It should probably be moved into section 7... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 01:38:44 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9AAB7D8B for ; Tue, 28 Oct 2014 01:38:44 +0000 (UTC) Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com [IPv6:2607:f8b0:4001:c05::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 623C7674 for ; Tue, 28 Oct 2014 01:38:44 +0000 (UTC) Received: by mail-ig0-f171.google.com with SMTP id l13so7520176iga.16 for ; Mon, 27 Oct 2014 18:38:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=9zJD3siXNgT+6dlgmvHbJF8qc/tQvRviFPp+YFrcBgQ=; b=RvicKfs2+EX/+9o00gAQ13SsQKdSGmIy5UtPqu4ZRlJPdn6s2+va63yDm9vA5I+aw1 3H5cC0eNGpQCPT9OOQEjzeqnXPAc+5YuDcUCZ2Y3nI5vaIhGu27uSPaUiTz6a0pXAo+K 8peMj20p2q+ZpQPEUUt6cxijfNeA4B4EmBw+ZAtOz7ZO8mH2qNqp8NIgwhE+Yp0mu2Fa UQz2XATCJhUPKfwG8Aes7tUK5o3bgV+IIhkp1aVJoD5SKkdTbvhV+OdN1KAQgvN75lq0 mhIHpgI4hg3fg6ndlI34c2lshcYXW+mxvqKhMHdXxKR24pkvSAw6RqXytPz927py9ivF JPBw== MIME-Version: 1.0 X-Received: by 10.107.18.1 with SMTP id a1mr115739ioj.83.1414460323805; Mon, 27 Oct 2014 18:38:43 -0700 (PDT) Received: by 10.50.193.135 with HTTP; Mon, 27 Oct 2014 18:38:43 -0700 (PDT) In-Reply-To: <20141028012331.GT82214@funkthat.com> References: <20141027231401.GQ82214@funkthat.com> <20141028012331.GT82214@funkthat.com> Date: Mon, 27 Oct 2014 18:38:43 -0700 Message-ID: Subject: Re: boot man pages installed four times.. From: NGie Cooper To: NGie Cooper , "freebsd-arch@FreeBSD.org Arch" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 01:38:44 -0000 On Mon, Oct 27, 2014 at 6:23 PM, John-Mark Gurney wrote: > NGie Cooper wrote this message on Mon, Oct 27, 2014 at 17:25 -0700: >> On Mon, Oct 27, 2014 at 4:14 PM, John-Mark Gurney wrote: >> > So, our loader man pages are currently installed four different times >> > during installworld... Once each durning sys/boot/userboot/userboot, >> > sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader >> > >> > This is because sys/boot/common/Makefile.inc defines the man pages, and >> > each of these locations include that Makefile... >> > >> > It seems like the logical thing to do is to create a sys/boot/man that >> > only installed man pages... This will partly move us to always >> > installing all man pages on all archs... >> >> Should this manpages just be installed as part of >> share/man/man
instead? > > That would involve moving the man pages from sys/boot into share/man > which IMO doesn't make much sense... Yes, they could be installed > from where ever we want, but they are usually installed from where > they reside.. > > Looks like only atf is installing from share/man when their pages are > located else where... We shouldn't introduce more, and atf should be > fixed... and it's only doing it for two man pages... > > Hmm... atf-test-case.4 seems to be in the wrong section too... section > for is for devices and device drivers, but atf-test-case doesn't have > any relation to the kernel... It should probably be moved into section > 7... Yes, I thought so too. Please file a bug and CC both jmmv and myself. Thanks! From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 02:21:57 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3CB3D7B3 for ; Tue, 28 Oct 2014 02:21:57 +0000 (UTC) Received: from mail-yh0-f51.google.com (mail-yh0-f51.google.com [209.85.213.51]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ED558AE5 for ; Tue, 28 Oct 2014 02:21:56 +0000 (UTC) Received: by mail-yh0-f51.google.com with SMTP id c41so2066826yho.10 for ; Mon, 27 Oct 2014 19:21:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=EApchF/rk92Xl/FWh/6NIcjeVyIlBA9HVcxBjOp6Ytw=; b=XdF5nyzBfzN5qqNbXyXMJA9TRQ9XxoSsQPBq3CFjueUauHF1d9MCEti5NnoeNL5C3/ Mrp9JTH3uNjsQHZ4zPy3ladOYiQ+AVNnRD/hiNcvg7AovzeWLC2liFizr4YwT+v+2FWJ TV94d8EuJMnb8plLhCIXnhh+rn7SH5K5MLyfqDEyNc0k10bmH3Gkm6l6vSI6wesOFtZe okg7mJYgYFwempfm9HiuG3tasJiKCFVbN53lYYR12HcG/NzM6ALHi0m5QbuvxzHSU07L SyNZ4nWPmgETkAR2HgZmhPl3fHjChD3GWlaUc5WKuTBU5gne0UczAyHyEwUKpGnfJOGQ dUGw== X-Gm-Message-State: ALoCoQmNQmG1assvDlfXCUVFn3x1TdVKoRk82OaPL4OPrThlMk9hrP7ObEc9o9iDFgezm+gmVqb4 X-Received: by 10.236.47.196 with SMTP id t44mr156111yhb.59.1414462915751; Mon, 27 Oct 2014 19:21:55 -0700 (PDT) Received: from [192.168.0.14] (173-18-133-79.client.mchsi.com. [173.18.133.79]) by mx.google.com with ESMTPSA id c76sm84202yho.12.2014.10.27.19.21.55 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 27 Oct 2014 19:21:55 -0700 (PDT) Sender: Warner Losh Content-Type: multipart/signed; boundary="Apple-Mail=_D9BD063E-8869-471C-BDF4-29A8B1348628"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: boot man pages installed four times.. From: Warner Losh In-Reply-To: <20141027231401.GQ82214@funkthat.com> Date: Mon, 27 Oct 2014 21:16:56 -0500 Message-Id: <1EC3043C-72FD-4790-B833-8E89C39B3FB9@bsdimp.com> References: <20141027231401.GQ82214@funkthat.com> To: John-Mark Gurney X-Mailer: Apple Mail (2.1878.6) Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 02:21:57 -0000 --Apple-Mail=_D9BD063E-8869-471C-BDF4-29A8B1348628 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 On Oct 27, 2014, at 6:14 PM, John-Mark Gurney wrote: > So, our loader man pages are currently installed four different times > during installworld... Once each durning sys/boot/userboot/userboot, > sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader >=20 > This is because sys/boot/common/Makefile.inc defines the man pages, = and > each of these locations include that Makefile... >=20 > It seems like the logical thing to do is to create a sys/boot/man that > only installed man pages... This will partly move us to always > installing all man pages on all archs... >=20 > Comments? We should have a common set installed from a new directory, and if = there=92s a need for variations we should install them from the current locations = (and make sure there=92s a cross ref from the common ones). Warner --Apple-Mail=_D9BD063E-8869-471C-BDF4-29A8B1348628 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJUTvyZAAoJEGwc0Sh9sBEAVgkQAJv+JRN2IRli1vP49SehN1SA aIuRReJwZ7La6KghN00m67APJQh7J67s0e0PRKz+gA2G71zneQF7yPU4MHdIeCxk oZGFtsGK2YPvI75xhDJY6iQKALy9eshNS5EXoHAznnf/VOX6eRPBkt/EfhO6eykb zaX2zY6gvVMUddEL7s1CwCvKPkdlSxBrQRh6kfpyVO5SVKgP/f4dzRxuHIoiFhsq chFliavKvUrRF/oFx+XORvkozQntqckn/NdPYsbj3a8DkF/Tl5vq4iW2bpjWRsz3 GiRsFZV3wa1slsjUgImkv4VSoFQGqVq8WRxYYdpUYFTPRlm9c9dHUWISmKSM4IdF d/W3JoK6Jg95SglpzqIxTPXZ3JfYC6QD+zm/QUAs1XbFabe7qAY8TGqtvfSISWxL IrxrDYQ89yTNMrG/P0zeGztmQfzLPXcW1aJlLUGBBcF6jTx5t7FgmH71KrC3u8FU C8cm6mCE1YLDgColUapnBaD/QoQ4vpJuMTAxBnYGELdsDUVAE8PWdf6th6YifE74 bxM+z8dLUU5S1ie3icGjPrep9jNXysNGgmv4aq9OfH4QUcMT2R109fD5yt4M1v3J zfrE2SOZsnk/izwUtLDlrtHHZXVN8IHevHneUE3vriU7+Sasm80I7KHI2SC7y/5K 7bK3DzrW8BmWPSqsVbRi =6jce -----END PGP SIGNATURE----- --Apple-Mail=_D9BD063E-8869-471C-BDF4-29A8B1348628-- From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 02:52:28 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A52EEBB1; Tue, 28 Oct 2014 02:52:28 +0000 (UTC) Received: from mail-wg0-x22a.google.com (mail-wg0-x22a.google.com [IPv6:2a00:1450:400c:c00::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C39D4D54; Tue, 28 Oct 2014 02:52:27 +0000 (UTC) Received: by mail-wg0-f42.google.com with SMTP id k14so5028055wgh.13 for ; Mon, 27 Oct 2014 19:52:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=xjJ6BspOw1GD30+/rPxX3KdG4RukQlnlA2cnx0tgkoI=; b=GcltBRlOf1fjMv4YV/+OEDg9P8zVdTizH8ZDcqIGFgAs1QMqo79OFJaGPZdSZTnhjc MytewOutgwS5DGlPcYoAJAC7YkyKWPXLRttXE7JO51vVEjTmTAX/3N7FOHRCzPkvtbxZ gEdxFgaKvMmnzsyTRWgWgkl+2Jb3B+hjAEHuUbRyChGsLJ0hH5ZQA9nGhy3NSxiqEcu4 643hAeQCd9YNXsLXO/T2JPyCBpHtVVAWIvTHIX9ZYGPGMOcyOYRH0pNjB0iFYB0DcHs9 N2lplH6PiVAqj+6osCZUOKAZ0VdTA9vPrMg+K+SGAsqp/w/ruqxswtzSsdCLz83tWtfr MhSg== X-Received: by 10.180.90.65 with SMTP id bu1mr1089117wib.71.1414464746012; Mon, 27 Oct 2014 19:52:26 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id dc8sm593915wib.7.2014.10.27.19.52.24 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 27 Oct 2014 19:52:25 -0700 (PDT) Date: Tue, 28 Oct 2014 03:52:22 +0100 From: Mateusz Guzik To: freebsd-arch@freebsd.org Subject: atomic ops Message-ID: <20141028025222.GA19223@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Attilio Rao , adrian@freebsd.org, Konstantin Belousov , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 02:52:28 -0000 As was mentioned sometime ago, our situation related to atomic ops is not ideal. atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide full memory barriers, which is stronger than needed. Moreover, load is implemented as lock cmpchg on var address, so it is addditionally slower especially when cpus compete. On amd64 it is sufficient to place a compiler barrier in such cases. Next, we lack some atomic ops in the first place. Let's define some useful terms: smp_wmb - no writes can be reordered past this point smp_rmb - no reads can be reordered past this point With this in mind, we lack ops which would guarantee only the following: 1. var = tmp; smp_wmb(); 2. tmp = var; smp_rmb(); 3. smp_rmb(); tmp = var; This matters since what we can use already to emulate this is way heavier than needed on aforementioned amd64 and most likely other archs. It is unclear to me whether it makes sense to alter what atomic_load_acq_* are currently doing. The simplest thing would be to just introduce aforementioned macros. Unfortunately I don't have any ideas for new function names. I was considering stealing consumer/producer wording instead of acq/rel, but that does not help with case 1. Also there is no common header for atomic ops. I propose adding sys/atomic.h which includes machine/atomic.h. Then it would provide atomic ops missing from md header implemented using what is already there. For an example where it could be useful see https://svnweb.freebsd.org/base/head/sys/sys/seq.h?view=markup Comments? And yes, I know that: - atomic_load_acq_rmb_int is a terrible name and I'm trying to get rid of it - seq_consistent misses a read memory barrier, but in worst case this will result in spurious ENOTCAPABLE returned. security problem of circumventing capabilities is plugged since seq is properly re-checked before we return -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 13:18:44 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9FDC94C1; Tue, 28 Oct 2014 13:18:44 +0000 (UTC) Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com [IPv6:2a00:1450:400c:c05::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E626BA2D; Tue, 28 Oct 2014 13:18:43 +0000 (UTC) Received: by mail-wi0-f175.google.com with SMTP id h11so7294853wiw.14 for ; Tue, 28 Oct 2014 06:18:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=FeHGoc80ObjiUvi06oKq+pUHv4vHaXaGrEZ7P6f5mec=; b=Xb9ZRDhWusFnS6KOK6De8hUD2/qpu2nu5kWjtchYMLsY0ye/G9QGLbLgZ4deZHRo7T hJ4ku3h9mmTfn35gZ0w+jN/cdBn5AyN6rDVDRhbQxYK2tojzTXP6Jd6JPpqT55WafK3L 5sZD3s3CTwSFXG2lKJE3cV3N+ADZ3kaj2wqvI7Zg/BsCtKVBJ9DEX6LJp8MI0AUUPLWV IsbEWvAn9RYWZXdY5Uy7hHoB60/THTGHKxz2wu7VjA661ICWOjYpGyYKCS3zI+jbvOwz fGVwPCfTNFUYITy3EP+pRso3Q5Ptu/M+26TjaSQCU7GW+bhuE1wRqq3iD/PCqwQ0I48+ KLKg== MIME-Version: 1.0 X-Received: by 10.180.10.231 with SMTP id l7mr28262950wib.1.1414502321855; Tue, 28 Oct 2014 06:18:41 -0700 (PDT) Reply-To: attilio@FreeBSD.org Sender: asmrookie@gmail.com Received: by 10.217.69.73 with HTTP; Tue, 28 Oct 2014 06:18:41 -0700 (PDT) In-Reply-To: <20141028025222.GA19223@dft-labs.eu> References: <20141028025222.GA19223@dft-labs.eu> Date: Tue, 28 Oct 2014 14:18:41 +0100 X-Google-Sender-Auth: 1ORo-3u8UGc8pxN1KyKytrYDHKI Message-ID: Subject: Re: atomic ops From: Attilio Rao To: Mateusz Guzik Content-Type: text/plain; charset=UTF-8 Cc: Adrian Chadd , Alan Cox , Konstantin Belousov , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 13:18:44 -0000 On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik wrote: > As was mentioned sometime ago, our situation related to atomic ops is > not ideal. > > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide > full memory barriers, which is stronger than needed. > > Moreover, load is implemented as lock cmpchg on var address, so it is > addditionally slower especially when cpus compete. I already explained this once privately: fully memory barriers is not stronger than needed. FreeBSD has a different semantic than Linux. We historically enforce a full barrier on _acq() and _rel() rather then just a read and write barrier, hence we need a different implementation than Linux. There is code that relies on this property, like the locking primitives (release a mutex, for instance). In short: optimizing the implementation for performance is fine and due. Changing the semantic is not fine, unless you have reviewed and fixed all the uses of _rel() and _acq(). > On amd64 it is sufficient to place a compiler barrier in such cases. > > Next, we lack some atomic ops in the first place. > > Let's define some useful terms: > smp_wmb - no writes can be reordered past this point > smp_rmb - no reads can be reordered past this point > > With this in mind, we lack ops which would guarantee only the following: > > 1. var = tmp; smp_wmb(); > 2. tmp = var; smp_rmb(); > 3. smp_rmb(); tmp = var; > > This matters since what we can use already to emulate this is way > heavier than needed on aforementioned amd64 and most likely other archs. I can see the value of such barriers in case you want to just synchronize operation regards read or writes. I also believe that on newest intel processors (for which we should optimize) rmb() and wmb() got significantly faster than mb(). However the most interesting case would be for arm and mips, I assume. That's where you would see a bigger perf difference if you optimize the membar paths. Last time I looked into it, in FreeBSD kernel the Linux-ish rmb()/wmb()/etc. were used primilarly in 3 places: Linux-derived code, handling of 16-bits operand and implementation of "faster" bus barriers. Initially I had thought about just confining the smp_*() in a Linux compat layer and fix the other 2 in this way: for 16-bits operands just pad to 32-bits, as the C11 standard also does. For the bus barriers, just grow more versions to actually include the rmb()/wmb() scheme within. At this point, I understand we may want to instead support the concept of write-only or read-only barrier. This means that if we want to keep the concept tied to the current _acq()/_rel() scheme we will end up with a KPI explosion. I'm not the one making the call here, but for a faster and more granluar approach, possibly we can end up using smp_rmb() and smp_wmb() directly. As I said I'm not the one making the call. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 13:43:04 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9194EDEF; Tue, 28 Oct 2014 13:43:04 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 34110D26; Tue, 28 Oct 2014 13:43:04 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9SDgto5027853 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 28 Oct 2014 15:42:55 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9SDgto5027853 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9SDgtSQ027852; Tue, 28 Oct 2014 15:42:55 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 28 Oct 2014 15:42:54 +0200 From: Konstantin Belousov To: Mateusz Guzik Subject: Re: atomic ops Message-ID: <20141028134254.GD1877@kib.kiev.ua> References: <20141028025222.GA19223@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141028025222.GA19223@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: Attilio Rao , adrian@freebsd.org, Alan Cox , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 13:43:04 -0000 On Tue, Oct 28, 2014 at 03:52:22AM +0100, Mateusz Guzik wrote: > As was mentioned sometime ago, our situation related to atomic ops is > not ideal. > > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide > full memory barriers, which is stronger than needed. x86 atomic_store_rel() does not establish any cpu barrier, due to the already provided guarantees of the architecture. > > Moreover, load is implemented as lock cmpchg on var address, so it is > addditionally slower especially when cpus compete. > > On amd64 it is sufficient to place a compiler barrier in such cases. > > Next, we lack some atomic ops in the first place. > > Let's define some useful terms: > smp_wmb - no writes can be reordered past this point > smp_rmb - no reads can be reordered past this point > > With this in mind, we lack ops which would guarantee only the following: > > 1. var = tmp; smp_wmb(); > 2. tmp = var; smp_rmb(); > 3. smp_rmb(); tmp = var; > > This matters since what we can use already to emulate this is way > heavier than needed on aforementioned amd64 and most likely other archs. > > It is unclear to me whether it makes sense to alter what > atomic_load_acq_* are currently doing. I still think that our load/stores, comparing with the classic definition of the operations, are ordered, i.e. what is called sequential consistent in the C standard. I have no idea if we want this property, or is it used really. The kern_intr.c (ab)uses load in this way. > > The simplest thing would be to just introduce aforementioned macros. > > Unfortunately I don't have any ideas for new function names. > > I was considering stealing consumer/producer wording instead of acq/rel, > but that does not help with case 1. > > Also there is no common header for atomic ops. > > I propose adding sys/atomic.h which includes machine/atomic.h. Then it > would provide atomic ops missing from md header implemented using what > is already there. > > For an example where it could be useful see > https://svnweb.freebsd.org/base/head/sys/sys/seq.h?view=markup > > Comments? > > And yes, I know that: > - atomic_load_acq_rmb_int is a terrible name and I'm trying to get rid > of it > - seq_consistent misses a read memory barrier, but in worst case this > will result in spurious ENOTCAPABLE returned. security problem of > circumventing capabilities is plugged since seq is properly re-checked > before we return > > -- > Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 14:25:25 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C00D9FA9; Tue, 28 Oct 2014 14:25:25 +0000 (UTC) Received: from nibbler.fubar.geek.nz (nibbler.fubar.geek.nz [199.48.134.198]) by mx1.freebsd.org (Postfix) with ESMTP id A2236252; Tue, 28 Oct 2014 14:25:25 +0000 (UTC) Received: from bender.lan (97e078e7.skybroadband.com [151.224.120.231]) by nibbler.fubar.geek.nz (Postfix) with ESMTPSA id 0E9225C692; Tue, 28 Oct 2014 14:25:16 +0000 (UTC) Date: Tue, 28 Oct 2014 14:25:10 +0000 From: Andrew Turner To: Attilio Rao Subject: Re: atomic ops Message-ID: <20141028142510.10a9d3cb@bender.lan> In-Reply-To: References: <20141028025222.GA19223@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "freebsd-arch@freebsd.org" , Adrian Chadd , Mateusz Guzik , Konstantin Belousov , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 14:25:25 -0000 On Tue, 28 Oct 2014 14:18:41 +0100 Attilio Rao wrote: > On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik > wrote: > > As was mentioned sometime ago, our situation related to atomic ops > > is not ideal. > > > > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide > > full memory barriers, which is stronger than needed. > > > > Moreover, load is implemented as lock cmpchg on var address, so it > > is addditionally slower especially when cpus compete. > > I already explained this once privately: fully memory barriers is not > stronger than needed. > FreeBSD has a different semantic than Linux. We historically enforce a > full barrier on _acq() and _rel() rather then just a read and write > barrier, hence we need a different implementation than Linux. > There is code that relies on this property, like the locking > primitives (release a mutex, for instance). On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support) there are only full barriers. On both 32 and 64-bit ARMv8 ARM has added support for load-acquire and store-release atomic instructions. For the use in atomic instructions we can assume these only operate of the address passed to them. It is unlikely we will use them in the 32-bit port however I would like to know the expected semantics of these atomic functions to make sure we get them correct in the arm64 port. I have been advised by one of the ARM Linux kernel maintainers on the problems they have found using these instructions but have yet to determine what our atomic functions guarantee. Andrew From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 14:33:09 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2DD312B8; Tue, 28 Oct 2014 14:33:09 +0000 (UTC) Received: from mail-wi0-x231.google.com (mail-wi0-x231.google.com [IPv6:2a00:1450:400c:c05::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6F5C136A; Tue, 28 Oct 2014 14:33:08 +0000 (UTC) Received: by mail-wi0-f177.google.com with SMTP id ex7so1786198wid.10 for ; Tue, 28 Oct 2014 07:33:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=jzJCJ/x9TPlske+Ae9sJ+KKAiDx64WR+YwmL3PKINic=; b=T6fDJEjC5VO+EA+Bi+saaE6lDDC+1OQsIjywQtKA+wok5ZnyvQyCMCj5SJ+tCUWyle NbfieXvHbl9F5pT6w1LmXsuCWpXeLv0vVtpv16jCNyZhtcWJ1ybr5513H+6QqROxi0YH LD5UyCjouWbMqTWdqkF8vzjW74g/pkzECk2wLO3PZbUkgyrDiw9V2eDLcJ54PPJwhUs0 mHoBNE1ishFSVkz7nF/BOeqb/iUNEWl3oHD9Idayn9sk+5IY4HTe6K6TbZ4jZD6UNo1C Q7Vcac9bKQXc7+jgGBkamr+JsIR6iR8mBZZ/Mn0gK99VeCGsY47ozaI+i2myxsv2xYLi TWsA== MIME-Version: 1.0 X-Received: by 10.180.83.37 with SMTP id n5mr28839571wiy.7.1414506786594; Tue, 28 Oct 2014 07:33:06 -0700 (PDT) Reply-To: attilio@FreeBSD.org Sender: asmrookie@gmail.com Received: by 10.217.69.73 with HTTP; Tue, 28 Oct 2014 07:33:06 -0700 (PDT) In-Reply-To: <20141028142510.10a9d3cb@bender.lan> References: <20141028025222.GA19223@dft-labs.eu> <20141028142510.10a9d3cb@bender.lan> Date: Tue, 28 Oct 2014 15:33:06 +0100 X-Google-Sender-Auth: ElSPvKB72y9f1cRQFz2uCY0dy7U Message-ID: Subject: Re: atomic ops From: Attilio Rao To: Andrew Turner Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-arch@freebsd.org" , Adrian Chadd , Mateusz Guzik , Konstantin Belousov , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 14:33:09 -0000 On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner wrote: > On Tue, 28 Oct 2014 14:18:41 +0100 > Attilio Rao wrote: > >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik >> wrote: >> > As was mentioned sometime ago, our situation related to atomic ops >> > is not ideal. >> > >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide >> > full memory barriers, which is stronger than needed. >> > >> > Moreover, load is implemented as lock cmpchg on var address, so it >> > is addditionally slower especially when cpus compete. >> >> I already explained this once privately: fully memory barriers is not >> stronger than needed. >> FreeBSD has a different semantic than Linux. We historically enforce a >> full barrier on _acq() and _rel() rather then just a read and write >> barrier, hence we need a different implementation than Linux. >> There is code that relies on this property, like the locking >> primitives (release a mutex, for instance). > > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support) > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has added > support for load-acquire and store-release atomic instructions. For the > use in atomic instructions we can assume these only operate of the > address passed to them. > > It is unlikely we will use them in the 32-bit port however I would like > to know the expected semantics of these atomic functions to make sure > we get them correct in the arm64 port. I have been advised by one of > the ARM Linux kernel maintainers on the problems they have found using > these instructions but have yet to determine what our atomic functions > guarantee. For FreeBSD the "reference doc" is atomic(9). It clearly states: The second variant of each operation includes a read memory barrier. This barrier ensures that the effects of this operation are completed before the effects of any later data accesses. As a result, the opera- tion is said to have acquire semantics as it acquires a pseudo-lock requiring further operations to wait until it has completed. To denote this, the suffix ``_acq'' is inserted into the function name immediately prior to the ``_'' suffix. For example, to subtract two integers ensuring that any later writes will happen after the subtraction is per- formed, use atomic_subtract_acq_int(). The third variant of each operation includes a write memory barrier. This ensures that all effects of all previous data accesses are completed before this operation takes place. As a result, the operation is said to have release semantics as it releases any pending data accesses to be completed before its operation is performed. To denote this, the suffix ``_rel'' is inserted into the function name immediately prior to the ``_'' suffix. For example, to add two long integers ensuring that all previous writes will happen first, use atomic_add_rel_long(). The bottom-side of all this is that read memory barriers ensures that the effect of the operations you are making (load in case of atomic_load_acq_int(), for example) are completed before any later data accesses. "Data accesses" qualifies for *all* the operations including read, writes, etc. This is very different by what Linux assumes for its rmb() barrier, for example which just orders loads. So for FreeBSD there is no _acq -> rmb() analogy and there is no _rel -> wmb() analogy. This must be kept well in mind when trying to optimize the atomic_*() operations. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 16:21:06 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 12C0632B for ; Tue, 28 Oct 2014 16:21:06 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DFBC220E for ; Tue, 28 Oct 2014 16:21:05 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 32B78B980; Tue, 28 Oct 2014 12:21:04 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Subject: Re: RfC: fueword(9) and casueword(9) Date: Tue, 28 Oct 2014 11:46:49 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141021094539.GA1877@kib.kiev.ua> <2048849.GkvWliFbyg@ralph.baldwin.cx> <20141027165557.GC1877@kib.kiev.ua> In-Reply-To: <20141027165557.GC1877@kib.kiev.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201410281146.49370.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 28 Oct 2014 12:21:04 -0400 (EDT) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 16:21:06 -0000 On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote: > On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote: > > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote: > > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote: > > > > A new API should try to fix these __DEVOLATILE() abominations. I think it > > > > is safe, and even correct, to declare the pointers as volatile const void > > > > *, since the functions really can handle volatile data, unlike copyin(). > > > > > > > > Atomic op functions are declared as taking pointers to volatile for > > > > similar reasons. Often they are applied to non-volatile data, but > > > > adding a qualifier is type-safe and doesn't cost efficiency since the > > > > pointer access is is not known to the compiler. (The last point is not > > > > so clear -- the compiler can see things in the functions since they are > > > > inline asm. fueword() isn't inline so its (in)efficiency is not changed.) > > > > > > > > The atomic read functions are not declared as taking pointers to const. > > > > The __DECONST() abomination might be used to work around this bug. > > > > > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the > > > umtx structures definitions. I think that it is bug to mark the lock > > > words with volatile. I want the fueword(9) interface to be as much > > > similar to fuword(9), in particular, volatile seems to be not needed. > > > > I agree with Bruce here. casuword() already accepts volatile. I also > > think umtx is correct in marking the field as volatile. They are subject > > to change without the compiler's knowledge albeit by other threads > > rather than signal handlers. Having them marked volatile doesn't really > > matter for the kernel, but the header is also used in userland and is > > relevant in sem_new.c, etc. > > You agree with making fueword() accept volatile const void * as the > address ? Or do you agree with the existence of the volatile type > qualifier for the lock field of umtx structures ? I agree with both (I thought Bruce only asserted the first). > I definitely do not want to make fueword() different from fuword() in > this aspect. If changing both fueword() and fuword() to take volatile > const * address, this should be different patch. I also agree that fuword() and fueword() should take identical arguments, so if this change is made it should be a separate patch (and should include suword()). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 16:21:10 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2A3E73C8 for ; Tue, 28 Oct 2014 16:21:10 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0402F210 for ; Tue, 28 Oct 2014 16:21:10 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id DA997B995; Tue, 28 Oct 2014 12:21:08 -0400 (EDT) From: John Baldwin To: Mateusz Guzik Subject: Re: refcount_release_take_##lock Date: Tue, 28 Oct 2014 11:54:54 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141025184448.GA19066@dft-labs.eu> <2629048.tOq3sNXcCP@ralph.baldwin.cx> <20141027192721.GA28049@dft-labs.eu> In-Reply-To: <20141027192721.GA28049@dft-labs.eu> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201410281154.54581.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 28 Oct 2014 12:21:08 -0400 (EDT) Cc: John-Mark Gurney , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 16:21:10 -0000 On Monday, October 27, 2014 3:27:21 pm Mateusz Guzik wrote: > On Mon, Oct 27, 2014 at 11:27:45AM -0400, John Baldwin wrote: > > Please keep the refcount_*() prefix so it matches the rest of the API. I > > would just declare the functions directly in refcount.h rather than requiring > > a macro to be invoked in each C file. We can also just implement the needed > > lock types for now instead of all of them. > > > > You could maybe replace 'take' with 'lock', but either name is fine. > > > > > We need sx and rwlocks (and temporarily mutexes, but that is going away > in few days). Ok. > I ran into the following issue: opensolaris code has its own rwlock.h, > and their refcount.h eventually includes ours refcount.h (and it has to > since e.g. our file.h requires it). > > I don't know any good solution. Ugh. > We could add locking funcs to a separate header (refcount_lock.h?) or use the > following hack: > > +#ifdef _SYS_RWLOCK_H_ > +REFCOUNT_RELEASE_LOCK_DEFINE(rwlock, struct rwlock, rw_wlock, rw_wunlock); > +#else The problem here is that typically refcount.h would be included before rwlock.h (style(9) sorts headers alphabetically). Given that you want to inline this anyway, you could perhaps implement it as a macro instead of an inline function? That would result in it only being parsed when used which would side-step this. It's not really ideal but might be less ugly than the other options. Something like: #define _refcount_release_lock(count, lock, LOCK_OP, UNLOCK_OP) \ ... #define refcount_release_lock_mtx(count, lock) \ _refcount_release_lock((count), (lock), mtx_lock, mtx_unlock) -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 17:44:36 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 66CB3262; Tue, 28 Oct 2014 17:44:36 +0000 (UTC) Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com [IPv6:2a00:1450:400c:c05::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CCAF4D63; Tue, 28 Oct 2014 17:44:35 +0000 (UTC) Received: by mail-wi0-f179.google.com with SMTP id h11so2376202wiw.12 for ; Tue, 28 Oct 2014 10:44:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=BCmsYGm/PuTTSnfkwXGltszK2GwSzK4ABJQUzyTr5gY=; b=ifEYN6Dqz7mg5xt4A1Vys9pqJYcJ5pM+fM3iZ0R7VtR+IO+Yh/WyMtKDetnsK38GXp bTaz0MfD+INVZ2FnCiP4F89pZo0aFL0h63SIbLTcJfNvfj6GGMpLtzLvLja/QO3FSLcj V3J8snEUbKSilkJ13fGOQH3enDW9Bubh2WFuxWKZlm+VaYuRDDn9BBAzVf0AEg5OX6DF vSZ5hMZZejZI80w1N8EQo05T2causCTzX30yblyRLs/2WheYGe0NtugBM8IjxBPosnf/ XMVgvPK0QZ0NpNrQLy22qarSeKIAv6HZ4pUYkNVksr23ALKjWu2SScX/vrZeMVAlIKEW +BDg== X-Received: by 10.180.221.129 with SMTP id qe1mr6701088wic.21.1414518271988; Tue, 28 Oct 2014 10:44:31 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id eu8sm11226564wic.1.2014.10.28.10.44.30 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 28 Oct 2014 10:44:31 -0700 (PDT) Date: Tue, 28 Oct 2014 18:44:28 +0100 From: Mateusz Guzik To: John Baldwin Subject: Re: refcount_release_take_##lock Message-ID: <20141028174428.GA12014@dft-labs.eu> References: <20141025184448.GA19066@dft-labs.eu> <2629048.tOq3sNXcCP@ralph.baldwin.cx> <20141027192721.GA28049@dft-labs.eu> <201410281154.54581.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <201410281154.54581.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: John-Mark Gurney , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 17:44:36 -0000 On Tue, Oct 28, 2014 at 11:54:54AM -0400, John Baldwin wrote: > On Monday, October 27, 2014 3:27:21 pm Mateusz Guzik wrote: > > On Mon, Oct 27, 2014 at 11:27:45AM -0400, John Baldwin wrote: > > > Please keep the refcount_*() prefix so it matches the rest of the API. I > > > would just declare the functions directly in refcount.h rather than requiring > > > a macro to be invoked in each C file. We can also just implement the needed > > > lock types for now instead of all of them. > > > > > > You could maybe replace 'take' with 'lock', but either name is fine. > > > > > > > > > We need sx and rwlocks (and temporarily mutexes, but that is going away > > in few days). > > Ok. > > > I ran into the following issue: opensolaris code has its own rwlock.h, > > and their refcount.h eventually includes ours refcount.h (and it has to > > since e.g. our file.h requires it). > > > > I don't know any good solution. > > Ugh. > > > We could add locking funcs to a separate header (refcount_lock.h?) or use the > > following hack: > > > > +#ifdef _SYS_RWLOCK_H_ > > +REFCOUNT_RELEASE_LOCK_DEFINE(rwlock, struct rwlock, rw_wlock, rw_wunlock); > > +#else > > The problem here is that typically refcount.h would be included before rwlock.h > (style(9) sorts headers alphabetically). > > Given that you want to inline this anyway, you could perhaps implement it as > a macro instead of an inline function? That would result in it only being > parsed when used which would side-step this. It's not really ideal but might > be less ugly than the other options. Something like: > > #define _refcount_release_lock(count, lock, LOCK_OP, UNLOCK_OP) \ > ... > > #define refcount_release_lock_mtx(count, lock) \ > _refcount_release_lock((count), (lock), mtx_lock, mtx_unlock) > diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c index f8ae0e6..e94ccde 100644 --- a/sys/kern/kern_jail.c +++ b/sys/kern/kern_jail.c @@ -4466,15 +4466,12 @@ prison_racct_free_locked(struct prison_racct *prr) void prison_racct_free(struct prison_racct *prr) { - int old; sx_assert(&allprison_lock, SA_UNLOCKED); - old = prr->prr_refcount; - if (old > 1 && atomic_cmpset_int(&prr->prr_refcount, old, old - 1)) + if (!refcount_release_lock_sx(&prr->prr_refcount, &allprison_lock)) return; - sx_xlock(&allprison_lock); prison_racct_free_locked(prr); sx_xunlock(&allprison_lock); } diff --git a/sys/kern/kern_loginclass.c b/sys/kern/kern_loginclass.c index c0946ef..0771b38 100644 --- a/sys/kern/kern_loginclass.c +++ b/sys/kern/kern_loginclass.c @@ -81,18 +81,10 @@ loginclass_hold(struct loginclass *lc) void loginclass_free(struct loginclass *lc) { - int old; - old = lc->lc_refcount; - if (old > 1 && atomic_cmpset_int(&lc->lc_refcount, old, old - 1)) + if (!refcount_release_lock_rwlock(&lc->lc_refcount, &loginclasses_lock)) return; - rw_wlock(&loginclasses_lock); - if (!refcount_release(&lc->lc_refcount)) { - rw_wunlock(&loginclasses_lock); - return; - } - racct_destroy(&lc->lc_racct); LIST_REMOVE(lc, lc_next); rw_wunlock(&loginclasses_lock); diff --git a/sys/kern/kern_resource.c b/sys/kern/kern_resource.c index 037a257..e1d5237 100644 --- a/sys/kern/kern_resource.c +++ b/sys/kern/kern_resource.c @@ -1303,20 +1303,10 @@ uihold(struct uidinfo *uip) void uifree(struct uidinfo *uip) { - int old; - /* Prepare for optimal case. */ - old = uip->ui_ref; - if (old > 1 && atomic_cmpset_int(&uip->ui_ref, old, old - 1)) + if (!refcount_release_lock_rwlock(&uip->ui_ref, &uihashtbl_lock)) return; - /* Prepare for suboptimal case. */ - rw_wlock(&uihashtbl_lock); - if (refcount_release(&uip->ui_ref) == 0) { - rw_wunlock(&uihashtbl_lock); - return; - } - racct_destroy(&uip->ui_racct); LIST_REMOVE(uip, ui_hash); rw_wunlock(&uihashtbl_lock); diff --git a/sys/sys/refcount.h b/sys/sys/refcount.h index 4611664..343da6d 100644 --- a/sys/sys/refcount.h +++ b/sys/sys/refcount.h @@ -64,4 +64,34 @@ refcount_release(volatile u_int *count) return (old == 1); } +#define _refcount_release_lock(count, lock, TYPE, LOCK_OP, UNLOCK_OP) \ +({ \ + TYPE *__lock; \ + volatile u_int *__cp; \ + u_int __old; \ + bool __ret; \ + \ + __lock = (lock); \ + __cp = (count); \ + __old = *__cp; \ + __ret = 0; \ + if (!(__old > 1 && atomic_cmpset_int(__cp, __old, __old - 1))) { \ + LOCK_OP(__lock); \ + if (refcount_release(__cp) == 0) \ + UNLOCK_OP(__lock); \ + else \ + __ret = 1; \ + } \ + __ret; \ +}) + +#define refcount_release_lock_mtx(count, lock) \ + _refcount_release_lock(count, lock, struct mtx, mtx_lock, mtx_unlock) +#define refcount_release_lock_rmlock(count, lock) \ + _refcount_release_lock(count, lock, struct rmlock, rm_wlock, rm_wunlock) +#define refcount_release_lock_rwlock(count, lock) \ + _refcount_release_lock(count, lock, struct rwlock, rw_wlock, rw_wunlock) +#define refcount_release_lock_sx(count, lock) \ + _refcount_release_lock(count, lock, struct sx, sx_xlock, sx_xunlock) + #endif /* ! __SYS_REFCOUNT_H__ */ -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 17:53:29 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5A02A638; Tue, 28 Oct 2014 17:53:29 +0000 (UTC) Received: from nibbler.fubar.geek.nz (nibbler.fubar.geek.nz [199.48.134.198]) by mx1.freebsd.org (Postfix) with ESMTP id 2B9A8E55; Tue, 28 Oct 2014 17:53:28 +0000 (UTC) Received: from bender.lan (97e078e7.skybroadband.com [151.224.120.231]) by nibbler.fubar.geek.nz (Postfix) with ESMTPSA id 6EA175CC08; Tue, 28 Oct 2014 17:53:26 +0000 (UTC) Date: Tue, 28 Oct 2014 17:53:18 +0000 From: Andrew Turner To: Attilio Rao Subject: Re: atomic ops Message-ID: <20141028175318.709d2ef6@bender.lan> In-Reply-To: References: <20141028025222.GA19223@dft-labs.eu> <20141028142510.10a9d3cb@bender.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "freebsd-arch@freebsd.org" , Adrian Chadd , Mateusz Guzik , Konstantin Belousov , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 17:53:29 -0000 On Tue, 28 Oct 2014 15:33:06 +0100 Attilio Rao wrote: > On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner > wrote: > > On Tue, 28 Oct 2014 14:18:41 +0100 > > Attilio Rao wrote: > > > >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik > >> wrote: > >> > As was mentioned sometime ago, our situation related to atomic > >> > ops is not ideal. > >> > > >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) > >> > provide full memory barriers, which is stronger than needed. > >> > > >> > Moreover, load is implemented as lock cmpchg on var address, so > >> > it is addditionally slower especially when cpus compete. > >> > >> I already explained this once privately: fully memory barriers is > >> not stronger than needed. > >> FreeBSD has a different semantic than Linux. We historically > >> enforce a full barrier on _acq() and _rel() rather then just a > >> read and write barrier, hence we need a different implementation > >> than Linux. There is code that relies on this property, like the > >> locking primitives (release a mutex, for instance). > > > > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support) > > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has > > added support for load-acquire and store-release atomic > > instructions. For the use in atomic instructions we can assume > > these only operate of the address passed to them. > > > > It is unlikely we will use them in the 32-bit port however I would > > like to know the expected semantics of these atomic functions to > > make sure we get them correct in the arm64 port. I have been > > advised by one of the ARM Linux kernel maintainers on the problems > > they have found using these instructions but have yet to determine > > what our atomic functions guarantee. > > For FreeBSD the "reference doc" is atomic(9). > It clearly states: There may also be a difference between what it states, how they are implemented, and what developers assume they do. I'm trying to make sure I get them correct. > The second variant of each operation includes a read memory barrier. > This barrier ensures that the effects of this operation are completed > before the effects of any later data accesses. As a result, the > opera- tion is said to have acquire semantics as it acquires a > pseudo-lock requiring further operations to wait until it has > completed. To denote this, the suffix ``_acq'' is inserted into the > function name immediately prior to the ``_'' suffix. For > example, to subtract two integers ensuring that any later writes will > happen after the subtraction is per- formed, use > atomic_subtract_acq_int(). It depends on the point we guarantee the acquire barrier to be. On ARMv8 the function will be a load/modify/write sequence. If we use a load-acquire operation for atomic_subtract_acq_int, for example, for a pointer P and value to subtract X: loop: load-acquire *P to N perform N = N - X store-exclusive N to *P if the store failed goto loop where N and X are both registers. This will mean no access after this loop will happen before it, but they may happen within it, e.g. if there was a later access A the following may be possible: Load P Access A Store P We know the store will happen as if it fails, e.g. another processor access *P, the store will have failed and will iterate over the loop. The other point is we can guarantee any store-release, and therefore any prior access, has happened before a later load-acquire even if it's on another processor. ... > The bottom-side of all this is that read memory barriers ensures that > the effect of the operations you are making (load in case of > atomic_load_acq_int(), for example) are completed before any later > data accesses. "Data accesses" qualifies for *all* the operations > including read, writes, etc. This is very different by what Linux > assumes for its rmb() barrier, for example which just orders loads. So > for FreeBSD there is no _acq -> rmb() analogy and there is no _rel -> > wmb() analogy. On ARMv8 using the above pseudo-code the operation later operations will not be moved before the load-acquire, but they may happen before it's store. Having discussed this with John Baldwin I don't think this is a problem due to the nature of the store operation being allowed to fail if another processor has written its memory. > > This must be kept well in mind when trying to optimize the atomic_*() > operations. At this point I'm more interested in getting them correct as they will be important when I start on SMP support. Andrew From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 18:26:50 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7A4962D8 for ; Tue, 28 Oct 2014 18:26:50 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 52DCA23E for ; Tue, 28 Oct 2014 18:26:50 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 46DCFB980; Tue, 28 Oct 2014 14:26:49 -0400 (EDT) From: John Baldwin To: Mateusz Guzik Subject: Re: refcount_release_take_##lock Date: Tue, 28 Oct 2014 14:13:58 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141025184448.GA19066@dft-labs.eu> <201410281154.54581.jhb@freebsd.org> <20141028174428.GA12014@dft-labs.eu> In-Reply-To: <20141028174428.GA12014@dft-labs.eu> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201410281413.58414.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 28 Oct 2014 14:26:49 -0400 (EDT) Cc: John-Mark Gurney , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 18:26:50 -0000 On Tuesday, October 28, 2014 1:44:28 pm Mateusz Guzik wrote: > diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c > index f8ae0e6..e94ccde 100644 > --- a/sys/kern/kern_jail.c > +++ b/sys/kern/kern_jail.c The diff looks good to me. Just need to update refcount.9 as well. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 18:26:51 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id ED6162DB; Tue, 28 Oct 2014 18:26:51 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B8BA0240; Tue, 28 Oct 2014 18:26:51 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 8ADE7B9B4; Tue, 28 Oct 2014 14:26:50 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: amd64 modules still use atomics as callable functions Date: Tue, 28 Oct 2014 14:18:04 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141027224901.GC28049@dft-labs.eu> In-Reply-To: <20141027224901.GC28049@dft-labs.eu> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201410281418.04704.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 28 Oct 2014 14:26:50 -0400 (EDT) Cc: Mateusz Guzik , Konstantin Belousov , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 18:26:52 -0000 On Monday, October 27, 2014 6:49:01 pm Mateusz Guzik wrote: > Turns out several years ago the kernel was modified to provide actual > functions for atomic operations and modules are always using them. > > I propose plugging it on amd64 in head. > > For stable/10 we can always provide them, but inline in modules by default > (testing a KLD_WANT_ATOMIC_FUNC knob?). I think some of the comments might need tweaking still: > diff --git a/sys/amd64/include/atomic.h b/sys/amd64/include/atomic.h > index 9110dc5..e7e1735 100644 > --- a/sys/amd64/include/atomic.h > +++ b/sys/amd64/include/atomic.h > @@ -69,28 +69,7 @@ > * The above functions are expanded inline in the statically-linked > * kernel. Lock prefixes are generated if an SMP kernel is being > * built. > - * > - * Kernel modules call real functions which are built into the kernel. > - * This allows kernel modules to be portable between UP and SMP systems. > */ > -#if defined(KLD_MODULE) || !defined(__GNUCLIKE_ASM) > -#define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) \ > -void atomic_##NAME##_##TYPE(volatile u_##TYPE *p, u_##TYPE v); \ > -void atomic_##NAME##_barr_##TYPE(volatile u_##TYPE *p, u_##TYPE v) > - > -int atomic_cmpset_int(volatile u_int *dst, u_int expect, u_int src); > -int atomic_cmpset_long(volatile u_long *dst, u_long expect, u_long src); > -u_int atomic_fetchadd_int(volatile u_int *p, u_int v); > -u_long atomic_fetchadd_long(volatile u_long *p, u_long v); > -int atomic_testandset_int(volatile u_int *p, u_int v); > -int atomic_testandset_long(volatile u_long *p, u_int v); > - > -#define ATOMIC_LOAD(TYPE, LOP) \ > -u_##TYPE atomic_load_acq_##TYPE(volatile u_##TYPE *p) > -#define ATOMIC_STORE(TYPE) \ > -void atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v) > - > -#else /* !KLD_MODULE && __GNUCLIKE_ASM */ > > /* > * For userland, always use lock prefixes so that the binaries will run Like here: maybe "For userland and kernel modules, always use lock prefixes..." Also, this does break the !__GNUCLIKE_ASM case, but I'm not sure if that case actually works anyway. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 18:26:53 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 16D45349; Tue, 28 Oct 2014 18:26:53 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D5B63243; Tue, 28 Oct 2014 18:26:52 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D4051B96E; Tue, 28 Oct 2014 14:26:51 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: boot man pages installed four times.. Date: Tue, 28 Oct 2014 14:19:58 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141027231401.GQ82214@funkthat.com> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201410281419.58068.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 28 Oct 2014 14:26:51 -0400 (EDT) Cc: "freebsd-arch@FreeBSD.org Arch" , NGie Cooper X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 18:26:53 -0000 On Monday, October 27, 2014 8:25:20 pm NGie Cooper wrote: > On Mon, Oct 27, 2014 at 4:14 PM, John-Mark Gurney wrote: > > So, our loader man pages are currently installed four different times > > during installworld... Once each durning sys/boot/userboot/userboot, > > sys/boot/amd64/efi, sys/boot/i386/loader and sys/boot/i386/zfsloader > > > > This is because sys/boot/common/Makefile.inc defines the man pages, and > > each of these locations include that Makefile... > > > > It seems like the logical thing to do is to create a sys/boot/man that > > only installed man pages... This will partly move us to always > > installing all man pages on all archs... > > Should this manpages just be installed as part of > share/man/man
instead? Ugh, no. We should keep manpages out of there when possible. E.g. all the pthread manpages should move next to libthr (now that we only have one thread library). I would also like to eventually move kernel manpages into sys (perhaps sys/man, though it would be really nice to put driver manpages into sys/dev/foo if possible). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 19:34:10 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6F0A7E64; Tue, 28 Oct 2014 19:34:10 +0000 (UTC) Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com [IPv6:2a00:1450:400c:c00::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CFFA2BFB; Tue, 28 Oct 2014 19:34:09 +0000 (UTC) Received: by mail-wg0-f41.google.com with SMTP id k14so279534wgh.14 for ; Tue, 28 Oct 2014 12:34:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=/aDpkQjplcyzjm2A6G41Q0oy4tZyAMn6R7C9/6NS6lw=; b=ysubuULWHwMjbbZMIvxHCC+s23v3LQcyNfDhfZgTJYXKf6M+T4af0McRqZneNk8tLB Q18OSRN3DXme5c526mpU4rqgP4tg9WWSD9/IjdOISrqiaj17zBkKVlxLq8thESsAc6Ka wZrPmRPLIlK7J4l8M15tJzE3dtKz3jPCwy1r+fyCCypcPHfED9FV+OF146qb1zMn6Oeb SJvkfMPUCvrKQHuGOPVytoESguKqq//099hxGQQ+X1DS1U4EDWdNjnRdHYYQoNF0/oqZ FLwL+rrznf9tZjOBzVYQvdGJo2zx9crgikrxvLI/FeCz0uHkGUtzpNpIxEx+rBymSHaQ 8FXg== X-Received: by 10.194.158.4 with SMTP id wq4mr7192449wjb.58.1414524847957; Tue, 28 Oct 2014 12:34:07 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id dg3sm3224593wib.14.2014.10.28.12.34.06 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 28 Oct 2014 12:34:07 -0700 (PDT) Date: Tue, 28 Oct 2014 20:34:04 +0100 From: Mateusz Guzik To: John Baldwin Subject: Re: refcount_release_take_##lock Message-ID: <20141028193404.GB12014@dft-labs.eu> References: <20141025184448.GA19066@dft-labs.eu> <201410281154.54581.jhb@freebsd.org> <20141028174428.GA12014@dft-labs.eu> <201410281413.58414.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <201410281413.58414.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: John-Mark Gurney , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 19:34:10 -0000 On Tue, Oct 28, 2014 at 02:13:58PM -0400, John Baldwin wrote: > On Tuesday, October 28, 2014 1:44:28 pm Mateusz Guzik wrote: > > diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c > > index f8ae0e6..e94ccde 100644 > > --- a/sys/kern/kern_jail.c > > +++ b/sys/kern/kern_jail.c > > The diff looks good to me. Just need to update refcount.9 as well. > diff --git a/share/man/man9/refcount.9 b/share/man/man9/refcount.9 index e7702a2..61b9b51 100644 --- a/share/man/man9/refcount.9 +++ b/share/man/man9/refcount.9 @@ -26,7 +26,7 @@ .\" .\" $FreeBSD$ .\" -.Dd January 20, 2009 +.Dd October 28, 2014 .Dt REFCOUNT 9 .Os .Sh NAME @@ -44,6 +44,15 @@ .Fn refcount_acquire "volatile u_int *count" .Ft int .Fn refcount_release "volatile u_int *count" +.In sys/mutex.h +.Fn refcount_release_lock_mtx "volatile u_int *count, struct mtx *lock" +.In sys/rmlock.h +.Fn refcount_release_lock_rmlock "volatile u_int *count, struct rmlock *lock" +.In sys/rwlock.h +.Fn refcount_release_lock_rwlock "volatile u_int *count, struct rwlock *lock" +.In sys/lock.h +.In sys/sx.h +.Fn refcount_release_lock_sx "volatile u_int *count, struct sx *lock" .Sh DESCRIPTION The .Nm @@ -77,6 +86,13 @@ The function returns a non-zero value if the reference being released was the last reference; otherwise, it returns zero. .Pp +.Fn refcount_release_lock_* +functions release an existing reference holding the lock if it is the last +reference. +These functions return with the lock held and a non-zero value if the reference +being released was the last reference; +otherwise, they returns zero and the lock is not held. +.Pp Note that these routines do not provide any inter-CPU synchronization, data protection, or memory ordering guarantees except for managing the counter. @@ -91,6 +107,18 @@ The .Nm refcount_release function returns non-zero when releasing the last reference and zero when releasing any other reference. +.Pp +.Nm refcount_release_lock_* +functions return with the lock held and non-zero value when releasing the last +reference, zero without the lock held when releasing any other reference. .Sh HISTORY -These functions were introduced in +.Fn refcount_init , +.Fn refcount_acquire +and +.Fn refcount_release +functions were introduced in .Fx 6.0 . +.Pp +.Fn refcount_release_lock_* +functions were introduced in +.Fx 10.2 . -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 20:08:29 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C9732F1E; Tue, 28 Oct 2014 20:08:29 +0000 (UTC) Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com [IPv6:2a00:1450:400c:c05::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F1F08F71; Tue, 28 Oct 2014 20:08:28 +0000 (UTC) Received: by mail-wi0-f174.google.com with SMTP id q5so10457204wiv.7 for ; Tue, 28 Oct 2014 13:08:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=SWEVK100JCP3hH6nIA5W048mTL2DlE6AZHkVV+IePd8=; b=nav94NyciEgZmh9wHr3O3aCWtbbOu70kmAyPv5/MYTISIVig9/1kBzfJUxJhL8Gan/ eLQ5/k1C/2mcRq49GSZaC2PZZdVv7n6yxYYrvpGjyhgSj/VLK2kGq4s0sx0i5Sw1JUT4 U6KqpJqjNnYnspJ7jZtFJcnkNFeUoTn4XZZo8Ks4RlHer5wWR2R2GhgPoNJ9MHM6FBlM 0b/fzUcpcg1p5L12qBaynuPAcnEwVWqACFXaGL6seaY1/es5nfPEqAE4RhyrDpdK7fbP FWba5d35DdjWX/ImQvus/34RnhmtOqftFU1lweemp8zZa4o6ubMGa6p2zdRAL9dnng/l 5J6g== MIME-Version: 1.0 X-Received: by 10.180.83.37 with SMTP id n5mr31131906wiy.7.1414526907071; Tue, 28 Oct 2014 13:08:27 -0700 (PDT) Reply-To: attilio@FreeBSD.org Sender: asmrookie@gmail.com Received: by 10.217.69.73 with HTTP; Tue, 28 Oct 2014 13:08:27 -0700 (PDT) In-Reply-To: <20141028175318.709d2ef6@bender.lan> References: <20141028025222.GA19223@dft-labs.eu> <20141028142510.10a9d3cb@bender.lan> <20141028175318.709d2ef6@bender.lan> Date: Tue, 28 Oct 2014 21:08:27 +0100 X-Google-Sender-Auth: c8HZhVde7fGTdTrH6xz9ynAAlLQ Message-ID: Subject: Re: atomic ops From: Attilio Rao To: Andrew Turner Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-arch@freebsd.org" , Adrian Chadd , Mateusz Guzik , Konstantin Belousov , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 20:08:29 -0000 On Tue, Oct 28, 2014 at 6:53 PM, Andrew Turner wrote: > On Tue, 28 Oct 2014 15:33:06 +0100 > Attilio Rao wrote: >> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner >> wrote: >> > On Tue, 28 Oct 2014 14:18:41 +0100 >> > Attilio Rao wrote: >> > >> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik >> >> wrote: >> >> > As was mentioned sometime ago, our situation related to atomic >> >> > ops is not ideal. >> >> > >> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) >> >> > provide full memory barriers, which is stronger than needed. >> >> > >> >> > Moreover, load is implemented as lock cmpchg on var address, so >> >> > it is addditionally slower especially when cpus compete. >> >> >> >> I already explained this once privately: fully memory barriers is >> >> not stronger than needed. >> >> FreeBSD has a different semantic than Linux. We historically >> >> enforce a full barrier on _acq() and _rel() rather then just a >> >> read and write barrier, hence we need a different implementation >> >> than Linux. There is code that relies on this property, like the >> >> locking primitives (release a mutex, for instance). >> > >> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support) >> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has >> > added support for load-acquire and store-release atomic >> > instructions. For the use in atomic instructions we can assume >> > these only operate of the address passed to them. >> > >> > It is unlikely we will use them in the 32-bit port however I would >> > like to know the expected semantics of these atomic functions to >> > make sure we get them correct in the arm64 port. I have been >> > advised by one of the ARM Linux kernel maintainers on the problems >> > they have found using these instructions but have yet to determine >> > what our atomic functions guarantee. >> >> For FreeBSD the "reference doc" is atomic(9). >> It clearly states: > > There may also be a difference between what it states, how they are > implemented, and what developers assume they do. I'm trying to make > sure I get them correct. atomic(9) is our reference so there might be no difference between what it states and what all architectures implement. I can say that x86 follows atomic(9) well. I'm not competent enough to judge if all the !x86 arches follow it completely. I can understand that developers may get confused. The FreeBSD scheme is pretty unique. It comes from the fact that historically the membar support was made to initially support x86. The super-widespread Linux design, instead, tried to catch all architectures in its description. It become very well known and I think it also "pushed" for companies like Intel to invest in improving performance of things like explicit read/write barriers, etc. >> The second variant of each operation includes a read memory barrier. >> This barrier ensures that the effects of this operation are completed >> before the effects of any later data accesses. As a result, the >> opera- tion is said to have acquire semantics as it acquires a >> pseudo-lock requiring further operations to wait until it has >> completed. To denote this, the suffix ``_acq'' is inserted into the >> function name immediately prior to the ``_'' suffix. For >> example, to subtract two integers ensuring that any later writes will >> happen after the subtraction is per- formed, use >> atomic_subtract_acq_int(). > > It depends on the point we guarantee the acquire barrier to be. On ARMv8 > the function will be a load/modify/write sequence. If we use a > load-acquire operation for atomic_subtract_acq_int, for example, for a > pointer P and value to subtract X: > > loop: > load-acquire *P to N > perform N = N - X > store-exclusive N to *P > if the store failed goto loop > > where N and X are both registers. > > This will mean no access after this loop will happen before it, but > they may happen within it, e.g. if there was a later access A the > following may be possible: > > Load P > Access A > Store P No, this will be broken in FreeBSD if "Access A" is later. If "Access A" is prior the membar it doesn't really matter if it gets interleaved with any of the operations in the atomic instruction. Ideally, it could even surpass the Store P itself. But if "Access A" is later (and you want to implement an _acq() barrier) then it cannot absolutely gets in the middle of the atomic_* operation. > We know the store will happen as if it fails, e.g. another processor > access *P, the store will have failed and will iterate over the loop. > > The other point is we can guarantee any store-release, and therefore > any prior access, has happened before a later load-acquire even if it's > on another processor. No, we can never guarantee on the visibility of the operations by other CPUs. We just make guarantee on how the operations are posted on the system bus (or how they are locally visible). Keeping in mind that FreeBSD model cames from x86, you can sense that some things are sized on the x86 model, which doesn't have any rule or ordering on global visibility of the operations. > ... > >> The bottom-side of all this is that read memory barriers ensures that >> the effect of the operations you are making (load in case of >> atomic_load_acq_int(), for example) are completed before any later >> data accesses. "Data accesses" qualifies for *all* the operations >> including read, writes, etc. This is very different by what Linux >> assumes for its rmb() barrier, for example which just orders loads. So >> for FreeBSD there is no _acq -> rmb() analogy and there is no _rel -> >> wmb() analogy. > > On ARMv8 using the above pseudo-code the operation later operations > will not be moved before the load-acquire, but they may happen before > it's store. Having discussed this with John Baldwin I don't think this > is a problem due to the nature of the store operation being allowed to > fail if another processor has written its memory. > >> >> This must be kept well in mind when trying to optimize the atomic_*() >> operations. > > At this point I'm more interested in getting them correct as they will > be important when I start on SMP support. Sure. The thread as started as an "optimization of x86" but it refers to all atomic_* on every architecture FreeBSD supports. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 06:09:39 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 94326AB5; Wed, 29 Oct 2014 06:09:39 +0000 (UTC) Received: from vps.rulingia.com (vps.rulingia.com [103.243.244.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps.rulingia.com", Issuer "CAcert Class 3 Root" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 0DB08390; Wed, 29 Oct 2014 06:09:37 +0000 (UTC) Received: from server.rulingia.com (c220-239-242-83.belrs5.nsw.optusnet.com.au [220.239.242.83]) by vps.rulingia.com (8.14.9/8.14.9) with ESMTP id s9T69MFA077927 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Oct 2014 17:09:28 +1100 (AEDT) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.9/8.14.9) with ESMTP id s9T69GkJ061291 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 29 Oct 2014 17:09:16 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.9/8.14.9/Submit) id s9T69FKs061290; Wed, 29 Oct 2014 17:09:15 +1100 (EST) (envelope-from peter) Date: Wed, 29 Oct 2014 17:09:15 +1100 From: Peter Jeremy To: Mateusz Guzik Subject: Re: amd64 modules still use atomics as callable functions Message-ID: <20141029060915.GA56181@server.rulingia.com> References: <20141027224901.GC28049@dft-labs.eu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Dxnq1zWXvFF0Q93v" Content-Disposition: inline In-Reply-To: <20141027224901.GC28049@dft-labs.eu> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.23 (2014-03-12) Cc: Alan Cox , Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 06:09:39 -0000 --Dxnq1zWXvFF0Q93v Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2014-Oct-27 23:49:01 +0100, Mateusz Guzik wrote: >Turns out several years ago the kernel was modified to provide actual >functions for atomic operations and modules are always using them. > >I propose plugging it on amd64 in head. > >For stable/10 we can always provide them, but inline in modules by default >(testing a KLD_WANT_ATOMIC_FUNC knob?). See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D173322 --=20 Peter Jeremy --Dxnq1zWXvFF0Q93v Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQJ8BAEBCgBmBQJUUISLXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFRUIyOTg2QzMwNjcxRTc0RTY1QzIyN0Ux NkE1OTdBMEU0QTIwQjM0AAoJEBall6Dkogs0I98P/ji+wVot8LztKeBy3A3J76ny 6InZ1+HTkApGBG4aiEhTBijDZywZENXvs7oU3e8bnvGZdO1/wdzGsbD05XZBkYjA fI5haPKAKv8sp91pTbrE2C/TpRthPQpuRTsTFUlCyfAS7Owxd7+HryDkvz6socGy JWeqqgT3OsVwSwtuoeNFtvSqpDlXLKVGgsIEIcQqVjnRYWkf0VxuKvPclvRKiuxQ DzLi/dQhiIAwGaGZMJ7FTNZjNKhZ/qliENPueIMbAHgUlcHbd7i7cCc3dr4EVNfq GEVYWxwYLCmtQCSTnpvRmOjsceUpfsR6tKVrGWvjUdThgKWWH3XVL1D9XSimF3Xg pxX3hklS5aNDEzlm+McidlIH8nNWCSsHPZm0A5in+QROJUg4T7hjWvgIXQmpC/f1 Dd713JV0g/C+NdUwkKgYm09t0WY36BdrntTuN3dDPUY7WVA/uEcjFeI07OCMogOU XwLaFSNtwpH4BzOOc5FxzjAZ2GNEHisek7QGFk/g3wfdVMtC57hXZ7eYI6jsTQ8v F1q/3pI9+j9y9gObAm7+08s1HHULJbgez/Od8z2aGLkMIK1fbcSS76H6moG62MnQ pIE9C6IqqiSUtKo9zYjmp6C+XAGHEnKhOo+XgZ3vDqdvZt7XrKRG9LH/U1x9Eten L4ihGzecF+4LyLNNp4Ff =rOZX -----END PGP SIGNATURE----- --Dxnq1zWXvFF0Q93v-- From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 16:02:18 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3FCBDEB0; Wed, 29 Oct 2014 16:02:18 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F0FCBDEF; Wed, 29 Oct 2014 16:02:17 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 03BD8B915; Wed, 29 Oct 2014 12:02:16 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org, attilio@freebsd.org Subject: Re: atomic ops Date: Wed, 29 Oct 2014 10:59:16 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141028025222.GA19223@dft-labs.eu> <20141028175318.709d2ef6@bender.lan> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201410291059.16829.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 29 Oct 2014 12:02:16 -0400 (EDT) Cc: Adrian Chadd , Mateusz Guzik , Konstantin Belousov , Andrew Turner , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 16:02:18 -0000 On Tuesday, October 28, 2014 4:08:27 pm Attilio Rao wrote: > On Tue, Oct 28, 2014 at 6:53 PM, Andrew Turner wrote: > > On Tue, 28 Oct 2014 15:33:06 +0100 > > Attilio Rao wrote: > >> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner > >> wrote: > >> > On Tue, 28 Oct 2014 14:18:41 +0100 > >> > Attilio Rao wrote: > >> > > >> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik > >> >> wrote: > >> >> > As was mentioned sometime ago, our situation related to atomic > >> >> > ops is not ideal. > >> >> > > >> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) > >> >> > provide full memory barriers, which is stronger than needed. > >> >> > > >> >> > Moreover, load is implemented as lock cmpchg on var address, so > >> >> > it is addditionally slower especially when cpus compete. > >> >> > >> >> I already explained this once privately: fully memory barriers is > >> >> not stronger than needed. > >> >> FreeBSD has a different semantic than Linux. We historically > >> >> enforce a full barrier on _acq() and _rel() rather then just a > >> >> read and write barrier, hence we need a different implementation > >> >> than Linux. There is code that relies on this property, like the > >> >> locking primitives (release a mutex, for instance). > >> > > >> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support) > >> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has > >> > added support for load-acquire and store-release atomic > >> > instructions. For the use in atomic instructions we can assume > >> > these only operate of the address passed to them. > >> > > >> > It is unlikely we will use them in the 32-bit port however I would > >> > like to know the expected semantics of these atomic functions to > >> > make sure we get them correct in the arm64 port. I have been > >> > advised by one of the ARM Linux kernel maintainers on the problems > >> > they have found using these instructions but have yet to determine > >> > what our atomic functions guarantee. > >> > >> For FreeBSD the "reference doc" is atomic(9). > >> It clearly states: > > > > There may also be a difference between what it states, how they are > > implemented, and what developers assume they do. I'm trying to make > > sure I get them correct. > > atomic(9) is our reference so there might be no difference between > what it states and what all architectures implement. > I can say that x86 follows atomic(9) well. I'm not competent enough to > judge if all the !x86 arches follow it completely. > I can understand that developers may get confused. The FreeBSD scheme > is pretty unique. It comes from the fact that historically the membar > support was made to initially support x86. The super-widespread Linux > design, instead, tried to catch all architectures in its description. > It become very well known and I think it also "pushed" for companies > like Intel to invest in improving performance of things like explicit > read/write barriers, etc. Actually, it was designed to support ia64 (and specifically the .acq and .rel modifiers on the ld, st, and cmpxchg instructions). Some of the langage is wrong (and is my fault) in that they are not "read" and "write" barriers. They truly are "acquire" and "release". That said, x86 has stronger barriers than that, partly because on i386 there wasn't a whole lot of options (though atomic_store_rel on even i386 should just be a simple store). > >> The second variant of each operation includes a read memory barrier. > >> This barrier ensures that the effects of this operation are completed > >> before the effects of any later data accesses. As a result, the > >> opera- tion is said to have acquire semantics as it acquires a > >> pseudo-lock requiring further operations to wait until it has > >> completed. To denote this, the suffix ``_acq'' is inserted into the > >> function name immediately prior to the ``_'' suffix. For > >> example, to subtract two integers ensuring that any later writes will > >> happen after the subtraction is per- formed, use > >> atomic_subtract_acq_int(). > > > > It depends on the point we guarantee the acquire barrier to be. On ARMv8 > > the function will be a load/modify/write sequence. If we use a > > load-acquire operation for atomic_subtract_acq_int, for example, for a > > pointer P and value to subtract X: > > > > loop: > > load-acquire *P to N > > perform N = N - X > > store-exclusive N to *P > > if the store failed goto loop > > > > where N and X are both registers. > > > > This will mean no access after this loop will happen before it, but > > they may happen within it, e.g. if there was a later access A the > > following may be possible: > > > > Load P > > Access A > > Store P > > No, this will be broken in FreeBSD if "Access A" is later. > If "Access A" is prior the membar it doesn't really matter if it gets > interleaved with any of the operations in the atomic instruction. > Ideally, it could even surpass the Store P itself. > But if "Access A" is later (and you want to implement an _acq() > barrier) then it cannot absolutely gets in the middle of the atomic_* > operation. Eh, that isn't broken. It is subtle however. The reason it isn't broken is that if any access to P occurs afer the 'load P', then the store will fail and the load-acquire will be retried, if A was accessed during the atomi op, the load-acquire during the try will discard that and force A to be re-accessed. If P is not accessed during the atomic op, then it is safe to access A during the atomic op itself. > > We know the store will happen as if it fails, e.g. another processor > > access *P, the store will have failed and will iterate over the loop. > > > > The other point is we can guarantee any store-release, and therefore > > any prior access, has happened before a later load-acquire even if it's > > on another processor. > > No, we can never guarantee on the visibility of the operations by other CPUs. > We just make guarantee on how the operations are posted on the system > bus (or how they are locally visible). > Keeping in mind that FreeBSD model cames from x86, you can sense that > some things are sized on the x86 model, which doesn't have any rule or > ordering on global visibility of the operations. 1) Again, it's actually based on ia64. 2) x86 _does_ have rules on ordering of global visiblity in that most stores (aside from some SSE special cases) will become visible in program order. Now, you can't force the _timing_ of when the stores become visible (and this is true in general, in MI code you can't assume that a barrier is equivalent to a cache flush). 3) In this case I think Andrew is using "armv8" for "we" and you can depend on architecture-specific semantics to determine the implementation of atomic(9). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 16:33:38 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 88151130; Wed, 29 Oct 2014 16:33:38 +0000 (UTC) Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com [IPv6:2a00:1450:400c:c05::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A27DD1F1; Wed, 29 Oct 2014 16:33:37 +0000 (UTC) Received: by mail-wi0-f174.google.com with SMTP id d1so2217340wiv.7 for ; Wed, 29 Oct 2014 09:33:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=TF/uYCoSxUDq2sBIVwBDOiYd5dX90SoSJuXcg7pkGKY=; b=I4y3JFhgPbTUASnA/hcFMavI66+djGrw96bVaLQYIhtTce3HtZ/3cYfmOZK6K7P27D fjk6WGUQJ+zL4PX+0YveMH44HyAyFIzDWVuYy4O4XsqcxRVex3uBzuQO83IVjczbiI16 W10TWdN1t1Net12z5D3TgLD1bZJ09KooxBmdo3Hoj0VEWrv63TmoAtANj28FCVJW0myR aTqJBGcIUeo7xpjl3BRL2nPdlcxZqXfNmcDe2qqvXwOe/nRDd5k792H/CXTUuxBY0ytQ Z3gXx+zKgG2ho4HkipvUlrZWk6MzEMyJUHBpGxMyNt00q4hqD1CJXIMokDXdyFXvqdY1 KZNg== MIME-Version: 1.0 X-Received: by 10.180.19.234 with SMTP id i10mr7995696wie.28.1414600415661; Wed, 29 Oct 2014 09:33:35 -0700 (PDT) Reply-To: attilio@FreeBSD.org Sender: asmrookie@gmail.com Received: by 10.217.69.73 with HTTP; Wed, 29 Oct 2014 09:33:35 -0700 (PDT) In-Reply-To: <201410291059.16829.jhb@freebsd.org> References: <20141028025222.GA19223@dft-labs.eu> <20141028175318.709d2ef6@bender.lan> <201410291059.16829.jhb@freebsd.org> Date: Wed, 29 Oct 2014 17:33:35 +0100 X-Google-Sender-Auth: wInE1xvvT49TWCYSJ5g93hdZTYc Message-ID: Subject: Re: atomic ops From: Attilio Rao To: John Baldwin Content-Type: text/plain; charset=UTF-8 Cc: Adrian Chadd , Mateusz Guzik , Alan Cox , Andrew Turner , Konstantin Belousov , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 16:33:38 -0000 On Wed, Oct 29, 2014 at 3:59 PM, John Baldwin wrote: > On Tuesday, October 28, 2014 4:08:27 pm Attilio Rao wrote: >> On Tue, Oct 28, 2014 at 6:53 PM, Andrew Turner wrote: >> > On Tue, 28 Oct 2014 15:33:06 +0100 >> > Attilio Rao wrote: >> >> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner >> >> wrote: >> >> > On Tue, 28 Oct 2014 14:18:41 +0100 >> >> > Attilio Rao wrote: >> >> > >> >> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik >> >> >> wrote: >> >> >> > As was mentioned sometime ago, our situation related to atomic >> >> >> > ops is not ideal. >> >> >> > >> >> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) >> >> >> > provide full memory barriers, which is stronger than needed. >> >> >> > >> >> >> > Moreover, load is implemented as lock cmpchg on var address, so >> >> >> > it is addditionally slower especially when cpus compete. >> >> >> >> >> >> I already explained this once privately: fully memory barriers is >> >> >> not stronger than needed. >> >> >> FreeBSD has a different semantic than Linux. We historically >> >> >> enforce a full barrier on _acq() and _rel() rather then just a >> >> >> read and write barrier, hence we need a different implementation >> >> >> than Linux. There is code that relies on this property, like the >> >> >> locking primitives (release a mutex, for instance). >> >> > >> >> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support) >> >> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has >> >> > added support for load-acquire and store-release atomic >> >> > instructions. For the use in atomic instructions we can assume >> >> > these only operate of the address passed to them. >> >> > >> >> > It is unlikely we will use them in the 32-bit port however I would >> >> > like to know the expected semantics of these atomic functions to >> >> > make sure we get them correct in the arm64 port. I have been >> >> > advised by one of the ARM Linux kernel maintainers on the problems >> >> > they have found using these instructions but have yet to determine >> >> > what our atomic functions guarantee. >> >> >> >> For FreeBSD the "reference doc" is atomic(9). >> >> It clearly states: >> > >> > There may also be a difference between what it states, how they are >> > implemented, and what developers assume they do. I'm trying to make >> > sure I get them correct. >> >> atomic(9) is our reference so there might be no difference between >> what it states and what all architectures implement. >> I can say that x86 follows atomic(9) well. I'm not competent enough to >> judge if all the !x86 arches follow it completely. >> I can understand that developers may get confused. The FreeBSD scheme >> is pretty unique. It comes from the fact that historically the membar >> support was made to initially support x86. The super-widespread Linux >> design, instead, tried to catch all architectures in its description. >> It become very well known and I think it also "pushed" for companies >> like Intel to invest in improving performance of things like explicit >> read/write barriers, etc. > > Actually, it was designed to support ia64 (and specifically the .acq and > .rel modifiers on the ld, st, and cmpxchg instructions). Some of the > langage is wrong (and is my fault) in that they are not "read" and > "write" barriers. They truly are "acquire" and "release". That said, > x86 has stronger barriers than that, partly because on i386 there wasn't > a whole lot of options (though atomic_store_rel on even i386 should just > be a simple store). > >> >> The second variant of each operation includes a read memory barrier. >> >> This barrier ensures that the effects of this operation are completed >> >> before the effects of any later data accesses. As a result, the >> >> opera- tion is said to have acquire semantics as it acquires a >> >> pseudo-lock requiring further operations to wait until it has >> >> completed. To denote this, the suffix ``_acq'' is inserted into the >> >> function name immediately prior to the ``_'' suffix. For >> >> example, to subtract two integers ensuring that any later writes will >> >> happen after the subtraction is per- formed, use >> >> atomic_subtract_acq_int(). >> > >> > It depends on the point we guarantee the acquire barrier to be. On ARMv8 >> > the function will be a load/modify/write sequence. If we use a >> > load-acquire operation for atomic_subtract_acq_int, for example, for a >> > pointer P and value to subtract X: >> > >> > loop: >> > load-acquire *P to N >> > perform N = N - X >> > store-exclusive N to *P >> > if the store failed goto loop >> > >> > where N and X are both registers. >> > >> > This will mean no access after this loop will happen before it, but >> > they may happen within it, e.g. if there was a later access A the >> > following may be possible: >> > >> > Load P >> > Access A >> > Store P >> >> No, this will be broken in FreeBSD if "Access A" is later. >> If "Access A" is prior the membar it doesn't really matter if it gets >> interleaved with any of the operations in the atomic instruction. >> Ideally, it could even surpass the Store P itself. >> But if "Access A" is later (and you want to implement an _acq() >> barrier) then it cannot absolutely gets in the middle of the atomic_* >> operation. > > Eh, that isn't broken. It is subtle however. The reason it isn't broken > is that if any access to P occurs afer the 'load P', then the store will > fail and the load-acquire will be retried, if A was accessed during the > atomi op, the load-acquire during the try will discard that and force A > to be re-accessed. If P is not accessed during the atomic op, then it is > safe to access A during the atomic op itself. This is specific to armv8, which I know 0 about. Good to know. >From a general point of view the description didn't seem ok. >> > We know the store will happen as if it fails, e.g. another processor >> > access *P, the store will have failed and will iterate over the loop. >> > >> > The other point is we can guarantee any store-release, and therefore >> > any prior access, has happened before a later load-acquire even if it's >> > on another processor. >> >> No, we can never guarantee on the visibility of the operations by other CPUs. >> We just make guarantee on how the operations are posted on the system >> bus (or how they are locally visible). >> Keeping in mind that FreeBSD model cames from x86, you can sense that >> some things are sized on the x86 model, which doesn't have any rule or >> ordering on global visibility of the operations. > > 1) Again, it's actually based on ia64. > > 2) x86 _does_ have rules on ordering of global visiblity in that most > stores (aside from some SSE special cases) will become visible in > program order. Now, you can't force the _timing_ of when the stores > become visible (and this is true in general, in MI code you can't > assume that a barrier is equivalent to a cache flush). Yes, this is what I mean. You can't have guarantee on the global timing of the memory accesses. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 16:58:20 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CDE9EFBF; Wed, 29 Oct 2014 16:58:20 +0000 (UTC) Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8B9166AE; Wed, 29 Oct 2014 16:58:20 +0000 (UTC) Received: from [73.34.117.227] (helo=ilsoft.org) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1XjWZe-0005En-AH; Wed, 29 Oct 2014 16:58:18 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id s9TGwGP8081108; Wed, 29 Oct 2014 10:58:16 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 73.34.117.227 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX18JTPxJ34qVhZExi5qdQV6H X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan [172.22.42.240] claimed to be [172.22.42.240] Subject: Re: atomic ops From: Ian Lepore To: John Baldwin In-Reply-To: <201410291059.16829.jhb@freebsd.org> References: <20141028025222.GA19223@dft-labs.eu> <20141028175318.709d2ef6@bender.lan> <201410291059.16829.jhb@freebsd.org> Content-Type: text/plain; charset="us-ascii" Date: Wed, 29 Oct 2014 10:58:15 -0600 Message-ID: <1414601895.17308.89.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: Adrian Chadd , Mateusz Guzik , Alan Cox , Andrew Turner , attilio@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 16:58:20 -0000 On Wed, 2014-10-29 at 10:59 -0400, John Baldwin wrote: > On Tuesday, October 28, 2014 4:08:27 pm Attilio Rao wrote: > > On Tue, Oct 28, 2014 at 6:53 PM, Andrew Turner wrote: > > > On Tue, 28 Oct 2014 15:33:06 +0100 > > > Attilio Rao wrote: > > >> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner > > >> wrote: > > >> > On Tue, 28 Oct 2014 14:18:41 +0100 > > >> > Attilio Rao wrote: > > >> > > > >> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik > > >> >> wrote: > > >> >> > As was mentioned sometime ago, our situation related to atomic > > >> >> > ops is not ideal. > > >> >> > > > >> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) > > >> >> > provide full memory barriers, which is stronger than needed. > > >> >> > > > >> >> > Moreover, load is implemented as lock cmpchg on var address, so > > >> >> > it is addditionally slower especially when cpus compete. > > >> >> > > >> >> I already explained this once privately: fully memory barriers is > > >> >> not stronger than needed. > > >> >> FreeBSD has a different semantic than Linux. We historically > > >> >> enforce a full barrier on _acq() and _rel() rather then just a > > >> >> read and write barrier, hence we need a different implementation > > >> >> than Linux. There is code that relies on this property, like the > > >> >> locking primitives (release a mutex, for instance). > > >> > > > >> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support) > > >> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has > > >> > added support for load-acquire and store-release atomic > > >> > instructions. For the use in atomic instructions we can assume > > >> > these only operate of the address passed to them. > > >> > > > >> > It is unlikely we will use them in the 32-bit port however I would > > >> > like to know the expected semantics of these atomic functions to > > >> > make sure we get them correct in the arm64 port. I have been > > >> > advised by one of the ARM Linux kernel maintainers on the problems > > >> > they have found using these instructions but have yet to determine > > >> > what our atomic functions guarantee. > > >> > > >> For FreeBSD the "reference doc" is atomic(9). > > >> It clearly states: > > > > > > There may also be a difference between what it states, how they are > > > implemented, and what developers assume they do. I'm trying to make > > > sure I get them correct. > > > > atomic(9) is our reference so there might be no difference between > > what it states and what all architectures implement. > > I can say that x86 follows atomic(9) well. I'm not competent enough to > > judge if all the !x86 arches follow it completely. > > I can understand that developers may get confused. The FreeBSD scheme > > is pretty unique. It comes from the fact that historically the membar > > support was made to initially support x86. The super-widespread Linux > > design, instead, tried to catch all architectures in its description. > > It become very well known and I think it also "pushed" for companies > > like Intel to invest in improving performance of things like explicit > > read/write barriers, etc. > > Actually, it was designed to support ia64 (and specifically the .acq and > .rel modifiers on the ld, st, and cmpxchg instructions). Some of the > langage is wrong (and is my fault) in that they are not "read" and > "write" barriers. They truly are "acquire" and "release". That said, > x86 has stronger barriers than that, partly because on i386 there wasn't > a whole lot of options (though atomic_store_rel on even i386 should just > be a simple store). > > > >> The second variant of each operation includes a read memory barrier. > > >> This barrier ensures that the effects of this operation are completed > > >> before the effects of any later data accesses. As a result, the > > >> opera- tion is said to have acquire semantics as it acquires a > > >> pseudo-lock requiring further operations to wait until it has > > >> completed. To denote this, the suffix ``_acq'' is inserted into the > > >> function name immediately prior to the ``_'' suffix. For > > >> example, to subtract two integers ensuring that any later writes will > > >> happen after the subtraction is per- formed, use > > >> atomic_subtract_acq_int(). > > > > > > It depends on the point we guarantee the acquire barrier to be. On ARMv8 > > > the function will be a load/modify/write sequence. If we use a > > > load-acquire operation for atomic_subtract_acq_int, for example, for a > > > pointer P and value to subtract X: > > > > > > loop: > > > load-acquire *P to N > > > perform N = N - X > > > store-exclusive N to *P > > > if the store failed goto loop > > > > > > where N and X are both registers. > > > > > > This will mean no access after this loop will happen before it, but > > > they may happen within it, e.g. if there was a later access A the > > > following may be possible: > > > > > > Load P > > > Access A > > > Store P > > > > No, this will be broken in FreeBSD if "Access A" is later. > > If "Access A" is prior the membar it doesn't really matter if it gets > > interleaved with any of the operations in the atomic instruction. > > Ideally, it could even surpass the Store P itself. > > But if "Access A" is later (and you want to implement an _acq() > > barrier) then it cannot absolutely gets in the middle of the atomic_* > > operation. > > Eh, that isn't broken. It is subtle however. The reason it isn't broken > is that if any access to P occurs afer the 'load P', then the store will > fail and the load-acquire will be retried, if A was accessed during the > atomi op, the load-acquire during the try will discard that and force A > to be re-accessed. If P is not accessed during the atomic op, then it is > safe to access A during the atomic op itself. > I'm not sure I completely agree with all of this. First, for if any access to P occurs afer the 'load P', then the store will fail and the load-acquire will be retried The term 'access' needs to be changed to 'store'. Other read accesses to P will not cause the store-exclusive to fail. Next, when we consider 'Access A' I'm not sure it's true that the access will replay if the store-exclusive fails and the operation loops. The access to A may have been a prefetch, even a prefetch for data on a predicted upcoming execution branch which may or may not end up being taken. I think the only think that makes an ldrex/strex sequence safe for use in implementing synchronization primitives is to insert a 'dmb' after the acquire loop (after the strex succeeds), and 'dsb' before the release loop (dsb is required for SMP, dmb might be good enough on UP). Looking into this has made me realize our current armv6/7 atomics are incorrect in this regard. Guess I'll see about fixing them up Real Soon Now. :) -- Ian > > > We know the store will happen as if it fails, e.g. another processor > > > access *P, the store will have failed and will iterate over the loop. > > > > > > The other point is we can guarantee any store-release, and therefore > > > any prior access, has happened before a later load-acquire even if it's > > > on another processor. > > > > No, we can never guarantee on the visibility of the operations by other CPUs. > > We just make guarantee on how the operations are posted on the system > > bus (or how they are locally visible). > > Keeping in mind that FreeBSD model cames from x86, you can sense that > > some things are sized on the x86 model, which doesn't have any rule or > > ordering on global visibility of the operations. > > 1) Again, it's actually based on ia64. > > 2) x86 _does_ have rules on ordering of global visiblity in that most > stores (aside from some SSE special cases) will become visible in > program order. Now, you can't force the _timing_ of when the stores > become visible (and this is true in general, in MI code you can't > assume that a barrier is equivalent to a cache flush). > > 3) In this case I think Andrew is using "armv8" for "we" and you can > depend on architecture-specific semantics to determine the implementation > of atomic(9). > From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 17:36:38 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A683DB13; Wed, 29 Oct 2014 17:36:38 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7CADFB18; Wed, 29 Oct 2014 17:36:38 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6DC03B97F; Wed, 29 Oct 2014 13:36:37 -0400 (EDT) From: John Baldwin To: Ian Lepore Subject: Re: atomic ops Date: Wed, 29 Oct 2014 13:35:57 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141028025222.GA19223@dft-labs.eu> <201410291059.16829.jhb@freebsd.org> <1414601895.17308.89.camel@revolution.hippie.lan> In-Reply-To: <1414601895.17308.89.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201410291335.57919.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 29 Oct 2014 13:36:37 -0400 (EDT) Cc: Adrian Chadd , Mateusz Guzik , Alan Cox , Andrew Turner , attilio@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 17:36:38 -0000 On Wednesday, October 29, 2014 12:58:15 pm Ian Lepore wrote: > On Wed, 2014-10-29 at 10:59 -0400, John Baldwin wrote: > > Eh, that isn't broken. It is subtle however. The reason it isn't broken > > is that if any access to P occurs afer the 'load P', then the store will > > fail and the load-acquire will be retried, if A was accessed during the > > atomi op, the load-acquire during the try will discard that and force A > > to be re-accessed. If P is not accessed during the atomic op, then it is > > safe to access A during the atomic op itself. > > > > I'm not sure I completely agree with all of this. > > First, for > > if any access to P occurs afer the 'load P', then the store will > fail and the load-acquire will be retried > > The term 'access' needs to be changed to 'store'. Other read accesses > to P will not cause the store-exclusive to fail. Correct, though for the places where acquire is used I believe that is ok. Certainly for lock cookies it is ok. It's writes to the lock cookie that would invalidate 'A'. > Next, when we consider 'Access A' I'm not sure it's true that the access > will replay if the store-exclusive fails and the operation loops. The > access to A may have been a prefetch, even a prefetch for data on a > predicted upcoming execution branch which may or may not end up being > taken. > > I think the only think that makes an ldrex/strex sequence safe for use > in implementing synchronization primitives is to insert a 'dmb' after > the acquire loop (after the strex succeeds), and 'dsb' before the > release loop (dsb is required for SMP, dmb might be good enough on UP). > > Looking into this has made me realize our current armv6/7 atomics are > incorrect in this regard. Guess I'll see about fixing them up Real Soon > Now. :) I'm not actually sure either, but it would be surprising to me otherwise. Presumably there is nothing magic about a branch. Either the load-acquire is an acquire barrier or it isn't. Namely, suppose you had this sequence: load-acquire P access A (prefetch) load-acquire Q load A Would you expect the prefetch to satisfy the load or should the load-acquire on Q discard that? Having a branch after a failing conditional store back to the load acquire should work similarly. It has to discard anything that was prefetched or it isn't an actual load-acquire. That is consider: 1: load-acquire P access A (prefetch) conditonal-store P branch-if-fail 1b load A In the case that the branch fails, the sequence of operations is: load-acquire P access A (prefetch) conditional-store P branch load-acquire P That should be equivalent to the first sequence above unless the branch instruction has the magical property of disabling memory barriers on the instruction after a branch (which would be insane). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 17:50:29 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E62AD21B for ; Wed, 29 Oct 2014 17:50:29 +0000 (UTC) Received: from mail-yk0-f177.google.com (mail-yk0-f177.google.com [209.85.160.177]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A9E6CC98 for ; Wed, 29 Oct 2014 17:50:29 +0000 (UTC) Received: by mail-yk0-f177.google.com with SMTP id 79so1505210ykr.8 for ; Wed, 29 Oct 2014 10:50:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=7UNUoQuwZvnxRXz4FABMxYW4GBXDMSDjLiV2waYoVZI=; b=jaouMgRlQgwrCPw4bjja0rjSKJjCjRGBwZlHAcqOrzCnEsHds7LDeXSseJVkMRlZtr LN1XidwKBWJKYf3PBuAxSvnbXeyu/Ka+K//tbk2nAGDrw1p4KMsrojZiQV7zyuqrFrsN g9dffXLpGg90UmytBoqU62dmIKFi8HpngWvhw7V1du5vt2shsH3A66dCh71rkYpo4kan 6ojTUKKtUSzlc/AXaDHJZDKewuCOV5/e41mBnvUSroKYyTYKopGLuCBQ0vTwcjdvDj5V qcpjvQ4Uwy+vzmKxNbxaejTfmfWXeD1LYnNfYQY/HXIEJzAeA1LU/K/g8IAcUwCOvI3Q OsIg== X-Gm-Message-State: ALoCoQntSTM/4ZSgYseEc5bo7QWGwZb+sgB2/Xt4t/2EaZpzmMqdeeHk/RuC4o6Wd+7dSCGtMACd MIME-Version: 1.0 X-Received: by 10.170.233.6 with SMTP id z6mr3453134ykf.101.1414605023070; Wed, 29 Oct 2014 10:50:23 -0700 (PDT) Received: by 10.170.46.203 with HTTP; Wed, 29 Oct 2014 10:50:22 -0700 (PDT) X-Originating-IP: [62.165.198.134] In-Reply-To: <201410281146.49370.jhb@freebsd.org> References: <20141021094539.GA1877@kib.kiev.ua> <2048849.GkvWliFbyg@ralph.baldwin.cx> <20141027165557.GC1877@kib.kiev.ua> <201410281146.49370.jhb@freebsd.org> Date: Wed, 29 Oct 2014 18:50:22 +0100 Message-ID: Subject: Re: RfC: fueword(9) and casueword(9) From: Oliver Pinter To: John Baldwin Content-Type: text/plain; charset=UTF-8 Cc: Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 17:50:30 -0000 On Tue, Oct 28, 2014 at 4:46 PM, John Baldwin wrote: > On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote: >> On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote: >> > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote: >> > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote: >> > > > A new API should try to fix these __DEVOLATILE() abominations. I think it >> > > > is safe, and even correct, to declare the pointers as volatile const void >> > > > *, since the functions really can handle volatile data, unlike copyin(). >> > > > >> > > > Atomic op functions are declared as taking pointers to volatile for >> > > > similar reasons. Often they are applied to non-volatile data, but >> > > > adding a qualifier is type-safe and doesn't cost efficiency since the >> > > > pointer access is is not known to the compiler. (The last point is not >> > > > so clear -- the compiler can see things in the functions since they are >> > > > inline asm. fueword() isn't inline so its (in)efficiency is not changed.) >> > > > >> > > > The atomic read functions are not declared as taking pointers to const. >> > > > The __DECONST() abomination might be used to work around this bug. >> > > >> > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the >> > > umtx structures definitions. I think that it is bug to mark the lock >> > > words with volatile. I want the fueword(9) interface to be as much >> > > similar to fuword(9), in particular, volatile seems to be not needed. >> > >> > I agree with Bruce here. casuword() already accepts volatile. I also >> > think umtx is correct in marking the field as volatile. They are subject >> > to change without the compiler's knowledge albeit by other threads >> > rather than signal handlers. Having them marked volatile doesn't really >> > matter for the kernel, but the header is also used in userland and is >> > relevant in sem_new.c, etc. >> >> You agree with making fueword() accept volatile const void * as the >> address ? Or do you agree with the existence of the volatile type >> qualifier for the lock field of umtx structures ? > > I agree with both (I thought Bruce only asserted the first). > >> I definitely do not want to make fueword() different from fuword() in >> this aspect. If changing both fueword() and fuword() to take volatile >> const * address, this should be different patch. > > I also agree that fuword() and fueword() should take identical arguments, > so if this change is made it should be a separate patch (and should include > suword()). > > -- > John Baldwin Hi Konstantin! I got this error with clang_complete + vim: "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c" 286L, 8326Csem_wait: Operation not supported sem_wait: Operation not supported Fatal Python error: PyEval_SaveThread: NULL tstate Vim: Caught deadly signal ABRT Vim: Finished. Abort (core dumped) It's on recent HEAD + HardenedBSD patches, so I must to inspect that this is caused by hbsd's changes or your. I don't see this problem on HardenedBSD build, which built on Oct. 23: [1] FreeBSD 11.0-CURRENT #0 0c61f55(hardened/current/master): Thu Oct 23 09:04:50 CEST 2014 [1] op@hardenedbsd:/usr/obj/usr/src/sys/HARDENEDBSD amd64 (currently I build a new kernel, which was based before the fueword changes) If you need help, please ping me. > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 17:54:08 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 003D9478 for ; Wed, 29 Oct 2014 17:54:07 +0000 (UTC) Received: from mail-yh0-f43.google.com (mail-yh0-f43.google.com [209.85.213.43]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B60CAD69 for ; Wed, 29 Oct 2014 17:54:07 +0000 (UTC) Received: by mail-yh0-f43.google.com with SMTP id z6so824985yhz.2 for ; Wed, 29 Oct 2014 10:54:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=Y+BJt6sDSKi/4TJ+XBl0MzDsv6bacnziejLt6tWUs28=; b=B3jHuFuR5hoYtfsHxUUsqaUR9447KcC5fpD1DCZEsn/T6RgtvAf8P1KhyFlgQaJJTf iZiqPIEwvzR6SnXin5Xg3ij2sROnCSJUGmb79qEneNjFbCezgYhkBjFudy2+b2CTeRt5 6/xQW/q8pPXYNKPiAZcp9lpagzkwD7RPqY1VTpxrem9bxDNcrfiV3+9bPkczGk7NNORl wiETaWvWK7B0I77b6BZPNaVQGj5HWMpp1u1RzHv8wzRK6DHkgugtbTQ4dM7uvTEXFJ0F wXSud5N00e226ag6rMi7kne3IvYwrUAiLojEuVyKL3GrWSmJ+YOO2KOgtCRRwiQ8oCAn 5gbw== X-Gm-Message-State: ALoCoQkhvu9QY0apo8WE7ets7gts2IYKdj2HYsMboIaumGT1c258++QyyF/NcEPuym0rk2W2sBkR MIME-Version: 1.0 X-Received: by 10.236.14.229 with SMTP id d65mr3172507yhd.45.1414605241344; Wed, 29 Oct 2014 10:54:01 -0700 (PDT) Received: by 10.170.46.203 with HTTP; Wed, 29 Oct 2014 10:54:00 -0700 (PDT) X-Originating-IP: [62.165.198.134] In-Reply-To: References: <20141021094539.GA1877@kib.kiev.ua> <2048849.GkvWliFbyg@ralph.baldwin.cx> <20141027165557.GC1877@kib.kiev.ua> <201410281146.49370.jhb@freebsd.org> Date: Wed, 29 Oct 2014 18:54:00 +0100 Message-ID: Subject: Re: RfC: fueword(9) and casueword(9) From: Oliver Pinter To: John Baldwin Content-Type: text/plain; charset=UTF-8 Cc: Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 17:54:08 -0000 On Wed, Oct 29, 2014 at 6:50 PM, Oliver Pinter wrote: > On Tue, Oct 28, 2014 at 4:46 PM, John Baldwin wrote: >> On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote: >>> On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote: >>> > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote: >>> > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote: >>> > > > A new API should try to fix these __DEVOLATILE() abominations. I think it >>> > > > is safe, and even correct, to declare the pointers as volatile const void >>> > > > *, since the functions really can handle volatile data, unlike copyin(). >>> > > > >>> > > > Atomic op functions are declared as taking pointers to volatile for >>> > > > similar reasons. Often they are applied to non-volatile data, but >>> > > > adding a qualifier is type-safe and doesn't cost efficiency since the >>> > > > pointer access is is not known to the compiler. (The last point is not >>> > > > so clear -- the compiler can see things in the functions since they are >>> > > > inline asm. fueword() isn't inline so its (in)efficiency is not changed.) >>> > > > >>> > > > The atomic read functions are not declared as taking pointers to const. >>> > > > The __DECONST() abomination might be used to work around this bug. >>> > > >>> > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the >>> > > umtx structures definitions. I think that it is bug to mark the lock >>> > > words with volatile. I want the fueword(9) interface to be as much >>> > > similar to fuword(9), in particular, volatile seems to be not needed. >>> > >>> > I agree with Bruce here. casuword() already accepts volatile. I also >>> > think umtx is correct in marking the field as volatile. They are subject >>> > to change without the compiler's knowledge albeit by other threads >>> > rather than signal handlers. Having them marked volatile doesn't really >>> > matter for the kernel, but the header is also used in userland and is >>> > relevant in sem_new.c, etc. >>> >>> You agree with making fueword() accept volatile const void * as the >>> address ? Or do you agree with the existence of the volatile type >>> qualifier for the lock field of umtx structures ? >> >> I agree with both (I thought Bruce only asserted the first). >> >>> I definitely do not want to make fueword() different from fuword() in >>> this aspect. If changing both fueword() and fuword() to take volatile >>> const * address, this should be different patch. >> >> I also agree that fuword() and fueword() should take identical arguments, >> so if this change is made it should be a separate patch (and should include >> suword()). >> >> -- >> John Baldwin > > Hi Konstantin! > > I got this error with clang_complete + vim: > > "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c" > 286L, 8326Csem_wait: Operation not supported > > sem_wait: Operation not > supported > > > Fatal Python error: PyEval_SaveThread: NULL tstate > Vim: Caught deadly signal ABRT > Vim: Finished. > Abort (core dumped) > > It's on recent HEAD + HardenedBSD patches, so I must to inspect that > this is caused by hbsd's changes or your. > > I don't see this problem on HardenedBSD build, which built on Oct. 23: > [1] FreeBSD 11.0-CURRENT #0 0c61f55(hardened/current/master): Thu Oct > 23 09:04:50 CEST 2014 > [1] op@hardenedbsd:/usr/obj/usr/src/sys/HARDENEDBSD amd64 > > (currently I build a new kernel, which was based before the fueword changes) > > If you need help, please ping me. gdb vim r ... "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c" 286L, 8326C(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...sem_wait: Operation not supported sem_wait: Operation not supported Fatal Python error: PyEval_SaveThread: NULL tstate Program received signal SIGABRT, Aborted. 0x00000009f5bb387a in thr_kill () from /lib/libc.so.7 (gdb) bt #0 0x00000009f5bb387a in thr_kill () from /lib/libc.so.7 #1 0x00000009f5c76849 in abort () from /lib/libc.so.7 #2 0x00000009f566c031 in Py_FatalError () from /usr/local/lib/libpython2.7.so.1 #3 0x00000009f56448f1 in PyEval_SaveThread () from /usr/local/lib/libpython2.7.so.1 #4 0x00000009f79ceef5 in _PyTime_FloatTime () from /usr/local/lib/python2.7/lib-dynload/time.so #5 0x00000009f564a31b in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.7.so.1 #6 0x00000009f564cb42 in _PyEval_SliceIndex () from /usr/local/lib/libpython2.7.so.1 #7 0x00000009f564862b in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.7.so.1 #8 0x00000009f564cb42 in _PyEval_SliceIndex () from /usr/local/lib/libpython2.7.so.1 #9 0x00000009f564862b in PyEval_EvalFrameEx () from /usr/local/lib/libpython2.7.so.1 #10 0x00000009f56452d4 in PyEval_EvalCodeEx () from /usr/local/lib/libpython2.7.so.1 #11 0x00000009f55d63bc in PyFunction_SetClosure () from /usr/local/lib/libpython2.7.so.1 #12 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1 #13 0x00000009f55becc3 in PyMethod_New () from /usr/local/lib/libpython2.7.so.1 #14 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1 #15 0x00000009f564c28d in PyEval_CallObjectWithKeywords () from /usr/local/lib/libpython2.7.so.1 #16 0x00000009f5681916 in initthread () from /usr/local/lib/libpython2.7.so.1 #17 0x00000009f59274f5 in pthread_create () from /lib/libthr.so.3 #18 0x0000000000000000 in ?? () > > >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 18:03:54 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A28CA86F; Wed, 29 Oct 2014 18:03:54 +0000 (UTC) Received: from mho-02-ewr.mailhop.org (mho-02-ewr.mailhop.org [204.13.248.72]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 60E91E9E; Wed, 29 Oct 2014 18:03:54 +0000 (UTC) Received: from [73.34.117.227] (helo=ilsoft.org) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1XjXb6-000ADo-SF; Wed, 29 Oct 2014 18:03:53 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by ilsoft.org (8.14.9/8.14.9) with ESMTP id s9TI3osa081247; Wed, 29 Oct 2014 12:03:50 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 73.34.117.227 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1+2a/JXp6EOyczZ1i5aJJC0 X-Authentication-Warning: paranoia.hippie.lan: Host revolution.hippie.lan [172.22.42.240] claimed to be [172.22.42.240] Subject: Re: atomic ops From: Ian Lepore To: John Baldwin In-Reply-To: <201410291335.57919.jhb@freebsd.org> References: <20141028025222.GA19223@dft-labs.eu> <201410291059.16829.jhb@freebsd.org> <1414601895.17308.89.camel@revolution.hippie.lan> <201410291335.57919.jhb@freebsd.org> Content-Type: text/plain; charset="us-ascii" Date: Wed, 29 Oct 2014 12:03:50 -0600 Message-ID: <1414605830.17308.100.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: Adrian Chadd , Mateusz Guzik , Alan Cox , Andrew Turner , attilio@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 18:03:54 -0000 On Wed, 2014-10-29 at 13:35 -0400, John Baldwin wrote: > On Wednesday, October 29, 2014 12:58:15 pm Ian Lepore wrote: > > On Wed, 2014-10-29 at 10:59 -0400, John Baldwin wrote: > > > Eh, that isn't broken. It is subtle however. The reason it isn't broken > > > is that if any access to P occurs afer the 'load P', then the store will > > > fail and the load-acquire will be retried, if A was accessed during the > > > atomi op, the load-acquire during the try will discard that and force A > > > to be re-accessed. If P is not accessed during the atomic op, then it is > > > safe to access A during the atomic op itself. > > > > > > > I'm not sure I completely agree with all of this. > > > > First, for > > > > if any access to P occurs afer the 'load P', then the store will > > fail and the load-acquire will be retried > > > > The term 'access' needs to be changed to 'store'. Other read accesses > > to P will not cause the store-exclusive to fail. > > Correct, though for the places where acquire is used I believe that is ok. > Certainly for lock cookies it is ok. It's writes to the lock cookie that > would invalidate 'A'. > > > Next, when we consider 'Access A' I'm not sure it's true that the access > > will replay if the store-exclusive fails and the operation loops. The > > access to A may have been a prefetch, even a prefetch for data on a > > predicted upcoming execution branch which may or may not end up being > > taken. > > > > I think the only think that makes an ldrex/strex sequence safe for use > > in implementing synchronization primitives is to insert a 'dmb' after > > the acquire loop (after the strex succeeds), and 'dsb' before the > > release loop (dsb is required for SMP, dmb might be good enough on UP). > > > > Looking into this has made me realize our current armv6/7 atomics are > > incorrect in this regard. Guess I'll see about fixing them up Real Soon > > Now. :) > > I'm not actually sure either, but it would be surprising to me otherwise. > Presumably there is nothing magic about a branch. Either the load-acquire > is an acquire barrier or it isn't. Namely, suppose you had this sequence: > > load-acquire P > access A (prefetch) > load-acquire Q > load A > > Would you expect the prefetch to satisfy the load or should the load-acquire > on Q discard that? Having a branch after a failing conditional store back > to the load acquire should work similarly. It has to discard anything that > was prefetched or it isn't an actual load-acquire. > > That is consider: > > 1: > load-acquire P > access A (prefetch) > conditonal-store P > branch-if-fail 1b > load A > > In the case that the branch fails, the sequence of operations is: > > load-acquire P > access A (prefetch) > conditional-store P > branch > load-acquire P > > That should be equivalent to the first sequence above unless the branch > instruction has the magical property of disabling memory barriers on the > instruction after a branch (which would be insane). > I hadn't realized it when I wrote that, but Andy was speaking in the context of armv8, which has a true load-acquire instruction. In our current code (armv6 and 7) we need the explicit dmb/dsb barriers to get the same effect. (It turns out we do have barriers, I misspoke earlier, but some of our dmb need to be dsb.) -- Ian From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 18:06:42 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8593B9DA; Wed, 29 Oct 2014 18:06:42 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0DF87EC6; Wed, 29 Oct 2014 18:06:41 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9TI6Z0c023223 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Oct 2014 20:06:35 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9TI6Z0c023223 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9TI6ZVU023222; Wed, 29 Oct 2014 20:06:35 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 29 Oct 2014 20:06:35 +0200 From: Konstantin Belousov To: Oliver Pinter Subject: Re: RfC: fueword(9) and casueword(9) Message-ID: <20141029180635.GJ53947@kib.kiev.ua> References: <20141021094539.GA1877@kib.kiev.ua> <2048849.GkvWliFbyg@ralph.baldwin.cx> <20141027165557.GC1877@kib.kiev.ua> <201410281146.49370.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 18:06:42 -0000 On Wed, Oct 29, 2014 at 06:54:00PM +0100, Oliver Pinter wrote: > On Wed, Oct 29, 2014 at 6:50 PM, Oliver Pinter > wrote: > > On Tue, Oct 28, 2014 at 4:46 PM, John Baldwin wrote: > >> On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote: > >>> On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote: > >>> > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote: > >>> > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote: > >>> > > > A new API should try to fix these __DEVOLATILE() abominations. I think it > >>> > > > is safe, and even correct, to declare the pointers as volatile const void > >>> > > > *, since the functions really can handle volatile data, unlike copyin(). > >>> > > > > >>> > > > Atomic op functions are declared as taking pointers to volatile for > >>> > > > similar reasons. Often they are applied to non-volatile data, but > >>> > > > adding a qualifier is type-safe and doesn't cost efficiency since the > >>> > > > pointer access is is not known to the compiler. (The last point is not > >>> > > > so clear -- the compiler can see things in the functions since they are > >>> > > > inline asm. fueword() isn't inline so its (in)efficiency is not changed.) > >>> > > > > >>> > > > The atomic read functions are not declared as taking pointers to const. > >>> > > > The __DECONST() abomination might be used to work around this bug. > >>> > > > >>> > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the > >>> > > umtx structures definitions. I think that it is bug to mark the lock > >>> > > words with volatile. I want the fueword(9) interface to be as much > >>> > > similar to fuword(9), in particular, volatile seems to be not needed. > >>> > > >>> > I agree with Bruce here. casuword() already accepts volatile. I also > >>> > think umtx is correct in marking the field as volatile. They are subject > >>> > to change without the compiler's knowledge albeit by other threads > >>> > rather than signal handlers. Having them marked volatile doesn't really > >>> > matter for the kernel, but the header is also used in userland and is > >>> > relevant in sem_new.c, etc. > >>> > >>> You agree with making fueword() accept volatile const void * as the > >>> address ? Or do you agree with the existence of the volatile type > >>> qualifier for the lock field of umtx structures ? > >> > >> I agree with both (I thought Bruce only asserted the first). > >> > >>> I definitely do not want to make fueword() different from fuword() in > >>> this aspect. If changing both fueword() and fuword() to take volatile > >>> const * address, this should be different patch. > >> > >> I also agree that fuword() and fueword() should take identical arguments, > >> so if this change is made it should be a separate patch (and should include > >> suword()). > >> > >> -- > >> John Baldwin > > > > Hi Konstantin! > > > > I got this error with clang_complete + vim: > > > > "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c" > > 286L, 8326Csem_wait: Operation not supported > > > > sem_wait: Operation not > > supported > > > > > > Fatal Python error: PyEval_SaveThread: NULL tstate > > Vim: Caught deadly signal ABRT > > Vim: Finished. > > Abort (core dumped) > > > > It's on recent HEAD + HardenedBSD patches, so I must to inspect that > > this is caused by hbsd's changes or your. > > > > I don't see this problem on HardenedBSD build, which built on Oct. 23: > > [1] FreeBSD 11.0-CURRENT #0 0c61f55(hardened/current/master): Thu Oct > > 23 09:04:50 CEST 2014 > > [1] op@hardenedbsd:/usr/obj/usr/src/sys/HARDENEDBSD amd64 > > > > (currently I build a new kernel, which was based before the fueword changes) > > > > If you need help, please ping me. > > gdb vim > > r ... > > "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c" > 286L, 8326C(no debugging symbols found)...(no debugging symbols > found)...(no debugging symbols found)...(no debugging symbols > found)...(no debugging symbols found)...(no debugging symbols > found)...(no debugging symbols found)...(no debugging symbols > found)...(no debugging symbols found)...(no debugging symbols > found)...(no debugging symbols found)...(no debugging symbols > found)...(no debugging symbols found)...(no debugging symbols > found)...(no debugging symbols found)...(no debugging symbols > found)...(no debugging symbols found)...sem_wait: Operation not > supported > > > sem_wait: Operation not supported > Fatal Python error: PyEval_SaveThread: NULL tstate > > Program received signal SIGABRT, Aborted. > 0x00000009f5bb387a in thr_kill () from /lib/libc.so.7 > (gdb) bt > #0 0x00000009f5bb387a in thr_kill () from /lib/libc.so.7 > #1 0x00000009f5c76849 in abort () from /lib/libc.so.7 > #2 0x00000009f566c031 in Py_FatalError () from /usr/local/lib/libpython2.7.so.1 > #3 0x00000009f56448f1 in PyEval_SaveThread () from > /usr/local/lib/libpython2.7.so.1 > #4 0x00000009f79ceef5 in _PyTime_FloatTime () from > /usr/local/lib/python2.7/lib-dynload/time.so > #5 0x00000009f564a31b in PyEval_EvalFrameEx () from > /usr/local/lib/libpython2.7.so.1 > #6 0x00000009f564cb42 in _PyEval_SliceIndex () from > /usr/local/lib/libpython2.7.so.1 > #7 0x00000009f564862b in PyEval_EvalFrameEx () from > /usr/local/lib/libpython2.7.so.1 > #8 0x00000009f564cb42 in _PyEval_SliceIndex () from > /usr/local/lib/libpython2.7.so.1 > #9 0x00000009f564862b in PyEval_EvalFrameEx () from > /usr/local/lib/libpython2.7.so.1 > #10 0x00000009f56452d4 in PyEval_EvalCodeEx () from > /usr/local/lib/libpython2.7.so.1 > #11 0x00000009f55d63bc in PyFunction_SetClosure () from > /usr/local/lib/libpython2.7.so.1 > #12 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1 > #13 0x00000009f55becc3 in PyMethod_New () from /usr/local/lib/libpython2.7.so.1 > #14 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1 > #15 0x00000009f564c28d in PyEval_CallObjectWithKeywords () from > /usr/local/lib/libpython2.7.so.1 > #16 0x00000009f5681916 in initthread () from /usr/local/lib/libpython2.7.so.1 > #17 0x00000009f59274f5 in pthread_create () from /lib/libthr.so.3 > #18 0x0000000000000000 in ?? () > How could I get a single bit of useful information from this text ? My guess is that you have old libc and new kernel compiled without COMPAT_FREEBSD9 and 10. If this is the cause, it has nothing to do with my changes. From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 18:10:51 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1A144BBE for ; Wed, 29 Oct 2014 18:10:51 +0000 (UTC) Received: from mail-yh0-f41.google.com (mail-yh0-f41.google.com [209.85.213.41]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CF207FA0 for ; Wed, 29 Oct 2014 18:10:50 +0000 (UTC) Received: by mail-yh0-f41.google.com with SMTP id b6so844656yha.14 for ; Wed, 29 Oct 2014 11:10:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=nIKdmnOOMwfqUrS20ZGqYgtwQh6nWRLbukuilCiwNTU=; b=Inwj6hls8aRb0z5VtFP0vBnuOz+qO+dUZp5UlVEiiLUxaY1wuqDaJzym1VBRWIhib3 6Ny3IzGFFtkUNAJ1+GVgiKjLzbHutqHFiEk41pjINDTI5p9jGYZ1sJ2UdHFP6ntOue57 1mqGARiZwN6AnhI19zLOcnHGUy983Pt7AnoHT0vgrXFBBwXS79xtWuUS3UpmwrCZItA0 gFRR85blE79vOSmnhObjKDBi24gal733ChhaEaXuFSuvdWI0auJocmxjQ9Sjjw/ahHmO 0dN3sEDNdv6T6/MF3jgpyVEfOvkvFgTHuhVfQ21GBl99lzUlWs20o7V9KSAMEcGe0bdK tjfA== X-Gm-Message-State: ALoCoQlIreRa3mffVT0Uz+qArCzNpPmGLzQ0R1bxmvnZCdtrOxrU/IWiDLGkSIeAE0fHAuUxr7xg MIME-Version: 1.0 X-Received: by 10.170.233.6 with SMTP id z6mr3559601ykf.101.1414606244246; Wed, 29 Oct 2014 11:10:44 -0700 (PDT) Received: by 10.170.46.203 with HTTP; Wed, 29 Oct 2014 11:10:44 -0700 (PDT) X-Originating-IP: [62.165.198.134] In-Reply-To: <20141029180635.GJ53947@kib.kiev.ua> References: <20141021094539.GA1877@kib.kiev.ua> <2048849.GkvWliFbyg@ralph.baldwin.cx> <20141027165557.GC1877@kib.kiev.ua> <201410281146.49370.jhb@freebsd.org> <20141029180635.GJ53947@kib.kiev.ua> Date: Wed, 29 Oct 2014 19:10:44 +0100 Message-ID: Subject: Re: RfC: fueword(9) and casueword(9) From: Oliver Pinter To: Konstantin Belousov Content-Type: text/plain; charset=UTF-8 Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 18:10:51 -0000 On Wed, Oct 29, 2014 at 7:06 PM, Konstantin Belousov wrote: > On Wed, Oct 29, 2014 at 06:54:00PM +0100, Oliver Pinter wrote: >> On Wed, Oct 29, 2014 at 6:50 PM, Oliver Pinter >> wrote: >> > On Tue, Oct 28, 2014 at 4:46 PM, John Baldwin wrote: >> >> On Monday, October 27, 2014 12:55:57 pm Konstantin Belousov wrote: >> >>> On Mon, Oct 27, 2014 at 11:17:51AM -0400, John Baldwin wrote: >> >>> > On Tuesday, October 21, 2014 07:23:06 PM Konstantin Belousov wrote: >> >>> > > On Wed, Oct 22, 2014 at 01:41:12AM +1100, Bruce Evans wrote: >> >>> > > > A new API should try to fix these __DEVOLATILE() abominations. I think it >> >>> > > > is safe, and even correct, to declare the pointers as volatile const void >> >>> > > > *, since the functions really can handle volatile data, unlike copyin(). >> >>> > > > >> >>> > > > Atomic op functions are declared as taking pointers to volatile for >> >>> > > > similar reasons. Often they are applied to non-volatile data, but >> >>> > > > adding a qualifier is type-safe and doesn't cost efficiency since the >> >>> > > > pointer access is is not known to the compiler. (The last point is not >> >>> > > > so clear -- the compiler can see things in the functions since they are >> >>> > > > inline asm. fueword() isn't inline so its (in)efficiency is not changed.) >> >>> > > > >> >>> > > > The atomic read functions are not declared as taking pointers to const. >> >>> > > > The __DECONST() abomination might be used to work around this bug. >> >>> > > >> >>> > > I prefer to not complicate the fetch(9) KPI due to the mistakes in the >> >>> > > umtx structures definitions. I think that it is bug to mark the lock >> >>> > > words with volatile. I want the fueword(9) interface to be as much >> >>> > > similar to fuword(9), in particular, volatile seems to be not needed. >> >>> > >> >>> > I agree with Bruce here. casuword() already accepts volatile. I also >> >>> > think umtx is correct in marking the field as volatile. They are subject >> >>> > to change without the compiler's knowledge albeit by other threads >> >>> > rather than signal handlers. Having them marked volatile doesn't really >> >>> > matter for the kernel, but the header is also used in userland and is >> >>> > relevant in sem_new.c, etc. >> >>> >> >>> You agree with making fueword() accept volatile const void * as the >> >>> address ? Or do you agree with the existence of the volatile type >> >>> qualifier for the lock field of umtx structures ? >> >> >> >> I agree with both (I thought Bruce only asserted the first). >> >> >> >>> I definitely do not want to make fueword() different from fuword() in >> >>> this aspect. If changing both fueword() and fuword() to take volatile >> >>> const * address, this should be different patch. >> >> >> >> I also agree that fuword() and fueword() should take identical arguments, >> >> so if this change is made it should be a separate patch (and should include >> >> suword()). >> >> >> >> -- >> >> John Baldwin >> > >> > Hi Konstantin! >> > >> > I got this error with clang_complete + vim: >> > >> > "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c" >> > 286L, 8326Csem_wait: Operation not supported >> > >> > sem_wait: Operation not >> > supported >> > >> > >> > Fatal Python error: PyEval_SaveThread: NULL tstate >> > Vim: Caught deadly signal ABRT >> > Vim: Finished. >> > Abort (core dumped) >> > >> > It's on recent HEAD + HardenedBSD patches, so I must to inspect that >> > this is caused by hbsd's changes or your. >> > >> > I don't see this problem on HardenedBSD build, which built on Oct. 23: >> > [1] FreeBSD 11.0-CURRENT #0 0c61f55(hardened/current/master): Thu Oct >> > 23 09:04:50 CEST 2014 >> > [1] op@hardenedbsd:/usr/obj/usr/src/sys/HARDENEDBSD amd64 >> > >> > (currently I build a new kernel, which was based before the fueword changes) >> > >> > If you need help, please ping me. >> >> gdb vim >> >> r ... >> >> "/usr/data/source/git/opBSD/hardenedBSD.git.opntr/sys/kern/kern_pax.c" >> 286L, 8326C(no debugging symbols found)...(no debugging symbols >> found)...(no debugging symbols found)...(no debugging symbols >> found)...(no debugging symbols found)...(no debugging symbols >> found)...(no debugging symbols found)...(no debugging symbols >> found)...(no debugging symbols found)...(no debugging symbols >> found)...(no debugging symbols found)...(no debugging symbols >> found)...(no debugging symbols found)...(no debugging symbols >> found)...(no debugging symbols found)...(no debugging symbols >> found)...(no debugging symbols found)...sem_wait: Operation not >> supported >> >> >> sem_wait: Operation not supported >> Fatal Python error: PyEval_SaveThread: NULL tstate >> >> Program received signal SIGABRT, Aborted. >> 0x00000009f5bb387a in thr_kill () from /lib/libc.so.7 >> (gdb) bt >> #0 0x00000009f5bb387a in thr_kill () from /lib/libc.so.7 >> #1 0x00000009f5c76849 in abort () from /lib/libc.so.7 >> #2 0x00000009f566c031 in Py_FatalError () from /usr/local/lib/libpython2.7.so.1 >> #3 0x00000009f56448f1 in PyEval_SaveThread () from >> /usr/local/lib/libpython2.7.so.1 >> #4 0x00000009f79ceef5 in _PyTime_FloatTime () from >> /usr/local/lib/python2.7/lib-dynload/time.so >> #5 0x00000009f564a31b in PyEval_EvalFrameEx () from >> /usr/local/lib/libpython2.7.so.1 >> #6 0x00000009f564cb42 in _PyEval_SliceIndex () from >> /usr/local/lib/libpython2.7.so.1 >> #7 0x00000009f564862b in PyEval_EvalFrameEx () from >> /usr/local/lib/libpython2.7.so.1 >> #8 0x00000009f564cb42 in _PyEval_SliceIndex () from >> /usr/local/lib/libpython2.7.so.1 >> #9 0x00000009f564862b in PyEval_EvalFrameEx () from >> /usr/local/lib/libpython2.7.so.1 >> #10 0x00000009f56452d4 in PyEval_EvalCodeEx () from >> /usr/local/lib/libpython2.7.so.1 >> #11 0x00000009f55d63bc in PyFunction_SetClosure () from >> /usr/local/lib/libpython2.7.so.1 >> #12 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1 >> #13 0x00000009f55becc3 in PyMethod_New () from /usr/local/lib/libpython2.7.so.1 >> #14 0x00000009f55b2d24 in PyObject_Call () from /usr/local/lib/libpython2.7.so.1 >> #15 0x00000009f564c28d in PyEval_CallObjectWithKeywords () from >> /usr/local/lib/libpython2.7.so.1 >> #16 0x00000009f5681916 in initthread () from /usr/local/lib/libpython2.7.so.1 >> #17 0x00000009f59274f5 in pthread_create () from /lib/libthr.so.3 >> #18 0x0000000000000000 in ?? () >> > > How could I get a single bit of useful information from this text ? > > My guess is that you have old libc and new kernel compiled without > COMPAT_FREEBSD9 and 10. If this is the cause, it has nothing to > do with my changes. Sure. The userland is from Oct. 20 too, and COMPAT_FREEBSD{9,10} was not added to kernel config. Thanks! From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 18:14:15 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 30553CE8; Wed, 29 Oct 2014 18:14:15 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 07832FCC; Wed, 29 Oct 2014 18:14:15 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1A93AB923; Wed, 29 Oct 2014 14:14:14 -0400 (EDT) From: John Baldwin To: Ian Lepore Subject: Re: atomic ops Date: Wed, 29 Oct 2014 14:13:18 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141028025222.GA19223@dft-labs.eu> <201410291335.57919.jhb@freebsd.org> <1414605830.17308.100.camel@revolution.hippie.lan> In-Reply-To: <1414605830.17308.100.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201410291413.18858.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 29 Oct 2014 14:14:14 -0400 (EDT) Cc: Adrian Chadd , Mateusz Guzik , Alan Cox , Andrew Turner , attilio@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 18:14:15 -0000 On Wednesday, October 29, 2014 2:03:50 pm Ian Lepore wrote: > I hadn't realized it when I wrote that, but Andy was speaking in the > context of armv8, which has a true load-acquire instruction. In our > current code (armv6 and 7) we need the explicit dmb/dsb barriers to get > the same effect. (It turns out we do have barriers, I misspoke earlier, > but some of our dmb need to be dsb.) Ah, ok. Fair enough. :) -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 18:23:47 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A2A2F114; Wed, 29 Oct 2014 18:23:47 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2D399168; Wed, 29 Oct 2014 18:23:47 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s9TINdq9026613 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Oct 2014 20:23:39 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s9TINdq9026613 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s9TINdql026612; Wed, 29 Oct 2014 20:23:39 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 29 Oct 2014 20:23:39 +0200 From: Konstantin Belousov To: Oliver Pinter Subject: Re: RfC: fueword(9) and casueword(9) Message-ID: <20141029182339.GK53947@kib.kiev.ua> References: <20141021094539.GA1877@kib.kiev.ua> <2048849.GkvWliFbyg@ralph.baldwin.cx> <20141027165557.GC1877@kib.kiev.ua> <201410281146.49370.jhb@freebsd.org> <20141029180635.GJ53947@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 18:23:47 -0000 On Wed, Oct 29, 2014 at 07:10:44PM +0100, Oliver Pinter wrote: > On Wed, Oct 29, 2014 at 7:06 PM, Konstantin Belousov > wrote: > > How could I get a single bit of useful information from this text ? > > > > My guess is that you have old libc and new kernel compiled without > > COMPAT_FREEBSD9 and 10. If this is the cause, it has nothing to > > do with my changes. > > Sure. The userland is from Oct. 20 too, and COMPAT_FREEBSD{9,10} was > not added to kernel config. So again. Did adding COMPAT_FREEBSD9 solved the issue ? From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 19:05:07 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0C21695A; Wed, 29 Oct 2014 19:05:07 +0000 (UTC) Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com [IPv6:2a00:1450:400c:c05::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 27AA97C9; Wed, 29 Oct 2014 19:05:06 +0000 (UTC) Received: by mail-wi0-f175.google.com with SMTP id ex7so2594192wid.2 for ; Wed, 29 Oct 2014 12:05:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=EfEBLIfl4bpXq56QWoRctyt1C/8hgDLUw9AuS5DDoq0=; b=A/l1/XTnIvNRPsC59hxvUuJWo/P3d9KmX5fSMRlPo3yh1SDyjl7CPPozx691jtpebh jcZaEYfgZTK3Akq6GRZMX5YQ+5rbzkGAkhaofvqGLCsS8aOROssvxllUz1tOZiCKE+Pb Vhy6Lwh9jSzo96UWoQraNxmaYwQFc8klfgRdp1m/njnKG/GCvJLx4XPtaVVsVMRMzZxh TixRmqJQaas4kRJwdk0B8sSa0QgkgoATbggEW2QTEK/HQraXOcdXtPiO59QtzRb7FNwJ 3cll3iL3rIVPYmPPb2UTqvIzfQtRF0uY7/kZN800SqwExghtd+yYDkD4/bmcl6kxFU3v dGBQ== X-Received: by 10.180.21.140 with SMTP id v12mr38671171wie.44.1414609503373; Wed, 29 Oct 2014 12:05:03 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id rx8sm1582962wjb.30.2014.10.29.12.05.01 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Wed, 29 Oct 2014 12:05:02 -0700 (PDT) Date: Wed, 29 Oct 2014 20:04:59 +0100 From: Mateusz Guzik To: Attilio Rao Subject: Re: atomic ops Message-ID: <20141029190459.GA25368@dft-labs.eu> References: <20141028025222.GA19223@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Adrian Chadd , Alan Cox , Konstantin Belousov , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 19:05:07 -0000 On Tue, Oct 28, 2014 at 02:18:41PM +0100, Attilio Rao wrote: > On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik wrote: > > As was mentioned sometime ago, our situation related to atomic ops is > > not ideal. > > > > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide > > full memory barriers, which is stronger than needed. > > > > Moreover, load is implemented as lock cmpchg on var address, so it is > > addditionally slower especially when cpus compete. > > I already explained this once privately: fully memory barriers is not > stronger than needed. > FreeBSD has a different semantic than Linux. We historically enforce a > full barrier on _acq() and _rel() rather then just a read and write > barrier, hence we need a different implementation than Linux. > There is code that relies on this property, like the locking > primitives (release a mutex, for instance). > I mean stronger than needed in some cases, popular one is fget_unlocked and we provide no "lightest sufficient" barrier (which would also be cheaper). Other case which benefits greatly is sys/sys/seq.h. As noted in some other thread, using load_acq as it is destroys performance. I don't dispute the need for full barriers, although it is unclear what current consumers of load_acq actually need a full barrier.. > In short: optimizing the implementation for performance is fine and > due. Changing the semantic is not fine, unless you have reviewed and > fixed all the uses of _rel() and _acq(). > > > On amd64 it is sufficient to place a compiler barrier in such cases. > > > > Next, we lack some atomic ops in the first place. > > > > Let's define some useful terms: > > smp_wmb - no writes can be reordered past this point > > smp_rmb - no reads can be reordered past this point > > > > With this in mind, we lack ops which would guarantee only the following: > > > > 1. var = tmp; smp_wmb(); > > 2. tmp = var; smp_rmb(); > > 3. smp_rmb(); tmp = var; > > > > This matters since what we can use already to emulate this is way > > heavier than needed on aforementioned amd64 and most likely other archs. > > I can see the value of such barriers in case you want to just > synchronize operation regards read or writes. > I also believe that on newest intel processors (for which we should > optimize) rmb() and wmb() got significantly faster than mb(). However > the most interesting case would be for arm and mips, I assume. That's > where you would see a bigger perf difference if you optimize the > membar paths. > > Last time I looked into it, in FreeBSD kernel the Linux-ish > rmb()/wmb()/etc. were used primilarly in 3 places: Linux-derived code, > handling of 16-bits operand and implementation of "faster" bus > barriers. > Initially I had thought about just confining the smp_*() in a Linux > compat layer and fix the other 2 in this way: for 16-bits operands > just pad to 32-bits, as the C11 standard also does. For the bus > barriers, just grow more versions to actually include the rmb()/wmb() > scheme within. > > At this point, I understand we may want to instead support the > concept of write-only or read-only barrier. This means that if we want > to keep the concept tied to the current _acq()/_rel() scheme we will > end up with a KPI explosion. > > I'm not the one making the call here, but for a faster and more > granluar approach, possibly we can end up using smp_rmb() and > smp_wmb() directly. As I said I'm not the one making the call. > Well, I don't know original motivation for expressing stuff with _load_acq and _store_rel. Anyway, maybe we could do something along (expressing intent, not actual code): mb_producer_start(p, v) { *p = v; smp_wmb(); } mb_producer(p, v) { smp_wmb(); *p = v; } mb_producer_end(p, v) { mb_producer(p, v); } type mb_consumer(p) { var = *p; smp_rmb(); return (var); } type mb_consumer_start(p) { return (mb_consumer(p)); } type mb_consumer_end(p) { smp_rmb(); return (*p); } -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Wed Oct 29 19:13:36 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 81E25B19 for ; Wed, 29 Oct 2014 19:13:36 +0000 (UTC) Received: from mail-yk0-f175.google.com (mail-yk0-f175.google.com [209.85.160.175]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 424318B1 for ; Wed, 29 Oct 2014 19:13:35 +0000 (UTC) Received: by mail-yk0-f175.google.com with SMTP id q9so1592043ykb.6 for ; Wed, 29 Oct 2014 12:13:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=8PM+euvGxgSRsVB/43XwSTzWjL/DGLVwkdU5UehBNVM=; b=ea9AQnnjjpPlXsoBPJ+LuG7/rlbKXDKWPu5JZ1UrevmdiZKA3RdwylMwV4SA7Jntl8 XICkpri11LHl1qu8fV8J5UJQwnm9qne1tb6amK2AICo3vQSF2nGjnwQZYr7PW0Dpi6Uk yS/I8qCntTJw8z0epJLOD+/YTnL9bis+qINdh7g7heZ42TQhR6jH30EEOKfjC5rS6xsI Ca6SrzkeyGzA4f/pV6IpH9ykIXiihAL9u0mjDbQapuo7FWAUXQnq8CrcSx+rXAr6YjVr ZgPaVau4PBxwsAVkhhpr4tJBDTF/n3Y6hxMBvX5jGxhk9SzWIghXLFVGUY4GVrlOI+7/ vlEg== X-Gm-Message-State: ALoCoQlvIeQEbIxTVIhNHlGGusntIsSLcLTsmFQoNJBqo5g1//ZeMH4iuuos9UHP/MIwrogYxK5z MIME-Version: 1.0 X-Received: by 10.170.223.84 with SMTP id p81mr3682025ykf.110.1414609710459; Wed, 29 Oct 2014 12:08:30 -0700 (PDT) Received: by 10.170.46.203 with HTTP; Wed, 29 Oct 2014 12:08:30 -0700 (PDT) X-Originating-IP: [62.165.198.134] In-Reply-To: <20141029182339.GK53947@kib.kiev.ua> References: <20141021094539.GA1877@kib.kiev.ua> <2048849.GkvWliFbyg@ralph.baldwin.cx> <20141027165557.GC1877@kib.kiev.ua> <201410281146.49370.jhb@freebsd.org> <20141029180635.GJ53947@kib.kiev.ua> <20141029182339.GK53947@kib.kiev.ua> Date: Wed, 29 Oct 2014 20:08:30 +0100 Message-ID: Subject: Re: RfC: fueword(9) and casueword(9) From: Oliver Pinter To: Konstantin Belousov Content-Type: text/plain; charset=UTF-8 Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 19:13:36 -0000 On Wed, Oct 29, 2014 at 7:23 PM, Konstantin Belousov wrote: > On Wed, Oct 29, 2014 at 07:10:44PM +0100, Oliver Pinter wrote: >> On Wed, Oct 29, 2014 at 7:06 PM, Konstantin Belousov >> wrote: >> > How could I get a single bit of useful information from this text ? >> > >> > My guess is that you have old libc and new kernel compiled without >> > COMPAT_FREEBSD9 and 10. If this is the cause, it has nothing to >> > do with my changes. >> >> Sure. The userland is from Oct. 20 too, and COMPAT_FREEBSD{9,10} was >> not added to kernel config. > > So again. Did adding COMPAT_FREEBSD9 solved the issue ? I added both COMPAT_FREEBSD9 and COMPAT_FREEBSD10, and the problem fixed. Thanks! From owner-freebsd-arch@FreeBSD.ORG Thu Oct 30 18:10:55 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B67321ED; Thu, 30 Oct 2014 18:10:55 +0000 (UTC) Received: from nibbler.fubar.geek.nz (nibbler.fubar.geek.nz [199.48.134.198]) by mx1.freebsd.org (Postfix) with ESMTP id 9684BE14; Thu, 30 Oct 2014 18:10:55 +0000 (UTC) Received: from bender.lan (97e078e7.skybroadband.com [151.224.120.231]) by nibbler.fubar.geek.nz (Postfix) with ESMTPSA id 95F815CC08; Thu, 30 Oct 2014 18:10:53 +0000 (UTC) Date: Thu, 30 Oct 2014 18:10:48 +0000 From: Andrew Turner To: John Baldwin Subject: Re: atomic ops Message-ID: <20141030181048.4cbeeec6@bender.lan> In-Reply-To: <201410291335.57919.jhb@freebsd.org> References: <20141028025222.GA19223@dft-labs.eu> <201410291059.16829.jhb@freebsd.org> <1414601895.17308.89.camel@revolution.hippie.lan> <201410291335.57919.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Adrian Chadd , Mateusz Guzik , Ian Lepore , Alan Cox , attilio@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Oct 2014 18:10:55 -0000 On Wed, 29 Oct 2014 13:35:57 -0400 John Baldwin wrote: > On Wednesday, October 29, 2014 12:58:15 pm Ian Lepore wrote: > > Next, when we consider 'Access A' I'm not sure it's true that the > > access will replay if the store-exclusive fails and the operation > > loops. The access to A may have been a prefetch, even a prefetch > > for data on a predicted upcoming execution branch which may or may > > not end up being taken. > > > > I think the only think that makes an ldrex/strex sequence safe for > > use in implementing synchronization primitives is to insert a 'dmb' > > after the acquire loop (after the strex succeeds), and 'dsb' before > > the release loop (dsb is required for SMP, dmb might be good enough > > on UP). > > > > Looking into this has made me realize our current armv6/7 atomics > > are incorrect in this regard. Guess I'll see about fixing them up > > Real Soon Now. :) > > I'm not actually sure either, but it would be surprising to me > otherwise. Presumably there is nothing magic about a branch. Either > the load-acquire is an acquire barrier or it isn't. Namely, suppose > you had this sequence: > > load-acquire P > access A (prefetch) > load-acquire Q > load A > > Would you expect the prefetch to satisfy the load or should the > load-acquire on Q discard that? Having a branch after a failing > conditional store back to the load acquire should work similarly. It > has to discard anything that was prefetched or it isn't an actual > load-acquire. I have checked with someone in ARM. The prefetch should not be considered an access with regard to the barrier and it could be moved before it as it will only load data into the cache. The barrier only deals with loading data into the core, i.e. if it has was part of the prefetch it will be loaded from the cache no earlier than the load-acquire. The cache coherency protocol ensures the data will be up to date while the barrier will ensure the ordering of the load of A. In the above example the prefetch of A will not be thrown away but the data in the cache may change between the prefetch and load A if another core has written to A. If this is the case the load will be of the new data. Andrew From owner-freebsd-arch@FreeBSD.ORG Thu Oct 30 19:05:47 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2B4F8E28; Thu, 30 Oct 2014 19:05:47 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 018BD619; Thu, 30 Oct 2014 19:05:47 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 472A0B923; Thu, 30 Oct 2014 15:05:45 -0400 (EDT) From: John Baldwin To: Andrew Turner Subject: Re: atomic ops Date: Thu, 30 Oct 2014 15:03:13 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141028025222.GA19223@dft-labs.eu> <201410291335.57919.jhb@freebsd.org> <20141030181048.4cbeeec6@bender.lan> In-Reply-To: <20141030181048.4cbeeec6@bender.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201410301503.14225.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 30 Oct 2014 15:05:45 -0400 (EDT) Cc: Adrian Chadd , Mateusz Guzik , Ian Lepore , Alan Cox , attilio@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Oct 2014 19:05:47 -0000 On Thursday, October 30, 2014 2:10:48 pm Andrew Turner wrote: > On Wed, 29 Oct 2014 13:35:57 -0400 > John Baldwin wrote: > > On Wednesday, October 29, 2014 12:58:15 pm Ian Lepore wrote: > > > Next, when we consider 'Access A' I'm not sure it's true that the > > > access will replay if the store-exclusive fails and the operation > > > loops. The access to A may have been a prefetch, even a prefetch > > > for data on a predicted upcoming execution branch which may or may > > > not end up being taken. > > > > > > I think the only think that makes an ldrex/strex sequence safe for > > > use in implementing synchronization primitives is to insert a 'dmb' > > > after the acquire loop (after the strex succeeds), and 'dsb' before > > > the release loop (dsb is required for SMP, dmb might be good enough > > > on UP). > > > > > > Looking into this has made me realize our current armv6/7 atomics > > > are incorrect in this regard. Guess I'll see about fixing them up > > > Real Soon Now. :) > > > > I'm not actually sure either, but it would be surprising to me > > otherwise. Presumably there is nothing magic about a branch. Either > > the load-acquire is an acquire barrier or it isn't. Namely, suppose > > you had this sequence: > > > > load-acquire P > > access A (prefetch) > > load-acquire Q > > load A > > > > Would you expect the prefetch to satisfy the load or should the > > load-acquire on Q discard that? Having a branch after a failing > > conditional store back to the load acquire should work similarly. It > > has to discard anything that was prefetched or it isn't an actual > > load-acquire. > > I have checked with someone in ARM. The prefetch should not be > considered an access with regard to the barrier and it could be moved > before it as it will only load data into the cache. The barrier only > deals with loading data into the core, i.e. if it has was part of the > prefetch it will be loaded from the cache no earlier than the > load-acquire. The cache coherency protocol ensures the data will be up > to date while the barrier will ensure the ordering of the load of A. > > In the above example the prefetch of A will not be thrown away but the > data in the cache may change between the prefetch and load A if another > core has written to A. If this is the case the load will be of the new > data. That is sufficient for what atomic(9)'s _acq wants, yes. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Fri Oct 31 19:12:19 2014 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A49EBA86; Fri, 31 Oct 2014 19:12:19 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7D19F6A; Fri, 31 Oct 2014 19:12:19 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9VJCCR0042606 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 31 Oct 2014 12:12:13 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9VJCCN5042605; Fri, 31 Oct 2014 12:12:12 -0700 (PDT) (envelope-from jmg) Date: Fri, 31 Oct 2014 12:12:12 -0700 From: John-Mark Gurney To: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: any reason not to enable IPDIVERT for ipfw module? Message-ID: <20141031191212.GO8852@funkthat.com> Mail-Followup-To: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Fri, 31 Oct 2014 12:12:13 -0700 (PDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2014 19:12:19 -0000 Can any one think of a good reason not to enable IPDIVERT sockets in the ipfw module? And possibly enabling default to accept? That way you don't have to go to the console when you load the ipfw module because you forgot to auto add the accept all rule? :) something like: ==== //depot/projects/opencrypto/sys/modules/ipfw/Makefile#3 - /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile ==== --- /tmp/tmp.15774.16 2014-10-31 12:11:56.000000000 -0700 +++ /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile 2014-10-31 12:11:54.000000000 -0700 @@ -16,7 +16,10 @@ #CFLAGS+= -DIPFIREWALL_VERBOSE_LIMIT=100 # #If you want it to pass all packets by default -#CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT +CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT +# +#If you want divert sockets +CFLAGS+= -DIPDIVERT # .include -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Fri Oct 31 19:14:30 2014 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C4F25D9C; Fri, 31 Oct 2014 19:14:30 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8B077C4; Fri, 31 Oct 2014 19:14:30 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s9VJETXQ042646 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 31 Oct 2014 12:14:29 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s9VJETUc042645; Fri, 31 Oct 2014 12:14:29 -0700 (PDT) (envelope-from jmg) Date: Fri, 31 Oct 2014 12:14:28 -0700 From: John-Mark Gurney To: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: any reason not to enable IPDIVERT for ipfw module? Message-ID: <20141031191428.GP8852@funkthat.com> Mail-Followup-To: freebsd-net@FreeBSD.org, freebsd-arch@FreeBSD.org References: <20141031191212.GO8852@funkthat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141031191212.GO8852@funkthat.com> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Fri, 31 Oct 2014 12:14:29 -0700 (PDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2014 19:14:30 -0000 John-Mark Gurney wrote this message on Fri, Oct 31, 2014 at 12:12 -0700: > Can any one think of a good reason not to enable IPDIVERT sockets in > the ipfw module? sorry, ignore this... didn't realize ipdivert was loadable as a separate module, ipdivert... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Fri Oct 31 23:35:08 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 96593820; Fri, 31 Oct 2014 23:35:08 +0000 (UTC) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0866B5F1; Fri, 31 Oct 2014 23:35:07 +0000 (UTC) Received: by mail-wi0-f171.google.com with SMTP id q5so2536856wiv.4 for ; Fri, 31 Oct 2014 16:35:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=3QiSI2ci/GVxQj4qve93FYPyQGi8c4kwaw04rrnf0tw=; b=CSdefo+85GlExOt+/G1I0Fp6ZyIGaSmNxektAv6VZvVzwWudXVHmM+DawlI5stwOaV HX5UjgSIB1zQzRQNmyRRXd+iRTfsd8PPi7upAyuDguVNLDDY6rnetBuf9P5lVCzot1JZ JPqvGehdI+MI+f9/JnJNeU5YnBpiO85/tbAxY0Vtu9M3OWCVcPRgYJYwjwHASadwRB/Q Jx5ivRl4PE+SZMAdizrXabOspbRC45XrGT9y49MNI2+d8RlCXpnV9XUXKjeGxt6anncg xsl092peSIIq/BnhcriZRqVbf/ep4qqR6Ie1jJ2rG9ZUHw8D9COwbjcyxoydn8XvJzVO vVzw== X-Received: by 10.194.62.226 with SMTP id b2mr26950561wjs.46.1414798505978; Fri, 31 Oct 2014 16:35:05 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id da3sm13654686wjb.12.2014.10.31.16.35.04 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 31 Oct 2014 16:35:05 -0700 (PDT) Date: Sat, 1 Nov 2014 00:35:02 +0100 From: Mateusz Guzik To: John Baldwin Subject: Re: refcount_release_take_##lock Message-ID: <20141031233502.GB20591@dft-labs.eu> References: <20141025184448.GA19066@dft-labs.eu> <201410281154.54581.jhb@freebsd.org> <20141028174428.GA12014@dft-labs.eu> <201410281413.58414.jhb@freebsd.org> <20141028193404.GB12014@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20141028193404.GB12014@dft-labs.eu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: John-Mark Gurney , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2014 23:35:08 -0000 On Tue, Oct 28, 2014 at 08:34:04PM +0100, Mateusz Guzik wrote: > On Tue, Oct 28, 2014 at 02:13:58PM -0400, John Baldwin wrote: > > On Tuesday, October 28, 2014 1:44:28 pm Mateusz Guzik wrote: > > > diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c > > > index f8ae0e6..e94ccde 100644 > > > --- a/sys/kern/kern_jail.c > > > +++ b/sys/kern/kern_jail.c > > > > The diff looks good to me. Just need to update refcount.9 as well. > > > Ping? Is this diff ok? > diff --git a/share/man/man9/refcount.9 b/share/man/man9/refcount.9 > index e7702a2..61b9b51 100644 > --- a/share/man/man9/refcount.9 > +++ b/share/man/man9/refcount.9 > @@ -26,7 +26,7 @@ > .\" > .\" $FreeBSD$ > .\" > -.Dd January 20, 2009 > +.Dd October 28, 2014 > .Dt REFCOUNT 9 > .Os > .Sh NAME > @@ -44,6 +44,15 @@ > .Fn refcount_acquire "volatile u_int *count" > .Ft int > .Fn refcount_release "volatile u_int *count" > +.In sys/mutex.h > +.Fn refcount_release_lock_mtx "volatile u_int *count, struct mtx *lock" > +.In sys/rmlock.h > +.Fn refcount_release_lock_rmlock "volatile u_int *count, struct rmlock *lock" > +.In sys/rwlock.h > +.Fn refcount_release_lock_rwlock "volatile u_int *count, struct rwlock *lock" > +.In sys/lock.h > +.In sys/sx.h > +.Fn refcount_release_lock_sx "volatile u_int *count, struct sx *lock" > .Sh DESCRIPTION > The > .Nm > @@ -77,6 +86,13 @@ The function returns a non-zero value if the reference being released was > the last reference; > otherwise, it returns zero. > .Pp > +.Fn refcount_release_lock_* > +functions release an existing reference holding the lock if it is the last > +reference. > +These functions return with the lock held and a non-zero value if the reference > +being released was the last reference; > +otherwise, they returns zero and the lock is not held. > +.Pp > Note that these routines do not provide any inter-CPU synchronization, > data protection, > or memory ordering guarantees except for managing the counter. > @@ -91,6 +107,18 @@ The > .Nm refcount_release > function returns non-zero when releasing the last reference and zero when > releasing any other reference. > +.Pp > +.Nm refcount_release_lock_* > +functions return with the lock held and non-zero value when releasing the last > +reference, zero without the lock held when releasing any other reference. > .Sh HISTORY > -These functions were introduced in > +.Fn refcount_init , > +.Fn refcount_acquire > +and > +.Fn refcount_release > +functions were introduced in > .Fx 6.0 . > +.Pp > +.Fn refcount_release_lock_* > +functions were introduced in > +.Fx 10.2 . > > -- > Mateusz Guzik -- Mateusz Guzik From owner-freebsd-arch@FreeBSD.ORG Sat Nov 1 01:28:29 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 36C69C7; Sat, 1 Nov 2014 01:28:29 +0000 (UTC) Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com [IPv6:2607:f8b0:4003:c06::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EAD00275; Sat, 1 Nov 2014 01:28:28 +0000 (UTC) Received: by mail-oi0-f50.google.com with SMTP id v63so3124019oia.9 for ; Fri, 31 Oct 2014 18:28:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=d47zLIZrH06oXHgyEodvKkAFAAFr7Q0KN9R8xb08vAg=; b=ynU8wbfanR04lpBx8Og1HZVAnVegF+1rspxtPWotqZO9F4+wjzWE+8P0mItAZdslfX uTfsr5bEssBv4WmpH6fi2XdaHtnX+jBZuZFUaMgOCPi2y+mMiLdOFchP+lQoApCoT27s 8T6QGC2mOmOR5mQv3rmXeZNWfMP3ab4hHtkT2Vq/cGv3Lx0Mbai4hh5n3RHCauGkRZ3U SMqug+ugEDUdUBtriCwUrXOmTXrT6MA0N94VYmjt4f6jYeX/NNjXu4sbG4V0XQvGo85E cIMKe4E/S4R2HKisXSqAo01WEeVALr/QvawPkPHlf0FFOfR+jiqQN77qKTijksjKpwjl 9J1A== MIME-Version: 1.0 X-Received: by 10.182.18.104 with SMTP id v8mr22769616obd.3.1414805308246; Fri, 31 Oct 2014 18:28:28 -0700 (PDT) Received: by 10.202.104.39 with HTTP; Fri, 31 Oct 2014 18:28:28 -0700 (PDT) Received: by 10.202.104.39 with HTTP; Fri, 31 Oct 2014 18:28:28 -0700 (PDT) In-Reply-To: <20141031191212.GO8852@funkthat.com> References: <20141031191212.GO8852@funkthat.com> Date: Fri, 31 Oct 2014 18:28:28 -0700 Message-ID: Subject: Re: any reason not to enable IPDIVERT for ipfw module? From: Freddie Cash To: FreeBSD Arch , freebsd-net Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Nov 2014 01:28:29 -0000 On Oct 31, 2014 12:12 PM, "John-Mark Gurney" wrote: > > Can any one think of a good reason not to enable IPDIVERT sockets in > the ipfw module? > > And possibly enabling default to accept? That way you don't have to > go to the console when you load the ipfw module because you forgot to > auto add the accept all rule? :) You can change the default rule to accept via loader.conf and it will be set when the module is loaded. net.inet.IP.fw.default_to_accept or something Luke that. > something like: > ==== //depot/projects/opencrypto/sys/modules/ipfw/Makefile#3 - /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile ==== > --- /tmp/tmp.15774.16 2014-10-31 12:11:56.000000000 -0700 > +++ /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile 2014-10-31 12:11:54.000000000 -0700 > @@ -16,7 +16,10 @@ > #CFLAGS+= -DIPFIREWALL_VERBOSE_LIMIT=100 > # > #If you want it to pass all packets by default > -#CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT > +CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT > +# > +#If you want divert sockets > +CFLAGS+= -DIPDIVERT > # > > .include > > -- > John-Mark Gurney Voice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Sat Nov 1 05:16:11 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EDC0B1CB; Sat, 1 Nov 2014 05:16:11 +0000 (UTC) Received: from sola.nimnet.asn.au (paqi.nimnet.asn.au [115.70.110.159]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 44F411D6; Sat, 1 Nov 2014 05:16:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by sola.nimnet.asn.au (8.14.2/8.14.2) with ESMTP id sA15G85D081579; Sat, 1 Nov 2014 16:16:08 +1100 (EST) (envelope-from smithi@nimnet.asn.au) Date: Sat, 1 Nov 2014 16:16:07 +1100 (EST) From: Ian Smith To: Freddie Cash Subject: Re: any reason not to enable IPDIVERT for ipfw module? In-Reply-To: Message-ID: <20141101144834.N52402@sola.nimnet.asn.au> References: <20141031191212.GO8852@funkthat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-net , freebsd-ipfw@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Nov 2014 05:16:12 -0000 On Fri, 31 Oct 2014 18:28:28 -0700, Freddie Cash wrote: > On Oct 31, 2014 12:12 PM, "John-Mark Gurney" wrote: > > > > Can any one think of a good reason not to enable IPDIVERT sockets in > > the ipfw module? Yes, two. Nowadays people are just as or perhaps more likely to use in-kernel NAT, loading ipfw_nat.ko instead of ipdivert.ko, and there's no good reason to add extra code to ipfw.ko unless it's going to be used. See libalias(3) /MODULAR ARCHITECTURE Similaly there'd be no reason to include dummynet code unless using it. > > And possibly enabling default to accept? That way you don't have to > > go to the console when you load the ipfw module because you forgot to > > auto add the accept all rule? :) That'd reverse some 15+ years of security policy, of having the firewall closed until you've loaded your ruleset, to cater to forgetfulness? :) > You can change the default rule to accept via loader.conf and it will be > set when the module is loaded. > > net.inet.IP.fw.default_to_accept or something Luke that. Yes, net.inet.ip.fw.default_to_accept=1 is a loader tunable, and can be set before ipfw is loaded, unlike the net.inet.ip.fw sysctls which don't exist until ipfw is loaded. Or it can be set to 0 to reverse policy if kernel has been built with 'options IPFIREWALL_DEFAULT_TO_ACCEPT'. Normally /etc/rc.d/ipfw takes care of loading ipfw_nat or ipdivert (or both if you wanted to use both natd(8) and ipfw_nat for some reason?) and/or dummynet, according to the rc.conf variables. I've added freebsd-ipfw@ to ccs, just because it seems relevant .. cheers, Ian > > something like: > > ==== //depot/projects/opencrypto/sys/modules/ipfw/Makefile#3 - > /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile ==== > > --- /tmp/tmp.15774.16 2014-10-31 12:11:56.000000000 -0700 > > +++ /home/jmg/freebsd.p4/opencrypto/sys/modules/ipfw/Makefile > 2014-10-31 12:11:54.000000000 -0700 > > @@ -16,7 +16,10 @@ > > #CFLAGS+= -DIPFIREWALL_VERBOSE_LIMIT=100 > > # > > #If you want it to pass all packets by default > > -#CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT > > +CFLAGS+= -DIPFIREWALL_DEFAULT_TO_ACCEPT > > +# > > +#If you want divert sockets > > +CFLAGS+= -DIPDIVERT > > # > > > > .include > > > > -- > > John-Mark Gurney Voice: +1 415 225 5579 > > > > "All that I will do, has been done, All that I have, has not."