From owner-freebsd-current@FreeBSD.ORG  Tue Jul 30 17:41:07 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 43398850;
 Tue, 30 Jul 2013 17:41:07 +0000 (UTC) (envelope-from alc@rice.edu)
Received: from pp1.rice.edu (proofpoint1.mail.rice.edu [128.42.201.100])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 75F5F214D;
 Tue, 30 Jul 2013 17:41:06 +0000 (UTC)
Received: from pps.filterd (pp1.rice.edu [127.0.0.1])
 by pp1.rice.edu (8.14.5/8.14.5) with SMTP id r6U9wTDu023852;
 Tue, 30 Jul 2013 12:41:05 -0500
Received: from mh10.mail.rice.edu (mh10.mail.rice.edu [128.42.201.30])
 by pp1.rice.edu with ESMTP id 1dsymjk13u-1;
 Tue, 30 Jul 2013 12:41:05 -0500
X-Virus-Scanned: by amavis-2.7.0 at mh10.mail.rice.edu, auth channel
Received: from [192.168.5.247] (unknown [12.107.116.132])
 (using TLSv1 with cipher RC4-MD5 (128/128 bits))
 (No client certificate requested) (Authenticated sender: alc)
 by mh10.mail.rice.edu (Postfix) with ESMTPSA id 1C8E4603B0;
 Tue, 30 Jul 2013 12:41:05 -0500 (CDT)
Subject: Re: Fix for sys_munlock(2) with racct
Mime-Version: 1.0 (Apple Message framework v1085)
Content-Type: text/plain; charset=us-ascii
From: Alan Cox <alc@rice.edu>
In-Reply-To: <20130721215038.GE13628@caravan.chchile.org>
Date: Tue, 30 Jul 2013 10:40:45 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <EDF1EFE1-E9A9-44CF-8007-A642971572BE@rice.edu>
References: <20130720112218.GD13628@caravan.chchile.org>
 <115EDC04-4B95-4FB5-9092-515188D8ADA7@rice.edu>
 <20130721215038.GE13628@caravan.chchile.org>
To: Jeremie Le Hen <jlh@FreeBSD.org>
X-Mailer: Apple Mail (2.1085)
X-Proofpoint-SSN: Sensitivity3
Cc: Alan Cox <alc@FreeBSD.org>, Konstantin Belousov <kib@freebsd.org>,
 freebsd-current@FreeBSD.org, trasz@FreeBSD.org
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Jul 2013 17:41:07 -0000


On Jul 21, 2013, at 2:50 PM, Jeremie Le Hen wrote:

> On Sat, Jul 20, 2013 at 08:33:35PM -0700, Alan Cox wrote:
>>=20
>> On Jul 20, 2013, at 4:22 AM, Jeremie Le Hen wrote:
>>=20
>>> Hi Edward, Alan,
>>>=20
>>> I plan to commit the following patch:
>>> http://people.freebsd.org/~jlh/racct_munlock.diff
>>>=20
>>> This solves the following panic:
>>>=20
>>> panic: racct_sub: freeing 301989888 of resource 5, which is more =
than allocated 73728 for pwsafe (pid 4177)
>>>=20
>>> The problem is that the racct code in sys_munlock() trusts too much =
the
>>> user's input.  vm_map_unwire_count() now returns how much memory has
>>> really been unwired.
>>>=20
>>> Any objection?
>>=20
>>=20
>> Can you elaborate on what you mean by "sys_munlock() trusts too much
>> the user's input."   munlock(2) is supposed to return ENOMEM if any
>> addresses within the specified range are not backed by valid =
mappings.
>> (Valid mappings with PROT_NONE access are something of a gray area
>> here.)  However, it is not error for a to call munlock() on a range
>> that isn't locked, or to call it a second, third, etc. time on the
>> same range.  Is that what is provoking your panic?
>=20
> I'm using sysutils/pwsafe and it seems to mlock(2) 18 pages (end - =
start
> =3D 73728 bytes) but then munlock(2) 73728 pages :-). vm_map_unwire()
> seems to do the right thing with those buggy boundaries, but then the
> racct code assumes that those boundaries were correct as long as
> vm_map_unwire() returned successfully.  This is what causes the
> assertion failure.
>=20

It sounds like the application is calling munlock(2) on valid but =
unwired memory.  That's not considered an error under the various Unix =
standards that specify the behavior of munlock(2).  It's sloppy =
programming but not buggy programming.=20

>=20
>> By the way, sys_mlock() uses a simpler approach to deal with a =
similar
>> situation:
>>=20
>>        error =3D vm_map_wire(map, start, end,=20
>>            VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES);
>> #ifdef RACCT
>>        if (error !=3D KERN_SUCCESS) {
>>                PROC_LOCK(proc);
>>                racct_set(proc, RACCT_MEMLOCK,
>>                    ptoa(pmap_wired_count(map->pmap)));
>>                PROC_UNLOCK(proc);
>>        }
>> #endif
>>=20
>> However, the code in sys_mlock() for maintaining RACCT_MEMLOCK,
>> including the above snippet, is race-y.  Two simultaneous callers to
>> sys_mlock() will produce incorrect results.  (I haven't looked at
>> sys_mlockall() or vm_map_growstack().)
>=20
> Ok, I will commit that as the first bandaid then.
>=20
>=20
>> Also, a wired mapping can be destroyed by calling munmap(2) without
>> first calling munlock(2), in which case, RACCT_MEMLOCK will be
>> incorrect.
>=20
> So I think the right way to tackle this is to handle racct in the vm
> layer rather than at the syscall layer.
>=20

The VM system already maintains counters equivalent to RACCT_VMEM and =
RACCT_MEMLOCK.  They are "map->size" and "pmap_wired_count(map->pmap)".  =
Instead of maintaining duplicate counters, could the resource accounting =
framework be extended to support callbacks to obtain a value when it's =
actually needed?

> vm_fault_{wire,unwire}() look like natural places for this, but racct
> functions require the struct proc to be locked.  Any idea how to =
handle
> this?
>=20
> --=20
> Jeremie Le Hen
>=20
> Scientists say the world is made up of Protons, Neutrons and =
Electrons.
> They forgot to mention Morons.
>=20