From owner-freebsd-current@FreeBSD.ORG Tue Jul 30 17:41:07 2013 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 43398850; Tue, 30 Jul 2013 17:41:07 +0000 (UTC) (envelope-from alc@rice.edu) Received: from pp1.rice.edu (proofpoint1.mail.rice.edu [128.42.201.100]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 75F5F214D; Tue, 30 Jul 2013 17:41:06 +0000 (UTC) Received: from pps.filterd (pp1.rice.edu [127.0.0.1]) by pp1.rice.edu (8.14.5/8.14.5) with SMTP id r6U9wTDu023852; Tue, 30 Jul 2013 12:41:05 -0500 Received: from mh10.mail.rice.edu (mh10.mail.rice.edu [128.42.201.30]) by pp1.rice.edu with ESMTP id 1dsymjk13u-1; Tue, 30 Jul 2013 12:41:05 -0500 X-Virus-Scanned: by amavis-2.7.0 at mh10.mail.rice.edu, auth channel Received: from [192.168.5.247] (unknown [12.107.116.132]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh10.mail.rice.edu (Postfix) with ESMTPSA id 1C8E4603B0; Tue, 30 Jul 2013 12:41:05 -0500 (CDT) Subject: Re: Fix for sys_munlock(2) with racct Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Alan Cox In-Reply-To: <20130721215038.GE13628@caravan.chchile.org> Date: Tue, 30 Jul 2013 10:40:45 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20130720112218.GD13628@caravan.chchile.org> <115EDC04-4B95-4FB5-9092-515188D8ADA7@rice.edu> <20130721215038.GE13628@caravan.chchile.org> To: Jeremie Le Hen X-Mailer: Apple Mail (2.1085) X-Proofpoint-SSN: Sensitivity3 Cc: Alan Cox , Konstantin Belousov , freebsd-current@FreeBSD.org, trasz@FreeBSD.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jul 2013 17:41:07 -0000 On Jul 21, 2013, at 2:50 PM, Jeremie Le Hen wrote: > On Sat, Jul 20, 2013 at 08:33:35PM -0700, Alan Cox wrote: >>=20 >> On Jul 20, 2013, at 4:22 AM, Jeremie Le Hen wrote: >>=20 >>> Hi Edward, Alan, >>>=20 >>> I plan to commit the following patch: >>> http://people.freebsd.org/~jlh/racct_munlock.diff >>>=20 >>> This solves the following panic: >>>=20 >>> panic: racct_sub: freeing 301989888 of resource 5, which is more = than allocated 73728 for pwsafe (pid 4177) >>>=20 >>> The problem is that the racct code in sys_munlock() trusts too much = the >>> user's input. vm_map_unwire_count() now returns how much memory has >>> really been unwired. >>>=20 >>> Any objection? >>=20 >>=20 >> Can you elaborate on what you mean by "sys_munlock() trusts too much >> the user's input." munlock(2) is supposed to return ENOMEM if any >> addresses within the specified range are not backed by valid = mappings. >> (Valid mappings with PROT_NONE access are something of a gray area >> here.) However, it is not error for a to call munlock() on a range >> that isn't locked, or to call it a second, third, etc. time on the >> same range. Is that what is provoking your panic? >=20 > I'm using sysutils/pwsafe and it seems to mlock(2) 18 pages (end - = start > =3D 73728 bytes) but then munlock(2) 73728 pages :-). vm_map_unwire() > seems to do the right thing with those buggy boundaries, but then the > racct code assumes that those boundaries were correct as long as > vm_map_unwire() returned successfully. This is what causes the > assertion failure. >=20 It sounds like the application is calling munlock(2) on valid but = unwired memory. That's not considered an error under the various Unix = standards that specify the behavior of munlock(2). It's sloppy = programming but not buggy programming.=20 >=20 >> By the way, sys_mlock() uses a simpler approach to deal with a = similar >> situation: >>=20 >> error =3D vm_map_wire(map, start, end,=20 >> VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES); >> #ifdef RACCT >> if (error !=3D KERN_SUCCESS) { >> PROC_LOCK(proc); >> racct_set(proc, RACCT_MEMLOCK, >> ptoa(pmap_wired_count(map->pmap))); >> PROC_UNLOCK(proc); >> } >> #endif >>=20 >> However, the code in sys_mlock() for maintaining RACCT_MEMLOCK, >> including the above snippet, is race-y. Two simultaneous callers to >> sys_mlock() will produce incorrect results. (I haven't looked at >> sys_mlockall() or vm_map_growstack().) >=20 > Ok, I will commit that as the first bandaid then. >=20 >=20 >> Also, a wired mapping can be destroyed by calling munmap(2) without >> first calling munlock(2), in which case, RACCT_MEMLOCK will be >> incorrect. >=20 > So I think the right way to tackle this is to handle racct in the vm > layer rather than at the syscall layer. >=20 The VM system already maintains counters equivalent to RACCT_VMEM and = RACCT_MEMLOCK. They are "map->size" and "pmap_wired_count(map->pmap)". = Instead of maintaining duplicate counters, could the resource accounting = framework be extended to support callbacks to obtain a value when it's = actually needed? > vm_fault_{wire,unwire}() look like natural places for this, but racct > functions require the struct proc to be locked. Any idea how to = handle > this? >=20 > --=20 > Jeremie Le Hen >=20 > Scientists say the world is made up of Protons, Neutrons and = Electrons. > They forgot to mention Morons. >=20