From owner-freebsd-mips@FreeBSD.ORG Mon Aug 6 11:07:15 2012 Return-Path: Delivered-To: freebsd-mips@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04F59106566B for ; Mon, 6 Aug 2012 11:07:15 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id CA2CF8FC20 for ; Mon, 6 Aug 2012 11:07:14 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q76B7EQ8021829 for ; Mon, 6 Aug 2012 11:07:14 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q76B7EsK021827 for freebsd-mips@FreeBSD.org; Mon, 6 Aug 2012 11:07:14 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 6 Aug 2012 11:07:14 GMT Message-Id: <201208061107.q76B7EsK021827@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-mips@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-mips@FreeBSD.org X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Aug 2012 11:07:15 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/165951 mips [ar913x] [ath] DDR flush isn't being done for the WMAC p kern/163670 mips [mips][arge] arge can't allocate ring buffer on multip 2 problems total. From owner-freebsd-mips@FreeBSD.ORG Mon Aug 6 14:40:04 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 34E8410656B6; Mon, 6 Aug 2012 14:40:03 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 82D608FC15; Mon, 6 Aug 2012 14:40:03 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id B91D1B981; Mon, 6 Aug 2012 10:40:02 -0400 (EDT) From: John Baldwin To: Peter Jeremy Date: Mon, 6 Aug 2012 10:26:06 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p17; KDE/4.5.5; amd64; ; ) References: <20120703111753.GB72292@server.rulingia.com> <20120708110516.GA38312@server.rulingia.com> <201207120826.05577.jhb@freebsd.org> In-Reply-To: <201207120826.05577.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201208061026.06328.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 06 Aug 2012 10:40:02 -0400 (EDT) Cc: arm@freebsd.org, mips@freebsd.org Subject: On-stack allocation of DMA S/G lists X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Aug 2012 14:40:04 -0000 On Thursday, July 12, 2012 8:26:05 am John Baldwin wrote: > On Sunday, July 08, 2012 7:05:16 am Peter Jeremy wrote: > > BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've > > noticed they appear to potentially allocate substantial kernel stack > > under some conditions as several bus_dma(9) functions include: > > bus_dma_segment_t dm_segments[dmat->nsegments]; > > What prevents this overflowing the kernel stack? > > That does seem dubious. x86 stores the array in the tag instead. I have an untested patch to change bus-dma on arm and mips to allocate a dynamic S/G list in each DMA tag on first use instead of using on-stack allocation (which I think is rather bogus). Can folks review and test this patch please? Thanks. http://www.FreeBSD.org/~jhb/patches/arm_mips_dynamic_dma_segs.patch -- John Baldwin From owner-freebsd-mips@FreeBSD.ORG Mon Aug 6 19:42:06 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EDC9D106564A; Mon, 6 Aug 2012 19:42:05 +0000 (UTC) (envelope-from stas@FreeBSD.org) Received: from mx0.deglitch.com (cl-414.sto-01.se.sixxs.net [IPv6:2001:16d8:ff00:19d::2]) by mx1.freebsd.org (Postfix) with ESMTP id 904708FC08; Mon, 6 Aug 2012 19:42:05 +0000 (UTC) Received: from orion.swifttest.com (unknown [74.3.97.61]) by mx0.deglitch.com (Postfix) with ESMTPA id 644DA8FC36; Mon, 6 Aug 2012 23:42:04 +0400 (MSK) Received: from orion.swifttest.com (localhost [127.0.0.1]) by orion.swifttest.com (Postfix) with SMTP id DFD235C3C; Mon, 6 Aug 2012 12:41:50 -0700 (PDT) Date: Mon, 6 Aug 2012 12:41:50 -0700 From: Stanislav Sedov To: John Baldwin Message-Id: <20120806124150.52fb0be17155cac723866d63@FreeBSD.org> In-Reply-To: <201208061026.06328.jhb@freebsd.org> References: <20120703111753.GB72292@server.rulingia.com> <20120708110516.GA38312@server.rulingia.com> <201207120826.05577.jhb@freebsd.org> <201208061026.06328.jhb@freebsd.org> Organization: The FreeBSD Project X-Mailer: carrier-pigeon Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: arm@freebsd.org, Peter Jeremy , mips@freebsd.org Subject: Re: On-stack allocation of DMA S/G lists X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Aug 2012 19:42:06 -0000 On Mon, 6 Aug 2012 10:26:06 -0400 John Baldwin mentioned: > On Thursday, July 12, 2012 8:26:05 am John Baldwin wrote: > > On Sunday, July 08, 2012 7:05:16 am Peter Jeremy wrote: > > > BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've > > > noticed they appear to potentially allocate substantial kernel stack > > > under some conditions as several bus_dma(9) functions include: > > > bus_dma_segment_t dm_segments[dmat->nsegments]; > > > What prevents this overflowing the kernel stack? > > > > That does seem dubious. x86 stores the array in the tag instead. > > I have an untested patch to change bus-dma on arm and mips to allocate a > dynamic S/G list in each DMA tag on first use instead of using on-stack > allocation (which I think is rather bogus). Can folks review and test this > patch please? Thanks. > > http://www.FreeBSD.org/~jhb/patches/arm_mips_dynamic_dma_segs.patch > Seems to work fine for me on ARM. I had to initialize mflags to 0 in one place to get it compiling though. -- Stanislav Sedov ST4096-RIPE () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments From owner-freebsd-mips@FreeBSD.ORG Mon Aug 6 20:13:27 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EC1091065670; Mon, 6 Aug 2012 20:13:26 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id C187B8FC16; Mon, 6 Aug 2012 20:13:26 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 0B081B96C; Mon, 6 Aug 2012 16:13:26 -0400 (EDT) From: John Baldwin To: Stanislav Sedov Date: Mon, 6 Aug 2012 15:53:40 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p17; KDE/4.5.5; amd64; ; ) References: <20120703111753.GB72292@server.rulingia.com> <201208061026.06328.jhb@freebsd.org> <20120806124150.52fb0be17155cac723866d63@FreeBSD.org> In-Reply-To: <20120806124150.52fb0be17155cac723866d63@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201208061553.40588.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 06 Aug 2012 16:13:26 -0400 (EDT) Cc: arm@freebsd.org, Peter Jeremy , mips@freebsd.org Subject: Re: On-stack allocation of DMA S/G lists X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Aug 2012 20:13:27 -0000 On Monday, August 06, 2012 3:41:50 pm Stanislav Sedov wrote: > On Mon, 6 Aug 2012 10:26:06 -0400 > John Baldwin mentioned: > > > On Thursday, July 12, 2012 8:26:05 am John Baldwin wrote: > > > On Sunday, July 08, 2012 7:05:16 am Peter Jeremy wrote: > > > > BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've > > > > noticed they appear to potentially allocate substantial kernel stack > > > > under some conditions as several bus_dma(9) functions include: > > > > bus_dma_segment_t dm_segments[dmat->nsegments]; > > > > What prevents this overflowing the kernel stack? > > > > > > That does seem dubious. x86 stores the array in the tag instead. > > > > I have an untested patch to change bus-dma on arm and mips to allocate a > > dynamic S/G list in each DMA tag on first use instead of using on-stack > > allocation (which I think is rather bogus). Can folks review and test this > > patch please? Thanks. > > > > http://www.FreeBSD.org/~jhb/patches/arm_mips_dynamic_dma_segs.patch > > > > Seems to work fine for me on ARM. > I had to initialize mflags to 0 in one place to get it compiling though. Ah, yes. That's why x86 did that. :) I've updated it to fix that, thanks for testing! -- John Baldwin From owner-freebsd-mips@FreeBSD.ORG Tue Aug 7 07:13:18 2012 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2DD4610657A7 for ; Tue, 7 Aug 2012 07:13:18 +0000 (UTC) (envelope-from ambrosehua@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id B5B318FC0C for ; Tue, 7 Aug 2012 07:13:17 +0000 (UTC) Received: by wgbds11 with SMTP id ds11so3304906wgb.31 for ; Tue, 07 Aug 2012 00:13:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=yULG6Nu3VXcMx8eHXmhAmejVA0KNt8yValoQ6W1rUXQ=; b=PksHu1ErEE4VomsiPSWoAPGbiHj8ytjRDLbui9TCbV0UciLcJYPYC3hVm5togJBmNz dpF6332Dy5q2wyGGyq/A4vbULqomu/UiJKtvYeTk092gj3Qr9WslKPnqUeYHqgCKq3WE 1Mi7/3cjoHciImI086104w95CThUxi5Olhfj3Abvkqx4Q99RoFfkbn7T2+myKOGmAzOR QARhTz0+aCPCfDudaeFmz/k0AZ3bSOpODbqJSnz+E7oV0bD5LQlIiO5tAf6aP/UCjYgt 7H93ozVjJ81RW8WuIQarzEO/GmbOvKoddUdafVULnRY310Aroz8tFRPD44fJetueyjrF /Ozg== MIME-Version: 1.0 Received: by 10.216.54.146 with SMTP id i18mr6782746wec.187.1344323596234; Tue, 07 Aug 2012 00:13:16 -0700 (PDT) Received: by 10.223.83.9 with HTTP; Tue, 7 Aug 2012 00:13:16 -0700 (PDT) Date: Tue, 7 Aug 2012 15:13:16 +0800 Message-ID: From: Paul Ambrose To: freebsd-mips@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: SYSCTL_INT emulate_fp error X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2012 07:13:18 -0000 When I read sys/mips/mips/trap.c, I found /* * FP emulation is assumed to work on O32, but the code is outdated and crufty * enough that it's a more sensible default to have it disabled when using * other ABIs. At the very least, it needs a lot of help in using * type-semantic ABI-oblivious macros for everything it does. */ #if defined(__mips_o32) static int emulate_fp = 1; #else static int emulate_fp = 0; #endif SYSCTL_INT(_machdep, OID_AUTO, emulate_fp, CTLFLAG_RW, &allow_unaligned_acc, 0, "Emulate unimplemented FPU instructions"); here &allow_unaligned_acc should be &emulate_fp From owner-freebsd-mips@FreeBSD.ORG Tue Aug 7 08:34:56 2012 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 91DDE1065674 for ; Tue, 7 Aug 2012 08:34:56 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id 4675F8FC0C for ; Tue, 7 Aug 2012 08:34:55 +0000 (UTC) Received: by yhfs35 with SMTP id s35so4102825yhf.13 for ; Tue, 07 Aug 2012 01:34:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=W8suI7Tz9spLtVsAYqf9RyRpczwPsWNAOKbGc4F4srA=; b=dDm6QPDJ0xiIpqQinwi2DRdbJc/HzdGH/gu2Ajv+Lmjg1YqYA90y7PQ4WzsCRGjCDe tlaudupzzRktySLzp4IQOhsha0vsk5jMx7eXfLUS81VUZTRx7mORn/gM8rJIAWVH60UO H6f/Hn5uqt2nnx9HtMB+vGr3rO7OZolXEhB8nOBujhcLFXDkJYgyffJsuVSMFUEFFOD7 +nwXWKSSzFTSbGka/3SDLybE3z9VfkOv7Y9EDyQHX8PQvwHh2asamw016iNGMypDjzt5 5B8SgO5fWTWt7Vu7FW30ACQfK5Tf5iieUPCe5er4Vcg1j2pVZAuIwF0QpAfQZ7LdlSXO X3fQ== Received: by 10.50.6.229 with SMTP id e5mr7963403iga.9.1344328494648; Tue, 07 Aug 2012 01:34:54 -0700 (PDT) Received: from 63.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPS id z3sm8739231igc.7.2012.08.07.01.34.52 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 07 Aug 2012 01:34:53 -0700 (PDT) Sender: Warner Losh Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: Date: Tue, 7 Aug 2012 02:34:51 -0600 Content-Transfer-Encoding: 7bit Message-Id: <76EA6699-96FB-4616-BFBB-F274680CFCAB@bsdimp.com> References: To: Paul Ambrose X-Mailer: Apple Mail (2.1084) X-Gm-Message-State: ALoCoQkU318BQHp1JtVbVF9Jvi0nH766gzl7LusshKiDwBnR2M21xleNntZ5wpHBFJ9Ow6WD9T/L Cc: freebsd-mips@freebsd.org Subject: Re: SYSCTL_INT emulate_fp error X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2012 08:34:56 -0000 On Aug 7, 2012, at 1:13 AM, Paul Ambrose wrote: > When I read sys/mips/mips/trap.c, I found > /* > * FP emulation is assumed to work on O32, but the code is outdated and > crufty > * enough that it's a more sensible default to have it disabled when using > * other ABIs. At the very least, it needs a lot of help in using > * type-semantic ABI-oblivious macros for everything it does. > */ > #if defined(__mips_o32) > static int emulate_fp = 1; > #else > static int emulate_fp = 0; > #endif > SYSCTL_INT(_machdep, OID_AUTO, emulate_fp, CTLFLAG_RW, > &allow_unaligned_acc, 0, "Emulate unimplemented FPU instructions"); > > here &allow_unaligned_acc should be &emulate_fp You're right. Fixed. Warner From owner-freebsd-mips@FreeBSD.ORG Tue Aug 7 16:10:52 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5C91F1065675 for ; Tue, 7 Aug 2012 16:10:52 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [76.96.30.48]) by mx1.freebsd.org (Postfix) with ESMTP id 3A9678FC1B for ; Tue, 7 Aug 2012 16:10:52 +0000 (UTC) Received: from omta07.emeryville.ca.mail.comcast.net ([76.96.30.59]) by qmta05.emeryville.ca.mail.comcast.net with comcast id jpMt1j0041GXsucA5s9mUM; Tue, 07 Aug 2012 16:09:46 +0000 Received: from damnhippie.dyndns.org ([24.8.232.202]) by omta07.emeryville.ca.mail.comcast.net with comcast id js9l1j00J4NgCEG8Us9ldh; Tue, 07 Aug 2012 16:09:46 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id q77G9hdj008330; Tue, 7 Aug 2012 10:09:43 -0600 (MDT) (envelope-from freebsd@damnhippie.dyndns.org) From: Ian Lepore To: John Baldwin In-Reply-To: <201208061026.06328.jhb@freebsd.org> References: <20120703111753.GB72292@server.rulingia.com> <20120708110516.GA38312@server.rulingia.com> <201207120826.05577.jhb@freebsd.org> <201208061026.06328.jhb@freebsd.org> Content-Type: text/plain; charset="us-ascii" Date: Tue, 07 Aug 2012 10:09:42 -0600 Message-ID: <1344355782.1128.186.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: arm@freebsd.org, mips@freebsd.org, Peter Jeremy Subject: Re: On-stack allocation of DMA S/G lists X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2012 16:10:52 -0000 On Mon, 2012-08-06 at 10:26 -0400, John Baldwin wrote: > On Thursday, July 12, 2012 8:26:05 am John Baldwin wrote: > > On Sunday, July 08, 2012 7:05:16 am Peter Jeremy wrote: > > > BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've > > > noticed they appear to potentially allocate substantial kernel stack > > > under some conditions as several bus_dma(9) functions include: > > > bus_dma_segment_t dm_segments[dmat->nsegments]; > > > What prevents this overflowing the kernel stack? > > > > That does seem dubious. x86 stores the array in the tag instead. > > I have an untested patch to change bus-dma on arm and mips to allocate a > dynamic S/G list in each DMA tag on first use instead of using on-stack > allocation (which I think is rather bogus). Can folks review and test this > patch please? Thanks. > > http://www.FreeBSD.org/~jhb/patches/arm_mips_dynamic_dma_segs.patch > I'm worried about changing a per-mapping-call resource to a per-dma-tag resource here. What prevents the situation where you have two bus_dmamap_load() calls in progress at the same time using different buffers but the same tag? I can't find anything in the docs that indicates you have to provide external locking of the tag for map load/unload calls, or that even implies the tag can be modified by a mapping operation. The lockfunc stuff related to creating the tag is documented as being used only during a deferred callback. The existing code seems to go out of its way to avoid modifying the tag during a mapping operation. For example, it decides at tag creation time whether any bounce pages might ever be needed for the tag, and if so it pre-sets a bounce zone in the tag, then at mapping time the bounce zone is protected with its own lock when it gets modified. To me this feels like a way to specifically avoid the need to lock or modify the tag during a mapping operation. Assuming that all of the foregoing is moot for some reason I've overlooked, then on a purely implementation level, could all the duplicated code to allocate the array when necessary be moved into bus_dmamap_load_buffer(), triggered by a NULL 'segs' pointer? And just for the record, looking at the problem from an even more distant vantage... is there really a problem with stack-allocating the segments? On a 64-bit arch the struct is like 16 bytes. Typical usage is to allocate a tag allowing 1 or just a few segments. Is anyone really going to create a tag specifying hundreds of segments that would overflow the stack? If they try, wouldn't failing the tag create be good enough? -- Ian From owner-freebsd-mips@FreeBSD.ORG Tue Aug 7 18:09:41 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6713E1065670; Tue, 7 Aug 2012 18:09:41 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 3A2318FC14; Tue, 7 Aug 2012 18:09:41 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 908D6B91C; Tue, 7 Aug 2012 14:09:40 -0400 (EDT) From: John Baldwin To: Ian Lepore Date: Tue, 7 Aug 2012 14:06:44 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p17; KDE/4.5.5; amd64; ; ) References: <20120703111753.GB72292@server.rulingia.com> <201208061026.06328.jhb@freebsd.org> <1344355782.1128.186.camel@revolution.hippie.lan> In-Reply-To: <1344355782.1128.186.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201208071406.45172.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 07 Aug 2012 14:09:40 -0400 (EDT) Cc: arm@freebsd.org, mips@freebsd.org, Peter Jeremy Subject: Re: On-stack allocation of DMA S/G lists X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2012 18:09:41 -0000 On Tuesday, August 07, 2012 12:09:42 pm Ian Lepore wrote: > On Mon, 2012-08-06 at 10:26 -0400, John Baldwin wrote: > > On Thursday, July 12, 2012 8:26:05 am John Baldwin wrote: > > > On Sunday, July 08, 2012 7:05:16 am Peter Jeremy wrote: > > > > BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've > > > > noticed they appear to potentially allocate substantial kernel stack > > > > under some conditions as several bus_dma(9) functions include: > > > > bus_dma_segment_t dm_segments[dmat->nsegments]; > > > > What prevents this overflowing the kernel stack? > > > > > > That does seem dubious. x86 stores the array in the tag instead. > > > > I have an untested patch to change bus-dma on arm and mips to allocate a > > dynamic S/G list in each DMA tag on first use instead of using on-stack > > allocation (which I think is rather bogus). Can folks review and test this > > patch please? Thanks. > > > > http://www.FreeBSD.org/~jhb/patches/arm_mips_dynamic_dma_segs.patch > > > > I'm worried about changing a per-mapping-call resource to a per-dma-tag > resource here. What prevents the situation where you have two > bus_dmamap_load() calls in progress at the same time using different > buffers but the same tag? > > I can't find anything in the docs that indicates you have to provide > external locking of the tag for map load/unload calls, or that even > implies the tag can be modified by a mapping operation. The lockfunc > stuff related to creating the tag is documented as being used only > during a deferred callback. Actually, I do think it is implicit that you won't do concurrent loads on a DMA tag, though that may not be obvious. Keep in mind that this is what x86's bus_dma has always done. For storage drivers you certainly can't do this or risk completeing I/O requests out-of-order which can break an upper-layer assumption in a filesystem. Note that all other platforms do this as well, only arm and mips allocate on the stack. > The existing code seems to go out of its way to avoid modifying the tag > during a mapping operation. For example, it decides at tag creation > time whether any bounce pages might ever be needed for the tag, and if > so it pre-sets a bounce zone in the tag, then at mapping time the bounce > zone is protected with its own lock when it gets modified. To me this > feels like a way to specifically avoid the need to lock or modify the > tag during a mapping operation. > > Assuming that all of the foregoing is moot for some reason I've > overlooked, then on a purely implementation level, could all the > duplicated code to allocate the array when necessary be moved into > bus_dmamap_load_buffer(), triggered by a NULL 'segs' pointer? Nope, bus_dmamap_load() doesn't know which of M_NOWAIT / M_WAITOK is appropriate to use. > And just for the record, looking at the problem from an even more > distant vantage... is there really a problem with stack-allocating the > segments? On a 64-bit arch the struct is like 16 bytes. Typical usage > is to allocate a tag allowing 1 or just a few segments. Is anyone > really going to create a tag specifying hundreds of segments that would > overflow the stack? If they try, wouldn't failing the tag create be > good enough? I/O devices can allocate tags with several S/G elements. An mfi(4) tag on i386 would use a 256 byte segments array (512 on amd64). That's not entirely trivial. It would be worse if you couldn't depend on dmat->nsegments and had to always allocate the full size. Presumably though we require C99 at that point (and it requires that?). -- John Baldwin From owner-freebsd-mips@FreeBSD.ORG Tue Aug 7 21:25:57 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0B44D1065670; Tue, 7 Aug 2012 21:25:57 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 743168FC14; Tue, 7 Aug 2012 21:25:55 +0000 (UTC) Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au [220.239.247.45]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q77LPndk049001 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 8 Aug 2012 07:25:50 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q77LPcTE011034 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 8 Aug 2012 07:25:38 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q77LPbJp011033; Wed, 8 Aug 2012 07:25:37 +1000 (EST) (envelope-from peter) Date: Wed, 8 Aug 2012 07:25:37 +1000 From: Peter Jeremy To: Ian Lepore Message-ID: <20120807212537.GB10572@server.rulingia.com> References: <20120703111753.GB72292@server.rulingia.com> <20120708110516.GA38312@server.rulingia.com> <201207120826.05577.jhb@freebsd.org> <201208061026.06328.jhb@freebsd.org> <1344355782.1128.186.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="/NkBOFFp2J2Af1nK" Content-Disposition: inline In-Reply-To: <1344355782.1128.186.camel@revolution.hippie.lan> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arm@freebsd.org, mips@freebsd.org Subject: Re: On-stack allocation of DMA S/G lists X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2012 21:25:57 -0000 --/NkBOFFp2J2Af1nK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-Aug-07 10:09:42 -0600, Ian Lepore w= rote: >And just for the record, looking at the problem from an even more >distant vantage... is there really a problem with stack-allocating the >segments? On a 64-bit arch the struct is like 16 bytes. Typical usage >is to allocate a tag allowing 1 or just a few segments. Is anyone >really going to create a tag specifying hundreds of segments that would >overflow the stack? The example that led me to study the code was drm(4). Video cards typically require fairly large allocations (32MB in my case) but don't require the RAM to be contiguous - ie it created a tag with 8192 segments in my case. This may not be relevant to most arm or mips hosts but drm(4) is MI code that can (theoretically) be built on these architectures and is a real example where a tag can have many segments. > If they try, wouldn't failing the tag create be good enough? No. The caller specifies the hardware limits for the device. They should not need to take into account implementation details that mean the full hardware capabilities are not needed. We don't fail a tag create if it specifies that RAM above 4GB can be used when we don't have any. Why should be fail a tag create that allows the use of up to 8192 tags when we only support 1? --=20 Peter Jeremy --/NkBOFFp2J2Af1nK Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlAhh9EACgkQ/opHv/APuIfXtQCgkPCHfBAMkK0mE0tBmKqiwVva qO8AnA6dmeOhECocgwzP3A21OG/gEI/i =OnWm -----END PGP SIGNATURE----- --/NkBOFFp2J2Af1nK-- From owner-freebsd-mips@FreeBSD.ORG Tue Aug 7 21:46:03 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 397C41065670 for ; Tue, 7 Aug 2012 21:46:03 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from qmta08.emeryville.ca.mail.comcast.net (qmta08.emeryville.ca.mail.comcast.net [76.96.30.80]) by mx1.freebsd.org (Postfix) with ESMTP id 179ED8FC0A for ; Tue, 7 Aug 2012 21:46:03 +0000 (UTC) Received: from omta19.emeryville.ca.mail.comcast.net ([76.96.30.76]) by qmta08.emeryville.ca.mail.comcast.net with comcast id jx6e1j0011eYJf8A8xkxGr; Tue, 07 Aug 2012 21:44:57 +0000 Received: from damnhippie.dyndns.org ([24.8.232.202]) by omta19.emeryville.ca.mail.comcast.net with comcast id jxkv1j00F4NgCEG01xkwPi; Tue, 07 Aug 2012 21:44:56 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id q77LirUP008514; Tue, 7 Aug 2012 15:44:53 -0600 (MDT) (envelope-from freebsd@damnhippie.dyndns.org) From: Ian Lepore To: Peter Jeremy In-Reply-To: <20120807212537.GB10572@server.rulingia.com> References: <20120703111753.GB72292@server.rulingia.com> <20120708110516.GA38312@server.rulingia.com> <201207120826.05577.jhb@freebsd.org> <201208061026.06328.jhb@freebsd.org> <1344355782.1128.186.camel@revolution.hippie.lan> <20120807212537.GB10572@server.rulingia.com> Content-Type: text/plain; charset="us-ascii" Date: Tue, 07 Aug 2012 15:44:53 -0600 Message-ID: <1344375893.1128.230.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: arm@freebsd.org, mips@freebsd.org Subject: Re: On-stack allocation of DMA S/G lists X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2012 21:46:03 -0000 On Wed, 2012-08-08 at 07:25 +1000, Peter Jeremy wrote: > On 2012-Aug-07 10:09:42 -0600, Ian Lepore wrote: > >And just for the record, looking at the problem from an even more > >distant vantage... is there really a problem with stack-allocating the > >segments? On a 64-bit arch the struct is like 16 bytes. Typical usage > >is to allocate a tag allowing 1 or just a few segments. Is anyone > >really going to create a tag specifying hundreds of segments that would > >overflow the stack? > > The example that led me to study the code was drm(4). Video cards > typically require fairly large allocations (32MB in my case) but don't > require the RAM to be contiguous - ie it created a tag with 8192 > segments in my case. This may not be relevant to most arm or mips > hosts but drm(4) is MI code that can (theoretically) be built on these > architectures and is a real example where a tag can have many > segments. > > > If they try, wouldn't failing the tag create be good enough? > > No. The caller specifies the hardware limits for the device. They > should not need to take into account implementation details that > mean the full hardware capabilities are not needed. We don't fail > a tag create if it specifies that RAM above 4GB can be used when > we don't have any. Why should be fail a tag create that allows the > use of up to 8192 tags when we only support 1? > Oh, good example. I was wondering if there was any realistic need for lots of segments, and a big video buffer is exactly that. -- Ian From owner-freebsd-mips@FreeBSD.ORG Wed Aug 8 16:10:07 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 236A2106566C for ; Wed, 8 Aug 2012 16:10:07 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh10.mail.rice.edu (mh10.mail.rice.edu [128.42.201.30]) by mx1.freebsd.org (Postfix) with ESMTP id CC1488FC14 for ; Wed, 8 Aug 2012 16:10:06 +0000 (UTC) Received: from mh10.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh10.mail.rice.edu (Postfix) with ESMTP id 8BBC160CBC for ; Wed, 8 Aug 2012 11:10:06 -0500 (CDT) Received: from mh10.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh10.mail.rice.edu (Postfix) with ESMTP id 8AA3860CB0; Wed, 8 Aug 2012 11:10:06 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh10.mail.rice.edu, auth channel Received: from mh10.mail.rice.edu ([127.0.0.1]) by mh10.mail.rice.edu (mh10.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id cyiq1qPr3SyJ; Wed, 8 Aug 2012 11:10:06 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh10.mail.rice.edu (Postfix) with ESMTPSA id 01E20602AC; Wed, 8 Aug 2012 11:10:05 -0500 (CDT) Message-ID: <50228F5C.1000408@rice.edu> Date: Wed, 08 Aug 2012 11:10:04 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:8.0) Gecko/20111113 Thunderbird/8.0 MIME-Version: 1.0 To: mips@freebsd.org Content-Type: multipart/mixed; boundary="------------030808020102000304090105" Cc: Alan Cox Subject: mips pmap patch X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2012 16:10:07 -0000 This is a multi-part message in MIME format. --------------030808020102000304090105 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Can someone please test this patch? It applies some changes to the mips pmap that were made a long time ago to the amd64 and i386 pmaps. In particular, it reduces the size of a pv entry. Briefly, the big picture is that in order to move forward with further locking refinements to the VM system's machine-independent layer, I need to eliminate all uses of the page queues lock from every pmap. In order to remove the page queues lock from the mips pmap, I need to port the new pv entry allocator from the amd64 and i386 pmaps. This patch is preparation for that. Alan --------------030808020102000304090105 Content-Type: text/plain; name="mips_pmap9.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mips_pmap9.patch" Index: mips/mips/pmap.c =================================================================== --- mips/mips/pmap.c (revision 239097) +++ mips/mips/pmap.c (working copy) @@ -164,7 +164,8 @@ static pv_entry_t pmap_pvh_remove(struct md_page * static __inline void pmap_changebit(vm_page_t m, int bit, boolean_t setem); static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpte); -static int pmap_remove_pte(struct pmap *pmap, pt_entry_t *ptq, vm_offset_t va); +static int pmap_remove_pte(struct pmap *pmap, pt_entry_t *ptq, vm_offset_t va, + pd_entry_t pde); static void pmap_remove_page(struct pmap *pmap, vm_offset_t va); static void pmap_remove_entry(struct pmap *pmap, vm_page_t m, vm_offset_t va); static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_page_t mpte, @@ -176,7 +177,7 @@ static int _pmap_unwire_pte_hold(pmap_t pmap, vm_o static vm_page_t pmap_allocpte(pmap_t pmap, vm_offset_t va, int flags); static vm_page_t _pmap_allocpte(pmap_t pmap, unsigned ptepindex, int flags); -static int pmap_unuse_pt(pmap_t, vm_offset_t, vm_page_t); +static int pmap_unuse_pt(pmap_t, vm_offset_t, pd_entry_t); static pt_entry_t init_pte_prot(vm_offset_t va, vm_page_t m, vm_prot_t prot); #ifdef SMP @@ -973,8 +974,6 @@ _pmap_unwire_pte_hold(pmap_t pmap, vm_offset_t va, pmap_unwire_pte_hold(pmap, va, pdpg); } #endif - if (pmap->pm_ptphint == m) - pmap->pm_ptphint = NULL; /* * If the page is finally unwired, simply free it. @@ -989,25 +988,14 @@ _pmap_unwire_pte_hold(pmap_t pmap, vm_offset_t va, * conditionally free the page, and manage the hold/wire counts. */ static int -pmap_unuse_pt(pmap_t pmap, vm_offset_t va, vm_page_t mpte) +pmap_unuse_pt(pmap_t pmap, vm_offset_t va, pd_entry_t pde) { - unsigned ptepindex; - pd_entry_t pteva; + vm_page_t mpte; if (va >= VM_MAXUSER_ADDRESS) return (0); - - if (mpte == NULL) { - ptepindex = pmap_pde_pindex(va); - if (pmap->pm_ptphint && - (pmap->pm_ptphint->pindex == ptepindex)) { - mpte = pmap->pm_ptphint; - } else { - pteva = *pmap_pde(pmap, va); - mpte = PHYS_TO_VM_PAGE(MIPS_DIRECT_TO_PHYS(pteva)); - pmap->pm_ptphint = mpte; - } - } + KASSERT(pde != 0, ("pmap_unuse_pt: pde != 0")); + mpte = PHYS_TO_VM_PAGE(MIPS_DIRECT_TO_PHYS(pde)); return (pmap_unwire_pte_hold(pmap, va, mpte)); } @@ -1019,7 +1007,6 @@ pmap_pinit0(pmap_t pmap) PMAP_LOCK_INIT(pmap); pmap->pm_segtab = kernel_segmap; CPU_ZERO(&pmap->pm_active); - pmap->pm_ptphint = NULL; for (i = 0; i < MAXCPU; i++) { pmap->pm_asid[i].asid = PMAP_ASID_RESERVED; pmap->pm_asid[i].gen = 0; @@ -1079,7 +1066,6 @@ pmap_pinit(pmap_t pmap) ptdva = MIPS_PHYS_TO_DIRECT(VM_PAGE_TO_PHYS(ptdpg)); pmap->pm_segtab = (pd_entry_t *)ptdva; CPU_ZERO(&pmap->pm_active); - pmap->pm_ptphint = NULL; for (i = 0; i < MAXCPU; i++) { pmap->pm_asid[i].asid = PMAP_ASID_RESERVED; pmap->pm_asid[i].gen = 0; @@ -1156,17 +1142,11 @@ _pmap_allocpte(pmap_t pmap, unsigned ptepindex, in /* Next level entry */ pde = (pd_entry_t *)*pdep; pde[pdeindex] = (pd_entry_t)pageva; - pmap->pm_ptphint = m; } #else pmap->pm_segtab[ptepindex] = (pd_entry_t)pageva; #endif pmap->pm_stats.resident_count++; - - /* - * Set the page table hint - */ - pmap->pm_ptphint = m; return (m); } @@ -1196,16 +1176,7 @@ retry: * count, and activate it. */ if (pde != NULL && *pde != NULL) { - /* - * In order to get the page table page, try the hint first. - */ - if (pmap->pm_ptphint && - (pmap->pm_ptphint->pindex == ptepindex)) { - m = pmap->pm_ptphint; - } else { - m = PHYS_TO_VM_PAGE(MIPS_DIRECT_TO_PHYS(*pde)); - pmap->pm_ptphint = m; - } + m = PHYS_TO_VM_PAGE(MIPS_DIRECT_TO_PHYS(*pde)); m->wire_count++; } else { /* @@ -1351,6 +1322,7 @@ get_pv_entry(pmap_t locked_pmap) static const struct timeval printinterval = { 60, 0 }; static struct timeval lastprint; struct vpgqueues *vpq; + pd_entry_t *pde; pt_entry_t *pte, oldpte; pmap_t pmap; pv_entry_t allocated_pv, next_pv, pv; @@ -1389,8 +1361,10 @@ retry: else if (pmap != locked_pmap && !PMAP_TRYLOCK(pmap)) continue; pmap->pm_stats.resident_count--; - pte = pmap_pte(pmap, va); - KASSERT(pte != NULL, ("pte")); + pde = pmap_pde(pmap, va); + KASSERT(pde != NULL && *pde != 0, + ("get_pv_entry: pde")); + pte = pmap_pde_to_pte(pde, va); oldpte = *pte; if (is_kernel_pmap(pmap)) *pte = PTE_G; @@ -1406,7 +1380,7 @@ retry: TAILQ_REMOVE(&pmap->pm_pvlist, pv, pv_plist); m->md.pv_list_count--; TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); - pmap_unuse_pt(pmap, va, pv->pv_ptem); + pmap_unuse_pt(pmap, va, *pde); if (pmap != locked_pmap) PMAP_UNLOCK(pmap); if (allocated_pv == NULL) @@ -1513,7 +1487,6 @@ pmap_try_insert_pv_entry(pmap_t pmap, vm_page_t mp pv_entry_count++; pv->pv_va = va; pv->pv_pmap = pmap; - pv->pv_ptem = mpte; TAILQ_INSERT_TAIL(&pmap->pm_pvlist, pv, pv_plist); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_list); m->md.pv_list_count++; @@ -1526,7 +1499,8 @@ pmap_try_insert_pv_entry(pmap_t pmap, vm_page_t mp * pmap_remove_pte: do the things to unmap a page in a process */ static int -pmap_remove_pte(struct pmap *pmap, pt_entry_t *ptq, vm_offset_t va) +pmap_remove_pte(struct pmap *pmap, pt_entry_t *ptq, vm_offset_t va, + pd_entry_t pde) { pt_entry_t oldpte; vm_page_t m; @@ -1561,7 +1535,7 @@ static int pmap_remove_entry(pmap, m, va); } - return (pmap_unuse_pt(pmap, va, NULL)); + return (pmap_unuse_pt(pmap, va, pde)); } /* @@ -1570,16 +1544,20 @@ static int static void pmap_remove_page(struct pmap *pmap, vm_offset_t va) { + pd_entry_t *pde; pt_entry_t *ptq; mtx_assert(&vm_page_queue_mtx, MA_OWNED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); - ptq = pmap_pte(pmap, va); + pde = pmap_pde(pmap, va); + if (pde == NULL || *pde == 0) + return; + ptq = pmap_pde_to_pte(pde, va); /* * if there is no pte for this address, just skip it!!! */ - if (!ptq || !pte_test(ptq, PTE_V)) { + if (!pte_test(ptq, PTE_V)) { return; } @@ -1591,7 +1569,7 @@ pmap_remove_page(struct pmap *pmap, vm_offset_t va /* * get a local va for mappings for this pmap. */ - (void)pmap_remove_pte(pmap, ptq, va); + (void)pmap_remove_pte(pmap, ptq, va, *pde); pmap_invalidate_page(pmap, va); return; @@ -1673,6 +1651,7 @@ void pmap_remove_all(vm_page_t m) { pv_entry_t pv; + pd_entry_t *pde; pt_entry_t *pte, tpte; KASSERT((m->oflags & VPO_UNMANAGED) == 0, @@ -1694,7 +1673,9 @@ pmap_remove_all(vm_page_t m) pv->pv_pmap->pm_stats.resident_count--; - pte = pmap_pte(pv->pv_pmap, pv->pv_va); + pde = pmap_pde(pv->pv_pmap, pv->pv_va); + KASSERT(pde != NULL && *pde != 0, ("pmap_remove_all: pde")); + pte = pmap_pde_to_pte(pde, pv->pv_va); tpte = *pte; if (is_kernel_pmap(pv->pv_pmap)) @@ -1719,7 +1700,7 @@ pmap_remove_all(vm_page_t m) TAILQ_REMOVE(&pv->pv_pmap->pm_pvlist, pv, pv_plist); TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); m->md.pv_list_count--; - pmap_unuse_pt(pv->pv_pmap, pv->pv_va, pv->pv_ptem); + pmap_unuse_pt(pv->pv_pmap, pv->pv_va, *pde); PMAP_UNLOCK(pv->pv_pmap); free_pv_entry(pv); } @@ -1925,7 +1906,6 @@ pmap_enter(pmap_t pmap, vm_offset_t va, vm_prot_t pv = get_pv_entry(pmap); pv->pv_va = va; pv->pv_pmap = pmap; - pv->pv_ptem = mpte; TAILQ_INSERT_TAIL(&pmap->pm_pvlist, pv, pv_plist); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_list); m->md.pv_list_count++; @@ -2063,14 +2043,8 @@ pmap_enter_quick_locked(pmap_t pmap, vm_offset_t v * increment the hold count, and activate it. */ if (pde && *pde != 0) { - if (pmap->pm_ptphint && - (pmap->pm_ptphint->pindex == ptepindex)) { - mpte = pmap->pm_ptphint; - } else { - mpte = PHYS_TO_VM_PAGE( - MIPS_DIRECT_TO_PHYS(*pde)); - pmap->pm_ptphint = mpte; - } + mpte = PHYS_TO_VM_PAGE( + MIPS_DIRECT_TO_PHYS(*pde)); mpte->wire_count++; } else { mpte = _pmap_allocpte(pmap, ptepindex, @@ -2458,6 +2432,7 @@ pmap_page_exists_quick(pmap_t pmap, vm_page_t m) void pmap_remove_pages(pmap_t pmap) { + pd_entry_t *pde; pt_entry_t *pte, tpte; pv_entry_t pv, npv; vm_page_t m; @@ -2470,7 +2445,9 @@ pmap_remove_pages(pmap_t pmap) PMAP_LOCK(pmap); for (pv = TAILQ_FIRST(&pmap->pm_pvlist); pv != NULL; pv = npv) { - pte = pmap_pte(pv->pv_pmap, pv->pv_va); + pde = pmap_pde(pmap, pv->pv_va); + KASSERT(pde != NULL && *pde != 0, ("pmap_remove_pages: pde")); + pte = pmap_pde_to_pte(pde, pv->pv_va); if (!pte_test(pte, PTE_V)) panic("pmap_remove_pages: page on pm_pvlist has no pte"); tpte = *pte; @@ -2488,7 +2465,7 @@ pmap_remove_pages(pmap_t pmap) KASSERT(m != NULL, ("pmap_remove_pages: bad tpte %#jx", (uintmax_t)tpte)); - pv->pv_pmap->pm_stats.resident_count--; + pmap->pm_stats.resident_count--; /* * Update the vm_page_t clean and reference bits. @@ -2497,14 +2474,14 @@ pmap_remove_pages(pmap_t pmap) vm_page_dirty(m); } npv = TAILQ_NEXT(pv, pv_plist); - TAILQ_REMOVE(&pv->pv_pmap->pm_pvlist, pv, pv_plist); + TAILQ_REMOVE(&pmap->pm_pvlist, pv, pv_plist); m->md.pv_list_count--; TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); if (TAILQ_FIRST(&m->md.pv_list) == NULL) { vm_page_aflag_clear(m, PGA_WRITEABLE); } - pmap_unuse_pt(pv->pv_pmap, pv->pv_va, pv->pv_ptem); + pmap_unuse_pt(pmap, pv->pv_va, *pde); free_pv_entry(pv); } pmap_invalidate_all(pmap); Index: mips/include/pmap.h =================================================================== --- mips/include/pmap.h (revision 239097) +++ mips/include/pmap.h (working copy) @@ -90,7 +90,6 @@ struct pmap { u_int32_t gen:ASIDGEN_BITS; /* its generation number */ } pm_asid[MAXSMPCPU]; struct pmap_statistics pm_stats; /* pmap statistics */ - struct vm_page *pm_ptphint; /* pmap ptp hint */ struct mtx pm_mtx; }; @@ -126,7 +125,6 @@ typedef struct pv_entry { vm_offset_t pv_va; /* virtual address for mapping */ TAILQ_ENTRY(pv_entry) pv_list; TAILQ_ENTRY(pv_entry) pv_plist; - vm_page_t pv_ptem; /* VM page for pte */ } *pv_entry_t; /* --------------030808020102000304090105-- From owner-freebsd-mips@FreeBSD.ORG Thu Aug 9 15:36:31 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5D8F71065672 for ; Thu, 9 Aug 2012 15:36:31 +0000 (UTC) (envelope-from c.jayachandran@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id E46B68FC16 for ; Thu, 9 Aug 2012 15:36:30 +0000 (UTC) Received: by weyx56 with SMTP id x56so471289wey.13 for ; Thu, 09 Aug 2012 08:36:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=E2ZwIZUTh2xO/m0mxCMdV34QCB/QcTZQGcM/fve2X8s=; b=P584eVBp2wD/VQzyyXXGcNFsKCud4RHNLwSODkHuAaEp2xcb4v8x8NwKa9jKQ8q0v8 /2G6K8fq9Y40sXUpLp2blxraifqNiurSv8jTZoaDRZJSZKVyACq72CvYcqrpHK9v+MYZ mgGYOwkCjpvpiUFEN2g0CEHxUnNoqtCarfPWOvpF4Tb5Xpw0uOfiqTXnMl1Yz4y1A+YO G2SZwkMvqCfK/x4T45fSC4R8Z3o/ePgPG1ZzsLpG8luJcPIQc+vscMcgYpnqZQPyAhgK odkDSnSxSRP3yHUWK9ngoF8Ymid6f1QIVKqvJC+FDxbOc7kL0xtm1/P9omUr4Zuk/94E /d1A== MIME-Version: 1.0 Received: by 10.216.162.141 with SMTP id y13mr11647992wek.14.1344526589550; Thu, 09 Aug 2012 08:36:29 -0700 (PDT) Received: by 10.216.153.42 with HTTP; Thu, 9 Aug 2012 08:36:29 -0700 (PDT) In-Reply-To: <50228F5C.1000408@rice.edu> References: <50228F5C.1000408@rice.edu> Date: Thu, 9 Aug 2012 21:06:29 +0530 Message-ID: From: "Jayachandran C." To: Alan Cox Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: mips@freebsd.org Subject: Re: mips pmap patch X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Aug 2012 15:36:31 -0000 On Wed, Aug 8, 2012 at 9:40 PM, Alan Cox wrote: > Can someone please test this patch? It applies some changes to the mips > pmap that were made a long time ago to the amd64 and i386 pmaps. In > particular, it reduces the size of a pv entry. > > Briefly, the big picture is that in order to move forward with further > locking refinements to the VM system's machine-independent layer, I need to > eliminate all uses of the page queues lock from every pmap. In order to > remove the page queues lock from the mips pmap, I need to port the new pv > entry allocator from the amd64 and i386 pmaps. This patch is preparation > for that. Tested the patch on XLP for about an hour ('make -j 64 buildworld' on 32 cpu mips64) and did not see any issues. JC. From owner-freebsd-mips@FreeBSD.ORG Fri Aug 10 11:57:52 2012 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 73B11106566C for ; Fri, 10 Aug 2012 11:57:52 +0000 (UTC) (envelope-from ambrosehua@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id E88DB8FC0A for ; Fri, 10 Aug 2012 11:57:51 +0000 (UTC) Received: by wgbds11 with SMTP id ds11so1249246wgb.31 for ; Fri, 10 Aug 2012 04:57:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=FOKHL5EPgs/XNLLY8H7g7ZT2HarIgaKhzAhgIrrUDmU=; b=skaJsp0d41S9/kaYNbirYdvA2RSepTzS4eh04YoGitXVkVDbWIIgEYYHAXTTVfGwxM mzRJ6DSLvekK/vBIpaVPPdOx6Uv0lDXseqQr+/GhDpnIrd/cIUcC6W7AW39cEdwAQZJQ AOX046L/Za6Q/y9I/7t4ZeDdfY2zDaW7d7Yuzl4TPKr8zv9p9DawtWtcYQkKu10Yv27F amneU0hbDniitqlHmup7nqB/p+R33Qvz1MF0TUoZ+4orgUwwLeVrXz3qjvPM/p0UCxzd Wm41AfV5sNmgRiKszjPa4HUX9UjmZPKBA8M+5BhcYeytwAUCDiZUIOqZOn9ZEGzHO5Ox 2ZmA== MIME-Version: 1.0 Received: by 10.216.55.195 with SMTP id k45mr1371184wec.216.1344599870606; Fri, 10 Aug 2012 04:57:50 -0700 (PDT) Received: by 10.223.83.9 with HTTP; Fri, 10 Aug 2012 04:57:50 -0700 (PDT) Date: Fri, 10 Aug 2012 19:57:50 +0800 Message-ID: From: Paul Ambrose To: freebsd-mips@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: tlb.c tlb_invalidate_all_user simplified X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Aug 2012 11:57:52 -0000 when I saw this commit, commit e60abd7cb37fb1b66e8b0c48050aa56732c73e90 Author: alc Date: Fri Aug 10 05:00:50 2012 +0000 Merge r134393 from amd64/i386: The machine-independent parts of the virtual memory system always pass a valid pmap to the pmap functions that require one. Remove the checks for NULL. (These checks have their origins in the Mach pmap.c that was integrated into BSD. None of the new code written specifically for FreeBSD included them.) according to the description here, pmap should not be null, I think tlb_invalidate_all_user(struct pmap * pmap), in mips/mips/tlb.c ---------------------------------------------------------------------------------------------------- tlb_invalidate_all_user(struct pmap *pmap) { register_t asid; register_t s; unsigned i; s = intr_disable(); asid = mips_rd_entryhi() & TLBHI_ASID_MASK; for (i = mips_rd_wired(); i < num_tlbentries; i++) { register_t uasid; mips_wr_index(i); tlb_read(); uasid = mips_rd_entryhi() & TLBHI_ASID_MASK; if (pmap == NULL) { /* * Invalidate all non-kernel entries. */ if (uasid == 0) continue; } else { /* * Invalidate this pmap's entries. */ if (uasid != pmap_asid(pmap)) continue; } tlb_invalidate_one(i); } mips_wr_entryhi(asid); intr_restore(s); } --------------------------------------------------------------------------- could be simplified like this: --------------------------------------------------------------------------- tlb_invalidate_all_user(struct pmap *pmap) { register_t asid; register_t s; unsigned i; s = intr_disable(); asid = mips_rd_entryhi() & TLBHI_ASID_MASK; for (i = mips_rd_wired(); i < num_tlbentries; i++) { register_t uasid; mips_wr_index(i); tlb_read(); uasid = mips_rd_entryhi() & TLBHI_ASID_MASK; if ((uasid != pmap_asid(pmap)) || (uasid == 0)) continue; tlb_invalidate_one(i); } mips_wr_entryhi(asid); intr_restore(s); } funcntion tlb_invalidate_all_user(struct pmap *pmap) is ONLY called like this: pmap_activate -> pmap_alloc_asid -> tlb_invalidate_all_user, I think the pmap passed fullfils the assumption. How do you like it ? From owner-freebsd-mips@FreeBSD.ORG Fri Aug 10 17:50:59 2012 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4C59C106566B for ; Fri, 10 Aug 2012 17:50:59 +0000 (UTC) (envelope-from juli@clockworksquid.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id D02CD8FC2A for ; Fri, 10 Aug 2012 17:50:58 +0000 (UTC) Received: by wgbds11 with SMTP id ds11so1526981wgb.31 for ; Fri, 10 Aug 2012 10:50:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :x-gm-message-state; bh=NoXBAJ9VvM/wbkl3Nj5Dvo8FqS+G7CIG/YMcTSvXHww=; b=pPHydzzS2hcOW2iFiN/X4lz8DFXI8ToCGtg+cjtncDT4z8234axfeP+OoD7Izn33gW 6FW6SsX3nzfY6kUaBv5oYhMLlH+ou0jIrtIGPNJz2TEVpDmw7jF62ZgFgvZ5lXlRcImH duqftl60FPTLVHt8JWb3zTn7Y0I2DvLXDKTk7tb++hh6TitNXBOdPQN51U/F9nsha/hX zFb9kNozKrfhzuIUTkT98DUDrbiOfQCS706M6khkrZUaHAw7loBZC2VNcYYza+SpPCeB /NQmPsQluT4ujtTeW68aqHDHKY9S2vbhNmejnZESqXtRMRyq2QDILwqaTvbU5WKE+YbI U9Pg== Received: by 10.216.134.169 with SMTP id s41mr1948618wei.183.1344621052422; Fri, 10 Aug 2012 10:50:52 -0700 (PDT) MIME-Version: 1.0 Sender: juli@clockworksquid.com Received: by 10.223.157.76 with HTTP; Fri, 10 Aug 2012 10:50:23 -0700 (PDT) In-Reply-To: References: From: Juli Mallett Date: Fri, 10 Aug 2012 10:50:23 -0700 X-Google-Sender-Auth: f3HKUZ76VsSfaf9Yvkagn5N1gUo Message-ID: To: Paul Ambrose Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQkgnfe26UkafJV3tOnhBt+EioOqbB6erGlZMSImmjJBuiC7EC4wEBg0UG9kw2ead0fykiif Cc: freebsd-mips@freebsd.org Subject: Re: tlb.c tlb_invalidate_all_user simplified X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Aug 2012 17:50:59 -0000 Actually, tlb_invalidate_all_user is beneath the pmap layer, and checks for a NULL pmap are useful here because tlb_invalidate_all_user may be passed a NULL pmap by the caller deliberately. When a NULL pmap is passed, the behavior is changed: rather than ejecting from the TLB any entries which belong to a specific user pmap, it ejects all entries from the pmap that do not belong to the kernel. This is necessary because user address spaces are partitioned within the (shared) TLB by something called an Address Space ID (ASID). We allocate these linearly and keep a generation count to know whether one is currently valid. (This way we can avoid removing entries owned by a task from the TLB during most context switches.) That is, to know whether a given pmap already has a valid ASID assigned to it. Because the ASID field is 8 bits, we must increment that generation count every time we allocate our 256th (actually 255th since ASID 0 is reserved) ASID, and know that any entries in the TLB belonging to tasks from the previous run of ASIDs must be removed. So we call tlb_invalidate_all_user(NULL), which flushes all user entries, without flushing kernel entries and requiring the kernel to reload a bunch of TLB entries just to keep running. I hope that's remotely clear. The need and use of this function touch upon just about every complicated result of the MIPS TLB being software managed. Thanks, Juli. On Fri, Aug 10, 2012 at 4:57 AM, Paul Ambrose wrote: > when I saw this commit, > commit e60abd7cb37fb1b66e8b0c48050aa56732c73e90 > Author: alc > Date: Fri Aug 10 05:00:50 2012 +0000 > > Merge r134393 from amd64/i386: > The machine-independent parts of the virtual memory system always > pass a > valid pmap to the pmap functions that require one. Remove the checks > for > NULL. (These checks have their origins in the Mach pmap.c that was > integrated into BSD. None of the new code written specifically for > FreeBSD included them.) > > according to the description here, pmap should not be null, I think > tlb_invalidate_all_user(struct pmap * pmap), in mips/mips/tlb.c > ---------------------------------------------------------------------------------------------------- > tlb_invalidate_all_user(struct pmap *pmap) > { > register_t asid; > register_t s; > unsigned i; > > s = intr_disable(); > asid = mips_rd_entryhi() & TLBHI_ASID_MASK; > > for (i = mips_rd_wired(); i < num_tlbentries; i++) { > register_t uasid; > > mips_wr_index(i); > tlb_read(); > > uasid = mips_rd_entryhi() & TLBHI_ASID_MASK; > if (pmap == NULL) { > /* > * Invalidate all non-kernel entries. > */ > if (uasid == 0) > continue; > } else { > /* > * Invalidate this pmap's entries. > */ > if (uasid != pmap_asid(pmap)) > continue; > } > tlb_invalidate_one(i); > } > > mips_wr_entryhi(asid); > intr_restore(s); > } > --------------------------------------------------------------------------- > could be simplified like this: > --------------------------------------------------------------------------- > tlb_invalidate_all_user(struct pmap *pmap) > { > register_t asid; > register_t s; > unsigned i; > > s = intr_disable(); > asid = mips_rd_entryhi() & TLBHI_ASID_MASK; > > for (i = mips_rd_wired(); i < num_tlbentries; i++) { > register_t uasid; > > mips_wr_index(i); > tlb_read(); > > uasid = mips_rd_entryhi() & TLBHI_ASID_MASK; > if ((uasid != pmap_asid(pmap)) || (uasid == 0)) > continue; > tlb_invalidate_one(i); > } > > mips_wr_entryhi(asid); > intr_restore(s); > } > > funcntion tlb_invalidate_all_user(struct pmap *pmap) is ONLY called like > this: > pmap_activate -> pmap_alloc_asid -> tlb_invalidate_all_user, I think the > pmap passed > fullfils the assumption. How do you like it ? > _______________________________________________ > freebsd-mips@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-mips > To unsubscribe, send any mail to "freebsd-mips-unsubscribe@freebsd.org" From owner-freebsd-mips@FreeBSD.ORG Sat Aug 11 17:48:13 2012 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1CD4F106566B for ; Sat, 11 Aug 2012 17:48:13 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30]) by mx1.freebsd.org (Postfix) with ESMTP id DA4798FC0C for ; Sat, 11 Aug 2012 17:48:12 +0000 (UTC) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 883234C01F0; Sat, 11 Aug 2012 12:48:06 -0500 (CDT) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 867964C01EB; Sat, 11 Aug 2012 12:48:06 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel Received: from mh11.mail.rice.edu ([127.0.0.1]) by mh11.mail.rice.edu (mh11.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id w4xJ6o90reIf; Sat, 11 Aug 2012 12:48:06 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh11.mail.rice.edu (Postfix) with ESMTPSA id 79D934C01D5; Sat, 11 Aug 2012 12:48:05 -0500 (CDT) Message-ID: <50269AD4.9050804@rice.edu> Date: Sat, 11 Aug 2012 12:48:04 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:8.0) Gecko/20111113 Thunderbird/8.0 MIME-Version: 1.0 To: "Jayachandran C." References: <50228F5C.1000408@rice.edu> In-Reply-To: Content-Type: multipart/mixed; boundary="------------070100090001080509040401" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: mips@freebsd.org Subject: Re: mips pmap patch X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Aug 2012 17:48:13 -0000 This is a multi-part message in MIME format. --------------070100090001080509040401 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 08/09/2012 10:36, Jayachandran C. wrote: > On Wed, Aug 8, 2012 at 9:40 PM, Alan Cox > wrote: > > Can someone please test this patch? It applies some changes to > the mips pmap that were made a long time ago to the amd64 and i386 > pmaps. In particular, it reduces the size of a pv entry. > > Briefly, the big picture is that in order to move forward with > further locking refinements to the VM system's machine-independent > layer, I need to eliminate all uses of the page queues lock from > every pmap. In order to remove the page queues lock from the mips > pmap, I need to port the new pv entry allocator from the amd64 and > i386 pmaps. This patch is preparation for that. > > > Tested the patch on XLP for about an hour ('make -j 64 buildworld' on > 32 cpu mips64) and did not see any issues. > Thank you for the quick response. I am attaching the next patch for testing. This patch does two things: 1. It ports the new PV entry allocator from x86. This new allocator has two virtues. First, it doesn't use the page queues lock. Second, it shrinks the size of a PV entry by almost half. 2. I observed and fixed a rather serious bug in pmap_remove_write(). After removing write access from the physical page's first mapping, pmap_remove_write() then used the wrong "next" pointer. So, the page's second, third, etc. mapping would not be write protected. Instead, some arbitrary mapping for a completely different page would be write protected, likely leading to spurious page faults later to reestablish write access to that mapping. This patch needs testing in both 32 bit and 64 bit kernels. Thanks, Alan --------------070100090001080509040401 Content-Type: text/plain; name="mips_pmap15.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mips_pmap15.patch" Index: conf/options.mips =================================================================== --- conf/options.mips (revision 239097) +++ conf/options.mips (working copy) @@ -75,3 +75,7 @@ IF_RT_DEBUG opt_if_rt.h IF_RT_PHY_SUPPORT opt_if_rt.h IF_RT_RING_DATA_COUNT opt_if_rt.h +# +# Options that affect the pmap. +# +PV_STATS opt_pmap.h Index: mips/mips/pmap.c =================================================================== --- mips/mips/pmap.c (revision 239170) +++ mips/mips/pmap.c (working copy) @@ -69,6 +69,7 @@ __FBSDID("$FreeBSD$"); #include "opt_ddb.h" +#include "opt_pmap.h" #include #include @@ -77,6 +78,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #ifdef DDB #include #endif @@ -116,6 +118,12 @@ __FBSDID("$FreeBSD$"); #define PMAP_INLINE #endif +#ifdef PV_STATS +#define PV_STAT(x) do { x ; } while (0) +#else +#define PV_STAT(x) do { } while (0) +#endif + /* * Get PDEs and PTEs for user/kernel address space */ @@ -152,12 +160,13 @@ static void pmap_asid_alloc(pmap_t pmap); /* * Data for the pv entry allocation mechanism */ -static uma_zone_t pvzone; -static struct vm_object pvzone_obj; -static int pv_entry_count = 0, pv_entry_max = 0, pv_entry_high_water = 0; +static TAILQ_HEAD(pch, pv_chunk) pv_chunks = TAILQ_HEAD_INITIALIZER(pv_chunks); +static int pv_entry_count; -static PMAP_INLINE void free_pv_entry(pv_entry_t pv); -static pv_entry_t get_pv_entry(pmap_t locked_pmap); +static void free_pv_chunk(struct pv_chunk *pc); +static void free_pv_entry(pmap_t pmap, pv_entry_t pv); +static pv_entry_t get_pv_entry(pmap_t pmap, boolean_t try); +static vm_page_t pmap_pv_reclaim(pmap_t locked_pmap); static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va); static pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va); @@ -472,7 +481,7 @@ pmap_create_kernel_pagetable(void) PMAP_LOCK_INIT(kernel_pmap); kernel_pmap->pm_segtab = kernel_segmap; CPU_FILL(&kernel_pmap->pm_active); - TAILQ_INIT(&kernel_pmap->pm_pvlist); + TAILQ_INIT(&kernel_pmap->pm_pvchunk); kernel_pmap->pm_asid[0].asid = PMAP_ASID_RESERVED; kernel_pmap->pm_asid[0].gen = 0; kernel_vm_end += nkpt * NPTEPG * PAGE_SIZE; @@ -591,7 +600,6 @@ pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.pv_list); - m->md.pv_list_count = 0; m->md.pv_flags = 0; } @@ -599,23 +607,10 @@ pmap_page_init(vm_page_t m) * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. - * pmap_init has been enhanced to support in a fairly consistant - * way, discontiguous physical memory. */ void pmap_init(void) { - - /* - * Initialize the address space (zone) for the pv entries. Set a - * high water mark so that the system can recover from excessive - * numbers of pv entries. - */ - pvzone = uma_zcreate("PV ENTRY", sizeof(struct pv_entry), NULL, NULL, - NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE); - pv_entry_max = PMAP_SHPGPERPROC * maxproc + cnt.v_page_count; - pv_entry_high_water = 9 * (pv_entry_max / 10); - uma_zone_set_obj(pvzone, &pvzone_obj, pv_entry_max); } /*************************************************** @@ -1012,7 +1007,7 @@ pmap_pinit0(pmap_t pmap) pmap->pm_asid[i].gen = 0; } PCPU_SET(curpmap, pmap); - TAILQ_INIT(&pmap->pm_pvlist); + TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); } @@ -1070,7 +1065,7 @@ pmap_pinit(pmap_t pmap) pmap->pm_asid[i].asid = PMAP_ASID_RESERVED; pmap->pm_asid[i].gen = 0; } - TAILQ_INIT(&pmap->pm_pvlist); + TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); return (1); @@ -1296,127 +1291,320 @@ pmap_growkernel(vm_offset_t addr) } /*************************************************** -* page management routines. + * page management routines. ***************************************************/ +CTASSERT(sizeof(struct pv_chunk) == PAGE_SIZE); +#ifdef __mips_n64 +CTASSERT(_NPCM == 3); +CTASSERT(_NPCPV == 168); +#else +CTASSERT(_NPCM == 11); +CTASSERT(_NPCPV == 336); +#endif + +static __inline struct pv_chunk * +pv_to_chunk(pv_entry_t pv) +{ + + return ((struct pv_chunk *)((uintptr_t)pv & ~(uintptr_t)PAGE_MASK)); +} + +#define PV_PMAP(pv) (pv_to_chunk(pv)->pc_pmap) + +#ifdef __mips_n64 +#define PC_FREE0_1 0xfffffffffffffffful +#define PC_FREE2 0x000000fffffffffful +#else +#define PC_FREE0_9 0xfffffffful /* Free values for index 0 through 9 */ +#define PC_FREE10 0x0000fffful /* Free values for index 10 */ +#endif + +static const u_long pc_freemask[_NPCM] = { +#ifdef __mips_n64 + PC_FREE0_1, PC_FREE0_1, PC_FREE2 +#else + PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, + PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, + PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, + PC_FREE0_9, PC_FREE10 +#endif +}; + +static SYSCTL_NODE(_vm, OID_AUTO, pmap, CTLFLAG_RD, 0, "VM/pmap parameters"); + +SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_count, CTLFLAG_RD, &pv_entry_count, 0, + "Current number of pv entries"); + +#ifdef PV_STATS +static int pc_chunk_count, pc_chunk_allocs, pc_chunk_frees, pc_chunk_tryfail; + +SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_count, CTLFLAG_RD, &pc_chunk_count, 0, + "Current number of pv entry chunks"); +SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_allocs, CTLFLAG_RD, &pc_chunk_allocs, 0, + "Current number of pv entry chunks allocated"); +SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_frees, CTLFLAG_RD, &pc_chunk_frees, 0, + "Current number of pv entry chunks frees"); +SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_tryfail, CTLFLAG_RD, &pc_chunk_tryfail, 0, + "Number of times tried to get a chunk page but failed."); + +static long pv_entry_frees, pv_entry_allocs; +static int pv_entry_spare; + +SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_frees, CTLFLAG_RD, &pv_entry_frees, 0, + "Current number of pv entry frees"); +SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_allocs, CTLFLAG_RD, &pv_entry_allocs, 0, + "Current number of pv entry allocs"); +SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_spare, CTLFLAG_RD, &pv_entry_spare, 0, + "Current number of spare pv entries"); +#endif + /* + * We are in a serious low memory condition. Resort to + * drastic measures to free some pages so we can allocate + * another pv entry chunk. + */ +static vm_page_t +pmap_pv_reclaim(pmap_t locked_pmap) +{ + struct pch newtail; + struct pv_chunk *pc; + pd_entry_t *pde; + pmap_t pmap; + pt_entry_t *pte, oldpte; + pv_entry_t pv; + vm_offset_t va; + vm_page_t m, m_pc; + u_long inuse; + int bit, field, freed, idx; + + PMAP_LOCK_ASSERT(locked_pmap, MA_OWNED); + pmap = NULL; + m_pc = NULL; + TAILQ_INIT(&newtail); + while ((pc = TAILQ_FIRST(&pv_chunks)) != NULL) { + TAILQ_REMOVE(&pv_chunks, pc, pc_lru); + if (pmap != pc->pc_pmap) { + if (pmap != NULL) { + pmap_invalidate_all(pmap); + if (pmap != locked_pmap) + PMAP_UNLOCK(pmap); + } + pmap = pc->pc_pmap; + /* Avoid deadlock and lock recursion. */ + if (pmap > locked_pmap) + PMAP_LOCK(pmap); + else if (pmap != locked_pmap && !PMAP_TRYLOCK(pmap)) { + pmap = NULL; + TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); + continue; + } + } + + /* + * Destroy every non-wired, 4 KB page mapping in the chunk. + */ + freed = 0; + for (field = 0; field < _NPCM; field++) { + for (inuse = ~pc->pc_map[field] & pc_freemask[field]; + inuse != 0; inuse &= ~(1UL << bit)) { + bit = ffsl(inuse) - 1; + idx = field * sizeof(inuse) * NBBY + bit; + pv = &pc->pc_pventry[idx]; + va = pv->pv_va; + pde = pmap_pde(pmap, va); + KASSERT(pde != NULL && *pde != 0, + ("pmap_pv_reclaim: pde")); + pte = pmap_pde_to_pte(pde, va); + oldpte = *pte; + KASSERT(!pte_test(&oldpte, PTE_W), + ("wired pte for unwired page")); + if (is_kernel_pmap(pmap)) + *pte = PTE_G; + else + *pte = 0; + pmap_invalidate_page(pmap, va); + m = PHYS_TO_VM_PAGE(TLBLO_PTE_TO_PA(oldpte)); + if (pte_test(&oldpte, PTE_D)) + vm_page_dirty(m); + if (m->md.pv_flags & PV_TABLE_REF) + vm_page_aflag_set(m, PGA_REFERENCED); + TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); + if (TAILQ_EMPTY(&m->md.pv_list)) { + vm_page_aflag_clear(m, PGA_WRITEABLE); + m->md.pv_flags &= ~(PV_TABLE_REF | + PV_TABLE_MOD); + } + pc->pc_map[field] |= 1UL << bit; + pmap_unuse_pt(pmap, va, *pde); + freed++; + } + } + if (freed == 0) { + TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); + continue; + } + /* Every freed mapping is for a 4 KB page. */ + pmap->pm_stats.resident_count -= freed; + PV_STAT(pv_entry_frees += freed); + PV_STAT(pv_entry_spare += freed); + pv_entry_count -= freed; + TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); + for (field = 0; field < _NPCM; field++) + if (pc->pc_map[field] != pc_freemask[field]) { + TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, + pc_list); + TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); + + /* + * One freed pv entry in locked_pmap is + * sufficient. + */ + if (pmap == locked_pmap) + goto out; + break; + } + if (field == _NPCM) { + PV_STAT(pv_entry_spare -= _NPCPV); + PV_STAT(pc_chunk_count--); + PV_STAT(pc_chunk_frees++); + /* Entire chunk is free; return it. */ + m_pc = PHYS_TO_VM_PAGE(MIPS_DIRECT_TO_PHYS( + (vm_offset_t)pc)); + break; + } + } +out: + TAILQ_CONCAT(&pv_chunks, &newtail, pc_lru); + if (pmap != NULL) { + pmap_invalidate_all(pmap); + if (pmap != locked_pmap) + PMAP_UNLOCK(pmap); + } + return (m_pc); +} + +/* * free the pv_entry back to the free list */ -static PMAP_INLINE void -free_pv_entry(pv_entry_t pv) +static void +free_pv_entry(pmap_t pmap, pv_entry_t pv) { + struct pv_chunk *pc; + int bit, field, idx; + mtx_assert(&vm_page_queue_mtx, MA_OWNED); + PMAP_LOCK_ASSERT(pmap, MA_OWNED); + PV_STAT(pv_entry_frees++); + PV_STAT(pv_entry_spare++); pv_entry_count--; - uma_zfree(pvzone, pv); + pc = pv_to_chunk(pv); + idx = pv - &pc->pc_pventry[0]; + field = idx / (sizeof(u_long) * NBBY); + bit = idx % (sizeof(u_long) * NBBY); + pc->pc_map[field] |= 1ul << bit; + for (idx = 0; idx < _NPCM; idx++) + if (pc->pc_map[idx] != pc_freemask[idx]) { + /* + * 98% of the time, pc is already at the head of the + * list. If it isn't already, move it to the head. + */ + if (__predict_false(TAILQ_FIRST(&pmap->pm_pvchunk) != + pc)) { + TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); + TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, + pc_list); + } + return; + } + TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); + free_pv_chunk(pc); } +static void +free_pv_chunk(struct pv_chunk *pc) +{ + vm_page_t m; + + TAILQ_REMOVE(&pv_chunks, pc, pc_lru); + PV_STAT(pv_entry_spare -= _NPCPV); + PV_STAT(pc_chunk_count--); + PV_STAT(pc_chunk_frees++); + /* entire chunk is free, return it */ + m = PHYS_TO_VM_PAGE(MIPS_DIRECT_TO_PHYS((vm_offset_t)pc)); + vm_page_unwire(m, 0); + vm_page_free(m); +} + /* * get a new pv_entry, allocating a block from the system * when needed. - * the memory allocation is performed bypassing the malloc code - * because of the possibility of allocations at interrupt time. */ static pv_entry_t -get_pv_entry(pmap_t locked_pmap) +get_pv_entry(pmap_t pmap, boolean_t try) { - static const struct timeval printinterval = { 60, 0 }; - static struct timeval lastprint; - struct vpgqueues *vpq; - pd_entry_t *pde; - pt_entry_t *pte, oldpte; - pmap_t pmap; - pv_entry_t allocated_pv, next_pv, pv; - vm_offset_t va; + struct pv_chunk *pc; + pv_entry_t pv; vm_page_t m; + int bit, field, idx; - PMAP_LOCK_ASSERT(locked_pmap, MA_OWNED); mtx_assert(&vm_page_queue_mtx, MA_OWNED); - allocated_pv = uma_zalloc(pvzone, M_NOWAIT); - if (allocated_pv != NULL) { - pv_entry_count++; - if (pv_entry_count > pv_entry_high_water) - pagedaemon_wakeup(); - else - return (allocated_pv); - } - /* - * Reclaim pv entries: At first, destroy mappings to inactive - * pages. After that, if a pv entry is still needed, destroy - * mappings to active pages. - */ - if (ratecheck(&lastprint, &printinterval)) - printf("Approaching the limit on PV entries, " - "increase the vm.pmap.shpgperproc tunable.\n"); - vpq = &vm_page_queues[PQ_INACTIVE]; + PMAP_LOCK_ASSERT(pmap, MA_OWNED); + PV_STAT(pv_entry_allocs++); + pv_entry_count++; retry: - TAILQ_FOREACH(m, &vpq->pl, pageq) { - if ((m->flags & PG_MARKER) != 0 || m->hold_count || m->busy) - continue; - TAILQ_FOREACH_SAFE(pv, &m->md.pv_list, pv_list, next_pv) { - va = pv->pv_va; - pmap = pv->pv_pmap; - /* Avoid deadlock and lock recursion. */ - if (pmap > locked_pmap) - PMAP_LOCK(pmap); - else if (pmap != locked_pmap && !PMAP_TRYLOCK(pmap)) - continue; - pmap->pm_stats.resident_count--; - pde = pmap_pde(pmap, va); - KASSERT(pde != NULL && *pde != 0, - ("get_pv_entry: pde")); - pte = pmap_pde_to_pte(pde, va); - oldpte = *pte; - if (is_kernel_pmap(pmap)) - *pte = PTE_G; - else - *pte = 0; - KASSERT(!pte_test(&oldpte, PTE_W), - ("wired pte for unwired page")); - if (m->md.pv_flags & PV_TABLE_REF) - vm_page_aflag_set(m, PGA_REFERENCED); - if (pte_test(&oldpte, PTE_D)) - vm_page_dirty(m); - pmap_invalidate_page(pmap, va); - TAILQ_REMOVE(&pmap->pm_pvlist, pv, pv_plist); - m->md.pv_list_count--; - TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); - pmap_unuse_pt(pmap, va, *pde); - if (pmap != locked_pmap) - PMAP_UNLOCK(pmap); - if (allocated_pv == NULL) - allocated_pv = pv; - else - free_pv_entry(pv); + pc = TAILQ_FIRST(&pmap->pm_pvchunk); + if (pc != NULL) { + for (field = 0; field < _NPCM; field++) { + if (pc->pc_map[field]) { + bit = ffsl(pc->pc_map[field]) - 1; + break; + } } - if (TAILQ_EMPTY(&m->md.pv_list)) { - vm_page_aflag_clear(m, PGA_WRITEABLE); - m->md.pv_flags &= ~(PV_TABLE_REF | PV_TABLE_MOD); + if (field < _NPCM) { + idx = field * sizeof(pc->pc_map[field]) * NBBY + bit; + pv = &pc->pc_pventry[idx]; + pc->pc_map[field] &= ~(1ul << bit); + /* If this was the last item, move it to tail */ + for (field = 0; field < _NPCM; field++) + if (pc->pc_map[field] != 0) { + PV_STAT(pv_entry_spare--); + return (pv); /* not full, return */ + } + TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); + TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); + PV_STAT(pv_entry_spare--); + return (pv); } } - if (allocated_pv == NULL) { - if (vpq == &vm_page_queues[PQ_INACTIVE]) { - vpq = &vm_page_queues[PQ_ACTIVE]; + /* No free items, allocate another chunk */ + m = vm_page_alloc_freelist(VM_FREELIST_DIRECT, VM_ALLOC_NORMAL | + VM_ALLOC_WIRED); + if (m == NULL) { + if (try) { + pv_entry_count--; + PV_STAT(pc_chunk_tryfail++); + return (NULL); + } + m = pmap_pv_reclaim(pmap); + if (m == NULL) goto retry; - } - panic("get_pv_entry: increase the vm.pmap.shpgperproc tunable"); } - return (allocated_pv); + PV_STAT(pc_chunk_count++); + PV_STAT(pc_chunk_allocs++); + pc = (struct pv_chunk *)MIPS_PHYS_TO_DIRECT(VM_PAGE_TO_PHYS(m)); + pc->pc_pmap = pmap; + pc->pc_map[0] = pc_freemask[0] & ~1ul; /* preallocated bit 0 */ + for (field = 1; field < _NPCM; field++) + pc->pc_map[field] = pc_freemask[field]; + TAILQ_INSERT_TAIL(&pv_chunks, pc, pc_lru); + pv = &pc->pc_pventry[0]; + TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); + PV_STAT(pv_entry_spare += _NPCPV - 1); + return (pv); } /* - * Revision 1.370 - * - * Move pmap_collect() out of the machine-dependent code, rename it - * to reflect its new location, and add page queue and flag locking. - * - * Notes: (1) alpha, i386, and ia64 had identical implementations - * of pmap_collect() in terms of machine-independent interfaces; - * (2) sparc64 doesn't require it; (3) powerpc had it as a TODO. - * - * MIPS implementation was identical to alpha [Junos 8.2] - */ - -/* * If it is the first entry on the list, it is actually * in the header and we must copy the following entry up * to the header. Otherwise we must search the list for @@ -1428,24 +1616,13 @@ pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, { pv_entry_t pv; - PMAP_LOCK_ASSERT(pmap, MA_OWNED); mtx_assert(&vm_page_queue_mtx, MA_OWNED); - if (pvh->pv_list_count < pmap->pm_stats.resident_count) { - TAILQ_FOREACH(pv, &pvh->pv_list, pv_list) { - if (pmap == pv->pv_pmap && va == pv->pv_va) - break; + TAILQ_FOREACH(pv, &pvh->pv_list, pv_list) { + if (pmap == PV_PMAP(pv) && va == pv->pv_va) { + TAILQ_REMOVE(&pvh->pv_list, pv, pv_list); + break; } - } else { - TAILQ_FOREACH(pv, &pmap->pm_pvlist, pv_plist) { - if (va == pv->pv_va) - break; - } } - if (pv != NULL) { - TAILQ_REMOVE(&pvh->pv_list, pv, pv_list); - pvh->pv_list_count--; - TAILQ_REMOVE(&pmap->pm_pvlist, pv, pv_plist); - } return (pv); } @@ -1458,7 +1635,7 @@ pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm KASSERT(pv != NULL, ("pmap_pvh_free: pv not found, pa %lx va %lx", (u_long)VM_PAGE_TO_PHYS(member2struct(vm_page, md, pvh)), (u_long)va)); - free_pv_entry(pv); + free_pv_entry(pmap, pv); } static void @@ -1482,14 +1659,9 @@ pmap_try_insert_pv_entry(pmap_t pmap, vm_page_t mp PMAP_LOCK_ASSERT(pmap, MA_OWNED); mtx_assert(&vm_page_queue_mtx, MA_OWNED); - if (pv_entry_count < pv_entry_high_water && - (pv = uma_zalloc(pvzone, M_NOWAIT)) != NULL) { - pv_entry_count++; + if ((pv = get_pv_entry(pmap, TRUE)) != NULL) { pv->pv_va = va; - pv->pv_pmap = pmap; - TAILQ_INSERT_TAIL(&pmap->pm_pvlist, pv, pv_plist); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_list); - m->md.pv_list_count++; return (TRUE); } else return (FALSE); @@ -1648,6 +1820,7 @@ void pmap_remove_all(vm_page_t m) { pv_entry_t pv; + pmap_t pmap; pd_entry_t *pde; pt_entry_t *pte, tpte; @@ -1659,29 +1832,30 @@ pmap_remove_all(vm_page_t m) vm_page_aflag_set(m, PGA_REFERENCED); while ((pv = TAILQ_FIRST(&m->md.pv_list)) != NULL) { - PMAP_LOCK(pv->pv_pmap); + pmap = PV_PMAP(pv); + PMAP_LOCK(pmap); /* * If it's last mapping writeback all caches from * the page being destroyed */ - if (m->md.pv_list_count == 1) + if (TAILQ_NEXT(pv, pv_list) == NULL) mips_dcache_wbinv_range_index(pv->pv_va, PAGE_SIZE); - pv->pv_pmap->pm_stats.resident_count--; + pmap->pm_stats.resident_count--; - pde = pmap_pde(pv->pv_pmap, pv->pv_va); + pde = pmap_pde(pmap, pv->pv_va); KASSERT(pde != NULL && *pde != 0, ("pmap_remove_all: pde")); pte = pmap_pde_to_pte(pde, pv->pv_va); tpte = *pte; - if (is_kernel_pmap(pv->pv_pmap)) + if (is_kernel_pmap(pmap)) *pte = PTE_G; else *pte = 0; if (pte_test(&tpte, PTE_W)) - pv->pv_pmap->pm_stats.wired_count--; + pmap->pm_stats.wired_count--; /* * Update the vm_page_t clean and reference bits. @@ -1692,14 +1866,12 @@ pmap_remove_all(vm_page_t m) __func__, (void *)pv->pv_va, (uintmax_t)tpte)); vm_page_dirty(m); } - pmap_invalidate_page(pv->pv_pmap, pv->pv_va); + pmap_invalidate_page(pmap, pv->pv_va); - TAILQ_REMOVE(&pv->pv_pmap->pm_pvlist, pv, pv_plist); TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); - m->md.pv_list_count--; - pmap_unuse_pt(pv->pv_pmap, pv->pv_va, *pde); - PMAP_UNLOCK(pv->pv_pmap); - free_pv_entry(pv); + pmap_unuse_pt(pmap, pv->pv_va, *pde); + free_pv_entry(pmap, pv); + PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); @@ -1894,14 +2066,11 @@ pmap_enter(pmap_t pmap, vm_offset_t va, vm_prot_t KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva, ("pmap_enter: managed mapping within the clean submap")); if (pv == NULL) - pv = get_pv_entry(pmap); + pv = get_pv_entry(pmap, FALSE); pv->pv_va = va; - pv->pv_pmap = pmap; - TAILQ_INSERT_TAIL(&pmap->pm_pvlist, pv, pv_plist); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_list); - m->md.pv_list_count++; } else if (pv != NULL) - free_pv_entry(pv); + free_pv_entry(pmap, pv); /* * Increment counters @@ -2397,7 +2566,7 @@ pmap_page_exists_quick(pmap_t pmap, vm_page_t m) rv = FALSE; vm_page_lock_queues(); TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) { - if (pv->pv_pmap == pmap) { + if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } @@ -2422,8 +2591,11 @@ pmap_remove_pages(pmap_t pmap) { pd_entry_t *pde; pt_entry_t *pte, tpte; - pv_entry_t pv, npv; + pv_entry_t pv; vm_page_t m; + struct pv_chunk *pc, *npc; + u_long inuse, bitmask; + int allfree, bit, field, idx; if (pmap != vmspace_pmap(curthread->td_proc->p_vmspace)) { printf("warning: pmap_remove_pages called with non-current pmap\n"); @@ -2431,46 +2603,61 @@ pmap_remove_pages(pmap_t pmap) } vm_page_lock_queues(); PMAP_LOCK(pmap); - for (pv = TAILQ_FIRST(&pmap->pm_pvlist); pv != NULL; pv = npv) { + TAILQ_FOREACH_SAFE(pc, &pmap->pm_pvchunk, pc_list, npc) { + allfree = 1; + for (field = 0; field < _NPCM; field++) { + inuse = ~pc->pc_map[field] & pc_freemask[field]; + while (inuse != 0) { + bit = ffsl(inuse) - 1; + bitmask = 1UL << bit; + idx = field * sizeof(inuse) * NBBY + bit; + pv = &pc->pc_pventry[idx]; + inuse &= ~bitmask; - pde = pmap_pde(pmap, pv->pv_va); - KASSERT(pde != NULL && *pde != 0, ("pmap_remove_pages: pde")); - pte = pmap_pde_to_pte(pde, pv->pv_va); - if (!pte_test(pte, PTE_V)) - panic("pmap_remove_pages: page on pm_pvlist has no pte"); - tpte = *pte; + pde = pmap_pde(pmap, pv->pv_va); + KASSERT(pde != NULL && *pde != 0, + ("pmap_remove_pages: pde")); + pte = pmap_pde_to_pte(pde, pv->pv_va); + if (!pte_test(pte, PTE_V)) + panic("pmap_remove_pages: bad pte"); + tpte = *pte; /* * We cannot remove wired pages from a process' mapping at this time */ - if (pte_test(&tpte, PTE_W)) { - npv = TAILQ_NEXT(pv, pv_plist); - continue; - } - *pte = is_kernel_pmap(pmap) ? PTE_G : 0; + if (pte_test(&tpte, PTE_W)) { + allfree = 0; + continue; + } + *pte = is_kernel_pmap(pmap) ? PTE_G : 0; - m = PHYS_TO_VM_PAGE(TLBLO_PTE_TO_PA(tpte)); - KASSERT(m != NULL, - ("pmap_remove_pages: bad tpte %#jx", (uintmax_t)tpte)); + m = PHYS_TO_VM_PAGE(TLBLO_PTE_TO_PA(tpte)); + KASSERT(m != NULL, + ("pmap_remove_pages: bad tpte %#jx", + (uintmax_t)tpte)); - pmap->pm_stats.resident_count--; + /* + * Update the vm_page_t clean and reference bits. + */ + if (pte_test(&tpte, PTE_D)) + vm_page_dirty(m); - /* - * Update the vm_page_t clean and reference bits. - */ - if (pte_test(&tpte, PTE_D)) { - vm_page_dirty(m); + /* Mark free */ + PV_STAT(pv_entry_frees++); + PV_STAT(pv_entry_spare++); + pv_entry_count--; + pc->pc_map[field] |= bitmask; + pmap->pm_stats.resident_count--; + TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); + if (TAILQ_EMPTY(&m->md.pv_list)) + vm_page_aflag_clear(m, PGA_WRITEABLE); + pmap_unuse_pt(pmap, pv->pv_va, *pde); + } } - npv = TAILQ_NEXT(pv, pv_plist); - TAILQ_REMOVE(&pmap->pm_pvlist, pv, pv_plist); - - m->md.pv_list_count--; - TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); - if (TAILQ_FIRST(&m->md.pv_list) == NULL) { - vm_page_aflag_clear(m, PGA_WRITEABLE); + if (allfree) { + TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); + free_pv_chunk(pc); } - pmap_unuse_pt(pmap, pv->pv_va, *pde); - free_pv_entry(pv); } pmap_invalidate_all(pmap); PMAP_UNLOCK(pmap); @@ -2486,21 +2673,20 @@ static boolean_t pmap_testbit(vm_page_t m, int bit) { pv_entry_t pv; + pmap_t pmap; pt_entry_t *pte; boolean_t rv = FALSE; if (m->oflags & VPO_UNMANAGED) return (rv); - if (TAILQ_FIRST(&m->md.pv_list) == NULL) - return (rv); - mtx_assert(&vm_page_queue_mtx, MA_OWNED); TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) { - PMAP_LOCK(pv->pv_pmap); - pte = pmap_pte(pv->pv_pmap, pv->pv_va); + pmap = PV_PMAP(pv); + PMAP_LOCK(pmap); + pte = pmap_pte(pmap, pv->pv_va); rv = pte_test(pte, bit); - PMAP_UNLOCK(pv->pv_pmap); + PMAP_UNLOCK(pmap); if (rv) break; } @@ -2514,6 +2700,7 @@ static __inline void pmap_changebit(vm_page_t m, int bit, boolean_t setem) { pv_entry_t pv; + pmap_t pmap; pt_entry_t *pte; if (m->oflags & VPO_UNMANAGED) @@ -2525,11 +2712,12 @@ pmap_changebit(vm_page_t m, int bit, boolean_t set * setting RO do we need to clear the VAC? */ TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) { - PMAP_LOCK(pv->pv_pmap); - pte = pmap_pte(pv->pv_pmap, pv->pv_va); + pmap = PV_PMAP(pv); + PMAP_LOCK(pmap); + pte = pmap_pte(pmap, pv->pv_va); if (setem) { *pte |= bit; - pmap_update_page(pv->pv_pmap, pv->pv_va, *pte); + pmap_update_page(pmap, pv->pv_va, *pte); } else { pt_entry_t pbits = *pte; @@ -2541,10 +2729,10 @@ pmap_changebit(vm_page_t m, int bit, boolean_t set } else { *pte = pbits & ~bit; } - pmap_update_page(pv->pv_pmap, pv->pv_va, *pte); + pmap_update_page(pmap, pv->pv_va, *pte); } } - PMAP_UNLOCK(pv->pv_pmap); + PMAP_UNLOCK(pmap); } if (!setem && bit == PTE_D) vm_page_aflag_clear(m, PGA_WRITEABLE); @@ -2569,7 +2757,7 @@ pmap_page_wired_mappings(vm_page_t m) return (count); vm_page_lock_queues(); TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) { - pmap = pv->pv_pmap; + pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte = pmap_pte(pmap, pv->pv_va); if (pte_test(pte, PTE_W)) @@ -2586,9 +2774,9 @@ pmap_page_wired_mappings(vm_page_t m) void pmap_remove_write(vm_page_t m) { - pv_entry_t pv, npv; - vm_offset_t va; - pt_entry_t *pte; + pmap_t pmap; + pt_entry_t pbits, *pte; + pv_entry_t pv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_write: page %p is not managed", m)); @@ -2602,20 +2790,25 @@ pmap_remove_write(vm_page_t m) if ((m->oflags & VPO_BUSY) == 0 && (m->aflags & PGA_WRITEABLE) == 0) return; - - /* - * Loop over all current mappings setting/clearing as appropos. - */ vm_page_lock_queues(); - for (pv = TAILQ_FIRST(&m->md.pv_list); pv; pv = npv) { - npv = TAILQ_NEXT(pv, pv_plist); - pte = pmap_pte(pv->pv_pmap, pv->pv_va); - if (pte == NULL || !pte_test(pte, PTE_V)) - panic("page on pm_pvlist has no pte"); - - va = pv->pv_va; - pmap_protect(pv->pv_pmap, va, va + PAGE_SIZE, - VM_PROT_READ | VM_PROT_EXECUTE); + TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) { + pmap = PV_PMAP(pv); + PMAP_LOCK(pmap); + pte = pmap_pte(pmap, pv->pv_va); + KASSERT(pte != NULL && pte_test(pte, PTE_V), + ("page on pv_list has no pte")); + pbits = *pte; + if (pte_test(&pbits, PTE_D)) { + pte_clear(&pbits, PTE_D); + vm_page_dirty(m); + m->md.pv_flags &= ~PV_TABLE_MOD; + } + pte_set(&pbits, PTE_RO); + if (pbits != *pte) { + *pte = pbits; + pmap_update_page(pmap, pv->pv_va, pbits); + } + PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); vm_page_unlock_queues(); Index: mips/include/pmap.h =================================================================== --- mips/include/pmap.h (revision 239152) +++ mips/include/pmap.h (working copy) @@ -66,9 +66,9 @@ * Pmap stuff */ struct pv_entry; +struct pv_chunk; struct md_page { - int pv_list_count; int pv_flags; TAILQ_HEAD(, pv_entry) pv_list; }; @@ -82,8 +82,7 @@ struct md_page { struct pmap { pd_entry_t *pm_segtab; /* KVA of segment table */ - TAILQ_HEAD(, pv_entry) pm_pvlist; /* list of mappings in - * pmap */ + TAILQ_HEAD(, pv_chunk) pm_pvchunk; /* list of mappings in pmap */ cpuset_t pm_active; /* active on cpus */ struct { u_int32_t asid:ASID_BITS; /* TLB address space tag */ @@ -121,13 +120,30 @@ extern struct pmap kernel_pmap_store; * mappings of that page. An entry is a pv_entry_t, the list is pv_table. */ typedef struct pv_entry { - pmap_t pv_pmap; /* pmap where mapping lies */ vm_offset_t pv_va; /* virtual address for mapping */ TAILQ_ENTRY(pv_entry) pv_list; - TAILQ_ENTRY(pv_entry) pv_plist; } *pv_entry_t; /* + * pv_entries are allocated in chunks per-process. This avoids the + * need to track per-pmap assignments. + */ +#ifdef __mips_n64 +#define _NPCM 3 +#define _NPCPV 168 +#else +#define _NPCM 11 +#define _NPCPV 336 +#endif +struct pv_chunk { + pmap_t pc_pmap; + TAILQ_ENTRY(pv_chunk) pc_list; + u_long pc_map[_NPCM]; /* bitmap; 1 = free */ + TAILQ_ENTRY(pv_chunk) pc_lru; + struct pv_entry pc_pventry[_NPCPV]; +}; + +/* * physmem_desc[] is a superset of phys_avail[] and describes all the * memory present in the system. * --------------070100090001080509040401--