From owner-freebsd-arch@FreeBSD.ORG Tue Sep 30 12:37:06 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6535CCBE; Tue, 30 Sep 2014 12:37:06 +0000 (UTC) Received: from mail-qa0-x236.google.com (mail-qa0-x236.google.com [IPv6:2607:f8b0:400d:c00::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BBCCD30F; Tue, 30 Sep 2014 12:37:05 +0000 (UTC) Received: by mail-qa0-f54.google.com with SMTP id n8so10104430qaq.41 for ; Tue, 30 Sep 2014 05:37:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=wt7eqEnc0Nni+B8Y9qjkslOckaPfPPvgv7LhuWPsw3E=; b=PEM4ww2YVBFI/YO5Anous87iKgUC9Feb13wpqeAmas6PGKKVqfLbTLpcBB11X8gMlc IvV49o95WV7hGpdzdpJEn5TPrwDJU3AXHrM/5VkTJFwomds9ItTv5WiCys3RIjAA2FXw nWbl2F4gWRB0m/3YPPrOH2/khoF/2UZkJqo9jMexkPeohBeXAJHVsLeV5UgAsd7MzcTc 9WBaYIGwcJFMYRJMIDItHF2tts/9YdNiqV1z6BiIagmjMT9v8IgyPMcx5TVkpv0Xr4ty ceSZmwbA2RNcyU49VpyOKYiJsxTg2RV2ea1+5lZuM2HrakPea2tzmLbW2yul5rzqy4DE yZmQ== MIME-Version: 1.0 X-Received: by 10.224.86.68 with SMTP id r4mr39598580qal.0.1412080621924; Tue, 30 Sep 2014 05:37:01 -0700 (PDT) Received: by 10.140.23.242 with HTTP; Tue, 30 Sep 2014 05:37:01 -0700 (PDT) In-Reply-To: <5428AF3B.1030906@rice.edu> References: <5428AF3B.1030906@rice.edu> Date: Tue, 30 Sep 2014 14:37:01 +0200 Message-ID: Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE From: Svatopluk Kraus To: Alan Cox Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: alc@freebsd.org, FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Sep 2014 12:37:06 -0000 On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox wrote: > On 09/27/2014 03:51, Svatopluk Kraus wrote: > > > On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox wrote: > >> >> >> On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus >> wrote: >> >>> Hi, >>> >>> I and Michal are finishing new ARM pmap-v6 code. There is one problem >>> we've >>> dealt with somehow, but now we would like to do it better. It's about >>> physical pages which are allocated before vm subsystem is initialized. >>> While later on these pages could be found in vm_page_array when >>> VM_PHYSSEG_DENSE memory model is used, it's not true for >>> VM_PHYSSEG_SPARSE >>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model. >>> >>> It really would be nice to utilize vm_page_array for such preallocated >>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. Things >>> could be much easier then. In our case, it's about pages which are used >>> for >>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of such >>> pages. First ones are preallocated and second ones are allocated after vm >>> subsystem was inited. We must deal with each set differently. So code is >>> more complex and so is debugging. >>> >>> Thus we need some method how to say that some part of physical memory >>> should be included in vm_page_array, but the pages from that region >>> should >>> not be put to free list during initialization. We think that such >>> possibility could be utilized in general. There could be a need for some >>> physical space which: >>> >>> (1) is needed only during boot and later on it can be freed and put to vm >>> subsystem, >>> >>> (2) is needed for something else and vm_page_array code could be used >>> without some kind of its duplication. >>> >>> There is already some code which deals with blacklisted pages in >>> vm_page.c >>> file. So the easiest way how to deal with presented situation is to add >>> some callback to this part of code which will be able to either exclude >>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the >>> biggest >>> phys_avail region is used for vm subsystem allocations, there should be >>> some more coding. (However, blacklisted pages are not dealt with on that >>> part of region.) >>> >>> We would like to know if there is any objection: >>> >>> (1) to deal with presented problem, >>> (2) to deal with the problem presented way. >>> Some help is very appreciated. Thanks >>> >>> >> >> As an experiment, try modifying vm_phys.c to use dump_avail instead of >> phys_avail when sizing vm_page_array. On amd64, where the same problem >> exists, this allowed me to use VM_PHYSSEG_SPARSE. Right now, this is >> probably my preferred solution. The catch being that not all architectures >> implement dump_avail, but my recollection is that arm does. >> > > Frankly, I would prefer this too, but there is one big open question: > > What is dump_avail for? > > > > dump_avail[] is solving a similar problem in the minidump code, hence, the > prefix "dump_" in its name. In other words, the minidump code couldn't use > phys_avail[] either because it didn't describe the full range of physical > addresses that might be included in a minidump, so dump_avail[] was created. > > There is already precedent for what I'm suggesting. dump_avail[] is > already (ab)used outside of the minidump code on x86 to solve this same > problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. > > > Using it for vm_page_array initialization and segmentation means that > phys_avail must be a subset of it. And this must be stated and be visible > enough. Maybe it should be even checked in code. I like the idea of > thinking about dump_avail as something what desribes all memory in a > system, but it's not how dump_avail is defined in archs now. > > > > When you say "it's not how dump_avail is defined in archs now", I'm not > sure whether you're talking about the code or the comments. In terms of > code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any > code that would have to change. In terms of comments, I did a grep looking > for comments defining what dump_avail[] is, because I couldn't remember > any. I found one ... on arm. So, I don't think it's a onerous task > changing the definition of dump_avail[]. :-) > > Already, as things stand today with dump_avail[] being used outside of the > minidump code, one could reasonably argue that it should be renamed to > something like phys_exists[]. > > > > I will experiment with it on monday then. However, it's not only about how > memory segments are created in vm_phys.c, but it's about how vm_page_array > size is computed in vm_page.c too. > > > > Yes, and there is also a place in vm_reserv.c that needs to change. I've > attached the patch that I developed and tested a long time ago. It still > applies cleanly and runs ok on amd64. > I've thought about it once again considering what I've learned yesterday from all archs in tree. (1) In arm, there are defined two extra arrays - hwregions and exregions - which already describes all memory on platform. This regions could be filled from FDT. => There is no need to describe all memory on platform for MI kernel sake, but there is need to describe memory which (a) should be kept in vm_page_array and, (b) is available to vm subsystem. For (a) phys_held[] or phys_known[] or phys_filed[] or phys_managed[] or ... For (b) phys_avail[]. (2) In mips, dump_avail[] is defined from phys_avail[] and is same. => One step is a renaming of misused dump_avail, for example to phys_known[]. But why not to do another step? The phys_known[] must be a superset of phys_avail[] and so superior. Thus phys_avail[] should be defined from phys_known[]. (3) In sparc64 and powerpc, dump_avail[] is not defined. => Why to force all archs to use both phys_known[] and phys_avail[]. If a pointer to phys_avail[] will be defined, then if not NULL, the array being pointed will be used. Otherwise, phys_known[] will be used. The pointer can be defined in MI code and initialized to NULL. So in archs which do not need both arrays there will be no need to do anything. The vm_reserv_init() and vm_phys_init() are called from vm_page_startup(). All three functions are only ones in MI code involved in this thing. The first two of them have no arguments now. If this will be changed and a pointer to array which should be processed will be given to them, then only real work remains in vm_page_startup(). What do you think about this? Svata