From owner-freebsd-amd64@freebsd.org Sun Jul 1 13:34:53 2018 Return-Path: Delivered-To: freebsd-amd64@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1CD57FF74AB; Sun, 1 Jul 2018 13:34:53 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Received: from mail-ot0-x22f.google.com (mail-ot0-x22f.google.com [IPv6:2607:f8b0:4003:c0f::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9E5F570131; Sun, 1 Jul 2018 13:34:52 +0000 (UTC) (envelope-from elenamihailescu22@gmail.com) Received: by mail-ot0-x22f.google.com with SMTP id l15-v6so14690492oth.6; Sun, 01 Jul 2018 06:34:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=abSFmBRuXW2sVBS1p5Xa8RktRGe540YbZ53HDff2WAM=; b=RNyjmOwIU7cGYaqkimlCclKSiJUdouU1O0YXJzWCydzfJxkFuSjE1EozHOs+7S75KA IytGvIIUoSsOs3tZBgpLRIN+15oNHsU9AxnM3BWcFqESzZiT6QLj/x6yKiL2RlORQR7w HeT1qEqIZ6oZTemMGwqPWDz67ajgmS3Tl+uawe84llLcpFa1cHHN9hY9LGFtrBudLI3U jYTJlTNRpAJC/BvfDFwNztnwa/SgzhPx0xNNrqbjJ40Z6qd86ENIfQU7LAweSBOcxYms Wt0uOwS9efOcfkUCoa547r1F9+juWJFKi1E0PqUFjKr6QepYjyH39/sTq4v5RrIny6vk kpEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=abSFmBRuXW2sVBS1p5Xa8RktRGe540YbZ53HDff2WAM=; b=SNlkGc6cLccs/r2yoiU7j420rt0EASFwM3yjx9NoPSseFfPLF0pnSFEMFr0ddsLAt7 E6N77jjDYAzhGofsyrbnPWW3gftXmnF85uUpw5IakFyOLn+gwER4IuasO9XQwbPs19Z+ s3fVuzwgsG0DnjNS6dzalgYGZHVYUbE7WQiS6V2RlheLxqJAeQHlVM8Dj/mJ33EazAJD X3ZwMmri0qzmV110/3pfErdcJ/ERrZsUm8y/gcIfzkEKTHFRXILPIftIs97yWki4IkeC ivK6nUYxxElRJ3qEWtrejkHpsLxqYGLpqYN2PA4stkUMBdDYc+Njr+kR92TO7JiYs+Wt TaXA== X-Gm-Message-State: APt69E3EbEtgCrbeOnqjjC4pP8OIM1M72VNyINc1YEIzE1oyprIYGbZy VT3Hhioy5IFyJuQpPlDz4MMbDDlFOXKhXwFmW5gm2g== X-Google-Smtp-Source: AAOMgpdWsDTW2nJDo+HHstLWJoQ7jhxPjp3h91tZ6m7CdvXoXoJ3db7cKb3WlmniJ3sU6KFErRv7JirrNzfIP2JEb64= X-Received: by 2002:a9d:5b39:: with SMTP id x54-v6mr10743197oth.106.1530452091396; Sun, 01 Jul 2018 06:34:51 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:3c4e:0:0:0:0:0 with HTTP; Sun, 1 Jul 2018 06:34:20 -0700 (PDT) In-Reply-To: <20180701082919.GB3926@pesky.lan> References: <20180629225209.GA4238@pesky.lan> <20180630215956.GA1282@pesky.lan> <20180630223401.GW2430@kib.kiev.ua> <20180701082919.GB3926@pesky.lan> From: Elena Mihailescu Date: Sun, 1 Jul 2018 16:34:20 +0300 Message-ID: Subject: Re: Inspect pages created after a vm_object is marked as copy-on-write To: Mark Johnston Cc: Konstantin Belousov , Mihai Carabas , freebsd-virtualization@freebsd.org, freebsd-amd64@freebsd.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Jul 2018 13:34:53 -0000 On 1 July 2018 at 11:29, Mark Johnston wrote: > > On Sun, Jul 01, 2018 at 01:34:01AM +0300, Konstantin Belousov wrote: > > On Sat, Jun 30, 2018 at 05:59:56PM -0400, Mark Johnston wrote: > > > On Sat, Jun 30, 2018 at 10:38:21AM +0300, Mihai Carabas wrote: > > > > On Sat, Jun 30, 2018 at 1:52 AM, Mark Johnston wrote: > > > > > On Fri, Jun 29, 2018 at 11:58:31AM +0300, Elena Mihailescu wrote: > > > > >> Is there anything I am doing wrong? Maybe I misunderstood something about > > > > >> the way the virtual memory works in FreeBSD. > > > > > > > > > > I'll note that inspecting and manipulating vm_map_entry and vm_object > > > > > structures in the bhyve code constitutes something of an abstraction > > > > > violation, though it's reasonable to proceed this way while working on a > > > > > prototype of the feature. That is, I think you should keep trying your > > > > > current approach, but just be aware that you are using the copy-on-write > > > > > mechanism in a way that the VM system isn't really expecting. > > > > > > > > > > > > > Can you point out the right approach in our case? > > > > > > I am merely suggesting that once the required VM interactions are fully > > > understood, the mechanism implemented for bhyve should be generalized > > > and lifted into the VM code. It's hard to say what the "right" approach > > > is, since I don't fully understand the proposed algorithm. It sounds > > > like you might be attempting something like: > > > > > > 1. mark the mappings of to-be-migrated objects as NEEDS_COW, so that a > > > subsequent write fault triggers creation of a shadow object > > It is actually MAP_ENTRY_COW | MAP_ENTRY_NEEDS_COPY. > > Indeed. > > > Note that setting an entry to COW changes the behaviour of mprotect(2), > > at least. > > > > > 2. invalidate all physical mappings of pages in the object to be copied, > > > so that subsequent writes trigger a fault > > I do not think this is needed to detect writes after the COW is set. > > It is enough to remove the write permissions. Same as fork() does, > > see the vm_map_copy_entry() code for the handling of MAP_ENTRY_NEEDS_COPY > > case. > > Ah, right. > > > > 3. copy pages from the backing object to the destination > > As I understand, this is done right after the entry is marked > > as COW. > > > > > 4. copy any pages from the shadow object to the desination > > And this is done after all backing data is copied and the process is > > suspended. > > Right, I see. Some reading suggests that in general we might perform > multiple iterations of this procedure before suspending the process and > performing any remaining copies. > For the live migration implementation, we only need this approach to send the guest's physical memory (in my tests, I observed that the guest's memory was represented by a single vm_map_entry for a virtual machine with 512MB of RAM. I found that by inspecting each bhyve instance's vm_map_entry and compared its start and end addresses with the virtual machine's base-address and memory size). What we are trying to implement is to send the initial memory while the virtual machine is running (that's why we need to mark that entry copy-on-write so that we have consistency), then stop the virtual machine execution (freeze the vCPUs) and send only the pages/memory areas that were modified meanwhile (using the copy-on-write mechanism, we could determine which pages were modified). This is "a simplified live migration". A live-migration feature will have multiple rounds to send the virtual memory (round1: send the entire memory, round2: send only the differences between round1 and round2, round3: send only the difference between round2 and round3 and so on, until a certain point when you stop the virtual machine and send to the destination only the differences and guest's CPU state). After the memory migration is completed (and of course, after migrating guest's CPU state), the virtual machine will be destroyed. So, the objects created using the copy-on-write mechanisms will be destroyed. As the number of rounds, it will be choose later, based on different performance tests, but it shouldn't be too big. Elena > > > 5. collapse the backing object into the shadow > > > 6. if the shadow object exists and was non-empty before the collapse, > > > goto 1 > > Are you trying to describe how to undo the COW marking ? Marking an > > entry as COW really changes its semantic, and we do not need the undo > > operation in the base so far. Collapsing the objects would lesser > > the pressure on the system pollution with objects, but it does not > > change back the meaning of mappings, e.g. their behaviour on inheritance > > on fork. > > I was just thinking about how to avoid creating a long chain of objects > if multiple iterations are in fact needed. > _______________________________________________ > freebsd-amd64@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-amd64 > To unsubscribe, send any mail to "freebsd-amd64-unsubscribe@freebsd.org"