Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Jul 2018 16:34:20 +0300
From:      Elena Mihailescu <elenamihailescu22@gmail.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, Mihai Carabas <mihai.carabas@gmail.com>,  freebsd-virtualization@freebsd.org, freebsd-amd64@freebsd.org
Subject:   Re: Inspect pages created after a vm_object is marked as copy-on-write
Message-ID:  <CAGOCPLiXzc1o_J18XDbLMY100HEDX8MrpKVpouDrTBZukM9VLg@mail.gmail.com>
In-Reply-To: <20180701082919.GB3926@pesky.lan>
References:  <CAGOCPLjrbv5nyXTQwqrsJhF9wFEFACeEyuS1aW0jdHycWNYz8g@mail.gmail.com> <20180629225209.GA4238@pesky.lan> <CANg1yUupb0hQb7jQHjD%2BvY8=3m4v%2BcH819qNGS7TjDUVFMgL=Q@mail.gmail.com> <20180630215956.GA1282@pesky.lan> <20180630223401.GW2430@kib.kiev.ua> <20180701082919.GB3926@pesky.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 1 July 2018 at 11:29, Mark Johnston <markj@freebsd.org> wrote:
>
> On Sun, Jul 01, 2018 at 01:34:01AM +0300, Konstantin Belousov wrote:
> > On Sat, Jun 30, 2018 at 05:59:56PM -0400, Mark Johnston wrote:
> > > On Sat, Jun 30, 2018 at 10:38:21AM +0300, Mihai Carabas wrote:
> > > > On Sat, Jun 30, 2018 at 1:52 AM, Mark Johnston <markj@freebsd.org> wrote:
> > > > > On Fri, Jun 29, 2018 at 11:58:31AM +0300, Elena Mihailescu wrote:
> > > > >> Is there anything I am doing wrong? Maybe I misunderstood something about
> > > > >> the way the virtual memory works in FreeBSD.
> > > > >
> > > > > I'll note that inspecting and manipulating vm_map_entry and vm_object
> > > > > structures in the bhyve code constitutes something of an abstraction
> > > > > violation, though it's reasonable to proceed this way while working on a
> > > > > prototype of the feature.  That is, I think you should keep trying your
> > > > > current approach, but just be aware that you are using the copy-on-write
> > > > > mechanism in a way that the VM system isn't really expecting.
> > > > >
> > > >
> > > > Can you point out the right approach in our case?
> > >
> > > I am merely suggesting that once the required VM interactions are fully
> > > understood, the mechanism implemented for bhyve should be generalized
> > > and lifted into the VM code.  It's hard to say what the "right" approach
> > > is, since I don't fully understand the proposed algorithm.  It sounds
> > > like you might be attempting something like:
> > >
> > > 1. mark the mappings of to-be-migrated objects as NEEDS_COW, so that a
> > >    subsequent write fault triggers creation of a shadow object
> > It is actually MAP_ENTRY_COW | MAP_ENTRY_NEEDS_COPY.
>
> Indeed.
>
> > Note that setting an entry to COW changes the behaviour of mprotect(2),
> > at least.
> >
> > > 2. invalidate all physical mappings of pages in the object to be copied,
> > >    so that subsequent writes trigger a fault
> > I do not think this is needed to detect writes after the COW is set.
> > It is enough to remove the write permissions.  Same as fork() does,
> > see the vm_map_copy_entry() code for the handling of MAP_ENTRY_NEEDS_COPY
> > case.
>
> Ah, right.
>
> > > 3. copy pages from the backing object to the destination
> > As I understand, this is done right after the entry is marked
> > as COW.
> >
> > > 4. copy any pages from the shadow object to the desination
> > And this is done after all backing data is copied and the process is
> > suspended.
>
> Right, I see.  Some reading suggests that in general we might perform
> multiple iterations of this procedure before suspending the process and
> performing any remaining copies.
>

For the live migration implementation, we only need this approach to
send the guest's physical memory (in my tests, I observed that the
guest's memory was represented by a single vm_map_entry for a virtual
machine with 512MB of RAM. I found that by inspecting each bhyve
instance's vm_map_entry and compared its start and end addresses with
the virtual machine's base-address and memory size).

What we are trying to implement is to send the initial memory while
the virtual machine is running (that's why we need to mark that entry
copy-on-write so that we have consistency), then stop the virtual
machine execution (freeze the vCPUs) and send only the pages/memory
areas that were modified meanwhile (using the copy-on-write mechanism,
we could determine which pages were modified). This is "a simplified
live migration". A live-migration feature will have multiple rounds to
send the virtual memory (round1: send the entire memory, round2: send
only the differences between round1 and round2, round3: send only the
difference between round2 and round3 and so on, until a certain point
when you stop the virtual machine and send to the destination only the
differences and guest's CPU state).

After the memory migration is completed (and of course, after
migrating guest's CPU state), the virtual machine will be destroyed.
So, the objects created using the copy-on-write mechanisms will be
destroyed.

As the number of rounds, it will be choose later, based on different
performance tests, but it shouldn't be too big.

Elena

> > > 5. collapse the backing object into the shadow
> > > 6. if the shadow object exists and was non-empty before the collapse,
> > >    goto 1
> > Are you trying to describe how to undo the COW marking ?  Marking an
> > entry as COW really changes its semantic, and we do not need the undo
> > operation in the base so far.  Collapsing the objects would lesser
> > the pressure on the system pollution with objects, but it does not
> > change back the meaning of mappings, e.g. their behaviour on inheritance
> > on fork.
>
> I was just thinking about how to avoid creating a long chain of objects
> if multiple iterations are in fact needed.
> _______________________________________________
> freebsd-amd64@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-amd64
> To unsubscribe, send any mail to "freebsd-amd64-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGOCPLiXzc1o_J18XDbLMY100HEDX8MrpKVpouDrTBZukM9VLg>