Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Apr 2017 16:55:00 +0300
From:      Flavius Anton <f.v.anton@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   Re: On COW memory mapping in d_mmap_single
Message-ID:  <CANXdjjZrjxhbqhZ13sAuZP7cqpvYU8CJusQ2NEpGuRCVMgr0=g@mail.gmail.com>
In-Reply-To: <CANXdjjYajtvWK%2Bq3OK4j5uPFR4sVUrhrQD8zZSpoJ1hwZhVS5Q@mail.gmail.com>
References:  <CANXdjjYajtvWK%2Bq3OK4j5uPFR4sVUrhrQD8zZSpoJ1hwZhVS5Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
>On Tue, Apr 11, 2017 at 04:00:21PM +0300, Konstantin Belousov wrote:
>>On Tue, Apr 11, 2017 at 03:37:26PM +0300, Flavius Anton wrote:
>> Hi everyone,
>>
>> I'll start by giving some context, so you can better understand what
>> is the problem I'm trying to solve. I???ve been working for a while on
>> bhyve trying to implement save/restore [1]. We've currently managed to
>> get it working for VMs using a ramdisk and no devices, so just vCPU
>> and memory states are saved and restored so far.
>>
>> Last week I started looking into network devices, specifically
>> virtio-net devices. The problem was that when I issue a checkpoint
>> operation, the guest virtio driver stops working. After digging for a
>> while, I figured out the problem is marking VM memory as COW. If I
>> don't do this, the driver continues with no problem after
>> checkpointing.
>>
>> Each VM has an associated vmspace and a /dev/vmm/VM_NAME device. When
>> the user space does a mmap on the /dev device, we would like to mark
>> VM memory as COW, thus the VM can continue touching pages while the
>> user space is writing the 'freezed', COW marked memory to a persistent
>> storage. We do this by iterating through all vm_entries from VM's
>> vmspace, we find which entry is mapping the object that has VM memory
>> and then we roughly just set MAP_ENTRY_COW and MAP_ENTRY_NEEDS_COPY on
>> that entry. You can see the code here [2].
>
>This is very strange operation, to put it mildly.  First, are other vCPUs
>operate while you do your 'COW' ?  If yes, you are guaranteed to get
>inconsistent snapshot.  If not, then you do not need 'COW'.

Yes, all vCPUs are locked before calling mmap(). I agree that we don't
need 'COW', as long as we keep all vCPUs locked while we copy the
entire VM memory. But this might take a while, imagine a VM with 32GB
or more of RAM. This will take maybe minutes to write to disk, so we
don't actually want the VM to be freezed for so long. That's the
reason we'd like to map the memory COW and then unlock vCPUs.

>More, what kinds of VM objects are mapped into the vmspace ? FreeBSD VM
>does not support shadowing of device objects (which means, inserting
>shadow objects into the device object chain breaks VM invariants). One
>of the main reasons why it not needed to be supported is because shadow
>copy cannot see changes which are performed on the shadowed pages,
>supposedly done by device. If vmm mmaps some devices into guest vmspace,
>the devices would kind of 'freeze' from the guest PoV.

It's a OBJT_DEFAULT. It's not a device object, it's the memory object
given to guest to use as physical memory.

>Next, how do you undo the damage done by your 'COW' ?

This is one thing that we've thought about, but we don't have a
solution for now. I agree it is very important, though. I figured that
it might be possible to 'unmark' the memory object as COW with some
additional tricks.

>> I'm not sure if the above is sufficient for our purpose. In other
>> words, how would you do this? You have a vm_object that is referenced
>> via a vm_entry by process A (the user space). Somebody else, process B
>> let's say, does an mmap() on your device and you'd like to freeze that
>> object, such that process B can see a consistent snapshot of it, while
>> you want process A to be able to continue reading and writing from/to
>> it.
>This is not supported. I have no idea why would a copy of a page which
>reflects the device state even considered as a good idea. But you cannot
>make the consistent copy without device cooperation anyway, since device
>might modify its state while CPU reads.

I'm sorry if I haven't been too clear. The object that I'm trying to
map as COW is not a device object. It's just the object that contains
VM memory. That object shouldn't change if all VM vCPUs are locked and
I make sure they are when calling mmap().

Thanks for your input on this.

--
Flavius

>> I've also read through Design Elements of the FreeBSD VM system [3],
>> but I am still afraid (I am sure) that I have some misunderstandings.
>>
>> Thank you very much for bearing with me and going through this wall of text.
>>
>> [1] https://github.com/flaviusanton/freebsd/tree/bhyve-save-restore
>> [2] https://github.com/flaviusanton/freebsd/blob/bhyve-save-restore/sys/amd64/vmm/vmm_dev.c#L862
>> [3] https://www.freebsd.org/doc/en/articles/vm-design/index.html



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANXdjjZrjxhbqhZ13sAuZP7cqpvYU8CJusQ2NEpGuRCVMgr0=g>