Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 08 Dec 2014 07:23:14 +0200
From:      Tinker <tinkr@openmailbox.org>
To:        Peter Grehan <grehan@freebsd.org>
Cc:        freebsd-virtualization@freebsd.org
Subject:   Re: Can a host OS user process create a zillion BHyVe VM:s and microcontrol =?UTF-8?Q?them=3F?=
Message-ID:  <0fd8dd84712a2d78e4397ac89d78326c@openmailbox.org>
In-Reply-To: <5483BA9F.2000905@freebsd.org>
References:  <CAG=rPVccq7R5%2Bcbm6nR1WCZDM=-xwwkmF=cw8PCuk58oHPA-gQ@mail.gmail.com> <1423616F-F44D-47E5-8595-DE862DC04464@bsdimp.com> <546A34C8.6060004@freebsd.org> <CAG=rPVeEEuK874g6%2BfVpHa5J_4V%2BA%2BQNbB5bCpXiS86jZW_U3Q@mail.gmail.com> <546C8812.2070904@FreeBSD.org> <20141119195923.GS24601@funkthat.com> <CAG=rPVdrjim_28ntxUv6qJxb-_bGhGabZipFyYzNuydZ2XPm5Q@mail.gmail.com> <69A8C06F-A7F6-49EC-8601-91AC4CDBFB13@FreeBSD.org> <547364EB.7090505@freebsd.org> <CAG=rPVeAM2_EEVYyhQiuXV7i%2Bpvw-uPYRBbjXeZhZKbrZzHRMg@mail.gmail.com> <547AEB93.3050600@freebsd.org> <CAF05609-FCB4-4CB6-9533-A1F32A7F3F22@neville-neil.com> <alpine.BSF.2.11.1412041456190.87680@fledge.watson.org> <5fa49b79a601363b471babbfc577590d@openmailbox.org> <5483BA9F.2000905@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Peter,

Thank you for your response!


Looking at Capsicum, I think it has an even lower safety profile than 
NaCl - my usecase might just run any beastly binary code, so the sandbox 
wall needs to be the toughest you got, so using BHyVe here makes sense.

Also there's a "generality problem" - running x86 from scratch is a very 
general problem, the instruction set doesn't change so there's no 
version conflicts, the host program could be ported to other 
environments maintaining guest binary compatibility and so on, so a safe 
x86 sandbox makes a lot of sense.


Great news to hear that all of this can be done and is satisfactorily 
much so within the scope BHyVe's intended use!

Right, so a VT-x takes time - if we're talking 2500 cycles per 
roundtrip, then you get approx one million of them per second - fair 
enough.


Though, as you point out this is a hardcore usecase indeed:

As you point out the VM is launched absolutely bare.

So then, strategies would need to be devised for how to run the code.

I guess this would range all the way from running a barebones memory 
management and pthreads implemenation (both those exist at least for 
amd64, if I got it right, not sure how up to date they are though),

..to running a minimal BSD/Linux kernel within the VM for maxing 
compatibility.

Perhaps some neat memory dumping drick could be applied to zero the 
BSD/Linux kernel boot time, i.e. the kernel doesn't boot but it's just 
loaded from a hibernated memory image.

I guess the most critical thing about the BSD/Linux kernel option would 
be how memory it consumes in itself, and that would need to be evaluated 
also.


Okay, so, there are two aspects to this problem now:

  (1) The BHyVe specific parts, about how to use BHyVe's interfaces to 
deliver the virtualization/sandbox aspect, and

  (2) The guest specific parts, as in how run the intended 
libraries/binaries as close to the metal as possible and with as low RAM 
overhead as possible (and CPU overhead) within the sandbox.


I guess (2) is off-topic to this mailing list so should be looked into 
separately.

If you have any spontaneous thoughts on that one though, of course feel 
free to share them :)


And, (1) is on this ML's topic, so I can direct any subsequent questions 
about that to here.

If you have any particular references with regard to header files, man 
page or example program source files, feel free to pass them!


Thanks!
Tinkr

On 2014-12-07 04:25, Peter Grehan wrote:
>> Can the following under any circumstance be done with BHyVe?
> 
>  Depends on how you define "any" :)
> 
>> BHyVe_sandbox* s = BHyVe_createSandbox(100KB RAM, 2 CPU cores,void*
>> pointer to the audio codec binary code,int size of the same code);
>> 
>> and then pass in data for procedure invocation like BHyVe_call(s,
>> "encoder_iterate",   my arguments);
> 
>  bhyve provides what looks like a bare-metal system. While you can
> inject code and register state (ala bhyveload/grub-bhyve), the code
> has to be essentially an operating system, since the only interface
> provided is hardware.
> 
>  For example, to run with > 1 CPUs, you would have to program APIC
> registers to startup up additional CPUs - very much less friendly than
> pthread_create() in a POSIX environment.
> 
>> And I'd have 50 000 of these running simultaneously on one machine, 
>> but
>> their respective heap size is 100KB each so that's only 5GB RAM in 
>> total.
>> 
>> As I get more data to encode from the network from any of the 50 000
>> audio streams I'm listening to, I feed the respective audio encoder
>> instance with the respective data with it, reactively.
>> 
>> Key characteristics here are
>>   * How light a BHyVe instance is (like, does a BHyVe instance require
>> allocation of any device file or FD otherwise), and
> 
>  Not too much. It requires an amount of kernel memory for guest data
> structures, maybe in the order of 10s of KBs.
> 
>>   * How lightweight trampolines between OS host process and guest code 
>> is,
> 
>  You have to do a VT-x context switch into the guest each time you'd
> want to run it - that's a few thousand cycles.
> 
>>   * I guess, if an OS scheduler context switch is needed for switching
>> between my host "my_audio_encoding_service_program.c" code and the 
>> guest
>> library code - can this be done like 100 000 times in a second.
> 
>  It could be done, but perhaps something like FreeBSD's capsicum
> sandboxing would be a better fit for this type of task than a VM.
> 
>  That's not to day it couldn't be done, just that it's a lot of work
> to get there.
> 
> later,
> 
> Peter.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0fd8dd84712a2d78e4397ac89d78326c>