From owner-freebsd-virtualization@FreeBSD.ORG Tue May 20 20:11:53 2014 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DCD9ED74 for ; Tue, 20 May 2014 20:11:53 +0000 (UTC) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 675042D20 for ; Tue, 20 May 2014 20:11:53 +0000 (UTC) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 83BFE153434; Tue, 20 May 2014 22:11:48 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Vxbnra8F09ga; Tue, 20 May 2014 22:11:43 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:65d0:df9:94a1:3ef5] (unknown [IPv6:2001:4cb8:3:1:65d0:df9:94a1:3ef5]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id 71E88153AC2; Tue, 20 May 2014 22:11:43 +0200 (CEST) Message-ID: <537BB6FF.5080909@digiware.nl> Date: Tue, 20 May 2014 22:11:43 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Anish , Nils Beyer Subject: Re: bhyve: svm (amd-v) update References: <045ce77ed17da4bd515bcc3cafe9c7f8@webmail.renzel.net.local> In-Reply-To: Content-Type: multipart/mixed; boundary="------------030600060806040702030601" Cc: FreeBSD virtualization X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 May 2014 20:11:53 -0000 This is a multi-part message in MIME format. --------------030600060806040702030601 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 18-5-2014 16:44, Anish wrote: > Thanks for testing it. >> Your patch applied cleanly to the working copy of the "bhyve_svm"-project. > I was then able to merge with HEAD > (using "theirs-full" on one file) and compile the kernel. So, to me it > looks OK to commit. > Yes, that's correct. You have to retain changes in sys/amd64/vmm/amd/amdv.c > from bhyve_svm branch. > >> Unfortunately, I am still not able to boot CentOS 6.5 using my Phenom > 1055T. It produces 200% load on the > host CPU, and the emulated machine generates endlessly: > Its 200% load because of 2 vcpus to guest. It stuck in loop even with > single processor(1 vcpu) after PCI probing[debug messages with linux > .....earlyprintk=serial debug] > > [ 3.684243] UDP hash table entries: 1024 (order: 3, 32768 bytes) > > [ 3.686484] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) > > [ 3.691987] NET: Registered protocol family 1 > > [ 3.693382] pci 0000:00:01.0: Activating ISA DMA hang workarounds > > [ 3.695214] PCI: CLS 64 bytes, default 64 > > [ 3.698176] Trying to unpack rootfs image as initramfs... > > [ 30.595279] BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1] > > [ 3.505631] pnp: PnP ACPI: found 5 devices > > [ 3.506417] ACPI: bus type PNP unregistered > > [ 3.635781] pci 0000:00:06.0: no compatible bridge window for [mem > 0xfe440000 > > -0xfe45ffff pref] > > [ 3.637555] pci 0000:00:06.0: BAR 6: assigned [mem 0x80000000-0x8001ffff > pref > > ] > > [ 3.638986] pci 0000:00:01.0: BAR 6: assigned [mem 0x80020000-0x800207ff > pref > > ] > > [ 3.640416] pci 0000:00:04.0: BAR 6: assigned [mem 0x80020800-0x80020fff > pref > > ] > > [ 3.641864] pci 0000:00:05.0: BAR 6: assigned [mem 0x80021000-0x800217ff > pref > > ] > > [ 3.643259] pci 0000:00:00.0: not setting up bridge for bus 0000:01 > > [ 3.644550] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7] > > [ 3.645670] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff] > > [ 3.646795] pci_bus 0000:00: resource 6 [mem 0x80000000-0xdfffffff] > > [ 3.648031] pci_bus 0000:00: resource 7 [mem 0xd000000000-0xfcffffffff] > > [ 3.650970] NET: Registered protocol family 2 > > [ 3.661491] TCP established hash table entries: 16384 (order: 6, 262144 > bytes > > ) > > [ 3.671854] TCP bind hash table entries: 16384 (order: 6, 262144 bytes) > > [ 3.681116] TCP: Hash tables configured (established 16384 bind 16384) > > [ 3.683335] TCP: reno registered > > [ 3.684243] UDP hash table entries: 1024 (order: 3, 32768 bytes) > > [ 3.686484] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) > > [ 3.691987] NET: Registered protocol family 1 > > [ 3.693382] pci 0000:00:01.0: Activating ISA DMA hang workarounds > > [ 3.695214] PCI: CLS 64 bytes, default 64 > > [ 3.698176] Trying to unpack rootfs image as initramfs... > > [ 30.595279] BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1] > > [ 30.596366] Modules linked in: >> Additionally, It produces a lot of MSR requests: > Yes, on AMD Linux is touching more MSRs( AMD specific -address 0xC00XXXX) > compared to Intel. > > Thanks and regards, > Anish > > > On Fri, May 16, 2014 at 2:17 PM, Nils Beyer wrote: > >> Hi Anish, >> >> Anish wrote: >>> If patches looks good to you, we can submit it. I have been testing it on >>> Phenom box which lacks some of newer SVM features. >> >> Your patch applied cleanly to the working copy of the "bhyve_svm"-project. >> I was then able to merge with HEAD >> (using "theirs-full" on one file) and compile the kernel. So, to me it >> looks OK to commit. >> >> Unfortunately, I am still not able to boot CentOS 6.5 using my Phenom >> 1055T. It produces 200% load on the >> host CPU, and the emulated machine generates endlessly: >> >> ======================================================================================= >> BUG: soft lockup - CPU#0 stuck for 67s! [swapper:1] >> Modules linked in: >> CPU 0 >> Modules linked in: >> >> Pid: 1, comm: swapper Not tainted 2.6.32-431.el6.x86_64 #1 BHYVE And more... >> I'd love to see CentOS perfectly running on my Phenom as it runs perfectly >> on an Intel i3. >> >> If you need any further information/debug, please let me know... I've been trying to get Ubuntu, CentOS and like to run on AMDs, and currently I'm compiling a kernel, but it goes dirt slow. Attached a patch I have to debug more of the MSRs and it does what I do to get the TSC running.... It helps, but things are still like molases. For Ubuntu I also needed to fix part of the AHCI code since it bails out on ATA FLUSH. I'm going to take a look at the recently posted diff which should get bhyve_svm in line with head. And see if that speeds up my Ubuntu kernels. --WjW --------------030600060806040702030601 Content-Type: text/plain; charset=windows-1252; name="msr-tsc.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="msr-tsc.patch" Index: sys/amd64/vmm/amd/svm.c =================================================================== --- sys/amd64/vmm/amd/svm.c (revision 264582) +++ sys/amd64/vmm/amd/svm.c (working copy) @@ -82,6 +82,8 @@ static bool svm_vmexit(struct svm_softc *svm_sc, int vcpu, struct vm_exit *vmexit); static int svm_msr_rw_ok(uint8_t *btmap, uint64_t msr); +static int svm_msr_ro_ok(uint8_t *btmap, uint64_t msr); +static int svm_msr_rw_ro_ok(uint8_t *btmap, uint64_t msr, int mask); static int svm_msr_index(uint64_t msr, int *index, int *bit); static uint32_t svm_feature; /* AMD SVM features. */ @@ -315,9 +317,24 @@ /* * Give virtual cpu the complete access to MSR(read & write). */ +#define MSR_RO 1 +#define MSR_RW 3 + static int svm_msr_rw_ok(uint8_t *perm_bitmap, uint64_t msr) { + return svm_msr_rw_ro_ok(perm_bitmap, msr, MSR_RW); +} + +static int +svm_msr_ro_ok(uint8_t *perm_bitmap, uint64_t msr) +{ + return svm_msr_rw_ro_ok(perm_bitmap, msr, MSR_RO); +} + +static int +svm_msr_rw_ro_ok(uint8_t *perm_bitmap, uint64_t msr, int mask) +{ int index, bit, err; err = svm_msr_index(msr, &index, &bit); @@ -336,8 +353,12 @@ } /* Disable intercept for read and write. */ - perm_bitmap[index] &= ~(3 << bit); - CTR1(KTR_VMM, "Guest has full control on SVM:MSR(0x%lx).\n", msr); + perm_bitmap[index] &= ~(mask << bit); + if (mask==MSR_RW) { + CTR1(KTR_VMM, "Guest has Read/Write control on SVM:MSR(0x%lx).\n", msr ); + } else { + CTR1(KTR_VMM, "Guest has Read/Write control on SVM:MSR(0x%lx).\n", msr ); + } return (0); } @@ -415,10 +436,26 @@ svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_CS_MSR); svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_ESP_MSR); svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_SYSENTER_EIP_MSR); - + +#define AMD_MSR_TSEG_BASE 0xc0010112 +#define AMD_MSR_OSVW_ID_LENGTH 0xc0010140 /* read */ +#define AMD_MSR_OSVW_STATUS 0xc0010141 /* read */ +#define AMD_MSR_MC4_CTL_MASK 0xc0010048 + /* For Nested Paging/RVI only. */ svm_msr_rw_ok(svm_sc->msr_bitmap, MSR_PAT); + svm_msr_rw_ok(svm_sc->msr_bitmap, AMD_MSR_OSVW_ID_LENGTH); + svm_msr_rw_ok(svm_sc->msr_bitmap, AMD_MSR_OSVW_STATUS); + /* + * MSRs that are allowed to be read. + * most obvious one is the TSC read which could be time critical + */ + svm_msr_ro_ok(svm_sc->msr_bitmap, MSR_TSC); + svm_msr_ro_ok(svm_sc->msr_bitmap, MSR_HWCR); + svm_msr_ro_ok(svm_sc->msr_bitmap, AMD_MSR_TSEG_BASE); + svm_msr_ro_ok(svm_sc->msr_bitmap, AMD_MSR_MC4_CTL_MASK); + /* Intercept access to all I/O ports. */ memset(svm_sc->iopm_bitmap, 0xFF, sizeof(svm_sc->iopm_bitmap)); @@ -566,6 +603,13 @@ svm_efer(svm_sc, vcpu, info1); break; } + if (ecx == MSR_TSC) { + uint64_t tscval = rdtsc(); + VCPU_CTR0(svm_sc->vm, vcpu,"VMEXIT TSC MSR\n"); + state->rax = tscval & 0xffffffff; + ctx->e.g.sctx_rdx = tscval >> 32; + break; + } retu = false; if (info1) { Index: sys/amd64/vmm/intel/vmx.c =================================================================== --- sys/amd64/vmm/intel/vmx.c (revision 264582) +++ sys/amd64/vmm/intel/vmx.c (working copy) @@ -109,6 +109,9 @@ #define guest_msr_rw(vmx, msr) \ msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_RW) +#define guest_msr_ro(vmx, msr) \ + msr_bitmap_change_access((vmx)->msr_bitmap, (msr), MSR_BITMAP_ACCESS_READ) + #define HANDLED 1 #define UNHANDLED 0 @@ -786,6 +789,11 @@ * MSR_EFER is saved and restored in the guest VMCS area on a * VM exit and entry respectively. It is also restored from the * host VMCS area on a VM exit. + * + * The TSC MSR is exposed read-only. Writes are disallowed as that + * will impact the host TSC. + * XXX Writes would be implemented with a wrmsr trap, and + * then modifying the TSC offset in the VMCS. */ if (guest_msr_rw(vmx, MSR_GSBASE) || guest_msr_rw(vmx, MSR_FSBASE) || @@ -793,7 +801,8 @@ guest_msr_rw(vmx, MSR_SYSENTER_ESP_MSR) || guest_msr_rw(vmx, MSR_SYSENTER_EIP_MSR) || guest_msr_rw(vmx, MSR_KGSBASE) || - guest_msr_rw(vmx, MSR_EFER)) + guest_msr_rw(vmx, MSR_EFER) || + guest_msr_ro(vmx, MSR_TSC)) panic("vmx_vminit: error setting guest msr access"); /* Index: sys/amd64/vmm/io/vlapic.c =================================================================== --- sys/amd64/vmm/io/vlapic.c (revision 264582) +++ sys/amd64/vmm/io/vlapic.c (working copy) @@ -143,7 +143,7 @@ #define VLAPIC_TIMER_UNLOCK(vlapic) mtx_unlock_spin(&((vlapic)->timer_mtx)) #define VLAPIC_TIMER_LOCKED(vlapic) mtx_owned(&((vlapic)->timer_mtx)) -#define VLAPIC_BUS_FREQ tsc_freq +#define VLAPIC_BUS_FREQ (128*1024*1024) static __inline uint32_t vlapic_get_id(struct vlapic *vlapic) Index: sys/amd64/vmm/vmm_msr.c =================================================================== --- sys/amd64/vmm/vmm_msr.c (revision 264582) +++ sys/amd64/vmm/vmm_msr.c (working copy) @@ -113,6 +113,9 @@ case MSR_MCG_CAP: guest_msrs[i] = 0; break; + case MSR_TSC: + guest_msrs[i] = rdtsc(); + break; case MSR_PAT: guest_msrs[i] = PAT_VALUE(0, PAT_WRITE_BACK) | PAT_VALUE(1, PAT_WRITE_THROUGH) | Index: sys/amd64/vmm/vmm_msr.h =================================================================== --- sys/amd64/vmm/vmm_msr.h (revision 264582) +++ sys/amd64/vmm/vmm_msr.h (working copy) @@ -29,7 +29,7 @@ #ifndef _VMM_MSR_H_ #define _VMM_MSR_H_ -#define VMM_MSR_NUM 16 +#define VMM_MSR_NUM 17 struct vm; void vmm_msr_init(void); Index: usr.sbin/bhyve/bhyverun.c =================================================================== --- usr.sbin/bhyve/bhyverun.c (revision 264582) +++ usr.sbin/bhyve/bhyverun.c (working copy) @@ -52,6 +52,7 @@ #include #include "bhyverun.h" +#include "compiledate.h" #include "acpi.h" #include "inout.h" #include "dbgport.h" @@ -75,6 +76,8 @@ #define MB (1024UL * 1024) #define GB (1024UL * MB) +#define FALSE 0 +#define TRUE (!FALSE) typedef int (*vmexit_handler_t)(struct vmctx *, struct vm_exit *, int *vcpu); @@ -139,8 +142,8 @@ " -S: legacy PCI slot config\n" " -l: LPC device configuration\n" " -m: memory size in MB\n" - " -w: ignore unimplemented MSRs\n", - progname, (int)strlen(progname), ""); + " -w: ignore unimplemented MSRs\n" + ,progname, (int)strlen(progname), ""); exit(code); } @@ -287,10 +290,6 @@ if (vme->u.inout.string || vme->u.inout.rep) return (VMEXIT_ABORT); - /* Special case of guest reset */ - if (out && port == 0x64 && (uint8_t)eax == 0xFE) - return (vmexit_catch_reset()); - /* Extra-special case of host notifications */ if (out && port == GUEST_NIO_PORT) return (vmexit_handle_notify(ctx, vme, pvcpu, eax)); @@ -315,16 +314,16 @@ uint64_t val; uint32_t eax, edx; int error; + val = 0; - val = 0; error = emulate_rdmsr(ctx, *pvcpu, vme->u.msr.code, &val); + if (error != 0) { - fprintf(stderr, "rdmsr to register %#x on vcpu %d\n", + fprintf(stderr, "rdmsr to register %#x ignored on vcpu %d\n\r", vme->u.msr.code, *pvcpu); if (strictmsr) return (VMEXIT_ABORT); } - eax = val; error = vm_set_register(ctx, *pvcpu, VM_REG_GUEST_RAX, eax); assert(error == 0); @@ -332,7 +331,6 @@ edx = val >> 32; error = vm_set_register(ctx, *pvcpu, VM_REG_GUEST_RDX, edx); assert(error == 0); - return (VMEXIT_CONTINUE); } @@ -343,7 +341,7 @@ error = emulate_wrmsr(ctx, *pvcpu, vme->u.msr.code, vme->u.msr.wval); if (error != 0) { - fprintf(stderr, "wrmsr to register %#x(%#lx) on vcpu %d\n", + fprintf(stderr, "wrmsr to register %#x(%#lx) ignored on vcpu %d\n\r", vme->u.msr.code, vme->u.msr.wval, *pvcpu); if (strictmsr) return (VMEXIT_ABORT); @@ -676,6 +674,7 @@ argc -= optind; argv += optind; + printf("BHyve compiled: %s \n\r\n\r", compiledate ); if (argc != 1) usage(1); Index: usr.sbin/bhyve/xmsr.c =================================================================== --- usr.sbin/bhyve/xmsr.c (revision 264582) +++ usr.sbin/bhyve/xmsr.c (working copy) @@ -38,24 +38,72 @@ #include #include "xmsr.h" +#include "xmsr-info.h" +#define BIT(b) (1<