From owner-freebsd-virtualization@FreeBSD.ORG Mon Dec 23 19:44:29 2013 Return-Path: Delivered-To: virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3E6AFB8A for ; Mon, 23 Dec 2013 19:44:29 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EEB20194C for ; Mon, 23 Dec 2013 19:44:28 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id CE8F1B9CC for ; Mon, 23 Dec 2013 14:44:27 -0500 (EST) From: John Baldwin To: virtualization@freebsd.org Subject: [PATCH] Support for S5 (soft power off) in bhyve Date: Mon, 23 Dec 2013 14:43:39 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201312231443.39282.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 23 Dec 2013 14:44:28 -0500 (EST) X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Dec 2013 19:44:29 -0000 One minor nit I've found annoying is that there is no easy way to break out of the loop in vmrun.sh. Breaking into the loader prompt and typing 'reboot' does work, but it requires you to be on the console at the time. What I really want is for 'shutdown -p' or 'poweroff' inside a guest to cleanly shut down and exit the loop in vmrun.sh. To that end, I've implemented support for a few more registers (such of which are non-optional in ACPI) including the Reset Control register (0xcf9), ACPI Power Management 1 Event registers (PM1_EVT) and the ACPI Power Management 1 Control register (PM1_CNT). I added a valid _S5 package and catch writes of the value _S5 specifies to ask bhyve to exit gracefully but with an exit code of 1 (so the loop in vmrun terminates). Most of the PM1 support is very simple (it doesn't support signalling anything for external events as most of them don't make sense). It does perform a reset for writes to the 0xcf9 register however, and also advertises 0xcf9 as the ACPI reset register. (Tested by commenting out the current 0x64 hack.) One thing I had to do was make it possible for an I/O port to request a reset or poweroff, so I added some #define's for additional return codes from in/out handlers. I haven't gone through and changed any existing handlers to use the new constants, but the new handlers make use of INOT_RESET and INOUT_POWEROFF. Note that this also means that the current 0x64 hack could be reimplemented as an in-out handler rather than a special case. I did not explicitly catch VMEXIT_POWEROFF and use a custom exit value in the main loop in bhyverun.c. Instead, I let it fall through to the default and do an exit(1). The patch is all in bhyve itself: --- //depot/vendor/freebsd/src/usr.sbin/bhyve/Makefile +++ //depot/user/jhb/bhyve/usr.sbin/bhyve/Makefile @@ -10,7 +10,7 @@ SRCS= acpi.c atpic.c bhyverun.c block_if.c consport.c dbgport.c elcr.c SRCS+= inout.c legacy_irq.c mem.c mevent.c mptbl.c pci_ahci.c SRCS+= pci_emul.c pci_hostbridge.c pci_lpc.c pci_passthru.c pci_virtio_block.c -SRCS+= pci_virtio_net.c pci_uart.c pit_8254.c pmtmr.c post.c rtc.c +SRCS+= pci_virtio_net.c pci_uart.c pit_8254.c pm.c pmtmr.c post.c rtc.c SRCS+= uart_emul.c virtio.c xmsr.c spinup_ap.c .PATH: ${.CURDIR}/../../sys/amd64/vmm --- //depot/vendor/freebsd/src/usr.sbin/bhyve/acpi.c +++ //depot/user/jhb/bhyve/usr.sbin/bhyve/acpi.c @@ -85,6 +85,8 @@ #define BHYVE_ASL_SUFFIX ".aml" #define BHYVE_ASL_COMPILER "/usr/sbin/iasl" +#define BHYVE_PM1A_EVT_ADDR 0x400 +#define BHYVE_PM1A_CNT_ADDR 0x404 #define BHYVE_PM_TIMER_ADDR 0x408 static int basl_keep_temps; @@ -332,9 +344,11 @@ EFPRINTF(fp, "[0001]\t\tACPI Disable Value : 00\n"); EFPRINTF(fp, "[0001]\t\tS4BIOS Command : 00\n"); EFPRINTF(fp, "[0001]\t\tP-State Control : 00\n"); - EFPRINTF(fp, "[0004]\t\tPM1A Event Block Address : 00000000\n"); + EFPRINTF(fp, "[0004]\t\tPM1A Event Block Address : %08X\n", + BHYVE_PM1A_EVT_ADDR); EFPRINTF(fp, "[0004]\t\tPM1B Event Block Address : 00000000\n"); - EFPRINTF(fp, "[0004]\t\tPM1A Control Block Address : 00000000\n"); + EFPRINTF(fp, "[0004]\t\tPM1A Control Block Address : %08X\n", + BHYVE_PM1A_CNT_ADDR); EFPRINTF(fp, "[0004]\t\tPM1B Control Block Address : 00000000\n"); EFPRINTF(fp, "[0004]\t\tPM2 Control Block Address : 00000000\n"); EFPRINTF(fp, "[0004]\t\tPM Timer Block Address : %08X\n", @@ -397,10 +411,10 @@ EFPRINTF(fp, "[0001]\t\tBit Width : 08\n"); EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n"); EFPRINTF(fp, "[0001]\t\tEncoded Access Width : 01 [Byte Access:8]\n"); - EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000001\n"); + EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000CF9\n"); EFPRINTF(fp, "\n"); - EFPRINTF(fp, "[0001]\t\tValue to cause reset : 00\n"); + EFPRINTF(fp, "[0001]\t\tValue to cause reset : 06\n"); EFPRINTF(fp, "[0003]\t\tReserved : 000000\n"); EFPRINTF(fp, "[0008]\t\tFACS Address : 00000000%08X\n", basl_acpi_base + FACS_OFFSET); @@ -412,7 +426,8 @@ EFPRINTF(fp, "[0001]\t\tBit Width : 20\n"); EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n"); EFPRINTF(fp, "[0001]\t\tEncoded Access Width : 02 [Word Access:16]\n"); - EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000001\n"); + EFPRINTF(fp, "[0008]\t\tAddress : 00000000%08X\n", + BHYVE_PM1A_EVT_ADDR); EFPRINTF(fp, "\n"); EFPRINTF(fp, @@ -431,7 +446,8 @@ EFPRINTF(fp, "[0001]\t\tBit Width : 10\n"); EFPRINTF(fp, "[0001]\t\tBit Offset : 00\n"); EFPRINTF(fp, "[0001]\t\tEncoded Access Width : 02 [Word Access:16]\n"); - EFPRINTF(fp, "[0008]\t\tAddress : 0000000000000001\n"); + EFPRINTF(fp, "[0008]\t\tAddress : 00000000%08X\n", + BHYVE_PM1A_CNT_ADDR); EFPRINTF(fp, "\n"); EFPRINTF(fp, @@ -603,6 +619,11 @@ EFPRINTF(fp, "DefinitionBlock (\"bhyve_dsdt.aml\", \"DSDT\", 2," "\"BHYVE \", \"BVDSDT \", 0x00000001)\n"); EFPRINTF(fp, "{\n"); + EFPRINTF(fp, " Name (_S5, Package (0x02)\n"); + EFPRINTF(fp, " {\n"); + EFPRINTF(fp, " 0x05,\n"); + EFPRINTF(fp, " Zero,\n"); + EFPRINTF(fp, " })\n"); EFPRINTF(fp, " Scope (_SB)\n"); EFPRINTF(fp, " {\n"); EFPRINTF(fp, " Device (PCI0)\n"); --- //depot/vendor/freebsd/src/usr.sbin/bhyve/bhyverun.c +++ //depot/user/jhb/bhyve/usr.sbin/bhyve/bhyverun.c @@ -72,6 +72,7 @@ #define VMEXIT_RESTART 2 /* restart current instruction */ #define VMEXIT_ABORT 3 /* abort the vm run loop */ #define VMEXIT_RESET 4 /* guest machine has reset */ +#define VMEXIT_POWEROFF 5 /* guest machine has powered off */ #define MB (1024UL * 1024) #define GB (1024UL * MB) @@ -296,12 +297,17 @@ return (vmexit_handle_notify(ctx, vme, pvcpu, eax)); error = emulate_inout(ctx, vcpu, in, port, bytes, &eax, strictio); - if (error == 0 && in) + if (error == INOUT_OK && in) error = vm_set_register(ctx, vcpu, VM_REG_GUEST_RAX, eax); - if (error == 0) + switch (error) { + case INOUT_OK: return (VMEXIT_CONTINUE); - else { + case INOUT_RESET: + return (VMEXIT_RESET); + case INOUT_POWEROFF: + return (VMEXIT_POWEROFF); + default: fprintf(stderr, "Unhandled %s%c 0x%04x\n", in ? "in" : "out", bytes == 1 ? 'b' : (bytes == 2 ? 'w' : 'l'), port); --- //depot/vendor/freebsd/src/usr.sbin/bhyve/inout.h +++ //depot/user/jhb/bhyve/usr.sbin/bhyve/inout.h @@ -33,6 +33,12 @@ struct vmctx; +/* Handler return values. */ +#define INOUT_ERROR -1 +#define INOUT_OK 0 +#define INOUT_RESET 1 +#define INOUT_POWEROFF 2 + typedef int (*inout_func_t)(struct vmctx *ctx, int vcpu, int in, int port, int bytes, uint32_t *eax, void *arg); --- /dev/null +++ //depot/user/jhb/bhyve/usr.sbin/bhyve/pm.c @@ -0,0 +1,139 @@ +/*- + * Copyright (c) 2013 Advanced Computing Technologies LLC + * Written by: John H. Baldwin + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD"); + +#include + +#include "inout.h" + +#define PM1A_EVT_ADDR 0x400 +#define PM1A_CNT_ADDR 0x404 + +/* + * Reset Control register at I/O port 0xcf9. Bit 2 forces a system + * reset when it transitions from 0 to 1. Bit 1 selects the type of + * reset to attempt: 0 selects a "soft" reset, and 1 selects a "hard" + * reset. + */ +static int +reset_handler(struct vmctx *ctx, int vcpu, int in, int port, int bytes, + uint32_t *eax, void *arg) +{ + static uint8_t reset_control; + + if (bytes != 1) + return (-1); + if (in) + *eax = reset_control; + else { + reset_control = *eax; + + /* Treat hard and soft resets the same. */ + if (reset_control & 0x4) + return (INOUT_RESET); + } + return (0); +} +INOUT_PORT(reset_reg, 0xCF9, IOPORT_F_INOUT, reset_handler); + +/* + * Power Management 1 Event Registers + * + * bhyve doesn't support any power management events currently, so the + * status register always returns zero. The enable register preserves + * its value but has no effect. + */ +static int +pm1_status_handler(struct vmctx *ctx, int vcpu, int in, int port, int bytes, + uint32_t *eax, void *arg) +{ + + if (bytes != 2) + return (-1); + if (in) + *eax = 0; + return (0); +} + +static int +pm1_enable_handler(struct vmctx *ctx, int vcpu, int in, int port, int bytes, + uint32_t *eax, void *arg) +{ + static uint16_t pm1_enable; + + if (bytes != 2) + return (-1); + if (in) + *eax = pm1_enable; + else + pm1_enable = *eax; + return (0); +} +INOUT_PORT(pm1_status, PM1A_EVT_ADDR, IOPORT_F_INOUT, pm1_status_handler); +INOUT_PORT(pm1_enable, PM1A_EVT_ADDR + 2, IOPORT_F_INOUT, pm1_enable_handler); + +/* + * Power Management 1 Control Register + * + * This is mostly unimplemented except that we wish to handle writes that + * set SPL_EN to handle S5 (soft power off). + */ +#define PM1_SLP_TYP 0x1c00 +#define PM1_SLP_EN 0x2000 +#define PM1_ALWAYS_ZERO 0xc003 + +static int +pm1_control_handler(struct vmctx *ctx, int vcpu, int in, int port, int bytes, + uint32_t *eax, void *arg) +{ + static uint16_t pm1_control; + + if (bytes != 2) + return (-1); + if (in) + *eax = pm1_control; + else { + /* + * Various bits are write-only or reserved, so force them + * to zero in pm1_control. + */ + pm1_control = *eax & ~(PM1_SLP_EN | PM1_ALWAYS_ZERO); + + /* + * If SLP_EN is set, check for S5. Bhyve's _S5_ method + * says that '5' should be stored in SLP_TYP for S5. + */ + if (*eax & PM1_SLP_EN) { + if ((pm1_control & PM1_SLP_TYP) >> 10 == 5) + return (INOUT_POWEROFF); + } + } + return (0); +} +INOUT_PORT(pm1_control, PM1A_CNT_ADDR, IOPORT_F_INOUT, pm1_control_handler); + -- John Baldwin