From owner-freebsd-virtualization@FreeBSD.ORG Thu Apr 2 11:20:15 2015 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2C615F4A; Thu, 2 Apr 2015 11:20:15 +0000 (UTC) Received: from mail-ig0-x233.google.com (mail-ig0-x233.google.com [IPv6:2607:f8b0:4001:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E5EB9153; Thu, 2 Apr 2015 11:20:14 +0000 (UTC) Received: by igcxg11 with SMTP id xg11so70551028igc.0; Thu, 02 Apr 2015 04:20:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=HmmufEYzPCF+nz5YlkclfUknk0RpzexM9ZuFpwtCczM=; b=brlF52VYT8WcwMPL6zwuQAvckPDvZ3VdM9tAe34c48ePy/L14MEnN0XlQ4C2WHIyLw zu99LbreWgLPN/kW3AgnF4L3I/X2pgYEdbejiHgwlYLuP+OvbCLZlNopECvDws4TIYaS R2cCsreFVxk/cQtHe5uCahAi2O7Mqf2WR+RYqhZoYKvYFQTzkhobAlx0X078igm6zq4L nOFlxiJuXl5cLiSxejvKD8XbuGY8XxQ7Qq6pnAu4EUVX626jHxFcBn3VIVLBatGjW2Ke x2TCEQPj2b0tCRLWlyswW2/CBNrF5vI6HVmWwHpc+gyIRBNSldfeZbSCamUzOckCHmw7 AKow== X-Received: by 10.107.137.31 with SMTP id l31mr35283991iod.23.1427973614323; Thu, 02 Apr 2015 04:20:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.7.16 with HTTP; Thu, 2 Apr 2015 04:19:44 -0700 (PDT) In-Reply-To: <551B367B.4020104@freebsd.org> References: <551B367B.4020104@freebsd.org> From: Jia-Shiun Li Date: Thu, 2 Apr 2015 19:19:44 +0800 Message-ID: Subject: Re: Bhyve: Investigating poor guest performance when host is busy To: Peter Grehan Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-virtualization@freebsd.org, Stefan Andritoiu X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Apr 2015 11:20:15 -0000 On Wed, Apr 1, 2015 at 8:06 AM, Peter Grehan wrote: > 1. Can anyone tell me what the cause might be? What may be happening? >> 2. Do you know if there is currently any work in investigation this >> problem? Or anything related? >> 3. Is Gang Scheduling or Coscheduling implemented in FreeBSD? >> 4. Do you know of any other solution to this kind of problem? >> 5. Can you recommend me any papers/videos/links in anyway related to this? >> > > I answered these in the FreeBSD forums post, but reproduced again here > for the list: > > 1. The main issue is 'lock holder preemption', where a vCPU that is > holding a spinlock has been pre-empted by the host scheduler, resulting in > other vCPUs that are trying to acquire that lock to spin for full quantums. > > Booting is a variant of this for FreeBSD since the AP spins on a memory > location waiting for a BSP to start up. > > 2. There's some minor investigation going on. > > 3. No. > > 4. I don't know that 'classic' gang scheduling is the answer (see 5). What > has been thought of for bhyve at least is to a) have the concept of vCPU > 'groups' in the scheduler, b) provide metrics to assist the scheduler in > trying to spread out threads associated with a vCPU group so they don't end > up on the same physical CPU (avoidance of lock-holder preemption), and c) > implement pause-loop exits (see the Intel SDM, 24.6.13) in the hypervisor > and provide that information to the scheduler so it can give a temporary > priority boost to vCPUs that have been preempted but aren't currently > running. > > 5. The classic reference on this is VMWare's scheduler paper: > www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf > I am seeing similar behavior on Windows host w/ vmware workstation recently. Guest boots several times slower than before. And console message apparently scrolls slower after APs were all started. Booting natively is not slow as much. Guess that's related to recent SMP changes to bring APs online early? Wondering if APs can be halted or be brought up later after kernel can really begin to schedule tasks onto them. -Jia-Shiun