Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Mar 2015 23:13:23 +0000
From:      Andrew Daugherity <adaugherity@tamu.edu>
To:        "freebsd-xen@freebsd.org" <freebsd-xen@freebsd.org>
Subject:   Poor performance with FreeBSD 10.1 under Xen 4.2
Message-ID:  <115BE54D-078A-4C45-8904-861DAB316C03@tamu.edu>

next in thread | raw e-mail | index | archive | help
Summary: FreeBSD 10.1/amd64 under Xen 4.2.5 is much slower than FreeBSD 9.3=
 on the same environment, especially at fork()


I recently installed a FreeBSD-10.1 VM under Xen, and was pleased to see th=
e XENHVM stuff is now integrated into GENERIC.  However, the system seemed =
a little slow and lacking in "snappiness" -- the first fetch/extraction of =
portsnap was particularly bad, taking at least 20 minutes.  It had been a w=
hile since I'd done that (as opposed to 'portsnap fetch update') so I wasn'=
t sure how abnormal that was, but then I noticed building stuff from ports,=
 especially stuff using libtool, like security/sssd, was extremely slow com=
pared to physical hardware, so I tested a 9.3 VM, which was much faster.

Importantly, it was not a typical case of a slow/overloaded CPU but more li=
ke slow context switching/forking.  I would see high (40%) system CPU perce=
ntage but low user, and usually the process at the top of the list was sh. =
 It would take a long time between compiling files but when cc finally ran =
it was quite fast, compiling each file in a second or two.  The system was =
not swapping and iostat (also xentop on the host) showed minimal I/O load.

Tracing the sh process (which was libtool-related) with truss, I would see =
it do some stuff, fork, wait several seconds, then do some more stuff, rins=
e and repeat.  Using 'truss -f' to follow the child processes, there was a =
noticeable delay associated with each fork() call.

This led me to do some benchmarking.  I found a fork() benchmark at [1] and=
 ran it on various systems.  Notably, on FreeBSD 10.1 (also 10.0) under Xen=
, it was reasonably fast shortly after bootup (though still slower than 9.3=
), but would get slower on repeated runs, and significantly slower after co=
mpiling some ports.  It would also run slowly if the system had booted and =
then sat idle for a while. The speed was inconsistent, as occasionally afte=
r a period of idleness it would run somewhat faster again without rebooting=
; also configure and compilation times of sssd were inconsistent, but gener=
ally "slow", sometimes drastically so.

FreeBSD 9.3 (with "xenhvm_load=3D"YES" in loader.conf) on the same Xen host=
 does not have this problem -- it fork()s more quickly and consistently; Fr=
eeBSD 10.1 on KVM (unfortunately not on the same hardware) also appears nor=
mal, as does 8.4 on (different but similar vintage) physical hardware, and =
a Linux VM on the same Xen host.  Using one or two virtual CPUs does not ma=
ke much difference, and the host machine is otherwise idle, so it does not =
appear to be an SMP issue.  I was using ZFS, but I have ruled that out as a=
 factor, as the problem occurs even without zfs.ko loaded (/ is ufs).  Vary=
ing the memory between 1 and 8 GB did not seem to affect anything either.  =
I also built a "NOHVM" 10.1 kernel to see if the Xen drivers were at issue,=
 but that did not help (it was actually a bit slower), so it appears to be =
something deeper in the kernel or scheduler.

The Xen host is running Xen 4.2.5_02-0.7.1 with SLES 11 SP3 as the Dom0, on=
 a Dell 2950 with 8 physical CPU cores (dual socket, quad-core Xeon E5420).=
  I have not experienced performance problems with any other guest OS.

As FreeBSD 9.3 runs fine, I am using that for my FreeBSD VMs for now, but h=
opefully 10.x can be fixed before 9-STABLE goes EOL!  Following are the VM =
config, dmesg, and some benchmarks.


-Andrew

[1] https://github.com/mondalaci/fork-benchmark



Xen DomU config:
=3D=3D=3D=3D=3D=3D=3D=3D
name=3D"fbsd10"
description=3D"FreeBSD 10.1 - testing"
uuid=3D"ed88195c-dee4-0e44-5943-3deceac8a56c"
#memory=3D4096
memory=3D1024
maxmem=3D1024
vcpus=3D2
on_poweroff=3D"destroy"
on_reboot=3D"restart"
on_crash=3D"preserve"
localtime=3D0
keymap=3D"en-us"

builder=3D"hvm"
device_model=3D"/usr/lib/xen/bin/qemu-dm"
kernel=3D"/usr/lib/xen/boot/hvmloader"
boot=3D"c"
disk=3D[ 'phy:/dev/xc-test/fbsd10,hda,w', 'file:/root/FreeBSD-10.1-RELEASE-=
amd64-dvd1.iso,hdc:cdrom,r', ]
vif=3D[ 'mac=3D00:16:3e:3a:57:7a,bridge=3Dbr0,type=3Dnetfront', ]

stdvga=3D0
vnc=3D1
vncunused=3D1
viridian=3D0
acpi=3D1
pae=3D1
serial=3D"pty"
=3D=3D=3D=3D=3D=3D=3D=3D


dmesg:
=3D=3D=3D=3D=3D=3D=3D=3D
Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.1-RELEASE-p6 #0: Tue Feb 24 19:00:21 UTC 2015
    root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
XEN: Hypervisor version 4.2 detected.
CPU: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz (2493.90-MHz K8-class =
CPU)
  Origin =3D "GenuineIntel"  Id =3D 0x10676  Family =3D 0x6  Model =3D 0x17=
  Stepping =3D 6
  Features=3D0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PG=
E,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT>
  Features2=3D0x81282201<SSE3,SSSE3,CX16,SSE4.1,x2APIC,TSCDLT,HV>
  AMD Features=3D0x20100800<SYSCALL,NX,LM>
  AMD Features2=3D0x1<LAHF>
real memory  =3D 1073741824 (1024 MB)
avail memory =3D 1010737152 (963 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <Xen HVM>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  2
ioapic0: Changing APIC ID to 1
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-47 on motherboard
kbd1 at kbdmux0
random: <Software, Yarrow> initialized
xen_et0: <Xen PV Clock> on motherboard
Event timer "XENTIMER" frequency 1000000000 Hz quality 950
Timecounter "XENTIMER" frequency 1000000000 Hz quality 950
acpi0: <Xen> on motherboard
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0xb008-0xb00b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,=
0x376,0xc100-0xc10f at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> mem 0xf0000000-0xf1ffffff,0xf3000000-0xf3=
000fff at device 2.0 on pci0
vgapci0: Boot video device
xenpci0: <Xen Platform Device> port 0xc000-0xc0ff mem 0xf2000000-0xf2ffffff=
 irq 28 at device 3.0 on pci0
xenstore0: <XenStore> on xenpci0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
ppc0: <Parallel port> port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=3D0x100>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
fdc0: No FDOUT register!
Timecounters tick every 10.000 msec
xctrl0: <Xen Control Device> on xenstore0
xenbusb_front0: <Xen Frontend Devices> on xenstore0
cd0 at ata1 bus 0 scbus1 target 0 lun 0
cd0: <QEMU QEMU DVD-ROM 0.10> Removable CD-ROM SCSI-0 device
cd0: Serial Number QM00003
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: cd present [1262221 x 2048 byte records]
xbd0: 6144MB <Virtual Block Device> at device/vbd/768 on xenbusb_front0
xbd0: attaching as ada0
xbd0: features: flush, write_barrier
xbd0: synchronize cache commands enabled.
xn0: <Virtual Network Interface> at device/vif/0 on xenbusb_front0
xn0: Ethernet address: 00:16:3e:3a:57:7a
xenbusb_back0: <Xen Backend Devices> on xenstore0
xn0: backend features: feature-sg feature-gso-tcp4
random: unblocking device.
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/ada0p2 [rw]...
xn0: 2 link states coalesced
=3D=3D=3D=3D=3D=3D=3D=3D


"NOHVM" kernel config (not the dmesg above, but presented for completeness)=
:
=3D=3D=3D=3D=3D=3D=3D=3D
include GENERIC
ident NOHVM

# NOTE: XENHVM depends on xenpci.  They must be added or removed together.
nooptions 	XENHVM			# Xen HVM kernel infrastructure
nodevice	xenpci			# Xen HVM Hypervisor services driver
=3D=3D=3D=3D=3D=3D=3D=3D


Benchmarks:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Fork benchmark -- ./fork-benchmark <numprocs>:

10.1, 2 CPU, fresh boot:
Forked, executed and destroyed 100 processes in 0.268835 seconds.
Forked, executed and destroyed 1000 processes in 2.362202 seconds.
Forked, executed and destroyed 1000 processes in 2.642716 seconds.
Forked, executed and destroyed 10000 processes in 28.75984 seconds.
Forked, executed and destroyed 10000 processes in 34.568837 seconds.
Forked, executed and destroyed 10000 processes in 52.69006 seconds.
Forked, executed and destroyed 10000 processes in 53.41585 seconds.

10.1, 1 CPU, after compiling sssd:
Forked, executed and destroyed 100 processes in 5.684971 seconds.
Forked, executed and destroyed 1000 processes in 60.330680 seconds.

10.1, 2 CPU, NOHVM kernel, after compiling sssd:
Forked, executed and destroyed 5000 processes in 102.849662 seconds.
Forked, executed and destroyed 5000 processes in 107.160831 seconds.
Forked, executed and destroyed 100 processes in 2.524160 seconds.
Forked, executed and destroyed 1000 processes in 19.592753 seconds.

9.3, 1 CPU:
Forked, executed and destroyed 5000 processes in 8.416964 seconds.

9.3, 2 CPU:
1: Forked, executed and destroyed 5000 processes in 9.951971 seconds.
2: Forked, executed and destroyed 5000 processes in 10.185864 seconds.
3: Forked, executed and destroyed 5000 processes in 10.124263 seconds.
(remains consistent)

Compilation times -- cd /usr/ports/security/sssd; make clean; time make con=
figure; time make build
configure:
9.3, 1 CPU:	22.804u 10.764s 0:40.19 83.5% 1400+2497k 816+7885io 456pf+0w
9.3, 2 CPU:	25.732u 14.651s 0:42.38 95.2%   1326+2432k 164+7885io 30pf+0w
10.1, 1 CPU:	148.992u 68.372s 3:38.52 99.4% 2325+197k 0+294io 3pf+0w
10.1, 2 CPU:	1.156u 29.289s 1:02.47 96.7% 4602+225k 774+300io 654pf+0w
(again):	35.229u 21.117s 0:49.30 114.2% 4667+221k 0+291io 0pf+0w
10.1 NOHVM:	80.236u 51.313s 1:51.45 118.0% 2930+200k 0+296io 30pf+0w

build:
9.3, 1 CPU:	233.998u 145.352s 6:22.51 99.1% 1360+2777k 287+3966io 32pf+0w
9.3, 2 CPU:	280.641u 230.728s 4:24.23 193.5% 1157+2675k 0+3968io 0pf+0w
10.1, 1 CPU:	3199.849u 764.871s 1:06:26.72 99.4% 753+182k 203+28io 86pf+0w
10.1, 2 CPU:	744.318u 549.327s 11:02.38 195.3% 2388+193k 235+28io 86pf+0w
(again):	1072.863u 747.565s 15:30.05 195.7% 2119+192k 3+29io 0pf+0w
10.1 NOHVM:	1173.692u 823.116s 17:06.46 194.5% 1725+188k 0+28io 0pf+0w

Note the 10.1/1 CPU build took over an hour!  I'm fairly certain I had a 10=
.1/2 CPU build also take around an hour, but I didn't manage to capture it =
with time(1).=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?115BE54D-078A-4C45-8904-861DAB316C03>