Zvol backed bhyve guest

Things have moved forward in the world of bhyve since I last looked at it over a year ago, support for zvol backed guests where fixed long ago among other things such as the birth of vmrc by Michael Dexter.
To run a guest with a ZFS zvol as its disk is no different to running with a disk image, only thing is that my version of /usr/share/examples/bhyve/vmrun.sh (11.0-CURRENT r267869 at the time of writing) fails to start from the disk once the OS has been installed.

A typical deployment scenario would be

Create a zvol of some size, e.g. 10GB

zfs create -V 10g zroot/guest0

Start a guest which’ll boot from the FreeBSD install CD iso & install onto the zvol

# sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 1024M -t tap0 -d /dev/zvol/zroot/guest0 -i -I FreeBSD-10.0-RELEASE-amd64-disc1.iso guest0

Use the “ZFS – Automatic Root-on-ZFS” option from the Partitioning menu

As instructed in the bhyve section of the handbook, before rebooting, drop to the shell & edit /etc/ttys & replace the console line with

console "/usr/libexec/getty std.9600" xterm on secure

Shutdown the operating system
halt -p

Kill the guest
/usr/sbin/bhyvectl --destroy --vm=guest0

Create a new guest
bhyveload -m 4G -d /dev/zvol/zroot/guest0 guest0

Boot the new guest from the zvol
bhyve -c 1 -m 4G -A -H -P -s0:0,hostbridge -s 1:0,virtio-net,tap0 -s 2:0,ahci-hd,/dev/zvol/zroot/guest0 -s 31,lpc -l com1,stdio guest0

These instruction skip the creation of networking which is covered in the FreeBSD handbook as linked to above.

Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014
root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (3399.54-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0x306a9 Family = 0x6 Model = 0x3a Stepping = 9
Features=0x8f83ab7f
Features2=0xfe9a6257
AMD Features=0x20100800
AMD Features2=0x1
Standard Extended Features=0x200
TSC: P-state invariant
real memory = 5368709120 (5120 MB)
avail memory = 4103143424 (3913 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table:
ioapic0 irqs 0-23 on motherboard
random: initialized
module_register_init: MOD_LOAD (vesa, 0xffffffff80cfa5e0, 0) error 19
kbd1 at kbdmux0
acpi0: on motherboard
acpi0: Power Button (fixed)
atrtc0: port 0x70-0x71 irq 8 on acpi0
Event timer "HPET" frequency 10000000 Hz quality 550
Event timer "HPET1" frequency 10000000 Hz quality 450
Event timer "HPET2" frequency 10000000 Hz quality 450
Event timer "HPET3" frequency 10000000 Hz quality 450
Event timer "HPET4" frequency 10000000 Hz quality 450
Event timer "HPET5" frequency 10000000 Hz quality 450
Event timer "HPET6" frequency 10000000 Hz quality 450
Event timer "HPET7" frequency 10000000 Hz quality 450
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: port 0x408-0x40b on acpi0
pcib0: port 0xcf8-0xcff on acpi0
pci0: on pcib0
virtio_pci0: port 0x2000-0x201f mem 0xc0000000-0xc0001fff irq 16 at device 1.0 on pci0
vtnet0: on virtio_pci0
virtio_pci0: host features: 0x1018020
virtio_pci0: negotiated features: 0x18020
vtnet0: Ethernet address: 00:a0:98:7f:5a:41
virtio_pci1: port 0x2040-0x207f mem 0xc0002000-0xc0003fff irq 17 at device 2.0 on pci0
vtblk0: on virtio_pci1
virtio_pci1: host features: 0x10000044
virtio_pci1: negotiated features: 0x10000044
vtblk0: 40960MB (83886080 512 byte sectors)
isab0: at device 31.0 on pci0
isa0: on isab0
uart0: port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
uart1: port 0x2f8-0x2ff irq 3 on acpi0
sc0: at flags 0x100 on isa0
sc0: MDA
vga0: at port 0x3b0-0x3bb iomem 0xb0000-0xb7fff on isa0
atkbdc0: at port 0x60,0x64 on isa0
atkbd0: irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
ppc0: cannot reserve I/O port range
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 10.000 msec
random: unblocking device.
Netvsc initializing... Timecounter "TSC-low" frequency 1699769676 Hz quality 1000
Trying to mount root from zfs:zroot/ROOT/default []...

bhyve – BSD Hypervisor

With the videos released last month from euroBSDcon 2012, I watched Michael Dexter’s talk on bhyve, the BSD hypervisor has come along way since I last tried it over a year ago & Michael has helped a with it’s progress by writing articles on CFT & scripts for running bhyve.
Last week I decided to get myself a server which I could use to do builds quickly & to run virtual machines for testing. Hetzner do high spec consumer hardware as servers,  €59 per month get you a i7 with 32GB of RAM & 2x 3TB HDD, I ordered the server along with a 16GB USB flash drive with the plan of running SmartOS, once my login details for the server came through, I raised a support ticket for access to a IP KVM, within the hour I was given access & the installation went seamlessly. SmartOS was running on my server & it all went down hill from there.
As there is a IPv4 address shortage, hetzner charge a premium for additional addresses as a routed subnet, along with an additional fee for having the ability to request additional addressses as a “flexi pack”, a /27 would cost €47, I was not going to pay this so decided to go IPv6 only as I have connectivity at home & work. Unfortunately, though IPv6 support is there in the core of SmartOS by interitence from OpenSolaris, the additions from Joyent for KVM don’t, main culprit being vmadm(1m), after losing two days trying to get things working I came to the conclusion that A) it would be a big pain to maintain going forward as the burden would be on me to work around the shortfalls of the system B) I didn’t want to maintain my own release with third party patches which were not in yet C) I didn’t like the way I would have to extend the system to add functionality eg to set the hostname for your system persistently you have to use a script D) getting IPv6 support to guests was painful.

The majority of the work I’m doing is oriented around FreeBSD, it takes over 4 hours to do a build world & kernel on my ThinkPad X61s with a 1.6GHz Core2Duo so anything that can prolong it’s life & give me new builds quickly is good. I placed another support request for IP KVM (LARA in the world of hetzner) & once I had the login details I netbooted the server to  their FreeBSD rescue environment which is a FreeBSD 8.3 based copy of mfsBSD. From there I fetched the latest FreeBSD-CURRENT usb image & wrote it to the flash drive using dd(1) & went about setting up a mirrored zpool to install FreeBSD onto.

Once the installation was complete & the system was up & running I revisited Michael’s talk, slides & scripts.
His scripts are numbered sequentially so you can easily go from creating a disk image to running & managing your virtual machines. This article covers a summary of what is involved to get a guest VM ready with FreeBSD-CURRENT built from source which are taken from his scripts & slides. As development has progressed since the talk, some things which are performed are no longer required. Essentially, you can boot a stock system from a disk image with only 2 necessary modifications to stock configuration files for dealing with the console.
There is also a vmrun.sh script which simplifies the whole process to try out (see instructions)

First build world & kernel (not necessary, you can use the precompiled binary instead if you choose)

On the host add the following to /boot/loader.conf
vmm_load="YES"
if_tap_load="YES"
bridgestp_load="YES"
if_bridge_load="YES"
bridgestp_load="YES"

Create a file which will be used as your disk, eg a 80GB one
truncate -s 80G disk.img
Create a md(4) disk with the file you just created
mdconfig disk.img
Initialise the disk to use the entire disk as a freebsd slice
fdisk -BI md0

You’ll receive the following error which can be safely ignored
******* Working on device /dev/md0 *******
fdisk: invalid fdisk partition table found

Write a standard label & boot code to slice 1
bsdlabel -wB /dev/md0s1
Write a filesystem to slice 1a
newfs -U /dev/md0s1a
Mount it to /mnt
mount /dev/md0s1a /mnt

From /usr/src, install world, kernel & distribution (contents of /etc) onto the disk image
make installworld DESTDIR=/mnt
make installkernel DESTDIR=/mnt
make distribution DESTDIR=/mnt

Setup your fstab to mount root from /dev/vtbd0s1a
echo "/dev/vtbd0s1a / ufs rw 1 1" > /mnt/etc/fstab
Configure your console
echo 'console "/usr/libexec/getty std.9600" vt100 on secure' > /mnt/etc/ttys
echo 'console="userboot"' > /mnt/boot/loader.conf

Aside from configuring /etc/rc.conf the instructions above cover the bare minimum to get a booting VM.

From Michael’s 2-install-guest.sh I’ve skipped loading the virtio drivers in /boot/loader.conf as they’re loaded by default in FreeBSD-CURRENT & the following though I’ve not given it more testing
Helps Kernel detected it’s running in a virtualised environment
smbios.bios.vendor="Bochs"
Avoid clock drift
kern.timecounter.hardware="TSC"
kern.timecounter.invariant_tsc="1"

PCI pass-through support as it caused hangs
hw.pci.enable_msix="0"
hw.pci.honor_msi_blacklist="0"

Unmount the file system
umount /mnt
Detach the file from md(4)
mdconfig -d -u 0
Assuming you’re using md0
You can get a list of configured devices with
mdconfig -l

As covered in 3-host-prep.sh you can load the required kernel modules for bhyve & guest networking by running
kldload vmm
kldload if_tap
kldload bridgestp
kldload if_bridge
or rebooting 🙂

Before starting your VM, you need to create the needed interfaces, a tap(4) interfaces with a bridge(4) linked to the interface you want the VM to be able to communicate on, in my case a re(4)
ifconfig tap0 create up
ifconfig bridge0 create up
ifconfig bridge0 addm tap0 addm re0 up

Because of STP, once you have started the virtual machine, you should pause at the boot menu by pressing space & waiting 20 seconds until STP has stabilised otherwise you may find strange issues with you guest not being able to communicate properly.
If you restart a VM, it is also important to destroy the tap & bridge interfaces before starting up again or you will again experience odd behaviour e.g I was seeing traffic come in to the VM but not going out.
ifconfig tap0 destroy
ifconfig bridge0

To start a VM with less than 4GB RAM issue
sudo bhyveload -d /path/to/disk.img -m 256 vmname && sudo bhyve -c 1 -a -A -m 256 -I -H -g 0 -s 0:0,hostbridge -s 2:0,virtio-blk,/path/to/disk.img -s 1:0,virtio-net,tap0 -S 31,uart,stdio vmname
This will start a VM called vmname which uses 256MB RAM.

To start a VM which uses 4GB or more you’ll have to specify memory settings differently as you need to lead space for PCI MMIO decode below 4GB, so for example, if you wanted to use 8GB RAM, you’d issue
sudo bhyveload -d /path/to/disk.img -m 2048 -M 6144 vmname && sudo bhyve -c 1 -a -A -m 2048 -M 6144 -I -H -g 0 -s 0:0,hostbridge -s 2:0,virtio-blk,/path/to/disk.img -s 1:0,virtio-net,tap0 -S 31,uart,stdio vmname

To shutdown a VM issue
bhyvectl --vm=vmname --destroy

My next step is to now see how to use a ZFS filesystem instead of a file based disk for the VM.