Revision as of 08:59, 10 October 2013

Automatic Ballooning

Introduction

When a Linux host is running out of memory, the kernel will take action to reclaim memory. This action may be detrimental to KVM guests performace (eg. swapping) or even extreme to the point where the kernel may kill a VM or an important virt stack component.

To help avoiding this scenario, a KVM guest could automatically give memory to the host when the host is facing memory pressure. By doing so the guest may also get into memory pressure so we need a way to allow the guest to automatically get memory back.

Design

KVM guests have a driver called the balloon driver. This driver allows guests to shrink and grow their memory. The balloon driver supports two operations:

Inflate: memory is taken from the guest and given to the host (guest shrinks)
Deflate: memory is returned from the host to the guest (guest grows)

Today, both operations are manual. The automatic ballooning project is about making them completely automatic, based on host and guest needs.

Automatic Inflate

Automatic inflate is performed by QEMU (ie. the KVM host). QEMU registers for memory pressure events so that it's notified when the host is under memory pressure.

Current patches have pre-defined values to be used by QEMU when it receives a memory pressure notification from the host kernel. Those values are:

1MB on LOW pressure
2MB on MEDIUM pressure
4MB on CRITICAL pressure

For example, suppose the host is facing MEDIUM pressure and notifies QEMU. When QEMU receives the event, it asks the guest to inflate its balloon by 2MB. The guest in turn will shrink itself by 2MB and give that memory to the host.

Automatic Deflate

Automatic deflate is performed by the guest or, more specifically, by the balloon driver.

The virtio-balloon driver registers a callback with the shrinker API. That callback is called when the guest kernel is facing memory pressure, and the number of pages to be returned to the kernel is passed to the callback. The balloon driver shrink callback deflates the guest's balloon.

For example, suppose the guest shrunk itself because of pressure in the host. But after some time, the guest is running more applications which causes memory pressure in the guest due its reduced size. The guest kernel then starts reclaiming memory and calls all shrink callbacks. That's when the balloon driver's shrink callback runs, and deflates the balloon by the number of pages specified. This causes the guest to grow again.

GIT trees

Latest RFC version posted upstream

QEMU:

git://repo.or.cz/qemu/qmp-unstable.git balloon/auto-ballooning/rfc.v2 (or grab the patch here)

Guest kernel:

git://repo.or.cz/linux-2.6/luiz-linux-2.6.git virtio-balloon/auto-deflate/rfc (or grab the first two patches from the web interface)

Development branches

QEMU:

git://repo.or.cz/qemu/qmp-unstable.git balloon/auto-ballooning/current

Guest kernel:

git://repo.or.cz/linux-2.6/luiz-linux-2.6.git virtio-balloon/auto-deflate/current

Testing

You have to setup the following before experimenting with auto-ballooning:

Install kernel 3.10 or higher in your host. Make sure the kernel options CONFIG_CGROUPS and CONFIG_MEMCG are enabled
Build and install QEMU from git://repo.or.cz/qemu/qmp-unstable.git balloon/auto-ballooning/rfc.v2 (or grab the patch here)
Build and install the guest kernel from git://repo.or.cz/linux-2.6/luiz-linux-2.6.git virtio-balloon/auto-deflate/rfc (or grab the first two patches from the web interface)

After setting up the above, do the following to experiment with automatic deflate:

Pass -balloon virtio,auto-balloon=true when starting QEMU
Wait for the guest to boot, then generate some memory pressure in the guest (say a kernel build with lots of jobs)
Switch to QEMU's monitor and shrink the guest (say from 1G to 200MB)
Watch the guest increase its memory by running "free" within the guest (or "info balloon" in QEMU)

There are two ways to play with automatic inflate:

The simplest thing is to create a memory constrained cgroup, but this will require code changes in QEMU because it's hardcoded to use the root cgroup (yes, we need a command-line option for that)
If you don't want to play with cgroups, you can overcommit your host by running several VMs in parallel. The VMs have to run a heavy memory workload

@@ Line 60: / Line 60: @@
 == Testing ==
-You have to do three things to play with automatic-ballooning:
+You have to setup the following before experimenting with auto-ballooning:
-# Install kernel 3.10 or higher in the host. Make sure to enable CONFIG_CGROUPS and CONFIG_MEMCG
+# Install kernel 3.10 or higher in your host. Make sure the kernel options CONFIG_CGROUPS and CONFIG_MEMCG are enabled
-# Clone QEMU from ''git://repo.or.cz/qemu/qmp-unstable.git balloon/auto-ballooning/rfc.v2'' (or grab the patch [http://repo.or.cz/w/qemu/qmp-unstable.git/commit/59fec94e6396d6c32cdce47777440c3a988be63d here])
+# Build and install QEMU from ''git://repo.or.cz/qemu/qmp-unstable.git balloon/auto-ballooning/rfc.v2'' (or grab the patch [http://repo.or.cz/w/qemu/qmp-unstable.git/commit/59fec94e6396d6c32cdce47777440c3a988be63d here])
-# Install the following kernel on your guest ''git://repo.or.cz/linux-2.6/luiz-linux-2.6.git virtio-balloon/auto-deflate/rfc'' (or grab the first two patches from the [http://repo.or.cz/w/linux-2.6/luiz-linux-2.6.git/shortlog/refs/heads/virtio-balloon/auto-deflate/rfc web interface])
+# Build and install the guest kernel from ''git://repo.or.cz/linux-2.6/luiz-linux-2.6.git virtio-balloon/auto-deflate/rfc'' (or grab the first two patches from the [http://repo.or.cz/w/linux-2.6/luiz-linux-2.6.git/shortlog/refs/heads/virtio-balloon/auto-deflate/rfc web interface])
-After setting up the above, do the following to experiment with automatic deflate (which is easy to reproduce):
+After setting up the above, do the following to experiment with automatic deflate:
 # Pass ''-balloon virtio,auto-balloon=true'' when starting QEMU
 # Wait for the guest to boot, then generate some memory pressure in the guest (say a kernel build with lots of jobs)
-# Switch to QEMU's monitor and shrink guest memory (say from 1G to 200MB)
+# Switch to QEMU's monitor and shrink the guest (say from 1G to 200MB)
-# Watch the guest increase its memory by running "info balloon" on QEMU's monitor
+# Watch the guest increase its memory by running "free" within the guest (or "info balloon" in QEMU)
-To see automatic inflate and automatic deflate in action, you can run several VMs in parallel doing some heavy memory workload. Make sure to over commit host memory. Say your host has 4GB, run 6 VMs with 1GB each.
+There are two ways to play with automatic inflate:
-Another idea is to run one or more VMs in a memory constrained cgroup, although this will require some hacking on the current patches as they register for the root memory cgroup.
+# The simplest thing is to create a memory constrained cgroup, but this will require code changes in QEMU because it's hardcoded to use the root cgroup (yes, we need a command-line option for that)
+# If you don't want to play with cgroups, you can overcommit your host by running several VMs in parallel. The VMs have to run a heavy memory workload

Projects/auto-ballooning: Difference between revisions