Small look inside

From KVM

A small look inside


This text will be a little explanation about what kvm is doing. Its done while the existence of kvm-54, so future versions of kvm can differ from this.

svm = secure virtual machine (AMD)

vmx = virtual machine extensions (Intel)

Loading Modules

svm (AMD)

If one loads the module svm.ko it invokes the module_init() function of that module. This points like most modules to a own init function,here called svm_init(). This function lies in svm.c.

The svm_init() function does nothing special. It calls kvm_init() with a struct kvm_x86_ops. This structure is defined in x86.h. kvm_init() is the init function in kvm_main.c. If we look at the init functions of svm.c and vmx.c, we see that both call kvm_init(), only with the specific set of kvm_x86_ops and a different sizeof struct vcpu_svm or vcpu_vmx.


  • We got a struct kvm_x86_ops (svm.c) setup with alot of functions.
  • We got a struct vcpu_svm (kvm_svm.h), where we simply need the sizeof firstly
  • We call kvm_init() (kvm_main.c)

In the function kvm_init() theres firstly a call to kvm_init_debug(). This function creates some debugfs entries. The kvm_stats_debugfs_item struct is initialized in the file x86.c, there you can check which debugfs entries then exists. The debugfs must firstly be mounted before using it.

mount -t debugfs none /sys/kernel/debug

and if you want to add it to /etc/fstab, add this line:

none /sys/kernel/debug debugfs defaults 0 0

Then it calls the function kvm_arch_init() with the struct kvm_x86_ops, which we transferred with the opaque variable from svm.c. The function kvm_arch_init() is defined in x86.c.

When we look at kvm_arch_init() we see that it's calling kvm_mmu_module_init(), which is defined in mmu.c. This function creates three lookaside caches. Then it calls kvm_init_msr_list(). msr stands for machine specific registers.

In kvm_init_msr_list() we see that it reads machine specific registers with rdmsr_safe() to the array msrs_to_save[].

We go back to x86.c and kvm_arch_init(). The next mission is checking if the computer got kvm support and initialize to the global pointer, named kvm_x86_ops, the transferred struct kvm_x86_ops. Then it calls kvm_mmu_set_nonpresent_ptes() which is defined in mmu.c. Now it returns back to kvm_init() in kvm_main.c.

Then the function is allocation a page to the global page struct, named bad_page.Then it calls kvm_arch_hardware_setup() which is defined in x86.c and returns kvm_86_ops->hardware_setup(). We know we use svm module, so we initialized the kvm_86_ops struct with functions from svm.c. So we search for .hardware_setup in our svm_x86_ops struct and see that its connected with svm_hardware_setup().

In svm_hardware_setup() we firstly allocate two pages.Then we copy with memset() the byte 0xff to the page address of struct iopm_pages.... Then we allocate one page and do the same. After that it calls set_msr_interception() to set up which MSRs should be intercepted. Then the macro for_each_online_cpu() which expands to a loop through all online cpu's in the computer and then calls in the loop the function svm_cpu_init().

The function svm_cpu_init() allocates memory for a struct svm_cpu_data which member "cpu" gets initialized with every online cpu and the member "save_area" gets one page of memory allocated. Then theres the line per_cpu(svm_data, cpu) = svm_data;, where the per_cpu macro sits in percpu.h and also calls RELOC_HIDE() macro which is in compiler-gcc.h located. At the beginning of file svm.c the line static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); creates a per-CPU variable at compile-time. Now to get the value of the current CPU one can use get_cpu_var(), but to access another processor's copy of the variable, one use per_cpu(). So we initialize the svm_cpu_data struct pointer svm_data to exactly one cpu's svm_data.Then the for_each_online_cpu loop begins with the next cpu. After that every CPU in the system got own memory for a svm_cpu_data struct and one allocated page for smv_data->save_area.

Now where back in kvm_init() function in kvm_main.c. We loop through every online cpu and call for every online cpu the smp_call_function_single() function which runs a function on a specific cpu. The function we put to smp_call_function_single() is kvm_arch_check_processor_compat() which lies in x86.c and only returns kvm_86_ops->check_processor_compatibility. We look again into svm.c at the svm_check_processor_compat() function and see, that this function gives back a NULL pointer(is that correct??). Seemed that only the return value of the function smp_call_function_single in smpcommon.c is interresting. It returns smp_ops.smp_call_function_mask(mask, func, info, wait);.

After that were back in kvm_init() where the macro on_each_cpu will call the function hardware_enable. This function gets the raw_smp_processor_id() and checks if cpu_isset(). If this cpu is not set, it calls cpu_set() and then kvm_arch_hardware_enable(). The function kvm_arch_hardware_enable() only returns kvm_x86_ops->hardware_enable.

We look at svm.c into the svm_hardware_enable() function and see that it calls also the raw_smp_processor_id() function to get the id. Then it checks if this cpu has svm-support builtin. After that it fills the svm_cpu_data struct with per_cpu(svm_data, me) and checks if svm_data is not NULL. Now it assigns values to the svm_cpu_data struct members?. Now it reads and writes some machine specific registers, which enables the svm-extension on the cpu. The next wrmsrl() function writes ...????

Back in kvm_init() we look at register_cpu_notifier which calls in a mutex_lock() the function raw_notifier_chain_register() which adds a notifier to a raw notifier chain. Within this notifier_block we call kvm_cpu_hotplug(). This function reacts on three notifications, CPU_DYING, CPU_UP_CANCELED and CPU_ONLINE and disables or enables the virtualization on that cpu.

And register_reboot_notifier registers a function which will be called at reboot time.

Now we registering the sysdev class with sysdev_class_register().

Then we add a system device to the tree with sysdev_register().

Now we allocated memory with kmem_cache_create, with the sizeof vcpu_size. This kmem cache lets us meet the alignment requirements of fx_save.????

When we called kvm_init() from svm_init() we transferred THIS_MODULE to kvm_init() and now we set svm's module name as owner of kvm_chardev_ops, which is a file_operations struct. This struct only got 2 functions initialized, .unlocked_ioctl and .compat_ioct, which both are initialized with the kvm_dev_ioctl function.

If we have set this, we call misc_register() to register the miscellaneous device. The miscellaneous device got a MAJOR device number of 10. You can check this with

cat /proc/devices|grep misc

The kvm device is then called /dev/kvm for userspace access.

After that, we set kvm_sched_in and kvm_sched_out as the members of struct kvm_preempt_ops, which is type of struct preempt_ops. With theses functions a task can request the scheduler to notify it whenever it is preempted or scheduled back in.This allows the task to swap any special-purpose registers like the fpu or Intel's VT registers.

Now we call kvm_init_anon_inode() which in anon_inodes.c calls anon_inode_init(). This function creates a anonymous inode.

Then we call the function preempt_notifier_sys_init() which ...???

After that we loaded the kvm module.


  • Create 3 lookaside caches for ...
  • Read MSR's into array msrs_to_save for ...
  • Setup kvm_x86_ops
  • Setup which MSRs should be intercepted
  • Allocated memory for every online CPU a struct svm_cpu_data
  • Allocated page for every save_area member of svm_cpu_data
  • Check Hardware compatibility
  • Registered CPU Notifier for hotplug and reboot

Here is a picture that show what all will be created during kvm_init(). picture1

Here is a picture that shows which stuff will be destroyed by kvm_exit(). picture2

vmx (Intel)

If one loads the module vmx.ko it invokes the module_init() function of that module. This points like most modules to a own init function,here called vmx_init(). This function lies in vmx.c.