Memory: Difference between revisions

From KVM
No edit summary
 
No edit summary
Line 4: Line 4:
----
----


The qemu/kvm process runs mostly like a normal Linux program.  It allocates its memory with normal malloc() or mmap() calls.  If a guest is going to have 1GB of physical memory, qemu/kvm will effectively do a malloc(1GB), which allocates 1GB of host virtual space.  However, just like a normal program doing a malloc(), there is no actual physical memory allocated at this point.  It will not be actually allocated the first time it is touched.
Once the guest is running, it sees that malloc()'d memory area as being its physical memory.  If the guest's kernel were to access what it sees as physical address 0x0, it will see the first page of that malloc() done by the qemu/kvm process.


----
----


It used to be that every time a KVM guest changed its page tables, the host had to be involved.  The host would validate that the entries the guest put in its page tables were valid and that they did not access any memory which was not allowed.  It did this with two mechanisms.  One was that the actual set of page tables being used by the virtualization hardware are separate from the page tables that the guest *thought* were being used.  This concept is called shadow page tables and it is a very common technique in virtualization.  The second part (and the key to this technique) was that the VMX/AMD-V extensions allowed the host to trap whenever the guest tried to set the register pointing to the base page table (CR3).


It used to be that every time a KVM guest changed its page tables, the host had to be involved.  The host would validate that the entries the guest put in its page tables were valid and that they did not access any memory which was not allowed. It did this with two mechanismsOne was that the actual set of page tables being used by the virtualization hardware are separate from the page tables that the guest *thought* were being used.  This concept is called shadow page tables and it is a very common technique in virtualization.  The second part (and the key to this technique) was that the VMX/AMD-V extensions allowed the host to trap whenever the guest tried to set the register pointing to the base page table (CR3).
This technique works fine.  But, it has some serious performance implications.  A single access to a guest page can take up to 25 memory accesses to complete, which gets very costly. See this paper: http://developer.amd.com/assets/NPT-WP-1%201-final-TM.pdf for more informationThe basic problem is that every access to memory must go through both the page tables of the guest and then the page tables of the host.  The two-dimensional part comes in because the page tables of the guest must *themselves* go through the page tables of the host.


This technique works fine.  But, it has some serious performance implications.  A single access to a guest page can take up to 25 memory accesses to complete, which gets very costly. See this paper: http://developer.amd.com/assets/NPT-WP-1%201-final-TM.pdf for more information.
It can also be very costly for the host to verify and maintain the shadow page tables.


----
----
Both AMD and Intel sought solutions to these problems and came up with similar answers called EPT and NPT.  They specify a set of structures recognized by the hardware which can quickly translate guest physical addresses to host physical addresses *without* going through the host page tables.  This shortcut removes the costly two-dimensional page table walks.
The problem with that is that the host page tables are what we use to enforce things like process separation.  If a page was to be unmapped from the host (when it is swapped, for instance), it then becomes difficult to coordinate that with these new hardware EPT/NPT structures.
The solution in software is something Linux calls mmu_notifiers.

Revision as of 15:46, 29 January 2010

There are two modes in which KVM can work.



The qemu/kvm process runs mostly like a normal Linux program. It allocates its memory with normal malloc() or mmap() calls. If a guest is going to have 1GB of physical memory, qemu/kvm will effectively do a malloc(1GB), which allocates 1GB of host virtual space. However, just like a normal program doing a malloc(), there is no actual physical memory allocated at this point. It will not be actually allocated the first time it is touched.

Once the guest is running, it sees that malloc()'d memory area as being its physical memory. If the guest's kernel were to access what it sees as physical address 0x0, it will see the first page of that malloc() done by the qemu/kvm process.


It used to be that every time a KVM guest changed its page tables, the host had to be involved. The host would validate that the entries the guest put in its page tables were valid and that they did not access any memory which was not allowed. It did this with two mechanisms. One was that the actual set of page tables being used by the virtualization hardware are separate from the page tables that the guest *thought* were being used. This concept is called shadow page tables and it is a very common technique in virtualization. The second part (and the key to this technique) was that the VMX/AMD-V extensions allowed the host to trap whenever the guest tried to set the register pointing to the base page table (CR3).

This technique works fine. But, it has some serious performance implications. A single access to a guest page can take up to 25 memory accesses to complete, which gets very costly. See this paper: http://developer.amd.com/assets/NPT-WP-1%201-final-TM.pdf for more information. The basic problem is that every access to memory must go through both the page tables of the guest and then the page tables of the host. The two-dimensional part comes in because the page tables of the guest must *themselves* go through the page tables of the host.

It can also be very costly for the host to verify and maintain the shadow page tables.


Both AMD and Intel sought solutions to these problems and came up with similar answers called EPT and NPT. They specify a set of structures recognized by the hardware which can quickly translate guest physical addresses to host physical addresses *without* going through the host page tables. This shortcut removes the costly two-dimensional page table walks.

The problem with that is that the host page tables are what we use to enforce things like process separation. If a page was to be unmapped from the host (when it is swapped, for instance), it then becomes difficult to coordinate that with these new hardware EPT/NPT structures.

The solution in software is something Linux calls mmu_notifiers.