KVM

From KVM

(Difference between revisions)
Revision as of 12:45, 16 April 2012
AviKivity (Talk | contribs)

← Previous diff
Revision as of 14:50, 11 July 2012
AviKivity (Talk | contribs)
(Clean up completed tasks)
Next diff →
Line 3: Line 3:
The following items need some love. Please post to the list if you are interested in helping out: The following items need some love. Please post to the list if you are interested in helping out:
-* Real mode support: VT support for real mode is terrible, so we need to do it in software. This means extending the x86 emulator (x86_emulate.c) to handle more instructions, and changing the execution loop to call the emulator for real mode (in progress). 
-* Emulate the architectural performance monitor counters/msrs, for Linux nmi watchdog support. 
* Emulate MSR_IA32_DEBUGCTL for guests which use it * Emulate MSR_IA32_DEBUGCTL for guests which use it
* Bring up Windows 95 and Windows 98 guests * Bring up Windows 95 and Windows 98 guests
* Implement ACPI memory hotplug * Implement ACPI memory hotplug
* Improve ballooning to try to use 2MB pages when possible ( in progress - kern.devel@gmail.com ) * Improve ballooning to try to use 2MB pages when possible ( in progress - kern.devel@gmail.com )
-* [[Guest PMU]] support 
-The following smaller scale tasks can be a nice entry point to someone wishing to get involved:  
- 
-* Reduce qemu memory footprint when using kvm 
-** this involves dropping the PhysPageDesc array in favour of a memslot-like approach 
-* Avoid taking kvm->lock when issuing mmio. Need to check lapic and ioapic accesses for correctness. 
==== MMU related: ==== ==== MMU related: ====
Line 25: Line 17:
* Simpler variant: don't drop large ptes when write protecting; just write protect them. When taking a write fault, either drop the large pte, or convert it to small ptes and write protect those (like O(1) write protection). * Simpler variant: don't drop large ptes when write protecting; just write protect them. When taking a write fault, either drop the large pte, or convert it to small ptes and write protect those (like O(1) write protection).
* O(1) mmu invalidation using a generation number * O(1) mmu invalidation using a generation number
- 
==== x86 emulator updates: ==== ==== x86 emulator updates: ====
-* Add a NonPT flag (or maybe its inverse, Paging) to instructions that are never used for page table updates (like add, sub, call). Teach the mmu to unshadow page tables if a NonPT instruction is executed on them.+* On-demand register access, really, copying all registers all the time is gross.
-* Same for ordinary registers, really, copying all registers all the time is gross.+
** Can be done by adding 'available' and 'dirty' bitmasks ** Can be done by adding 'available' and 'dirty' bitmasks
* Implement mmx and sse memory move instructions; useful for guests that use multimedia extensions for accessing vga (partially done) * Implement mmx and sse memory move instructions; useful for guests that use multimedia extensions for accessing vga (partially done)
Line 35: Line 25:
** if this is done, we can retire ->read_std() in favour of ->read_emulated(). ** if this is done, we can retire ->read_std() in favour of ->read_emulated().
* convert more instructions to direct dispatch (function pointer in decode table) * convert more instructions to direct dispatch (function pointer in decode table)
-* fold 'struct decode_cache' into 'struct x88_emulate_context'; reducing a lot of pointless temporary variables. 
* move init_emulate_ctxt() into x86_decode_insn() and other emulator entry points * move init_emulate_ctxt() into x86_decode_insn() and other emulator entry points
Line 60: Line 49:
==== Bug fixes: ==== ==== Bug fixes: ====
* Less sexy but ever important, fixing bugs is one of the most important contributions * Less sexy but ever important, fixing bugs is one of the most important contributions
- 
==== Random improvements ==== ==== Random improvements ====
Line 66: Line 54:
==== For the adventurous: ==== ==== For the adventurous: ====
-* Emulate the VT and SVM instruction sets on qemu. This would be very beneficial to debugging kvm ( working on this - kern.devel@gmail.com ).+* Emulate the VMX instruction sets on qemu. This would be very beneficial to debugging kvm ( working on this - kern.devel@gmail.com ).
* Add [http://lagarcavilla.org/vmgl/ vmgl] support to qemu. Port to virtio. Write a Windows driver. * Add [http://lagarcavilla.org/vmgl/ vmgl] support to qemu. Port to virtio. Write a Windows driver.
* Keep this TODO up to date * Keep this TODO up to date

Revision as of 14:50, 11 July 2012

ToDo

The following items need some love. Please post to the list if you are interested in helping out:

  • Emulate MSR_IA32_DEBUGCTL for guests which use it
  • Bring up Windows 95 and Windows 98 guests
  • Implement ACPI memory hotplug
  • Improve ballooning to try to use 2MB pages when possible ( in progress - kern.devel@gmail.com )


MMU related:

  • Improve mmu page eviction algorithm (currently FIFO, change to approximate LRU).
  • Add a read-only memory type.
    • possible using mprotect()?
  • Implement AM20 for dos and the like.
  • O(1) write protection by protecting the PML4Es, then on demand PDPTEs, PDEs, and PTEs
  • Simpler variant: don't drop large ptes when write protecting; just write protect them. When taking a write fault, either drop the large pte, or convert it to small ptes and write protect those (like O(1) write protection).
  • O(1) mmu invalidation using a generation number

x86 emulator updates:

  • On-demand register access, really, copying all registers all the time is gross.
    • Can be done by adding 'available' and 'dirty' bitmasks
  • Implement mmx and sse memory move instructions; useful for guests that use multimedia extensions for accessing vga (partially done)
  • Implement an operation queue for the emulator. The emulator often calls userspace to perform a read or a write, but due to inversion of control it actually restarts instead of continuing. The queue would allow it to replay all previous operations until it reaches the point it last stopped.
    • if this is done, we can retire ->read_std() in favour of ->read_emulated().
  • convert more instructions to direct dispatch (function pointer in decode table)
  • move init_emulate_ctxt() into x86_decode_insn() and other emulator entry points

Interactivity improvements:

  • If for several frames in a row a large proportion of the framebuffer pages are changing, then for the next few frames don't bother to get the dirty page log from kvm, but instead assume that all pages are dirty. This will reduce page fault overhead on highly interactive workloads.
  • When detecting keyboard/video/mouse activity, scale up the frame rate; when activity dies down, scale it back down (applicable to qemu as well).

Pass-through/VT-d related:

  • Enhance KVM QEMU to return error messages if user attempts to pass-through unsupported devices:
    • Devices with shared host IOAPIC interrupt
    • Conventional PCI devices
    • Devices without FLR capability
  • QEMU PCI pass-through patch needs to be enhanced to same functionality as corresponding file in Xen
    • Remove direct HW access by QEMU for probing PCI BAR size
    • PCI handling of various PCI configuration registers
    • Other enhancements that was done in Xen
  • Host shared interrupt support
  • VT-d2 support (WIP in Linux Kernel)
    • Queued invalidation
    • Interrupt remapping
    • ATS
  • USB 2.0 (EHCI) support

Bug fixes:

  • Less sexy but ever important, fixing bugs is one of the most important contributions

Random improvements

  • Utilize the SVM interrupt queue to avoid extra exits when guest interrupts are disabled

For the adventurous:

  • Emulate the VMX instruction sets on qemu. This would be very beneficial to debugging kvm ( working on this - kern.devel@gmail.com ).
  • Add vmgl support to qemu. Port to virtio. Write a Windows driver.
  • Keep this TODO up to date

Nested VMX

  • Implement performance features such as EPT and VPID

KVM Safe Mode

An ioctl() from userspace that tells KVM to disable one or more of the following features:

- shadow paging (force direct mapping) - instruction emulation (require virtio or mmio hypercall) - task switches - mode switches (long mode / legacy mode / real mode) - IDT/GDT/LDT changes - IDT/GDT/LDT write protect - write protect important MSRs (*STAR etc)

The idea is both to protect the guest from attacks, and to protect the host from the guest.


Views Article Discussion Edit History
Personal tools:  Log in / create account
Toolbox What links here Related changes Upload file Special pages Printable version