From KVM
Line 23: Line 23:
 
       Multiqueue causes regression in some workloads, thus
 
       Multiqueue causes regression in some workloads, thus
 
       it is off by default. Detect and enable/disable
 
       it is off by default. Detect and enable/disable
       automatically so we can make it on by default
+
       automatically so we can make it on by default.
 +
      This is because GSO tends to batch less when mq is enabled.
 +
      https://patchwork.kernel.org/patch/2235191/
 +
      Developer: Jason Wang
 +
 
 +
* rework on flow caches
 +
      Current hlist implementation of flow caches has several limitations:
 +
      1) at worst case, linear search will be bad
 +
      2) not scale
 +
      https://patchwork.kernel.org/patch/2025121/
 +
      Developer: Jason Wang
 +
     
 +
* eliminate the extra copy in virtio-net driver
 +
      We need do an extra copy of 128 bytes for every packets.
 +
      This could be eliminated for small packets by:
 +
      1) use build_skb() and head frag
 +
      2) bigger vnet header length ( >= NET_SKB_PAD + NET_IP_ALIGN )
 +
      Or use a dedicated queue for small packet receiving ? (reordering)
 +
      Developer: Jason Wang
 +
 
 +
* make pktgen works for virtio-net ( or partially orphan )
 +
      virtio-net orphan the skb during tx,
 +
      which will makes pktgen wait for ever to the refcnt.
 +
      Jason's idea: introduce a flat to tell pktgen not for wait
 +
      Discussion here: https://patchwork.kernel.org/patch/1800711/
 +
      MST's idea: add a .ndo_tx_polling not only for pktgen
 +
      Developer: Jason Wang
 +
 
 +
* Add HW_VLAN_TX support for tap
 +
      Eliminate the extra data moving for tagged packets
 +
      Developer: Jason Wang
 +
 
 +
* Announce self by guest driver
 +
      Send gARP by guest driver. Guest part is finished.
 +
      Qemu is ongoing.
 +
      V7 patches is here:
 +
      http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html
 
       Developer: Jason Wang
 
       Developer: Jason Wang
  
Line 94: Line 130:
 
   support "reposting" buffers for mergeable buffers,
 
   support "reposting" buffers for mergeable buffers,
 
   support pool for indirect buffers
 
   support pool for indirect buffers
 +
 +
* more GSO type support:
 +
      Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL
  
 
=== vague ideas: path to implementation not clear
 
=== vague ideas: path to implementation not clear
Line 108: Line 147:
 
     core, need to teach it to allocate array of
 
     core, need to teach it to allocate array of
 
     pointers and not array of queues.
 
     pointers and not array of queues.
 +
    Jason has an draft patch to use flex array.
 +
    Another thing is to move the flow caches out of tun_struct.
 +
    Developer: Jason Wang
  
 
* irq/numa affinity:
 
* irq/numa affinity:

Revision as of 09:27, 24 May 2013

This page should cover all networking related activity in KVM, currently most info is related to virtio-net.

TODO: add bugzilla entry links.

=== projects in progress. contributions are still very wellcome!

  • vhost-net scalability tuning: threading for many VMs
     Plan: switch to workqueue shared by many VMs
     http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html

http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument

     Developer: Shirley Ma?, MST?
     Testing: netperf guest to guest
  • multiqueue support in macvtap
      multiqueue is only supported for tun.
      Add support for macvtap.
      Developer: Jason Wang
  • enable multiqueue by default
      Multiqueue causes regression in some workloads, thus
      it is off by default. Detect and enable/disable
      automatically so we can make it on by default.
      This is because GSO tends to batch less when mq is enabled.
      https://patchwork.kernel.org/patch/2235191/
      Developer: Jason Wang
  • rework on flow caches
      Current hlist implementation of flow caches has several limitations:
      1) at worst case, linear search will be bad
      2) not scale
      https://patchwork.kernel.org/patch/2025121/
      Developer: Jason Wang
      
  • eliminate the extra copy in virtio-net driver
      We need do an extra copy of 128 bytes for every packets. 
      This could be eliminated for small packets by:
      1) use build_skb() and head frag
      2) bigger vnet header length ( >= NET_SKB_PAD + NET_IP_ALIGN )
      Or use a dedicated queue for small packet receiving ? (reordering)
      Developer: Jason Wang
  • make pktgen works for virtio-net ( or partially orphan )
      virtio-net orphan the skb during tx,
      which will makes pktgen wait for ever to the refcnt.
      Jason's idea: introduce a flat to tell pktgen not for wait
      Discussion here: https://patchwork.kernel.org/patch/1800711/
      MST's idea: add a .ndo_tx_polling not only for pktgen
      Developer: Jason Wang
  • Add HW_VLAN_TX support for tap
      Eliminate the extra data moving for tagged packets
      Developer: Jason Wang
  • Announce self by guest driver
      Send gARP by guest driver. Guest part is finished.
      Qemu is ongoing.
      V7 patches is here:
      http://lists.nongnu.org/archive/html/qemu-devel/2013-03/msg01127.html
      Developer: Jason Wang
  • guest programmable mac/vlan filtering with macvtap
       Developer: Dragos Tatulea?, Amos Kong
       Status: GuestProgrammableMacVlanFiltering
  • bridge without promisc mode in NIC
 given hardware support, teach bridge
 to program mac/vlan filtering in NIC
 Helps performance and security on noisy LANs
 http://comments.gmane.org/gmane.linux.network/266546
 Developer: Vlad Yasevich
  • reduce networking latency:
 allow handling short packets from softirq or VCPU context
 Plan:
   We are going through the scheduler 3 times
   (could be up to 5 if softirqd is involved)
   Consider RX: host irq -> io thread -> VCPU thread ->
   guest irq -> guest thread.
   This adds a lot of latency.
   We can cut it by some 1.5x if we do a bit of work
   either in the VCPU or softirq context.
 Testing: netperf TCP RR - should be improved drastically
          netperf TCP STREAM guest to host - no regression
 Developer: MST
  • Flexible buffers: put virtio header inline with packet data
 https://patchwork.kernel.org/patch/1540471/
 Developer: MST
  • device failover to allow migration with assigned devices
 https://fedoraproject.org/wiki/Features/Virt_Device_Failover
 Developer: Gal Hammer, Cole Robinson, Laine Stump, MST
  • Reuse vringh code for better maintainability
 Developer: Rusty Russell
  • Improve stats, make them more helpful for per analysis
 Developer: Sriram Narasimhan
  • Bug: e1000 & rtl8139: Change macaddr in guest, but not update to qemu (info network)
 Developer: Amos Kong

projects that are not started yet - no owner

  • netdev polling for virtio.
 See http://lkml.indiana.edu/hypermail/linux/kernel/1303.0/00553.html
  • receive side zero copy
 The ideal is a NIC with accelerated RFS support,
 So we can feed the virtio rx buffers into the correct NIC queue.
 Depends on non promisc NIC support in bridge.
  • IPoIB infiniband bridging
 Plan: implement macvtap for ipoib and virtio-ipoib
  • RDMA bridging
  • DMA emgine (IOAT) use in tun
 Old patch here: [PATCH RFC] tun: dma engine support
 It does not speed things up. Need to see why and
 what can be done.
  • use kvm eventfd support for injecting level interrupts,
 enable vhost by default for level interrupts
  • virtio API extension: improve small packet/large buffer performance:
 support "reposting" buffers for mergeable buffers,
 support pool for indirect buffers
  • more GSO type support:
      Kernel not support more type of GSO: FCOE, GRE, UDP_TUNNEL

=== vague ideas: path to implementation not clear

  • ring redesign:
     find a way to test raw ring performance 
     fix cacheline bounces 
     reduce interrupts


  • support more queues
    We limit TUN to 8 queues, but we really want
    1 queue per guest CPU. The limit comes from net
    core, need to teach it to allocate array of
    pointers and not array of queues.
    Jason has an draft patch to use flex array.
    Another thing is to move the flow caches out of tun_struct.
    Developer: Jason Wang
  • irq/numa affinity:
    networking goes much faster with irq pinning:
    both with and without numa.
    what can be done to make the non-pinned setup go faster?
  • reduce conflict with VCPU thread
   if VCPU and networking run on same CPU,
   they conflict resulting in bad performance.
   Fix that, push vhost thread out to another CPU
   more aggressively.
  • rx mac filtering in tun
       the need for this is still not understood as we have filtering in bridge
       we have a small table of addresses, need to make it larger
       if we only need filtering for unicast (multicast is handled by IMP filtering)
  • vlan filtering in tun
       the need for this is still not understood as we have filtering in bridge
  • vlan filtering in bridge
       IGMP snooping in bridge should take vlans into account


testing projects

Keeping networking stable is highest priority.

  • Run weekly test on upstream HEAD covering test matrix with autotest

non-virtio-net devices

  • e1000: stabilize

test matrix

DOA test matrix (all combinations should work):

       vhost: test both on and off, obviously
       test: hotplug/unplug, vlan/mac filtering, netperf,
            file copy both ways: scp, NFS, NTFS
       guests: linux: release and debug kernels, windows
       conditions: plain run, run while under migration,
               vhost on/off migration
       networking setup: simple, qos with cgroups
       host configuration: host-guest, external-guest