From KVM

Networking Performance Testing

This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.

We can get different configurations by selecting different options in the following categories: Networking setup, CPU setup, Guest setup, Traffic load. For each of these we are interested in a set of Performance metrics. A test would need to be performed under a controlled Hardware configuration, for each relevant Hypervisor setup and/or Guest setup (depending on which change is tested) on the same hardware. Ideally we'd note the Hardware configuration and person performing the test to increase the chance it can be reproduced later.

Performance metrics

Generally for a given setup and traffic we want to know the Latency and the CPU load. We generally might care about minimal, average (or median) and maximum latencies.


Latency is generally time until you get a response. For some workloads you don't measure latencies directly, instead you measure peak throughput.

CPU load

The only metric that makes sense is probably host system load, of which the only someone quantifiable component seems to be the CPU load. Need take into account the fact that CPU speed might change with time, so load should probably be in seconds (%CPU/speed) rather than plain %CPU.

Some derive metrics from this are:

peak throughput

How high we can load the system until latencies sharply become unreasonable

service demand

Load divided by CPU utilization

Networking setup

CPU setup

Guest setup

Hypervisor setup

Traffic load

Available tools

Hardware configuration

<mst> yes <jasonwang> can we let the perf team to do that? <mst> they likely won't do it in time <mst> I started making up a list of what we need to measure <mst> have a bit of time to discuss? <jasonwang> you mean we need to do it ourself? <mst> at least part of it <jasonwang> I'm sorry, I need to attend the autotest meeting in 10 minutes <jasonwang> mst ok <mst> will have time afterward? <mst> I know it's late in your TZ <jasonwang> ok <mst> cool, then I'll stay connected on irc just ping me <jasonwang> ok <mst> thanks! <jasonwang> you are welcome <jasonwang> hi, just back from the meeting <mst> hi <mst> okay so let's see what we have <jasonwang> okay <mst> first we have the various connection options <jasonwang> yes <mst> we can do: <mst> host to guest <mst> guest to host <mst> ext to guest <mst> ext to host <mst> guest to guest on local <jasonwang> ok <mst> guest to guest across the net <mst> for comparison it's probably useful to do "baremetal": loopback and external<->host <jasonwang> yes <mst> a bit more advanced: bidirectional tests <mst> many to many is probably to hard to setup <jasonwang> yes, so we need only test some key options <mst> yes, for now let's focus on things that are easy to define <mst> ok now what kind of traffic we care about <jasonwang> (ext)host to guest, guest to (ext)host ? <mst> no I mean scheduler is heavily involved <jasonwang> so guest to guest on local is also needed? <mst> yes, think so <mst> so I think we need to try just defaults <mst> (no pinning) <jasonwang> yes, that is usual case <mst> as well as pinned scenario where qemu is pinned to cpus <jasonwang> ok <mst> and for external pinning irqs as well <jasonwang> set irq affinity? <mst> do you know whether virsh let you pin the iothread? <mst> yes, affinity <jasonwang> no, I don't use virsh <mst> need to find out, only pin what virsh let us pin <jasonwang> okay <mst> note vhost-net thread is created on demand, so it is not very practical to pin it <mst> if we do need this capability it will have to be added, I am hoping scheduler does the right thing <jasonwang> yes, it's a workqueue in RHEL6.1 <mst> workqueue is just a list + thread, or we can change it if we like <jasonwang> do you man if we need we can use a dedicated thread like upstream which is easy to be pinned? <mst> upstream is not easier to be pinned <mst> the issue is mostly that thread is only created on driver OK now <jasonwang> yes <mst> so guest can destroy it and recreate and it loses what you set <mst> in benchmark it works but not for real users <jasonwang> yes, agree <mst> maybe cgroups can be used somehow since it inherits the cgroups of the owner <mst> another option is to let qemu control the pinning <mst> either let it specify the thread to do the work <mst> or just add ioctl for pinning <jasonwang> looks possible <mst> in mark wagner's tests it seemed to work well without <mst> so need to see if it's needed, it's not hard to add this interface <mst> but once we add it must maintain forever <mst> so I think irq affinity and cpu pinning are two options to try tweaking <jasonwang> yes, have saw some performance discussion of vhost upstream <mst> need to make sure we try on a numa box <mst> at the moment kernel structures are allocated on first use <jasonwang> yes <mst> I hope it all fits in cache so should not matter <mst> but need to check, not yet sure what exactly <jasonwang> yes, things would be more complicated when using numa <mst> not sure what exactly are the configurations to check <mst> ok so we have the network setup and we have the cpu setup <mst> let thing is traffic to check <mst> let->last <jasonwang> yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else? <mst> let's focus on the protocols first <mst> so we can do TCP, this has a strange property of coalescing messages <mst> but OTOH it's the most used protocol <mst> and it has hard requirements e.g. on the ordering of packets <jasonwang> yes, TCP must to be tested <mst> UDP is only working well up to mtu packet size <mst> but otherwise it let us do pretty low level stuff <jasonwang> yes, agree <mst> ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad) <mst> what kind of traffic we care about? ideally a range of message sizes, and a range of loads <mst> (in terms of messages per second) <jasonwang> yes <mst> what do we want to measure? <jasonwang> bandwidth and latency <mst> I think this not really it <mst> this is what tools like to give us <jasonwang> yes and maybe also the cpu usage <mst> if you think about it in terms of an application, it is always latency that you care about in the end <mst> e.g. I have this huge file what is the latency to send it over the network <mst> and for us also what is the cpu load, you are right <jasonwang> yes <mst> so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second <mst> we want to know the latency and the cpu load <jasonwang> yes <mst> and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable <mst> this last is a bit subjective <mst> but generally any system would gadually become less responsive with more load <mst> then at some point it just breaks <mst> cou load is a bit hard to define <mst> cpu <jasonwang> yes and it looks hard to do the measuring then <mst> I think in the end, what we care about is how many cpu cycles the host burns <jasonwang> yes, but how to measure that? <mst> well we have simple things like /proc/stat <jasonwang> understood and maybe perf can also help <mst> yes quite possibly <mst> in other words we'll need to measure this in parallel while test is running <mst> netperf can report local/remote CPU <mst> but I do not understand what it really means <mst> especially for a guest <jasonwang> yes, if we want to use netperf it's better to know how it does the calculation <mst> well it just looks at /proc/stat AFAIK <jasonwang> yes, I try to take a look at its source <mst> this is the default but it has other heuristics <mst> that can be configured at compile time <jasonwang> ok, understand <mst> ok and I think load divided by CPU is a useful metric <jasonwang> so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB <mst> netperf can report service demand <mst> I do not understand what it is <jasonwang> From its manual its how many us the cpu spend on a KB <mst> well the answer will be it depends :) <mst> also, we have packet loss <mst> I think at some level we only care about packets that were delivered <mst> so e.g. with UDP we only care about received messages <jasonwang> yes, the packet loss may have concerns with guest drivers <mst> with TCP if you look at messages, there's no loss <jasonwang> yes TCP have flow control itself <mst> ok so let's see what tools we have <mst> the simplest is flood ping <jasonwang> yes, it's very simple and easy to use <mst> it gives you control over message size, packets per second, gets you back latency <mst> it is always bidirectional I think <mst> and we need to measure CPU ourselves <mst> that last seems to be true anyway <jasonwang> yes, maybe easy to be understand and analysis than netperf <mst> packet loss when it occurs complicates things <mst> e.g. with 50% packet loss the real load is anywhere in between <jasonwang> yes <mst> that's the only problem: it's always bidirectional so tx/rx problems are hard to separate <jasonwang> yes, vhost is currently half-duplex <mst> I am also not sure it detect reordering <jasonwang> yes, it has sequence no. <jasonwang> but for ping, as you've said it's ICMP and was not the most of the cases <mst> ok, next we have netperf <mst> afaik it can do two things <mst> it can try sending as many packets as it can <jasonwang> yes <mst> or it can send a single one back and forth <mst> not a lot of data, but ok <jasonwang> yes <mst> and similar with UDP <mst> got to go have lunch <mst> So I will try and write all this up <mst> do you have any hardware for testing? <mst> if yes we'll add it too, I'll put up a wiki <mst> back in half an hour <jasonwang> yes, write all things up would help <jasonwang> go home now, please send me mail

  • jasonwang has quit (Quit: Leaving)
  • Loaded log from Wed Dec 15 15:07:24 2010