NetworkingPerformanceTesting

Networking Performance Testing

This is a summary of performance acceptance criteria for changes in hypervisor virt networking. The matrix of configurations we are interested in is built combining possible options. Naturally the bigger a change the more exhaustive would we want the coverage to be.

We can get different configurations by selecting different options in the following categories: #Networking setup, [[#CPU setup], #Guest setup, #Traffic load. For each of these we are interested in a set of #Performance metrics. A test would need to be performed under a controlled Hardware configuration, for each relevant #Hypervisor setup and/or #Guest setup (depending on which change is tested) on the same hardware. Ideally we'd note the #Hardware configuration and person performing the test to increase the chance it can be reproduced later.

Performance metrics

Networking setup

CPU setup

Guest setup

Hypervisor setup

Traffic load

Hardware configuration

<mst> yes <jasonwang> can we let the perf team to do that? <mst> they likely won't do it in time <mst> I started making up a list of what we need to measure <mst> have a bit of time to discuss? <jasonwang> you mean we need to do it ourself? <mst> at least part of it <jasonwang> I'm sorry, I need to attend the autotest meeting in 10 minutes <jasonwang> mst ok <mst> will have time afterward? <mst> I know it's late in your TZ <jasonwang> ok <mst> cool, then I'll stay connected on irc just ping me <jasonwang> ok <mst> thanks! <jasonwang> you are welcome <jasonwang> hi, just back from the meeting <mst> hi <mst> okay so let's see what we have <jasonwang> okay <mst> first we have the various connection options <jasonwang> yes <mst> we can do: <mst> host to guest <mst> guest to host <mst> ext to guest <mst> ext to host <mst> guest to guest on local <jasonwang> ok <mst> guest to guest across the net <mst> for comparison it's probably useful to do "baremetal": loopback and external<->host <jasonwang> yes <mst> a bit more advanced: bidirectional tests <mst> many to many is probably to hard to setup <jasonwang> yes, so we need only test some key options <mst> yes, for now let's focus on things that are easy to define <mst> ok now what kind of traffic we care about <jasonwang> (ext)host to guest, guest to (ext)host ? <mst> no I mean scheduler is heavily involved <jasonwang> so guest to guest on local is also needed? <mst> yes, think so <mst> so I think we need to try just defaults <mst> (no pinning) <jasonwang> yes, that is usual case <mst> as well as pinned scenario where qemu is pinned to cpus <jasonwang> ok <mst> and for external pinning irqs as well <jasonwang> set irq affinity? <mst> do you know whether virsh let you pin the iothread? <mst> yes, affinity <jasonwang> no, I don't use virsh <mst> need to find out, only pin what virsh let us pin <jasonwang> okay <mst> note vhost-net thread is created on demand, so it is not very practical to pin it <mst> if we do need this capability it will have to be added, I am hoping scheduler does the right thing <jasonwang> yes, it's a workqueue in RHEL6.1 <mst> workqueue is just a list + thread, or we can change it if we like <jasonwang> do you man if we need we can use a dedicated thread like upstream which is easy to be pinned? <mst> upstream is not easier to be pinned <mst> the issue is mostly that thread is only created on driver OK now <jasonwang> yes <mst> so guest can destroy it and recreate and it loses what you set <mst> in benchmark it works but not for real users <jasonwang> yes, agree <mst> maybe cgroups can be used somehow since it inherits the cgroups of the owner <mst> another option is to let qemu control the pinning <mst> either let it specify the thread to do the work <mst> or just add ioctl for pinning <jasonwang> looks possible <mst> in mark wagner's tests it seemed to work well without <mst> so need to see if it's needed, it's not hard to add this interface <mst> but once we add it must maintain forever <mst> so I think irq affinity and cpu pinning are two options to try tweaking <jasonwang> yes, have saw some performance discussion of vhost upstream <mst> need to make sure we try on a numa box <mst> at the moment kernel structures are allocated on first use <jasonwang> yes <mst> I hope it all fits in cache so should not matter <mst> but need to check, not yet sure what exactly <jasonwang> yes, things would be more complicated when using numa <mst> not sure what exactly are the configurations to check <mst> ok so we have the network setup and we have the cpu setup <mst> let thing is traffic to check <mst> let->last <jasonwang> yes, TCP_STREAM/UDP_STREAM/TCP_RR and something else? <mst> let's focus on the protocols first <mst> so we can do TCP, this has a strange property of coalescing messages <mst> but OTOH it's the most used protocol <mst> and it has hard requirements e.g. on the ordering of packets <jasonwang> yes, TCP must to be tested <mst> UDP is only working well up to mtu packet size <mst> but otherwise it let us do pretty low level stuff <jasonwang> yes, agree <mst> ICMP is very low level (good), has a disadvantage that it might be special-cased in hardware and software (bad) <mst> what kind of traffic we care about? ideally a range of message sizes, and a range of loads <mst> (in terms of messages per second) <jasonwang> yes <mst> what do we want to measure? <jasonwang> bandwidth and latency <mst> I think this not really it <mst> this is what tools like to give us <jasonwang> yes and maybe also the cpu usage <mst> if you think about it in terms of an application, it is always latency that you care about in the end <mst> e.g. I have this huge file what is the latency to send it over the network <mst> and for us also what is the cpu load, you are right <jasonwang> yes <mst> so for a given traffic, which we can approximate by setting message size (both ways) protocol and messages per second <mst> we want to know the latency and the cpu load <jasonwang> yes <mst> and we want the peak e.g. we want to know how high we can go in messages per second until latencies become unreasonable <mst> this last is a bit subjective <mst> but generally any system would gadually become less responsive with more load <mst> then at some point it just breaks <mst> cou load is a bit hard to define <mst> cpu <jasonwang> yes and it looks hard to do the measuring then <mst> I think in the end, what we care about is how many cpu cycles the host burns <jasonwang> yes, but how to measure that? <mst> well we have simple things like /proc/stat <jasonwang> understood and maybe perf can also help <mst> yes quite possibly <mst> in other words we'll need to measure this in parallel while test is running <mst> netperf can report local/remote CPU <mst> but I do not understand what it really means <mst> especially for a guest <jasonwang> yes, if we want to use netperf it's better to know how it does the calculation <mst> well it just looks at /proc/stat AFAIK <jasonwang> yes, I try to take a look at its source <mst> this is the default but it has other heuristics <mst> that can be configured at compile time <jasonwang> ok, understand <mst> ok and I think load divided by CPU is a useful metric <jasonwang> so the ideal result is to get how many cpu cycles does vhost spend on send or receive a KB <mst> netperf can report service demand <mst> I do not understand what it is <jasonwang> From its manual its how many us the cpu spend on a KB <mst> well the answer will be it depends :) <mst> also, we have packet loss <mst> I think at some level we only care about packets that were delivered <mst> so e.g. with UDP we only care about received messages <jasonwang> yes, the packet loss may have concerns with guest drivers <mst> with TCP if you look at messages, there's no loss <jasonwang> yes TCP have flow control itself <mst> ok so let's see what tools we have <mst> the simplest is flood ping <jasonwang> yes, it's very simple and easy to use <mst> it gives you control over message size, packets per second, gets you back latency <mst> it is always bidirectional I think <mst> and we need to measure CPU ourselves <mst> that last seems to be true anyway <jasonwang> yes, maybe easy to be understand and analysis than netperf <mst> packet loss when it occurs complicates things <mst> e.g. with 50% packet loss the real load is anywhere in between <jasonwang> yes <mst> that's the only problem: it's always bidirectional so tx/rx problems are hard to separate <jasonwang> yes, vhost is currently half-duplex <mst> I am also not sure it detect reordering <jasonwang> yes, it has sequence no. <jasonwang> but for ping, as you've said it's ICMP and was not the most of the cases <mst> ok, next we have netperf <mst> afaik it can do two things <mst> it can try sending as many packets as it can <jasonwang> yes <mst> or it can send a single one back and forth <mst> not a lot of data, but ok <jasonwang> yes <mst> and similar with UDP <mst> got to go have lunch <mst> So I will try and write all this up <mst> do you have any hardware for testing? <mst> if yes we'll add it too, I'll put up a wiki <mst> back in half an hour <jasonwang> yes, write all things up would help <jasonwang> go home now, please send me mail

jasonwang has quit (Quit: Leaving)

Loaded log from Wed Dec 15 15:07:24 2010