Stupid benchmark

To add to the pool of braindead benchmarks, but perhaps with a little more reason, I'm adding this, and take it for what it is. If anything, this shows that performance is generally not the primary argument for choosing an intermediary. This is what I've been preaching - Yeah, performance is important, but most servers available today will handle a ridicious amount of HTTP traffic.

This is a test against an AMD Phenom(tm) II X4 940 Processor (very cheap), running across a GigE network. There are two linksys switches between the load geneating host and the server, but no routing or packet filtering. The payload is 100 bytes + a fairly small header, and the test is running with keep-alive.

In all tests, all logging is disabled.

Varnish

The configs are mostly the defaults, the main thing was I had to jack up the minimum threads, 200 seems to be a reasonable number for this test. During the test, the load goes to 300. The version of varnish is v2.1.5, from the Fedora repository.

5719306 fetches on 450 conns, 450 max parallel, 5.776500E+08 bytes, in 60 seconds
101 mean bytes/fetch
95320.7 fetches/sec, 9.627390E+06 bytes/sec
msecs/connect: 3.955 mean, 6.453 max, 0.162 min
msecs/first-response: 3.546 mean, 1005.235 max, 0.076 min

 

Nginx

This is running the older v0.8.53 version, since it's what was made available on the Fedora repo. The configs had to be tuned some, increasing the number of worker processes, setting the open_file_cache high, and also increasing the keepalive_requests setting (high).

5848823 fetches on 450 conns, 450 max parallel, 5.848820E+08 bytes, in 60 seconds
100 mean bytes/fetch
97480.4 fetches/sec, 9.748040E+06 bytes/sec
msecs/connect: 1.340 mean, 3.558 max, 0.469 min
msecs/first-response: 3.522 mean, 280.463 max, 0.067 min

 

Apache Traffic Server

This is the winner, of course, otherwise I wouldn't have published these results ;). This is running ATS v2.1.8, with mostly stock config. The primary configuration changes is to set the number of worker threads to 5 and turning off some verbose Via and server strings.

6944993 fetches on 450 conns, 450 max parallel, 6.945000E+08 bytes, in 60 seconds
100 mean bytes/fetch
115748.6 fetches/sec, 1.157486E+07 bytes/sec
msecs/connect: 1.805 mean, 2.995 max, 0.519 min
msecs/first-response: 1.736 mean, 218.573 max, 0.081 min

 

Update: I updated with the latest results from ATS v2.1.9, they are marginally different.

Comments

Re: Stupid benchmark

I'm not sure, either it's a bug in the load generator (i've seen that before), or perhaps nginx is simply dropping / closing the connections. I'm obviously no nginx expert ;)

Re: Stupid benchmark

Ah, figured it out, there's a setting for this, keepalive_requests. I set it way high, and re-ran the tests. It increases nginx's results a bit :). I've updated the benchmark numbers for nginx accordingly.

Re: Stupid benchmark

Try giving Varnish a higher number of threads say 20.000 and see if your results change.

A load of 300 is not normal. You are doing it wrong®! ;)

Re: Stupid benchmark

You mean increase the minimum threads to 20000 ? My current settings are

-w 200,1000,120

Why would it need 20k threads, to handle 450 concurrent connections?  I should say, most of the rest of the configs are 'default', so if I'm doing something wrong, I'd say the varnish default configs are doing it wrong ;).

Re: Stupid benchmark

The benchmark numbers look spectacular, but on an core2duo 2.4 GHz machine/4GB PC800 memory, when running ATS and httperf on the same machine, and serving requests from RAM cache, ATS 3.0.0 seems to max out at 10K requests/sec (only one connection). The only bottleneck is memory bandwidth which at 6.4GB/s is way faster than a 1Gb network connection. What tools/settings you are using to drive and measure the traffic?

Re: Stupid benchmark

Hmmm, you are seeing your traffic_server peg out all CPUs at only 10k QPS? Now, with only one connection, your limitation is the response time. Imagine it takes 1ms to respond, the most you could possibly do on one connection is 1,000 req/sec. So, if you are seeing 10k QPS with a single connection, that's very good, since that means your average response time is 0.1ms or less. (I'm assuming you are not testing with pipelining, then that math changes).

 

Fwiw, I use a souped up version of http_load, but even so, to drive it to 150,000 QPS or more, I need to run three instances of it. Usually I run a few hundred connections per client, to make it a reasonable test (but, optimal is proably less than 200 total).