leif's blog

May 2011 Apache Traffic Server performance

I just ran a small tests against Apache Traffic Server, to see how performance has improved since last time. My test box is my desktop, an Intel Core i7-920 at 2.67Ghz (no overclocking), and the client runs on a separate box, over a cheap GigE network (two switches in between). Here are the latest results:

2,306,882 fetches on 450 conns, 450 max parallel, 2.306882E+08 bytes, in 15 seconds
100 mean bytes/fetch
153,792.0 fetches/sec, 1.537920E+07 bytes/sec
msecs/connect: 5.326 mean, 13.078 max, 0.149 min
msecs/first-response: 2.094 mean, 579.752 max, 0.099 min

 

This is of course for very small objects (100 bytes) served out of RAM cache, with HTTP keep-alive. Still respectable, close to 154k QPS out of a vey low end, commodity box.

Hacking: 

Forcing a "check" on a Linux md RAID device

To be proactive, I've found that once in a while (perhaps via a cron job) it might be a good idea to force a Linux RAID (mirror or RAID5/6) to be checked for consistency. This can easily be done from command line, with something like

% sudo echo check >  /sys/block/md0/md/sync_action

Also, to repair bad raid device, perhaps something like

% sudo echo repair >/sys/block/md0/md/sync_action

This second command solved a problem for me, where I'd get an email warnings once a week, saying "WARNING: mismatch_cnt is not 0 on /dev/md0". This was verified with

% sudo cat /sys/block/md1/md/mismatch_cnt
256

 

Hacking: 

Upgrading Fedora with yum

I've been doing a number of upgrades of Fedora Core recently, and was semi-disappointed with how slow the upgrade process from DVD was. On every upgrade, it had to first install well over 1000 packages from DVD, and to finish it off with a "yum update" after the upgrade, at least another 1000 packages upgraded again. I decided for my next upgrade to do an update on the live system, using yum directly. I decided to follow the excellent upgrade instructions from the Fedora project, and it worked surprisingly well (and much, much faster). On top of that, I also cleaned out some groups that I no longer thought were necessary.

Here's the quick summary of the commands I ran:

# Run an update first, just in case
yum update

# Cleanup/merge config updates
yum install rpmconf
rpmconf -a

# Find and review unused packages
yum install yum-utils
package-cleanup --leaves
# Now yum remove the packages you think should be removed

# Find and review lost packages
package-cleanup --orphans

# Cleanup
yum clean all

# Do the upgrade, preferably in run-level 3 (no GUI)

# For FC13 -> 14 upgrade
rpm --import https://fedoraproject.org/static/97A1071F.txt
# For FC14 -> 15 upgrade
rpm --import https://fedoraproject.org/static/069C8460.txt
# For FC15 -> 16 upgrade
rpm --import https://fedoraproject.org/static/A82BA4B7.txt
# For FC16 -> 17 upgrade
rpm --import https://fedoraproject.org/static/10D90A9E.txt
# For FC17 - > 18 upgrade
rpm --import https://fedoraproject.org/static/DE7F38BD.txt
# For FC18 - > 19 upgrade
rpm --import https://fedoraproject.org/static/FB4B18E6.txt
# For FC19 -> 20 upgrade
rpm --import https://fedoraproject.org/static/246110C1.txt

yum update yum
# The following steps are a bit different than Fedora recommendations, they
# say to run the updates with --skip-broken. But I don't like it.
yum check
# and then resolve any duplicates. When running the following, you will get
# conflicts, resolve those by uninstalling the offending packages.
# For FC13 -> 14
yum --releasever=14 distro-sync
# For FC14 -> 15
yum --releasever=15 distro-sync
yum groupupdate Base
# For FC15 -> 16
yum --releasever=16 --disableplugin=presto distro-sync
yum groupupdate Base
# For FC16 -> 17
yum --releasever=17 --disableplugin=presto distro-sync
yum groupupdate Base
# For FC17 -> 18
yum --releasever=18 --disableplugin=presto distro-sync
yum groupupdate Base
# For FC18 -> 19
yum --releasever=19 --disableplugin=presto distro-sync
yum groupupdate Base
# For FC19 -> 20
yum --releasever=20 --disableplugin=presto distro-sync
yum groupupdate Base
 

# Examine other groups and update (or remove) as necessary yum grouplist yum groupupdate "Administration Tools" "Server Configuration Tools" ... # Prepare for boot, assuming your boot device is /dev/sda (change as appropriate) /sbin/grub-install /dev/sda # Or, if you are switching to grub (as in FC16) /sbin/grub2-mkconfig -o /boot/grub2/grub.cfg /sbin/grub2-install /dev/sda # Update startup order; This is not for FC16 or later! cd /etc/rc.d/init.d; for f in *; do /sbin/chkconfig $f resetpriorities; done # Find packages that haven't been upgraded package-cleanup --orphans

I made a small shell script available to make it a little easier to manage the update (and additions / removals) of yum groups. It's far from perfect, and does indeed have bugs, but I still find it helpful. Right now, it still runs the necessary yum commands "interactively", to give you some extra safety deciding if you want to proceed with a particular operation. The script supports upgrading, removing and adding yum groups. It is available from github, at https://github.com/zwoop/scripts, it is named yumgroup.sh.

 

Note: I take no responsibilities for failed upgrades, but this process worked nicely for me, upgrading from FC13 to FC14.

Hacking: 

Linux / Fedora Core and nf_conntrack:

While testing and benchmarking some new features in Apache Traffic Server on my dev box, I started having really odd problems with inconsistently lost connections or really slow performance. Upon examing the logs, I found a large number of kernel barf like

Oct 29 11:53:51 loki kernel: nf_conntrack: table full, dropping packet.
Oct 29 11:53:51 loki kernel: nf_conntrack: table full, dropping packet.
Oct 29 11:53:51 loki kernel: nf_conntrack: table full, dropping packet.
Oct 29 11:53:51 loki kernel: nf_conntrack: table full, dropping packet.

Clearly this could not be good. I poked around, and could not find any ways to turn off this kernel module, at least on my Fedora Core 13 it seems compiled straight into the kernel (i.e. no .ko). There are ways to increase the table size (sysctl's) but that seemed like a band aid at best. Bummer. So, I went to the mountain (Noah F.) and asked for advice. After some poking around, we found that forcing iptables to ignore the conntracking helps, a lot. E.g.

iptables -t raw -I OUTPUT -j NOTRACK

 seems to do the trick for my particular case. This turns it off for all protocols, but you could probably limit it further (e.g. for tcp only with a "-p tcp" option). Also, you might want to do this on the input as well, e.g.

iptables -t raw -I PREROUTING -j NOTRACK

We found this old report against Fedora Core 10 as well, which describes the same problem, but no solution.

Hacking: 

yum, rpm and duplicate versions

Apparently, when doing "yum update", and it fails miserably, you can end up with duplicate versions of packages in the RPM database. This seems harmless, but is annoying. yum provides a tool to check for this, but I was not able to find anything that would automatically repair it. So here's a little tip:

$ yum check duplicates | awk '/is a duplicate/ {print $6}' > /tmp/DUPES
$ yum remove `cat /tmp/DUPES`

 Of course, before you remove the dupes, make sure to examine the tmp file (/tmp/DUPES) and make sure it looks ok.

Update:

There seems to be a command to do this, package-cleanup has an option for it. E.g.

$ package-cleanup --cleandupes

However, testing this command on a second box having the same problem gave bad results, it seems to have uninstalled the "real" packages too.

Hacking: 

Slow SSH into my MacOSX laptop

I recently upgraded Michelle's old laptop to MacOSX 10.6 (for myself), and noticed it would take excruciating long to ssh or scp into the box. As it turns out, sshd tries to do a reverse lookup of my internal IPs, which fails. I don't know why it takes it so long to realize this (the reverse lookup ought to fail immediately), but I found a simple solution. In /etc/sshd_config, I simply turned off DNS lookups:

UseDNS no

Hacking: 

July 2010 Apache Traffic Server benchmark

I reran my benchmarks with the latest "trunk" of Apache Traffic Server, to make sure we're not regressing. I also tweaked the number of worker threads a little, a gut feeling tells me that with Hyper Threading, our auto-scaling algorithm isn't optimal (and, it really isn't). Here are the latest numbers, running over a GigE network (two Linksys el-cheapo switches between clients and server)

3,160,237 fetches on 3,666 conns, 1,800 max parallell, 1.58012e+09 bytes in 30 seconds
500 mean bytes/fetch
105,341.10 fetches/sec, 5.26704e+07 bytes/sec
msecs/connect: 1.46781 mean, 6.674 max, 0.093666 min
msecs/first-response: 16.3333 mean, 615.34033 max, 0.121333 min

 

That is, 105k QPS (with keep-alive) for small objects, over the network. It's pushing 52MB of payload at this speed, but remember the average size is very small (500 bytes). My box is an Intel i7 920, Quad core.

Hacking: 

Why Traffic Server defaults to not allow forward proxying

We have discussed numerous times on the Apache mailing lists about the reasons why Apache Traffic Server ships with a default configuration that is almost entirely locked down. Our argument has been that we want to assure that someone testing TS is not accidentally setup in the wild as an open proxy.

I recently moved all of www.ogre.com to be served via a TS reverse proxy setup. Within minutes from setting it up, and while watching the log files, I found entries like these in the logs:

1273927263.211 0 125.45.109.166 ERR_CONNECT_FAIL/404 485 GET http://proxyjudge1.proxyfire.net/fastenv - NONE/- tex -
1274057848.081 0 125.45.109.166 ERR_CONNECT_FAIL/404 485 GET http://proxyjudge1.proxyfire.net/fastenv - NONE/- tex -
1274236765.403 4 125.45.109.166 ERR_CONNECT_FAIL/404 485 GET http://proxyjudge1.proxyfire.net/fastenv - NONE/- tex -

Of course, my server doesn't allow this, since it's setup to only accelerate www.ogre.com traffic. But case in point is, I think we've done the right thing of shipping Apache Traffic Server with a very restrictive configuration.

Hacking: 

Pages

Subscribe to RSS - leif's blog