Linux / Fedora Core and nf_conntrack:

While testing and benchmarking some new features in Apache Traffic Server on my dev box, I started having really odd problems with inconsistently lost connections or really slow performance. Upon examing the logs, I found a large number of kernel barf like

Oct 29 11:53:51 loki kernel: nf_conntrack: table full, dropping packet.
Oct 29 11:53:51 loki kernel: nf_conntrack: table full, dropping packet.
Oct 29 11:53:51 loki kernel: nf_conntrack: table full, dropping packet.
Oct 29 11:53:51 loki kernel: nf_conntrack: table full, dropping packet.

Clearly this could not be good. I poked around, and could not find any ways to turn off this kernel module, at least on my Fedora Core 13 it seems compiled straight into the kernel (i.e. no .ko). There are ways to increase the table size (sysctl's) but that seemed like a band aid at best. Bummer. So, I went to the mountain (Noah F.) and asked for advice. After some poking around, we found that forcing iptables to ignore the conntracking helps, a lot. E.g.

iptables -t raw -I OUTPUT -j NOTRACK

 seems to do the trick for my particular case. This turns it off for all protocols, but you could probably limit it further (e.g. for tcp only with a "-p tcp" option). Also, you might want to do this on the input as well, e.g.

iptables -t raw -I PREROUTING -j NOTRACK

We found this old report against Fedora Core 10 as well, which describes the same problem, but no solution.

Hacking: 

yum, rpm and duplicate versions

Apparently, when doing "yum update", and it fails miserably, you can end up with duplicate versions of packages in the RPM database. This seems harmless, but is annoying. yum provides a tool to check for this, but I was not able to find anything that would automatically repair it. So here's a little tip:

$ yum check duplicates | awk '/is a duplicate/ {print $6}' > /tmp/DUPES
$ yum remove `cat /tmp/DUPES`

 Of course, before you remove the dupes, make sure to examine the tmp file (/tmp/DUPES) and make sure it looks ok.

Update:

There seems to be a command to do this, package-cleanup has an option for it. E.g.

$ package-cleanup --cleandupes

However, testing this command on a second box having the same problem gave bad results, it seems to have uninstalled the "real" packages too.

Hacking: 

Slow SSH into my MacOSX laptop

I recently upgraded Michelle's old laptop to MacOSX 10.6 (for myself), and noticed it would take excruciating long to ssh or scp into the box. As it turns out, sshd tries to do a reverse lookup of my internal IPs, which fails. I don't know why it takes it so long to realize this (the reverse lookup ought to fail immediately), but I found a simple solution. In /etc/sshd_config, I simply turned off DNS lookups:

UseDNS no

Hacking: 

July 2010 Apache Traffic Server benchmark

I reran my benchmarks with the latest "trunk" of Apache Traffic Server, to make sure we're not regressing. I also tweaked the number of worker threads a little, a gut feeling tells me that with Hyper Threading, our auto-scaling algorithm isn't optimal (and, it really isn't). Here are the latest numbers, running over a GigE network (two Linksys el-cheapo switches between clients and server)

3,160,237 fetches on 3,666 conns, 1,800 max parallell, 1.58012e+09 bytes in 30 seconds
500 mean bytes/fetch
105,341.10 fetches/sec, 5.26704e+07 bytes/sec
msecs/connect: 1.46781 mean, 6.674 max, 0.093666 min
msecs/first-response: 16.3333 mean, 615.34033 max, 0.121333 min

 

That is, 105k QPS (with keep-alive) for small objects, over the network. It's pushing 52MB of payload at this speed, but remember the average size is very small (500 bytes). My box is an Intel i7 920, Quad core.

Hacking: 

Why Traffic Server defaults to not allow forward proxying

We have discussed numerous times on the Apache mailing lists about the reasons why Apache Traffic Server ships with a default configuration that is almost entirely locked down. Our argument has been that we want to assure that someone testing TS is not accidentally setup in the wild as an open proxy.

I recently moved all of www.ogre.com to be served via a TS reverse proxy setup. Within minutes from setting it up, and while watching the log files, I found entries like these in the logs:

1273927263.211 0 125.45.109.166 ERR_CONNECT_FAIL/404 485 GET http://proxyjudge1.proxyfire.net/fastenv - NONE/- tex -
1274057848.081 0 125.45.109.166 ERR_CONNECT_FAIL/404 485 GET http://proxyjudge1.proxyfire.net/fastenv - NONE/- tex -
1274236765.403 4 125.45.109.166 ERR_CONNECT_FAIL/404 485 GET http://proxyjudge1.proxyfire.net/fastenv - NONE/- tex -

Of course, my server doesn't allow this, since it's setup to only accelerate www.ogre.com traffic. But case in point is, I think we've done the right thing of shipping Apache Traffic Server with a very restrictive configuration.

Hacking: 

Ogre is now running on Apache Traffic Server

I've just switched over to serve all of www.ogre.com out of Apache Traffic Server. The site is still managed and created using Apache HTTPD, PHP and Drupal, but that is running as an "origin" server to ATS. This gives me a few benefits over serving straight out of Apache HTTPD:

  • Static content is automatically "cached" on the ATS server, and it can serve such content very fast with low latency.
  • I can jack up keep-alive much higher than I dared doing with HTTPD. Fwiw, I still use the pre-fork MPM, so I have limited number of processes and can't afford to tie those up with idle KA connections.
  • In a pinch, I could turn the HTML generated from Drupal to be cacheable, and serve straight out of ATS. I'm contemplating making this setting automatic, so when the load on the box hits a certain level, all HTML will also be cached by ATS. That would increase my capacity by at least a magnitude I think.

This change required no changes on my Drupal site, but I did change the port on my Apache HTTPD virtual host:

NameVirtualHost 209.126.158.218:8080

<VirtualHost 209.126.158.218:8080>
    ServerName www.ogre.com
...

I then installed Apache Traffic Server to listen on port 80, and I also told it to only bind a specific IP on my server (I have three IPs for different things). I also increased the RAM cache size and Keep-Alive timeouts, so I now have these changes in etc/trafficserver/records.config:

CONFIG proxy.config.proxy_name STRING kramer3.ogre.com
CONFIG proxy.config.http.server_port INT 80
CONFIG proxy.config.http.keep_alive_no_activity_timeout_in INT 60
CONFIG proxy.config.http.keep_alive_no_activity_timeout_out INT 1
CONFIG proxy.config.http.transaction_no_activity_timeout_in INT 15
CONFIG proxy.config.http.transaction_no_activity_timeout_out INT 30
CONFIG proxy.config.cache.ram_cache.size LLONG 33554432

LOCAL proxy.local.incoming_ip_to_bind STRING 209.126.158.218

Next, I added a disk cache to use for ATS, etc/trafficserver/storage.config:

/disk/tmp 134217728

This creates a 128MB cache in /disk/tmp. I know, very small, but this is still experimental. Finally, I added a remapping rule to etc/trafficserver/remap.config:

map http://www.ogre.com/ http://www.ogre.com:8080/

After starting everything up, the entire site is now reverse proxied (or accelerated) through Apache Traffic Server! As you can see, the changes necessary to ATS are fairly small, and pretty straight forward, most of the default settings 'just work'. It's a miracle.

Hacking: 

Adobe Photoshop CS5 vs The Gimp

I've only used CS5 for a few hours, but I have to say, this beats The Gimp hands down in every way... The content aware fill is by far the coolest thing I've seen in a long time, and the ease of making a lot of my "common" tasks (for example Unsharpen mask on luminosity only) is awesome. Don't get me wrong, The Gimp is great, particularly with the Scheme scriptability, but there's no question Photoshop / CS5 is vastly superior. I'm still running the trial version on my Windows7 VirtualBox installation, and so far, it rocks.

Update: Yes, there is a similar plugin for the Gimp...

Hacking: 

First "alpha" release of Apache Traffic Server

We've finally produced our first "alpha" release of Apache Traffic Server. It can be fetched from your local Apache Download Mirror. This first version, v2.0.0-alpha, should be reasonably stable as it does not contain a ton of improvements over the old Yahoo code. The 2.0.0 releases also only supports Linux, but a number of 32-bit and 64-bit distros have been tested.

We're hoping to get some testing done on this release in the next week or two, so we can make the "final" 2.0.0 release. After that, the plan is to aggressively start making "developer" releases of trunk, which has some impressive improvements (like, up to 2x the performance in some cases). Apache Traffic Server is adopting the same versioning scheme as Apache HTTPD, so, v2.0.x is a "stable" release, while v2.1.x is a developer release. This implies, of course, that the next stable release will be v2.2.0, and we're hoping to get that out the door sometime this summer!

People interested in Apache Traffic Server are highly encouraged to join our mailing lists (see the incubator page), or come join us on #traffic-server on freenode.net.

Hacking: 

Pages

Subscribe to Ogre.com RSS