Traffic Server

Jenkins slaves, Java Web Start and proxies

So we (Apache Traffic Server) have our Jenkins CI system behind a proxy, naturally. This works very well. We have a few remote slaves, that uses the "local" Java start processes, and when they fetch the .jnpl file, the destination host and port for it to talk to Jenkins itself is wrong. It (of course) tries to talk to the proxy host, which doesn't work! This was fairly easy to fix, in the Node configuration for the slaves, click the advanced option, and add a host:port value for "Tunnel connection through".

Hacking: 

Forward proxy over HTTPS

Most clients supports what we call Forward Proxying: You explicitly tell it which server (and port) to use as a proxy. This has traditionally been done over HTTP, with the addition of support for the CONNECT method for HTTPS request. We are now starting to see some clients supporting Forward Proxy over HTTPS, and you might wonder why? Well, a few reasons could include

  • Even with CONNECT there can be some leakage of information. The CONNECT request includes the destination server and port, in clear text.
  • Authentication to the proxy.
  • Overal, we're transitioning away from HTTP.

I saw this tweet from Daniel Stenberg, looking for volunteers to implement support for this in curl. I don't know if he's got any takers yet :). Firefox and Chrome both are working on this feature, Chrome already having the basics available. Since I work on a proxy server (Apache Traffic Server), I took the opportunity to test it with the latest Chrome. Lo and behold, it simply worked right out of the box! I started chrome with this (OSX) command:

% Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --proxy-server=https://localhost:443

Hacking: 

Druapl 7 problems behind a proxy

I run Drupal behind an Apache Traffic Server caching proxy. In my setup, the proxy listens on port 80, and the real Apache HTTPD server listens on port 82 (which is firewall off). In my Traffic Server remap.config, I have a rule like

map http://www.boot.org  http://www.boot.org:82

Granted, in retrospect, this is not the best of setups, but it does however causes serious problems with Drupal 7, whereas it does not cause problems with Drupal 6. In D7, the favicon.ico and all JS and CSS URLs in the head are created to use absolute URLs. I don't set an explicit $base_url in my Drupal settings.php, more on that later, and this causes the URLs to get the wrong base! These URLs are all getting a form like

http://www.boot.org:82/misc/favicon.ico

Yikes! This obviously fails, since port 82 is not accesible from the outside. Browsing the forums, the "solution" seems to be to set the $base_url in the Drupal settings.php configuration file, e.g.

$base_url = 'http://www.boot.org';  // NO trailing slash!

This does indeed solve the problem, however, it now breaks when I want to use e.g. https://www.boot.org for admin access. Besides, why these URLs should be absolute, is a mystery to me, they certainly were not in D6.

The solution I'm ending up with is of course to change Apache Traffic Server to use what we call "pristine host headers", so that the Origin server (Apache HTTPD and Drupal) sees the original client Host: header. I could not get any help from the Drupal IRC, or forums, but if anyone has any insight on why D7 is doing this crazy stuff with absolute URLs, please post. In an ideal world, they really should change these to be relative, e.g. /misc/favicon.ico.

Hacking: 

Drupal, Traffic Server, HTTPS and CDNs

I use Drupal for most of my sites. It generally works well, despite all the weirdness it does (Drupal 7 is doing strange things behind a proxy, more later). One thing is, I've started using a CDN (NetDNA) for my site. With HTTPS, this generally doesn't work well, since I'm not enabling HTTPS for the CDN (at least not yet). The CDN module in Drupal generally works well, but I couldn't see an option to prevent it from using the CDN with HTTPS. This would generate those annoying warnings from Internet Explorer for example.

Since I'm also using an Apache Traffic Server proxy in front of Apache HTTPD, the protocol information was lost oncey it hit Apache, PHP and Drupal. Bummer. I browsed through the CDN code, and noticed they do indeed honor a header of X-Forwarded-Proto, which if set to "https" will prevent the CDN from being used. I added a plugin for my remap rules, with a config like

[SEND_REQUEST_HDR]
        X-Forwarded-Proto =https=

And I activated this for the https remap.config rules for Apache Traffic Server. With this, my Drupal site now stops using the CDN when Apache Traffic Server maps from https:// to the http://localhost URL.

Hacking: 

Filtering Drupal comment spam

I get a fair amount of comment spam on my blog, and even after I changed all comments to be moderated, the spammers still persist. I decided to do something about this, and working under the assumption that most spammers are from a few countries, I decided to implement a Geo-location filter for Apache Traffic Server. The code is currently available at http://svn.apache.org/repos/asf/trafficserver/plugins/geoip_acl/, and only works with MaxMind's APIs (but I'd be more than happy to add support for other Geo-location APIs). This plugin also requires PCRE, but that's already a requirement for building ATS, so shouldn't be a problem.

Once compiled and installed (see the README), setting this up is fairly straight forward. In my remap.config, I now have the following rule

map http://www.ogre.com http://localhost:69 @plugin=geoip_acl.so @pparam=country \
       @pparam=regex::/home/server/etc/deny_spam.conf


This says to apply a country based Geo-location filter on this rule, using the additional configurations from deny_spam.conf. This file contains one single line:

^comment/       deny    CN RU IN

This might look draconian, but for now I'm disabling all comment posts from China, Russia and India. For more details on the plugin configurations and features, again see the README from the source above.

Enjoy!

Hacking: 

Performance tuning

This section contains various configurations that can improve the performance, some might not be appropriate for your environment, but all these options are known to have an impact on performance.

Threads

This is probably the most important configuration for your system, particularly for benchmarking. The default configurations are good for most systems, but can obviously be improved. The relevant settings are:

Hacking: 

May 2011 Apache Traffic Server performance

I just ran a small tests against Apache Traffic Server, to see how performance has improved since last time. My test box is my desktop, an Intel Core i7-920 at 2.67Ghz (no overclocking), and the client runs on a separate box, over a cheap GigE network (two switches in between). Here are the latest results:

2,306,882 fetches on 450 conns, 450 max parallel, 2.306882E+08 bytes, in 15 seconds
100 mean bytes/fetch
153,792.0 fetches/sec, 1.537920E+07 bytes/sec
msecs/connect: 5.326 mean, 13.078 max, 0.149 min
msecs/first-response: 2.094 mean, 579.752 max, 0.099 min

 

This is of course for very small objects (100 bytes) served out of RAM cache, with HTTP keep-alive. Still respectable, close to 154k QPS out of a vey low end, commodity box.

Hacking: 

Apache Traffic Server recipes

Apache Traffc Server is a high performance, customizable HTTP proxy server. This is a collection of small "recipes", showing how to accomplish various tasks. This cookbook is work in progress, I haven't quite decided yet how I want to handle this "documentation". Perhaps it belongs in Apache official docs, but for now, I find it easier to use the tools I'm used to here on ogre.com to maintain this.

Hacking: 

July 2010 Apache Traffic Server benchmark

I reran my benchmarks with the latest "trunk" of Apache Traffic Server, to make sure we're not regressing. I also tweaked the number of worker threads a little, a gut feeling tells me that with Hyper Threading, our auto-scaling algorithm isn't optimal (and, it really isn't). Here are the latest numbers, running over a GigE network (two Linksys el-cheapo switches between clients and server)

3,160,237 fetches on 3,666 conns, 1,800 max parallell, 1.58012e+09 bytes in 30 seconds
500 mean bytes/fetch
105,341.10 fetches/sec, 5.26704e+07 bytes/sec
msecs/connect: 1.46781 mean, 6.674 max, 0.093666 min
msecs/first-response: 16.3333 mean, 615.34033 max, 0.121333 min

 

That is, 105k QPS (with keep-alive) for small objects, over the network. It's pushing 52MB of payload at this speed, but remember the average size is very small (500 bytes). My box is an Intel i7 920, Quad core.

Hacking: 

Pages

Subscribe to RSS - Traffic Server