Not all (Fedora) kernels are created equal

Introduction


While working on some completely unrelated piece of code, I discovered what seemed to be the impossible: FreeBSD was much faster than my 2.6 Linux system on certain system calls. And it was quite noticeable, a magnitude or more. This clearly couldn't be right, so I ended up doing some tests and I wrote a blog entry on it. Well, this continue being an issue, so I decided to expand on the topic.

Test setup


The primary test system is a Pentium 4 2.8GHz with HT, so all kernels are built with SMP support. For comparison, I've run the same tests on a few other platforms, including FreeBSD. I don't claim any of these tests to be scientifically correct in any way, but something is definitely wrong here.

Benchmarks


I checked the same code on my older system, running a 2.4 kernel on an old Celeron CPU. To my satisfaction, it was indeed very fast, even faster than FreeBSD. I double checked my PIII machine running a 2.6.6 kernel, it was also reasonably fast.

So, why was my main Linux system so damn slow? Well, as it turns out, not all kernels are created (i.e. built) equal/ I did a number of benchmarks on my system, and discovered that pretty much all my pre-built kernels (from Fedora Core 2 "rawhide") were slow, but only on a Pentium-4 system. My custom built kernel was fast

The numbers below are clock cycles per system call (smaller is better):

Kernel gettimeofday() uname() chdir() open()
FreeBSD 4.x on P4
1591
29446
8977
2583
RedHat 2.4 on Celeron
461
592
1198
864
FC2 2.6.6 on PIII
1076
1286
2547
1425
FC2 2.6.3 i686 8k stack on P4
8373
1681
33115
29725
FC2 2.6.7 i686 4k stack on P4
8453
8680
41695
37885
FC3 2.6.9 i686 on P4
8720
9058
14154
9887
custom 2.6.7 8k stack on P4
890
1031
4601
1396
custom 2.6.7 4k stack on P4
828
1002
4295
1377
custom 2.6.9 on P4
814
990
4514
1315

My custom kernel configuration is "optimized" for my particular system, the configs are available here. The simple program that I used to benchmark this is available here. This code only works on x86 platforms.


Observations


First of all, this is obviously only a problem with P4 systems, so I did a bit research (but not a lot). Apperently P4 CPUs have a new instructions for handling system calls, SYSENTER, while older Pentium systems uses the 0x80 interupt. As far as I can tell, there's a new virtual system call layer that is supposed to handle this, called vsyscall. There's some info on this in this Kerneltrap.org article. I guess I assumed that vsyscall would handle all combinations of CPUs and builds, but maybe it's not?

"Solution"


The solution for me was to recompile the "stock" Linux kernel, based on the FC3 kernel configuration file. I did quite a few changes to the config, and I also didn't apply any of the RedHat/Fedora patches to the kernel source.

The latter is the key to the solution, after trying numerous configurations and options, I narrowed it down to the set of patches Fedora applies. Dave Jones kindly confirmed this, and pointed to the exact set of patches: exec-shield. If you kernel applies the same or a similar patch, you should definitely be aware of this problem.

I haven't yet found an easy way to build the Fedora kernel without just the exec-shield patches, but I'm still working on that. Unfortunately the patches co-depend on each other a bit, so it's not as simple as just leaving out the patch.

I've filed a bug against Fedora Core for this, see Bugzilla bug #139318.


Effects


Obviously system call performance isn't a huge factor on overall system performance, although I can see certain applications suffering more than others due to this problem. To get a somewhat interesting comparison, I did a compile of the kernel source on a "slow" kernel vs my custom built one. Here are the results, output is from the time command:

"Slow" kernel:

# time gmake
real    21m10.495s
user    19m0.013s
sys     3m6.801s

"Fast" kernel:
# time gmake
real    19m17.596s
user    17m55.808s
sys     1m51.115s