I recently compiled all Python version from v2.2 to 3.0b, to see how their performance compares. I decided to not use pybench, but to take some of the benchmarks from the [http://shootout.alioth.debian.org/gp4/|Computer Language Benchmarks Game] instead (hoping they are slightly more "real use" realistic). I compiled all versions of Python identically, using the same compiler (4.3.0) and the same optimization options ("-O3 -march=core2 -mtune=core2"). All benchmarks were run 20 times for every python version, and the fastest run for each benchmarks and interpreter was picked. This obviously gives a "best case" scenario (I think), the other alternative would be to do a median or average, but I wanted to avoid any unfairness due to system/OS activities.
The benchmarks had to be ported to support Python3000 (v3.0b3), but these changes were mostly trivial (__print__'s and __xrange__'s), so I don't think that should affect the results. My test system (a Core2 Duo box with plenty of RAM) was "unused" during the entire test run (which took over 6 hours to complete). Alright, so what are the results? The most interesting data is the relative performance index. This is the average of each test as compared to Python v2.2.3, which therefore has an index of "1.0". This also means that each test has equivalent weight in the total index calculation (a higher index is better).
I'm also including the results for each individual benchmark, in the following graph (times in seconds, lower is better):
__Update__: On request from a friend, I tried compiling with "-Os" instead of "-O3", and not surprisingly, compiling for size is not advantageous on my Core2 box. This is in line with the results from the Firefox tests I did before. Again, the 4MB L2 cache probably negates any benefits from compiling for size.
I'm not going to make any comments about what might have happened after v2.4.x, but it's good to see that Python3k is getting very promising results.