Python

Python performance through times

I recently compiled all Python version from v2.2 to 3.0b, to see how their performance compares. I decided to not use pybench, but to take some of the benchmarks from the Computer Language Benchmarks Game instead (hoping they are slightly more "real use" realistic). I compiled all versions of Python identically, using the same compiler (4.3.0) and the same optimization options ("-O3 -march=core2 -mtune=core2"). All benchmarks were run 20 times for every python version, and the fastest run for each benchmarks and interpreter was picked. This obviously gives a "best case" scenario (I think), the other alternative would be to do a median or average, but I wanted to avoid any unfairness due to system/OS activities.

The benchmarks had to be ported to support Python3000 (v3.0b3), but these changes were mostly trivial (print's and xrange's), so I don't think that should affect the results. My test system (a Core2 Duo box with plenty of RAM) was "unused" during the entire test run (which took over 6 hours to complete). Alright, so what are the results? The most interesting data is the relative performance index. This is the average of each test as compared to Python v2.2.3, which therefore has an index of "1.0". This also means that each test has equivalent weight in the total index calculation (a higher index is better).


py-performance-index.png



I'm also including the results for each individual benchmark, in the following graph (times in seconds, lower is better):


py-performance-bench.png

Update: On request from a friend, I tried compiling with "-Os" instead of "-O3", and not surprisingly, compiling for size is not advantageous on my Core2 box. This is in line with the results from the Firefox tests I did before. Again, the 4MB L2 cache probably negates any benefits from compiling for size.

I'm not going to make any comments about what might have happened after v2.4.x, but it's good to see that Python3k is getting very promising results.

Yahoo Search Web Services

Well, it's finally official, Yahoo released the search APIs to the public. The Developer Network has all sort of documentation, and downloads of example code and APIs. Obviously, you should use Python. Swaroop has already begun hacking, writing a QT Image search tool thingie. The Python APIs are hosted on SourceForge, at http://pysearch.sf.net/ .

I'm about 50% done with my next search project, a Gimp plugin that lets me search for Images, preview the thumbnail, and then import the selected image(s) as layers into the Gimp. I'll post with screenshots once it's looking less gimpy.

Python httplib.py problems with Akamai

I spent a long night debugging a problem in my current project (Image/content spam prevention), and had problems retrieving URLs that resolved into the Akamai distributed proxy mesh. And yes, I did file this as a SourceForge bug

I don't know who's at fault here, but httplib.py will interpret the HTTP response from an Akamai server (erroneously) as if the socket is closed when the request is finished. This does not happen, because Akamai implements a Connection: keep-alive feature. The following diff to httplib.py (Python 2.3.2) does solve the problem, although I'm not sure if it's the right solution:

--- /usr/lib/python2.3/httplib.py    2003-10-06 09:11:52.000000000 -0700
+++ httplib.py    2004-01-11 03:10:18.000000000 -0800
@@ -355,6 +355,12 @@
         # An HTTP/1.0 response with a Connection header is probably
         # the result of a confused proxy.  Ignore it.
 
+        # Akamai returns HTTP 1.0 headers, with connection: keep-alive, so
+        # the socket will not close.
+        conn = self.msg.getheader('connection')
+        if conn and conn.lower().find("keep-alive") >= 0:
+            return False
+
         # For older HTTP, Keep-Alive indiciates persistent connection.
         if self.msg.getheader('keep-alive'):
             return False

As an alternative, you can subclass the HTTPResponse class, to override the _check_close() method:

class HTTPResponse(httplib.HTTPResponse):
    def _check_close(self):
        if self.version == 11:
            # An HTTP/1.1 proxy is assumed to stay open unless
            # explicitly closed.
            conn = self.msg.getheader('connection')
            if conn and conn.lower().find("close") >= 0:
                return True
            return False

        # Akamai returns HTTP 1.0 headers, with connection: keep-alive, so
        # the socket will not close.
        conn = self.msg.getheader('connection')
        if conn and conn.lower().find("keep-alive") >= 0:
            return False

        # For older HTTP, Keep-Alive indiciates persistent connection.
        if self.msg.getheader('keep-alive'):
            return False

        # Proxy-Connection is a netscape hack.
        pconn = self.msg.getheader('proxy-connection')
        if pconn and pconn.lower().find("keep-alive") >= 0:
            return False

        # otherwise, assume it will close
        return True

httplib.HTTPConnection.response_class = HTTPResponse

New Python features, and super()

I'm a long time Python user (since 1994) and fan, and lately I've been trying to catch up on some of the many changes the language have been going through. I must say, pretty much everything they have changed or added since the old 1.x days are awesome! As if Python wasn't a great language already, it's shaping up to be a very strong contender indeed.

If you haven't looked at Python lately, take a quick look at the following "summaries" of changes that's been made in recent years:

Anyways, last week I was playing with the super() function, to clean up some old code which used the old style calling conventions for accessing the super class members. This new built-in function was added to Python 2.2, to better support the new multiple inheritance lookup rules (see above). The following example shows the two different styles (but read the docs above for more in depth analysis why using super() is useful):

class Foo(BaseFoo):
    def __init__(self, arg1, arg2):
        BaseFoo.__init__(self, arg1, arg2)
        ...

class Bar(BaseBar):
    def __init__(self, arg1, arg2):
        super(Bar, self).__init__(arg1, arg2)
        ...

This worked most of the time in my code, except when I tried to sub-class some old-style classes, for instance:

from HTMLParser import HTMLParser, HTMLParseError

class HTMLImageParser(HTMLParser):
    def __init__(self, callback=None):
        super(HTMLImageParser, self).__init__()
        ....

This would generate an error like:

TypeError: super() argument 1 must be type, not classobj

The simple solution was to change my sub-class to also be a sub-class of the object class, making it a new-style class:

class HTMLImageParser(HTMLParser, object):
    def __init__(self, callback=None):
        super(HTMLImageParser, self).__init__()
        ...