Python's GIL is EVIL
Tagged:  •  

Lately I've been doing some Python multi-threading to make the best use of some of our amazing server resources. As I was pondering the reasons why one of our 8-core servers reported 83% idle despite 8 threads banging away, I re-discovered the Global Interpreter Lock.

BLECH!

The GIL enforces Python's requirement that only a single bytecode operation is executed at a time. My nicely coded multi-threaded app was only being executed serially!! Sadly, this seems unlikely to change, even in Python 3000. Last year Guido said:

"Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions."

I was brought up to believe that threading was dirty and independent communicating processes were the way to go. But even I realize that this just isn't practical in these days of GUIs, multi-core processors, and application servers.

Why does the Python community accept the GIL? Is it because most people only use Python as a scripting language? Are there simple workarounds (e.g. not forking, shared memory, or the like) that I'm missing?

GIL and performance issues

Python has a GIL as opposed to fine-grained locking for several reasons:

--- It is faster in the single-threaded case.

--- It is faster in the multi-threaded case for i/o bound programs.

--- It is faster in the multi-threaded case for cpu bound programs that do their compute-intensive work in C libraries.

--- It makes C extensions easier to write: there will be no switch of Python threads except where you allow it to happen (i.e. between the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros).

--- It makes wrapping C libraries easier. You don't have to worry about thread-safety. If the library is not thread-safe, you simply keep the GIL locked while you call it.

The GIL can be released by C extensions. Python's standard library releases the GIL around each blocking i/o call. Thus the GIL has no consequence for performance of i/o bound servers. You can thus create networking servers in Python using processes (fork), threads or asynchronous i/o, and the GIL will not get in your way.

Numerical libraries in C or Fortran can similarly be called with the GIL released. While your C extension is waiting for an FFT to complete, the interpreter will be executing other Python threads. A GIL is thus easier and faster than fine-grained locking in this case as well. This constitutes the bulk of numerical work. The NumPy extension releases the GIL whenever possible.

Threads are usually a bad way to write most server programs. If the load is low, forking is easier. If the load is high, asynchronous i/o and event-driven programming (e.g. using Python's Twisted framework) is better. The only excuse for using threads is the lack of os.fork on Windows.

The GIL is a problem if, and only if, you are doing CPU-intensive work in pure Python. Here you can get cleaner design using processes and message-passing (e.g. mpi4py). There is also a 'processing' module in Python cheese shop, that gives processes the same interface as threads (i.e. replace threading.Thread with processing.Process).

Threads can be used to maintain responsiveness of a GUI regardless of the GIL. If the GIL impairs your performance (cf. the discussion above), you can let your thread spawn a process and wait for it to finish.

pyProcessing

Will be included in python 2.6 and 3.0
http://www.python.org/dev/peps/pep-0371/

http://pyprocessing.berlios.de/

I really stumbled upon your blog and the pep by accident while looking for other things.

Use Erlang!

I think you could consider Erlang and you will probably love it. It has many powerful and elegant list operations just like Python, but it is designed to be totally parallel. No threads, but inter-process messaging. With Erlang it's easy to use all your cores/processors fully.

Use Erlang!

I think you could consider Erlang and you will probably love it. It has many powerful and elegant list operations just like Python, but it is designed to be totally parallel. No threads, but inter-process messaging. With Erlang it's easy to use all your cores/processors fully.

Consider Scala

You might consider the Scala programming language: I think it offers the best of Python but running on the JVM. It's very nearly as fast as Java (and thus much faster than Python), and sometimes faster. It's more OO and more functional than Java, and I think both Python and Java programmers will be very happy with it. And while Scala is a statically typed language, it's much more pleasant than Java, thanks to type inference and a few other nice features.

And it has scripting capabilities and an interpreter shell, like Python (and unlike Java).

http://www.scala-lang.org/

It's definitely not because

It's definitely not because everyone's using python just for quick scripts, or whatever.

I spent a bit of time poking around the GIL issue a few jobs ago, on a Zope system, and came to the conclusion that there are enough alternate approaches for most problems, that the advantages of Python generally make it worth while. Having said that, we were dependent on the ZODB, Zope's object database, to get multiple things done at once, and it really couldn't manage.

a pity

I think we accept the GIL because we love python so much that by the time we encounter it, we can't give it up. I discovered the dreaded GIL during a class project with genetic algorithms. I had four processors to work with and was hoping for a 4x speed-up, by evaluating four "genes" at a time. It was depressing.

There is Stackless...

There is/was something called Stackless Python which did, I believe, do away with the GIL (it also introduced green threads and continuations, though not necessarily all at the same time).

I'm not sure why they accept the GIL. It could be that they've found that any cases they have where it's really a problem can be dealt with using multiple processes, and ditching it would add significant complexity to the runtime (concurrent garbage collection and having a lock on every object to protect its reference count or a global refcount lock to use when adjusting reference counts are two of the things that come to mind).

But yeah, the present state... I don't know. It does seem that they're somewhat crippling themselves given that increasing parallelism is the direction of increased performance for the forseeable future.

I don't believe that

I don't believe that Stackless did away with the GIL. If it had, we would all be using Stackless.

As for easy concurrency, look at the processing package. It offers an interface similar to the Threading module in Python, only it uses processes. There are queues and other shared data structures that allow the each process to easily pass data around (including sockets on Windows and Linux).

Regarding why Python programmers "accept" the GIL, it's not about "acceptance", it's about "it's really hard to get rid of a single coarse-grained lock and replace it with a fine-grained lock". And doing so doesn't necessarily guarantee an increase in performance.

Although not a definitive

Although not a definitive answer by any means, Stackless certainly seems to keep all my cores working hard when it needs to.

Threads seem to be Evil

Of course, I meant that Jython and IronPython do not have a GIL.

Threads seem to be Evil

Hi,

From what I read, a lot of people believe that Threads are Evil. Guido van Rossum says that one should try to avoid threads as much as possible, because they make programs complex and they lead to deadlocks and race conditions. Some programs have been in production for over 4 years without any problem and suddenly they deadlock! It is incredibly difficult, even for simple multi-threaded programs, to think of every possible scenario. For more info, read this article:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

Apparently the Stackless python implementation got rid of the GIL, but the overhead was 100%! In other words, programs ran twice as fast on the regular C python interpreter than on Stackless python (on a single CPU). This means that you only got some performance benefit out of Stackless python in heavily-multithreaded programs running on machines with 3 or more CPUs. Thus this project seems to have been abandonned. Guido van Rossum says that it is not worth the effort, and he definitely will not even try, but he still encourages anyone who has (a lot of) spare time to write a GIL-free python interpreter.

In the mean time, if you really want to take benefit of threads in multiple-CPU machines, you may use Jython (based on Java) or IronPython (based on .NET) which both rely on real threads.

Hope this helps