How to configure django/mod_wsgi to avoid a frozen apache because of Python GIL?

Do you know that a long running request could block the whole Apache process that's hosting it?

Yes, we have run into this problem today. It took us quite a while to figure it out. So here is the story. 

Trunk.ly allows our user to add a new link directly via the web interface. Behind the scene,  the python process hosted by mod_wsgi "within" apache needs to resolve the dns, expand any url shortener wrapped around it, download the actual html and submit it to our backend search server. We noticed that if someone added a url that takes a long time for our crawler to download, the whole apache will be frozen there. Further user requests will be blocked and eventually the front-end nginx will start throwing out 500 Timeout errors. (We use nginx to serve static files and pass dynamic requests to apache as explained here. Notice that this tutorial has the same flaw we'll see soon.)

So one request can bring the whole server down. Bad.  

Our settings are: 

  • MPM Worker for Apache: Basically apache has a process group which consists of a few processes and each process has a number of threads. All these threads will take a request and process them.  (Internals of Apache's scheduling/worker system and some nice diagrams can be found here. )
  • mod_wsgi works in daemon mode: This means python processes run in their own process and mod_wsgi acts as a bridge between apache and the python process. 
  • mod_wsgi has 1 process and 25 threads as specified in mod_wsgi's IntegrationWithDjango,  WSGIDaemonProcess site-1 user=user-1 group=user-1 threads=25

The above settings basically says create a single process with 25 threads in it and let each thread deal with the incoming http requests. At the beginning I didn't think the whole apache process was frozen because of Python GIL.  Theoretically Python GIL should yield whenever there is an IO request but somehow this didn't happen in the context specified above. So a natual solution is to assign each request to one process instead of a thread.  Here is the new mod_wsgi settings  

WSGIDaemonProcess site-1 user=user-1 group=user-1 processes=9 threads=1

The above setting tells mod_wsgi to spawn 9 processes and each has 1 thread. Basically the same thing python's multiprocessing does to replace threading. After restarting the apache process, the frozen process problem disappeared. 

Notes

  1. One unknown I still haven't figured out yet is why the GIL doesn't yield. If you have any hints, please leave a comment.  
  2. Our backend crawler and processing pipeline have been running in gevent for many months, so gunicorn looks like a very natural next step for us.  I'd love to hear about your experience if you have moved from mod_wsgi to gunicorn. 
  3. Here are a few more links trunk.ly users have shared about GIL - Python Global Interpreter Lock

 

About

A Programming Artist believes in Minimalism. CTO of http://trunk.ly/. Proud owner of vim, zsh, and wikiReader. A man without a mobile phone.

http://alexdong.com/
http://twitter.com/alexdong/
http://trunk.ly/alexdong/

TwitterFacebook