Case in point: I was talking to my students in Unix Systems Programming about multi-threaded producer-consumer systems and how to use queues to coordinate them. While going through some examples, I noticed that I had made a really bad call some time ago when I integrated an SQLite database with a multi-threaded web application.
Some background? I have a web application written in CherryPy, a very nice but also very multi-threaded Python framework. I decided to use SQLite as the database for my application because I didn't want to deal with the complexities of setting up MySQL or something similar. You may say "That's your mistake right there!" but hey, it's what I did and I don't want to change databases right now. (I also don't want to switch the application to some ORM at this point, but of course I should probably have used SQLAlchemy from the beginning.)
In case you don't know: SQLite doesn't like multiple threads to begin with as it uses a global lock for the whole database. Also, the Python interface to SQLite doesn't like multiple threads: You can't share objects created through the interface among multiple threads. So I had to do two things:
- Get each CherryPy thread its own database connection (the only way to generate more SQLite objects).
- Handle the (inevitable) case that two threads want to access the database concurrently.
The first was easy to solve: I maintain a dictionary of database connections indexed by thread. When a thread wants to execute a query, I open a connection for it if it doesn't have one already. The only "problem" here was that I had to close and re-open connections once an exception occurred, but this wasn't too hard.
The second gave me more trouble: The Python interface to SQLite responds to concurrent accesses by throwing an exception. So if some transaction is in progress and another thread tries to start one, that thread fails. Obviously that's not acceptable, so I had to somehow handle the exception and retry the "failed" transaction. For some reason I got inspired by the Ethernet protocol and the idea of collision handling by exponential backoff. I added a pinch of randomness and limited the maximum timeout to two seconds after lots of performance experiments, but that's what I did. Yes, it may seem like a dumb idea in retrospect, but of course it didn't seem all that dumb at the time: I didn't have much experience with Python threads, I needed to get the application done, and all this actually worked. Amazing. :-D
Back to my lecture epiphany: General producer-consumer systems assume n producers and m consumers, but what do we get for m=1? A beautiful special case that solves my problem:
- If only one thread talks to the database, I only ever need a single database connection.
- If only one thread talks to the database, all transactions will be completely serialized and there will never be an exception due to concurrent access.
Perfect! Of course this seemed too good to be true, so I didn't really believe I had seen all of the issues yet. Yesterday I finally had the time to re-implement the concurrency handling using a producer-consumer model. And guess what? Learning all I had to about threads was a breeze, the code is less complicated than the previous version, and the whole thing performs better too.
Lessons? First, lecturers are people too, so we make bad decisions all the time. Second, think carefully about concurrency issues before you start hacking your next web application. Third, don't be afraid to re-factor an essential part of your application. Fourth, don't get too attached to cute ideas, once you have a better albeit more bland approach, throw out the cute one. And finally, learn what I don't seem to be able to: how to write concise blog posts. :-D
Update 2009/11/10: Three things to point out: First, I am still not using the new interface in production, but that's mostly because I changed a lot of other features in my app and I don't want to release too many at once.
Second, I had three processes running before, the web app and two "helpers" that would work on the database every now and then. This worked because each process used the same database interface which would retry transactions if they failed. However, the new database interface doesn't retry, therefore I can't have multiple processes running on the database: I had to rewrite my processes as threads of the main web application. That actually worked out well, especially since I now have more control over them since I get configuration and logging support for free from the web framework.
Third, in all my excitement about not having to catch database exceptions and retry transactions, I forgot that there are exceptions that I do want to let the caller know about, for example if an integrity constraint is violated. Since the exception now happens in the worker thread but I have to tell the calling thread about it, a brief moment of hilarity followed. I was actually thinking I had finally found The Problem with the new worker thread approach. Alas, Python to the rescue. :-D I simply catch the exception in the worker thread and stuff it into the request object before waking up the calling thread. In the calling thread I check the exception entry in the request before I check the result entry, and if I find an exception, I (re-)raise it in the calling thread. And that works! :-D
So far I am still very happy with the new approach. I will deploy it sometime next week for general use, and if something goes wrong then I'll update the post again. I hope nothing bad happens of course, this cost a lot of time already...
Update 2010/05/22: I finally put the new database interface into production two weeks ago. It works great, the performance of my web application is through the roof, and everybody using it seems just as happy as I am. Small successes... :-D