Today was all about hacking. First Walter went over the basic commands for git, then we started working on his Abacus activity. Walter suggested adding a new kind of Abacus, but I saw a piece of code that was repeated like 10 times in a row and decided to replace that with a loop instead. Yeah, I don't follow instructions well... :-D But hey, now I left my (very small) stamp on a piece of free software that others actually work on as well, not too shabby!
The lucky thing about the Abacus was that it runs fine outside of Sugar. I like Sugar in principle, but I don't want to hack inside its unusual confines: I want my actual Unix system! The activity we picked for the homework tonight, Measure, sadly doesn't run outside of Sugar. I played with Physics a bit too, but couldn't get that to work well outside of Sugar either. Also there are a few Physics forks, some of which work better in Gnome than others. Sadly mainline doesn't work well at all. :-( So I am not too sure what to work on tonight, but maybe I'll just keep working on Abacus instead. :-D
I had a great experience with IRC today: I actually got help from a complete stranger! We have a "side task" not related to code, and I picked translating some leftover strings from the Measure activity to German. But the registration for translate.sugarlabs.org was borked for a while this afternoon (has been fixed since). Luckily one of the admins was around and approved my account without the confirmation email that never got to my gmail.com inbox. Of course now that they fixed stuff, I finally got the email. :-D
I am looking forward to having dinner again tonight with the other three people who are stranded in Worcester hotels: Kristina, Mihaela, and Peter (yes, another one!). Last night we had some delicious Mexican food, and tonight I believe we're all going to "smell like meat" (to quote Mihaela) when we roll around in Brazilian BBQ. :-D
Update: Actually, we went to a place called Brew City and they had Bavarian beer: Ayinger! I don't usually drink that, but I had to have two bottles even at $8.50 a pop (wow). Very good stuff, and the food was decent too!
Wednesday, June 9, 2010
Monday, June 7, 2010
POSSE Worcester, Day 1
Just a quick note that we got started with POSSE today. Yay! We all introduced each other and why we're here, then Mel gave an overview of FOSS development and Walter introduced Sugar, the project we'll be working on for the rest of the week.
The exercises came next, so I now have even more accounts to keep track off: bugs.sugarlabs.org, git.sugarlabs.org, wiki.sugarlabs.org, admin.fedoraproject.org, and bugzilla.redhat.com. I even have a user page on the Sugarlabs wiki, courtesy of Karl. :-D
Lunch at Worcester State College was amazing, possibly the best Chilli I've had on any campus anywhere. At least outside of Texas anyway!
On a more serious note, Walter got me thinking about the Python course I am designing for next Fall. Seems that Sugar may actually be a very good environment to start people out in. It seems to offer a path that gradually introduces more Python and de-emphasizes Sugar, at least if I understand it correctly. Not sure if that should be called "Lighter" then? :-D I'll play with Sugar much more this week, so I'll certainly find out whether it's a good fit by Friday.
The exercises came next, so I now have even more accounts to keep track off: bugs.sugarlabs.org, git.sugarlabs.org, wiki.sugarlabs.org, admin.fedoraproject.org, and bugzilla.redhat.com. I even have a user page on the Sugarlabs wiki, courtesy of Karl. :-D
Lunch at Worcester State College was amazing, possibly the best Chilli I've had on any campus anywhere. At least outside of Texas anyway!
On a more serious note, Walter got me thinking about the Python course I am designing for next Fall. Seems that Sugar may actually be a very good environment to start people out in. It seems to offer a path that gradually introduces more Python and de-emphasizes Sugar, at least if I understand it correctly. Not sure if that should be called "Lighter" then? :-D I'll play with Sugar much more this week, so I'll certainly find out whether it's a good fit by Friday.
Thursday, May 27, 2010
Python ORMs
There is really no point to this post except to remind me of all the Python ORMs I promised myself I'd look at over the summer. I want to convert my web application from raw SQL(ite) to some ORM, but there are way too many of them. I could roll dice, but that doesn't seem appropriate somehow... :-D
Autumn
Axiom
DejaVu
Elixir
Membrane
Storm
SQLAlchemy
SQLObject
XRecord
And then there are these Gadfly and buzhug and SnakeSQL things, not ORMs of course, but interesting anyway...
Update 2010/06/24: Alright, I've looked at a few of these in more detail now. Seems that Autumn has not been updated in a while, so it's off the list. Also Elixir being a declarative layer over SQLAlchemy seems a little strange now that SQLAlchemy got it's own declarative layer, so it's off the list.
I implemented a basic model layer for my web application using both SQLObject and Storm, so those two I actually sort of grok now. The first major difference is that Storm requires writing some raw SQL to create tables and related schema stuff, whereas SQLObject tries to hide SQL even for those tasks. The second major difference is that Storm separates object creation from persistence whereas SQLObject combines the two to some extent; this can be good or bad depending on what your application needs to do. Also, coming from a "raw SQL" background, both Storm and SQLObject have some "issues" when it comes to formulating complex queries. Nothing much to be done about that I guess, but I still don't like it all that much.
If I had to pick an ORM right now, I'd probably pick SQLObject. But I have a few more to evaluate so stay tuned. :-D
Autumn
Axiom
DejaVu
Elixir
Membrane
Storm
SQLAlchemy
SQLObject
XRecord
And then there are these Gadfly and buzhug and SnakeSQL things, not ORMs of course, but interesting anyway...
Update 2010/06/24: Alright, I've looked at a few of these in more detail now. Seems that Autumn has not been updated in a while, so it's off the list. Also Elixir being a declarative layer over SQLAlchemy seems a little strange now that SQLAlchemy got it's own declarative layer, so it's off the list.
I implemented a basic model layer for my web application using both SQLObject and Storm, so those two I actually sort of grok now. The first major difference is that Storm requires writing some raw SQL to create tables and related schema stuff, whereas SQLObject tries to hide SQL even for those tasks. The second major difference is that Storm separates object creation from persistence whereas SQLObject combines the two to some extent; this can be good or bad depending on what your application needs to do. Also, coming from a "raw SQL" background, both Storm and SQLObject have some "issues" when it comes to formulating complex queries. Nothing much to be done about that I guess, but I still don't like it all that much.
If I had to pick an ORM right now, I'd probably pick SQLObject. But I have a few more to evaluate so stay tuned. :-D
Sunday, May 23, 2010
QEMU: The Machine Park Replacement?
Alright, so teaching 600.318/418: Operating Systems last semester forced me to finally take a serious look at QEMU. And guess what? I liked it! :-D (Special thanks to Venkatesh Srinivas for helping me getting used to QEMU!)
Since QEMU supports some "exotic" platforms like PowerPC and SPARC, platforms I love to use for 600.328/428: Compilers and Interpreters, I started a little project to see if I could replace my aging "machine park" with QEMU instances.
The bad news is that installing a standard Linux distro for anything but x86 is somewhat complicated on QEMU, at least I had a very hard time with it. The good news is that I found some very nice Debian images tailored specifically for QEMU. And most of those even work! :-D
A few minutes ago I was finally able to SSH into my first MIPS QEMU instance! Now I'll work on finishing the same setup for ARM and PowerPC. So far it looks like I can replace at least my Cobalt Qube (MIPS) and my iMac (PowerPC) with QEMU instances. SPARC is a bit of a problem child right now, so I'll keep working on my Ultra 60 (I made some good progress there BTW, a later post will have the details).
But why replace my "machine park" in the first place? Granted, it's great fun to keep those old machines running and to hack compiler backends on them. However, it's also a big drain on my time. And what's worse, it's obvious that eventually each of these machines is going to fail beyond repair. In addition it's going to be easier to backup QEMU images every now and then, and having them all hosted on the gaming lab server with its nice RAID-6 makes things a tad more reliable and predictable as well. So while I am not planning on actually getting rid of my old machines just yet, overall QEMU seems to be a much better tradeoff for instructional purposes.
Getting Debian ARM to work with screen: I like to run my QEMU instances in screen, so I want the console output to go to, well, the console. Here is how:
The important part is the ttyAMA0 thing, that convinces the Debian ARM kernel to use stdin/stdout for everything. So now I am happy with my virtual ARM box. :-D
Getting Debian PowerPC to work with screen: At first I couldn't get the PowerPC QEMU instance to work with screen at all, so I had to "fake" a display as follows:
This worked fine, but it just wasn't very satisfying. So I poked around a bit more and found a getty process listening on ttyPZ0. Given my experience with the other QEMU instances, I figured this must be where the serial console is. However, since I didn't provide a kernel to QEMU directly, I couldn't append anything to the kernel command line either. So I copied the kernel from the PowerPC Debian image out and tried booting that way, but it wouldn't mount the root partition. So I copied the Debian initrd image out as well, and voila:
Now I have a PowerPC QEMU instance that works with screen. Not much more I can ask for at this point, I now have three platforms for the next compilers course and that means my students will finally have to write native ELF backends next time around. Yay! :-D
Since QEMU supports some "exotic" platforms like PowerPC and SPARC, platforms I love to use for 600.328/428: Compilers and Interpreters, I started a little project to see if I could replace my aging "machine park" with QEMU instances.
The bad news is that installing a standard Linux distro for anything but x86 is somewhat complicated on QEMU, at least I had a very hard time with it. The good news is that I found some very nice Debian images tailored specifically for QEMU. And most of those even work! :-D
A few minutes ago I was finally able to SSH into my first MIPS QEMU instance! Now I'll work on finishing the same setup for ARM and PowerPC. So far it looks like I can replace at least my Cobalt Qube (MIPS) and my iMac (PowerPC) with QEMU instances. SPARC is a bit of a problem child right now, so I'll keep working on my Ultra 60 (I made some good progress there BTW, a later post will have the details).
But why replace my "machine park" in the first place? Granted, it's great fun to keep those old machines running and to hack compiler backends on them. However, it's also a big drain on my time. And what's worse, it's obvious that eventually each of these machines is going to fail beyond repair. In addition it's going to be easier to backup QEMU images every now and then, and having them all hosted on the gaming lab server with its nice RAID-6 makes things a tad more reliable and predictable as well. So while I am not planning on actually getting rid of my old machines just yet, overall QEMU seems to be a much better tradeoff for instructional purposes.
Getting Debian ARM to work with screen: I like to run my QEMU instances in screen, so I want the console output to go to, well, the console. Here is how:
qemu-system-arm -M versatilepb -m 256 -kernel vmlinuz-2.6.26-2-versatile -initrd initrd.img-2.6.26-2-versatile -hda arm.img -append "root=/dev/sda1 console=ttyAMA0" -nographic
The important part is the ttyAMA0 thing, that convinces the Debian ARM kernel to use stdin/stdout for everything. So now I am happy with my virtual ARM box. :-D
Getting Debian PowerPC to work with screen: At first I couldn't get the PowerPC QEMU instance to work with screen at all, so I had to "fake" a display as follows:
qemu-system-ppc -m 256 -hda powerpc.img -no-reboot -vnc :0
This worked fine, but it just wasn't very satisfying. So I poked around a bit more and found a getty process listening on ttyPZ0. Given my experience with the other QEMU instances, I figured this must be where the serial console is. However, since I didn't provide a kernel to QEMU directly, I couldn't append anything to the kernel command line either. So I copied the kernel from the PowerPC Debian image out and tried booting that way, but it wouldn't mount the root partition. So I copied the Debian initrd image out as well, and voila:
qemu-system-ppc -m 256 -hda powerpc.img -no-reboot -nographic -initrd initrd.img-2.6.26-1-powerpc -kernel vmlinux-2.6.26-1-powerpc -append "root=/dev/hda3 console=ttyPZ0"
Now I have a PowerPC QEMU instance that works with screen. Not much more I can ask for at this point, I now have three platforms for the next compilers course and that means my students will finally have to write native ELF backends next time around. Yay! :-D
Saturday, April 10, 2010
A Modern Baldur's Gate Install?
I've been playing Baldur's Gate since it first came out in 1998, and I still think it's one of the best computer role-playing games ever made. It may be a little surprising, but in 2010 there's still a very active community around Baldur's Gate and other Infinity Engine games. How many other mainstream computer games can boast of a 12 year development cycle? I can only think of Quake 3 as coming even close...
Getting Baldur's Gate and friends to run on my Linux box wasn't very complicated, mostly because Wine provides pretty excellent support for it out of the box. However, I have not yet tapped into the vast array of modifications and customizations available for these games, something I want to rectify now. The problem? There are too many modifications! Some of those even overlap, so installation order determines the resulting gameplay experience in somewhat non-obvious ways.
The only "basic" decision I've made so far is that I will use the Baldur's Gate Trilogy framework which integrates all of the existing Baldur's Gate titles (Baldur's Gate, Tales of the Sword Coast, Shadows of Amn, Throne of Bhaal) into a single game. There are other ways of getting the more advanced Baldur's Gate 2 engine to play original Baldur's Gate content, but Baldur's Gate Trilogy seems better maintained than BG1Tutu as far as I can tell.
What to install on top of that? That's what I hope to detail in updates to this post. I'll experiment with various modifications and their installation and I'll try to document the effects of those here for fellow Baldur's Gate fanatics. I'll first try to work my way up to the install suggested in Dan Simpson's FAQ, which will probably take a few weeks given my schedule. Stay tuned! :-D
Notes on Wine: I had to update Wine to 1.1.42 to get Shadows of Amn installed. I am not sure which version broke it since an older Wine installed SoA fine before. Also, after being unable to install Throne of Bhaal into a new path due to a previously installed ToB, I decided to start with a fresh Wine directory for all of this. Maybe you want to do the same.
Baldur's Gate and Tales of the Sword Coast: I have the 4 in 1 boxed set, so Baldur's Gate was already patched to 1.1.4315 on install. However, Tales of the Sword Coast was at 1.3.5508 by default, so I patched it to 1.3.5512 before continuing. I also applied the DirectX 8 patch, which may or may not be a good idea, we'll see.
Shadows of Amn and Throne of Bhaal: I installed Shadows of Amn and patched it to 23037, then installed Throne of Bhaal and patched it to 26498 before continuing. I also applied the 26499 beta patch, which once again may or may not be a good idea, we'll see.
Checkpoint: I made a backup copy of the .wine folder at this point, mainly so I won't ever have to sit through the demo movies again (the BG and TotSC installers won't let you break out of those horrible movies). The .tar.gz for this was 4.2 GB. Pretty darn big! :-D
Baldur's Gate Trilogy: The first thing I tried was to install BGT without anything else. That fails bigtime since the installation scripts rely on various Windowsisms (yeah, that's a word!) that are not present in Wine. Luckily someone already worked out the kinks but it makes the installation process a little more involved; I found this article helpful too. So we start by installing dos2unix and mmv (I used emerge in Gentoo). Then grab mospack and compile it using "make -f makefile.unix" in the source directory, then add the source directory to your PATH. Now grab Baldur's Gate Trilogy and bgt_linux.rar and extract them into your Shadows of Amn directory. Now grab the Linux version of WeiDU, extract it, and add the resulting directory to your PATH. Now execute the "tolower" program from WeiDU in both the Baldur's Gate and the Shadows of Amn folders to convert everything to lower case. Note that for the scripts to work you need to remove the spaces and capital letter from your path names to the installation directories as well.
Update 2010/05/22: Bad news. I had to give up on this project because the install scripts simply assume way too much about Windoze to be processed nicely with Wine. So I bit the bullet and installed an ancient Windows 2000 CD I had lying around in VirtualBox instead. Now I can play all my favorite games with all the fun extensions I want. And over the last few weeks, the "guilt" I felt whenever I started VirtualBox also diminished. Of course it's still frustrating that I had to give up. :-(
Getting Baldur's Gate and friends to run on my Linux box wasn't very complicated, mostly because Wine provides pretty excellent support for it out of the box. However, I have not yet tapped into the vast array of modifications and customizations available for these games, something I want to rectify now. The problem? There are too many modifications! Some of those even overlap, so installation order determines the resulting gameplay experience in somewhat non-obvious ways.
The only "basic" decision I've made so far is that I will use the Baldur's Gate Trilogy framework which integrates all of the existing Baldur's Gate titles (Baldur's Gate, Tales of the Sword Coast, Shadows of Amn, Throne of Bhaal) into a single game. There are other ways of getting the more advanced Baldur's Gate 2 engine to play original Baldur's Gate content, but Baldur's Gate Trilogy seems better maintained than BG1Tutu as far as I can tell.
What to install on top of that? That's what I hope to detail in updates to this post. I'll experiment with various modifications and their installation and I'll try to document the effects of those here for fellow Baldur's Gate fanatics. I'll first try to work my way up to the install suggested in Dan Simpson's FAQ, which will probably take a few weeks given my schedule. Stay tuned! :-D
Notes on Wine: I had to update Wine to 1.1.42 to get Shadows of Amn installed. I am not sure which version broke it since an older Wine installed SoA fine before. Also, after being unable to install Throne of Bhaal into a new path due to a previously installed ToB, I decided to start with a fresh Wine directory for all of this. Maybe you want to do the same.
Baldur's Gate and Tales of the Sword Coast: I have the 4 in 1 boxed set, so Baldur's Gate was already patched to 1.1.4315 on install. However, Tales of the Sword Coast was at 1.3.5508 by default, so I patched it to 1.3.5512 before continuing. I also applied the DirectX 8 patch, which may or may not be a good idea, we'll see.
Shadows of Amn and Throne of Bhaal: I installed Shadows of Amn and patched it to 23037, then installed Throne of Bhaal and patched it to 26498 before continuing. I also applied the 26499 beta patch, which once again may or may not be a good idea, we'll see.
Checkpoint: I made a backup copy of the .wine folder at this point, mainly so I won't ever have to sit through the demo movies again (the BG and TotSC installers won't let you break out of those horrible movies). The .tar.gz for this was 4.2 GB. Pretty darn big! :-D
Baldur's Gate Trilogy: The first thing I tried was to install BGT without anything else. That fails bigtime since the installation scripts rely on various Windowsisms (yeah, that's a word!) that are not present in Wine. Luckily someone already worked out the kinks but it makes the installation process a little more involved; I found this article helpful too. So we start by installing dos2unix and mmv (I used emerge in Gentoo). Then grab mospack and compile it using "make -f makefile.unix" in the source directory, then add the source directory to your PATH. Now grab Baldur's Gate Trilogy and bgt_linux.rar and extract them into your Shadows of Amn directory. Now grab the Linux version of WeiDU, extract it, and add the resulting directory to your PATH. Now execute the "tolower" program from WeiDU in both the Baldur's Gate and the Shadows of Amn folders to convert everything to lower case. Note that for the scripts to work you need to remove the spaces and capital letter from your path names to the installation directories as well.
Update 2010/05/22: Bad news. I had to give up on this project because the install scripts simply assume way too much about Windoze to be processed nicely with Wine. So I bit the bullet and installed an ancient Windows 2000 CD I had lying around in VirtualBox instead. Now I can play all my favorite games with all the fun extensions I want. And over the last few weeks, the "guilt" I felt whenever I started VirtualBox also diminished. Of course it's still frustrating that I had to give up. :-(
Saturday, November 7, 2009
Stuff for Gentoo Servers
I've set up a lot of servers using Gentoo lately, so I've collected a few things that I really don't want to be without on a server. In no particular order, here's what you really have to install after the first boot (or while still in the chroot if you prefer):
For ccache you need to make some changes to
In terms of USE flags in
There are also a few USE flags that you should consider switching ON instead:
I am sure I'll find some more useful stuff, and I am sure I forgot some things as well. I'll update the post as I make more progress. :-D
- screen
- sudo
- mirrorselect (the automatic modes work too!)
- ccache
- logrotate
- syslog-ng
- vixie-cron
- slocate
- vim
- gentoolkit
- netcat
- iptables
- htop
- localepurge
- logsentry
For ccache you need to make some changes to
/etc/make.conf
, in my experience even if the emerge message tells you that things are going to be configured automatically. Add this (whatever size you want):- FEATURES="ccache"
- CCACHE_SIZE="1G"
In terms of USE flags in
/etc/make.conf
, most servers can do without a lot of stuff that just costs time in recompiles and installs. All of the following should be OFF, that is with a minus in front:- nls (although some packages sort-of want it after all, sigh)
- fortran
- java
- X
- xorg
- qt
- qt3
- qt3support
- qt4
- gtk
- gtk2
- gnome
- kde
- wxwindows
- sdl
- cups (unless you're running a print server of course, lol)
- alsa
- oss
- pcmcia
- wifi
There are also a few USE flags that you should consider switching ON instead:
- lzma
- bash-completion (you still have to enable it using eselect)
- vim-syntax
- ssh
- ssl
- threads
I am sure I'll find some more useful stuff, and I am sure I forgot some things as well. I'll update the post as I make more progress. :-D
Thursday, November 5, 2009
Threads and SQLite
One great thing about teaching is that as you tell students how to do something, you constantly get to re-evaluate what you did yourself in a similar situation.
Case in point: I was talking to my students in Unix Systems Programming about multi-threaded producer-consumer systems and how to use queues to coordinate them. While going through some examples, I noticed that I had made a really bad call some time ago when I integrated an SQLite database with a multi-threaded web application.
Some background? I have a web application written in CherryPy, a very nice but also very multi-threaded Python framework. I decided to use SQLite as the database for my application because I didn't want to deal with the complexities of setting up MySQL or something similar. You may say "That's your mistake right there!" but hey, it's what I did and I don't want to change databases right now. (I also don't want to switch the application to some ORM at this point, but of course I should probably have used SQLAlchemy from the beginning.)
In case you don't know: SQLite doesn't like multiple threads to begin with as it uses a global lock for the whole database. Also, the Python interface to SQLite doesn't like multiple threads: You can't share objects created through the interface among multiple threads. So I had to do two things:
The first was easy to solve: I maintain a dictionary of database connections indexed by thread. When a thread wants to execute a query, I open a connection for it if it doesn't have one already. The only "problem" here was that I had to close and re-open connections once an exception occurred, but this wasn't too hard.
The second gave me more trouble: The Python interface to SQLite responds to concurrent accesses by throwing an exception. So if some transaction is in progress and another thread tries to start one, that thread fails. Obviously that's not acceptable, so I had to somehow handle the exception and retry the "failed" transaction. For some reason I got inspired by the Ethernet protocol and the idea of collision handling by exponential backoff. I added a pinch of randomness and limited the maximum timeout to two seconds after lots of performance experiments, but that's what I did. Yes, it may seem like a dumb idea in retrospect, but of course it didn't seem all that dumb at the time: I didn't have much experience with Python threads, I needed to get the application done, and all this actually worked. Amazing. :-D
Back to my lecture epiphany: General producer-consumer systems assume n producers and m consumers, but what do we get for m=1? A beautiful special case that solves my problem:
Perfect! Of course this seemed too good to be true, so I didn't really believe I had seen all of the issues yet. Yesterday I finally had the time to re-implement the concurrency handling using a producer-consumer model. And guess what? Learning all I had to about threads was a breeze, the code is less complicated than the previous version, and the whole thing performs better too.
Lessons? First, lecturers are people too, so we make bad decisions all the time. Second, think carefully about concurrency issues before you start hacking your next web application. Third, don't be afraid to re-factor an essential part of your application. Fourth, don't get too attached to cute ideas, once you have a better albeit more bland approach, throw out the cute one. And finally, learn what I don't seem to be able to: how to write concise blog posts. :-D
Update 2009/11/10: Three things to point out: First, I am still not using the new interface in production, but that's mostly because I changed a lot of other features in my app and I don't want to release too many at once.
Second, I had three processes running before, the web app and two "helpers" that would work on the database every now and then. This worked because each process used the same database interface which would retry transactions if they failed. However, the new database interface doesn't retry, therefore I can't have multiple processes running on the database: I had to rewrite my processes as threads of the main web application. That actually worked out well, especially since I now have more control over them since I get configuration and logging support for free from the web framework.
Third, in all my excitement about not having to catch database exceptions and retry transactions, I forgot that there are exceptions that I do want to let the caller know about, for example if an integrity constraint is violated. Since the exception now happens in the worker thread but I have to tell the calling thread about it, a brief moment of hilarity followed. I was actually thinking I had finally found The Problem with the new worker thread approach. Alas, Python to the rescue. :-D I simply catch the exception in the worker thread and stuff it into the request object before waking up the calling thread. In the calling thread I check the exception entry in the request before I check the result entry, and if I find an exception, I (re-)raise it in the calling thread. And that works! :-D
So far I am still very happy with the new approach. I will deploy it sometime next week for general use, and if something goes wrong then I'll update the post again. I hope nothing bad happens of course, this cost a lot of time already...
Update 2010/05/22: I finally put the new database interface into production two weeks ago. It works great, the performance of my web application is through the roof, and everybody using it seems just as happy as I am. Small successes... :-D
Case in point: I was talking to my students in Unix Systems Programming about multi-threaded producer-consumer systems and how to use queues to coordinate them. While going through some examples, I noticed that I had made a really bad call some time ago when I integrated an SQLite database with a multi-threaded web application.
Some background? I have a web application written in CherryPy, a very nice but also very multi-threaded Python framework. I decided to use SQLite as the database for my application because I didn't want to deal with the complexities of setting up MySQL or something similar. You may say "That's your mistake right there!" but hey, it's what I did and I don't want to change databases right now. (I also don't want to switch the application to some ORM at this point, but of course I should probably have used SQLAlchemy from the beginning.)
In case you don't know: SQLite doesn't like multiple threads to begin with as it uses a global lock for the whole database. Also, the Python interface to SQLite doesn't like multiple threads: You can't share objects created through the interface among multiple threads. So I had to do two things:
- Get each CherryPy thread its own database connection (the only way to generate more SQLite objects).
- Handle the (inevitable) case that two threads want to access the database concurrently.
The first was easy to solve: I maintain a dictionary of database connections indexed by thread. When a thread wants to execute a query, I open a connection for it if it doesn't have one already. The only "problem" here was that I had to close and re-open connections once an exception occurred, but this wasn't too hard.
The second gave me more trouble: The Python interface to SQLite responds to concurrent accesses by throwing an exception. So if some transaction is in progress and another thread tries to start one, that thread fails. Obviously that's not acceptable, so I had to somehow handle the exception and retry the "failed" transaction. For some reason I got inspired by the Ethernet protocol and the idea of collision handling by exponential backoff. I added a pinch of randomness and limited the maximum timeout to two seconds after lots of performance experiments, but that's what I did. Yes, it may seem like a dumb idea in retrospect, but of course it didn't seem all that dumb at the time: I didn't have much experience with Python threads, I needed to get the application done, and all this actually worked. Amazing. :-D
Back to my lecture epiphany: General producer-consumer systems assume n producers and m consumers, but what do we get for m=1? A beautiful special case that solves my problem:
- If only one thread talks to the database, I only ever need a single database connection.
- If only one thread talks to the database, all transactions will be completely serialized and there will never be an exception due to concurrent access.
Perfect! Of course this seemed too good to be true, so I didn't really believe I had seen all of the issues yet. Yesterday I finally had the time to re-implement the concurrency handling using a producer-consumer model. And guess what? Learning all I had to about threads was a breeze, the code is less complicated than the previous version, and the whole thing performs better too.
Lessons? First, lecturers are people too, so we make bad decisions all the time. Second, think carefully about concurrency issues before you start hacking your next web application. Third, don't be afraid to re-factor an essential part of your application. Fourth, don't get too attached to cute ideas, once you have a better albeit more bland approach, throw out the cute one. And finally, learn what I don't seem to be able to: how to write concise blog posts. :-D
Update 2009/11/10: Three things to point out: First, I am still not using the new interface in production, but that's mostly because I changed a lot of other features in my app and I don't want to release too many at once.
Second, I had three processes running before, the web app and two "helpers" that would work on the database every now and then. This worked because each process used the same database interface which would retry transactions if they failed. However, the new database interface doesn't retry, therefore I can't have multiple processes running on the database: I had to rewrite my processes as threads of the main web application. That actually worked out well, especially since I now have more control over them since I get configuration and logging support for free from the web framework.
Third, in all my excitement about not having to catch database exceptions and retry transactions, I forgot that there are exceptions that I do want to let the caller know about, for example if an integrity constraint is violated. Since the exception now happens in the worker thread but I have to tell the calling thread about it, a brief moment of hilarity followed. I was actually thinking I had finally found The Problem with the new worker thread approach. Alas, Python to the rescue. :-D I simply catch the exception in the worker thread and stuff it into the request object before waking up the calling thread. In the calling thread I check the exception entry in the request before I check the result entry, and if I find an exception, I (re-)raise it in the calling thread. And that works! :-D
So far I am still very happy with the new approach. I will deploy it sometime next week for general use, and if something goes wrong then I'll update the post again. I hope nothing bad happens of course, this cost a lot of time already...
Update 2010/05/22: I finally put the new database interface into production two weeks ago. It works great, the performance of my web application is through the roof, and everybody using it seems just as happy as I am. Small successes... :-D
Subscribe to:
Posts (Atom)