Saturday, November 7, 2009

Stuff for Gentoo Servers

I've set up a lot of servers using Gentoo lately, so I've collected a few things that I really don't want to be without on a server. In no particular order, here's what you really have to install after the first boot (or while still in the chroot if you prefer):

  • screen
  • sudo
  • mirrorselect (the automatic modes work too!)
  • ccache
  • logrotate
  • syslog-ng
  • vixie-cron
  • slocate
  • vim
  • gentoolkit
  • netcat
  • iptables
  • htop
  • localepurge
  • logsentry

For ccache you need to make some changes to /etc/make.conf, in my experience even if the emerge message tells you that things are going to be configured automatically. Add this (whatever size you want):

  • FEATURES="ccache"
  • CCACHE_SIZE="1G"

In terms of USE flags in /etc/make.conf, most servers can do without a lot of stuff that just costs time in recompiles and installs. All of the following should be OFF, that is with a minus in front:

  • nls (although some packages sort-of want it after all, sigh)
  • fortran
  • java
  • X
  • xorg
  • qt
  • qt3
  • qt3support
  • qt4
  • gtk
  • gtk2
  • gnome
  • kde
  • wxwindows
  • sdl
  • cups (unless you're running a print server of course, lol)
  • alsa
  • oss
  • pcmcia
  • wifi

There are also a few USE flags that you should consider switching ON instead:

  • lzma
  • bash-completion (you still have to enable it using eselect)
  • vim-syntax
  • ssh
  • ssl
  • threads

I am sure I'll find some more useful stuff, and I am sure I forgot some things as well. I'll update the post as I make more progress. :-D

Thursday, November 5, 2009

Threads and SQLite

One great thing about teaching is that as you tell students how to do something, you constantly get to re-evaluate what you did yourself in a similar situation.

Case in point: I was talking to my students in Unix Systems Programming about multi-threaded producer-consumer systems and how to use queues to coordinate them. While going through some examples, I noticed that I had made a really bad call some time ago when I integrated an SQLite database with a multi-threaded web application.

Some background? I have a web application written in CherryPy, a very nice but also very multi-threaded Python framework. I decided to use SQLite as the database for my application because I didn't want to deal with the complexities of setting up MySQL or something similar. You may say "That's your mistake right there!" but hey, it's what I did and I don't want to change databases right now. (I also don't want to switch the application to some ORM at this point, but of course I should probably have used SQLAlchemy from the beginning.)

In case you don't know: SQLite doesn't like multiple threads to begin with as it uses a global lock for the whole database. Also, the Python interface to SQLite doesn't like multiple threads: You can't share objects created through the interface among multiple threads. So I had to do two things:

  1. Get each CherryPy thread its own database connection (the only way to generate more SQLite objects).
  2. Handle the (inevitable) case that two threads want to access the database concurrently.

The first was easy to solve: I maintain a dictionary of database connections indexed by thread. When a thread wants to execute a query, I open a connection for it if it doesn't have one already. The only "problem" here was that I had to close and re-open connections once an exception occurred, but this wasn't too hard.

The second gave me more trouble: The Python interface to SQLite responds to concurrent accesses by throwing an exception. So if some transaction is in progress and another thread tries to start one, that thread fails. Obviously that's not acceptable, so I had to somehow handle the exception and retry the "failed" transaction. For some reason I got inspired by the Ethernet protocol and the idea of collision handling by exponential backoff. I added a pinch of randomness and limited the maximum timeout to two seconds after lots of performance experiments, but that's what I did. Yes, it may seem like a dumb idea in retrospect, but of course it didn't seem all that dumb at the time: I didn't have much experience with Python threads, I needed to get the application done, and all this actually worked. Amazing. :-D

Back to my lecture epiphany: General producer-consumer systems assume n producers and m consumers, but what do we get for m=1? A beautiful special case that solves my problem:

  1. If only one thread talks to the database, I only ever need a single database connection.
  2. If only one thread talks to the database, all transactions will be completely serialized and there will never be an exception due to concurrent access.

Perfect! Of course this seemed too good to be true, so I didn't really believe I had seen all of the issues yet. Yesterday I finally had the time to re-implement the concurrency handling using a producer-consumer model. And guess what? Learning all I had to about threads was a breeze, the code is less complicated than the previous version, and the whole thing performs better too.

Lessons? First, lecturers are people too, so we make bad decisions all the time. Second, think carefully about concurrency issues before you start hacking your next web application. Third, don't be afraid to re-factor an essential part of your application. Fourth, don't get too attached to cute ideas, once you have a better albeit more bland approach, throw out the cute one. And finally, learn what I don't seem to be able to: how to write concise blog posts. :-D

Update 2009/11/10: Three things to point out: First, I am still not using the new interface in production, but that's mostly because I changed a lot of other features in my app and I don't want to release too many at once.

Second, I had three processes running before, the web app and two "helpers" that would work on the database every now and then. This worked because each process used the same database interface which would retry transactions if they failed. However, the new database interface doesn't retry, therefore I can't have multiple processes running on the database: I had to rewrite my processes as threads of the main web application. That actually worked out well, especially since I now have more control over them since I get configuration and logging support for free from the web framework.

Third, in all my excitement about not having to catch database exceptions and retry transactions, I forgot that there are exceptions that I do want to let the caller know about, for example if an integrity constraint is violated. Since the exception now happens in the worker thread but I have to tell the calling thread about it, a brief moment of hilarity followed. I was actually thinking I had finally found The Problem with the new worker thread approach. Alas, Python to the rescue. :-D I simply catch the exception in the worker thread and stuff it into the request object before waking up the calling thread. In the calling thread I check the exception entry in the request before I check the result entry, and if I find an exception, I (re-)raise it in the calling thread. And that works! :-D

So far I am still very happy with the new approach. I will deploy it sometime next week for general use, and if something goes wrong then I'll update the post again. I hope nothing bad happens of course, this cost a lot of time already...

Update 2010/05/22: I finally put the new database interface into production two weeks ago. It works great, the performance of my web application is through the roof, and everybody using it seems just as happy as I am. Small successes... :-D

Wednesday, October 14, 2009

Meet Sparcacus

Good news if you're thinking of taking my compilers course next time I teach it: We have yet another new member in the machine park! Meet Sparcacus, an old Sun box with SPARC processors (yes, plural!). Once again my pal uber came through for all of us by giving me this box as a present! Thanks dude! This means the next compilers course will force you to generate code for either MIPS, PowerPC, or SPARC. Oh glorious future that is found in our RISCy past. :-D

PS: Sparcacus is none of the Sun boxes I had gotten my fingers on previously, all of those could not be made to work. So now I have plenty of Sun hardware left over for people to pick up. Just email me if you're interested.

Update 2009/11/05: I've had some trouble with the disk in Sparcy, so now I got a hold of a few more disks. I'll have to try those and see if I can find enough working disks to build a little RAID. But as of right now, Sparcy is a bit... Unstable? :-(

Wednesday, September 16, 2009

Got distcc(d) working!

I finally have distcc and distccd working, so now my ancient PowerPC iMac actually has all its code compiled by a semi-modern dual-core AMD machine. Woohoo!

Most of this was Gentoo magic that worked right out of the box, at least for portage. I still have not been able to prove to myself that I can actually use distcc manually as well. Of course that's not nearly as important... :-D

Next I have to try pump mode, and then it's off to setting up the old SPARC box I have lying around, also with distcc most likely. (It's equally straightforward to generate another cross-compiler using Gentoo's crossdev script. Sweet.)

Monday, September 7, 2009

I <3 LVM2

So when I installed Debian on my MSI Wind Netbook the other day, I was asked whether I wanted to have encrypted LVM2 volumes. I said yes, and I am already glad I did today: resizing works! :-D

Debian decided to give me a 4GB / and a 138GB /home which was a little unbalanced since I needed to install a lot of stuff. Also it made the swap space 2.3GB instead of the 4GB I would have liked. I was really bummed out for a few minutes until I remembered the LVM2 stuff.

So I went ahead and used resize2fs to make /home smaller, then lvreduce to make the volume for /home smaller, then lvextend to extend the swap space and a new mkswap on it, and finally lvextend to make / 16GB and resize2fs to expand the file system as well. And it all worked. Amazing! :-D

Makes me wish I had done an LVM2 install on my desktop at home too, but sadly I didn't feel comfortable enough with it back then. I am already using it on the gaming lab server, but I never had a reason to resize anything on there yet. Good to know that it'll work if I ever need to. I <3 LVM2! :-D

DenyHosts on Gentoo

When I set up a server, I like to move the port for sshd away from 22 to some high location, say 32767. At JHU, however, high ports are blocked by the good folks in IT. So machines I host on campus actually get attacked a good deal more than machines I host off campus where I control the firewall. Talk about "security" measures around here. :-(

I looked around for a nice way to ban attackers who try to get into my machines and settled on DenyHosts as my favorite. One emerge later I was editing the configuration file, and after I got done with that the trouble started.

First sshd completely ignored the /etc/hosts.deny file that DenyHosts 2.6-r1 writes into. Maybe I forgot to install tcp-wrappers? Nope, those are there. Maybe I forgot to build sshd with the tcpd USE flag? No, that's there. It turns out that the default sshd configuration will bind to all interfaces on your machine, and for some reason that leads to entries in /etc/hosts.deny not being respected. The details are muddy, at least to me, but adding a ListenAddress your.ip.here.please solves the problem. And you gotta put your actual ip address!

So once that's working, I try the init script to (re)start DenyHosts. And it fails. At least that's what the init script says, in htop I can clearly see that I have a denyhosts process running now. What do you know, the init script that comes with Denyhosts 2.6-r1 on Gentoo is broken. You need to replace --name denyhosts with --name /path/to/python instead. Yes, you'll have to change it every time you update the Python interpreter to a new major version. What can I say? Someone needs to rewrite the init script from scratch I guess.

So now I have DenyHosts running, and script kiddies who try to get into my machine are banned. What else could I wish for? I don't know, a similar tool for Apache maybe? :-D

Sunday, September 6, 2009

The Domains of JHU

Don't ask me how I got into looking this stuff up, what matters is that I did look it up... :-D Apparently Johns Hopkins owns somewhere between 200 and 450 domain names. I am not sure how reliable the statistics are, but here are the links to what I saw:

low estimate
high estimate

Yes, I took the liberty of rounding the numbers, both up and down. I started wondering how many domain names other universities own. Here's my biased sample (data taken from the same site):

harvard.edu 242-391
yale.edu 186-405
princeton.edu 50-247
umd.edu 86-479
ucsd.edu 48
uci.edu 19-53
umbc.edu 4-8
ucr.edu 4-7

And just for reference, microsoft.com has 24,609-29,017 domains associated with it. So while we at JHU are "small fish" compared to M$, it's still somewhat surprising to me that universities, especially some who consider themselves "Ivy League," are domain hogs. :-)

Friday, April 17, 2009

Cooling Down Your Hot iMac G3 Server

I had a post here long ago about how the local ACM chapter handed me two old iMacs. About time I did something with them! :-)

Well, one of them, the faster one, got a CD stuck in its throat, so that's something for a later post. Slot-loading drives be damned! But the really old one, the original Series A iMac they gave me, is alive and kicking: Meet Zach!

As you can see, I am running Gentoo Linux on it instead of my usual choice for old machines, NetBSD. Turns out that NetBSD is a real bummer on these machines and that Gentoo just worked right out of the proverbial box. I even did the install remotely and it booted properly the first time, something that never worked out for me before. Excellent!

Well... Not quite. :-( At first I had the same problem I had when I ran NetBSD: the display wouldn't shut down. Now these old iMacs get really, really warm when the display is on. Not actually hot, but really, really warm. So running that thing as a 24/365 server wasn't really an option at first. But now it is! :-)

Thanks to my pal uber, I figured out how to make the monitor power down. Now I have a quiet and cool PowerPC server, just the thing I needed to be a little happier. Here's how you do it:
setterm -powersave powerdown
setterm -powerdown 1
setterm -blank 1
For some reason I had to do all three, not exactly sure why the last one is needed. But in any case, now the display shuts off after one minute. You have to run these from the actual console, but you can also add them to whatever local startup file your Linux distribution uses. On Gentoo, you can slap them into /etc/conf.d/local.start for example. Presumably your kernel has to support power management somehow (not really sure about that). Just in case I checked my kernel configuration, and the only thing related to power that I have set seems to be this option:
CONFIG_APM_POWER=y
As far as I can tell, one kernel option (which is a default on most Linux distributions I think) and three commands in a local startup file is all it takes: You can all turn your old iMacs into sexy servers now! So join the revolution! :-)

Zach has been running for about two weeks without any problems, heat or otherwise. The only painful thing with Gentoo is that everything gets compiled from source, which can take over a day on a slow box if you have to build gcc as well. Hint: USE="-fortran" is your friend. Eventually I'll look into distcc more seriously so modern boxes can do the grunt work instead of those tiny little PowerPC chips. Probably a blog post for next year...

Tuesday, February 3, 2009

Overdue Update

Well, it's been a while. :-) Since my last blogging craze I have been busy working mostly on stuff related to the Johns Hopkins Gaming Lab and the new course we're offering this semester.

In addition to the file server, I've built two gaming machines which are (or will be for the second one) over in the Mattin Center, ready for our students to crank. I have pictures of those two builds, but I am not sure I'll have the time to post them soon.

The course is going pretty well, teams are assembled and have mentors, and the first meetings and blog posts are starting to appear. Woohoo! :-) Might be a good time to thank everybody who is helping with our efforts: Thank you! (This thank you applies to all of you, you know who you are!)

I expect to be quite busy for the rest of the semester, but maybe I can make it on here every now and then to let you know what's up. In the meantime: Play more games! Hack more on your games! And enjoy your stay at JHU... :-)