Friday, April 22, 2016

Tracking Positions in Go: Why Composition Rocks

In this post I discuss the genesis of the streampos package for Go that I recently posted on github. I don't normally write about code I post, but in this case I learned something cute that I wanted to share. Maybe you'll find it enjoyable too?

The story starts with me writing some Go code to process XML files. Of course I used the encoding/xml package from the Go standard library to do this. A few hours into the job I needed to generate error messages when something is not quite right in the XML file. (If you've dealt with large-ish XML files that get edited manually, you probably know that it's quite easy to make the occassional mistake that syntax highlighting alone will not protect you from.) In order for those error messages to be useful, they should tell the user where in the XML file the problem was detected. And that's when I ran into trouble: There's no straightforward way to get, say, line numbers out of encoding/xml!

Sure, their code tracks line numbers in support of their error messages (check
SyntaxError for example) but if you have your own errors that go beyond what the standard library checks, you're out of luck. Well, not completely out of luck. If you're using Decoder to do stream-based XML processing, you can get something moderately useful: the InputOffset method will give you the current offset in bytes since the beginning of the stream.

What do you have to do to turn that into error messages of the kind users expect, so error messages in terms of line (and maybe column) numbers? First you somehow have to get your hands on the raw input stream. Then you look for newlines and build a data structure that allows you to map a range of offsets in the stream into a line number. With a little more code on top, you even get out column numbers if you want them. Sounds like fun, but just how should we do it?

To use Decoder you have to give it a Reader to grab input from. This is promising because Go very much prides itself in the power that simple interfaces like Reader and Writer provide. Indeed, the pattern of wrapping one Reader inside another (or one Writer inside another) is fundamental in much of the standard I/O library. But we can also find those interfaces in "unrelated" parts of the library: The Hash interface, for example, is a Writer that computes hashes over the data written to it.

So the Decoder wants a Reader, but thanks to interfaces any Reader will do. It's not a big leap to think "Hey, I can hack my own Reader that tracks line numbers, and I'll pass that one to Decoder instead of the original Reader!" That's indeed what I considered doing for a few minutes. Luckily I then realized that by hacking a Reader I am actually making myself more problems than I had to begin with.

I want to solve the problem of building a mapping from offsets to line numbers. However, as a Reader, I also have to solve the problem of providing data to whoever is calling me. So whatever the original problem was, as soon as I decide to solve it inside a Reader, I immediately get a second problem. And it's not an entirely trivial problem either! For example, I have to consider what the code should do when asked to provide 37 bytes of data but the underlying Reader I am wrapping only gave me 25. (The answer, at least in Go land, is to return the short read. But notice that it's something I had to think about for a little while, so it cost me time.) On a more philosophical level, inserting a Reader makes my code an integral part of the entire XML thing. I never set out to do that! I just wanted a way to collect position information "on the side" without getting in anybody's way.

In the specific case of encoding/xml things actually get even funnier. It turns out that Decoder checks whether the Reader we hand it is actually a ByteReader. If it's not, Decoder chooses to wrap the Reader we hand it again, this time in a bufio.Reader. So either I have to implement a ByteReader myself, or I have to live with the fact that plugging in my own Reader causes another level on indirection to be added, unnecessarily to some extent. (That's yet another problem I don't want to have to deal with!) There really should be a better way.

And there is: Instead of hacking a Reader, just hack a Writer! I can almost see you shaking your head at this point. "If the Decoder wants a Reader, what good is it going to do you to hack a Writer?" I'll get to that in a second, first let's focus on what being a Writer instead of a Reader buys us.

The most important thing is that as a Writer, we can decide to be "the sink" where data disappears. That is, after all, exactly what the Hash interface does: It's a Writer that turns a stream into a hash value and nothing else, the stream itself disappears in the process. What's good enough for Hash is good enough for us: We can be a Writer that turns a stream into a data structure that maps offsets to line numbers. Note that a Reader doesn't have this luxury. Not ever. True, there could be Readers that are "the source" where data appears out of thin air, but there are no (sensible) Readers that can be "the sink" as described above.

A secondary effect of "being the sink" is that we don't have to worry about dealing with an "underlying Writer" that we wrap. (As a Reader, we'd have to deal with "both ends" as it were, at least in our scenario.) Also, just like in the case of Hash, any write we deal with cannot actually fail. (Except of course for things like running out of memory, but to a large degree those memory issues are something Go doesn't let us worry about in detail anyway.) This "no failures" property will actually come in handy.

Okay, so those are all nice things that will make our code simpler as long as we hack a Writer and not a Reader. But how the heck are we going to make our Writer "play nice" with the Decoder that after all requires a Reader? Enter a glorious little thing called TeeReader. (No, not TeaReader!) A TeeReader takes two arguments, a Reader r and a Writer w, and returns another Reader t. When we read from t, that request is forwarded to r. But before the data from r gets returned through t, it's also written to w. Problem solved:

lines := &streampos.Writer{}
tee := io.TeeReader(os.Stdin, lines)
dec := xml.NewDecoder(tee)

There's just one small problem with TeeReader: If the write we're doing "on the side" fails, that write error turns into a read error for the client of TeeReader. Of course that client doesn't really know that there's a writer involved anywhere, so things could get confusing. Luckily, as I pointed out above, our Writer for position information never fails, so we cannot possibly generate additional errors for the client.

I could end the post here. But I don't want to sweep under the rug that there's a little cheating going on. Where? Well, you see, TeeReader is not a ByteReader either. So regardless of how nice the code in our Writer is, and regardless of how cute the setup above is, we incur the cost of an extra indirection when NewDecoder decides to wrap the TeeReader. What we're doing is shoving the problem back to the standard library. It's possible that TeeReader will eventually grow a ReadByte method at which point the needless wrapping would cease. However, that's not very likely given what TeeReader is designed to do. But note that this concern arises specifically in connection with encoding/xml. There are probably many applications that do not require methods beyond the Reader interface.

Speaking of other applications. In the Go ecosystem, interfaces such as Reader and Writer are extremely prominent. Lots of people write their code to take advantage of them. The nice thing is that streampos.Writer coupled with TeeReader provides a generic way to handle position information for all applications that use a Reader to grab textual data. Of course not all applications do, and not all applications will be able to take full advantage of it. But if you're writing one that does, and if you want to have position information for error messages, well, it's three lines of code as long as you already track offsets. And you have to track something yourself because after all only your application knows what parts of a stream are interesting.

I very much like that Go encourages this kind of reusability by composing small-ish, independently developed pieces. (Actually, that even confirms a few of the claims I made in my 2003 dissertation. Yay me!) The only "trouble" is that there are already a few places in the standard library where similar position tracking code exists: At the very least in text/scanner and in the Go compiler itself. Whether that code could use my little library I don't know for sure, maybe not. But I guess it should be a goal of the standard library to refactor itself on occasion. We'll see if it does...

One last note: I've been teaching a course on compilers since 2001, and since about 2003 I've told students to use byte offsets as their model of positions. I've always sold this by explaining that offsets can be turned into lines and columns later but we don't have to worry about those details in the basic compiler. Strangely enough I never actually wrote the code to perform that transformation, until now that is. So once I teach the course mostly in Go, I can use my own little library. Neat. :-)

Wednesday, April 20, 2016

Terminal Multiplexers: Simplified and Unified

I have a love-hate relationship with terminal multiplexers. I've been using screen for years, but mostly on remote servers, and mostly just to keep something running on logout and later get back to it again. But on my home machine or my laptop, I've avoided terminal multiplexers like the plague, mostly because of their strange (or is "horrid" more appropriate?) user interfaces.

For a really long time now, I've simply used LXTerminal and its tabs as a crutch, but I've recently grown rather tired of that approach. When you're writing (more or less complicated) client/server software, it really pays off to have both ends running side-by-side: switching tabs, even with a quick key combination, gets old fast. Also LXTerminal lacks quite a few features, true-color among them. What I really wanted to use was st, but that lean beast doesn't even have a scrollback buffer (so forget about tabs or a contextual menu).

Wait, so why don't I just use a tiling window manager like dwm and open several terminals? Sadly I've been a spoiled Openbox person since 2008 and a spoiled "overlapping windows" person since 1987 or so (thank the Amiga for that). I like the idea of a tiling window manager exactly for a bunch of terminals and not much else. In the long run I may actually become a dwm nut, but not just yet.

So I had to face it, the time was right to actually learn a terminal multiplexer for real. But which one? For text editors there's an easy (if controversial) answer: just learn vi or emacs, both of those you're likely to find on any UNIX system you may ever have to work with. (Heck even busybox has a vi clone.) That seems to suggest that I should spend time on learning screen for real: It's the oldest terminal multiplexer out there, so it's most likely to be available just about everywhere.

The only problem is that it sucks. If you want to see why it sucks, just start htop in screen. I don't care if that's a terminfo problem or not, in screen the htop interface looks messed up. It's still usable, but you know, the eyes want to be pleased as well (yes, even in the terminal). But it sucks even more: Just split the terminal and then run cat </dev/urandom in one of the splits. Chances are you'll get random garbage spewed all over the other split as well. Doesn't inspire much confidence in how well-isolated those splits are, does it?

So of course I tried tmux next, a much more recent project and maybe "better" because it doesn't have as much buggy legacy code. Sadly it immediately fails the htop test as well, but at least it does a little better when hit with the random hammer: No more spewing into the other split, but still the status line gets trashed and once you stop hammering some of the UI elements are just a little out of it. A little groggy most likely?

One more attempt, let's try dvtm instead. If you don't know, that's a dwm clone inside the terminal. And wow, it passes both the htop test and the random hammer with flying colors, only a few of the border elements get trashed. On the downside, it's least likely to be installed on a random UNIX system you need to work on. That, and it has a bunch of opinionated layout assumptions that may not be to your liking. I, however, like them just fine, at least for the most part.

At this point I started asking myself what I actually need in my terminal multiplexer, and I arrived at a rather short list of features:

  • create a new terminal window with a shell running in it (splitting the terminal area horizontally or vertically or at least sanely)
  • switch from one terminal window to another quickly and reliably
  • destroy an existing terminal window and whatever is running in it

That's really it as far as interactive use on my home machine or laptop is concerned. Sure, being able to detach and reattach sessions later is great, especially when working remotely. But for that use-case I already have screen wired into my brain and fingers. (Also it turns out that dvtm doesn't persist sessions in any which way, so I'd need to use another tool like dtach or (more likely) abduco with dvtm.)

So what am I to do? Which one of these am I to learn and put deep into my muscle memory over the next few weeks? That's when it struck me: I can probably learn none of them and all of them at the same time! After all, I have very few requirements (see above) and as luck would have it, all of the tools are highly configurable. Especially in regards to their keybindings! Can it be done?

Can I configure all three tools in such a way that one set of keybindings will get me all the features I actually need?

Let's see what we find in each program for each use-case, then try to generalize from there. Let's start with screen:

  • CTRL-a-c create a new full-sized window (and start a shell)
  • CTRL-a-n switch to the next full-sized window
  • CTRL-a-p switch to the previous full-sized window
  • CTRL-a-S split current region horizontally (no shell started)
  • CTRL-a-| split current region vertically (no shell started)
  • CTRL-a-TAB switch to the next region
  • CTRL-a-X remove current region (shell turns into full-sized window)
  • CTRL-a-k kill current window/region (including the shell)
  • CTRL-a-d detach screen

So obviously the concepts of window and region are a bit strange, but for better or worse that's what we get. I am pretty sure that regions (splitting one terminal into several "subterminals") was added later? In terms of my uses cases it's a bit sad that creating a new region does not also immediately launch a shell in it, instead I have to move to the new region with CTRL-a-TAB and then hit CTRL-a-c to do that manually. It's also strange that removing a region doesn't kill the shell running in it but turns that shell into a "window" instead. And of course navigating windows is different from navigating regions. Let's see how tmux does it:

  • CTRL-b-c create new full-sized window (and start a shell)
  • CTRL-b-n switch to next full-sized window
  • CTRL-b-p switch to previous full-sized window
  • CTRL-b-" split current pane horizontally (and start a shell)
  • CTRL-b-% split current pane vertically (and start a shell)
  • CTRL-b-o switch to next pane
  • CTRL-b-UP/DOWN/LEFT/RIGHT switch to pane in that direction
  • CTRL-b-x kill current pane (as well as shell running in it)
  • CTRL-b-& kill current window (as well as shell running in it)
  • CTRL-b-d detach tmux

So we can pretty much do the same things, but not quite. That's annoying. Note how creating a new "pane" launches a shell and how killing a "pane" kills the shell running in it? That's exactly what screen doesn't do. Also note that tmux has a notion of "direction" with regards to panes, something completely lacking in screen. And note that "windows" are still different from "panes" in terms of navigation. Finally, what's dvtm like?

  • CTRL-g-c create a new window (and start a shell, splitting automatically)
  • CTRL-g-x close current window (and kill the shell running in it)
  • CTRL-g-TAB switch to previously selected window
  • CTRL-g-j switch to next window
  • CTRL-g-k switch to previous window
  • CTRL-g-SPACE switch between defined layouts
  • CTRL-\ detach (using abduco, no native session support)

Note that dvtm's notion of "window" unifies what the other two tools call "window" (full-sized) and "region" or "pane" (split-view). But the biggest difference is that dvtm actually has an opinion about what the layout should be. You can choose between a few predefined layouts (with CTRL-g-SPACE) but that's it. One of these layouts shows one "window" at a time, but the navigation between "windows" stays consistent regardless of how many you can see in your terminal.

So what's the outcome here? Personally, I very much prefer what dvtm does over what tmux does over what screen does. Obviously your mileage may vary, but what I'll try to do here is make all the programs behave (more or less, and within reason) like dvtm.

Of course there are plenty of issues to resolve. For starters, two of the programs care about whether a split is vertical (top/bottom) or horizontal (left/right) while one doesn't. The simplest solution I can think of is to have separate key bindings for each but to map them to the same underlying command in dvtm. This way I'll learn the key bindings for the worst case while enjoying the features of the best case.

Next screen doesn't automatically start a shell in a new split while the other two do. Luckily that's easy to resolve by writing new and slightly more complex key bindings. There are a few additional issues we'll get into below when we talk about how to configure each program, but first we need to "switch gears" as it were: It's time to do some very basic user interface design.

Let's face the most important decision first, namely which "leader key" should I use? We have CTRL-a, CTRL-b, and CTRL-g as precedents, but we don't necessarily have to follow those.

I played with various keys for a while to see what's easiest for me to trigger. My favorite "leader keys" would have to be CTRL-c and CTRL-d. Sadly, those already have "deep meaning" for shells, so I don't want to use either of them. The next-best keys (for my hands anyway) would be CTRL-e and CTRL-f. Sadly, CTRL-e is used for "move to end of line" in bash and I tend to use that quite a bit; screen's CTRL-a sort of shares that problem. In terms of "finger distance" I also find CTRL-a "too close and squishy" and CTRL-b/CTRL-g "too far and stretchy" for myself.

Based on this very unscientific analysis, I ended up picking CTRL-f as the "unified leader key" for all three multiplexers. The drawback of this is that I'll train my muscle memory for "the wrong key" but between three programs, any key would have been "the right key" only for one of them anyway. Big deal. (Yes, in bash CTRL-f triggers "move one character forward" but that's not really a big loss either, is it?)

With the leader settled, how shall we map each use-case to actual key strokes? The guiding principle I'll follow is to rank things by how often I am likely to do them. (Of course that's just a guess for now, if it turns out to be wrong I'll adapt later.) I believe that the ranking is as follows:

  1. Navigate between splits.
  2. Create a new split.
  3. Remove the current split.

If that's true, then moving to the next split should be on the same key as the leader, so the sequence would be CTRL-f-f. That even works mnemonically if we interpret "f" as "forward" or something. Now let's find keys to create splits. At first I thought "something around f" would be good, the idea being that I'd hit the key with the same finger I used to hit "f" a moment before. But that's actually slower than hitting a key with another finger on the same hand. The way I type, I hit "f" with my index finger and my middle finger naturally hovers over "w" in the process. So CTRL-f-w it is, after all that's again a mnemonic, for "window" this time. But what kind of split should it be, horizontal or vertical? I settled on horizontal because in dvtm's default layout, the first split will also be horizontal (left/right). So I'll get some consistency after all. The other key that's quick for me to hit is "e" so CTRL-f-e shall be the vertical (top/bottom) split. That leaves removing the current split, and for that CTRL-f-r is good enough, again with some mnemonic goodness. Here's the summary:

  • CTRL-f-f switch to next split
  • CTRL-f-w split horizontally (and start a new shell)
  • CTRL-f-e split vertically (and start a new shell)
  • CTRL-f-r remove current split (and terminate the shell)

Sounds workable to me. All that remains is actually making the three programs behave "similarly enough" for those keystrokes. As per usual, we'll start with screen. The file to create/edit is ~/.screenrc and here's what we'll do:

hardstatus ignore
startup_message off
escape ^Ff
bind f eval "focus"
bind ^f eval "focus"
bind e eval "split" "focus" "screen"
bind ^e eval "split" "focus" "screen"
bind w eval "split -v" "focus" "screen"
bind ^w eval "split -v" "focus" "screen"
bind r eval "kill" "remove"
bind ^r eval "kill" "remove"

I'll admit right away that my understanding of "hardstatus" is lacking. What I am hoping this command does is turn off any status line that would cost us terminal real estate: I want to focus on the applications I am using, not on the terminal multiplexer and what it thinks of the world. The "startup_message" bit just makes sure that there's no such thing; many distros disable it by default anyway, but for some strange reason Ubuntu leaves it on. (I am all for giving people credit for their work, but that message requires a key press to go away and therefore it's annoying as heck.)

The remaining lines simply establish the key bindings described above, albeit in a somewhat repetitive manner: I define each key twice, once with CTRL and once without, that way it doesn't matter how quickly I release the CTRL key during the sequence of key presses. (If you have a more concise way of doing the same thing, please let me know!)

We should briefly look at what we lose compared to the default key bindings. Our use of "f" overrides "flow control" and luckily I cannot think of many reasons why there should be "flow control" these days. Our use of "w" overrides "list of windows" but since I intend to mostly use splits that's not a big problem. Our use of "e" comes for free because it's not used in the default configuration. Finally, our use of "r" overrides line-wrapping, something I don't imagine caring about a lot. So we really don't lose too much of the basic functionality here, do we?

Next we need to configure tmux. The file to create/edit is ~/.tmux.conf and here's how that one works:

set-option -g status off
set-option -g prefix C-f
unbind-key C-b
bind-key C-f send-prefix
bind-key f select-pane -t :.+
bind-key C-f select-pane -t :.+
bind-key w split-window -h
bind-key C-w split-window -h
bind-key e split-window
bind-key C-e split-window
bind-key r kill-pane
bind-key C-r kill-pane

Different syntax, same story. First we switch off the status bar to get one more line of terminal real estate back. Then we change the leader key to CTRL-f and define (repetitively, I know) the key bindings we settled on above. Easy. (Well, except for figuring out the select-pane arguments, I need to credit Josh Clayton for that. Yes, it's in the man page, but it's hard to grok at first.)

What do we lose? By using "f" we lose the "find text in windows" functionality, something I don't foresee having much use for. By using "w" we lose "choose window interactively" which seems equally useless. Luckily "e" is once again a freebie, a key not used by the default configuration. Finally, by using "r" we lose "force redraw" which is hopefully not something I'll need very often. Seems alright by me!

Last but not least, let's configure dvtm. In true suckless style there is of course no configuration file because that would "attract too many users with stupid questions" and the like. (Arrogance really is bliss, isn't it? Let me just say for the record that I really appreciate the suckless ideals and their software. But that doesn't change the fact that configuration files are convenient for all users (not just idiots!) who don't want to manually compile every last bit of code on their machines. But I digress...) So we have to edit config.h and recompile the application, which in Gentoo amounts to (a) setting the savedconfig USE flag, (b) editing the file

/etc/portage/savedconfig/app-misc/dvtm-0.14

and then (c) re-emerging the application. Here's the (grisly?) gist of it:

#define BAR_POS BAR_OFF
#define MOD CTRL('f')
...
{{MOD, 'f',}, {focusnext, {NULL}}},
{{MOD, CTRL('f'),}, {focusnext, {NULL}}},
{{MOD, 'w',}, {create, {NULL}}},
{{MOD, CTRL('w'),}, {create, {NULL}}},
{{MOD, 'e',}, {create, {NULL}}},
{{MOD, CTRL('e'),}, {create, {NULL}}},
{{MOD, 'r',}, {killclient, {NULL}}},
{{MOD, CTRL('r'),}, {killclient, {NULL}}},

There are a few more modifications I didn't show, but just to remove existing key bindings that conflict with ours. Which brings us to the question what we lose. Our use of "f" costs us the ability to choose one specific layout, something I can live without for sure. Our use of "w" is a freebie, it's unused by default. Our use of "e" costs us "copymode" which, so far, I didn't need; eventually I may have to revisit this decision and maybe remap the functionality elsewhere. Finally, our use of "r" costs us being able to "redraw" the screen, something I hope I won't need too much.

Wow, what a project. Had I known I'd spend about 10 hours on learning all the relevant things and writing this blog post, maybe I would not ever have started. But now that it's all done, I am actually enjoying the fruits of my labor: Splitting my terminal regardless of what ancient UNIX system I am on? Using fancier tools on newer machines, including my own? Debugging client/server stuff with a lot less need to grab the mouse and click something? It's paradise! Well, close to it anyway...

Update 2016/04/23: My "unified" configuration files are now available on github.com if you want to grab them directly.