1. I Asked AI to Rebase My Git Branch and Accidentally Discovered the Future (and Uncovered the Past)

    One of the things I initially missed most, when shifting from emacs to zed was magit, which has been my main git interface for IDK, let's say about 20 years 🤷.

    I'm a bit of a sloppy worker, but a meticulous git user, lots of atomic annotated commits, many linear branches, I'm cherry picking and rebasing, and stashing, and magit integrates all of this naturally into the normal flow.

    Zed has git integration, obviously, but it's entirely basic - pull, push, commit, branch, diff, that's about it. It does let you stage incrementally, which is most of the problem solved, and of course, there is a terminal integration, so you can pop a shell and use git CLI (like a farmer 🤪).

    It's OK, but clunky, obviously. A little bit of friction and dissonance in this slick world of modern native editors that were actually made this century. I think the kids are a lot looser with git than I am, and perhaps they have a point, but I like the discipline.

    Here comes the machine mind

    You probably figured out what's coming next, if you bothered to read this far.

    I've been using the zed LLM integration to write commit messages for a while - it's pretty well suited to that, given a bit of context, you can generate a draft commit message that summarises the changes, and it's you tweak and approve them before you apply. It's a pretty good example of the kind of low-hanging, small improvement, you can achieve with even simple model, precisely applied to a narrow context that involves generating prose. Smoothing out busywork.

    Obviously, zed is pretty agentic 🤢, because that stupid word is all the rage these days. I guess you can open a chat box and ask your editor to vibe code your whole application. (Good luck with that, if you do, I think that's liable to create more work in integration than it saves in writing, maybe that's just me)

    I use the agents for a bit of boilerplate here and there - refactor this, replace these magic numbers with proper constants, redo this part to use an iterator, what is the type checker complaining about here, how do I configure the language server to disable a misfeature, again, it's not too bad at doing think kind of drone effort, and there is a small but appreciable productivity gain to be had.

    I cross the streams, and surprise myself

    This last week, I was suddenly inspired to cross the streams, and something interesting happened, surprising me enough to bother drafting a post on the topic. I don't really like to thought-leader, but occasionally something will delight me enough to want to share.

    I wanted to tidy up a messy WIP branch that collected a couple of different ideas in progress, (and had also coincided with me correctly figuring out how to enable auto-linting in zed, so I suddenly had a lot of aesthetic formatting corrections dropped suddenly into an already untidy sandbox)

    A minute or two fiddling with rebase in magit, but in zed...? Time to roll up my sleeves, and flex, and take out the ol' git pitchfork, or hitch up the rebase propagator to the reflog tractor (I do not really know what farmers do). But, wait. I wonder if...

    So I pull up the agent and ask it "can you run an interactive rebase on this branch please, and group all the white-space only changes into one commit, the other formatting changes into another, and then separate the removal of the obsolete class from the other feature work?"

    And, it kind of worked! It got stuck a couple of times, and I had to pop in and edit a couple of things, and I restarted it over with the order of the commits I wanted a little more explicitly instructed once I recognised where the conflicts were going to land, but I got the result I intended, certainly with no more fiddling about than I would have had to do if I'd been performing the task manually, maybe less? It's hard to measure, but I enjoyed the experience.

    One thing I definitely realised. It was less irritating. I was able to complete a disruptive, yet necessary chore, while in the middle of doing something more interesting, with much less context switching than performing it manually. And that got me thinking about interfaces a bit more. Chat bots are unquestionably a very ergonomic interface.

    Increasingly, I am starting to think that a key part of unlocking the value of LLMs, may be through thinking about them more as solutions for interface design problems.

    Considering git interfaces

    Git is a classic case. Git's interface sucks space balls, everyone knows it. I mean I know a few people who like it, and I'm happy for them (but I think they are weirdos). It does allow for a bunch of studly machismo and nerd flexing, for anyone who has invested enough time in learning its arcana to impress people with stunt-git trick shots, and that can be fun. I have definitely enjoyed being the knuckle-cracking "stand back everyone and stop panicking, I know how to fix this" guy on a number of occasions, but that is a sideshow. You can do circus tricks with power tools, and some people do, but it's not the reason the tools were made.

    Git has a compelling storage model, and a commit graph workflow that solves a bunch of annoying challenges with incremental and concurrent code editing and integration, efficiently and better than previous source code version systems. That's why it became a huge success.

    Git's horrible ergonomics and implicit barriers to entry, but compelling powers of sharing and integration, allowed GitHub to spring into existence, as a multi trillion zillion company from out of nowhere, just by slapping a nicer set of user abstractions on top of git's ugly robotic core. (in the process accidentally inventing some other significant ergonomic problems, like pull request based workflows, and IDK, tag driven releases, but I guess that's a different blog post)

    Git's foul ergonomics are what pulled me into learning magit, which has a reasonably steep learning curve of its own, but also follows the emacs way of having a lovely manual. It's a better thought out UI. It also leverages many common emacs behaviours, so when you're working in emacs already, which I typically was, again, you get this reduction of context switching.

    The mythical 'Flow State'

    Why is this so important? At this point, it's tempting to dive off into a side bar about "programmer flow state", a long held shibboleth of the developer community about which I don't have much truck, but many thousands of words can be found about it on the web already. - I don't like anything that reinforces programmer identity as a higher state of being, and I dislike how easily this concept is weaponised towards shunning collaboration and social work (e.g. "coders must not be interrupted during holy flow state") , again this is clearly a different adjacent blog post - but the notion is not fundamentally baseless.

    Programming tasks often require holding a lot of accrued context about a chain of thought, and carefully expressing those in a narrow, precise domain, incrementally progressing towards a well defined future state. It's not easy for a human mind to do that, it takes a bit of effort. Effortlessly breaking out of one domain into another mode of expression isn't really possible. To do that even passably well, perhaps you would need a different kind of "mind", even?

    I think version control is interesting here, precisely because it's liminal stuff. It's programming-adjacent work in a certain sense, it's a chore - any time you have to break thread to address some version control nonsense or busy work, you are in essence interrupting yourself. It makes a lot of sense to try and find a low friction simplified user interface to mediate these kind of tasks. Like GitHub, or magit. The ideal is to minimise the amount of disruption you face while working on these background or side channel threads of work. You can mitigate against this in two main ways I think. You can look for ways to simplify the UI to better suit a particular working context, as we have been discussing; alternatively you can divide the work and make the secondary context a first class task that's managed separately.

    One way to do this is to structure the way you work so you can plan your version control stories a little ahead of time, and use discipline in your task management to make it better fit onto the VC, with strategies like formal branching and ticketing protocols, rigorous task mapping that accounts for tech debt, probably integrated into a project management system.

    Another way you can do it is to make it literally someone's job and push the load out sideways - examples of this might be code review protocols, gatekeepers for merges (in the olden times, with less sophisticated version control software, I've often worked on teams where there was a nominated 'merge master', whose entire job, or a large portion of it would be to basically do the integration and harder version control work on behalf of the feature developers), also other adjacent roles like scrum masters, or DBAs.

    How about SQL ?

    That train of thought got me thinking about SQL. I think SQL is another interesting example of 'programming-adjacent work', although it might be a bit more subtle of an example than it first appears. Let's have another digression then. I really like SQL, although it's obviously covered in warts and sharp edges, I've always appreciated it's utility, and to a certain extent it's ergonomics. It does share some of the properties I've discussed with git - it's very often a boundary, and context shift away from the main thread of programming, and programmers tend to hate it, and like to avoid it, and make up nasty memes about it for slack, and that kind of thing - just like version control (or meetings 😘), it's essential and necessary work in a lot of software development, but it's another liminal place, where you have to get pulled out of the context of thinking about your feature work, and software architecture, and land in another place for a while with an annoying external syntax, and a lot of aggravating round trips into different tooling.

    Indeed, over the years, a very common programming pattern is to try and slap a more program-ergonomic abstraction layer in front of the SQL, once again to try and minimise the friction and narrow the interface - I'm thinking of things like various ORMs, 'noSQL' database engines that bring the data modelling and querying closer to the application layer, all of them moderately successful, and yet SQL still hangs around everywhere, slightly annoying everyone, like a remote senior cousin inevitably invited to every wide social gathering, tolerated, rather than enthusiastically invited.

    That's because SQL is already an ergonomic abstraction. It's kind of the ur-DSL. SQL is there because databases have a lot of inertia associated with them. All the data is often where the money is. Data is the raw stockpile of materials, the raw ingredients of the information that's necessary to run large information technology applications. Data tends to accrue value cumulatively, and you want to keep it all in a big lump in one place (this is why we have terms like 'data-mining' and 'data-warehousing'), so you can correlate it, and leverage sexy network effects from having it all integrated into one humongous data domain. Once you pass a certain critical data mass, you need to access it multi-modally , i.e. there will be many different use cases for the same information sets, different users and different applications will emerge from, or require access to, various intersectional pieces of that data blob. Now, reading and updating that data blob by hand, in your preferred programming stack would really sting. You'd still have to leave your application software context, but you'd now need to delve into a world of low level file systems, and data packing, and indexed data access, and write locking, and concurrent editing, and wire protocols, and cache invalidation, and the whole nine yards of that side of computer science.

    I pause a little here, because I realise I'm probably making it sound kind of fun to a particular audience segment (amongst which I include myself, periodically), but let's not lose track of the core point. If your intended task is to make a cool dating app that helps your users get laid, low level storage systems coding is a horrible, high effort context switch away from the feature track you're working on. And of course, all this data access code you're having to do while you go will also need tracking in your version control system, more context switches. Instead, we SQL.

    LLMs suck at SQL though

    As an aside, I have found LLM coding assistants to be generally pretty bad at SQL writing. I have two hand-wavy personal theories about why this is the case

    1. Programmers, on average, in my experience are really pretty bad at SQL, probably because of the reasons we delved into above. So the training set of the models is full of low quality material. (cheap shot, maybe? 😅 Cut me some slack, I've spent many hours of my life fixing other people's bad SQL)
    2. Effective SQL generation requires a lot of external context the model doesn't have - not only the entire database schema, but also ideas about the data contents, and distributions that a pre-trained model doesn't necessarily have any access to. I think getting good SQL results from LLMs would need a lot of context prompting, or specific training. Still doesn't entirely explain why they're so bad at basic join syntax.

    I'm tempted to infer something from those two sub-points about SQL requiring fundamentally different kinds of reasoning to other forms of writing, but I'm probably just seeing the face of Poseidon appearing in the patterns of my mental sea foam... I'll leave it there for now. (a third parallel blog post? This stuff is getting fractal). BTW - The name for that fascinating phenomenon is *pareidolia* , and it's something worth keeping in mind when discussing "AI" concepts...

    The surprising invention and nature of SQL

    Ahem... Another really interesting aside is to have a look back at the history of SQL. SQL is rather old. It's basically my age, and I've already pointed out I've been using emacs professionally for at least a few decades. SQL emerges from IBM in the 1970s, as a research project, greatly influenced by E.F. Codd's classic article 'A relational model of data for large shared data banks'

    The primary designers of SQL were Don Chamberlin and Ray Boyce, part of IBM's system R research project, who were tasked with looking into ways to apply Codd's relational / mathematical principles of database modelling to IBM's database businesses. IBM's database business at this point was most of IBM's business, and IBM's database business was pretty huge. Prior to relational databases, the existing big iron database management systems were awkward weird transactional / hierarchical databases, like IMS/360 where you pretty much had to write a specific computer program to be batch executed from a queued transaction management system. Each 'query' was more akin to an independent program. In order to change the report, you'd develop a new program, and you would need appropriate programmer time and skill to do it, and your best turnaround for results would be several hours, probably more like days.

    So the system R researchers wanted to make this more flexible, but their ambitions didn't end there. Chamberlin and Boyce wanted to make information retrieval accessible to non-programmers. Here's Chamberlin

    "Ray and I hoped to design a relational language based on concepts that would be familiar to a wider population of users. We also hoped to extend the language to encompass database updates and administrative tasks such as the creation of new tables and views, which had traditionally been outside the scope of a query language.[...] What we thought we were doing was making it possible for non-programmers to interact with databases. We thought that this was going to open up access to data to a whole new class of people who could do things that were never possible before because they didn’t know how to program."

    Can you see where I'm heading? SQL is a frantically successful example of what we used to call '4GLs', (fourth generation languages), when I was a school kid, (although by that point, the text books - and what we didn't yet call the hype cycle - were already breathlessly excited by the imminent arrival of the Fifth Generation Languages, and systems...). The terminology is dated, and stretchy, and marketing-fed, but the central gist is - 4GLs are languages that were operating at a higher abstraction level. Your inputs and controls would describe a program at an abstraction level much higher than the operation of the system, and the 4GL would write a program for you that ran at the lower level. All a bit hand-wavy, but SQL querying has some really interesting properties related to this delegation.

    • it's declarative. Your query describes the data structures you want to retrieve, and the actual details of how the data is retrieved from disk is decided for you by a query planner.
    • it's live and interactive - you can interrogate the system interactively at a REPL
    • it's got a weird-as-hell syntax full of special cases that tries valiantly to use ENGLISH LIKE words, all IN UPPER CASE SHOUTING that deal in terms of database structures and relational operators, not computer terms like bytes and loops and sorts.

    The query planner is worth a little thought - The query plan uses a bit of maths, and some heuristics and a bunch of information and sampled data about your system, and works out a series of reads and sorts and filters that produce the data structures your query is requesting. Crucially you don't tell it HOW to do it, just WHAT you want it to do. (sorry, the upper case is a bit addicting, I'll stop). Most of the time you don't think about it too much more than that. However, most SQL database systems will show you their plans if you ask them, typically by using the EXPLAIN keyword, which will show you what the planner thinks it should do to build the result set you wanted. If you don't like what it's decided you can't tell it to do things differently, but you may teach it to do things differently, typically by updating the available indexes and constraints, or maybe by re-balancing the statistics it uses to decide about cardinality and seek times, and that kind of thing.

    Here again we have a division of labour - the idea is that the structural and statistical and optimising and runtime bits of maintaining the SQL system can be delegated to the programmer and technician classes, who can be more concerned with the implementation and operational parts, and the query writer (a non programmer, if you remember) can just get on with expressing their tasks in a lower friction, narrowed domain, where they don't have to context switch out as hard from their task at hand, writing lovely business reports for the sales and finance teams to make slide decks from.

    Now I'm not saying that SQL writing is 1970s prompt engineering, but I'm also not not saying that, right?

    Here's Don again, after the fact

    "Ray and I were wrong about the predominant usage of SQL. Typically, SQL is embedded in a host programming language and used by professional programmers"

    Is SQL any good though?

    SQL was, as I have said, a spectacular success. It's still bloody everywhere, fifty years on. It was also an abject failure. The syntax is a mess of unaligned clauses with special cases everywhere (INSERT and UPDATE have such radically different approaches to clauses but sort of do the same thing. So does DELETE really). You need to understand maths to do it properly. You also need to understand the precise details of the database schema. The database schema is maintained in a separate dialect that's somehow intertwined with the querying DSL and uses the same interfaces, but again obeys a different syntax and semantics. Record locking is both implicit and explicit. The concurrency model is insane and nobody actually understands it properly. NULL breaks everything including query logic. And most damningly, it's never used as an interactive REPL by non-programmers. It's folded into programs mostly. In fact these days, an eye-wateringly huge number of applications bundle an entire SQLite embedded RDBMS inside their application deployment.

    Repeated interface patterns, and why I care

    There's a pattern emerging here I think - querying and reporting wasn't quite entirely programming adjacent busywork, but it was business-adjacent drone work, and SQL was an attempt to narrow the interface with an ergonomic DSL that got out of the user's way and reduced the scope of the context switch needed to engage with the reporting system. It kind of failed at the interface, if you ask me (and most folks who have to use it), but it did succeed amazingly at reducing the complexity of the context switch. Using SQL to mediate data persistence inside your application is kind of like using magit to fold your git operations right inside your emacs workflow, they both serve to shrink down the cost of flipping out of your primary task domain, into some other essential, dependent domain.

    Either reduce it to a tight DSL, or extract the work and organise it so it can be delegated to another worker. Sometimes, you put a DSL on top of a DSL, to tune it down even further. Sometimes you build a team to own the work in this domain. What if you build a DSL over a DSL but the DSL could sort of work like a team you can delegate tasks to? That's a compelling interface, that is.

    When I look at it like this, using an agent REPL in zed to run git operations strongly reminds me of that (failed) SQL/4GL promise. I tell the thing the result I want from it, and it builds a plan of execution, and I can iterate on that, interactively. Only this time, I'm using literal English sentences to describe the narrow domain the system has trained itself on, not some bastard awful pidgin version of it, that somehow ends up with the worst features of both natural languages, and programming languages. Sorry SQL, it has to be said. And you know what, I kind of love you anyway.

    I think this pattern can be found in several places of software development, if you squint right. Programming-adjacent work gets reduced with DSLs or narrow abstractions, which bring the benefits of reducing that tedious, expensive context switch. I guess we have lots of it in CI automation, pipeline building, that kind of thing. DevOps is built out of this stuff, infrastructure as code, deployment charts, meta deployment charts, run-books, playbooks. These domains are also places I've found LLM-based assistants super helpful - help me grind out some yaml please so I can add a pull request pipeline that does this thing I just thought of, without me having to spend quite so much time reading up the stupid YAML syntax for this weeks CI system, and spending hours sitting in a push/fail/edit/push loop on some git forge. GraphQL over REST apis - maybe? Unit test generation and test harness design is mostly busywork in a DSL following some declarable constraints. I think I already mentioned figuring out the precise type annotations for things. It makes me think that coding-assistants are perhaps more of a user interface paradigm than they are a coding one. More like a 4GL than a semantic IDE.

    In conclusion

    What's my point? I'm not completely sure (he says after several thousand words)

    • I like zed, I'm finding it more useful than I thought I would, and I've stuck using it on and off for a few months now.
    • I keep finding small useful things for LLMs to do, and I enjoy that process.
    • The most value might be in 'programming-adjacent' things - e.g. as I mentioned, I'm now personally only writing a small proportion of my commit messages by hand, and I think my commit messages are considerably better off for it.
    • Tools that reduce the context switches for 'programming-adjacent' work will win
    • Conversational interfaces are very low friction for human minds.
    • LLMs are much better than humans at context switching to a different domain, while keeping track of and applying accrued context across different tasks
    • Is there something you need to do that applies a tight but boring DSL across a defined data set? A model might actually be quite good at that. Another weird thing I've found them almost spectacularly good at is networking and debugging - describe a topology, give it a tcpdump, and watch it spit out a bunch of diagnostic suggestions and potential remedies for you to use.
    • These tools are one of the biggest tech shifts I've seen in my rather long, slightly storied, career that spans a whole bunch of tech paradigm shifts.

    Summary footnotes

    1. This is just some opinions, inspired by how astonished I was to successfully use an "agentic" tool to run some gnarly git stuff I would have had to pull out the manual to do. There's all sorts of important discourse about LLMs and the out of control tech hype cycle around them that I don't go near, and I'm not writing a manifesto
    2. I'm not a historian, or a first hand witness of the pre-RDBMS data scene, although I did work alongside people who came from that background, so I do have a lot of secondary source exposure. I haven't done much research other than light web searching, so please take my historical characterisations with the appropriate amount of salt. I know pretty much zilch about how IMS/360 actually worked, I just remember the name, I'm generalising.
    3. I wrote this blog post by hand, it's not the sort of thing I think LLMs are much use at. I'm trying to express my own opinions here, in my own voice, because I was excited about a couple of ideas and wanted to try sharing them
    4. I wrote this blog post in emacs though.
    5. I probably won't write any of those other parallel blog posts, who has time to write blog posts?
    6. I did use Claude to fact check the post. Don't laugh, they can actually do that sort of thing now. This stuff moves fast.
    7. Finally, I feel I ought to note that Ray Boyce died tragically young at 27 in 1974, shortly after the presentation of the SEQUEL design paper. He left an astonishingly outsize impact after such a short career. Put a dent in the world, as they say. Don Chamberlin, happily, is still with us so far as I know.
    posted by cms on
    tagged as
  2. Last week I started a new job and with a new job comes a new computer. These days, I am once again a fairly comitted desktop linux user, for my sins, and so I asked for a Lenovo ThinkPad x270. Not without trepidation, because even though I'm a fairly expert user, it's been some time since I put a linux distribution onto a modern PC-laptop flat, and more recent hardware can present some driver challenges. It all went on pretty well, nonetheless, with just a moderate amount of tweaking, and a week or so in I can report that I'm pretty delighted with it. It's a wonderfully solid and useful piece of kit, everything works. Screen, keyboard and portability are spot on, battery life is a phenomenon, and there's at least one of every kind of useful port I care about.

    It's my first ThinkPad with a 'chiclet' keyboard. It's my first without a seven-row keyboard actually. I was a little bit worried about that. Keyboards are one of those things you like ThinkPads for, if you're the kind of person who likes ThinkPads, and of course I am. Actually, the keyboard is great. I think I can type faster on it than I can on my well-loved x220 model, which is basically my high-water mark for a laptop keyboard. Trackpoint is present and works as well as ever, trackpad is a huge improvement. I am not going to say that I wouldn't like the missing keys and ThinkLight back, but I'm not aggravated by their absence. After all I can use my 3l337 remapping skills to make sure I have everything I need somewhere that I can access it, and the less often used things can just go on mod key combinations and function shifts. It has the makings of a truly great keyboard if I'm honest, although I accept these things are subjective. There was just one amusing wrinkle though.

    IMG_20171111_150121

    For some reason they've put the PrtSc key in where the menu key was. This seemed pretty weird, but it could be worse. At least I still have a balanced group of three modifier keys either side of the space bar. It goes LCTRL WINDOWS ALT SPACE BAR ALTGR PRTSC RCTRL. I just modified my xkb settings very slightly to redefine PRTSC, and I was back to using my happy path of SUPER LMETA LCTRL SPACE BAR RCTRL RMETA SUPER, and emacsing about with gay abandon. Right up to the first time I hit backward-sexp whilst cheerfully editing code, and to my astonishment my laptop immediately rebooted without any warnings. I was so stunned I immediately tried that again. Same result. I was dumbfounded for maybe sixty seconds before I figured it out.

    PrtSc is an old key, although unlike many of the old dedicated PC buttons, (Scroll lock anyone?), it's managed to reinvent itself for modern generations. Typically it is used to trigger a screenshot. GNOME sets it up for that, and while I was remapping it I figured I would be able to manage just fine without a dedicated key for screenshotting. Print screen often used to share a key with another ancient button, SysReq, and System Request is a really interesting beast. Turns out, even though it's not labelled like that, the PrtSc key on my x270 was also a SysReq. And system requests are the key to this laptop narcolepsy.

    System Request was a button deliberately designed to bypass as much of your software as possible, and send a hardware interrupt direct to the operating system hardware event loop. Normal keyboard handling is entirely bypassed. It's a brain probe. No matter how elaborate your interface, or hotkey macros become, you have a dedicated batphone right there on your keyboard, a zap line into the mainframe. Even in it's most locked up system crash, this is a signal that could still get through.

    Originally, SysReq had it's own proud dedicated button. Then, as it's usage was a little bit esoteric, it became seconded to PrtSc. If you wanted to access the magic zap you still could. You just hit Alt in conjunction with PrtSc. And that happens to be the second piece of our puzzle. SysReq lingered on over there for some decades, largely entirely unused. A vestigial organ, like an appendix, or a supernumerary nipple. One of those dorky joke keys on a PC nobody understands or uses, that cool Apple systems condesncendingly wink at. Linux uses it though. Linux doesn't mind being dorky, and can always use a spare modifier key. Especially one with a hardware function.

    It's called the Magic SysRq key. Linux has a special interrupt handler sat there in the kernel listening for it. You can hit Alt + SysReq and then another key, and trigger special, super low level system recovery or debugging features, such as triggering a crash dump, forcing an OOM kill, or yes, rebooting the system. And that's where I was hitting it. With my remapping in place, ALT is CTRL, PRTSC is META, so when I am editing a lisp file, and hit C-M-b to move backwards one sexp, I'm actually banging on the chord that bypasses all my software stack, and pushes a reboot lever deep in my computer's lizard brain, which it dutifully obeys. A little bit frustrating, but honestly, as soon as I figured out what must be going on, it made me chuckle out loud.

    Linux being linux, it's entirely configurable of course. You can build a kernel with the feature missing, you can disable it in software, or you can configure a bitmask to define which key sequences are trapped and acted upon. I have opted to disable it for now. I would rather have my META key where I like it to be, than have an easy access debugger's powertool. Now everything is closer to perfect.

    posted by cms on
    tagged as
  3. Why I care about keyboard modifiers

    This is a merry dance.

    One of those things I generally expect to be part of the routine of running a linux desktop is a certain amount of manual effort necessary to keep things running smoothly. Sometimes this is the classic multi-decade horror story of "sound card" configuration. Sometimes its the inevitable friction between under-specified hardware and volunteer-maintained drivers. Sometimes it's the CADT principle, where everything changes around you, just because it can, as part of the generational cycle of collaborative software development. Sometimes it's just the self-induced consequence of having a system where you can tweak and configure everything to work however you'd like it to, and therefore you choose to, and sometimes that's quite a deep rabbit hole. This tale covers a small handful of these categories, although it's primarily a consequence of that lattermost case.

    Mod life

    Mod keys. Modifier keys, that is. Those would be the keys you hold down alongside other keys to change their behaviour. The most obvious and venerable of these is SHIFT. Hold that down and type an alpha-numeric key, and you generate a different character. With the alphabetical keys, you GET THE CAPITAL LETTER FORM. Other keys, like the numerals, get you punctuation. You may also be aware of the Alt/AltGr modifier keys, which hang around on most keyboards, and generally allow access to a different shift level, further symbols and accented characters. And then there's control, or CTRL. Maybe you know that guy as the menu-shortcut accelerator key, or maybe for a couple of shortcuts you might use in the shell, if you're a shell user. Actually the kids all call it 'terminal' these days, because that's what Apple does. And then they all use iTerm anyway, for I don't know why really but I'm sure it's great. Anyway, calm down Mac-loving readers, this story is about how terrible linux is, you're all wonderful. So CTRL in the shell - CTRL + C cancels things, CTRL + D ends a session. CTRL + A takes you to the start of the line. CTRL + E takes you back to the end. Assuming you're using a fairly standard bash shell. Those last two are slightly more interesting, and immensely relevant to this story.

    Enter Emacs

    They're readline bindings. Readline is a GNU library used to make command line shell editing a little more interactive. And because bash is the GNU shell, it uses readline by default. Those keybindings are the default readline bindings, and they work the same in any application that uses readline. Typically this means other interactive shells. These keybindings are taken from Emacs. Emacs is a text editor, that is to say it's an application for interactively working on so-called 'plain text' files. Not that's there's any such thing as a plain text file. Emacs is one of the most ancient, convoluted, complex, crufty, awkward pieces of software you're ever likely to encounter. It's one of the original fundamental components of the GNU system. You might say it was the standard editor. Emacs is also one of my all-time favourite things. So those readline keybindings we were discussing, are intended to bring some of the more capable text editing commands from the GNU text editor across to the GNU shell, making use of modifier keys. Emacs really really likes modifier keys.

    Because Emacs is a very old piece of software, it's design was heavily influenced by the keyboards typically used on the systems of its time. Computer use was a lot more text and command oriented, and the large, pre-PC era keyboards tended to reflect this by having a large amount of mod keys and function keys available. Commonly cited examples are the MIT or symbolics lisp machine 'space cadet' keyboards, and the Knight keyboard. As a consequence of this emacs can understand a lot of different modifier keys, and has a UI that is organised around layering functionality onto different keyboard 'chord' operations. You really need at least two distinct mod keys as a bare minimum to get emacs to do anything useful at all. We already met CTRL a couple of paragraphs back, but you also need another key called META. Emacs uses them in fundamental, and interestingly composable ways. For example, you can move the cursor forward one position by typing CTRL + F, but you can move the cursor one word forward by typing META + F. It's powerful, and sort of intuitive once you understand the fundamentals quite well. Unfortunately, they mostly stopped making keyboards with META keys on them some while back.

    Welcoming the X Window System to the fray

    Now I am getting quite old, but I'm not ancient enough to have run emacs on pre-Internet era hardware. I did use it a little bit on 7-bit serial terminals, and limped along using ESC as a prefix modifier, like a farmer, but by the time I started really learning how to use emacs to any serious degree, I'd made the jump to UNIX machines using X11 as a graphical user terminal. Some of the UNIX workstations had a META key. Some of them didn't, but had a few other modifier keys. Increasingly, UNIX graphical workstation started to mean 'Linux and XFree86 on PC hardware'. Now IBM-derived PC keyboards don't have a META key and never did. The original PC keyboards didn't really offer many modifier keys, but by this time period, everything had mostly standardised on the 101/102 key IBM extended model archetype. This doesn't have META keys, but it does have a pair of prominent ALT modifier keys. And so, we begin to remap.

    X11 is maybe one of the canonical reference points for design by committee. Fully intended to offer a portable graphical , networked user interface across a variety of dissimilar UNIX systems, it tries very hard to offer the broadest possible set of abstractions across similar base behaviours, trying to build a unifying API in all aspects. Screen dimensions and orientation, color model and layout, pointers, input devices, key-types, you name it. So you can usually configure your equipment in a bewildering, verging on frustratingly flexible manner. X11 is a very broad church and welcomes all kinds of keyboards. X11 allows you a whole byte for modifier keys (I think), so you can have Shift, Lock , Control, and then five others called Mod1 through Mod5. You can freely map key codes onto key symbols, and then assign key symbols to one or more modifiers. So obviously this all took fourteen hours to decipher in the first instance, but I gradually became reasonably adept at using the Xmodmap utility to set ALT to be both ALT and META, CAPSLK to be another CTRL and life was mostly good. You'd tweak your .Xmodmaprc file every time your keyboard changed significantly, load it in as part of your login, and everything would work. PC-104/105 keyboards came along, with windows keys, and this meant that you could perhaps add a SUPER or even a HYPER key, and bind those to other emacs macros. The system was working, and everyone got rich on the proceeds! Or not. Nonetheless, although linux desktop software was fairly terrible, it was a fine environment for running Emacs, and running Emacs was where most of the work got done after all.

    A detour into Macintosh

    Times have changed however, and systems have changed, and uses have changed, and so have I. Like everyone else, I started using laptops more. Modifier keys started getting scarce again, as the machines shrank down to be portable, and interfaces just grew ever more graphical. For about a decade, I used Macintosh systems, which represent their own series of keyboard configuration challenges. Macs are actually pretty good for modifiers, if I'm being fair - they have their own dedicated command key for all the system key shortcuts, and so you're free to map control and option as you see fit to control and meta. They even give you a little GUI configurator for managing and assigning modifier keys, which is way more convenient than spending hours searching for information about xmodmap. The main suckitude is that they don't have a symmetrical set of them on their laptop keyboards. You don't get a right hand CTRL. Symmetry is important for healthy typing habits with key chords, because it's vastly better for your hands if you use both of them for combinations. So you ideally want to be able to hold down CTRL META SUPER or some combination of them with one hand whilst you type the activation keys. So CTRL + C is best expressed as a right hand finger holding CTRL, whilst your left middle finger taps the C. So my life as a Macintosh Emacs user was constantly blighted by crazy-ass schemes to find keyboard layouts that allowed unstressful ways to type CTRL key combinations.

    Desktop Linux in the present day

    For the last few years though, I've been back on the linux horse (and why is a different story, for another day), and my main laptop, a battered lenovo ThinkPad, has a full set of three modifiers either side of the space bar, where they were intended to go. The Debian GNOME 3 desktop is configured to use the windows and menu keys for desktop commands, and the ALTGR key, which I have on the right, as some kind of compose prefix. Even thought it's X.org now, not XFree86, and Xmodmap is heavily deprecated in favour of the XKeyboard and Xinput extensions, using the GNOME configurator and then some of my old Xmodmap ways, I could make this go away, and map ALTGR to a right meta, ALT to a left meta and the windows and menu key to SUPER and HYPER. The lenovo x220 I use has a particularly excellent keyboard and all was right in the world.

    And then GNOME 3.22 switched to Wayland as a display server, rather than X. And this year's Debian defaulted to this. Even though there is an X11 compatibility layer, GTK+ and GNOME on Wayland do not talk to X11 directly for mediated key events any more, and this meant that Xmodmap can't be used to universally set modifier maps. GNOME 3 on wayland will still use xkb for key configurations, and this meant another fourteen hours of fiddling about in order to come up with a keyboard scheme that works for both GNOME and legacy X using the XKeyboard extension (XKB). This was not made any easier by the fact that all the attempts to search for information on this get bogged down in legacy explanations about Xmodmap or how to enable XKB for X11. But I got there in the end

    It seems like there's not actually any supported, or easily documented way to load user configurations into GNOME 3 + Wayland's XKB environment, so I ended up slightly disappointedly hacking them into the system options files. Of course this meant that several months after I did this, a system upgrade overwrote all of my changes, and I was left without a keyboard, and a scant recollection of how I ever did it, or what any of the bits were even called.

    Finally I fix it

    So this morning I figured out how to assemble it all again from first principles. To make it more worth my while, this time I decided to transpose all of the mod keys as I went, so I can have CTRL on the inside of META as it was originally intended to be, and push the other modifiers to the outside edge. To save myself the bother the next time this breaks underneath me, I thought I'd write down the exact sequences here. I am not going to try and attempt to explain XKB here. There are a several documents on the web that do that job, to varying degrees of success. I'm not going to pretend that I understand how it all works, I just experimented with xsetkbmap and xkbcomp under an X11 desktop until I understood how to express what I needed to work under Wayland. Here are the steps.

    System-wide keyboard configuration is fine for configuring the basic keyboard layout - using the Debian keyboard configurator, I can pick either a ThinkPad or a pc-105 model with a gb layout. The modifier layout can then be selected using xkboptions. I can tell GNOME what XKB options to apply from its database, using the dconf configuration key /org/gnome/desktop/input-sources/xkb-options.

    dconf editor in all it's glory

    If you're playing along at home, you may have spotted that cmswin is not the name of any valid xkb layout. The wrinkle is that none of the built-in options offer quite the right set of combinations. So this is how I added my own custom XKB option.

    1: Define an option

    I added a new file usr/share/X11/xkb/symbols/cmswin to define my partial keymap.

    Its contents:

    // alts are ctrls, winkeys are metas, ctrls are supers  
    partial modifier_keys  
    xkb_symbols "cms_modkeys" {  
                replace key <LALT> { [ Control_L, Control_L ] };  
                replace key <LWIN> { [ Alt_L, Meta_L ] };  
                replace key <LCTL> { [ Super_L ] };  
                replace key <RALT> { [ Control_R, Control_R ] };  
                replace key <MENU> { [ Alt_R, Meta_R ] };  
                replace key <RCTL> { [ Super_R ] }; };  
    }; // end  
    

    that defines the option.

    2: Add it to the rules database

    Further to this, I modified /usr/share/X11/xkb/rules/evdev

    adding the line

       cmswin:cms_modkeys            =       +cmswin(cms_modkeys)  
    

    to the section

      ! option        =       symbols  
    

    I believe this is adding an option named cmswin:cms_modkeys to the dataset assigning it to parsing the 'cms_modkeys' entry from the 'cmswin' file, in the symbols subdirectory. The convention in xkb is to name all the different symbols using the same substrings, and it's terribly confusing when you're trying to remember which part does what, although slightly helpful when you're trying to perform the reverse map and locate which file is responsible for which option, I suppose.

    3: Make it available for GNOME

    The final step is to add the line

    cmswin:cms_modkeys   fix keys for emacs  
    

    into the file /usr/share/X11/xkb/rules/evdev.lst

    I think this does something like import the option into the environment. There is also an evdev.xml file in the rules directory, which looks like it marks up the options to be used by the GNOME gui, but I didn't bother with that one, because life is too short to hand write XML for computers to parse, and I'd already spent half a day setting this all up. To give you an idea of how tedious this all was, for a while I'd added the evdev option into the section marked !option = types rather than symbols, and this caused wayland to stick to a crash loop as soon as I loaded the XKB option into the dconf key (with no visible error logs! yum!)

    4: Retire, rich from the proceeds

    With all of this in place however, everything works fine. For now. GNOME seems to be in a bit of a transitory phase with regards to keyboard and input configuration, it looks like they're reworking everything to use IBUS in the long term, so I expect I'll be doing some form of this dance again within a year. Until then though, this document can serve as a reference for the next time I, or anyone else interested enough needs to figure out how to do this.

    2017 then, and nothing seems to have really changed that much at all. Desktop Linux is still terrible, and desktop linux is still awesome. Emacs is still terrible, and Emacs is still the best tool I have.

    posted by cms on
    tagged as