Understanding Keyboard Events BetterWritten at 2023-12-02 - Updated at 2023-12-04
In this essay, I describe how I made a Node.js module to listen to keypresses across the system on Linux machines using X. This experience helped me grasp how the OS and Window Managers handle keyboard inputs, clarifying the reasons behind an unexpected behavior I had encountered before, which I also mention in the essay.
If you’re interested in learning more about how keyboard events are handled, this essay might be of interest to you.
An Issue About Remapping Keys
As a person who uses VIM for many things I do, seamlessly transitioning between VIM modes is essential for my workflow. However, the default key for returning back to normal VIM mode is the Escape key, necessitating me to remove my hands from the keyboard in order to reach it.
This is exactly why many VIM users, including myself, remap the Escape key to the Caps Lock key. This adjustment is particularly beneficial for those who are already accustomed to pressing the Shift key for uppercase characters.
However, I’ve encountered a minor issue with this setting: Certain applications seem indifferent to the remappings I’ve configured. Occasionally, when I press the Caps Lock key, the operating system interprets it as an Escape key press, while the specific program I’m using still recognizes it as the original physical key pressed.
I have encountered this problem both in my Linux computer, and Windows computer. So the problem itself is OS-agnostic. However I’ve unintentionally identified the reason behind this occasional discrepancy while I was working on a recreational project.
Linux and Keyboard Events
Recently, a friend asked me if it’s possible to create macros using Node.js. I confidently said, ‘Sure, it’s probably easy.’ After a quick search, I found a desktop automation library called robotJS and wrote a simple script where specific keys are pressed regularly by the script.
However, I started wondering if it was possible to trigger those keypresses after a user presses a certain key. To achieve this, I needed to listen to keypress events on a system-wide level. I searched for suitable Node.js libraries for this task on Linux, but I couldn’t find one that worked seamlessly.
There were libraries like
iohook, but they seemed to lack support for
listening to Linux keyboard events in the latest versions of Node.js. Some
solutions only focused on capturing keyboard events within the current window
associated with the process.
I stumbled upon a library called
xev-emitter but it didn’t provide what I
needed as it mainly dealt with listening to xevents of a specific X windows.
After some contemplation, I decided to create my own Node.js module using
xinput underneath, just for the sake of it and out of curiosity.
is a Linux tool that allows listening to keyboard events and provides an
interface to monitor events from connected keyboards.
For instance, running the command
xinput gives me a list of available input
devices connected to my PC:
⎡ Virtual core pointer id=2 [master pointer (3)]
⎜ ↳ Virtual core XTEST pointer id=4 [slave pointer (2)]
⎜ ↳ 2.4G Mouse id=10 [slave pointer (2)]
⎜ ↳ 2.4G Mouse Consumer Control id=11 [slave pointer (2)]
⎜ ↳ Synaptics TM3336-004 id=14 [slave pointer (2)]
⎣ Virtual core keyboard id=3 [master keyboard (2)]
↳ Virtual core XTEST keyboard id=5 [slave keyboard (3)]
↳ Power Button id=6 [slave keyboard (3)]
↳ Video Bus id=7 [slave keyboard (3)]
↳ Power Button id=8 [slave keyboard (3)]
↳ 2.4G Mouse id=9 [slave keyboard (3)]
↳ 2.4G Mouse System Control id=12 [slave keyboard (3)]
↳ Ideapad extra buttons id=13 [slave keyboard (3)]
↳ AT Translated Set 2 keyboard id=15 [slave keyboard (3)]
↳ 2.4G Mouse Consumer Control id=16 [slave keyboard (3)]
xinput also has a command type that lets you listen to a specific input
device. For instance, if I use
xinput test 15, it listens to the device with
the specified ID 15. When I run the command
xinput test 15 and then press the
‘a,’ ’s,’ and ’d’ keys on my keyboard, the output I get is as follows:
key press 38
key release 38
key press 39
key release 39
key press 40
key release 40
Now, with these two commands, we can iterate through all the input devices
related to keyboards and listen to them. We can create a script that first
lists the available input devices, filters them, and then runs the command
xinput test for each of them.
However, there is still a minor problem. How do we understand which key is pressed just by looking at the numbers that xinput gave us? How can we know that 38 stands for the key ‘a’?
The numbers provided by xinput are known as X Key Codes. These codes represent the physical keys pressed on the X layer. They are essentially similar to Linux Input Event Codes, which the Linux Operating System generates to represent the physical keys pressed. For reasons I’m not aware of, X Key Codes are incremented by 8 compared to Linux keycodes
Now, the challenge lies in making sense of each X Key Code. We need a mapping
between the X Key Codes and their corresponding keys. However, what they
correspond to can be configured by users. In fact, I’ve configured this using
setxkbmap -option "caps:swapescape". So, although pressing the
same keys on a physical keyboard will result in the same key codes, the
interpretation by your operating system or window management server can be
configured. Therefore, the correspondence of each keycode with what you’ve
pressed might vary from one environment to another. In the X protocol, you can
view the mapping between X Key Codes and X KeySyms by running the command
This is essentially what I did in the Node.js module I created to listen to keyboard events using X:
Obtain the list of available input devices by running
xinputas a subprocess.
Filter the devices; you don’t need all of them, just their IDs.
For each ID, run the command
xinput test id.
Use the result of
xmodmap -pketo understand the semantic meaning assigned to each physical keypress, known as a KeySym.
If you’re curious, you can check out the module I created, Node XInput Events.
Probably a Better Approach
After implementing the module mentioned earlier, I discovered the existence of a Linux utility called showkey that allows listening to pressed keys.
It’s also possible to create a similar script using showkey under the hood. In fact, this might have been a better approach compared to what I did above because it operates on a more fundamental level than X.
Similar to how we mapped between X Key Codes and their corresponding X Key Codes, we could create a mapping between Linux Event Codes and their meanings by examining the Linux source code for the input event codes
Moreover, using scripts like xinput as subprocesses under our script might not be the optimal approach for implementing an EventEmitter library to listen to system-wide keypresses. The conventional way is likely to interact with the X server using an X library. Unfortunately, I couldn’t build the nodeJS x11 library on my computer and chose not to delve into it much.
The series of experimental processes I went through greatly enhanced my understanding of what happens behind the scenes when I simply press a key on my physical keyboard.
Returning to the initial scenario I described, when you’re developing a program, you can act upon the values of key syms or key codes. While the key codes might remain the same, the key syms—the meanings attached to those key codes—can differ. It appears that some applications focus on key codes, disregarding your local options.
Essentially, at the kernel layer, there are only keycodes. Your operating system assigns meaning to these keycodes through specific configuration files, which you can either directly modify or use another program for modification (in this case, the X Window Management server). Since it’s generally more convenient to alter settings in the window management layer, most people configure their preferences through utilities provided by their window manager, and the window manager handles the interaction with the OS.
This serves as a compelling example of how casually experimenting with things can significantly contribute to one’s understanding of the core concepts they are dealing with.