The Linux Kernel Modules Programming

Kernel modules are pieces of code that get loaded into the kernel at runtime no reboot, no recompilation. They run in ring0, same privilege level as the kernel itself. Drivers, filesystems, network stacks all kernel modules. And yeah, rootkits too.

Linux uses two rings: ring0 for kernel, ring3 for userland. User processes can’t touch kernel memory, can’t execute privileged instructions, can’t do shit outside their sandbox. Kernel modules bypass all of that. They run with full access to everything hardware, memory, every process on the system. Only root can load them.

Load a module:

insmod module.ko

Unload it:

rmmod module

That’s it. Now let’s write one.

Hello World

The bare minimum a module that prints on load and unload:

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("");
MODULE_DESCRIPTION("Hello World");

int __init mod_init(void)
{
	printk(KERN_ALERT "Hello world!\n");
	return 0;
}

void __exit mod_exit(void)
{
	printk(KERN_ALERT "Bye world\n");
}

module_init(mod_init);
module_exit(mod_exit);

module_init and module_exit register the entry and exit functions. Names don’t matter the macros handle the mapping. printk is the kernel’s printf writes to the kernel log, not stdout. KERN_ALERT is the priority level (there are 8, defined in linux/kernel.h). If the priority is high enough, it prints to the console. Otherwise check dmesg.

Makefile:

obj-m += hello.o
all:
	make -C /lib/modules/$(shell uname -r)/build M=${PWD} modules
clean:
	make -C /lib/modules/$(shell uname -r)/build M=${PWD} clean

Compile with make, load with insmod hello.ko, check dmesg for output. modinfo hello.ko shows the metadata you set with the MODULE macros.

For multi-file modules:

obj-m += big.o
big-objs += one.o two.o

Procfs

Linux exposes kernel and process info through /proc a virtual filesystem. /proc/version gives you the kernel version, /proc/<pid>/ has per-process info. Modules can create their own entries here to communicate with userland.

Each procfs entry is a proc_dir_entry struct. The fields that matter:

name - entry name in /proc
mode - permissions (e.g. 0666)
read_proc - function called when userland reads the entry
write_proc - function called when userland writes to it

The read callback:

typedef int (read_proc_t)(char *page, char **start, off_t off,
                          int count, int *eof, void *data);

page is the buffer you write into, count is how many bytes the reader wants.

The write callback:

typedef int (write_proc_t)(struct file *file, const char __user *buffer,
                           unsigned long count, void *data);

buffer is what userland wrote, count is the size.

Here’s a module that creates /proc/test_proc you can write to it and read back whatever you wrote:

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/proc_fs.h>
#include <linux/string.h>
#include <linux/uaccess.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("");
MODULE_DESCRIPTION("Simple procfs module");

static char our_buf[256];

int buf_read(char *buf, char **start, off_t offset, int count, int *eof, void *data)
{
	return snprintf(buf, count, "%s", our_buf);
}

static int buf_write(struct file *file, const char *buf, unsigned long count, void *data)
{
	if(count > 255)
		count = 255;
	copy_from_user(our_buf, buf, count);
	our_buf[count] = '\0';
	return count;
}

int __init start_module(void)
{
	struct proc_dir_entry *de = create_proc_entry("test_proc", 0666, 0);
	de->read_proc = buf_read;
	de->write_proc = buf_write;
	sprintf(our_buf, "hello");
	return 0;
}

void __exit exit_module(void)
{
	remove_proc_entry("test_proc", NULL);
}

module_init(start_module);
module_exit(exit_module);

Test it:

echo "foo" > /proc/test_proc
cat /proc/test_proc

The copy_from_user call is important you can’t just dereference userland pointers from kernel space. The function handles the address space boundary safely. The 255-byte cap prevents buffer overflows on our_buf.

Notifiers

The kernel has a notification system for broadcasting events to interested subsystems notify chains. When something happens (key press, network event, module load), the kernel walks a linked list of notifier_block structs and calls each registered callback.

struct notifier_block {
	int (*notifier_call)(struct notifier_block *self, unsigned long x, void *data);
	struct notifier_block *next;
	int priority;
};

notifier_call is your callback. priority controls execution order (higher = called first, usually just set to 0). next chains to the next registered block.

For keyboard events, you register with register_keyboard_notifier. The callback gets a keyboard_notifier_param struct with the key value. You only act on KBD_KEYSYM stage that’s when the key has been fully processed.

Here’s a dumb random number generator that uses key presses as entropy not cryptographically sound, just a demo of the notifier mechanism:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/keyboard.h>
#include <linux/notifier.h>
#include <linux/proc_fs.h>
#include <linux/string.h>

MODULE_LICENSE("GPL");

static unsigned long long random_num;

static int kbd_notify(struct notifier_block *self, unsigned long stage, void *data)
{
	struct keyboard_notifier_param *param = data;
	int value = param->value - 0xf000;

	if(random_num > 1000000000)
		random_num -= 10 * value;
	else
		random_num *= value;

	return NOTIFY_DONE;
}

static struct notifier_block kbd_nb = {
	.notifier_call = kbd_notify,
};

static int random_read(char *buf, char **start, off_t off, int count, int *peof, void *data)
{
	return sprintf(buf, "%llu", random_num);
}

static int __init random_init(void)
{
	struct proc_dir_entry *de = create_proc_entry("random_simple", 0444, 0);
	de->read_proc = random_read;
	register_keyboard_notifier(&kbd_nb);
	random_num = 1;
	return 0;
}

static void __exit random_exit(void)
{
	remove_proc_entry("random_simple", 0);
	unregister_keyboard_notifier(&kbd_nb);
}

module_init(random_init);
module_exit(random_exit);

Load it, press some keys, then cat /proc/random_simple to see the “random” number. Every keypress mutates the value. Unload with rmmod always unregister your notifier in the exit function or the kernel will call into freed memory and panic.

The notify chain pattern shows up everywhere in the kernel it’s the same mechanism used for module load notifications, network events, CPU hotplug. Understanding it here means you’ll recognize it when you see register_module_notifier, register_netdevice_notifier, etc.

The Linux Kernel Module Programming Guide