The Nature of Linux Kernel Development — Difference Between Rules of Kernel Level and User-Space Application Level
Planted February 4, 2024
Preface: This article is intended to explain a clear distinction between the core principles of Linux Kernel Development and User-Application Level Development. The provided information is based on my research on Kernel Development through various sources and I have tried to make it as accurate as possible. Efforts have been made to explain it as simply and concisely as possible.
Introduction to the Nature of Linux Kernel
Linux Kernel is the abstraction layer between the Operating System and the Hardware in the system. While most User-Application Developers are familiar with standard ways of application development, they reside in the comfort of the Operating System and often underestimate the features provided by it. Since the Linux Kernel goes deep down to the low level, it does not have such support for convenience. There is no definition of those aspects! For example, usually, printf
would be used for printing messages generated by the software. These are usually built-in functions and need to be imported from some external dependencies. But in the case of Kernel, there are no external dependencies! or even the definition of it. Since printf
prints messages in the console (your terminal), there is no console-like thing in a bare metal system. Hence, development at such a low level is different from the usual application-level development. It’s not difficult or tedious, it’s different. I would like to present a perspective on this in the next section.
The Perspective of the Author in this Topic
Now these are my very personal opinions and may not be accurate but are my best intentions.
When developers learn to program, usually they learn while being in the comfort of the underlying Operating System and a Kernel that is residing under it to do its job. It’s not always required by application-level developers to go through a lot of low-level and is usually good for them to improve their application-level skills to make progress on that side of computers. On the other hand, it confines their ability to get out of the zone and appreciate the working of such low-level programs that make their life easier. Programming at such a low level does not incorporate external dependencies and hence, it should be noted that programming is not just about getting the work done with external dependencies. The external dependencies are not the real syntax of it. Writing printf
when in need of console output is convenient but not the syntax of the programming language. When going down deep into the roots, this convenience decreases and it’s time to understand the real “syntax”. The syntax is the same for the language, but the implementation changes. Now it’s not just some user-friendly output, it would be defining how the underlying hardware functions.
While understanding Kernel Development, I always like to think of C Language as its Assembly code. This gives more closeness to the processor and therefore understanding things more from the firmware development perspective. Since the process understands assembly and this is how the underlying core of any application works, this gives more convenience while imagining the step-by-step execution of the code.
The more you go deep, the more you get out of the illusion of development and understand the real rules of computers.
The Seven Important Points in Kernel Development
Here are the absolutely important points about the Kernel Development:
- The Kernel has access to neither the C library nor the standard C headers. There is no definition of external dependencies while programming at such low levels.
- The Kernel is developed in the GNU C Language
- The Kernel lacks the memory protection afforded to user-space (So developers who got used to application-level programming need to get out of the illusion of security convenience).
- The Kernel Cannot easily execute floating-point operations.
- The Kernel has a small per-process fixed-size stack.
- Because the Kernel has asynchronous interrupts, is preemptive, and supports SMP, synchronization and concurrency are major concerns within the Kernel.
- Portability of important (It needs to run on a variety of CPU Architectures).
No libc or Standard Headers
A Header file contains functions that are predefined to make the development more convenient as they contain utilities that are complex and reusable. So developing them once and using them makes more sense. They are just external dependencies which are installed on the system. But in the case of Kernel, there can’t be any external dependencies on the system (as there is no system, it’s just a circuit board so so-called motherboard in the case of computers). Hence, the Kernel needs to have it’s all its code consolidated in it and defined in it. The Kernel has header files which are used in cases of reusability of code and standard functions but they have been developed for their purpose and hence, are required to be studied differently. For example, there is no printf
in the Kernel as there is no console, but there is a function called printk
that prints the Kernel-generated messages (you can use the dmesg
command in your Linux system to print the Kernel-generated messages on the console). The console is a concept that needs to be developed while writing the Kernel and is not something that exists already. Hence, printk
is defined as printing the message in a way that it would be on the console that is defined to show on the monitor.
However, there are libc libraries included in the Linux Kernel for convenience of development which would be essential and reusable in a lot of parts of the code. For example, common string manipulation is included in the lib/string.c
which can be included by using linux/string.h
.
The Use of GNU C Programming Language
Here is the part where the Kernel Development deviates from usual Application-Level Development. Usually, ANSI C is used at the Application Level while the Linux Kernel is developed in the GNU C Programming Language. The Kernel can be compiled with gcc and has also got the support of the Intel compiler. The usage of GNU C is due to the support provided by it for various functions that are essential for firmware development. Some of the essential extensions are explained here:
- Inline Functions: Both C99 and GNU C support inline functions. These functions are inserted inline into each function call site. This eliminated this overhead of function invocation and return (register to save and restore) and allows for potentially greater optimisations as the compiler can optimise both the caller and the function as one.
- Inline Assembly: The gcc compiler enables the embedding of assembly instructions in otherwise normal C functions. This is used in the parts where it is unique to the system Architecture and needs particular adjustments.
- Branch Annotation: The gcc compiler has a built-in directive that optimises conditional branches as either very likely taken or very unlikely taken. The Kernel wraps the directive in easy-to-use macros, likely() and unlikely().
No Memory Protection
The concept of memory protection comes in the Application-Level Development for secure coding practices. Languages like Python manage their memory with built-in functions whereas languages like Rust (one of my favourite languages while writing this article) use mechanisms like borrow-checker to ensure secure memory management. But in the case of Kernel Development, there is just a piece of hardware which have memory free to access! Hence, to provide memory management to make sure that the memory is used efficiently and securely by the operating system (actually this is where the Kernel Exploits happen due to overflow in cases where memory management logic is compromised). Hence, the Kernel has to impose some rules for itself to control the usage of the memory and prevent unwanted behaviours.
No (Easy) use of Floating Point
When the user-space process uses the floating-point instructions, the Kernel manages the transition from integer to floating-point mode. The Kernel handles the floating-point instructions differently on each architecture but usually catches a trap and then initiates the transition from integer to floating-point mode.
But the Kernel does not enjoy the luxury of catching a trap (it can’t easily trap itself) and hence, manual procedures need to be used. This is tedious and hence, usually not preferred and is usually used only if there is a strict requirement.
Small, Fixed-Size Stack
User-Space Applications can make use of as many variables as they want due to the large stack that can dynamically grow. But it’s not the case with Kernel Development. The Kernel Stack is neither large nor dynamic; it is small and fixed in size. It’s 8KB for 32-bit architecture and 16KB for 64-bit architecture which is fixed and absolute. Each process receives its stack.
Synchronization and Concurrency
The kernel is susceptible to race conditions. Shared and concurrent access to resources is allowed and hence, synchronization is required to prevent race conditions.
- Linux is a preemptive (Kernel can stop any process as per the allocated timeslice, more explained in another article) multitasking operating system. Hence, the Kernel needs to synchronise between the switching tasks.
- Linux supports symmetrical multiprocessing (SMP). Therefore, without proper protection, Kernel code executing simultaneously on two or more processors can concurrently access the same resource.
- Interrupts occur asynchronously with respect to the currently executing code. Therefore, without proper protection, an interrupt can occur in the midst of accessing a resource, and the interrupt handler can then access the same resource.
- Since the Linux Kernel is preemptive, without protection, Kernel code can be preempted in favour of different code that then accesses the same resource.
Portability of the Kernel
The Linux Kernel needs to be portable due to its wide range of applications. Linux Kernel powers large servers and infrastructures as well as embedded devices. Hence, hardware considerations have to be taken care of and must be independent of the platform it runs on. This is something that is at the heart of the Linux Kernel, to run on a wide range of systems.
Conclusion
Kernel development is a wide topic and has a very different nature than application-level development. Although it has a very different nature, it’s not difficult. Understanding and exploring it gives a deeper dive into the low levels of the computers and helps understand a lot about the real nature of the hardware of the computers. It’s just the fact that most developers have got so caught up with User-Space Application development rules that they feel low levels as rigorous and tedious, which is not the case. It’s just how the real under-the-hood things happen, this is how it’s governed.