10. Technical Details
10.1. How They Work
insmod makes an init_module
system call to load the LKM into kernel memory. Loading it is the
easy part, though. How does the kernel know to use it? The answer is
that the init_module system call invokes the
LKM's initialization routine right after it loads the LKM.
insmod passes to init_module
the address of the subroutine in the LKM named
init_module as its initialization routine.
(This is confusing -- every LKM has a subroutine named
init_module, and the base kernel has a system
call by that same name, which is accessible via a subroutine in the
standard C library also named init_module).
The LKM author set up init_module to call a
kernel function that registers the subroutines that the LKM contains.
For example, a character device driver's
init_module subroutine might call the kernel's
register_chrdev subroutine, passing the major and
minor number of the device it intends to drive and the address of its
own "open" routine among the arguments.
register_chrdev records in base kernel tables
that when the kernel wants to open that particular device, it should
call the open routine in our LKM.
But the astute reader will now ask how the LKM's
init_module subroutine knew the address of the
base kernel's register_chrdev subroutine. This
is not a system call, but an ordinary subroutine bound into the base
kernel. Calling it means branching to its address. So how does our
LKM, which was not compiled anywhere near the base kernel, know that
address? The answer to this is insmod relocation.
insmod functions as a relocating linker/loader.
The LKM object file contains an external reference to the symbol
register_chrdev. insmod does a
query_module system call to find out the
addresses of various symbols that the existing kernel exports.
register_chrdev is among these.
query_module returns the address for which
register_chrdev stands and
insmod patches that into the LKM where the LKM
refers to register_chrdev.
If you want to see the kind of information insmod
can get from a query_module system call, look at
the contents of /proc/ksyms.
Note that some LKMs call subroutines in other LKMs. They can do this
because of the __ksymtab and
.kstrtab sections in the LKM object files. These
sections together list the external symbols within the LKM object file
that are supposed to be accessible by other LKMs inserted in the
future. insmod looks at
__ksymtab and .kstrtab and tells
the kernel to add those symbols to its exported kernel symbols table.
To see this for yourself, insert the LKM msdos.o
and then notice in /proc/ksyms the symbol
fat_add_cluster (which is the name of a subroutine
in the fat.o LKM). Any subsequently inserted LKM
can branch to fat_add_cluster, and in fact
msdos.o does just that.
10.2. The .modinfo Section
An ELF object file consists of various named sections. Some of them
are basic parts of an object file, for example the
.text section contains executable code that a
loader loads. But you can make up any section you want and have it
used by special programs. For the purposes of Linux LKMs, there is
the .modinfo section. An LKM doesn't have to have
a section named .modinfo to work, but the macros
you're supposed to use to code an LKM cause one to be generated, so
they generally do.
To see the sections of an object file, including the
.modinfo section if it exists, use the
objdump program. For example:
To see all the sections in the object file for the msdos LKM:
objdump msdos.o --section-headers
To see the contents of the .modinfo
objdump msdos.o --full-contents --section=.modinfo
You can use the modinfo program to interpret the
contents of the .modinfo section.
So what is in the .modinfo section and who uses it?
insmod uses the .modinfo section
for the following:
It contains the kernel release number for which the module was built.
I.e. of the kernel source tree whose header files were used in
compiling the module.
insmod uses that information as explained in
It describes the form of the LKM's parameters.
insmod uses this information to format the
parameters you supply on the insmod command line
into data structure initial values, which insmod
inserts into the LKM as it loads it.
10.3. The __ksymtab And .kstrtab Sections
Two other sections you often find in an LKM object file are named
__ksymtab and .kstrtab.
Together, they list symbols in the LKM that should be accessible
(exported) to other parts of the kernel. A symbol is just a text name
for an address in the LKM. LKM A's object file can refer to an
address in LKM B by name (say, getBinfo"). When
you insert LKM A, after having inserted LKM B,
insmod can insert into LKM A the actual address
within LKM B where the data/subroutine named
getBinfo is loaded.
See Section 10.1 for more mind-numbing details of symbol binding.
10.4. Ksymoops Symbols
insmod adds a bunch of exported symbols to the LKM
as it loads it. These symbols are all intended to help
ksymoops do its job. ksymoops
is a program that interprets and "oops" display. And
"oops" display is stuff that the Linux kernel displays when
it detects an internal kernel error (and consequently terminates a
process). This information contains a bunch of addresses in the
kernel, in hexadecimal.
ksymoops looks at the hexadecimal addresses, looks
them up in the kernel symbol table (which you see in
/proc/ksyms, and translates the addresses in the
oops message to symbolic addresses, which you might be able to look
up in an assembler listing.
So lets say you have an LKM crash on you. The oops message contains
the address of the instruction that choked, and what you want
ksymoops to tell you is 1) in what LKM is that
instruction, and 2) where is the instruction relative to an assembler
listing of that LKM? Similar questions arise for the data addresses
in the oops message.
To answer those questions, ksymoops must be able to
get the loadpoints and lengths of the various sections of the LKM from
the kernel symbol table.
Well, insmod knows those addresses, so it just
creates symbols for them and includes them in the symbols it loads
with the LKM.
In particular, those symbols are named (and you can see this for
yourself by looking at /proc/ksyms):
name is the LKM name (as you would see in
sectionname is the section name, e.g. .text
(don't forget the leading period).
length is the length of the section, in decimal.
The value of the symbol is, of course, the address of the section.
Insmod also adds a pretty useful symbol that tells from what file
the LKM was loaded. That symbol's name is
name is the LKM name, as above.
filespec is the file specification that was used to
identify the file containing the LKM when it was loaded. Note that it
isn't necessarily still under that name, and there are multiple
file specifications that might have been used to refer to the same
file. For example, ../dir1/mylkm.o and
mtime is the modification time of that file, in
the standard Unix representation (seconds since 1969), in hexadecimal.
version tells the kernel version level for which
the LKM was built (same as in the .modinfo section).
It is the value of the macro LINUX_VERSION_CODE
in Linux's linux/version.h file. For example,
The value of this symbol is meaningless.
10.5. Other Symbols
adds another symbol, similar to the ksymoops
symbols. This one tells where the persistent data lives in the
LKM, which rmmod needs to know in order to save
the persistent data.
10.6. Debugging Symbols
There is another kind of symbol that relates to an LKM: kallsyms
symbols. These are not exported symbols; they do not show up in
proc/ksyms. They refer to addresses in the
kernel that are nobody's business except the module they are in, and
are not meant to be referenced by anything except a debugger. Kdb,
the kernel debugger that comes with the Linux kernel, is one user of
The kallsyms facility works for both the base kernel and LKMs. For the
base kernel, you select it when you build the base kernel, with the
CONFIG_KALLSYMS configuration option. When you do that, the kernel
contains a kallsyms symbol for all the symbols in the base kernel
object files. You know your base kernel is participating in the kallsyms
facility if you see the symbol __start___kallsyms
For an LKM, you decide at load time whether it will contain kallsyms
symbols. You include the kallsyms definitions in the data you pass to
the init_module system call to load the LKM.
insmod does this if either 1) you specify the
--kallsyms option, or 2) insmod
determines, by looking at /proc/ksyms, that the
base kernel is participating in the kallsyms facility. The kallsyms that
insmod defines are all the symbols in the LKM
object file. To wit, those are the symbols you see when you run
nm on the LKM object file.
Each loaded LKM that is participating in kallsyms has its own kallsyms
symbol table. When the base kernel is participating in the kallsyms
facility, the individual LKM kallysms symbol tables are linked into a
master symbol table so that a debugger can look up a symbol anywhere
in the kernel. When the base kernel is not participating in kallsyms,
a debugger must look explicitly at a particular LKM to find symbols
for that LKM. Kdb, for one, cannot do this. So the basic rule is: If
you're going to do any kernel debugging, use CONFIG_KALLSYMS.
Note that the __kallsyms section has nothing to do
with LKMs. That's a section in the base kernel object module. The
base kernel doesn't have the luxury of something as high-level as
sophisticated as insmod to load it, so it needs
that extra object file section to facilitate its participation in
Similarly, the program kallsyms has nothing to do with
LKMs. It is what creates the __kallsyms section.
There is another kind of debugging symbol -- the kind that
gcc creates with its -g option.
These are unrelated to the kallsyms facility. They do not get loaded
into kernel memory. Kdb does not use them. But Kgdb (which gets
information both from kernel memory and source and object files) does.
10.7. Memory Allocation For Loading
This section is about how Linux allocates memory in which to load an LKM.
It is not about how an LKM dynamically allocates memory, which is the same
as for any other part of the kernel.
The memory where an LKM resides is a little different from that where
the base kernel resides. The base kernel is always loaded into one
big contiguous area of real memory, whose real addresses are equal to
is virtual addresses. That's possible because the base kernel is the
first thing ever to get loaded (besides the loader) -- it has a wide
open empty space in which to load. And since the Linux kernel is not
pageable, it stays in it's homestead forever.
By the time you load an LKM, real memory is all fragmented -- you
can't simply add the LKM to the end of the base kernel. But the LKM
needs to be in contiguous virtual memory, so Linux uses
vmalloc to allocate a contiguous area of virtual
memory (in the kernel address space), which is probably not contiguous
in real memory. But the memory is still not
pageable. The LKM gets loaded into real page frames from
the start, and stays in those real page frames until it gets unloaded.
Some CPUs can take advantage of the properties of the base kernel to
effect faster access to base kernel memory. For example, on one
machine, the entire base kernel is covered by one page table entry
and consequently one entry in the translation lookaside buffer (TLB).
Naturally, that TLB entry is virtually always present. For LKMs on
this machine, there is a page table entry for each memory page into
which the LKM is loaded. Much more often, the entry for a page is not
in the TLB when the CPU goes to access it, which means a slower access.
This effect is probably trivial.
It is also said that PowerPC Linux does something with its address
translation so that transferring between accessing base kernel memory
to accessing LKM memory is costly. I don't know anything solid about
The base kernel contains within its prized contiguous domain a large
expanse of reusable memory -- the kmalloc pool. In some versions of
Linux, the module loader tries first to get contiguous memory from
that pool into which to load an LKM and only if a large enough space
was not available, go to the vmalloc space. Andi Kleen submitted code
to do that in Linux 2.5 in October 2002. He claims the difference is
in the several per cent range.
10.8. Linux internals
If you're interested in the internal workings of the Linux kernel with
respect to LKMs, this section can get you started. You should not need
to know any of this in order to develop, build, and use LKMs.
The code to handle LKMs is in the source files
kernel/module.c in the Linux source tree.
The kernel module loader (see Section 5.4) lives in
(Ok, that wasn't much of a start, but at least I have a framework here
for adding this information in the future).