E0 initial submission: due tonight
P0 released: due in two weeks
CLARIFICATION: When re-submitting your peer review, reply to the original cover and not your own email
When grading your peer review, we will only look at the latest reply to the latest cover letter
Change the line "set sort=threads" to "set sort=reverse-threads" to make the newest emails appear first, which may help
Suggestion for homework workflow: make a private fork of the ILKD_submissions repository and push your local changes to GitHub as a backup
Please don't hesitate to ask any questions in #questions
Take note of the new "Practical Reference" section at the bottom of this page
Last week, we started building a minimal Linux distribution
During Lecture 02, we compiled and booted the kernel using QEMU and a stub init program packaged in an initramfs
During Lecture 03, we compiled busybox to upgrade our userspace with basic core Linux utilities and a shell
As a reminder, busybox is a single binary containing many common Linux CLI utilities like mkdir
, ln
, and cat
ps
and reboot
don't work because we are missing the /proc
filesystemLet's take a brief detour to add a couple of kernel-backed filesystems to our system
A kernel backed filesystem is one whose contents are generated by the kernel rather than a storage medium
The /proc
fileysystem provides access to internal kernel data structures and exposes a number of configuration knobs
The sysfs
filesystem mounted at /sys
is similar but structured differently
The most important features is that /proc
provides information about running processes
We can use the mount
utility to add /proc
and /sys
to our system
The the kernel immediately populates the contents of both
We can add commands to our init script to mount these filesystems automatically
We will return to /proc
and /sys
later in the course
We add a C compiler and C library to our system
We use
The Tiny C Compiler (tcc
),
the tiny C compiler, since it is tiny and has some fun features
We use
The GNU C Library (glibc
)
as our C library, since it's the most widely used
Build the C compiler (tcc
) and C library (glibc
) and install into the initramfs
We create a directory tree to package as a cpio archive that we can use as the kernel's initial filesystem (initramfs)
Without a dynamic linker, executable binaries must be statically linked
The C runtime includes the headers and libraries necessary to run a C program, as well as a dynamic linker
First, we configure, build, and install tcc
into our root filesystem directory tree
Second, we configure, build, and install glibc
into our root filesystem directory tree
We compile and run a "Hello world" C program to demonstrate that our system works
Our first attempt yields a couple of "file not found" errors that we can fix by specifying additional include and link paths
We eliminate the need to specify these options at tcc
invokation by defining environment variables in our init
script
With these fixes, tcc
works as expected and we build and run "hello world" successfully
With our VM containing core utilities, a C compiler, and a C runtime, we have a minimal Linux distribution ready to roll
We now turn our attention to some of the advanced features of C used frequently by the Linux kernel
We will look at stringification and token/string concatenation
We will see some variations of for_each*
We will see examples of assembly source files that combine usage of assembly and C macros
This section contains a rundown of the commands and scripts we use in this demo.
To begin, create a directory and either link to or install
a built Linux source tree in the linux
subdirectory.
Starting the system quickly: start_vm.sh
To avoid needing to keep rebuilding the cpio
archive by hand
and manually editing and invoking QEMU,
we can use a simple script to package whatever is in the rootfs
directory as a usable initramfs and immediately run QEMU.
find .
will list files inside the subtree of the filesystem starting the current directory.
All paths listed in the output are relative to the current directory.
The cpio
utility requires a list of paths to files to include in the archive.
The program reads this list from the standard input stream.
Therefore, we can pipe the standard output stream from find .
into cpio -co
to create a
cpio-formatted
archive of a filesystem directory tree starting from the current directory.
$ cat start_vm.sh
#!/bin/sh
# package our initial root filesystem tree for use as initramfs
cd rootfs
find . | cpio -co > ../rootfs.cpio
cd ..
# invoke QEMU with our kenrel image and the initramfs from above
qemu-system-aarch64 \
-machine virt \ # machine type (virt is a general purpose option)
-cpu cortex-a53 \ # cpu model (cortex-a53 is an arbitrary choice -- it's used in one of the raspberry pi computers)
-smp 1 \ # smp = symetric multi-processing, and we specify that we only require a single virutal CPU core
-m 1024 \ # m = memory, and we only need 1024MB
-kernel linux/arch/arm64/boot/Image \ # path to the Linux kernel image
-initrd rootfs.cpio \ # path to the file containing either the initial root filesystem (initrams) or initial ramdisk (initrd)
-display none \ # don't display any video output
-serial stdio \ # connect the terminal's standard input/output to the serial console
-no-reboot \ # exit instead of rebooting when the system halts
-append "console=ttyAMA0 panic=-1" # add these arguments to the Linux kernel boot commandline options
# console=ttyAMA0 will use the AMA0 device serial port as the main system console
# this is the main serial port on the raspberry pi that we are sort of virtualizing
# panic=-1 will set the kernel to reboot immediately in the case of a kernel panic
More information about the arguments can be found in this documentation
Our init
script
This is the /init
we used at the end of L03.
This will be installed in our root filesystem as /init
.
$ cat rootfs/init
#!/bin/ash
exec ash
Recall that /bin/ash
is a symlink to /bin/busybox
generated by the busybox build system.
We initially booted our system without these symlinks
(and just the /busybox
binary)
and therefore we needed a different init script
$ cat rootfs/init
#!/busybox ash
exec /busybox ash
I made the mistake of attempting to use the first script in this situation
which led to some confusion, however this second script fixed the problem
and booted correctly, though using our system was annoying since every
command needed to be prefixed by /busybox
, e.g. /busybox mkdir
.
This /init
script is enhanced to automatically mount /proc
and /sys
$ cat rootfs/init
#!/bin/ash
# mount <device> <path> -t <filesystem type>
mkdir /proc
mount none /proc -t proc
mkdir /sys
mount none /sys -t sysfs
exec ash
We have a short
article about /proc
available containing material we will return to later in the course.
Make sure this script is executable!
Otherwise, the kernel will fail to run /init
and fall back to trying to execute several other paths before reaching a panic.
Build busybox for our rootfs:
Build busybox from source.
$ git clone git://busybox.net/busybox.git
$ cd busybox && git checkout 1_36_stable # Use the the latest stable branch instead of master
$ make defconfig # Generate the default build configuration file .config
We patch the default configuration to build busybox
as a statically-linked binary and disable the tc
utility that breaks compilation.
@@ -40,7 +40,7 @@ CONFIG_FEATURE_SYSLOG=y
#
# Build Options
#
-# CONFIG_STATIC is not set
+CONFIG_STATIC=y
# CONFIG_PIE is not set
# CONFIG_NOMMU is not set
# CONFIG_BUILD_LIBBUSYBOX is not set
@@ -968,8 +968,8 @@ CONFIG_PSCAN=y
CONFIG_ROUTE=y
CONFIG_SLATTACH=y
CONFIG_SSL_CLIENT=y
-CONFIG_TC=y
-CONFIG_FEATURE_TC_INGRESS=y
+# CONFIG_TC is not set
+# CONFIG_FEATURE_TC_INGRESS is not set
CONFIG_TCPSVD=y
CONFIG_UDPSVD=y
CONFIG_TELNET=y
These options should set using a tool like make menuconfig
For make menuconfig
, users of newer gcc
versions may need to patch the busybox source like so:
diff --git a/scripts/kconfig/lxdialog/check-lxdialog.sh b/scripts/kconfig/lxdialog/check-lxdialog.sh
index 5075ebf2d..4e138366d 100755
--- a/scripts/kconfig/lxdialog/check-lxdialog.sh
+++ b/scripts/kconfig/lxdialog/check-lxdialog.sh
@@ -47,7 +47,7 @@ trap "rm -f $tmp" 0 1 2 3 15
check() {
$cc -x c - -o $tmp 2>/dev/null <<'EOF'
#include CURSES_LOC
-main() {}
+int main() {}
EOF
if [ $? != 0 ]; then
echo " *** Unable to find the ncurses libraries or the" 1>&2
To build and install in the rootfs, assuming busybox and the rootfs reside in the same parent directory:
make
make install
cp -r _install/* ../rootfs
This is where we left off at the end of L03, but with the addition of /proc
and /sys
mounted by /init
seen above.
Clone tinycc
git clone git://repo.or.cz/tinycc.git
If for some reason the mob branch is broken, this was theHEAD
commit used in this demo:
3b943bec5de423e234b5f92d9a8f110ad66a85a1
Configure and build:
./configure
make
Note: There is no longer any necessity to compile a statically linked binary since we are about to add a dynamic linker. One may chose to use either a statically linked or dynamically linked library at their own discretion.
Since there is no obvious equivalent of DESTDIR
we can use this simple script to install tcc
in our rootfs:
#!/bin/sh
ROOTFS=../rootfs
mkdir -p $ROOTFS/bin
mkdir -p $ROOTFS/lib/tcc/include
mkdir -p $ROOTFS/include
cp tcc $ROOTFS/bin/tcc
cp libtcc1.a $ROOTFS/lib/tcc/libtcc1.a
cp runmain.o $ROOTFS/lib/tcc/runmain.o
cp bt-exe.o $ROOTFS/lib/tcc/bt-exe.o
cp bt-log.o $ROOTFS/lib/tcc/bt-log.o
cp bcheck.o $ROOTFS/lib/tcc/bcheck.o
cp include/float.h $ROOTFS/lib/tcc/include/float.h
cp include/stdalign.h $ROOTFS/lib/tcc/include/stdalign.h
cp include/stdarg.h $ROOTFS/lib/tcc/include/stdarg.h
cp include/stdatomic.h $ROOTFS/lib/tcc/include/stdatomic.h
cp include/stdbool.h $ROOTFS/lib/tcc/include/stdbool.h
cp include/stddef.h $ROOTFS/lib/tcc/include/stddef.h
cp include/stdnoreturn.h $ROOTFS/lib/tcc/include/stdnoreturn.h
cp include/tccdefs.h $ROOTFS/lib/tcc/include/tccdefs.h
cp include/tgmath.h $ROOTFS/lib/tcc/include/tgmath.h
cp include/varargs.h $ROOTFS/lib/tcc/include/varargs.h
cp tcclib.h $ROOTFS/lib/tcc/include/tcclib.h
cp libtcc.a $ROOTFS/lib/libtcc.a
cp libtcc.h $ROOTFS/include/libtcc.h
Get the GNU C library (glibc
) and create build and staging directories:
git clone git://sourceware.org/git/glibc.git
mkdir glibc-build /tmp/glibc-staging
cd glibc && git checkout release/2.39/master
glibc
has a reputation for being difficult to build from source,
but we have relatively few obstacles to deal with to accomplish our purpose
Note of little importance: GNU/Hurd mentioned in INSTALL
Configure glibc
to use the standard /usr
prefix and build, installing with headers into staging and then the rootfs:
cd ../glibc-build
../glibc/configure --prefix=/usr
DESTDIR=/tmp/glibc-staging make install
DESTDIR=/tmp/glibc-staging make install-headers
cp -r /tmp/glibc-staging/* ../rootfs
At this point, we have a dynamic linker so we don't need to compile busybox as a static binary. We can recompile busybox to use the dynamic linker and our system will continue to work
Because of some configuration quirks, we will need to pass -I/lib/tcc/include -L/lib/tcc
as arguments to each invocation of tcc
.
To avoid the need to type this every time, we can modify init
like so:
#!/bin/ash
# mount <device> <path> -t <filesystem type>
mkdir /proc
mount none /proc -t proc
mkdir /sys
mount none /sys -t sysfs
export CPATH="/lib/tcc/include"
export LIBRARY_PATH="/lib/tcc"
exec ash
At this point, we can compile and run C programs in our minimal Linux distribution.
One fun little feature of tcc
is the ability to run C files as a script.
Using a hello world program like this:
$ cat hello.c
#!/bin/tcc -run -I/lib/tcc/include -L/lib/tcc
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
return 0;
}
We can execute as follows:
$ chmod +x hello.c
$ ./hello.c
Hello, world!
Funky C interlude
Is this valid?
int main(void) {
int;
;short
;;;;;int;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;float;
void;;;;;;
return 0;
}
What is the meaning of this C statement: 0["Hello"]
?
How about: 5["Hello"]
?
This course assumes advanced knowledge of the C
language and compilation process.
We have a
short article
that breaks down the four stages of the
C
source to binary compilation process
for anyone who would like a quick refresher.
A quick preprocessor-centric tour of some funky looking kernel code
Definition of functions by macros using token concatenation, such as some of the first couple of macros defined in the arm64-specific atomic.h. Make sure to also be aware of stringification and the fact that adjacent string literals are concatenated by the preprocessor
for_each*
macros like
for_each_prime_number
and
list_for_each
A combination of C macros and arm64 assembly macros in the arm64 entry source, where Linux defines the entry points for arm64 syscalls. We will return to this later.
msg = (silence)
whoami = None
singularity v0.5 https://github.com/underground-software/singularity