file descriptors: open syscall

Learning objective

Follow the outline of the open syscall

Overview

  1. Invocation of open

  2. Internal translation to main handler

  3. Manipulating the file descriptor table

What does open need to do?

Thoughts?

  • Get a handle to an inode

  • Create entry in the file descriptor table

  • Invoke underlying filesystem-specific code

FDT: file descriptor table

struct files_struct

->files in task_struct

General process

  1. Resolve a path

  2. Perform validation

  3. Create entry in file descriptor table

The obvious entry point

SYSCALL_DEFINE3(open,...

openat(2) is preferred

force_o_largefile()

Do we have ARCH_32_BIT_OFF_T?

  1. If not, all file offsets can be 64-bit

demo

strace ./twinkle_twinkle

C Library prefers openat

open becomes openat

  1. SYSCALL_CANCEL() used for thread semantics

Non-AT versions deprecated

list of syscalls

A fake file descriptor

AT_FDCWD

  1. Open relative to current working directory

New: struct open_how

  1. Built from int flags and umode_t mode

  2. For openat, done in do_sys_open()

  3. Added in fddb5d430ad9f ("open: introduce openat2(2) syscall")

Why the need for another open?

  1. openat didn't check for unknown flags

  2. Extending with new features troublesome

New structure

struct open_how

Open's name resolution

Follow symlinks in each component of patch

By default, follow for last component, unless O_NOFOLLOW

  1. O_PATH: another exeception, open path without following

New RESOLVE flag highlights

  1. RESOLVE_BENEATH: all resolutions in subtree

  2. RESOLVE_NO_ROOT: can't cross into another mount namespace

  3. RESOLVE_NO_SYMLINKS: Like O_NOFOLLOW for whole path resolution process

Creating the new structure

build_open_how()

  1. Invalid flags quietly discarded

  2. openat2 would -EINVAL with bad flags

A note on mode

S_IALLUGO: permission bits

read, write, exec for user, group, and other

  1. Also: setuid, setgid, sticky bit

More validation

build_open_flags()

struct open_how => struct open_flags

Name resolution

getname called => getname_flags

  1. Like a copy_from_user for filenames

  2. allocates memory, performs validation

  3. namei: name-to-inode

easter egg?

What the heck is "omirr"

Important enough for fundamental changes to pathname lookup

In and out of the kernel

  1. What became of "omirr"?

FDT Origin

SYSCALL_DEFINE0(fork)

kernel_clone() calls copy_process()

copy_process() calls copy_files()

copy_files()

Back into the file code

dup_fd()

  1. Structure is cached

alloc_fdtable()

Get next available fd

get_unused_fd_flags()

Back to the FDT

do_filp_open()

  1. Relies on path_openat()

  2. Finally: do_open()

    1. Calls vfs_open()

Enter the virtual filesystem

vfs_open()

  1. Uses struct dentry in struct path

  2. dentry: cache directory entry items: inode+path

Calling into module handler

call to fops_get

call to fops->open()

Putting the file in the table

fd_install()

demo

sudo bpftrace -e 'k:kkey_open { printf("%s\n", kstack); }'

Summary

Open resolves a path into an inode

Summary

An open file descriptor refers to a struct file in the current->files structure

Summary

The newer openat2 provides useful symlink resolution options

Summary

Everything is a file descriptor in Linux, and open is the first step

End