file descriptors: ioctl and lseek syscalls

Learning objective

Revisit familiar patterns and round out our understanding of file descriptors

Overview

  1. ioctl(2)

    1. Background and history

    2. Entry point and codepath

    3. Common IOCTLS to all file descriptors

  2. lseek(2)

    1. History and offset extension

    2. Entry point and codepath

IOCTL

  1. Commonly pronounced "eye-ock-toll"

  2. Abbreviation: Input/Output Control

  3. General purpose interface

Origins

  1. Introduced in Unix version 7

    1. 1960s-70s
  2. Operations beyond read/write

  3. Became standard device-specific communication method

  4. Replaced (now unimplemented stty & gtty

Standardization

  1. Included in POSIX.1-2001

  2. Widely used in Linux and friends

  3. Compare to DeviceIoControl() in Win32

An unusual interface

int ioctl(int fildes, int request, ... /* arg */);

  1. Variable number of arguments!

  2. From current standard

  3. "For non-STREAMS devices, the functions performed by this call are unspecified"

    1. STREAMS is an obsolete character device protocol
  4. "The ioctl() function may be removed in a future version."

In Linux

  1. man 2 ioctl

  2. in glibc

int
__ioctl (int fd, unsigned long int request, ...)
  1. Relies on crazy macros

  2. Notice that args after arg are ignored

Entering the kernel

SYSCALL_DEFINE3(ioctl,...)

  1. unsigned long int from userspace implicitly converted to unsigned int

  2. unsigned long arg can be used to hold pointer

  3. No ksys_ioctl() here!

    1. Used to exist but was removed years ago

Overview

SYSCALL_DEFINE3(ioctl,...)

  1. Validate and take reference to file

  2. Check security modules to validate operation

  3. Perform underlying IOCTL

  4. Release the file reference

fdget() covered elsewhere

See the slides on read

  1. This check make sure fd is valid

Security check

security_file_ioctl()

  1. Similar to file_permisison hook covered in write slides

  2. Checks depend on cmd

  3. Example in selinux

  4. Not present in apparmor

First, the common

do_vfs_ioctl()

  1. Common to any file descriptor

  2. Not specific to any filesystem or device

First, the common

do_vfs_ioctl()

FIOCLEX and FIONCLEX: Set or clear the "close-on-exec" flag

  1. Can also do this with fcntl(2) and open(2) with the O_CLOEXEC flag

  2. Close fd if current succeeds at execve(2)

First, the common

do_vfs_ioctl()

FIONBIO: Uses ioctl_fionbio() to set or clear the nonblocking IO flag

  1. Note single cmd here

First, the common

do_vfs_ioctl()

FIOASYNC: Uses ioctl_fioasync() to enable or disable asyncrhonous IO notifications

  1. Note -ENOTTY means this IOCTL doesn't apply to this fd

  2. Makes sense: f_op->fasync() must be defined

First, the common

do_vfs_ioctl()

FIOQSIZE: get file's size

  1. Works for directories and links, not just regular files

  2. A directory's size is the sum of all entries

First, the common

do_vfs_ioctl()

FIFREEZE and FITHAW: freeze or thaw a filesystem

  1. Useful for snapshotting and backups

  2. Interaction with write covered write slides

  3. Uses ioctl_fsfreeze() and ioctl_fsthaw()

First, the common

do_vfs_ioctl()

FS_IOC_FIEMAP: Get the physical layout of a file on disk

  1. Useful for optimization and defragmentation

  2. See ioctl_fiemap() for more info

First, the common

do_vfs_ioctl()

FIGETBSZ: get the block size of a filesystem

  1. Check the superblock of this inode

  2. Not always relevant

  3. A simple operation

First, the common

do_vfs_ioctl()

FICLONE, FICLONERANGE, and FIDEDUPERANGE: Copy-on-write file cloning

  1. First can clone a whole file (ioctl_file_clone())

  2. Second can clone part of a file (ioctl_file_clone_range())

  3. Third can deduplicate data accross multiple files (ioctl_file_dedupe_range())

demo

A simple cp implementation

First, the common

do_vfs_ioctl()

FIONREAD: How many bytes left to read in a file?

  1. This is one place IOCTL may call into a filesystem and/or module

  2. For regular file, this is simple subtraction

First, the common

do_vfs_ioctl()

FS_IOC_GETFLAGS and FS_IOC_SETFLAGS: Set and get file flags

  1. Different than those that can be set with open(2) or fcntl(2)

  2. Many are persistent beyond this fd

  3. E.g. FS_APPEND_FL make a file append-only

  4. E.g. FS_IMMUTABLE_FL make a file immutable

  5. Uses ioctl_getflags and ioctl_setflags

First, the common

do_vfs_ioctl()

FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR: Get and set extented filesystem-level attributes

  1. Multiple uses, including SELinux labels

  2. Stored seperately from main file information

  3. Related to, but different than xattrs

Regular files

file_ioctl()

A couple of commands only relevant for regular files, including:

  1. Mapping logical to physical block numbers

  2. Allocate uninitialized space for a file

  3. Deacocate the physical space for a file

  4. Zero out a file range

Next, the specific implementation

vfs_ioctl()

  1. Call f_op->unlocked_ioctl() if it exists

  2. Unlocked == no global kernel lock taken

  3. Global kernel lock long removed, so no other option

This concludes ioctl(2)

LSEEK

  1. Short for "long seek"

  2. Change offset of an open file

  3. Implies an historical non-long seek

History of lseek

  1. In the begining (~1970), there was seek()

  2. Used signed, 16-bit offset

  3. Very limited!

  4. 2^15 bytes per file

History of lseek

  1. lseek() was introduced to expand computer potential

  2. Now, the offset was a signed 32-bit integer

  3. Files could be an entire 2GB!

  4. POSIX standardized lseek() but not seek()

  5. Therefore, seek() found the dustbin of history

Current standard

"...off_t shall be [a] signed integer [type]" -- POSIX

  1. off_t => __kernel_off_t in <linux/types.h>

  2. __kernel_off_t => __kernel_long_t in <asm-generic/posix_types.h>o

  3. Finally: __kernel_long_t => long in the same file, per POSIX

Longer offsets

  1. An loff_t, however, is a long long (64-bit)

  2. On 64-bit systems, the long type is 64-bits

  3. 2^63 = 17,179,869,184/2 gigabytes

  4. This should be enough for all humans

Back to the code

SYSCALL_DEFINE3(lseek,...)

  1. Another ksys_* instance

  2. Used by the 32-bit compatibility entry point too

A familiar pattern

ksys_lseek()

  1. Get a valid reference to the file descriptor or exit

  2. Make sure the whence is within range

    1. This value modifies the file offset
  3. Perform the operation

  4. Check for errors (downcast and upcast)

  5. Release the reference and return

The long becomes longer

vfs_llseek()

  1. Bail if this is a pipe, socket, or FIFO

  2. ESPIPE is a specific error for seeking on a pipe

  3. If all goes well, call into the filesystem or module

  4. llseek: long long seek (64-bit)

This concludes lseek(2)

Summary

Many system calls have a varied and interesting history that explains many of their quirks

Summary

ioctl(2) provides a versitile way to implement all sorts of interfaces to a kernel modules

Summary

Though quite a simple syscall, understanding lseek(2) provides insight into Linux, Unix, and computer history.

Summary

After seeing six syscall implementations, many common patterns should become apparent

Summary

This code is being actively worked on upstream. Contribute!

End