Gain greater depth of understanding file descriptors by seeing how read uses them
read(2) entry
read(2)
Advanced reference count optimization
Reading through the virtual filesystem
SYSCALL_DEFINE3(read, ...)
Just calls ksys_read()
ksys_read()
Only one other caller in s390 compat code
Originally there were more callers
Obtain a reference to the file position or bail
Create a local copy
Perform virtual filesystem (vfs) read
If needed, update the file position
Drop any held reference
fdget_pos()
What is this struct fd and why might we want something more than just the struct file?
struct fd
struct file
We don't always need to have our own reference to the struct file
We need to keep track of which references need to be dropped
Get an unsigned long
unsigned long
Split it into a struct fd
static inline struct fd __to_fd(unsigned long v) { return (struct fd){(struct file *)(v & ~3),v & 3}; }
__fdget_pos()
First, do we need the file lock?
Then, do we need the file position lock?
__fdget()
Get a reference to a file descriptor unless it's opened in path mode
__fget_light()
If the refcount is 1, we can borrow it
Othewrise, we need our own reference
Use atomic_load_and_acquire() to get the current reference count
atomic_load_and_acquire()
Call files_lookup_fd_raw() directly
files_lookup_fd_raw()
The unsinged long return value will be cast
unsinged long
__fget()
FMODE_PATH
__fget_files()
fget_files_rcu()
get_file_rcu()
In the case we cannot borrow, mark the lower bits of the pointer
The following line should be more clear:
struct file *file = (struct file *)(v & ~3);
file_needs_f_pos_lock()
When do we need the file position lock?
Any regular file or directory has FMODE_POS_ATOMIC set
FMODE_POS_ATOMIC
in do_dentry_open()
do_dentry_open()
POSIX.1-2017 2.9.7
In addition, we check the file_count and for a shared iterator
To finish up, lock and set another bit if needed
__to_fd()
Not used much yet, but may be soon
DEFINE_CLASS(fd,...)
#define DEFINE_CLASS(...)
First check whether the file is open with f.file
f.file
Maybe soon to become fd_empty() and fd_file()
fd_empty()
fd_file()
Recent patchset by maintainer
file_ppos()
Otherwise, this just gets the address of the file position
vfs_read()
Overview:
Validate the operation and its inputs
Execute the specifc read handler
Notify of completion
First three checks
Make sure the file is open for reading
Make sure that the file can be read
Make sure the output buffer is a sane address
rw_verify_area()
Sanity check the file position
Verify read access
security_file_permission()
Check that count isn't too big
count >= MAX_RW_COUNT
MAX_RW_COUNT
Ensures maximum value is rounded down to page bondary
Call the actual read!
Call the read() member of file operations
read()
Otherwise, call read_iter()
read_iter()
If we are successful:
Tell fsnotify to let others know of this access
Account for task's bytes written
Unconditionally:
See struct task_io_accounting
struct task_io_accounting
Last steps to wrap up
Update the file position if relevant
Drop any references we may have
Return the number of bytes read or an error
fdput_pos()
If we locked the file position: __f_unlock_pos()
__f_unlock_pos()
If we locked the file: fdput() calls fput()
fdput()
fput()
Read doens't need to do as much as open or write
Small optimizations on file descriptor operations add up to significant performance improvements
Watch out for data storage in unexpected places like the lower bits of a pointer!