2617 lines
107 KiB
Plaintext
2617 lines
107 KiB
Plaintext
@node Reference Guide
|
|
@appendix Reference Guide
|
|
|
|
This chapter is a reference for the PintOS code. The reference guide
|
|
does not cover all of the code in PintOS, but it does cover those
|
|
pieces that students most often find troublesome. You may find that
|
|
you want to read each part of the reference guide as you work on the
|
|
task where it becomes important.
|
|
|
|
We recommend using ``tags'' to follow along with references to function
|
|
and variable names (@pxref{Tags}).
|
|
|
|
@menu
|
|
* PintOS Loading::
|
|
* Threads::
|
|
* Synchronization::
|
|
* Interrupt Handling::
|
|
* Memory Allocation::
|
|
* Virtual Addresses::
|
|
* Page Table::
|
|
* Linked List::
|
|
* Hash Table::
|
|
@end menu
|
|
|
|
@node PintOS Loading
|
|
@section Loading
|
|
|
|
This section covers the PintOS loader and basic kernel
|
|
initialization.
|
|
|
|
@menu
|
|
* PintOS Loader::
|
|
* Low-Level Kernel Initialization::
|
|
* High-Level Kernel Initialization::
|
|
* Physical Memory Map::
|
|
@end menu
|
|
|
|
@node PintOS Loader
|
|
@subsection The Loader
|
|
|
|
The first part of PintOS that runs is the loader, in
|
|
@file{threads/loader.S}. The PC BIOS loads the loader into memory.
|
|
The loader, in turn, is responsible for finding the kernel on disk,
|
|
loading it into memory, and then jumping to its start. It's
|
|
not important to understand exactly how the loader works, but if
|
|
you're interested, read on. You should probably read along with the
|
|
loader's source. You should also understand the basics of the
|
|
80@var{x}86 architecture as described by chapter 3, ``Basic Execution
|
|
Environment,'' of @bibref{IA32-v1}.
|
|
|
|
The PC BIOS loads the loader from the first sector of the first hard
|
|
disk, called the @dfn{master boot record} (MBR). PC conventions
|
|
reserve 64 bytes of the MBR for the partition table, and PintOS uses
|
|
about 128 additional bytes for kernel command-line arguments. This
|
|
leaves a little over 300 bytes for the loader's own code. This is a
|
|
severe restriction that means, practically speaking, the loader must
|
|
be written in assembly language.
|
|
|
|
The PintOS loader and kernel don't have to be on the same disk, nor
|
|
is the kernel required to be in any particular location on a
|
|
given disk. The loader's first job, then, is to find the kernel by
|
|
reading the partition table on each hard disk, looking for a bootable
|
|
partition of the type used for a PintOS kernel.
|
|
|
|
When the loader finds a bootable kernel partition, it reads the
|
|
partition's contents into memory at physical address @w{128 kB}. The
|
|
kernel is at the beginning of the partition, which might be larger
|
|
than necessary due to partition boundary alignment conventions, so the
|
|
loader reads no more than @w{512 kB} (and the PintOS build process
|
|
will refuse to produce kernels larger than that). Reading more data
|
|
than this would cross into the region from @w{640 kB} to @w{1 MB} that
|
|
the PC architecture reserves for hardware and the BIOS, and a standard
|
|
PC BIOS does not provide any means to load the kernel above @w{1 MB}.
|
|
|
|
The loader's final job is to extract the entry point from the loaded
|
|
kernel image and transfer control to it. The entry point is not at a
|
|
predictable location, but the kernel's ELF header contains a pointer
|
|
to it. The loader extracts the pointer and jumps to the location it
|
|
points to.
|
|
|
|
The PintOS kernel command line
|
|
is stored in the boot loader. The @command{pintos} program actually
|
|
modifies a copy of the boot loader on disk each time it runs the kernel,
|
|
inserting whatever command-line arguments the user supplies to the kernel,
|
|
and then the kernel at boot time reads those arguments out of the boot
|
|
loader in memory. This is not an elegant solution, but it is simple
|
|
and effective.
|
|
|
|
@node Low-Level Kernel Initialization
|
|
@subsection Low-Level Kernel Initialization
|
|
|
|
The loader's last action is to transfer control to the kernel's entry
|
|
point, which is @func{start} in @file{threads/start.S}. The job of
|
|
this code is to switch the CPU from legacy 16-bit ``real mode'' into
|
|
the 32-bit ``protected mode'' used by all modern 80@var{x}86 operating
|
|
systems.
|
|
|
|
The startup code's first task is actually to obtain the machine's
|
|
memory size, by asking the BIOS for the PC's memory size. The
|
|
simplest BIOS function to do this can only detect up to 64 MB of RAM,
|
|
so that's the practical limit that PintOS can support. The function
|
|
stores the memory size, in pages, in global variable
|
|
@code{init_ram_pages}.
|
|
|
|
The first part of CPU initialization is to enable the A20 line, that
|
|
is, the CPU's address line numbered 20. For historical reasons, PCs
|
|
boot with this address line fixed at 0, which means that attempts to
|
|
access memory beyond the first 1 MB (2 raised to the 20th power) will
|
|
fail. PintOS wants to access more memory than this, so we have to
|
|
enable it.
|
|
|
|
Next, the loader creates a basic page table. This page table maps
|
|
the 64 MB at the base of virtual memory (starting at virtual address
|
|
0) directly to the identical physical addresses. It also maps the
|
|
same physical memory starting at virtual address
|
|
@code{LOADER_PHYS_BASE}, which defaults to @t{0xc0000000} (3 GB). The
|
|
PintOS kernel only wants the latter mapping, but there's a
|
|
chicken-and-egg problem if we don't include the former: our current
|
|
virtual address is roughly @t{0x20000}, the location where the loader
|
|
put us, and we can't jump to @t{0xc0020000} until we turn on the
|
|
page table, but if we turn on the page table without jumping there,
|
|
then we've just pulled the rug out from under ourselves.
|
|
|
|
After the page table is initialized, we load the CPU's control
|
|
registers to turn on protected mode and paging, and set up the segment
|
|
registers. We aren't yet equipped to handle interrupts in protected
|
|
mode, so we disable interrupts. The final step is to call @func{main}.
|
|
|
|
@node High-Level Kernel Initialization
|
|
@subsection High-Level Kernel Initialization
|
|
|
|
The kernel proper starts with the @func{main} function. The
|
|
@func{main} function is written in C, as will be most of the code we
|
|
encounter in PintOS from here on out.
|
|
|
|
When @func{main} starts, the system is in a pretty raw state. We're
|
|
in 32-bit protected mode with paging enabled, but hardly anything else is
|
|
ready. Thus, the @func{main} function consists primarily of calls
|
|
into other PintOS modules' initialization functions.
|
|
These are usually named @func{@var{module}_init}, where
|
|
@var{module} is the module's name, @file{@var{module}.c} is the
|
|
module's source code, and @file{@var{module}.h} is the module's
|
|
header.
|
|
|
|
The first step in @func{main} is to call @func{bss_init}, which clears
|
|
out the kernel's ``BSS'', which is the traditional name for a
|
|
segment that should be initialized to all zeros. In most C
|
|
implementations, whenever you
|
|
declare a variable outside a function without providing an
|
|
initializer, that variable goes into the BSS. Because it's all zeros, the
|
|
BSS isn't stored in the image that the loader brought into memory. We
|
|
just use @func{memset} to zero it out.
|
|
|
|
Next, @func{main} calls @func{read_command_line} to break the kernel command
|
|
line into arguments, then @func{parse_options} to read any options at
|
|
the beginning of the command line. (Actions specified on the
|
|
command line execute later.)
|
|
|
|
@func{thread_init} initializes the thread system. We will defer full
|
|
discussion to our discussion of PintOS threads below. It is called so
|
|
early in initialization because a valid thread structure is a
|
|
prerequisite for acquiring a lock, and lock acquisition in turn is
|
|
important to other PintOS subsystems. Then we initialize the console
|
|
and print a startup message to the console.
|
|
|
|
The next block of functions we call initializes the kernel's memory
|
|
system. @func{palloc_init} sets up the kernel page allocator, which
|
|
doles out memory one or more pages at a time (@pxref{Page Allocator}).
|
|
@func{malloc_init} sets
|
|
up the allocator that handles allocations of arbitrary-size blocks of
|
|
memory (@pxref{Block Allocator}).
|
|
@func{paging_init} sets up a page table for the kernel (@pxref{Page
|
|
Table}).
|
|
|
|
In tasks 2 and later, @func{main} also calls @func{tss_init} and
|
|
@func{gdt_init}.
|
|
|
|
The next set of calls initializes the interrupt system.
|
|
@func{intr_init} sets up the CPU's @dfn{interrupt descriptor table}
|
|
(IDT) to ready it for interrupt handling (@pxref{Interrupt
|
|
Infrastructure}), then @func{timer_init} and @func{kbd_init} prepare for
|
|
handling timer interrupts and keyboard interrupts, respectively.
|
|
@func{input_init} sets up to merge serial and keyboard input into one
|
|
stream. In
|
|
tasks 2 and later, we also prepare to handle interrupts caused by
|
|
user programs using @func{exception_init} and @func{syscall_init}.
|
|
|
|
Now that interrupts are set up, we can start the scheduler
|
|
with @func{thread_start}, which creates the idle thread and enables
|
|
interrupts.
|
|
With interrupts enabled, interrupt-driven serial port I/O becomes
|
|
possible, so we use
|
|
@func{serial_init_queue} to switch to that mode. Finally,
|
|
@func{timer_calibrate} calibrates the timer for accurate short delays.
|
|
|
|
If the file system is compiled in, as it will starting in task 2, we
|
|
initialize the IDE disks with @func{ide_init}, then the
|
|
file system with @func{filesys_init}.
|
|
|
|
The PintOS boot is then complete, so we print a message.
|
|
|
|
Function @func{run_actions} now parses and executes actions specified on
|
|
the kernel command line, such as @command{run} to run a test (in task
|
|
1) or a user program (in later tasks).
|
|
|
|
Finally, if @option{-q} was specified on the kernel command line, we
|
|
call @func{shutdown_power_off} to terminate the machine simulator. Otherwise,
|
|
@func{main} calls @func{thread_exit}, which allows any other running
|
|
threads to continue running.
|
|
|
|
@node Physical Memory Map
|
|
@subsection Physical Memory Map
|
|
|
|
@multitable {@t{00000000}--@t{00000000}} {Hardware} {Some much longer explanatory text}
|
|
@headitem Memory Range
|
|
@tab Owner
|
|
@tab Contents
|
|
|
|
@item @t{00000000}--@t{000003ff} @tab CPU @tab Real mode interrupt table.
|
|
@item @t{00000400}--@t{000005ff} @tab BIOS @tab Miscellaneous data area.
|
|
@item @t{00000600}--@t{00007bff} @tab --- @tab ---
|
|
@item @t{00007c00}--@t{00007dff} @tab PintOS @tab Loader.
|
|
@item @t{0000e000}--@t{0000efff} @tab PintOS
|
|
@tab Stack for loader; kernel stack and @struct{thread} for initial
|
|
kernel thread.
|
|
@item @t{0000f000}--@t{0000ffff} @tab PintOS
|
|
@tab Page directory for startup code.
|
|
@item @t{00010000}--@t{00020000} @tab PintOS
|
|
@tab Page tables for startup code.
|
|
@item @t{00020000}--@t{0009ffff} @tab PintOS
|
|
@tab Kernel code, data, and uninitialized data segments.
|
|
@item @t{000a0000}--@t{000bffff} @tab Video @tab VGA display memory.
|
|
@item @t{000c0000}--@t{000effff} @tab Hardware
|
|
@tab Reserved for expansion card RAM and ROM.
|
|
@item @t{000f0000}--@t{000fffff} @tab BIOS @tab ROM BIOS.
|
|
@item @t{00100000}--@t{03ffffff} @tab PintOS @tab Dynamic memory allocation.
|
|
@end multitable
|
|
|
|
@node Threads
|
|
@section Threads
|
|
|
|
@menu
|
|
* struct thread::
|
|
* Thread Functions::
|
|
* Thread Switching::
|
|
@end menu
|
|
|
|
@node struct thread
|
|
@subsection @code{struct thread}
|
|
|
|
The main PintOS data structure for threads is @struct{thread},
|
|
declared in @file{threads/thread.h}.
|
|
|
|
@deftp {Structure} {struct thread}
|
|
Represents a thread or a user process. In the tasks, you will have
|
|
to add your own members to @struct{thread}. You may also change or
|
|
delete the definitions of existing members.
|
|
|
|
Every @struct{thread} occupies the beginning of its own page of
|
|
memory. The rest of the page is used for the thread's stack, which
|
|
grows downward from the end of the page. It looks like this:
|
|
|
|
@example
|
|
@group
|
|
4 kB +---------------------------------+
|
|
| kernel stack |
|
|
| | |
|
|
| | |
|
|
| V |
|
|
| grows downward |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
sizeof (struct thread) +---------------------------------+
|
|
| magic |
|
|
| : |
|
|
| : |
|
|
| status |
|
|
| tid |
|
|
0 kB +---------------------------------+
|
|
@end group
|
|
@end example
|
|
|
|
This has two consequences. First, @struct{thread} must not be allowed
|
|
to grow too big. If it does, then there will not be enough room for the
|
|
kernel stack. The base @struct{thread} is only a few bytes in size. It
|
|
probably should stay well under 1 kB.
|
|
|
|
Second, kernel stacks must not be allowed to grow too large. If a stack
|
|
overflows, it will corrupt the thread state. Thus, kernel functions
|
|
should not allocate large structures or arrays as non-static local
|
|
variables. Use dynamic allocation with @func{malloc} or
|
|
@func{palloc_get_page} instead (@pxref{Memory Allocation}).
|
|
@end deftp
|
|
|
|
@deftypecv {Member} {@struct{thread}} {tid_t} tid
|
|
The thread's thread identifier or @dfn{tid}. Every thread must have a
|
|
tid that is unique over the entire lifetime of the kernel. By
|
|
default, @code{tid_t} is a @code{typedef} for @code{int} and each new
|
|
thread receives the numerically next higher tid, starting from 1 for
|
|
the initial process. You can change the type and the numbering scheme
|
|
if you like.
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{thread}} {enum thread_status} status
|
|
@anchor{Thread States}
|
|
The thread's state, one of the following:
|
|
|
|
@defvr {Thread State} @code{THREAD_RUNNING}
|
|
The thread is running. Exactly one thread is running at a given time.
|
|
@func{thread_current} returns the running thread.
|
|
@end defvr
|
|
|
|
@defvr {Thread State} @code{THREAD_READY}
|
|
The thread is ready to run, but it's not running right now. The
|
|
thread could be selected to run the next time the scheduler is
|
|
invoked. Ready threads are kept in a doubly linked list called
|
|
@code{ready_list}.
|
|
@end defvr
|
|
|
|
@defvr {Thread State} @code{THREAD_BLOCKED}
|
|
The thread is waiting for something, e.g.@: a lock to become
|
|
available, an interrupt to be invoked. The thread won't be scheduled
|
|
again until it transitions to the @code{THREAD_READY} state with a
|
|
call to @func{thread_unblock}. This is most conveniently done
|
|
indirectly, using one of the PintOS synchronization primitives that
|
|
block and unblock threads automatically (@pxref{Synchronization}).
|
|
|
|
There is no @i{a priori} way to tell what a blocked thread is waiting
|
|
for, but a backtrace can help (@pxref{Backtraces}).
|
|
@end defvr
|
|
|
|
@defvr {Thread State} @code{THREAD_DYING}
|
|
The thread will be destroyed by the scheduler after switching to the
|
|
next thread.
|
|
@end defvr
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{thread}} {char} name[16]
|
|
The thread's name as a string, or at least the first few characters of
|
|
it.
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{thread}} {uint8_t *} stack
|
|
Every thread has its own stack to keep track of its state. When the
|
|
thread is running, the CPU's stack pointer register tracks the top of
|
|
the stack and this member is unused. But when the CPU switches to
|
|
another thread, this member saves the thread's stack pointer. No
|
|
other members are needed to save the thread's registers, because the
|
|
other registers that must be saved are saved on the stack.
|
|
|
|
When an interrupt occurs, whether in the kernel or a user program, an
|
|
@struct{intr_frame} is pushed onto the stack. When the interrupt occurs
|
|
in a user program, the @struct{intr_frame} is always at the very top of
|
|
the page. @xref{Interrupt Handling}, for more information.
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{thread}} {int} priority
|
|
A thread priority, ranging from @code{PRI_MIN} (0) to @code{PRI_MAX}
|
|
(63). Lower numbers correspond to lower priorities, so that
|
|
priority 0 is the lowest priority and priority 63 is the highest.
|
|
PintOS, as initially provided, ignores thread priorities, but you will implement
|
|
priority scheduling in task 1 (@pxref{Priority Scheduling}).
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{thread}} {@struct{list_elem}} allelem
|
|
This ``list element'' is used to link the thread into the list of all
|
|
threads. Each thread is inserted into this list when it is created
|
|
and removed when it exits. The @func{thread_foreach} function should
|
|
be used to iterate over all threads.
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{thread}} {@struct{list_elem}} elem
|
|
A ``list element'' used to put the thread into doubly linked lists,
|
|
either @code{ready_list} (the list of threads ready to run) or a list of
|
|
threads waiting on a semaphore in @func{sema_down}. It can do double
|
|
duty because a thread waiting on a semaphore is not ready, and vice
|
|
versa.
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{thread}} {uint32_t *} pagedir
|
|
Only present in task 2 and later. @xref{Page Tables}.
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{thread}} {unsigned} magic
|
|
Always set to @code{THREAD_MAGIC}, which is just an arbitrary number defined
|
|
in @file{threads/thread.c}, and used to detect stack overflow.
|
|
@func{thread_current} checks that the @code{magic} member of the running
|
|
thread's @struct{thread} is set to @code{THREAD_MAGIC}. Stack overflow
|
|
tends to change this value, triggering the assertion. For greatest
|
|
benefit, as you add members to @struct{thread}, leave @code{magic} at
|
|
the end.
|
|
@end deftypecv
|
|
|
|
@node Thread Functions
|
|
@subsection Thread Functions
|
|
|
|
@file{threads/thread.c} implements several public functions for thread
|
|
support. Let's take a look at the most useful:
|
|
|
|
@deftypefun void thread_init (void)
|
|
Called by @func{main} to initialize the thread system. Its main
|
|
purpose is to create a @struct{thread} for PintOS's initial thread.
|
|
This is possible because the PintOS loader puts the initial
|
|
thread's stack at the top of a page, in the same position as any other
|
|
PintOS thread.
|
|
|
|
Before @func{thread_init} runs,
|
|
@func{thread_current} will fail because the running thread's
|
|
@code{magic} value is incorrect. Lots of functions call
|
|
@func{thread_current} directly or indirectly, including
|
|
@func{lock_acquire} for locking a lock, so @func{thread_init} is
|
|
called early in PintOS initialization.
|
|
@end deftypefun
|
|
|
|
@deftypefun void thread_start (void)
|
|
Called by @func{main} to start the scheduler. Creates the idle
|
|
thread, that is, the thread that is scheduled when no other thread is
|
|
ready. Then enables interrupts, which as a side effect enables the
|
|
scheduler because the scheduler runs on return from the timer interrupt, using
|
|
@func{intr_yield_on_return} (@pxref{External Interrupt Handling}).
|
|
@end deftypefun
|
|
|
|
@deftypefun void thread_tick (void)
|
|
Called by the timer interrupt at each timer tick. It keeps track of
|
|
thread statistics and triggers the scheduler when a time slice expires.
|
|
@end deftypefun
|
|
|
|
@deftypefun void thread_print_stats (void)
|
|
Called during PintOS shutdown to print thread statistics.
|
|
@end deftypefun
|
|
|
|
@deftypefun tid_t thread_create (const char *@var{name}, int @var{priority}, thread_func *@var{func}, void *@var{aux})
|
|
Creates and starts a new thread named @var{name} with the given
|
|
@var{priority}, returning the new thread's tid. The thread executes
|
|
@var{func}, passing @var{aux} as the function's single argument.
|
|
|
|
@func{thread_create} allocates a page for the thread's
|
|
@struct{thread} and stack and initializes its members, then it sets
|
|
up a set of fake stack frames for it (@pxref{Thread Switching}). The
|
|
thread is initialized in the blocked state, then unblocked just before
|
|
returning, which allows the new thread to
|
|
be scheduled (@pxref{Thread States}).
|
|
|
|
@deftp {Type} {void thread_func (void *@var{aux})}
|
|
This is the type of the function passed to @func{thread_create}, whose
|
|
@var{aux} argument is passed along as the function's argument.
|
|
@end deftp
|
|
@end deftypefun
|
|
|
|
@deftypefun void thread_block (void)
|
|
Transitions the running thread from the running state to the blocked
|
|
state (@pxref{Thread States}). The thread will not run again until
|
|
@func{thread_unblock} is
|
|
called on it, so you'd better have some way arranged for that to happen.
|
|
Because @func{thread_block} is so low-level, you should prefer to use
|
|
one of the synchronization primitives instead (@pxref{Synchronization}).
|
|
@end deftypefun
|
|
|
|
@deftypefun void thread_unblock (struct thread *@var{thread})
|
|
Transitions @var{thread}, which must be in the blocked state, to the
|
|
ready state, allowing it to resume running (@pxref{Thread States}).
|
|
This is called when the event that the thread is waiting for occurs,
|
|
e.g.@: when the lock that
|
|
the thread is waiting on becomes available.
|
|
@end deftypefun
|
|
|
|
@deftypefun {struct thread *} thread_current (void)
|
|
Returns the running thread.
|
|
@end deftypefun
|
|
|
|
@deftypefun {tid_t} thread_tid (void)
|
|
Returns the running thread's thread id. Equivalent to
|
|
@code{thread_current ()->tid}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {const char *} thread_name (void)
|
|
Returns the running thread's name. Equivalent to @code{thread_current
|
|
()->name}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void thread_exit (void) @code{NO_RETURN}
|
|
Causes the current thread to exit. Never returns, hence
|
|
@code{NO_RETURN} (@pxref{Function and Parameter Attributes}).
|
|
@end deftypefun
|
|
|
|
@deftypefun void thread_yield (void)
|
|
Yields the CPU to the scheduler, which picks a new thread to run. The
|
|
new thread might be the current thread, so you can't depend on this
|
|
function to keep this thread from running for any particular length of
|
|
time.
|
|
@end deftypefun
|
|
|
|
@deftypefun void thread_foreach (thread_action_func *@var{action}, void *@var{aux})
|
|
Iterates over all threads @var{t} and invokes @code{action(t, aux)} on each.
|
|
@var{action} must refer to a function that matches the signature
|
|
given by @func{thread_action_func}:
|
|
|
|
@deftp {Type} {void thread_action_func (struct thread *@var{thread}, void *@var{aux})}
|
|
Performs some action on a thread, given @var{aux}.
|
|
@end deftp
|
|
@end deftypefun
|
|
|
|
@deftypefun int thread_get_priority (void)
|
|
@deftypefunx void thread_set_priority (int @var{new_priority})
|
|
Stub to set and get thread priority. @xref{Priority Scheduling}.
|
|
@end deftypefun
|
|
|
|
@deftypefun int thread_get_nice (void)
|
|
@deftypefunx void thread_set_nice (int @var{new_nice})
|
|
@deftypefunx int thread_get_recent_cpu (void)
|
|
@deftypefunx int thread_get_load_avg (void)
|
|
Stubs for the advanced scheduler. @xref{4.4BSD Scheduler}.
|
|
@end deftypefun
|
|
|
|
@node Thread Switching
|
|
@subsection Thread Switching
|
|
|
|
@func{schedule} is responsible for switching threads. It
|
|
is internal to @file{threads/thread.c} and called only by the three
|
|
public thread functions that need to switch threads:
|
|
@func{thread_block}, @func{thread_exit}, and @func{thread_yield}.
|
|
Before any of these functions call @func{schedule}, they disable
|
|
interrupts (or ensure that they are already disabled) and then change
|
|
the running thread's state to something other than running.
|
|
|
|
@func{schedule} is short but tricky. It records the
|
|
current thread in local variable @var{cur}, determines the next thread
|
|
to run as local variable @var{next} (by calling
|
|
@func{next_thread_to_run}), and then calls @func{switch_threads} to do
|
|
the actual thread switch. The thread we switched to was also running
|
|
inside @func{switch_threads}, as are all the threads not currently
|
|
running, so the new thread now returns out of
|
|
@func{switch_threads}, returning the previously running thread.
|
|
|
|
@func{switch_threads} is an assembly language routine in
|
|
@file{threads/switch.S}. It saves registers on the stack, saves the
|
|
CPU's current stack pointer in the current @struct{thread}'s @code{stack}
|
|
member, restores the new thread's @code{stack} into the CPU's stack
|
|
pointer, restores registers from the stack, and returns.
|
|
|
|
The rest of the scheduler is implemented in @func{thread_schedule_tail}. It
|
|
marks the new thread as running. If the thread we just switched from
|
|
is in the dying state, then it also frees the page that contained the
|
|
dying thread's @struct{thread} and stack. These couldn't be freed
|
|
prior to the thread switch because the switch needed to use it.
|
|
|
|
Running a thread for the first time is a special case. When
|
|
@func{thread_create} creates a new thread, it goes through a fair
|
|
amount of trouble to get it started properly. In particular, the new
|
|
thread hasn't started running yet, so there's no way for it to be
|
|
running inside @func{switch_threads} as the scheduler expects. To
|
|
solve the problem, @func{thread_create} creates some fake stack frames
|
|
in the new thread's stack:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
The topmost fake stack frame is for @func{switch_threads}, represented
|
|
by @struct{switch_threads_frame}. The important part of this frame is
|
|
its @code{eip} member, the return address. We point @code{eip} to
|
|
@func{switch_entry}, indicating it to be the function that called
|
|
@func{switch_threads}.
|
|
|
|
@item
|
|
The next fake stack frame is for @func{switch_entry}, an assembly
|
|
language routine in @file{threads/switch.S} that adjusts the stack
|
|
pointer,@footnote{This is because @func{switch_threads} takes
|
|
arguments on the stack and the 80@var{x}86 SVR4 calling convention
|
|
requires the caller, not the called function, to remove them when the
|
|
call is complete. See @bibref{SysV-i386} chapter 3 for details.}
|
|
calls @func{thread_schedule_tail} (this special case is why
|
|
@func{thread_schedule_tail} is separate from @func{schedule}), and returns.
|
|
We fill in its stack frame so that it returns into
|
|
@func{kernel_thread}, a function in @file{threads/thread.c}.
|
|
|
|
@item
|
|
The final stack frame is for @func{kernel_thread}, which enables
|
|
interrupts and calls the thread's function (the function passed to
|
|
@func{thread_create}). If the thread's function returns, it calls
|
|
@func{thread_exit} to terminate the thread.
|
|
@end itemize
|
|
|
|
@node Synchronization
|
|
@section Synchronization
|
|
|
|
If sharing of resources between threads is not handled in a careful, controlled fashion, the result is usually a big mess.
|
|
This is especially the case in operating system kernels, where faulty sharing can crash the entire machine.
|
|
PintOS provides several synchronization primitives to help out.
|
|
|
|
@cartouche
|
|
@noindent@strong{Important:} For the scope of all PintOS tasks, you may assume that any 1, 2 or 4 byte read or write operation on aligned memory is atomic.
|
|
All other read or write operations could potentially be interrupted or descheduled.
|
|
@end cartouche
|
|
|
|
@menu
|
|
* Disabling Interrupts::
|
|
* Semaphores::
|
|
* Locks::
|
|
* Monitors::
|
|
* Optimization Barriers::
|
|
@end menu
|
|
|
|
@node Disabling Interrupts
|
|
@subsection Disabling Interrupts
|
|
|
|
The crudest way to do synchronization is to disable interrupts, that
|
|
is, to temporarily prevent the CPU from responding to interrupts. If
|
|
interrupts are off, no other thread will preempt the running thread,
|
|
because thread preemption is driven by the timer interrupt. If
|
|
interrupts are on, as they normally are, then the running thread may
|
|
be preempted by another at any time, whether between two C statements
|
|
or even within the execution of one.
|
|
|
|
Incidentally, this means that PintOS is a ``preemptible kernel,'' that
|
|
is, kernel threads can be preempted at any time. Traditional Unix
|
|
systems are ``nonpreemptible,'' that is, kernel threads can only be
|
|
preempted at points where they explicitly call into the scheduler.
|
|
(User programs can be preempted at any time in both models.) As you
|
|
might imagine, preemptible kernels require more explicit
|
|
synchronization.
|
|
|
|
You should have little need to set the interrupt state directly. Most
|
|
of the time you should use the other synchronization primitives
|
|
described in the following sections. The main reason to disable
|
|
interrupts is to synchronize kernel threads with external interrupt
|
|
handlers, which cannot sleep and thus cannot use most other forms of
|
|
synchronization (@pxref{External Interrupt Handling}).
|
|
|
|
Some external interrupts cannot be postponed, even by disabling
|
|
interrupts. These interrupts, called @dfn{non-maskable interrupts}
|
|
(NMIs), are supposed to be used only in emergencies, e.g.@: when the
|
|
computer is on fire. PintOS does not handle non-maskable interrupts.
|
|
|
|
Types and functions for disabling and enabling interrupts are in
|
|
@file{threads/interrupt.h}.
|
|
|
|
@deftp Type {enum intr_level}
|
|
One of @code{INTR_OFF} or @code{INTR_ON}, denoting that interrupts are
|
|
disabled or enabled, respectively.
|
|
@end deftp
|
|
|
|
@deftypefun {enum intr_level} intr_get_level (void)
|
|
Returns the current interrupt state.
|
|
@end deftypefun
|
|
|
|
@deftypefun {enum intr_level} intr_set_level (enum intr_level @var{level})
|
|
Turns interrupts on or off according to @var{level}. Returns the
|
|
previous interrupt state.
|
|
@end deftypefun
|
|
|
|
@deftypefun {enum intr_level} intr_enable (void)
|
|
Turns interrupts on. Returns the previous interrupt state.
|
|
@end deftypefun
|
|
|
|
@deftypefun {enum intr_level} intr_disable (void)
|
|
Turns interrupts off. Returns the previous interrupt state.
|
|
@end deftypefun
|
|
|
|
@node Semaphores
|
|
@subsection Semaphores
|
|
|
|
A @dfn{semaphore} is a nonnegative integer together with two operators
|
|
that manipulate it atomically, which are:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
``Down'' or ``P'': wait for the value to become positive, then
|
|
decrement it.
|
|
|
|
@item
|
|
``Up'' or ``V'': increment the value (and wake up one waiting thread,
|
|
if any).
|
|
@end itemize
|
|
|
|
A semaphore initialized to 0 may be used to wait for an event
|
|
that will happen exactly once. For example, suppose thread @var{A}
|
|
starts another thread @var{B} and wants to wait for @var{B} to signal
|
|
that some activity is complete. @var{A} can create a semaphore
|
|
initialized to 0, pass it to @var{B} as it starts it, and then
|
|
``down'' the semaphore. When @var{B} finishes its activity, it
|
|
``ups'' the semaphore. This works regardless of whether @var{A}
|
|
``downs'' the semaphore or @var{B} ``ups'' it first.
|
|
|
|
A semaphore initialized to 1 is typically used for controlling access
|
|
to a resource. Before a block of code starts using the resource, it
|
|
``downs'' the semaphore, then after it is done with the resource it
|
|
``ups'' the resource. In such a case a lock, described below, may be
|
|
more appropriate.
|
|
|
|
Semaphores can also be initialized to values larger than 1. These are
|
|
rarely used.
|
|
|
|
Semaphores were invented by Edsger Dijkstra and first used in the THE
|
|
operating system (@bibref{Dijkstra}).
|
|
|
|
PintOS' semaphore type and operations are declared in
|
|
@file{threads/synch.h}.
|
|
|
|
@deftp {Type} {struct semaphore}
|
|
Represents a semaphore.
|
|
@end deftp
|
|
|
|
@deftypefun void sema_init (struct semaphore *@var{sema}, unsigned @var{value})
|
|
Initializes @var{sema} as a new semaphore with the given initial
|
|
@var{value}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void sema_down (struct semaphore *@var{sema})
|
|
Executes the ``down'' or ``P'' operation on @var{sema}, waiting for
|
|
its value to become positive and then decrementing it by one.
|
|
@end deftypefun
|
|
|
|
@deftypefun bool sema_try_down (struct semaphore *@var{sema})
|
|
Tries to execute the ``down'' or ``P'' operation on @var{sema},
|
|
without waiting. Returns true if @var{sema}
|
|
was successfully decremented, or false if it was already
|
|
zero and thus could not be decremented without waiting. Calling this
|
|
function in a
|
|
tight loop wastes CPU time, so use @func{sema_down} or find a
|
|
different approach instead.
|
|
@end deftypefun
|
|
|
|
@deftypefun void sema_up (struct semaphore *@var{sema})
|
|
Executes the ``up'' or ``V'' operation on @var{sema},
|
|
incrementing its value. If any threads are waiting on
|
|
@var{sema}, wakes one of them up.
|
|
|
|
Unlike most synchronization primitives, @func{sema_up} may be called
|
|
inside an external interrupt handler (@pxref{External Interrupt
|
|
Handling}).
|
|
@end deftypefun
|
|
|
|
Semaphores are internally implemented by disabling interrupts
|
|
(@pxref{Disabling Interrupts}) and blocking and unblocking threads
|
|
(@func{thread_block} and @func{thread_unblock}). Each semaphore maintains
|
|
a list of waiting threads, using the linked list
|
|
implementation in @file{lib/kernel/list.c}.
|
|
|
|
@node Locks
|
|
@subsection Locks
|
|
|
|
A @dfn{lock} is like a semaphore with an initial value of 1
|
|
(@pxref{Semaphores}). A lock's equivalent of ``up'' is called
|
|
``release'', and the ``down'' operation is called ``acquire''.
|
|
|
|
Compared to a semaphore, a lock has one added restriction: only the
|
|
thread that acquires a lock, called the lock's ``owner'', is allowed to
|
|
release it. If this restriction is a problem, it's a good sign that a
|
|
semaphore should be used, instead of a lock.
|
|
|
|
Locks in PintOS are not ``recursive,'' that is, it is an error for the
|
|
thread currently holding a lock to try to acquire that lock.
|
|
|
|
Lock types and functions are declared in @file{threads/synch.h}.
|
|
|
|
@deftp {Type} {struct lock}
|
|
Represents a lock.
|
|
@end deftp
|
|
|
|
@deftypefun void lock_init (struct lock *@var{lock})
|
|
Initializes @var{lock} as a new lock.
|
|
The lock is not initially owned by any thread.
|
|
@end deftypefun
|
|
|
|
@deftypefun void lock_acquire (struct lock *@var{lock})
|
|
Acquires @var{lock} for the current thread, first waiting for
|
|
any current owner to release it if necessary.
|
|
@end deftypefun
|
|
|
|
@deftypefun bool lock_try_acquire (struct lock *@var{lock})
|
|
Tries to acquire @var{lock} for use by the current thread, without
|
|
waiting. Returns true if successful, false if the lock is already
|
|
owned. Calling this function in a tight loop is a bad idea because it
|
|
wastes CPU time, so use @func{lock_acquire} instead.
|
|
@end deftypefun
|
|
|
|
@deftypefun void lock_release (struct lock *@var{lock})
|
|
Releases @var{lock}, which the current thread must own.
|
|
@end deftypefun
|
|
|
|
@deftypefun bool lock_held_by_current_thread (const struct lock *@var{lock})
|
|
Returns true if the running thread owns @var{lock},
|
|
false otherwise.
|
|
There is no function to test whether an arbitrary thread owns a lock,
|
|
because the answer could change before the caller could act on it.
|
|
@end deftypefun
|
|
|
|
@node Monitors
|
|
@subsection Monitors
|
|
|
|
A @dfn{monitor} is a higher-level form of synchronization than a
|
|
semaphore or a lock. A monitor consists of data being synchronized,
|
|
plus a lock, called the @dfn{monitor lock}, and one or more
|
|
@dfn{condition variables}. Before it accesses the protected data, a
|
|
thread first acquires the monitor lock. It is then said to be ``in the
|
|
monitor''. While in the monitor, the thread has control over all the
|
|
protected data, which it may freely examine or modify. When access to
|
|
the protected data is complete, it releases the monitor lock.
|
|
|
|
Condition variables allow code in the monitor to wait for a condition to
|
|
become true. Each condition variable is associated with an abstract
|
|
condition, e.g.@: ``some data has arrived for processing'' or ``over 10
|
|
seconds has passed since the user's last keystroke''. When code in the
|
|
monitor needs to wait for a condition to become true, it ``waits'' on
|
|
the associated condition variable, which releases the lock and waits for
|
|
the condition to be signaled. If, on the other hand, it has caused one
|
|
of these conditions to become true, it ``signals'' the condition to wake
|
|
up one waiter, or ``broadcasts'' the condition to wake all of them.
|
|
|
|
The theoretical framework for monitors was laid out by C.@: A.@: R.@:
|
|
Hoare (@bibref{Hoare}). Their practical usage was later elaborated in a
|
|
paper on the Mesa operating system (@bibref{Lampson}).
|
|
|
|
Condition variable types and functions are declared in
|
|
@file{threads/synch.h}.
|
|
|
|
@deftp {Type} {struct condition}
|
|
Represents a condition variable.
|
|
@end deftp
|
|
|
|
@deftypefun void cond_init (struct condition *@var{cond})
|
|
Initializes @var{cond} as a new condition variable.
|
|
@end deftypefun
|
|
|
|
@deftypefun void cond_wait (struct condition *@var{cond}, struct lock *@var{lock})
|
|
Atomically releases @var{lock} (the monitor lock) and waits for
|
|
@var{cond} to be signaled by some other piece of code. After
|
|
@var{cond} is signaled, reacquires @var{lock} before returning.
|
|
@var{lock} must be held before calling this function.
|
|
|
|
Sending a signal and waking up from a wait are not an atomic operation.
|
|
Thus, typically, @func{cond_wait}'s caller must recheck the condition
|
|
after the wait completes and, if necessary, wait again. See the next
|
|
section for an example.
|
|
@end deftypefun
|
|
|
|
@deftypefun void cond_signal (struct condition *@var{cond}, struct lock *@var{lock})
|
|
If any threads are waiting on @var{cond} (protected by monitor lock
|
|
@var{lock}), then this function wakes up one of them. If no threads are
|
|
waiting, returns without performing any action.
|
|
@var{lock} must be held before calling this function.
|
|
@end deftypefun
|
|
|
|
@deftypefun void cond_broadcast (struct condition *@var{cond}, struct lock *@var{lock})
|
|
Wakes up all threads, if any, waiting on @var{cond} (protected by
|
|
monitor lock @var{lock}). @var{lock} must be held before calling this
|
|
function.
|
|
@end deftypefun
|
|
|
|
@subsubsection Monitor Example
|
|
|
|
The classical example of a monitor is handling a buffer into which one
|
|
or more
|
|
``producer'' threads write characters and out of which one or more
|
|
``consumer'' threads read characters. To implement this we need,
|
|
besides the monitor lock, two condition variables which we will call
|
|
@var{not_full} and @var{not_empty}:
|
|
|
|
@example
|
|
char buf[BUF_SIZE]; /* @r{Buffer.} */
|
|
size_t n = 0; /* @r{0 <= n <= @var{BUF_SIZE}: # of characters in buffer.} */
|
|
size_t head = 0; /* @r{@var{buf} index of next char to write (mod @var{BUF_SIZE}).} */
|
|
size_t tail = 0; /* @r{@var{buf} index of next char to read (mod @var{BUF_SIZE}).} */
|
|
struct lock lock; /* @r{Monitor lock.} */
|
|
struct condition not_empty; /* @r{Signaled when the buffer is not empty.} */
|
|
struct condition not_full; /* @r{Signaled when the buffer is not full.} */
|
|
|
|
@dots{}@r{initialize the locks and condition variables}@dots{}
|
|
|
|
void put (char ch) @{
|
|
lock_acquire (&lock);
|
|
while (n == BUF_SIZE) @{ /* @r{Can't add to @var{buf} as long as it's full.} */
|
|
cond_wait (¬_full, &lock);
|
|
@}
|
|
buf[head++ % BUF_SIZE] = ch; /* @r{Add @var{ch} to @var{buf}.} */
|
|
n++;
|
|
cond_signal (¬_empty, &lock); /* @r{@var{buf} can't be empty anymore.} */
|
|
lock_release (&lock);
|
|
@}
|
|
|
|
char get (void) @{
|
|
char ch;
|
|
lock_acquire (&lock);
|
|
while (n == 0) @{ /* @r{Can't read @var{buf} as long as it's empty.} */
|
|
cond_wait (¬_empty, &lock);
|
|
@}
|
|
ch = buf[tail++ % BUF_SIZE]; /* @r{Get @var{ch} from @var{buf}.} */
|
|
n--;
|
|
cond_signal (¬_full, &lock); /* @r{@var{buf} can't be full anymore.} */
|
|
lock_release (&lock);
|
|
return ch;
|
|
@}
|
|
@end example
|
|
|
|
Note that @code{BUF_SIZE} must divide evenly into @code{SIZE_MAX + 1}
|
|
for the above code to be completely correct. Otherwise, it will fail
|
|
the first time @code{head} wraps around to 0. In practice,
|
|
@code{BUF_SIZE} would ordinarily be a power of 2.
|
|
|
|
@node Optimization Barriers
|
|
@subsection Optimization Barriers
|
|
|
|
@c We should try to come up with a better example.
|
|
@c Perhaps something with a linked list?
|
|
|
|
An @dfn{optimization barrier} is a special statement that prevents the
|
|
compiler from making assumptions about the state of memory across the
|
|
barrier. The compiler will not reorder reads or writes of variables
|
|
across the barrier or assume that a variable's value is unmodified
|
|
across the barrier, except for local variables whose address is never
|
|
taken. In PintOS, @file{threads/synch.h} defines the @code{barrier()}
|
|
macro as an optimization barrier.
|
|
|
|
One reason to use an optimization barrier is when data can change
|
|
asynchronously, without the compiler's knowledge, e.g.@: by another
|
|
thread or an interrupt handler. The @func{too_many_loops} function in
|
|
@file{devices/timer.c} is an example. This function starts out by
|
|
busy-waiting in a loop until a timer tick occurs:
|
|
|
|
@example
|
|
/* Wait for a timer tick. */
|
|
int64_t start = ticks;
|
|
while (ticks == start) @{
|
|
barrier ();
|
|
@}
|
|
@end example
|
|
|
|
@noindent
|
|
Without an optimization barrier in the loop, the compiler could
|
|
conclude that the loop would never terminate, because @code{start} and
|
|
@code{ticks} start out equal and the loop itself never changes them.
|
|
It could then ``optimize'' the function into an infinite loop, which
|
|
would definitely be undesirable.
|
|
|
|
Optimization barriers can be used to avoid other compiler
|
|
optimizations. The @func{busy_wait} function, also in
|
|
@file{devices/timer.c}, is an example. It contains this loop:
|
|
|
|
@example
|
|
while (loops-- > 0) @{
|
|
barrier ();
|
|
@}
|
|
@end example
|
|
|
|
@noindent
|
|
The goal of this loop is to busy-wait by counting @code{loops} down
|
|
from its original value to 0. Without the barrier, the compiler could
|
|
delete the loop entirely, because it produces no useful output and has
|
|
no side effects. The barrier forces the compiler to pretend that the
|
|
loop body has an important effect.
|
|
|
|
Finally, optimization barriers can be used to force the ordering of
|
|
memory reads or writes. For example, suppose we add a ``feature''
|
|
that, whenever a timer interrupt occurs, the character in global
|
|
variable @code{timer_put_char} is printed on the console, but only if
|
|
global Boolean variable @code{timer_do_put} is true. The best way to
|
|
set up @samp{x} to be printed is then to use an optimization barrier,
|
|
like this:
|
|
|
|
@example
|
|
timer_put_char = 'x';
|
|
barrier ();
|
|
timer_do_put = true;
|
|
@end example
|
|
|
|
Without the barrier, the code is buggy because the compiler is free to
|
|
reorder operations when it doesn't see a reason to keep them in the
|
|
same order. In this case, the compiler doesn't know that the order of
|
|
assignments is important, so its optimizer is permitted to exchange
|
|
their order. There's no telling whether it will actually do this, and
|
|
it is possible that passing the compiler different optimization flags
|
|
or using a different version of the compiler will produce different
|
|
behaviour.
|
|
|
|
Another solution is to disable interrupts around the assignments.
|
|
This does not prevent reordering, but it prevents the interrupt
|
|
handler from intervening between the assignments. It also has the
|
|
extra runtime cost of disabling and re-enabling interrupts:
|
|
|
|
@example
|
|
enum intr_level old_level = intr_disable ();
|
|
timer_put_char = 'x';
|
|
timer_do_put = true;
|
|
intr_set_level (old_level);
|
|
@end example
|
|
|
|
A third solution is to mark the declarations of
|
|
@code{timer_put_char} and @code{timer_do_put} as @samp{volatile}. This
|
|
keyword tells the compiler that the variables are externally observable
|
|
and restricts its latitude for optimization. However, the semantics of
|
|
@samp{volatile} are not well-defined, so it is not a good general
|
|
solution. The base PintOS code does not use @samp{volatile} at all.
|
|
|
|
The following is @emph{not} a solution, because locks neither prevent
|
|
interrupts nor prevent the compiler from reordering the code within the
|
|
region where the lock is held:
|
|
|
|
@example
|
|
lock_acquire (&timer_lock); /* INCORRECT CODE */
|
|
timer_put_char = 'x';
|
|
timer_do_put = true;
|
|
lock_release (&timer_lock);
|
|
@end example
|
|
|
|
The compiler treats invocation of any function defined externally,
|
|
that is, in another source file, as a limited form of optimization
|
|
barrier. Specifically, the compiler assumes that any externally
|
|
defined function may access any statically or dynamically allocated
|
|
data and any local variable whose address is taken. This often means
|
|
that explicit barriers can be omitted. It is one reason that PintOS
|
|
contains few explicit barriers.
|
|
|
|
A function defined in the same source file, or in a header included by
|
|
the source file, cannot be relied upon as a optimization barrier.
|
|
This applies even to invocation of a function before its
|
|
definition, because the compiler may read and parse the entire source
|
|
file before performing optimization.
|
|
|
|
@node Interrupt Handling
|
|
@section Interrupt Handling
|
|
|
|
An @dfn{interrupt} notifies the CPU of some event. Much of the work
|
|
of an operating system relates to interrupts in one way or another.
|
|
For our purposes, we classify interrupts into two broad categories:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@dfn{Internal interrupts}, that is, interrupts caused directly by CPU
|
|
instructions. System calls, attempts at invalid memory access
|
|
(@dfn{page faults}), and attempts to divide by zero are some activities
|
|
that cause internal interrupts. Because they are caused by CPU
|
|
instructions, internal interrupts are @dfn{synchronous} or synchronized
|
|
with CPU instructions. @func{intr_disable} does not disable internal
|
|
interrupts.
|
|
|
|
@item
|
|
@dfn{External interrupts}, that is, interrupts originating outside the
|
|
CPU. These interrupts come from hardware devices such as the system
|
|
timer, keyboard, serial ports, and disks. External interrupts are
|
|
@dfn{asynchronous}, meaning that their delivery is not
|
|
synchronized with instruction execution. Handling of external interrupts
|
|
can be postponed with @func{intr_disable} and related functions
|
|
(@pxref{Disabling Interrupts}).
|
|
@end itemize
|
|
|
|
The CPU treats both classes of interrupts largely the same way,
|
|
so PintOS has common infrastructure to handle both classes.
|
|
The following section describes this
|
|
common infrastructure. The sections after that give the specifics of
|
|
external and internal interrupts.
|
|
|
|
If you haven't already read chapter 3, ``Basic Execution Environment,''
|
|
in @bibref{IA32-v1}, it is recommended that you do so now. You might
|
|
also want to skim chapter 5, ``Interrupt and Exception Handling,'' in
|
|
@bibref{IA32-v3a}.
|
|
|
|
@menu
|
|
* Interrupt Infrastructure::
|
|
* Internal Interrupt Handling::
|
|
* External Interrupt Handling::
|
|
@end menu
|
|
|
|
@node Interrupt Infrastructure
|
|
@subsection Interrupt Infrastructure
|
|
|
|
When an interrupt occurs, the CPU saves
|
|
its most essential state on the current stack (determined by esp)
|
|
and jumps to an interrupt handler routine.
|
|
The 80@var{x}86 architecture supports 256
|
|
interrupts, numbered 0 through 255, each with an independent
|
|
handler defined in an array called the @dfn{interrupt
|
|
descriptor table} or IDT.
|
|
|
|
In PintOS, @func{intr_init} in @file{threads/interrupt.c} sets up the
|
|
IDT so that each entry points to a unique entry point in
|
|
@file{threads/intr-stubs.S} named @func{intr@var{NN}_stub}, where
|
|
@var{NN} is the interrupt number in
|
|
hexadecimal. Because the CPU doesn't give
|
|
us any other way to find out the interrupt number, this entry point
|
|
pushes the interrupt number on the stack. Then it jumps to
|
|
@func{intr_entry}, which pushes all the registers that the processor
|
|
didn't already push for us, and then calls @func{intr_handler}, which
|
|
brings us back into C in @file{threads/interrupt.c}.
|
|
|
|
The main job of @func{intr_handler} is to call the function
|
|
registered for handling the particular interrupt. (If no
|
|
function is registered, it dumps some information to the console and
|
|
panics.) It also does some extra processing for external
|
|
interrupts (@pxref{External Interrupt Handling}).
|
|
|
|
When @func{intr_handler} returns, the assembly code in
|
|
@file{threads/intr-stubs.S} restores all the CPU registers saved
|
|
earlier and directs the CPU to return from the interrupt.
|
|
|
|
The following types and functions are common to all interrupts:
|
|
|
|
@deftp {Type} {void intr_handler_func (struct intr_frame *@var{frame})}
|
|
This is how an interrupt handler function must be declared. Its @var{frame}
|
|
argument (see below) allows it to determine the cause of the interrupt
|
|
and the state of the thread that was interrupted.
|
|
@end deftp
|
|
|
|
@deftp {Type} {struct intr_frame}
|
|
The stack frame of an interrupt handler, as saved by the CPU, the interrupt
|
|
stubs, and @func{intr_entry}. Its most interesting members are described
|
|
below.
|
|
@end deftp
|
|
|
|
@deftypecv {Member} {@struct{intr_frame}} uint32_t edi
|
|
@deftypecvx {Member} {@struct{intr_frame}} uint32_t esi
|
|
@deftypecvx {Member} {@struct{intr_frame}} uint32_t ebp
|
|
@deftypecvx {Member} {@struct{intr_frame}} uint32_t esp_dummy
|
|
@deftypecvx {Member} {@struct{intr_frame}} uint32_t ebx
|
|
@deftypecvx {Member} {@struct{intr_frame}} uint32_t edx
|
|
@deftypecvx {Member} {@struct{intr_frame}} uint32_t ecx
|
|
@deftypecvx {Member} {@struct{intr_frame}} uint32_t eax
|
|
@deftypecvx {Member} {@struct{intr_frame}} uint16_t es
|
|
@deftypecvx {Member} {@struct{intr_frame}} uint16_t ds
|
|
Register values in the interrupted thread, pushed by @func{intr_entry}.
|
|
The @code{esp_dummy} value isn't actually used (refer to the
|
|
description of @code{PUSHA} in @bibref{IA32-v2b} for details).
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{intr_frame}} uint32_t vec_no
|
|
The interrupt vector number, ranging from 0 to 255.
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{intr_frame}} uint32_t error_code
|
|
The ``error code'' pushed on the stack by the CPU for some internal
|
|
interrupts.
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{intr_frame}} void (*eip) (void)
|
|
The address of the next instruction to be executed by the interrupted
|
|
thread.
|
|
@end deftypecv
|
|
|
|
@deftypecv {Member} {@struct{intr_frame}} {void *} esp
|
|
The interrupted thread's stack pointer.
|
|
@end deftypecv
|
|
|
|
@deftypefun {const char *} intr_name (uint8_t @var{vec})
|
|
Returns the name of the interrupt numbered @var{vec}, or
|
|
@code{"unknown"} if the interrupt has no registered name.
|
|
@end deftypefun
|
|
|
|
@node Internal Interrupt Handling
|
|
@subsection Internal Interrupt Handling
|
|
|
|
Internal interrupts are caused directly by CPU instructions executed by
|
|
the running kernel thread or user process (from task 2 onward). An
|
|
internal interrupt is therefore said to arise in a ``process context.''
|
|
|
|
In an internal interrupt's handler, it can make sense to examine the
|
|
@struct{intr_frame} passed to the interrupt handler, or even to modify
|
|
it. When the interrupt returns, modifications in @struct{intr_frame}
|
|
become changes to the calling thread or process's state. For example,
|
|
the PintOS system call handler returns a value to the user program by
|
|
modifying the saved EAX register (@pxref{System Call Details}).
|
|
|
|
There are no special restrictions on what an internal interrupt
|
|
handler can or can't do. Generally they should run with interrupts
|
|
enabled, just like other code, so they can be preempted by other
|
|
kernel threads. Thus, they do need to synchronize with other threads
|
|
on shared data and other resources (@pxref{Synchronization}). Of course, this
|
|
only makes sense if they are not updating critical system data at the time.
|
|
|
|
Internal interrupt handlers can be invoked recursively. For example,
|
|
the system call handler might cause a page fault while attempting to
|
|
read user memory. Deep recursion would risk overflowing the limited
|
|
kernel stack (@pxref{struct thread}), but should be unnecessary.
|
|
|
|
@deftypefun void intr_register_int (uint8_t @var{vec}, int @var{dpl}, enum intr_level @var{level}, intr_handler_func *@var{handler}, const char *@var{name})
|
|
Registers @var{handler} to be called when internal interrupt numbered
|
|
@var{vec} is triggered. Names the interrupt @var{name} for debugging
|
|
purposes.
|
|
|
|
If @var{level} is @code{INTR_ON}, external interrupts will be processed
|
|
normally during the interrupt handler's execution, which is normally
|
|
desirable. Specifying @code{INTR_OFF} will cause the CPU to disable
|
|
external interrupts when it invokes the interrupt handler. The effect
|
|
is slightly different from calling @func{intr_disable} inside the
|
|
handler, because that leaves a window of one or more CPU instructions in
|
|
which external interrupts are still enabled. This is important for the
|
|
page fault handler; refer to the comments in @file{userprog/exception.c}
|
|
for details.
|
|
|
|
@var{dpl} determines how the interrupt can be invoked. If @var{dpl} is
|
|
0, then the interrupt can be invoked only by kernel threads. Otherwise
|
|
@var{dpl} should be 3, which allows user processes to invoke the
|
|
interrupt with an explicit INT instruction. The value of @var{dpl}
|
|
doesn't affect user processes' ability to invoke the interrupt
|
|
indirectly, e.g.@: an invalid memory reference will cause a page fault
|
|
regardless of @var{dpl}.
|
|
@end deftypefun
|
|
|
|
@node External Interrupt Handling
|
|
@subsection External Interrupt Handling
|
|
|
|
External interrupts are caused by events outside the CPU.
|
|
They are asynchronous, so they can be invoked at any time that
|
|
interrupts have not been disabled. We say that an external interrupt
|
|
runs in an ``interrupt context.''
|
|
|
|
In an external interrupt, the @struct{intr_frame} passed to the
|
|
handler is not very meaningful. It describes the state of the thread
|
|
or process that was interrupted, but there is no way to predict which
|
|
one that is. It is possible, although rarely useful, to examine it, but
|
|
modifying it is a recipe for disaster.
|
|
|
|
Only one external interrupt may be processed at a time. Neither
|
|
internal nor external interrupt may nest within an external interrupt
|
|
handler. Thus, an external interrupt's handler must run with interrupts
|
|
disabled (@pxref{Disabling Interrupts}).
|
|
|
|
An external interrupt handler must not sleep or yield, which rules out
|
|
calling @func{lock_acquire}, @func{thread_yield}, and many other
|
|
functions. Sleeping in interrupt context would effectively put the
|
|
interrupted thread to sleep, too, until the interrupt handler was again
|
|
scheduled and returned. This would be unfair to the unlucky thread, and
|
|
it would deadlock if the handler were waiting for the sleeping thread
|
|
to, e.g., release a lock.
|
|
|
|
An external interrupt handler effectively monopolizes the machine and delays
|
|
all other activities. Therefore, external interrupt handlers should complete
|
|
as quickly as they can. Anything that requires a significant amount of CPU
|
|
time should instead run in a kernel thread, possibly one that the interrupt
|
|
triggers using a synchronization primitive.
|
|
|
|
External interrupts are controlled by a
|
|
pair of devices outside the CPU called @dfn{programmable interrupt
|
|
controllers}, @dfn{PICs} for short. When @func{intr_init} sets up the
|
|
CPU's IDT, it also initializes the PICs for interrupt handling. The
|
|
PICs also must be ``acknowledged'' at the end of processing for each
|
|
external interrupt. @func{intr_handler} takes care of that by calling
|
|
@func{pic_end_of_interrupt}, which properly signals the PICs.
|
|
|
|
The following functions relate to external interrupts:
|
|
|
|
@deftypefun void intr_register_ext (uint8_t @var{vec}, intr_handler_func *@var{handler}, const char *@var{name})
|
|
Registers @var{handler} to be called when external interrupt numbered
|
|
@var{vec} is triggered. Names the interrupt @var{name} for debugging
|
|
purposes. The handler will run with interrupts disabled.
|
|
@end deftypefun
|
|
|
|
@deftypefun bool intr_context (void)
|
|
Returns true if we are running in an interrupt context, otherwise
|
|
false. Mainly used in functions that might sleep
|
|
or that otherwise should not be called from interrupt context, in this
|
|
form:
|
|
@example
|
|
ASSERT (!intr_context ());
|
|
@end example
|
|
@end deftypefun
|
|
|
|
@deftypefun void intr_yield_on_return (void)
|
|
When called in an interrupt context, causes @func{thread_yield} to be
|
|
called just before the interrupt returns. Used
|
|
in the timer interrupt handler when a thread's time slice expires, to
|
|
cause a new thread to be scheduled.
|
|
@end deftypefun
|
|
|
|
@node Memory Allocation
|
|
@section Memory Allocation
|
|
|
|
PintOS contains two memory allocators, one that allocates memory in
|
|
units of a page, and one that can allocate blocks of any size.
|
|
|
|
@menu
|
|
* Page Allocator::
|
|
* Block Allocator::
|
|
@end menu
|
|
|
|
@node Page Allocator
|
|
@subsection Page Allocator
|
|
|
|
The page allocator declared in @file{threads/palloc.h} allocates
|
|
memory in units of a page. It is most often used to allocate memory
|
|
one page at a time, but it can also allocate multiple contiguous pages
|
|
at once.
|
|
|
|
The page allocator divides the memory it allocates into two pools,
|
|
called the kernel and user pools. By default, each pool gets half of
|
|
system memory above @w{1 MB}, but the division can be changed with the
|
|
@option{-ul} kernel
|
|
command line
|
|
option (@pxref{Why PAL_USER?}). An allocation request draws from one
|
|
pool or the other. If one pool becomes empty, the other may still
|
|
have free pages. The user pool should be used for allocating memory
|
|
for user processes and the kernel pool for all other allocations.
|
|
This will only become important starting with task 3. Until then,
|
|
all allocations should be made from the kernel pool.
|
|
|
|
Each pool's usage is tracked with a bitmap, one bit per page in
|
|
the pool. A request to allocate @var{n} pages scans the bitmap
|
|
for @var{n} consecutive bits set to
|
|
false, indicating that those pages are free, and then sets those bits
|
|
to true to mark them as used. This is a ``first fit'' allocation
|
|
strategy (@pxref{Wilson}).
|
|
|
|
The page allocator is subject to fragmentation. That is, it may not
|
|
be possible to allocate @var{n} contiguous pages even though @var{n}
|
|
or more pages are free, because the free pages are separated by used
|
|
pages. In fact, in pathological cases it may be impossible to
|
|
allocate 2 contiguous pages even though half of the pool's pages are free.
|
|
Single-page requests can't fail due to fragmentation, so
|
|
requests for multiple contiguous pages should be limited as much as
|
|
possible.
|
|
|
|
Pages may not be allocated from interrupt context, but they may be
|
|
freed.
|
|
|
|
When a page is freed, all of its bytes are cleared to @t{0xcc}, as
|
|
a debugging aid (@pxref{Debugging Tips}).
|
|
|
|
Page allocator types and functions are described below:
|
|
|
|
@deftypefun {void *} palloc_get_page (enum palloc_flags @var{flags})
|
|
@deftypefunx {void *} palloc_get_multiple (enum palloc_flags @var{flags}, size_t @var{page_cnt})
|
|
Obtains and returns one page, or @var{page_cnt} contiguous pages,
|
|
respectively. Returns a null pointer if the pages cannot be allocated.
|
|
|
|
The @var{flags} argument may be any combination of the following flags:
|
|
|
|
@defvr {Page Allocator Flag} @code{PAL_ASSERT}
|
|
If the pages cannot be allocated, panic the kernel. This is only
|
|
appropriate during kernel initialization. User processes
|
|
should never be permitted to panic the kernel.
|
|
@end defvr
|
|
|
|
@defvr {Page Allocator Flag} @code{PAL_ZERO}
|
|
Zero all the bytes in the allocated pages before returning them. If not
|
|
set, the contents of newly allocated pages are unpredictable.
|
|
@end defvr
|
|
|
|
@defvr {Page Allocator Flag} @code{PAL_USER}
|
|
Obtain the pages from the user pool. If not set, pages are allocated
|
|
from the kernel pool.
|
|
@end defvr
|
|
@end deftypefun
|
|
|
|
@deftypefun void palloc_free_page (void *@var{page})
|
|
@deftypefunx void palloc_free_multiple (void *@var{pages}, size_t @var{page_cnt})
|
|
Frees one page, or @var{page_cnt} contiguous pages, respectively,
|
|
starting at @var{pages}. All of the pages must have been obtained using
|
|
@func{palloc_get_page} or @func{palloc_get_multiple}.
|
|
@end deftypefun
|
|
|
|
@node Block Allocator
|
|
@subsection Block Allocator
|
|
|
|
The block allocator, declared in @file{threads/malloc.h}, can allocate
|
|
blocks of any size. It is layered on top of the page allocator
|
|
described in the previous section. Blocks returned by the block
|
|
allocator are obtained from the kernel pool.
|
|
|
|
The block allocator uses two different strategies for allocating memory.
|
|
The first strategy applies to blocks that are 1 kB or smaller
|
|
(one-fourth of the page size). These allocations are rounded up to the
|
|
nearest power of 2, or 16 bytes, whichever is larger. Then they are
|
|
grouped into a page used only for allocations of that size.
|
|
|
|
The second strategy applies to blocks larger than 1 kB.
|
|
These allocations (plus a small amount of overhead) are rounded up to
|
|
the nearest page in size, and then the block allocator requests that
|
|
number of contiguous pages from the page allocator.
|
|
|
|
In either case, the difference between the allocation requested size
|
|
and the actual block size is wasted. A real operating system would
|
|
carefully tune its allocator to minimize this waste, but this is
|
|
unimportant in an instructional system like PintOS.
|
|
|
|
As long as a page can be obtained from the page allocator, small
|
|
allocations always succeed. Most small allocations do not require a
|
|
new page from the page allocator at all, because they are satisfied
|
|
using part of a page already allocated. However, large allocations
|
|
always require calling into the page allocator, and any allocation
|
|
that needs more than one contiguous page can fail due to fragmentation,
|
|
as already discussed in the previous section. Thus, you should
|
|
minimize the number of large allocations in your code, especially
|
|
those over approximately 4 kB each.
|
|
|
|
When a block is freed, all of its bytes are cleared to @t{0xcc}, as
|
|
a debugging aid (@pxref{Debugging Tips}).
|
|
|
|
The block allocator may not be called from interrupt context.
|
|
|
|
The block allocator functions are described below. Their interfaces are
|
|
the same as the standard C library functions of the same names.
|
|
|
|
@deftypefun {void *} malloc (size_t @var{size})
|
|
Obtains and returns a new block, from the kernel pool, at least
|
|
@var{size} bytes long. Returns a null pointer if @var{size} is zero or
|
|
if memory is not available.
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} calloc (size_t @var{a}, size_t @var{b})
|
|
Obtains a returns a new block, from the kernel pool, at least
|
|
@code{@var{a} * @var{b}} bytes long. The block's contents will be
|
|
cleared to zeros. Returns a null pointer if @var{a} or @var{b} is zero
|
|
or if insufficient memory is available.
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} realloc (void *@var{block}, size_t @var{new_size})
|
|
Attempts to resize @var{block} to @var{new_size} bytes, possibly moving
|
|
it in the process. If successful, returns the new block, in which case
|
|
the old block must no longer be accessed. On failure, returns a null
|
|
pointer, and the old block remains valid.
|
|
|
|
A call with @var{block} null is equivalent to @func{malloc}. A call
|
|
with @var{new_size} zero is equivalent to @func{free}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void free (void *@var{block})
|
|
Frees @var{block}, which must have been previously returned by
|
|
@func{malloc}, @func{calloc}, or @func{realloc} (and not yet freed).
|
|
@end deftypefun
|
|
|
|
@node Virtual Addresses
|
|
@section Virtual Addresses
|
|
|
|
A 32-bit virtual address can be divided into a 20-bit @dfn{page number}
|
|
and a 12-bit @dfn{page offset} (or just @dfn{offset}), like this:
|
|
|
|
@example
|
|
@group
|
|
31 12 11 0
|
|
+-------------------+-----------+
|
|
| Page Number | Offset |
|
|
+-------------------+-----------+
|
|
Virtual Address
|
|
@end group
|
|
@end example
|
|
|
|
Header @file{threads/vaddr.h} defines these functions and macros for
|
|
working with virtual addresses:
|
|
|
|
@defmac PGSHIFT
|
|
@defmacx PGBITS
|
|
The bit index (0) and number of bits (12) of the offset part of a
|
|
virtual address, respectively.
|
|
@end defmac
|
|
|
|
@defmac PGMASK
|
|
A bit mask with the bits in the page offset set to 1, the rest set to 0
|
|
(@t{0xfff}).
|
|
@end defmac
|
|
|
|
@defmac PGSIZE
|
|
The page size in bytes (4,096).
|
|
@end defmac
|
|
|
|
@deftypefun unsigned pg_ofs (const void *@var{va})
|
|
Extracts and returns the page offset in virtual address @var{va}.
|
|
@end deftypefun
|
|
|
|
@deftypefun uintptr_t pg_no (const void *@var{va})
|
|
Extracts and returns the page number in virtual address @var{va}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} pg_round_down (const void *@var{va})
|
|
Returns the start of the virtual page that @var{va} points within, that
|
|
is, @var{va} with the page offset set to 0.
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} pg_round_up (const void *@var{va})
|
|
Returns @var{va} rounded up to the nearest page boundary.
|
|
@end deftypefun
|
|
|
|
Virtual memory in PintOS is divided into two regions: user virtual
|
|
memory and kernel virtual memory (@pxref{Virtual Memory Layout}). The
|
|
boundary between them is @code{PHYS_BASE}:
|
|
|
|
@defmac PHYS_BASE
|
|
Base address of kernel virtual memory. It defaults to @t{0xc0000000} (3
|
|
GB), but it may be changed to any multiple of @t{0x10000000} from
|
|
@t{0x80000000} to @t{0xf0000000}.
|
|
|
|
User virtual memory ranges from virtual address 0 up to
|
|
@code{PHYS_BASE}. Kernel virtual memory occupies the rest of the
|
|
virtual address space, from @code{PHYS_BASE} up to 4 GB.
|
|
@end defmac
|
|
|
|
@deftypefun {bool} is_user_vaddr (const void *@var{va})
|
|
@deftypefunx {bool} is_kernel_vaddr (const void *@var{va})
|
|
Returns true if @var{va} is a user or kernel virtual address,
|
|
respectively, false otherwise.
|
|
@end deftypefun
|
|
|
|
The 80@var{x}86 architecture doesn't provide any way to directly access memory given
|
|
a physical address. This ability is often necessary in an operating
|
|
system kernel, so PintOS works around it by mapping kernel virtual
|
|
memory one-to-one to physical memory. That is, virtual address
|
|
@code{PHYS_BASE} accesses physical address 0, virtual address
|
|
@code{PHYS_BASE} + @t{0x1234} accesses physical address @t{0x1234}, and
|
|
so on up to the size of the machine's physical memory. Thus, adding
|
|
@code{PHYS_BASE} to a physical address obtains a kernel virtual address
|
|
that accesses that address; conversely, subtracting @code{PHYS_BASE}
|
|
from a kernel virtual address obtains the corresponding physical
|
|
address. Header @file{threads/vaddr.h} provides a pair of functions to
|
|
do these translations:
|
|
|
|
@deftypefun {void *} ptov (uintptr_t @var{pa})
|
|
Returns the kernel virtual address corresponding to physical address
|
|
@var{pa}, which should be between 0 and the number of bytes of physical
|
|
memory.
|
|
@end deftypefun
|
|
|
|
@deftypefun {uintptr_t} vtop (void *@var{va})
|
|
Returns the physical address corresponding to @var{va}, which must be a
|
|
kernel virtual address.
|
|
@end deftypefun
|
|
|
|
@node Page Table
|
|
@section Page Table
|
|
|
|
The code in @file{pagedir.c} is an abstract interface to the 80@var{x}86
|
|
hardware page table, also called a ``page directory'' by Intel processor
|
|
documentation. The page table interface uses a @code{uint32_t *} to
|
|
represent a page table because this is convenient for accessing their
|
|
internal structure.
|
|
|
|
The sections below describe the page table interface and internals.
|
|
|
|
@menu
|
|
* Page Table Creation Destruction Activation::
|
|
* Page Tables Inspection and Updates::
|
|
* Page Table Accessed and Dirty Bits::
|
|
* Page Table Details::
|
|
@end menu
|
|
|
|
@node Page Table Creation Destruction Activation
|
|
@subsection Creation, Destruction, and Activation
|
|
|
|
These functions create, destroy, and activate page tables. The base
|
|
PintOS code already calls these functions where necessary, so it should
|
|
not be necessary to call them yourself.
|
|
|
|
@deftypefun {uint32_t *} pagedir_create (void)
|
|
Creates and returns a new page table. The new page table contains
|
|
PintOS's normal kernel virtual page mappings, but no user virtual
|
|
mappings.
|
|
|
|
Returns a null pointer if memory cannot be obtained.
|
|
@end deftypefun
|
|
|
|
@deftypefun void pagedir_destroy (uint32_t *@var{pd})
|
|
Frees all of the resources held by @var{pd}, including the page table
|
|
itself and the frames that it maps.
|
|
@end deftypefun
|
|
|
|
@deftypefun void pagedir_activate (uint32_t *@var{pd})
|
|
Activates @var{pd}. The active page table is the one used by the CPU to
|
|
translate memory references.
|
|
@end deftypefun
|
|
|
|
@node Page Tables Inspection and Updates
|
|
@subsection Inspection and Updates
|
|
|
|
These functions examine or update the mappings from pages to frames
|
|
encapsulated by a page table. They work on both active and inactive
|
|
page tables (that is, those for running and suspended processes),
|
|
flushing the TLB as necessary.
|
|
|
|
@deftypefun bool pagedir_set_page (uint32_t *@var{pd}, void *@var{upage}, void *@var{kpage}, bool @var{writable})
|
|
Adds to @var{pd} a mapping from user page @var{upage} to the frame identified
|
|
by kernel virtual address @var{kpage}. If @var{writable} is true, the
|
|
page is mapped read/write; otherwise, it is mapped read-only.
|
|
|
|
User page @var{upage} must not already be mapped in @var{pd}.
|
|
|
|
Kernel page @var{kpage} should be a kernel virtual address obtained from
|
|
the user pool with @code{palloc_get_page(PAL_USER)} (@pxref{Why
|
|
PAL_USER?}).
|
|
|
|
Returns true if successful, false on failure. Failure will occur if
|
|
additional memory required for the page table cannot be obtained.
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} pagedir_get_page (uint32_t *@var{pd}, const void *@var{uaddr})
|
|
Looks up the frame mapped to @var{uaddr} in @var{pd}. Returns the
|
|
kernel virtual address for that frame, if @var{uaddr} is mapped, or a
|
|
null pointer if it is not.
|
|
@end deftypefun
|
|
|
|
@deftypefun void pagedir_clear_page (uint32_t *@var{pd}, void *@var{page})
|
|
Marks @var{page} ``not present'' in @var{pd}. Later accesses to
|
|
the page will fault.
|
|
|
|
Other bits in the page table for @var{page} are preserved, permitting
|
|
the accessed and dirty bits (see the next section) to be checked.
|
|
|
|
This function has no effect if @var{page} is not mapped.
|
|
@end deftypefun
|
|
|
|
@node Page Table Accessed and Dirty Bits
|
|
@subsection Accessed and Dirty Bits
|
|
|
|
80@var{x}86 hardware provides some assistance for implementing page
|
|
replacement algorithms, through a pair of bits in the page table entry
|
|
(PTE) for each page. On any read or write to a page, the CPU sets the
|
|
@dfn{accessed bit} to 1 in the page's PTE, and on any write, the CPU
|
|
sets the @dfn{dirty bit} to 1. The CPU never resets these bits to 0,
|
|
but the OS may do so.
|
|
|
|
Proper interpretation of these bits requires understanding of
|
|
@dfn{aliases}, that is, two (or more) pages that refer to the same
|
|
frame. When an aliased frame is accessed, the accessed and dirty bits
|
|
are updated in only one page table entry (the one for the page used for
|
|
access). The accessed and dirty bits for the other aliases are not
|
|
updated.
|
|
|
|
@xref{Accessed and Dirty Bits}, on applying these bits in implementing
|
|
page replacement algorithms.
|
|
|
|
@deftypefun bool pagedir_is_dirty (uint32_t *@var{pd}, const void *@var{page})
|
|
@deftypefunx bool pagedir_is_accessed (uint32_t *@var{pd}, const void *@var{page})
|
|
Returns true if page directory @var{pd} contains a page table entry for
|
|
@var{page} that is marked dirty (or accessed). Otherwise,
|
|
returns false.
|
|
@end deftypefun
|
|
|
|
@deftypefun void pagedir_set_dirty (uint32_t *@var{pd}, const void *@var{page}, bool @var{value})
|
|
@deftypefunx void pagedir_set_accessed (uint32_t *@var{pd}, const void *@var{page}, bool @var{value})
|
|
If page directory @var{pd} has a page table entry for @var{page}, then
|
|
its dirty (or accessed) bit is set to @var{value}.
|
|
@end deftypefun
|
|
|
|
@node Page Table Details
|
|
@subsection Page Table Details
|
|
|
|
The functions provided with PintOS are sufficient to implement the
|
|
tasks. However, you may still find it worthwhile to understand the
|
|
hardware page table format, so we'll go into a little detail in this
|
|
section.
|
|
|
|
@menu
|
|
* Page Table Structure::
|
|
* Page Table Entry Format::
|
|
* Page Directory Entry Format::
|
|
@end menu
|
|
|
|
@node Page Table Structure
|
|
@subsubsection Structure
|
|
|
|
The top-level paging data structure is a page called the ``page
|
|
directory'' (PD) arranged as an array of 1,024 32-bit page directory
|
|
entries (PDEs), each of which represents 4 MB of virtual memory. Each
|
|
PDE may point to the physical address of another page called a
|
|
``page table'' (PT) arranged, similarly, as an array of 1,024
|
|
32-bit page table entries (PTEs), each of which translates a single 4
|
|
kB virtual page to a physical page.
|
|
|
|
Translation of a virtual address into a physical address follows
|
|
the three-step process illustrated in the diagram
|
|
below:@footnote{Actually, virtual to physical translation on the
|
|
80@var{x}86 architecture occurs via an intermediate ``linear
|
|
address,'' but PintOS (and most modern 80@var{x}86 OSes) set up the CPU
|
|
so that linear and virtual addresses are one and the same. Thus, you
|
|
can effectively ignore this CPU feature.}
|
|
|
|
@enumerate 1
|
|
@item
|
|
The most-significant 10 bits of the virtual address (bits 22@dots{}31)
|
|
index the page directory. If the PDE is marked ``present,'' the
|
|
physical address of a page table is read from the PDE thus obtained.
|
|
If the PDE is marked ``not present'' then a page fault occurs.
|
|
|
|
@item
|
|
The next 10 bits of the virtual address (bits 12@dots{}21) index
|
|
the page table. If the PTE is marked ``present,'' the physical
|
|
address of a data page is read from the PTE thus obtained. If the PTE
|
|
is marked ``not present'' then a page fault occurs.
|
|
|
|
@item
|
|
The least-significant 12 bits of the virtual address (bits 0@dots{}11)
|
|
are added to the data page's physical base address, yielding the final
|
|
physical address.
|
|
@end enumerate
|
|
|
|
@example
|
|
@group
|
|
31 22 21 12 11 0
|
|
+----------------------+----------------------+----------------------+
|
|
| Page Directory Index | Page Table Index | Page Offset |
|
|
+----------------------+----------------------+----------------------+
|
|
| | |
|
|
_______/ _______/ _____/
|
|
/ / /
|
|
/ Page Directory / Page Table / Data Page
|
|
/ .____________. / .____________. / .____________.
|
|
|1,023|____________| |1,023|____________| | |____________|
|
|
|1,022|____________| |1,022|____________| | |____________|
|
|
|1,021|____________| |1,021|____________| \__\|____________|
|
|
|1,020|____________| |1,020|____________| /|____________|
|
|
| | | | | | | |
|
|
| | | \____\| |_ | |
|
|
| | . | /| . | \ | . |
|
|
\____\| . |_ | . | | | . |
|
|
/| . | \ | . | | | . |
|
|
| . | | | . | | | . |
|
|
| | | | | | | |
|
|
|____________| | |____________| | |____________|
|
|
4|____________| | 4|____________| | |____________|
|
|
3|____________| | 3|____________| | |____________|
|
|
2|____________| | 2|____________| | |____________|
|
|
1|____________| | 1|____________| | |____________|
|
|
0|____________| \__\0|____________| \____\|____________|
|
|
/ /
|
|
@end group
|
|
@end example
|
|
|
|
PintOS provides some macros and functions that are useful for working
|
|
with raw page tables:
|
|
|
|
@defmac PTSHIFT
|
|
@defmacx PTBITS
|
|
The starting bit index (12) and number of bits (10), respectively, in a
|
|
page table index.
|
|
@end defmac
|
|
|
|
@defmac PTMASK
|
|
A bit mask with the bits in the page table index set to 1 and the rest
|
|
set to 0 (@t{0x3ff000}).
|
|
@end defmac
|
|
|
|
@defmac PTSPAN
|
|
The number of bytes of virtual address space that a single page table
|
|
page covers (4,194,304 bytes, or 4 MB).
|
|
@end defmac
|
|
|
|
@defmac PDSHIFT
|
|
@defmacx PDBITS
|
|
The starting bit index (22) and number of bits (10), respectively, in a
|
|
page directory index.
|
|
@end defmac
|
|
|
|
@defmac PDMASK
|
|
A bit mask with the bits in the page directory index set to 1 and other
|
|
bits set to 0 (@t{0xffc00000}).
|
|
@end defmac
|
|
|
|
@deftypefun uintptr_t pd_no (const void *@var{va})
|
|
@deftypefunx uintptr_t pt_no (const void *@var{va})
|
|
Returns the page directory index or page table index, respectively, for
|
|
virtual address @var{va}. These functions are defined in
|
|
@file{threads/pte.h}.
|
|
@end deftypefun
|
|
|
|
@deftypefun unsigned pg_ofs (const void *@var{va})
|
|
Returns the page offset for virtual address @var{va}. This function is
|
|
defined in @file{threads/vaddr.h}.
|
|
@end deftypefun
|
|
|
|
@node Page Table Entry Format
|
|
@subsubsection Page Table Entry Format
|
|
|
|
You do not need to understand the PTE format to do the PintOS
|
|
tasks, unless you wish to incorporate the page table into your
|
|
supplemental page table (@pxref{Managing the Supplemental Page Table}).
|
|
|
|
The actual format of a page table entry is summarized below. For
|
|
complete information, refer to section 3.7, ``Page Translation Using
|
|
32-Bit Physical Addressing,'' in @bibref{IA32-v3a}.
|
|
|
|
@example
|
|
@group
|
|
31 12 11 9 6 5 2 1 0
|
|
+---------------------------------------+----+----+-+-+---+-+-+-+
|
|
| Physical Address | AVL| |D|A| |U|W|P|
|
|
+---------------------------------------+----+----+-+-+---+-+-+-+
|
|
@end group
|
|
@end example
|
|
|
|
Some more information on each bit is given below. The names are
|
|
@file{threads/pte.h} macros that represent the bits' values:
|
|
|
|
@defmac PTE_P
|
|
Bit 0, the ``present'' bit. When this bit is 1, the
|
|
other bits are interpreted as described below. When this bit is 0, any
|
|
attempt to access the page will page fault. The remaining bits are then
|
|
not used by the CPU and may be used by the OS for any purpose.
|
|
@end defmac
|
|
|
|
@defmac PTE_W
|
|
Bit 1, the ``read/write'' bit. When it is 1, the page
|
|
is writable. When it is 0, write attempts will page fault.
|
|
@end defmac
|
|
|
|
@defmac PTE_U
|
|
Bit 2, the ``user/supervisor'' bit. When it is 1, user
|
|
processes may access the page. When it is 0, only the kernel may access
|
|
the page (user accesses will page fault).
|
|
|
|
PintOS clears this bit in PTEs for kernel virtual memory, to prevent
|
|
user processes from accessing them.
|
|
@end defmac
|
|
|
|
@defmac PTE_A
|
|
Bit 5, the ``accessed'' bit. @xref{Page Table Accessed
|
|
and Dirty Bits}.
|
|
@end defmac
|
|
|
|
@defmac PTE_D
|
|
Bit 6, the ``dirty'' bit. @xref{Page Table Accessed and
|
|
Dirty Bits}.
|
|
@end defmac
|
|
|
|
@defmac PTE_AVL
|
|
Bits 9@dots{}11, available for operating system use.
|
|
PintOS, as provided, does not use them and sets them to 0.
|
|
@end defmac
|
|
|
|
@defmac PTE_ADDR
|
|
Bits 12@dots{}31, the top 20 bits of the physical address of a frame.
|
|
The low 12 bits of the frame's address are always 0.
|
|
@end defmac
|
|
|
|
The other bits are either reserved or uninteresting in a PintOS context and
|
|
should be set to@tie{}0.
|
|
|
|
Header @file{threads/pte.h} defines three functions for working with
|
|
page table entries:
|
|
|
|
@deftypefun uint32_t pte_create_kernel (uint32_t *@var{page}, bool @var{writable})
|
|
Returns a page table entry that points to @var{page}, which should be a
|
|
kernel virtual address. The PTE's present bit will be set. It will be
|
|
marked for kernel-only access. If @var{writable} is true, the PTE will
|
|
also be marked read/write; otherwise, it will be read-only.
|
|
@end deftypefun
|
|
|
|
@deftypefun uint32_t pte_create_user (uint32_t *@var{page}, bool @var{writable})
|
|
Returns a page table entry that points to @var{page}, which should be
|
|
the kernel virtual address of a frame in the user pool (@pxref{Why
|
|
PAL_USER?}). The PTE's present bit will be set and it will be marked to
|
|
allow user-mode access. If @var{writable} is true, the PTE will also be
|
|
marked read/write; otherwise, it will be read-only.
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} pte_get_page (uint32_t @var{pte})
|
|
Returns the kernel virtual address for the frame that @var{pte} points
|
|
to. The @var{pte} may be present or not-present; if it is not-present
|
|
then the pointer returned is only meaningful if the address bits in the PTE
|
|
actually represent a physical address.
|
|
@end deftypefun
|
|
|
|
@node Page Directory Entry Format
|
|
@subsubsection Page Directory Entry Format
|
|
|
|
Page directory entries have the same format as PTEs, except that the
|
|
physical address points to a page table page instead of a frame. Header
|
|
@file{threads/pte.h} defines two functions for working with page
|
|
directory entries:
|
|
|
|
@deftypefun uint32_t pde_create (uint32_t *@var{pt})
|
|
Returns a page directory that points to @var{page}, which should be the
|
|
kernel virtual address of a page table page. The PDE's present bit will
|
|
be set, it will be marked to allow user-mode access, and it will be
|
|
marked read/write.
|
|
@end deftypefun
|
|
|
|
@deftypefun {uint32_t *} pde_get_pt (uint32_t @var{pde})
|
|
Returns the kernel virtual address for the page table page that
|
|
@var{pde}, which must be marked present, points to.
|
|
@end deftypefun
|
|
|
|
@node Linked List
|
|
@section Linked List
|
|
|
|
PintOS provides a (doubly) linked list data structure in @file{lib/kernel/list.c}.
|
|
When used, the header file @file{lib/kernel/list.h} needs to be included with with @code{#include <list.h>}.
|
|
This is often already done for you, as PintOS already uses this linked list structure in several places
|
|
(such as the @code{ready_list} in @file{src/threads/thread.c} and semaphore @code{waiters} in @file{src/threads/sync.h}).
|
|
|
|
@menu
|
|
* List Data Types::
|
|
* Basic List Functions::
|
|
* List Traversal Functions::
|
|
* List Insertion Functions::
|
|
* List Removal Functions::
|
|
* Ordered List Operations::
|
|
* List Example::
|
|
* List Auxiliary Data::
|
|
* List Synchronisation::
|
|
@end menu
|
|
|
|
@node List Data Types
|
|
@subsection Data Types
|
|
|
|
The PintOS implementation of a doubly linked list does not require use of dynamically allocated memory.
|
|
Instead, each structure that is a potential list element must embed a @code{struct list_elem} member.
|
|
All of the list functions operate on these @code{struct list_elem}'s.
|
|
The @code{list_entry} macro allows conversion from a @code{struct list_elem} back to a structure object that contains it.
|
|
|
|
A linked list is represented by @struct{list}.
|
|
|
|
@deftp {Type} {struct list}
|
|
Represents an entire linked list. The actual members of @struct{list} are ``opaque.''
|
|
That is, code that uses a linked list should not access @struct{list} members directly, nor should it need to.
|
|
Instead, use list functions and macros.
|
|
@end deftp
|
|
|
|
Internally, these lists have two sentinel elements: the "head" just before the first element and the "tail" just after the last element.
|
|
The @var{prev} link of the head sentinel is null, as is the @var{next} link of the tail sentinel.
|
|
Their other two links point toward each other via the interior elements of the list.
|
|
|
|
An empty list looks like this:
|
|
|
|
@example
|
|
@group
|
|
+------+ +------+
|
|
<---| head |<--->| tail |--->
|
|
+------+ +------+
|
|
@end group
|
|
@end example
|
|
|
|
A list with two elements in it looks like this:
|
|
|
|
@example
|
|
@group
|
|
+------+ +-------+ +-------+ +------+
|
|
<---| head |<--->| 1 |<--->| 2 |<--->| tail |--->
|
|
+------+ +-------+ +-------+ +------+
|
|
@end group
|
|
@end example
|
|
|
|
The symmetry of this arrangement eliminates a lot of special cases in list processing.
|
|
|
|
@page
|
|
The linked list operates on elements of type @struct{list_elem}.
|
|
|
|
@deftp {Type} {struct list_elem}
|
|
Embed a @struct{list_elem} member in the structure you want to include in a linked list.
|
|
Like @struct{list}, @struct{list_elem} is opaque.
|
|
All functions for operating on linked list elements actually take and return pointers to @struct{list_elem},
|
|
not pointers to your linked list's real element type.
|
|
@end deftp
|
|
|
|
You will often need to obtain a @struct{list_elem} given a real element of the linked list, and vice versa.
|
|
Given a real element of the linked list, you may use the @samp{&} operator to obtain a pointer to its @struct{list_elem}.
|
|
Use the @code{list_entry()} macro to go the other direction.
|
|
|
|
@deftypefn {Macro} {@var{type} *} list_entry (struct list_elem *@var{elem}, @var{type}, @var{member})
|
|
Returns a pointer to the structure that @var{elem}, a pointer to a @struct{list_elem}, is embedded within.
|
|
You must provide @var{type}, the name of the structure that @var{elem} is inside, and @var{member}, the name of the member in @var{type} that @var{elem} points to.
|
|
|
|
For example, suppose @code{l} is a @code{struct list_elem *} variable that points to a @struct{thread} member (of type @struct{list_elem}) named @code{l_elem}.
|
|
Then, @code{list_entry@tie{}(l, struct thread, l_elem)} yields the address of the @struct{thread} that @code{l} points within.
|
|
@end deftypefn
|
|
|
|
@xref{List Example}, for an example.
|
|
|
|
@node Basic List Functions
|
|
@subsection Basic Functions
|
|
|
|
These functions create linked lists and inspect their properties.
|
|
|
|
@deftypefun void list_init (struct list *@var{list})
|
|
Initializes @var{list} as an empty linked list.
|
|
@end deftypefun
|
|
|
|
@deftypefun size_t list_size (struct list *@var{list})
|
|
Returns the number of elements currently stored in @var{list}.
|
|
@end deftypefun
|
|
|
|
@deftypefun bool list_empty (struct list *@var{list})
|
|
Returns true if @var{list} currently contains no elements,
|
|
false if @var{list} contains at least one element.
|
|
@end deftypefun
|
|
|
|
@node List Traversal Functions
|
|
@subsection Traversal Functions
|
|
|
|
Each of these functions allows for the traversal of a linked list.
|
|
|
|
@deftypefun struct list_elem *list_head (struct list *@var{list});
|
|
@deftypefunx struct list_elem *list_rend (struct list *@var{list});
|
|
Returns the head sentinel of @var{list}.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_tail (struct list *@var{list});
|
|
@deftypefunx struct list_elem *list_end (struct list *@var{list});
|
|
Returns the tail sentinel of @var{list}.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_begin (struct list *@var{list});
|
|
Returns the first element of @var{list}, or the tail sentinel if the list is empty.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_rbegin (struct list *@var{list});
|
|
Returns the last element of @var{list}, or the head sentinel if the list is empty.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_front (struct list *@var{list});
|
|
Returns the front element in @var{list}.
|
|
Undefined behaviour if @var{list} is empty.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_back (struct list *@var{list});
|
|
Returns the last element in @var{list}.
|
|
Undefined behaviour if @var{list} is empty.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_next (struct list_elem *@var{elem});
|
|
Returns the element after @var{elem} in its list.
|
|
If @var{elem} is the last element in its list, returns the list tail sentinel.
|
|
Results are undefined if @var{elem} is itself a list tail sentinel.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_prev (struct list_elem *@var{elem});
|
|
Returns the element before @var{elem} in its list.
|
|
If @var{elem} is the first element in its list, returns the list head sentinel.
|
|
Results are undefined if @var{elem} is itself a list head sentinel.
|
|
@end deftypefun
|
|
|
|
@node List Insertion Functions
|
|
@subsection Insertion Functions
|
|
|
|
Each of these functions allows for the insertion of an element into a linked list.
|
|
|
|
@deftypefun void list_push_front (struct list *@var{list}, struct list_elem *@var{elem});
|
|
Inserts @var{elem} at the beginning of @var{list}, so that it becomes the first element in @var{list}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void list_push_back (struct list *@var{list}, struct list_elem *@var{elem});
|
|
Inserts @var{elem} at the end of @var{list}, so that it becomes the last element in @var{list}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void list_insert (struct list_elem *@var{before}, struct list_elem *@var{elem});
|
|
Inserts @var{elem} just before @var{before}, which may be either an interior element or a tail sentinel of a list.
|
|
The latter case is equivalent to @code{list_push_back}.
|
|
Undefined behaviour if @var{elem} is already in the list.
|
|
@end deftypefun
|
|
|
|
@deftypefun void list_splice (struct list_elem *@var{before}, struct list_elem *@var{first}, struct list_elem *@var{last});
|
|
Removes elements @var{first} though @var{last} (exclusive) from their current list,
|
|
then inserts them just before @var{before}, which may be either an interior element or a tail sentinel.
|
|
@end deftypefun
|
|
|
|
@node List Removal Functions
|
|
@subsection Removal Functions
|
|
|
|
Each of these functions allows for the removal of an element from a linked list.
|
|
|
|
@deftypefun struct list_elem *list_remove (struct list_elem *@var{elem});
|
|
Removes @var{elem} from its list and returns the element that followed it.
|
|
Undefined behaviour if @var{elem} is not in a list.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_pop_front (struct list *@var{list});
|
|
Removes the first element from @var{list} and returns it.
|
|
Undefined behaviour if @var{list} is empty.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_pop_back (struct list *@var{list});
|
|
Removes the last element from @var{list} and returns it.
|
|
Undefined behaviour if @var{list} is empty.
|
|
@end deftypefun
|
|
|
|
A list element must be treated very carefully after removing it from its list.
|
|
Calling @code{list_next(elem)} or @code{list_prev(elem)} will return the item that was previously before or after @var{elem},
|
|
but, @code{list_prev(list_next(elem))} is no longer @var{elem}!
|
|
|
|
@node Ordered List Operations
|
|
@subsection Ordered List Operations
|
|
|
|
The PintOS list implementation includes some functions specifically designed for working with ordered lists.
|
|
Each of these functions relies on the definition of an ordering function.
|
|
|
|
@deftypefun typedef bool list_less_func (const struct list_elem *@var{a}, const struct list_elem *@var{b}, void *@var{aux});
|
|
Compares the value of two list elements @var{a} and @var{b}, given auxiliary data @var{aux}.
|
|
Returns true if @var{a} is less than @var{b}, or false if @var{a} is greater than or equal to @var{b}.
|
|
@end deftypefun
|
|
|
|
@xref{List Auxiliary Data}, for an explanation of @var{aux}.
|
|
|
|
A list can then be sorted into an ordered list.
|
|
|
|
@deftypefun void list_sort (struct list *@var{list}, list_less_func *@var{less}, void *@var{aux});
|
|
Sorts @var{list} according to @var{less} given auxiliary data @var{aux}.
|
|
@end deftypefun
|
|
|
|
Each of these functions allow operations over an ordered list.
|
|
|
|
@deftypefun void list_insert_ordered (struct list *@var{list}, struct list_elem *@var{elem}, list_less_func *@var{less}, void *@var{aux});
|
|
Inserts @var{elem} in the proper position in @var{list}, which must be sorted according to @var{less} given auxiliary data @var{aux}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void list_unique (struct list *@var{list}, struct list *@var{duplicates}, list_less_func *@var{less}, void *@var{aux});
|
|
Iterates through @var{list} and removes all but the first in each set of adjacent elements that are equal according to @var{less} given auxiliary data @var{aux}.
|
|
If @var{duplicates} is non-null, then the elements from @var{list} are appended to @var{duplicates}.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_max (struct list *@var{list}, list_less_func *@var{less}, void *@var{aux});
|
|
Returns the element in @var{list} with the largest value according to @var{less} given auxiliary data @var{aux}.
|
|
If there is more than one maximum, returns the one that appears earlier in the list.
|
|
If the list is empty, returns its tail sentinel.
|
|
@end deftypefun
|
|
|
|
@deftypefun struct list_elem *list_min (struct list *@var{list}, list_less_func *@var{less}, void *@var{aux});
|
|
Returns the element in @var{list} with the smallest value according to @var{less} given auxiliary data @var{aux}.
|
|
If there is more than one minimum, returns the one that appears earlier in the list.
|
|
If the list is empty, returns its tail sentinel.
|
|
@end deftypefun
|
|
|
|
@node List Example
|
|
@subsection List Example
|
|
|
|
Suppose there is a need for a list of @code{struct foo}.
|
|
First define @code{struct foo} to include a @code{struct list_elem} member:
|
|
|
|
@example
|
|
struct foo
|
|
@{
|
|
struct list_elem elem;
|
|
int bar;
|
|
...other members...
|
|
@};
|
|
@end example
|
|
|
|
Then a list of @code{struct foo} can be be declared and initialised like this:
|
|
|
|
@example
|
|
struct list foo_list;
|
|
|
|
list_init (&foo_list);
|
|
@end example
|
|
|
|
Now we can add, traverse and remove items form the list as desired.
|
|
|
|
Iteration is a typical situation where it is necessary to convert from a @code{struct list_elem} back to its enclosing structure.
|
|
Here's an example using our @code{foo_list}:
|
|
|
|
@example
|
|
struct list_elem *e;
|
|
|
|
for (e = list_begin (&foo_list); e != list_end (&foo_list);
|
|
e = list_next (e))
|
|
@{
|
|
struct foo *f = list_entry (e, struct foo, elem);
|
|
...do something with f...
|
|
@}
|
|
@end example
|
|
|
|
If we wanted to order our @code{foo_list} in terms of each element's @code{bar} member, then we would need to define a @code{list_less_func} as follows:
|
|
|
|
@example
|
|
static bool sort_foos_by_bar(const struct list_elem *a_,
|
|
const struct list_elem *b_,
|
|
void *aux UNUSED)
|
|
@{
|
|
const struct foo *a = list_entry (a_, struct foo, elem);
|
|
const struct foo *b = list_entry (b_, struct foo, elem);
|
|
|
|
return a->bar < b->bar;
|
|
@}
|
|
@end example
|
|
|
|
We can then order the list by calling:
|
|
|
|
@example
|
|
list_sort(foo_list, sort_foos_by_bar, NULL)
|
|
@end example
|
|
|
|
@node List Auxiliary Data
|
|
@subsection Auxiliary Data
|
|
|
|
In simple cases like the example above, there's no need for the @var{aux} parameters for the ordered list operations.
|
|
In these cases, just pass a null pointer to the fuctions for @var{aux}
|
|
(You'll get a compiler warning if you don't use the @var{aux} parameter,
|
|
but you can turn that off with the @code{UNUSED} macro, as shown in the example, or you can just ignore it.)
|
|
|
|
@var{aux} is useful when you have some property of the elements in the list that is both constant and needed for element comparisons,
|
|
but not stord in the elements themselves.
|
|
For example, if the elements in a list are fixed-length strings,
|
|
but the elements themselves don't indicate what that fixed length is,
|
|
you could pass the length as an @var{aux} parameter.
|
|
|
|
@node List Synchronisation
|
|
@subsection Synchronisation
|
|
|
|
The linked list does not do any internal synchronization. It is the caller's responsibility to synchronize calls to list functions.
|
|
In general, any number of functions that examine but do not modify the list, such as @func{list_size} or @func{list_next}, may execute simultaneously.
|
|
However, these functions cannot safely execute at the same time as any function that may modify a given list,
|
|
such as @func{list_insert} or @func{list_remove}, nor may more than one function that can modify a given list execute safely at once.
|
|
|
|
It is also the caller's responsibility to synchronize access to data in list elements.
|
|
How to synchronize access to this data depends on how it is designed and organized, as with any other data structure.
|
|
|
|
@node Hash Table
|
|
@section Hash Table
|
|
|
|
PintOS provides a hash table data structure in @file{lib/kernel/hash.c}.
|
|
To use it you will need to include its header file,
|
|
@file{lib/kernel/hash.h}, with @code{#include <hash.h>}.
|
|
No code provided with PintOS uses the hash table, which means that you
|
|
are free to use it as is, modify its implementation for your own
|
|
purposes, or ignore it, as you wish.
|
|
|
|
Most implementations of the virtual memory task use a hash table to
|
|
translate pages to frames. You may find other uses for hash tables as
|
|
well.
|
|
|
|
@menu
|
|
* Hash Data Types::
|
|
* Basic Hash Functions::
|
|
* Hash Search Functions::
|
|
* Hash Iteration Functions::
|
|
* Hash Table Example::
|
|
* Hash Auxiliary Data::
|
|
* Hash Synchronization::
|
|
@end menu
|
|
|
|
@node Hash Data Types
|
|
@subsection Data Types
|
|
|
|
A hash table is represented by @struct{hash}.
|
|
|
|
@deftp {Type} {struct hash}
|
|
Represents an entire hash table. The actual members of @struct{hash}
|
|
are ``opaque.'' That is, code that uses a hash table should not access
|
|
@struct{hash} members directly, nor should it need to. Instead, use
|
|
hash table functions and macros.
|
|
@end deftp
|
|
|
|
The hash table operates on elements of type @struct{hash_elem}.
|
|
|
|
@deftp {Type} {struct hash_elem}
|
|
Embed a @struct{hash_elem} member in the structure you want to include
|
|
in a hash table. Like @struct{hash}, @struct{hash_elem} is opaque.
|
|
All functions for operating on hash table elements actually take and
|
|
return pointers to @struct{hash_elem}, not pointers to your hash table's
|
|
real element type.
|
|
@end deftp
|
|
|
|
You will often need to obtain a @struct{hash_elem} given a real element
|
|
of the hash table, and vice versa. Given a real element of the hash
|
|
table, you may use the @samp{&} operator to obtain a pointer to its
|
|
@struct{hash_elem}. Use the @code{hash_entry()} macro to go the other
|
|
direction.
|
|
|
|
@deftypefn {Macro} {@var{type} *} hash_entry (struct hash_elem *@var{elem}, @var{type}, @var{member})
|
|
Returns a pointer to the structure that @var{elem}, a pointer to a
|
|
@struct{hash_elem}, is embedded within. You must provide @var{type},
|
|
the name of the structure that @var{elem} is inside, and @var{member},
|
|
the name of the member in @var{type} that @var{elem} points to.
|
|
|
|
For example, suppose @code{h} is a @code{struct hash_elem *} variable
|
|
that points to a @struct{thread} member (of type @struct{hash_elem})
|
|
named @code{h_elem}. Then, @code{hash_entry@tie{}(h, struct thread, h_elem)}
|
|
yields the address of the @struct{thread} that @code{h} points within.
|
|
@end deftypefn
|
|
|
|
@xref{Hash Table Example}, for an example.
|
|
|
|
Each hash table element must contain a key, that is, data that
|
|
identifies and distinguishes elements, which must be unique
|
|
among elements in the hash table. (Elements may
|
|
also contain non-key data that need not be unique.) While an element is
|
|
in a hash table, its key data must not be changed. Instead, if need be,
|
|
remove the element from the hash table, modify its key, then reinsert
|
|
the element.
|
|
|
|
For each hash table, you must write two functions that act on keys: a
|
|
hash function and a comparison function. These functions must match the
|
|
following prototypes:
|
|
|
|
@deftp {Type} {unsigned hash_hash_func (const struct hash_elem *@var{element}, void *@var{aux})}
|
|
Returns a hash of @var{element}'s data, as a value anywhere in the range
|
|
of @code{unsigned int}. The hash of an element should be a
|
|
pseudo-random function of the element's key. It must not depend on
|
|
non-key data in the element or on any non-constant data other than the
|
|
key. PintOS provides the following functions as a suitable basis for
|
|
hash functions.
|
|
|
|
@deftypefun unsigned hash_bytes (const void *@var{buf}, size_t *@var{size})
|
|
Returns a hash of the @var{size} bytes starting at @var{buf}. The
|
|
implementation is the general-purpose
|
|
@uref{http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash, Fowler-Noll-Vo
|
|
hash} for 32-bit words.
|
|
@end deftypefun
|
|
|
|
@deftypefun unsigned hash_string (const char *@var{s})
|
|
Returns a hash of null-terminated string @var{s}.
|
|
@end deftypefun
|
|
|
|
@deftypefun unsigned hash_int (int @var{i})
|
|
Returns a hash of integer @var{i}.
|
|
@end deftypefun
|
|
|
|
If your key is a single piece of data of an appropriate type, it is
|
|
sensible for your hash function to directly return the output of one of
|
|
these functions. For multiple pieces of data, you may wish to combine
|
|
the output of more than one call to them using, e.g., the @samp{^}
|
|
(exclusive or)
|
|
operator. Finally, you may entirely ignore these functions and write
|
|
your own hash function from scratch, but remember that your goal is to
|
|
build an operating system kernel, not to design a hash function.
|
|
|
|
@xref{Hash Auxiliary Data}, for an explanation of @var{aux}.
|
|
@end deftp
|
|
|
|
@deftp {Type} {bool hash_less_func (const struct hash_elem *@var{a}, const struct hash_elem *@var{b}, void *@var{aux})}
|
|
Compares the keys stored in elements @var{a} and @var{b}. Returns
|
|
true if @var{a} is less than @var{b}, false if @var{a} is greater than
|
|
or equal to @var{b}.
|
|
|
|
If two elements compare equal, then they must hash to equal values.
|
|
|
|
@xref{Hash Auxiliary Data}, for an explanation of @var{aux}.
|
|
@end deftp
|
|
|
|
@xref{Hash Table Example}, for hash and comparison function examples.
|
|
|
|
A few functions accept a pointer to a third kind of
|
|
function as an argument:
|
|
|
|
@deftp {Type} {void hash_action_func (struct hash_elem *@var{element}, void *@var{aux})}
|
|
Performs some kind of action, chosen by the caller, on @var{element}.
|
|
|
|
@xref{Hash Auxiliary Data}, for an explanation of @var{aux}.
|
|
@end deftp
|
|
|
|
@node Basic Hash Functions
|
|
@subsection Basic Functions
|
|
|
|
These functions create, destroy, and inspect hash tables.
|
|
|
|
@deftypefun bool hash_init (struct hash *@var{hash}, hash_hash_func *@var{hash_func}, hash_less_func *@var{less_func}, void *@var{aux})
|
|
Initializes @var{hash} as a hash table with @var{hash_func} as hash
|
|
function, @var{less_func} as comparison function, and @var{aux} as
|
|
auxiliary data.
|
|
Returns true if successful, false on failure. @func{hash_init} calls
|
|
@func{malloc} and fails if memory cannot be allocated.
|
|
|
|
@xref{Hash Auxiliary Data}, for an explanation of @var{aux}, which is
|
|
most often a null pointer.
|
|
@end deftypefun
|
|
|
|
@deftypefun void hash_clear (struct hash *@var{hash}, hash_action_func *@var{action})
|
|
Removes all the elements from @var{hash}, which must have been
|
|
previously initialized with @func{hash_init}.
|
|
|
|
If @var{action} is non-null, then it is called once for each element in
|
|
the hash table, which gives the caller an opportunity to deallocate any
|
|
memory or other resources used by the element. For example, if the hash
|
|
table elements are dynamically allocated using @func{malloc}, then
|
|
@var{action} could @func{free} the element. This is safe because
|
|
@func{hash_clear} will not access the memory in a given hash element
|
|
after calling @var{action} on it. However, @var{action} must not call
|
|
any function that may modify the hash table, such as @func{hash_insert}
|
|
or @func{hash_delete}.
|
|
@end deftypefun
|
|
|
|
@deftypefun void hash_destroy (struct hash *@var{hash}, hash_action_func *@var{action})
|
|
If @var{action} is non-null, calls it for each element in the hash, with
|
|
the same semantics as a call to @func{hash_clear}. Then, frees the
|
|
memory held by @var{hash}. Afterward, @var{hash} must not be passed to
|
|
any hash table function, absent an intervening call to @func{hash_init}.
|
|
@end deftypefun
|
|
|
|
@deftypefun size_t hash_size (struct hash *@var{hash})
|
|
Returns the number of elements currently stored in @var{hash}.
|
|
@end deftypefun
|
|
|
|
@deftypefun bool hash_empty (struct hash *@var{hash})
|
|
Returns true if @var{hash} currently contains no elements,
|
|
false if @var{hash} contains at least one element.
|
|
@end deftypefun
|
|
|
|
@node Hash Search Functions
|
|
@subsection Search Functions
|
|
|
|
Each of these functions searches a hash table for an element that
|
|
compares equal to one provided. Based on the success of the search,
|
|
they perform some action, such as inserting a new element into the hash
|
|
table, or simply return the result of the search.
|
|
|
|
@deftypefun {struct hash_elem *} hash_insert (struct hash *@var{hash}, struct hash_elem *@var{element})
|
|
Searches @var{hash} for an element equal to @var{element}. If none is
|
|
found, inserts @var{element} into @var{hash} and returns a null pointer.
|
|
If the table already contains an element equal to @var{element}, it is
|
|
returned without modifying @var{hash}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {struct hash_elem *} hash_replace (struct hash *@var{hash}, struct hash_elem *@var{element})
|
|
Inserts @var{element} into @var{hash}. Any element equal to
|
|
@var{element} already in @var{hash} is removed. Returns the element
|
|
removed, or a null pointer if @var{hash} did not contain an element
|
|
equal to @var{element}.
|
|
|
|
The caller is responsible for deallocating any resources associated with
|
|
the returned element, as appropriate. For example, if the hash table
|
|
elements are dynamically allocated using @func{malloc}, then the caller
|
|
must @func{free} the element after it is no longer needed.
|
|
@end deftypefun
|
|
|
|
The element passed to the following functions is only used for hashing
|
|
and comparison purposes. It is never actually inserted into the hash
|
|
table. Thus, only key data in the element needs to be initialized, and
|
|
other data in the element will not be used. It often makes sense to
|
|
declare an instance of the element type as a local variable, initialize
|
|
the key data, and then pass the address of its @struct{hash_elem} to
|
|
@func{hash_find} or @func{hash_delete}. @xref{Hash Table Example}, for
|
|
an example. (Large structures should not be
|
|
allocated as local variables. @xref{struct thread}, for more
|
|
information.)
|
|
|
|
@deftypefun {struct hash_elem *} hash_find (struct hash *@var{hash}, struct hash_elem *@var{element})
|
|
Searches @var{hash} for an element equal to @var{element}. Returns the
|
|
element found, if any, or a null pointer otherwise.
|
|
@end deftypefun
|
|
|
|
@deftypefun {struct hash_elem *} hash_delete (struct hash *@var{hash}, struct hash_elem *@var{element})
|
|
Searches @var{hash} for an element equal to @var{element}. If one is
|
|
found, it is removed from @var{hash} and returned. Otherwise, a null
|
|
pointer is returned and @var{hash} is unchanged.
|
|
|
|
The caller is responsible for deallocating any resources associated with
|
|
the returned element, as appropriate. For example, if the hash table
|
|
elements are dynamically allocated using @func{malloc}, then the caller
|
|
must @func{free} the element after it is no longer needed.
|
|
@end deftypefun
|
|
|
|
@node Hash Iteration Functions
|
|
@subsection Iteration Functions
|
|
|
|
These functions allow iterating through the elements in a hash table.
|
|
Two interfaces are supplied. The first requires writing and supplying a
|
|
@var{hash_action_func} to act on each element (@pxref{Hash Data Types}).
|
|
|
|
@deftypefun void hash_apply (struct hash *@var{hash}, hash_action_func *@var{action})
|
|
Calls @var{action} once for each element in @var{hash}, in arbitrary
|
|
order. @var{action} must not call any function that may modify the hash
|
|
table, such as @func{hash_insert} or @func{hash_delete}. @var{action}
|
|
must not modify key data in elements, although it may modify any other
|
|
data.
|
|
@end deftypefun
|
|
|
|
The second interface is based on an ``iterator'' data type.
|
|
Idiomatically, iterators look like:
|
|
|
|
@example
|
|
struct hash_iterator i;
|
|
|
|
hash_first (&i, h);
|
|
while (hash_next (&i))
|
|
@{
|
|
struct foo *f = hash_entry (hash_cur (&i), struct foo, elem);
|
|
@r{@dots{}do something with @i{f}@dots{}}
|
|
@}
|
|
@end example
|
|
|
|
@deftp {Type} {struct hash_iterator}
|
|
Represents a position within a hash table. Calling any function that
|
|
may modify a hash table, such as @func{hash_insert} or
|
|
@func{hash_delete}, invalidates all iterators within that hash table.
|
|
|
|
Like @struct{hash} and @struct{hash_elem}, @struct{hash_elem} is opaque.
|
|
@end deftp
|
|
|
|
@deftypefun void hash_first (struct hash_iterator *@var{iterator}, struct hash *@var{hash})
|
|
Initializes @var{iterator} to just before the first element in
|
|
@var{hash}.
|
|
@end deftypefun
|
|
|
|
@deftypefun {struct hash_elem *} hash_next (struct hash_iterator *@var{iterator})
|
|
Advances @var{iterator} to the next element in @var{hash}, and returns
|
|
that element. Returns a null pointer if no elements remain. After
|
|
@func{hash_next} returns null for @var{iterator}, calling it again
|
|
yields undefined behaviour.
|
|
@end deftypefun
|
|
|
|
@deftypefun {struct hash_elem *} hash_cur (struct hash_iterator *@var{iterator})
|
|
Returns the value most recently returned by @func{hash_next} for
|
|
@var{iterator}. Yields undefined behaviour after @func{hash_first} has
|
|
been called on @var{iterator} but before @func{hash_next} has been
|
|
called for the first time.
|
|
@end deftypefun
|
|
|
|
@node Hash Table Example
|
|
@subsection Hash Table Example
|
|
|
|
Suppose you have a structure, called @struct{page}, that you
|
|
want to put into a hash table. First, define @struct{page} to include a
|
|
@struct{hash_elem} member:
|
|
|
|
@example
|
|
struct page
|
|
@{
|
|
struct hash_elem hash_elem; /* @r{Hash table element.} */
|
|
void *addr; /* @r{Virtual address.} */
|
|
/* @r{@dots{}other members@dots{}} */
|
|
@};
|
|
@end example
|
|
|
|
We write a hash function and a comparison function using @var{addr} as
|
|
the key. A pointer can be hashed based on its bytes, and the @samp{<}
|
|
operator works fine for comparing pointers:
|
|
|
|
@example
|
|
/* @r{Returns a hash value for page @var{p}.} */
|
|
unsigned
|
|
page_hash (const struct hash_elem *p_, void *aux UNUSED)
|
|
@{
|
|
const struct page *p = hash_entry (p_, struct page, hash_elem);
|
|
return hash_bytes (&p->addr, sizeof p->addr);
|
|
@}
|
|
|
|
/* @r{Returns true if page @var{a} precedes page @var{b}.} */
|
|
bool
|
|
page_less (const struct hash_elem *a_, const struct hash_elem *b_,
|
|
void *aux UNUSED)
|
|
@{
|
|
const struct page *a = hash_entry (a_, struct page, hash_elem);
|
|
const struct page *b = hash_entry (b_, struct page, hash_elem);
|
|
|
|
return a->addr < b->addr;
|
|
@}
|
|
@end example
|
|
|
|
@noindent
|
|
(The use of @code{UNUSED} in these functions' prototypes suppresses a
|
|
warning that @var{aux} is unused. @xref{Function and Parameter
|
|
Attributes}, for information about @code{UNUSED}. @xref{Hash Auxiliary
|
|
Data}, for an explanation of @var{aux}.)
|
|
|
|
@page
|
|
Then, we can create a hash table like this:
|
|
|
|
@example
|
|
struct hash pages;
|
|
|
|
hash_init (&pages, page_hash, page_less, NULL);
|
|
@end example
|
|
|
|
Now we can manipulate the hash table we've created. If @code{@var{p}}
|
|
is a pointer to a @struct{page}, we can insert it into the hash table
|
|
with:
|
|
|
|
@example
|
|
hash_insert (&pages, &p->hash_elem);
|
|
@end example
|
|
|
|
@noindent If there's a chance that @var{pages} might already contain a
|
|
page with the same @var{addr}, then we should check @func{hash_insert}'s
|
|
return value.
|
|
|
|
To search for an element in the hash table, use @func{hash_find}. This
|
|
takes a little setup, because @func{hash_find} takes an element to
|
|
compare against. Here's a function that will find and return a page
|
|
based on a virtual address, assuming that @var{pages} is defined at file
|
|
scope:
|
|
|
|
@example
|
|
/* @r{Returns the page containing the given virtual @var{address},}
|
|
@r{or a null pointer if no such page exists.} */
|
|
struct page *
|
|
page_lookup (const void *address)
|
|
@{
|
|
struct page p;
|
|
struct hash_elem *e;
|
|
|
|
p.addr = address;
|
|
e = hash_find (&pages, &p.hash_elem);
|
|
return e != NULL ? hash_entry (e, struct page, hash_elem) : NULL;
|
|
@}
|
|
@end example
|
|
|
|
@noindent
|
|
@struct{page} is allocated as a local variable here on the assumption
|
|
that it is fairly small. Large structures should not be allocated as
|
|
local variables. @xref{struct thread}, for more information.
|
|
|
|
A similar function could delete a page by address using
|
|
@func{hash_delete}.
|
|
|
|
@node Hash Auxiliary Data
|
|
@subsection Auxiliary Data
|
|
|
|
In simple cases like the example above, there's no need for the
|
|
@var{aux} parameters. In these cases, just pass a null pointer to
|
|
@func{hash_init} for @var{aux} and ignore the values passed to the hash
|
|
function and comparison functions. (You'll get a compiler warning if
|
|
you don't use the @var{aux} parameter, but you can turn that off with
|
|
the @code{UNUSED} macro, as shown in the example, or you can just ignore
|
|
it.)
|
|
|
|
@var{aux} is useful when you have some property of the data in the
|
|
hash table is both constant and needed for hashing or comparison,
|
|
but not stored in the data items themselves. For example, if
|
|
the items in a hash table are fixed-length strings, but the items
|
|
themselves don't indicate what that fixed length is, you could pass
|
|
the length as an @var{aux} parameter.
|
|
|
|
@node Hash Synchronization
|
|
@subsection Synchronization
|
|
|
|
The hash table does not do any internal synchronization. It is the
|
|
caller's responsibility to synchronize calls to hash table functions.
|
|
In general, any number of functions that examine but do not modify the
|
|
hash table, such as @func{hash_find} or @func{hash_next}, may execute
|
|
simultaneously. However, these function cannot safely execute at the
|
|
same time as any function that may modify a given hash table, such as
|
|
@func{hash_insert} or @func{hash_delete}, nor may more than one function
|
|
that can modify a given hash table execute safely at once.
|
|
|
|
It is also the caller's responsibility to synchronize access to data in
|
|
hash table elements. How to synchronize access to this data depends on
|
|
how it is designed and organized, as with any other data structure.
|
|
|