uvm - virtual memory system external interface
#include <sys/param.h>
#include <uvm/uvm.h>
The UVM virtual memory system manages access to the computer's memory
resources. User processes and the kernel access these resources through
UVM's external interface. UVM's external interface includes functions
that:
- initialise UVM sub-systems
- manage virtual address spaces
- resolve page faults
- memory map files and devices
- perform uio-based I/O to virtual memory
- allocate and free kernel virtual memory
- allocate and free physical memory
In addition to exporting these services, UVM has two kernel-level processes:
pagedaemon and swapper. The pagedaemon process sleeps until
physical memory becomes scarce. When that happens, pagedaemon is awoken.
It scans physical memory, paging out and freeing memory that has not been
recently used. The swapper process swaps in runnable processes that are
currently swapped out, if there is room.
There are also several miscellaneous functions.
void uvm_init(void) void uvm_init_limits(struct proc *p) void
uvm_setpagesize(void) void uvm_swap_init(void)
uvm_init() sets up the UVM system at system boot time, after the copyright
has been printed. It initialises global state, the page, map, kernel
virtual memory state, machine-dependent physical map, kernel memory
allocator, pager and anonymous memory sub-systems, and then enables paging
of kernel objects.
uvm_init_limits() initialises process limits for the named process. This
is for use by the system startup for process zero, before any other processes
are created.
uvm_setpagesize() initialises the uvmexp members pagesize (if not already
done by machine-dependent code), pageshift and pagemask. It should be
called by machine-dependent code early in the pmap_init() call (see
pmap(9)).
uvm_swap_init() initialises the swap sub-system.
VIRTUAL ADDRESS SPACE MANAGEMENT [Toc] [Back] int uvm_map(struct vm_map *map, vaddr_t *startp, vsize_t size, struct
uvm_object *uobj, voff_t uoffset, uvm_flag_t flags) int
uvm_map_pageable(struct vm_map *map, vaddr_t start, vaddr_t end,
boolean_t new_pageable, int lockflags) boolean_t uvm_map_checkprot(struct
vm_map *map, vaddr_t start, vaddr_t end, vm_prot_t protection) int
uvm_map_protect(struct vm_map *map, vaddr_t start, vaddr_t end, vm_prot_t
new_prot, boolean_t set_max) int uvm_deallocate(struct vm_map *map,
vaddr_t start, vsize_t size)
struct vmspace * uvmspace_alloc(vaddr_t min, vaddr_t max, int pageable)
void uvmspace_exec(struct proc *p, vaddr_t start, vaddr_t end) struct
vmspace * uvmspace_fork(struct vmspace *vm) void uvmspace_free(struct
vmspace *vm1) void uvmspace_share(struct proc *p1, struct proc *p2) void
uvmspace_unshare(struct proc *p)
uvm_map() establishes a valid mapping in map map, which must be unlocked.
The new mapping has size size, which must be in PAGE_SIZE units. The
uobj and uoffset arguments can have four meanings. When uobj is NULL and
uoffset is UVM_UNKNOWN_OFFSET, uvm_map() does not use the machine-dependent
PMAP_PREFER function. If uoffset is any other value, it is used as
the hint to PMAP_PREFER. When uobj is not NULL and uoffset is
UVM_UNKNOWN_OFFSET, uvm_map() finds the offset based upon the virtual
address, passed as startp. If uoffset is any other value, we are doing a
normal mapping at this offset. The start address of the map will be
returned in startp.
flags passed to uvm_map() are typically created using the
UVM_MAPFLAG(vm_prot_t prot, vm_prot_t maxprot, vm_inherit_t inh, int
advice, int flags) macro, which uses the following values. The prot and
maxprot can take are:
#define UVM_PROT_MASK 0x07 /* protection mask */
#define UVM_PROT_NONE 0x00 /* protection none */
#define UVM_PROT_ALL 0x07 /* everything */
#define UVM_PROT_READ 0x01 /* read */
#define UVM_PROT_WRITE 0x02 /* write */
#define UVM_PROT_EXEC 0x04 /* exec */
#define UVM_PROT_R 0x01 /* read */
#define UVM_PROT_W 0x02 /* write */
#define UVM_PROT_RW 0x03 /* read-write */
#define UVM_PROT_X 0x04 /* exec */
#define UVM_PROT_RX 0x05 /* read-exec */
#define UVM_PROT_WX 0x06 /* write-exec */
#define UVM_PROT_RWX 0x07 /* read-write-exec */
The values that inh can take are:
#define UVM_INH_MASK 0x30 /* inherit mask */
#define UVM_INH_SHARE 0x00 /* "share" */
#define UVM_INH_COPY 0x10 /* "copy" */
#define UVM_INH_NONE 0x20 /* "none" */
#define UVM_INH_DONATE 0x30 /* "donate" << not used */
The values that advice can take are:
#define UVM_ADV_NORMAL 0x0 /* 'normal' */
#define UVM_ADV_RANDOM 0x1 /* 'random' */
#define UVM_ADV_SEQUENTIAL 0x2 /* 'sequential' */
#define UVM_ADV_MASK 0x7 /* mask */
The values that flags can take are:
#define UVM_FLAG_FIXED 0x010000 /* find space */
#define UVM_FLAG_OVERLAY 0x020000 /* establish overlay */
#define UVM_FLAG_NOMERGE 0x040000 /* don't merge map entries */
#define UVM_FLAG_COPYONW 0x080000 /* set copy_on_write flag */
#define UVM_FLAG_AMAPPAD 0x100000 /* for bss: pad amap to reduce malloc() */
#define UVM_FLAG_TRYLOCK 0x200000 /* fail if we can not lock map */
The UVM_MAPFLAG macro arguments can be combined with an or operator.
There are several special purpose macros for checking protection combinations,
e.g. the UVM_PROT_WX macro. There are also some additional macros
to extract bits from the flags. The UVM_PROTECTION, UVM_INHERIT,
UVM_MAXPROTECTION and UVM_ADVICE macros return the protection, inheritance,
maximum protection and advice, respectively. uvm_map() returns a
standard UVM return value.
uvm_map_pageable() changes the pageability of the pages in the range from
start to end in map map to new_pageable. uvm_map_pageable() returns a
standard UVM return value.
uvm_map_checkprot() checks the protection of the range from start to end
in map map against protection. This returns either TRUE or FALSE.
uvm_map_protect() changes the protection start to end in map map to
new_prot, also setting the maximum protection to the region to new_prot
if set_max is non-zero. This function returns a standard UVM return
value.
uvm_deallocate() deallocates kernel memory in map map from address start
to start + size.
uvmspace_alloc() allocates and returns a new address space, with ranges
from min to max, setting the pageability of the address space to
pageable.
uvmspace_exec() either reuses the address space of process p if there are
no other references to it, or creates a new one with uvmspace_alloc().
The range of valid addresses in the address space is reset to start
through end.
uvmspace_fork() creates and returns a new address space based upon the
vm1 address space, typically used when allocating an address space for a
child process.
uvmspace_free() lowers the reference count on the address space vm, freeing
the data structures if there are no other references.
uvmspace_share() causes process p2 to share the address space of p1.
uvmspace_unshare() ensures that process p has its own, unshared address
space, by creating a new one if necessary by calling uvmspace_fork().
int uvm_fault(struct vm_map *orig_map, vaddr_t vaddr, vm_fault_t
fault_type, vm_prot_t access_type)
uvm_fault() is the main entry point for faults. It takes orig_map as the
map the fault originated in, a vaddr offset into the map the fault
occurred, fault_type describing the type of fault, and access_type
describing the type of access requested. uvm_fault() returns a standard
UVM return value.
MEMORY MAPPING FILES AND DEVICES [Toc] [Back] struct uvm_object * uvn_attach(void *arg, vm_prot_t accessprot) void
uvm_vnp_setsize(struct vnode *vp, voff_t newsize) void * ubc_alloc(struct
uvm_object *uobj, voff_t offset, vsize_t *lenp, int flags) void
ubc_release(void *va, int flags)
uvn_attach() attaches a UVM object to vnode arg, creating the object if
necessary. The object is returned.
uvm_vnp_setsize() sets the size of vnode vp to newsize. Caller must hold
a reference to the vnode. If the vnode shrinks, pages no longer used are
discarded.
ubc_alloc() creates a kernel mappings of uobj starting at offset offset.
the desired length of the mapping is pointed to by lenp, but the actual
mapping may be smaller than this. lenp is updated to contain the actual
length mapped. The flags must be one of
#define UBC_READ 0x01 /* mapping will be accessed for read */
#define UBC_WRITE 0x02 /* mapping will be accessed for write */
Currently, uobj must actually be a vnode object. Once the mapping is
created, it must be accessed only by methods that can handle faults, such
as uiomove() or kcopy(). Page faults on the mapping will result in the
vnode's VOP_GETPAGES() method being called to resolve the fault.
ubc_release() frees the mapping at va for reuse. The mapping may be
cached to speed future accesses to the same region of the object. The
flags are currently unused.
VIRTUAL MEMORY I/O
int uvm_io(struct vm_map *map, struct uio *uio)
uvm_io() performs the I/O described in uio on the memory described in
map.
ALLOCATION OF KERNEL MEMORY [Toc] [Back] vaddr_t uvm_km_alloc(struct vm_map *map, vsize_t size) vaddr_t
uvm_km_zalloc(struct vm_map *map, vsize_t size) vaddr_t
uvm_km_alloc1(struct vm_map *map, vsize_t size, boolean_t zeroit) vaddr_t
uvm_km_kmemalloc(struct vm_map *map, struct uvm_object *obj, vsize_t
size, int flags) vaddr_t uvm_km_valloc(struct vm_map *map, vsize_t size)
vaddr_t uvm_km_valloc_wait(struct vm_map *map, vsize_t size) struct
vm_map * uvm_km_suballoc(struct vm_map *map, vaddr_t *min, vaddr_t *max ,
vsize_t size, boolean_t pageable, boolean_t fixed, struct vm_map *submap)
void uvm_km_free(struct vm_map *map, vaddr_t addr, vsize_t size) void
uvm_km_free_wakeup(struct vm_map *map, vaddr_t addr, vsize_t size)
uvm_km_alloc() and uvm_km_zalloc() allocate size bytes of wired kernel
memory in map map. In addition to allocation, uvm_km_zalloc() zeros the
memory. Both of these functions are defined as macros in terms of
uvm_km_alloc1(), and should almost always be used in preference to
uvm_km_alloc1().
uvm_km_alloc1() allocates and returns size bytes of wired memory in the
kernel map, zeroing the memory if the zeroit argument is non-zero.
uvm_km_kmemalloc() allocates and returns size bytes of wired kernel memory
into obj. The flags can be any of:
#define UVM_KMF_NOWAIT 0x1 /* matches M_NOWAIT */
#define UVM_KMF_VALLOC 0x2 /* allocate VA only */
#define UVM_KMF_TRYLOCK UVM_FLAG_TRYLOCK /* try locking only */
UVM_KMF_NOWAIT causes uvm_km_kmemalloc() to return immediately if no memory
is available. UVM_KMF_VALLOC causes no pages to be allocated, only a
virtual address. UVM_KMF_TRYLOCK causes uvm_km_kmemalloc() to use
simple_lock_try() when locking maps.
uvm_km_valloc() and uvm_km_valloc_wait() return a newly allocated zerofilled
address in the kernel map of size size. uvm_km_valloc_wait() will
also wait for kernel memory to become available, if there is a memory
shortage.
uvm_km_free() and uvm_km_free_wakeup() free size bytes of memory in the
kernel map, starting at address addr. uvm_km_free_wakeup() calls
wakeup() on the map before unlocking the map.
uvm_km_suballoc() allocates submap from map, creating a new map if submap
is NULL. The addresses of the submap can be specified exactly by setting
the fixed argument to non-zero, which causes the min argument specify the
beginning of the address in the submap. If fixed is zero, any address of
size size will be allocated from map and the start and end addresses
returned in min and max. If pageable is non-zero, entries in the map may
be paged out.
ALLOCATION OF PHYSICAL MEMORY [Toc] [Back] struct vm_page * uvm_pagealloc(struct uvm_object *uobj, voff_t off,
struct vm_anon *anon, int flags) void uvm_pagerealloc(struct vm_page *pg,
struct uvm_object *newobj, voff_t newoff) void uvm_pagefree(struct
vm_page *pg) int uvm_pglistalloc(psize_t size, paddr_t low, paddr_t high,
paddr_t alignment, paddr_t boundary, struct pglist *rlist, int nsegs, int
waitok) void uvm_pglistfree(struct pglist *list) void
uvm_page_physload(vaddr_t start, vaddr_t end, vaddr_t avail_start,
vaddr_t avail_end, int free_list)
uvm_pagealloc() allocates a page of memory at virtual address off in
either the object uobj or the anonymous memory anon, which must be locked
by the caller. Only one of uobj and anon can be non NULL. Returns NULL
when no page can be found. The flags can be any of
#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve pages */
#define UVM_PGA_ZERO 0x0002 /* returned page must be zero'd */
UVM_PGA_USERESERVE means to allocate a page even if that will result in
the number of free pages being lower than uvmexp.reserve_pagedaemon (if
the current thread is the pagedaemon) or uvmexp.reserve_kernel (if the
current thread is not the pagedaemon). UVM_PGA_ZERO causes the returned
page to be filled with zeroes, either by allocating it from a pool of
pre-zeroed pages or by zeroing it in-line as necessary.
uvm_pagerealloc() reallocates page pg to a new object newobj, at a new
offset newoff.
uvm_pagefree() frees the physical page pg.
uvm_pglistalloc() allocates a list of pages for size size byte under various
constraints. low and high describe the lowest and highest addresses
acceptable for the list. If alignment is non-zero, it describes the
required alignment of the list, in power-of-two notation. If boundary is
non-zero, no segment of the list may cross this power-of-two boundary,
relative to zero. The nsegs and waitok arguments are currently ignored.
uvm_pglistfree() frees the list of pages pointed to by list.
uvm_page_physload() loads physical memory segments into VM space on the
specified free_list. It must be called at system boot time to setup
physical memory management pages. The arguments describe the start and
end of the physical addresses of the segment, and the available start and
end addresses of pages not already in use.
void uvm_pageout(void) void uvm_scheduler(void) void uvm_swapin(struct
proc *p)
uvm_pageout() is the main loop for the page daemon.
uvm_scheduler() is the process zero main loop, which is to be called
after the system has finished starting other processes. It handles the
swapping in of runnable, swapped out processes in priority order.
uvm_swapin() swaps in the named process.
int uvm_loan(struct vm_map *map, vaddr_t start, vsize_t len, void *v, int
flags) void uvm_unloan(void *v, int npages, int flags)
uvm_loan() loans pages in a map out to anons or to the kernel. map
should be unlocked , start and len should be multiples of PAGE_SIZE.
Argument flags should be one of
#define UVM_LOAN_TOANON 0x01 /* loan to anons */
#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */
v should be pointer to array of pointers to struct anon or struct
vm_page, as appropriate. The caller has to allocate memory for the array
and ensure it's big enough to hold len / PAGE_SIZE pointers. Returns 0
for success, or appropriate error number otherwise.
uvm_unloan() kills loans on pages or anons. The v must point to the array
of pointers initialized by previous call to uvm_loan(). npages should
match number of pages allocated for loan, this also matches number of
items in the array. Argument flags should be one of
#define UVM_LOAN_TOANON 0x01 /* loan to anons */
#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */
and should match what was used for previous call to uvm_loan().
MISCELLANEOUS FUNCTIONS [Toc] [Back] struct uvm_object * uao_create(vsize_t size, int flags) void
uao_detach(struct uvm_object *uobj) void uao_reference(struct uvm_object
*uobj)
boolean_t uvm_chgkprot(caddr_t addr, size_t len, int rw) void
uvm_kernacc(caddr_t addr, size_t len, int rw) boolean_t
uvm_useracc(caddr_t addr, size_t len, int rw)
int uvm_vslock(struct proc *p, caddr_t addr, size_t len, vm_prot_t prot)
void uvm_vsunlock(struct proc *p, caddr_t addr, size_t len)
void uvm_meter(void) int uvm_sysctl(int *name, u_int namelen, void *oldp,
size_t *oldlenp, void *newp , size_t newlen, struct proc *p)
void uvm_fork(struct proc *p1, struct proc *p2, boolean_t shared) int
uvm_grow(struct proc *p, vaddr_t sp) int uvm_coredump(struct proc *p,
struct vnode *vp, struct ucred *cred, struct core *chdr)
void uvn_findpages(struct uvm_object *uobj, voff_t offset, int *npagesp,
struct vm_page **pps, int flags)
void uvm_swap_stats(int cmd, struct swapent *sep, int sec, register_t
*retval)
The uao_create(), uao_detach() and uao_reference() functions operate on
anonymous memory objects, such as those used to support System V shared
memory. uao_create() returns an object of size size with flags:
#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */
#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel swap */
which can only be used once each at system boot time. uao_reference()
creates an additional reference to the named anonymous memory object.
uao_detach() removes a reference from the named anonymous memory object,
destroying it if removing the last reference.
uvm_chgkprot() changes the protection of kernel memory from addr to addr
+ len to the value of rw. This is primarily useful for debuggers, for
setting breakpoints. This function is only available with options KGDB.
uvm_kernacc() and uvm_useracc() check the access at address addr to addr
+ len for rw access, in the kernel address space, and the current process'
address space respectively.
uvm_vslock() and uvm_vsunlock() control the wiring and unwiring of pages
for process p from addr to addr + len. These functions are normally used
to wire memory for I/O.
uvm_meter() calculates the load average and wakes up the swapper if necessary.
uvm_sysctl() provides support for the CTL_VM domain of the sysctl(3)
hierarchy. uvm_sysctl() handles the VM_LOADAVG, VM_METER and VM_UVMEXP
calls, which return the current load averages, calculates current VM
totals, and returns the uvmexp structure respectively. The load averages
are access from userland using the getloadavg(3) function. The uvmexp
structure has all global state of the UVM system, and has the following
members:
/* vm_page constants */
int pagesize; /* size of a page (PAGE_SIZE): must be power of 2 */
int pagemask; /* page mask */
int pageshift; /* page shift */
/* vm_page counters */
int npages; /* number of pages we manage */
int free; /* number of free pages */
int active; /* number of active pages */
int inactive; /* number of pages that we free'd but may want back */
int paging; /* number of pages in the process of being paged out */
int wired; /* number of wired pages */
int reserve_pagedaemon; /* number of pages reserved for pagedaemon */
int reserve_kernel; /* number of pages reserved for kernel */
/* pageout params */
int freemin; /* min number of free pages */
int freetarg; /* target number of free pages */
int inactarg; /* target number of inactive pages */
int wiredmax; /* max number of wired pages */
/* swap */
int nswapdev; /* number of configured swap devices in system */
int swpages; /* number of PAGE_SIZE'ed swap pages */
int swpginuse; /* number of swap pages in use */
int nswget; /* number of times fault calls uvm_swap_get() */
int nanon; /* number total of anon's in system */
int nfreeanon; /* number of free anon's */
/* stat counters */
int faults; /* page fault count */
int traps; /* trap count */
int intrs; /* interrupt count */
int swtch; /* context switch count */
int softs; /* software interrupt count */
int syscalls; /* system calls */
int pageins; /* pagein operation count */
/* pageouts are in pdpageouts below */
int swapins; /* swapins */
int swapouts; /* swapouts */
int pgswapin; /* pages swapped in */
int pgswapout; /* pages swapped out */
int forks; /* forks */
int forks_ppwait; /* forks where parent waits */
int forks_sharevm; /* forks where vmspace is shared */
/* fault subcounters */
int fltnoram; /* number of times fault was out of ram */
int fltnoanon; /* number of times fault was out of anons */
int fltpgwait; /* number of times fault had to wait on a page */
int fltpgrele; /* number of times fault found a released page */
int fltrelck; /* number of times fault relock called */
int fltrelckok; /* number of times fault relock is a success */
int fltanget; /* number of times fault gets anon page */
int fltanretry; /* number of times fault retrys an anon get */
int fltamcopy; /* number of times fault clears "needs copy" */
int fltnamap; /* number of times fault maps a neighbor anon page */
int fltnomap; /* number of times fault maps a neighbor obj page */
int fltlget; /* number of times fault does a locked pgo_get */
int fltget; /* number of times fault does an unlocked get */
int flt_anon; /* number of times fault anon (case 1a) */
int flt_acow; /* number of times fault anon cow (case 1b) */
int flt_obj; /* number of times fault is on object page (2a) */
int flt_prcopy; /* number of times fault promotes with copy (2b) */
int flt_przero; /* number of times fault promotes with zerofill (2b) */
/* daemon counters */
int pdwoke; /* number of times daemon woke up */
int pdrevs; /* number of times daemon rev'd clock hand */
int pdswout; /* number of times daemon called for swapout */
int pdfreed; /* number of pages daemon freed since boot */
int pdscans; /* number of pages daemon scanned since boot */
int pdanscan; /* number of anonymous pages scanned by daemon */
int pdobscan; /* number of object pages scanned by daemon */
int pdreact; /* number of pages daemon reactivated since boot */
int pdbusy; /* number of times daemon found a busy page */
int pdpageouts; /* number of times daemon started a pageout */
int pdpending; /* number of times daemon got a pending pageout */
int pddeact; /* number of pages daemon deactivates */
uvm_fork() forks a virtual address space for process' (old) p1 and (new)
p2. If the shared argument is non zero, p1 shares its address space with
p2, otherwise a new address space is created. This function currently
has no return value, and thus cannot fail. In the future, this function
will be changed to allow it to fail in low memory conditions.
uvm_grow() increases the stack segment of process p to include sp.
uvm_coredump() generates a coredump on vnode vp for process p with credentials
cred and core header description in chdr.
uvn_findpages() looks up or creates pages in uobj at offset offset, marks
them busy and returns them in the pps array. Currently uobj must be a
vnode object. The number of pages requested is pointed to by npagesp,
and this value is updated with the actual number of pages returned. The
flags can be
#define UFP_ALL 0x00 /* return all pages requested */
#define UFP_NOWAIT 0x01 /* don't sleep */
#define UFP_NOALLOC 0x02 /* don't allocate new pages */
#define UFP_NOCACHE 0x04 /* don't return pages which already exist */
#define UFP_NORDONLY 0x08 /* don't return PG_READONLY pages */
UFP_ALL is a pseudo-flag meaning all requested pages should be returned.
UFP_NOWAIT means that we must not sleep. UFP_NOALLOC causes any pages
which do not already exist to be skipped. UFP_NOCACHE causes any pages
which do already exist to be skipped. UFP_NORDONLY causes any pages
which are marked PG_READONLY to be skipped.
uvm_swap_stats() implements the SWAP_STATS and SWAP_OSTATS operation of
the swapctl(2) system call. cmd is the requested command, SWAP_STATS or
SWAP_OSTATS. The function will copy no more than sec entries in the
array pointed by sep. On return, retval holds the actual number of
entries copied in the array.
uvm_chgkprot() is only available if the kernel has been compiled with
options KGDB.
All structure and types whose names begin with ``vm_'' will be renamed to
``uvm_''.
swapctl(2), getloadavg(3), kvm(3), sysctl(3), ddb(4), options(4), pmap(9)
UVM is a new VM system developed at Washington University in St. Louis
(Missouri). UVM's roots lie partly in the Mach-based 4.4BSD VM system,
the FreeBSD VM system, and the SunOS4 VM system. UVM's basic structure
is based on the 4.4BSD VM system. UVM's new anonymous memory system is
based on the anonymous memory system found in the SunOS4 VM (as described
in papers published by Sun Microsystems, Inc.). UVM also includes a number
of feature new to BSD including page loanout, map entry passing, simplified
copy-on-write, and clustered anonymous memory pageout. UVM is
also further documented in a August 1998 dissertation by Charles D. Cranor.
UVM appeared in NetBSD 1.4.
Charles D. Cranor <chuck@ccrc.wustl.edu> designed and implemented UVM.
Matthew Green <mrg@eterna.com.au> wrote the swap-space management code
and handled the logistical issues involved with merging UVM into the
NetBSD source tree.
Chuck Silvers <chuq@chuq.com> implemented the aobj pager, thus allowing
UVM to support System V shared memory and process swapping. He also
designed and implemented the UBC part of UVM, which uses UVM pages to
cache vnode data rather than the traditional buffer cache buffers.
BSD December 24, 2001 BSD
[ Back ] |