proc(4) proc(4)
proc - process (debug) filesystem
#include <sys/procfs.h>
/proc is a filesystem that provides access to the image of each active
process in the system. This was historically mounted as /debug. /proc
does not consume any disk resources. This interface provides a richer
set of functionality and replaces the now obsolete dbg(4), debug(4)
interface. The "files" of this filesystem are of the form /proc/nnnnn
and /proc/pinfo/nnnnn, where nnnnn is a decimal number corresponding to
the process-ID. These files actually consume no disk space, and are only
convenient handles by which a debugger can attach to a process. The
owner of each ``file'' is determined by the process's user-ID. Files of
the form /proc/nnnnn have permission mode 0600 while files of the form
/proc/pinfo/nnnnn have permission mode 0444. The /proc/pinfo files are
intended for use by unprivileged programs that wish to access
miscellaneous process information such as that provided by ps(1) and
top(1).
The statfs(2) system call will return valid information concerning the
proc filesystem. The total and free blocks as reported by df(1)
respectively represent the total virtual memory (real memory plus swap
space) available and currently free.
Standard system call interfaces are used to access /proc files: open(2),
close(2), read(2), write(2), and ioctl(2). Note that read(2) and
write(2) are not allowed for /proc/pinfo files. Furthermore only the
PIOCACINFO, PIOCPSINFO, PIOCUSAGE, PIOCGETPTIMER and PIOCCRED commands
may be specified to ioctl(2) for /proc/pinfo files. An open for reading
and writing enables process control; a read-only open allows inspection
but not control. As with ordinary files, more than one process can open
the same /proc file at the same time. Exclusive open is provided to
allow controlling processes to avoid collisions: an open(2) for writing
that specifies O_EXCL fails if the file is already open for writing; if
such an exclusive open succeeds, subsequent attempts to open the file for
writing, with or without the O_EXCL flag, fail until the exclusivelyopened
file descriptor is closed. (Exception: a superuser open(2) that
does not specify O_EXCL succeeds even if the file is exclusively opened.)
There can be any number of read-only opens, even when an exclusive write
open is in effect on the file. On a successful open the inherit-on-fork
(PR_FORK) and run-on-last-close (PR_RLC) flags are set by default, if no
other process has the file open. On the last close for writing, if the
kill-on-last-close (PR_KLC) or the PR_RLC flags are set, then all the
controlling flags are cleared and either a SIGKILL is sent to the process
or the process is set running again. If neither of the above two flags
are set, the controlling flags are not cleared.
Page 1
proc(4) proc(4)
Data may be transferred from or to any locations in the traced process's
address space by applying lseek(2) to position the file at the virtual
address of interest followed by read(2) or write(2). The PIOCMAP
operation can be applied to determine the accessible areas (mappings) of
the address space. A contiguous area of the address space may appear as
multiple mappings due to varying read/write/execute permissions. I/O
transfers may span contiguous mappings. An I/O request extending into an
unmapped area is truncated at the boundary.
Information and control operations are provided through ioctl(2). These
have the form:
#include <sys/types.h>
#include <sys/signal.h>
#include <sys/fault.h>
#include <sys/syscall.h>
#include <sys/procfs.h>
void *p;
retval = ioctl(fildes, code, p);
The argument p is a generic pointer whose type depends on the specific
ioctl code. Where not specifically mentioned below, its value should be
zero. <sys/procfs.h> contains definitions of ioctl codes and data
structures used by the operations.
Process information and control operations involve the use of sets of
flags. The set types sigset_t, fltset_t, and sysset_t correspond,
respectively, to signal, fault, and system call enumerations defined in
<sys/signal.h>, <sys/fault.h>, and <sys/syscall.h>. Each set type is
large enough to hold flags for its own enumeration. Although they are of
different sizes, they have a common structure and can be manipulated by
these macros:
prfillset(&set); /* turn on all flags in set */
premptyset(&set); /* turn off all flags in set */
praddset(&set, flag); /* turn on the specified flag */
prdelset(&set, flag); /* turn off the specified flag */
r = prismember(&set, flag); /* != 0 iff flag is turned on */
One of prfillset() or premptyset() must be used to initialize set before
it is used in any other operation. flag must be a member of the
enumeration corresponding to set.
The allowable ioctl codes follow. Certain of these can be used only if
the process file descriptor is open for writing; these include all
operations that affect process control. Those requiring write access are
marked with an asterisk (*). Except where noted, an ioctl to a process
that has terminated elicits the error ENOENT.
Page 2
proc(4) proc(4)
PIOCSTATUS [Toc] [Back]
PIOCSTATUS returns status information for the process; p is a pointer to
a prstatus structure containing at least the following fields:
typedef struct prstatus {
long pr_flags; /* Flags */
short pr_why; /* Reason for stop (if stopped) */
short pr_what; /* More detailed reason */
short pr_cursig; /* Current signal */
sigset_t pr_sigpend; /* Set of pending signals */
sigset_t pr_sighold; /* Set of held signals */
struct siginfo pr_info; /* Info associated with signal/fault */
struct sigaltstack pr_altstack; /* Alternate signal stack info */
struct sigaction pr_action;/* Signal action for current signal */
short pr_syscall; /* System call # (if in syscall) */
short pr_nsysarg; /* # of arguments to this syscall */
long pr_errno; /* Error number from system call */
long pr_rval1; /* System call return value 1 */
long pr_rval2; /* System call return value 2 */
long pr_sysarg[PRSYSARGS]; /* Arguments to this syscall */
pid_t pr_pid; /* Process id */
pid_t pr_ppid; /* Parent process id */
pid_t pr_pgrp; /* Process group id */
pid_t pr_sid; /* Session id */
timespec_t pr_utime; /* Process user cpu time */
timespec_t pr_stime; /* Process system cpu time */
timespec_t pr_cutime; /* Sum of children's user times */
timespec_t pr_cstime; /* Sum of children's system times */
char pr_clname[8]; /* Scheduling class name */
long pr_instr; /* Current instruction */
gregset_t pr_reg; /* General registers */
} prstatus_t;
pr_flags is a bit-mask holding these flags:
PR_STOPPED Process is stopped
PR_ISTOP Process is stopped on an event of interest (see
PIOCSTOP).
PR_DSTOP Process has a stop directive in effect (see
PIOCSTOP).
PR_STEP Process has a single-step directive in effect (see
PIOCRUN).
PR_ASLEEP Process is in an interruptible sleep within a system
call.
PR_PCINVAL Process's current instruction (pr_instr) is
undefined.
Page 3
proc(4) proc(4)
PR_ISSYS Process is a system process (see PIOCSTOP).
PR_FORK Process has its inherit-on-fork flag set (see
PIOCSET).
PR_RLC Process has its run-on-last-close flag set (see
PIOCSET).
PR_KLC Process has its kill-on-last-close flag set (see
PIOCSET).
PR_PTRACE Process is being traced via ptrace(2).
pr_why and pr_what together describe, for a stopped process, the reason
that the process is stopped. Possible values of pr_why are:
PR_REQUESTED The stop occurred in response to a stop directive,
normally because PIOCSTOP was applied. pr_what is
unused in this case.
PR_SIGNALLED The process stopped on receipt of a signal (see
PIOCSTRACE); pr_what holds the signal number that
caused the stop (for a newly-stopped process, the
same value is in pr_cursig).
PR_FAULTED The process stopped on incurring a hardware fault
(see PIOCSFAULT); pr_what holds the fault number
that caused the stop.
PR_SYSENTRY and PR_SYSEXIT
A stop on entry to or exit from a system call (see
PIOCSENTRY and PIOCSEXIT); pr_what holds the system
call number.
PR_JOBCONTROL The process stopped due to the default action of a
job control stop signal (see sigaction(2)); pr_what
holds the stopping signal number.
pr_cursig names the current signal, that is, the next signal to be
delivered to the process. pr_sigpend identifies any other signals
pending for the process. pr_sighold identifies those signals whose
delivery is being delayed if sent to the process.
pr_info, when the process is in a PR_SIGNALLED or PR_FAULTED stop,
contains additional information pertinent to the particular signal or
fault (see <sys/siginfo.h>).
pr_altstack contains the alternate signal stack information for the
process (see sigaltstack(2)). pr_action contains the signal action
information pertaining to the current signal (see sigaction(2)); it is
undefined if pr_cursig is zero.
Page 4
proc(4) proc(4)
pr_syscall is the number of the system call, if any, being executed by
the traced process; it is non-zero if the process is stopped on
PR_SYSENTRY or PR_SYSEXIT, is asleep within a system call (PR_ASLEEP is
set), or is stopped on a watchpoint trap incurred within a system call
(see PIOCSWATCH). If pr_syscall is non-zero, pr_nsysarg is the number of
arguments to the system call and the pr_sysarg array contains the actual
arguments; pr_errno contains the value of errno returned at the last
system call; and pr_rval1 and pr_rval2 contain the return values from the
last system call.
pr_pid, pr_ppid, pr_pgrp, and pr_sid are, respectively, the process id,
the id of the process's parent, the process's process group id, and the
process's session id.
pr_utime, pr_stime, pr_cutime, and pr_cstime are, respectively, the user
CPU and system CPU time consumed by the process, and the cumulative user
CPU and system CPU time consumed by the process's children, in seconds
and nanoseconds.
pr_clname contains the name of the process's scheduling class.
pr_instr contains the machine instruction to which the program counter
refers. The amount of data retrieved from the process is machinedependent;
on SGI machines, it is a 32-bit word. In general, the size is
that of the machine's smallest instruction. If PR_PCINVAL is set,
pr_instr is undefined; this occurs whenever the process is not stopped or
when the program counter refers to an invalid address.
pr_reg is an array holding the contents of the general registers for a
stopped process. For SGI machines the structure gregset_t is defined in
<sys/ucontext.h>. If the process is not stopped, register values are
undefined.
*PIOCSTOP PIOCWSTOP
PIOCSTOP directs the process to stop and waits until it has stopped;
PIOCWSTOP simply waits for the process to stop. These operations
complete when the process stops on an event of interest, immediately if
already so stopped. If p is non-zero it points to an instance of
prstatus_t to be filled with status information for the stopped process.
An ``event of interest'' is either a PR_REQUESTED stop or a stop that has
been specified in the process's tracing flags (set by PIOCSTRACE,
PIOCSFAULT, PIOCSENTRY, and PIOCSEXIT). A PR_JOBCONTROL stop is
specifically not an event of interest. (A process may stop twice due to
a stop signal, first showing PR_SIGNALLED if the signal is traced and
again showing PR_JOBCONTROL if the process is set running without
clearing the signal.) If the process is controlled by ptrace(2), it
comes to a PR_SIGNALLED stop on receipt of any signal; this is an event
of interest only if the signal is in the traced signal set. If PIOCSTOP
is applied to a process that is stopped, but not on an event of interest,
the stop directive takes effect when the process is restarted by the
competing mechanism; at that time the process enters a PR_REQUESTED stop
Page 5
proc(4) proc(4)
before executing any user-level code.
ioctl()s are interruptible by signals so that, for example, an alarm(2)
can be set to avoid waiting forever for a process that may never stop on
an event of interest. If PIOCSTOP is interrupted, the stop directive
remains in effect even though the ioctl() returns an error.
A system process (indicated by the PR_ISSYS flag) never executes at user
level, has no user-level address space visible through /proc, and cannot
be stopped. Applying PIOCSTOP or PIOCWSTOP to a system process elicits
the error EBUSY.
*PIOCRUN
The traced process is made runnable again after a stop. If p is non-zero
it points to a prrun structure describing additional actions to be
performed. The prrun structure contains at least the following fields:
typedef struct prrun {
long pr_flags; /* Flags */
sigset_t pr_trace; /* Set of signals to be traced */
sigset_t pr_sighold; /* Set of signals to be held */
fltset_t pr_fault; /* Set of faults to be traced */
caddr_t pr_vaddr; /* Virtual address at which to resume */
} prrun_t;
pr_flags is a bit-mask describing optional actions; the remainder of the
entries are meaningful only if the appropriate bits are set in pr_flags.
Flag definitions:
PRCSIG Clears the current signal, if any (see PIOCSSIG).
PRCFAULT Clears the current fault, if any (see PIOCCFAULT).
PRSTRACE Sets the traced signal set to pr_trace (see
PIOCSTRACE).
PRSHOLD Sets the held signal set to pr_sighold (see
PIOCSHOLD).
PRSFAULT Sets the traced fault set to pr_fault (see
PIOCSFAULT).
PRSVADDR Sets the address at which execution resumes to
pr_vaddr.
PRSTEP Directs the process to single-step, that is, to run
and to execute a single machine instruction. On
completion of the instruction, a trace trap occurs.
If FLTTRACE is being traced, the process stops,
otherwise it is sent SIGTRAP; if SIGTRAP is being
traced and not held, the process stops. This
operation requires hardware and operating system
Page 6
proc(4) proc(4)
support and may not be implemented on all
processors. It is implemented on SGI machines.
PRCSTEP Cancels any outstanding single-step directive and
any PRSTEP directive set in the current request.
PRSABORT Meaningful only if the process is in a PR_SYSENTRY
stop or is marked PR_ASLEEP; it instructs the
process to abort execution of the system call (see
PIOCSENTRY, PIOCSEXIT).
PRSTOP Directs the process to stop again as soon as
possible after resuming execution (see PIOCSTOP).
In particular if the process is stopped on
PR_SIGNALLED or PR_FAULTED, the next stop will show
PR_REQUESTED, no other stop will have intervened,
and the process will not have executed any userlevel
code.
PIOCRUN fails (EBUSY) if applied to a process that is not stopped on
an event of interest. Once PIOCRUN has been applied, the process is
no longer stopped on an event of interest even if, due to a
competing mechanism, it remains stopped.
*PIOCSTRACE
This defines a set of signals to be traced: the receipt of one of these
signals causes the traced process to stop. The set of signals is defined
via an instance of sigset_t addressed by p. Receipt of SIGKILL cannot be
traced.
If a signal that is included in the held signal set is sent to the traced
process, the signal is not received and does not cause a process stop
until it is removed from the held signal set, either by the process
itself or by setting the held signal set with PIOCSHOLD or the PRSHOLD
option of PIOCRUN.
PIOCGTRACE [Toc] [Back]
The current traced signal set is returned in an instance of sigset_t
addressed by p.
*PIOCSSIG
The current signal and its associated signal information are set
according to the contents of the siginfo structure addressed by p (see
<sys/siginfo.h>). If the specified signal number is zero or if p is
zero, the current signal is cleared. Setting the current signal to
SIGKILL terminates the process immediately, even if it is stopped. All
other signals will be sent after the process is made runnable, if it is
currently stopped.
*PIOCKILL
A signal is sent to the process with semantics identical to those of
kill(2). p points to an int naming the signal. Sending SIGKILL
Page 7
proc(4) proc(4)
terminates the process immediately.
*PIOCUNKILL
A signal is deleted, that is, it is removed from the set of pending
signals. The current signal (if any) is unaffected. p points to an int
naming the signal. It is an error to attempt to delete SIGKILL.
PIOCGHOLD *PIOCSHOLD
PIOCGHOLD returns the set of held signals (signals whose delivery will be
delayed if sent to the process) in an instance of sigset_t addressed by
p. PIOCSHOLD correspondingly sets the held signal set but does not allow
SIGKILL or SIGSTOP to be held.
PIOCMAXSIG PIOCACTION [Toc] [Back]
These operations provide information about the signal actions associated
with the traced process (see sigaction(2)). PIOCMAXSIG returns, in the
int addressed by p, the maximum signal number understood by the system.
This can be used to allocate storage for use with the PIOCACTION
operation, which returns the traced process's signal actions in an array
of sigaction structures addressed by p. Signal numbers are displaced by
1 from array indices, so that the action for signal number n appears in
position n-1 of the array.
*PIOCSFAULT
This defines a set of hardware faults to be traced: on incurring one of
these faults the traced process stops. The set is defined via an
instance of fltset_t addressed by p. Fault names are defined in
<sys/fault.h> and include the following. Some of these may not occur on
all processors; there may be processor-specific faults in addition to
these.
FLTILL illegal instruction
FLTPRIV privileged instruction
FLTBPT breakpoint trap
FLTTRACE trace trap
FLTWATCH watchpoint trap
FLTKWATCH kernel watchpoint trap
FLTACCESS memory access fault
FLTBOUNDS memory bounds violation
FLTIOVF integer overflow
FLTIZDIV integer zero divide
FLTFPE floating-point exception
FLTSTACK unrecoverable stack fault
FLTPAGE recoverable page fault
When not traced, a fault normally results in the posting of a signal to
the process that incurred the fault. If the process stops on a fault,
the signal is posted to the process when execution is resumed unless the
fault is cleared by PIOCCFAULT or by the PRCFAULT option of PIOCRUN.
FLTPAGE and FLTKWATCH are exceptions; no signal is posted. There may be
additional processor-specific faults like this. pr_info in the prstatus
structure identifies the signal to be sent and contains machine-specific
Page 8
proc(4) proc(4)
information about the fault.
PIOCGFAULT [Toc] [Back]
The current traced fault set is returned in an instance of fltset_t
addressed by p.
*PIOCCFAULT
The current fault (if any) is cleared; the associated signal is not sent
to the process.
*PIOCSENTRY *PIOCSEXIT
These operations instruct the process to stop on entry to or exit from
specified system calls. The set of system calls to be traced is defined
via an instance of sysset_t addressed by p.
When entry to a system call is being traced, the traced process stops
after having begun the call to the system but before the system call
arguments have been fetched from the process. When exit from a system
call is being traced, the traced process stops on completion of the
system call just prior to checking for signals and returning to user
level. At this point all return values have been stored into the traced
process's registers.
If the traced process is stopped on entry to a system call (PR_SYSENTRY)
or when sleeping in an interruptible system call (PR_ASLEEP is set), it
may be instructed to go directly to system call exit by specifying the
PRSABORT flag in a PIOCRUN request. Unless exit from the system call is
being traced the process returns to user level showing error EINTR.
PIOCGENTRY PIOCGEXIT [Toc] [Back]
These return the current traced system call entry or exit set in an
instance of sysset_t addressed by p.
PIOCNWATCH [Toc] [Back]
PIOCNWATCH returns, in the int addressed by p, the number of watched
areas supported by the system. This can be used to allocate storage for
use with the PIOCSWATCH and PIOCGWATCH operations, each of which must
provide an array whose number of elements equals the supported number of
watched areas.
*PIOCSWATCH
PIOCSWATCH establishes or clears a set of watched areas in the traced
process; p points to prwatch structure containing at least the following
fields:
typedef struct prwatch {
caddr_t pr_vaddr; /* Virtual address of watched area */
u_long pr_size; /* Size of watched area in bytes */
long pr_wflags; /* Watch type flags */
} prwatch_t;
pr_vaddr specifies the virtual address of an area of memory to be watched
Page 9
proc(4) proc(4)
in the traced process. pr_size specifies the size of the area, in bytes.
pr_wflags specifies the type of memory access to be monitored as a bitmask
of one or more of the following flags (see also PIOCMAP):
MA_READ read access
MA_WRITE write access
MA_EXEC execution access
An entry with a zero value for pr_size clears any previously-established
watched area starting at the specified virtual address. An entry with a
non-empty pr_wflags bit-mask establishes a watched area for the virtual
address range specified by pr_vaddr and pr_size. An entry with an empty
pr_wflags bit-mask is ignored.
A watchpoint is triggered when the traced process makes a memory
reference that covers at least one byte of a watched area and the memory
reference is a mode of interest as specified in pr_wflags. When a
watchpoint is triggered, the process incurs a watchpoint trap. If
FLTWATCH is being traced, the process stops; otherwise it is sent
SIGTRAP; if SIGTRAP is being traced and not held, the process stops. If
the access is a write access, the memory is not modified. If the process
stops, its program counter refers to the instruction that triggered the
watchpoint. pr_info in the prstatus structure contains information
pertinent to the watchpoint trap. In particular, the si_addr field
contains the virtual address of the memory reference that triggered the
watchpoint and the si_code field contains one of MA_READ, MA_WRITE, or
MA_EXEC, indicating read, write or execute access, respectively.
A watchpoint may be triggered while executing a system call that makes
reference to the traced process's memory. Such a system call completes
normally; a kernel watchpoint fault is taken after the system call
completes but before the process returns to user level. If more than one
watchpoint would be triggered by the system call, the first one
encountered is the one reported.
PIOCSWATCH fails with EINVAL if an attempt is made to specify overlapping
watched areas or to specify a watchpoint whose virtual address range
includes invalid virtual addresses in the traced process. PIOCSWATCH
fails with E2BIG if an attempt is made to establish more than the
supported number of watched areas and with ESRCH if an attempt is made to
delete a non-existent watchpoint. An attempt to delete watchpoints on a
running process could result in failure with errno set to EBUSY. This is
a temporary condition that occurs when the kernel is stepping over a
watchpoint and a later subsequent attempt should succeed. This does not
happen if the process is stopped.
Access to a process's memory through /proc will not trigger a watchpoint,
even if the access is from the process itself (which must have opened its
own /proc entry).
Page 10
proc(4) proc(4)
PIOCGWATCH [Toc] [Back]
PIOCGWATCH returns, in the array of prwatch structures addressed by p,
the set of watched areas currently in effect. Elements beyond the number
of actually established watched areas are filled with zeros.
*PIOCSET *PIOCRESET
PIOCSET sets one or more modes of operation for the traced process.
PIOCRESET resets these modes. The modes to be set or reset are specified
by flags in a long addressed by p:
PR_FORK (inherit-on-fork) When set, the process's tracing
flags are inherited by the child of a fork(2). When
reset, child processes start with all tracing flags
cleared.
PR_RLC (run-on-last-close) When set and the last writable
/proc file descriptor referring to the traced
process is closed, all of the process's tracing
flags are cleared, any outstanding stop directive is
canceled, and if the process is stopped, it is set
running as though PIOCRUN had been applied to it.
When reset, the process's tracing flags are retained
and the process is not set running on last close.
PR_KLC (kill-on-last-close) When set and the last writable
/proc file descriptor referring to the traced
process is closed, the process is terminated with
SIGKILL.
It is an error (EINVAL) to specify flags other than those described above
or to apply these operations to a system process. The current modes are
reported in the prstatus structure (see PIOCSTATUS).
Note that a processes using /proc can not assume any default settings for
these flags, as some other process may have attached to the target
earlier and reset the flags and then detached.
PIOCGREG *PIOCSREG
These operations respectively get and set the process general registers
into or out of an array addressed by p; the array has type gregset_t.
Register contents are accessible using a set of predefined indices (see
PIOCSTATUS). No bits of the processor-status register (PSR) or other
privileged registers can be modified by PIOCSREG.
PIOCSREG fails (EBUSY) if applied to a process that is not stopped on an
event of interest. If the process is not stopped, the register values
returned by PIOCGREG are undefined.
PIOCGFPREG *PIOCSFPREG
These operations respectively get and set the process floating-point
registers into or out of a structure addressed by p; the structure has
type fpregset_t. An error (EINVAL) is returned if there is no floating
Page 11
proc(4) proc(4)
point hardware on the machine. PIOCSFPREG fails (EBUSY) if applied to a
process that is not stopped on an event of interest. If the process is
not stopped, the register values returned by PIOCGFPREG are undefined.
*PIOCNICE
The traced process's nice(2) priority is incremented by the amount
contained in the int addressed by p. Only the superuser may better a
process's priority in this way, but any user may make the priority worse.
PIOCPSINFO [Toc] [Back]
This returns miscellaneous process information such as that reported by
ps(1). p is a pointer to a prpsinfo structure containing at least the
following fields:
typedef struct prpsinfo {
char pr_state; /* numeric process state (see pr_sname) */
char pr_sname; /* printable character representing pr_state */
char pr_zomb; /* !=0: process terminated but not waited for */
char pr_nice; /* nice for cpu usage */
u_long pr_flag; /* process flags */
uid_t pr_uid; /* real user id */
gid_t pr_gid; /* real group id */
pid_t pr_pid; /* unique process id */
pid_t pr_ppid; /* process id of parent */
pid_t pr_pgrp; /* pid of process group leader */
pid_t pr_sid; /* session id */
caddr_t pr_addr; /* physical address of process */
long pr_size; /* size of process image in pages */
long pr_rssize; /* resident set size in pages */
long pr_pagesize; /* system page size, in bytes */
caddr_t pr_wchan; /* wait addr for sleeping process */
timespec_t pr_start; /* process start time, sec+nsec since epoch */
timespec_t pr_time; /* usr+sys cpu time for this process */
long pr_pri; /* priority, high value is high priority */
char pr_oldpri; /* pre-SVR4, low value is high priority */
char pr_cpu; /* pre-SVR4, cpu usage for scheduling */
dev_t pr_ttydev; /* controlling tty device (PRNODEV if none) */
char pr_clname[8]; /* Scheduling class name */
char pr_fname[PRCOMSIZ]; /* last component of exec()ed pathname */
char pr_psargs[PRARGSZ]; /* initial characters of arg list */
u_int pr_pset; /* associated processor set name */
cpuid_t pr_sonproc; /* processor running on */
timespec_t pr_ctime; /* usr+sys cpu time for all children */
} prpsinfo_t;
Some of the entries in prpsinfo, such as pr_state and pr_flag, are
system-specific and should not be expected to retain their meanings
across different versions of the operating system. pr_addr is a vestige
of the past and has no real meaning in current systems.
Page 12
proc(4) proc(4)
PIOCPSINFO can be applied to a zombie process (one that has terminated
but whose parent has not yet performed a wait(2) on it).
PIOCNMAP PIOCMAP [Toc] [Back]
These operations provide information about the memory mappings (virtual
address ranges) associated with the traced process. PIOCNMAP returns, in
the int addressed by p, the number of mappings that are currently active.
The PIOCMAP operation may be used to obtain the list of currently active
mappings, which is an array of structures of type prmap_t. The PIOCNMAP
may be used to determine the minimum amount of storage that needs to be
allocated to receive these structures, but the programmer should not
assume that it is the maximum amount needed. If the PIOCNMAP and PIOCMAP
calls are made on a process that is not stopped, the number of maps could
change between the two ioctl calls and caller could fault if too few maps
were allocated to hold the results of PIOCMAP. Note: for a better
interface, see PIOCMAP_SGI below. For PIOCMAP, p addresses an array of
elements of type prmap_t; one array element (one structure) is returned
for each mapping, plus an additional element containing all zeros to mark
the end of the list. The prmap structure contains at least the following
fields:
typedef struct prmap {
caddr_t pr_vaddr; /* Virtual address */
u_long pr_size; /* Size of mapping in bytes */
off_t pr_off; /* Offset into mapped object, if any */
long pr_mflags; /* Protection and attribute flags */
} prmap_t;
pr_vaddr is the virtual address of the mapping within the traced process
and pr_size is its size in bytes. pr_off is the offset within the mapped
object (if any) to which the virtual address is mapped.
pr_mflags is a bit-mask of protection and attribute flags:
MA_READ mapping is readable by the traced process
MA_WRITE mapping is writable by the traced process
MA_EXEC mapping is executable by the traced process
MA_SHARED mapping changes are shared by the mapped object
MA_BREAK mapping is grown by the brk(2) system call
MA_STACK mapping is grown automatically on stack faults
MA_PHYS mapping corresponds to a physical device mapping
PIOCMAP_SGI [Toc] [Back]
This operation provides detailed information about the memory mappings
(virtual address ranges) associated with the traced process. In effect
it performs both a PIOCNMAP and a PIOCMAP call (with additional
information) with one ioctl. The PIOCMAP_SGI operation may be used to
obtain the list of currently active mappings, which is an array of
structures of type prmap_sgi_t. The user must preallocate an array of
the maximum number of mapping structures they are willing to receive.
One array element (one structure) is returned for each mapping, plus an
additional element containing all zeros that also marks the end of the
Page 13
proc(4) proc(4)
list. There is an upper limit to the number of memory mappings that can
be returned by this call, which is defined as PRMAPMAX in the procfs.h
header file. Attempts to request more than the PRMAPMAX number of
mappings results in only PRMAPMAX mappings returned. PIOCMAP_SGI returns
either -1 or the number of mappings that are currently active.
For PIOCMAP_SGI, p addresses a pointer to a structure called
prmap_sgi_arg_t. It contains the following fields:
typedef struct prmap_sgi_arg {
caddr_t pr_vaddr; /* Base of map buffer */
ulong_t pr_size; /* Size of buffer in bytes */
} prmap_sgi_arg_t;
pr_vaddr is the virtual address of the buffer to hold the mappings for
the traced process and pr_size is its size in bytes. The prmap_sgi_t
structure contains at least the following fields:
typedef struct prmap_sgi {
caddr_t pr_vaddr; /* Virtual base address */
ulong_t pr_size; /* Size of mapping in bytes */
off_t pr_off; /* Offset into mapped object, if any */
ulong_t pr_mflags; /* Protection and attribute flags */
pgno_t pr_vsize; /* # valid pages in this segment */
pgno_t pr_psize; /* # private pages in this segment */
pgno_t pr_wsize; /* Cost for this proc weighted base 256 */
pgno_t pr_rsize; /* # referenced pages in this segment */
pgno_t pr_msize; /* # modified pages in this segment */
dev_t pr_dev; /* Device # of segment iff mapped */
ino_t pr_ino; /* Inode # of segment iff mapped */
} prmap_sgi_t;
pr_vaddr is the virtual address of the mapping within the traced process
and pr_size is its size in bytes. pr_off is the offset within the mapped
object (if any) to which the virtual address is mapped. pr_vsize,
pr_psize, pr_wsize, pr_rsize, pr_msize are page counts for the virtual
mapping. pr_dev and pr_dev identify the filesystem resident object from
which the mapping originates (if one exists).
pr_mflags is a bit-mask of protection and attribute flags:
MA_READ mapping is readable by the traced process
MA_WRITE mapping is writable by the traced process
MA_EXEC mapping is executable by the traced process
MA_SHARED mapping changes are shared by the mapped object
MA_BREAK mapping is grown by the brk(2) system call
MA_STACK mapping is grown automatically on stack faults
MA_PHYS mapping corresponds to a physical device mapping
MA_PRIMARY mapping is one of the processes core segments
MA_COW mapping corresponds to a copy on write segment
Page 14
proc(4) proc(4)
MA_NOTCACHED mapped address segment is not cached
MA_SHMEM mapping corresponds to a shared memory mapping
MA_REFCNT_SHIFT amount to shift right mflags to get reference count
PIOCPGD_SGI [Toc] [Back]
This operation provides information about the interior of a memory
mappings (virtual address ranges) associated with the traced process.
The PIOCPGD_SGI operation is be used to obtain the list of page
descriptors, which is an array of structures of type pgd_t. The
PIOCMAP_SGI ioctl may be used to determine the amount of storage that
needs to be allocated to receive these structures. For PIOCPGD_SGI, p
addresses a pointer to a prpgd_sgi_t structure that contains an array of
elements of type prpgd_t. The pgd_t structure contains at least the
following fields:
typedef struct pgd { /* per-page data */
short pr_flags; /* flags */
short pr_value; /* page count/fault offset */
} pgd_t;
The prpgd_sgi_t structure contains at least the following fields:
typedef struct prpgd_sgi {
caddr_t pr_vaddr; /* virtual base address of region to stat */
pgno_t pr_pglen; /* number of pages in data list... */
pgd_t pr_data[1]; /* variable length array of page flags */
} prpgd_sgi_t;
pr_vaddr is the virtual address of the mapping within the traced process
and pr_pglen is length of the pr_data array.
The pr_flags field for each page contains the following flags:
PGF_REFERENCED page is currently valid in system page table
PGF_GLOBAL page is marked global in system page table
PGF_WRITEABLE page is currently writeable in system page table
PGF_NOTCACHED page is marked non-cacheable in system page table
PGF_ISVALID page is marked valid for this process
PGF_ISDIRTY page is marked dirty for this process
PGF_PRIVATE page is marked private to this process
PGF_FAULT the pr_value field contains a fault offset
PGF_USRHISTORY accumulating history flag for caller
PGF_REFHISTORY page has been marked referenced
PGF_WRTHISTORY page has been marked dirty
PGF_VALHISTORY page has been marked valid
PGF_CLEAR clear valid & writeable bits in page table
The pr_value field for each page contains either a reference count or a
fault offset value if the PGF_CLEAR operation was set on a previous call.
This can be used to determine what function or variable inside a page
that the process references or writes frequently.
Page 15
proc(4) proc(4)
PIOCOPENM [Toc] [Back]
The return value retval provides a read-only file descriptor for a mapped
object associated with the traced process. If p is zero the traced
process's exec(2)ed file is found. This enables a debugger to find the
object file symbol table without having to know the pathname of the
executable file. If p is non-zero it points to a caddr_t containing a
virtual address within the traced process and the mapped object, if any,
associated with that address is found; this can be used to get a file
descriptor for a shared library that is attached to the process. On
error (invalid address, physical device mapping, or no mapped object for
the designated address), -1 is returned and errno is set to EINVAL.
PIOCCRED [Toc] [Back]
Fetch the set of credentials associated with the process. p points to an
instance of prcred_t that is filled by the operation. The prcred
structure contains at least the following fields:
typedef struct prcred {
uid_t pr_euid; /* Effective user id */
uid_t pr_ruid; /* Real user id */
uid_t pr_suid; /* Saved user id (from exec) */
gid_t pr_egid; /* Effective group id */
gid_t pr_rgid; /* Real group id */
gid_t pr_sgid; /* Saved group id (from exec) */
u_int pr_ngroups; /* Number of supplementary groups */
} prcred_t;
PIOCGROUPS [Toc] [Back]
Fetch the set of supplementary group IDs associated with the process. p
points to an array of elements of type gid_t, that will be filled by the
operation. PIOCCRED can be applied beforehand to determine the number of
groups (pr_ngroups) that will be returned and the amount of storage that
should be allocated to hold them.
PIOCTLBMISS [Toc] [Back]
Enable special user TLB handling. The TLB is a hardware coprocessor that
makes virtual-to-physical address translations. p points to an integer
that specifies the handling desired. If the value is TLB_COUNT, a record
will be kept of every virtual-address TLB refill that occurs while the
process mapped by fildes is running. If the value is TLB_STD, counting
will be disabled (the default mode). It is important to note that
monitoring TLB efficiency can be a useful tool, but the performance of
the code that refills the TLB will be degraded.
The TLB refill counts can be obtained by PIOCUSAGE. The struct prusage
field pu_utlb accounts for TLB refills that occurred while the process
was running in user mode, and the field pu_ktlb accounts for refills that
occurred while executing system calls on behalf of the user or while
handling hardware interrupt code while the user process was scheduled.
Page 16
proc(4) proc(4)
PIOCUSAGE [Toc] [Back]
PIOCUSAGE returns process usage information. p points to a prusage
structure that is filled by the operation. The fields in a prusage
structure are implementation dependent; no application can assume
portability in this area. See <sys/procfs.h> for the exact definition
for a particular implementation.
The SGI implementation supports the following fields:
typedef struct prusage {
timespec_t pu_tstamp; /* time stamp */
timespec_t pu_starttime; /* process start time */
timespec_t pu_utime; /* user CPU time */
timespec_t pu_stime; /* system CPU time */
u_long pu_minf; /* minor (mapping) page faults */
u_long pu_majf; /* major (disk) page faults */
u_long pu_utlb; /* user TLB misses */
u_long pu_nswap; /* number of swaps */
u_long pu_gbread; /* gigabytes ... */
u_long pu_bread; /* and bytes read */
u_long pu_gbwrit; /* gigabytes ... */
u_long pu_bwrit; /* and bytes written */
u_long pu_sigs; /* signals received */
u_long pu_vctx; /* voluntary context switches */
u_long pu_ictx; /* involuntary context switches */
u_long pu_sysc; /* system calls */
u_long pu_syscr; /* read() system calls */
u_long pu_syscw; /* write() system calls */
u_long pu_syscps; /* poll() or select() system calls */
u_long pu_sysci; /* ioctl() system calls */
u_long pu_graphfifo; /* graphics pipeline stalls */
u_long pu_graph_req[8]; /* graphics resource requests */
u_long pu_graph_wait[8]; /* graphics resource waits */
u_long pu_size; /* size of swappable image in pages */
u_long pu_rss; /* resident size of swappable image */
u_long pu_inblock; /* block input operations */
u_long pu_oublock; /* block output operations */
u_long pu_vfault; /* total number of vfaults */
u_long pu_ktlb; /* kernel TLB misses */
} prusage_t;
PIOCGETPTIMER [Toc] [Back]
PIOCGETPTIMER returns an array of timers indicating the amount of time
the process has spent in each of the following states:
#include <time.h>
#include <sys/timers.h>
struct timespec ptime[MAX_PROCTIMER];
Page 17
proc(4) proc(4)
AS_USR_RUN running in user mode
AS_SYS_RUN running in system mode
AS_INT_RUN running in interrupt mode
AS_BIO_WAIT waiting for block I/O
AS_MEM_WAIT waiting for memory
AS_SELECT_WAIT waiting in select
AS_JCL_WAIT stopped because of job control
AS_RUNQ_WAIT waiting to run on run queue
AS_SLEEP_WAIT waiting for resource
AS_STRMON_WAIT waiting for the stream monitor
AS_PHYSIO_WAIT waiting for raw I/O
p is a pointer to an array of MAX_PROCTIMER timespec structures.
PIOCOPENPD [Toc] [Back]
PIOCOPENPD is not currently implemented on SGI machines. It is under
consideration for future releases.
The return value retval provides a read-only file descriptor for a ``page
data file'', enabling tracking of address space references and
modifications on a per-page basis.
A read(2) of the page data file descriptor returns structured page data
and atomically clears the page data maintained for the file by the
system. That is to say, each read returns data collected since the last
read; the first read returns data collected since the file was opened.
When the call completes, the read buffer contains the following structure
as its header and thereafter contains a number of variable length
structures that must be accessed by walking linearly through the buffer.
typedef struct prpageheader {
timespec_t tstamp; /* real time time stamp */
u_long nmap; /* number of address space mappings */
u_long npage; /* total number of pages */
} prpageheader_t;
The header is followed by nmap variable-length prasmap structures:
typedef struct prasmap {
caddr_t vaddr; /* virtual address */
u_long npage; /* number of pages in mapping */
u_char data[1]; /* referenced, modified, present flags */
} prasmap_t;
The data[] array is of variable length, with one entry for each page in
the mapping, npage entries altogether, rounded up with empty entries at
the end so that the structure size is an integral numbers of long's.
data[] entries may contain these flags:
PG_PRESENT page is resident in memory now
Page 18
proc(4) proc(4)
PG_REFERENCED page has been referenced since last read
PG_MODIFIED page has been modified since last read
If the read buffer is not large enough to contain all of the page data,
the read fails with E2BIG and the page data is not cleared. The required
size of the read buffer can be determined through fstat(2). Application
of lseek(2) to the page data file descriptor is ineffective. Closing the
page data file terminates the system overhead associated with collecting
the data.
PIOCGETPR PIOCGETU [Toc] [Back]
These operations copy, respectively, the traced process's proc structure
and user area into the buffer addressed by p. They are provided for
completeness but it should be unnecessary to access either of these
structures directly since relevant status information is available
through other control operations. Their use is discouraged because a
program making use of them is tied to a particular version of the
operating system.
PIOCGETPR can be applied to a zombie process (see PIOCPSINFO).
PIOCACINFO [Toc] [Back]
PIOCACINFO returns the currently accumulated accounting information for
the process. p points to a pracinfo structure that is filled in by the
operation. The fields in pracinfo are implementation dependent; no
application can assume portability in this area. See <sys/procfs.h> and
<sys/extacct.h> for the exact definition of a particular implementation.
The SGI implementation supports the following fields:
typedef struct pracinfo {
char pr_version; /* Accounting data version */
char pr_flag; /* Miscellaneous flags */
char pr_nice; /* Nice value */
unchar pr_sched; /* Scheduling discipline */
/* (see sys/schedctl.h) */
__int32_t pr_spare1; /* reserved */
ash_t pr_ash; /* Array session handle */
prid_t pr_prid; /* Project ID */
time_t pr_btime; /* Begin time (in secs since 1970)*/
time_t pr_etime; /* Elapsed time (in HZ) */
__int32_t pr_spare2[2]; /* reserved */
struct acct_timers pr_timers; /* Assorted timers: see extacct.h */
struct acct_counts pr_counts; /* Assorted counters: (ditto) */
__int64_t pr_spare3[8]; /* reserved */
} pracinfo_t;
PIOCGETSN0EXTREFCNTRS PIOCGETSN0REFCNTRS [Toc] [Back]
PIOCGETSN0EXTREFCNTRS returns the extended memory reference counter
values in an Origin system for a specified virtual address space range.
See refcnt(5).
Page 19
proc(4) proc(4)
The third argument is used to specify the virtual address space range and
the user buffer where to store the counter values. This argument is of
type sn0_refcnt_args_t, as defined in <sys/SN/hwcntrs.h>:
typedef struct sn0_refcnt_args {
caddr_t vaddr;
long len;
sn0_refcnt_buf_t* buf;
} sn0_refcnt_args_t;
The first field vaddr is the base of the virtual address space range, the
field len is the corresponding length in bytes, and the field buf is a
pointer to a user buffer where the system will store the counter values
and additional information. This buffer is an array of elements of type
sn0_refcnt_buf_t, where each element corresponds to the counter
information associated with one hardware page:
typedef struct sn0_refcnt_buf {
sn0_refcnt_set_t refcnt_set;
__uint64_t paddr;
__uint64_t page_size;
cnodeid_t cnodeid;
} sn0_refcnt_buf_t;
The field refcnt_set contains the set of counters associated with the
virtual address passed via sn0_refcnt_args, paddr is the address of the
physical page associated with this virtual address, page_size is the page
size being used to map it, and cnodeid is the physical page home node,
expressed in terms of Compact Node Identifiers which can be mapped back
to node names using the command topology(1). The refcnt_set type is
defined by
typedef struct sn0_refcnt_set {
refcnt_t refcnt[SN0_REFCNT_MAX_COUNTERS];
__uint64_t flags;
} sn0_refcnt_set_t;
The field refcnt is the actual set of counters (one counter per node),
and flags is a state vector reserved for future use. The counters in
refcnt are ordered according to the Compact Node Identifiers, also known
as cnodeids (numa(5)).
PIOCGETSN0REFCNTRS instructs the system to return the actual hardware
counter values instead of the extended software counter values returned
by PIOCGETSN0EXTREFCNTRS.
Page 20
proc(4) proc(4)
The following section of code shows an example of use for this interface:
#include <sys/types.h>
#include <stdio.h>
#include <malloc.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/prctl.h>
#include <procfs/procfs.h>
#include <sys/syssgi.h>
#include <sys/sysmp.h>
#include <sys/SN/hwcntrs.h>
/*
* This routine makes two assumptions that may not
* be true in all systems:
* Length of hardware page (counter granularity): 0x1000 bytes
* Length of base software page (smallest mappable memory area): 0x4000 bytes
*/
void
print_refcounters(char* vaddr, int len)
{
pid_t pid = getpid();
char pfile[256];
int fd;
sn0_refcnt_buf_t* refcnt_buffer;
sn0_refcnt_buf_t* direct_refcnt_buffer;
sn0_refcnt_args_t* refcnt_args;
int npages;
int gen_start;
int numnodes;
int page;
int node;
sprintf(pfile, "/proc/%05d", pid);
if ((fd = open(pfile, O_RDONLY)) < 0) {
fprintf(stderr,"Can't open /proc/%d", pid);
exit(1);
}
vaddr = (char *)( (unsigned long)vaddr & ~0xfff );
npages = (len + 0xfff) >> 12;
if ((refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) * npages)) == NULL) {
perror("malloc refcnt_buffer");
exit(1);
}
if ((direct_refcnt_buffer = malloc(sizeof(sn0_refcnt_buf_t) * npages)) == NULL) {
perror("malloc refcnt_buffer");
exit(1);
|