LIBPERFEX(3C) LIBPERFEX(3C)
libperfex, start_counters, read_counters, print_counters, print_costs,
load_costs - A procedural interface to processor event counters
int start_counters( int e0, int e1 );
int read_counters( int e0, long long *c0, int e1, long long *c1);
int print_counters( int e0, long long c0, int e1, long long c1);
int print_costs( int e0, long long c0, int e1, long long c1);
int load_costs(char *CostFileName);
FORTRAN SYNOPSIS
INTEGER*8 c0, c1
INTEGER e0, e1
CHARACTER(*n) CostFileName
INTEGER*4 function start_counters( e0, e1 )
INTEGER*4 function read_counters( e0, c0, e1, c1 )
INTEGER*4 function print_counters( e0, c0, e1, c1 )
INTEGER*4 function print_costs( e0, c0, e1, c1 )
INTEGER*4 function load_costs( CostFileName )
These routines provide simple access to the hardware event counters. The
arguments e0 and e1 are int types specifying which events to count. For
descriptions of the counters themselves, see the perfex(1) or the
r10k_counters(5) man page.
The counts are returned in the long long arguments c0 and c1. The
print_counters routine prints the counts to standard error. Two events
which must be counted on the same hardware counter will cause a
conflicting counters error. The arguments e0 and e1 can be overridden by
setting the environment variables T5_EVENT0 and T5_EVENT1. Calls to
start_counters implicitly zero out the internal software counters before
starting them, while read_counters implicitly stops the counters after
reading them. Thus if you want to accumulate counts over multiple
start/read calls, you must save out the counts and do the accumulation
yourself. Each of these incurs a typical system call overhead which can
amount to a few hundred microseconds per start/stop call pair. The
print_counters procedure is just a formatting convenience and has no
effect on the state of the counters.
TIME ESTIMATES AND COST TABLES
The print_costs routine prints the counts together with approximate time
estimates as described for perfex. The load_costs procedure allows a
cost table to be loaded from a file. Cost ranges reflect both the width
of the cost distribution and uncertainty in the degree of overlap. Costs
for different events are not exclusive due to the overlap of one type of
event with others. By default a table of costs for each event
appropriate to the host system is used. If the file /etc/perfex.costs is
present, it is used instead. If a table is loaded using load_costs, it is
used instead. Partial tables may be used; the remaining entries taking
Page 1
LIBPERFEX(3C) LIBPERFEX(3C)
the appropriate default values. The format of the cost table and how to
dump the default cost table for the current system are described in the
perfex manpage.
Normal completion returns a positive integer, the generation number.
Negative return values signal an error. The underlying operating system
counter interface may produce other errors. Among these are indications
that the counter resource is in use, possibly because of monitoring by
another user on the system (with root privileges) in system mode. The
generation numbers increment on a per-process basis each time the
counters are enabled or stopped. Because the counters may have been used
by a supervisor-privilege process after being enabled by the user with
start_counters, but before being read by read_counters, all correct uses
of the interface must check that the generation number returned by
read_counters matches the generation number from the corresponding
start_counters, for each process. For example, To collect instruction
and data scache miss counts on a program normally executed by
int e0,e1
long long c0,c1;
int gen_start, gen_read
/* ... set e0 and e1 ... */
if((gen_start=start_counters(e0,e1)) < 0) {
perror("start_counters");
}
/* user code */
if((gen_read=read_counters(e0,&c0,e1,&c1))<0) {
perror("read_counters");
}
if(gen_read != gen_start) {
perror("lost counters!, aborting...");
exit(1);
}
/* do something with c0,c1 */
illustrates correct use. Counter measurements are typically not precisely
reproducible for most events. For example, counts of cache misses depend
on the change in the cache population due to other processes when the
target process is not running. Invalidation and intervention counts may
depend on the temporal ordering of memory accesses by different
processors, which is not guaranteed. Issued instruction counts depend on
the R10000's dynamic scheduling of instructions, which in turn can depend
on the pattern of instruction cache misses. Since the cache misses are
affected by the runtime environment as noted above, this count is also
not precisely reproducible. In almost all cases, however, while the
exact count may not be reproducible, the amount of time attributable to
any fluctuations is a tiny fraction of the execution time. Conversely,
Page 2
LIBPERFEX(3C) LIBPERFEX(3C)
any event which accounts for an important fraction of the execution time
will have a count with small relative fluctuations from run to run.
This interface is not reentrant or (pthread) thread-safe. The
start_counters, read_counters, print_counters and print_costs, routines
can be called by individual sproc threads. However there is only
provision for a single global cost table, so load_costs should be called
only in a sequential region, and that cost table must be applied to the
counts from all threads.
/usr/lib/libperfex.so /usr/lib32/libperfex.so /usr/lib64/libperfex.so
These procedures are only available on systems with hardware performance
counters (systems with R10000 or R12000 processors). Use on systems with
mixed processor types will have undefined results in systems with mixed
processor types.
Manipulating the counters with these routines and simultaneously through
perfex or the OS counter interface ioctl procedures can produce an error
if, for example, the counters are enabled twice without being released in
the interim.
perfex(1), r10k_counters(5), abi(5), mips4(5), mips3(5)
PPPPaaaaggggeeee 3333 [ Back ]
|