SPEEDSHOP(1) SPEEDSHOP(1)
SpeedShop, speedshop - an integrated package of performance tools
SpeedShop is the generic name for an integrated package of performance
tools to run performance experiments on executables, and to examine the
results of those experiments. It also supports starting a process, in
such a way as to permit a debugger to attach to it, and it supports
running Purify on executables.
For Purify and for some experiments instrumentation is necessary; if so,
it will be performed automatically, and the resulting instrumented
executable run to generate the data.
SUPPORTED EXECUTABLES [Toc] [Back] SpeedShop works under IRIX 6.2, or later, and supports executables
compiled with the IRIX 6.2 compilers (o32, n32 and 64), or with the
MIPSPro 7.x compilers (n32 and 64). SpeedShop supports C, C++, FORTRAN,
ADA, and asm programs. Programs must be built using shared libraries
(DSOs); nonshared or stripped executables are not supported.
RECORDING EXPERIMENTS [Toc] [Back] Experiments are recorded using the ssrun(1) command, as follows:
ssrun -<exptype> <a.out-name> <a.out arguments>
where <exptype> is one of the named experiments listed below.
The result of an experiment is one or more files that are named by the
following convention:
<a.out-name>.<exptype>.<code><pid>
where <code> is:
'm' for the master process created by ssrun;
'p' for a process created by a call to sproc();
'f' for a process created by a call to fork();
'e' for a process created by a call to exec();
's' for a process created by a call to system(); and
'fe' for the exec'd process created by calls to fork() and exec()
with environment variable _SPEEDSHOP_TRACE_FORK_TO_EXEC being set to False.
To start the target process running, and leave it in a state to attach
a debugger, add the -hang flag:
ssrun -hang -<exptype> <a.out-name> <a.out arguments>
To get more detailed information about the run, add the -v
flag:
ssrun -v -<exptype> <a.out-name> <a.out arguments>
-orssrun
-v -hang -<exptype> <a.out-name> <a.out arguments>
To run Purify on an executable, use:
ssrun -purify <a.out-name> <a.out arguments>
Purify and performance experiments are mutually exclusive.
Page 1
SPEEDSHOP(1) SPEEDSHOP(1)
ssrun may take additional arguments; see its man page for further
information.
The following experiment types, specified by <exptype> above, are
supported in the current release:
usertime and totaltime
uses statistical callstack profiling, based on process virtual time
(including time spent when the system is running on behalf of the
process) for usertime and wall clock time for totaltime, with a time
sample interval of 30 milliseconds.
Note: o32 executables must explicitly link with -lexc for these
experiments to work; program execution may show significant slowdown
compared to the original executable; the stack unwind code sometimes
fails to completely unwind the stack; consequently, caller
attribution can not be done beyond the point of failure.
[f]pcsamp[x]
uses statistical PC sampling, using 16-bit bins, based on user and
system time, with a sample interval of 10 milliseconds. If the
optional f prefix is specified, a sample interval of 1 millisecond
will be used. If the optional x suffix is specified, a 32-bit bin
size will be used.
ideal
uses basic-block counting, done by instrumenting the executable.
fpe does tracing of all floating-point exceptions.
io does tracing of various I/O system calls.
On machines with hardware performance counters, (R10000 machines), the
following additional types are supported:
[f]gi_hwc
uses statistical PC sampling, based on overflows of the graduatedinstruction
counter, at an overflow interval of 32771. If the
optional f prefix is used, the overflow interval will be 6553.
[f]cy_hwc
uses statistical PC sampling, based on overflows of the cycle
counter, at an overflow interval of 16411. If the optional f prefix
is used, the overflow interval will be 3779.
[f]ic_hwc
uses statistical PC sampling, based on overflows of the primary
instruction-cache miss counter, at an overflow interval of 2053. If
the optional f prefix is used, the overflow interval will be 419.
Page 2
SPEEDSHOP(1) SPEEDSHOP(1)
[f]isc_hwc
uses statistical PC sampling, based on overflows of the secondary
instruction-cache miss counter, at an overflow interval of 131. If
the optional f prefix is used, the overflow interval will be 29.
[f]dc_hwc
uses statistical PC sampling, based on overflows of the primary
data-cache miss counter, at an overflow interval of 2053. If the
optional f prefix is used, the overflow interval will be 419.
[f]dsc_hwc
uses statistical PC sampling, based on overflows of the secondary
data-cache miss counter, at an overflow interval of 131. If the
optional f prefix is used, the overflow interval will be 29.
[f]tlb_hwc
uses statistical PC sampling, based on overflows of the TLB miss
counter, at an overflow interval of 257. If the optional f prefix
is used, the overflow interval will be 53.
[f]gfp_hwc
uses statistical PC sampling, based on overflows of the graduated
floating-point instruction counter, at an overflow interval of
32771. If the optional f prefix is used, the overflow interval will
be 6553.
[f]fsc_hwc
uses statistical PC sampling, based on overflows of the failed store
conditionals counter, at an overflow interval of 2003. If the
optional f prefix is used, the overflow interval will be 401.
prof_hwc
uses statistical PC sampling, based on overflows of the counter
specified by the environment variable _SPEEDSHOP_HWC_COUNTER_NUMBER,
at an interval given by the environment variable
_SPEEDSHOP_HWC_COUNTER_OVERFLOW. Note that these environment
variables can not be used to override the counter number or interval
for the other defined experiments. They are examined only when the
prof_hwc experiment is specified. The default counter is the
primary instruction-cache miss counter and the default overflow
interval is 2053.
gi_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the graduated-instruction counter, at an
overflow interval of 1000003.
cy_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the cycle counter, at an overflow interval of
10000019.
Page 3
SPEEDSHOP(1) SPEEDSHOP(1)
ic_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the primary instruction-cache-miss counter, at
an overflow interval of 8009.
isc_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the secondary instruction-cache-miss counter,
at an overflow interval of 2003.
dc_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the primary data-cache-miss counter, at an
overflow interval of 8009.
dsc_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the secondary data-cache-miss counter, at an
overflow interval of 2003.
tlb_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the TLB miss counter, at an overflow interval
of 2521.
gfp_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the graduated floating-point instruction
counter, at an overflow interval of 10007.
fsc_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the failed store conditionals counter, at an
overflow interval of 5003.
prof_hwctime
profiles the counter specified by the environment variable
_SPEEDSHOP_HWC_COUNTER_PROF_NUMBER using statistical call-stack
sampling, based on overflows of the counter specified by the
environment variable _SPEEDSHOP_HWC_COUNTER_NUMBER, at an interval
given by the environment variable _SPEEDSHOP_HWC_COUNTER_OVERFLOW.
Note that these environment variables can not be used to override
the counter numbers or interval for the other defined experiments.
They are examined only when the prof_hwctime experiment is
specified. The default overflow and profling counter is the cycle
counter and the default overflow interval is 10000019.
One additional experiment type may be recorded, but no report generation
for it is yet supported. It is:
Page 4
SPEEDSHOP(1) SPEEDSHOP(1)
heap does tracing of all malloc and free, etc. calls, and also supports
various options for debugging heap usage.
Custom experiments will be supported in future releases.
Report generation is done through the prof(1) command:
prof <output file> . . . <output file>
It will add the data from all of the output files, and produce a listing
which depends on the particular experiment type. For all experiments, it
will produce a list of functions, annotated with the appropriate metric.
For [f]pcsamp[x], and the various *_hwc experiments, the function list is
annotated with the exclusive metric; for the PC sampling experiments,
the metric is exclusive time, for the various hardware counter profiling
experiments the metric is exclusive counts.
For ideal experiments, the function list is annotated with a cycle count
and percentage, a cumulative percentage for that function and all others
above it in the list, an estimated of idealized time, an instruction
execution count, and a call count. If the -b[utterfly] flag is added, a
list of callers and callees of each function is also produced.
For usertime and totaltime and the various *_hwctime experiments, the
function list is annotated with percentage of time or counts for the
function, the time in that function, and the time or counts in that
function and its descendants, and a count of the number of callstacks
containing that function. If the -b[utterfly] flag is added, a list of
callers and callees of each function is also produced.
For fpe experiments, the function list is annotated with the percentage
of FPEs in that function, and counts for the function and its
descendants. If the -b[utterfly] flag is added, a list of callers and
callees of each function is also produced.
For io experiments, the function list is annotated with the percentage of
IO calls in that function, and counts for the function and its
descendants. If the -b[utterfly] flag is added, a list of callers and
callees of each function is also produced.
There are many additional options to prof; see the prof(1) man page for
further details.
CALIPER SAMPLES
In the current releases, caliper samples may be recorded, and the
-calipers option to prof, will allow you to see the data for any
caliper-setting.
Caliper samples are supported in three different ways. First, the user
can explicitly link with the SpeedShop runtime, and call its API routine
to record a caliper sample; second, the user can define a signal to be
used to record a caliper sample, by specifying the environment variable
Page 5
SPEEDSHOP(1) SPEEDSHOP(1)
_SPEEDSHOP_CALIPER_POINT_SIG and send the target the specified signal;
third, a caliper-sample trap may be set in either dbx, or the WorkShop
debugger. In the current debuggers, this is done by planting an stop
trap (breakpoint), and, when the process stops, evaluating the
expression:
ssrt_caliper_point(0, 0)
the evaluation of the expression always returns zero, but a side effect
of the evaluation is the recording of the appropriate data. After
evaluation, process execution may be resumed. See the ssapi(3) man page
for further details.
USER ENVIRONMENT VARIABLE CONTROLS [Toc] [Back] Various environment variables are normally used to control the operation
of SpeedShop. They are:
_SPEEDSHOP_VERBOSE
causes a log of each program's operation to be written to stderr.
If it is set to an empty string, only major events are logged; if it
is set to a non-empty string, more detailed events are logged.
_SPEEDSHOP_SILENT
if set, suppresses all output, other than fatal error messages from
SpeedShop. If both _SPEEDSHOP_VERBOSE and _SPEEDSHOP_SILENT are
set, _SPEEDSHOP_SILENT wins.
_SPEEDSHOP_CALIPER_POINT_SIG <signal-number>
if specified, gives a signal number to be used for recording a
caliper-point in the experiment.
_SPEEDSHOP_POLLPOINT_CALIPER_POINT <timer_type>,<timer_interval>
if specified, defines the timer type and the timer interval (in
secs) for pollpoint caliper points.
_SPEEDSHOP_OUTPUT_DIRECTORY
if specified, the output data files and the instrumented binaries
will be put in the named directory.
_SPEEDSHOP_OUTPUT_FD
if specified, gives the number of the file descriptor to be used for
writing the output file. Note: this option is not supported in the
current release.
_SPEEDSHOP_REUSE_FILE_DESCRIPTORS
if set, opens and closes the file descriptors for the output files
every time performance data is to be written. If the target program
is using chdir(), then the _SPEEDSHOP_OUTPUT_DIRECTORY environment
variable should also be set to the full pathname of the directory
where the output files are to be put.
_SPEEDSHOP_OUTPUT_FILENAME
if specified, the given name will be used for the output file; if
_SPEEDSHOP_OUTPUT_DIRECTORY is also specified, it will be prepended
Page 6
SPEEDSHOP(1) SPEEDSHOP(1)
to the name.
_SPEEDSHOP_HWC_COUNTER_NUMBER
specifies the overflow counter to be used for prof_hwc or
prof_hwctime experiments. Counters are numbered between 0 and 31,
and are described in the MIPS R10000 Microprocessor User's Manual,
Chapter 14. Counter 0 counters are numbered 0-15, and counter 1
counters are numbers 16-31.
_SPEEDSHOP_HWC_COUNTER_OVERFLOW
specifies the overflow value for the counter to be used in prof_hwc
or prof_hwctime experiments. The value chosen may be any number
greater than 0. Some choices may produce data that is not
statistically random, but rather reflects a correlation between the
overflow interval and a cyclic behavior in the application. Users
may want to do two or more runs with different overflow values.
_SPEEDSHOP_HWC_COUNTER_PROF_NUMBER
specifies the profiling counter to be used for prof_hwctime
experiments. Counters are numbered between 0 and 31, and are
described in the MIPS R10000 Microprocessor User's Manual, Chapter
14. Counter 0 counters are numbered 0-15, and counter 1 counters
are numbers 16-31.
_SPEEDSHOP_HWC_COUNTER_SIGNAL_NUMBER
specifies the profiling signal to be used for prof_hwctime
experiments.
_SPEEDSHOP_OUTPUT_NOCOMPRESS
if set, disables the compression of performance data.
Other variables will be documented in the future releases.
PROCESS TRACKING ENVIRONMENT VARIABLE CONTROLS
Various environment variables may be used for controlling the treatment
of processes spawned from the original target. They are:
_SPEEDSHOP_TRACE_FORK {True|False}
if True, specifies that processes spawned by calls to fork() will be
monitored, if they do not call exec(). If they do call exec(), and
_SPEEDSHOP_TRACE_FORK_TO_EXEC is not set to True, the data covering
the time between the fork() and the exec() will be discarded. It is
True by default. Note: in the current release, data will be
recorded independent of whether the process calls exec() or not.
_SPEEDSHOP_TRACE_FORK_TO_EXEC {True|False}
if True, specifies that process spawned by calls to fork() will be
monitored, even if they also call exec(). It is False by default.
_SPEEDSHOP_TRACE_EXEC {True|False}
if True, specifies that process spawned by calls to any of the
various flavors of exec() will be monitored. It is True by default.
Page 7
SPEEDSHOP(1) SPEEDSHOP(1)
_SPEEDSHOP_TRACE_SPROC {True|False}
if True, specifies that process spawned by calls to sproc() will be
monitored. It is True by default.
_SPEEDSHOP_TRACE_SYSTEM {True|False}
if True, specifies that process spawned by calls to system() will be
monitored. It is False by default.
EXPERT-MODE ENVIRONMENT VARIABLE CONTROLS
Various additional environment variables may be used for debugging and
finer control of the operation of SpeedShop. They are:
_SPEEDSHOP_SAMPLING_MODE
for PC-sampling and hardware-counter profiling, if set to 1, will
generate data for the base executable only. If it is not set, or
set to anything other than 1, data is generated for the executable
and all DSOs it uses.
_SPEEDSHOP_INIT_DEFERRED_SIG <signal-number>
If specified, initialization of the experiment will not be performed
when the target process starts, but rather will be delayed until the
specified signal is sent to the process. A handler for the given
signal will be installed when the process starts, and it is the
users responsibility to ensure that it is not overridden by the
target code. If the process terminates before the signal is
received, no data will be recorded.
_SPEEDSHOP_SHUTDOWN_SIG <signal-number>
If specified, termination of the experiment will not be performed
when the target process exits, but rather will happen when the
specified signal is sent to the process. A handler for the given
signal will be installed when the process starts, and it is the
users responsibility to ensure that it is not overridden by the
target code. If the process terminates before the signal is
received, data is recorded normally.
_SPEEDSHOP_EXPERIMENT_TYPE
passes the name of the experiment to the runtime. It is normally
set by ssrun(1), but may be overwritten.
_SPEEDSHOP_MARCHING_ORDERS
passes the marching orders of the experiment to the runtime. It is
normally set by ssrun(1) from the experiment type, but may be
overwritten.
_SPEEDSHOP_SBRK_BUFFER_LENGTH
defines the segment grow size for the internal malloc arena used.
This arena is completely separate from the user's arena, and it
usually grows in default segments of size 0x100000.
Page 8
SPEEDSHOP(1) SPEEDSHOP(1)
_SPEEDSHOP_SBRK_BUFFER_ADDR
defines the preferred starting address to be used for the internal
malloc arena. This option has to used with extreme care since it
might result in memory regions overlap.
_SPEEDSHOP_FILE_BUFFER_LENGTH
defines the size of the buffer used for writing the experiment
files. The default length is 64KB. The buffer is only used for
writing many small records to the file (as in tracing experiments);
large records are written directly, to avoid the buffering overhead.
_SPEEDSHOP_DEBUG_NO_SIG_TRAPS
if set, disables the normal setting of signal handlers for all fatal
and exit signals.
_SPEEDSHOP_DEBUG_NO_STACK_UNWIND
if set, suppresses the stack unwind as done in usertime, totaltime,
or other callstack-based experiments. The option is used as a
workaround for various unwind bugs in libexc.
_SPEEDSHOP_RLD
defines the full pathname to rld to be used and enables rld
profiling (for pcsamp and hwc experiments only).
_SPEEDSHOP_INSTR_ARGS
defines additional instrumentation arguments.
Instrumentation is done with the pixie(1) command, invoked automatically
by ssrun(1), and, if necessary for DSOs that are opened during a run, by
the runtime library. Users normally would not invoke pixie(1) directly.
In the current release, instrumented executables and DSOs appear in the
current working directory. In a future release, the DSOs will be cached.
SPEEDSHOP API ROUTINES [Toc] [Back] The SpeedShop API routines are defined in the include file
"SpeedShop/api.h", installed in /usr/include. It defines three entry
points, described int the SpeedShop API man page, ssapi(3).
SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES [Toc] [Back] The SpeedShop facility for users to add custom data capture routines is
not available in the current release.
MISCELLANEOUS UTILITY PROGRAMS [Toc] [Back] Several utility routines are provided, in addition to the main
functionality in SpeedShop. They are:
sscord
and ssorder are used to generate cord feedback files from recorded
data. sswsextr is a script to produce the working-set files used
for cord computations. See their respective man pages for more
Page 9
SPEEDSHOP(1) SPEEDSHOP(1)
information.
ssusage
is a variant of time(1) that prints more information about the
resource usage of a program. See ssusage(1) for more information.
ssdump
is a program which produces a formatted dump of a SpeedShop
experiment.
squeeze
is a program which allocates and locks down memory, making the
system behave as if it had less physical memory that it really does.
See squeeze(1) for more information.
thrash
is a program that allocates memory, and then touches all of the
pages, in order to force other pages out of the system's physical
memory. See thrash(1) for more information.
fbdump
is a program that dumps out the contents of the compiler feedback
files produced by the -feedback option to prof(1). See fpdump(1)
and prof(1) for more information.
Depending on the revision of the R10000 CPUs there is a difference in the
interpretation of counter number 14 (``Virtual coherency condition'' for
parts before revision 3.1 or ``ALU/FPU completion cycles'' for parts at
revision 3.1 or later). There are also some subtle differences in the
semantics of some of the counters.
In systems with a homogeneous deployment of CPUs at the same revision,
speedshop will adjust the reported information accordingly.
For systems with a mixed deployment of CPU revisions including some
before 3.1 and some at or after 3.1, the interpretation of counter 14 is
undefined, and there may be some slight inaccuracies due to aggregation
of counters with different semantics across all CPUs.
Identification of the revisions for all CPUs can be made using the -v
option to hinv(1).
ssrun(1), ssdump(1), prof(1), pixie(1), fbdump(1), ssusage(1),
squeeze(1), thrash(1), malloc_ss(3), fpe_ss(3), io_ss(3), ssapi(3),
perfex(1), r10k_counters(5), sscord(1), ssorder(1), sswsextr(1)
SPEEDSHOP(1) SPEEDSHOP(1)
SpeedShop, speedshop - an integrated package of performance tools
SpeedShop is the generic name for an integrated package of performance
tools to run performance experiments on executables, and to examine the
results of those experiments. It also supports starting a process, in
such a way as to permit a debugger to attach to it, and it supports
running Purify on executables.
For Purify and for some experiments instrumentation is necessary; if so,
it will be performed automatically, and the resulting instrumented
executable run to generate the data.
SUPPORTED EXECUTABLES [Toc] [Back] SpeedShop works under IRIX 6.2, or later, and supports executables
compiled with the IRIX 6.2 compilers (o32, n32 and 64), or with the
MIPSPro 7.x compilers (n32 and 64). SpeedShop supports C, C++, FORTRAN,
ADA, and asm programs. Programs must be built using shared libraries
(DSOs); nonshared or stripped executables are not supported.
RECORDING EXPERIMENTS [Toc] [Back] Experiments are recorded using the ssrun(1) command, as follows:
ssrun -<exptype> <a.out-name> <a.out arguments>
where <exptype> is one of the named experiments listed below.
The result of an experiment is one or more files that are named by the
following convention:
<a.out-name>.<exptype>.<code><pid>
where <code> is:
'm' for the master process created by ssrun;
'p' for a process created by a call to sproc();
'f' for a process created by a call to fork();
'e' for a process created by a call to exec();
's' for a process created by a call to system(); and
'fe' for the exec'd process created by calls to fork() and exec()
with environment variable _SPEEDSHOP_TRACE_FORK_TO_EXEC being set to False.
To start the target process running, and leave it in a state to attach
a debugger, add the -hang flag:
ssrun -hang -<exptype> <a.out-name> <a.out arguments>
To get more detailed information about the run, add the -v
flag:
ssrun -v -<exptype> <a.out-name> <a.out arguments>
-orssrun
-v -hang -<exptype> <a.out-name> <a.out arguments>
To run Purify on an executable, use:
ssrun -purify <a.out-name> <a.out arguments>
Purify and performance experiments are mutually exclusive.
Page 1
SPEEDSHOP(1) SPEEDSHOP(1)
ssrun may take additional arguments; see its man page for further
information.
The following experiment types, specified by <exptype> above, are
supported in the current release:
usertime and totaltime
uses statistical callstack profiling, based on process virtual time
(including time spent when the system is running on behalf of the
process) for usertime and wall clock time for totaltime, with a time
sample interval of 30 milliseconds.
Note: o32 executables must explicitly link with -lexc for these
experiments to work; program execution may show significant slowdown
compared to the original executable; the stack unwind code sometimes
fails to completely unwind the stack; consequently, caller
attribution can not be done beyond the point of failure.
[f]pcsamp[x]
uses statistical PC sampling, using 16-bit bins, based on user and
system time, with a sample interval of 10 milliseconds. If the
optional f prefix is specified, a sample interval of 1 millisecond
will be used. If the optional x suffix is specified, a 32-bit bin
size will be used.
ideal
uses basic-block counting, done by instrumenting the executable.
fpe does tracing of all floating-point exceptions.
io does tracing of various I/O system calls.
On machines with hardware performance counters, (R10000 machines), the
following additional types are supported:
[f]gi_hwc
uses statistical PC sampling, based on overflows of the graduatedinstruction
counter, at an overflow interval of 32771. If the
optional f prefix is used, the overflow interval will be 6553.
[f]cy_hwc
uses statistical PC sampling, based on overflows of the cycle
counter, at an overflow interval of 16411. If the optional f prefix
is used, the overflow interval will be 3779.
[f]ic_hwc
uses statistical PC sampling, based on overflows of the primary
instruction-cache miss counter, at an overflow interval of 2053. If
the optional f prefix is used, the overflow interval will be 419.
Page 2
SPEEDSHOP(1) SPEEDSHOP(1)
[f]isc_hwc
uses statistical PC sampling, based on overflows of the secondary
instruction-cache miss counter, at an overflow interval of 131. If
the optional f prefix is used, the overflow interval will be 29.
[f]dc_hwc
uses statistical PC sampling, based on overflows of the primary
data-cache miss counter, at an overflow interval of 2053. If the
optional f prefix is used, the overflow interval will be 419.
[f]dsc_hwc
uses statistical PC sampling, based on overflows of the secondary
data-cache miss counter, at an overflow interval of 131. If the
optional f prefix is used, the overflow interval will be 29.
[f]tlb_hwc
uses statistical PC sampling, based on overflows of the TLB miss
counter, at an overflow interval of 257. If the optional f prefix
is used, the overflow interval will be 53.
[f]gfp_hwc
uses statistical PC sampling, based on overflows of the graduated
floating-point instruction counter, at an overflow interval of
32771. If the optional f prefix is used, the overflow interval will
be 6553.
[f]fsc_hwc
uses statistical PC sampling, based on overflows of the failed store
conditionals counter, at an overflow interval of 2003. If the
optional f prefix is used, the overflow interval will be 401.
prof_hwc
uses statistical PC sampling, based on overflows of the counter
specified by the environment variable _SPEEDSHOP_HWC_COUNTER_NUMBER,
at an interval given by the environment variable
_SPEEDSHOP_HWC_COUNTER_OVERFLOW. Note that these environment
variables can not be used to override the counter number or interval
for the other defined experiments. They are examined only when the
prof_hwc experiment is specified. The default counter is the
primary instruction-cache miss counter and the default overflow
interval is 2053.
gi_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the graduated-instruction counter, at an
overflow interval of 1000003.
cy_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the cycle counter, at an overflow interval of
10000019.
Page 3
SPEEDSHOP(1) SPEEDSHOP(1)
ic_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the primary instruction-cache-miss counter, at
an overflow interval of 8009.
isc_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the secondary instruction-cache-miss counter,
at an overflow interval of 2003.
dc_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the primary data-cache-miss counter, at an
overflow interval of 8009.
dsc_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the secondary data-cache-miss counter, at an
overflow interval of 2003.
tlb_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the TLB miss counter, at an overflow interval
of 2521.
gfp_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the graduated floating-point instruction
counter, at an overflow interval of 10007.
fsc_hwctime
profiles the cycle counter using statistical call-stack sampling,
based on overflows of the failed store conditionals counter, at an
overflow interval of 5003.
prof_hwctime
profiles the counter specified by the environment variable
_SPEEDSHOP_HWC_COUNTER_PROF_NUMBER using statistical call-stack
sampling, based on overflows of the counter specified by the
environment variable _SPEEDSHOP_HWC_COUNTER_NUMBER, at an interval
given by the environment variable _SPEEDSHOP_HWC_COUNTER_OVERFLOW.
Note that these environment variables can not be used to override
the counter numbers or interval for the other defined experiments.
They are examined only when the prof_hwctime experiment is
specified. The default overflow and profling counter is the cycle
counter and the default overflow interval is 10000019.
One additional experiment type may be recorded, but no report generation
for it is yet supported. It is:
Page 4
SPEEDSHOP(1) SPEEDSHOP(1)
heap does tracing of all malloc and free, etc. calls, and also supports
various options for debugging heap usage.
Custom experiments will be supported in future releases.
Report generation is done through the prof(1) command:
prof <output file> . . . <output file>
It will add the data from all of the output files, and produce a listing
which depends on the particular experiment type. For all experiments, it
will produce a list of functions, annotated with the appropriate metric.
For [f]pcsamp[x], and the various *_hwc experiments, the function list is
annotated with the exclusive metric; for the PC sampling experiments,
the metric is exclusive time, for the various hardware counter profiling
experiments the metric is exclusive counts.
For ideal experiments, the function list is annotated with a cycle count
and percentage, a cumulative percentage for that function and all others
above it in the list, an estimated of idealized time, an instruction
execution count, and a call count. If the -b[utterfly] flag is added, a
list of callers and callees of each function is also produced.
For usertime and totaltime and the various *_hwctime experiments, the
function list is annotated with percentage of time or counts for the
function, the time in that function, and the time or counts in that
function and its descendants, and a count of the number of callstacks
containing that function. If the -b[utterfly] flag is added, a list of
callers and callees of each function is also produced.
For fpe experiments, the function list is annotated with the percentage
of FPEs in that function, and counts for the function and its
descendants. If the -b[utterfly] flag is added, a list of callers and
callees of each function is also produced.
For io experiments, the function list is annotated with the percentage of
IO calls in that function, and counts for the function and its
descendants. If the -b[utterfly] flag is added, a list of callers and
callees of each function is also produced.
There are many additional options to prof; see the prof(1) man page for
further details.
CALIPER SAMPLES
In the current releases, caliper samples may be recorded, and the
-calipers option to prof, will allow you to see the data for any
caliper-setting.
Caliper samples are supported in three different ways. First, the user
can explicitly link with the SpeedShop runtime, and call its API routine
to record a caliper sample; second, the user can define a signal to be
used to record a caliper sample, by specifying the environment variable
Page 5
SPEEDSHOP(1) SPEEDSHOP(1)
_SPEEDSHOP_CALIPER_POINT_SIG and send the target the specified signal;
third, a caliper-sample trap may be set in either dbx, or the WorkShop
debugger. In the current debuggers, this is done by planting an stop
trap (breakpoint), and, when the process stops, evaluating the
expression:
ssrt_caliper_point(0, 0)
the evaluation of the expression always returns zero, but a side effect
of the evaluation is the recording of the appropriate data. After
evaluation, process execution may be resumed. See the ssapi(3) man page
for further details.
USER ENVIRONMENT VARIABLE CONTROLS [Toc] [Back] Various environment variables are normally used to control the operation
of SpeedShop. They are:
_SPEEDSHOP_VERBOSE
causes a log of each program's operation to be written to stderr.
If it is set to an empty string, only major events are logged; if it
is set to a non-empty string, more detailed events are logged.
_SPEEDSHOP_SILENT
if set, suppresses all output, other than fatal error messages from
SpeedShop. If both _SPEEDSHOP_VERBOSE and _SPEEDSHOP_SILENT are
set, _SPEEDSHOP_SILENT wins.
_SPEEDSHOP_CALIPER_POINT_SIG <signal-number>
if specified, gives a signal number to be used for recording a
caliper-point in the experiment.
_SPEEDSHOP_POLLPOINT_CALIPER_POINT <timer_type>,<timer_interval>
if specified, defines the timer type and the timer interval (in
secs) for pollpoint caliper points.
_SPEEDSHOP_OUTPUT_DIRECTORY
if specified, the output data files and the instrumented binaries
will be put in the named directory.
_SPEEDSHOP_OUTPUT_FD
if specified, gives the number of the file descriptor to be used for
writing the output file. Note: this option is not supported in the
current release.
_SPEEDSHOP_REUSE_FILE_DESCRIPTORS
if set, opens and closes the file descriptors for the output files
every time performance data is to be written. If the target program
is using chdir(), then the _SPEEDSHOP_OUTPUT_DIRECTORY environment
variable should also be set to the full pathname of the directory
where the output files are to be put.
_SPEEDSHOP_OUTPUT_FILENAME
if specified, the given name will be used for the output file; if
_SPEEDSHOP_OUTPUT_DIRECTORY is also specified, it will be prepended
Page 6
SPEEDSHOP(1) SPEEDSHOP(1)
to the name.
_SPEEDSHOP_HWC_COUNTER_NUMBER
specifies the overflow counter to be used for prof_hwc or
prof_hwctime experiments. Counters are numbered between 0 and 31,
and are described in the MIPS R10000 Microprocessor User's Manual,
Chapter 14. Counter 0 counters are numbered 0-15, and counter 1
counters are numbers 16-31.
_SPEEDSHOP_HWC_COUNTER_OVERFLOW
specifies the overflow value for the counter to be used in prof_hwc
or prof_hwctime experiments. The value chosen may be any number
greater than 0. Some choices may produce data that is not
statistically random, but rather reflects a correlation between the
overflow interval and a cyclic behavior in the application. Users
may want to do two or more runs with different overflow values.
_SPEEDSHOP_HWC_COUNTER_PROF_NUMBER
specifies the profiling counter to be used for prof_hwctime
experiments. Counters are numbered between 0 and 31, and are
described in the MIPS R10000 Microprocessor User's Manual, Chapter
14. Counter 0 counters are numbered 0-15, and counter 1 counters
are numbers 16-31.
_SPEEDSHOP_HWC_COUNTER_SIGNAL_NUMBER
specifies the profiling signal to be used for prof_hwctime
experiments.
_SPEEDSHOP_OUTPUT_NOCOMPRESS
if set, disables the compression of performance data.
Other variables will be documented in the future releases.
PROCESS TRACKING ENVIRONMENT VARIABLE CONTROLS
Various environment variables may be used for controlling the treatment
of processes spawned from the original target. They are:
_SPEEDSHOP_TRACE_FORK {True|False}
if True, specifies that processes spawned by calls to fork() will be
monitored, if they do not call exec(). If they do call exec(), and
_SPEEDSHOP_TRACE_FORK_TO_EXEC is not set to True, the data covering
the time between the fork() and the exec() will be discarded. It is
True by default. Note: in the current release, data will be
recorded independent of whether the process calls exec() or not.
_SPEEDSHOP_TRACE_FORK_TO_EXEC {True|False}
if True, specifies that process spawned by calls to fork() will be
monitored, even if they also call exec(). It is False by default.
_SPEEDSHOP_TRACE_EXEC {True|False}
if True, specifies that process spawned by calls to any of the
various flavors of exec() will be monitored. It is True by default.
Page 7
SPEEDSHOP(1) SPEEDSHOP(1)
_SPEEDSHOP_TRACE_SPROC {True|False}
if True, specifies that process spawned by calls to sproc() will be
monitored. It is True by default.
_SPEEDSHOP_TRACE_SYSTEM {True|False}
if True, specifies that process spawned by calls to system() will be
monitored. It is False by default.
EXPERT-MODE ENVIRONMENT VARIABLE CONTROLS
Various additional environment variables may be used for debugging and
finer control of the operation of SpeedShop. They are:
_SPEEDSHOP_SAMPLING_MODE
for PC-sampling and hardware-counter profiling, if set to 1, will
generate data for the base executable only. If it is not set, or
set to anything other than 1, data is generated for the executable
and all DSOs it uses.
_SPEEDSHOP_INIT_DEFERRED_SIG <signal-number>
If specified, initialization of the experiment will not be performed
when the target process starts, but rather will be delayed until the
specified signal is sent to the process. A handler for the given
signal will be installed when the process starts, and it is the
users responsibility to ensure that it is not overridden by the
target code. If the process terminates before the signal is
received, no data will be recorded.
_SPEEDSHOP_SHUTDOWN_SIG <signal-number>
If specified, termination of the experiment will not be performed
when the target process exits, but rather will happen when the
specified signal is sent to the process. A handler for the given
signal will be installed when the process starts, and it is the
users responsibility to ensure that it is not overridden by the
target code. If the process terminates before the signal is
received, data is recorded normally.
_SPEEDSHOP_EXPERIMENT_TYPE
passes the name of the experiment to the runtime. It is normally
set by ssrun(1), but may be overwritten.
_SPEEDSHOP_MARCHING_ORDERS
passes the marching orders of the experiment to the runtime. It is
normally set by ssrun(1) from the experiment type, but may be
overwritten.
_SPEEDSHOP_SBRK_BUFFER_LENGTH
defines the segment grow size for the internal malloc arena used.
This arena is completely separate from the user's arena, and it
usually grows in default segments of size 0x100000.
Page 8
SPEEDSHOP(1) SPEEDSHOP(1)
_SPEEDSHOP_SBRK_BUFFER_ADDR
defines the preferred starting address to be used for the internal
malloc arena. This option has to used with extreme care since it
might result in memory regions overlap.
_SPEEDSHOP_FILE_BUFFER_LENGTH
defines the size of the buffer used for writing the experiment
files. The default length is 64KB. The buffer is only used for
writing many small records to the file (as in tracing experiments);
large records are written directly, to avoid the buffering overhead.
_SPEEDSHOP_DEBUG_NO_SIG_TRAPS
if set, disables the normal setting of signal handlers for all fatal
and exit signals.
_SPEEDSHOP_DEBUG_NO_STACK_UNWIND
if set, suppresses the stack unwind as done in usertime, totaltime,
or other callstack-based experiments. The option is used as a
workaround for various unwind bugs in libexc.
_SPEEDSHOP_RLD
defines the full pathname to rld to be used and enables rld
profiling (for pcsamp and hwc experiments only).
_SPEEDSHOP_INSTR_ARGS
defines additional instrumentation arguments.
Instrumentation is done with the pixie(1) command, invoked automatically
by ssrun(1), and, if necessary for DSOs that are opened during a run, by
the runtime library. Users normally would not invoke pixie(1) directly.
In the current release, instrumented executables and DSOs appear in the
current working directory. In a future release, the DSOs will be cached.
SPEEDSHOP API ROUTINES [Toc] [Back] The SpeedShop API routines are defined in the include file
"SpeedShop/api.h", installed in /usr/include. It defines three entry
points, described int the SpeedShop API man page, ssapi(3).
SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES [Toc] [Back] The SpeedShop facility for users to add custom data capture routines is
not available in the current release.
MISCELLANEOUS UTILITY PROGRAMS [Toc] [Back] Several utility routines are provided, in addition to the main
functionality in SpeedShop. They are:
sscord
and ssorder are used to generate cord feedback files from recorded
data. sswsextr is a script to produce the working-set files used
for cord computations. See their respective man pages for more
Page 9
SPEEDSHOP(1) SPEEDSHOP(1)
information.
ssusage
is a variant of time(1) that prints more information about the
resource usage of a program. See ssusage(1) for more information.
ssdump
is a program which produces a formatted dump of a SpeedShop
experiment.
squeeze
is a program which allocates and locks down memory, making the
system behave as if it had less physical memory that it really does.
See squeeze(1) for more information.
thrash
is a program that allocates memory, and then touches all of the
pages, in order to force other pages out of the system's physical
memory. See thrash(1) for more information.
fbdump
is a program that dumps out the contents of the compiler feedback
files produced by the -feedback option to prof(1). See fpdump(1)
and prof(1) for more information.
Depending on the revision of the R10000 CPUs there is a difference in the
interpretation of counter number 14 (``Virtual coherency condition'' for
parts before revision 3.1 or ``ALU/FPU completion cycles'' for parts at
revision 3.1 or later). There are also some subtle differences in the
semantics of some of the counters.
In systems with a homogeneous deployment of CPUs at the same revision,
speedshop will adjust the reported information accordingly.
For systems with a mixed deployment of CPU revisions including some
before 3.1 and some at or after 3.1, the interpretation of counter 14 is
undefined, and there may be some slight inaccuracies due to aggregation
of counters with different semantics across all CPUs.
Identification of the revisions for all CPUs can be made using the -v
option to hinv(1).
ssrun(1), ssdump(1), prof(1), pixie(1), fbdump(1), ssusage(1),
squeeze(1), thrash(1), malloc_ss(3), fpe_ss(3), io_ss(3), ssapi(3),
perfex(1), r10k_counters(5), sscord(1), ssorder(1), sswsextr(1)
PPPPaaaaggggeeee 11110000 [ Back ]
|