speedshop - IRIX

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->IRIX man pages -> speedshop (1)

NAME
DESCRIPTION
SUPPORTED EXECUTABLES
RECORDING EXPERIMENTS
EXPERIMENT TYPES
REPORT GENERATION
USER ENVIRONMENT VARIABLE CONTROLS
INSTRUMENTATION
SPEEDSHOP API ROUTINES
SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES
MISCELLANEOUS UTILITY PROGRAMS
CAVEATS
SEE ALSO
NAME
DESCRIPTION
SUPPORTED EXECUTABLES
RECORDING EXPERIMENTS
EXPERIMENT TYPES
REPORT GENERATION
USER ENVIRONMENT VARIABLE CONTROLS
INSTRUMENTATION
SPEEDSHOP API ROUTINES
SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES
MISCELLANEOUS UTILITY PROGRAMS
CAVEATS
SEE ALSO


SPEEDSHOP(1)							  SPEEDSHOP(1)

NAME [Toc] [Back]

     SpeedShop,	speedshop - an integrated package of performance tools

DESCRIPTION [Toc] [Back]

     SpeedShop is the generic name for an integrated package of	performance
     tools to run performance experiments on executables, and to examine the
     results of	those experiments.  It also supports starting a	process, in
     such a way	as to permit a debugger	to attach to it, and it	supports
     running Purify on executables.

     For Purify	and for	some experiments instrumentation is necessary; if so,
     it	will be	performed automatically, and the resulting instrumented
     executable	run to generate	the data.

SUPPORTED EXECUTABLES [Toc] [Back]

     SpeedShop works under IRIX	6.2, or	later, and supports executables
     compiled with the IRIX 6.2	compilers (o32,	n32 and	64), or	with the
     MIPSPro 7.x compilers (n32	and 64).  SpeedShop supports C,	C++, FORTRAN,
     ADA, and asm programs.  Programs must be built using shared libraries
     (DSOs); nonshared or stripped executables are not supported.

RECORDING EXPERIMENTS [Toc] [Back]

     Experiments are recorded using the	ssrun(1) command, as follows:
	  ssrun	-<exptype> <a.out-name>	<a.out arguments>
     where <exptype> is	one of the named experiments listed below.

     The result	of an experiment is one	or more	files that are named by	the
     following convention:
		    <a.out-name>.<exptype>.<code><pid>
     where <code> is:
     'm' for the master	process	created	by ssrun;
     'p' for a process created by a call to sproc();
     'f' for a process created by a call to fork();
     'e' for a process created by a call to exec();
     's' for a process created by a call to system(); and
     'fe' for the exec'd process created by calls to fork() and	exec()
     with environment variable _SPEEDSHOP_TRACE_FORK_TO_EXEC being set to False.

     To	start the target process running, and leave it in a state to attach
     a debugger, add the -hang flag:
	  ssrun	-hang -<exptype> <a.out-name> <a.out arguments>

     To	get more detailed information about the	run, add the -v
     flag:
	  ssrun	-v -<exptype> <a.out-name> <a.out arguments>
	       -orssrun
	-v -hang -<exptype> <a.out-name> <a.out	arguments>

     To	run Purify on an executable, use:
	  ssrun	-purify	<a.out-name> <a.out arguments>

     Purify and	performance experiments	are mutually exclusive.



									Page 1






SPEEDSHOP(1)							  SPEEDSHOP(1)



     ssrun may take additional arguments; see its man page for further
     information.

EXPERIMENT TYPES [Toc] [Back]

     The following experiment types, specified by <exptype> above, are
     supported in the current release:

     usertime and totaltime
	  uses statistical callstack profiling,	based on process virtual time
	  (including time spent	when the system	is running on behalf of	the
	  process) for usertime	and wall clock time for	totaltime, with	a time
	  sample interval of 30	milliseconds.
	  Note:	o32 executables	must explicitly	link with -lexc	for these
	  experiments to work; program execution may show significant slowdown
	  compared to the original executable; the stack unwind	code sometimes
	  fails	to completely unwind the stack;	consequently, caller
	  attribution can not be done beyond the point of failure.

     [f]pcsamp[x]
	  uses statistical PC sampling,	using 16-bit bins, based on user and
	  system time, with a sample interval of 10 milliseconds.  If the
	  optional f prefix is specified, a sample interval of 1 millisecond
	  will be used.	 If the	optional x suffix is specified,	a 32-bit bin
	  size will be used.

     ideal
	  uses basic-block counting, done by instrumenting the executable.

     fpe  does tracing of all floating-point exceptions.

     io	  does tracing of various I/O system calls.

     On	machines with hardware performance counters, (R10000 machines),	the
     following additional types	are supported:

     [f]gi_hwc
	  uses statistical PC sampling,	based on overflows of the graduatedinstruction
 counter, at an overflow interval of 32771.  If the
	  optional f prefix is used, the overflow interval will	be 6553.

     [f]cy_hwc
	  uses statistical PC sampling,	based on overflows of the cycle
	  counter, at an overflow interval of 16411.  If the optional f	prefix
	  is used, the overflow	interval will be 3779.

     [f]ic_hwc
	  uses statistical PC sampling,	based on overflows of the primary
	  instruction-cache miss counter, at an	overflow interval of 2053.  If
	  the optional f prefix	is used, the overflow interval will be 419.






									Page 2






SPEEDSHOP(1)							  SPEEDSHOP(1)



     [f]isc_hwc
	  uses statistical PC sampling,	based on overflows of the secondary
	  instruction-cache miss counter, at an	overflow interval of 131.  If
	  the optional f prefix	is used, the overflow interval will be 29.

     [f]dc_hwc
	  uses statistical PC sampling,	based on overflows of the primary
	  data-cache miss counter, at an overflow interval of 2053.  If	the
	  optional f prefix is used, the overflow interval will	be 419.

     [f]dsc_hwc
	  uses statistical PC sampling,	based on overflows of the secondary
	  data-cache miss counter, at an overflow interval of 131.  If the
	  optional f prefix is used, the overflow interval will	be 29.

     [f]tlb_hwc
	  uses statistical PC sampling,	based on overflows of the TLB miss
	  counter, at an overflow interval of 257.  If the optional f prefix
	  is used, the overflow	interval will be 53.

     [f]gfp_hwc
	  uses statistical PC sampling,	based on overflows of the graduated
	  floating-point instruction counter, at an overflow interval of
	  32771.  If the optional f prefix is used, the	overflow interval will
	  be 6553.

     [f]fsc_hwc
	  uses statistical PC sampling,	based on overflows of the failed store
	  conditionals counter,	at an overflow interval	of 2003.  If the
	  optional f prefix is used, the overflow interval will	be 401.

     prof_hwc
	  uses statistical PC sampling,	based on overflows of the counter
	  specified by the environment variable	_SPEEDSHOP_HWC_COUNTER_NUMBER,
	  at an	interval given by the environment variable
	  _SPEEDSHOP_HWC_COUNTER_OVERFLOW.  Note that these environment
	  variables can	not be used to override	the counter number or interval
	  for the other	defined	experiments.  They are examined	only when the
	  prof_hwc experiment is specified.  The default counter is the
	  primary instruction-cache miss counter and the default overflow
	  interval is 2053.

     gi_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the graduated-instruction counter, at an
	  overflow interval of 1000003.

     cy_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the cycle counter, at an overflow interval of
	  10000019.




									Page 3






SPEEDSHOP(1)							  SPEEDSHOP(1)



     ic_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the primary instruction-cache-miss counter, at
	  an overflow interval of 8009.

     isc_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the secondary instruction-cache-miss counter,
	  at an	overflow interval of 2003.

     dc_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the primary data-cache-miss counter, at	an
	  overflow interval of 8009.

     dsc_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the secondary data-cache-miss counter, at an
	  overflow interval of 2003.

     tlb_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the TLB	miss counter, at an overflow interval
	  of 2521.

     gfp_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the graduated floating-point instruction
	  counter, at an overflow interval of 10007.

     fsc_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the failed store conditionals counter, at an
	  overflow interval of 5003.

     prof_hwctime
	  profiles the counter specified by the	environment variable
	  _SPEEDSHOP_HWC_COUNTER_PROF_NUMBER using statistical call-stack
	  sampling, based on overflows of the counter specified	by the
	  environment variable _SPEEDSHOP_HWC_COUNTER_NUMBER, at an interval
	  given	by the environment variable _SPEEDSHOP_HWC_COUNTER_OVERFLOW.
	  Note that these environment variables	can not	be used	to override
	  the counter numbers or interval for the other	defined	experiments.
	  They are examined only when the prof_hwctime experiment is
	  specified.  The default overflow and profling	counter	is the cycle
	  counter and the default overflow interval is 10000019.

     One additional experiment type may	be recorded, but no report generation
     for it is yet supported.  It is:






									Page 4






SPEEDSHOP(1)							  SPEEDSHOP(1)



     heap does tracing of all malloc and free, etc. calls, and also supports
	  various options for debugging	heap usage.

     Custom experiments	will be	supported in future releases.

REPORT GENERATION [Toc] [Back]

     Report generation is done through the prof(1) command:
	  prof <output file> . . . <output file>
     It	will add the data from all of the output files,	and produce a listing
     which depends on the particular experiment	type.  For all experiments, it
     will produce a list of functions, annotated with the appropriate metric.

     For [f]pcsamp[x], and the various *_hwc experiments, the function list is
     annotated with the	exclusive metric;  for the PC sampling experiments,
     the metric	is exclusive time, for the various hardware counter profiling
     experiments the metric is exclusive counts.

     For ideal experiments, the	function list is annotated with	a cycle	count
     and percentage, a cumulative percentage for that function and all others
     above it in the list, an estimated	of idealized time, an instruction
     execution count, and a call count.	 If the	-b[utterfly] flag is added, a
     list of callers and callees of each function is also produced.

     For usertime and totaltime	and the	various	*_hwctime experiments, the
     function list is annotated	with percentage	of time	or counts for the
     function, the time	in that	function, and the time or counts in that
     function and its descendants, and a count of the number of	callstacks
     containing	that function.	If the -b[utterfly] flag is added, a list of
     callers and callees of each function is also produced.

     For fpe experiments, the function list is annotated with the percentage
     of	FPEs in	that function, and counts for the function and its
     descendants.  If the -b[utterfly] flag is added, a	list of	callers	and
     callees of	each function is also produced.

     For io experiments, the function list is annotated	with the percentage of
     IO	calls in that function,	and counts for the function and	its
     descendants.  If the -b[utterfly] flag is added, a	list of	callers	and
     callees of	each function is also produced.

     There are many additional options to prof;	see the	prof(1)	man page for
     further details.

CALIPER	SAMPLES
     In	the current releases, caliper samples may be recorded, and the
     -calipers option to prof, will allow you to see the data for any
     caliper-setting.

     Caliper samples are supported in three different ways.  First, the	user
     can explicitly link with the SpeedShop runtime, and call its API routine
     to	record a caliper sample; second, the user can define a signal to be
     used to record a caliper sample, by specifying the	environment variable



									Page 5






SPEEDSHOP(1)							  SPEEDSHOP(1)



     _SPEEDSHOP_CALIPER_POINT_SIG and send the target the specified signal;
     third, a caliper-sample trap may be set in	either dbx, or the WorkShop
     debugger.	In the current debuggers, this is done by planting an stop
     trap (breakpoint),	and, when the process stops, evaluating	the
     expression:
	       ssrt_caliper_point(0, 0)
     the evaluation of the expression always returns zero, but a side effect
     of	the evaluation is the recording	of the appropriate data.  After
     evaluation, process execution may be resumed.  See	the ssapi(3) man page
     for further details.

USER ENVIRONMENT VARIABLE CONTROLS [Toc] [Back]

     Various environment variables are normally	used to	control	the operation
     of	SpeedShop.  They are:

     _SPEEDSHOP_VERBOSE
	  causes a log of each program's operation to be written to stderr.
	  If it	is set to an empty string, only	major events are logged; if it
	  is set to a non-empty	string,	more detailed events are logged.

     _SPEEDSHOP_SILENT
	  if set, suppresses all output, other than fatal error	messages from
	  SpeedShop.  If both _SPEEDSHOP_VERBOSE and _SPEEDSHOP_SILENT are
	  set, _SPEEDSHOP_SILENT wins.

     _SPEEDSHOP_CALIPER_POINT_SIG <signal-number>
	  if specified,	gives a	signal number to be used for recording a
	  caliper-point	in the experiment.

     _SPEEDSHOP_POLLPOINT_CALIPER_POINT	<timer_type>,<timer_interval>
	  if specified,	defines	the timer type and the timer interval (in
	  secs)	for pollpoint caliper points.

     _SPEEDSHOP_OUTPUT_DIRECTORY
	  if specified,	the output data	files and the instrumented binaries
	  will be put in the named directory.

     _SPEEDSHOP_OUTPUT_FD
	  if specified,	gives the number of the	file descriptor	to be used for
	  writing the output file.  Note: this option is not supported in the
	  current release.

     _SPEEDSHOP_REUSE_FILE_DESCRIPTORS
	  if set, opens	and closes the file descriptors	for the	output files
	  every	time performance data is to be written.	If the target program
	  is using chdir(), then the _SPEEDSHOP_OUTPUT_DIRECTORY environment
	  variable should also be set to the full pathname of the directory
	  where	the output files are to	be put.

     _SPEEDSHOP_OUTPUT_FILENAME
	  if specified,	the given name will be used for	the output file;  if
	  _SPEEDSHOP_OUTPUT_DIRECTORY is also specified, it will be prepended



									Page 6






SPEEDSHOP(1)							  SPEEDSHOP(1)



	  to the name.

     _SPEEDSHOP_HWC_COUNTER_NUMBER
	  specifies the	overflow counter to be used for	prof_hwc or
	  prof_hwctime experiments.  Counters are numbered between 0 and 31,
	  and are described in the MIPS	R10000 Microprocessor User's Manual,
	  Chapter 14.  Counter 0 counters are numbered 0-15, and counter 1
	  counters are numbers 16-31.

     _SPEEDSHOP_HWC_COUNTER_OVERFLOW
	  specifies the	overflow value for the counter to be used in prof_hwc
	  or prof_hwctime experiments.	The value chosen may be	any number
	  greater than 0.  Some	choices	may produce data that is not
	  statistically	random,	but rather reflects a correlation between the
	  overflow interval and	a cyclic behavior in the application.  Users
	  may want to do two or	more runs with different overflow values.

     _SPEEDSHOP_HWC_COUNTER_PROF_NUMBER
	  specifies the	profiling counter to be	used for prof_hwctime
	  experiments.	Counters are numbered between 0	and 31,	and are
	  described in the MIPS	R10000 Microprocessor User's Manual, Chapter
	  14.  Counter 0 counters are numbered 0-15, and counter 1 counters
	  are numbers 16-31.

     _SPEEDSHOP_HWC_COUNTER_SIGNAL_NUMBER
	  specifies the	profiling signal to be used for	prof_hwctime
	  experiments.

     _SPEEDSHOP_OUTPUT_NOCOMPRESS
	  if set, disables the compression of performance data.

     Other variables will be documented	in the future releases.

PROCESS	TRACKING ENVIRONMENT VARIABLE CONTROLS
     Various environment variables may be used for controlling the treatment
     of	processes spawned from the original target.  They are:

     _SPEEDSHOP_TRACE_FORK {True|False}
	  if True, specifies that processes spawned by calls to	fork() will be
	  monitored, if	they do	not call exec().  If they do call exec(), and
	  _SPEEDSHOP_TRACE_FORK_TO_EXEC	is not set to True, the	data covering
	  the time between the fork() and the exec() will be discarded.	 It is
	  True by default.  Note: in the current release, data will be
	  recorded independent of whether the process calls exec() or not.

     _SPEEDSHOP_TRACE_FORK_TO_EXEC {True|False}
	  if True, specifies that process spawned by calls to fork() will be
	  monitored, even if they also call exec().  It	is False by default.

     _SPEEDSHOP_TRACE_EXEC {True|False}
	  if True, specifies that process spawned by calls to any of the
	  various flavors of exec() will be monitored.	It is True by default.



									Page 7






SPEEDSHOP(1)							  SPEEDSHOP(1)



     _SPEEDSHOP_TRACE_SPROC {True|False}
	  if True, specifies that process spawned by calls to sproc() will be
	  monitored.  It is True by default.

     _SPEEDSHOP_TRACE_SYSTEM {True|False}
	  if True, specifies that process spawned by calls to system() will be
	  monitored.  It is False by default.

EXPERT-MODE ENVIRONMENT	VARIABLE CONTROLS
     Various additional	environment variables may be used for debugging	and
     finer control of the operation of SpeedShop.  They	are:

     _SPEEDSHOP_SAMPLING_MODE
	  for PC-sampling and hardware-counter profiling, if set to 1, will
	  generate data	for the	base executable	only.  If it is	not set, or
	  set to anything other	than 1,	data is	generated for the executable
	  and all DSOs it uses.

     _SPEEDSHOP_INIT_DEFERRED_SIG <signal-number>
	  If specified,	initialization of the experiment will not be performed
	  when the target process starts, but rather will be delayed until the
	  specified signal is sent to the process.   A handler for the given
	  signal will be installed when	the process starts, and	it is the
	  users	responsibility to ensure that it is not	overridden by the
	  target code.	If the process terminates before the signal is
	  received, no data will be recorded.

     _SPEEDSHOP_SHUTDOWN_SIG <signal-number>
	  If specified,	termination of the experiment will not be performed
	  when the target process exits, but rather will happen	when the
	  specified signal is sent to the process.   A handler for the given
	  signal will be installed when	the process starts, and	it is the
	  users	responsibility to ensure that it is not	overridden by the
	  target code.	If the process terminates before the signal is
	  received, data is recorded normally.

     _SPEEDSHOP_EXPERIMENT_TYPE
	  passes the name of the experiment to the runtime.  It	is normally
	  set by ssrun(1), but may be overwritten.

     _SPEEDSHOP_MARCHING_ORDERS
	  passes the marching orders of	the experiment to the runtime.	It is
	  normally set by ssrun(1) from	the experiment type, but may be
	  overwritten.

     _SPEEDSHOP_SBRK_BUFFER_LENGTH
	  defines the segment grow size	for the	internal malloc	arena used.
	  This arena is	completely separate from the user's arena, and it
	  usually grows	in default segments of size 0x100000.






									Page 8






SPEEDSHOP(1)							  SPEEDSHOP(1)



     _SPEEDSHOP_SBRK_BUFFER_ADDR
	  defines the preferred	starting address to be used for	the internal
	  malloc arena.	This option has	to used	with extreme care since	it
	  might	result in memory regions overlap.

     _SPEEDSHOP_FILE_BUFFER_LENGTH
	  defines the size of the buffer used for writing the experiment
	  files.  The default length is	64KB.  The buffer is only used for
	  writing many small records to	the file (as in	tracing	experiments);
	  large	records	are written directly, to avoid the buffering overhead.

     _SPEEDSHOP_DEBUG_NO_SIG_TRAPS
	  if set, disables the normal setting of signal	handlers for all fatal
	  and exit signals.

     _SPEEDSHOP_DEBUG_NO_STACK_UNWIND
	  if set, suppresses the stack unwind as done in usertime, totaltime,
	  or other callstack-based experiments.	 The option is used as a
	  workaround for various unwind	bugs in	libexc.

     _SPEEDSHOP_RLD
	  defines the full pathname to rld to be used and enables rld
	  profiling (for pcsamp	and hwc	experiments only).

     _SPEEDSHOP_INSTR_ARGS
	  defines additional instrumentation arguments.

INSTRUMENTATION [Toc] [Back]

     Instrumentation is	done with the pixie(1) command,	invoked	automatically
     by	ssrun(1), and, if necessary for	DSOs that are opened during a run, by
     the runtime library.  Users normally would	not invoke pixie(1) directly.

     In	the current release, instrumented executables and DSOs appear in the
     current working directory.	 In a future release, the DSOs will be cached.

SPEEDSHOP API ROUTINES [Toc] [Back]

     The SpeedShop API routines	are defined in the include file
     "SpeedShop/api.h",	installed in /usr/include.  It defines three entry
     points, described int the SpeedShop API man page, ssapi(3).

SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES [Toc] [Back]

     The SpeedShop facility for	users to add custom data capture routines is
     not available in the current release.

MISCELLANEOUS UTILITY PROGRAMS [Toc] [Back]

     Several utility routines are provided, in addition	to the main
     functionality in SpeedShop.  They are:

     sscord
	  and ssorder are used to generate cord	feedback files from recorded
	  data.	 sswsextr is a script to produce the working-set files used
	  for cord computations.  See their respective man pages for more



									Page 9






SPEEDSHOP(1)							  SPEEDSHOP(1)



	  information.

     ssusage
	  is a variant of time(1) that prints more information about the
	  resource usage of a program.	See ssusage(1) for more	information.

     ssdump
	  is a program which produces a	formatted dump of a SpeedShop
	  experiment.

     squeeze
	  is a program which allocates and locks down memory, making the
	  system behave	as if it had less physical memory that it really does.
	  See squeeze(1) for more information.

     thrash
	  is a program that allocates memory, and then touches all of the
	  pages, in order to force other pages out of the system's physical
	  memory.  See thrash(1) for more information.

     fbdump
	  is a program that dumps out the contents of the compiler feedback
	  files	produced by the	-feedback option to prof(1).  See fpdump(1)
	  and prof(1) for more information.

CAVEATS [Toc] [Back]

     Depending on the revision of the R10000 CPUs there	is a difference	in the
     interpretation of counter number 14 (``Virtual coherency condition'' for
     parts before revision 3.1 or ``ALU/FPU completion cycles''	for parts at
     revision 3.1 or later).  There are	also some subtle differences in	the
     semantics of some of the counters.

     In	systems	with a homogeneous deployment of CPUs at the same revision,
     speedshop will adjust the reported	information accordingly.

     For systems with a	mixed deployment of CPU	revisions including some
     before 3.1	and some at or after 3.1, the interpretation of	counter	14 is
     undefined,	and there may be some slight inaccuracies due to aggregation
     of	counters with different	semantics across all CPUs.

     Identification of the revisions for all CPUs can be made using the	-v
     option to hinv(1).

NAME [Toc] [Back]

     SpeedShop,	speedshop - an integrated package of performance tools

DESCRIPTION [Toc] [Back]

     SpeedShop is the generic name for an integrated package of	performance
     tools to run performance experiments on executables, and to examine the
     results of	those experiments.  It also supports starting a	process, in
     such a way	as to permit a debugger	to attach to it, and it	supports
     running Purify on executables.

     For Purify	and for	some experiments instrumentation is necessary; if so,
     it	will be	performed automatically, and the resulting instrumented
     executable	run to generate	the data.

SUPPORTED EXECUTABLES [Toc] [Back]

     SpeedShop works under IRIX	6.2, or	later, and supports executables
     compiled with the IRIX 6.2	compilers (o32,	n32 and	64), or	with the
     MIPSPro 7.x compilers (n32	and 64).  SpeedShop supports C,	C++, FORTRAN,
     ADA, and asm programs.  Programs must be built using shared libraries
     (DSOs); nonshared or stripped executables are not supported.

RECORDING EXPERIMENTS [Toc] [Back]

     Experiments are recorded using the	ssrun(1) command, as follows:
	  ssrun	-<exptype> <a.out-name>	<a.out arguments>
     where <exptype> is	one of the named experiments listed below.

     The result	of an experiment is one	or more	files that are named by	the
     following convention:
		    <a.out-name>.<exptype>.<code><pid>
     where <code> is:
     'm' for the master	process	created	by ssrun;
     'p' for a process created by a call to sproc();
     'f' for a process created by a call to fork();
     'e' for a process created by a call to exec();
     's' for a process created by a call to system(); and
     'fe' for the exec'd process created by calls to fork() and	exec()
     with environment variable _SPEEDSHOP_TRACE_FORK_TO_EXEC being set to False.

     To	start the target process running, and leave it in a state to attach
     a debugger, add the -hang flag:
	  ssrun	-hang -<exptype> <a.out-name> <a.out arguments>

     To	get more detailed information about the	run, add the -v
     flag:
	  ssrun	-v -<exptype> <a.out-name> <a.out arguments>
	       -orssrun
	-v -hang -<exptype> <a.out-name> <a.out	arguments>

     To	run Purify on an executable, use:
	  ssrun	-purify	<a.out-name> <a.out arguments>

     Purify and	performance experiments	are mutually exclusive.



									Page 1






SPEEDSHOP(1)							  SPEEDSHOP(1)



     ssrun may take additional arguments; see its man page for further
     information.

EXPERIMENT TYPES [Toc] [Back]

     The following experiment types, specified by <exptype> above, are
     supported in the current release:

     usertime and totaltime
	  uses statistical callstack profiling,	based on process virtual time
	  (including time spent	when the system	is running on behalf of	the
	  process) for usertime	and wall clock time for	totaltime, with	a time
	  sample interval of 30	milliseconds.
	  Note:	o32 executables	must explicitly	link with -lexc	for these
	  experiments to work; program execution may show significant slowdown
	  compared to the original executable; the stack unwind	code sometimes
	  fails	to completely unwind the stack;	consequently, caller
	  attribution can not be done beyond the point of failure.

     [f]pcsamp[x]
	  uses statistical PC sampling,	using 16-bit bins, based on user and
	  system time, with a sample interval of 10 milliseconds.  If the
	  optional f prefix is specified, a sample interval of 1 millisecond
	  will be used.	 If the	optional x suffix is specified,	a 32-bit bin
	  size will be used.

     ideal
	  uses basic-block counting, done by instrumenting the executable.

     fpe  does tracing of all floating-point exceptions.

     io	  does tracing of various I/O system calls.

     On	machines with hardware performance counters, (R10000 machines),	the
     following additional types	are supported:

     [f]gi_hwc
	  uses statistical PC sampling,	based on overflows of the graduatedinstruction
 counter, at an overflow interval of 32771.  If the
	  optional f prefix is used, the overflow interval will	be 6553.

     [f]cy_hwc
	  uses statistical PC sampling,	based on overflows of the cycle
	  counter, at an overflow interval of 16411.  If the optional f	prefix
	  is used, the overflow	interval will be 3779.

     [f]ic_hwc
	  uses statistical PC sampling,	based on overflows of the primary
	  instruction-cache miss counter, at an	overflow interval of 2053.  If
	  the optional f prefix	is used, the overflow interval will be 419.






									Page 2






SPEEDSHOP(1)							  SPEEDSHOP(1)



     [f]isc_hwc
	  uses statistical PC sampling,	based on overflows of the secondary
	  instruction-cache miss counter, at an	overflow interval of 131.  If
	  the optional f prefix	is used, the overflow interval will be 29.

     [f]dc_hwc
	  uses statistical PC sampling,	based on overflows of the primary
	  data-cache miss counter, at an overflow interval of 2053.  If	the
	  optional f prefix is used, the overflow interval will	be 419.

     [f]dsc_hwc
	  uses statistical PC sampling,	based on overflows of the secondary
	  data-cache miss counter, at an overflow interval of 131.  If the
	  optional f prefix is used, the overflow interval will	be 29.

     [f]tlb_hwc
	  uses statistical PC sampling,	based on overflows of the TLB miss
	  counter, at an overflow interval of 257.  If the optional f prefix
	  is used, the overflow	interval will be 53.

     [f]gfp_hwc
	  uses statistical PC sampling,	based on overflows of the graduated
	  floating-point instruction counter, at an overflow interval of
	  32771.  If the optional f prefix is used, the	overflow interval will
	  be 6553.

     [f]fsc_hwc
	  uses statistical PC sampling,	based on overflows of the failed store
	  conditionals counter,	at an overflow interval	of 2003.  If the
	  optional f prefix is used, the overflow interval will	be 401.

     prof_hwc
	  uses statistical PC sampling,	based on overflows of the counter
	  specified by the environment variable	_SPEEDSHOP_HWC_COUNTER_NUMBER,
	  at an	interval given by the environment variable
	  _SPEEDSHOP_HWC_COUNTER_OVERFLOW.  Note that these environment
	  variables can	not be used to override	the counter number or interval
	  for the other	defined	experiments.  They are examined	only when the
	  prof_hwc experiment is specified.  The default counter is the
	  primary instruction-cache miss counter and the default overflow
	  interval is 2053.

     gi_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the graduated-instruction counter, at an
	  overflow interval of 1000003.

     cy_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the cycle counter, at an overflow interval of
	  10000019.




									Page 3






SPEEDSHOP(1)							  SPEEDSHOP(1)



     ic_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the primary instruction-cache-miss counter, at
	  an overflow interval of 8009.

     isc_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the secondary instruction-cache-miss counter,
	  at an	overflow interval of 2003.

     dc_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the primary data-cache-miss counter, at	an
	  overflow interval of 8009.

     dsc_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the secondary data-cache-miss counter, at an
	  overflow interval of 2003.

     tlb_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the TLB	miss counter, at an overflow interval
	  of 2521.

     gfp_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the graduated floating-point instruction
	  counter, at an overflow interval of 10007.

     fsc_hwctime
	  profiles the cycle counter using statistical call-stack sampling,
	  based	on overflows of	the failed store conditionals counter, at an
	  overflow interval of 5003.

     prof_hwctime
	  profiles the counter specified by the	environment variable
	  _SPEEDSHOP_HWC_COUNTER_PROF_NUMBER using statistical call-stack
	  sampling, based on overflows of the counter specified	by the
	  environment variable _SPEEDSHOP_HWC_COUNTER_NUMBER, at an interval
	  given	by the environment variable _SPEEDSHOP_HWC_COUNTER_OVERFLOW.
	  Note that these environment variables	can not	be used	to override
	  the counter numbers or interval for the other	defined	experiments.
	  They are examined only when the prof_hwctime experiment is
	  specified.  The default overflow and profling	counter	is the cycle
	  counter and the default overflow interval is 10000019.

     One additional experiment type may	be recorded, but no report generation
     for it is yet supported.  It is:






									Page 4






SPEEDSHOP(1)							  SPEEDSHOP(1)



     heap does tracing of all malloc and free, etc. calls, and also supports
	  various options for debugging	heap usage.

     Custom experiments	will be	supported in future releases.

REPORT GENERATION [Toc] [Back]

     Report generation is done through the prof(1) command:
	  prof <output file> . . . <output file>
     It	will add the data from all of the output files,	and produce a listing
     which depends on the particular experiment	type.  For all experiments, it
     will produce a list of functions, annotated with the appropriate metric.

     For [f]pcsamp[x], and the various *_hwc experiments, the function list is
     annotated with the	exclusive metric;  for the PC sampling experiments,
     the metric	is exclusive time, for the various hardware counter profiling
     experiments the metric is exclusive counts.

     For ideal experiments, the	function list is annotated with	a cycle	count
     and percentage, a cumulative percentage for that function and all others
     above it in the list, an estimated	of idealized time, an instruction
     execution count, and a call count.	 If the	-b[utterfly] flag is added, a
     list of callers and callees of each function is also produced.

     For usertime and totaltime	and the	various	*_hwctime experiments, the
     function list is annotated	with percentage	of time	or counts for the
     function, the time	in that	function, and the time or counts in that
     function and its descendants, and a count of the number of	callstacks
     containing	that function.	If the -b[utterfly] flag is added, a list of
     callers and callees of each function is also produced.

     For fpe experiments, the function list is annotated with the percentage
     of	FPEs in	that function, and counts for the function and its
     descendants.  If the -b[utterfly] flag is added, a	list of	callers	and
     callees of	each function is also produced.

     For io experiments, the function list is annotated	with the percentage of
     IO	calls in that function,	and counts for the function and	its
     descendants.  If the -b[utterfly] flag is added, a	list of	callers	and
     callees of	each function is also produced.

     There are many additional options to prof;	see the	prof(1)	man page for
     further details.

CALIPER	SAMPLES
     In	the current releases, caliper samples may be recorded, and the
     -calipers option to prof, will allow you to see the data for any
     caliper-setting.

     Caliper samples are supported in three different ways.  First, the	user
     can explicitly link with the SpeedShop runtime, and call its API routine
     to	record a caliper sample; second, the user can define a signal to be
     used to record a caliper sample, by specifying the	environment variable



									Page 5






SPEEDSHOP(1)							  SPEEDSHOP(1)



     _SPEEDSHOP_CALIPER_POINT_SIG and send the target the specified signal;
     third, a caliper-sample trap may be set in	either dbx, or the WorkShop
     debugger.	In the current debuggers, this is done by planting an stop
     trap (breakpoint),	and, when the process stops, evaluating	the
     expression:
	       ssrt_caliper_point(0, 0)
     the evaluation of the expression always returns zero, but a side effect
     of	the evaluation is the recording	of the appropriate data.  After
     evaluation, process execution may be resumed.  See	the ssapi(3) man page
     for further details.

USER ENVIRONMENT VARIABLE CONTROLS [Toc] [Back]

     Various environment variables are normally	used to	control	the operation
     of	SpeedShop.  They are:

     _SPEEDSHOP_VERBOSE
	  causes a log of each program's operation to be written to stderr.
	  If it	is set to an empty string, only	major events are logged; if it
	  is set to a non-empty	string,	more detailed events are logged.

     _SPEEDSHOP_SILENT
	  if set, suppresses all output, other than fatal error	messages from
	  SpeedShop.  If both _SPEEDSHOP_VERBOSE and _SPEEDSHOP_SILENT are
	  set, _SPEEDSHOP_SILENT wins.

     _SPEEDSHOP_CALIPER_POINT_SIG <signal-number>
	  if specified,	gives a	signal number to be used for recording a
	  caliper-point	in the experiment.

     _SPEEDSHOP_POLLPOINT_CALIPER_POINT	<timer_type>,<timer_interval>
	  if specified,	defines	the timer type and the timer interval (in
	  secs)	for pollpoint caliper points.

     _SPEEDSHOP_OUTPUT_DIRECTORY
	  if specified,	the output data	files and the instrumented binaries
	  will be put in the named directory.

     _SPEEDSHOP_OUTPUT_FD
	  if specified,	gives the number of the	file descriptor	to be used for
	  writing the output file.  Note: this option is not supported in the
	  current release.

     _SPEEDSHOP_REUSE_FILE_DESCRIPTORS
	  if set, opens	and closes the file descriptors	for the	output files
	  every	time performance data is to be written.	If the target program
	  is using chdir(), then the _SPEEDSHOP_OUTPUT_DIRECTORY environment
	  variable should also be set to the full pathname of the directory
	  where	the output files are to	be put.

     _SPEEDSHOP_OUTPUT_FILENAME
	  if specified,	the given name will be used for	the output file;  if
	  _SPEEDSHOP_OUTPUT_DIRECTORY is also specified, it will be prepended



									Page 6






SPEEDSHOP(1)							  SPEEDSHOP(1)



	  to the name.

     _SPEEDSHOP_HWC_COUNTER_NUMBER
	  specifies the	overflow counter to be used for	prof_hwc or
	  prof_hwctime experiments.  Counters are numbered between 0 and 31,
	  and are described in the MIPS	R10000 Microprocessor User's Manual,
	  Chapter 14.  Counter 0 counters are numbered 0-15, and counter 1
	  counters are numbers 16-31.

     _SPEEDSHOP_HWC_COUNTER_OVERFLOW
	  specifies the	overflow value for the counter to be used in prof_hwc
	  or prof_hwctime experiments.	The value chosen may be	any number
	  greater than 0.  Some	choices	may produce data that is not
	  statistically	random,	but rather reflects a correlation between the
	  overflow interval and	a cyclic behavior in the application.  Users
	  may want to do two or	more runs with different overflow values.

     _SPEEDSHOP_HWC_COUNTER_PROF_NUMBER
	  specifies the	profiling counter to be	used for prof_hwctime
	  experiments.	Counters are numbered between 0	and 31,	and are
	  described in the MIPS	R10000 Microprocessor User's Manual, Chapter
	  14.  Counter 0 counters are numbered 0-15, and counter 1 counters
	  are numbers 16-31.

     _SPEEDSHOP_HWC_COUNTER_SIGNAL_NUMBER
	  specifies the	profiling signal to be used for	prof_hwctime
	  experiments.

     _SPEEDSHOP_OUTPUT_NOCOMPRESS
	  if set, disables the compression of performance data.

     Other variables will be documented	in the future releases.

PROCESS	TRACKING ENVIRONMENT VARIABLE CONTROLS
     Various environment variables may be used for controlling the treatment
     of	processes spawned from the original target.  They are:

     _SPEEDSHOP_TRACE_FORK {True|False}
	  if True, specifies that processes spawned by calls to	fork() will be
	  monitored, if	they do	not call exec().  If they do call exec(), and
	  _SPEEDSHOP_TRACE_FORK_TO_EXEC	is not set to True, the	data covering
	  the time between the fork() and the exec() will be discarded.	 It is
	  True by default.  Note: in the current release, data will be
	  recorded independent of whether the process calls exec() or not.

     _SPEEDSHOP_TRACE_FORK_TO_EXEC {True|False}
	  if True, specifies that process spawned by calls to fork() will be
	  monitored, even if they also call exec().  It	is False by default.

     _SPEEDSHOP_TRACE_EXEC {True|False}
	  if True, specifies that process spawned by calls to any of the
	  various flavors of exec() will be monitored.	It is True by default.



									Page 7






SPEEDSHOP(1)							  SPEEDSHOP(1)



     _SPEEDSHOP_TRACE_SPROC {True|False}
	  if True, specifies that process spawned by calls to sproc() will be
	  monitored.  It is True by default.

     _SPEEDSHOP_TRACE_SYSTEM {True|False}
	  if True, specifies that process spawned by calls to system() will be
	  monitored.  It is False by default.

EXPERT-MODE ENVIRONMENT	VARIABLE CONTROLS
     Various additional	environment variables may be used for debugging	and
     finer control of the operation of SpeedShop.  They	are:

     _SPEEDSHOP_SAMPLING_MODE
	  for PC-sampling and hardware-counter profiling, if set to 1, will
	  generate data	for the	base executable	only.  If it is	not set, or
	  set to anything other	than 1,	data is	generated for the executable
	  and all DSOs it uses.

     _SPEEDSHOP_INIT_DEFERRED_SIG <signal-number>
	  If specified,	initialization of the experiment will not be performed
	  when the target process starts, but rather will be delayed until the
	  specified signal is sent to the process.   A handler for the given
	  signal will be installed when	the process starts, and	it is the
	  users	responsibility to ensure that it is not	overridden by the
	  target code.	If the process terminates before the signal is
	  received, no data will be recorded.

     _SPEEDSHOP_SHUTDOWN_SIG <signal-number>
	  If specified,	termination of the experiment will not be performed
	  when the target process exits, but rather will happen	when the
	  specified signal is sent to the process.   A handler for the given
	  signal will be installed when	the process starts, and	it is the
	  users	responsibility to ensure that it is not	overridden by the
	  target code.	If the process terminates before the signal is
	  received, data is recorded normally.

     _SPEEDSHOP_EXPERIMENT_TYPE
	  passes the name of the experiment to the runtime.  It	is normally
	  set by ssrun(1), but may be overwritten.

     _SPEEDSHOP_MARCHING_ORDERS
	  passes the marching orders of	the experiment to the runtime.	It is
	  normally set by ssrun(1) from	the experiment type, but may be
	  overwritten.

     _SPEEDSHOP_SBRK_BUFFER_LENGTH
	  defines the segment grow size	for the	internal malloc	arena used.
	  This arena is	completely separate from the user's arena, and it
	  usually grows	in default segments of size 0x100000.






									Page 8






SPEEDSHOP(1)							  SPEEDSHOP(1)



     _SPEEDSHOP_SBRK_BUFFER_ADDR
	  defines the preferred	starting address to be used for	the internal
	  malloc arena.	This option has	to used	with extreme care since	it
	  might	result in memory regions overlap.

     _SPEEDSHOP_FILE_BUFFER_LENGTH
	  defines the size of the buffer used for writing the experiment
	  files.  The default length is	64KB.  The buffer is only used for
	  writing many small records to	the file (as in	tracing	experiments);
	  large	records	are written directly, to avoid the buffering overhead.

     _SPEEDSHOP_DEBUG_NO_SIG_TRAPS
	  if set, disables the normal setting of signal	handlers for all fatal
	  and exit signals.

     _SPEEDSHOP_DEBUG_NO_STACK_UNWIND
	  if set, suppresses the stack unwind as done in usertime, totaltime,
	  or other callstack-based experiments.	 The option is used as a
	  workaround for various unwind	bugs in	libexc.

     _SPEEDSHOP_RLD
	  defines the full pathname to rld to be used and enables rld
	  profiling (for pcsamp	and hwc	experiments only).

     _SPEEDSHOP_INSTR_ARGS
	  defines additional instrumentation arguments.

INSTRUMENTATION [Toc] [Back]

     Instrumentation is	done with the pixie(1) command,	invoked	automatically
     by	ssrun(1), and, if necessary for	DSOs that are opened during a run, by
     the runtime library.  Users normally would	not invoke pixie(1) directly.

     In	the current release, instrumented executables and DSOs appear in the
     current working directory.	 In a future release, the DSOs will be cached.

SPEEDSHOP API ROUTINES [Toc] [Back]

     The SpeedShop API routines	are defined in the include file
     "SpeedShop/api.h",	installed in /usr/include.  It defines three entry
     points, described int the SpeedShop API man page, ssapi(3).

SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES [Toc] [Back]

     The SpeedShop facility for	users to add custom data capture routines is
     not available in the current release.

MISCELLANEOUS UTILITY PROGRAMS [Toc] [Back]

     Several utility routines are provided, in addition	to the main
     functionality in SpeedShop.  They are:

     sscord
	  and ssorder are used to generate cord	feedback files from recorded
	  data.	 sswsextr is a script to produce the working-set files used
	  for cord computations.  See their respective man pages for more



									Page 9






SPEEDSHOP(1)							  SPEEDSHOP(1)



	  information.

     ssusage
	  is a variant of time(1) that prints more information about the
	  resource usage of a program.	See ssusage(1) for more	information.

     ssdump
	  is a program which produces a	formatted dump of a SpeedShop
	  experiment.

     squeeze
	  is a program which allocates and locks down memory, making the
	  system behave	as if it had less physical memory that it really does.
	  See squeeze(1) for more information.

     thrash
	  is a program that allocates memory, and then touches all of the
	  pages, in order to force other pages out of the system's physical
	  memory.  See thrash(1) for more information.

     fbdump
	  is a program that dumps out the contents of the compiler feedback
	  files	produced by the	-feedback option to prof(1).  See fpdump(1)
	  and prof(1) for more information.

CAVEATS [Toc] [Back]

     Depending on the revision of the R10000 CPUs there	is a difference	in the
     interpretation of counter number 14 (``Virtual coherency condition'' for
     parts before revision 3.1 or ``ALU/FPU completion cycles''	for parts at
     revision 3.1 or later).  There are	also some subtle differences in	the
     semantics of some of the counters.

     In	systems	with a homogeneous deployment of CPUs at the same revision,
     speedshop will adjust the reported	information accordingly.

     For systems with a	mixed deployment of CPU	revisions including some
     before 3.1	and some at or after 3.1, the interpretation of	counter	14 is
     undefined,	and there may be some slight inaccuracies due to aggregation
     of	counters with different	semantics across all CPUs.

     Identification of the revisions for all CPUs can be made using the	-v
     option to hinv(1).

Contents

NAME [Toc] [Back]

DESCRIPTION [Toc] [Back]

SUPPORTED EXECUTABLES [Toc] [Back]

RECORDING EXPERIMENTS [Toc] [Back]

EXPERIMENT TYPES [Toc] [Back]

REPORT GENERATION [Toc] [Back]

USER ENVIRONMENT VARIABLE CONTROLS [Toc] [Back]

INSTRUMENTATION [Toc] [Back]

SPEEDSHOP API ROUTINES [Toc] [Back]

SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES [Toc] [Back]

MISCELLANEOUS UTILITY PROGRAMS [Toc] [Back]

CAVEATS [Toc] [Back]

SEE ALSO [Toc] [Back]

NAME [Toc] [Back]

DESCRIPTION [Toc] [Back]

SUPPORTED EXECUTABLES [Toc] [Back]

RECORDING EXPERIMENTS [Toc] [Back]

EXPERIMENT TYPES [Toc] [Back]

REPORT GENERATION [Toc] [Back]

USER ENVIRONMENT VARIABLE CONTROLS [Toc] [Back]

INSTRUMENTATION [Toc] [Back]

SPEEDSHOP API ROUTINES [Toc] [Back]

SPEEDSHOP CUSTOM DATA CAPTURE ROUTINES [Toc] [Back]

MISCELLANEOUS UTILITY PROGRAMS [Toc] [Back]

CAVEATS [Toc] [Back]

SEE ALSO [Toc] [Back]