availmon - overview of system availability monitoring facilities
The availability monitor (availmon) is a set of programs that are
integrated with SGI Embedded Support Partner (a.k.a ESP; see esp(5) for
more details) to collectively monitor and report the availability of a
system and the diagnosis of system crashes. For unexpected reboots,
availmon identifies the cause of the reboot by gathering information from
diagnostic programs such as icrash(1M), which includes results from the
FRU analyzer when available, and syslog (see syslog(3C)), and system
configuration information from configmon(1M), versions(1M), hinv(1M) and
Availmon can send availability and diagnostic information to various
locations, depending on configuration; it can provide local system
availability statistics and reboot history reporting.
All availmon capabilities are configurable from SGI ESP User Interface.
Availmon, by default will not automatically send availmon reports on
reboot. In all cases, the Automatic e-mail distribution flag must be
enabled for availmon to send reports.
Availmon reporting centers around events. Any system reboot is an
availmon event, whether a controlled shutdown or an "unscheduled" reboot,
such as a power interruption or a "crash". An event record contains the
time at which the system was previously booted, which starts the event
period, the time the event occurred, which ends the period of "uptime",
the reason for the event, and the time that the system was rebooted. If
the system stopped as a result of a hang, the exact instant at which it
stopped is not easily known; this time is obtained from SGI ESP Event
Monitor (see eventmond(1M) for more details) if amconfig tickerd flag is
Events are grouped as either "Service Action" events, or "Unscheduled"
events. Service Action events are controlled shutdowns, initiated by
operators through shutdown(1M), hal
). For such
controlled shutdowns, a (configurable) prompt is given to identify the
reason for the shutdown. Unscheduled events include system panics, and
system interrupts (power failures, power cycles, system resets etc.).
Panics are identified as either due to hardware or due to software or due
to unknown reasons. This distinction is based strictly on results of the
FRU analyzer, if present.
Availmon generates three types of reports: availability, diagnosis and
pager. Availability reports consist of the system serial number, full
hostname/internet address, the previous system start time, the time of
the event, the reason for the event (the event code), uptime, start time
(following the reboot), and a summary of the reason for the event where
Diagnosis reports include all data from an availability report, and
additionally may contain the icrash analysis report, FRU analyzer result,
important syslog messages, and system hardware/software configuration and
version information. Important syslog messages include error messages
and all messages logged by sysctlrd and syslogd, since the last reboot.
Duplicated messages are eliminated even if not consecutive; the first
such message is retained with its time stamp, and the number of
duplicated messages and the last time stamp are appended. System
software version information is limited to version output for the
operating system and installed patches.
Pager reports are intended for "chatty pagers", and include only the
system hostname, a brief description of the reason for the event, and the
summary, if present.
Availability information for the local system is always permanently
stored in SGI ESP database with the help of esplogger(1). Files in
/var/adm/avail are maintained by availmon and should not be deleted,
modified, or moved.
Once availmon is installed, "registration" is required before availmon
reports are automatically distributed, and if desired, other options may
also be configured. Registration of a system can normally be
accomplished simply by enabling the flag autoemail using amconfig
autoemail on. The default distribution of email reports is to send a
diagnosis report to email@example.com, which enters the report into
the SGI Availmon Database. Further adjustments of Email distribution are
described in the next paragraph.
There are several other configuration options that can prove useful. One
is to configure sending availmon reports from one or more systems to a
standard system administrator email alias. This provides real-time
notification of system activity. Another similar option is to configure
availmon pager reports for real-time notification to "chatty" pagers.
Or, availmon diagnostic reports may be sent to a local support office, or
to a system administrator for detailed evaluation. To perform those
adjustmentd of Email distribution, just use amconfig autoemail.list.
Availmon can also generate periodic status reports that indicate that a
system is still running and "registered" to send email reports. This is
controlled by the Number of days between status updates configuration
value, which defaults to 7 days. Such reports are sent by the eventmond,
so they are sent only if the amconfig tickerd configuration flag is on.
NOTE:That option is now deprecated in favor of an eventmond command-line
Even where sending of availmon reports is not enabled, local system
availability data is always maintained, and Reports->Availability option
can be chosen from SGI ESP User Interface to produce statistical or event
detail reports for the local system.
The Reports->Availability option of User Interface reviews saved
availability report information and provides statistical and event
history reports. Also, espreport availability command ASCII interface
can be used. By default, it processes the availability data on the local
system. It can also process aggregate site data; that is, an
accumulation of availmon data from different systems. Please refer to
SGI ESP User Guide on how to setup your system to collect availability
data from different systems.
uptime in seconds since Jan 1, 1970 (written by eventmond)
location temporary availmon files: availreport.*, diagreport.*,
init script that logs start/stop and initiates notification
espreport(1), esplogger(1), Mail(1), amconfig(1M), amparse(1M),
amreceive(1M), amsyslog(1M), amtime1970(1M), configmon(1M),
eventmond(1M), halt(1M), hinv(1M), icrash(1M), init(1M), shutdown(1M),
versions(1M), syslogd(1M), syslog(3C), esp(5).
SGI Embedded Support Partner User Guide.
PPPPaaaaggggeeee 3333 [ Back ]