volwatch - Tru64

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->Tru64 Unix man pages -> volwatch (8)

volwatch(8)

NAME
SYNOPSIS
OPTIONS
DESCRIPTION
SEE ALSO

NAME [Toc] [Back]

       volwatch  - Monitors the Logical Storage Manager (LSM) for
       failure events and performs hot sparing

SYNOPSIS [Toc] [Back]

       /usr/sbin/volwatch [-m] [-s] [-o] [mail-addresses...]

OPTIONS [Toc] [Back]

       Runs volwatch with the mail notification support to notify
       root  (by default) or other specified users when a failure
       occurs. This option is started by default.  Runs  volwatch
       with  hot  spare  support.   Specifies an argument to pass
       directly to volrecover if it is running and hot spare support
 is enabled.

DESCRIPTION [Toc] [Back]

       The  volwatch  command  monitors LSM waiting for exception
       events to occur. When an exception event occurs, the  volwatch
  command  uses  mailx(1)  to  send mail to: The root
       account.  The user accounts specified  when  you  use  the
       rcmgr  command  to  set the VOLWATCH_USERS variable in the
       /etc/rc.config.common file.  The  user  account  that  you
       specify on the command line with the volwatch command.

       The  volwatch  command  uses the volnotify command to wait
       for events to occur. When an event occurs,  there is a  15
       second  delay  before the failure is analyzed and the message
 is sent.  This delay allows a group of related events
       to  be collected and reported in a single mail message. By
       default, the volwatch command  automatically  starts  when
       the system boots.

       You  can  enter  the volwatch -s command to start the volwatch
 command with hot-spare support.  Hot-spare  support:
       Detects  LSM  events resulting from the failure of a disk,
       plex, or RAID5 subdisk.  Sends mail to  the  root  account
       (and other specified accounts) with notification about the
       failure and identifies the affected LSM  objects.   Determines
  which  subdisks  to relocate, finds space for those
       subdisks in the disk group, relocates  the  subdisks,  and
       notifies  the root account  (and other specified accounts)
       of these actions and their success or failure.

              When a partial disk  failure  occurs  (that  is,  a
              failure  affecting  only  some subdisks on a disk),
              redundant data on the failed portion of the disk is
              relocated and the existing volumes comprised of the
              unaffected portions of the disk remain  accessible.

                                  Note

       Hot-sparing  is  only performed for redundant (mirrored or
       RAID5) subdisks on a failed disk.  Non-redundant  subdisks
       on  a  failed disk are not relocated, but you are notified
       of the failure.

       Only one volwatch daemon can be running  on  a  system  or
       cluster node at any time.

       Hot-sparing  does not guarantee the same layout of data or
       the same performance after relocation.  You  may  want  to
       make  some configuration changes after hot-sparing occurs.


   Mail Notification Support    [Toc]    [Back]
       The following is a sample mail notification when a failure
       is  detected:  Failures  have been detected by the Logical
       Storage Manager:

       failed disks:

       medianame

        ...

       failed plexes:

       plexname

        ...

       failed log plexes:

       plexname

        ...

       failing disks:

       medianame
        ...

       failed subdisks:

       subdiskname

        ...

       The Logical Storage Manager will  attempt  to  find  spare
       disks,  relocate failed subdisks and then recover the data
       in the failed plexes.

       The following describes the sections of the mail  message:
       The medianame list under failed disks specifies disks that
       appear to have completely failed; The medianame list under
       failing  disks  indicates a partial disk failure or a disk
       that is in the process of failing. When a disk has  failed
       completely,  the  same  medianame  list appears under both
       failed disks: and failing disks.  The plexname list  under
       failed  plexes shows plexes that have been detached due to
       I/O failures experienced while attempting  to  do  I/O  to
       subdisks they contain.  The plexname list under failed log
       plexes indicates RAID5 or dirty region  log  (DRL)  plexes
       that have experienced failures. The subdiskname list specifies
 subdisks in RAID5 volumes that  have  been  detached
       due to I/O errors.

   Enabling Hot-Sparing    [Toc]    [Back]
       By  default,  hot-sparing is disabled. To enable hot-sparing,
 enter the volwatch command with the  -s  option,  for
       example: # volwatch -s

       To  use hot-spare support you should configure a disk as a
       spare, which identifies the disk as an available site  for
       relocating  failed subdisks.  Disks that are identified as
       spares are not used  for  normal  allocations  unless  you
       explicitly specify otherwise. This ensures that there is a
       pool of spare disk space available for  relocating  failed
       subdisks  and that this disk space is not consumed by normal
 operations.

       Spare disk space is  the  first  space  used  to  relocate
       failed  subdisks.   However,  if  no  spare  disk space is
       available or if the available  spare  disk  space  is  not
       suitable or sufficient, free disk space is used.

       You  must  initialize  a spare disk and place it in a disk
       group as a spare before it can  be  used  for  replacement
       purposes.  If  no  disks  are  designated as spares when a
       failure occurs, LSM automatically uses any available  free
       disk  space in the disk group in which the failure occurs.
       If there is not enough spare disk space, a combination  of
       spare disk space and free disk space is used.

       When  hot-sparing  selects  a disk for relocation, it preserves
 the redundancy characteristics of the LSM object to
       which  the  relocated  subdisk belongs.  For example, hotsparing
 ensures that subdisks from a failed plex  are  not
       relocated  to  a  disk  containing  a mirror of the failed
       plex. If redundancy cannot be  preserved  using  available
       spare  disks  and/or free disk space, hot-sparing does not
       take place. If relocation is not possible,  mail  is  sent
       indicating that no action was taken.

       When  hot-sparing  takes  place,  the  failed  subdisk  is
       removed from the configuration database and LSM takes precautions
  to ensure that the disk space used by the failed
       subdisk is not recycled as free disk space.

   Initializing and Removing Hot-Spare Disks    [Toc]    [Back]
       Although hot-sparing does not  require  you  to  designate
       disks  as  spares,  HP  recommends  that you initialize at
       least one disk as a spare within  each  disk  group;  this
       gives  you  control  over which disks are used for relocation.
 If no spare disks exist,  LSM  uses  available  free
       disk space within the disk group.  When free disk space is
       used for relocation purposes, it is likely that there  may
       be performance degradation after the relocation.

       Follow  these guidelines when choosing a disk to configuring
 as a spare: The hot-spare feature works  best  if  you
       specify  at  least  one spare disk in each disk group containing
 mirrored or RAID5 volumes.  If a given disk  group
       spans  multiple  controllers  and  has more than one spare
       disk,  set up the spare disks on different controllers (in
       case  one  of the controllers fails).  For a mirrored volume,
 the disk group must have at least one disk that  does
       not already contain one of the volume's mirrors. This disk
       should either be a spare disk with some available space or
       a  regular  disk with some free space.  For a mirrored and
       striped volume, the disk group must have at least one disk
       that  does not already contain one of the volume's mirrors
       or another subdisk in the striped plex. This  disk  should
       either be a spare disk with some available space or a regular
 disk with some free space.  For a RAID5  volume,  the
       disk  group  must  have  at  least  one disk that does not
       already contain the volume's RAID5 plex or one of its  log
       plexes.  This disk should either be a spare disk with some
       available space or a regular disk with  some  free  space.
       If  a mirrored volume has a DRL log subdisk as part of its
       data plex (for example, volprint does not  list  the  plex
       length as LOGONLY),  that plex cannot be relocated. Therefore,
 place log subdisks in plexes that  contain  no  data
       (log  plexes).  By  default, the volassist command creates
       log plexes.  For mirroring the root disk, the rootdg  disk
       group  should  contain  an empty spare disk that satisfies
       the restrictions for mirroring the root disk.  Although it
       is  possible  to  build  LSM objects on spare disks, it is
       preferable to use spare disks for  hot-spare  only.   When
       relocating subdisks off a failed disk, LSM attempts to use
       a spare disk large enough to hold all data from the failed
       disk.

       To  initialize  a  disk  as a spare that has no associated
       subdisks, use the voldiskadd command and enter  y  at  the
       following  prompt:  Add  disk  as  a spare disk for newdg?
       [y,n,q,?] (default: n) y

       To initialize an existing LSM disk as a spare disk, enter:
       # voledit set spare=on medianame

       For example, to initialize a disk called test03 as a spare
       disk, enter: # voledit set spare=on test03

       To remove  a  disk  as  a  spare,  enter:  #  voledit  set
       spare=off medianame

       For  example,  to  make a disk called test03 available for
       normal use, enter: # voledit set spare=off test03


   Replacement Procedure    [Toc]    [Back]
       In the event of a disk failure, mail is sent, and if  volwatch
  was configured to run with hot sparing support with
       the -s option, volwatch attempts to relocate any  subdisks
       that  appear  to have failed. This involves finding appropriate
 spare disk or free disk  space  in  the  same  disk
       group as the failed subdisk.

       To  determine  which  disk  from  among the eligible spare
       disks to use, volwatch tries to use the disk that is closest
 to the failed disk.  The value of closeness depends on
       the controller, target, and  disk  number  of  the  failed
       disk.  For  example,  a disk on the same controller as the
       failed disk is closer than a  disk  on  a  different  controller;
  a  disk under the same target as the failed disk
       is closer than one under a different target.

       If no spare or free disk space  is  found,  the  following
       mail message is sent explaining the disposition of volumes
       on the failed disk: Relocation was not successful for subdisks
  on  disk  dm_name  in  volume  v_name in disk group
       dg_name.  No replacement was made and the  disk  is  still
       unusable.

       The following volumes have storage on medianame:

       volumename ...

       These  volumes  are  still  usable,  but the redundancy of
       those volumes is reduced. Any RAID-5 volumes with  storage
       on the failed disk may become unusable in the face of further
 failures.

       If non-RAID5 volumes are made unusable due to the  failure
       of  the  disk,  the following is included in the mail message:
 The following volumes:

       volumename ...

       have data on medianame but have no other usable mirrors on
       other  disks.  These volumes are now unusable and the data
       on them is unavailable.  These  volumes  must  have  their
       data restored.

       If  RAID5  volumes  are  made  unavailable due to the disk
       failure, the following message is  included  in  the  mail
       message: The following RAID-5 volumes:

       volumename ...

       have storage on medianame and have experienced other failures.
 These RAID-5 volumes are now unusable  and  data  on
       them is unavailable.  These RAID-5 volumes must have their
       data restored.

       If spare disk space is found, LSM attemps to set up a subdisk
  on  the  spare disk and use it to replace the failed
       subdisk. If this is  successful,  the  volrecover  command
       runs  in the background to recover the contents of data in
       volumes on the failed disk.

       If the relocation fails, the  following  mail  message  is
       sent:  Relocation  was not successful for subdisks on disk
       dm_name in  volume  v_name  in  disk  group  dg_name.   No
       replacement was made and the disk is still unusable.

       error message

       If  any volumes (RAID5 or otherwise) are rendered unusable
       due to the failure, the following is included in the  mail
       message: The following volumes:

       volumename ...

       have  data  on dm_name but have no other usable mirrors on
       other disks. These volumes are now unusable and  the  data
       on them is unavailable. These volumes must have their data
       restored.

       If the relocation  procedure  completes  successfully  and
       recovery is under way, the following mail message is sent:
       Volume v_name Subdisk sd_name relocated to newsd_name, but
       not yet recovered.

       Once  recovery  has  completed, a message is sent relaying
       the outcome of the recovery procedure. If the recovery was
       successful, the following is included in the mail message:
       Recovery complete for volume v_name in disk group dg_name.

       If  the  recovery  was  not  successful,  the following is
       included in the mail message: Failure recovering v_name in
       disk group dg_name.

volwatch(8)

Contents

NAME [Toc] [Back]

SYNOPSIS [Toc] [Back]

OPTIONS [Toc] [Back]

DESCRIPTION [Toc] [Back]

SEE ALSO [Toc] [Back]