raidctl - OpenBSD

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->OpenBSD man pages -> raidctl (8)

RAIDCTL(8)

NAME
SYNOPSIS
DESCRIPTION
- Configuration file
EXAMPLES
WARNINGS
FILES
SEE ALSO
HISTORY
BUGS
COPYRIGHT

NAME [Toc] [Back]

     raidctl - configuration utility for the RAIDframe disk driver

SYNOPSIS [Toc] [Back]

     raidctl    [-v]   [-afFgrR   component]   [-BGipPsSu]   [-cC
config_file]
             [-A [yes | no | root]] [-I serial_number] dev

DESCRIPTION [Toc] [Back]

     raidctl is the user-land control program  for  raid(4),  the
RAIDframe disk
     device.   raidctl is primarily used to dynamically configure
and unconfigure
 RAIDframe disk devices.  For more information about  the
RAIDframe
     disk device, see raid(4).

     This  document  assumes  the reader has at least rudimentary
knowledge of
     RAID and RAID concepts.

     The device used by raidctl is specified by dev.  dev may  be
either the
     full  name of the device, e.g.  /dev/rraid0c, or just simply
raid0 (for
     /dev/rraid0c).

     For several commands (-BGipPsSu),  raidctl  can  accept  the
word all as the
     dev  argument.  If all is used, raidctl will execute the requested action
     for all the configured raid(4) devices.

     The command-line options for raidctl are as follows:

     -a component dev
             Add component as a hot spare for the device dev.

     -A yes dev
             Make the RAID set auto-configurable.  The  RAID  set
will be automatically
  configured  at  boot before the root file
system is
             mounted.  Note that all components of the  set  must
be of type
             RAID in the disklabel.

     -A no dev
             Turn off auto-configuration for the RAID set.

     -A root dev
             Make  the  RAID set auto-configurable, and also mark
the set as being
 eligible to contain the root partition.  A  RAID
set configured
 this way will override the use of the boot disk
as the root
             device.  All components of the set must be  of  type
RAID in the
             disklabel.   Note  that the kernel being booted must
currently reside
 on a non-RAID set and, in  order  to  have  the
root file system
             correctly  mounted  from  it, the RAID set must have
its `a' partition
 (aka raid[0..n]a) set up.

     -B dev  Initiate a copyback of  reconstructed  data  from  a
spare disk to
             its original disk.  This is performed after a component has
             failed, and the failed drive has been  reconstructed
onto a spare
             drive.

     -c config_file dev
             Configure  the RAIDframe device dev according to the
configuration
             given in config_file.  A description of the contents
of
             config_file is given later.

     -C config_file dev
             As  for  -c,  but  forces  the configuration to take
place.  This is
             required the first time a RAID set is configured.

     -f component dev
             This marks the specified component as having failed,
but does not
             initiate a reconstruction of that component.

     -F component dev
             Fails the specified component of the device, and immediately begin
 a reconstruction of  the  failed  disk  onto  an
available hot
             spare.   This is one of the mechanisms used to start
the reconstruction
 process if a component does have  a  hardware failure.

     -g component dev
             Get the component label for the specified component.

     -G dev  Generate the configuration of the  RAIDframe  device
in a format
             suitable for use with raidctl -c or -C.

     -i  dev   Initialize  the  RAID device.  In particular, (rewrite) the parity
             on the selected device.  This MUST be done  for  all
RAID sets before
 the RAID device is labeled and before file systems are created
 on the RAID device.

     -I serial_number dev
             Initialize the component labels on each component of
the device.
             serial_number  is  used as one of the keys in determining whether a
             particular set of components belong to the same RAID
set.  While
             not  strictly  enforced,  different  serial  numbers
should be used
             for different RAID sets.  This  step  MUST  be  performed when a new
             RAID set is created.

     -p  dev   Check  the  status  of the parity on the RAID set.
Displays a status
 message, and returns successfully if the  parity
is up-todate.


     -P  dev  Check the status of the parity on the RAID set, and
initialize
             (re-write) the parity if the parity is not known  to
be up-todate.
   This  is  normally used after a system crash
(and before a
             fsck(8)) to ensure the integrity of the parity.

     -r component dev
             Remove the spare disk specified  by  component  from
the set of
             available spare components.

     -R component dev
             Fails the specified component, if necessary, and immediately begins
 a reconstruction back to  component.   This  is
useful for reconstructing
 back onto a component after it has been
replaced
             following a failure.

     -s dev  Display the status of the RAIDframe device for  each
of the components
 and spares.

     -S dev  Check the status of parity re-writing, component reconstruction,
             and component copyback.  The  output  indicates  the
amount of
             progress achieved in each of these areas.

     -u dev  Unconfigure the RAIDframe device.

     -v      Be more verbose.  For operations such as reconstructions, parity
             re-writing, and copybacks, provide a progress  indicator.

   Configuration file    [Toc]    [Back]
     The format of the configuration file is complex, and only an
abbreviated
     treatment is given here.  In the configuration files, a  `#'
indicates the
     beginning of a comment.

     There are 4 required sections of a configuration file, and 2
optional
     sections.  Each section begins with a `START',  followed  by
the section
     name,  and the configuration parameters associated with that
section.  The
     first section is the `array' section, and it  specifies  the
number of
     rows,  columns,  and spare disks in the RAID set.  For example:

           START array
           1 3 0

     indicates an array with 1 row, 3 columns, and 0 spare disks.
Note that
     although multi-dimensional arrays may be specified, they are
NOT supported
 in the driver.

     The second section, the `disks' section, specifies the actual components
     of the device.  For example:

           START disks
           /dev/sd0e
           /dev/sd1e
           /dev/sd2e

     specifies  the  three component disks to be used in the RAID
device.  If
     any of the specified drives cannot be found  when  the  RAID
device is configured,
  then they will be marked as `failed', and the system will operate
 in degraded mode.  Note that it is imperative  that  the
order of the
     components in the configuration file does not change between
configurations
 of a RAID device.  Changing the order  of  the  components will result
     in  data  loss  if the set is configured with the -C option.
In normal circumstances,
 the RAID set will not configure if  only  -c  is
specified, and
     the components are out-of-order.

     The next section, which is the `spare' section, is optional,
and, if present,
 specifies the devices to be used as  `hot  spares'  --
devices which
     are  on-line,  but  are not actively used by the RAID driver
unless one of
     the main components fail.  A simple  `spare'  section  might
be:

           START spare
           /dev/sd3e

     for  a  configuration  with a single spare component.  If no
spare drives
     are to be used in the configuration, then the  `spare'  section may be
     omitted.

     The  next section is the `layout' section.  This section describes the
     general layout parameters for the RAID device, and  provides
such information
  as  sectors  per  stripe unit, stripe units per parity
unit, stripe
     units per reconstruction unit, and the parity  configuration
to use.  This
     section might look like:

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
           32 1 1 5

     The sectors per stripe unit specifies, in blocks, the interleave factor;
     i.e.  the number of contiguous sectors to be written to each
component for
     a single stripe.  Appropriate selection of this value (32 in
this example)
  is the subject of much research in RAID architectures.
The stripe
     units per parity unit and stripe  units  per  reconstruction
unit are normally
  each set to 1.  While certain values above 1 are permitted, a discussion
 of valid values and the consequences of  using  anything other than
     1 are outside the scope of this document.  The last value in
this section
     (5 in this example) indicates the parity  configuration  desired.  Valid
     entries include:

     0     RAID level 0.  No parity, only simple striping.

     1     RAID level 1.  Mirroring.  The parity is the mirror.

     4     RAID level 4.  Striping across components, with parity
stored on
           the last component.

     5     RAID level 5.  Striping across components, parity distributed
           across all components.

     There  are  other  valid  entries  here, including those for
Even-Odd parity,
     RAID level 5 with rotated sparing, Chained declustering, and
Interleaved
     declustering, but as of this writing the code for those parity operations
     has not been tested with OpenBSD.

     The next required section is the `queue' section.   This  is
most often
     specified as:

           START queue
           fifo 100

     where  the  queuing  method  is specified as FIFO (First-In,
First-Out), and
     the size of the per-component queue is limited  to  100  requests.  Other
     queuing  methods  may also be specified, but a discussion of
them is beyond
     the scope of this document.

     The final section, the `debug' section,  is  optional.   For
more details on
     this  the  reader is referred to the RAIDframe documentation
discussed in
     the HISTORY section.  See EXAMPLES for a more complete  configuration file
     example.

EXAMPLES [Toc] [Back]

     It  is  highly recommended that before using the RAID driver
for real file
     systems that the system administrator(s) become quite familiar with the
     use  of  raidctl, and that they understand how the component
reconstruction
     process works.  The examples in this section will  focus  on
configuring a
     number  of  different RAID sets of varying degrees of redundancy.  By working
 through these examples, administrators should be able to
develop a
     good feel for how to configure a RAID set, and how to initiate reconstruction
 of failed components.

     In the following examples `raid0' will be used to denote the
RAID device.
     `/dev/rraid0c' may be used in place of `raid0'.

   Initialization and Configuration    [Toc]    [Back]
     The  initial  step  in configuring a RAID set is to identify
the components
     that will be used in the RAID set.  All components should be
the same
     size.   Each  component  should  have  a  disklabel  type of
FS_RAID, and a typical
 disklabel entry for a RAID component might look like:

           f:   1800000   200495      RAID               #  (Cyl.
405*- 4041*)

     While  FS_BSDFFS  (e.g. 4.2BSD) will also work as the component type, the
     type FS_RAID (e.g. RAID) is preferred for RAIDframe use,  as
it is required
  for features such as auto-configuration.  As part of
the initial
     configuration of each RAID set, each component will be given
a `component
     label'.   A `component label' contains important information
about the
     component, including a user-specified serial number, the row
and column
     of  that  component in the RAID set, the redundancy level of
the RAID set,
     a 'modification counter', and whether the parity information
(if any) on
     that component is known to be correct.  Component labels are
an integral
     part of the RAID set, since they are  used  to  ensure  that
components are
     configured  in  the correct order, and used to keep track of
other vital
     information about the RAID set.  Component labels  are  also
required for
     the  auto-detection  and  auto-configuration of RAID sets at
boot time.  For
     a component label to be considered  valid,  that  particular
component label
     must  be in agreement with the other component labels in the
set.  For example,
 the serial number, `modification counter', number  of
rows and number
  of  columns  must all be in agreement.  If any of these
are different,
     then the component is not considered to be part of the  set.
See raid(4)
     for more information about component labels.

     Once the components have been identified, and the disks have
appropriate
     labels, raidctl is then used to configure  the  raid(4)  device.  To configure
  the  device, a configuration file which looks something
like:

           START array
           # numRow numCol numSpare
           1 3 1

           START disks
           /dev/sd1e
           /dev/sd2e
           /dev/sd3e

           START spare
           /dev/sd4e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
           32 1 1 5

           START queue
           fifo 100

     is  created  in a file.  The above configuration file specifies a RAID 5
     set consisting of the components /dev/sd1e,  /dev/sd2e,  and
/dev/sd3e,
     with /dev/sd4e available as a `hot spare' in case one of the
three main
     drives should fail.  A RAID 0 set would be  specified  in  a
similar way:

           START array
           # numRow numCol numSpare
           1 4 0

           START disks
           /dev/sd10e
           /dev/sd11e
           /dev/sd12e
           /dev/sd13e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
           64 1 1 0

           START queue
           fifo 100

     In this case, devices  /dev/sd10e,  /dev/sd11e,  /dev/sd12e,
and /dev/sd13e
     are  the  components  that make up this RAID set.  Note that
there are no
     hot spares for a RAID 0 set, since there is no way to recover data if any
     of the components fail.

     For a RAID 1 (mirror) set, the following configuration might
be used:

           START array
           # numRow numCol numSpare
           1 2 0

           START disks
           /dev/sd20e
           /dev/sd21e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
           128 1 1 1

           START queue
           fifo 100

     In  this  case, /dev/sd20e and /dev/sd21e are the two components of the
     mirror set.  While no hot spares have been specified in this
configuration,
  they  easily could be, just as they were specified in
the RAID 5
     case above.  Note as well that RAID  1  sets  are  currently
limited to only
     2  components.  At present, n-way mirroring is not possible.

     The first time a RAID set is configured, the -C option  must
be used:

           # raidctl -C raid0.conf raid0

     where  `raid0.conf'  is  the  name of the RAID configuration
file.  The -C
     forces the configuration to succeed, even if any of the component labels
     are  incorrect.  The -C option should not be used lightly in
situations
     other than initial configurations, as if the system  is  refusing to configure
  a RAID set, there is probably a very good reason for
it.  After
     the initial configuration is done (and appropriate component
labels are
     added  with the -I option) then raid0 can be configured normally with:

           # raidctl -c raid0.conf raid0

     When the RAID set is configured for the first  time,  it  is
necessary to
     initialize the component labels, and to initialize the parity on the RAID
     set.  Initializing the component labels is done with:

           # raidctl -I 112341 raid0

     where `112341' is a user-specified  serial  number  for  the
RAID set.  This
     initialization  step  is  required for all RAID sets.  Also,
using different
     serial numbers between RAID sets is strongly encouraged,  as
using the
     same  serial number for all RAID sets will only serve to decrease the usefulness
 of the component label checking.

     Initializing the RAID set is done via the -i  option.   This
initialization
     MUST  be done for all RAID sets, since among other things it
verifies that
     the parity (if any) on the RAID set is correct.  Since  this
initialization
  may be quite time-consuming, the -v option may be also
used in conjunction
 with -i:

           # raidctl -iv raid0

     This will give more verbose output on the status of the initialization:

           Initiating re-write of parity
           Parity Re-write status:
            10%  |****                                    |  ETA:
06:03 /

     The output provides a `Percent Complete' in both  a  numeric
and graphical
     format,  as  well  as an estimated time to completion of the
operation.

     Since it is the parity that provides the  `redundancy'  part
of RAID, it is
     critical that the parity is correct as much as possible.  If
the parity
     is not correct, then there is no guarantee  that  data  will
not be lost if
     a component fails.

     Once  the  parity is known to be correct, it is then safe to
perform
     disklabel(8), newfs(8), or fsck(8)  on  the  device  or  its
filesystems, and
     then to mount the filesystems for use.

     Under  certain  circumstances (e.g. the additional component
has not arrived,
 or data is being migrated off of a disk  destined  to
become a component)
  it  may be desirable to configure a RAID 1 set with
only a single
     component.  This can be achieved by configuring the set with
a physically
     existing component (as either the first or second component)
and with a
     `fake' component.  In the following:

           START array
           # numRow numCol numSpare
           1 2 0

           START disks
           /dev/sd6e
           /dev/sd0e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
           128 1 1 1

           START queue
           fifo 100

     /dev/sd0e is the real component, and will be the second disk
of a RAID 1
     set.  The component /dev/sd6e, which must exist, but have no
physical device
  associated  with  it, is simply used as a placeholder.
Configuration
     (using -C and -I 12345 as above) proceeds normally, but initialization of
     the RAID set will have to wait until all physical components
are present.
     After configuration, this set can be used normally, but will
be operating
     in  degraded  mode.  Once a second physical component is obtained, it can
     be hot-added, the existing data mirrored, and normal  operation resumed.

   Maintenance of the RAID set    [Toc]    [Back]
     After  the  parity  has been initialized for the first time,
the command:

           # raidctl -p raid0

     can be used to check the current status of the  parity.   To
check the parity
  and rebuild it necessary (for example, after an unclean
shutdown) the
     command:

           # raidctl -P raid0

     is used.  Note that re-writing the parity can be done  while
other operations
  on the RAID set are taking place (e.g. while doing an
fsck(8) on a
     file system on the RAID set).  However: for  maximum  effectiveness of the
     RAID  set,  the  parity should be known to be correct before
any data on the
     set is modified.

     To see how the RAID set is doing, the following command  can
be used to
     show the RAID set's status:

           # raidctl -s raid0

     The output will look something like:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: optimal
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: spare
           Parity status: clean
           Reconstruction is 100% complete.
           Parity Re-write is 100% complete.
           Copyback is 100% complete.

     This  indicates  that all is well with the RAID set.  Of importance here
     are the component lines which read `optimal', and the `Parity status'
     line  which  indicates  that the parity is up-to-date.  Note
that if there
     are file systems open on the RAID set, the individual components will not
     be `clean' but the set as a whole can still be clean.

     The -v option may be also used in conjunction with -s:

           # raidctl -sv raid0

     In  this case, the components' label information (see the -g
option) will
     be given as well:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: optimal
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: spare
           Component label for /dev/sd1e:
              Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
              Version: 2 Serial Number: 13432 Mod Counter: 65
              Clean: No Status: 0
              sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
              RAID Level: 5  blocksize: 512 numBlocks: 1799936
              Autoconfig: No
              Last configured as: raid0
           Component label for /dev/sd2e:
              Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
              Version: 2 Serial Number: 13432 Mod Counter: 65
              Clean: No Status: 0
              sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
              RAID Level: 5  blocksize: 512 numBlocks: 1799936
              Autoconfig: No
              Last configured as: raid0
           Component label for /dev/sd3e:
              Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
              Version: 2 Serial Number: 13432 Mod Counter: 65
              Clean: No Status: 0
              sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
              RAID Level: 5  blocksize: 512 numBlocks: 1799936
              Autoconfig: No
              Last configured as: raid0
           Parity status: clean
           Reconstruction is 100% complete.
           Parity Re-write is 100% complete.
           Copyback is 100% complete.

     To check the component label of /dev/sd1e, the following  is
used:

           # raidctl -g /dev/sd1e raid0

     The output of this command will look something like:

           Component label for /dev/sd1e:
              Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
              Version: 2 Serial Number: 13432 Mod Counter: 65
              Clean: No Status: 0
              sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
              RAID Level: 5  blocksize: 512 numBlocks: 1799936
              Autoconfig: No
              Last configured as: raid0

   Dealing with Component Failures    [Toc]    [Back]
     If  for  some  reason (perhaps to test reconstruction) it is
necessary to
     pretend a drive has failed, the following will perform  that
function:

           # raidctl -f /dev/sd2e raid0

     The system will then be performing all operations in degraded mode, where
     missing data is re-computed from existing data and the parity.  In this
     case, obtaining the status of raid0 will return (in part):

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: failed
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: spare

     Note  that  with the use of -f a reconstruction has not been
started.  To
     both fail the disk and start a reconstruction, the -F option
must be
     used:

           # raidctl -F /dev/sd2e raid0

     The -f option may be used first, and then the -F option used
later, on
     the same disk, if desired.   Immediately  after  the  reconstruction is
     started, the status will report:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: reconstructing
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: used_spare
           [...]
           Parity status: clean
           Reconstruction is 10% complete.
           Parity Re-write is 100% complete.
           Copyback is 100% complete.

     This  indicates  that  a  reconstruction is in progress.  To
find out how the
     reconstruction is progressing the -S  option  may  be  used.
This will indicate
  the  progress in terms of the percentage of the reconstruction that
     is completed.  When the reconstruction is  finished  the  -s
option will
     show:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: spared
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: used_spare
           [...]
           Parity status: clean
           Reconstruction is 100% complete.
           Parity Re-write is 100% complete.
           Copyback is 100% complete.

     At  this  point  there  are at least two options.  First, if
/dev/sd2e is
     known to be good (i.e. the failure was either caused  by  -f
or -F, or the
     failed  disk  was replaced), then a copyback of the data can
be initiated
     with the -B option.  In this example, this  would  copy  the
entire contents
     of  /dev/sd4e  to /dev/sd2e.  Once the copyback procedure is
complete, the
     status of the device would be (in part):

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: optimal
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: spare

     and the system is back to normal operation.

     The second option after the reconstruction is to simply  use
/dev/sd4e in
     place  of /dev/sd2e in the configuration file.  For example,
the configuration
 file (in part) might now look like:

           START array
           1 3 0

           START drives
           /dev/sd1e
           /dev/sd4e
           /dev/sd3e

     This can be done as /dev/sd4e is completely  interchangeable
with
     /dev/sd2e  at  this  point.   Note that extreme care must be
taken when
     changing the order of the drives in a  configuration.   This
is one of the
     few  instances  where the devices and/or their orderings can
be changed
     without loss of data!  In general, the  ordering  of  components in a configuration
 file should never be changed.

     If  a  component fails and there are no hot spares available
on-line, the
     status of the RAID set might (in part) look like:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: failed
                      /dev/sd3e: optimal
           No spares.

     In this case there are a number of options.  The  first  option is to add a
     hot spare using:

           # raidctl -a /dev/sd4e raid0

     After the hot add, the status would then be:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: failed
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: spare

     Reconstruction  could  then  take place using -F as describe
above.

     A second option is to rebuild directly onto /dev/sd2e.  Once
the disk
     containing /dev/sd2e has been replaced, one can simply use:

           # raidctl -R /dev/sd2e raid0

     to rebuild the /dev/sd2e component.  As the rebuilding is in
progress,
     the status will be:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: reconstructing
                      /dev/sd3e: optimal
           No spares.

     and when completed, will be:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: optimal
                      /dev/sd3e: optimal
           No spares.

     In circumstances where a particular component is  completely
unavailable
     after a reboot, a special component name will be used to indicate the
     missing component.  For example:

           Components:
                      /dev/sd2e: optimal
                     component1: failed
           No spares.

     indicates that the second component of this RAID set was not
detected at
     all  by  the auto-configuration code.  The name `component1'
can be used
     anywhere a normal component name would be used.   For  example, to add a
     hot  spare  to the above set, and rebuild to that hot spare,
the following
     could be done:

           # raidctl -a /dev/sd3e raid0
           # raidctl -F component1 raid0

     at which point the data missing from `component1'  would  be
reconstructed
     onto /dev/sd3e.

   RAID on RAID    [Toc]    [Back]
     RAID  sets  can  be  layered to create more complex and much
larger RAID
     sets.  A RAID 0 set, for example, could be constructed  from
four RAID 5
     sets.  The following configuration file shows such a setup:

           START array
           # numRow numCol numSpare
           1 4 0

           START disks
           /dev/raid1e
           /dev/raid2e
           /dev/raid3e
           /dev/raid4e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
           128 1 1 0

           START queue
           fifo 100

     A similar configuration file might be used for a RAID 0  set
constructed
     from  components  on  RAID 1 sets.  In such a configuration,
the mirroring
     provides a high degree of  redundancy,  while  the  striping
provides additional
 speed benefits.

   Auto-configuration and Root on RAID    [Toc]    [Back]
     RAID  sets  can  also be auto-configured at boot.  To make a
set auto-configurable,
 simply prepare the RAID set as above, and then do
a:

           # raidctl -A yes raid0

     to turn on auto-configuration for that set.  To turn off auto-configuration,
 use:

           # raidctl -A no raid0

     RAID sets which are auto-configurable will be configured before the root
     file  system is mounted.  These RAID sets are thus available
for use as a
     root file system, or for any other file system.   A  primary
advantage of
     using  the auto-configuration is that RAID components become
more independent
 of the disks they reside on.  For  example,  SCSI  ID's
can change, but
     auto-configured  sets  will  always be configured correctly,
even if the SCSI
 ID's of the component disks have become scrambled.

     Having a system's root file system (/) on a RAID set is also
allowed,
     with  the `a' partition of such a RAID set being used for /.
To use
     raid0a as the root file system, simply use:

           # raidctl -A root raid0

     To return raid0 to be just an  auto-configuring  set  simply
use the -A yes
     arguments.

     Note  that kernels can't be directly read from a RAID component.  To support
 the root file system on RAID sets, some mechanism  must
be used to
     get  a  kernel booting.  For example, a small partition containing only the
     secondary boot-blocks and an alternate kernel (or two) could
be used.
     Once  a  kernel  is  booting however, and an auto-configured
RAID set is
     found that is eligible to be root, then that RAID  set  will
be auto-configured
 and its `a' partition (aka raid[0..n]a) will be used
as the root
     file system.  If two or more RAID sets claim to be root  devices, then the
     user  will  be  prompted to select the root device.  At this
time, RAID 0,
     1, 4, and 5 sets are all supported as root devices.

     A typical RAID 1 setup with root on RAID might  be  as  follows:

     1.    wd0a  -  a small partition, which contains a complete,
bootable, basic
          OpenBSD installation.

     2.   wd1a - also contains a complete, bootable, basic OpenBSD installation.


     3.    wd0e and wd1e - a RAID 1 set, raid0, used for the root
file system.

     4.   wd0f and wd1f - a RAID 1 set, raid1, which will be used
only for
          swap space.

     5.    wd0g  and  wd1g  - a RAID 1 set, raid2, used for /usr,
/home, or other
          data, if desired.

     6.   wd0h and wd1h - a RAID 1 set, raid3, if desired.

     RAID sets raid0, raid1, and raid2 are all  marked  as  autoconfigurable.
     raid0 is marked as being a root-able raid.  When new kernels
are installed,
 the kernel is not only copied to  /,  but  also  to
wd0a and wd1a.
     The kernel on wd0a is required, since that is the kernel the
system boots
     from.  The kernel on wd1a is also required, since that  will
be the kernel
     used  should  wd0 fail.  The important point here is to have
redundant
     copies of the kernel available, in the event that one of the
drives fail.

     There  is no requirement that the root file system be on the
same disk as
     the kernel.  For example, obtaining the  kernel  from  wd0a,
and using sd0e
     and  sd1e  for raid0, and the root file system, is fine.  It
is critical,
     however, that there be multiple kernels  available,  in  the
event of media
     failure.

     Multi-layered  RAID devices (such as a RAID 0 set made up of
RAID 1 sets)
     are not supported as root devices or  auto-configurable  devices at this
     point.   (Multi-layered RAID devices are supported in general, however, as
     mentioned earlier.)  Note that in order to enable  component
auto-detection
 and auto-configuration of RAID devices, the line:

           option    RAID_AUTOCONFIG

     must  be  in the kernel configuration file.  See raid(4) for
more details.

   Unconfiguration    [Toc]    [Back]
     The final operation performed by raidctl is to unconfigure a
raid(4) device.
  This is accomplished via a simple:

           # raidctl -u raid0

     at which point the device is ready to be reconfigured.

   Performance Tuning    [Toc]    [Back]
     Selection  of  the  various parameter values which result in
the best performance
 can be quite tricky, and often requires  a  bit  of
trial-and-error
     to  get those values most appropriate for a given system.  A
whole range
     of factors come into play, including:

     1.   Types of components (e.g. SCSI vs. IDE) and their bandwidth

     2.   Types of controller cards and their bandwidth

     3.   Distribution of components among controllers

     4.   IO bandwidth

     5.   File system access patterns

     6.   CPU speed

     As  with  most  performance tuning, benchmarking under reallife loads may
     be the only way to  measure  expected  performance.   Understanding some of
     the  underlying  technology  is  also useful in tuning.  The
goal of this
     section is to provide pointers to those parameters which may
make significant
 differences in performance.

     For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.
     Since data in a RAID 1 set is arranged in a  linear  fashion
on each component,
  selecting an appropriate stripe size is somewhat less
critical than
     it is for a RAID 5 set.  However: a stripe size that is  too
small will
     cause  large  IO's  to be broken up into a number of smaller
ones, hurting
     performance.  At the same time,  a  large  stripe  size  may
cause problems
     with  concurrent  accesses to stripes, which may also affect
performance.
     Thus values in the range of 32 to 128 are often the most effective.

     Tuning  RAID  5  sets  is trickier.  In the best case, IO is
presented to the
     RAID set one stripe at a time.  Since the entire  stripe  is
available at
     the  beginning  of  the IO, the parity of that stripe can be
calculated before
 the stripe is written, and then  the  stripe  data  and
parity can be
     written  in parallel.  When the amount of data being written
is less than
     a full stripe  worth,  the  `small  write'  problem  occurs.
Since a `small
     write'  means only a portion of the stripe on the components
is going to
     change, the data (and parity) on the components must be  updated slightly
     differently.  First, the `old parity' and `old data' must be
read from
     the components.  Then the new parity is  constructed,  using
the new data
     to  be  written,  and the old data and old parity.  Finally,
the new data
     and new parity are written.  All this extra  data  shuffling
results in a
     serious  loss  of performance, and is typically 2 to 4 times
slower than a
     full stripe write (or read).  To combat this problem in  the
real world,
     it  may  be  useful  to  ensure  that stripe sizes are small
enough that a
     `large IO' from the system will use exactly one large stripe
write.  As
     is seen later, there are some file system dependencies which
may come into
 play here as well.

     Since the size of a `large IO' is often (currently) only 32K
or 64K, on a
     5-drive RAID 5 set it may be desirable to select a SectPerSU
value of 16
     blocks (8K) or 32 blocks (16K).  Since there are 4 data sectors per
     stripe,  the  maximum  data per stripe is 64 blocks (32K) or
128 blocks
     (64K).  Again, empirical measurement will provide  the  best
indicators of
     which values will yield better performance.

     The parameters used for the file system are also critical to
good performance.
  For newfs(8), for example, increasing the block size
to 32K or
     64K  may  improve  performance dramatically.  Also, changing
the cylindersper-group
 parameter from 16 to 32 or higher is often not only necessary
     for  larger file systems, but may also have positive performance implications.


   Summary    [Toc]    [Back]
     Despite the length of this man-page, configuring a RAID  set
is a relatively
  straight-forward process.  All that needs to be done
is the following
 steps:

     1.   Use disklabel(8) to  create  the  components  (of  type
RAID).

     2.   Construct a RAID configuration file: e.g.  `raid0.conf'

     3.   Configure the RAID set with:

                # raidctl -C raid0.conf raid0

     4.   Initialize the component labels with:

                # raidctl -I 123456 raid0

     5.   Initialize other important parts of the set with:

                # raidctl -i raid0

     6.   Get the default label for the RAID set:

                # disklabel raid0 > /tmp/label

     7.   Edit the label:

                # vi /tmp/label

     8.   Put the new label on the RAID set:

                # disklabel -R -r raid0 /tmp/label

     9.   Create the file system:

                # newfs /dev/rraid0e

     10.  Mount the file system:

                # mount /dev/raid0e /mnt

     11.  Use:

                # raidctl -c raid0.conf raid0

          to re-configure the RAID set the next time it is  needed, or put
          raid0.conf  into  /etc  where  it will automatically be
started by the
          /etc/rc scripts.

WARNINGS [Toc] [Back]

     Certain RAID levels (1, 4, 5, 6,  and  others)  can  protect
against some data
  loss  due to component failure.  However the loss of two
components of
     a RAID 4 or 5 system, or the loss of a single component of a
RAID 0 system
  will  result in the entire filesystem being lost.  RAID
is NOT a substitute
 for good backup practices.

     Recomputation of parity MUST be performed whenever there  is
a chance that
     it  may  have  been compromised.  This includes after system
crashes, or before
 a RAID device has been used for the first time.   Failure to keep
     parity  correct will be catastrophic should a component ever
fail -- it is
     better to use RAID 0 and get the additional space and speed,
than it is
     to  use  parity,  but not keep the parity correct.  At least
with RAID 0
     there is no perception of increased data security.

FILES [Toc] [Back]

     /dev/{,r}raid*  raid device special files.

HISTORY [Toc] [Back]

     RAIDframe is a  framework  for  rapid  prototyping  of  RAID
structures developed
  by  the  folks  at  the  Parallel  Data  Laboratory at
Carnegie Mellon University
 (CMU).  A more complete description of the internals
and functionality
  of  RAIDframe is found in the paper "RAIDframe: A
Rapid Prototyping
 Tool for RAID Systems", by William V. Courtright  II,
Garth Gibson,
     Mark  Holland,  LeAnn Neal Reilly, and Jim Zelenka, and published by the
     Parallel Data Laboratory of Carnegie Mellon University.

     The raidctl command first appeared as  a  program  in  CMU's
RAIDframe v1.1
     distribution.   This  version  of  raidctl is a complete rewrite, and first
     appeared in NetBSD 1.4 from where it was ported  to  OpenBSD
2.5.

BUGS [Toc] [Back]

     Hot-spare removal is currently not available.

COPYRIGHT [Toc] [Back]

     The RAIDframe Copyright is as follows:

     Copyright (c) 1994-1996 Carnegie-Mellon University.
     All rights reserved.

     Permission to use, copy, modify and distribute this software
and
     its documentation is hereby granted, provided that both  the
copyright
     notice  and  this  permission notice appear in all copies of
the
     software, derivative works or  modified  versions,  and  any
portions
     thereof, and that both notices appear in supporting documentation.

     CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS  "AS
IS"
     CONDITION.   CARNEGIE  MELLON DISCLAIMS ANY LIABILITY OF ANY
KIND
     FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE  USE  OF  THIS
SOFTWARE.

     Carnegie Mellon requests users of this software to return to

      Software Distribution Coordinator   or   Software.Distribution@CS.CMU.EDU
      School of Computer Science
      Carnegie Mellon University
      Pittsburgh PA 15213-3890

     any  improvements  or  extensions  that  they make and grant
Carnegie the
     rights to redistribute these changes.


OpenBSD     3.6                           July      10,      2001

[ Back ]

Similar pages

Name	OS	Title
raid	OpenBSD	RAIDframe disk driver
raid	FreeBSD	RAIDframe disk driver
ccdconfig	FreeBSD	configuration utility for the concatenated disk driver
ccdconfig	OpenBSD	configuration utility for the concatenated disk driver
espconfig	IRIX	0espconfig is a utility provided for the configuration ESP from the command line. This utility bypasses the we
fx	IRIX	disk utility
fddisk	Tru64	FDI disk maintenance utility
diskalign	IRIX	XLV Aligned Disk Striping Utility
diskperf	IRIX	Disk Performance Testing Utility
dmrecord	IRIX	digital media hard-disk recording utility

newsletter delivery service

RAIDCTL(8)

Contents

NAME [Toc] [Back]

SYNOPSIS [Toc] [Back]

DESCRIPTION [Toc] [Back]

EXAMPLES [Toc] [Back]

WARNINGS [Toc] [Back]

FILES [Toc] [Back]

SEE ALSO [Toc] [Back]

HISTORY [Toc] [Back]

BUGS [Toc] [Back]

COPYRIGHT [Toc] [Back]