raid - RAIDframe disk driver
pseudo-device raid [count]
The raid driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to
OpenBSD. This document assumes that the reader has at least
some familiarity
with RAID and RAID concepts. The reader is also assumed to know
how to configure disks and pseudo-devices into kernels, how
to generate
kernels, and how to partition disks.
RAIDframe provides a number of different RAID levels including:
RAID 0 provides simple data striping across the components.
RAID 1 provides mirroring.
RAID 4 provides data striping across the components, with
parity stored
on a dedicated drive (in this case, the last component).
RAID 5 provides data striping across the components, with
parity distributed
across all the components.
There are a wide variety of other RAID levels supported by
RAIDframe, including
Even-Odd parity, RAID level 5 with rotated sparing,
Chained
declustering, and Interleaved declustering. The reader is
referred to
the RAIDframe documentation mentioned in the HISTORY section
for more detail
on these various RAID configurations.
Depending on the parity level configured, the device driver
can support
the failure of component drives. The number of failures allowed depends
on the parity level selected. If the driver is able to handle drive
failures, and a drive does fail, then the system is operating in "degraded
mode". In this mode, all missing data must be reconstructed from the
data and parity present on the other components. This results in much
slower data accesses, but does mean that a failure need not
bring the
system to a complete halt.
The RAID driver supports and enforces the use of `component
labels'. A
`component label' contains important information about the
component, including
a user-specified serial number, the row and column
of that component
in the RAID set, and whether the data (and parity) on
the component
is `clean'. If the driver determines that the labels are
very inconsistent
with respect to each other (e.g. two or more serial
numbers do not
match) or that the component label is not consistent with
its assigned
place in the set (e.g., the component label claims the component should
be the 3rd one of a 6-disk set, but the RAID set has it as
the 3rd component
in a 5-disk set) then the device will fail to configure. If the
driver determines that exactly one component label seems to
be incorrect,
and the RAID set is being configured as a set that supports
a single
failure, then the RAID set will be allowed to configure, but
the incorrectly
labeled component will be marked as `failed', and the
RAID set
will begin operation in degraded mode. If all of the components are consistent
among themselves, the RAID set will configure normally.
Component labels are also used to support the auto-detection
and autoconfiguration
of RAID sets. A RAID set can be flagged as
auto-configurable,
in which case it will be configured automatically
during the kernel
boot process. RAID filesystems which are automatically
configured
are also eligible to be the root filesystem. There is currently no support
for booting a kernel directly from a RAID set. To use
a RAID set as
the root filesystem, a kernel is usually obtained from a
small non-RAID
partition, after which any auto-configuring RAID set can be
used for the
root filesystem. See raidctl(8) for more information on auto-configuration
of RAID sets.
The driver supports `hot spares', disks which are on-line,
but are not
actively used in an existing filesystem. Should a disk
fail, the driver
is capable of reconstructing the failed disk onto a hot
spare or back onto
a replacement drive. If the components are hot swapable,
the failed
disk can then be removed, a new disk put in its place, and a
copyback operation
performed. The copyback operation, as its name indicates, will
copy the reconstructed data from the hot spare to the previously failed
(and now replaced) disk. Hot spares can also be hot-added
using
raidctl(8).
If a component cannot be detected when the RAID device is
configured,
that component will be simply marked as 'failed'.
The user-land utility for doing all raid configuration and
other operations
is raidctl(8). Most importantly, raidctl(8) must be
used with the
-i option to initialize all RAID sets. In particular, this
initialization
includes re-building the parity data. This rebuilding
of parity data
is also required when either a) a new RAID device is
brought up for
the first time or b) after an un-clean shutdown of a RAID
device. By using
the -P option to raidctl(8), and performing this on-demand recomputation
of all parity before doing a fsck(8) or a newfs(8),
filesystem integrity
and parity integrity can be ensured. It bears repeating again
that parity recomputation is required before any filesystems
are created
or used on the RAID device. If the parity is not correct,
then missing
data cannot be correctly recovered.
RAID levels may be combined in a hierarchical fashion. For
example, a
RAID 0 device can be constructed out of a number of RAID 5
devices
(which, in turn, may be constructed out of the physical
disks, or of other
RAID devices).
It is important that drives be hard-coded at their respective addresses
(i.e., not left free-floating, where a drive with SCSI ID of
4 can end up
as /dev/sd0c) for well-behaved functioning of the RAID device. This is
true for all types of drives, including IDE, HP-IB, etc.
For normal SCSI
drives, for example, the following can be used to fix the
device addresses:
sd0 at scsibus0 target 0 lun ? # SCSI disk
drives
sd1 at scsibus0 target 1 lun ? # SCSI disk
drives
sd2 at scsibus0 target 2 lun ? # SCSI disk
drives
sd3 at scsibus0 target 3 lun ? # SCSI disk
drives
sd4 at scsibus0 target 4 lun ? # SCSI disk
drives
sd5 at scsibus0 target 5 lun ? # SCSI disk
drives
sd6 at scsibus0 target 6 lun ? # SCSI disk
drives
See sd(4) for more information. The rationale for fixing
the device addresses
is as follows: Consider a system with three SCSI
drives at SCSI
ID's 4, 5, and 6, and which map to components /dev/sd0e,
/dev/sd1e, and
/dev/sd2e of a RAID 5 set. If the drive with SCSI ID 5
fails, and the
system reboots, the old /dev/sd2e will show up as /dev/sd1e.
The RAID
driver is able to detect that component positions have
changed, and will
not allow normal configuration. If the device addresses are
hard coded,
however, the RAID driver would detect that the middle component is unavailable,
and bring the RAID 5 set up in degraded mode.
Note that the
auto-detection and auto-configuration code does not care
about where the
components live. The auto-configuration code will correctly
configure a
device even after any number of the components have been rearranged.
The first step to using the raid driver is to ensure that it
is suitably
configured in the kernel. This is done by adding a line
similar to:
pseudo-device raid 4 # RAIDframe disk device
to the kernel configuration file. The `count' argument (
`4', in this
case), specifies the number of RAIDframe drivers to configure. To turn
on component auto-detection and auto-configuration of RAID
sets, simply
add:
option RAID_AUTOCONFIG
to the kernel configuration file.
All component partitions must be of the type FS_BSDFFS
(e.g., 4.2BSD) or
FS_RAID (e.g., RAID). The use of the latter is strongly encouraged, and
is required if auto-configuration of the RAID set is desired. Since
RAIDframe leaves room for disklabels, RAID components can be
simply raw
disks, or partitions which use an entire disk. Note that
some platforms
(such as SUN) do not allow using the FS_RAID partition type.
On these
platforms, the raid driver can still auto-configure from
FS_BSDFFS partitions.
A more detailed treatment of actually using a raid device is
found in
raidctl(8). It is highly recommended that the steps to reconstruct,
copyback, and re-compute parity are well understood by the
system administrator(s)
before a component failure. Doing the wrong
thing when a
component fails may result in data loss.
Additional debug information can be sent to the console by
specifying:
option RAIDDEBUG
Certain RAID levels (1, 4, 5, 6, and others) can protect
against some data
loss due to component failure. However the loss of two
components of
a RAID 4 or 5 system, or the loss of a single component of a
RAID 0 system,
will result in the entire filesystems on that RAID device being
lost. RAID is NOT a substitute for good backup practices.
Recomputation of parity MUST be performed whenever there is
a chance that
it may have been compromised. This includes after system
crashes, or before
a RAID device has been used for the first time. Failure to keep
parity correct will be catastrophic should a component ever
fail -- it is
better to use RAID 0 and get the additional space and speed,
than it is
to use parity, but not keep the parity correct. At least
with RAID 0
there is no perception of increased data security.
/dev/{,r}raid* raid device special files.
ccd(4), sd(4), wd(4), MAKEDEV(8), config(8), fsck(8),
mount(8), newfs(8),
raidctl(8)
The raid driver in OpenBSD is a port of RAIDframe, a framework for rapid
prototyping of RAID structures developed by the folks at the
Parallel Data
Laboratory at Carnegie Mellon University (CMU). RAIDframe, as originally
distributed by CMU, provides a RAID simulator for a
number of different
architectures, and a user-level device driver and a
kernel device
driver for Digital UNIX. The raid driver is a kernelized
version of
RAIDframe v1.1.
A more complete description of the internals and functionality of RAIDframe
is found in the paper "RAIDframe: A Rapid Prototyping
Tool for RAID
Systems", by William V. Courtright II, Garth Gibson, Mark
Holland, LeAnn
Neal Reilly, and Jim Zelenka, and published by the Parallel
Data Laboratory
of Carnegie Mellon University. The raid driver first
appeared in
NetBSD 1.4 from where it was ported to OpenBSD 2.5.
The RAIDframe Copyright is as follows:
Copyright (c) 1994-1996 Carnegie-Mellon University.
All rights reserved.
Permission to use, copy, modify and distribute this software
and
its documentation is hereby granted, provided that both the
copyright
notice and this permission notice appear in all copies of
the
software, derivative works or modified versions, and any
portions
thereof, and that both notices appear in supporting documentation.
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS
IS"
CONDITION.
CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY
DAMAGES
WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
Carnegie Mellon requests users of this software to return to
Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
School of Computer Science
Carnegie Mellon University
Pittsburgh PA 15213-3890
any improvements or extensions that they make and grant
Carnegie the
rights to redistribute these changes.
OpenBSD 3.6 November 9, 1998
[ Back ] |