grio - guaranteed-rate I/O
Guaranteed-rate I/O (GRIO) refers to a guarantee made by the system to a
user process indicating that the given process will receive data from a
system resource at a predefined rate regardless of any other activity on
the system. The purpose of this mechanism is to manage the sharing of
scarce I/O resources amongst a number of competing processes, and to
permit a given process to reserve a portion of the system's resources for
its exclusive use for a period of time.
Currently, the only system resources that can be reserved using the GRIO
mechanism are files stored on an XFS filesystem.
A GRIO guarantee is defined as the number of bytes that can be read or
written to a given file by a given process over a set period of time. If
a process has a GRIO guarantee on a file, it can write data to or read
data from the file at the guaranteed rate regardless of other I/O
activity on the system. If the process issues I/O requests at a size or
rate greater than the guarantee, the behavior of the system is determined
by the type of rate guarantee. The excess requests may be blocked until
such time as they fall within the scope of the guarantee, or the requests
may be allowed up to the limit of the device bandwidth before that are
The following types of rate guarantees are supported:
PER_FILE_GUAR - the GRIO reservation is associated with a
single file and may not be transferred
PER_FILE_SYS_GUAR - the GRIO reservation may be transferred to any
file on a given file system
PROC_PRIVATE_GUAR - the GRIO reservation may not transferred to
PROC_SHARE_GUAR - the GRIO reservation may be transferred to
FIXED_ROTOR_GUAR - the GRIO reservation is the VOD (ie. ROTOR)
type of reservation and the rotor position is
established at the start of the reservation
SLIP_ROTOR_GUAR - the GRIO reservation is the VOD (ie. ROTOR)
type of reservation and the rotor position will
vary according to the access pattern of the
NON_ROTOR_GUAR - the GRIO reservation is a regular
(ie. NONROTOR) type of reservation
REALTIME_SCHED_GUAR - the GRIO reservation specifies the rate at
which data will be provided to the process
NON_SCHED_GUAR - the I/O requests associated with the GRIO
reservation are non-scheduled, this will
affect other GRIO reservations on the system
Only one GRIO reservation characteristic may be chosen from each group.
The PROC_SHARE_GUAR, non REALTIME_SCHED_GUAR, and NON_ROTOR_GUAR
characteristics are set by default.
There are a number of components in the GRIO mechanism. The first is the
guarantee-granting daemon, ggd. This is a user level process that is
started when the system is booted. It controls the granting of
guarantees, the initiation and expiration of existing guarantees, and the
monitoring of the available bandwidths of each I/O device on the system.
User processes communicate with the daemon via the grio library using the
grio_associate_file() - associate a file with a guarantee
grio_query_fs() - query filesystem
grio_action_list() - issue list of GRIO reservation requests
grio_reserve_file() - issue GRIO reservation request
grio_reserve_fs() - issue GRIO reservation request
grio_unreserve_bw() - remove grio reservation
When ggd is started, it reads the file /etc/grio_disks and uses its
inbuilt system knowledge to determine the bandwidths of the various
devices on the system. The /etc/grio_disks file may be edited by the
system administrator to tune performance. If ggd is terminated, all
existing rate guarantees are removed.
The next component of the GRIO mechanism is the XLV volume manager. Rate
guarantees may be obtained from files on the real-time and non-realtime
subvolumes of an XFS filesystem as well as non-XLV disk partitions having
XFS filesystems. The disk driver command retry mechanism is disabled on
the disks that make up the real-time subvolume. This means that if a
drive error occurs, the data is lost. The intent of real-time files is
to read/write data from the disk as rapidly as possible. If the device
driver is forced to retry one process's disk request, it causes the
requests from other processes to become delayed.
If one partition of a disk is used in a real-time subvolume, the entire
disk is considered to be used for real-time operation. If one disk on a
SCSI controller is used for real-time operation then all the other
devices on that controller must be used for real-time operation as well.
In order to use the guaranteed-rate I/O mechanism effectively, the XLV
volume and XFS filesystem must be set up properly. The next section
gives an example.
By default, the ggd daemon will allow four process streams to obtain rate
guarantees. If support for more streams is desired, it is necessary to
obtain licenses for the additional streams. The license information is
stored in the /usr/var/netls/nodelock file and interpreted by the ggd
daemon on startup.
While configuring a system that will be used for guaranteed rate I/O, it
is important to recognize that the system setup can affect grio. For
example, if non standard disk drives are added and the grio_disks file
updated, this might have an impact on the performance of the SCSI
controllers (which should then be tuned thru grio_disks). Also, it is
recommended that the real time disks to which grio is needed should be
placed all by themselves on the SCSI bus. This is becauses devices like
CDROMs and tapes might hold up the bus if accessed while a grio request
is being serviced. Grio does not model file system meta data writes, so
keeping the data and log volumes of the XLV on different SCSI busses from
the real time disks help in satisfying the guarantees. For real time
disks, it is strongly recommended that the disks be partitioned to have
only one partition, the real time partition. Finally, some system
components are used for I/O as well as other data traffic, so it is
important to not overload these components with other data requests while
grio requests are being serviced.
The example in this section describes a method of laying out the disks,
filesystem, and real-time file that enables the greatest number of
processes to obtain guarantees on a single file concurrently. It is not
necessary to construct a file in this manner in order to use GRIO,
however fewer processes can obtain rate guarantees on the file as a
result. It is also not necessary to use a real-time file, however
guarantees obtained on non-real time files can only be considered to be
"soft" guarantees at best which may not be sufficient for some
applications. Assume that there are four disk partitions available for
the real-time subvolume of an XLV volume. Each one of the partitions is
on a different physical disk.
Before setting up the XFS filesystem, the I/O request size used by the
user process must be determined. In order to get the greatest I/O rate,
the file data should be striped across all the disks in the subvolume.
To avoid filesystem fragmentation and to force all I/O operations to be
on stripe boundaries, the file extent size should be an even multiple of
the volume stripe width. Rate guarantees should be made with sizes equal
to even multiples of I/O request sizes. All I/O request sizes must be
even multiples of the optimal I/O size of the underlying disk devices.
The optimal I/O size is specified on a per device basis in the
/etc/grio_disks. The disk device characteristics for optimal I/O sizes of
64k, 128k, 256k, and 512k bytes are supplied. The grio_bandwidth(1M)
utility can be used to determine the device characteristics for different
optimal I/O sizes. For simplicity, this example will use an optimal I/O
size of 64K bytes. Also, the stripe size of the XLV realtime subvolume
for this file system will be set to an even multiple of 64K bytes. If
there are four disks available, let the stripe step size be equal to 64k
bytes, and the volume stripe width be 256k bytes. The file extent size
should be set to a multiple of the volume stripe width. In this example,
let the file extent size be equal to the stripe width. Assume that the
application always issues I/O requests in sizes equal to the extent size.
Once the XLV volume and XFS filesystem have been created, the application
can create the real-time file. Real-time files must be read or written
using direct, synchronous I/O requests. (This is also true for GRIO
accesses to non-real time files.) The open(2) manual page describes the
use and buffer alignment restrictions when using direct I/O. When
creating a real-time file, the F_FSSETXATTR command must be issued to set
the XFS_XFLAG_REALTIME flag. This can only be issued on a newly created
file. It is not possible to mark a file as real-time once non-real-time
data blocks have been allocated to it.
After the real-time file has been created, the application can issue
grio_reserve_fs(3X)] and grio_associate_file(3x) pair, to obtain the rate
guarantee. Once the rate guarantee is established, any read or write
requests that the application issues to the file will be completed within
the parameters of the guarantee. This will continue until the file is
closed, the guarantee is removed by the application via
grio_unreserve_bw(3X), or the guarantee expires.
Any process can use the grio_associate_file() call to switch the GRIO
reservation to itself if the PROC_SHARE_GUAR characteristic is set. This
causes the first process to lose the rate guarantee and the second
process to receive it. Similarly, the grio_associate_file() call can be
used to switch the GRIO reservation from one file to another, within the
same filesystem, if the PER_FILE_SYS_GUAR characteristic is set.
If a rate cannot be guaranteed, ggd returns an error to the requesting
process. It also returns the amount of bandwidth currently available on
the device. The process can then determine if this amount is sufficient
and if so issue another rate guarantee request.
ggd(1M), grio(1M), grio_bandwidth(1M), grio_associate_file(3X),
grio_query_fs(3X), grio_action_list(3X), grio_reserve_file(3X),
grio_reserve_fs(3X), grio_unreserve_bw(3X), grio_disks(4)
To make grio more secure, processes requesting guaranteed rate I/O need
the priviledge of CAP_DEVICE_MGMT or root permissions, else their
requests will fail.
PPPPaaaaggggeeee 4444 [ Back ]