*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->OpenBSD man pages -> regexp (3)              



NAME    [Toc]    [Back]

     regexp - obsolete regular expression routines

SYNOPSIS    [Toc]    [Back]

     #include <regexp.h>

     regexp *
     regcomp(const char *exp);

     regexec(const regexp *prog, const char *string);

     regsub(const regexp *prog, const char *source, char *dest);

DESCRIPTION    [Toc]    [Back]

     This  interface  is  made  obsolete  by  regex(3).   It   is
available from the
     compatibility library, libcompat.

     The regcomp(), regexec(), regsub(), and regerror() functions
     egrep(1)-style regular expressions  and  supporting  facilities.

     The  regcomp() function compiles a regular expression into a
structure of
     type regexp, and returns a pointer to  it.   The  space  has
been allocated
     using malloc(3) and may be released by free(3).

     The  regexec()  function  matches  a  NUL-terminated  string
against the compiled
 regular expression in prog.  It returns 1 for  success
and 0 for
     failure,  and adjusts the contents of prog's startp and endp
(see below)

     The members of a regexp structure include at least the  following (not
     necessarily in order):

           char *startp[NSUBEXP];
           char *endp[NSUBEXP];

     where NSUBEXP is defined (as 10) in the header file.  Once a
     regexec() has been done using  the  regexp(),  each  startp-
endp pair describes
  one  substring  within  the string, with the startp
pointing to the
     first character of the substring and the  endp  pointing  to
the first character
  following  the  substring.   The 0th substring is the
substring of
     string that matched the whole regular expression.  The  others are those
     substrings that matched parenthesized expressions within the
regular expression,
 with parenthesized expressions numbered  in  leftto-right order
     of their opening parentheses.

     The  regsub() function copies source to dest, making substitutions according
 to the most recent regexec() performed using prog.  Each
instance of
     `&'  in  source  is  replaced  by the substring indicated by
startp[] and
     endp[].  Each instance of `n', where n is a  digit,  is  replaced by the
     substring indicated by startp[n] and endp[n].  To get a literal `&' or
     `n' into dest, prefix it with `'; to get a literal  `'  preceding `&' or
     `n', prefix it with another `'.

     The  regerror()  function is called whenever an error is detected in
     regcomp(), regexec(), or regsub().  The  default  regerror()
writes the
     string  msg,  with  a  suitable  indicator of origin, on the
standard error
     output and invokes exit(3).  The regerror() function can  be
replaced by
     the user if other actions are desirable.


     A  regular expression is zero or more branches, separated by
`|'.  It
     matches anything that matches one of the branches.

     A branch is zero or more pieces, concatenated.  It matches a
match for
     the first, followed by a match for the second, etc.

     A  piece  is  an atom possibly followed by `*', `+', or `?'.
An atom followed
 by `*' matches a sequence of 0 or more matches of  the
atom.  An
     atom followed by `+' matches a sequence of 1 or more matches
of the atom.
     An atom followed by `?' matches a match of the atom, or  the
null string.

     An  atom  is a regular expression in parentheses (matching a
match for the
     regular expression), a range (see below), `.'  (matching any
single character),
  `^'  (matching  the null string at the beginning of
the input
     string), `$' (matching the null string at the end of the input string), a
     `' followed by a single character (matching that character),
or a single
     character with no other significance (matching that  character).

     A  range  is  a sequence of characters enclosed in `[]'.  It
normally matches
 any single character from the sequence.  If the  sequence
begins with
     `^',  it  matches  any single character not from the rest of
the sequence.
     If two characters in the sequence are separated by `-', this
is shorthand
     for  the  full  list of ASCII characters between them (e.g.,
`[0-9]' matches
     any decimal digit).  To include a literal  `]'  in  the  sequence, make it
     the  first character (following a possible `^').  To include
a literal
     `-', make it the first or last character.

AMBIGUITY    [Toc]    [Back]

     If a regular expression could match two different  parts  of
the input
     string,  it  will  match  the one which begins earliest.  If
both begin in
     the same place but match different  lengths,  or  match  the
same length in
     different ways, life gets messier, as follows.

     In general, the possibilities in a list of branches are considered in
     left-to-right order, the possibilities for `*', `+', and `?'
are considered
  longest-first,  nested  constructs are considered from
the outermost
     in, and concatenated  constructs  are  considered  leftmostfirst.  The match
     that will be chosen is the one that uses the earliest possibility in the
     first choice that has to be made.  If there is more than one
choice, the
     next  will be made in the same manner (earliest possibility)
subject to
     the decision on the first choice.  And so forth.

     For example, `(ab|a)b*c' could match `abc'  in  one  of  two
ways.  The first
     choice  is  between `ab' and `a'; since `ab' is earlier, and
does lead to a
     successful overall match, it is chosen.  Since  the  `b'  is
already spoken
     for,  the  `b*'  must  match its last possibility--the empty
string--since it
     must respect the earlier choice.

     In the particular case where no `|'s are present  and  there
is only one
     `*',  `+', or `?', the net effect is that the longest possible match will
     be chosen.  So `ab*', presented with `xabbbby',  will  match
`abbbb'.  Note
     that  if  `ab*', is tried against `xabyabbbz', it will match
`ab' just after
 `x', due to the begins-earliest rule.  (In  effect,  the
decision on
     where  to  start  the  match is the first choice to be made;
hence subsequent
     choices must respect it even if this leads them to less-preferred alternatives.)

RETURN VALUES    [Toc]    [Back]

     The   regcomp()   function   returns   NULL  for  a  failure
(regerror() permitting),
 where failures are syntax errors, exceeding implementation limits,
     or applying `+' or `*' to a possibly NULL operand.

SEE ALSO    [Toc]    [Back]

     ed(1), egrep(1), ex(1), expr(1), fgrep(1), grep(1), regex(3)

HISTORY    [Toc]    [Back]

     Both  code  and  manual  page  for   regcomp(),   regexec(),
regsub(), and
     regerror() were written at the University of Toronto and appeared in
     4.3BSD-Tahoe.  They are intended to be compatible  with  the
Bell V8
     regexp(), but are not derived from Bell code.

BUGS    [Toc]    [Back]

     Empty   branches  and  empty  regular  expressions  are  not
portable to V8.

     The restriction against applying `*' or `+'  to  a  possibly
NULL operand is
     an artifact of the simplistic implementation.

     Does not support egrep(1)'s newline-separated branches; neither does the
     V8 regexp() though.

     Due to emphasis on  compactness  and  simplicity,  it's  not
strikingly fast.
     It  does  give  special  attention  to handling simple cases

OpenBSD      3.6                           June      4,      1993
[ Back ]
 Similar pages
Name OS Title
regsub OpenBSD regular expression routines
regfree OpenBSD regular expression routines
regex OpenBSD regular expression routines
regerror OpenBSD regular expression routines
regcomp OpenBSD regular expression routines
regexec OpenBSD regular expression routines
step Tru64 Regular expression compile and match routines
step_r Tru64 Regular expression compile and match routines
regexpr IRIX regular expression compile and match routines
compile_r Tru64 Regular expression compile and match routines
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service