advance, advance_r, compile, compile_r, step, step_r -
Regular expression compile and match routines
#define INIT declarations #define GETC getc code #define
PEEKC peek code #define UNGETC(c) ungetc code #define
RETURN(ptr) return code #define ERROR(val) error code
#include <regexp.h>
char *compile(
char *instring,
char *expbuf,
const char *endbuf,
int eof ); int step(
const char *string,
const char *expbuf ); int advance(
const char *string,
const char *expbuf );
extern char *loc1, *loc2, *locs;
The following functions do not conform to current standards
and are supported only for backward compatibility:
char *compile_r(
char *instring,
char *expbuf,
char *endbuf,
int eof,
struct regexp_data *regexp_data ); int advance_r(
char *string,
char *expbuf,
struct regexp_data *regexp_data ); int step_r(
char *string,
char *expbuf,
struct regexp_data *regexp_data );
Interfaces documented on this reference page conform to
industry standards as follows:
advance(), compile(), step(): XSH4.2
Refer to the standards(5) reference page for more information
about industry standards and associated tags.
The value of the next character (byte) in the regular
expression pattern. Returned by the next call to the
GETC() and PEEKC() macros. Specifies a pointer to the
character following the last character of the compiled
regular expression. Specifies an error value. Specifies
a string to be passed to the compile() function.
The instring parameter is never used explicitly by
the compile() function, but you can use it in your
macros. For example, you may want to pass the
string containing a pattern as the instring parameter
to the compile() function and use the INIT()
macro to set a pointer to the beginning of this
string. When your macros do not use instring, call
the compile() function with a value of ((char *) 0)
for this parameter. Points to a character array
where the compiled regular expression is stored.
Points to the location that immediately follows the
character array where the compiled regular expression
is stored. When the compiled expression cannot
be contained in (endbuf-expbuf) number of bytes, a
call to the ERROR(_BIGREGEXP) macro is made (see
the ERRORS section). Specifies the character that
marks the end of the regular expression. For example,
in ed this character is usually a / (slash).
Points to a NULL terminated string of characters,
in the step() function, to be searched for a match.
Is data for the compile_r(), step_r(), and
advance_r() functions.
The compile(), advance(), and step() functions are used
for general-purpose expression matching.
The compile() function takes a simple regular expression
as input and produces a compiled expression that can be
used with the step() and advance() functions.
The following six macros, used in the compile() function,
must be defined before the #include <regexp.h> statement
in programs. The GETC(), PEEKC(), and UNGETC() macros
operate on the regular expression provided as input for
the compile() function. The INIT() macro is used for
dependent declarations and initializations. In the regexp.h
header file this macro is located right after the
compile() function declarations and opening { (left
brace). Your INIT() declarations must end with a ; (semicolon).
The INIT() macro is frequently used to set a register
variable to point to the beginning of the regular
expression, so that this pointer can be used in
declarations for GETC(), PEEKC(), and UNGETC().
Alternatively, you can use INIT() to declare external
variables that GETC(), PEEKC(), and UNGETC()
need. The GETC() macro returns the value of the
next character (byte) in the regular-expression
pattern. Successive calls to GETC() return successive
characters of the regular expression. The
PEEKC() macro returns the next character (byte) in
the regular expression. Immediate subsequent calls
to this macro return the same byte, which is also
the next character returned by the GETC() macro.
The UNGETC() macro causes the c parameter to be
returned by the next call to the GETC() and PEEKC()
macros. No more than one character of pushback is
ever needed because this character is guaranteed to
be the last character read by the GETC() macro. The
value of the UNGETC() macro is always ignored. The
RETURN() macro is used for normal exit of the compile()
function. The value of the ptr parameter is
a pointer to the character following the last character
of the compiled regular expression. This is
useful in programs that manage memory allocation.
The ERROR() macro is the abnormal return from the
compile() function. A call to this macro should
never return a value. In this macro, val is an
error number, which is described in the ERRORS section
of this reference page.
The step() function finds the first substring of the
string parameter that matches the compiled expression
pointed to by the expbuf parameter. When there is no
match, the step() function returns a value of 0 (zero).
When there is a match, the step() function returns a
nonzero value and sets two global character pointers:
loc1, which points to the first character of the substring
that matches the pattern, and loc2, which points to the
character immediately following the substring that matches
the pattern. When the regular expression matches the
entire expression, loc1 points to the first character of
the string parameter and loc2 points to the NULL character
at the end of the expression specified by the string
parameter.
The step() function uses the integer variable circf, which
is set by the compile() function when the regular expression
begins with a ^ (circumflex). When this variable is
set, the step() function only tries to match the regular
expression to the beginning of the string. When you compile
more than one regular expression before executing the
first one, save the value of circf for each compiled
expression and set circf to the saved value before each
call to step().
The advance() function tests whether an initial substring
of the string parameter matches the expression pointed to
by the expbuf parameter. Using the same parameters that
were passed to it, the step() function calls the advance()
function. The step() function increments a pointer through
the string parameter characters and calls advance() until
a nonzero value, which indicates a match, is returned, or
until the end of the expression pointed to by the string
parameter is reached. To unconditionally constrain string
to point to the beginning of the expression, call the
advance() function directly instead of calling step().
When the advance() function encounters an * (asterisk) or
a \{\} sequence in the regular expression, it advances its
pointer to the string to be matched as far as possible and
recursively calls itself, trying to match the remainder of
the regular expression. As long as there is no match, the
advance() function backs up along the string until the
function finds a match or reaches the point in the string
where the initial match with the * or \{\} character
occurred.
It is sometimes desirable to stop this backing up before
the initial pointer position in the string is reached.
When the locs global character pointer is matched with the
character at the pointer position in the string during the
backing-up process, the advance() function breaks out of
the recursive loop that backs up and returns the value 0
(zero).
The compile_r(), step_r(), and advance_r() functions are
the reentrant versions of the compile(), step(), and
advance() functions. They are supported in order to maintain
backward compatibility with operating system versions
prior to Tru64 UNIX Version 4.0.
The regexp.h header file defines the regexp_data structure.
This interface has been deprecated in favor of the regcomp()
interface specified by the POSIX and X/Open standards
and may be retired. If possible, you should migrate
regexp() regular expression routines to the routines
offered under the regcomp() and regexec() interfaces (see
regcomp(3)).
The regexp interface is provided to support System V
applications. Traditional BSD applications use different
functions for regular expression handling. See the
re_comp(3) and re_exec(3) reference pages.
The advance(), compile(), and step() functions are scheduled
to be withdrawn from a future version of the X/Open
CAE Specification.
Upon successful completion, the compile() function calls
the RETURN() macro. Upon failure, this function calls the
ERROR() macro.
Whenever a successful match occurs, the step() and
advance() functions return a nonzero value. Upon failure,
these functions return a value of 0 (zero).
[Tru64 UNIX] The compile_r(), step_r(), and advance_r()
functions return the same values as their non-reentrant
counterparts.
If any of the following conditions occurs, the compile()
or compile_r() functions call the ERROR() macro with an
error value as its argument: The range endpoint is too
large. A bad number was received. The number in \digit
is out of range. There is an illegal or missing delimiter.
There is no remembered search string. The use of a
pair of \( and \) is unbalanced. There are too many \(
and \) pairs (exceeds the maximum value set for _NBRA in
regexp.h, usually 9). More than two numbers are given in
the \{ and \} pair. A } character was expected after a \.
The first number exceeds the second in the \{ and \} pair.
There is a [ ] pair imbalance. There is a regular expression
overflow. [Tru64 UNIX] There was an unknown error.
The following is an example of the regular expression
macros and calls from the grep command:
#define INIT register char *sp=instring; #define
GETC (*sp++) #define PEEKC (*sp) #define
UNGETC(c) (--sp) #define RETURN(c) return; #define
ERROR(c) regerr
#include <regexp.h>
. . .
compile (patstr, expbuf, &expbuf[ESIZE], '\0');
. . .
if (step (linebuf, expbuf))
succeed( );
. . .
Functions: ctype(3), fnmatch(3), glob(3), regcomp(3),
re_comp(3)
Commands: ed(1), sed(1), grep(1)
Standards: standards(5)
regexp(3)
[ Back ] |