wsregexp(3W) wsregexp(3W)
wsregexp: wsrecompile, wsrestep, wsrematch, wsreerr - Wide character
based regular expression compile and match routines
#include <wsregexp.h>
#include <widec.h>
long wsrecompile(struct rexdata *prex, long *expbuf,
long *endbuf, wchar_t eof);
int wsrestep(struct rexdata *prex, wchar_t *wstr, long *expbuf);
int wsrematch(struct rexdata *prex, wchar_t *wstr, long *expbuf);
char *wsreerr(int err);
These functions are general purpose internationalized regular expression
matching routines to be used in programs that perform regular expression
matching. These functions are defined by the wsregexp.h header file.
The function wsrecompile takes as input an internationalized regular
expression as defined below (apart from the normal regular expressions as
defined by regexp) and produces a compiled expression that can be used
with wsrestep or wsrematch.
struct rexdata {
short sed; /* flag for sed */
wchar_t *str; /* regular expression */
int err; /* returned error code, 0 = no error */
wchar_t *loc1;
wchar_t *loc2;
int circf;
...
};
The first parameter, prex, is a pointer to the specification of the
regular expression. prex->sed should be non-zero if sed style delimiter
syntax is to be adopted. prex->str should point to the regular expression
that needs to be compiled. The regular expression string should be in
wide character format. prex->err indicated any error during the
compilation and use of this regular expression. expbuf points to the
place where the compiled regular expression will be placed. endbuf points
to the first long after the space where the compiled regular expression
may be placed. (endbuf-expbuf) should be large enough for the compiled
regular expression to fit. eof is the wide character which marks the end
of the regular expression. This character is usually a / (slash).
If wsrecompile was successful, it returns the pointer to the end of the
regular expression, endbuf. Otherwise, 0 is returned and the error code
is set in prex->err.
Page 1
wsregexp(3W) wsregexp(3W)
The functions wsrestep and wsrematch do pattern matching given a null
terminated wide character string wstr and a compiled regular expression
expbuf as input. expbuf for these functions should be the compiled
regular expression which was obtained by a call to the function
wsrecompile.
The function wsrestep returns non-zero if some substring of wstr matches
the regular expression in expbuf and zero if there is no match. The
function wsrematch returns non-zero if a substring of wstr starting from
the beginning matches the regular expression in expbuf and zero if there
is no match. If there is a match, prex->loc1 and prex->loc2 are set.
prex->loc1 points to the first wide character that matched the regular
expression; prex->loc2 points to the wide character after the last wide
character that matches the regular expression. Thus if the regular
expression matches the entire input string, prex->loc1 will point to the
first wide character of wstr and prex->loc2 will point to the null at the
end of wstr.
wsrestep uses the variable circf of struct rexdata which is set by
wsrecompile if the regular expression begins with ^ (caret). If this is
set then wsrestep will try to match the regular expression to the
beginning of the string only. If more than one regular expression is to
be compiled before the first is executed, the value of prex->circf should
be saved for each compiled expression and should be set to that saved
value before each call to wsrestep.
wsreerr returns the error message corresponding to the error code in the
language of the current locale. The error code err should be one returned
by the wsregexp functions in the err variable of struct rexdata.
The internationalized regular expressions available for use with the
wsregexp functions are constructed as follows:
Expression Meaning
c the character c where c is not a special character.
[[:class<b>:]] class is any character type as defined by the LC_TYPE locale
category. class can be one of the following
alpha a letter
upper an upper-case letter
lower a lower-case letter
digit a decimal digit
xdigit a hexadecimal digit
Page 2
wsregexp(3W) wsregexp(3W)
alnum an alphanumeric character
space any whitespace character
punct a punctuation character
print a printable character
graph a character that has a visible representation
cntrl a control character
[[=c<b>=]] An equivalence class, or, any collation element defined as
having the same relative order in the current collation
sequence as c. As an example, if A and a belong to the same
equivalence class, then both [[=A=]b]] and [[=a=]b]] are
equivalent to [Aab].
[[.cc<b>.]] This represents a multi-character collating symbol. Multicharacter
collating elements must be represented as collating
symbols to distinguish them from single-character collating
elements. As an example, if the string ab is a valid
collating element, then [[.ab.]] will be treated as an
element and will match the same string of characters, while
ab will match the list of characters a and b. If the multicharacter
collating symbol is not a valid collating element
in the current collating sequence definition, the symbol will
be treated as an invalid expression.
[[c<b>-c<b>]] Any collation element in the character expression range c-c,
where c can identify a collating symbol or an equivalence
class. If the character - (hyphen) appears immediately after
an opening square bracker, e.g. [-c], or immediately prior to
a closing square bracket, e.g. [c-], it has no special
meaning.
Immediately following an opening square bracket ^ means the complement
of, e.g. [^c]. Otherwise, it has no special meaning.
Within square brackets, a . that is not part of a [[.cc.]] sequence, or
a : that is not part of a [[:class:]] sequence, matches itself.
regexp(5)
Errors are:
ERR_NORMBR no remembered search string
Page 3
wsregexp(3W) wsregexp(3W)
ERR_REOVFLOW regexp overflow
This happens when wsrecompile cant fit the
compiled regular expression in (endbuf-
expbuf).
ERR_BRA ( ) imbalance
ERR_DELIM illegal or missing delimiter.
ERR_NBR bad number in { }
ERR_2MNBR more than 2 numbers given in { }
ERR_DIGIT digit out of range
ERR_2MLBRA too many (
ERR_RANGE range number too large
ERR_MISSB } expected after \
ERR_BADRNG first number exceeds second in { }.
ERR_SIMBAL [ ] imbalance.
ERR_SYNTAX illegal regular expression
ERR_ILLCLASS illegal [:class<b>:]
ERR_EQUIL illegal [=class<b>=]
ERR_COLL illegal [.cc<b>.]
The following is an example of how the regular expression macros and
calls might be defined by an application program:
#include <wsregexp.h>
#include <widec.h>
. . .
struct rexdata rex;
long expbuf [BUFSIZ]; /* Buffer for the compiled RE */
/* Define a RE to identify a capitalized word */
char *regexp = "[[:space:]][[:upper:]]";
wchar_t wregexp [512];
wchar_t weof; /* The end of regular expression */
char eof = '\0';
wchar_t linebuf [BUFSIZ]; /* Buffer for the input string */
. . .
(void) mbstowcs(wregexp, regexp, strlen(regexp)+1);
Page 4
wsregexp(3W) wsregexp(3W)
(void) mbtowc(&weof, &eof, 1);
rex.str = wregexp;
rex.sed = 0;
rex.err = 0;
if (!wsrecompile(&rex, expbuf, &expbuf[BUFSIZ], weof))
fprintf(stderr, "%s\n", wsreerr(rex.err));
. . .
if (wsrestep(&rex, linebuf, expbuf))
succeed;
PPPPaaaaggggeeee 5555 [ Back ]
|