| 
regexp(3Tcl)							  regexp(3Tcl)
      regexp - Match a regular expression against a string
      regexp ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...?
      Determines	whether	the regular expression exp matches part	or all of
     string and	returns	1 if it	does, 0	if it doesn't.
     If	additional arguments are specified after string	then they are treated
     as	the names of variables in which	to return information about which
     part(s) of	string matched exp.  MatchVar will be set to the range of
     string that matched all of	exp.  The first	subMatchVar will contain the
     characters	in string that matched the leftmost parenthesized
     subexpression within exp, the next	subMatchVar will contain the
     characters	that matched the next parenthesized subexpression to the right
     in	exp, and so on.
     If	the initial arguments to regexp	start with - then they are treated as |
     switches.	The following switches are currently supported:
     -nocase   Causes upper-case characters in string to be treated as lower  |
	       case during the matching	process.
     -indices  Changes what is stored in the subMatchVars. Instead of storing |
	       the matching characters from string, each variable will contain|
	       a list of two decimal strings giving the	indices	in string of  |
	       the first and last characters in	the matching range of	      |
	       characters.
     --	       Marks the end of	switches.  The argument	following this one    |
	       will be treated as exp even if it starts	with a -.
     If	there are more subMatchVar's than parenthesized	subexpressions within
     exp, or if	a particular subexpression in exp doesn't match	the string
     (e.g. because it was in a portion of the expression that wasn't matched),
     then the corresponding subMatchVar	will be	set to ``-1 -1'' if -indices
     has been specified	or to an empty string otherwise.
REGULAR	EXPRESSIONS
     Regular expressions are implemented using Henry Spencer's package
     (thanks, Henry!), and much	of the description of regular expressions
     below is copied verbatim from his manual entry.
     A regular expression is zero or more branches, separated by ``|''.	 It
     matches anything that matches one of the branches.
									Page 1
regexp(3Tcl)							  regexp(3Tcl)
     A branch is zero or more pieces, concatenated.  It	matches	a match	for
     the first,	followed by a match for	the second, etc.
     A piece is	an atom	possibly followed by ``*'', ``+'', or ``?''.  An atom
     followed by ``*'' matches a sequence of 0 or more matches of the atom.
     An	atom followed by ``+'' matches a sequence of 1 or more matches of the
     atom.  An atom followed by	``?'' matches a	match of the atom, or the null
     string.
     An	atom is	a regular expression in	parentheses (matching a	match for the
     regular expression), a range (see below), ``.''  (matching	any single
     character), ``^'' (matching the null string at the	beginning of the input
     string), ``$'' (matching the null string at the end of the	input string),
     a ``\'' followed by a single character (matching that character), or a
     single character with no other significance (matching that	character).
     A range is	a sequence of characters enclosed in ``[]''.  It normally
     matches any single	character from the sequence.  If the sequence begins
     with ``^'', it matches any	single character not from the rest of the
     sequence.	If two characters in the sequence are separated	by ``-'', this
     is	shorthand for the full list of ASCII characters	between	them (e.g.
     ``[0-9]'' matches any decimal digit).  To include a literal ``]'' in the
     sequence, make it the first character (following a	possible ``^'').  To
     include a literal ``-'', make it the first	or last	character.
CHOOSING AMONG ALTERNATIVE MATCHES    [Toc]    [Back]     In	general	there may be more than one way to match	a regular expression
     to	an input string.  For example, consider	the command
	  regexp  (a*)b*  aabaaabb  x  y
     Considering only the rules	given so far, x	and y could end	up with	the
     values aabb and aa, aaab and aaa, ab and a, or any	of several other
     combinations.  To resolve this potential ambiguity	regexp chooses among
     alternatives using	the rule ``first then longest''.  In other words, it
     considers the possible matches in order working from left to right	across
     the input string and the pattern, and it attempts to match	longer pieces
     of	the input string before	shorter	ones.  More specifically, the
     following rules apply in decreasing order of priority:
     [1]  If a regular expression could	match two different parts of an	input
	  string then it will match the	one that begins	earliest.
     [2]  If a regular expression contains | operators then the	leftmost
	  matching sub-expression is chosen.
     [3]  In *,	+, and ? constructs, longer matches are	chosen in preference
	  to shorter ones.
									Page 2
regexp(3Tcl)							  regexp(3Tcl)
     [4]  In sequences of expression components	the components are considered
	  from left to right.
     In	the example from above,	(a*)b* matches aab:  the (a*) portion of the
     pattern is	matched	first and it consumes the leading aa; then the b*
     portion of	the pattern consumes the next b.  Or, consider the following
     example:
	  regexp  (ab|a)(b*)c  abc  x  y  z
     After this	command	x will be abc, y will be ab, and z will	be an empty
     string.  Rule 4 specifies that (ab|a) gets	first shot at the input	string
     and Rule 2	specifies that the ab sub-expression is	checked	before the a
     sub-expression.  Thus the b has already been claimed before the (b*)
     component is checked and (b*) must	match an empty string.
     match, regular expression,	string
									PPPPaaaaggggeeee 3333[ Back ] |