If / is the delimiter then the initial 'm' is optional. With the 'm' you can use any pair of non-alphanumeric characters as delimiters. This is particularly useful for matching Unix path names that contain '/'. If the final delimiter is followed by the optional letter 'i', the matching is done in a case-insensitive manner. PATTERN may contain references to scalar variables, which will be interpolated (and the pattern recompiled) every time the pattern search is evaluated. (Note that $) and $| may not be interpolated because they look like end-of-string tests.) If you want such a pattern to be compiled only once, add an "o" after the trailing delimiter. This avoids expensive run-time recompilations, and is useful when the value you are interpolating won't change over the life of the script. If the PATTERN evaluates to a null string, the most recent successful regular expression is used instead.
If used in a context that requires an array value, a pattern match returns an array consisting of the subexpressions matched by the parentheses in the pattern, i.e. ($1, $2, $3...). It does NOT actually set $1, $2, etc. in this case, nor does it set $+, $`, $& or $'. If the match fails, a null array is returned. If the match succeeds, but there were no parentheses, an array value of (1) is returned.
Examples:
open(tty, '/dev/tty'); <tty> =~ /^y/i && do foo(); # do foo if desired if (/Version: *([0-9.]*)/) { $version = $1; } next if m#^/usr/spool/uucp#; # poor man's grep $arg = shift; while (<>) { print if /$arg/o; # compile only once } if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))This last example splits $foo into the first two words and the remainder of the line, and assigns those three fields to $F1, $F2 and $Etc. The conditional is true if any variables were assigned, i.e. if the pattern matched.
The "g" modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In an array context, it returns a list of all the substrings matched by all the parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern. In a scalar context, it iterates through the string, returning TRUE each time it matches, and FALSE when it eventually runs out of matches. (In other words, it remembers where it left off last time and restarts the search at that point.) It presumes that you have not modified the string since the last match. Modifying the string between matches may result in undefined behavior. (You can actually get away with in-place modifications via substr() that do not change the length of the entire string. In general, however, you should be using s///g for such modifications.) Examples:
# array context ($one,$five,$fifteen) = (\`uptime\` =~ /(\d+\.\d+)/g); # scalar context $/ = ""; $* = 1; while ($paragraph = <>) { while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) { $sentences++; } } print "$sentences\n";
s/\bgreen\b/mauve/g; # don't change wintergreen $path =~ s|/usr/bin|/usr/local/bin|; s/Login: $foo/Login: $bar/; # run-time pattern ($foo = $bar) =~ s/bar/foo/; $_ = 'abc123xyz'; s/\d+/$&*2/e; # yields 'abc246xyz' s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz' s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz' s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields(Note the use of $ instead of \ in the last example. See section on regular expressions.)
For example, here is a loop which inserts index producing entries before any line containing a certain pattern:
while (<>) { study; print ".IX foo\n" if /\bfoo\b/; print ".IX bar\n" if /\bbar\b/; print ".IX blurfl\n" if /\bblurfl\b/; ... print; }In searching for /\bfoo\b/, only those locations in $_ that contain 'f' will be looked at, because 'f' is rarer than 'o'. In general, this is a big win except in pathological cases. The only question is whether it saves you more time than it took to build the linked list in the first place.
Note that if you have to look for strings that you don't know till runtime, you can build an entire loop as a string and eval that to avoid recompiling all your patterns all the time. Together with undefining $/ to input entire files as one record, this can be very fast, often faster than specialized programs like fgrep. The following scans a list of files (@files) for a list of words (@words), and prints out the names of those files that contain a match:
$search = 'while (<>) { study;'; foreach $word (@words) { $search .= "++\$seen{\$ARGV} if /\\b$word\\b/;\n"; } $search .= "}"; @ARGV = @files; undef $/; eval $search; # this screams $/ = "\n"; # put back to normal input delim foreach $file (sort keys(%seen)) { print $file, "\n"; }
If the c modifier is specified, the SEARCHLIST character set is complemented. If the d modifier is specified, any characters specified by SEARCHLIST that are not found in REPLACEMENTLIST are deleted. (Note that this is slightly more flexible than the behavior of some tr programs, which delete anything they find in the SEARCHLIST, period.) If the s modifier is specified, sequences of characters that were translated to the same character are squashed down to 1 instance of the character.
If the d modifier was used, the REPLACEMENTLIST is always interpreted exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter than the SEARCHLIST, the final character is replicated till it is long enough. If the REPLACEMENTLIST is null, the SEARCHLIST is replicated. This latter is useful for counting characters in a class, or for squashing character sequences in a class.
Examples:
$ARGV[1] =~ y/A-Z/a-z/; \h'|3i'# canonicalize to lower case $cnt = tr/*/*/; \h'|3i'# count the stars in $_ $cnt = tr/0-9//; \h'|3i'# count the digits in $_ tr/a-zA-Z//s; \h'|3i'# bookkeeper -> bokeper ($HOST = $host) =~ tr/a-z/A-Z/; y/a-zA-Z/ /cs; \h'|3i'# change non-alphas to single space tr/\200-\377/\0-\177/;\h'|3i'# delete 8th bit