Документация
HTML CSS PHP PERL другое

Simple Patterns

 
Previous Table of Contents Next

Simple Patterns

In Perl, patterns are enclosed inside a pattern match operator, which is sometimes represented as m//. A simple pattern might appear as follows:


m/Simon/


The preceding pattern matches the letters S-i-m-o-n in sequence. But where is it looking for Simon? Previously, you learned that the Perl variable $_ is frequently used when Perl needs a default value. Pattern matches occur against $_ unless you tell Perl otherwise (which you'll learn about later). So the preceding pattern looks for S-i-m-o-n in the scalar variable $_.

If the pattern specified by m// is found anywhere in the variable $_, the match operator returns true. Thus, the normal place to see pattern matches is in a conditional expression, as shown here:


if (m/Piglet/) {

    # the pattern "Piglet" is in $_

}


Inside the pattern, every character matches itself unless it is a metacharacter. Most of the "normal" characters match themselves: A to Z, a to z, and digits. Metacharacters are characters that change the behavior of the pattern match. The list of metacharacters is as follows:


^ $ ( ) \ | @ [ { ? . + *


You'll shortly read about what the metacharacters do. If your pattern contains a metacharacter that you want to match for its literal value, simply precede the metacharacter with a backslash, as shown here:


m/I won \$10 at the fair/;   # The $ is treated as a literal dollar sign.


Earlier, you read that the pattern match operator is usually represented by m//. In reality, you can replace the slashes with any other character you want, such as the commas in the following example:


if (m,Waldo,) { print "Found Waldo.\n"; }


The slash or other character that marks the beginning and end of the pattern is called the delimiter. Often you replace the delimiter when the pattern contains slashes (/) and the end of the pattern could be confused with the slashes inside the pattern. If you stick with slashes to delimit that pattern, the enclosed slashes need to have backslashes in front of them, as shown here:


if (m/\/usr\/local\/bin\/hangman/) { print "Found the hangman game!" }


By changing the delimiter, you could write the preceding example more legibly as follows:


if (m:/usr/local/bin/hangman:) { print "Found the hangman game!" }


If the delimiters around the pattern are slashes, you also can write the pattern match without the m. This way, you also can write m/Cheetos/ as /Cheetos/. Normally, unless you need to use delimiters other than slashes (//), you write pattern matches with just slashes and no m.

Variables can also be used in a regular expression. If a scalar variable is seen in a regular expression, Perl first evaluates the scalar and interpolates it, just as in a double-quoted string (recall Hour 2, "Perl's Building Blocks: Numbers and Strings"); then it examines the regular expression. This capability allows you to build regular expressions dynamically. The regular expression in the following if statement is based on user input:


$pat=<STDIN>;  chomp $pat;

$_="The phrase that pays";

if (/$pat/) {    #Look for the user's pattern

    print "\"$_\" contains the pattern $pat\n";

}


By the Way

Regular expressions in the manual pages and in other documentation are sometimes called REs or regexps. For clarity, I'll continue to refer to them as regular expressions throughout this book.


Rules of the Game

As you begin to write regular expressions in Perl, you should know that a few rules are involved in the way Perl interprets them. There are not many rules, though, and most of them make sense after you think about them. They are as follows:

  1. Normally, pattern matches start at the left of the target string and work their way to the right.

  2. Pattern matches return true (in whatever context) if and only if the entire pattern can be used to match the target string.

  3. The first possible match (the leftmost) in the target string is matched first. Regular expressions don't leave behind one good match to go looking for another further along. However…

  4. The largest possible first match is taken. Your regular expressions might find a match immediately and then try to stretch that match as far as possible. Regular expressions are greedy, meaning they try to match as much as possible.

    Previous Table of Contents Next
    © 2000- NIV