Section 8.3.  Anchors

Table of Contents

8.3. Anchors

By default, if a pattern doesn't match at the start of the string, it can "float" on down the string trying to match somewhere else. But a number of anchors may be used to hold the pattern at a particular point in a string.

The caret[*] anchor (^) marks the beginning of the string, and the dollar sign ($) marks the end.[Section 8.3.  Anchors] So, the pattern /^fred/ will match fred only at the start of the string; it wouldn't match manfred mann. And /rock$/ will match rock only at the end of the string; it wouldn't match knute rockne.

[*] Yes, you've seen the caret used in another way in patterns. As the first character of a character class, it negates the class. But outside of a character class, it's a metacharacter in a different way, being the start-of-string anchor. There are only so many characters, so you have to use some of them twice.

[Section 8.3.  Anchors] Actually, it matches either the end of the string or at a newline at the end of the string. That's so you can match the end of the string whether it has a trailing newline or not. Most folks don't worry about this distinction much, but once in a while it's important to remember that /^fred$/ will match "fred" or "fred\n" with equal ease.

Sometimes, you'll want to use both of these anchors to ensure that the pattern matches an entire string. A common example is /^\s*$/, which matches a blank line. But this "blank" line may include some whitespace characters, like tabs and spaces, which are invisible. Any line that matches this pattern looks like any other one on paper, so this pattern treats all blank lines equally. Without the anchors, it would match nonblank lines as well.

8.3.1. Word Anchors

Anchors aren't just at the ends of the string. The word-boundary anchor, \b, matches at either end of a word.[Section 8.3.  Anchors] So you can use /\bfred\b/ to match the word fred but not frederick, alfred, or manfred mann. This is similar to the feature often called "match whole words only" in a word processor's search command.

[Section 8.3.  Anchors] Some regular expression implementations have one anchor for start-of-word and another for end-of-word, but Perl uses \b for both.

Alas, these aren't words as you and I are likely to think of them; they're those \w-type words made up of ordinary letters, digits, and underscores. The \b anchor matches at the start or end of a group of \w characters.

In Figure 8-1, a gray underline is under each "word," and the arrows show the corresponding places where \b could match. There are always an even number of word boundaries in a given string since there's an end-of-word for every start-of-word.

The "words" are sequences of letters, digits, and underscores. A word in this sense is what's matched by /\w+/. There are five words in that sentence: That, s, a, word, and

Figure 8-1. Word-boundary matches with \b

boundary.[*] The quote marks around word don't change the word boundaries; these words are made of \w characters.

[*] You can see why we wish we could change the definition of "word"; That's should be one word, not two words with an apostrophe in between. Even in text that may be mostly ordinary English, it's normal to find a soupçon of other characters spicing things up.

Each arrow points to the beginning or the end of one of the gray underlines since the word-boundary anchor \b matches only at the beginning or the end of a group of word characters.

The word-boundary anchor is useful to ensure we don't accidentally find cat in delicatessen, dog in boondoggle, or fish in selfishness. Sometimes, you'll want one word-boundary anchor, as when using /\bhunt/ to match words like hunt or hunting or hunter, but not shunt, or when using /stone\b/ to match words like sandstone or flintstone but not capstones.

The nonword-boundary anchor is \B; it matches at any point where \b would not match. So, the pattern /\bsearch\B/ will match searches, searching, and searched but not search or researching.

    Table of Contents
    © 2000- NIV