Section 9.1.  Substitutions with s///

Table of Contents

9.1. Substitutions with s///

If you think of the m// pattern match as being like your word processor's "search" feature, the "search and replace" feature would be Perl's s/// substitution operator. This replaces whichever part of a variable[*] matches a pattern with a replacement string:

[*] Unlike m//, which can match against any string expression, s/// is modifying data that must be contained in what's known as an lvalue. This is nearly always a variable, though it could be anything that could be used on the left side of an assignment operator.

    $_ = "He's out bowling with Barney tonight.";
    s/Barney/Fred/;  # Replace Barney with Fred
    print "$_\n";

If the match fails, nothing happens, and the variable is untouched:

    # Continuing from above; $_ has "He's out bowling with Fred tonight."
    s/Wilma/Betty/;  # Replace Wilma with Betty (fails)

The pattern and the replacement string could be more complex. Here, the replacement string uses the first memory variable, $1, which is set by the pattern match:

    s/with (\w+)/against $1's team/;
    print "$_\n";  # says "He's out bowling against Fred's team tonight."

Here are some other possible substitutions. (These are here only as samples; in the real world, it would not be typical to do so many unrelated substitutions in a row.)

    $_ = "green scaly dinosaur";
    s/(\w+) (\w+)/$2, $1/;  # Now it's "scaly, green dinosaur"
    s/^/huge, /;            # Now it's "huge, scaly, green dinosaur"

    s/,.*een//;             # Empty replacement: Now it's "huge dinosaur"
    s/green/red/;           # Failed match: still "huge dinosaur"
    s/\w+$/($`!)$&/;        # Now it's "huge (huge !)dinosaur"
    s/\s+(!\W+)/$1 /;       # Now it's "huge (huge!) dinosaur"
    s/huge/gigantic/;       # Now it's "gigantic (huge!) dinosaur"

There's a useful Boolean value from s///. It's true if a substitution was successful; otherwise it's false:

    $_ = "fred flintstone";
    if (s/fred/wilma/) {
      print "Successfully replaced fred with wilma!\n";

9.1.1. Global Replacements with /g

As you may have noticed in a previous example, s/// will make only one replacement even if others are possible. Of course, that's just the default. The /g modifier tells s/// to make all possible nonoverlapping[*] replacements:

[*] It's nonoverlapping because each new match starts looking just beyond the latest replacement.

    $_ = "home, sweet home!";
    print "$_\n";  # "cave, sweet cave!"

A fairly common use of a global replacement is to collapse whitespacethat is, to turn any arbitrary whitespace into a single space:

    $_ = "Input  data\t may have    extra whitespace.";
    s/\s+/ /g;  # Now it says "Input data may have extra whitespace."

Once we show collapsing whitespace, everyone wants to know about stripping leading and trailing whitespace. That's easy enough, in two steps:

    s/^\s+//;  # Replace leading whitespace with nothing
    s/\s+$//;  # Replace trailing whitespace with nothing

We could do that in one step with an alternation and the /g flag, but that turns out to be a bit slower, at least when we wrote this. The regular expression engine is always being tuned, but to learn more about that, you can get Mastering Regular Expressions (O'Reilly) and find out what makes regular expressions fast (or slow).

    s/^\s+|\s+$//g;  # Strip leading, trailing whitespace

9.1.2. Different Delimiters

As we did with m// and qw//, we can change the delimiters for s///. But the substitution uses three delimiter characters, so things are a little different.

With ordinary (non-paired) characters, which don't have a left and right variety, use three of them as we did with the forward slash. Here, we've chosen the pound sign[*] as the delimiter:

[*] With apologies to our British friends, to whom the pound sign is something else. Though the pound sign is generally the start of a comment in Perl, it won't start a comment when the parser knows to expect a delimiterin this case, immediately after the s that starts the substitution.


But if you use paired characters, which have a left and right variety, you have to use two pairs: one to hold the pattern and one to hold the replacement string. In this case, the delimiters don't have to be the same kind around the string as they are around the pattern. In fact, the delimiters of the string could even be non-paired. These are all the same:


9.1.3. Option Modifiers

In addition to the /g modifier,[Section 9.1.  Substitutions with s///] substitutions may use the /i, /x, and /s modifiers that you saw in ordinary pattern matching. The order of modifiers isn't significant.

[Section 9.1.  Substitutions with s///] We still speak of the modifiers with names like "/i" even if the delimiter is something other than a slash.

    s#wilma#Wilma#gi;  # replace every WiLmA or WILMA with Wilma
    s{_ _END_ _.*}{  }s;   # chop off the end marker and all following lines

9.1.4. The Binding Operator

As you saw with m//, we can choose a different target for s/// by using the binding operator:

    $file_name =~ s#^.*/##s;  # In $file_name, remove any Unix-style path

9.1.5. Case Shifting

It often happens in a substitution that you'll want to ensure that a replacement word is properly capitalized (or not, as the case may be). That's easy to accomplish with Perl, by using some backslash escapes. The \U escape forces what follows to all uppercase:

    $_ = "I saw Barney with Fred.";
    s/(fred|barney)/\U$1/gi;  # $_ is now "I saw BARNEY with FRED."

Similarly, the \L escape forces lowercase:

    s/(fred|barney)/\L$1/gi;  # $_ is now "I saw barney with fred."

By default, these affect the rest of the (replacement) string. You can turn off case shifting with \E:

    s/(\w+) with (\w+)/\U$2\E with $1/i;  # $_ is now "I saw FRED with barney."

When written in lowercase (\l and \u ), they affect only the next character:

    s/(fred|barney)/\u$1/ig;  # $_ is now "I saw FRED with Barney."

You can even stack them up. Using \u with \L means "all lower case, but capitalize the first letter":[*]

[*] The \L and \u may appear together in either order. Larry realized that people would sometimes get those two backward, so he made Perl figure out that you want the first letter capitalized and the rest lowercase. Larry is a pretty nice guy.

    s/(fred|barney)/\u\L$1/ig;  # $_ is now "I saw Fred with Barney."

As it happens, though we're covering case shifting in relation to substitutions, these escape sequences are available in any double-quotish string:

    print "Hello, \L\u$name\E, would you like to play a game?\n";

    Table of Contents
    © 2000- NIV