Приглашаем посетить
Лермонтов (lermontov-lit.ru)

[Chapter 15] 15.5 Transliteration

PreviousChapter 15
Other Data Transformation
Next
 

15.5 Transliteration

When you want to take a string and replace every instance of some character with some new character, or delete every instance of some character, you can already do that with carefully selected s/// commands. But suppose you had to change all of the a's into b's, and all of the b's into a's? You can't do that with two s/// commands because the second one would undo all of the changes the first one made.

From the UNIX shell, however, such a data transformation is simple: just use the standard tr (1) command:

tr ab ba <indata >outdata

(If you don't know anything about the tr command, please look at the tr (1) manpage; it's a useful tool for your bag of tricks.) Similarly, Perl provides a tr operator that works in much the same way:

tr/ab/ba/;

The tr operator takes two arguments: an old string and a new string. These arguments work like the two arguments to s///; in other words, there's some delimiter that appears immediately after the tr keyword that separates and terminates the two arguments (in this case, a slash, but nearly any character will do).

The arguments to the tr operator are similar to the arguments to the tr (1) command. The tr operator modifies the contents of the $_ variable (just like s///), looking for characters of the old string within the $_ variable. All such characters found are replaced with the corresponding characters in the new string. Here are some examples:

$_ = "fred and barney";
tr/fb/bf/;        # $_ is now "bred and farney"
tr/abcde/ABCDE/;  # $_ is now "BrED AnD fArnEy"
tr/a-z/A-Z/;      # $_ is now "BRED AND FARNEY"

Notice how a range of characters can be indicated by two characters separated by a dash. If you need a literal dash in either string, precede it with a backslash.

If the new string is shorter than the old string, the last character of the new string is repeated enough times to make the strings equal length, like so:

$_ = "fred and barney";
tr/a-z/x/; # $_ is now "xxxx xxx xxxxxx"

To prevent this behavior, append a d to the end of the tr/// operator, meaning delete. In this case, the last character is not replicated. Any character that matches in the old string without a corresponding character in the new string is simply removed from the string.

$_ = "fred and barney";
tr/a-z/ABCDE/d; # $_ is now "ED AD BAE"

Notice how any letter after e disappears because there's no corresponding letter in the new list, and that spaces are unaffected because they don't appear in the old list. This is similar in operation to the -d option of the tr command.

If the new list is empty and there's no d option, the new list is the same as the old list. This may seem silly, as in why replace an I for an I and a 2 for a 2, but it actually does something useful. The return value of the tr/// operator is the number of characters matched by the old string, and by changing characters into themselves, you can get the count of that kind of character within the string.[3] For example:

$_ = "fred and barney";
$count = tr/a-z//;      # $_ unchanged, but $count is 13
$count2 = tr/a-z/A-Z/;  # $_ is uppercased, and $count2 is 13

[3] This works only for single characters. To count strings, use the /g flag to a pattern match:

while (/pattern/g) {
    $count++;
}

If you append a c (like appending the d), it means to complement the old string with respect to all 256 characters. Any character you list in the old string is removed from the set of all possible characters; the remaining characters, taken in sequence from lowest to highest, form the resulting old string. So, a way to count or change the nonletters in our string could be:

$_ = "fred and barney";
$count = tr/a-z//c; # $_ unchanged, but $count is 2
tr/a-z/_/c;         # $_ is now "fred_and_barney" (non-letters => _)
tr/a-z//cd;         # $_ is now "fredandbarney" (delete non-letters)

Notice that the options can be combined, as shown in that last example, where we first complement the set (the list of letters become the list of all nonletters) and then use the d option to delete any character in that set.

The final option for tr/// is s, which squeezes multiple consecutive copies of the same resulting translated letter into one copy. As an example, look at this:

$_ = "aaabbbcccdefghi";
tr/defghi/abcddd/s; # $_ is now "aaabbbcccabcd"

Note that the def became abc, and ghi (which would have become ddd without the s option) becomes a single d. Also note that the consecutive letters at the first part of the string are not squeezed because they didn't result from a translation. Here are some more examples:

$_ = "fred and barney, wilma and betty";
tr/a-z/X/s;  # $_ is now "X X X, X X X"
$_ = "fred and barney, wilma and betty";
tr/a-z/_/cs; # $_ is now "fred_and_barney_wilma_and_betty"

In the first example, each word (consecutive letters) was squeezed down to a single letter X. In the second example, all chunks of consecutive nonletters became a single underscore.

Like s///, the tr operator can be targeted at another string besides $_ using the =~ operator:

$names = "fred and barney";
$names =~ tr/aeiou/X/; # $names now "frXd Xnd bXrnXy"


PreviousHomeNext
15.4 Advanced SortingBook Index15.6 Exercises