Useful Things to Do with a Hash

Hashes are often used in Perl for more reasons than to store records by keys for later retrieval. The advantages to using a hash are fast individual access to keys and the fact that all keys in a hash are unique. These properties lend themselves to some useful data manipulations. Not surprisingly, because arrays and hashes are so similar, many of the interesting things you can do with hashes are array manipulations.

Determining Frequency Distributions

In Hour 6, "Pattern Matching," you learned how to take a line of text and split it into words. Examine the following snippet of code:


while ( <> ) {

   while ( /(\w[\w-]*)/g  ) {   # Iterate over words, setting $1 to each.

         $Words{$1}++;

   }

}

The first line reads the standard input one line at a time, setting $_ to each line.

The next while() loop then iterates over each word in $_. Recall from Hour 6 that using the pattern-matching operator (//) in a scalar context with the g modifier returns each pattern match until no more are left. The pattern being looked for is a word character \w, followed by zero or more word characters or dashes [\w-]*. In this case, you use parentheses to remember the string matched in the special variable $1.

The next line, although short, is where the snippet gets interesting. $1 is set, in turn, to each word matched by the pattern on the second line. That word is used as the key to the hash %Words. The first time the word is seen, the key does not already exist in the hash, so Perl returns a value of undef for that key-value pair. By incrementing it, Perl sets the value to 1, creating the pair. The second time a word is seen, the key (that word) already exists in the hash %Words, and it is incremented from 1 to 2. This process continues until no input is left.

When you're finished, the hash %Words contains a frequency distribution of the words read in. To look at the frequency distribution, you can use the following code:


foreach ( keys %Words ) {

        print "$_ $Words{$_}\n";

}

Finding Unique Elements in Arrays

The technique shown in the preceding code is also useful for finding which elements in an array occur only once. Suppose you have already extracted all the words from the input into an array instead of a hash, and you have made no particular effort to make sure that a word wasn't already in the list before putting it in again. In this case, you would have a list with a lot of duplicated words.

If your input text were the opening lines of One Fish, Two Fish, the list would look something like the following:


@fishwords=('one', 'fish', 'two', 'fish', 'red', 'fish', 'blue', 'fish');

If you are given this list of words (in @fishwords), and you need only the unique elements of the list, a hash works nicely for this purpose, as shown in Listing 7.1.

Listing 7.1. Finding Unique Elements in an Array


1:   %seen = ();

2:   foreach (@fishwords) {

3:      $seen{$_} = 1;

4:   }

5:   @uniquewords = keys %seen;

Line 1: This line initializes a temporary

Previous Table of Contents Next