Section 8.14.  Utilities

Table of Contents

8.14. Utilities

Use the "non-builtin builtins".

This guideline covers a number of common wheels that ought not be re-invented. Perl itself encourages the re-use of existing wheels by providing so many built-in functions in the first place. But there are a few gaps in its coverage; a few common tasks that it doesn't provide a convenient builtin to handle.

That's where the Scalar::Util, List::Util, and List::MoreUtils modules can help. They provide commonly needed list and scalar processing functions, which are implemented in C for performance. Scalar::Util and List::Util[*] are part of the Perl standard library (since Perl 5.8), and all three are also available on CPAN.

[*] There is also a standard Hash::Util module in 5.8 and later, but its use is not recommended. See Chapter 15.

The Scalar::Util module provides the following functions:

blessed $scalar

If $scalar contains a reference to an object, blessed( ) returns a true value (specifically, the name of the class). Otherwise, it returns undef.

refaddr $scalar

If $scalar contains a reference, refaddr( ) returns an integer representing the memory address that reference points to. If $scalar doesn't contain a reference, the subroutine returns undef. This result is useful for generating unique identifiers for variables or objects (see Chapter 15).

reftype $scalar

If $scalar contains a reference, reftype( ) returns the standard string that describes the type of the referent (e.g., 'SCALAR', 'HASH', 'ARRAY', 'CODE', 'Regexp'). In particular, if the reference is to a blessed object, reftype( ) still returns the standard string representing the underlying (pre-blessed) type of the object. If $scalar doesn't contain a reference, reftype( ) returns undef.

readonly $scalar

Returns a true value if $scalar has been marked as read-only (e.g., via the Readonly module).

tainted $scalar

Returns a true value if $scalar contains data from an untrusted source. See the perlsec manpage.

openhandle $scalar

Returns the contents of $scalar if those contents can be used as a filehandle and the resulting filehandle is already open. Otherwise, returns undef. Handy for verifying arguments to I/O subroutines that are supposed to be passed a usable filehandle.

weaken $scalar

This subroutine expects $scalar to contain a reference to something. It takes that reference and "hides" it from the reference-counting garbage collector. See "Cyclic References" in Chapter 11 for an example of why this might be a useful thing to do.

is_weak $scalar

Returns a true value if $scalar contains a reference that has already been weakened.

looks_like_number $scalar

Returns a true value if the entire contents of $scalar is something that Perl can treat as a number (e.g., an actual number, or a string that can be wholly converted to a number, or a reference). If $scalar contains a string that could only partially be converted to a numbersuch as '802.11b'then looks_like_number( ) will return false. This function is often a better choice for verifying numeric input than simply relying on Perl's implicit numeric conversions. On the other hand, looks_like_number( ) also accepts the strings 'Inf' and 'Infinity' as numbers. Whether this is a bug or a feature will depend on your personal mathematical philosophy.

Scalar::Util also provides a several other exportable subroutines that are not described here. Those additional subroutines are not recommended, because their intended usessuch as identifying vstrings and setting subroutine prototypesdirectly contravene specific guidelines in this book.

The List::Util module allows you to export any of the following functions:

first { <condition>} @list

Returns the first element of @list that satisfies the condition specified in the block. first( ) is similar to grep, but stops processing the list as soon as it finds the first successful match. see Chapters 6 and 9 for examples.

max @list

Returns the largest element of @list, as determined by numeric comparison (>).

maxstr @list

Returns the largest element of @list, as determined by string comparison (gt).

min @list

Returns the smallest element of @list, as determined by numeric comparison (<).

minstr @list

Returns the smallest element of @list, as determined by string comparison (lt).

shuffle @list

Returns the elements of @list in an unbiased (pseudo-)randomized order[*].

[*] "Fairness" in a shuffle is actually quite tricky to get right. Which makes this particular wheel one that's especially worth not re-inventing. See "Randomizing an Array" in Chapter 4 of Perl Cookbook (O'Reilly, 2003) for a full discussion.

sum @list

Returns the sum of the individual elements of @list (that is: $list[0] + $list[1] + $list[2] +...+ $list[$#list]).

reduce { <binary_op>} @list

Applies the specified binary operation to each adjacent pair of elements in @list. The binary operation must be specified in terms of operands $a and $b (like a sort block uses). For example, to multiply all the elements of a list together:

    my $overall_probablity = reduce { $a * $b } @partial_probabilities;

Or to flatten a list of array references into a single array reference:

    my $universal_set_ref = reduce { [ uniq @{$a}, @{$b} ] } @individual_sets;

In this last example, reduce( ) takes every pair of adjacent array references inside @individual_sets (calling them $a and $b inside its block), dereferences them (@{$a} and @{$b}), concatenates the resulting lists (@{$a}, @{$b}), keeps only the unique elements (uniq @{$a}, @{$b}), and then puts the result into a new anonymous array ([ @{$a}, @{$b} ]).

The List::MoreUtils CPAN module provides efficient implementations for many additional list processing functions. Some of the most useful include:

all { <condition>} @list

Returns true if all of the items in @list satisfy the condition specified in the block. There are also any( ), notall( ), and none( ) variants, which test whether the corresponding numbers of list elements satisfy the condition. For example:

    croak q{Can't handle an undefined value}
        if any {!defined} @args;
    carp "All values are large. This may take a while...\n"
        if all {$_ > $FAST_LIMIT} @args;

first_index { <condition>} @list

Returns the index of the first element in @list for which the condition in the block is true. There is also a last_index( ) version.

apply { <transform>} @list

This function applies the operation(s) in the block to copies of each list element (passed in $_), and then returns the list of those modified copies. For example, instead of:

    my @nice_words
        = map {
              my $copy = $_;
              $copy =~ s/$EXPLETIVE/[DELETED]/gxms;
          } @words;

you can simply write:

    my @nice_words = apply { s/$EXPLETIVE/[DELETED]/gxms } @words;

pairwise { <binary_op>} @array1, @array2

Walks through the elements of @array1 and @array2 in parallel, applying the binary operation specified in the block to one element of @array1 (accessed via $a) and the corresponding element of @array2 (accessed via $b). Returns a list of the results of each such binary operation. For example:

    my @revenue_from_items = pairwise { $a * $b } @sales_of_items, @price_of_items;

zip @array1, @array2, ...

Returns a list that interleaves the elements of each array: $array1[0], $array2[0], $array1[1], $array2[1], $array1[2], $array2[2], etc. The name derives from the interleaving of teeth in a zipper. This subroutine is particularly handy for populating an anonymous hash from two arrays:

    my $hash_ref = { zip @keys, @values };

uniq @list

Returns a list consisting of all the elements in @list, but with any repeated elements removed. Preserves the original order of the elements it does return. If called in a scalar context, returns the number of unique elements in @list. Note that the list doesn't have to be sorted, nor do the repeated elements have to be adjacent.

The functions in Scalar::Util, List::Util, and List::MoreUtils are efficiently implemented and widely used, so they're fast and thoroughly debugged. They are also well named, so using them can improve the readability of your code. For example, instead of writing:

    my $max_sample = $samples[0];
    for my $sample (@samples[1..$#samples]) {
        if ($sample > $max_sample) {
            $max_sample = $sample;

it's cleaner, clearer, more robust, more scalable, more maintainable, and faster to write:

    my $max_sample = max @samples;

Even when you're only deciding between two values:

    my $upper_limit = $last_seen gt $last_predicted ? $last_seen : $last_predicted;

it's still better to write:

    my $upper_limit = maxstr($last_seen, $last_predicted);

Although calling the subroutine is approximately 25% slower than using a "raw" ternary operator, it's still blisteringly fast. And the maxstr( ) version definitely wins on cleanliness, clarity, reliability, scalability, and extensibility.

    Table of Contents
    © 2000- NIV