Ïðèãëàøàåì ïîñåòèòü
Õëåáíèêîâ (hlebnikov.lit-info.ru)

6.2 Cargo Cult Perl

Previous Table of Contents Next

6.2 Cargo Cult Perl

Perl provides so many ways of solving a problem that it's not surprising that sometimes people pick ways that do work, but are suboptimal for reasons of performance, readability, or maintainability. It might sound harsh to characterize something that gets the job done as a mistake, but we're working on being better than just "good enough" here.

During World War II, air crews visited islands in the South Pacific such as Melanesia that had seen few, if any, white people before. The appearance and vehicles of these crews resembled some local legends and they were revered as deities. Following their departure, islanders emulated the behavior and clothing of these airmen in the hope that they would return, even building landing strips to encourage their arrival. These ad hoc religions were termed cargo cults because they venerated the cargo left behind by the airmen.

Cargo cult code is a pejorative but colorful term given to code that is used without the author knowing what it does, but because he or she saw someone else doing it and assumed it was right for their purpose. Remember, the more you think about your programming, the more valuable you will be as a programmer. Be conscious of when you make blind use of code idioms, because the next person to read your code may also have seen the advice I am about to give: When you see cargo cult code in programs you are given to maintain, it will help you gauge the experience level of the author. This helps you decide how seriously to take code that you cannot understand at all.

As in any art, the most advanced practitioners of Perl make their own rules. If you know that the code you're inheriting came from someone who knew what they were doing, the presence of apparent cargo cult code in their program probably does not mean they were suffering from premature dementia. They likely had a good reason for putting it in.

Here are some examples of cargo cult code. They range from the harmlessly redundant to the just plain wrong, but all of them are worth looking out for, to make sure that you don't use them unwittingly.

6.2.1 Useless Stringification

When I see a scalar variable enclosed in double quotes with nothing to keep it company—like so:


print "$x";   # Useless stringification

I think, "Ah, a shell programmer was here." This code is identical to:


print $x;     # No stringification

(with a caveat I'll get to in a moment). In the Bourne shell, there are times you must put a variable in double quotes to avoid a possible error; there are no such times in Perl.

Harmless, you say? Mostly. But the reason we refer to quoting an expression as "stringification" is more than a desire to coin a horrible neologism. In fact it describes an actual operation. Not everything in Perl is unaffected by stringification. References—including objects—are not the same after being stringified:


% perl -Mstrict -Mwarnings

my $ref = { dog => 'bark', cat => 'purr', llama => 'hum' };

my $notref = "$ref";

print "Reference: $ref->{dog}\n";

print "String:    $notref->{dog}\n";

^D

Reference: bark

Can't use string ("HASH(0x80fbb0c)") as a HASH ref while

"strict refs" in use...

Stringification is also a process that can be affected by the programmer. If an object in a class overloads stringification, putting quotes around it will invoke its stringify method. This may sound too esoteric for you to be concerned about, but there are a number of common modules that do in fact overload stringification, such as URI.pm:


% perl -Mstrict -Mwarnings -MURI

my $url = URI->new("http://www.perlmedic.com");

my $nonurl = "$url";

print "Object: ", $url->host, "\n";

print "String: ", $nonurl->host;

^D

Object: www.perlmedic.com

Can't locate object method "host" via package 

"http://www.perlmedic.com" (perhaps you forgot to load "http://

www.perlmedic.com"?)...

Useless stringification is also inefficient. If you're stringifying a scalar that started life as a number only to use it as a number again anyway, Perl ends up performing a redundant string conversion; overhead that could be detrimental if you're in a loop that gets executed often.

6.2.2 Pointless Concatenation

This is not using variables between double quotes enough. (A number of these sections will seem to contradict others. There isn't really a contradiction, but I don't mind you speculating that there is one if that encourages you to think about everything.) This isn't an error per se, but more of a cumbersome style. Many languages don't have variable interpolation, so people who are used to those languages write Perl code like this:


$title = "Report at " . $time . " on " . $date . "\n";

I've even seen—don't laugh:


$title = join (" ", "Report at", $time, "on", $date, "\n");

the result of which happens to differ from the first by a trailing space, but regardless of that is needlessly obfuscated. Instead, use interpolation:


$title = "Report at $time on $date\n";

6.2.3 Superfluous Use of scalar()

There is no point in using the scalar() function if the context is already scalar. Understanding list and scalar contexts is crucial to being a good Perl programmer, and the canonical reference work [WALL00] does not mince words: "You will be miserable until you learn the difference between scalar and list context" (p. 69).

Therefore you have to wonder how well someone who wrote this:


if (scalar @items) { ... }

understood context. Clearly the code is intended to execute the following block if there is anything in the @items array. But the condition of an if statement is already in scalar context.[1] It couldn't be otherwise; the condition has to be evaluated to see whether it's true or false; both of those alternatives are single values (i.e., scalars).

[1] Strictly speaking, it's in boolean context. But that's a special case of scalar context (as are string and numeric contexts), and you'd be hard pressed to tell the difference.

Even more blatant is code like this:


my $n_values = scalar(values %hash);

Assigning anything to a scalar will put the expression in scalar context anyway, so the scalar() is superfluous.

Understanding context will make a world of difference in, for example, how you use regular expressions in conditions. Can you tell the difference between the following two constructs?

  1. while ( /(\w+)\s+(\d+)/g) { ... }

  2. foreach (/(\w+)\s+(\d+)/g) { ... }

The while statement puts its condition in scalar context, and a /g regular expression match in a scalar context acts as an iterator, repeatedly returning true while the pattern matcher advances along the input string ($_ in this case) until there are no more matches. We're saving the portions of interest in $1 and $2 via the capturing parentheses, and therefore the code in the block had better use them.

The foreach statement imposes a list context between the parentheses, and a /g regular expression match in a list context returns a list of all the capturing parentheses matches. Inside the block $_ will be set in turn to each of the successive $1 and $2 from each match. If you actually wanted to differentiate between the $1 and $2, this would not be good code, because you'd have no idea inside the block whether your $_ represented the word ($1) or the number ($2), unless you counted how many times you'd been through the block and looked to see if that count was even or odd.

Let's look at an example to make that clearer:


$_ = <<EOT;

  apples  42

  oranges 57

  lemons  21

  pears   91 

EOT



my %stock;

while (/(\w+)\s+(\d+)/g)

{

  my ($fruit, $count) = ($1, $2);

  $stock{$fruit} = $count;

}

This is using the right tool for the job; because we need to differentiate between the fruit and the count in the loop, we want to get two elements at a time. Had I tried this with a foreach loop, not only would the loop not have started executing until I'd accumulated all the matches in a list, but the block would have executed first for apples, then for 42, then for oranges, and so on.

If you understood that, try this example. Suppose we decide to combine the regular expression match with the assignment to $fruit and $count:


while (my ($fruit, $count) = /(\w+)\s+(\d+)/g) #Wrong!

{

  $stock{$fruit} = $count;

}

This is a big mistake! The code will loop forever on just the first values of $fruit and $count. We've confused the behavior of m// in a list context with the behavior of /g in a scalar context. In a list context, a match will return the list of what matched in capturing parentheses, and a global (/g) match will return the list of all lists of capturing parentheses matches. There is no iterative behavior; once the /g match is complete, another attempt to execute it will start from the beginning again.

Even though the while statement puts its condition in a scalar context, the list assignment is inside that condition and therefore takes precedence for setting the context of the m//g. The list assignment itself is evaluated in scalar context, and perlop tells us that a list assignment in scalar context evaluates to the number of elements that were assigned. So as long as there was a match, there will be an assignment, and the condition will be true.

When I need to force a scalar context, I seldom use scalar(); it's just too much typing. Instead I take advantage of an operator that imposes a scalar context. Here's what I mean: Say I want to print a report of the number of elements in an array. These don't work:


print "There are ", @items, " items in the array\n"; # Wrong

print "There are @items in the array\n";        # Also wrong

The first puts @items in a list context and therefore prints out all the items; the second puts @items in a string context and therefore prints all the items with a space between consecutive elements. But wait! Don't reach for that scalar() button! With much less effort, . will do the job:


print "There are " . @items . " items in the array\n";

(Either dot could have been replaced with a comma; one is enough to force @items into scalar context.)

Now suppose I have an array @recs of data records and I need to set a count parameter in a hash to the number of elements in @recs, say for some web page template. I can't do this:


my %param = (count => @recs, ... );

It's tempting to think, "Why not? @recs is right where I'd put a scalar in a hash initialization list, so isn't it in scalar context?" No it isn't; elements in a list are in list context, and there's no special hash context (until Perl 6); a hash on the left-hand side of an assignment puts the right-hand side in list context, and there's only one kind of list context.

Here I can make a case for scalar(); to see why, let's look at the alternative. I want a count, so I can use an arithmetical operator to force scalar context:


my %param = (count => 0+@recs, ... );

Yes, I could use other variants like 1*@recs, but 0+@recs is the least surprising to the eyes. But here I am using the + for its side effect of imposing scalar context, and I have no interest in the addition per se, whereas in the previous example I certainly did want to perform string concatenation and the dot operator was therefore not surprising. So the next person to look at this might wonder why I am performing a superfluous addition. They may realize the reason within a fraction of a second, but good coding requires paying attention to even that level of detail, because any program will contain many potential places where the reader might waste that fraction of a second. So by all means write that as:


my %param = (count => scalar @recs, ... );

It's handy to know the alternatives, though, in case one day you're writing a throwaway script in a hurry and want to minimize typing.

6.2.4 Useless Slices

Beginners often get confused over the syntax for an array element, and write @array[4] where they should have written $array[4]. Today we can ascribe a more charitable interpretation to their actions and say that they are merely anticipating Perl 6, where that is the syntax; however, in Perl 5 it means something else. The programmer has inadvertently stumbled across the array slice, a useful way of getting several elements from an array:


my @pilots = qw(Scott Virgil Alan Gordon John);

my @ships = (0, 3, 4);

for (@pilots[@ships]) { ... }  # Scott Gordon John

What appears between the brackets is the list of indices to use. You can slice only one element from an array; however, with warnings enabled, Perl will tell you about it:


% perl -Mstrict -Mwarnings -le 'print @INC[2]'

Scalar value @INC[2] better written as $INC[2] at -e line 1.

/usr/lib/perl5/site_perl/5.6.1/i386-linux

Why should it bother? Since the definition of an array slice is that @array[a,b,c,...] is equivalent to the list ($array[a], $array[b], $array[c],...), then @array[2] is the same as $array[2], isn't it?

Not quite, and the difference again is that magic word context. @array[2] is a list containing one element; but $array[2] is a scalar. Therefore each will impose a different context if given the chance:


% perl 

$element[0] = localtime;

@slice[0]   = localtime;

print "Element: ", $element[0], "\n";

print "Slice:   ", @slice[0], "\n";

^D

Element: Fri Apr 18 20:10:09 2003

Slice:   9

The warning is only generated for a literal list of one element. If you use an expression for the index list—say, @slice[@items]—you don't get a warning even if @items happens to contain one element. Otherwise you'd be forced to check the size of @items first, and that kind of constraint is quite anti-Perlish.

Here's an even better reason not to use a slice accidentally:


% perl -Mstrict -Mwarnings

my @cats = ( {name => 'Jake', fur => 'orange'} );  # etc

print @cats[0]{name};

^D

Scalar value @cats[0] better written as $cats[0] at - line 2.

syntax error at - line 2, near "]{name"

Execution of - aborted due to compilation errors.

I tried to access the name element of the first hashref in the array @cats; Perl warned me that I probably didn't want a slice, but left the code the way it was, which turned out to be a syntax error. (Out of curiosity, there is no syntax error if the implied arrow is put back: @cats[0]->{name}. But I'd sooner you stay off the road with the landmines than keep going just because I point out where the first one is.)

Lists and hashes can also be sliced:


my ($uid, $gid) = (stat $file)[4,5];

my @fields = @data{@field_names};

That second line shows a handy way of extracting hash values in a particular order, useful in code for database applications. See [GUTTMAN98] for more information.

6.2.5 Not Testing a Regular Expression Before Using $1

This is when you do a regular expression match that captures text, and then use $1 and friends without checking that the regex matched:


$line =~ /height = (\d+) in, weight = (\d+) lbs/;

print "Go metric! Height = ", $1 * 2.54, "cm, weight = ",

      $2 * 0.45, "kg\n";

and it is flat out wrong. Assume that a regular expression matched when using $1 and you may end up with a $1 that was set by a previous regular expression match. Always check to see if the regex match succeeded, or you could end up with quite bogus results like this:


% perl -Mstrict -Mwarnings

for (qw(1 2 3 banana 4))

{

  /(\d+)/;

  print "\$1 = $1\n";

}

^D

$1 = 1

$1 = 2

$1 = 3

$1 = 3

$1 = 4

and no warning that anything is amiss. The same goes for s/// (although it's less common to want to capture text in a substitution). Proper code looks like:


for (qw(1 2 3 banana 4))

{

  /(\d+)/ and print "\$1 = $1\n";

}

or some other variant of only performing the action if the match succeeded.

6.2.6 Not Checking Return Codes from System Functions

This is also wrong. The only time it is appropriate not to check the return code from a function that might fail is when your program should do the same thing whether or not the function succeeds. Usually that requires more thinking than putting in a check every time. Most system functions return false and set $! on failure, so checking is as simple as this:


open my $fh, $filename or die "Can't open $filename: $!\n";

Look in perlfunc at the lists under the headings "Input and output functions" and "Functions for filehandles, files, and directories" for system functions. Note that people rarely check the return code of print(), even though it can return false if, for instance, it is printing to a file on a filesystem that has run out of space. Although you see that error when you call close(),[2] if you've spent a lot of time or written (so you think) a lot of data before getting to the close(), this may be small comfort. So check the return code from print() if you want to be really safe. Remember, if the return code signals an error, your program should get the reason for that error from $! and put it somewhere that it can be seen.

[2] Don't count on it; I found a situation with AFS where this wasn't the case.

6.2.7 map() in Void Context

The reason I bring this one up is that a large number of people think of it as cargo cult programming, even though it's a small nit to pick. And one of those people may be evaluating your competence based on your use of this idiom. What they are all worked up about is using map() for its side effects and ignoring its return value. Because map() is designed to transform a list and therefore its raison d'être is to return another list, to ignore that output list is to obscure your point; and here the detractors are right. They also argue that Perl wastes memory constructing this list, and they're right again there, although Perl as of version 5.8.1 has been modified not to bother constructing the list in a void context. The final nail in the coffin is that there's no savings in readability or typing, because:


map EXPR, LIST 

is identical to


EXPR for LIST

except the second doesn't create the unnecessary list, and it's even one character shorter. The same goes for the block form:


map BLOCK LIST

is identical to


for (LIST) BLOCK

although granted, the second is a whopping two characters longer.

6.2.8 Reading an Entire Stream into Memory Only to Process It Line by Line

For reasons I have never been able to discover, a surprising number of people write code like this:


# Don't really do this

my @lines;

while (<>)

{

  push @lines, $_;

}

for (@lines)

{

  # Process $_

}

There are two things wrong with this approach. First, processing doesn't start until the whole stream has been read. Second, memory is wasted on a temporary array. Both of these problems make this program a poor implementation for a filter in a pipeline processing an arbitrarily long stream. Just write this instead:


while (<>)

{

  # Process $_

}

If the stream has to be processed in a different order from its input—via some kind of sorting, say—then you will need the temporary array after all. Don't confuse that requirement with the need to sort some product of the lines; if, for example, you're reading lines that contain stock inventory such as:


apples 10

oranges 16

lemons 12

and you need to work with this data sorted in descending order of stock count, then populate a hash on the fly:


my %stock;

while (<>)

{

  my ($fruit, $count) = /(\w+)\s+(\d+)/ or next;

  $stock{$fruit} = $count;

}

my @sorted_fruit = sort { $stock{$b} <=> $stock{$a} }

                        keys %stock;

Or, if speed is of the essence, use an external sorting program:


open my $fh, "sort -nr -k2 @ARGV|" or die $!;

my (%stock, @sorted_fruit);

while (<$fh>)

{

  my ($fruit, $count) = /(\w+)\s+(\d+)/ or next;

  $stock{$fruit} = $count;

  push @sorted_fruit, $fruit;

}

6.2.9 Reinventing File::Find

This one sounds like it should be a public service announcement. Imagine Charlton Heston—or Alan Alda, if you prefer—solemnly intoning, "Every year, hundreds of beginner Perl programmers get caught in an insidious trap. Seduced by the temptation to try out recursion and the apparent simplicity of filesystem traversal, they write programs and even modules to descend through directory trees. But along the way, something goes horribly wrong . . . "

It looks so easy and fun to write directory traversal code. In five minutes one can dash off a find() function that takes a callback routine to call on each file found under a set of starting points. In fact, just to prove the point, I'll take five minutes to write one:


sub find                # Don't really use this!

{

  my ($callback, @tops) = @_;

  for my $top (@tops)

  {

    _dofile($top, $callback);

  }

}



sub _dofile

{

  my ($file, $callback) = @_;

  -d $file and _dirfind($file, $callback);

  $callback->($file);

}



sub _dirfind

{

  my ($dir, $callback) = @_;

  opendir my $dh, $dir or warn "Can't open $dir: $!" and return;

  while (defined(my $file = readdir $dh))

  {

    next if $file =~ /^\.\.?$/;

    _dofile("$dir/$file", $callback);

  }

}

So quick, so simple, so wrong. The insidious nature of this beast is that it appears to work at first, and might in fact work for every test case you happen to use it for. Unfortunately, in addition to interface flaws (no order of traversal is guaranteed, and some programmers may want to traverse all the children of a directory before the directory itself, or vice versa), there are potentially fatal problems. One of those is the handling of cyclic symbolic links; the preceding code will loop until the filesystem gets tired of generating ever longer paths. Another is portability across filesystems.

Other optimizations performed, and problems solved, by File::Find, I deliberately omit here; the point is that you don't need to know what they are, because the authors of File::Find have done the work for you. Instead of typing all the preceding code, all you need to type instead is this:


use File::Find;

which is a massive win for efficiency, maintainability, readability, and portability. If you don't mind going to CPAN, Richard Clamp has written File::Find::Rule (http://search.cpan.org/dist/File-Find-Rule/), a module that makes it even easier to construct filesystem traversal code.

6.2.10 Useless Parentheses

This is being very nitpicky, but in statements like this:


my ($cheese) = shift;

my ($wine);

the parentheses are unnecessary. You only need them when you are declaring more than one variable or when you need to put the right-hand side in list context. Their use like that suggests that the programmer thought that my is a function rather than a variable modifier and would therefore be unaware of the more subtle aspects of my, such as its separate compile time and run time effects.[3]

[3] Yes, I know my is listed in perlfunc. Whether that's the right place for it is debatable.

If you do have more than one variable to declare, make sure you do have the parentheses; I once spent a long time debugging a problem caused by:


my $words, $lines = (0, 0);

But that was in the Dark Ages before I was using strict everywhere. Had I used strict, Perl would have told me that $lines was undeclared.

6.2.11 Superfluous Initialization

As long as I'm picking nits, hashes and arrays automatically start life empty; there is no equivalent of undef for an aggregate. Therefore this:


my @foo = ();

my %bar = ();

is just the same as this:


my (@foo, %bar);

It might be a small point, but when I'm compressing legacy code so I can see enough of it to understand it better, every line saved helps.

6.2.12 Thinking tr and the Right Side of s/// Use Regular Expressions

This is another one for the Just Plain Wrong camp. For some reason, tr (transliterate) is an operator that is bound to a variable via =~, just as the operators m// and s/// are. When Perl was invented, transliteration constituted a more important part of the language and its applications than it does now, and so it made sense to minimize its syntax; if you're transliterating $_, you don't even need to use =~ at all, for example:


tr/a-zA-Z/n-za-mN-ZA-M/;         # rot-13

Unfortunately, seeing tr in the company of =~ so often leads some people to believe that it is an operator that takes regular expressions, so they write code like this:


tr/[a-z]/[A-Z]/;                 # Don't use this

thinking that they are employing character classes that will uppercase text. By coincidence, they get the result they wanted, without realizing that this code means, "Turn left square brackets into left square brackets, turn lowercase letters to uppercase letters, and turn right square brackets into right square brackets." They have made an incorrect assumption that will lead them into trouble when they try something different with tr.

To find out how tr is really used, see perlop.

Likewise, because the second argument of the substitution operator, s///, is so close to the first one, people sometimes put regular expression syntax there. This can be benign:


s/\.txt\b/\.doc/g     # Superfluous backslash

or malignant:


s/\.txt\b/\.doc\b/g   # Superfluous backslash and backspace

The second argument—or right side, if you like—of s/// is treated as a double-quoted string, which obeys different syntax rules from a regular expression.

6.2.13 Using Symbolic References

I promised back in Section 5.3.3 that I would expand on how to avoid symbolic references. The first symbolic reference avoidance tool is the hash. When you see code like this:


$bob = 42;  # Bob's age

$jim = 71;  # Jim's age

$art = 57;  # You get the idea...



$name = 'bob';  # Or set $name some other way

$age = $$name;  # Get their age with a symbolic reference

it means that the programmer didn't know about hashes. This is a pity, because it's a rare Perl program that can't make good use of them. In this case, the code should be turned into:


my %age = (bob => 42, jim => 71, art => 57);

my $name = 'bob';       # Or set $name some other way

my $age = $age{$name};  # Get their age from the hash

The second symbolic reference avoidance tool is the "hard" reference. Too few programmers know how to use hard references. (I'll mostly call them "references" from now on, and say "symbolic references" when I mean the other kind, because that's how Perl people talk.) Unfortunately the most common tutorial for learning Perl [SCHWARTZ01] doesn't cover them. Fortunately, Randal Schwartz and Tom Phoenix rectified that omission with a book specifically covering references and objects [SCHWARTZ03a].

A reference is preferable to a symbolic reference because the only way to create a reference is to have something to refer to. So if you receive a reference, you know you can dereference it. A symbolic reference, on the other hand, is just a string. There may or may not be an entity in your program with the same name; who knows how the symbolic reference was constructed. So, as your mother might tell you, put that symbolic reference down, you don't know where it's been.

Here's some code that doesn't know about references:

Example 6.1. Symbolic References

1  my (%title, %author, %pubyear, %artist);

2  my $isb n        = '0-123-79526-4';

3  $title{$isb n}   = "Perl Medic";

4  $author{$isb n}  = "Peter Scott";

5  $pubyear{$isb n} = 2004;

6  $artist{$isb n}  = "Ann Palmer";

7

8  # Later that same program...

9  

10 foreach my $isb n (sort keys %title)

11 {

12   print "$isb n: ";

13   foreach $attribute (qw(title author pubyear artist))

14   {

15     print "$attribute: ", $$attribute{$isb n}, " ";

16   }

17   print "\n";

18 }

Aside from the use of a symbolic reference in line 15, this code commits at least two other offenses: First, it separates attributes (title, author, etc.) that all belong to the same entity (a book) into different data structures. Secondly, because of that separation, it is forced to assume that all those data structures have the same keys and arbitrarily pick one of them to get a list of keys from. This also wastes space.

But with references, we can make a multidimensional data structure:

Example 6.2. Multidimensional Hash

1  my %book;

2  my $isb n              = '0-123-45678-9';

3  $book{$isb n}{title}   = "Perl Medic";

4  $book{$isb n}{author}  = "Peter Scott";

5  $book{$isb n}{pubyear} = 2004;

6  $book{$isb n}{artist}  = "Ann Palmer";

7

8  # Later that same program...

9

10 foreach my $isb n (sort keys %book)

11 {

12   print "$isb n: ";

13   foreach my $attribute (sort keys %{$book{$isb n}})

14   {

15     print "$attribute: $book{$isb n}{$attribute}, ";

16   }

17   print "\n";

18 }

This is good, but our itch to remove superfluous code needs scratching. Our knowledge of references enables us to remove duplicate code in lines 3–6:

Example 6.3. Loading Multidimensional Hash

1  $book{$isb n} = { title   => "Perl Medic", 

2                   author  => "Peter Scott",

3                   pubyear => 2004,

4                   artist  => "Ann Palmer")

5                 };

By just assigning an anonymous hash reference to the primary key we now have code that looks exactly like what it does with no extra verbiage.

6.2.14 Reinventing CGI.pm

Chapter 8 covers this mistake in detail, but it is so endemic among novice and intermediate users that I must mention it here. If reinventing File::Find deserves a public service announcement, then reinventing CGI.pm merits a hazmat team from the Centers for Disease Control. Countless people waste untold hours writing code to parse form submissions to CGI scripts. The lines of code for decoding hexadecimal encoding multiply around the Internet like the Black Death devastating Europe. Aside from being occasionally egged on by misguided bystanders pretending to know what they're talking about, there is no rhyme or reason to people's spreading this plague.

It is not easy to do CGI decoding properly; Lincoln Stein has done a fantastic job of staying on top of all the issues in his CGI.pm. In Section 8.3.1, I'll show you how to use it.

6.2.15 Returning undef

All subroutines return something, whether you use it or not. You don't even need to use a return statement in a subroutine to make it happen; the value of a subroutine is the value of the last expression evaluated in the subroutine if Perl didn't encounter a return statement before the end of the subroutine.

Often we write a subroutine whose job is to extract some feature from an item of data, and we have to decide what the subroutine should do in the event that feature doesn't exist. For example, suppose we're returning a person's work phone number, but they're unemployed. This is a natural use of Perl's undef value, so you sometimes see people write that like this:


sub work_phone

{

  # ...

  # No work phone!  Not even self-employed...

  return undef;

}

But this is a mistake. Granted, it works when used like this:


$bus_fone = work_phone(...);

or:


if (defined(work_phone(...))

But what if it is called in a list context? For instance:


push @biz_fonez, work_phone(...);

Then the array will have an undefined element in it. What makes more sense is for nothing to get pushed on the array in the first place; that is, work_phone() should return an empty list when called in a list context. "A-ha!" you say, "I know that wantarray() will tell me what context I am called in." And so you write:


sub work_phone

{

  # ...

  # No work phone!  Not even self-employed...

  return wantarray ? () : undef;

}

Well, guess what? That's exactly what the default argument to return is. So you could have just said:


return;

to begin with.

6.2.16 Using for Instead of foreach

Iterating through an array using the C-style for loop:


for (my $i = 0; $i <= $#array; $i++)

{

  # Do something with $array[$i]

}

reveals that the programmer is unaware of the much clearer option of foreach:


foreach my $item (@array) 

{

  # Do something with $item

}

Perhaps they knew about foreach, but wanted to change each element of the array, and thought that the foreach loop variable was just a copy of each element. Not so; it is in fact an alias for any writable member of the foreach list:


foreach my $item (@numbers)

{

  $item **= 2;     # Square it

}

Not only that, but foreach uses $_ as a default loop variable, and can also be used in a suffix form. Perl even allows you to use for as a synonym for foreach (because it can tell the difference based on whether it sees a list or statements separated by semicolons). Combining all these features gives us a highly concise construction:


$_ ** = 2 for @numbers;   # Square each element of the array

In general, $#array is abused far more often than it is used appropriately, so question every use of it that you see.

6.2.17 Useless Regular Expressions

Using Perl's regular expression engine is like summoning a genie: You don't want to invoke that kind of power just to ask for a diet soda. So using a regular expression to test for straightforward string equality:


if ($tree = ~ /^larch$/) 

is unnecessarily confusing. If you want to test for equality, say so:


if ($tree eq "larch") 

Conversely, don't be afraid to use a regular expression even where there are alternatives if it's clearer:


if ($tree =~ /larch/)            # Regex check for substring...

if (index($tree, "larch") >= 0)  # ... is clearer than this

Use index() if you need to know the position the substring matched at, if you need to start searching from a particular offset, or if performance is crucial.[4]

[4] Okay, so you might have spotted that there's a tiny difference between the effect of the match and the effect of eq: the regular expression can match not only the string "larch" but the same string with a newline appended. That's one of those convenience features of Perl (in this case, the meaning of the $ in the regular expression) that many people aren't aware of, but just does what they want anyway. But most people who write /^larch$/ aren't doing it because they know the string might end with a newline.

    Previous Table of Contents Next