Section 7.4.  Closures

Table of Contents

7.4. Closures

We could also use File::Find to find out some other things about files, such as their size. For the callback's convenience, the current working directory is the item's containing directory, and the item's name within that directory is found in $_.

Maybe you have noticed that, in the previous code, we used $File::Find::name for the item's name. So which name is real, $_ or $File::Find::name? $File::Find::name gives the name relative to the starting directory, but during the callback, the working directory is the one that holds the item just found. For example, suppose that we want find to look for files in the current working directory, so we give it (".") as the list of directories to search. If we call find when the current working directory is /usr, find looks below that directory. When find locates /usr/bin/perl, the current working directory (during the callback) is /usr/bin. $_ holds perl, and $File::Find::name holds ./bin/perl, which is the name relative to the directory in which you started the search.

All of this means that the file tests, such as -s, automatically report on the just-found item. Although this is convenient, the current directory inside the callback is different from the search's starting directory.

What if we want to use File::Find to accumulate the total size of all files seen? The callback subroutine cannot take arguments, and the caller discards its result. But that doesn't matter. When dereferenced, a subroutine reference can see all visible lexical variables when the reference to the subroutine is taken. For example:

use File::Find;

my $total_size = 0;
find(sub { $total_size += -s if -f }, '.');
print $total_size, "\n";

As before, we call the find routine with two parameters: a reference to an anonymous subroutine and the starting directory. When it finds names within that directory (and its subdirectories), it calls the anonymous subroutine.

Note that the subroutine accesses the $total_size variable. We declare this variable outside the scope of the subroutine but still visible to the subroutine. Thus, even though find invokes the callback subroutine (and would not have direct access to $total_size), the callback subroutine accesses and updates the variable.

The kind of subroutine that can access all lexical variables that existed at the time we declared it is called a closure (a term borrowed from the world of mathematics). In Perl terms, a closure is just a subroutine that references a lexical variable that has gone out of scope.

Furthermore, the access to the variable from within the closure ensures that the variable remains alive as long as the subroutine reference is alive. For example, let's number the output files:[*]

[*] This code seems to have an extra semicolon at the end of the line that assigns to $callback, doesn't it? But remember, the construct sub { ... } is an expression. Its value (a coderef) is assigned to $callback, and there's a semicolon at the end of that statement. It's easy to forget to put the proper punctuation after the closing curly brace of an anonymous subroutine declaration.

use File::Find;

my $callback;
  my $count = 0;
  $callback = sub { print ++$count, ": $File::Find::name\n" };
find($callback, '.');

Here, we declare a variable to hold the callback. We cannot declare this variable within the naked block (the block following that is not part of a larger Perl syntax construct), or Perl will recycle it at the end of that block. Next, the lexical $count variable is initialized to 0. We then declare an anonymous subroutine and place its reference into $callback. This subroutine is a closure because it refers to the lexical $count variable.

At the end of the naked block, the $count variable goes out of scope. However, because it is still referenced by subroutine in $callback, it stays alive as an anonymous scalar variable.[*] When the callback is invoked from find, the value of the variable formerly known as $count is incremented from 1 to 2 to 3, and so on.

[*] To be more accurate, the closure declaration increases the reference count of the referent, as if another reference had been taken explicitly. Just before the end of the naked block, the reference count of $count is two, but after the block has exited, the value still has a reference count of one. Although no other code may access $count, it will still be kept in memory as long as the reference to the sub is available in $callback or elsewhere.

Table of Contents
© 2000- NIV