Приглашаем посетить
Полевой Н.А. (polevoy.lit-info.ru)

Section 9.7.  Building Recursively Defined Data

Previous
Table of Contents
Next

9.7. Building Recursively Defined Data

Suppose we wanted to capture information about a filesystem, including the filenames and directory names, and their included contents. Represent a directory as a hash, in which the keys are the names of the entries within the directory, and values are undef for plain files. A sample /bin directory looks like:

my $bin_directory = {
  cat  => undef,
  cp   => undef,
  date => undef,
  ... and so on ...
};

Similarly, the Skipper's home directory might also contain a personal bin directory (at something like ~skipper/bin) that contains personal tools:

my $skipper_bin = {
  navigate            => undef,
  discipline_gilligan => undef,
  eat                 => undef,
};

Nothing in either structure tells where the directory is located in the hierarchy. It just represents the contents of some directory.

Go up one level to the Skipper's home directory, which is likely to contain a few files along with the personal bin directory:

my $skipper_home = {
  '.cshrc'                        => undef,
    'Please_rescue_us.pdf'        => undef,
    'Things_I_should_have_packed' => undef,
  bin                             => $skipper_bin,
};

Ahh, notice that we have three files, but the fourth entry bin doesn't have undef for a value but rather the hash reference created earlier for the Skipper's personal bin directory. This is how we indicate subdirectories. If the value is undef, it's a plain file; if it's a hash reference, we have a subdirectory, with its own files and subdirectories. Of course, we can have combined these two initializations:

my $skipper_home = {
  '.cshrc'                    => undef,
  Please_rescue_us.pdf        => undef,
  Things_I_should_have_packed => undef,

  bin => {
    navigate            => undef,
    discipline_gilligan => undef,
    eat                 => undef,
  },
};

Now the hierarchical nature of the data starts to come into play.

Obviously, we don't want to create and maintain a data structure by changing literals in the program. We should fetch the data by using a subroutine. Write a subroutine that returns undef for a given pathname if the path is a file, or a hash reference of the directory contents if the path is a directory. The base case of looking at a file is the easiest, so let's write that:

sub data_for_path {
  my $path = shift;
  if (-f $path) {
    return undef;
  }
  if (-d $path) {
    ...
  }
  warn "$path is neither a file nor a directory\n";
  return undef;
}

If the Skipper calls this on .cshrc, he'll get back an undef value, indicating that a file was seen.

Now for the directory part. We need a hash reference, which we declare as a named hash inside the subroutine. For each element of the hash, we call ourselves to populate the value of that hash element. It goes something like this:

sub data_for_path {
  my $path = shift;
  if (-f $path or -l $path) {        # files or symbolic links
    return undef;
  }
  if (-d $path) {
    my %directory;
    opendir PATH, $path or die "Cannot opendir $path: $!";
    my @names = readdir PATH;
    closedir PATH;
    for my $name (@names) {
        next if $name eq '.' or $name eq '..';
        $directory{$name} = data_for_path("$path/$name");
    }
    return \%directory;
  }
  warn "$path is neither a file nor a directory\n";
  return undef;
}

The base cases in this recursive algorithm are the files and symbolic links. This algorithm wouldn't correctly traverse the filesystem if it followed symbolic links to directories as if they were true (hard) links, since it could end up in a circular loop if the symlink pointed to a directory that contained the symlink.[*] It would also fail to correctly traverse a malformed filesystemthat is, one in which the directories form a ring rather than a tree structure, say. Although malformed filesystems may not often be an issue, recursive algorithms in general are vulnerable to errors in the structure of the recursive data.

[*] Not that any of us have ever done that and wondered why the program took forever. The second time really wasn't our fault anyway, and the third time was just bad luck. That's our story and we're sticking to it.

For each file within the directory being examined, the response from the recursive call to data_for_path is undef. This populates most elements of the hash. When the reference to the named hash is returned, the reference becomes a reference to an anonymous hash because the name immediately goes out of scope. (The data itself doesn't change, but the number of ways in which we can access the data changes.)

If there is a subdirectory, the nested subroutine call uses readdir to extract the contents of that directory and returns a hash reference, which is inserted into the hash structure created by the caller.

At first, it may look a bit mystifying, but if we walk through the code slowly, we'll see it's always doing the right thing. Test the results of this subroutine by calling it on . (the current directory) and inspecting the result:

use Data::Dumper;
print Dumper(data_for_path('.'));

Obviously, this will be more interesting if our current directory contains subdirectories.


Previous
Table of Contents
Next