Pipes

Pipes in Unix and MS-DOS/Windows are used for connecting different processes together so that the output of one process becomes the input of the next process. Consider the following set of commands, which would almost work in Unix (if you change the dir to ls) or MS-DOS:

dir > outfile
sort outfile > newfile
more newfile

The output of dir is collected in outfile. Then sort is used to sort outfile, and the output from sort is stored in newfile. Then more shows the contents of newfile a screenful at a time.

Pipes allow you to perform that same sequence, but without outfile and newfile, like this:

dir | sort | more

The output of dir is given to sort, which then sorts the data. The output from sort is then given to more for page-at-a-time display. No redirection (>) or temporary files are needed.

This kind of command line is called a pipeline, and the vertical line between the commands is called a pipe. Unix relies heavily on pipes to connect its small utilities. MS-DOS and Windows support pipes but have far fewer command-line utilities that work with them.

Perl programs can participate in a pipeline in different ways. First of all, if you have a Perl program that accepts standard input, transforms it, and sends it to standard output, then you can write a command line to insert that Perl program into a pipeline, as in the following example:

dir /B | sort | perl Totaler | more

In the preceding pipeline, Totaler could be a Perl program you write to print the sum of the directory listing and maybe some statistics, along with the directory listing itself. If you're using Unix, change the dir /B to ls -1, and the pipeline works as expected. Listing 11.1 contains the Totaler program.

Listing 11.1. Complete Listing for `Totaler`

1:   #!/usr/bin/perl
2:
3:   use strict;
4:   my($dirs,$sizes,$total);
5:
6:   while(<STDIN>) {
7:       chomp;
8:       $total++;
9:       if (-d $_) {
10:          $dirs++;
11:          print "$_\n";
12:          next;
13:      }
14:      $sizes+=(stat($_))[7];
15:      print "$_\n";
16:  }
17:  print "$total files, $dirs directories\n";
18:  print "Average file size: ", $sizes/($total-$dirs), "\n";

Line 6: Each line of input is read from STDIN and assigned to $_. On a pipeline, a program's STDIN is connected to the previous program's STDOUT. So, in the example given, STDIN is being fed by dir /B tHRough sort.
Lines 9?3: If a directory is encountered, its number is totaled separately in $dirs, the directory name is printed, and the loop is started again.
Lines 14?5: Otherwise, the sizes of the files are accumulated in $sizes, and the filenames are printed.
Lines 17?8: The average size of the file is printed, along with the total number of files and directories.

The other way Perl can participate in a pipeline is to treat a pipeline like a file that can either be read from or written to. This is done with the open function in Perl, as shown here:

# Replace "dir /B" with "ls -1" for Unix
open(RHANDLE, "dir /B| sort |") || die "Cannot open pipe for reading: $!";

In the preceding snippet of code, the open function opens a pipeline for reading from dir/B | sort. The fact that Perl is reading from this pipeline is indicated by having the final pipe (|) on the right. When the open function is run, Perl starts the dir /B | sort commands. When the filehandle RHANDLE is read, the output from sort is read into the Perl program.

Now consider this example:

open(WHANDLE, "| more") || die "Cannot open pipe for writing: $!";

This open function opens a pipeline for writing to the more command. The pipe symbol on the left means that Perl is going to write to the pipe. All printing to the WHANDLE filehandle is buffered by more and displayed a page at a time. Writing the function like this might be a good way to get your program's output displayed a page at a time.

When you are done with a filehandle that has been opened to a program—like RHANDLE and WHANDLE—it is very important that you close the filehandle properly. The reason is that the programs started by open must be properly shut down, and using close on the filehandle ensures that. Failing to close the filehandle when you're done with it could result in the programs continuing to run even after your Perl program has ended.

When closing a filehandle that's been opened on a pipe, the close function indicates whether the pipeline was successful. Therefore, you should be careful to check the return value of close like this:

close(WHANDLE) || warn "pipe to more failed: $!";

By the Way

The reason that the open function might not tell you whether the pipeline was successfully started has to do with Unix's design. When Perl constructs the pipeline and starts it, it's not sure that the pipeline will actually work; if the pipeline is assembled properly and starts, it's assumed that it will finish properly. When the last program in the pipe completes, it should return a successful exit status. The close function can read that status to tell whether everything went all right; otherwise, an error results.

Table of Contents

Pipes

Pipes

Listing 11.1. Complete Listing for Totaler

Listing 11.1. Complete Listing for `Totaler`