Приглашаем посетить
Соллогуб (sollogub.lit-info.ru)

Section 14.4.  In-situ Arguments

Previous
Table of Contents
Next

14.4. In-situ Arguments

Allow the same filename to be specified for both input and output.

When users want to do in-situ processing on a file, they often specify it as both the input and output file:

    > lustrate -i sample_data -o sample_data -op=normalize

But if the -i and -o flags are processed independently, the program will usually open the file for input, open it again for output (at which point the file will be truncated to zero length), and then attempt to read in the first line of the now-empty file:

    # Open both filehandles...
    use Fatal qw( open );
    open my $src,  '<', $source_file;
    open my $dest, '>', $destination_file;

    # Read, process, and output data, line-by-line...
    while (my $line = <$src>) {
        print {$dest} transform($line);
    }

Not only does this not perform the requested transformation on the file, it also destroys the original data, which conveniently prevents users from feeling frustrated, by making them irate instead.

Clobbering data files in this way during an in-situ update is perhaps the single commonest command-line interface design error. Fortunately, it's extremely easy to avoidjust make sure that you unlink the output file before you open it:


    
# Open both filehandles...
use Fatal qw( open ); open my $src, '<', $source_file; unlink $destination_file; open my $dest, '>', $destination_file;
# Read, process, and output data, line-by-line...
while (my $line = <$src>) { print {$dest} transform($line); }

If the input and output files are different, unlinking the output file merely removes a file that was about to be rewritten anyway. Then the second open simply recreates the output file, ready for writing.

If the two filenames actually refer to a single in-situ file, unlinking the output filename removes that filename from its directory, but doesn't remove the file itself from the filesystem. The file is already open through the filehandle in $input, so the filesystem will preserve the unlinked file until that input filehandle is closed. The second open then creates a new version of the in-situ file, ready for writing.

The only limitation of this technique is that it changes the inode of any in-situ file[*]. That can be a problem if the file has any hard-linked aliases, or if other applications are identifying the file by its inode number. If either of those situations is possible, you can preserve the in-situ file's inode by using the IO::InSitu CPAN module instead:

[*] The inode of a file is the internal data structure that a Unix file system uses to represent and access that file. Within a particular storage device, every file is uniquely identified by the index of its inode: its inode number.


    
# Open both filehandles...
use IO::InSitu; my ($src, $dest) = open_rw($source_file, $destination_file);
# Read, process, and output data, line-by-line...
while (my $line = <$src>) { print {$dest} transform($line); }

The open_rw( ) subroutine takes the names of two files: one to be opened for reading, the other for writing. It returns a list of two filehandles, opened to those two files. However, if the two filenames refer to the same file, open_rw( ) first makes a temporary copy of the file, which it opens for input. It then opens the original file for output. In such cases, when the input filehandle is eventually closed, IO::InSitu arranges for the temporary file to be automatically deleted.

This approach preserves the original file's inode, but at the cost of making a temporary copy of the file. The name of the temporary copy is usually formed by appending '.bak' to the original filename, but this can be altered, by passing an option to open_rw( ).

    Previous
    Table of Contents
    Next