Приглашаем посетить
Почтовые индексы (post.niv.ru)

Section 3.1.  Formats and Text::Autoformat

Previous
Table of Contents
Next

3.1. Formats and Text::Autoformat

Formats have been in Perl since version 1.0. They're not used very much these days, but for a lot of what people want from text formatting, they're precisely the right thing.

Perl formats allow you to draw up a picture of the data you want to output, and then paint the data into the format. For instance, in a recent application, I needed to display a set of IDs, dates, email addresses, and email subjects with one line per mail. If we assume that the line is fixed at 80 columns, we may need to truncate some of those fields and pad others to wider than their natural width. In pure Perl, there are basically three ways to get this sort of formatted output. There's sprintf (or printf) and substr:

    for (@mails) {
        printf "%5i %10s %40s %21s\n",
            $_->id,
            substr($_->received,0,10),
            substr($_->from_address,-40,40),
            substr($_->subject,0,21);
    }

Then there's pack, which everyone forgets about (and which doesn't give as much control over truncation):

    for (@mails) {
        print pack("A5 A10 A40 A21\n",
          $_->id, $_->received, $_->from_address, $_->subject);
    }

And then there's the format:

    format STDOUT =
    @<<<< @<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<
    $_->id $_->received $_->from_address                       $_->subject
    .

    for (@mails) {
         write;
    }

Personally, I think this is much neater and more intuitive than the other two solutionsand has the bonus that it takes the formatting away from the main loop, making the code less cluttered.[*]

[*] As it happens, I didn't actually use formats in my code, because I wanted to have a variable-width instead of a fixed-width display. But for cases where a fixed-width output is acceptable, this solution is perfect.

Formats are associated with a particular filehandle; as you can see from the example, we've determined that this format should apply to anything we write on standard output. The picture language of formats is pretty simple: fields begin with @ or ^ and are followed by <, |, or > characters specifying left, center, and right justified respectively. After each line of fields comes a line of expressions that fill those fields, one expression for each field. If we like, we could change the format to multiple lines of fields and expressions:

    format STDOUT =
    Id      : @<<<<
    $_->id
    Date    : @<<<<<<<
    $_->received
    From    : @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    $_->from_address
    Subject : @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    $_->subject

    .

We've seen examples of the @-type field. If you're dealing with multi-line formats, you might find that you want to break up a value and show it across several lines of the format. For instance, to display the start of an email alongside metadata about it:

    Id      : 1                                  Hi Simon, Thank you for the
    Date    : 10/12/02                           supply of widgets that you sent
    From    : fred@funglyfoobar.com              me last week. I can assure you
    Subject : Widgets                            that they have all been put ...

This is where the other type of field, the ^ field, comes in: you can achieve the preceding output by using a format like this:

    format STDOUT =
    Id      : @<<<<                              ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    $_->id                                       $message
    Date    : @<<<<<<<                           ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    $_->received                                 $message
    From    : @<<<<<<<<<<<<<<<<<<<<              ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    $_->from_address                             $message
    Subject : @<<<<<<<<<<<<<<<<<<<<...           ^<<<<<<<<<<<<<<<<<<<<<<<<<<<...
    $_->subject                                  $message

    .

Unlike the values supplied to an @ field, which can be any Perl expression, these ^ values must take an ordinary scalar. What happens is that each time the format processor sees a ^ field, it outputs as much as it can from the supplied value and then chops that much off the beginning of the value for the next iteration. The ... sign at the end of the field indicates that if the supplied value is too long, the format should truncate the value and show three dots instead. If you use ^ fields with values found in lexical variables, such as $message in the previous example, you need to declare the lexical variable before the format, or else it won't be able to see the variable.

Another boon of using formats is that you can set a header to be sent out at the top of each pagePerl keeps track of how many lines have been printed by a format so it knows when to send out the next page. The header for a particular filehandle is a format named with _TOP appended to the filehandle's name. The simple use of this is to give column headers to your one-line records:

    format STDOUT_TOP =
    ID    Received   From                                     Subject
    =======================================
    ================ ====================
    .

    format STDOUT =
    @<<<< @<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<
    $_->id $_->received $_->from_address                       $_->subject
    .

Formats are quite handy, especially as you can associate different formats with different filehandles and send data out to multiple locations in different ways. On the other hand, they have some serious shortcomings that you should bear in mind if you're thinking of using them in a bigger application.

First, they're a camping ground for obscure special variables: $% is the current format page number, $= is the number of printable lines per page, $- is the number of lines currently left on the page, $~ is the name of the current output format, $^ is the name of the current header format, and so on. I could not remember a single one of these variables and had to look them up in perlvar.

Formats also deal pretty badly with lexical variables, changing filehandles, variable-length lines, changing formats on the fly, and so on. But they're handy for neat little hacks.

Section 3.1.  Formats and Text::Autoformat

For complete details on Perl's built-in formats, read perlform.


3.1.1. Text::Autoformat

There's a more 21st century way to deal with formatting, however, and that's the Text::Autoformat module. This has two main purposesit wraps text more sensitively than the usual Text::Wrap module or the Unix fmt command, and it provides a syntactically simpler but more featureful replacement for the built-in format language.

Text::Autoformat's text wrapping capabilities are only tangentially related to templating, but they're still worth mentioning here.

The idea behind autoformat is to solve the problem of wrapping structured text; it was created specifically for email messages (with special consideration for quoted text, signatures, etc.), but it's applicable to any structured textual data. For instance, given the text:

    You have:
        * a splitting headache
        * no tea
        * your gown (being worn)
          It looks like your gown contains:
            . a thing your aunt gave you which you don't know what it is
            . a buffered analgesic
            . pocket fluff

fmt fails rather spectacularly:

    You have:
        * a splitting headache * no tea * your gown
        (being worn)
          It looks like your gown contains:
            . a thing your aunt gave you which
            you don't know what it is . a buffered
            analgesic . pocket fluff

In this case, the autoformat subroutine does things a lot better, as it looks ahead at the structure of the text it's formatting:

    You have:
        * a splitting headache
        * no tea
        * your gown (being worn) It looks like your
          gown contains:
            . a thing your aunt gave you which you
              don't know what it is
            . a buffered analgesic
            . pocket fluff

Text::Autoformat's format language is quite similar to Perl's native one, but with some simplifications. First, the distinction between filling @ fields and continuing ^ fields is made by the choice of picture character, not the prefix to the field. Hence, what was:

    @<<<< @<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<

now simply becomes:

    <<<<< <<<<<<<<<< <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<<<<<<

For continuation formats, you now use [ and ], which repeat as necessary on subsequent lines:

    Id      : <<<<<
    Message :
            [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[

This will produce output like the following:

    Id      :     1
    Message :
            Hi Simon, Thank you for the supply of widgets that you sent me
            last week. I can assure you that they have all been put to good...

Unlike Perl's built-in continuation formats, however, be aware that the [ and ] lines repeat the entire format time and time again until the variable is completely printed out. So this, for instance, won't do what you expect:

    Id      : <<<<<   [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[

Instead, it'll produce output something like this:

    Id      :     1   Hi Simon, Thank you for the supply of widgets that you sent
    Id      :         me last week. I can assure you that they have all been put
    Id      :         to good use, and have been found, as usual to be the very...

with even more spectacularly bad results for formats longer than one line.

One big advantage, though, is that with Text::Autoformat, formats are just plain strings instead of cleverly compiled patterns interleaved with code. These strings are processed with the form function, which needs to be exported specifically:

    use Text::Autoformat qw(form);

    my $format = <<EOF;
    Id      : <<<<<
    Date    : <<<<<<<<
    From    : <<<<<<<<<<<<<<<<<<<<<
    Subject : <<<<<<<<<<<<<<<<<<<<<...
    EOF
    my $id = 10;
    my $date = "20/12/02";
    my $from = "Fred Foonly";
    my $subject = "Autoformatted message";
    print form($format, $id, $date, $from, $subject);

Text::Autoformat also provides extremely flexible control over the hyphenation of form fields in a multi-line block, including the ability to plug in other hyphenation routines such as Jan Pazdziora's TeX::Hyphen, the hyphenation algorithm used in Donald Knuth's TeX package. The main disadvantage, however, is that you don't get the same control over headers and footers as you would with write.

Both Perl formats and Text::Autoformat are great for producing formatted output in the style of 1980s form-based programs, but when people think of forms these days, they're more likely to think of things like form letters. Let's move on to look at modules that are more suited to this style of templating.

    Previous
    Table of Contents
    Next