Dealing with Table Data

Previous Table of Contents Next

Dealing with Table Data

To continue the example, for the moment have the recipient of the email simply cut and paste it into your Perl program. Now you'll need a Perl program to read the incoming email. Figure 19.1 is the same as the previous example of table data, except it's marked up a bit to indicate what the Perl script is going to have to deal with.

The data occurs in four distinct blocks:

  • The email header. You can discard this, as it will vary from email system to email system and isn't a reliable source for information.

  • The email address and postal address of the customer. There are two lines for the street address, and in this example the second line is blank.

  • A payment block. This seems to be three sections: a card type, card number, and card expiration.

  • The order information. This consists of part numbers and a quantity.

Now that you've discovered the structure of the data (which you should confirm with the originator), you can write a Perl program to pull it apart.

Example: Email Order Taker

Initially, your email order program is going to read the pasted email and determine if the structure is valid and the order is complete. You'll add some more features later on. The program to read the email is presented in Listing 19.1. Most of the program is a subroutine to create a structure to contain the order. You'll expand on it shortly to do something with the order.

This listing contains a feature you haven't seen before, the global variable $/. The global variable $/ controls how Perl reads text files. If set to undef, the next read will continue to the end of the file. If set to the empty string ("") the next read will read the next paragraph of data. That is, it will continue reading until the next empty line.

Listing 19.1. Email Reader

1: #!/usr/bin/perl -w


3: use strict;


5: sub read_message {

6:       {

7:               local $/="";

8:               my $headers = scalar <STDIN>;

9:       }

10:      my @body=<STDIN>;

11:      chomp @body;

12:      my($name, $addr1, $addr2, $city, $state, $zip)=splice(@body, 0, 6);


14:      shift @body;    # Throw one blank line away

15:      $_=shift @body; # Grab the next line, should be CC Info

16:      my($cardname, $cardno, $exp);

17:      if (/^(\w+)\s(\S+)\s(\d+\/\d+)/) {

18:              $cardname=$1;

19:              $cardno=$2;

20:              $exp=$3;

21:      }


23:      # Process the item lines

24:      my @items;

25:      foreach(@body) {

26:              if (! /^([-\d]+)\s+(\d+)/) {

27:                      next;

28:              }

29:              push @items, [ $1, $2];

30:      }

31:      return { name => $name,

32:               addr1 => $addr1, addr2 => $addr2,

33:               city  => $city,  state => $state,

34:               zip   => $zip,

35:               card  => $cardname, cardno => $cardno,

36:               expires => $exp,

37:               items => \@items

38:               };

39: }

Line 6: Setting $/ to "" will cause the next read to slurp up an entire paragraph. By using the local keyword on this global value, at the end of the block $/ will revert to its previous value.

Line 7: The first paragraph is read and discarded. This will eliminate the email headers.

Lines 1011: The remainder of the input is read into @body, and the trailing newline characters are removed.

Line 12: The first six lines of @body are removed and assigned to meaningful variables.

Line 17: The value in $_ is pattern-matched for a word, followed by something surrounded by spaces, and then a pair of numbers separated by slashes.

Lines 1820: The matched values are assigned to meaningful variables.

Lines 2627: Each remaining line in the body is matched to see if it's a series of dashes and digits followed by a space and then more digits. Invalid lines are thrown away.

Line 29: An array reference consisting of two elementsthe item # and quantityare pushed into @items. When you're done, @items will be an array of array references.

Lines 3138: A hash reference is created containing all the elements of the order. The element items are a reference to the array @items.

To read the order, you'd call the subroutine with something like this:

my $order = read_message();

When that's done, use the hash reference to get to individual elements of the order:

print $order->{name};          # Name on the order

print $order->{items}->[0]->[0];  # First item's item number on the order

Now add a function to print an order. Print it to a file, and then launch an editor so that it can be printed. In this way, the order can follow the usual paper-trail through the office. Append Listing 19.2 onto Listing 19.1 to enable you to print the order.

Listing 19.2. Order Printer

1: sub print_order {

2:       my ($order)=@_;

3:       open(OF, ">orderfile.txt") || die "Can't open orderfile.txt: $!";

4:       print OF "Customer:\t$order->{name}\n";

5:       print OF "\t\t$order->{addr1}\n";

6:       print OF "\t\t$order->{addr2}\n";

7:       print OF "\t\t$order->{city}, $order->{state}  $order->{zip}\n\n";

8:       print OF "$order->{card} $order->{cardno} $order->{expires}\n";

9:       print OF "\nItem #\tQuantity\n";

10:      foreach my $item (@{ $order->{items}}) {

11:              print OF "$item->[0]\t$item->[1]\n";

12:      }

13:      close(OF);

14:      system("notepad orderfile.txt");

15: }

There's no real rocket science in this program that you haven't seen before. An order file is created, filled with some data, and then closed. Afterwards, you make an external call to the operating system to open an editor. Presumably, the order can be printed from there.

By the Way

There is a cure for the bulky print statements that make up half of this function in Listing 19.2. In Hour 23, "Complex Forms," you'll read about here documents that allow for an easier presentation of this code.

Now to use these functions, you'd write the main body of the program like this:

my $order = read_message();


When you start the program, it will wait for input. Cut and paste the order email message (or type it) into the waiting program and then press Control+D (Unix, Mac OS X) or Control+Z, Control+Z (Windows) to end the message.

This program could be modified easily to read input from a file containing the mail message, if you'd prefer.

Example: Verifier for the Email Order

For the next task, the boss has noted that the new order forms don't print the name of the item on themjust the item number, like so:

Item #        Quantity

12-31441    1

99-00129    1

He wants the item's description to print on the form to make filling the order easier. Fortunately for you, the vendor supplies a copy of the parts list as a table in a text file. (Had the vendor supplied the data in a different format such as XML or Excel, you'd soon be able to handle that as well; I discuss it later in this hour and in Hour 20.) The vendor's parts list file looks something like Listing 19.3.

Listing 19.3. Vendor Parts List

1231441        1' seat post clamp

3511221        brake cable set

3512314        brake lever assembly

3588123        brake calipers

6692818        1 1/8 carbon fork

6055232        fork oil

1600112        derailleur hanger

1619921        derailleur housing end caps

9900127        vinyl seat cover, red

9900129        vinyl seat cover, blue

For the rest of this example to work, Listing 19.3 should be saved in a file called partslist.txt.

At this point, stop, step back, and take a good long look at the data here. Time spent now will make things easier later on. A few observations:

  • The table is relatively straightforward: a part number, some number of spaces (or tabs), and then a description that goes to the end of the line.

  • Your part numbers contain dashes; the vendor's do not. Your warehouse people like the dashes because they denote the type of item, and where to find them. So keep the dashes, but find a way to match the tables up somehow.

  • Part numbers are usually unique. This case seems no different. Remember from Hour 7, "Hashes," that tables that have a unique index are quickly and efficiently dealt with in hashes. This part table should wind up in a hash eventually.

Listing 19.4 should be appended to Listing 19.1 (and Listing 19.2) to make the final program.

Listing 19.4. Add Descriptions to Parts

1: sub add_descriptions {

2:       my($order) = @_;


4:       my %parts;

5:       open(PL, "partslist.txt") || die "Can't read parts listing: $!";

6:       while(<PL>) {

7:               if (/^(\d\d)(\d+)\s+(.*)/) {

8:                       $parts{"$1-$2"}=$3;

9:               }

10:      }

11:      close(PL);


13:      foreach my $item (@{ $order->{items} }) {

14:              $item->[2] = $parts{$item->[0]};

15:      }

16: }

Lines 56: The parts table is opened and processed one line at a time.

Lines 78: Each line is matched. The first two digits are saved in $1, subsequent digits in $2, spaces are skipped, and then everything else is stored in $3. These values are then stored in the parts table %parts.

Line 13: Loop through each item on the order, assigning it to $item.

Line 14: Add a third element ([2]) to the item, which is the looked-up description from %parts.

In addition, Listing 19.2, line 11 should be altered to read the following:

print OF "$item->[0]\t$item->[1]\t$item->[2]\n";

So your description will print to the right of the quantity ordered.

To use your subroutines, only three function calls are needed:

my $order = read_message();



Your order will be read, descriptions added, and then displayed in Notepad ready for printing.

    Previous Table of Contents Next
    © 2000- NIV