Hour 19. Data Processing

Previous Table of Contents Next

Hour 19. Data Processing

What You'll Learn in This Hour:

If you're old enough, the phrase "data processing" probably conjures up images of magazine ads from the 1970s proclaiming that you could "make big money as a Data Processor," or minicomputers with whirling tape drives from the 1980s.

But that's not really what we're talking about in this hour. This hour is about taking data from its presented form and making it more useful. This takes a few different stages.

First, you have to examine the raw data to see if it can be cut apart, sliced up, stretched, and massaged into the final form that you need. Usually this step is obvious, but not to be forgotten. If you've got a CD collection, is it possible to assemble a (partial) discography for each band in the collection? Sure. Is it possible to take that and assemble a telephone directory for your company? No, because the raw data you need just isn't in there.

Next you need to pick your tools to read the data, pull it apart, and reassemble it. Our tool of choice, of course, is Perl.

Finallyand this is the part that requires some creativityexamine the data to determine how to pull it apart. Should you cut it into vertical slices (columns)? Horizontal slices (rows)? Make new tables and manipulate those? Do you have to glue two different sources of data together?

This hour will show you some basic techniques for pulling apart your data and reassembling it into a useful form.

By the Way

For further reading, an entire book has been written on the subject of using Perl to manipulate table data, XML data, and parse unstructured data: Data Munging with Perl by David Cross.

    Previous Table of Contents Next
    © 2000- NIV