How to Look at Data

Previous Table of Contents Next

How to Look at Data

To give this chapter some context, I'm going to invent a fictional company, WidgetCo, and you are going to take on the role of programmer. WidgetCo is a small (10 employee) shop that sells retail replacement parts for antique bicycles. WidgetCo primarily sells through catalogs to bicycle enthusiasts and repair shops.

Your boss, Mr. Widget, has determined that his company will now be taking orders from the Internet. In fact, he would like you as Chief Programmer to streamline the process. Initially, orders will be received through email sent by someone else's web site. (The exact mechanism for taking such orders over the web will be covered in Hours 2124.) For now, your concern is dealing with incoming mail messages.

So what can you expect to see?

Unstructured Data

One kind of data representation is called unstructured. For example, the web site could simply have users type what they want in a text box and mail the results to you. What you'll likely see is something like the following mail message:


Subject: Order

I love your catalog! I have a 1962 Schwinn Astro Special that I'm trying to get working again. It used to be my older brother's bike, but he broke the seat clamp so I'll need one of those. The one on page 62 of the catalog in the right-hand corner should do nicely. Also, can you send me a half yard of the seat material on the next page? My VISA number is 0000-11-2222-3333 and expires in July 2006. Please ship it to 1000 Rose St in Monroe, MI 48089.

This contains all the relevant information, but there's no hope of having a computer program dissect all of this. A human is still going to have to read this message, and re-key the information into your ordering system. Unstructured data is not suitable for any kind of automation.

Table Data

Most likely, the web site will be designed to send data in a structured form. The most common representation is a table. Your email order might arrive and look like the following:


Subject: Order

404 Garden St.

Royal Oak



VISA 0000-11-2222-3333 06/06

12-31441 1

99-00129 1

This is more typical of the kind of data that would be sent through a web application (or any other kind of program). If you've ordered products on the web from an online retailer, you've probably received a confirmation email that looked very much like this. It's still human-readable, but it's structured in such a way that it can be easily processed by software.

Hierarchical Data

Hierarchical data refers to data that is structured, like table data, but contained in parent-child or container relationships.

The Table of Contents of this book is an example of hierarchical data. Each line in the Table of Contents isn't really meaningful unless you know its context. There are 24 entries called "Summary," but each Summary is only meaningful if you know in which chapter it appears. The Index of this book is also hierarchical. There are at least a half-dozen entries for the word "arrays," sometimes as a minor topic (data types/arrays, references/arrays, and so on) and once as a major topic. Exactly where it appears depends a great deal on what is being talked about.

You'll look at hierarchical data in a little while, specifically at a method of representing hierarchical data called XML.

Binary Data

The last category of data is binary data. It's usually quite unreadable by humans. Loading a binary file into your text editor results in gibberish.

Binary data is useful when interchanging data between programs where space and speed are concerns, and human readability is not. Normally to decode binary data, you have to be supplied with the data layout by the person who's also supplying the data. It's very structuredlike the table data in Figure 19.1it's just that the structure isn't apparent. JPEG images, ZIP files, and MP3 music files are all examples of binary data.

Figure 19.1. Marked-up incoming order

    Previous Table of Contents Next
    © 2000- NIV