6.1. What Is a Hash?A hash is a data structure like an array, in that it can hold any number of values and retrieve these values at will. However, instead of indexing the values by number, as we did with arrays, we'll look up the values by name. That is, the indices (here, we'll call them keys) aren't numbers but are arbitrary unique strings (see Figure 6-1). The keys are strings, first of all, so instead of getting element number 3 from an array, we'll be accessing the hash element named wilma. These keys are arbitrary stringsyou can use any string expression for a hash key. And they are unique strings; as there's only one array element numbered 3, there's only one hash element named wilma. Another way to think of a hash is that it's like a barrel of data (see Figure 6-2), where each piece of data has a tag attached. You can reach into the barrel and pull out any tag and see what piece of data is attached. But there's no "first" item in the barrel; it's just a jumble. In an array, we'd start with element 0 and then element 1, element 2, and so on. But in a hash, there's no fixed order, no first element. It's just a collection of key/value pairs. Figure 6-1. Hash keys and valuesFigure 6-2. A hash as a barrel of dataThe keys and values are both arbitrary scalars, but the keys are always converted to strings. So, if you used the numeric expression 50/20 as the key,[*] it would be turned into the three-character string "2.5", which is one of the keys shown in the diagram above.
As usual, Perl's no-unnecessary-limits philosophy applies: a hash may be of any size, from an empty hash with zero key/value pairs, up to whatever fills up your memory. Some implementations of hashes (such as in the original awk language, from where Larry borrowed the idea) slow down as the hashes get larger. This is not the case in Perlit has a good, efficient, scalable algorithm.[*] So, if a hash has only three key/value pairs, it's quick to "reach into the barrel" and pull out any one of those. If the hash has 3 million key/value pairs, it should be about as quick to pull out any one of those. A big hash is nothing to fear.
Keys are unique, though the values can be duplicated. The values of a hash may be all numbers, strings, undef values, or a mixture,[] but the keys are arbitrary, unique strings.
6.1.1. Why Use a Hash?When you first hear about hashes, especially if you've lived a long and productive life as a programmer using languages that don't have hashes, you may wonder why anyone would want one of these strange beasts. Well, the general idea is that you'll have one set of data "related to" another set of data. For example, here are some hashes you might find in typical applications of Perl:
Another way to think of a hash is as a simple database, in which one piece of data may be filed under each key. If your task description includes phrases like "finding duplicates," "unique," "cross-reference," or "lookup table," it's likely that a hash will be useful in the implementation. |