Exercise: Statistics

Previous Table of Contents Next

Exercise: Statistics

Now that you've learned about subroutines, you should begin to see the benefits of encapsulating code in self-contained subroutines. They provide code that can be easily reused. In this exercise, three subs provide some analysis on groups of numbers.

Just to refresh your memory from school, the mean, also called an arithmetic mean or average, of a set of numbers is simply the sum of all the numbers in the set divided by how many numbers there are in the set. The median is the number that would be in the middle if you were to sort the set numerically; with an even number of elements, the median is the average of the two numbers that would be in the middle. The standard deviation gives an idea of how "bunched" the numbers are around the mean. A high standard deviation that means the numbers are widely distributed; a small one means they're bunched tightly around the average. In many sets of numbers commonly found in nature, the mean, plus or minus the standard deviation, represents about 68 percent of the set of the numbers; plus or minus two standard deviations, 95 percent of the set of numbers. Here is a program that finds the mean, median, and standard deviation for a set of numbers that the user types at the keyboard.

Using your text editor, type the program from Listing 8.1 and save it as Stats. Again, do not type line numbers, and if possible, be sure to make the program executable according to the instructions you learned in Hour 1, "Introduction to the Perl Language."

When you're all done, try running the program by typing the following at a command line:


or, if your system cannot make the program executable,

perl Stats

Listing 8.1. Complete Listing for Stats Program

1:   #!/usr/bin/perl -w


3:   use strict;

4:   sub mean {

5:       my(@data) = @_;

6:       my $sum;

7:       foreach(@data) {

8:           $sum += $_;

9:       }

10:      return($sum / @data);

11:  }

12:  sub median {

13:      my(@data)=sort { $a <=> $b} @_;

14:      if (scalar(@data) % 2) {

15:          return($data[@data / 2]);

16:      } else {

17:          my($upper, $lower);

18:          $lower=$data[@data / 2];

19:          $upper=$data[@data / 2 - 1];

20:          return(mean($lower, $upper));

21:      }

22:  }

23:  sub std_dev {

24:      my(@data)=@_;

25:      my($sq_dev_sum, $avg)=(0,0);


27:      $avg = mean(@data);

28:      foreach my $elem (@data) {

29:          $sq_dev_sum += ($avg - $elem) **2;

30:      }

31:      return(sqrt($sq_dev_sum / ( @data - 1 )));

32:  }

33:  my($data, @dataset);

34:  print "Please enter data, separated by commas: ";

35:  $data = <STDIN>;  chomp $data;

36:  @dataset = split(/[\s,]+/, $data);


38:  print "Median: ", median(@dataset), "\n";

39:  print "Mean: ", mean(@dataset), "\n";

40:  print "Standard Dev.: ", std_dev(@dataset), "\n";

Line 1: This line contains the path to the interpreter (you can change it so that it's appropriate to your system) and the -w switch. Always have warnings enabled!

Line 3: The use strict directive means that all variables must be declared with my and that bare words must be quoted.

Lines 411: The mean() function works by using a foreach loop to add up all the numbers in $sum and then divides by the number of numbers.

Lines 1221: The median() function works in two ways. With an odd number of elements, it simply picks the middle element by taking the length of the array and dividing it by two and then using the integer portion. With an even number of elements, it does the same but instead takes the two middle numbers. Those numbersin $upper and $lowerare then averaged with the mean() function and returned as the median.

Lines 2332: The std_dev() function is simple, but mostly just math. In short, each element in @data is subtracted from the mean and squared. The result is then accumulated in $sq_dev_sum. To find the standard deviation, the sum of the squared differences is divided by the number of elements minus 1, and then the square root is taken.

Lines 3335: The variables needed in the main body of the program are declared as lexicals (with my), and the user is prompted for $data. The variable $data is then split into the array @dataset using the pattern /[\s,]+/. This pattern splits the line on commas and spaces. Extra spaces and commas are ignored.

Lines 3840: The output is produced. Notice that this isn't the only place the functions mean(), median(), and std_dev() are called. They also call each other: std_dev() and median() both use mean(), which is a good example of code reuse!

Listing 8.2 shows a sample of the statistics program's output.

Listing 8.2. Sample Output from Stats

Please enter data, separated by commas: 14.5,6,8,9,10,34

Median: 9.5

Mean: 13.5833333333333

Standard Dev.: 10.3943093405318

    Previous Table of Contents Next
    © 2000- NIV