Приглашаем посетить
Ходасевич (hodasevich.lit-info.ru)

8.1 The Case for CPAN

Previous Table of Contents Next

8.1 The Case for CPAN

It surprises and disappoints me how often people waste time reinventing code for which a module already exists. Here are some reasons why they might be doing that:

  • They don't know that there are non-core modules for Perl, or they don't know where to find them.

  • They don't trust non-core modules.

  • It takes them too long to find a module to do what they want.

  • Their management has a policy against using non-core modules.

  • They think it will be faster to write the code themselves than to figure out how to use a module.

Let me address each of these issues in turn.

8.1.1 Yes, Virginia, There Is a Code Repository

The first concern is easily handled. There is a repository of contributed code; it is called the Comprehensive Perl Archive Network, or CPAN; and it is as easy to reach as you can imagine. It is replicated on hundreds of computers around the world for your convenience, far more than even really necessary these days (the mirroring structure was set up when networks were less reliable, bandwidth much smaller, and machines less powerful). The upshot of this is that under any circumstances short of World War III, at least part of CPAN will be accessible to you at all times if you can reach any reasonable chunk of the Internet. Elaine Ashton's all-encompassing FAQ also makes entertaining reading (http://www.cpan.org/misc/cpan-faq.html). See Section 8.1.3 for more information on how to use CPAN.

8.1.2 Trusting CPAN

How much can you trust code from CPAN? The bad news is that it is true that there is little quality control on CPAN: Anyone can upload any code they want; the only central controls are on the apportionment of the module namespace. There is a testing group (see http://testers.cpan.org/), but they just test that a module passes its own tests on as many different platforms and configurations as they can.[2] Technically, someone could place malicious code on CPAN.

[2] You can also reach the test results for any module via search.cpan.org, if you navigate to the root of a module's distribution, then click on the "CPAN Testers" link. See Figure 8.4.

The good news is that this has not yet happened. Every CPAN distribution has a corresponding checksum (generated automatically by the server when the distribution is uploaded) stored on CPAN so that when you download code you can compare the checksum with what it is supposed to be. (CPAN.pm does this automatically. See Section 8.2.1.)

The reliability of code on CPAN is a separate issue, but still not a pressing concern. I have found the modules I have tested to be of very high quality, but that doesn't mean that there are no poor CPAN modules. I would suggest you apply these heuristics to estimate the pedigree of a module:

  • See if there are reviews of the module at the new site http://cpanratings.perl.org/.

  • Is the module written by a prominent name in the Perl community? If you don't recognize the name, have they written any modules you recognize? (Using search.cpan.org, click on the author's ID near the top of the page to get a listing of their modules.) Do a Google search for their name and "Perl" in Web pages and Usenet articles, and see if you like what you read. At the time of writing there are more than 2,800 authors listed on CPAN and there are so many specialties I have not investigated that it would be grossly unfair to give any kind of list of reputable authors at this point in the context of assessing module reliability.

  • Does the module pass its tests? If it doesn't have tests, or isn't delivered to work properly with CPAN.pm (or Module::Build), those would be red flags.

  • Check the date the module was last updated; if it has been altered in the last three months then the author is giving it regular care and you are more likely to be able to get their attention if you have a problem than you would if the module is very old.

  • If the module has not been touched in two years, it is likely moribund; investigate to see if it has been superseded by another module. However, it is quite possible that if it is a simple module, it doesn't need frequent changing. The module Geography::States, for instance, contains mappings between state names and abbreviations for several countries, and these just don't change often.[3]

    [3] As luck would have it, I looked at Geography::States and found that it hadn't been updated with the (recent!) assignment of the abbreviation for Nunavut in Canada. (The definition of "state" is a bit broad in this module.) I notified the author.

  • Complex modules often have SourceForge projects[4] so that multiple people can cleanly work on them concurrently. This site will be a valuable source of information about the current state of the module.

    [4] http://sourceforge.net/

  • Do a Google search of Usenet articles for the module and sort by date to see if there have been any recent postings indicating approval or otherwise.

8.1.3 How to Find a CPAN Module

Like many things, finding a CPAN module is very easy when you know how. The easiest way to access CPAN is through http://search.cpan.org/ (see Figure 8.1).

Figure 8.1. Front page of search.cpan.org.

graphics/08fig01.jpg

If you want to find out more about CPAN and get some different views of its data, you can go to http://www.cpan.org and browse. One type of data you can find through www.cpan.org but not search.cpan.org is scripts (pre-written programs). search.cpan.org just has modules (reusable libraries of functions and methods).

After entering a query on search.cpan.org, clicking on the desired hit (Figure 8.2) will give you the documentation for the module: its POD automatically rendered into HTML (Figure 8.3). Clicking on the distribution link in the header will take you to a listing of the files that make up the distribution (Figure 8.4); this is a handy way to find the README file that comes with the module, and also displays the cpanratings.perl.org rating. If the module is delivered with any examples, this is a way to inspect them without having to download the whole distribution. The Download link on this page gives you a direct link to the tarball containing that version of the module; but use this only if CPAN.pm won't work for you, because that'll be a much easier way to install a module (see Section 8.2.1).

Figure 8.2. Hits from a search on search.cpan.org.

graphics/08fig02.jpg

Figure 8.3. Module documentation seen via search.cpan.org.

graphics/08fig03.jpg

Figure 8.4. Module distribution seen via search.cpan.org.

graphics/08fig04.jpg

Clicking on the Source link under the header will take you to the module source code, which allows you to answer any remaining questions you might have about how the module works.

8.1.4 Dealing with an Anti-CPAN Policy

A policy that says "No CPAN code" is born out of fear or ignorance. Every day, companies and governments trust millions, even billions of dollars to CPAN modules. Refer your management to "Perl Success Stories" [OREILLY99] for examples. If they are concerned about quality control, point out that modules such as PerkTk and DBI are nearly as complex as the entire Perl core (and presumably they already trust the Perl core or you wouldn't be reading this) and receive attention from dozens of maintainers and thousands of users. Sadly, though, many organizations suffer from the "Not Invented Here" syndrome, which causes them to be blind to important new technologies.

8.1.5 Make or Buy?

Our final concern arrives in the form of a venerable question that takes on a special aura when applied to modules. We programmers have a well-known tendency to underestimate the time it takes to create code. If we were right the first time about interface and design decisions, never hit any problems, didn't have to write tests because the code worked to begin with, and could leave out documentation, then our schedule estimates might only be too optimistic by a factor of two.

Otherwise, with the exception of a few people in CMM-SW (Software Engineering Institute Capability Maturity Model for Software) level 5 environments [CMU03] or devout adherents of the Personal Software Productivity process [HUMPHREY94], we're likely to be way off the mark. "Create a module for parsing CGI form inputs—no problem! Three, four hours at the most," we think. This is not just abusing the virtue of hubris, this is mugging it in a dark alley.

Before you decide to hack out your own module without looking for an equivalent on CPAN, consider what you will have to do, one way or another, sooner or later, in your module:

  • Document it (in POD of course).

  • Comment it for maintenance programmers.

  • Port it to any platforms or versions of Perl that might be required by users.

  • Write regression tests for users.

  • Respond to user feedback and requests.

The only time it makes sense not to use a CPAN module that already does what you want is if it is such a bad fit in other respects that you will spend longer learning its documentation or working around unnecessary interfaces than it would take to create a module that does only what you want. Of course, false hubris makes us think that this is true 90 percent of the time. But if you truly believe that a particular CPAN module is not worth the effort after reading this chapter, then as a last resort before you proceed to do it yourself, consider whether you can eliminate the interface problems by subclassing the module.

    Previous Table of Contents Next