Ïðèãëàøàåì ïîñåòèòü
Ãîðüêèé (gorkiy-lit.ru)

2.3 Find the Dependencies

Previous Table of Contents Next

2.3 Find the Dependencies

Look for documentation that describes everything that needs to exist for this program to work. Complex systems could have dependencies on code written in other languages, on data files produced by other systems, or on network connections with external services. If you can find interface agreements or other documents that describe these dependencies they will make the job of code analysis much easier. Otherwise you will be reduced to a trial-and-error process of copying over the main program and repeatedly running it and identifying missing dependencies until it appears to work.

A common type of dependency is a custom Perl module. Quite possibly the program uses some modules that should have been delivered to you but weren't. Get a list of modules that the program uses in operation and compare it with what you were given and what is in the Perl core. Again, this is easier to do with the currently operating version of the program. First try the simple approach of searching for lines beginning with "use " or "require ". On UNIX, you can use egrep:


% egrep '^(use|require) ' files...

Remember to search also all the modules that are part of the code you were given. Let's say that I did that and the output was:


use strict;

use warnings;

use lib qw(/opt/lib/perl);

use WWW::Mechanize;

Can I be certain I've found all the modules the program loads? No. For one thing, there's no law that use and require have to be at the beginning of a line; in fact I commonly have require statements embedded in do blocks in conditionals, for instance.

The other reason this search can't be foolproof is that Perl programs are capable of loading modules dynamically based on conditions that are unknown until run time. Although there is no completely foolproof way of finding out all the modules the program might use, a pretty close way is to add this code to the program:


END {

  print "Directories searched:\n\t", 

        join ("\n\t" => @INC),

        "\nModules loaded:\n\t",

        join ("\n\t" => sort values %INC),

        "\n";

}

Then run the program. You'll get output looking something like this:


Directories searched:

   /opt/lib/perl

   /usr/lib/perl5/5.6.1/i386-linux

   /usr/lib/perl5/5.6.1

   /usr/lib/perl5/site_perl/5.6.1/i386-linux

   /usr/lib/perl5/site_perl/5.6.1

   /usr/lib/perl5/site_perl/5.6.0

   /usr/lib/perl5/site_perl

   /usr/lib/perl5/vendor_perl/5.6.1/i386-linux

   /usr/lib/perl5/vendor_perl/5.6.1

   /usr/lib/perl5/vendor_perl

   .

Modules loaded:

   /usr/lib/perl5/5.6.1/AutoLoader.pm

   /usr/lib/perl5/5.6.1/Carp.pm

   /usr/lib/perl5/5.6.1/Exporter.pm

   /usr/lib/perl5/5.6.1/Exporter/Heavy.pm

   /usr/lib/perl5/5.6.1/Time/Local.pm

   /usr/lib/perl5/5.6.1/i386-linux/Config.pm

   /usr/lib/perl5/5.6.1/i386-linux/DynaLoader.pm

   /usr/lib/perl5/5.6.1/lib.pm

   /usr/lib/perl5/5.6.1/overload.pm

   /usr/lib/perl5/5.6.1/strict.pm

   /usr/lib/perl5/5.6.1/vars.pm

   /usr/lib/perl5/5.6.1/warnings.pm

   /usr/lib/perl5/5.6.1/warnings/register.pm

   /usr/lib/perl5/site_perl/5.6.1/HTML/Form.pm

   /usr/lib/perl5/site_perl/5.6.1/HTTP/Date.pm

   /usr/lib/perl5/site_perl/5.6.1/HTTP/Headers.pm

   /usr/lib/perl5/site_perl/5.6.1/HTTP/Message.pm

   /usr/lib/perl5/site_perl/5.6.1/HTTP/Request.pm

   /usr/lib/perl5/site_perl/5.6.1/HTTP/Response.pm

   /usr/lib/perl5/site_perl/5.6.1/HTTP/Status.pm

   /usr/lib/perl5/site_perl/5.6.1/LWP.pm

   /usr/lib/perl5/site_perl/5.6.1/LWP/Debug.pm

   /usr/lib/perl5/site_perl/5.6.1/LWP/MemberMixin.pm

   /usr/lib/perl5/site_perl/5.6.1/LWP/Protocol.pm

   /usr/lib/perl5/site_perl/5.6.1/LWP/UserAgent.pm

   /usr/lib/perl5/site_perl/5.6.1/URI.pm

   /usr/lib/perl5/site_perl/5.6.1/URI/Escape.pm

   /usr/lib/perl5/site_perl/5.6.1/URI/URL.pm

   /usr/lib/perl5/site_perl/5.6.1/URI/WithBase.pm

   /opt/lib/perl/WWW/Mechanize.pm

   /usr/lib/perl5/site_perl/5.6.1/i386-linux/Clone.pm

   /usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/Entities.pm

   /usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/Parser.pm

   /usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/PullParser.pm

   /usr/lib/perl5/site_perl/5.6.1/i386-linux/HTML/TokeParser.pm

That doesn't mean that the user code loaded 34 modules; in fact, it loaded 3, one of which (WWW::Mechanize) loaded the rest, mostly via other modules that in turn loaded other modules that—well, you get the picture. Now you want to verify that the program isn't somehow loading modules that your egrep command didn't find; so create a program containing just the results of the egrep command and add the END block, like so:


use strict;

use warnings;

use lib qw(/opt/lib/perl);

use WWW::Mechanize;



END {

  print "Directories searched:\n\t", 

        join ("\n\t" => @INC),

        "\nModules loaded:\n\t",

        join ("\n\t" => sort values %INC),

        "\n";

}

Run it. If the output is identical to what you got when you added the END block to the entire program, then egrep almost certainly found all the dependencies. If it isn't, you'll have to dig deeper.

Even if the outputs match, it's conceivable, although unlikely, that you haven't found all the dependencies. Why? Just because one set of modules was loaded by the program the time you ran it with your reporting code doesn't mean it couldn't load another set some other time. You can't be certain the code isn't doing that until you've inspected every eval and require statement in it. For instance, DBI (the DataBase Independent module) decides which DBD driver module it needs depending on part of a string passed to its connect() method. Fortunately, code that complicated is rare.

Now check that the system you need to port the program to contains all the required modules. Take the list output by egrep and prefix each module with -M in a one-liner like so:


% perl -Mstrict -Mwarnings -MWWW::Mechanize -e 0

This runs a trivial program (0) after loading the required modules. If the modules loaded okay, you won't see any errors. If one or more modules don't exist on this system, you'll see a message starting, "Can't locate module.pm in @INC . . . "

That's quite likely what will happen with the preceding one-liner, and the reason is the use lib statement in the source. Like warnings and strict, lib is a pragma, meaning that it's a module that affects the behavior of the Perl compiler. In this case it was used to add the directory /opt/lib/perl to @INC, the list of directories perl searches for modules in. Seeing that in a program you need to port indicates that it uses modules that are not part of the Perl core. It could mean, as it did here, that it is pointing perl toward a non-core Perl module (WWW::Mechanize) that is nevertheless maintained by someone else and downloaded from CPAN. Or it could indicate the location of private modules that were written by the developers of the program you are porting. Find out which case applies: Look on CPAN for any missing modules. The easiest way to do this is to go to http://search.cpan.org/ and enter the name of each missing module, telling the search engine to search in "modules".[2]

[2] Unless you're faced with a huge list to check, in which case you can script searches using the CPAN.pm module's expand method.

So if we want to write a one-liner that searches the same module directories as the original code, we would have to use Perl's -I flag:


% perl -Mstrict -Mwarnings -I/opt/lib/perl -MWWW::Mechanize \

  -e 0

However, in the new environment you're porting the program to, there may not be a /opt/lib/perl; there may be another location you should install third-party modules to. If possible, install CPAN modules where CPAN.pm wants to put them; that is, in the @INC site-specific directory. (Local business policies might prevent this, in which case you put them where local policy specifies and insert use lib statements pointing to that location in your programs.)

If you find a missing module on CPAN, see if you can download the same version that is used by the currently operational program—not necessarily the latest version. Remember, you want first of all to re-create the original environment as closely as possible to minimize the number of places you'll have to look for bugs if it doesn't work. Again, if you're dealing with a relatively small, unprepossessing program, this level of caution may not be worth the trouble and you will usually spend less time overall if you just run it against the latest version of everything it needs.

To find out what version of a module (Foo::Bar, say) the original program uses, run this command on the operational system:


% perl -MFoo::Bar -le 'print $Foo::Bar::VERSION'

0.33

Old or poorly written modules may not define a $VERSION package variable, leaving you to decide just how much effort you want to put into finding exactly the same historical version, because you'll have to compare the actual source code texts (unless you have the source your module was installed from and the version number is embedded in the directory name). Don't try getting multiple versions of the same module to coexist in the same perl installation unless you're desperate; this takes considerable expertise.

You can find tools for reporting dependencies in programs and modules in Tom Christiansen's pmtools distribution (http://language.perl.com/misc/pmtools-1.00.tar.gz).

2.3.1 Gobbledygook

What if you look at a program and it really makes no sense at all? No indentation, meaningless variable names, line breaks in bizarre places, little or no white space? You're looking at a deliberately obfuscated program, likely one that was created by running a more intelligible program through an obfuscator.[3]

[3] Granted, some programs written by humans can appear obfuscated even when there was no intention that they appear that way. See Section 1.5.

Clearly, you'd prefer to have the more intelligible version. That's the one the developer used; what you've got is something they delivered in an attempt to provide functionality while making it difficult for the customer to make modifications or understand the code. You're now the developer, so you're entitled to the original source code; find it. If it's been lost, don't despair; much of the work of reconstructing a usable version of the program can be done by a beautifier, discussed in Section 4.5. A tool specifically designed for helping you in this situation is Joshua ben Jore's module B::Deobfuscate (http://search.cpan.org/dist/B-Deobfuscate/).

    Previous Table of Contents Next