Приглашаем посетить

Section 1.3. Unexpected Code

1.3. Unexpected Code

The final set of advanced techniques in this chapter covers anything where Perl code runs at a time that might not be obvious: tying, for instance, runs code when a variable is accessed or assigned to; overloading runs code when various operations are called on a value; and time shifting allows us to run code out of order or delayed until the end of scope.

Some of the most striking effects in Perl can be obtained by arranging for code to be run at unexpected moments, but this must be tempered with care. The whole point of unexpected code is that it's unexpected, and that breaks the well-known Principle of Least Surprise: programming Perl should not be surprising.

On the other hand, these are powerful techniques. Let's take a look at how to make the best use of them.

1.3.1. Overloading

Overloading, in a Perl context, is a way of making an object look like it isn't an object. More specifically, it's a way of making an object respond to methods when used in an operation or other context that doesn't look like a method call.

The problem with such overloading is that it can quickly get wildly out of hand. C++ overloads the left bit-shift operator, <<, on filehandles to mean print:

    cout << "Hello world";

since it looks like the string is heading into the stream. Ruby, on the other hand, overloads the same operator on arrays to mean push. If we make flagrant use of overloading in Perl, we end up having to look at least twice at code like:

    $object *= $value;

We look once to see it as a multiplication, once to realize it's actually a method call, and once more to work out what class $object is in at this point and hence what method has been called.

That said, for classes that more or less represent the sort of things you're overloadingnumbers, strings, and so onthen overloading works fine. Now, how do we do it?

1.3.1.1 Simple operator overloading

The classic example of operator overloading is a module that represents time. Indeed, Time::Seconds, from the Time::Piece distribution does just this. Let's make some new Time::Seconds objects:

    my $min  = Time::Seconds->new(60);
    my $hour = Time::Seconds->new(3600);

The point of Time::Seconds is that, as well as merely representing a number of seconds, you can convert between different units of duration:

    my $longtime = Time::Seconds->new(123456);
    print $longtime->hours; # 34.2933..
    print $longtime->days;  # 1.42888..

These objects definitely represent a numbera number of seconds. Normally, we'd have to add them together with some ugly hack like this:

    my $new = $min->add($hour);

And even then it's not clear whether or not that alters the original $min. So one natural use of operator overloading would be to enable us to say $min + $hour, and get back an object representing 3,660 seconds. And that is precisely what happens:

    my $new = $min + $hour;
    print $new->seconds; # 3660

This is done by the following bit of code in the Time::Seconds module:

    use overload '+' => \&add;
    # ...
    sub add {
        my ($lhs, $rhs) = _get_ovlvals(@_);
        return Time::Seconds->new($lhs + $rhs);
    }

    sub _get_ovlvals {
        my ($lhs, $rhs, $reverse) = @_;
        $lhs = $lhs->seconds;

        if (UNIVERSAL::isa($rhs, 'Time::Seconds')) {
        $rhs = $rhs->seconds;
        } elsif (ref($rhs)) {
        die "Can't use non Seconds object in operator overload";
        }

        if ($reverse) { return $rhs, $lhs; }
        return $lhs, $rhs;
    }

The overload pragma is the key to it all. It tells Perl to look more carefully at operations involving objects of that class, and it registers methods for the given operators in a look-up table. When an object is involved in an overloaded operation, the operation is looked up in the table and the resulting method called. In this case, $obj + $other will call $obj->add($other, 0).

The reason Perl passes three parameters to the method is that in the case of $other + $obj, where $other is not an object that overloads +, we still expect the add method to be called on $obj. In this case, however, Perl will call $obj->add($other, 1), to signify that the arguments have been reversed.

The _get_ovlvals subroutine looks at the two arguments to an operator and tries to coerce them into numbersother Time::Seconds objects are turned into numbers by having the seconds method called on them, ordinary numbers are passed through, and any other kind of object causes a fatal error. Then the arguments are reordered to the original order.

Once we have two ordinary numbers, we can add them together and return a new Time::Seconds object based on the sum.

The other operators are based on this principle, such as <=>, which implements all of the comparison operators:

    use overload '<=>' => \&compare;
    sub compare {
        my ($lhs, $rhs) = _get_ovlvals(@_);
        return $lhs <=> $rhs;
    }

Time::Seconds also overloads assignment operators += and -=:

    use overload '-=' => \&subtract_from;
    sub subtract_from {
        my $lhs = shift;
        my $rhs = shift;
        $rhs = $rhs->seconds if UNIVERSAL::isa($rhs, 'Time::Seconds');
        $$lhs -= $rhs;
        return $lhs;
    }

This allows you to say $new += 60 to add another minute to the new duration.

Finally, to avoid having to write such subroutines for every kind of operator, Time::Seconds uses a feature of overload called fallback. This instructs Perl to attempt to automatically generate reasonable methods from the ones specified: for instance, the $x++ operator will be implemented in terms of $x += 1, and so on.Time::Seconds sets fallback to undef, which means that Perl will try to use an autogenerated method but will die if it cannot find one.

    use overload 'fallback' => 'undef';

Alternate values for fallback include some true value, which is the most general fallback: if it cannot find an autogenerated method, it will do what it can, assuming if necessary that overloading does not exist. In other words, it will always produce some value, somehow.

If you're using overloading just to add a shortcut operator or two onto an otherwise object-based classfor example, if you wanted to emulate C++'s (rather dodgy) use of the << operator to write to a filehandle:

    $file << "This is ugly\n";

then you should use the default value of fallback, which is false. This means that no automatic method generation will be tried, and any attempts to use the object with one of the operations you have not overloaded will cause a fatal error.

However, as well as performing arithmetic operations on Time::Seconds objects, there's something else you can do with them:

    print $new; # 3660

If we use the object as an ordinary string or a number, we don't get object-like behavior (the dreaded Time::Seconds=SCALAR(0xf00)) but instead it acts just like we should expect from something representing a number: it looks like a number. How does it do that?

1.3.1.2 Other operator overloading

As well as being able to overload the basic arithmetic and string operators, Perl allows you to overload the sorts of things that you wouldn't normally think of as operators. The two most useful of these we have just seen with Time::Secondsthe ability to dictate how an object is converted to a string or integer when used as such.

This is done by assigning methods to two special operator namesthe "" operator for stringification and the 0+ operator for numification:

    use overload '0+' => \&seconds,
                 '""' => \&seconds;

Now anytime the Time::Seconds object is used as a string or a number, the seconds method gets called, returning the number of seconds that the object contains:

    print "One hour plus one minute is $new seconds\n";
    # One hour plus one minute is 3660 seconds.

These are the most common methods to make an overloaded object look and behave like the thing it's meant to represent. There are a few other methods you can play with for more obscure effects.

For instance, you can overload the way that an object is dereferenced in various ways, allowing a scalar reference to pretend that it's a list reference or vice versa. There are few sensible reasons to do thisthe curious Object::MultiType overloads the @{ }, %{ }, &{ }, and *{ } operators to allow a single object to pretend to be an array, hash, subroutine, or glob, depending on how it's used.

1.3.1.3 Non-operator overloading

One little-known extension of the overload mechanism is hidden away in the documentation for overload:

For some application Perl parser [sic] mangles constants too much. It is possible to hook into this process via overload::constant( ) and overload::remove_constant( ) functions.

These functions take a hash as an argument. The recognized keys of this hash are

integer

to overload integer constants,

float

to overload floating point constants,

binary

to overload octal and hexadecimal constants,

q

to overload "q"-quoted strings, constant pieces of "qq"- and "qx"-quoted strings and here-documents,

qr

to overload constant pieces of regular expressions.

That is to say, you can cause the Perl parser to run a subroutine of your choice every time it comes across some kind of constant. Naturally, this is again something that should be used with care but can be used to surprising effect.

The subroutines supplied to overload::constant pass three parameters: the first is the raw form as the parser saw it, the second is the default interpretation, and the third is a mnemonic for the context in which the constant occurs. For instance, given "camel\nalpaca\npanther", the first parameter would be camel\nalpaca\npanther, whereas the second would be:

    camel
    alpaca
    panther

As this is a double-quoted (qq) string, the third parameter would be qq.

For instance, the high-precision math libraries Math::BigInt and Math::BigFloat provide the ability to automatically create high-precision numbers, by overloading the constant operation.

    % perl -MMath::BigFloat=:constant -le 'print ref (123456789012345678901234567890\
        >1234567890)'
    Math::BigFloat

This allows the libraries to get at all the numbers in a program, providing high-precision math without the explicit creation of overloaded Math::BigFloat objects. The code that does it is stunningly simple:

    sub import {
        my $self = shift;
        # ...
        overload::constant float => sub { $self->new(shift); };
    }

When the parser sees a floating point number (one too large to be stored as an integer) it passes the raw string as the first parameter of the subroutine reference. This is equivalent to calling:

    Math::BigFloat->new("1234567890123456789012345678901234567890")

at compile time.

The Math::Big* libraries can get away with this because they are relatively well behaved; that is, a Perl program should not notice any difference if all the numbers are suddenly overloaded Math::BigInt objects.

On the other hand, here's a slightly more crazy use of overloading...

I've already mentioned Ruby as being another favorite language of mine. One of the draws about Ruby is that absolutely everything is an object:

    % irb
    irb(main):001:0> 2
    => 2
    irb(main):002:0> 2.class
    => Fixnum
    irb(main):003:0> 2.class.class
    => Class
    irb(main):004:0> 2.class.class.class
    => Class
    irb(main):005:0> 2.methods
    => ["<=", "to_f", "abs", "-", "upto", "succ", "|", "/", "type",
    "times", "%", "-@", "&", "~", "<", "**", "zero?", "^", "<=>", "to_s",
    "step", "[&thinsp;&thinsp;]", ">", "=&thinsp;&thinsp;=", "modulo", "next", "id2name",    "size", "<<",
    "*", "downto", ">>", ">=", "divmod", "+", "floor", "to_int", "to_i",
    "chr", "truncate", "round", "ceil", "integer?", "prec_f", "prec_i",
    "prec", "coerce", "nonzero?", "+@", "remainder", "eql?",
    "=&thinsp;&thinsp;=&thinsp;&thinsp;=",
    "clone", "between?", "is_a?", "equal?", "singleton_methods", "freeze",
    "instance_of?", "send", "methods", "tainted?", "id",
    "instance_variables", "extend", "dup", "protected_methods", "=~",
    "frozen?", "kind_of?", "respond_to?", "class", "nil?",
    "instance_eval", "public_methods", "_&thinsp;_send_&thinsp;_", "untaint", "_&thinsp;_
    id_&thinsp;_",
    "inspect", "display", "taint", "method", "private_methods", "hash",
    "to_a"]

I like that you can call methods on a 2. I like that you can define your own methods to call on a 2. Of course, you can't do that in Perl; 2 is not an object.

But we can fake it. Ruby.pm was a proof-of-concept module I started work on to demonstrate that you can do this sort of thing in Perl. Here's what it looks like:

        use Ruby;
        print 2->class; # "FixInt"
        print "Hello World"->class->class # "Class"
        print 2->class->to_s->class # "String"
        print 2->class->to_s->length # "6"
        print ((2+2)->class) # "FixInt"

        # Or even:
        print 2.class.to_s.class # "String"

How can this possibly work? Obviously, the only thing that we can call methods on are objects, so constants like 2 and Hello World need to return objects. This tells us we need to be overloading these constants to return objects. We can do that easily enough:

        package Ruby;
        sub import {
        overload::constant(integer => sub { return Fixnum->new(shift) },
                           q       => sub { return String->new(shift) },
                           qq      => sub { return String->new(shift) });
        }

We can make these objects blessed scalar references:

        package Fixnum;
        sub new { return bless \$_[1], $_[0] }

        package String;
        sub new { return bless \$_[1], $_[0] }

This allows us to fill the classes up with methods that can be called on the constants. That's a good start. The problem is that our constants now behave like objects, instead of like the strings and numbers they represent. We want "Hello World" to look like and act like "Hello World" instead of like "String=SCALAR(0x80ba0c)".

To get around this, we need to overload againwe've overloaded the constants to become objects, and now we need to overload those objects to look like constants again. Let's look at the string class first. The first thing we need to overload is obviously stringification; when the object is used as a string, it needs to display its string value to Perl, which we do by dereferencing the reference.

    use overload '""' => sub { ${$_[0]} };

This will get us most of the way there; we can now print out our Strings and use them anywhere that a normal Perl string would be expected. Next, we take note of the fact that in Ruby, Strings can't be coerced into numbers. You can't simply say 2 + "10", because this is an operation between two disparate types.

To make this happen in our String class, we have to overload numification, too:

    use Carp;
    use overload "0+" => sub { croak "String can't be coerced into Fixnum"};

You might like the fact that Perl converts between types magically, but the reason why Ruby can't do it is because it uses the + operator for both numeric addition and string concatenation, just like Java and Python. Let's overload + to give us string concatenation:

    use overload "+"  => sub { String->new(${$_[0]} . "$_[1]") };

There are two things to note about this. The first is that we have to be sure that any operations that manipulate strings will themselves return String objects, or otherwise we will end up with ordinary strings that we can no longer call methods on. This is necessary in the Fixnum analogue to ensure that (2+2)->class still works. The other thing is that we must explicitly force stringification on the right-hand operand, for reasons soon to become apparent.

Turning temporarily to the numeric class, we can fill in two of the overload methods in the same sort of way:

    use overload '""' => sub { croak "failed to convert Fixnum into String" },
                 "0+" => sub { ${ $_[0] } },

However, methods like + have to be treated carefully. We might first try doing something like this:

    use overload '+'  => sub { ${ $_[0] } + $_[1] };

However, if we then try 2 + "12" then we get the bizarre result 122, and further prodding finds that this is a String. Why?

What happens is that Perl first sees Fixnum + String and calls the overloaded method we've just created. Inside this method, it converts the Fixnum object to its integer value and now has integer + String.

The integer is not overloaded, but the String object is. If Perl can see an overloaded operation, it will try and call it, reordering the operation as String + integer. Since String has an overloaded + method, too, that gets called, creating a new string, which catenates the String and the integer. Oops.

Ideally, we would find a way of converting the right-hand side of the + operation on a Fixnum to an honest-to-goodness number. Unfortunately, while Perl has an explicit stringification operator, "", which we used to avoid this problem in the String case, there isn't an explicit numification operator; overload uses 0+ as a convenient mnemonic for numification, but this is merely describing the operation in terms of the + operator, which can be overloaded. So to fix up our + method, we have to get a little technical:

    use overload '+' => \&sum;

    sub sum {
        my ($left, $right) = @_;
        my $rval;
        if (my $numify = overload::Method($right, "0+")) {
            $rval = $right->$numify;
        } else {
            $rval = $right;
        }
        Fixnum->new($$left + $rval);
    }

To explicitly numify the right-hand side, we ask overload if that value has an overloaded numification. If it does, Method will return the method, and we can call it and explicitly numify the value into $rval. Once we've got two plain old numbers, we add them together and return a new number out of the two.

Next, we add overload fallback => 1; to each class, to provide do-what-I-mean (DWIM) methods for the operators that we don't define. This is what you want to do for any case where you want an object to completely emulate a standard built-in type, rather than just add one or two overloaded methods onto something that's essentially an object.

Finally, as a little flourish, we want to make the last line of our example work:

    print 2.class.to_s.class # "String"

One of the reasons Ruby's concatenation operator is + is to free up . for the preferred use in most OO languages: method calls. This isn't very easy to do in Perl, but we can fake it enough for a rigged demo. Obviously we're going to need to overload the concatenation operator. The key to working out how to make it work is to realize what those things like class are in a Perl context: they're bare words, or just ordinary strings. Hence if we see a concatenation between one of our Ruby objects and an ordinary string, we should call the method whose name is in the string:

    use overload "." => sub { my ($obj,$meth)=@_; $obj->$meth };

And presto, we have Ruby-like objects and Ruby-like method calls. The method call magic isn't perfectwe'll see later how it can be improvedbut the Ruby-like objects can now respond to any methods we want to put into their classes. It's not hard to build up a full class hierarchy just like Ruby's own.

Limitations

Of course, our overloading shenanigans do not manage to deal with, for instance, turning arrays into objects. Although Perl is pretty flexible, that really can't be done without changing the way the method call operator works.

That doesn't necessarily stop people; the hacker known only as "chocolateboy" has created a module called autobox, which requires a patch to the Perl core, but which allows you to treat any built-in Perl data type as an object.

1.3.2. Time Shifting

The final fundamental advanced technique we want to look at is that of postponing or reordering the execution of Perl code. For instance, we might want to wait until all modules have been loaded before manipulating the symbol table, we might want to construct some code and run it immediately with eval, or we might want to run code at the end of a scope.

There are Perl keywords for all of these concepts, and judicious use of them can be effective in achieving a wide variety of effects.

1.3.2.1 Doing things now with eval/BEGIN

The basic interface to time-shifting is through a series of named blocks. These are like special subroutines that Perl stores in a queue and runs at strategic points during the lifetime of a program.

A BEGIN block is executed as soon as Perl compiles the code:

    print "I come second!\n";
    BEGIN { print "I come first!\n"; }

The second line appears first because Perl does not ordinarily run code as it sees it; it waits until it has compiled a program and all of its dependencies into the sort of op tree we saw in our section on B, and then runs it all. However, BEGIN forces Perl to run the code as soon as the individual block has been compiledbefore the official runtime.

In fact, the use directive to load a module can be thought of as:

    BEGIN { require Module::Name; Module::Name->import(@stuff); }

because it causes the module's code to be loaded up and its import method to be run immediately.

One use of the immediate execution nature of the BEGIN block is in the AnyDBM_File module. This module tries to find an appropriate DBM module to inherit from, meaning that so long as one of the five supported DBM modules is available, any code using DBMs ought to work.

Unfortunately, some DBM implementations are more reliable than others, or optimized for different types of application, so you might want to specify a preferred search order that is different from the default. But when? As AnyDBM_File loads, it sets up its @ISA array and requires the DBM modules.

The trick is to use BEGIN; if AnyDBM_File sees that someone else has put an @ISA array into its namespace, it won't overwrite it with its default one. So we say:

    BEGIN { @AnyDBM_File::ISA = qw(DB_File GDBM_File NDBM_File); }
    use AnyDBM::File;

This wouldn't work without the BEGIN, since the statement would then only be executed at runtime; way after the use had set up AnyDBM_File.

As well as a BEGIN, there's also an END block, which stores up code to run right at the end of the program, and, in fact, there are a series of other special blocks as well, as shown in Figure 1-7.

Figure 1-7. Named blocks

The CHECK blocks and the INIT blocks are pretty much indistinguishable, running just before and just after execution begins. The only difference is that executing perl with the -c switch (compilation checks) will run CHECK blocks but not INIT blocks. (This also means that if you load a module at runtime, its CHECK and INIT blocks won't be run, because the transition between the global compilation phase and the global runtime execution has already passed.) Let's take a look at what we can do with a CHECK block.

1.3.2.2 Doing things later with CHECK

Earlier, we talked about messing with inheritance relationships and stealing ideas from other languages. Let's now implement a new module, which gives us the Java concept of final methods. A final method is one that cannot be overriden by inheritance:

    package Beverage::Hot;
    sub serve :final { # I have exclusive rights to defining this method!
        my ($self, $who) = @_;
        if ($who->waitress) { $who->waitress->serve($self, $who); }
        else                { $who->take($self); }
    }

    package Tea;
    use base 'Beverage::Hot';

    sub serve { # Compile-time error.
    }

We'll do this by allowing a user to specify a :final attribute on a method. This attribute will mark a method for later checking. Once compile time has finished, we'll check out all the classes that derive from the marked class, and die with an error if the derived class implements the final method.

Attributes

The idea of attributes came in Perl 5.005, with the attrs module. This was part of threading support and allowed you to mark a subroutine as being a method or being locked for threadingthat is, it only allows one thread to access the subroutine or the method's invocant at once. In 5.6.0, the syntax was changed to the now-familiar sub name :attr, and it also allowed user-defined attributes.

Perhaps the easiest way to get into attribute programming for anything tricky is to use Damian Conway's Attribute::Handlers module: this allows you to define subroutines to be called when an attribute is seen.

The first thing we want to do is take a note of those classes and methods marked final. We need to switch to the UNIVERSAL class, so that our attribute is visible everywhere. We'll also use a hash, %marked, to group the marked methods by package:

    package UNIVERSAL;
    use Attribute::Handlers;
    sub final :ATTR {
        my ($pack, $ref) = @_;
        push @{$marked{$pack}}, *{$ref}{NAME};
    }

The Attribute::Handlers package arranges for our handler to be called with various parameters, of which we are only interested in the first twothe package that has the marked subroutine in it and the glob reference for the subroutine itselfbecause we can get the subroutine's name from that. (NAME is one of the magic names we can use to access a glob's slotit returns the name of the symbol table entry. *{Tea::serve}{NAME} would return serve.)

Now we've got our list of marked methods. We need to find a way to interrupt Perl just before it runs the script but after all the modules that we plan to use have been compiled and all the inheritence relationships set up, so that we can check nobody has been naughty and overriden a finalized method.

The CHECK keyword gives us a way to do this. It registers a block of code to be called after compilation has been finished but before execution begins.^[*]

^[*] Incidentally, the O compiler module we mentioned earlier works by means of CHECK blocksafter all the code has been compiled, O has the selected compiler backend visit the opcode tree and spit out whatever it wants to do, then exits before the code is run.

To enable us to test the module, it turns out we want to have our CHECK block call another function. This is because we can then run the checker twice, once without an offending method and once with:

    CHECK { Attribute::Final->check }

What will our checking method do, though? It needs to visit all the classes that derive from those classes we have in our %marked hash, and to do that, it has to know all the packages in the system. So first we'll write a little function to recursively walk over the symbol table, collecting names of packages it sees.

The symbol table is just a hash, and we can find glob names by looking at the keys of the hash. To make matters even easier, package names are just hash keys that end in ::. So our collector function looks like this:

    sub fill_packages {
        no strict 'refs';
        my $root = shift;
        my @subs = grep s/::$//, keys %{$root."::"};
        push @all_packages, $root;
        for (@subs) {
            next if $root eq "main" and $_ eq "main"; # Loop
            fill_packages($root."::".$_);
        }
    }

The next line avoids the potential trap of looping forever, because the main:: package contains an entry to itself. Now we can start looking at the check function. It only has to deal with those packages that have some kind of inheritance relationship, so if a package does not have an @ISA, then we can discard it:

    sub check {
        no strict 'refs';
        fill_packages("main") unless @all_packages;
        for my $derived_pack (@all_packages) {
            next unless @{$derived_pack."::ISA"};
            ...
        }
    }

Next, we have a list of marked packages that contain final methods. We want to look specifically at circumstances where a derived package derives from a marked package:

    for my $derived_pack (@all_packages) {
        next unless @{$derived_pack."::ISA"};
        for my $marked_pack (keys %marked) {
            next unless $derived_pack->isa($marked_pack);
            ...

At this point, we know we have a suspect package. It has the right kind of inheritance relationship, but does it override the finalized method?

            for my $meth (@{$marked{$marked_pack}}) {
                my $glob_ref = \*{$derived_pack."::".$meth};
                if (*{$glob_ref}{CODE}) {

If the code slot is populated, then we have indeed found a naughty method. At this point, all that's left to do is report where it came from. We can do that with the B technique: by turning the glob into a B::GV object, we gain access to the otherwise unreachable FILE and LINE methods, which tell us where the glob entry was constructed.

                    my $name = $marked_pack."::".$meth;
                    my $b = B::svref_2object($glob_ref);
                    die "Cannot override final method $name at ".
                        $b->FILE. ", line ".$b->LINE."\n";

And that is the essence of working with CHECK blocks: they allow us to do things with the symbol table once everything is in place, once all the modules have been loaded, and once the inheritance relationships and other factors have been set up. If you ever feel you need to do something in a module but you don't want to do it quite yet, putting it in a CHECK block might just be the right technique.

1.3.2.3 Doing things at the end with DESTROY

We've referred to the special DESTROY method, which is called when an object goes out of scope. Generally this is used for writing out state to disk, breaking circular references, and other finalization tasks. However, you can use DESTROY to arrange for things to be done at the end of a scope:

    sub do_later (&) { bless shift, "Do::Later" }
    sub Do::Later::DESTROY { $_[0]->(  ) };

    {
       my $later = do_later { print "End of block!\n"; };
       ...
    }

So long as $later sticks around, the code doesn't get called. When it goes out of scope, gets undefined, or the final reference to it goes away, then the code block is called. Hook::LexWrap, one of the modules we looked at earlier in the chapter, actually uses a similar trick to turn off the wrapping of a subroutine at the end of a lexical scope:

        my $unwrap;
        $imposter = sub {
            if ($unwrap) { goto &$original }
            ...
        }
        ...
        return bless sub { $unwrap=1 }, 'Hook::LexWrap::Cleanup';

While you keep hold of the return value from wrap, the imposter calls the wrapping code. However, once that value goes out of scope, the closure sets $unwrap to a true value, and from then on the imposter simply jumps to the original routine.

1.3.2.4 Case study: Acme::Dot

One example that puts it all togethermessing about with the symbol table, shifting the timing of code execution, and overloadingis my own Acme::Dot module.

If you're not familiar with CPAN's Acme::* hierarchy, we'll cover it in more detail in Chapter 10, but for now you should know it's for modules that are not entirely serious. Acme::Dot is far from serious, but it demonstrates a lot of serious advanced techniques.

The idea of Acme::Dot was to abstract the $variable.method overloaded . operator from Ruby.pm and allow third-party modules to use it. It also goes a little further, allowing $variable.method(@arguments) to work. And, of course, it does so without using source filters or any other non-Perl hackery; that would be cheatingor at least inelegant.

So, how do we make this work? We know the main trick, from Ruby.pm, of overloading concatentation on an object. However, there are two niggles. The first is that previously, where $foo.class was a variable "concatenated" with a literal string, $foo.method(@args) is going to be parsed as a subroutine call. That's fine, for the time being; we'll assume that there isn't going to be a subroutine called method kicking around anywhere for now, and later we'll fix up the case where there is one. We want Perl to call the undefined subroutine method, because if an undefined subroutine gets called, we can catch it with AUTOLOAD and subvert it.

In what way do we need to subvert it? In the Ruby.pm case, we simply turned the right-hand side of the concatenation (class in $var.class) and used that as a method name. In this case, we need to not only know the method name, but the method's parameters, as well. So, our AUTOLOAD routine has to return a data structure that holds the method name and the parameter. A hash is a natural way of doing this, although an array would do just as well:

    sub AUTOLOAD {
         $AUTOLOAD =~ /.*::(.*)/;
         return if $1 eq "DESTROY";
         return { data => \@_, name => $1 }
    }

As usual, we take care to avoid clobbering DESTROY. Now that we have the arguments and the name, we can write our overload subroutine to fire the correct method call on concatenation. On the left will be the object, and on the right will be the result of our AUTOLOAD routinethe data structure that tells us which method to fire and with what parameters.

    use overload "." => sub {
        my ($obj, $stuff) = @_;
        @_ = ($obj, @{$stuff->{data}});
        goto &{$obj->can($stuff->{name})};
    }, fallback => 1;

Just as in Ruby, we use the goto trick to avoid upsetting anything that relies on caller.^[*]Now we have the easy part done.

^[*] Although, to be honest, I don't believe there really is (or ought to be) anything that relies on the behavior of callerat least, nothing that isn't doing advanced things itself.

I say this is the easy part because we know how to do this for one package. So far we've glossed over the fact that the methods and the overload routine are going to live in one class, and the AUTOLOAD subroutine has to be present wherever the $var.method method calls are going to be made. To make matters worse, our Acme::Dot module is going to be neither of these packages. We're going to see something like this:

    package My::Class;
    use Acme::Dot;
    use base 'Class::Accessor';
    _ _PACKAGE_ _->mk_accessors(qw/name age/);

    package End::User;
    use My::Class;

    my $x = new My::Class;
    $x.name("Winnie-the-Pooh");

It's the OO class that needs to use Acme::Dot directly, and it will have the overload routine. We can take care of this easily by making Acme::Dot's import method set up the overloading in its caller:

    my ($call_pack);

    sub import {
        no strict 'refs';
        $call_pack = (caller(  ))[0];
        eval <<EOT
     package $call_pack;
    use overload "." => sub {
        my (\$obj, \$stuff) = \@_;
        \@_ = (\$obj, \@{\$stuff->{data}});
        goto \&{\$obj->can(\$stuff->{name})};
    }, fallback => 1;

    EOT
        ;
    }

However, there's the third package, the End::User package, which actually never sees Acme::Dot at all. It just uses My::Class and expects to get the dot-operator functionality as part of that class. Meanwhile, our poor Acme::Dot class has to somehow find out which class is the end user and install an AUTOLOAD routine into it.

Thankfully, we know that the end-user class will call My::Class->import, so we can use glob assignment to make My::Class::import convey some information back to Acme::Dot. We can modify Acme::Dot's import routine a little:

    my ($call_pack, $end_user);

    sub import {
        no strict 'refs';
        $call_pack = (caller(  ))[0];
        *{$call_pack."::import"} = sub { $end_user = (caller(  ))[0]; };
        eval <<EOT
     package $call_pack;
    use overload "." => sub {
        my (\$obj, \$stuff) = \@_;
        \@_ = (\$obj, \@{\$stuff->{data}});
        goto \&{\$obj->can(\$stuff->{name})};
    }, fallback => 1;

    EOT
        ;
    }

As you can see, we've now glob assigned My::Class's import routine and made it save away the name of the package that used it: the end-user class.

And now, since everything is set up, we are at the point where we can inject the AUTOLOAD into the end user's class. We use a CHECK block to time-shift this to the end of compilation:

    CHECK {
       # At this point, everything is ready, and $end_user contains
       # the calling package's calling package.
       no strict;
       if ($end_user) {
           *{$end_user."::AUTOLOAD"} = sub {
                $AUTOLOAD =~ /.*::(.*)/;
                return if $1 eq "DESTROY";
                return { data => \@_, name => $1 }
           }
       }
    }

And that is essentially how Acme::Dot operates. It isn't perfect; if there's a subroutine in the end-user package with the same name as a method on the object, AUTOLOAD won't be called, and we will run into problems. It's possible to work around that, by moving all the subroutines to another package, dispatching everything via AUTOLOAD and using B to work out whether we're in the context of a concatenation operator, but...hey, it's only an Acme::* module. And I hope it's made its point already.

Table of Contents