Nice article by Chromatic from Perl.com, so here it is:
listen
Friday October 12, 2007 6:00AM
by chromatic in Technical
Adrian Howard asked on PerlMonks is there any way to access the contents of a block eval?. After catching an exception in Perl, can you see the source code that threw the exception, without using a source filter?
I wrote many of my parts of Perl Hacks by trying to do impossible things that no one had ever done before. (I don’t know that no one had ever done them, but I’d never heard of them before, which was close enough.) Because everyone thought it was impossible, I decided to try.
Introspection
The B modules prove Perl-level access to the internal data structures which represent executing Perl programs. They’ve been around in the core for years, but they rarely get much attention. (Their interfaces are occasionally opaque, and the documentation isn’t always clear.)
Plenty of Perl programmers know that B::Deparse is usually very good at making idiomatic code clearer. From the command line, run perl -MO=Deparse some_messy_file.pl to get a better sense of what’s happening. (This is your second stop after Perl::Tidy.)
Fewer Perl programmers know that you can use B::Deparse from within your programs. Given a subroutine reference and a B::Deparse object, call the coderef2text() method on the object, passing the reference, to receive deparsed source code.
I knew that much was possible before I read Adrian’s post. I figured I could start from there to see what I could find.
More Introspection
I decided to set a couple of constraints. First, I wanted a simple function which could return the source code of the previous eval block. Second, I decided that I’d only handle the case where the eval block occurred inside a Perl subroutine. With a little bit of work, my solution will also work when the eval occurs at the top level of a program, but I didn’t want to code around every corner case in this proof of concept. Finally, I decided that I would assume that there would only be one eval block on a line. That’s a reasonable assumption as well.
I wrote a little bit of code for a test case: sub main
{
my $x = 10;
my $y = 20;
eval { my $x = 1; my $y =
$x; die 'aaaarrrr' };
print( $@, get_eval_text( __LINE__ - 1 ) ) if $@;
}
main();
The only interesting code is the call to get_eval_text(). It takes one argument, the line number of the eval call. This was my first interface, and it could be slightly better. In particular, the __LINE__ directive (congruent to #line in C) is unnecessary. As well, the line number is also unnecessary; it’s possible to find this precise call in the optree and then the most recent eval by walking backwards, but that’s polish now that I know it’s possible.
get_eval_text() is reasonably simple, too: sub get_eval_text
{
my $start = shift;
# turn the invoking
subroutine into a B::Op-derived object
my ($package, $sub) = (caller(1))[0,
3];
my $subref = do { no strict 'refs'; *{ $package . '::' . $sub }{CODE} };
my $cv = B::svref_2object( $subref );
# create a B::Deparse object and
give it a sub to deparse (in part)
my $deparse = B::Deparse->new();
$deparse->{curcv} = $cv;
# search the optree for the first op on the
eval {} line
my $op = deparse_from($cv->START, $start);
# and deparse
from that op
return $deparse->deparse($op, 0) if $op;
}
There’s a little more magic here. First, it uses the caller() operator to get information about the calling subroutine — the one that contains the eval block. From there, it only takes a symbolic lookup to get a reference to that subroutine. (Yes, getting a reference to an anonymous subroutine would be slightly more difficult, but certainly not impossible. With that reference, the code then calls B::svref_2object to receive a B::CV object.
Optrees
Perl represents all subroutines internally with data structures called CVs (for Code Value). B wraps these into objects with the appropriate methods. A CV contains pointers to its optree, which is a tree of opcodes that Perl executes. To see a textual representation of the optree, use B::Concise.
Next, the code creates a B::Deparse object. Instead of immediately passing it the subroutine reference, the code sets a property of the object to the B::CV object. This is because I didn’t want to deparse the entire subroutine, just the eval block. B::Deparse doesn’t really quite have a public interface for the approach I took, yet.
The next call is to deparse_from(), a function which takes the first op (first in execution order) of the subroutine as well as the line number on which the eval block occurs. It returns a B::Op or derivative object representing the branch of the block.
Finally, the result gets passed to the B::Deparse object’s deparse() method, if there’s any result. That method returns the deparsed code of just that branch of the optree.
Here’s the code for deparse_from(): sub deparse_from
{
my ($start, $line) = @_;
for (my $op =
$start; $$op; $op = $op->next())
{
# look for nextstate ops
next
unless $op->isa( 'B::COP' );
# ... specifically the one representing the
start of the eval {}
next unless $op->line == $line;
# then grab the
sibling op in the tree: leavetry
return $op->sibling;
}
return;
}
The heart is a loop, which just walks the optree from one op to the next, in execution order. The optree for main() looks like:
$ perl -MO=Concise,main get_at_eval.pl
main::main:
m <1>
leavesub[1 ref] K/REFC,1 ->(end)
- <@> lineseq KP ->m
1
<;> nextstate(main 2016 get_at_eval.pl:49) v/2 ->2
4 <2>
sassign vKS/2 ->5
2 <$> const[IV 10] s ->3
3 <0>
padsv[$x:2016,2021] sRM*/LVINTRO ->4
5 <;> nextstate(main 2017
get_at_eval.pl:50) v/2 ->6
8 <2> sassign vKS/2 ->9
6
<$> const[IV 20] s ->7
7 <0> padsv[$y:2017,2021] sRM*/LVINTRO
->8
9 <;> nextstate(main 2021 get_at_eval.pl:51) v/2 ->a
b
<@> leavetry vKP ->c
a <> entertry(other->b) v ->n
n <;> nextstate(main 2018 get_at_eval.pl:51) v/2 ->o
q
<2> sassign vKS/2 ->r
o <$> const[IV 1] s ->p
p
<0> padsv[$x:2018,2020] sRM*/LVINTRO ->q
r <;> nextstate(main
2019 get_at_eval.pl:51) v/2 ->s
u <2> sassign vKS/2 ->v
s
<0> padsv[$x:2018,2020] s ->t
t <0> padsv[$y:2019,2020]
sRM*/LVINTRO ->u
v <;> nextstate(main 2020 get_at_eval.pl:51) v/2
->w
y <@> die[t5] vK/1 ->b
w <0> pushmark s ->x
x <$> const[PV "aaaarrrr"] s ->y
c <;> nextstate(main
2021 get_at_eval.pl:52) v/2 ->d
- <1> null K/1 ->-
e
<> and(other->f) K/1 ->m
- <1> ex-rv2sv sK/3 ->e
d
<#> gvsv[*@] s ->e
l <@> print sK ->m
f <0>
pushmark s ->g
- <1> ex-rv2sv sK/3 ->h
g <#> gvsv[*@]
s ->h
k <1> entersub[t9] lKS/TARG,3 ->l
- <1> ex-list
lK ->k
h <0> pushmark s ->i
i <$> const[IV 51] sM
->j
- <1> ex-rv2cv sK/3 ->-
j <#> gv[*get_eval_text] s
->k
get_at_eval.pl syntax OK
I realize that’s a big chunk of information, but you don’t have to understand much about it. The numbers and letters to the far left represent the execution sequence. The START op of the code is 1, the next op is 2, and so forth.
The loop walks the ops, looking for COP (Control OPeration) nodes. The concise output represents node types symbolically; <;> indicates a COP. Note that they’re all named nextstate. These nodes more or less represent sequence points in the program. They also have a line attribute, which the loop uses to see if it’s reached the correct line number for the eval block. Otherwise, it keeps stepping through the code.
When the loop has reached the correct line, it grabs the sibling node (in tree order, not execution order) which is (probably) the leavetry node representing the eval block. Following the next pointer would lead to the entertry op, which isn’t sufficient to deparse the whole block structure as it isn’t the parent op of the entire block. Don’t worry if you don’t follow all of that; just trust me that a parent op executes only after all of its children have executed in the proper order.
The leavetry op is deparseable, and it gives only the eval block and not any other parts of the subroutine.
Concluding Thoughts
I mentioned earlier that there are a couple of optimizations. Because caller knows the line number of the call to get_eval_text(), it’s possible to scan the relevant CV optree for the COP with that line number, then scan backwards for the most recent leavetry sibling. That heuristic will probably work, but it’s more complex than I wanted to show here (and also it only occurred to me as I started to write this explanation.)
It’s also possible that there will be no calling subroutine. In that case, there are a couple of B functions that can grab the top-level CV in the invoking package for introspection. I also didn’t show that because it would complicate the code.
If you run this code, you’ll see that the deparsed code block likely contains the use of the warnings and strict pragmata. I didn’t look in B::Deparse how to disable ambient pragmas, mostly because I considered removing that code also polish. However, it might be useful to see what is in effect in the block anyway.
I realize that Lisp-like languages make this sort of behavior much easier, but it’s at least possible in Perl, even if the interface isn’t always easy. One of the plans for Parrot is to make this introspection much less painful. Once that’s in place, it should be possible to give better introspective and manipulation capabilities to all languages running on Parrot, from each other.
2 comments:
Interesting to know.
www.BedroomBox.co.uk
SEX TOYS and LINGERIE at UKs Official Sex Store!
Post a Comment