File Coverage

blib/lib/PPI/Token/HereDoc.pm
Criterion Covered Total %
statement 74 76 97.3
branch 41 42 97.6
condition 5 6 83.3
subroutine 7 8 87.5
pod 3 3 100.0
total 130 135 96.3


line stmt bran cond sub pod time code
1             package PPI::Token::HereDoc;
2              
3             =pod
4              
5             =head1 NAME
6              
7             PPI::Token::HereDoc - Token class for the here-doc
8              
9             =head1 INHERITANCE
10              
11             PPI::Token::HereDoc
12             isa PPI::Token
13             isa PPI::Element
14              
15             =head1 DESCRIPTION
16              
17             Here-docs are incredibly handy when writing Perl, but incredibly tricky
18             when parsing it, primarily because they don't follow the general flow of
19             input.
20              
21             They jump ahead and nab lines directly off the input buffer. Whitespace
22             and newlines may not matter in most Perl code, but they matter in here-docs.
23              
24             They are also tricky to store as an object. They look sort of like an
25             operator and a string, but they don't act like it. And they have a second
26             section that should be something like a separate token, but isn't because a
27             string can span from above the here-doc content to below it.
28              
29             So when parsing, this is what we do.
30              
31             Firstly, the PPI::Token::HereDoc object, does not represent the C<<< << >>>
32             operator, or the "END_FLAG", or the content, or even the terminator.
33              
34             It represents all of them at once.
35              
36             The token itself has only the declaration part as its "content".
37              
38             # This is what the content of a HereDoc token is
39             <
40            
41             # Or this
42             <<"FOO"
43            
44             # Or even this
45             << 'FOO'
46              
47             That is, the "operator", any whitespace separator, and the quoted or bare
48             terminator. So when you call the C method on a HereDoc token, you
49             get '<< "FOO"'.
50              
51             As for the content and the terminator, when treated purely in "content" terms
52             they do not exist.
53              
54             The content is made available with the C method, and the name of
55             the terminator with the C method.
56              
57             To make things work in the way you expect, PPI has to play some games
58             when doing line/column location calculation for tokens, and also during
59             the content parsing and generation processes.
60              
61             Documents cannot simply by recreated by stitching together the token
62             contents, and involve a somewhat more expensive procedure, but the extra
63             expense should be relatively negligible unless you are doing huge
64             quantities of them.
65              
66             Please note that due to the immature nature of PPI in general, we expect
67             C to be a rich (bad) source of corner-case bugs for quite a while,
68             but for the most part they should more or less DWYM.
69              
70             =head2 Comparison to other string types
71              
72             Although technically it can be considered a quote, for the time being
73             C are being treated as a completely separate C subclass,
74             and will not be found in a search for L or
75             L objects.
76              
77             This may change in the future, with it most likely to end up under
78             QuoteLike.
79              
80             =head1 METHODS
81              
82             Although it has the standard set of C methods, C objects
83             have a relatively large number of unique methods all of their own.
84              
85             =cut
86              
87 67     67   500 use strict;
  67         4160  
  67         116671  
88              
89             our $VERSION = '1.28401'; # TRIAL
90              
91             our @ISA = "PPI::Token";
92              
93              
94              
95              
96              
97             #####################################################################
98             # PPI::Token::HereDoc Methods
99              
100             =pod
101              
102             =head2 heredoc
103              
104             The C method is the authoritative method for accessing the contents
105             of the C object.
106              
107             It returns the contents of the here-doc as a list of newline-terminated
108             strings. If called in scalar context, it returns the number of lines in
109             the here-doc, B the terminator line.
110              
111             =cut
112              
113 695     695 1 88073 sub heredoc { @{shift->{_heredoc}} }
  695         3438  
114              
115             =pod
116              
117             =head2 indentation
118              
119             The C method returns the indentation string of an indented
120             here-doc if that can be determined. If the indented here-doc is damaged
121             (say, missing terminator) or the here-doc was not indented, it returns
122             C.
123              
124             =cut
125              
126 631     631 1 5218 sub indentation { shift->{_indentation} }
127              
128             =pod
129              
130             =head2 terminator
131              
132             The C method returns the name of the terminating string for the
133             here-doc.
134              
135             Returns the terminating string as an unescaped string (in the rare case
136             the terminator has an escaped quote in it).
137              
138             =cut
139              
140             sub terminator {
141 0     0 1 0 shift->{_terminator};
142             }
143              
144             sub _is_terminator {
145 1503     1503   4487 my ( $self, $terminator, $line, $indented ) = @_;
146 1503 100       3540 if ( $indented ) {
147 729         35868 return $line =~ /^\s*\Q$terminator\E$/;
148             } else {
149 774         2308 return $line eq $terminator;
150             }
151             }
152              
153             sub _indent {
154 16     16   30 my ( $self, $token ) = @_;
155 16         51 my ($indent) = $token->{_terminator_line} =~ /^(\s*)/;
156 16         31 return $indent;
157             }
158              
159             sub _is_match_indent {
160 16     16   35 my ( $self, $token, $indent ) = @_;
161 16 100       21 return (grep { /^$indent/ || $_ eq "\n" } @{$token->{_heredoc}}) == @{$token->{_heredoc}};
  33         153  
  16         36  
  16         49  
162             }
163              
164              
165              
166              
167             #####################################################################
168             # Tokenizer Methods
169              
170             # Parse in the entire here-doc in one call
171             sub __TOKENIZER__on_char {
172 745     745   2387 my ( $self, $t ) = @_;
173              
174             # We are currently located on the first char after the <<
175              
176             # Handle the most common form first for simplicity and speed reasons
177             ### FIXME - This regex, and this method in general, do not yet allow
178             ### for the null here-doc, which terminates at the first
179             ### empty line.
180 745         2979 pos $t->{line} = $t->{line_cursor};
181              
182 745 100       6628 if ( $t->{line} !~ m/\G( ~? \s* (?: "[^"]*" | '[^']*' | `[^`]*` | \\?\w+ ) )/gcx ) {
183             # Degenerate to a left-shift operation
184 99         621 $t->{token}->set_class('Operator');
185 99         410 return $t->_finalize_token->__TOKENIZER__on_char( $t );
186             }
187              
188             # Add the rest of the token, work out what type it is,
189             # and suck in the content until the end.
190 646         2009 my $token = $t->{token};
191 646         3698 $token->{content} .= $1;
192 646         1850 $t->{line_cursor} += length $1;
193              
194             # Find the terminator, clean it up and determine
195             # the type of here-doc we are dealing with.
196 646         1892 my $content = $token->{content};
197 646 100       8387 if ( $content =~ /^\<\<(~?)(\w+)$/ ) {
    100          
    100          
    100          
    50          
198             # Bareword
199 134         426 $token->{_mode} = 'interpolate';
200 134 100       543 $token->{_indented} = 1 if $1 eq '~';
201 134         616 $token->{_terminator} = $2;
202              
203             } elsif ( $content =~ /^\<\<(~?)\s*\'(.*)\'$/ ) {
204             # ''-quoted literal
205 273         1108 $token->{_mode} = 'literal';
206 273 100       1691 $token->{_indented} = 1 if $1 eq '~';
207 273         1141 $token->{_terminator} = $2;
208 273         1230 $token->{_terminator} =~ s/\\'/'/g;
209              
210             } elsif ( $content =~ /^\<\<(~?)\s*\"(.*)\"$/ ) {
211             # ""-quoted literal
212 37         145 $token->{_mode} = 'interpolate';
213 37 100       217 $token->{_indented} = 1 if $1 eq '~';
214 37         152 $token->{_terminator} = $2;
215 37         139 $token->{_terminator} =~ s/\\"/"/g;
216              
217             } elsif ( $content =~ /^\<\<(~?)\s*\`(.*)\`$/ ) {
218             # ``-quoted command
219 182         842 $token->{_mode} = 'command';
220 182 100       1426 $token->{_indented} = 1 if $1 eq '~';
221 182         856 $token->{_terminator} = $2;
222 182         637 $token->{_terminator} =~ s/\\`/`/g;
223              
224             } elsif ( $content =~ /^\<\<(~?)\\(\w+)$/ ) {
225             # Legacy forward-slashed bareword
226 20         67 $token->{_mode} = 'literal';
227 20 100       111 $token->{_indented} = 1 if $1 eq '~';
228 20         75 $token->{_terminator} = $2;
229              
230             } else {
231             # WTF?
232 0         0 return undef;
233             }
234              
235             # Suck in the HEREDOC
236 646         2407 $token->{_heredoc} = \my @heredoc;
237 646         1945 my $terminator = $token->{_terminator} . "\n";
238 646         2698 while ( defined( my $line = $t->_get_line ) ) {
239 1216 100       8056 if ( $self->_is_terminator( $terminator, $line, $token->{_indented} ) ) {
240             # Keep the actual termination line for consistency
241             # when we are re-assembling the file
242 175         564 $token->{_terminator_line} = $line;
243              
244 175 100       611 if ( $token->{_indented} ) {
245 10         23 my $indent = $self->_indent( $token );
246 10         17 $token->{_indentation} = $indent;
247             # Indentation of here-doc doesn't match delimiter
248 10 100       25 unless ( $self->_is_match_indent( $token, $indent ) ) {
249 1         2 push @heredoc, $line;
250 1         3 last;
251             }
252              
253 9         77 s/^$indent// for @heredoc, $token->{_terminator_line};
254             }
255              
256             # The HereDoc is now fully parsed
257 174         692 return $t->_finalize_token->__TOKENIZER__on_char( $t );
258             }
259              
260             # Add the line
261 1041         4018 push @heredoc, $line;
262             }
263              
264             # End of file.
265             # Error: Didn't reach end of here-doc before end of file.
266              
267             # If the here-doc block is not empty, look at the last line to determine if
268             # the here-doc terminator is missing a newline (which Perl would fail to
269             # compile but is easy to detect) or if the here-doc block was just not
270             # terminated at all (which Perl would fail to compile as well).
271 472         1454 $token->{_terminator_line} = undef;
272 472 100 66     2668 if ( @heredoc and defined $heredoc[-1] ) {
273             # See PPI::Tokenizer, the algorithm there adds a space at the end of the
274             # document that we need to make sure we remove.
275 287 100       1107 if ( $t->{source_eof_chop} ) {
276 271         781 chop $heredoc[-1];
277 271         701 $t->{source_eof_chop} = '';
278             }
279              
280             # Check if the last line of the file matches the terminator without
281             # newline at the end. If so, remove it from the content and set it as
282             # the terminator line.
283             $token->{_terminator_line} = pop @heredoc
284 287 100       1266 if $self->_is_terminator( $token->{_terminator}, $heredoc[-1], $token->{_indented} );
285             }
286              
287 472 100 100     6774 if ( $token->{_indented} && $token->{_terminator_line} ) {
288 6         16 my $indent = $self->_indent( $token );
289 6         15 $token->{_indentation} = $indent;
290 6 100       13 if ( $self->_is_match_indent( $token, $indent ) ) {
291             # Remove indent from here-doc as much as possible
292 5         35 s/^$indent// for @heredoc;
293             }
294              
295 6         29 s/^$indent// for $token->{_terminator_line};
296             }
297              
298             # Set a hint for PPI::Document->serialize so it can
299             # inexpensively repair it if needed when writing back out.
300 472         1908 $token->{_damaged} = 1;
301              
302             # The HereDoc is not fully parsed
303 472         2026 $t->_finalize_token->__TOKENIZER__on_char( $t );
304             }
305              
306             1;
307              
308             =pod
309              
310             =head1 TO DO
311              
312             - Implement PPI::Token::Quote interface compatibility
313              
314             - Check CPAN for any use of the null here-doc or here-doc-in-s///e
315              
316             - Add support for the null here-doc
317              
318             - Add support for here-doc in s///e
319              
320             =head1 SUPPORT
321              
322             See the L in the main module.
323              
324             =head1 AUTHOR
325              
326             Adam Kennedy Eadamk@cpan.orgE
327              
328             =head1 COPYRIGHT
329              
330             Copyright 2001 - 2011 Adam Kennedy.
331              
332             This program is free software; you can redistribute
333             it and/or modify it under the same terms as Perl itself.
334              
335             The full text of the license can be found in the
336             LICENSE file included with this module.
337              
338             =cut