line |
stmt |
bran |
cond |
sub |
pod |
time |
code |
1
|
|
|
|
|
|
|
package Pod::Cats; |
2
|
|
|
|
|
|
|
|
3
|
5
|
|
|
5
|
|
143689
|
use warnings; |
|
5
|
|
|
|
|
12
|
|
|
5
|
|
|
|
|
171
|
|
4
|
5
|
|
|
5
|
|
25
|
use strict; |
|
5
|
|
|
|
|
11
|
|
|
5
|
|
|
|
|
144
|
|
5
|
5
|
|
|
5
|
|
133
|
use 5.010; |
|
5
|
|
|
|
|
22
|
|
|
5
|
|
|
|
|
218
|
|
6
|
|
|
|
|
|
|
|
7
|
5
|
|
|
5
|
|
2769
|
use Pod::Cats::Parser::MGC; |
|
5
|
|
|
|
|
20
|
|
|
5
|
|
|
|
|
178
|
|
8
|
5
|
|
|
5
|
|
52
|
use List::Util qw(min max); |
|
5
|
|
|
|
|
10
|
|
|
5
|
|
|
|
|
695
|
|
9
|
5
|
|
|
5
|
|
27
|
use Carp; |
|
5
|
|
|
|
|
12
|
|
|
5
|
|
|
|
|
12900
|
|
10
|
|
|
|
|
|
|
|
11
|
|
|
|
|
|
|
=head1 NAME |
12
|
|
|
|
|
|
|
|
13
|
|
|
|
|
|
|
Pod::Cats - The POD-like markup language written for podcats.in |
14
|
|
|
|
|
|
|
|
15
|
|
|
|
|
|
|
=head1 VERSION |
16
|
|
|
|
|
|
|
|
17
|
|
|
|
|
|
|
Version 0.06 |
18
|
|
|
|
|
|
|
|
19
|
|
|
|
|
|
|
=head1 DESCRIPTION |
20
|
|
|
|
|
|
|
|
21
|
|
|
|
|
|
|
POD is an expressive markup language - like Perl is an expressive programming |
22
|
|
|
|
|
|
|
language - and for a plain text file format there is little finer. Pod::Cats is |
23
|
|
|
|
|
|
|
an extension of the POD semantics that adds more syntax and more flexibility to |
24
|
|
|
|
|
|
|
the language. |
25
|
|
|
|
|
|
|
|
26
|
|
|
|
|
|
|
Pod::Cats is designed to be extended and doesn't implement any default |
27
|
|
|
|
|
|
|
commands or entities. |
28
|
|
|
|
|
|
|
|
29
|
|
|
|
|
|
|
=head1 SYNTAX |
30
|
|
|
|
|
|
|
|
31
|
|
|
|
|
|
|
Pod::Cats syntax borrows ideas from POD and adds its own. |
32
|
|
|
|
|
|
|
|
33
|
|
|
|
|
|
|
A paragraph is any block of text delimited by blank lines (whitespace ignored). |
34
|
|
|
|
|
|
|
This is the same as POD, and basically allows you to use hard word wrapping in |
35
|
|
|
|
|
|
|
your markup without having to join them all together for output later. |
36
|
|
|
|
|
|
|
|
37
|
|
|
|
|
|
|
There are three command paragraphs, which are defined by their first character. |
38
|
|
|
|
|
|
|
This character must be in the first column; whitespace at the start of a |
39
|
|
|
|
|
|
|
paragraph is syntactically relevant. |
40
|
|
|
|
|
|
|
|
41
|
|
|
|
|
|
|
=over 4 |
42
|
|
|
|
|
|
|
|
43
|
|
|
|
|
|
|
=item C<=COMMAND CONTENT> |
44
|
|
|
|
|
|
|
X |
45
|
|
|
|
|
|
|
|
46
|
|
|
|
|
|
|
A line beginning with the C<=> symbol denotes a single I. Usually this |
47
|
|
|
|
|
|
|
will be some sort of header, perhaps the equivalent of a C<< >>, something |
48
|
|
|
|
|
|
|
like that. It is roughly equivalent to the self-closing tag in XML. B |
49
|
|
|
|
|
|
|
is just text that may or may not be present. The relationship of B to |
50
|
|
|
|
|
|
|
the B is for you to define, as is the meaning of B. |
51
|
|
|
|
|
|
|
|
52
|
|
|
|
|
|
|
When a C<=COMMAND> block is completed, it is passed to L. |
53
|
|
|
|
|
|
|
|
54
|
|
|
|
|
|
|
=item C<+NAME CONTENT> |
55
|
|
|
|
|
|
|
X |
56
|
|
|
|
|
|
|
|
57
|
|
|
|
|
|
|
A line beginning with C<+> opens a named block; its name is B. Similar to |
58
|
|
|
|
|
|
|
C<=COMMAND>, the B is arbitrary, and its relationship to the B of |
59
|
|
|
|
|
|
|
the block is up to you. |
60
|
|
|
|
|
|
|
|
61
|
|
|
|
|
|
|
When this is encountered you are invited to L. |
62
|
|
|
|
|
|
|
|
63
|
|
|
|
|
|
|
=item C<-NAME> |
64
|
|
|
|
|
|
|
X |
65
|
|
|
|
|
|
|
|
66
|
|
|
|
|
|
|
A line beginning with C<-> is the end of the named block previously started. |
67
|
|
|
|
|
|
|
These must match in reverse order to the C<+> block with the matching B - |
68
|
|
|
|
|
|
|
basically the same as XML's pairs. It is passed to L, |
69
|
|
|
|
|
|
|
and unlike the other two command paragraphs it accepts no content. |
70
|
|
|
|
|
|
|
|
71
|
|
|
|
|
|
|
=back |
72
|
|
|
|
|
|
|
|
73
|
|
|
|
|
|
|
Then there are two types of text paragraph, for which the text is not |
74
|
|
|
|
|
|
|
syntactically relevant but whitespace still is: |
75
|
|
|
|
|
|
|
|
76
|
|
|
|
|
|
|
=over 4 |
77
|
|
|
|
|
|
|
|
78
|
|
|
|
|
|
|
=item Verbatim paragraphs |
79
|
|
|
|
|
|
|
|
80
|
|
|
|
|
|
|
A line whose first character is whitespace is considered verbatim. No removal of |
81
|
|
|
|
|
|
|
whitespace is done to the rest of the paragraph if the first character is |
82
|
|
|
|
|
|
|
whitespace; all your text is repeated verbatim, hence the name |
83
|
|
|
|
|
|
|
|
84
|
|
|
|
|
|
|
The verbatim paragraph continues until the first non-verbatim paragraph is |
85
|
|
|
|
|
|
|
encountered. A blank line is no longer considered to end the paragraph. |
86
|
|
|
|
|
|
|
Therefore, two verbatim paragraphs can only be separated by a non-verbatim |
87
|
|
|
|
|
|
|
paragraph with non-whitespace content. The special formatting code C<< ZZ<><> >> |
88
|
|
|
|
|
|
|
can be used on its own to separate them with zero-width content. |
89
|
|
|
|
|
|
|
|
90
|
|
|
|
|
|
|
All lines in the verbatim paragraph will have their leading whitespace removed. |
91
|
|
|
|
|
|
|
This is done intelligently: the I amount of leading whitespace found on |
92
|
|
|
|
|
|
|
any line is removed from all lines. This allows you to indent other lines (even |
93
|
|
|
|
|
|
|
the first one) relative to the syntactic whitespace that defines the verbatim |
94
|
|
|
|
|
|
|
paragraph without your indentation being parsed out. |
95
|
|
|
|
|
|
|
|
96
|
|
|
|
|
|
|
L are not parsed in verbatim paragraphs, as expected. |
97
|
|
|
|
|
|
|
|
98
|
|
|
|
|
|
|
When a verbatim paragraph has been collated, it is passed to L. |
99
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
=item Paragraphs |
101
|
|
|
|
|
|
|
|
102
|
|
|
|
|
|
|
Everything that doesn't get caught by one of the above rules is deemed to be a |
103
|
|
|
|
|
|
|
plain text paragraph. As with all paragraphs, a single line break is removed by |
104
|
|
|
|
|
|
|
the parser and a blank line causes the paragraph to be processed. It is passed |
105
|
|
|
|
|
|
|
to L. |
106
|
|
|
|
|
|
|
|
107
|
|
|
|
|
|
|
=back |
108
|
|
|
|
|
|
|
|
109
|
|
|
|
|
|
|
And finally the inline formatting markup, entities. |
110
|
|
|
|
|
|
|
|
111
|
|
|
|
|
|
|
=over |
112
|
|
|
|
|
|
|
|
113
|
|
|
|
|
|
|
=item C<< XZ<><> >> |
114
|
|
|
|
|
|
|
X X |
115
|
|
|
|
|
|
|
|
116
|
|
|
|
|
|
|
An entity is defined as a capital letter followed by a delimiter that is |
117
|
|
|
|
|
|
|
repeated n times, then any amount of text up to a matching quantity of a |
118
|
|
|
|
|
|
|
balanced delimiter. |
119
|
|
|
|
|
|
|
|
120
|
|
|
|
|
|
|
In normal POD the only delimiter is C<< < >>, so entities have the format C<< |
121
|
|
|
|
|
|
|
XZ<><> >>; except that the opening delimiter may be duplicated as long as the |
122
|
|
|
|
|
|
|
closing delimiter matches, allowing you to put the delimiter itself inside the |
123
|
|
|
|
|
|
|
entity: C<<< XZ<><<>> >>>; in Pod::Cats you can use any delimiter, removing the |
124
|
|
|
|
|
|
|
requirement to duplicate it at all: C<< C[ XZ<><> ] >>. |
125
|
|
|
|
|
|
|
|
126
|
|
|
|
|
|
|
Once an entity has begun, nested entities are only considered if the delimiters |
127
|
|
|
|
|
|
|
are the same as those used for the outer entity: C<< B[ I[bold-italic] ] >>; |
128
|
|
|
|
|
|
|
C<< B[IZ<>] >>. |
129
|
|
|
|
|
|
|
|
130
|
|
|
|
|
|
|
Apart from the special entity C<< ZZ<><> >>, the letter used for the entity has |
131
|
|
|
|
|
|
|
no inherent meaning to Pod::Cats. The parsed entity is provided to |
132
|
|
|
|
|
|
|
L. C<< ZZ<><> >> retains its meaning from POD, which is to be a |
133
|
|
|
|
|
|
|
zero-width 'divider' to break up things that would otherwise be considered |
134
|
|
|
|
|
|
|
syntax. You are not given C<< ZZ<><> >> to handle, and C<< ZZ<><> >> itself will |
135
|
|
|
|
|
|
|
produce undef if it is the only content to an element. A paragraph comprising solely |
136
|
|
|
|
|
|
|
C<< ZZ<><> >> will never generate a parsed paragraph; it will be skipped. |
137
|
|
|
|
|
|
|
|
138
|
|
|
|
|
|
|
=back |
139
|
|
|
|
|
|
|
|
140
|
|
|
|
|
|
|
=head1 METHODS |
141
|
|
|
|
|
|
|
|
142
|
|
|
|
|
|
|
=cut |
143
|
|
|
|
|
|
|
|
144
|
|
|
|
|
|
|
our $VERSION = '0.06'; |
145
|
|
|
|
|
|
|
|
146
|
|
|
|
|
|
|
=head2 new |
147
|
|
|
|
|
|
|
|
148
|
|
|
|
|
|
|
Create a new parser. Options are provided as a hashref, but there is currently |
149
|
|
|
|
|
|
|
only one: |
150
|
|
|
|
|
|
|
|
151
|
|
|
|
|
|
|
=over |
152
|
|
|
|
|
|
|
|
153
|
|
|
|
|
|
|
=item delimiters |
154
|
|
|
|
|
|
|
|
155
|
|
|
|
|
|
|
A string containing delimiters to use. Bracketed delimiters will be balanced; |
156
|
|
|
|
|
|
|
other delimiters will simply be used as-is. This echoes the delimiter philosophy |
157
|
|
|
|
|
|
|
of Perl syntax such as regexes and C. The string should be all the possible |
158
|
|
|
|
|
|
|
delimiters, listed once each, and only the opening brackets of balanced pairs. |
159
|
|
|
|
|
|
|
|
160
|
|
|
|
|
|
|
The default is C<< '<' >>, same as POD. |
161
|
|
|
|
|
|
|
|
162
|
|
|
|
|
|
|
=back |
163
|
|
|
|
|
|
|
|
164
|
|
|
|
|
|
|
=cut |
165
|
|
|
|
|
|
|
|
166
|
|
|
|
|
|
|
sub new { |
167
|
4
|
|
|
4
|
1
|
104
|
my $class = shift; |
168
|
4
|
|
100
|
|
|
26
|
my $opts = shift || {}; |
169
|
4
|
|
|
|
|
14
|
my $self = bless $opts, $class; # FIXME |
170
|
|
|
|
|
|
|
|
171
|
4
|
|
|
|
|
14
|
return $self; |
172
|
|
|
|
|
|
|
} |
173
|
|
|
|
|
|
|
|
174
|
|
|
|
|
|
|
=head2 parse |
175
|
|
|
|
|
|
|
|
176
|
|
|
|
|
|
|
Parses a string containing whatever Pod::Cats code you have. |
177
|
|
|
|
|
|
|
|
178
|
|
|
|
|
|
|
=cut |
179
|
|
|
|
|
|
|
|
180
|
|
|
|
|
|
|
sub parse { |
181
|
0
|
|
|
0
|
1
|
0
|
my ($self, $string) = @_; |
182
|
|
|
|
|
|
|
|
183
|
0
|
|
|
|
|
0
|
return $self->parse_lines(split /\n/, $string); |
184
|
|
|
|
|
|
|
} |
185
|
|
|
|
|
|
|
|
186
|
|
|
|
|
|
|
=head2 parse_file |
187
|
|
|
|
|
|
|
|
188
|
|
|
|
|
|
|
Opens the file given by filename and reads it all in and then parses that. |
189
|
|
|
|
|
|
|
|
190
|
|
|
|
|
|
|
=cut |
191
|
|
|
|
|
|
|
|
192
|
|
|
|
|
|
|
sub parse_file { |
193
|
0
|
|
|
0
|
1
|
0
|
my ($self, $filename) = @_; |
194
|
|
|
|
|
|
|
|
195
|
0
|
0
|
|
|
|
0
|
carp "File not found: " . $filename unless -e $filename; |
196
|
|
|
|
|
|
|
|
197
|
0
|
|
|
|
|
0
|
open my $fh, "<", $filename; |
198
|
0
|
|
|
|
|
0
|
chomp(my @lines = <$fh>); |
199
|
0
|
|
|
|
|
0
|
close $fh; |
200
|
|
|
|
|
|
|
|
201
|
0
|
|
|
|
|
0
|
return $self->parse_lines(@lines); |
202
|
|
|
|
|
|
|
} |
203
|
|
|
|
|
|
|
|
204
|
|
|
|
|
|
|
=head2 parse_lines |
205
|
|
|
|
|
|
|
|
206
|
|
|
|
|
|
|
L and L both come here, which just takes the markup text |
207
|
|
|
|
|
|
|
as an array of lines and parses them. This is where the logic happens. It is |
208
|
|
|
|
|
|
|
exposed publicly so you can parse an array of your own if you want. |
209
|
|
|
|
|
|
|
|
210
|
|
|
|
|
|
|
=cut |
211
|
|
|
|
|
|
|
|
212
|
|
|
|
|
|
|
sub parse_lines { |
213
|
4
|
|
|
4
|
1
|
138
|
my ($self, @lines) = @_; |
214
|
|
|
|
|
|
|
|
215
|
4
|
|
|
|
|
11
|
my $result = ""; |
216
|
|
|
|
|
|
|
|
217
|
|
|
|
|
|
|
# The buffer type goes in the first element, and its |
218
|
|
|
|
|
|
|
# contents, if any, in the rest. |
219
|
4
|
|
|
|
|
8
|
my @buffer; |
220
|
4
|
|
|
|
|
34
|
$self->{dom} = []; |
221
|
|
|
|
|
|
|
|
222
|
|
|
|
|
|
|
# Special lines are: |
223
|
|
|
|
|
|
|
# - a blank line. An exception is between verbatim paragraphs, so we will |
224
|
|
|
|
|
|
|
# simply re-merge verbatim paras later on |
225
|
|
|
|
|
|
|
# - A line starting with =, + or -. Command paragraph. Process the previous |
226
|
|
|
|
|
|
|
# buffer and start a new one with this. |
227
|
|
|
|
|
|
|
# - Anything else continues the previous buffer, or starts a normal paragraph |
228
|
|
|
|
|
|
|
|
229
|
4
|
|
|
|
|
40
|
shift @lines while $lines[0] !~ /\S/; # shift off leading blank lines! |
230
|
|
|
|
|
|
|
|
231
|
4
|
|
|
|
|
11
|
for my $line (@lines) { |
232
|
57
|
|
|
|
|
76
|
for ($line) { |
233
|
57
|
|
|
|
|
131
|
when (/^\s*$/) { |
234
|
25
|
|
|
|
|
83
|
$self->_process_buffer(@buffer); |
235
|
25
|
|
|
|
|
69
|
@buffer = (); |
236
|
|
|
|
|
|
|
} |
237
|
32
|
|
|
|
|
60
|
when (/^([=+-])/) { |
238
|
3
|
|
|
|
|
9
|
my $type = $1; |
239
|
3
|
50
|
|
|
|
8
|
if (@buffer) { |
240
|
0
|
|
|
|
|
0
|
warn "$type command found without leading blank line."; |
241
|
|
|
|
|
|
|
|
242
|
0
|
|
|
|
|
0
|
$self->_process_buffer(@buffer); |
243
|
0
|
|
|
|
|
0
|
@buffer = (); |
244
|
|
|
|
|
|
|
} |
245
|
|
|
|
|
|
|
|
246
|
3
|
50
|
|
|
|
18
|
push @buffer, { |
247
|
|
|
|
|
|
|
'+' => 'begin', |
248
|
|
|
|
|
|
|
'-' => 'end', |
249
|
|
|
|
|
|
|
'=' => 'command', |
250
|
|
|
|
|
|
|
}->{$type} or die "Don't know what to do with $type"; |
251
|
|
|
|
|
|
|
|
252
|
|
|
|
|
|
|
# find and push the command name onto it; the rest is the first |
253
|
|
|
|
|
|
|
# bit of buffer contents. |
254
|
3
|
|
|
|
|
63
|
push @buffer, grep {$_} ($line =~ /^\Q$type\E(.+?)\b\s*(.*)$/); |
|
6
|
|
|
|
|
19
|
|
255
|
|
|
|
|
|
|
} |
256
|
29
|
|
|
|
|
48
|
when (/^\s+\S/) { |
257
|
10
|
100
|
|
|
|
23
|
push @buffer, "verbatim" if !@buffer; |
258
|
10
|
|
|
|
|
26
|
push @buffer, $line; |
259
|
|
|
|
|
|
|
} |
260
|
19
|
|
|
|
|
23
|
default { |
261
|
|
|
|
|
|
|
# Nothing special, continue previous buffer or start a paragraph. |
262
|
19
|
100
|
|
|
|
57
|
push @buffer, "paragraph" if !@buffer; |
263
|
19
|
|
|
|
|
55
|
push @buffer, $line; |
264
|
|
|
|
|
|
|
} |
265
|
|
|
|
|
|
|
} |
266
|
|
|
|
|
|
|
} |
267
|
|
|
|
|
|
|
|
268
|
4
|
50
|
|
|
|
25
|
$self->_process_buffer(@buffer) if @buffer; |
269
|
4
|
|
|
|
|
32
|
$self->_postprocess_dom(); |
270
|
|
|
|
|
|
|
|
271
|
4
|
|
|
|
|
41
|
$self->_postprocess_paragraphs(); |
272
|
4
|
|
|
|
|
2273
|
return $self->{dom}; |
273
|
|
|
|
|
|
|
} |
274
|
|
|
|
|
|
|
|
275
|
|
|
|
|
|
|
# Adds the buffer and some metadata to the DOM, returning nothing. |
276
|
|
|
|
|
|
|
sub _process_buffer { |
277
|
29
|
|
|
29
|
|
125
|
my ($self, @buffer) = @_; |
278
|
|
|
|
|
|
|
|
279
|
29
|
50
|
|
|
|
64
|
return '' unless @buffer; |
280
|
|
|
|
|
|
|
|
281
|
29
|
|
|
|
|
37
|
my $buffer_type = shift @buffer; |
282
|
|
|
|
|
|
|
|
283
|
29
|
|
|
|
|
69
|
my $node = { |
284
|
|
|
|
|
|
|
type => $buffer_type |
285
|
|
|
|
|
|
|
}; |
286
|
|
|
|
|
|
|
|
287
|
29
|
|
|
|
|
46
|
for ($buffer_type) { |
288
|
29
|
|
|
|
|
71
|
when('paragraph') { |
289
|
|
|
|
|
|
|
# concatenate the lines and normalise whitespace. |
290
|
17
|
|
|
|
|
36
|
my $para = join " ", @buffer; |
291
|
17
|
|
|
|
|
114
|
$node->{content} = $para; |
292
|
|
|
|
|
|
|
} |
293
|
12
|
|
|
|
|
38
|
when('verbatim') { |
294
|
|
|
|
|
|
|
# find the lowest level of indentation in this buffer and strip it |
295
|
9
|
|
|
|
|
12
|
my $indent_level = min map { /^(\s+)/; length $1 } @buffer; |
|
10
|
|
|
|
|
20
|
|
|
10
|
|
|
|
|
38
|
|
296
|
9
|
|
|
|
|
20
|
$node->{content} = join "\n", @buffer; |
297
|
9
|
|
|
|
|
24
|
$node->{indent_level} = $indent_level; |
298
|
|
|
|
|
|
|
} |
299
|
3
|
|
100
|
|
|
12
|
when($_ eq 'command' || $_ eq 'begin') { |
300
|
2
|
|
|
|
|
10
|
$node->{name} = shift @buffer; |
301
|
2
|
|
|
|
|
5
|
my $content = join " ", @buffer; |
302
|
2
|
|
|
|
|
5
|
$node->{content} = $content; |
303
|
|
|
|
|
|
|
} |
304
|
1
|
|
|
|
|
3
|
when('end') { |
305
|
1
|
|
|
|
|
4
|
$node->{name} = shift @buffer; # end tags take no content |
306
|
|
|
|
|
|
|
} |
307
|
|
|
|
|
|
|
} |
308
|
|
|
|
|
|
|
|
309
|
29
|
|
|
|
|
41
|
push @{$self->{dom}}, $node; |
|
29
|
|
|
|
|
72
|
|
310
|
|
|
|
|
|
|
} |
311
|
|
|
|
|
|
|
|
312
|
|
|
|
|
|
|
# This is basically just to merge verbatims together |
313
|
|
|
|
|
|
|
sub _postprocess_dom { |
314
|
4
|
|
|
4
|
|
9
|
my $self = shift; |
315
|
|
|
|
|
|
|
|
316
|
4
|
|
|
|
|
7
|
my @new_dom; |
317
|
|
|
|
|
|
|
my $last_node; |
318
|
4
|
|
|
|
|
9
|
for my $node (@{$self->{dom}}) { |
|
4
|
|
|
|
|
20
|
|
319
|
29
|
100
|
50
|
|
|
63
|
$last_node = $node and next unless defined $last_node; |
320
|
|
|
|
|
|
|
|
321
|
|
|
|
|
|
|
# Don't change the last node until we stop finding verbatims. |
322
|
|
|
|
|
|
|
# That way we can keep using it as the concatenated node. |
323
|
25
|
100
|
100
|
|
|
89
|
if ($last_node->{type} eq 'verbatim' && $node->{type} eq 'verbatim') { |
324
|
|
|
|
|
|
|
# The smallest indent level is considered the level for the merged node. |
325
|
4
|
|
|
|
|
9
|
$last_node->{indent_level} = |
326
|
|
|
|
|
|
|
min( $last_node->{indent_level}, $node->{indent_level}); |
327
|
4
|
|
|
|
|
9
|
$last_node->{content} .= "\n\n" . $node->{content}; |
328
|
|
|
|
|
|
|
|
329
|
|
|
|
|
|
|
} else { |
330
|
|
|
|
|
|
|
# Node type changed, push old one |
331
|
21
|
100
|
|
|
|
73
|
if ($last_node->{type} eq 'verbatim') { |
332
|
5
|
|
|
|
|
6
|
my $to_remove = $last_node->{indent_level}; |
333
|
5
|
50
|
|
|
|
54
|
$last_node->{content} =~ s/^ {$to_remove}//mg if $to_remove; |
334
|
|
|
|
|
|
|
} |
335
|
21
|
|
|
|
|
32
|
push @new_dom, $last_node; |
336
|
21
|
|
|
|
|
36
|
$last_node = $node; |
337
|
|
|
|
|
|
|
} |
338
|
|
|
|
|
|
|
} |
339
|
|
|
|
|
|
|
|
340
|
4
|
|
|
|
|
15
|
push @new_dom, $last_node; |
341
|
4
|
|
|
|
|
13
|
$self->{dom} = \@new_dom; |
342
|
|
|
|
|
|
|
} |
343
|
|
|
|
|
|
|
|
344
|
|
|
|
|
|
|
# Now is the sax-like bit, where it goes through and fires the user's events for |
345
|
|
|
|
|
|
|
# the various types. TODO: what's the point in sax-like if you already made a |
346
|
|
|
|
|
|
|
# DOM? Make this part of the parsing process and create the DOM out of the SAX. |
347
|
|
|
|
|
|
|
sub _postprocess_paragraphs { |
348
|
4
|
|
|
4
|
|
7
|
my $self = shift; |
349
|
|
|
|
|
|
|
|
350
|
4
|
|
|
|
|
14
|
for my $node (@{ $self->{dom} }) { |
|
4
|
|
|
|
|
19
|
|
351
|
25
|
|
|
|
|
14754
|
for ($node->{type}) { |
352
|
25
|
|
|
|
|
60
|
when ('paragraph') { |
353
|
|
|
|
|
|
|
# If _process_entities gives us undef, that was a single Z<>, which should not |
354
|
|
|
|
|
|
|
# generate a new paragraph. |
355
|
17
|
|
100
|
|
|
67
|
$node->{content} = $self->_process_entities($node->{content}) // next; |
356
|
16
|
|
|
|
|
27
|
$self->handle_paragraph(@{ $node->{content} }); |
|
16
|
|
|
|
|
66
|
|
357
|
|
|
|
|
|
|
} |
358
|
8
|
|
|
|
|
16
|
when ('begin') { |
359
|
1
|
|
|
|
|
4
|
$node->{content} = $self->_process_entities($node->{content}); |
360
|
|
|
|
|
|
|
# Check for balance later |
361
|
1
|
|
|
|
|
2
|
push @{$self->{begin_stack}}, $node->{name}; |
|
1
|
|
|
|
|
4
|
|
362
|
|
|
|
|
|
|
|
363
|
1
|
|
50
|
|
|
3
|
$self->handle_begin($node->{name}, @{ $node->{content} // [] }); |
|
1
|
|
|
|
|
21
|
|
364
|
|
|
|
|
|
|
} |
365
|
7
|
|
|
|
|
14
|
when ('end') { |
366
|
1
|
|
|
|
|
6
|
warn "$node->{name} is ended out of sync!" |
367
|
1
|
50
|
|
|
|
2
|
if pop @{$self->{begin_stack}} ne $node->{name}; |
368
|
|
|
|
|
|
|
|
369
|
1
|
|
|
|
|
5
|
$self->handle_end($node->{name}); |
370
|
|
|
|
|
|
|
} |
371
|
6
|
|
|
|
|
10
|
when ('command') { |
372
|
1
|
|
|
|
|
8
|
$node->{content} = $self->_process_entities($node->{content}); |
373
|
1
|
|
50
|
|
|
4
|
$self->handle_command($node->{name}, @{ $node->{content} // [] }); |
|
1
|
|
|
|
|
6
|
|
374
|
|
|
|
|
|
|
} |
375
|
5
|
|
|
|
|
9
|
when ('verbatim') { |
376
|
5
|
|
|
|
|
16
|
$self->handle_verbatim($node->{content}); |
377
|
|
|
|
|
|
|
} |
378
|
|
|
|
|
|
|
} |
379
|
|
|
|
|
|
|
} |
380
|
|
|
|
|
|
|
} |
381
|
|
|
|
|
|
|
|
382
|
|
|
|
|
|
|
=head2 handle_verbatim |
383
|
|
|
|
|
|
|
|
384
|
|
|
|
|
|
|
The verbatim paragraph as it was in the code, except with the minimum amount of |
385
|
|
|
|
|
|
|
whitespace stripped from each line as described in L. |
386
|
|
|
|
|
|
|
Passed in as a single string with line breaks preserved. |
387
|
|
|
|
|
|
|
|
388
|
|
|
|
|
|
|
Do whatever you want. Default is to return the string straight back atcha. |
389
|
|
|
|
|
|
|
|
390
|
|
|
|
|
|
|
=cut |
391
|
|
|
|
|
|
|
|
392
|
|
|
|
|
|
|
sub handle_verbatim { |
393
|
5
|
|
|
5
|
1
|
25
|
shift; |
394
|
5
|
|
|
|
|
12
|
shift; |
395
|
|
|
|
|
|
|
} |
396
|
|
|
|
|
|
|
|
397
|
|
|
|
|
|
|
=head2 handle_entity |
398
|
|
|
|
|
|
|
|
399
|
|
|
|
|
|
|
Passed the letter of the L as the first argument and its content |
400
|
|
|
|
|
|
|
as the rest of @_. The content will alternate between plain text and the return |
401
|
|
|
|
|
|
|
value of this function for any nested entities inside this one. |
402
|
|
|
|
|
|
|
|
403
|
|
|
|
|
|
|
For this reason you should return a scalar from this method, be it text or a |
404
|
|
|
|
|
|
|
ref. The default is to concatenate @_, thus replacing entities with their |
405
|
|
|
|
|
|
|
contents. |
406
|
|
|
|
|
|
|
|
407
|
|
|
|
|
|
|
Note that this method is the only one whose return value is of relevance to the |
408
|
|
|
|
|
|
|
parser, since what you return from this will appear in another handler, |
409
|
|
|
|
|
|
|
depending on what type of paragraph the entity is in. |
410
|
|
|
|
|
|
|
|
411
|
|
|
|
|
|
|
You will never get the C<< ZZ<><> >> entity. |
412
|
|
|
|
|
|
|
|
413
|
|
|
|
|
|
|
=cut |
414
|
|
|
|
|
|
|
|
415
|
|
|
|
|
|
|
sub handle_entity { |
416
|
5
|
|
|
5
|
1
|
6440
|
shift; shift; |
|
5
|
|
|
|
|
7
|
|
417
|
5
|
|
100
|
|
|
40
|
join ' ', map $_ // '', @_; |
418
|
|
|
|
|
|
|
} |
419
|
|
|
|
|
|
|
|
420
|
|
|
|
|
|
|
# preprocess paragraph before giving it to the user. handle_entity is called |
421
|
|
|
|
|
|
|
# from the parser itself. |
422
|
|
|
|
|
|
|
sub _process_entities { |
423
|
19
|
|
|
19
|
|
30
|
my ($self, $para) = @_; |
424
|
|
|
|
|
|
|
|
425
|
|
|
|
|
|
|
# 1. replace POD-like Z<...> with user-defined functions. |
426
|
|
|
|
|
|
|
# Z itself is the only actual exception to that. |
427
|
19
|
|
100
|
|
|
125
|
$self->{parser} ||= Pod::Cats::Parser::MGC->new( |
|
|
|
66
|
|
|
|
|
428
|
|
|
|
|
|
|
object => $self, |
429
|
|
|
|
|
|
|
delimiters => $self->{delimiters} // '<' |
430
|
|
|
|
|
|
|
); |
431
|
|
|
|
|
|
|
|
432
|
19
|
|
|
|
|
80
|
my $parsed = $self->{parser}->from_string( $para ); |
433
|
|
|
|
|
|
|
|
434
|
|
|
|
|
|
|
# Single return of undef was Z<> |
435
|
19
|
100
|
66
|
|
|
1722
|
return defined $parsed->[0] || @$parsed > 1 ? $parsed : (); |
436
|
|
|
|
|
|
|
} |
437
|
|
|
|
|
|
|
|
438
|
|
|
|
|
|
|
=head2 handle_paragraph |
439
|
|
|
|
|
|
|
|
440
|
|
|
|
|
|
|
The paragraph is split into sections that alternate between plain text and the |
441
|
|
|
|
|
|
|
return values of L as described above. These |
442
|
|
|
|
|
|
|
sections are arrayed in @_. Note that the paragraph could start with an entity. |
443
|
|
|
|
|
|
|
|
444
|
|
|
|
|
|
|
By default it returns @_ concatenated, since the default behaviour of |
445
|
|
|
|
|
|
|
L is to remove the formatting but keep the |
446
|
|
|
|
|
|
|
contents. |
447
|
|
|
|
|
|
|
|
448
|
|
|
|
|
|
|
=cut |
449
|
|
|
|
|
|
|
|
450
|
|
|
|
|
|
|
sub handle_paragraph { |
451
|
14
|
|
100
|
14
|
1
|
37
|
shift; join ' ', map $_ // '', @_; |
|
14
|
|
|
|
|
152
|
|
452
|
|
|
|
|
|
|
} |
453
|
|
|
|
|
|
|
|
454
|
|
|
|
|
|
|
=head2 handle_command |
455
|
|
|
|
|
|
|
|
456
|
|
|
|
|
|
|
When a L is encountered it comes here. The first argument is |
457
|
|
|
|
|
|
|
the B (from B<=COMMAND>); the rest of the arguments follow the rules of |
458
|
|
|
|
|
|
|
L and alternate between plain text and parsed |
459
|
|
|
|
|
|
|
entities. |
460
|
|
|
|
|
|
|
|
461
|
|
|
|
|
|
|
By default it returns @_ concatenated, same as paragraphs. |
462
|
|
|
|
|
|
|
|
463
|
|
|
|
|
|
|
=cut |
464
|
|
|
|
|
|
|
|
465
|
|
|
|
|
|
|
sub handle_command { |
466
|
0
|
|
0
|
0
|
1
|
|
shift; shift; join ' ', map $_ // '', @_; |
|
0
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
467
|
|
|
|
|
|
|
} |
468
|
|
|
|
|
|
|
|
469
|
|
|
|
|
|
|
=head2 handle_begin |
470
|
|
|
|
|
|
|
|
471
|
|
|
|
|
|
|
This is handled the same as L, except it is called when a |
472
|
|
|
|
|
|
|
L command is encountered. The same rules apply. |
473
|
|
|
|
|
|
|
|
474
|
|
|
|
|
|
|
=cut |
475
|
|
|
|
|
|
|
|
476
|
|
|
|
|
|
|
sub _handle_begin { |
477
|
0
|
|
0
|
0
|
|
|
shift; shift; join ' ', map $_ // '', @_; |
|
0
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
478
|
|
|
|
|
|
|
} |
479
|
|
|
|
|
|
|
|
480
|
|
|
|
|
|
|
=head2 handle_end |
481
|
|
|
|
|
|
|
|
482
|
|
|
|
|
|
|
The counterpart to the begin handler. This is called when the L paragraph |
483
|
|
|
|
|
|
|
is encountered. The parser will already have discovered whether your begins and |
484
|
|
|
|
|
|
|
ends are not balanced so you don't need to worry about that. |
485
|
|
|
|
|
|
|
|
486
|
|
|
|
|
|
|
Note that there is no content for an end paragraph so the only argument this |
487
|
|
|
|
|
|
|
gets is the command name. |
488
|
|
|
|
|
|
|
|
489
|
|
|
|
|
|
|
=cut |
490
|
|
|
|
|
|
|
|
491
|
0
|
|
|
0
|
1
|
|
sub handle_end { } |
492
|
|
|
|
|
|
|
|
493
|
|
|
|
|
|
|
=head1 TODO |
494
|
|
|
|
|
|
|
|
495
|
|
|
|
|
|
|
=over |
496
|
|
|
|
|
|
|
|
497
|
|
|
|
|
|
|
=item The document is parsed into DOM, then events are fired SAX-like. |
498
|
|
|
|
|
|
|
Preferable to fire the events and build the DOM from that. |
499
|
|
|
|
|
|
|
|
500
|
|
|
|
|
|
|
=item Currently the matching of begin/end commands is a bit naive. |
501
|
|
|
|
|
|
|
|
502
|
|
|
|
|
|
|
=item Line numbers of errors are not yet reported. |
503
|
|
|
|
|
|
|
|
504
|
|
|
|
|
|
|
=back |
505
|
|
|
|
|
|
|
|
506
|
|
|
|
|
|
|
=head1 AUTHOR |
507
|
|
|
|
|
|
|
|
508
|
|
|
|
|
|
|
Altreus, C<< >> |
509
|
|
|
|
|
|
|
|
510
|
|
|
|
|
|
|
=head1 BUGS |
511
|
|
|
|
|
|
|
|
512
|
|
|
|
|
|
|
Bug reports to github please: http://github.com/Altreus/Pod-Cats/issues |
513
|
|
|
|
|
|
|
|
514
|
|
|
|
|
|
|
=head1 SUPPORT |
515
|
|
|
|
|
|
|
|
516
|
|
|
|
|
|
|
You are reading the only documentation for this module. |
517
|
|
|
|
|
|
|
|
518
|
|
|
|
|
|
|
For more help, give me a holler on irc.freenode.com #perl |
519
|
|
|
|
|
|
|
|
520
|
|
|
|
|
|
|
=head1 ACKNOWLEDGEMENTS |
521
|
|
|
|
|
|
|
|
522
|
|
|
|
|
|
|
Paul Evans (LeoNerd) basically wrote Parser::MGC because I was whining about not |
523
|
|
|
|
|
|
|
being able to parse these entity delimiters with any of the token parsers I |
524
|
|
|
|
|
|
|
could find; and then he wrote a POD example that I only had to tweak in order to |
525
|
|
|
|
|
|
|
do so. So a lot of the credit should go to him! |
526
|
|
|
|
|
|
|
|
527
|
|
|
|
|
|
|
=head1 LICENSE AND COPYRIGHT |
528
|
|
|
|
|
|
|
|
529
|
|
|
|
|
|
|
Copyright 2013 Altreus. |
530
|
|
|
|
|
|
|
|
531
|
|
|
|
|
|
|
This module is released under the MIT licence. |
532
|
|
|
|
|
|
|
|
533
|
|
|
|
|
|
|
=cut |
534
|
|
|
|
|
|
|
|
535
|
|
|
|
|
|
|
1; |