line |
stmt |
bran |
cond |
sub |
pod |
time |
code |
1
|
|
|
|
|
|
|
package Pod::Cats; |
2
|
|
|
|
|
|
|
|
3
|
5
|
|
|
5
|
|
107615
|
use warnings; |
|
5
|
|
|
|
|
12
|
|
|
5
|
|
|
|
|
155
|
|
4
|
5
|
|
|
5
|
|
23
|
use strict; |
|
5
|
|
|
|
|
11
|
|
|
5
|
|
|
|
|
94
|
|
5
|
5
|
|
|
5
|
|
101
|
use 5.010; |
|
5
|
|
|
|
|
20
|
|
6
|
|
|
|
|
|
|
|
7
|
5
|
|
|
5
|
|
2684
|
use Pod::Cats::Parser::MGC; |
|
5
|
|
|
|
|
15
|
|
|
5
|
|
|
|
|
188
|
|
8
|
5
|
|
|
5
|
|
43
|
use List::Util qw(min max); |
|
5
|
|
|
|
|
45
|
|
|
5
|
|
|
|
|
549
|
|
9
|
5
|
|
|
5
|
|
26
|
use Carp; |
|
5
|
|
|
|
|
8
|
|
|
5
|
|
|
|
|
303
|
|
10
|
5
|
|
|
5
|
|
4274
|
use String::Util qw/crunch/; |
|
5
|
|
|
|
|
18298
|
|
|
5
|
|
|
|
|
8492
|
|
11
|
|
|
|
|
|
|
|
12
|
|
|
|
|
|
|
=head1 NAME |
13
|
|
|
|
|
|
|
|
14
|
|
|
|
|
|
|
Pod::Cats - The POD-like markup language written for podcats.in |
15
|
|
|
|
|
|
|
|
16
|
|
|
|
|
|
|
=head1 VERSION |
17
|
|
|
|
|
|
|
|
18
|
|
|
|
|
|
|
Version 0.05 |
19
|
|
|
|
|
|
|
|
20
|
|
|
|
|
|
|
=head1 DESCRIPTION |
21
|
|
|
|
|
|
|
|
22
|
|
|
|
|
|
|
POD is an expressive markup language - like Perl is an expressive programming |
23
|
|
|
|
|
|
|
language - and for a plain text file format there is little finer. Pod::Cats is |
24
|
|
|
|
|
|
|
an extension of the POD semantics that adds more syntax and more flexibility to |
25
|
|
|
|
|
|
|
the language. |
26
|
|
|
|
|
|
|
|
27
|
|
|
|
|
|
|
Pod::Cats is designed to be extended and doesn't implement any default |
28
|
|
|
|
|
|
|
commands or entities. |
29
|
|
|
|
|
|
|
|
30
|
|
|
|
|
|
|
=head1 SYNTAX |
31
|
|
|
|
|
|
|
|
32
|
|
|
|
|
|
|
Pod::Cats syntax borrows ideas from POD and adds its own. |
33
|
|
|
|
|
|
|
|
34
|
|
|
|
|
|
|
A paragraph is any block of text delimited by blank lines (whitespace ignored). |
35
|
|
|
|
|
|
|
This is the same as POD, and basically allows you to use hard word wrapping in |
36
|
|
|
|
|
|
|
your markup without having to join them all together for output later. |
37
|
|
|
|
|
|
|
|
38
|
|
|
|
|
|
|
There are three command paragraphs, which are defined by their first character. |
39
|
|
|
|
|
|
|
This character must be in the first column; whitespace at the start of a |
40
|
|
|
|
|
|
|
paragraph is syntactically relevant. |
41
|
|
|
|
|
|
|
|
42
|
|
|
|
|
|
|
=over 4 |
43
|
|
|
|
|
|
|
|
44
|
|
|
|
|
|
|
=item C<=COMMAND CONTENT> |
45
|
|
|
|
|
|
|
X |
46
|
|
|
|
|
|
|
|
47
|
|
|
|
|
|
|
A line beginning with the C<=> symbol denotes a single I. Usually this |
48
|
|
|
|
|
|
|
will be some sort of header, perhaps the equivalent of a C<< >>, something |
49
|
|
|
|
|
|
|
like that. It is roughly equivalent to the self-closing tag in XML. B |
50
|
|
|
|
|
|
|
is just text that may or may not be present. The relationship of B to |
51
|
|
|
|
|
|
|
the B is for you to define, as is the meaning of B. |
52
|
|
|
|
|
|
|
|
53
|
|
|
|
|
|
|
When a C<=COMMAND> block is completed, it is passed to L. |
54
|
|
|
|
|
|
|
|
55
|
|
|
|
|
|
|
=item C<+NAME CONTENT> |
56
|
|
|
|
|
|
|
X |
57
|
|
|
|
|
|
|
|
58
|
|
|
|
|
|
|
A line beginning with C<+> opens a named block; its name is B. Similar to |
59
|
|
|
|
|
|
|
C<=COMMAND>, the B is arbitrary, and its relationship to the B of |
60
|
|
|
|
|
|
|
the block is up to you. |
61
|
|
|
|
|
|
|
|
62
|
|
|
|
|
|
|
When this is encountered you are invited to L. |
63
|
|
|
|
|
|
|
|
64
|
|
|
|
|
|
|
=item C<-NAME> |
65
|
|
|
|
|
|
|
X |
66
|
|
|
|
|
|
|
|
67
|
|
|
|
|
|
|
A line beginning with C<-> is the end of the named block previously started. |
68
|
|
|
|
|
|
|
These must match in reverse order to the C<+> block with the matching B - |
69
|
|
|
|
|
|
|
basically the same as XML's pairs. It is passed to L, |
70
|
|
|
|
|
|
|
and unlike the other two command paragraphs it accepts no content. |
71
|
|
|
|
|
|
|
|
72
|
|
|
|
|
|
|
=back |
73
|
|
|
|
|
|
|
|
74
|
|
|
|
|
|
|
Then there are two types of text paragraph, for which the text is not |
75
|
|
|
|
|
|
|
syntactically relevant but whitespace still is: |
76
|
|
|
|
|
|
|
|
77
|
|
|
|
|
|
|
=over 4 |
78
|
|
|
|
|
|
|
|
79
|
|
|
|
|
|
|
=item Verbatim paragraphs |
80
|
|
|
|
|
|
|
|
81
|
|
|
|
|
|
|
A line whose first character is whitespace is considered verbatim. No removal of |
82
|
|
|
|
|
|
|
whitespace is done to the rest of the paragraph if the first character is |
83
|
|
|
|
|
|
|
whitespace; all your text is repeated verbatim, hence the name |
84
|
|
|
|
|
|
|
|
85
|
|
|
|
|
|
|
The verbatim paragraph continues until the first non-verbatim paragraph is |
86
|
|
|
|
|
|
|
encountered. A blank line is no longer considered to end the paragraph. |
87
|
|
|
|
|
|
|
Therefore, two verbatim paragraphs can only be separated by a non-verbatim |
88
|
|
|
|
|
|
|
paragraph with non-whitespace content. The special formatting code C<< ZZ<><> >> |
89
|
|
|
|
|
|
|
can be used on its own to separate them with zero-width content. |
90
|
|
|
|
|
|
|
|
91
|
|
|
|
|
|
|
All lines in the verbatim paragraph will have their leading whitespace removed. |
92
|
|
|
|
|
|
|
This is done intelligently: the I amount of leading whitespace found on |
93
|
|
|
|
|
|
|
any line is removed from all lines. This allows you to indent other lines (even |
94
|
|
|
|
|
|
|
the first one) relative to the syntactic whitespace that defines the verbatim |
95
|
|
|
|
|
|
|
paragraph without your indentation being parsed out. |
96
|
|
|
|
|
|
|
|
97
|
|
|
|
|
|
|
L are not parsed in verbatim paragraphs, as expected. |
98
|
|
|
|
|
|
|
|
99
|
|
|
|
|
|
|
When a verbatim paragraph has been collated, it is passed to L. |
100
|
|
|
|
|
|
|
|
101
|
|
|
|
|
|
|
=item Paragraphs |
102
|
|
|
|
|
|
|
|
103
|
|
|
|
|
|
|
Everything that doesn't get caught by one of the above rules is deemed to be a |
104
|
|
|
|
|
|
|
plain text paragraph. As with all paragraphs, a single line break is removed by |
105
|
|
|
|
|
|
|
the parser and a blank line causes the paragraph to be processed. It is passed |
106
|
|
|
|
|
|
|
to L. |
107
|
|
|
|
|
|
|
|
108
|
|
|
|
|
|
|
=back |
109
|
|
|
|
|
|
|
|
110
|
|
|
|
|
|
|
And finally the inline formatting markup, entities. |
111
|
|
|
|
|
|
|
|
112
|
|
|
|
|
|
|
=over |
113
|
|
|
|
|
|
|
|
114
|
|
|
|
|
|
|
=item C<< XZ<><> >> |
115
|
|
|
|
|
|
|
X X |
116
|
|
|
|
|
|
|
|
117
|
|
|
|
|
|
|
An entity is defined as a capital letter followed by a delimiter that is |
118
|
|
|
|
|
|
|
repeated n times, then any amount of text up to a matching quantity of a |
119
|
|
|
|
|
|
|
balanced delimiter. |
120
|
|
|
|
|
|
|
|
121
|
|
|
|
|
|
|
In normal POD the only delimiter is C<< < >>, so entities have the format C<< |
122
|
|
|
|
|
|
|
XZ<><> >>; except that the opening delimiter may be duplicated as long as the |
123
|
|
|
|
|
|
|
closing delimiter matches, allowing you to put the delimiter itself inside the |
124
|
|
|
|
|
|
|
entity: C<<< XZ<><<>> >>>; in Pod::Cats you can use any delimiter, removing the |
125
|
|
|
|
|
|
|
requirement to duplicate it at all: C<< C[ XZ<><> ] >>. |
126
|
|
|
|
|
|
|
|
127
|
|
|
|
|
|
|
Once an entity has begun, nested entities are only considered if the delimiters |
128
|
|
|
|
|
|
|
are the same as those used for the outer entity: C<< B[ I[bold-italic] ] >>; |
129
|
|
|
|
|
|
|
C<< B[IZ<>] >>. |
130
|
|
|
|
|
|
|
|
131
|
|
|
|
|
|
|
Apart from the special entity C<< ZZ<><> >>, the letter used for the entity has |
132
|
|
|
|
|
|
|
no inherent meaning to Pod::Cats. The parsed entity is provided to |
133
|
|
|
|
|
|
|
L. C<< ZZ<><> >> retains its meaning from POD, which is to be a |
134
|
|
|
|
|
|
|
zero-width 'divider' to break up things that would otherwise be considered |
135
|
|
|
|
|
|
|
syntax. You are not given C<< ZZ<><> >> to handle, and C<< ZZ<><> >> itself will |
136
|
|
|
|
|
|
|
produce undef if it is the only content to an element. A paragraph comprising solely |
137
|
|
|
|
|
|
|
C<< ZZ<><> >> will never generate a parsed paragraph; it will be skipped. |
138
|
|
|
|
|
|
|
|
139
|
|
|
|
|
|
|
=back |
140
|
|
|
|
|
|
|
|
141
|
|
|
|
|
|
|
=head1 METHODS |
142
|
|
|
|
|
|
|
|
143
|
|
|
|
|
|
|
=cut |
144
|
|
|
|
|
|
|
|
145
|
|
|
|
|
|
|
our $VERSION = '0.08'; |
146
|
|
|
|
|
|
|
|
147
|
|
|
|
|
|
|
=head2 new |
148
|
|
|
|
|
|
|
|
149
|
|
|
|
|
|
|
Create a new parser. Options are provided as a hashref, but there is currently |
150
|
|
|
|
|
|
|
only one: |
151
|
|
|
|
|
|
|
|
152
|
|
|
|
|
|
|
=over |
153
|
|
|
|
|
|
|
|
154
|
|
|
|
|
|
|
=item delimiters |
155
|
|
|
|
|
|
|
|
156
|
|
|
|
|
|
|
A string containing delimiters to use. Bracketed delimiters will be balanced; |
157
|
|
|
|
|
|
|
other delimiters will simply be used as-is. This echoes the delimiter philosophy |
158
|
|
|
|
|
|
|
of Perl syntax such as regexes and C. The string should be all the possible |
159
|
|
|
|
|
|
|
delimiters, listed once each, and only the opening brackets of balanced pairs. |
160
|
|
|
|
|
|
|
|
161
|
|
|
|
|
|
|
The default is C<< '<' >>, same as POD. |
162
|
|
|
|
|
|
|
|
163
|
|
|
|
|
|
|
=back |
164
|
|
|
|
|
|
|
|
165
|
|
|
|
|
|
|
=cut |
166
|
|
|
|
|
|
|
|
167
|
|
|
|
|
|
|
sub new { |
168
|
3
|
|
|
3
|
1
|
83
|
my $class = shift; |
169
|
3
|
|
100
|
|
|
20
|
my $opts = shift || {}; |
170
|
3
|
|
|
|
|
9
|
my $self = bless $opts, $class; # FIXME |
171
|
|
|
|
|
|
|
|
172
|
3
|
|
|
|
|
9
|
return $self; |
173
|
|
|
|
|
|
|
} |
174
|
|
|
|
|
|
|
|
175
|
|
|
|
|
|
|
=head2 parse |
176
|
|
|
|
|
|
|
|
177
|
|
|
|
|
|
|
Parses a string containing whatever Pod::Cats code you have. |
178
|
|
|
|
|
|
|
|
179
|
|
|
|
|
|
|
=cut |
180
|
|
|
|
|
|
|
|
181
|
|
|
|
|
|
|
sub parse { |
182
|
0
|
|
|
0
|
1
|
0
|
my ($self, $string) = @_; |
183
|
|
|
|
|
|
|
|
184
|
0
|
|
|
|
|
0
|
return $self->parse_lines(split /\n/, $string); |
185
|
|
|
|
|
|
|
} |
186
|
|
|
|
|
|
|
|
187
|
|
|
|
|
|
|
=head2 parse_file |
188
|
|
|
|
|
|
|
|
189
|
|
|
|
|
|
|
Opens the file given by filename and reads it all in and then parses that. |
190
|
|
|
|
|
|
|
|
191
|
|
|
|
|
|
|
=cut |
192
|
|
|
|
|
|
|
|
193
|
|
|
|
|
|
|
sub parse_file { |
194
|
0
|
|
|
0
|
1
|
0
|
my ($self, $filename) = @_; |
195
|
|
|
|
|
|
|
|
196
|
0
|
0
|
|
|
|
0
|
carp "File not found: " . $filename unless -e $filename; |
197
|
|
|
|
|
|
|
|
198
|
0
|
|
|
|
|
0
|
open my $fh, "<", $filename; |
199
|
0
|
|
|
|
|
0
|
chomp(my @lines = <$fh>); |
200
|
0
|
|
|
|
|
0
|
close $fh; |
201
|
|
|
|
|
|
|
|
202
|
0
|
|
|
|
|
0
|
return $self->parse_lines(@lines); |
203
|
|
|
|
|
|
|
} |
204
|
|
|
|
|
|
|
|
205
|
|
|
|
|
|
|
=head2 parse_lines |
206
|
|
|
|
|
|
|
|
207
|
|
|
|
|
|
|
L and L both come here, which just takes the markup text |
208
|
|
|
|
|
|
|
as an array of lines and parses them. This is where the logic happens. It is |
209
|
|
|
|
|
|
|
exposed publicly so you can parse an array of your own if you want. |
210
|
|
|
|
|
|
|
|
211
|
|
|
|
|
|
|
=cut |
212
|
|
|
|
|
|
|
|
213
|
|
|
|
|
|
|
sub parse_lines { |
214
|
3
|
|
|
3
|
1
|
103
|
my ($self, @lines) = @_; |
215
|
|
|
|
|
|
|
|
216
|
3
|
|
|
|
|
9
|
my $result = ""; |
217
|
3
|
|
|
|
|
23
|
$self->_new_buffer; |
218
|
|
|
|
|
|
|
|
219
|
|
|
|
|
|
|
# Special lines are: |
220
|
|
|
|
|
|
|
# - a blank line. An exception is between verbatim paragraphs, so we will |
221
|
|
|
|
|
|
|
# simply re-merge verbatim paras later on |
222
|
|
|
|
|
|
|
# - A line starting with =, + or -. Command paragraph. Process the previous |
223
|
|
|
|
|
|
|
# buffer and start a new one with this. |
224
|
|
|
|
|
|
|
# - Anything else continues the previous buffer, or starts a normal paragraph |
225
|
|
|
|
|
|
|
|
226
|
3
|
|
|
|
|
5
|
my $line_num = @lines; |
227
|
3
|
|
|
|
|
22
|
shift @lines while $lines[0] !~ /\S/; # shift off leading blank lines! |
228
|
|
|
|
|
|
|
|
229
|
|
|
|
|
|
|
# Adjust for lines we removed. |
230
|
3
|
|
|
|
|
6
|
$line_num = $line_num - @lines; |
231
|
|
|
|
|
|
|
|
232
|
3
|
|
50
|
0
|
|
32
|
my $original_warn = $SIG{__WARN__} || sub { warn $_[0] }; |
|
0
|
|
|
|
|
0
|
|
233
|
|
|
|
|
|
|
local $SIG{__WARN__} = sub { |
234
|
0
|
|
|
0
|
|
0
|
my $warning = shift; |
235
|
0
|
|
|
|
|
0
|
$warning =~ s/at .+/at line $line_num/; |
236
|
0
|
|
|
|
|
0
|
$original_warn->($warning); |
237
|
3
|
|
|
|
|
24
|
}; |
238
|
|
|
|
|
|
|
|
239
|
3
|
|
|
|
|
10
|
for my $line (@lines) { |
240
|
46
|
|
|
|
|
61
|
$line_num++; |
241
|
|
|
|
|
|
|
|
242
|
46
|
|
|
|
|
85
|
for ($line) { |
243
|
46
|
100
|
|
|
|
222
|
if (/^\s*$/) { |
|
|
50
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
244
|
21
|
|
|
|
|
60
|
$self->_handle_blank_line; |
245
|
|
|
|
|
|
|
} |
246
|
|
|
|
|
|
|
elsif (/^([=+-])/) { |
247
|
0
|
|
|
|
|
0
|
my $type = $1; |
248
|
|
|
|
|
|
|
my $buftype = { |
249
|
|
|
|
|
|
|
'+' => 'begin', |
250
|
|
|
|
|
|
|
'-' => 'end', |
251
|
|
|
|
|
|
|
'=' => 'command', |
252
|
0
|
|
|
|
|
0
|
}->{$type}; |
253
|
|
|
|
|
|
|
|
254
|
|
|
|
|
|
|
# First word is command name; rest is buffer. |
255
|
0
|
|
|
|
|
0
|
my ($command, $buffer) = $line =~ /^\Q$type\E(\w+)\s*(.*)$/; |
256
|
|
|
|
|
|
|
|
257
|
0
|
|
|
|
|
0
|
$self->_handle_command_line($buftype, $command); |
258
|
0
|
|
|
|
|
0
|
$self->_buffer($buffer); |
259
|
|
|
|
|
|
|
} |
260
|
|
|
|
|
|
|
elsif (/^\s+\S/) { |
261
|
10
|
|
|
|
|
30
|
$self->_handle_verbatim_line($line); |
262
|
|
|
|
|
|
|
} |
263
|
|
|
|
|
|
|
else { |
264
|
15
|
|
|
|
|
53
|
$self->_handle_normal_line($line); |
265
|
|
|
|
|
|
|
} |
266
|
|
|
|
|
|
|
} |
267
|
|
|
|
|
|
|
} |
268
|
|
|
|
|
|
|
|
269
|
3
|
|
|
|
|
24
|
$self->_end_of_line; |
270
|
|
|
|
|
|
|
} |
271
|
|
|
|
|
|
|
|
272
|
|
|
|
|
|
|
sub _buffer { |
273
|
54
|
|
|
54
|
|
76
|
my $self = shift; |
274
|
54
|
|
|
|
|
71
|
push @{ $self->{buffer} }, @_; |
|
54
|
|
|
|
|
157
|
|
275
|
|
|
|
|
|
|
} |
276
|
|
|
|
|
|
|
|
277
|
|
|
|
|
|
|
sub _new_buffer { |
278
|
38
|
|
|
38
|
|
54
|
my $self = shift; |
279
|
|
|
|
|
|
|
# The buffer type goes in the first element, and its |
280
|
|
|
|
|
|
|
# contents, if any, in the rest. |
281
|
38
|
|
|
|
|
237
|
$self->{buffer} = []; |
282
|
|
|
|
|
|
|
} |
283
|
|
|
|
|
|
|
|
284
|
|
|
|
|
|
|
sub _end_previous_buffer { |
285
|
10
|
|
|
10
|
|
18
|
my $self = shift; |
286
|
10
|
100
|
|
|
|
21
|
return unless $self->_buftype; |
287
|
|
|
|
|
|
|
|
288
|
5
|
50
|
|
|
|
13
|
if ($self->_buftype eq 'verbatim') { |
289
|
|
|
|
|
|
|
warn "Verbatim paragraph ended without blank line" |
290
|
5
|
50
|
|
|
|
18
|
if $self->{buffer}->[-1] =~ /\S/; |
291
|
|
|
|
|
|
|
|
292
|
5
|
|
|
|
|
12
|
$self->_process_buffer; |
293
|
|
|
|
|
|
|
return |
294
|
5
|
|
|
|
|
10
|
} |
295
|
|
|
|
|
|
|
|
296
|
0
|
|
|
|
|
0
|
warn $self->_buftype . " ended without blank line"; |
297
|
0
|
|
|
|
|
0
|
$self->_process_buffer; |
298
|
|
|
|
|
|
|
|
299
|
0
|
|
|
|
|
0
|
return 0; |
300
|
|
|
|
|
|
|
} |
301
|
|
|
|
|
|
|
|
302
|
|
|
|
|
|
|
sub _handle_blank_line { |
303
|
21
|
|
|
21
|
|
27
|
my $self = shift; |
304
|
21
|
50
|
|
|
|
45
|
return unless $self->_buftype; |
305
|
|
|
|
|
|
|
|
306
|
|
|
|
|
|
|
# Handle a verbatim buffer when a different buffer starts, or we run out of |
307
|
|
|
|
|
|
|
# lines. |
308
|
21
|
100
|
|
|
|
49
|
if ($self->_buftype eq 'verbatim') { |
309
|
9
|
|
|
|
|
20
|
$self->_buffer(''); |
310
|
9
|
|
|
|
|
23
|
return; |
311
|
|
|
|
|
|
|
} |
312
|
|
|
|
|
|
|
|
313
|
12
|
|
|
|
|
44
|
$self->_process_buffer; |
314
|
|
|
|
|
|
|
} |
315
|
|
|
|
|
|
|
|
316
|
|
|
|
|
|
|
sub _handle_command_line { |
317
|
0
|
|
|
0
|
|
0
|
my $self = shift; |
318
|
0
|
|
|
|
|
0
|
my $type = shift; |
319
|
0
|
|
|
|
|
0
|
my $command = shift; |
320
|
|
|
|
|
|
|
|
321
|
0
|
|
|
|
|
0
|
$self->_end_previous_buffer; |
322
|
|
|
|
|
|
|
|
323
|
0
|
|
|
|
|
0
|
$self->_buffer($type, $command); |
324
|
|
|
|
|
|
|
} |
325
|
|
|
|
|
|
|
|
326
|
|
|
|
|
|
|
# Verbatim lines just get buffered until something else happens. |
327
|
|
|
|
|
|
|
sub _handle_verbatim_line { |
328
|
10
|
|
|
10
|
|
14
|
my $self = shift; |
329
|
|
|
|
|
|
|
|
330
|
10
|
100
|
|
|
|
25
|
if ($self->_buftype ne 'verbatim') { |
331
|
5
|
|
|
|
|
16
|
$self->_end_previous_buffer; |
332
|
5
|
|
|
|
|
12
|
$self->_buffer('verbatim'); |
333
|
|
|
|
|
|
|
} |
334
|
10
|
|
|
|
|
26
|
$self->_buffer(@_); |
335
|
|
|
|
|
|
|
} |
336
|
|
|
|
|
|
|
|
337
|
|
|
|
|
|
|
sub _handle_normal_line { |
338
|
15
|
|
|
15
|
|
31
|
my $self = shift; |
339
|
15
|
|
|
|
|
23
|
my $line = shift; |
340
|
|
|
|
|
|
|
|
341
|
|
|
|
|
|
|
# Nothing special, continue previous buffer or start a paragraph. |
342
|
15
|
100
|
|
|
|
101
|
if ($self->_buftype eq 'verbatim') { |
343
|
5
|
|
|
|
|
66
|
$self->_end_previous_buffer; |
344
|
|
|
|
|
|
|
} |
345
|
|
|
|
|
|
|
|
346
|
15
|
50
|
|
|
|
40
|
if (not $self->_buftype) { |
347
|
15
|
|
|
|
|
49
|
$self->_buffer('paragraph'); |
348
|
|
|
|
|
|
|
} |
349
|
|
|
|
|
|
|
|
350
|
15
|
|
|
|
|
35
|
$self->_buffer($line); |
351
|
|
|
|
|
|
|
} |
352
|
|
|
|
|
|
|
|
353
|
|
|
|
|
|
|
sub _buftype { |
354
|
117
|
|
|
117
|
|
149
|
my $self = shift; |
355
|
117
|
100
|
|
|
|
129
|
if (@{ $self->{buffer} }) { |
|
117
|
|
|
|
|
320
|
|
356
|
82
|
|
|
|
|
257
|
return $self->{buffer}->[0]; |
357
|
|
|
|
|
|
|
} |
358
|
35
|
|
|
|
|
108
|
return ''; |
359
|
|
|
|
|
|
|
} |
360
|
|
|
|
|
|
|
|
361
|
|
|
|
|
|
|
sub _end_of_line { |
362
|
3
|
|
|
3
|
|
8
|
my $self = shift; |
363
|
3
|
|
|
|
|
10
|
$self->_process_buffer; |
364
|
|
|
|
|
|
|
} |
365
|
|
|
|
|
|
|
|
366
|
|
|
|
|
|
|
sub _process_buffer { |
367
|
20
|
|
|
20
|
|
28
|
my $self = shift; |
368
|
|
|
|
|
|
|
|
369
|
20
|
50
|
|
|
|
45
|
return '' unless $self->_buftype; |
370
|
|
|
|
|
|
|
|
371
|
20
|
|
|
|
|
45
|
my @buffer = @{$self->{buffer}}; |
|
20
|
|
|
|
|
55
|
|
372
|
|
|
|
|
|
|
|
373
|
20
|
|
|
|
|
42
|
for (shift @buffer) { |
374
|
20
|
100
|
0
|
|
|
53
|
if($_ eq 'paragraph') { |
|
|
50
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
375
|
|
|
|
|
|
|
# concatenate the lines and normalise whitespace. |
376
|
15
|
|
|
|
|
72
|
my $para = crunch(join " ", @buffer); |
377
|
15
|
|
|
|
|
252
|
$self->handle_paragraph($self->_process_entities($para)); |
378
|
15
|
|
|
|
|
6675
|
$self->_new_buffer; |
379
|
|
|
|
|
|
|
} |
380
|
|
|
|
|
|
|
elsif($_ eq 'verbatim') { |
381
|
|
|
|
|
|
|
# There will have been a warning if this is not the empty string |
382
|
5
|
50
|
|
|
|
15
|
pop @buffer if $buffer[-1] eq ''; |
383
|
|
|
|
|
|
|
|
384
|
|
|
|
|
|
|
# find the lowest level of indentation in this buffer and strip it |
385
|
|
|
|
|
|
|
# Avoid zeroes from blank lines in between |
386
|
5
|
100
|
|
|
|
9
|
my $indent_level = min map { /^(\s*)/; length $1 || () } @buffer; |
|
14
|
|
|
|
|
32
|
|
|
14
|
|
|
|
|
64
|
|
387
|
5
|
|
|
|
|
68
|
s/^\s{$indent_level}// for @buffer; |
388
|
|
|
|
|
|
|
|
389
|
5
|
|
|
|
|
14
|
my $para = join "\n", @buffer; |
390
|
5
|
|
|
|
|
16
|
$self->handle_verbatim($para); |
391
|
|
|
|
|
|
|
} |
392
|
|
|
|
|
|
|
elsif($_ eq 'command' || $_ eq 'begin') { |
393
|
0
|
|
|
|
|
0
|
my $type = shift @buffer; |
394
|
0
|
|
|
|
|
0
|
my $content = crunch(join " ", @buffer); |
395
|
0
|
|
|
|
|
0
|
$self->${\"handle_$_"}($type, $self->_process_entities($content)); |
|
0
|
|
|
|
|
0
|
|
396
|
|
|
|
|
|
|
} |
397
|
|
|
|
|
|
|
elsif($_ eq 'end') { |
398
|
0
|
|
|
|
|
0
|
$self->handle_end($buffer[0]); |
399
|
|
|
|
|
|
|
} |
400
|
|
|
|
|
|
|
} |
401
|
|
|
|
|
|
|
|
402
|
20
|
|
|
|
|
5266
|
$self->_new_buffer; |
403
|
|
|
|
|
|
|
} |
404
|
|
|
|
|
|
|
|
405
|
|
|
|
|
|
|
=head2 handle_verbatim |
406
|
|
|
|
|
|
|
|
407
|
|
|
|
|
|
|
The verbatim paragraph as it was in the code, except with the minimum amount of |
408
|
|
|
|
|
|
|
whitespace stripped from each line as described in L. |
409
|
|
|
|
|
|
|
Passed in as a single string with line breaks preserved. |
410
|
|
|
|
|
|
|
|
411
|
|
|
|
|
|
|
Do whatever you want. Default is to return the string straight back atcha. |
412
|
|
|
|
|
|
|
|
413
|
|
|
|
|
|
|
=cut |
414
|
|
|
|
|
|
|
|
415
|
|
|
|
|
|
|
sub handle_verbatim { |
416
|
5
|
|
|
5
|
1
|
23
|
shift; |
417
|
5
|
|
|
|
|
11
|
shift; |
418
|
|
|
|
|
|
|
} |
419
|
|
|
|
|
|
|
|
420
|
|
|
|
|
|
|
=head2 handle_entity |
421
|
|
|
|
|
|
|
|
422
|
|
|
|
|
|
|
Passed the letter of the L as the first argument and its content |
423
|
|
|
|
|
|
|
as the rest of @_. The content will alternate between plain text and the return |
424
|
|
|
|
|
|
|
value of this function for any nested entities inside this one. |
425
|
|
|
|
|
|
|
|
426
|
|
|
|
|
|
|
For this reason you should return a scalar from this method, be it text or a |
427
|
|
|
|
|
|
|
ref. The default is to concatenate @_, thus replacing entities with their |
428
|
|
|
|
|
|
|
contents. |
429
|
|
|
|
|
|
|
|
430
|
|
|
|
|
|
|
Note that this method is the only one whose return value is of relevance to the |
431
|
|
|
|
|
|
|
parser, since what you return from this will appear in another handler, |
432
|
|
|
|
|
|
|
depending on what type of paragraph the entity is in. |
433
|
|
|
|
|
|
|
|
434
|
|
|
|
|
|
|
You will never get the C<< ZZ<><> >> entity. |
435
|
|
|
|
|
|
|
|
436
|
|
|
|
|
|
|
=cut |
437
|
|
|
|
|
|
|
|
438
|
|
|
|
|
|
|
sub handle_entity { |
439
|
5
|
|
|
5
|
1
|
6827
|
shift; shift; |
|
5
|
|
|
|
|
6
|
|
440
|
5
|
|
100
|
|
|
38
|
join ' ', map $_ // '', @_; |
441
|
|
|
|
|
|
|
} |
442
|
|
|
|
|
|
|
|
443
|
|
|
|
|
|
|
# preprocess paragraph before giving it to the user. handle_entity is called |
444
|
|
|
|
|
|
|
# from the parser itself. |
445
|
|
|
|
|
|
|
sub _process_entities { |
446
|
15
|
|
|
15
|
|
26
|
my ($self, $para) = @_; |
447
|
|
|
|
|
|
|
|
448
|
|
|
|
|
|
|
# 1. replace POD-like Z<...> with user-defined functions. |
449
|
|
|
|
|
|
|
# Z itself is the only actual exception to that. |
450
|
|
|
|
|
|
|
$self->{parser} ||= Pod::Cats::Parser::MGC->new( |
451
|
|
|
|
|
|
|
object => $self, |
452
|
15
|
|
100
|
|
|
89
|
delimiters => $self->{delimiters} // '<' |
|
|
|
66
|
|
|
|
|
453
|
|
|
|
|
|
|
); |
454
|
|
|
|
|
|
|
|
455
|
15
|
|
|
|
|
57
|
my $parsed = $self->{parser}->from_string( $para ); |
456
|
|
|
|
|
|
|
|
457
|
|
|
|
|
|
|
# Single return of undef was Z<> |
458
|
15
|
100
|
66
|
|
|
1231
|
return defined $parsed->[0] || @$parsed > 1 ? @$parsed : (); |
459
|
|
|
|
|
|
|
} |
460
|
|
|
|
|
|
|
|
461
|
|
|
|
|
|
|
=head2 handle_paragraph |
462
|
|
|
|
|
|
|
|
463
|
|
|
|
|
|
|
The paragraph is split into sections that alternate between plain text and the |
464
|
|
|
|
|
|
|
return values of L as described above. These |
465
|
|
|
|
|
|
|
sections are arrayed in @_. Note that the paragraph could start with an entity. |
466
|
|
|
|
|
|
|
|
467
|
|
|
|
|
|
|
By default it returns @_ concatenated, since the default behaviour of |
468
|
|
|
|
|
|
|
L is to remove the formatting but keep the |
469
|
|
|
|
|
|
|
contents. |
470
|
|
|
|
|
|
|
|
471
|
|
|
|
|
|
|
=cut |
472
|
|
|
|
|
|
|
|
473
|
|
|
|
|
|
|
sub handle_paragraph { |
474
|
15
|
|
100
|
15
|
1
|
37
|
shift; join ' ', map $_ // '', @_; |
|
15
|
|
|
|
|
90
|
|
475
|
|
|
|
|
|
|
} |
476
|
|
|
|
|
|
|
|
477
|
|
|
|
|
|
|
=head2 handle_command |
478
|
|
|
|
|
|
|
|
479
|
|
|
|
|
|
|
When a L is encountered it comes here. The first argument is |
480
|
|
|
|
|
|
|
the B (from B<=COMMAND>); the rest of the arguments follow the rules of |
481
|
|
|
|
|
|
|
L and alternate between plain text and parsed |
482
|
|
|
|
|
|
|
entities. |
483
|
|
|
|
|
|
|
|
484
|
|
|
|
|
|
|
By default it returns @_ concatenated, same as paragraphs. |
485
|
|
|
|
|
|
|
|
486
|
|
|
|
|
|
|
=cut |
487
|
|
|
|
|
|
|
|
488
|
|
|
|
|
|
|
sub handle_command { |
489
|
0
|
|
0
|
0
|
1
|
|
shift; shift; join ' ', map $_ // '', @_; |
|
0
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
490
|
|
|
|
|
|
|
} |
491
|
|
|
|
|
|
|
|
492
|
|
|
|
|
|
|
=head2 handle_begin |
493
|
|
|
|
|
|
|
|
494
|
|
|
|
|
|
|
This is handled the same as L, except it is |
495
|
|
|
|
|
|
|
called when a L command is encountered. The same rules apply. |
496
|
|
|
|
|
|
|
|
497
|
|
|
|
|
|
|
=cut |
498
|
|
|
|
|
|
|
|
499
|
|
|
|
|
|
|
sub handle_begin { |
500
|
0
|
|
0
|
0
|
1
|
|
shift; shift; join ' ', map $_ // '', @_; |
|
0
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
501
|
|
|
|
|
|
|
} |
502
|
|
|
|
|
|
|
|
503
|
|
|
|
|
|
|
=head2 handle_end |
504
|
|
|
|
|
|
|
|
505
|
|
|
|
|
|
|
The counterpart to the begin handler. This is called when the L paragraph |
506
|
|
|
|
|
|
|
is encountered. The parser will already have discovered whether your begins and |
507
|
|
|
|
|
|
|
ends are not balanced so you don't need to worry about that. |
508
|
|
|
|
|
|
|
|
509
|
|
|
|
|
|
|
Note that there is no content for an end paragraph so the only argument this |
510
|
|
|
|
|
|
|
gets is the command name. |
511
|
|
|
|
|
|
|
|
512
|
|
|
|
|
|
|
=cut |
513
|
|
|
|
|
|
|
|
514
|
|
|
|
0
|
1
|
|
sub handle_end { } |
515
|
|
|
|
|
|
|
|
516
|
|
|
|
|
|
|
=head1 TODO |
517
|
|
|
|
|
|
|
|
518
|
|
|
|
|
|
|
=over |
519
|
|
|
|
|
|
|
|
520
|
|
|
|
|
|
|
=item The document is parsed into DOM, then events are fired SAX-like. |
521
|
|
|
|
|
|
|
Preferable to fire the events and build the DOM from that. |
522
|
|
|
|
|
|
|
|
523
|
|
|
|
|
|
|
=item Currently the matching of begin/end commands is a bit naive. |
524
|
|
|
|
|
|
|
|
525
|
|
|
|
|
|
|
=item Line numbers of errors are not yet reported. |
526
|
|
|
|
|
|
|
|
527
|
|
|
|
|
|
|
=back |
528
|
|
|
|
|
|
|
|
529
|
|
|
|
|
|
|
=head1 AUTHOR |
530
|
|
|
|
|
|
|
|
531
|
|
|
|
|
|
|
Altreus, C<< >> |
532
|
|
|
|
|
|
|
|
533
|
|
|
|
|
|
|
=head1 BUGS |
534
|
|
|
|
|
|
|
|
535
|
|
|
|
|
|
|
Bug reports to github please: http://github.com/Altreus/Pod-Cats/issues |
536
|
|
|
|
|
|
|
|
537
|
|
|
|
|
|
|
=head1 SUPPORT |
538
|
|
|
|
|
|
|
|
539
|
|
|
|
|
|
|
You are reading the only documentation for this module. |
540
|
|
|
|
|
|
|
|
541
|
|
|
|
|
|
|
For more help, give me a holler on irc.freenode.com #perl |
542
|
|
|
|
|
|
|
|
543
|
|
|
|
|
|
|
=head1 ACKNOWLEDGEMENTS |
544
|
|
|
|
|
|
|
|
545
|
|
|
|
|
|
|
Paul Evans (LeoNerd) basically wrote Parser::MGC because I was whining about not |
546
|
|
|
|
|
|
|
being able to parse these entity delimiters with any of the token parsers I |
547
|
|
|
|
|
|
|
could find; and then he wrote a POD example that I only had to tweak in order to |
548
|
|
|
|
|
|
|
do so. So a lot of the credit should go to him! |
549
|
|
|
|
|
|
|
|
550
|
|
|
|
|
|
|
=head1 LICENSE AND COPYRIGHT |
551
|
|
|
|
|
|
|
|
552
|
|
|
|
|
|
|
Copyright 2013 Altreus. |
553
|
|
|
|
|
|
|
|
554
|
|
|
|
|
|
|
This module is released under the MIT licence. |
555
|
|
|
|
|
|
|
|
556
|
|
|
|
|
|
|
=cut |
557
|
|
|
|
|
|
|
|
558
|
|
|
|
|
|
|
1; |