File Coverage

blib/lib/Pod/Cats.pm
Criterion Covered Total %
statement 104 140 74.2
branch 28 42 66.6
condition 13 23 56.5
subroutine 22 30 73.3
pod 10 10 100.0
total 177 245 72.2


line stmt bran cond sub pod time code
1             package Pod::Cats;
2              
3 5     5   107615 use warnings;
  5         12  
  5         155  
4 5     5   23 use strict;
  5         11  
  5         94  
5 5     5   101 use 5.010;
  5         20  
6              
7 5     5   2684 use Pod::Cats::Parser::MGC;
  5         15  
  5         188  
8 5     5   43 use List::Util qw(min max);
  5         45  
  5         549  
9 5     5   26 use Carp;
  5         8  
  5         303  
10 5     5   4274 use String::Util qw/crunch/;
  5         18298  
  5         8492  
11              
12             =head1 NAME
13              
14             Pod::Cats - The POD-like markup language written for podcats.in
15              
16             =head1 VERSION
17              
18             Version 0.05
19              
20             =head1 DESCRIPTION
21              
22             POD is an expressive markup language - like Perl is an expressive programming
23             language - and for a plain text file format there is little finer. Pod::Cats is
24             an extension of the POD semantics that adds more syntax and more flexibility to
25             the language.
26              
27             Pod::Cats is designed to be extended and doesn't implement any default
28             commands or entities.
29              
30             =head1 SYNTAX
31              
32             Pod::Cats syntax borrows ideas from POD and adds its own.
33              
34             A paragraph is any block of text delimited by blank lines (whitespace ignored).
35             This is the same as POD, and basically allows you to use hard word wrapping in
36             your markup without having to join them all together for output later.
37              
38             There are three command paragraphs, which are defined by their first character.
39             This character must be in the first column; whitespace at the start of a
40             paragraph is syntactically relevant.
41              
42             =over 4
43              
44             =item C<=COMMAND CONTENT>
45             X
46              
47             A line beginning with the C<=> symbol denotes a single I. Usually this
48             will be some sort of header, perhaps the equivalent of a C<<
>>, something
49             like that. It is roughly equivalent to the self-closing tag in XML. B
50             is just text that may or may not be present. The relationship of B to
51             the B is for you to define, as is the meaning of B.
52              
53             When a C<=COMMAND> block is completed, it is passed to L.
54              
55             =item C<+NAME CONTENT>
56             X
57              
58             A line beginning with C<+> opens a named block; its name is B. Similar to
59             C<=COMMAND>, the B is arbitrary, and its relationship to the B of
60             the block is up to you.
61              
62             When this is encountered you are invited to L.
63              
64             =item C<-NAME>
65             X
66              
67             A line beginning with C<-> is the end of the named block previously started.
68             These must match in reverse order to the C<+> block with the matching B -
69             basically the same as XML's pairs. It is passed to L,
70             and unlike the other two command paragraphs it accepts no content.
71              
72             =back
73              
74             Then there are two types of text paragraph, for which the text is not
75             syntactically relevant but whitespace still is:
76              
77             =over 4
78              
79             =item Verbatim paragraphs
80              
81             A line whose first character is whitespace is considered verbatim. No removal of
82             whitespace is done to the rest of the paragraph if the first character is
83             whitespace; all your text is repeated verbatim, hence the name
84              
85             The verbatim paragraph continues until the first non-verbatim paragraph is
86             encountered. A blank line is no longer considered to end the paragraph.
87             Therefore, two verbatim paragraphs can only be separated by a non-verbatim
88             paragraph with non-whitespace content. The special formatting code C<< ZZ<><> >>
89             can be used on its own to separate them with zero-width content.
90              
91             All lines in the verbatim paragraph will have their leading whitespace removed.
92             This is done intelligently: the I amount of leading whitespace found on
93             any line is removed from all lines. This allows you to indent other lines (even
94             the first one) relative to the syntactic whitespace that defines the verbatim
95             paragraph without your indentation being parsed out.
96              
97             L are not parsed in verbatim paragraphs, as expected.
98              
99             When a verbatim paragraph has been collated, it is passed to L.
100              
101             =item Paragraphs
102              
103             Everything that doesn't get caught by one of the above rules is deemed to be a
104             plain text paragraph. As with all paragraphs, a single line break is removed by
105             the parser and a blank line causes the paragraph to be processed. It is passed
106             to L.
107              
108             =back
109              
110             And finally the inline formatting markup, entities.
111              
112             =over
113              
114             =item C<< XZ<><> >>
115             X X
116              
117             An entity is defined as a capital letter followed by a delimiter that is
118             repeated n times, then any amount of text up to a matching quantity of a
119             balanced delimiter.
120              
121             In normal POD the only delimiter is C<< < >>, so entities have the format C<<
122             XZ<><> >>; except that the opening delimiter may be duplicated as long as the
123             closing delimiter matches, allowing you to put the delimiter itself inside the
124             entity: C<<< XZ<><<>> >>>; in Pod::Cats you can use any delimiter, removing the
125             requirement to duplicate it at all: C<< C[ XZ<><> ] >>.
126              
127             Once an entity has begun, nested entities are only considered if the delimiters
128             are the same as those used for the outer entity: C<< B[ I[bold-italic] ] >>;
129             C<< B[IZ<>] >>.
130              
131             Apart from the special entity C<< ZZ<><> >>, the letter used for the entity has
132             no inherent meaning to Pod::Cats. The parsed entity is provided to
133             L. C<< ZZ<><> >> retains its meaning from POD, which is to be a
134             zero-width 'divider' to break up things that would otherwise be considered
135             syntax. You are not given C<< ZZ<><> >> to handle, and C<< ZZ<><> >> itself will
136             produce undef if it is the only content to an element. A paragraph comprising solely
137             C<< ZZ<><> >> will never generate a parsed paragraph; it will be skipped.
138              
139             =back
140              
141             =head1 METHODS
142              
143             =cut
144              
145             our $VERSION = '0.08';
146              
147             =head2 new
148              
149             Create a new parser. Options are provided as a hashref, but there is currently
150             only one:
151              
152             =over
153              
154             =item delimiters
155              
156             A string containing delimiters to use. Bracketed delimiters will be balanced;
157             other delimiters will simply be used as-is. This echoes the delimiter philosophy
158             of Perl syntax such as regexes and C. The string should be all the possible
159             delimiters, listed once each, and only the opening brackets of balanced pairs.
160              
161             The default is C<< '<' >>, same as POD.
162              
163             =back
164              
165             =cut
166              
167             sub new {
168 3     3 1 83 my $class = shift;
169 3   100     20 my $opts = shift || {};
170 3         9 my $self = bless $opts, $class; # FIXME
171              
172 3         9 return $self;
173             }
174              
175             =head2 parse
176              
177             Parses a string containing whatever Pod::Cats code you have.
178              
179             =cut
180              
181             sub parse {
182 0     0 1 0 my ($self, $string) = @_;
183              
184 0         0 return $self->parse_lines(split /\n/, $string);
185             }
186              
187             =head2 parse_file
188              
189             Opens the file given by filename and reads it all in and then parses that.
190              
191             =cut
192              
193             sub parse_file {
194 0     0 1 0 my ($self, $filename) = @_;
195              
196 0 0       0 carp "File not found: " . $filename unless -e $filename;
197              
198 0         0 open my $fh, "<", $filename;
199 0         0 chomp(my @lines = <$fh>);
200 0         0 close $fh;
201              
202 0         0 return $self->parse_lines(@lines);
203             }
204              
205             =head2 parse_lines
206              
207             L and L both come here, which just takes the markup text
208             as an array of lines and parses them. This is where the logic happens. It is
209             exposed publicly so you can parse an array of your own if you want.
210              
211             =cut
212              
213             sub parse_lines {
214 3     3 1 103 my ($self, @lines) = @_;
215              
216 3         9 my $result = "";
217 3         23 $self->_new_buffer;
218              
219             # Special lines are:
220             # - a blank line. An exception is between verbatim paragraphs, so we will
221             # simply re-merge verbatim paras later on
222             # - A line starting with =, + or -. Command paragraph. Process the previous
223             # buffer and start a new one with this.
224             # - Anything else continues the previous buffer, or starts a normal paragraph
225              
226 3         5 my $line_num = @lines;
227 3         22 shift @lines while $lines[0] !~ /\S/; # shift off leading blank lines!
228              
229             # Adjust for lines we removed.
230 3         6 $line_num = $line_num - @lines;
231              
232 3   50 0   32 my $original_warn = $SIG{__WARN__} || sub { warn $_[0] };
  0         0  
233             local $SIG{__WARN__} = sub {
234 0     0   0 my $warning = shift;
235 0         0 $warning =~ s/at .+/at line $line_num/;
236 0         0 $original_warn->($warning);
237 3         24 };
238              
239 3         10 for my $line (@lines) {
240 46         61 $line_num++;
241              
242 46         85 for ($line) {
243 46 100       222 if (/^\s*$/) {
    50          
    100          
244 21         60 $self->_handle_blank_line;
245             }
246             elsif (/^([=+-])/) {
247 0         0 my $type = $1;
248             my $buftype = {
249             '+' => 'begin',
250             '-' => 'end',
251             '=' => 'command',
252 0         0 }->{$type};
253              
254             # First word is command name; rest is buffer.
255 0         0 my ($command, $buffer) = $line =~ /^\Q$type\E(\w+)\s*(.*)$/;
256              
257 0         0 $self->_handle_command_line($buftype, $command);
258 0         0 $self->_buffer($buffer);
259             }
260             elsif (/^\s+\S/) {
261 10         30 $self->_handle_verbatim_line($line);
262             }
263             else {
264 15         53 $self->_handle_normal_line($line);
265             }
266             }
267             }
268              
269 3         24 $self->_end_of_line;
270             }
271              
272             sub _buffer {
273 54     54   76 my $self = shift;
274 54         71 push @{ $self->{buffer} }, @_;
  54         157  
275             }
276              
277             sub _new_buffer {
278 38     38   54 my $self = shift;
279             # The buffer type goes in the first element, and its
280             # contents, if any, in the rest.
281 38         237 $self->{buffer} = [];
282             }
283              
284             sub _end_previous_buffer {
285 10     10   18 my $self = shift;
286 10 100       21 return unless $self->_buftype;
287              
288 5 50       13 if ($self->_buftype eq 'verbatim') {
289             warn "Verbatim paragraph ended without blank line"
290 5 50       18 if $self->{buffer}->[-1] =~ /\S/;
291              
292 5         12 $self->_process_buffer;
293             return
294 5         10 }
295              
296 0         0 warn $self->_buftype . " ended without blank line";
297 0         0 $self->_process_buffer;
298              
299 0         0 return 0;
300             }
301              
302             sub _handle_blank_line {
303 21     21   27 my $self = shift;
304 21 50       45 return unless $self->_buftype;
305              
306             # Handle a verbatim buffer when a different buffer starts, or we run out of
307             # lines.
308 21 100       49 if ($self->_buftype eq 'verbatim') {
309 9         20 $self->_buffer('');
310 9         23 return;
311             }
312              
313 12         44 $self->_process_buffer;
314             }
315              
316             sub _handle_command_line {
317 0     0   0 my $self = shift;
318 0         0 my $type = shift;
319 0         0 my $command = shift;
320              
321 0         0 $self->_end_previous_buffer;
322              
323 0         0 $self->_buffer($type, $command);
324             }
325              
326             # Verbatim lines just get buffered until something else happens.
327             sub _handle_verbatim_line {
328 10     10   14 my $self = shift;
329              
330 10 100       25 if ($self->_buftype ne 'verbatim') {
331 5         16 $self->_end_previous_buffer;
332 5         12 $self->_buffer('verbatim');
333             }
334 10         26 $self->_buffer(@_);
335             }
336              
337             sub _handle_normal_line {
338 15     15   31 my $self = shift;
339 15         23 my $line = shift;
340              
341             # Nothing special, continue previous buffer or start a paragraph.
342 15 100       101 if ($self->_buftype eq 'verbatim') {
343 5         66 $self->_end_previous_buffer;
344             }
345              
346 15 50       40 if (not $self->_buftype) {
347 15         49 $self->_buffer('paragraph');
348             }
349              
350 15         35 $self->_buffer($line);
351             }
352              
353             sub _buftype {
354 117     117   149 my $self = shift;
355 117 100       129 if (@{ $self->{buffer} }) {
  117         320  
356 82         257 return $self->{buffer}->[0];
357             }
358 35         108 return '';
359             }
360              
361             sub _end_of_line {
362 3     3   8 my $self = shift;
363 3         10 $self->_process_buffer;
364             }
365              
366             sub _process_buffer {
367 20     20   28 my $self = shift;
368              
369 20 50       45 return '' unless $self->_buftype;
370              
371 20         45 my @buffer = @{$self->{buffer}};
  20         55  
372              
373 20         42 for (shift @buffer) {
374 20 100 0     53 if($_ eq 'paragraph') {
    50          
    0          
    0          
375             # concatenate the lines and normalise whitespace.
376 15         72 my $para = crunch(join " ", @buffer);
377 15         252 $self->handle_paragraph($self->_process_entities($para));
378 15         6675 $self->_new_buffer;
379             }
380             elsif($_ eq 'verbatim') {
381             # There will have been a warning if this is not the empty string
382 5 50       15 pop @buffer if $buffer[-1] eq '';
383              
384             # find the lowest level of indentation in this buffer and strip it
385             # Avoid zeroes from blank lines in between
386 5 100       9 my $indent_level = min map { /^(\s*)/; length $1 || () } @buffer;
  14         32  
  14         64  
387 5         68 s/^\s{$indent_level}// for @buffer;
388              
389 5         14 my $para = join "\n", @buffer;
390 5         16 $self->handle_verbatim($para);
391             }
392             elsif($_ eq 'command' || $_ eq 'begin') {
393 0         0 my $type = shift @buffer;
394 0         0 my $content = crunch(join " ", @buffer);
395 0         0 $self->${\"handle_$_"}($type, $self->_process_entities($content));
  0         0  
396             }
397             elsif($_ eq 'end') {
398 0         0 $self->handle_end($buffer[0]);
399             }
400             }
401              
402 20         5266 $self->_new_buffer;
403             }
404              
405             =head2 handle_verbatim
406              
407             The verbatim paragraph as it was in the code, except with the minimum amount of
408             whitespace stripped from each line as described in L.
409             Passed in as a single string with line breaks preserved.
410              
411             Do whatever you want. Default is to return the string straight back atcha.
412              
413             =cut
414              
415             sub handle_verbatim {
416 5     5 1 23 shift;
417 5         11 shift;
418             }
419              
420             =head2 handle_entity
421              
422             Passed the letter of the L as the first argument and its content
423             as the rest of @_. The content will alternate between plain text and the return
424             value of this function for any nested entities inside this one.
425              
426             For this reason you should return a scalar from this method, be it text or a
427             ref. The default is to concatenate @_, thus replacing entities with their
428             contents.
429              
430             Note that this method is the only one whose return value is of relevance to the
431             parser, since what you return from this will appear in another handler,
432             depending on what type of paragraph the entity is in.
433              
434             You will never get the C<< ZZ<><> >> entity.
435              
436             =cut
437              
438             sub handle_entity {
439 5     5 1 6827 shift; shift;
  5         6  
440 5   100     38 join ' ', map $_ // '', @_;
441             }
442              
443             # preprocess paragraph before giving it to the user. handle_entity is called
444             # from the parser itself.
445             sub _process_entities {
446 15     15   26 my ($self, $para) = @_;
447              
448             # 1. replace POD-like Z<...> with user-defined functions.
449             # Z itself is the only actual exception to that.
450             $self->{parser} ||= Pod::Cats::Parser::MGC->new(
451             object => $self,
452 15   100     89 delimiters => $self->{delimiters} // '<'
      66        
453             );
454              
455 15         57 my $parsed = $self->{parser}->from_string( $para );
456              
457             # Single return of undef was Z<>
458 15 100 66     1231 return defined $parsed->[0] || @$parsed > 1 ? @$parsed : ();
459             }
460              
461             =head2 handle_paragraph
462              
463             The paragraph is split into sections that alternate between plain text and the
464             return values of L as described above. These
465             sections are arrayed in @_. Note that the paragraph could start with an entity.
466              
467             By default it returns @_ concatenated, since the default behaviour of
468             L is to remove the formatting but keep the
469             contents.
470              
471             =cut
472              
473             sub handle_paragraph {
474 15   100 15 1 37 shift; join ' ', map $_ // '', @_;
  15         90  
475             }
476              
477             =head2 handle_command
478              
479             When a L is encountered it comes here. The first argument is
480             the B (from B<=COMMAND>); the rest of the arguments follow the rules of
481             L and alternate between plain text and parsed
482             entities.
483              
484             By default it returns @_ concatenated, same as paragraphs.
485              
486             =cut
487              
488             sub handle_command {
489 0   0 0 1   shift; shift; join ' ', map $_ // '', @_;
  0            
  0            
490             }
491              
492             =head2 handle_begin
493              
494             This is handled the same as L, except it is
495             called when a L command is encountered. The same rules apply.
496              
497             =cut
498              
499             sub handle_begin {
500 0   0 0 1   shift; shift; join ' ', map $_ // '', @_;
  0            
  0            
501             }
502              
503             =head2 handle_end
504              
505             The counterpart to the begin handler. This is called when the L paragraph
506             is encountered. The parser will already have discovered whether your begins and
507             ends are not balanced so you don't need to worry about that.
508              
509             Note that there is no content for an end paragraph so the only argument this
510             gets is the command name.
511              
512             =cut
513              
514       0 1   sub handle_end { }
515              
516             =head1 TODO
517              
518             =over
519              
520             =item The document is parsed into DOM, then events are fired SAX-like.
521             Preferable to fire the events and build the DOM from that.
522              
523             =item Currently the matching of begin/end commands is a bit naive.
524              
525             =item Line numbers of errors are not yet reported.
526              
527             =back
528              
529             =head1 AUTHOR
530              
531             Altreus, C<< >>
532              
533             =head1 BUGS
534              
535             Bug reports to github please: http://github.com/Altreus/Pod-Cats/issues
536              
537             =head1 SUPPORT
538              
539             You are reading the only documentation for this module.
540              
541             For more help, give me a holler on irc.freenode.com #perl
542              
543             =head1 ACKNOWLEDGEMENTS
544              
545             Paul Evans (LeoNerd) basically wrote Parser::MGC because I was whining about not
546             being able to parse these entity delimiters with any of the token parsers I
547             could find; and then he wrote a POD example that I only had to tweak in order to
548             do so. So a lot of the credit should go to him!
549              
550             =head1 LICENSE AND COPYRIGHT
551              
552             Copyright 2013 Altreus.
553              
554             This module is released under the MIT licence.
555              
556             =cut
557              
558             1;