blib/lib/HTML/ParagraphSplit.pm | |||
---|---|---|---|
Criterion | Covered | Total | % |
statement | 98 | 103 | 95.1 |
branch | 25 | 28 | 89.2 |
condition | 5 | 8 | 62.5 |
subroutine | 12 | 12 | 100.0 |
pod | 2 | 2 | 100.0 |
total | 142 | 153 | 92.8 |
line | stmt | bran | cond | sub | pod | time | code |
---|---|---|---|---|---|---|---|
1 | package HTML::ParagraphSplit; | ||||||
2 | |||||||
3 | 9 | 9 | 245354 | use strict; | |||
9 | 23 | ||||||
9 | 761 | ||||||
4 | 9 | 9 | 51 | use warnings; | |||
9 | 18 | ||||||
9 | 897 | ||||||
5 | |||||||
6 | our $VERSION = '1.05'; | ||||||
7 | |||||||
8 | require Exporter; | ||||||
9 | |||||||
10 | our @ISA = qw( Exporter ); | ||||||
11 | |||||||
12 | our @EXPORT_OK = qw( split_paragraphs split_paragraphs_to_text ); | ||||||
13 | |||||||
14 | 9 | 9 | 9807 | use HTML::Entities; | |||
9 | 66328 | ||||||
9 | 933 | ||||||
15 | 9 | 9 | 12493 | use HTML::TreeBuilder; | |||
9 | 329774 | ||||||
9 | 158 | ||||||
16 | 9 | 9 | 435 | use HTML::Tagset; | |||
9 | 19 | ||||||
9 | 250 | ||||||
17 | 9 | 9 | 52 | use Scalar::Util qw/ blessed /; | |||
9 | 19 | ||||||
9 | 1163 | ||||||
18 | |||||||
19 | 9 | 9 | 56 | use vars qw( %p_content ); | |||
9 | 18 | ||||||
9 | 24411 | ||||||
20 | *p_content = *HTML::Tagset::is_Possible_Strict_P_Content; | ||||||
21 | |||||||
22 | |||||||
23 | =head1 NAME | ||||||
24 | |||||||
25 | HTML::ParagraphSplit - Change text containing HTML into a formatted HTML fragment | ||||||
26 | |||||||
27 | =head1 SYNOPSIS | ||||||
28 | |||||||
29 | use HTML::ParagraphSplit qw( split_paragraphs_to_text split_paragraphs ); | ||||||
30 | |||||||
31 | # Read in from a file handle, output text | ||||||
32 | print split_paragraphs_to_text(\*ARGV); | ||||||
33 | |||||||
34 | # Convert text to nicely split text | ||||||
35 | print split_paragraphs_to_text(< | ||||||
36 | This is one paragraph. | ||||||
37 | |||||||
38 | This is a another paragraph. | ||||||
39 | END_OF_MARKUP | ||||||
40 | |||||||
41 | # Convert to an HTML::Element object instead | ||||||
42 | my $tree = split_paragraphs($html_input); | ||||||
43 | print $tree->as_HTML; | ||||||
44 | |||||||
45 | # Create your own HTML::Element object and split it | ||||||
46 | my $tree = HTML::TreeBuilder->new; | ||||||
47 | $tree->parse($text); | ||||||
48 | $tree->eof; | ||||||
49 | |||||||
50 | split_paragraphs($tree); | ||||||
51 | |||||||
52 | my $html_fragment = $tree->guts->as_HTML; | ||||||
53 | $tree->delete; | ||||||
54 | |||||||
55 | =head1 DESCRIPTION | ||||||
56 | |||||||
57 | The purpose of this library is to provide methods for converting double line-breaks in text to HTML paragraphs (i.e., wrap in C |
||||||
58 | |||||||
59 | For example, given this input (the initial text was generated by DadaDodo L |
||||||
60 | |||||||
61 | I see over the noise but I don't understand sometimes. | ||||||
62 | |||||||
63 |
|
||||||
64 | |||||||
65 | Fortunately, we've traded the club you can't skimp on the do because This | ||||||
66 | week! Presented by code Lounge: except, for controlling Knox video cameras | ||||||
67 | Linux well that the reason, the runlevel to run some reason number of coming | ||||||
68 | back next server; sees you Control display a steep | ||||||
69 | and I tagged with specifications of six feet, moving to Code, flyer main room | ||||||
70 | motel balcony, and airflow in which define the ability to run a common. We |
||||||
71 | need to current in a manner than six months and that already gotten a |
||||||
72 | webcast is roughly long and bulk: and up the src page: and updates on a: | ||||||
73 | user will probably does this. | ||||||
74 | |||||||
75 | This would be converted into the following: | ||||||
76 | |||||||
77 | I see over the noise but I don't understand sometimes. |
||||||
78 | |||||||
79 |
|
||||||
80 | |||||||
81 | Fortunately, we've traded the club you can't skimp on the do because This |
||||||
82 | week! Presented by code Lounge: except, for controlling Knox video cameras | ||||||
83 | Linux well that the reason, the runlevel to run some reason number of coming | ||||||
84 | back next server; sees you Control display a steep | ||||||
85 | and I tagged with specifications of six feet, moving to Code, flyer main room | ||||||
86 | motel balcony, | ||||||
87 | and airflow in which define the ability to run a common. We need to |
||||||
88 | current in a manner | ||||||
89 | than six months and that already gotten a |
||||||
90 | webcast | ||||||
91 | is roughly long and bulk: and up the src page: and updates on a: user will |
||||||
92 | probably does this. | ||||||
93 | |||||||
94 | This allows authors to use HTML markup some without having to cope with getting their paragraph tags right. | ||||||
95 | |||||||
96 | This library depends upon L |
||||||
97 | |||||||
98 | =head1 METHODS | ||||||
99 | |||||||
100 | The primary method of this library is C |
||||||
101 | |||||||
102 | =head2 split_paragraphs | ||||||
103 | |||||||
104 | =over | ||||||
105 | |||||||
106 | =item $element = split_paragraphs($handle, \%options) | ||||||
107 | |||||||
108 | =item $element = split_paragraphs($text, \%options) | ||||||
109 | |||||||
110 | =item $element = split_paragraphs($element, \%options) | ||||||
111 | |||||||
112 | =back | ||||||
113 | |||||||
114 | This method has three forms, which vary only in the input they receive. If the first argument is a file handle, C<$handle>, then that handle will be read, parsed, and split. If the first argument is a scalar, C<$text>, then that text will parsed and split. If the first argument is a subclass of L |
||||||
115 | |||||||
116 | If you use the third form, your tree will be modified in place and the same tree will be returned. You will want to clone the tree ahead of time if you need to preserve the old tree. | ||||||
117 | |||||||
118 | All forms take an optional second parameter, C<\%options>, which is a reference to a hash of options which modify the default behavior. See below for details. | ||||||
119 | |||||||
120 | The first two forms perform an extra step, but are handled essentially the same after the input is parsed into an L |
||||||
121 | |||||||
122 | This method will search down the element tree and find the first node with non-implicit child ndoes and use that as the root of operations. | ||||||
123 | |||||||
124 | The C |
||||||
125 | |||||||
126 | Any text found within a block-level node may also be paragraphified. Those blocks of text will not be wrapped in paragraphs unless they contain a double-line break (that way we're not inserting C -tags without an explicit need for them). |
||||||
127 | |||||||
128 | Note also that this will insert C -tags conservatively. If more than two line-breaks are present, even if they are mixed with other white space, all of that whitespace will be treated as the same paragraph break. No empty C -tags or C -tags containing only whitespace will be inserted (mostly). The only exception is when the white space is created by white space entities, such as C< >. |
||||||
129 | |||||||
130 | All of that is the default behavior. That behavior may be modified by the second parameter, which is used to specify options that modify that behavior. | ||||||
131 | |||||||
132 | Here's the list of options and what they do: | ||||||
133 | |||||||
134 | =over | ||||||
135 | |||||||
136 | =item p_on_breaks_only =E |
||||||
137 | |||||||
138 | If this option is used, then paragrpahs will not be added to your text unless there is at least one double-line break. This option is used internally to make sure nested elements do not have extra C -tags unnecessarily. |
||||||
139 | |||||||
140 | =item single_line_breaks_to_br =E |
||||||
141 | |||||||
142 | If this option is given, then single line breaks will also be converted to C -tags. |
||||||
143 | |||||||
144 | =item br_only_if_can_tighten =E |
||||||
145 | |||||||
146 | This option modifies the C -tags are not added within blocks that cannot be tightened (i.e., aren't set in C<%canTighten> of L -tags or C |