line |
stmt |
bran |
cond |
sub |
pod |
time |
code |
1
|
|
|
|
|
|
|
package App::Rsnapshot::XML::Tiny; |
2
|
|
|
|
|
|
|
|
3
|
13
|
|
|
13
|
|
34091
|
use strict; |
|
13
|
|
|
|
|
24
|
|
|
13
|
|
|
|
|
535
|
|
4
|
|
|
|
|
|
|
|
5
|
|
|
|
|
|
|
require Exporter; |
6
|
|
|
|
|
|
|
|
7
|
13
|
|
|
13
|
|
66
|
use vars qw($VERSION @EXPORT_OK @ISA); |
|
13
|
|
|
|
|
20
|
|
|
13
|
|
|
|
|
21856
|
|
8
|
|
|
|
|
|
|
|
9
|
|
|
|
|
|
|
$VERSION = '1.12'; |
10
|
|
|
|
|
|
|
@EXPORT_OK = qw(parsefile); |
11
|
|
|
|
|
|
|
@ISA = qw(Exporter); |
12
|
|
|
|
|
|
|
|
13
|
|
|
|
|
|
|
# localising prevents the warningness leaking out of this module |
14
|
|
|
|
|
|
|
local $^W = 1; # can't use warnings as that's a 5.6-ism |
15
|
|
|
|
|
|
|
|
16
|
|
|
|
|
|
|
=head1 NAME |
17
|
|
|
|
|
|
|
|
18
|
|
|
|
|
|
|
App::Rsnapshot::XML::Tiny - simple lightweight parser for a subset of XML |
19
|
|
|
|
|
|
|
|
20
|
|
|
|
|
|
|
=head1 DESCRIPTION |
21
|
|
|
|
|
|
|
|
22
|
|
|
|
|
|
|
App::Rsnapshot::XML::Tiny is a simple lightweight parser for a subset of XML |
23
|
|
|
|
|
|
|
|
24
|
|
|
|
|
|
|
=head1 SYNOPSIS |
25
|
|
|
|
|
|
|
|
26
|
|
|
|
|
|
|
use App::Rsnapshot::XML::Tiny qw(parsefile); |
27
|
|
|
|
|
|
|
open($xmlfile, 'something.xml); |
28
|
|
|
|
|
|
|
my $document = parsefile($xmlfile); |
29
|
|
|
|
|
|
|
|
30
|
|
|
|
|
|
|
This will leave C<$document> looking something like this: |
31
|
|
|
|
|
|
|
|
32
|
|
|
|
|
|
|
[ |
33
|
|
|
|
|
|
|
{ |
34
|
|
|
|
|
|
|
type => 'e', |
35
|
|
|
|
|
|
|
attrib => { ... }, |
36
|
|
|
|
|
|
|
name => 'rootelementname', |
37
|
|
|
|
|
|
|
content => [ |
38
|
|
|
|
|
|
|
... |
39
|
|
|
|
|
|
|
more elements and text content |
40
|
|
|
|
|
|
|
... |
41
|
|
|
|
|
|
|
] |
42
|
|
|
|
|
|
|
} |
43
|
|
|
|
|
|
|
] |
44
|
|
|
|
|
|
|
|
45
|
|
|
|
|
|
|
=head1 FUNCTIONS |
46
|
|
|
|
|
|
|
|
47
|
|
|
|
|
|
|
The C function is optionally exported. By default nothing is |
48
|
|
|
|
|
|
|
exported. There is no objecty interface. |
49
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
|
=head2 parsefile |
51
|
|
|
|
|
|
|
|
52
|
|
|
|
|
|
|
This takes at least one parameter, optionally more. The compulsory |
53
|
|
|
|
|
|
|
parameter may be: |
54
|
|
|
|
|
|
|
|
55
|
|
|
|
|
|
|
=over 4 |
56
|
|
|
|
|
|
|
|
57
|
|
|
|
|
|
|
=item a filename |
58
|
|
|
|
|
|
|
|
59
|
|
|
|
|
|
|
in which case the file is read and parsed; |
60
|
|
|
|
|
|
|
|
61
|
|
|
|
|
|
|
=item a string of XML |
62
|
|
|
|
|
|
|
|
63
|
|
|
|
|
|
|
in which case it is read and parsed. How do we tell if we've got a string |
64
|
|
|
|
|
|
|
or a filename? If it begins with C<_TINY_XML_STRING_> then it's |
65
|
|
|
|
|
|
|
a string. That prefix is, of course, ignored when it comes to actually |
66
|
|
|
|
|
|
|
parsing the data. This is intended primarily for use by wrappers which |
67
|
|
|
|
|
|
|
want to retain compatibility with Ye Aunciente Perl. Normal users who want |
68
|
|
|
|
|
|
|
to pass in a string would be expected to use L. |
69
|
|
|
|
|
|
|
|
70
|
|
|
|
|
|
|
=item a glob-ref or IO::Handle object |
71
|
|
|
|
|
|
|
|
72
|
|
|
|
|
|
|
in which case again, the file is read and parsed. |
73
|
|
|
|
|
|
|
|
74
|
|
|
|
|
|
|
=back |
75
|
|
|
|
|
|
|
|
76
|
|
|
|
|
|
|
The former case is for compatibility with older perls, but makes no |
77
|
|
|
|
|
|
|
attempt to properly deal with character sets. If you open a file in a |
78
|
|
|
|
|
|
|
character-set-friendly way and then pass in a handle / object, then the |
79
|
|
|
|
|
|
|
method should Do The Right Thing as it only ever works with character |
80
|
|
|
|
|
|
|
data. |
81
|
|
|
|
|
|
|
|
82
|
|
|
|
|
|
|
The remaining parameters are a list of key/value pairs to make a hash of |
83
|
|
|
|
|
|
|
options: |
84
|
|
|
|
|
|
|
|
85
|
|
|
|
|
|
|
=over 4 |
86
|
|
|
|
|
|
|
|
87
|
|
|
|
|
|
|
=item fatal_declarations |
88
|
|
|
|
|
|
|
|
89
|
|
|
|
|
|
|
If set to true, E!ENTITY...E and E!DOCTYPE...E declarations |
90
|
|
|
|
|
|
|
in the document |
91
|
|
|
|
|
|
|
are fatal errors - otherwise they are *ignored*. |
92
|
|
|
|
|
|
|
|
93
|
|
|
|
|
|
|
=item no_entity_parsing |
94
|
|
|
|
|
|
|
|
95
|
|
|
|
|
|
|
If set to true, the five built-in entities are passed through unparsed. |
96
|
|
|
|
|
|
|
Note that special characters in CDATA and attributes may have been turned |
97
|
|
|
|
|
|
|
into C<&>, C<<> and friends. |
98
|
|
|
|
|
|
|
|
99
|
|
|
|
|
|
|
=item strict_entity_parsing |
100
|
|
|
|
|
|
|
|
101
|
|
|
|
|
|
|
If set to true, any unrecognised entities (ie, those outside the core five |
102
|
|
|
|
|
|
|
plus numeric entities) cause a fatal error. If you set both this and |
103
|
|
|
|
|
|
|
C (but why would you do that?) then the latter takes |
104
|
|
|
|
|
|
|
precedence. |
105
|
|
|
|
|
|
|
|
106
|
|
|
|
|
|
|
Obviously, if you want to maximise compliance with the XML spec, you should |
107
|
|
|
|
|
|
|
turn on fatal_declarations and strict_entity_parsing. |
108
|
|
|
|
|
|
|
|
109
|
|
|
|
|
|
|
=back |
110
|
|
|
|
|
|
|
|
111
|
|
|
|
|
|
|
The function returns a structure describing the document. This contains |
112
|
|
|
|
|
|
|
one or more nodes, each being either an 'element' node or a 'text' mode. |
113
|
|
|
|
|
|
|
The structure is an arrayref which contains a single 'element' node which |
114
|
|
|
|
|
|
|
represents the document entity. The arrayref is redundant, but exists for |
115
|
|
|
|
|
|
|
compatibility with L. |
116
|
|
|
|
|
|
|
|
117
|
|
|
|
|
|
|
Element nodes are hashrefs with the following keys: |
118
|
|
|
|
|
|
|
|
119
|
|
|
|
|
|
|
=over 4 |
120
|
|
|
|
|
|
|
|
121
|
|
|
|
|
|
|
=item type |
122
|
|
|
|
|
|
|
|
123
|
|
|
|
|
|
|
The node's type, represented by the letter 'e'. |
124
|
|
|
|
|
|
|
|
125
|
|
|
|
|
|
|
=item name |
126
|
|
|
|
|
|
|
|
127
|
|
|
|
|
|
|
The element's name. |
128
|
|
|
|
|
|
|
|
129
|
|
|
|
|
|
|
=item attrib |
130
|
|
|
|
|
|
|
|
131
|
|
|
|
|
|
|
A hashref containing the element's attributes, as key/value pairs where |
132
|
|
|
|
|
|
|
the key is the attribute name. |
133
|
|
|
|
|
|
|
|
134
|
|
|
|
|
|
|
=item content |
135
|
|
|
|
|
|
|
|
136
|
|
|
|
|
|
|
An arrayref of the element's contents. The array's contents is a list of |
137
|
|
|
|
|
|
|
nodes, in the order they were encountered in the document. |
138
|
|
|
|
|
|
|
|
139
|
|
|
|
|
|
|
=back |
140
|
|
|
|
|
|
|
|
141
|
|
|
|
|
|
|
Text nodes are hashrefs with the following keys: |
142
|
|
|
|
|
|
|
|
143
|
|
|
|
|
|
|
=over 4 |
144
|
|
|
|
|
|
|
|
145
|
|
|
|
|
|
|
=item type |
146
|
|
|
|
|
|
|
|
147
|
|
|
|
|
|
|
The node's type, represented by the letter 't'. |
148
|
|
|
|
|
|
|
|
149
|
|
|
|
|
|
|
=item content |
150
|
|
|
|
|
|
|
|
151
|
|
|
|
|
|
|
A scalar piece of text. |
152
|
|
|
|
|
|
|
|
153
|
|
|
|
|
|
|
=back |
154
|
|
|
|
|
|
|
|
155
|
|
|
|
|
|
|
=cut |
156
|
|
|
|
|
|
|
|
157
|
|
|
|
|
|
|
my %regexps = ( |
158
|
|
|
|
|
|
|
name => '[:a-z][\\w:\\.-]*' |
159
|
|
|
|
|
|
|
); |
160
|
|
|
|
|
|
|
|
161
|
|
|
|
|
|
|
my $strict_entity_parsing; # mmm, global. don't worry, parsefile sets it |
162
|
|
|
|
|
|
|
# explicitly every time |
163
|
|
|
|
|
|
|
sub parsefile { |
164
|
212
|
|
|
212
|
1
|
39290
|
my($arg, %params) = @_; |
165
|
212
|
|
|
|
|
701
|
my($file, $elem) = ('', { content => [] }); |
166
|
212
|
|
|
|
|
525
|
local $/; # sluuuuurp |
167
|
|
|
|
|
|
|
|
168
|
212
|
|
|
|
|
281
|
$strict_entity_parsing = $params{strict_entity_parsing}; |
169
|
|
|
|
|
|
|
|
170
|
212
|
100
|
|
|
|
604
|
if(ref($arg) eq '') { # we were passed a filename or a string |
171
|
210
|
100
|
|
|
|
408
|
if($arg =~ /^_TINY_XML_STRING_/) { # it's a string |
172
|
22
|
|
|
|
|
227
|
$file = substr($arg, 17); |
173
|
|
|
|
|
|
|
} else { |
174
|
188
|
|
|
|
|
332
|
local *FH; |
175
|
188
|
100
|
|
|
|
8015
|
open(FH, $arg) || die(__PACKAGE__."::parsefile: Can't open $arg\n"); |
176
|
187
|
|
|
|
|
3729
|
$file = ; |
177
|
187
|
|
|
|
|
1795
|
close(FH); |
178
|
|
|
|
|
|
|
} |
179
|
2
|
|
|
|
|
50
|
} else { $file = <$arg>; } |
180
|
211
|
100
|
66
|
|
|
1555
|
die("No elements\n") if (!defined($file) || $file =~ /^\s*$/); |
181
|
|
|
|
|
|
|
|
182
|
|
|
|
|
|
|
# illegal low-ASCII chars |
183
|
209
|
100
|
|
|
|
649
|
die("Not well-formed\n") if($file =~ /[\x00-\x08\x0b\x0c\x0e-\x1f]/); |
184
|
|
|
|
|
|
|
|
185
|
|
|
|
|
|
|
# turn CDATA into PCDATA |
186
|
204
|
|
|
|
|
324
|
$file =~ s{}{ |
187
|
8
|
|
|
|
|
22
|
$_ = $1.chr(0); # this makes sure that empty CDATAs become |
188
|
8
|
|
|
|
|
14
|
s/([&<>])/ # the empty string and aren't just thrown away. |
189
|
4
|
100
|
|
|
|
19
|
$1 eq '&' ? '&' : |
|
|
100
|
|
|
|
|
|
190
|
|
|
|
|
|
|
$1 eq '<' ? '<' : |
191
|
|
|
|
|
|
|
'>' |
192
|
|
|
|
|
|
|
/eg; |
193
|
8
|
|
|
|
|
23
|
$_; |
194
|
|
|
|
|
|
|
}egs; |
195
|
|
|
|
|
|
|
|
196
|
217
|
100
|
|
|
|
1074
|
die("Not well-formed\n") if( |
197
|
|
|
|
|
|
|
$file =~ /]]>/ || # ]]> not delimiting CDATA |
198
|
|
|
|
|
|
|
$file =~ //s || # ---> can't end a comment |
199
|
204
|
100
|
100
|
|
|
9368
|
grep { $_ && /--/ } ($file =~ /^\s+||\s+$/gs) # -- in comm |
|
|
|
100
|
|
|
|
|
200
|
|
|
|
|
|
|
); |
201
|
|
|
|
|
|
|
|
202
|
|
|
|
|
|
|
# strip leading/trailing whitespace and comments (which don't nest - phew!) |
203
|
194
|
|
|
|
|
7392
|
$file =~ s/^\s+||\s+$//gs; |
204
|
|
|
|
|
|
|
|
205
|
|
|
|
|
|
|
# turn quoted > in attribs into > |
206
|
|
|
|
|
|
|
# double- and single-quoted attrib values get done seperately |
207
|
194
|
|
|
|
|
12232
|
while($file =~ s/($regexps{name}\s*=\s*"[^"]*)>([^"]*")/$1>$2/gsi) {} |
208
|
194
|
|
|
|
|
12934
|
while($file =~ s/($regexps{name}\s*=\s*'[^']*)>([^']*')/$1>$2/gsi) {} |
209
|
|
|
|
|
|
|
|
210
|
194
|
100
|
100
|
|
|
1449
|
if($params{fatal_declarations} && $file =~ /
|
211
|
111
|
|
|
|
|
1054
|
die("I can't handle this document\n"); |
212
|
|
|
|
|
|
|
} |
213
|
|
|
|
|
|
|
|
214
|
|
|
|
|
|
|
# ignore empty tokens/whitespace tokens |
215
|
83
|
100
|
|
|
|
1197
|
foreach my $token (grep { length && $_ !~ /^\s+$/ } |
|
1290
|
|
|
|
|
11686
|
|
216
|
|
|
|
|
|
|
split(/(<[^>]+>)/, $file)) { |
217
|
749
|
100
|
100
|
|
|
1318440
|
if( |
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
218
|
|
|
|
|
|
|
$token =~ /<\?$regexps{name}.*?\?>/is || # PI |
219
|
|
|
|
|
|
|
$token =~ /^
|
220
|
|
|
|
|
|
|
) { |
221
|
6
|
|
|
|
|
18
|
next; |
222
|
|
|
|
|
|
|
} elsif($token =~ m!^($regexps{name})\s*>!i) { # close tag |
223
|
241
|
100
|
|
|
|
863
|
die("Not well-formed\n\tat $token\n") if($elem->{name} ne $1); |
224
|
238
|
|
|
|
|
633
|
$elem = delete $elem->{parent}; |
225
|
|
|
|
|
|
|
} elsif($token =~ /^<$regexps{name}(\s[^>]*)*(\s*\/)?>/is) { # open tag |
226
|
351
|
|
|
|
|
2940
|
my($tagname, $attribs_raw) = ($token =~ m!<(\S*)(.*?)(\s*/)?>!s); |
227
|
|
|
|
|
|
|
# first make attribs into a list so we can spot duplicate keys |
228
|
351
|
|
|
|
|
4389
|
my $attrib = [ |
229
|
|
|
|
|
|
|
# do double- and single- quoted attribs seperately |
230
|
|
|
|
|
|
|
$attribs_raw =~ /\s($regexps{name})\s*=\s*"([^"]*?)"/gi, |
231
|
|
|
|
|
|
|
$attribs_raw =~ /\s($regexps{name})\s*=\s*'([^']*?)'/gi |
232
|
|
|
|
|
|
|
]; |
233
|
351
|
100
|
|
|
|
501
|
if(@{$attrib} == 2 * keys %{{@{$attrib}}}) { |
|
351
|
|
|
|
|
559
|
|
|
351
|
|
|
|
|
378
|
|
|
351
|
|
|
|
|
1373
|
|
234
|
350
|
|
|
|
|
404
|
$attrib = { @{$attrib} } |
|
350
|
|
|
|
|
718
|
|
235
|
1
|
|
|
|
|
12
|
} else { die("Not well-formed - duplicate attribute\n"); } |
236
|
|
|
|
|
|
|
|
237
|
|
|
|
|
|
|
# now trash any attribs that we *did* manage to parse and see |
238
|
|
|
|
|
|
|
# if there's anything left |
239
|
350
|
|
|
|
|
3491
|
$attribs_raw =~ s/\s($regexps{name})\s*=\s*"([^"]*?)"//gi; |
240
|
350
|
|
|
|
|
2130
|
$attribs_raw =~ s/\s($regexps{name})\s*=\s*'([^']*?)'//gi; |
241
|
350
|
100
|
100
|
|
|
1277
|
die("Not well-formed\n$attribs_raw") if($attribs_raw =~ /\S/ || grep { / } values %{$attrib}); |
|
154
|
|
|
|
|
625
|
|
|
340
|
|
|
|
|
1130
|
|
242
|
|
|
|
|
|
|
|
243
|
338
|
100
|
|
|
|
731
|
unless($params{no_entity_parsing}) { |
244
|
337
|
|
|
|
|
369
|
foreach my $key (keys %{$attrib}) { |
|
337
|
|
|
|
|
1679
|
|
245
|
152
|
|
|
|
|
342
|
$attrib->{$key} = _fixentities($attrib->{$key}) |
246
|
|
|
|
|
|
|
} |
247
|
|
|
|
|
|
|
} |
248
|
|
|
|
|
|
|
$elem = { |
249
|
334
|
|
|
|
|
1639
|
content => [], |
250
|
|
|
|
|
|
|
name => $tagname, |
251
|
|
|
|
|
|
|
type => 'e', |
252
|
|
|
|
|
|
|
attrib => $attrib, |
253
|
|
|
|
|
|
|
parent => $elem |
254
|
|
|
|
|
|
|
}; |
255
|
334
|
|
|
|
|
400
|
push @{$elem->{parent}->{content}}, $elem; |
|
334
|
|
|
|
|
736
|
|
256
|
|
|
|
|
|
|
# now handle self-closing tags |
257
|
334
|
100
|
|
|
|
1670
|
if($token =~ /\s*\/>$/) { |
258
|
70
|
|
|
|
|
160
|
$elem->{name} =~ s/\/$//; |
259
|
70
|
|
|
|
|
509
|
$elem = delete $elem->{parent}; |
260
|
|
|
|
|
|
|
} |
261
|
|
|
|
|
|
|
} elsif($token =~ /^) { # some token taggish thing |
262
|
13
|
|
|
|
|
145
|
die("I can't handle this document\n\tat $token\n"); |
263
|
|
|
|
|
|
|
} else { # ordinary content |
264
|
138
|
|
|
|
|
204
|
$token =~ s/\x00//g; # get rid of our CDATA marker |
265
|
138
|
100
|
|
|
|
318
|
unless($params{no_entity_parsing}) { $token = _fixentities($token); } |
|
137
|
|
|
|
|
271
|
|
266
|
130
|
|
|
|
|
209
|
push @{$elem->{content}}, { content => $token, type => 't' }; |
|
130
|
|
|
|
|
696
|
|
267
|
|
|
|
|
|
|
} |
268
|
|
|
|
|
|
|
} |
269
|
42
|
50
|
|
|
|
286
|
die("Not well-formed\n") if(exists($elem->{parent})); |
270
|
42
|
100
|
|
|
|
92
|
die("Junk after end of document\n") if($#{$elem->{content}} > 0); |
|
42
|
|
|
|
|
235
|
|
271
|
30
|
|
|
|
|
218
|
die("No elements\n") if( |
272
|
30
|
100
|
66
|
|
|
61
|
$#{$elem->{content}} == -1 || $elem->{content}->[0]->{type} ne 'e' |
273
|
|
|
|
|
|
|
); |
274
|
29
|
|
|
|
|
6139
|
return $elem->{content}; |
275
|
|
|
|
|
|
|
} |
276
|
|
|
|
|
|
|
|
277
|
|
|
|
|
|
|
sub _fixentities { |
278
|
289
|
|
|
289
|
|
412
|
my $thingy = shift; |
279
|
|
|
|
|
|
|
|
280
|
289
|
100
|
|
|
|
507
|
my $junk = ($strict_entity_parsing) ? '|.*' : ''; |
281
|
289
|
|
|
|
|
1506
|
$thingy =~ s/&((#(\d+|x[a-fA-F0-9]+);)|lt;|gt;|quot;|apos;|amp;$junk)/ |
282
|
219
|
100
|
|
|
|
1531
|
$3 ? ( |
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
283
|
|
|
|
|
|
|
substr($3, 0, 1) eq 'x' ? # using a =~ match here clobbers $3 |
284
|
|
|
|
|
|
|
chr(hex(substr($3, 1))) : # so don't "fix" it! |
285
|
|
|
|
|
|
|
chr($3) |
286
|
|
|
|
|
|
|
) : |
287
|
|
|
|
|
|
|
$1 eq 'lt;' ? '<' : |
288
|
|
|
|
|
|
|
$1 eq 'gt;' ? '>' : |
289
|
|
|
|
|
|
|
$1 eq 'apos;' ? "'" : |
290
|
|
|
|
|
|
|
$1 eq 'quot;' ? '"' : |
291
|
|
|
|
|
|
|
$1 eq 'amp;' ? '&' : |
292
|
|
|
|
|
|
|
die("Illegal ampersand or entity\n\tat $1\n") |
293
|
|
|
|
|
|
|
/ge; |
294
|
277
|
|
|
|
|
929
|
$thingy; |
295
|
|
|
|
|
|
|
} |
296
|
|
|
|
|
|
|
|
297
|
|
|
|
|
|
|
=head1 COMPATIBILITY |
298
|
|
|
|
|
|
|
|
299
|
|
|
|
|
|
|
=head2 With other modules |
300
|
|
|
|
|
|
|
|
301
|
|
|
|
|
|
|
The C function is so named because it is intended to work in a |
302
|
|
|
|
|
|
|
similar fashion to L with the L style. |
303
|
|
|
|
|
|
|
Instead of saying this: |
304
|
|
|
|
|
|
|
|
305
|
|
|
|
|
|
|
use XML::Parser; |
306
|
|
|
|
|
|
|
use XML::Parser::EasyTree; |
307
|
|
|
|
|
|
|
$XML::Parser::EasyTree::Noempty=1; |
308
|
|
|
|
|
|
|
my $p=new XML::Parser(Style=>'EasyTree'); |
309
|
|
|
|
|
|
|
my $tree=$p->parsefile('something.xml'); |
310
|
|
|
|
|
|
|
|
311
|
|
|
|
|
|
|
you would say: |
312
|
|
|
|
|
|
|
|
313
|
|
|
|
|
|
|
use App::Rsnapshot::XML::Tiny; |
314
|
|
|
|
|
|
|
my $tree = App::Rsnapshot::XML::Tiny::parsefile('something.xml'); |
315
|
|
|
|
|
|
|
|
316
|
|
|
|
|
|
|
Any valid document that can be parsed like that using App::Rsnapshot::XML::Tiny should |
317
|
|
|
|
|
|
|
produce identical results if you use the above example of how to use |
318
|
|
|
|
|
|
|
L. |
319
|
|
|
|
|
|
|
|
320
|
|
|
|
|
|
|
If you find a document where that is not the case, please report it as |
321
|
|
|
|
|
|
|
a bug. |
322
|
|
|
|
|
|
|
|
323
|
|
|
|
|
|
|
=head2 With perl 5.004 |
324
|
|
|
|
|
|
|
|
325
|
|
|
|
|
|
|
The module is intended to be fully compatible with every version of perl |
326
|
|
|
|
|
|
|
back to and including 5.004, and may be compatible with even older |
327
|
|
|
|
|
|
|
versions of perl 5. |
328
|
|
|
|
|
|
|
|
329
|
|
|
|
|
|
|
The lack of Unicode and friends in older perls means that App::Rsnapshot::XML::Tiny |
330
|
|
|
|
|
|
|
does nothing with character sets. If you have a document with a funny |
331
|
|
|
|
|
|
|
character set, then you will need to open the file in an appropriate |
332
|
|
|
|
|
|
|
mode using a character-set-friendly perl and pass the resulting file |
333
|
|
|
|
|
|
|
handle to the module. |
334
|
|
|
|
|
|
|
|
335
|
|
|
|
|
|
|
=head2 The subset of XML that we understand |
336
|
|
|
|
|
|
|
|
337
|
|
|
|
|
|
|
=over 4 |
338
|
|
|
|
|
|
|
|
339
|
|
|
|
|
|
|
=item Element tags and attributes |
340
|
|
|
|
|
|
|
|
341
|
|
|
|
|
|
|
Including "self-closing" tags like Epie type = 'steak n kidney' /E; |
342
|
|
|
|
|
|
|
|
343
|
|
|
|
|
|
|
=item Comments |
344
|
|
|
|
|
|
|
|
345
|
|
|
|
|
|
|
Which are ignored; |
346
|
|
|
|
|
|
|
|
347
|
|
|
|
|
|
|
=item The five "core" entities |
348
|
|
|
|
|
|
|
|
349
|
|
|
|
|
|
|
ie C<&>, C<<>, C<>>, C<'> and C<">; |
350
|
|
|
|
|
|
|
|
351
|
|
|
|
|
|
|
=item Numeric entities |
352
|
|
|
|
|
|
|
|
353
|
|
|
|
|
|
|
eg C<A> and C<A>; |
354
|
|
|
|
|
|
|
|
355
|
|
|
|
|
|
|
=item CDATA |
356
|
|
|
|
|
|
|
|
357
|
|
|
|
|
|
|
This is simply turned into PCDATA before parsing. Note how this may interact |
358
|
|
|
|
|
|
|
with the various entity-handling options; |
359
|
|
|
|
|
|
|
|
360
|
|
|
|
|
|
|
=back |
361
|
|
|
|
|
|
|
|
362
|
|
|
|
|
|
|
The following parts of the XML standard are handled incorrectly or not at |
363
|
|
|
|
|
|
|
all - this is not an exhaustive list: |
364
|
|
|
|
|
|
|
|
365
|
|
|
|
|
|
|
=over 4 |
366
|
|
|
|
|
|
|
|
367
|
|
|
|
|
|
|
=item Namespaces |
368
|
|
|
|
|
|
|
|
369
|
|
|
|
|
|
|
While documents that use namespaces will be parsed just fine, there's no |
370
|
|
|
|
|
|
|
special treatment of them. Their names are preserved in element and |
371
|
|
|
|
|
|
|
attribute names like 'rdf:RDF'. |
372
|
|
|
|
|
|
|
|
373
|
|
|
|
|
|
|
=item DTDs and Schemas |
374
|
|
|
|
|
|
|
|
375
|
|
|
|
|
|
|
This is not a validating parser. declarations are ignored |
376
|
|
|
|
|
|
|
if you've not made them fatal. |
377
|
|
|
|
|
|
|
|
378
|
|
|
|
|
|
|
=item Entities and references |
379
|
|
|
|
|
|
|
|
380
|
|
|
|
|
|
|
declarations are ignored if you've not made them fatal. |
381
|
|
|
|
|
|
|
Unrecognised entities are ignored by default, as are naked & characters. |
382
|
|
|
|
|
|
|
This means that if entity parsing is enabled you won't be able to tell |
383
|
|
|
|
|
|
|
the difference between C< > and C< >. If your |
384
|
|
|
|
|
|
|
document might use any non-core entities then please consider using |
385
|
|
|
|
|
|
|
the C option, and then use something like |
386
|
|
|
|
|
|
|
L. |
387
|
|
|
|
|
|
|
|
388
|
|
|
|
|
|
|
=item Processing instructions |
389
|
|
|
|
|
|
|
|
390
|
|
|
|
|
|
|
These are ignored. |
391
|
|
|
|
|
|
|
|
392
|
|
|
|
|
|
|
=item Whitespace |
393
|
|
|
|
|
|
|
|
394
|
|
|
|
|
|
|
We do not guarantee to correctly handle leading and trailing whitespace. |
395
|
|
|
|
|
|
|
|
396
|
|
|
|
|
|
|
=item Character sets |
397
|
|
|
|
|
|
|
|
398
|
|
|
|
|
|
|
This is not practical with older versions of perl |
399
|
|
|
|
|
|
|
|
400
|
|
|
|
|
|
|
=back |
401
|
|
|
|
|
|
|
|
402
|
|
|
|
|
|
|
=head1 PHILOSOPHY and JUSTIFICATION |
403
|
|
|
|
|
|
|
|
404
|
|
|
|
|
|
|
While feedback from real users about this module has been uniformly |
405
|
|
|
|
|
|
|
positive and helpful, some people seem to take issue with this module |
406
|
|
|
|
|
|
|
because it doesn't implement every last jot and tittle of the XML |
407
|
|
|
|
|
|
|
standard and merely implements a useful subset. A very useful subset, |
408
|
|
|
|
|
|
|
as it happens, which can cope with common light-weight XML-ish tasks |
409
|
|
|
|
|
|
|
such as parsing the results of queries to the Amazon Web Services. |
410
|
|
|
|
|
|
|
Many, perhaps most, users of XML do not in fact need a full implementation |
411
|
|
|
|
|
|
|
of the standard, and are understandably reluctant to install large complex |
412
|
|
|
|
|
|
|
pieces of software which have many dependencies. In fact, when they |
413
|
|
|
|
|
|
|
realise what installing and using a full implementation entails, they |
414
|
|
|
|
|
|
|
quite often don't *want* it. Another class of users, people |
415
|
|
|
|
|
|
|
distributing applications, often can not rely on users being able to |
416
|
|
|
|
|
|
|
install modules from the CPAN, or even having tools like make or a shell |
417
|
|
|
|
|
|
|
available. App::Rsnapshot::XML::Tiny exists for those people. |
418
|
|
|
|
|
|
|
|
419
|
|
|
|
|
|
|
=head1 BUGS and FEEDBACK |
420
|
|
|
|
|
|
|
|
421
|
|
|
|
|
|
|
I welcome feedback about my code, including constructive criticism. |
422
|
|
|
|
|
|
|
Bug reports should be made using L or by email, |
423
|
|
|
|
|
|
|
and should include the smallest possible chunk of code, along with |
424
|
|
|
|
|
|
|
any necessary XML data, which demonstrates the bug. Ideally, this |
425
|
|
|
|
|
|
|
will be in the form of a file which I can drop in to the module's |
426
|
|
|
|
|
|
|
test suite. Please note that such files must work in perl 5.004. |
427
|
|
|
|
|
|
|
|
428
|
|
|
|
|
|
|
If you are feeling particularly generous you can encourage me in my |
429
|
|
|
|
|
|
|
open source endeavours by buying me something from my wishlist: |
430
|
|
|
|
|
|
|
L |
431
|
|
|
|
|
|
|
|
432
|
|
|
|
|
|
|
=head1 SEE ALSO |
433
|
|
|
|
|
|
|
|
434
|
|
|
|
|
|
|
=over 4 |
435
|
|
|
|
|
|
|
|
436
|
|
|
|
|
|
|
=item For more capable XML parsers: |
437
|
|
|
|
|
|
|
|
438
|
|
|
|
|
|
|
L |
439
|
|
|
|
|
|
|
|
440
|
|
|
|
|
|
|
L |
441
|
|
|
|
|
|
|
|
442
|
|
|
|
|
|
|
=item The requirements for a module to be Tiny |
443
|
|
|
|
|
|
|
|
444
|
|
|
|
|
|
|
L |
445
|
|
|
|
|
|
|
|
446
|
|
|
|
|
|
|
=back |
447
|
|
|
|
|
|
|
|
448
|
|
|
|
|
|
|
=head1 AUTHOR |
449
|
|
|
|
|
|
|
|
450
|
|
|
|
|
|
|
David Cantrell EFE |
451
|
|
|
|
|
|
|
|
452
|
|
|
|
|
|
|
Thanks to David Romano for some compatibility patches for Ye Aunciente Perl; |
453
|
|
|
|
|
|
|
|
454
|
|
|
|
|
|
|
to Matt Knecht and David Romano for prodding me to support attributes, |
455
|
|
|
|
|
|
|
and to Matt for providing code to implement it in a quick n dirty minimal |
456
|
|
|
|
|
|
|
kind of way; |
457
|
|
|
|
|
|
|
|
458
|
|
|
|
|
|
|
to the people on L and elsewhere who have been kind |
459
|
|
|
|
|
|
|
enough to point out ways it could be improved; |
460
|
|
|
|
|
|
|
|
461
|
|
|
|
|
|
|
to Sergio Fanchiotti for pointing out a bug in handling self-closing tags, |
462
|
|
|
|
|
|
|
and for reporting another bug that I introduced when fixing the first one; |
463
|
|
|
|
|
|
|
|
464
|
|
|
|
|
|
|
to 'Corion' for finding a bug with localised filehandles and providing a fix. |
465
|
|
|
|
|
|
|
|
466
|
|
|
|
|
|
|
=head1 COPYRIGHT and LICENCE |
467
|
|
|
|
|
|
|
|
468
|
|
|
|
|
|
|
Copyright 2007 David Cantrell |
469
|
|
|
|
|
|
|
|
470
|
|
|
|
|
|
|
This module is free-as-in-speech software, and may be used, distributed, |
471
|
|
|
|
|
|
|
and modified under the same terms as Perl itself. |
472
|
|
|
|
|
|
|
|
473
|
|
|
|
|
|
|
=head1 CONSPIRACY |
474
|
|
|
|
|
|
|
|
475
|
|
|
|
|
|
|
This module is also free-as-in-mason software. |
476
|
|
|
|
|
|
|
|
477
|
|
|
|
|
|
|
=cut |
478
|
|
|
|
|
|
|
|
479
|
|
|
|
|
|
|
'zero'; |