line |
stmt |
bran |
cond |
sub |
pod |
time |
code |
1
|
|
|
|
|
|
|
package XML::Tiny; |
2
|
|
|
|
|
|
|
|
3
|
14
|
|
|
14
|
|
39847
|
use strict; |
|
14
|
|
|
|
|
24
|
|
|
14
|
|
|
|
|
684
|
|
4
|
|
|
|
|
|
|
|
5
|
|
|
|
|
|
|
require Exporter; |
6
|
|
|
|
|
|
|
|
7
|
14
|
|
|
14
|
|
72
|
use vars qw($VERSION @EXPORT_OK @ISA); |
|
14
|
|
|
|
|
22
|
|
|
14
|
|
|
|
|
26979
|
|
8
|
|
|
|
|
|
|
|
9
|
|
|
|
|
|
|
$VERSION = '2.06'; |
10
|
|
|
|
|
|
|
@EXPORT_OK = qw(parsefile); |
11
|
|
|
|
|
|
|
@ISA = qw(Exporter); |
12
|
|
|
|
|
|
|
|
13
|
|
|
|
|
|
|
# localising prevents the warningness leaking out of this module |
14
|
|
|
|
|
|
|
local $^W = 1; # can't use warnings as that's a 5.6-ism |
15
|
|
|
|
|
|
|
|
16
|
|
|
|
|
|
|
=head1 NAME |
17
|
|
|
|
|
|
|
|
18
|
|
|
|
|
|
|
XML::Tiny - simple lightweight parser for a subset of XML |
19
|
|
|
|
|
|
|
|
20
|
|
|
|
|
|
|
=head1 DESCRIPTION |
21
|
|
|
|
|
|
|
|
22
|
|
|
|
|
|
|
XML::Tiny is a simple lightweight parser for a subset of XML |
23
|
|
|
|
|
|
|
|
24
|
|
|
|
|
|
|
=head1 SYNOPSIS |
25
|
|
|
|
|
|
|
|
26
|
|
|
|
|
|
|
use XML::Tiny qw(parsefile); |
27
|
|
|
|
|
|
|
open($xmlfile, 'something.xml); |
28
|
|
|
|
|
|
|
my $document = parsefile($xmlfile); |
29
|
|
|
|
|
|
|
|
30
|
|
|
|
|
|
|
This will leave C<$document> looking something like this: |
31
|
|
|
|
|
|
|
|
32
|
|
|
|
|
|
|
[ |
33
|
|
|
|
|
|
|
{ |
34
|
|
|
|
|
|
|
type => 'e', |
35
|
|
|
|
|
|
|
attrib => { ... }, |
36
|
|
|
|
|
|
|
name => 'rootelementname', |
37
|
|
|
|
|
|
|
content => [ |
38
|
|
|
|
|
|
|
... |
39
|
|
|
|
|
|
|
more elements and text content |
40
|
|
|
|
|
|
|
... |
41
|
|
|
|
|
|
|
] |
42
|
|
|
|
|
|
|
} |
43
|
|
|
|
|
|
|
] |
44
|
|
|
|
|
|
|
|
45
|
|
|
|
|
|
|
=head1 FUNCTIONS |
46
|
|
|
|
|
|
|
|
47
|
|
|
|
|
|
|
The C function is optionally exported. By default nothing is |
48
|
|
|
|
|
|
|
exported. There is no objecty interface. |
49
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
|
=head2 parsefile |
51
|
|
|
|
|
|
|
|
52
|
|
|
|
|
|
|
This takes at least one parameter, optionally more. The compulsory |
53
|
|
|
|
|
|
|
parameter may be: |
54
|
|
|
|
|
|
|
|
55
|
|
|
|
|
|
|
=over 4 |
56
|
|
|
|
|
|
|
|
57
|
|
|
|
|
|
|
=item a filename |
58
|
|
|
|
|
|
|
|
59
|
|
|
|
|
|
|
in which case the file is read and parsed; |
60
|
|
|
|
|
|
|
|
61
|
|
|
|
|
|
|
=item a string of XML |
62
|
|
|
|
|
|
|
|
63
|
|
|
|
|
|
|
in which case it is read and parsed. How do we tell if we've got a string |
64
|
|
|
|
|
|
|
or a filename? If it begins with C<_TINY_XML_STRING_> then it's |
65
|
|
|
|
|
|
|
a string. That prefix is, of course, ignored when it comes to actually |
66
|
|
|
|
|
|
|
parsing the data. This is intended primarily for use by wrappers which |
67
|
|
|
|
|
|
|
want to retain compatibility with Ye Aunciente Perl. Normal users who want |
68
|
|
|
|
|
|
|
to pass in a string would be expected to use L. |
69
|
|
|
|
|
|
|
|
70
|
|
|
|
|
|
|
=item a glob-ref or IO::Handle object |
71
|
|
|
|
|
|
|
|
72
|
|
|
|
|
|
|
in which case again, the file is read and parsed. |
73
|
|
|
|
|
|
|
|
74
|
|
|
|
|
|
|
=back |
75
|
|
|
|
|
|
|
|
76
|
|
|
|
|
|
|
The former case is for compatibility with older perls, but makes no |
77
|
|
|
|
|
|
|
attempt to properly deal with character sets. If you open a file in a |
78
|
|
|
|
|
|
|
character-set-friendly way and then pass in a handle / object, then the |
79
|
|
|
|
|
|
|
method should Do The Right Thing as it only ever works with character |
80
|
|
|
|
|
|
|
data. |
81
|
|
|
|
|
|
|
|
82
|
|
|
|
|
|
|
The remaining parameters are a list of key/value pairs to make a hash of |
83
|
|
|
|
|
|
|
options: |
84
|
|
|
|
|
|
|
|
85
|
|
|
|
|
|
|
=over 4 |
86
|
|
|
|
|
|
|
|
87
|
|
|
|
|
|
|
=item fatal_declarations |
88
|
|
|
|
|
|
|
|
89
|
|
|
|
|
|
|
If set to true, E!ENTITY...E and E!DOCTYPE...E declarations |
90
|
|
|
|
|
|
|
in the document |
91
|
|
|
|
|
|
|
are fatal errors - otherwise they are *ignored*. |
92
|
|
|
|
|
|
|
|
93
|
|
|
|
|
|
|
=item no_entity_parsing |
94
|
|
|
|
|
|
|
|
95
|
|
|
|
|
|
|
If set to true, the five built-in entities are passed through unparsed. |
96
|
|
|
|
|
|
|
Note that special characters in CDATA and attributes may have been turned |
97
|
|
|
|
|
|
|
into C<&>, C<<> and friends. |
98
|
|
|
|
|
|
|
|
99
|
|
|
|
|
|
|
=item strict_entity_parsing |
100
|
|
|
|
|
|
|
|
101
|
|
|
|
|
|
|
If set to true, any unrecognised entities (ie, those outside the core five |
102
|
|
|
|
|
|
|
plus numeric entities) cause a fatal error. If you set both this and |
103
|
|
|
|
|
|
|
C (but why would you do that?) then the latter takes |
104
|
|
|
|
|
|
|
precedence. |
105
|
|
|
|
|
|
|
|
106
|
|
|
|
|
|
|
Obviously, if you want to maximise compliance with the XML spec, you should |
107
|
|
|
|
|
|
|
turn on fatal_declarations and strict_entity_parsing. |
108
|
|
|
|
|
|
|
|
109
|
|
|
|
|
|
|
=back |
110
|
|
|
|
|
|
|
|
111
|
|
|
|
|
|
|
The function returns a structure describing the document. This contains |
112
|
|
|
|
|
|
|
one or more nodes, each being either an 'element' node or a 'text' mode. |
113
|
|
|
|
|
|
|
The structure is an arrayref which contains a single 'element' node which |
114
|
|
|
|
|
|
|
represents the document entity. The arrayref is redundant, but exists for |
115
|
|
|
|
|
|
|
compatibility with L. |
116
|
|
|
|
|
|
|
|
117
|
|
|
|
|
|
|
Element nodes are hashrefs with the following keys: |
118
|
|
|
|
|
|
|
|
119
|
|
|
|
|
|
|
=over 4 |
120
|
|
|
|
|
|
|
|
121
|
|
|
|
|
|
|
=item type |
122
|
|
|
|
|
|
|
|
123
|
|
|
|
|
|
|
The node's type, represented by the letter 'e'. |
124
|
|
|
|
|
|
|
|
125
|
|
|
|
|
|
|
=item name |
126
|
|
|
|
|
|
|
|
127
|
|
|
|
|
|
|
The element's name. |
128
|
|
|
|
|
|
|
|
129
|
|
|
|
|
|
|
=item attrib |
130
|
|
|
|
|
|
|
|
131
|
|
|
|
|
|
|
A hashref containing the element's attributes, as key/value pairs where |
132
|
|
|
|
|
|
|
the key is the attribute name. |
133
|
|
|
|
|
|
|
|
134
|
|
|
|
|
|
|
=item content |
135
|
|
|
|
|
|
|
|
136
|
|
|
|
|
|
|
An arrayref of the element's contents. The array's contents is a list of |
137
|
|
|
|
|
|
|
nodes, in the order they were encountered in the document. |
138
|
|
|
|
|
|
|
|
139
|
|
|
|
|
|
|
=back |
140
|
|
|
|
|
|
|
|
141
|
|
|
|
|
|
|
Text nodes are hashrefs with the following keys: |
142
|
|
|
|
|
|
|
|
143
|
|
|
|
|
|
|
=over 4 |
144
|
|
|
|
|
|
|
|
145
|
|
|
|
|
|
|
=item type |
146
|
|
|
|
|
|
|
|
147
|
|
|
|
|
|
|
The node's type, represented by the letter 't'. |
148
|
|
|
|
|
|
|
|
149
|
|
|
|
|
|
|
=item content |
150
|
|
|
|
|
|
|
|
151
|
|
|
|
|
|
|
A scalar piece of text. |
152
|
|
|
|
|
|
|
|
153
|
|
|
|
|
|
|
=back |
154
|
|
|
|
|
|
|
|
155
|
|
|
|
|
|
|
If you prefer a DOMmish interface, then look at L on the CPAN. |
156
|
|
|
|
|
|
|
|
157
|
|
|
|
|
|
|
=cut |
158
|
|
|
|
|
|
|
|
159
|
|
|
|
|
|
|
my %regexps = ( |
160
|
|
|
|
|
|
|
name => '[:_a-z][\\w:\\.-]*' |
161
|
|
|
|
|
|
|
); |
162
|
|
|
|
|
|
|
|
163
|
|
|
|
|
|
|
my $strict_entity_parsing; # mmm, global. don't worry, parsefile sets it |
164
|
|
|
|
|
|
|
# explicitly every time |
165
|
|
|
|
|
|
|
|
166
|
|
|
|
|
|
|
sub parsefile { |
167
|
223
|
|
|
223
|
1
|
42237
|
my($arg, %params) = @_; |
168
|
223
|
|
|
|
|
982
|
my($file, $elem) = ('', { content => [] }); |
169
|
223
|
|
|
|
|
661
|
local $/; # sluuuuurp |
170
|
|
|
|
|
|
|
|
171
|
223
|
|
|
|
|
356
|
$strict_entity_parsing = $params{strict_entity_parsing}; |
172
|
|
|
|
|
|
|
|
173
|
223
|
100
|
|
|
|
500
|
if(ref($arg) eq '') { # we were passed a filename or a string |
174
|
221
|
100
|
|
|
|
484
|
if($arg =~ /^_TINY_XML_STRING_/) { # it's a string |
175
|
29
|
|
|
|
|
182
|
$file = substr($arg, 17); |
176
|
|
|
|
|
|
|
} else { |
177
|
192
|
|
|
|
|
405
|
local *FH; |
178
|
192
|
100
|
|
|
|
7863
|
open(FH, $arg) || die(__PACKAGE__."::parsefile: Can't open $arg\n"); |
179
|
191
|
|
|
|
|
9005
|
$file = ; |
180
|
191
|
|
|
|
|
2811
|
close(FH); |
181
|
|
|
|
|
|
|
} |
182
|
2
|
|
|
|
|
60
|
} else { $file = <$arg>; } |
183
|
|
|
|
|
|
|
|
184
|
|
|
|
|
|
|
# strip any BOM |
185
|
222
|
|
|
|
|
881
|
$file =~ s/^(\xff\xfe(\x00\x00)?|(\x00\x00)?\xfe\xff|\xef\xbb\xbf)//; |
186
|
|
|
|
|
|
|
|
187
|
222
|
100
|
66
|
|
|
1478
|
die("No elements\n") if (!defined($file) || $file =~ /^\s*$/); |
188
|
|
|
|
|
|
|
|
189
|
|
|
|
|
|
|
# illegal low-ASCII chars |
190
|
220
|
100
|
|
|
|
715
|
die("Not well-formed (Illegal low-ASCII chars found)\n") if($file =~ /[\x00-\x08\x0b\x0c\x0e-\x1f]/); |
191
|
|
|
|
|
|
|
|
192
|
|
|
|
|
|
|
# turn CDATA into PCDATA |
193
|
214
|
|
|
|
|
501
|
$file =~ s{}{ |
194
|
9
|
|
|
|
|
27
|
$_ = $1.chr(0); # this makes sure that empty CDATAs become |
195
|
9
|
|
|
|
|
24
|
s/([&<>'"])/ # the empty string and aren't just thrown away. |
196
|
5
|
100
|
|
|
|
52
|
$1 eq '&' ? '&' : |
|
|
50
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
197
|
|
|
|
|
|
|
$1 eq '<' ? '<' : |
198
|
|
|
|
|
|
|
$1 eq '"' ? '"' : |
199
|
|
|
|
|
|
|
$1 eq "'" ? ''' : |
200
|
|
|
|
|
|
|
'>' |
201
|
|
|
|
|
|
|
/eg; |
202
|
9
|
|
|
|
|
37
|
$_; |
203
|
|
|
|
|
|
|
}egs; |
204
|
|
|
|
|
|
|
|
205
|
176
|
100
|
|
|
|
1132
|
die("Not well-formed (CDATA not delimited or bad comment)\n") if( |
206
|
|
|
|
|
|
|
$file =~ /]]>/ || # ]]> not delimiting CDATA |
207
|
|
|
|
|
|
|
$file =~ //s || # ---> can't end a comment |
208
|
214
|
100
|
100
|
|
|
8154
|
grep { $_ && /--/ } ($file =~ /^\s+||\s+$/gs) # -- in comm |
|
|
|
100
|
|
|
|
|
209
|
|
|
|
|
|
|
); |
210
|
|
|
|
|
|
|
|
211
|
|
|
|
|
|
|
# strip leading/trailing whitespace and comments (which don't nest - phew!) |
212
|
204
|
|
|
|
|
6400
|
$file =~ s/^\s+||\s+$//gs; |
213
|
|
|
|
|
|
|
|
214
|
|
|
|
|
|
|
# turn quoted > in attribs into > |
215
|
|
|
|
|
|
|
# double- and single-quoted attrib values get done seperately |
216
|
204
|
|
|
|
|
10532
|
while($file =~ s/($regexps{name}\s*=\s*"[^"]*)>([^"]*")/$1>$2/gsi) {} |
217
|
204
|
|
|
|
|
8976
|
while($file =~ s/($regexps{name}\s*=\s*'[^']*)>([^']*')/$1>$2/gsi) {} |
218
|
|
|
|
|
|
|
|
219
|
204
|
100
|
100
|
|
|
1081
|
if($params{fatal_declarations} && $file =~ /
|
220
|
111
|
|
|
|
|
1536
|
die("I can't handle this document\n"); |
221
|
|
|
|
|
|
|
} |
222
|
|
|
|
|
|
|
|
223
|
|
|
|
|
|
|
# ignore empty tokens/whitespace tokens |
224
|
93
|
100
|
|
|
|
962
|
foreach my $token (grep { length && $_ !~ /^\s+$/ } |
|
924
|
|
|
|
|
3677
|
|
225
|
|
|
|
|
|
|
split(/(<[^>]+>)/, $file)) { |
226
|
533
|
100
|
100
|
|
|
8938
|
if( |
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
227
|
|
|
|
|
|
|
$token =~ /<\?$regexps{name}.*?\?>/is || # PI |
228
|
|
|
|
|
|
|
$token =~ /^
|
229
|
|
|
|
|
|
|
) { |
230
|
6
|
|
|
|
|
18
|
next; |
231
|
|
|
|
|
|
|
} elsif($token =~ m!^($regexps{name})\s*>!i) { # close tag |
232
|
168
|
100
|
|
|
|
633
|
die("Not well-formed\n\tat $token\n") if($elem->{name} ne $1); |
233
|
165
|
|
|
|
|
461
|
$elem = delete $elem->{parent}; |
234
|
|
|
|
|
|
|
} elsif($token =~ /^<$regexps{name}(\s[^>]*)*(\s*\/)?>/is) { # open tag |
235
|
241
|
|
|
|
|
1905
|
my($tagname, $attribs_raw) = ($token =~ m!<(\S*)(.*?)(\s*/)?>!s); |
236
|
|
|
|
|
|
|
# first make attribs into a list so we can spot duplicate keys |
237
|
241
|
|
|
|
|
3082
|
my $attrib = [ |
238
|
|
|
|
|
|
|
# do double- and single- quoted attribs seperately |
239
|
|
|
|
|
|
|
$attribs_raw =~ /\s($regexps{name})\s*=\s*"([^"]*?)"/gi, |
240
|
|
|
|
|
|
|
$attribs_raw =~ /\s($regexps{name})\s*=\s*'([^']*?)'/gi |
241
|
|
|
|
|
|
|
]; |
242
|
241
|
100
|
|
|
|
328
|
if(@{$attrib} == 2 * keys %{{@{$attrib}}}) { |
|
241
|
|
|
|
|
365
|
|
|
241
|
|
|
|
|
241
|
|
|
241
|
|
|
|
|
819
|
|
243
|
240
|
|
|
|
|
305
|
$attrib = { @{$attrib} } |
|
240
|
|
|
|
|
518
|
|
244
|
1
|
|
|
|
|
15
|
} else { die("Not well-formed - duplicate attribute\n"); } |
245
|
|
|
|
|
|
|
|
246
|
|
|
|
|
|
|
# now trash any attribs that we *did* manage to parse and see |
247
|
|
|
|
|
|
|
# if there's anything left |
248
|
240
|
|
|
|
|
1884
|
$attribs_raw =~ s/\s($regexps{name})\s*=\s*"([^"]*?)"//gi; |
249
|
240
|
|
|
|
|
1388
|
$attribs_raw =~ s/\s($regexps{name})\s*=\s*'([^']*?)'//gi; |
250
|
240
|
100
|
100
|
|
|
769
|
die("Not well-formed\n$attribs_raw") if($attribs_raw =~ /\S/ || grep { / } values %{$attrib}); |
|
73
|
|
|
|
|
310
|
|
|
230
|
|
|
|
|
844
|
|
251
|
|
|
|
|
|
|
|
252
|
228
|
100
|
|
|
|
477
|
unless($params{no_entity_parsing}) { |
253
|
227
|
|
|
|
|
238
|
foreach my $key (keys %{$attrib}) { |
|
227
|
|
|
|
|
598
|
|
254
|
71
|
|
|
|
|
152
|
($attrib->{$key} = _fixentities($attrib->{$key})) =~ s/\x00//g; # get rid of CDATA marker |
255
|
|
|
|
|
|
|
} |
256
|
|
|
|
|
|
|
} |
257
|
|
|
|
|
|
|
$elem = { |
258
|
224
|
|
|
|
|
1006
|
content => [], |
259
|
|
|
|
|
|
|
name => $tagname, |
260
|
|
|
|
|
|
|
type => 'e', |
261
|
|
|
|
|
|
|
attrib => $attrib, |
262
|
|
|
|
|
|
|
parent => $elem |
263
|
|
|
|
|
|
|
}; |
264
|
224
|
|
|
|
|
266
|
push @{$elem->{parent}->{content}}, $elem; |
|
224
|
|
|
|
|
500
|
|
265
|
|
|
|
|
|
|
# now handle self-closing tags |
266
|
224
|
100
|
|
|
|
1030
|
if($token =~ /\s*\/>$/) { |
267
|
33
|
|
|
|
|
110
|
$elem->{name} =~ s/\/$//; |
268
|
33
|
|
|
|
|
260
|
$elem = delete $elem->{parent}; |
269
|
|
|
|
|
|
|
} |
270
|
|
|
|
|
|
|
} elsif($token =~ /^) { # some token taggish thing |
271
|
13
|
|
|
|
|
170
|
die("I can't handle this document\n\tat $token\n"); |
272
|
|
|
|
|
|
|
} else { # ordinary content |
273
|
105
|
|
|
|
|
173
|
$token =~ s/\x00//g; # get rid of our CDATA marker |
274
|
105
|
100
|
|
|
|
236
|
unless($params{no_entity_parsing}) { $token = _fixentities($token); } |
|
104
|
|
|
|
|
222
|
|
275
|
97
|
|
|
|
|
131
|
push @{$elem->{content}}, { content => $token, type => 't' }; |
|
97
|
|
|
|
|
461
|
|
276
|
|
|
|
|
|
|
} |
277
|
|
|
|
|
|
|
} |
278
|
52
|
50
|
|
|
|
251
|
die("Not well-formed (Duplicated parent)\n") if(exists($elem->{parent})); |
279
|
52
|
100
|
|
|
|
67
|
die("Junk after end of document\n") if($#{$elem->{content}} > 0); |
|
52
|
|
|
|
|
334
|
|
280
|
40
|
|
|
|
|
386
|
die("No elements\n") if( |
281
|
40
|
100
|
66
|
|
|
57
|
$#{$elem->{content}} == -1 || $elem->{content}->[0]->{type} ne 'e' |
282
|
|
|
|
|
|
|
); |
283
|
39
|
|
|
|
|
5255
|
return $elem->{content}; |
284
|
|
|
|
|
|
|
} |
285
|
|
|
|
|
|
|
|
286
|
|
|
|
|
|
|
sub _fixentities { |
287
|
175
|
|
|
175
|
|
322
|
my $thingy = shift; |
288
|
|
|
|
|
|
|
|
289
|
175
|
100
|
|
|
|
418
|
my $junk = ($strict_entity_parsing) ? '|.*' : ''; |
290
|
175
|
|
|
|
|
1224
|
$thingy =~ s/&((#(\d+|x[a-fA-F0-9]+);)|lt;|gt;|quot;|apos;|amp;$junk)/ |
291
|
220
|
100
|
|
|
|
1548
|
$3 ? ( |
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
292
|
|
|
|
|
|
|
substr($3, 0, 1) eq 'x' ? # using a =~ match here clobbers $3 |
293
|
|
|
|
|
|
|
chr(hex(substr($3, 1))) : # so don't "fix" it! |
294
|
|
|
|
|
|
|
chr($3) |
295
|
|
|
|
|
|
|
) : |
296
|
|
|
|
|
|
|
$1 eq 'lt;' ? '<' : |
297
|
|
|
|
|
|
|
$1 eq 'gt;' ? '>' : |
298
|
|
|
|
|
|
|
$1 eq 'apos;' ? "'" : |
299
|
|
|
|
|
|
|
$1 eq 'quot;' ? '"' : |
300
|
|
|
|
|
|
|
$1 eq 'amp;' ? '&' : |
301
|
|
|
|
|
|
|
die("Illegal ampersand or entity\n\tat $1\n") |
302
|
|
|
|
|
|
|
/ge; |
303
|
163
|
|
|
|
|
591
|
$thingy; |
304
|
|
|
|
|
|
|
} |
305
|
|
|
|
|
|
|
|
306
|
|
|
|
|
|
|
=head1 COMPATIBILITY |
307
|
|
|
|
|
|
|
|
308
|
|
|
|
|
|
|
=head2 With other modules |
309
|
|
|
|
|
|
|
|
310
|
|
|
|
|
|
|
The C function is so named because it is intended to work in a |
311
|
|
|
|
|
|
|
similar fashion to L with the L style. |
312
|
|
|
|
|
|
|
Instead of saying this: |
313
|
|
|
|
|
|
|
|
314
|
|
|
|
|
|
|
use XML::Parser; |
315
|
|
|
|
|
|
|
use XML::Parser::EasyTree; |
316
|
|
|
|
|
|
|
$XML::Parser::EasyTree::Noempty=1; |
317
|
|
|
|
|
|
|
my $p=new XML::Parser(Style=>'EasyTree'); |
318
|
|
|
|
|
|
|
my $tree=$p->parsefile('something.xml'); |
319
|
|
|
|
|
|
|
|
320
|
|
|
|
|
|
|
you would say: |
321
|
|
|
|
|
|
|
|
322
|
|
|
|
|
|
|
use XML::Tiny; |
323
|
|
|
|
|
|
|
my $tree = XML::Tiny::parsefile('something.xml'); |
324
|
|
|
|
|
|
|
|
325
|
|
|
|
|
|
|
Any valid document that can be parsed like that using XML::Tiny should |
326
|
|
|
|
|
|
|
produce identical results if you use the above example of how to use |
327
|
|
|
|
|
|
|
L. |
328
|
|
|
|
|
|
|
|
329
|
|
|
|
|
|
|
If you find a document where that is not the case, please report it as |
330
|
|
|
|
|
|
|
a bug. |
331
|
|
|
|
|
|
|
|
332
|
|
|
|
|
|
|
=head2 With perl 5.004 |
333
|
|
|
|
|
|
|
|
334
|
|
|
|
|
|
|
The module is intended to be fully compatible with every version of perl |
335
|
|
|
|
|
|
|
back to and including 5.004, and may be compatible with even older |
336
|
|
|
|
|
|
|
versions of perl 5. |
337
|
|
|
|
|
|
|
|
338
|
|
|
|
|
|
|
The lack of Unicode and friends in older perls means that XML::Tiny |
339
|
|
|
|
|
|
|
does nothing with character sets. If you have a document with a funny |
340
|
|
|
|
|
|
|
character set, then you will need to open the file in an appropriate |
341
|
|
|
|
|
|
|
mode using a character-set-friendly perl and pass the resulting file |
342
|
|
|
|
|
|
|
handle to the module. BOMs are ignored. |
343
|
|
|
|
|
|
|
|
344
|
|
|
|
|
|
|
=head2 The subset of XML that we understand |
345
|
|
|
|
|
|
|
|
346
|
|
|
|
|
|
|
=over 4 |
347
|
|
|
|
|
|
|
|
348
|
|
|
|
|
|
|
=item Element tags and attributes |
349
|
|
|
|
|
|
|
|
350
|
|
|
|
|
|
|
Including "self-closing" tags like Epie type = 'steak n kidney' /E; |
351
|
|
|
|
|
|
|
|
352
|
|
|
|
|
|
|
=item Comments |
353
|
|
|
|
|
|
|
|
354
|
|
|
|
|
|
|
Which are ignored; |
355
|
|
|
|
|
|
|
|
356
|
|
|
|
|
|
|
=item The five "core" entities |
357
|
|
|
|
|
|
|
|
358
|
|
|
|
|
|
|
ie C<&>, C<<>, C<>>, C<'> and C<">; |
359
|
|
|
|
|
|
|
|
360
|
|
|
|
|
|
|
=item Numeric entities |
361
|
|
|
|
|
|
|
|
362
|
|
|
|
|
|
|
eg C<A> and C<A>; |
363
|
|
|
|
|
|
|
|
364
|
|
|
|
|
|
|
=item CDATA |
365
|
|
|
|
|
|
|
|
366
|
|
|
|
|
|
|
This is simply turned into PCDATA before parsing. Note how this may interact |
367
|
|
|
|
|
|
|
with the various entity-handling options; |
368
|
|
|
|
|
|
|
|
369
|
|
|
|
|
|
|
=back |
370
|
|
|
|
|
|
|
|
371
|
|
|
|
|
|
|
The following parts of the XML standard are handled incorrectly or not at |
372
|
|
|
|
|
|
|
all - this is not an exhaustive list: |
373
|
|
|
|
|
|
|
|
374
|
|
|
|
|
|
|
=over 4 |
375
|
|
|
|
|
|
|
|
376
|
|
|
|
|
|
|
=item Namespaces |
377
|
|
|
|
|
|
|
|
378
|
|
|
|
|
|
|
While documents that use namespaces will be parsed just fine, there's no |
379
|
|
|
|
|
|
|
special treatment of them. Their names are preserved in element and |
380
|
|
|
|
|
|
|
attribute names like 'rdf:RDF'. |
381
|
|
|
|
|
|
|
|
382
|
|
|
|
|
|
|
=item DTDs and Schemas |
383
|
|
|
|
|
|
|
|
384
|
|
|
|
|
|
|
This is not a validating parser. declarations are ignored |
385
|
|
|
|
|
|
|
if you've not made them fatal. |
386
|
|
|
|
|
|
|
|
387
|
|
|
|
|
|
|
=item Entities and references |
388
|
|
|
|
|
|
|
|
389
|
|
|
|
|
|
|
declarations are ignored if you've not made them fatal. |
390
|
|
|
|
|
|
|
Unrecognised entities are ignored by default, as are naked & characters. |
391
|
|
|
|
|
|
|
This means that if entity parsing is enabled you won't be able to tell |
392
|
|
|
|
|
|
|
the difference between C< > and C< >. If your |
393
|
|
|
|
|
|
|
document might use any non-core entities then please consider using |
394
|
|
|
|
|
|
|
the C option, and then use something like |
395
|
|
|
|
|
|
|
L. |
396
|
|
|
|
|
|
|
|
397
|
|
|
|
|
|
|
=item Processing instructions |
398
|
|
|
|
|
|
|
|
399
|
|
|
|
|
|
|
These are ignored. |
400
|
|
|
|
|
|
|
|
401
|
|
|
|
|
|
|
=item Whitespace |
402
|
|
|
|
|
|
|
|
403
|
|
|
|
|
|
|
We do not guarantee to correctly handle leading and trailing whitespace. |
404
|
|
|
|
|
|
|
|
405
|
|
|
|
|
|
|
=item Character sets |
406
|
|
|
|
|
|
|
|
407
|
|
|
|
|
|
|
This is not practical with older versions of perl |
408
|
|
|
|
|
|
|
|
409
|
|
|
|
|
|
|
=back |
410
|
|
|
|
|
|
|
|
411
|
|
|
|
|
|
|
=head1 PHILOSOPHY and JUSTIFICATION |
412
|
|
|
|
|
|
|
|
413
|
|
|
|
|
|
|
While feedback from real users about this module has been uniformly |
414
|
|
|
|
|
|
|
positive and helpful, some people seem to take issue with this module |
415
|
|
|
|
|
|
|
because it doesn't implement every last jot and tittle of the XML |
416
|
|
|
|
|
|
|
standard and merely implements a useful subset. A very useful subset, |
417
|
|
|
|
|
|
|
as it happens, which can cope with common light-weight XML-ish tasks |
418
|
|
|
|
|
|
|
such as parsing the results of queries to the Amazon Web Services. |
419
|
|
|
|
|
|
|
Many, perhaps most, users of XML do not in fact need a full implementation |
420
|
|
|
|
|
|
|
of the standard, and are understandably reluctant to install large complex |
421
|
|
|
|
|
|
|
pieces of software which have many dependencies. In fact, when they |
422
|
|
|
|
|
|
|
realise what installing and using a full implementation entails, they |
423
|
|
|
|
|
|
|
quite often don't *want* it. Another class of users, people |
424
|
|
|
|
|
|
|
distributing applications, often can not rely on users being able to |
425
|
|
|
|
|
|
|
install modules from the CPAN, or even having tools like make or a shell |
426
|
|
|
|
|
|
|
available. XML::Tiny exists for those people. |
427
|
|
|
|
|
|
|
|
428
|
|
|
|
|
|
|
=head1 BUGS and FEEDBACK |
429
|
|
|
|
|
|
|
|
430
|
|
|
|
|
|
|
I welcome feedback about my code, including constructive criticism. |
431
|
|
|
|
|
|
|
Bug reports should be made using L or by email, |
432
|
|
|
|
|
|
|
and should include the smallest possible chunk of code, along with |
433
|
|
|
|
|
|
|
any necessary XML data, which demonstrates the bug. Ideally, this |
434
|
|
|
|
|
|
|
will be in the form of a file which I can drop in to the module's |
435
|
|
|
|
|
|
|
test suite. Please note that such files must work in perl 5.004. |
436
|
|
|
|
|
|
|
|
437
|
|
|
|
|
|
|
=head1 SEE ALSO |
438
|
|
|
|
|
|
|
|
439
|
|
|
|
|
|
|
=over 4 |
440
|
|
|
|
|
|
|
|
441
|
|
|
|
|
|
|
=item For more capable XML parsers: |
442
|
|
|
|
|
|
|
|
443
|
|
|
|
|
|
|
L |
444
|
|
|
|
|
|
|
|
445
|
|
|
|
|
|
|
L |
446
|
|
|
|
|
|
|
|
447
|
|
|
|
|
|
|
L |
448
|
|
|
|
|
|
|
|
449
|
|
|
|
|
|
|
=item The requirements for a module to be Tiny |
450
|
|
|
|
|
|
|
|
451
|
|
|
|
|
|
|
L |
452
|
|
|
|
|
|
|
|
453
|
|
|
|
|
|
|
=back |
454
|
|
|
|
|
|
|
|
455
|
|
|
|
|
|
|
=head1 AUTHOR, COPYRIGHT and LICENCE |
456
|
|
|
|
|
|
|
|
457
|
|
|
|
|
|
|
David Cantrell EFE |
458
|
|
|
|
|
|
|
|
459
|
|
|
|
|
|
|
Thanks to David Romano for some compatibility patches for Ye Aunciente Perl; |
460
|
|
|
|
|
|
|
|
461
|
|
|
|
|
|
|
to Matt Knecht and David Romano for prodding me to support attributes, |
462
|
|
|
|
|
|
|
and to Matt for providing code to implement it in a quick n dirty minimal |
463
|
|
|
|
|
|
|
kind of way; |
464
|
|
|
|
|
|
|
|
465
|
|
|
|
|
|
|
to the people on L and elsewhere who have been kind |
466
|
|
|
|
|
|
|
enough to point out ways it could be improved; |
467
|
|
|
|
|
|
|
|
468
|
|
|
|
|
|
|
to Sergio Fanchiotti for pointing out a bug in handling self-closing tags, |
469
|
|
|
|
|
|
|
for reporting another bug that I introduced when fixing the first one, |
470
|
|
|
|
|
|
|
and for providing a patch to improve error reporting; |
471
|
|
|
|
|
|
|
|
472
|
|
|
|
|
|
|
to 'Corion' for finding a bug with localised filehandles and providing a fix; |
473
|
|
|
|
|
|
|
|
474
|
|
|
|
|
|
|
to Diab Jerius for spotting that element and attribute names can begin |
475
|
|
|
|
|
|
|
with an underscore; |
476
|
|
|
|
|
|
|
|
477
|
|
|
|
|
|
|
to Nick Dumas for finding a bug when attribs have their quoting character |
478
|
|
|
|
|
|
|
in CDATA, and providing a patch; |
479
|
|
|
|
|
|
|
|
480
|
|
|
|
|
|
|
to Mathieu Longtin for pointing out that BOMs exist. |
481
|
|
|
|
|
|
|
|
482
|
|
|
|
|
|
|
Copyright 2007-2010 David Cantrell Edavid@cantrell.org.ukE |
483
|
|
|
|
|
|
|
|
484
|
|
|
|
|
|
|
This software is free-as-in-speech software, and may be used, |
485
|
|
|
|
|
|
|
distributed, and modified under the terms of either the GNU |
486
|
|
|
|
|
|
|
General Public Licence version 2 or the Artistic Licence. It's |
487
|
|
|
|
|
|
|
up to you which one you use. The full text of the licences can |
488
|
|
|
|
|
|
|
be found in the files GPL2.txt and ARTISTIC.txt, respectively. |
489
|
|
|
|
|
|
|
|
490
|
|
|
|
|
|
|
=head1 CONSPIRACY |
491
|
|
|
|
|
|
|
|
492
|
|
|
|
|
|
|
This module is also free-as-in-mason software. |
493
|
|
|
|
|
|
|
|
494
|
|
|
|
|
|
|
=cut |
495
|
|
|
|
|
|
|
|
496
|
|
|
|
|
|
|
'zero'; |