| line | stmt | bran | cond | sub | pod | time | code | 
| 1 |  |  |  |  |  |  | package App::Rsnapshot::XML::Tiny; | 
| 2 |  |  |  |  |  |  |  | 
| 3 | 13 |  |  | 13 |  | 34091 | use strict; | 
|  | 13 |  |  |  |  | 24 |  | 
|  | 13 |  |  |  |  | 535 |  | 
| 4 |  |  |  |  |  |  |  | 
| 5 |  |  |  |  |  |  | require Exporter; | 
| 6 |  |  |  |  |  |  |  | 
| 7 | 13 |  |  | 13 |  | 66 | use vars qw($VERSION @EXPORT_OK @ISA); | 
|  | 13 |  |  |  |  | 20 |  | 
|  | 13 |  |  |  |  | 21856 |  | 
| 8 |  |  |  |  |  |  |  | 
| 9 |  |  |  |  |  |  | $VERSION = '1.12'; | 
| 10 |  |  |  |  |  |  | @EXPORT_OK = qw(parsefile); | 
| 11 |  |  |  |  |  |  | @ISA = qw(Exporter); | 
| 12 |  |  |  |  |  |  |  | 
| 13 |  |  |  |  |  |  | # localising prevents the warningness leaking out of this module | 
| 14 |  |  |  |  |  |  | local $^W = 1;    # can't use warnings as that's a 5.6-ism | 
| 15 |  |  |  |  |  |  |  | 
| 16 |  |  |  |  |  |  | =head1 NAME | 
| 17 |  |  |  |  |  |  |  | 
| 18 |  |  |  |  |  |  | App::Rsnapshot::XML::Tiny - simple lightweight parser for a subset of XML | 
| 19 |  |  |  |  |  |  |  | 
| 20 |  |  |  |  |  |  | =head1 DESCRIPTION | 
| 21 |  |  |  |  |  |  |  | 
| 22 |  |  |  |  |  |  | App::Rsnapshot::XML::Tiny is a simple lightweight parser for a subset of XML | 
| 23 |  |  |  |  |  |  |  | 
| 24 |  |  |  |  |  |  | =head1 SYNOPSIS | 
| 25 |  |  |  |  |  |  |  | 
| 26 |  |  |  |  |  |  | use App::Rsnapshot::XML::Tiny qw(parsefile); | 
| 27 |  |  |  |  |  |  | open($xmlfile, 'something.xml); | 
| 28 |  |  |  |  |  |  | my $document = parsefile($xmlfile); | 
| 29 |  |  |  |  |  |  |  | 
| 30 |  |  |  |  |  |  | This will leave C<$document> looking something like this: | 
| 31 |  |  |  |  |  |  |  | 
| 32 |  |  |  |  |  |  | [ | 
| 33 |  |  |  |  |  |  | { | 
| 34 |  |  |  |  |  |  | type   => 'e', | 
| 35 |  |  |  |  |  |  | attrib => { ... }, | 
| 36 |  |  |  |  |  |  | name   => 'rootelementname', | 
| 37 |  |  |  |  |  |  | content => [ | 
| 38 |  |  |  |  |  |  | ... | 
| 39 |  |  |  |  |  |  | more elements and text content | 
| 40 |  |  |  |  |  |  | ... | 
| 41 |  |  |  |  |  |  | ] | 
| 42 |  |  |  |  |  |  | } | 
| 43 |  |  |  |  |  |  | ] | 
| 44 |  |  |  |  |  |  |  | 
| 45 |  |  |  |  |  |  | =head1 FUNCTIONS | 
| 46 |  |  |  |  |  |  |  | 
| 47 |  |  |  |  |  |  | The C function is optionally exported.  By default nothing is | 
| 48 |  |  |  |  |  |  | exported.  There is no objecty interface. | 
| 49 |  |  |  |  |  |  |  | 
| 50 |  |  |  |  |  |  | =head2 parsefile | 
| 51 |  |  |  |  |  |  |  | 
| 52 |  |  |  |  |  |  | This takes at least one parameter, optionally more.  The compulsory | 
| 53 |  |  |  |  |  |  | parameter may be: | 
| 54 |  |  |  |  |  |  |  | 
| 55 |  |  |  |  |  |  | =over 4 | 
| 56 |  |  |  |  |  |  |  | 
| 57 |  |  |  |  |  |  | =item a filename | 
| 58 |  |  |  |  |  |  |  | 
| 59 |  |  |  |  |  |  | in which case the file is read and parsed; | 
| 60 |  |  |  |  |  |  |  | 
| 61 |  |  |  |  |  |  | =item a string of XML | 
| 62 |  |  |  |  |  |  |  | 
| 63 |  |  |  |  |  |  | in which case it is read and parsed.  How do we tell if we've got a string | 
| 64 |  |  |  |  |  |  | or a filename?  If it begins with C<_TINY_XML_STRING_> then it's | 
| 65 |  |  |  |  |  |  | a string.  That prefix is, of course, ignored when it comes to actually | 
| 66 |  |  |  |  |  |  | parsing the data.  This is intended primarily for use by wrappers which | 
| 67 |  |  |  |  |  |  | want to retain compatibility with Ye Aunciente Perl.  Normal users who want | 
| 68 |  |  |  |  |  |  | to pass in a string would be expected to use L. | 
| 69 |  |  |  |  |  |  |  | 
| 70 |  |  |  |  |  |  | =item a glob-ref or IO::Handle object | 
| 71 |  |  |  |  |  |  |  | 
| 72 |  |  |  |  |  |  | in which case again, the file is read and parsed. | 
| 73 |  |  |  |  |  |  |  | 
| 74 |  |  |  |  |  |  | =back | 
| 75 |  |  |  |  |  |  |  | 
| 76 |  |  |  |  |  |  | The former case is for compatibility with older perls, but makes no | 
| 77 |  |  |  |  |  |  | attempt to properly deal with character sets.  If you open a file in a | 
| 78 |  |  |  |  |  |  | character-set-friendly way and then pass in a handle / object, then the | 
| 79 |  |  |  |  |  |  | method should Do The Right Thing as it only ever works with character | 
| 80 |  |  |  |  |  |  | data. | 
| 81 |  |  |  |  |  |  |  | 
| 82 |  |  |  |  |  |  | The remaining parameters are a list of key/value pairs to make a hash of | 
| 83 |  |  |  |  |  |  | options: | 
| 84 |  |  |  |  |  |  |  | 
| 85 |  |  |  |  |  |  | =over 4 | 
| 86 |  |  |  |  |  |  |  | 
| 87 |  |  |  |  |  |  | =item fatal_declarations | 
| 88 |  |  |  |  |  |  |  | 
| 89 |  |  |  |  |  |  | If set to true, E!ENTITY...E and E!DOCTYPE...E declarations | 
| 90 |  |  |  |  |  |  | in the document | 
| 91 |  |  |  |  |  |  | are fatal errors - otherwise they are *ignored*. | 
| 92 |  |  |  |  |  |  |  | 
| 93 |  |  |  |  |  |  | =item no_entity_parsing | 
| 94 |  |  |  |  |  |  |  | 
| 95 |  |  |  |  |  |  | If set to true, the five built-in entities are passed through unparsed. | 
| 96 |  |  |  |  |  |  | Note that special characters in CDATA and attributes may have been turned | 
| 97 |  |  |  |  |  |  | into C<&>, C<<> and friends. | 
| 98 |  |  |  |  |  |  |  | 
| 99 |  |  |  |  |  |  | =item strict_entity_parsing | 
| 100 |  |  |  |  |  |  |  | 
| 101 |  |  |  |  |  |  | If set to true, any unrecognised entities (ie, those outside the core five | 
| 102 |  |  |  |  |  |  | plus numeric entities) cause a fatal error.  If you set both this and | 
| 103 |  |  |  |  |  |  | C (but why would you do that?) then the latter takes | 
| 104 |  |  |  |  |  |  | precedence. | 
| 105 |  |  |  |  |  |  |  | 
| 106 |  |  |  |  |  |  | Obviously, if you want to maximise compliance with the XML spec, you should | 
| 107 |  |  |  |  |  |  | turn on fatal_declarations and strict_entity_parsing. | 
| 108 |  |  |  |  |  |  |  | 
| 109 |  |  |  |  |  |  | =back | 
| 110 |  |  |  |  |  |  |  | 
| 111 |  |  |  |  |  |  | The function returns a structure describing the document.  This contains | 
| 112 |  |  |  |  |  |  | one or more nodes, each being either an 'element' node or a 'text' mode. | 
| 113 |  |  |  |  |  |  | The structure is an arrayref which contains a single 'element' node which | 
| 114 |  |  |  |  |  |  | represents the document entity.  The arrayref is redundant, but exists for | 
| 115 |  |  |  |  |  |  | compatibility with L. | 
| 116 |  |  |  |  |  |  |  | 
| 117 |  |  |  |  |  |  | Element nodes are hashrefs with the following keys: | 
| 118 |  |  |  |  |  |  |  | 
| 119 |  |  |  |  |  |  | =over 4 | 
| 120 |  |  |  |  |  |  |  | 
| 121 |  |  |  |  |  |  | =item type | 
| 122 |  |  |  |  |  |  |  | 
| 123 |  |  |  |  |  |  | The node's type, represented by the letter 'e'. | 
| 124 |  |  |  |  |  |  |  | 
| 125 |  |  |  |  |  |  | =item name | 
| 126 |  |  |  |  |  |  |  | 
| 127 |  |  |  |  |  |  | The element's name. | 
| 128 |  |  |  |  |  |  |  | 
| 129 |  |  |  |  |  |  | =item attrib | 
| 130 |  |  |  |  |  |  |  | 
| 131 |  |  |  |  |  |  | A hashref containing the element's attributes, as key/value pairs where | 
| 132 |  |  |  |  |  |  | the key is the attribute name. | 
| 133 |  |  |  |  |  |  |  | 
| 134 |  |  |  |  |  |  | =item content | 
| 135 |  |  |  |  |  |  |  | 
| 136 |  |  |  |  |  |  | An arrayref of the element's contents.  The array's contents is a list of | 
| 137 |  |  |  |  |  |  | nodes, in the order they were encountered in the document. | 
| 138 |  |  |  |  |  |  |  | 
| 139 |  |  |  |  |  |  | =back | 
| 140 |  |  |  |  |  |  |  | 
| 141 |  |  |  |  |  |  | Text nodes are hashrefs with the following keys: | 
| 142 |  |  |  |  |  |  |  | 
| 143 |  |  |  |  |  |  | =over 4 | 
| 144 |  |  |  |  |  |  |  | 
| 145 |  |  |  |  |  |  | =item type | 
| 146 |  |  |  |  |  |  |  | 
| 147 |  |  |  |  |  |  | The node's type, represented by the letter 't'. | 
| 148 |  |  |  |  |  |  |  | 
| 149 |  |  |  |  |  |  | =item content | 
| 150 |  |  |  |  |  |  |  | 
| 151 |  |  |  |  |  |  | A scalar piece of text. | 
| 152 |  |  |  |  |  |  |  | 
| 153 |  |  |  |  |  |  | =back | 
| 154 |  |  |  |  |  |  |  | 
| 155 |  |  |  |  |  |  | =cut | 
| 156 |  |  |  |  |  |  |  | 
| 157 |  |  |  |  |  |  | my %regexps = ( | 
| 158 |  |  |  |  |  |  | name => '[:a-z][\\w:\\.-]*' | 
| 159 |  |  |  |  |  |  | ); | 
| 160 |  |  |  |  |  |  |  | 
| 161 |  |  |  |  |  |  | my $strict_entity_parsing; # mmm, global. don't worry, parsefile sets it | 
| 162 |  |  |  |  |  |  | # explicitly every time | 
| 163 |  |  |  |  |  |  | sub parsefile { | 
| 164 | 212 |  |  | 212 | 1 | 39290 | my($arg, %params) = @_; | 
| 165 | 212 |  |  |  |  | 701 | my($file, $elem) = ('', { content => [] }); | 
| 166 | 212 |  |  |  |  | 525 | local $/; # sluuuuurp | 
| 167 |  |  |  |  |  |  |  | 
| 168 | 212 |  |  |  |  | 281 | $strict_entity_parsing = $params{strict_entity_parsing}; | 
| 169 |  |  |  |  |  |  |  | 
| 170 | 212 | 100 |  |  |  | 604 | if(ref($arg) eq '') { # we were passed a filename or a string | 
| 171 | 210 | 100 |  |  |  | 408 | if($arg =~ /^_TINY_XML_STRING_/) { # it's a string | 
| 172 | 22 |  |  |  |  | 227 | $file = substr($arg, 17); | 
| 173 |  |  |  |  |  |  | } else { | 
| 174 | 188 |  |  |  |  | 332 | local *FH; | 
| 175 | 188 | 100 |  |  |  | 8015 | open(FH, $arg) || die(__PACKAGE__."::parsefile: Can't open $arg\n"); | 
| 176 | 187 |  |  |  |  | 3729 | $file = ; | 
| 177 | 187 |  |  |  |  | 1795 | close(FH); | 
| 178 |  |  |  |  |  |  | } | 
| 179 | 2 |  |  |  |  | 50 | } else { $file = <$arg>; } | 
| 180 | 211 | 100 | 66 |  |  | 1555 | die("No elements\n") if (!defined($file) || $file =~ /^\s*$/); | 
| 181 |  |  |  |  |  |  |  | 
| 182 |  |  |  |  |  |  | # illegal low-ASCII chars | 
| 183 | 209 | 100 |  |  |  | 649 | die("Not well-formed\n") if($file =~ /[\x00-\x08\x0b\x0c\x0e-\x1f]/); | 
| 184 |  |  |  |  |  |  |  | 
| 185 |  |  |  |  |  |  | # turn CDATA into PCDATA | 
| 186 | 204 |  |  |  |  | 324 | $file =~ s{}{ | 
| 187 | 8 |  |  |  |  | 22 | $_ = $1.chr(0);          # this makes sure that empty CDATAs become | 
| 188 | 8 |  |  |  |  | 14 | s/([&<>])/               # the empty string and aren't just thrown away. | 
| 189 | 4 | 100 |  |  |  | 19 | $1 eq '&' ? '&' : | 
|  |  | 100 |  |  |  |  |  | 
| 190 |  |  |  |  |  |  | $1 eq '<' ? '<'  : | 
| 191 |  |  |  |  |  |  | '>' | 
| 192 |  |  |  |  |  |  | /eg; | 
| 193 | 8 |  |  |  |  | 23 | $_; | 
| 194 |  |  |  |  |  |  | }egs; | 
| 195 |  |  |  |  |  |  |  | 
| 196 | 217 | 100 |  |  |  | 1074 | die("Not well-formed\n") if( | 
| 197 |  |  |  |  |  |  | $file =~ /]]>/ ||                          # ]]> not delimiting CDATA | 
| 198 |  |  |  |  |  |  | $file =~ //s ||               # ---> can't end a comment | 
| 199 | 204 | 100 | 100 |  |  | 9368 | grep { $_ && /--/ } ($file =~ /^\s+||\s+$/gs) # -- in comm | 
|  |  |  | 100 |  |  |  |  | 
| 200 |  |  |  |  |  |  | ); | 
| 201 |  |  |  |  |  |  |  | 
| 202 |  |  |  |  |  |  | # strip leading/trailing whitespace and comments (which don't nest - phew!) | 
| 203 | 194 |  |  |  |  | 7392 | $file =~ s/^\s+||\s+$//gs; | 
| 204 |  |  |  |  |  |  |  | 
| 205 |  |  |  |  |  |  | # turn quoted > in attribs into > | 
| 206 |  |  |  |  |  |  | # double- and single-quoted attrib values get done seperately | 
| 207 | 194 |  |  |  |  | 12232 | while($file =~ s/($regexps{name}\s*=\s*"[^"]*)>([^"]*")/$1>$2/gsi) {} | 
| 208 | 194 |  |  |  |  | 12934 | while($file =~ s/($regexps{name}\s*=\s*'[^']*)>([^']*')/$1>$2/gsi) {} | 
| 209 |  |  |  |  |  |  |  | 
| 210 | 194 | 100 | 100 |  |  | 1449 | if($params{fatal_declarations} && $file =~ / | 
| 211 | 111 |  |  |  |  | 1054 | die("I can't handle this document\n"); | 
| 212 |  |  |  |  |  |  | } | 
| 213 |  |  |  |  |  |  |  | 
| 214 |  |  |  |  |  |  | # ignore empty tokens/whitespace tokens | 
| 215 | 83 | 100 |  |  |  | 1197 | foreach my $token (grep { length && $_ !~ /^\s+$/ } | 
|  | 1290 |  |  |  |  | 11686 |  | 
| 216 |  |  |  |  |  |  | split(/(<[^>]+>)/, $file)) { | 
| 217 | 749 | 100 | 100 |  |  | 1318440 | if( | 
|  |  | 100 |  |  |  |  |  | 
|  |  | 100 |  |  |  |  |  | 
|  |  | 100 |  |  |  |  |  | 
| 218 |  |  |  |  |  |  | $token =~ /<\?$regexps{name}.*?\?>/is ||  # PI | 
| 219 |  |  |  |  |  |  | $token =~ /^ | 
| 220 |  |  |  |  |  |  | ) { | 
| 221 | 6 |  |  |  |  | 18 | next; | 
| 222 |  |  |  |  |  |  | } elsif($token =~ m!^($regexps{name})\s*>!i) {     # close tag | 
| 223 | 241 | 100 |  |  |  | 863 | die("Not well-formed\n\tat $token\n") if($elem->{name} ne $1); | 
| 224 | 238 |  |  |  |  | 633 | $elem = delete $elem->{parent}; | 
| 225 |  |  |  |  |  |  | } elsif($token =~ /^<$regexps{name}(\s[^>]*)*(\s*\/)?>/is) {   # open tag | 
| 226 | 351 |  |  |  |  | 2940 | my($tagname, $attribs_raw) = ($token =~ m!<(\S*)(.*?)(\s*/)?>!s); | 
| 227 |  |  |  |  |  |  | # first make attribs into a list so we can spot duplicate keys | 
| 228 | 351 |  |  |  |  | 4389 | my $attrib  = [ | 
| 229 |  |  |  |  |  |  | # do double- and single- quoted attribs seperately | 
| 230 |  |  |  |  |  |  | $attribs_raw =~ /\s($regexps{name})\s*=\s*"([^"]*?)"/gi, | 
| 231 |  |  |  |  |  |  | $attribs_raw =~ /\s($regexps{name})\s*=\s*'([^']*?)'/gi | 
| 232 |  |  |  |  |  |  | ]; | 
| 233 | 351 | 100 |  |  |  | 501 | if(@{$attrib} == 2 * keys %{{@{$attrib}}}) { | 
|  | 351 |  |  |  |  | 559 |  | 
|  | 351 |  |  |  |  | 378 |  | 
|  | 351 |  |  |  |  | 1373 |  | 
| 234 | 350 |  |  |  |  | 404 | $attrib = { @{$attrib} } | 
|  | 350 |  |  |  |  | 718 |  | 
| 235 | 1 |  |  |  |  | 12 | } else { die("Not well-formed - duplicate attribute\n"); } | 
| 236 |  |  |  |  |  |  |  | 
| 237 |  |  |  |  |  |  | # now trash any attribs that we *did* manage to parse and see | 
| 238 |  |  |  |  |  |  | # if there's anything left | 
| 239 | 350 |  |  |  |  | 3491 | $attribs_raw =~ s/\s($regexps{name})\s*=\s*"([^"]*?)"//gi; | 
| 240 | 350 |  |  |  |  | 2130 | $attribs_raw =~ s/\s($regexps{name})\s*=\s*'([^']*?)'//gi; | 
| 241 | 350 | 100 | 100 |  |  | 1277 | die("Not well-formed\n$attribs_raw") if($attribs_raw =~ /\S/ || grep { / } values %{$attrib}); | 
|  | 154 |  |  |  |  | 625 |  | 
|  | 340 |  |  |  |  | 1130 |  | 
| 242 |  |  |  |  |  |  |  | 
| 243 | 338 | 100 |  |  |  | 731 | unless($params{no_entity_parsing}) { | 
| 244 | 337 |  |  |  |  | 369 | foreach my $key (keys %{$attrib}) { | 
|  | 337 |  |  |  |  | 1679 |  | 
| 245 | 152 |  |  |  |  | 342 | $attrib->{$key} = _fixentities($attrib->{$key}) | 
| 246 |  |  |  |  |  |  | } | 
| 247 |  |  |  |  |  |  | } | 
| 248 |  |  |  |  |  |  | $elem = { | 
| 249 | 334 |  |  |  |  | 1639 | content => [], | 
| 250 |  |  |  |  |  |  | name => $tagname, | 
| 251 |  |  |  |  |  |  | type => 'e', | 
| 252 |  |  |  |  |  |  | attrib => $attrib, | 
| 253 |  |  |  |  |  |  | parent => $elem | 
| 254 |  |  |  |  |  |  | }; | 
| 255 | 334 |  |  |  |  | 400 | push @{$elem->{parent}->{content}}, $elem; | 
|  | 334 |  |  |  |  | 736 |  | 
| 256 |  |  |  |  |  |  | # now handle self-closing tags | 
| 257 | 334 | 100 |  |  |  | 1670 | if($token =~ /\s*\/>$/) { | 
| 258 | 70 |  |  |  |  | 160 | $elem->{name} =~ s/\/$//; | 
| 259 | 70 |  |  |  |  | 509 | $elem = delete $elem->{parent}; | 
| 260 |  |  |  |  |  |  | } | 
| 261 |  |  |  |  |  |  | } elsif($token =~ /^) { # some token taggish thing | 
| 262 | 13 |  |  |  |  | 145 | die("I can't handle this document\n\tat $token\n"); | 
| 263 |  |  |  |  |  |  | } else {                          # ordinary content | 
| 264 | 138 |  |  |  |  | 204 | $token =~ s/\x00//g; # get rid of our CDATA marker | 
| 265 | 138 | 100 |  |  |  | 318 | unless($params{no_entity_parsing}) { $token = _fixentities($token); } | 
|  | 137 |  |  |  |  | 271 |  | 
| 266 | 130 |  |  |  |  | 209 | push @{$elem->{content}}, { content => $token, type => 't' }; | 
|  | 130 |  |  |  |  | 696 |  | 
| 267 |  |  |  |  |  |  | } | 
| 268 |  |  |  |  |  |  | } | 
| 269 | 42 | 50 |  |  |  | 286 | die("Not well-formed\n") if(exists($elem->{parent})); | 
| 270 | 42 | 100 |  |  |  | 92 | die("Junk after end of document\n") if($#{$elem->{content}} > 0); | 
|  | 42 |  |  |  |  | 235 |  | 
| 271 | 30 |  |  |  |  | 218 | die("No elements\n") if( | 
| 272 | 30 | 100 | 66 |  |  | 61 | $#{$elem->{content}} == -1 || $elem->{content}->[0]->{type} ne 'e' | 
| 273 |  |  |  |  |  |  | ); | 
| 274 | 29 |  |  |  |  | 6139 | return $elem->{content}; | 
| 275 |  |  |  |  |  |  | } | 
| 276 |  |  |  |  |  |  |  | 
| 277 |  |  |  |  |  |  | sub _fixentities { | 
| 278 | 289 |  |  | 289 |  | 412 | my $thingy = shift; | 
| 279 |  |  |  |  |  |  |  | 
| 280 | 289 | 100 |  |  |  | 507 | my $junk = ($strict_entity_parsing) ? '|.*' : ''; | 
| 281 | 289 |  |  |  |  | 1506 | $thingy =~ s/&((#(\d+|x[a-fA-F0-9]+);)|lt;|gt;|quot;|apos;|amp;$junk)/ | 
| 282 | 219 | 100 |  |  |  | 1531 | $3 ? ( | 
|  |  | 100 |  |  |  |  |  | 
|  |  | 100 |  |  |  |  |  | 
|  |  | 100 |  |  |  |  |  | 
|  |  | 100 |  |  |  |  |  | 
|  |  | 100 |  |  |  |  |  | 
|  |  | 100 |  |  |  |  |  | 
| 283 |  |  |  |  |  |  | substr($3, 0, 1) eq 'x' ?     # using a =~ match here clobbers $3 | 
| 284 |  |  |  |  |  |  | chr(hex(substr($3, 1))) : # so don't "fix" it! | 
| 285 |  |  |  |  |  |  | chr($3) | 
| 286 |  |  |  |  |  |  | ) : | 
| 287 |  |  |  |  |  |  | $1 eq 'lt;'   ? '<' : | 
| 288 |  |  |  |  |  |  | $1 eq 'gt;'   ? '>' : | 
| 289 |  |  |  |  |  |  | $1 eq 'apos;' ? "'" : | 
| 290 |  |  |  |  |  |  | $1 eq 'quot;' ? '"' : | 
| 291 |  |  |  |  |  |  | $1 eq 'amp;'  ? '&' : | 
| 292 |  |  |  |  |  |  | die("Illegal ampersand or entity\n\tat $1\n") | 
| 293 |  |  |  |  |  |  | /ge; | 
| 294 | 277 |  |  |  |  | 929 | $thingy; | 
| 295 |  |  |  |  |  |  | } | 
| 296 |  |  |  |  |  |  |  | 
| 297 |  |  |  |  |  |  | =head1 COMPATIBILITY | 
| 298 |  |  |  |  |  |  |  | 
| 299 |  |  |  |  |  |  | =head2 With other modules | 
| 300 |  |  |  |  |  |  |  | 
| 301 |  |  |  |  |  |  | The C function is so named because it is intended to work in a | 
| 302 |  |  |  |  |  |  | similar fashion to L with the L style. | 
| 303 |  |  |  |  |  |  | Instead of saying this: | 
| 304 |  |  |  |  |  |  |  | 
| 305 |  |  |  |  |  |  | use XML::Parser; | 
| 306 |  |  |  |  |  |  | use XML::Parser::EasyTree; | 
| 307 |  |  |  |  |  |  | $XML::Parser::EasyTree::Noempty=1; | 
| 308 |  |  |  |  |  |  | my $p=new XML::Parser(Style=>'EasyTree'); | 
| 309 |  |  |  |  |  |  | my $tree=$p->parsefile('something.xml'); | 
| 310 |  |  |  |  |  |  |  | 
| 311 |  |  |  |  |  |  | you would say: | 
| 312 |  |  |  |  |  |  |  | 
| 313 |  |  |  |  |  |  | use App::Rsnapshot::XML::Tiny; | 
| 314 |  |  |  |  |  |  | my $tree = App::Rsnapshot::XML::Tiny::parsefile('something.xml'); | 
| 315 |  |  |  |  |  |  |  | 
| 316 |  |  |  |  |  |  | Any valid document that can be parsed like that using App::Rsnapshot::XML::Tiny should | 
| 317 |  |  |  |  |  |  | produce identical results if you use the above example of how to use | 
| 318 |  |  |  |  |  |  | L. | 
| 319 |  |  |  |  |  |  |  | 
| 320 |  |  |  |  |  |  | If you find a document where that is not the case, please report it as | 
| 321 |  |  |  |  |  |  | a bug. | 
| 322 |  |  |  |  |  |  |  | 
| 323 |  |  |  |  |  |  | =head2 With perl 5.004 | 
| 324 |  |  |  |  |  |  |  | 
| 325 |  |  |  |  |  |  | The module is intended to be fully compatible with every version of perl | 
| 326 |  |  |  |  |  |  | back to and including 5.004, and may be compatible with even older | 
| 327 |  |  |  |  |  |  | versions of perl 5. | 
| 328 |  |  |  |  |  |  |  | 
| 329 |  |  |  |  |  |  | The lack of Unicode and friends in older perls means that App::Rsnapshot::XML::Tiny | 
| 330 |  |  |  |  |  |  | does nothing with character sets.  If you have a document with a funny | 
| 331 |  |  |  |  |  |  | character set, then you will need to open the file in an appropriate | 
| 332 |  |  |  |  |  |  | mode using a character-set-friendly perl and pass the resulting file | 
| 333 |  |  |  |  |  |  | handle to the module. | 
| 334 |  |  |  |  |  |  |  | 
| 335 |  |  |  |  |  |  | =head2 The subset of XML that we understand | 
| 336 |  |  |  |  |  |  |  | 
| 337 |  |  |  |  |  |  | =over 4 | 
| 338 |  |  |  |  |  |  |  | 
| 339 |  |  |  |  |  |  | =item Element tags and attributes | 
| 340 |  |  |  |  |  |  |  | 
| 341 |  |  |  |  |  |  | Including "self-closing" tags like Epie type = 'steak n kidney' /E; | 
| 342 |  |  |  |  |  |  |  | 
| 343 |  |  |  |  |  |  | =item Comments | 
| 344 |  |  |  |  |  |  |  | 
| 345 |  |  |  |  |  |  | Which are ignored; | 
| 346 |  |  |  |  |  |  |  | 
| 347 |  |  |  |  |  |  | =item The five "core" entities | 
| 348 |  |  |  |  |  |  |  | 
| 349 |  |  |  |  |  |  | ie C<&>, C<<>, C<>>, C<'> and C<">; | 
| 350 |  |  |  |  |  |  |  | 
| 351 |  |  |  |  |  |  | =item Numeric entities | 
| 352 |  |  |  |  |  |  |  | 
| 353 |  |  |  |  |  |  | eg C<A> and C<A>; | 
| 354 |  |  |  |  |  |  |  | 
| 355 |  |  |  |  |  |  | =item CDATA | 
| 356 |  |  |  |  |  |  |  | 
| 357 |  |  |  |  |  |  | This is simply turned into PCDATA before parsing.  Note how this may interact | 
| 358 |  |  |  |  |  |  | with the various entity-handling options; | 
| 359 |  |  |  |  |  |  |  | 
| 360 |  |  |  |  |  |  | =back | 
| 361 |  |  |  |  |  |  |  | 
| 362 |  |  |  |  |  |  | The following parts of the XML standard are handled incorrectly or not at | 
| 363 |  |  |  |  |  |  | all - this is not an exhaustive list: | 
| 364 |  |  |  |  |  |  |  | 
| 365 |  |  |  |  |  |  | =over 4 | 
| 366 |  |  |  |  |  |  |  | 
| 367 |  |  |  |  |  |  | =item Namespaces | 
| 368 |  |  |  |  |  |  |  | 
| 369 |  |  |  |  |  |  | While documents that use namespaces will be parsed just fine, there's no | 
| 370 |  |  |  |  |  |  | special treatment of them.  Their names are preserved in element and | 
| 371 |  |  |  |  |  |  | attribute names like 'rdf:RDF'. | 
| 372 |  |  |  |  |  |  |  | 
| 373 |  |  |  |  |  |  | =item DTDs and Schemas | 
| 374 |  |  |  |  |  |  |  | 
| 375 |  |  |  |  |  |  | This is not a validating parser.   declarations are ignored | 
| 376 |  |  |  |  |  |  | if you've not made them fatal. | 
| 377 |  |  |  |  |  |  |  | 
| 378 |  |  |  |  |  |  | =item Entities and references | 
| 379 |  |  |  |  |  |  |  | 
| 380 |  |  |  |  |  |  | declarations are ignored if you've not made them fatal. | 
| 381 |  |  |  |  |  |  | Unrecognised entities are ignored by default, as are naked & characters. | 
| 382 |  |  |  |  |  |  | This means that if entity parsing is enabled you won't be able to tell | 
| 383 |  |  |  |  |  |  | the difference between C< > and C< >.  If your | 
| 384 |  |  |  |  |  |  | document might use any non-core entities then please consider using | 
| 385 |  |  |  |  |  |  | the C option, and then use something like | 
| 386 |  |  |  |  |  |  | L. | 
| 387 |  |  |  |  |  |  |  | 
| 388 |  |  |  |  |  |  | =item Processing instructions | 
| 389 |  |  |  |  |  |  |  | 
| 390 |  |  |  |  |  |  | These are ignored. | 
| 391 |  |  |  |  |  |  |  | 
| 392 |  |  |  |  |  |  | =item Whitespace | 
| 393 |  |  |  |  |  |  |  | 
| 394 |  |  |  |  |  |  | We do not guarantee to correctly handle leading and trailing whitespace. | 
| 395 |  |  |  |  |  |  |  | 
| 396 |  |  |  |  |  |  | =item Character sets | 
| 397 |  |  |  |  |  |  |  | 
| 398 |  |  |  |  |  |  | This is not practical with older versions of perl | 
| 399 |  |  |  |  |  |  |  | 
| 400 |  |  |  |  |  |  | =back | 
| 401 |  |  |  |  |  |  |  | 
| 402 |  |  |  |  |  |  | =head1 PHILOSOPHY and JUSTIFICATION | 
| 403 |  |  |  |  |  |  |  | 
| 404 |  |  |  |  |  |  | While feedback from real users about this module has been uniformly | 
| 405 |  |  |  |  |  |  | positive and helpful, some people seem to take issue with this module | 
| 406 |  |  |  |  |  |  | because it doesn't implement every last jot and tittle of the XML | 
| 407 |  |  |  |  |  |  | standard and merely implements a useful subset.  A very useful subset, | 
| 408 |  |  |  |  |  |  | as it happens, which can cope with common light-weight XML-ish tasks | 
| 409 |  |  |  |  |  |  | such as parsing the results of queries to the Amazon Web Services. | 
| 410 |  |  |  |  |  |  | Many, perhaps most, users of XML do not in fact need a full implementation | 
| 411 |  |  |  |  |  |  | of the standard, and are understandably reluctant to install large complex | 
| 412 |  |  |  |  |  |  | pieces of software which have many dependencies.  In fact, when they | 
| 413 |  |  |  |  |  |  | realise what installing and using a full implementation entails, they | 
| 414 |  |  |  |  |  |  | quite often don't *want* it.  Another class of users, people | 
| 415 |  |  |  |  |  |  | distributing applications, often can not rely on users being able to | 
| 416 |  |  |  |  |  |  | install modules from the CPAN, or even having tools like make or a shell | 
| 417 |  |  |  |  |  |  | available.  App::Rsnapshot::XML::Tiny exists for those people. | 
| 418 |  |  |  |  |  |  |  | 
| 419 |  |  |  |  |  |  | =head1 BUGS and FEEDBACK | 
| 420 |  |  |  |  |  |  |  | 
| 421 |  |  |  |  |  |  | I welcome feedback about my code, including constructive criticism. | 
| 422 |  |  |  |  |  |  | Bug reports should be made using L or by email, | 
| 423 |  |  |  |  |  |  | and should include the smallest possible chunk of code, along with | 
| 424 |  |  |  |  |  |  | any necessary XML data, which demonstrates the bug.  Ideally, this | 
| 425 |  |  |  |  |  |  | will be in the form of a file which I can drop in to the module's | 
| 426 |  |  |  |  |  |  | test suite.  Please note that such files must work in perl 5.004. | 
| 427 |  |  |  |  |  |  |  | 
| 428 |  |  |  |  |  |  | If you are feeling particularly generous you can encourage me in my | 
| 429 |  |  |  |  |  |  | open source endeavours by buying me something from my wishlist: | 
| 430 |  |  |  |  |  |  | L | 
| 431 |  |  |  |  |  |  |  | 
| 432 |  |  |  |  |  |  | =head1 SEE ALSO | 
| 433 |  |  |  |  |  |  |  | 
| 434 |  |  |  |  |  |  | =over 4 | 
| 435 |  |  |  |  |  |  |  | 
| 436 |  |  |  |  |  |  | =item For more capable XML parsers: | 
| 437 |  |  |  |  |  |  |  | 
| 438 |  |  |  |  |  |  | L | 
| 439 |  |  |  |  |  |  |  | 
| 440 |  |  |  |  |  |  | L | 
| 441 |  |  |  |  |  |  |  | 
| 442 |  |  |  |  |  |  | =item The requirements for a module to be Tiny | 
| 443 |  |  |  |  |  |  |  | 
| 444 |  |  |  |  |  |  | L | 
| 445 |  |  |  |  |  |  |  | 
| 446 |  |  |  |  |  |  | =back | 
| 447 |  |  |  |  |  |  |  | 
| 448 |  |  |  |  |  |  | =head1 AUTHOR | 
| 449 |  |  |  |  |  |  |  | 
| 450 |  |  |  |  |  |  | David Cantrell EFE | 
| 451 |  |  |  |  |  |  |  | 
| 452 |  |  |  |  |  |  | Thanks to David Romano for some compatibility patches for Ye Aunciente Perl; | 
| 453 |  |  |  |  |  |  |  | 
| 454 |  |  |  |  |  |  | to Matt Knecht and David Romano for prodding me to support attributes, | 
| 455 |  |  |  |  |  |  | and to Matt for providing code to implement it in a quick n dirty minimal | 
| 456 |  |  |  |  |  |  | kind of way; | 
| 457 |  |  |  |  |  |  |  | 
| 458 |  |  |  |  |  |  | to the people on L and elsewhere who have been kind | 
| 459 |  |  |  |  |  |  | enough to point out ways it could be improved; | 
| 460 |  |  |  |  |  |  |  | 
| 461 |  |  |  |  |  |  | to Sergio Fanchiotti for pointing out a bug in handling self-closing tags, | 
| 462 |  |  |  |  |  |  | and for reporting another bug that I introduced when fixing the first one; | 
| 463 |  |  |  |  |  |  |  | 
| 464 |  |  |  |  |  |  | to 'Corion' for finding a bug with localised filehandles and providing a fix. | 
| 465 |  |  |  |  |  |  |  | 
| 466 |  |  |  |  |  |  | =head1 COPYRIGHT and LICENCE | 
| 467 |  |  |  |  |  |  |  | 
| 468 |  |  |  |  |  |  | Copyright 2007 David Cantrell | 
| 469 |  |  |  |  |  |  |  | 
| 470 |  |  |  |  |  |  | This module is free-as-in-speech software, and may be used, distributed, | 
| 471 |  |  |  |  |  |  | and modified under the same terms as Perl itself. | 
| 472 |  |  |  |  |  |  |  | 
| 473 |  |  |  |  |  |  | =head1 CONSPIRACY | 
| 474 |  |  |  |  |  |  |  | 
| 475 |  |  |  |  |  |  | This module is also free-as-in-mason software. | 
| 476 |  |  |  |  |  |  |  | 
| 477 |  |  |  |  |  |  | =cut | 
| 478 |  |  |  |  |  |  |  | 
| 479 |  |  |  |  |  |  | 'zero'; |