line |
stmt |
bran |
cond |
sub |
pod |
time |
code |
1
|
|
|
|
|
|
|
package XML::Filter::Dispatcher ; |
2
|
|
|
|
|
|
|
|
3
|
|
|
|
|
|
|
$VERSION = 0.52; |
4
|
|
|
|
|
|
|
|
5
|
|
|
|
|
|
|
=head1 NAME |
6
|
|
|
|
|
|
|
|
7
|
|
|
|
|
|
|
XML::Filter::Dispatcher - Path based event dispatching with DOM support |
8
|
|
|
|
|
|
|
|
9
|
|
|
|
|
|
|
=head1 SYNOPSIS |
10
|
|
|
|
|
|
|
|
11
|
|
|
|
|
|
|
use XML::Filter::Dispatcher qw( :all ); |
12
|
|
|
|
|
|
|
|
13
|
|
|
|
|
|
|
my $f = XML::Filter::Dispatcher->new( |
14
|
|
|
|
|
|
|
Rules => [ |
15
|
|
|
|
|
|
|
'foo' => \&handle_foo_start_tag, |
16
|
|
|
|
|
|
|
'@bar' => \&handle_bar_attr, |
17
|
|
|
|
|
|
|
|
18
|
|
|
|
|
|
|
## Send any elts and their contents to $handler |
19
|
|
|
|
|
|
|
'snarf//self::node()' => $handler, |
20
|
|
|
|
|
|
|
|
21
|
|
|
|
|
|
|
## Print the text of all elements |
22
|
|
|
|
|
|
|
'description' |
23
|
|
|
|
|
|
|
=> [ 'string()' => sub { push @out, xvalue } ], |
24
|
|
|
|
|
|
|
], |
25
|
|
|
|
|
|
|
|
26
|
|
|
|
|
|
|
Vars => { |
27
|
|
|
|
|
|
|
"id" => [ string => "12a" ], |
28
|
|
|
|
|
|
|
}, |
29
|
|
|
|
|
|
|
); |
30
|
|
|
|
|
|
|
|
31
|
|
|
|
|
|
|
=head1 DESCRIPTION |
32
|
|
|
|
|
|
|
|
33
|
|
|
|
|
|
|
B: Beta code alert. |
34
|
|
|
|
|
|
|
|
35
|
|
|
|
|
|
|
A SAX2 filter that dispatches SAX events based on "EventPath" patterns |
36
|
|
|
|
|
|
|
as the SAX events arrive. The SAX events are not buffered or converted |
37
|
|
|
|
|
|
|
to an in-memory document representation like a DOM tree. This provides |
38
|
|
|
|
|
|
|
for low lag operation because the actions associated with each pattern |
39
|
|
|
|
|
|
|
are executed as soon as possible, usually in an element's |
40
|
|
|
|
|
|
|
C event method. |
41
|
|
|
|
|
|
|
|
42
|
|
|
|
|
|
|
This differs from traditional XML pattern matching tools like |
43
|
|
|
|
|
|
|
XPath and XSLT (which is XPath-based) which require the entire |
44
|
|
|
|
|
|
|
document to be built in memory (as a "DOM tree") before queries can be |
45
|
|
|
|
|
|
|
executed. In SAX terms, this means that they have to build a DOM tree |
46
|
|
|
|
|
|
|
from SAX events and delay pattern matching until the C |
47
|
|
|
|
|
|
|
event method is called. |
48
|
|
|
|
|
|
|
|
49
|
|
|
|
|
|
|
=head2 Rules |
50
|
|
|
|
|
|
|
|
51
|
|
|
|
|
|
|
A rule is composed of a pattern and an action. Each |
52
|
|
|
|
|
|
|
XML::Filter::Dispatcher instance has a list of rules. As SAX events are |
53
|
|
|
|
|
|
|
received, the rules are evaluated and one rule's action is executed. If |
54
|
|
|
|
|
|
|
more than one rule matches an event, the rule with the highest score |
55
|
|
|
|
|
|
|
wins; by default a rule's score is its position in the rule list, so |
56
|
|
|
|
|
|
|
the last matching rule the list will be acted on. |
57
|
|
|
|
|
|
|
|
58
|
|
|
|
|
|
|
A simple rule list looks like: |
59
|
|
|
|
|
|
|
|
60
|
|
|
|
|
|
|
Rules => [ |
61
|
|
|
|
|
|
|
'a' => \&handle_a, |
62
|
|
|
|
|
|
|
'b' => \&handle_b, |
63
|
|
|
|
|
|
|
], |
64
|
|
|
|
|
|
|
|
65
|
|
|
|
|
|
|
=head3 Actions |
66
|
|
|
|
|
|
|
|
67
|
|
|
|
|
|
|
There are several types of actions: |
68
|
|
|
|
|
|
|
|
69
|
|
|
|
|
|
|
=over |
70
|
|
|
|
|
|
|
|
71
|
|
|
|
|
|
|
=item * |
72
|
|
|
|
|
|
|
|
73
|
|
|
|
|
|
|
CODE reference |
74
|
|
|
|
|
|
|
|
75
|
|
|
|
|
|
|
Rules => [ |
76
|
|
|
|
|
|
|
'a' => \&foo, |
77
|
|
|
|
|
|
|
'b' => sub { print "got a !\n" }, |
78
|
|
|
|
|
|
|
], |
79
|
|
|
|
|
|
|
|
80
|
|
|
|
|
|
|
=item * |
81
|
|
|
|
|
|
|
|
82
|
|
|
|
|
|
|
SAX handler |
83
|
|
|
|
|
|
|
|
84
|
|
|
|
|
|
|
Handler => $h, ## A downstream handler |
85
|
|
|
|
|
|
|
Rules => [ |
86
|
|
|
|
|
|
|
'a' => "Handler", |
87
|
|
|
|
|
|
|
'b' => $h2, ## Another handler |
88
|
|
|
|
|
|
|
], |
89
|
|
|
|
|
|
|
|
90
|
|
|
|
|
|
|
=item * |
91
|
|
|
|
|
|
|
|
92
|
|
|
|
|
|
|
undef |
93
|
|
|
|
|
|
|
|
94
|
|
|
|
|
|
|
Rules => [ |
95
|
|
|
|
|
|
|
'//node()' => $h, |
96
|
|
|
|
|
|
|
'b' => undef, |
97
|
|
|
|
|
|
|
], |
98
|
|
|
|
|
|
|
|
99
|
|
|
|
|
|
|
Useful for preventing other actions for some events |
100
|
|
|
|
|
|
|
|
101
|
|
|
|
|
|
|
=item * |
102
|
|
|
|
|
|
|
|
103
|
|
|
|
|
|
|
Perl code |
104
|
|
|
|
|
|
|
|
105
|
|
|
|
|
|
|
Rules => [ |
106
|
|
|
|
|
|
|
'b' => \q{print "got a !\n"}, |
107
|
|
|
|
|
|
|
], |
108
|
|
|
|
|
|
|
|
109
|
|
|
|
|
|
|
Lower overhead than a CODE reference. |
110
|
|
|
|
|
|
|
|
111
|
|
|
|
|
|
|
B. |
112
|
|
|
|
|
|
|
|
113
|
|
|
|
|
|
|
=back |
114
|
|
|
|
|
|
|
|
115
|
|
|
|
|
|
|
|
116
|
|
|
|
|
|
|
=head2 EventPath Patterns |
117
|
|
|
|
|
|
|
|
118
|
|
|
|
|
|
|
Note: this section describes EventPath and discusses differences between |
119
|
|
|
|
|
|
|
EventPath and XPath. If you are not familiar with XPath you may want |
120
|
|
|
|
|
|
|
to skim those bits; they're provided for the benefit of people coming |
121
|
|
|
|
|
|
|
from an XPath background but hopefully don't hinder others. A working |
122
|
|
|
|
|
|
|
knowledge of SAX is necessary for the advanced bits. |
123
|
|
|
|
|
|
|
|
124
|
|
|
|
|
|
|
EventPath patterns may match the document, elements, attributes, text |
125
|
|
|
|
|
|
|
nodes, comments, processing instructions, and (not yet implemented) |
126
|
|
|
|
|
|
|
namespace nodes. Patterns like this are referred to as "location paths" |
127
|
|
|
|
|
|
|
and resemble Unix file paths or URIs in appearance and functionality. |
128
|
|
|
|
|
|
|
|
129
|
|
|
|
|
|
|
Location paths describe a location (or set of locations) in the document |
130
|
|
|
|
|
|
|
much the same way a filespec describes a location in a filesystem. The |
131
|
|
|
|
|
|
|
path C could refer to a directory named C on a filesystem or |
132
|
|
|
|
|
|
|
a set of CcE> elements in an XML document. In either case, |
133
|
|
|
|
|
|
|
the path indicates that C must be a child of C, C must be |
134
|
|
|
|
|
|
|
's, and is a root level entity. More examples later. |
135
|
|
|
|
|
|
|
|
136
|
|
|
|
|
|
|
EventPath patterns may also extract strings, numbers and boolean values |
137
|
|
|
|
|
|
|
from a document. These are called "expression patterns" and are only |
138
|
|
|
|
|
|
|
said to match when the values they extract are "true" according to XPath |
139
|
|
|
|
|
|
|
semantics (XPath truth-ness differs from Perl truth-ness, see |
140
|
|
|
|
|
|
|
EventPath Truth below). Expression patterns look |
141
|
|
|
|
|
|
|
like C or C, and if the result |
142
|
|
|
|
|
|
|
is true, the action will be executed and the result can be retrieved |
143
|
|
|
|
|
|
|
using the L method. |
144
|
|
|
|
|
|
|
|
145
|
|
|
|
|
|
|
TODO: rename xvalue to be ep_result or something. |
146
|
|
|
|
|
|
|
|
147
|
|
|
|
|
|
|
We cover patterns in more detail below, starting with some examples. |
148
|
|
|
|
|
|
|
|
149
|
|
|
|
|
|
|
If you'd like to get some experience with pattern matching in an |
150
|
|
|
|
|
|
|
interactive XPath web site, there's a really good XPath/XSLT based |
151
|
|
|
|
|
|
|
tutorial and lab at |
152
|
|
|
|
|
|
|
L. |
153
|
|
|
|
|
|
|
|
154
|
|
|
|
|
|
|
=head2 Actions |
155
|
|
|
|
|
|
|
|
156
|
|
|
|
|
|
|
Two kinds of actions are supported: Perl subroutine calls and |
157
|
|
|
|
|
|
|
dispatching events to other SAX processors. When a pattern matches, the |
158
|
|
|
|
|
|
|
associated action |
159
|
|
|
|
|
|
|
|
160
|
|
|
|
|
|
|
=head2 Examples |
161
|
|
|
|
|
|
|
|
162
|
|
|
|
|
|
|
This is perhaps best introduced by some examples. Here's a routine that runs a |
163
|
|
|
|
|
|
|
rather knuckleheaded document through a dispatcher: |
164
|
|
|
|
|
|
|
|
165
|
|
|
|
|
|
|
use XML::SAX::Machines qw( Pipeline ); |
166
|
|
|
|
|
|
|
|
167
|
|
|
|
|
|
|
sub run { Pipeline( shift )->parse_string( <
|
168
|
|
|
|
|
|
|
|
169
|
|
|
|
|
|
|
|
170
|
|
|
|
|
|
|
Bully |
171
|
|
|
|
|
|
|
|
172
|
|
|
|
|
|
|
|
173
|
|
|
|
|
|
|
Klutz |
174
|
|
|
|
|
|
|
|
175
|
|
|
|
|
|
|
Middleman |
176
|
|
|
|
|
|
|
|
177
|
|
|
|
|
|
|
|
178
|
|
|
|
|
|
|
|
179
|
|
|
|
|
|
|
Fool |
180
|
|
|
|
|
|
|
|
181
|
|
|
|
|
|
|
|
182
|
|
|
|
|
|
|
|
183
|
|
|
|
|
|
|
|
184
|
|
|
|
|
|
|
|
185
|
|
|
|
|
|
|
|
186
|
|
|
|
|
|
|
|
187
|
|
|
|
|
|
|
XML_END |
188
|
|
|
|
|
|
|
|
189
|
|
|
|
|
|
|
=over |
190
|
|
|
|
|
|
|
|
191
|
|
|
|
|
|
|
=item Counting Stooges |
192
|
|
|
|
|
|
|
|
193
|
|
|
|
|
|
|
Let's count the number of stooge characters in that document. To do that, we'd |
194
|
|
|
|
|
|
|
like a rule that fires on almost all CstoogeE> elements: |
195
|
|
|
|
|
|
|
|
196
|
|
|
|
|
|
|
my $count; |
197
|
|
|
|
|
|
|
|
198
|
|
|
|
|
|
|
run( |
199
|
|
|
|
|
|
|
XML::Filter::Dispatcher->new( |
200
|
|
|
|
|
|
|
Rules => [ |
201
|
|
|
|
|
|
|
'stooge' => sub { ++$count }, |
202
|
|
|
|
|
|
|
], |
203
|
|
|
|
|
|
|
) |
204
|
|
|
|
|
|
|
); |
205
|
|
|
|
|
|
|
|
206
|
|
|
|
|
|
|
print "$count\n"; ## 7 |
207
|
|
|
|
|
|
|
|
208
|
|
|
|
|
|
|
Hmmm, that's one too many: it's picking up on Shemp twice since the document |
209
|
|
|
|
|
|
|
shows that Shemp had two periods of stoogedom. The second node has a |
210
|
|
|
|
|
|
|
convenient C attribute we can use to ignore the duplicate. |
211
|
|
|
|
|
|
|
|
212
|
|
|
|
|
|
|
We can ignore the duplicate element by adding a "predicate" |
213
|
|
|
|
|
|
|
expression to the pattern to accept only those elements with no C |
214
|
|
|
|
|
|
|
attribute. Changing that rule to |
215
|
|
|
|
|
|
|
|
216
|
|
|
|
|
|
|
'stooge[not(@repeat)]' => ... |
217
|
|
|
|
|
|
|
|
218
|
|
|
|
|
|
|
or even the more pedantic |
219
|
|
|
|
|
|
|
|
220
|
|
|
|
|
|
|
'stooge[not(@repeat) or not(@repeat = "yes")]' => ... |
221
|
|
|
|
|
|
|
|
222
|
|
|
|
|
|
|
yields the expected answer (6). |
223
|
|
|
|
|
|
|
|
224
|
|
|
|
|
|
|
=item Hairstyles and Attitudes |
225
|
|
|
|
|
|
|
|
226
|
|
|
|
|
|
|
Now let's try to figure out the hairstyles the stooges wore. To extract |
227
|
|
|
|
|
|
|
just the names of hairstyles, we could do something like: |
228
|
|
|
|
|
|
|
|
229
|
|
|
|
|
|
|
my %styles; |
230
|
|
|
|
|
|
|
|
231
|
|
|
|
|
|
|
run( |
232
|
|
|
|
|
|
|
XML::Filter::Dispatcher->new( |
233
|
|
|
|
|
|
|
Rules => [ |
234
|
|
|
|
|
|
|
'stooge' => [ |
235
|
|
|
|
|
|
|
'string( @hairstyle )' => sub { $styles{xvalue()} = 1 }, |
236
|
|
|
|
|
|
|
], |
237
|
|
|
|
|
|
|
], |
238
|
|
|
|
|
|
|
) |
239
|
|
|
|
|
|
|
); |
240
|
|
|
|
|
|
|
|
241
|
|
|
|
|
|
|
print join( ", ", sort keys %styles ), "\n"; |
242
|
|
|
|
|
|
|
|
243
|
|
|
|
|
|
|
which prints "bald, bowl cut, bushy, mop". That rule extracts the text |
244
|
|
|
|
|
|
|
of each C attribute and the C returns it. |
245
|
|
|
|
|
|
|
|
246
|
|
|
|
|
|
|
The text contents of elements like CattitudesE> can also be |
247
|
|
|
|
|
|
|
sussed out by using a rule like: |
248
|
|
|
|
|
|
|
|
249
|
|
|
|
|
|
|
'string( attitude )' => sub { $styles{xvalue()} = 1 }, |
250
|
|
|
|
|
|
|
|
251
|
|
|
|
|
|
|
which prints "Bully, Fool, Klutz, Middleman". |
252
|
|
|
|
|
|
|
|
253
|
|
|
|
|
|
|
Finally, we might want to correlate hairstyles and attitudes by using |
254
|
|
|
|
|
|
|
a rule like: |
255
|
|
|
|
|
|
|
|
256
|
|
|
|
|
|
|
my %styles; |
257
|
|
|
|
|
|
|
|
258
|
|
|
|
|
|
|
run( |
259
|
|
|
|
|
|
|
XML::Filter::Dispatcher->new( |
260
|
|
|
|
|
|
|
Rules => [ |
261
|
|
|
|
|
|
|
'stooge' => [ |
262
|
|
|
|
|
|
|
'concat(@hairstyle,"=>",attitude)' => sub { |
263
|
|
|
|
|
|
|
$styles{$1} = $2 if xvalue() =~ /(.+)=>(.+)/; |
264
|
|
|
|
|
|
|
}, |
265
|
|
|
|
|
|
|
], |
266
|
|
|
|
|
|
|
], |
267
|
|
|
|
|
|
|
) |
268
|
|
|
|
|
|
|
); |
269
|
|
|
|
|
|
|
|
270
|
|
|
|
|
|
|
print map "$_ => $styles{$_}\n", sort keys %styles; |
271
|
|
|
|
|
|
|
|
272
|
|
|
|
|
|
|
which prints: |
273
|
|
|
|
|
|
|
|
274
|
|
|
|
|
|
|
bald => Fool |
275
|
|
|
|
|
|
|
bowl cut => Bully |
276
|
|
|
|
|
|
|
bushy => Middleman |
277
|
|
|
|
|
|
|
mop => Klutz |
278
|
|
|
|
|
|
|
|
279
|
|
|
|
|
|
|
=back |
280
|
|
|
|
|
|
|
|
281
|
|
|
|
|
|
|
=head3 Examples that need to be written |
282
|
|
|
|
|
|
|
|
283
|
|
|
|
|
|
|
=over |
284
|
|
|
|
|
|
|
|
285
|
|
|
|
|
|
|
=item * Examples for accumulating data |
286
|
|
|
|
|
|
|
|
287
|
|
|
|
|
|
|
=item * Advanced pattern matching examples |
288
|
|
|
|
|
|
|
|
289
|
|
|
|
|
|
|
=back |
290
|
|
|
|
|
|
|
|
291
|
|
|
|
|
|
|
=cut |
292
|
|
|
|
|
|
|
|
293
|
|
|
|
|
|
|
=head2 Sending Trees and Events to Other SAX Handlers |
294
|
|
|
|
|
|
|
|
295
|
|
|
|
|
|
|
When a blessed object C<$handler> is provided as an action for a rule: |
296
|
|
|
|
|
|
|
|
297
|
|
|
|
|
|
|
my $foo = XML::Handler::Foo->new(); |
298
|
|
|
|
|
|
|
my $d = XML::Filter::Dispatcher->new( |
299
|
|
|
|
|
|
|
Rules => [ |
300
|
|
|
|
|
|
|
'foo' => $handler, |
301
|
|
|
|
|
|
|
], |
302
|
|
|
|
|
|
|
Handler => $h, |
303
|
|
|
|
|
|
|
); |
304
|
|
|
|
|
|
|
|
305
|
|
|
|
|
|
|
the selected SAX events are sent to C<$handler>. |
306
|
|
|
|
|
|
|
|
307
|
|
|
|
|
|
|
=head3 Element Forwarding |
308
|
|
|
|
|
|
|
|
309
|
|
|
|
|
|
|
If the event is selected is a C or C |
310
|
|
|
|
|
|
|
event and it is selected without using the C or |
311
|
|
|
|
|
|
|
C axes, then the handler (C<$foo>) replaces the |
312
|
|
|
|
|
|
|
existing handler of the dispatcher (C<$h>) until after the corresponding |
313
|
|
|
|
|
|
|
C event is received. |
314
|
|
|
|
|
|
|
|
315
|
|
|
|
|
|
|
This causes the entire element (CfooE>) to be sent to the |
316
|
|
|
|
|
|
|
temporary handler (C<$foo>). In the example, each CfooE> |
317
|
|
|
|
|
|
|
element will be sent to C<$foo> as a separate document, so if |
318
|
|
|
|
|
|
|
(whitespace shown as underscores) |
319
|
|
|
|
|
|
|
|
320
|
|
|
|
|
|
|
|
321
|
|
|
|
|
|
|
____.... |
322
|
|
|
|
|
|
|
____.... |
323
|
|
|
|
|
|
|
____.... |
324
|
|
|
|
|
|
|
|
325
|
|
|
|
|
|
|
|
326
|
|
|
|
|
|
|
is fed in to C<$d>, then C<$foo> will receive 3 separate |
327
|
|
|
|
|
|
|
|
328
|
|
|
|
|
|
|
... |
329
|
|
|
|
|
|
|
|
330
|
|
|
|
|
|
|
documents (C and C events are emitted |
331
|
|
|
|
|
|
|
as necessary) and C<$h> will receive a single document without any |
332
|
|
|
|
|
|
|
CfooE> elements: |
333
|
|
|
|
|
|
|
|
334
|
|
|
|
|
|
|
|
335
|
|
|
|
|
|
|
____ |
336
|
|
|
|
|
|
|
____ |
337
|
|
|
|
|
|
|
____ |
338
|
|
|
|
|
|
|
|
339
|
|
|
|
|
|
|
|
340
|
|
|
|
|
|
|
This can be useful for parsing large XML files in small chunks, often in |
341
|
|
|
|
|
|
|
conjunction with L or |
342
|
|
|
|
|
|
|
L. |
343
|
|
|
|
|
|
|
|
344
|
|
|
|
|
|
|
But what if you don't want C<$foo> to see three separate documents? |
345
|
|
|
|
|
|
|
What if you're excerpting chunks of a document to create another |
346
|
|
|
|
|
|
|
document? This can be done by telling the dispatcher to emit the main |
347
|
|
|
|
|
|
|
document to C<$foo> and using rules with an action of C to elide |
348
|
|
|
|
|
|
|
the events that are not wanted. This setup: |
349
|
|
|
|
|
|
|
|
350
|
|
|
|
|
|
|
my $foo = XML::Handler::Foo->new(); |
351
|
|
|
|
|
|
|
my $d = XML::Filter::Dispatcher->new( |
352
|
|
|
|
|
|
|
Rules => [ |
353
|
|
|
|
|
|
|
'/' => $foo, |
354
|
|
|
|
|
|
|
'bar' => undef, |
355
|
|
|
|
|
|
|
'foo' => $foo, |
356
|
|
|
|
|
|
|
], |
357
|
|
|
|
|
|
|
Handler => $h, |
358
|
|
|
|
|
|
|
); |
359
|
|
|
|
|
|
|
|
360
|
|
|
|
|
|
|
, when fed this document: |
361
|
|
|
|
|
|
|
|
362
|
|
|
|
|
|
|
|
363
|
|
|
|
|
|
|
__hork |
364
|
|
|
|
|
|
|
__ |
365
|
|
|
|
|
|
|
__.... |
366
|
|
|
|
|
|
|
__.... |
367
|
|
|
|
|
|
|
__.... |
368
|
|
|
|
|
|
|
__ |
369
|
|
|
|
|
|
|
__ |
370
|
|
|
|
|
|
|
__ |
371
|
|
|
|
|
|
|
|
372
|
|
|
|
|
|
|
|
373
|
|
|
|
|
|
|
results in C<$foo> receiving a single document of input looking like |
374
|
|
|
|
|
|
|
this: |
375
|
|
|
|
|
|
|
|
376
|
|
|
|
|
|
|
|
377
|
|
|
|
|
|
|
__ |
378
|
|
|
|
|
|
|
__.... |
379
|
|
|
|
|
|
|
__.... |
380
|
|
|
|
|
|
|
__.... |
381
|
|
|
|
|
|
|
__ |
382
|
|
|
|
|
|
|
|
383
|
|
|
|
|
|
|
|
384
|
|
|
|
|
|
|
XML::Filter::Dispatcher keeps track of each handler and sends |
385
|
|
|
|
|
|
|
C and C at the appropriate times, so |
386
|
|
|
|
|
|
|
the CfooE> elements are "hoisted" out of the CbarE> |
387
|
|
|
|
|
|
|
element in this example without any untoward C<..._document()> events. |
388
|
|
|
|
|
|
|
|
389
|
|
|
|
|
|
|
B: support forwarding to multiple documents at a time. At the |
390
|
|
|
|
|
|
|
present, using multiple handlers for the same event is not supported. |
391
|
|
|
|
|
|
|
|
392
|
|
|
|
|
|
|
=head3 Discrete Event Forwarding |
393
|
|
|
|
|
|
|
|
394
|
|
|
|
|
|
|
B: At the moment, selecting and forwarding individual events is |
395
|
|
|
|
|
|
|
not supported. When it is, any events other than those covered above |
396
|
|
|
|
|
|
|
will be forwarded individually |
397
|
|
|
|
|
|
|
|
398
|
|
|
|
|
|
|
=head2 Tracing |
399
|
|
|
|
|
|
|
|
400
|
|
|
|
|
|
|
XML::Filter::Dispatcher checks when it is first loaded to see if |
401
|
|
|
|
|
|
|
L is loaded. If so, it will emit tracing |
402
|
|
|
|
|
|
|
messages. Typical use looks like |
403
|
|
|
|
|
|
|
|
404
|
|
|
|
|
|
|
perl -d:Devel::TraceSAX script_using_x_f_dispatcher |
405
|
|
|
|
|
|
|
|
406
|
|
|
|
|
|
|
If you are C |
407
|
|
|
|
|
|
|
loaded before XML::Filter::Dispatcher. |
408
|
|
|
|
|
|
|
|
409
|
|
|
|
|
|
|
TODO: Allow tracing to be enabled/disabled independantly of Devel::TraceSAX. |
410
|
|
|
|
|
|
|
|
411
|
|
|
|
|
|
|
=head2 Namespace Support |
412
|
|
|
|
|
|
|
|
413
|
|
|
|
|
|
|
XML::Filter::Dispatcher offers namespace support in matching and by |
414
|
|
|
|
|
|
|
providing functions like local-name(). If the documents you are |
415
|
|
|
|
|
|
|
processing don't use namespaces, or you only care about elements and |
416
|
|
|
|
|
|
|
attributes in the default namespace (ie without a "foo:" namespace |
417
|
|
|
|
|
|
|
prefix), then you need not worry about engaging |
418
|
|
|
|
|
|
|
XML::Filter::Dispatcher's namespace support. You do need it if your |
419
|
|
|
|
|
|
|
patterns contain the C construct (that C<*> is literal). |
420
|
|
|
|
|
|
|
|
421
|
|
|
|
|
|
|
To specify the namespaces, pass in an option like |
422
|
|
|
|
|
|
|
|
423
|
|
|
|
|
|
|
Namespaces => { |
424
|
|
|
|
|
|
|
"" => "uri0", ## Default namespace |
425
|
|
|
|
|
|
|
prefix1 => "uri1", |
426
|
|
|
|
|
|
|
prefix2 => "uri2", |
427
|
|
|
|
|
|
|
}, |
428
|
|
|
|
|
|
|
|
429
|
|
|
|
|
|
|
Then use C and C whereever necessary in patterns. |
430
|
|
|
|
|
|
|
|
431
|
|
|
|
|
|
|
A missing prefix on an element always maps to the default namespace URI, |
432
|
|
|
|
|
|
|
which is "" by default. Attributes are treated likewise, though this |
433
|
|
|
|
|
|
|
is probably a bug. |
434
|
|
|
|
|
|
|
|
435
|
|
|
|
|
|
|
If your patterns contain prefixes (like the C in C), and |
436
|
|
|
|
|
|
|
you don't provide a Namespaces option, then the element names will |
437
|
|
|
|
|
|
|
silently be matched literally as "foo:bar", whether or not the source |
438
|
|
|
|
|
|
|
document declares namespaces. B
|
439
|
|
|
|
|
|
|
much user confusion>. |
440
|
|
|
|
|
|
|
|
441
|
|
|
|
|
|
|
XML::Filter::Dispatcher follows the XPath specification rather literally |
442
|
|
|
|
|
|
|
and does not allow C<:*>, which you might think would match all nodes in |
443
|
|
|
|
|
|
|
the default namespace. To do this, ass a prefixe for the default |
444
|
|
|
|
|
|
|
namespace URI: |
445
|
|
|
|
|
|
|
|
446
|
|
|
|
|
|
|
Namespaces => { |
447
|
|
|
|
|
|
|
"" => "uri0", ## Default namespace |
448
|
|
|
|
|
|
|
"default" => "uri0", ## Default namespace |
449
|
|
|
|
|
|
|
prefix1 => "uri1", |
450
|
|
|
|
|
|
|
prefix2 => "uri2", |
451
|
|
|
|
|
|
|
}, |
452
|
|
|
|
|
|
|
|
453
|
|
|
|
|
|
|
then use "default:*" to match it. |
454
|
|
|
|
|
|
|
|
455
|
|
|
|
|
|
|
B: Currently, all rules must exist in the same namespace |
456
|
|
|
|
|
|
|
context. This will be changed when I need to change it (contact me |
457
|
|
|
|
|
|
|
if you need it changed). The current idear is to allow a special |
458
|
|
|
|
|
|
|
function "Namespaces( { .... }, @rules )" that enables a temporary |
459
|
|
|
|
|
|
|
namespace context, although abbreviated forms might be possible. |
460
|
|
|
|
|
|
|
|
461
|
|
|
|
|
|
|
=head2 EventPath Dialect |
462
|
|
|
|
|
|
|
|
463
|
|
|
|
|
|
|
"EventPath" patterns are that large subset of XPath patterns that can be |
464
|
|
|
|
|
|
|
run in a SAX environment without a DOM. There are a few crucial |
465
|
|
|
|
|
|
|
differences between the environments that EventPath and XPath each |
466
|
|
|
|
|
|
|
operate in. |
467
|
|
|
|
|
|
|
|
468
|
|
|
|
|
|
|
XPath operates on a tree of "nodes" where each entity in an XML document |
469
|
|
|
|
|
|
|
has only one corresponding node. The tree metaphor used in XPath has a |
470
|
|
|
|
|
|
|
literal representation in memory. For instance, an element |
471
|
|
|
|
|
|
|
CfooE> is represented by a single node which contains other |
472
|
|
|
|
|
|
|
nodes. |
473
|
|
|
|
|
|
|
|
474
|
|
|
|
|
|
|
EventPath operates on a series of events instead of a tree of nodes. |
475
|
|
|
|
|
|
|
For instance elements, which are represented by nodes in DOM trees, are |
476
|
|
|
|
|
|
|
represented by two event method calls, C and |
477
|
|
|
|
|
|
|
C. This means that EventPath patterns may match in a |
478
|
|
|
|
|
|
|
C method or an C method, or even both if you try |
479
|
|
|
|
|
|
|
hard enough. |
480
|
|
|
|
|
|
|
|
481
|
|
|
|
|
|
|
The only times an EventPath pattern will match in an |
482
|
|
|
|
|
|
|
C method are when the pattern refers to an element's contents |
483
|
|
|
|
|
|
|
or it uses the XXXX function (described below) to do so |
484
|
|
|
|
|
|
|
intentionally. |
485
|
|
|
|
|
|
|
|
486
|
|
|
|
|
|
|
The tree metaphor is used to arrange and describe the |
487
|
|
|
|
|
|
|
relationships between events. In the DOM trees an XPath engine operates |
488
|
|
|
|
|
|
|
on, a document or an element is represented by a single entity, called a |
489
|
|
|
|
|
|
|
node. In the event streams that EventPath operates on, documents and |
490
|
|
|
|
|
|
|
element |
491
|
|
|
|
|
|
|
|
492
|
|
|
|
|
|
|
=head3 Why EventPath and not XPath? |
493
|
|
|
|
|
|
|
|
494
|
|
|
|
|
|
|
EventPath is not a standard of any kind, but XPath can't cope with |
495
|
|
|
|
|
|
|
situations where there is no DOM and there are some features that |
496
|
|
|
|
|
|
|
EventPath need (start_element() vs. end_element() processing for |
497
|
|
|
|
|
|
|
example) that are not compatible with XPath. |
498
|
|
|
|
|
|
|
|
499
|
|
|
|
|
|
|
Some of the features of XPath require that the source document be fully |
500
|
|
|
|
|
|
|
translated in to a DOM tree of nodes before the features can be evaluated. |
501
|
|
|
|
|
|
|
(Nodes are things like elements, attributes, text, comments, processing |
502
|
|
|
|
|
|
|
instructions, namespace mappings etc). |
503
|
|
|
|
|
|
|
|
504
|
|
|
|
|
|
|
These features are not supported and are not likely to be, you might |
505
|
|
|
|
|
|
|
want to use L for "full" XPath |
506
|
|
|
|
|
|
|
support (tho it be in an XSLT framework) or wait for |
507
|
|
|
|
|
|
|
L to grow SAX support. |
508
|
|
|
|
|
|
|
|
509
|
|
|
|
|
|
|
Rather than build a DOM, XML::Filter::Dispatcher only keeps a bare |
510
|
|
|
|
|
|
|
minimum of nodes: the current node and its parent, grandparent, and so |
511
|
|
|
|
|
|
|
on, up to the document ("root") node (basically the /ancestor-or-self:: |
512
|
|
|
|
|
|
|
axis). This is called the "context stack", although you may not need to |
513
|
|
|
|
|
|
|
know that term unless you delve in to the guts. |
514
|
|
|
|
|
|
|
|
515
|
|
|
|
|
|
|
=head3 EventPath Truth |
516
|
|
|
|
|
|
|
|
517
|
|
|
|
|
|
|
EventPath borrows a lot from XPath including its notion of truth. |
518
|
|
|
|
|
|
|
This is different from Perl's notion of truth; presumably to make |
519
|
|
|
|
|
|
|
document processing easier. Here's a table that may help, the |
520
|
|
|
|
|
|
|
important differences are towards the end: |
521
|
|
|
|
|
|
|
|
522
|
|
|
|
|
|
|
Expression EventPath XPath Perl |
523
|
|
|
|
|
|
|
========== ========= ===== ==== |
524
|
|
|
|
|
|
|
false() FALSE FALSE n/a (not applicable) |
525
|
|
|
|
|
|
|
true() TRUE TRUE n/a |
526
|
|
|
|
|
|
|
0 FALSE FALSE FALSE |
527
|
|
|
|
|
|
|
-0 FALSE** FALSE n/a |
528
|
|
|
|
|
|
|
NaN FALSE** FALSE n/a (not fully, anyway) |
529
|
|
|
|
|
|
|
1 TRUE TRUE TRUE |
530
|
|
|
|
|
|
|
"" FALSE FALSE FALSE |
531
|
|
|
|
|
|
|
"1" TRUE TRUE TRUE |
532
|
|
|
|
|
|
|
|
533
|
|
|
|
|
|
|
"0" TRUE TRUE FALSE |
534
|
|
|
|
|
|
|
|
535
|
|
|
|
|
|
|
* To be regarded as a bug in this implementation |
536
|
|
|
|
|
|
|
** Only partially implemented/supported in this implementation |
537
|
|
|
|
|
|
|
|
538
|
|
|
|
|
|
|
Note: it looks like XPath 2.0 is defining a more workable concept |
539
|
|
|
|
|
|
|
for document processing that uses something resembling Perl's empty |
540
|
|
|
|
|
|
|
lists, C<()>, to indicate empty values, so C<""> and C<()> will be |
541
|
|
|
|
|
|
|
distinct and C<"0"> can be interpreted as false like in Perl. XPath2 |
542
|
|
|
|
|
|
|
is I provided by this module yet and won't be for a long time |
543
|
|
|
|
|
|
|
(patches welcome ;). |
544
|
|
|
|
|
|
|
|
545
|
|
|
|
|
|
|
=head3 EventPath Examples |
546
|
|
|
|
|
|
|
|
547
|
|
|
|
|
|
|
All of this means that only a portion of XPath is available. Luckily, |
548
|
|
|
|
|
|
|
that portion is also quite useful. Here are examples of working XPath |
549
|
|
|
|
|
|
|
expressions, followed by known unimplemented features. |
550
|
|
|
|
|
|
|
|
551
|
|
|
|
|
|
|
TODO: There is also an extension function available to differentiate between |
552
|
|
|
|
|
|
|
C and C events. By default |
553
|
|
|
|
|
|
|
|
554
|
|
|
|
|
|
|
=head2 Examples |
555
|
|
|
|
|
|
|
|
556
|
|
|
|
|
|
|
Expression Event Type Description (event type) |
557
|
|
|
|
|
|
|
========== ========== ======================== |
558
|
|
|
|
|
|
|
/ start_document Selects the document node |
559
|
|
|
|
|
|
|
|
560
|
|
|
|
|
|
|
/a start_element Root elt, if it's "" |
561
|
|
|
|
|
|
|
|
562
|
|
|
|
|
|
|
a start_element All "a" elements |
563
|
|
|
|
|
|
|
|
564
|
|
|
|
|
|
|
b//c start_element All "c" descendants of "b" elt.s |
565
|
|
|
|
|
|
|
|
566
|
|
|
|
|
|
|
@id start_element All "id" attributes |
567
|
|
|
|
|
|
|
|
568
|
|
|
|
|
|
|
string( foo ) end_element matches at the first or |
569
|
|
|
|
|
|
|
in the current element; |
570
|
|
|
|
|
|
|
xvalue() returns the |
571
|
|
|
|
|
|
|
text contained in "..." |
572
|
|
|
|
|
|
|
|
573
|
|
|
|
|
|
|
string( @name ) start_element the first "name" attributes; |
574
|
|
|
|
|
|
|
xvalue() returns the |
575
|
|
|
|
|
|
|
text of the attribute. |
576
|
|
|
|
|
|
|
|
577
|
|
|
|
|
|
|
=head2 Methods and Functions |
578
|
|
|
|
|
|
|
|
579
|
|
|
|
|
|
|
There are several APIs provided: general, xstack, and EventPath |
580
|
|
|
|
|
|
|
variable handling. |
581
|
|
|
|
|
|
|
|
582
|
|
|
|
|
|
|
The general API provides C and C, C, and |
583
|
|
|
|
|
|
|
C. |
584
|
|
|
|
|
|
|
|
585
|
|
|
|
|
|
|
The variables API provides C and C. |
586
|
|
|
|
|
|
|
|
587
|
|
|
|
|
|
|
The xstack API provides C, C, C, |
588
|
|
|
|
|
|
|
C, C and C. |
589
|
|
|
|
|
|
|
|
590
|
|
|
|
|
|
|
All of the "xfoo()" APIs may be called as a method or, |
591
|
|
|
|
|
|
|
within rule handlers, called as a function: |
592
|
|
|
|
|
|
|
|
593
|
|
|
|
|
|
|
$d = XML::Filter::Dispatcher->new( |
594
|
|
|
|
|
|
|
Rules => [ |
595
|
|
|
|
|
|
|
"/" => sub { |
596
|
|
|
|
|
|
|
xpush "foo\n"; |
597
|
|
|
|
|
|
|
print xpeek; ## Prints "foo\n" |
598
|
|
|
|
|
|
|
|
599
|
|
|
|
|
|
|
my $self = shift; |
600
|
|
|
|
|
|
|
print $self->xpeek; ## Also prints "foo\n" |
601
|
|
|
|
|
|
|
}, |
602
|
|
|
|
|
|
|
], |
603
|
|
|
|
|
|
|
); |
604
|
|
|
|
|
|
|
|
605
|
|
|
|
|
|
|
print $d->xpeek; ## Yup, prints "foo\n" as well. |
606
|
|
|
|
|
|
|
|
607
|
|
|
|
|
|
|
This dual nature allows you to import the APIs as functions and call them |
608
|
|
|
|
|
|
|
using a concise function-call style, or to leave them as methods and |
609
|
|
|
|
|
|
|
use object-oriented style. |
610
|
|
|
|
|
|
|
|
611
|
|
|
|
|
|
|
Each call may be imported by name: |
612
|
|
|
|
|
|
|
|
613
|
|
|
|
|
|
|
use XML::Filter::Dispatcher qw( xpush xpeek ); |
614
|
|
|
|
|
|
|
|
615
|
|
|
|
|
|
|
or by one of three API category tags: |
616
|
|
|
|
|
|
|
|
617
|
|
|
|
|
|
|
use XML::Filter::Dispatcher ":general"; ## xvalue() |
618
|
|
|
|
|
|
|
use XML::Filter::Dispatcher ":variables"; ## xset_var(), xget_var() |
619
|
|
|
|
|
|
|
use XML::Filter::Dispatcher ":xstack"; ## xpush(), xpop(), and xpeek() |
620
|
|
|
|
|
|
|
|
621
|
|
|
|
|
|
|
or en mass: |
622
|
|
|
|
|
|
|
|
623
|
|
|
|
|
|
|
use XML::Filter::Dispatcher ":all"; |
624
|
|
|
|
|
|
|
|
625
|
|
|
|
|
|
|
=cut |
626
|
|
|
|
|
|
|
|
627
|
|
|
|
|
|
|
require Exporter; |
628
|
|
|
|
|
|
|
*import = \&Exporter::import; |
629
|
|
|
|
|
|
|
|
630
|
|
|
|
|
|
|
BEGIN { |
631
|
16
|
|
|
16
|
|
410291
|
my @general_API = qw( xvalue xvaluetype xvalue_type xevent_type xrun_next_action ); |
632
|
16
|
|
|
|
|
58
|
my @xstack_API = qw( xpeek xpop xadd xset xoverwrite xpush xstack_empty xstack_max ); |
633
|
16
|
|
|
|
|
38
|
my @variables_API = qw( xset_var xget_var ); |
634
|
16
|
|
|
|
|
57
|
@EXPORT_OK = ( @general_API, @variables_API, @xstack_API ); |
635
|
16
|
|
|
|
|
413
|
%EXPORT_TAGS = ( |
636
|
|
|
|
|
|
|
all => \@EXPORT_OK, |
637
|
|
|
|
|
|
|
general => \@general_API, |
638
|
|
|
|
|
|
|
xstack => \@xstack_API, |
639
|
|
|
|
|
|
|
autostack => \@xstack_API, # deprecated |
640
|
|
|
|
|
|
|
variables => \@variables_API, |
641
|
|
|
|
|
|
|
); |
642
|
|
|
|
|
|
|
} |
643
|
|
|
|
|
|
|
|
644
|
|
|
|
|
|
|
|
645
|
16
|
|
|
16
|
|
122
|
use strict ; |
|
16
|
|
|
|
|
24
|
|
|
16
|
|
|
|
|
583
|
|
646
|
|
|
|
|
|
|
|
647
|
16
|
|
|
16
|
|
80
|
use Carp qw( confess ); |
|
16
|
|
|
|
|
39
|
|
|
16
|
|
|
|
|
1271
|
|
648
|
0
|
|
|
0
|
|
|
sub croak { goto &Carp::confess } |
649
|
|
|
|
|
|
|
|
650
|
|
|
|
|
|
|
#use XML::SAX::Base; |
651
|
|
|
|
|
|
|
#use XML::NamespaceSupport; |
652
|
16
|
|
|
16
|
|
25373
|
use XML::SAX::EventMethodMaker qw( compile_missing_methods sax_event_names ); |
|
0
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
653
|
|
|
|
|
|
|
|
654
|
|
|
|
|
|
|
use constant is_tracing => defined $Devel::TraceSAX::VERSION; |
655
|
|
|
|
|
|
|
# Devel::TraceSAX does not work under perl5.8.0 |
656
|
|
|
|
|
|
|
#use constant is_tracing => 1; |
657
|
|
|
|
|
|
|
#sub emit_trace_SAX_message { warn @_ }; |
658
|
|
|
|
|
|
|
|
659
|
|
|
|
|
|
|
use constant show_buffer_highwater => |
660
|
|
|
|
|
|
|
$ENV{XFDSHOWBUFFERHIGHWATER} || 0; |
661
|
|
|
|
|
|
|
|
662
|
|
|
|
|
|
|
BEGIN { |
663
|
|
|
|
|
|
|
eval( is_tracing |
664
|
|
|
|
|
|
|
? 'use Devel::TraceSAX qw( emit_trace_SAX_message ); 1' |
665
|
|
|
|
|
|
|
: 'sub emit_trace_SAX_message; 1' |
666
|
|
|
|
|
|
|
) or die $@; |
667
|
|
|
|
|
|
|
} |
668
|
|
|
|
|
|
|
|
669
|
|
|
|
|
|
|
|
670
|
|
|
|
|
|
|
## TODO: Prefix all of the hash keys in $self with XFD_ to avoid |
671
|
|
|
|
|
|
|
## conflict with X::S::B and subclasses / hander CODE. |
672
|
|
|
|
|
|
|
|
673
|
|
|
|
|
|
|
## TODO: local $_ = xvalue before calling in to a sub |
674
|
|
|
|
|
|
|
|
675
|
|
|
|
|
|
|
## |
676
|
|
|
|
|
|
|
## $ctx->{Vars} a HASH of variables passed in from the parent context |
677
|
|
|
|
|
|
|
## (or the Perl world in the doc root node context). Set |
678
|
|
|
|
|
|
|
## in all child contexts. |
679
|
|
|
|
|
|
|
## |
680
|
|
|
|
|
|
|
## $ctx->{ChildVars} a HASH of variables set in by this node, passed |
681
|
|
|
|
|
|
|
## on to all child contexts, but erased before this |
682
|
|
|
|
|
|
|
## node's siblings can see it. |
683
|
|
|
|
|
|
|
## |
684
|
|
|
|
|
|
|
|
685
|
|
|
|
|
|
|
=head1 General API |
686
|
|
|
|
|
|
|
|
687
|
|
|
|
|
|
|
=over |
688
|
|
|
|
|
|
|
|
689
|
|
|
|
|
|
|
=item new |
690
|
|
|
|
|
|
|
|
691
|
|
|
|
|
|
|
my $f = XML::Filter::Dispatcher->new( |
692
|
|
|
|
|
|
|
Rules => [ ## Order is significant |
693
|
|
|
|
|
|
|
"/foo/bar" => sub { |
694
|
|
|
|
|
|
|
## Code to execute |
695
|
|
|
|
|
|
|
}, |
696
|
|
|
|
|
|
|
], |
697
|
|
|
|
|
|
|
); |
698
|
|
|
|
|
|
|
|
699
|
|
|
|
|
|
|
Must be called as a method, unlike other API calls provided. |
700
|
|
|
|
|
|
|
|
701
|
|
|
|
|
|
|
=cut |
702
|
|
|
|
|
|
|
|
703
|
|
|
|
|
|
|
my @every_names = qw( |
704
|
|
|
|
|
|
|
attribute |
705
|
|
|
|
|
|
|
characters |
706
|
|
|
|
|
|
|
comment |
707
|
|
|
|
|
|
|
processing_instruction |
708
|
|
|
|
|
|
|
start_element |
709
|
|
|
|
|
|
|
start_prefix_mapping |
710
|
|
|
|
|
|
|
); |
711
|
|
|
|
|
|
|
|
712
|
|
|
|
|
|
|
|
713
|
|
|
|
|
|
|
sub new { |
714
|
|
|
|
|
|
|
my $proto = shift ; |
715
|
|
|
|
|
|
|
my %handlers; |
716
|
|
|
|
|
|
|
my $self = bless { |
717
|
|
|
|
|
|
|
FoldContstants => 1, |
718
|
|
|
|
|
|
|
Handlers => { |
719
|
|
|
|
|
|
|
Handler => undef, ## Setting this here always allows "Handler" |
720
|
|
|
|
|
|
|
## in actions, so that a call to |
721
|
|
|
|
|
|
|
## set_handler( $h ) can be used when the |
722
|
|
|
|
|
|
|
## handler is not set when new() is called. |
723
|
|
|
|
|
|
|
}, |
724
|
|
|
|
|
|
|
}, ref $proto || $proto; |
725
|
|
|
|
|
|
|
|
726
|
|
|
|
|
|
|
while ( @_ ) { |
727
|
|
|
|
|
|
|
my ( $key, $value ) = ( shift, shift ); |
728
|
|
|
|
|
|
|
|
729
|
|
|
|
|
|
|
if ( substr( $key, -7 ) eq "Handler" ) { |
730
|
|
|
|
|
|
|
$self->{Handlers}->{$key} = $value; |
731
|
|
|
|
|
|
|
} |
732
|
|
|
|
|
|
|
elsif ( $key eq "Handlers" ) { |
733
|
|
|
|
|
|
|
$self->{Handlers}->{$_} = $value->{$_} |
734
|
|
|
|
|
|
|
for keys %$value; |
735
|
|
|
|
|
|
|
} |
736
|
|
|
|
|
|
|
else { |
737
|
|
|
|
|
|
|
$self->{$key} = $value; |
738
|
|
|
|
|
|
|
} |
739
|
|
|
|
|
|
|
|
740
|
|
|
|
|
|
|
} |
741
|
|
|
|
|
|
|
|
742
|
|
|
|
|
|
|
$self->{Debug} ||= $ENV{XFDDEBUG} || 0; |
743
|
|
|
|
|
|
|
|
744
|
|
|
|
|
|
|
$self->{SortAttributes} = 1 |
745
|
|
|
|
|
|
|
unless defined $self->{SortAttributes}; |
746
|
|
|
|
|
|
|
$self->{SetXValue} = 1 |
747
|
|
|
|
|
|
|
unless defined $self->{SetXValue}; |
748
|
|
|
|
|
|
|
|
749
|
|
|
|
|
|
|
|
750
|
|
|
|
|
|
|
$self->{Rules} ||= []; |
751
|
|
|
|
|
|
|
# $self->{Rules} = [ %{$self->{Rules}} ] |
752
|
|
|
|
|
|
|
# if ref $self->{Rules} eq "HASH"; |
753
|
|
|
|
|
|
|
|
754
|
|
|
|
|
|
|
my $doc_ctx = $self->{DocCtx} = $self->{CtxStack}->[0] = {}; |
755
|
|
|
|
|
|
|
$doc_ctx->{ChildCtx} = {}; |
756
|
|
|
|
|
|
|
|
757
|
|
|
|
|
|
|
for ( keys %{$self->{Vars}} ) { |
758
|
|
|
|
|
|
|
$self->xset_var( $_, @{$self->{Vars}->{$_}} ); |
759
|
|
|
|
|
|
|
} |
760
|
|
|
|
|
|
|
|
761
|
|
|
|
|
|
|
if ( @{$self->{Rules}} ) { |
762
|
|
|
|
|
|
|
require XML::Filter::Dispatcher::Compiler; |
763
|
|
|
|
|
|
|
my $c = XML::Filter::Dispatcher::Compiler->new( %$self ); |
764
|
|
|
|
|
|
|
|
765
|
|
|
|
|
|
|
my $code; |
766
|
|
|
|
|
|
|
|
767
|
|
|
|
|
|
|
## Use the internal use only compiler internals. |
768
|
|
|
|
|
|
|
( $code, $self->{Actions} ) = $c->_compile; |
769
|
|
|
|
|
|
|
|
770
|
|
|
|
|
|
|
$self->{DocSub} = eval $code; |
771
|
|
|
|
|
|
|
if ( ! $self->{DocSub} ) { |
772
|
|
|
|
|
|
|
my $c = $code; |
773
|
|
|
|
|
|
|
my $ln = 1; |
774
|
|
|
|
|
|
|
$c =~ s{^}{sprintf "%4d|", $ln++}gme; |
775
|
|
|
|
|
|
|
die $@, $c; |
776
|
|
|
|
|
|
|
} |
777
|
|
|
|
|
|
|
|
778
|
|
|
|
|
|
|
} |
779
|
|
|
|
|
|
|
|
780
|
|
|
|
|
|
|
return $self ; |
781
|
|
|
|
|
|
|
} |
782
|
|
|
|
|
|
|
|
783
|
|
|
|
|
|
|
|
784
|
|
|
|
|
|
|
=item xvalue |
785
|
|
|
|
|
|
|
|
786
|
|
|
|
|
|
|
"string( foo )" => sub { my $v = xvalue }, # if imported |
787
|
|
|
|
|
|
|
"string( foo )" => sub { my $v = shift->xvalue }, # if not |
788
|
|
|
|
|
|
|
|
789
|
|
|
|
|
|
|
Returns the result of the last EventPath expression evaluated; this is |
790
|
|
|
|
|
|
|
the result that fired the current rule. The example prints all text |
791
|
|
|
|
|
|
|
node children of CfooE> elements, for instance. |
792
|
|
|
|
|
|
|
|
793
|
|
|
|
|
|
|
For matching expressions, this is equivalent to $_[1] in action |
794
|
|
|
|
|
|
|
subroutines. |
795
|
|
|
|
|
|
|
|
796
|
|
|
|
|
|
|
=cut |
797
|
|
|
|
|
|
|
|
798
|
|
|
|
|
|
|
sub xvalue() { |
799
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
800
|
|
|
|
|
|
|
|
801
|
|
|
|
|
|
|
return $XFD::cur_self->{XValue}; |
802
|
|
|
|
|
|
|
exists $XFD::cur_self->{XValue} |
803
|
|
|
|
|
|
|
? $XFD::cur_self->{XValue} |
804
|
|
|
|
|
|
|
: $XFD::ctx && $XFD::ctx->{Node}; |
805
|
|
|
|
|
|
|
} |
806
|
|
|
|
|
|
|
|
807
|
|
|
|
|
|
|
=item xvalue_type |
808
|
|
|
|
|
|
|
|
809
|
|
|
|
|
|
|
Returns the type of the result returned by xvalue. This is either a SAX |
810
|
|
|
|
|
|
|
event name or "attribute" for path rules ("//a"), or "" (for a string), |
811
|
|
|
|
|
|
|
"HASH" for a hash (note that struct() also returns a hash; these types |
812
|
|
|
|
|
|
|
are Perl data structure types, not EventPath types). |
813
|
|
|
|
|
|
|
|
814
|
|
|
|
|
|
|
This is the same as xeventtype for all rules that don't evaluate |
815
|
|
|
|
|
|
|
functions like "string()" as their top level expression. |
816
|
|
|
|
|
|
|
|
817
|
|
|
|
|
|
|
=cut |
818
|
|
|
|
|
|
|
|
819
|
|
|
|
|
|
|
sub xvalue_type() { |
820
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
821
|
|
|
|
|
|
|
|
822
|
|
|
|
|
|
|
return $XFD::cur_self->{XValue} == $XFD::ctx->{Node} |
823
|
|
|
|
|
|
|
? $XFD::ctx->{EventType} |
824
|
|
|
|
|
|
|
: ref $XFD::ctx->{Node}; |
825
|
|
|
|
|
|
|
} |
826
|
|
|
|
|
|
|
sub xvaluetype() { goto \&xvalue_type } ## deprecated syntax |
827
|
|
|
|
|
|
|
|
828
|
|
|
|
|
|
|
=item xeventtype |
829
|
|
|
|
|
|
|
|
830
|
|
|
|
|
|
|
Returns the type of the current SAX event. |
831
|
|
|
|
|
|
|
|
832
|
|
|
|
|
|
|
=cut |
833
|
|
|
|
|
|
|
|
834
|
|
|
|
|
|
|
sub xevent_type() { |
835
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
836
|
|
|
|
|
|
|
|
837
|
|
|
|
|
|
|
return $XFD::ctx->{EventType}; |
838
|
|
|
|
|
|
|
} |
839
|
|
|
|
|
|
|
|
840
|
|
|
|
|
|
|
=item xrun_next_action |
841
|
|
|
|
|
|
|
|
842
|
|
|
|
|
|
|
Runs the next action for the current node. Ordinarily, |
843
|
|
|
|
|
|
|
XML::Filter::Dispatcher runs only one action per node; this allows an |
844
|
|
|
|
|
|
|
action to call down to the next action. |
845
|
|
|
|
|
|
|
|
846
|
|
|
|
|
|
|
This is especially useful in filters that tweak a document on the way |
847
|
|
|
|
|
|
|
by. This tweaky sort of filter establishes a default "pass-through" |
848
|
|
|
|
|
|
|
rule and then additional override rules to tweak the values being |
849
|
|
|
|
|
|
|
passed through. |
850
|
|
|
|
|
|
|
|
851
|
|
|
|
|
|
|
Let's suppose you want to convert some mtimes from seconds since the |
852
|
|
|
|
|
|
|
epoch to a human readable format. Here's a set of rules that might |
853
|
|
|
|
|
|
|
do that: |
854
|
|
|
|
|
|
|
|
855
|
|
|
|
|
|
|
Rules => [ |
856
|
|
|
|
|
|
|
'node()' => "Handler", ## Pass everything through by default. |
857
|
|
|
|
|
|
|
|
858
|
|
|
|
|
|
|
'file[@mtime]' => sub { ## intercept and tweak the node. |
859
|
|
|
|
|
|
|
my $attr = $_[1]->{Attributes}->{"{}mtime"}; |
860
|
|
|
|
|
|
|
|
861
|
|
|
|
|
|
|
## Localize the changes: never assume that it is safe |
862
|
|
|
|
|
|
|
## to alter SAX elements on the way by in a general purpose |
863
|
|
|
|
|
|
|
## filter. Some smart aleck might send the same events |
864
|
|
|
|
|
|
|
## to another filter with a Tee fitting or even back |
865
|
|
|
|
|
|
|
## through your filter multiple times from a cache. |
866
|
|
|
|
|
|
|
local $attr->{Value} = localtime $attr->{Value}; |
867
|
|
|
|
|
|
|
|
868
|
|
|
|
|
|
|
## Now that the changes are localised, fall through to |
869
|
|
|
|
|
|
|
## the default rule. |
870
|
|
|
|
|
|
|
xrun_next_action; |
871
|
|
|
|
|
|
|
|
872
|
|
|
|
|
|
|
## We could emit other events here as well, but need not |
873
|
|
|
|
|
|
|
## in this example. |
874
|
|
|
|
|
|
|
}, |
875
|
|
|
|
|
|
|
], |
876
|
|
|
|
|
|
|
|
877
|
|
|
|
|
|
|
=cut |
878
|
|
|
|
|
|
|
|
879
|
|
|
|
|
|
|
sub xrun_next_action() { |
880
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
881
|
|
|
|
|
|
|
$XFD::cur_self->_execute_next_action; |
882
|
|
|
|
|
|
|
} |
883
|
|
|
|
|
|
|
|
884
|
|
|
|
|
|
|
|
885
|
|
|
|
|
|
|
=back |
886
|
|
|
|
|
|
|
|
887
|
|
|
|
|
|
|
=head2 EventPath Variables |
888
|
|
|
|
|
|
|
|
889
|
|
|
|
|
|
|
EventPath variables may be set in the current context using |
890
|
|
|
|
|
|
|
C, and accessed using C. Variables set in a |
891
|
|
|
|
|
|
|
given context are visible to all child contexts. If you want a variable |
892
|
|
|
|
|
|
|
to be set in an enclosed context and later retrieved in an enclosing |
893
|
|
|
|
|
|
|
context, you must set it in the enclosing context first, then alter it |
894
|
|
|
|
|
|
|
in the enclosed context, then retrieve it. |
895
|
|
|
|
|
|
|
|
896
|
|
|
|
|
|
|
EventPath variables are typed. |
897
|
|
|
|
|
|
|
|
898
|
|
|
|
|
|
|
EventPath variables set in a context are visible within that context and |
899
|
|
|
|
|
|
|
all enclosed contexts, but not outside of them. |
900
|
|
|
|
|
|
|
|
901
|
|
|
|
|
|
|
=cut |
902
|
|
|
|
|
|
|
|
903
|
|
|
|
|
|
|
=over |
904
|
|
|
|
|
|
|
|
905
|
|
|
|
|
|
|
=item xset_var |
906
|
|
|
|
|
|
|
|
907
|
|
|
|
|
|
|
"foo" => sub { xset_var( bar => string => "bingo" ) }, # if imported |
908
|
|
|
|
|
|
|
"foo" => sub { shift->xset_var( bar => boolean => 1 ) }, |
909
|
|
|
|
|
|
|
|
910
|
|
|
|
|
|
|
Sets an XPath variables visible in the current context and all child |
911
|
|
|
|
|
|
|
contexts. Will not be visible in parent contexts or sibling contexts. |
912
|
|
|
|
|
|
|
|
913
|
|
|
|
|
|
|
Legal types are C, C, and C. Node sets and |
914
|
|
|
|
|
|
|
nodes are unsupported at this time, and "other" types are not useful |
915
|
|
|
|
|
|
|
unless you work in your own functions that handle them. |
916
|
|
|
|
|
|
|
|
917
|
|
|
|
|
|
|
Variables are visible as C<$bar> variable references in XPath expressions and |
918
|
|
|
|
|
|
|
using xget_var in Perl code. Setting a variable to a new value temporarily |
919
|
|
|
|
|
|
|
overrides any existing value, somewhat like using Perl's C. |
920
|
|
|
|
|
|
|
|
921
|
|
|
|
|
|
|
=cut |
922
|
|
|
|
|
|
|
|
923
|
|
|
|
|
|
|
sub xset_var { |
924
|
|
|
|
|
|
|
my $self = @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ) ? shift : $XFD::cur_self; |
925
|
|
|
|
|
|
|
croak |
926
|
|
|
|
|
|
|
"Wrong number of parameters (" . @_ . ") passed to xset_var, need 3.\n" |
927
|
|
|
|
|
|
|
if @_ != 3; |
928
|
|
|
|
|
|
|
|
929
|
|
|
|
|
|
|
my ( $name, $type, $value ) = @_; |
930
|
|
|
|
|
|
|
croak "undefined type passed to xset_var\n" unless defined $type; |
931
|
|
|
|
|
|
|
croak "undefined name passed to xset_var\n" unless defined $name; |
932
|
|
|
|
|
|
|
croak "undefined value passed to xset_var\n" unless defined $name; |
933
|
|
|
|
|
|
|
|
934
|
|
|
|
|
|
|
## TODO: rename the type non-classes to st other than "string", etc. |
935
|
|
|
|
|
|
|
$self->{CtxStack}->[-1]->{Vars}->{$name} = bless \$value, $type; |
936
|
|
|
|
|
|
|
} |
937
|
|
|
|
|
|
|
|
938
|
|
|
|
|
|
|
|
939
|
|
|
|
|
|
|
## Used in compiled XPath exprs only; only minimal safeties engaged. |
940
|
|
|
|
|
|
|
sub _look_up_var { |
941
|
|
|
|
|
|
|
my $self = shift; |
942
|
|
|
|
|
|
|
my ( $vname ) = @_; |
943
|
|
|
|
|
|
|
|
944
|
|
|
|
|
|
|
my $ctx = $self->{CtxStack}->[-1]; |
945
|
|
|
|
|
|
|
return $ctx->{Vars}->{$vname} if exists $ctx->{Vars}->{$vname}; |
946
|
|
|
|
|
|
|
|
947
|
|
|
|
|
|
|
die "Unknown variable '\$$vname' referenced in XPath expression\n"; |
948
|
|
|
|
|
|
|
} |
949
|
|
|
|
|
|
|
|
950
|
|
|
|
|
|
|
|
951
|
|
|
|
|
|
|
=item xget_var |
952
|
|
|
|
|
|
|
|
953
|
|
|
|
|
|
|
"bar" => sub { print xget_var( "bar" ) }, # if imported |
954
|
|
|
|
|
|
|
"bar" => sub { print shift->xget_var( "bar" ) }, |
955
|
|
|
|
|
|
|
|
956
|
|
|
|
|
|
|
Retrieves a single variable from the current context. This may have |
957
|
|
|
|
|
|
|
been set by a parent or by a previous rule firing on this node, but |
958
|
|
|
|
|
|
|
not by children or preceding siblings. |
959
|
|
|
|
|
|
|
|
960
|
|
|
|
|
|
|
Returns C if the variable is not set (or if it was set to undef). |
961
|
|
|
|
|
|
|
|
962
|
|
|
|
|
|
|
=cut |
963
|
|
|
|
|
|
|
|
964
|
|
|
|
|
|
|
sub xget_var { |
965
|
|
|
|
|
|
|
my $self = @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ) ? shift : $XFD::cur_self; |
966
|
|
|
|
|
|
|
croak "No variable name passed to xget_var.\n" |
967
|
|
|
|
|
|
|
unless @_; |
968
|
|
|
|
|
|
|
croak "More than one variable name passed to xget_var.\n" |
969
|
|
|
|
|
|
|
unless @_ == 1; |
970
|
|
|
|
|
|
|
|
971
|
|
|
|
|
|
|
my ( $vname ) = @_; |
972
|
|
|
|
|
|
|
|
973
|
|
|
|
|
|
|
croak "Undefined variable name passed to xget_var.\n" |
974
|
|
|
|
|
|
|
unless defined $vname; |
975
|
|
|
|
|
|
|
|
976
|
|
|
|
|
|
|
my $ctx = $self->{CtxStack}->[-1]; |
977
|
|
|
|
|
|
|
return |
978
|
|
|
|
|
|
|
exists $ctx->{Vars}->{$vname} |
979
|
|
|
|
|
|
|
? ${$ctx->{Vars}->{$vname}} |
980
|
|
|
|
|
|
|
: undef; |
981
|
|
|
|
|
|
|
} |
982
|
|
|
|
|
|
|
|
983
|
|
|
|
|
|
|
|
984
|
|
|
|
|
|
|
=item xget_var_type |
985
|
|
|
|
|
|
|
|
986
|
|
|
|
|
|
|
"bar" => sub { print xget_var_type( "bar" ) }, # if imported |
987
|
|
|
|
|
|
|
"bar" => sub { shift->xget_var_type( "bar" ) }, |
988
|
|
|
|
|
|
|
|
989
|
|
|
|
|
|
|
Retrieves the type of a variable from the current context. This may have |
990
|
|
|
|
|
|
|
been set by a parent or by a previous rule firing on this node, but |
991
|
|
|
|
|
|
|
not by children or preceding siblings. |
992
|
|
|
|
|
|
|
|
993
|
|
|
|
|
|
|
Returns C if the variable is not set. |
994
|
|
|
|
|
|
|
|
995
|
|
|
|
|
|
|
=cut |
996
|
|
|
|
|
|
|
|
997
|
|
|
|
|
|
|
sub xget_var_type { |
998
|
|
|
|
|
|
|
my $self = @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ) ? shift : $XFD::cur_self; |
999
|
|
|
|
|
|
|
croak "No variable name passed to xget_var_type.\n" |
1000
|
|
|
|
|
|
|
unless @_; |
1001
|
|
|
|
|
|
|
croak "More than one variable name passed to xget_var_type.\n" |
1002
|
|
|
|
|
|
|
unless @_ == 1; |
1003
|
|
|
|
|
|
|
|
1004
|
|
|
|
|
|
|
my ( $vname ) = @_; |
1005
|
|
|
|
|
|
|
|
1006
|
|
|
|
|
|
|
croak "Undefined variable name passed to xget_var.\n" |
1007
|
|
|
|
|
|
|
unless defined $vname; |
1008
|
|
|
|
|
|
|
|
1009
|
|
|
|
|
|
|
my $ctx = $self->{CtxStack}->[-1]; |
1010
|
|
|
|
|
|
|
return |
1011
|
|
|
|
|
|
|
exists $ctx->{Vars}->{$vname} |
1012
|
|
|
|
|
|
|
? ref $ctx->{Vars}->{$vname} |
1013
|
|
|
|
|
|
|
: undef; |
1014
|
|
|
|
|
|
|
} |
1015
|
|
|
|
|
|
|
|
1016
|
|
|
|
|
|
|
=back |
1017
|
|
|
|
|
|
|
|
1018
|
|
|
|
|
|
|
=head2 Handlers |
1019
|
|
|
|
|
|
|
|
1020
|
|
|
|
|
|
|
XML::Filter::Dispatcher allows you to register handlers using |
1021
|
|
|
|
|
|
|
C and C, and then to refer to them |
1022
|
|
|
|
|
|
|
by name in actions. These are part of the "general API". |
1023
|
|
|
|
|
|
|
|
1024
|
|
|
|
|
|
|
You may use any string for handler names that you like, including |
1025
|
|
|
|
|
|
|
strings with spaces. It is wise to avoid those standard, rarely used |
1026
|
|
|
|
|
|
|
handlers recognized by parsers, such as: |
1027
|
|
|
|
|
|
|
|
1028
|
|
|
|
|
|
|
DTDHandler |
1029
|
|
|
|
|
|
|
ContentHandler |
1030
|
|
|
|
|
|
|
DocumentHandler |
1031
|
|
|
|
|
|
|
DeclHandler |
1032
|
|
|
|
|
|
|
ErrorHandler |
1033
|
|
|
|
|
|
|
EntityResolver |
1034
|
|
|
|
|
|
|
LexicalHandler |
1035
|
|
|
|
|
|
|
|
1036
|
|
|
|
|
|
|
unless you are using them for the stated purpose. (List taken from |
1037
|
|
|
|
|
|
|
L). |
1038
|
|
|
|
|
|
|
|
1039
|
|
|
|
|
|
|
Handlers may be set in the constructor in two ways: by using a name |
1040
|
|
|
|
|
|
|
ending in "Handler" and passing it as a top level option: |
1041
|
|
|
|
|
|
|
|
1042
|
|
|
|
|
|
|
my $f = XML::Filter::Dispatcher->new( |
1043
|
|
|
|
|
|
|
Handler => $h, |
1044
|
|
|
|
|
|
|
FooHandler => $foo, |
1045
|
|
|
|
|
|
|
BarHandler => $bar, |
1046
|
|
|
|
|
|
|
Rules => [ |
1047
|
|
|
|
|
|
|
... |
1048
|
|
|
|
|
|
|
], |
1049
|
|
|
|
|
|
|
); |
1050
|
|
|
|
|
|
|
|
1051
|
|
|
|
|
|
|
Or, for oddly named handlers, by passing them in the Handlers hash: |
1052
|
|
|
|
|
|
|
|
1053
|
|
|
|
|
|
|
my $f = XML::Filter::Dispatcher->new( |
1054
|
|
|
|
|
|
|
Handlers => { |
1055
|
|
|
|
|
|
|
Manny => $foo, |
1056
|
|
|
|
|
|
|
Moe => $bar, |
1057
|
|
|
|
|
|
|
Jack => $bat, |
1058
|
|
|
|
|
|
|
}, |
1059
|
|
|
|
|
|
|
Rules => [ |
1060
|
|
|
|
|
|
|
... |
1061
|
|
|
|
|
|
|
], |
1062
|
|
|
|
|
|
|
); |
1063
|
|
|
|
|
|
|
|
1064
|
|
|
|
|
|
|
Once declared in new(), handler names can be |
1065
|
|
|
|
|
|
|
used as actions. The "well known" handler name "Handler" need not be |
1066
|
|
|
|
|
|
|
predeclared. |
1067
|
|
|
|
|
|
|
|
1068
|
|
|
|
|
|
|
For exampled, this forwards all events except the C |
1069
|
|
|
|
|
|
|
and C events for the root element's children, thus |
1070
|
|
|
|
|
|
|
"hoisting" everything two levels below the root up a level: |
1071
|
|
|
|
|
|
|
|
1072
|
|
|
|
|
|
|
Rules => [ |
1073
|
|
|
|
|
|
|
'/*/*' => undef, |
1074
|
|
|
|
|
|
|
'node()' => "Handler", |
1075
|
|
|
|
|
|
|
], |
1076
|
|
|
|
|
|
|
|
1077
|
|
|
|
|
|
|
By default, no events are forwarded to any handlers: you must send |
1078
|
|
|
|
|
|
|
individual events to an individual handlers. |
1079
|
|
|
|
|
|
|
|
1080
|
|
|
|
|
|
|
Normally, when a handler is used in this manner, XML::Filter::Dispatcher |
1081
|
|
|
|
|
|
|
makes sure to send C and C events to |
1082
|
|
|
|
|
|
|
it just before the first event and just after the last event. This |
1083
|
|
|
|
|
|
|
prevents sending the document events unless a handler actually receives |
1084
|
|
|
|
|
|
|
other events, which is what most people expect (the alternative would be |
1085
|
|
|
|
|
|
|
to preemptively always send a C to all handlers when |
1086
|
|
|
|
|
|
|
when the dispatcher receives its C: ugh). |
1087
|
|
|
|
|
|
|
|
1088
|
|
|
|
|
|
|
To disable this for all handlers, pass the C
|
1089
|
|
|
|
|
|
|
=> 1> option. |
1090
|
|
|
|
|
|
|
|
1091
|
|
|
|
|
|
|
=over |
1092
|
|
|
|
|
|
|
|
1093
|
|
|
|
|
|
|
=item set_handler |
1094
|
|
|
|
|
|
|
|
1095
|
|
|
|
|
|
|
$self->set_handler( $handler ); |
1096
|
|
|
|
|
|
|
$self->set_handler( $name => $handler ); |
1097
|
|
|
|
|
|
|
|
1098
|
|
|
|
|
|
|
=cut |
1099
|
|
|
|
|
|
|
|
1100
|
|
|
|
|
|
|
sub set_handler { |
1101
|
|
|
|
|
|
|
my $self = shift; |
1102
|
|
|
|
|
|
|
my $name = @_ > 1 ? shift : "Handler"; |
1103
|
|
|
|
|
|
|
$self->{Handlers}->{$name} = shift; |
1104
|
|
|
|
|
|
|
} |
1105
|
|
|
|
|
|
|
|
1106
|
|
|
|
|
|
|
=item get_handler |
1107
|
|
|
|
|
|
|
|
1108
|
|
|
|
|
|
|
$self->set_handler( $handler ); |
1109
|
|
|
|
|
|
|
$self->set_handler( $name => $handler ); |
1110
|
|
|
|
|
|
|
|
1111
|
|
|
|
|
|
|
=cut |
1112
|
|
|
|
|
|
|
|
1113
|
|
|
|
|
|
|
sub get_handler { |
1114
|
|
|
|
|
|
|
my $self = shift; |
1115
|
|
|
|
|
|
|
my $name = @_ > 1 ? shift : "Handler"; |
1116
|
|
|
|
|
|
|
return $self->{Handlers}->{$name} |
1117
|
|
|
|
|
|
|
if exists $self->{Handlers}->{$name} |
1118
|
|
|
|
|
|
|
} |
1119
|
|
|
|
|
|
|
|
1120
|
|
|
|
|
|
|
=back |
1121
|
|
|
|
|
|
|
|
1122
|
|
|
|
|
|
|
|
1123
|
|
|
|
|
|
|
=head2 The xstack |
1124
|
|
|
|
|
|
|
|
1125
|
|
|
|
|
|
|
The xstack is a stack mechanism provided by XML::Filter::Dispatcher that |
1126
|
|
|
|
|
|
|
is automatically unwound after end_element, end_document, and all other |
1127
|
|
|
|
|
|
|
events other than start_element or start_document. This sounds |
1128
|
|
|
|
|
|
|
limiting, but it's quite useful for building data structures that mimic |
1129
|
|
|
|
|
|
|
the structure of the XML input. I've found this to be common when |
1130
|
|
|
|
|
|
|
dealing with data structures in XML and a creating nested hierarchies of |
1131
|
|
|
|
|
|
|
objects and/or Perl data structures. |
1132
|
|
|
|
|
|
|
|
1133
|
|
|
|
|
|
|
Here's an example of how to build and return a graph: |
1134
|
|
|
|
|
|
|
|
1135
|
|
|
|
|
|
|
use Graph; |
1136
|
|
|
|
|
|
|
|
1137
|
|
|
|
|
|
|
my $d = XML::Filter::Dispatcher->new( |
1138
|
|
|
|
|
|
|
Rules => [ |
1139
|
|
|
|
|
|
|
## These two create and, later, return the Graph object. |
1140
|
|
|
|
|
|
|
'graph' => sub { xpush( Graph->new ) }, |
1141
|
|
|
|
|
|
|
'end::graph' => \&xpop, |
1142
|
|
|
|
|
|
|
|
1143
|
|
|
|
|
|
|
## Every vertex must have a name, so collect in and add it |
1144
|
|
|
|
|
|
|
## to the Graph object using its add_vertex( $name ) method. |
1145
|
|
|
|
|
|
|
'vertex' => [ 'string( @name )' => sub { xadd } ], |
1146
|
|
|
|
|
|
|
|
1147
|
|
|
|
|
|
|
## Edges are a little more complex: we need to collect the |
1148
|
|
|
|
|
|
|
## from and to attributes, which we do using a hash, then |
1149
|
|
|
|
|
|
|
## pop the hash and use it to add an edge. You could |
1150
|
|
|
|
|
|
|
## also use a single rule, see below. |
1151
|
|
|
|
|
|
|
'edge' => [ 'string()' => sub { xpush {} } ], |
1152
|
|
|
|
|
|
|
'edge/@*' => [ 'string()' => sub { xset } ], |
1153
|
|
|
|
|
|
|
'end::edge' => sub { |
1154
|
|
|
|
|
|
|
my $edge = xpop; |
1155
|
|
|
|
|
|
|
xpeek->add_edge( @$edge{"from","to"} ); |
1156
|
|
|
|
|
|
|
}, |
1157
|
|
|
|
|
|
|
], |
1158
|
|
|
|
|
|
|
); |
1159
|
|
|
|
|
|
|
|
1160
|
|
|
|
|
|
|
my $graph = QB->new( "graph", <playback( $d ); |
1161
|
|
|
|
|
|
|
|
1162
|
|
|
|
|
|
|
|
1163
|
|
|
|
|
|
|
|
1164
|
|
|
|
|
|
|
|
1165
|
|
|
|
|
|
|
|
1166
|
|
|
|
|
|
|
END_XML |
1167
|
|
|
|
|
|
|
|
1168
|
|
|
|
|
|
|
print $graph, $graph->is_sparse ? " is sparse!\n" : "\n"; |
1169
|
|
|
|
|
|
|
|
1170
|
|
|
|
|
|
|
should print "0,1-2,2-1 is sparse!\n". |
1171
|
|
|
|
|
|
|
|
1172
|
|
|
|
|
|
|
This is good if you can tell what object to add to the stack before |
1173
|
|
|
|
|
|
|
seeing content. Some XML parsing is more general than that: if you see |
1174
|
|
|
|
|
|
|
no child elements, you want to create one class to contain just |
1175
|
|
|
|
|
|
|
character content, otherwise you want to add a container class to |
1176
|
|
|
|
|
|
|
contain the child nodes. |
1177
|
|
|
|
|
|
|
|
1178
|
|
|
|
|
|
|
An faster alternative to the 3 edge rules relies on the fact that |
1179
|
|
|
|
|
|
|
SAX's start_element events carry the attributes, so you can actually |
1180
|
|
|
|
|
|
|
do a single rule instead of the three we show above: |
1181
|
|
|
|
|
|
|
|
1182
|
|
|
|
|
|
|
'edge' => sub { |
1183
|
|
|
|
|
|
|
xpeek->add_edge( |
1184
|
|
|
|
|
|
|
$_[1]->{Attributes}->{"{}from"}->{Value}, |
1185
|
|
|
|
|
|
|
$_[1]->{Attributes}->{"{}to" }->{Value}, |
1186
|
|
|
|
|
|
|
); |
1187
|
|
|
|
|
|
|
}, |
1188
|
|
|
|
|
|
|
|
1189
|
|
|
|
|
|
|
=over |
1190
|
|
|
|
|
|
|
|
1191
|
|
|
|
|
|
|
=item xpush |
1192
|
|
|
|
|
|
|
|
1193
|
|
|
|
|
|
|
Push values on to the xstack. These will be removed from the xstack at |
1194
|
|
|
|
|
|
|
the end of the current element. The topmost item on the |
1195
|
|
|
|
|
|
|
xstack is available through the peek method. Elements xpushed before |
1196
|
|
|
|
|
|
|
the first element (usually in the C event) remain on |
1197
|
|
|
|
|
|
|
the stack after the document has been parsed and a call like |
1198
|
|
|
|
|
|
|
|
1199
|
|
|
|
|
|
|
my $elt = $dispatcher->xpop; |
1200
|
|
|
|
|
|
|
|
1201
|
|
|
|
|
|
|
can be used to retrieve them. |
1202
|
|
|
|
|
|
|
|
1203
|
|
|
|
|
|
|
=cut |
1204
|
|
|
|
|
|
|
|
1205
|
|
|
|
|
|
|
sub xpush { |
1206
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
1207
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: xpush()ing ", @_ if is_tracing; |
1208
|
|
|
|
|
|
|
push @{$XFD::cur_self->{XStack}}, map( { "#elt" => $_ }, @_ ); |
1209
|
|
|
|
|
|
|
} |
1210
|
|
|
|
|
|
|
|
1211
|
|
|
|
|
|
|
=item xadd |
1212
|
|
|
|
|
|
|
|
1213
|
|
|
|
|
|
|
Tries to add a possibly named item to the element on the top of the |
1214
|
|
|
|
|
|
|
stack and push the item on to the stack. It makes a guess about how to |
1215
|
|
|
|
|
|
|
add items depending on what the current top of the stack is. |
1216
|
|
|
|
|
|
|
|
1217
|
|
|
|
|
|
|
xadd $name, $new_item; |
1218
|
|
|
|
|
|
|
|
1219
|
|
|
|
|
|
|
does this: |
1220
|
|
|
|
|
|
|
|
1221
|
|
|
|
|
|
|
Top of Stack Action |
1222
|
|
|
|
|
|
|
============ ====== |
1223
|
|
|
|
|
|
|
scalar xpeek .= $new_item; |
1224
|
|
|
|
|
|
|
SCALAR ref ${xpeek} .= $new_item; |
1225
|
|
|
|
|
|
|
ARRAY ref push @{xpeek()}, $new_item; |
1226
|
|
|
|
|
|
|
HASH ref push @{xpeek->{$name}} = $new_item; |
1227
|
|
|
|
|
|
|
blessed object xpeek->$method( $new_item ); |
1228
|
|
|
|
|
|
|
|
1229
|
|
|
|
|
|
|
The $method in the last item is one of (in order) "add_$name", |
1230
|
|
|
|
|
|
|
"push_$name", or "$name". |
1231
|
|
|
|
|
|
|
|
1232
|
|
|
|
|
|
|
After the above action, an |
1233
|
|
|
|
|
|
|
|
1234
|
|
|
|
|
|
|
xpush $new_item; |
1235
|
|
|
|
|
|
|
|
1236
|
|
|
|
|
|
|
is done. |
1237
|
|
|
|
|
|
|
|
1238
|
|
|
|
|
|
|
$name defaults to the LocalName of the current node if it is an |
1239
|
|
|
|
|
|
|
attribute or element, so |
1240
|
|
|
|
|
|
|
|
1241
|
|
|
|
|
|
|
xadd $foo; |
1242
|
|
|
|
|
|
|
|
1243
|
|
|
|
|
|
|
will DWYM. TODO: search up the current node's ancestry for a LocalName |
1244
|
|
|
|
|
|
|
when handling other event types. |
1245
|
|
|
|
|
|
|
|
1246
|
|
|
|
|
|
|
If no parameters are provided, xvalue is used. |
1247
|
|
|
|
|
|
|
|
1248
|
|
|
|
|
|
|
If the stack is empty, it just xpush()es on the stack. |
1249
|
|
|
|
|
|
|
|
1250
|
|
|
|
|
|
|
=cut |
1251
|
|
|
|
|
|
|
|
1252
|
|
|
|
|
|
|
sub xadd { |
1253
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
1254
|
|
|
|
|
|
|
my $name = @_ > 1 |
1255
|
|
|
|
|
|
|
? shift |
1256
|
|
|
|
|
|
|
: do { |
1257
|
|
|
|
|
|
|
croak "$XFD::ctx->{EventType} has no LocalName" |
1258
|
|
|
|
|
|
|
unless exists $XFD::ctx->{Node}->{LocalName} |
1259
|
|
|
|
|
|
|
&& defined exists $XFD::ctx->{Node}->{LocalName}; |
1260
|
|
|
|
|
|
|
croak "$XFD::ctx->{EventType} a LocalName of ''" |
1261
|
|
|
|
|
|
|
unless length $XFD::ctx->{Node}->{LocalName}; |
1262
|
|
|
|
|
|
|
$XFD::ctx->{Node}->{LocalName}; |
1263
|
|
|
|
|
|
|
}; |
1264
|
|
|
|
|
|
|
|
1265
|
|
|
|
|
|
|
my $new_item = @_ ? shift : $XFD::cur_self->xvalue; |
1266
|
|
|
|
|
|
|
|
1267
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: xadd()ing ", $name, " => ", $new_item if is_tracing; |
1268
|
|
|
|
|
|
|
|
1269
|
|
|
|
|
|
|
if ( @{$XFD::cur_self->{XStack}} ) { |
1270
|
|
|
|
|
|
|
my $e = $XFD::cur_self->{XStack}->[-1]; |
1271
|
|
|
|
|
|
|
my $top = $e->{"#elt"}; |
1272
|
|
|
|
|
|
|
my $t = ref $top; |
1273
|
|
|
|
|
|
|
my $meth; |
1274
|
|
|
|
|
|
|
|
1275
|
|
|
|
|
|
|
if ( $t eq "" ) { |
1276
|
|
|
|
|
|
|
$XFD::cur_self->{XStack}->[-1]->{"#elt"} .= ""; |
1277
|
|
|
|
|
|
|
$e->{scalar}++; |
1278
|
|
|
|
|
|
|
} |
1279
|
|
|
|
|
|
|
elsif ( $t eq "SCALAR" ) { |
1280
|
|
|
|
|
|
|
$e->{scalar}++; |
1281
|
|
|
|
|
|
|
$$top .= ""; |
1282
|
|
|
|
|
|
|
} |
1283
|
|
|
|
|
|
|
elsif ( $t eq "ARRAY" ) { |
1284
|
|
|
|
|
|
|
$e->{scalar}++; |
1285
|
|
|
|
|
|
|
push @$top, $new_item; |
1286
|
|
|
|
|
|
|
} |
1287
|
|
|
|
|
|
|
elsif ( $t eq "HASH" ) { |
1288
|
|
|
|
|
|
|
croak |
1289
|
|
|
|
|
|
|
"element '", |
1290
|
|
|
|
|
|
|
$name, |
1291
|
|
|
|
|
|
|
"' of the HASH on top of the xstack is a ", |
1292
|
|
|
|
|
|
|
do { |
1293
|
|
|
|
|
|
|
my $t = ref $top->{$name}; |
1294
|
|
|
|
|
|
|
! $t ? "scalar" : "$t reference"; |
1295
|
|
|
|
|
|
|
}, |
1296
|
|
|
|
|
|
|
", not an ARRAY ref" |
1297
|
|
|
|
|
|
|
if defined $top->{$name} && ! ref $top->{$name}; |
1298
|
|
|
|
|
|
|
push @{$top->{$name}}, $new_item; |
1299
|
|
|
|
|
|
|
$e->{$name}++; |
1300
|
|
|
|
|
|
|
} |
1301
|
|
|
|
|
|
|
## See if it's a blessed object that can add thingamies" |
1302
|
|
|
|
|
|
|
elsif ( $meth = ( |
1303
|
|
|
|
|
|
|
UNIVERSAL::can( $top, "add_$name" ) |
1304
|
|
|
|
|
|
|
|| UNIVERSAL::can( $top, "push_$name" ) |
1305
|
|
|
|
|
|
|
|| UNIVERSAL::can( $top, "add" ) |
1306
|
|
|
|
|
|
|
) ) { |
1307
|
|
|
|
|
|
|
$top->$meth( $new_item ); |
1308
|
|
|
|
|
|
|
$e->{$name}++; |
1309
|
|
|
|
|
|
|
} |
1310
|
|
|
|
|
|
|
else { |
1311
|
|
|
|
|
|
|
croak "don't know how to xadd() a '", |
1312
|
|
|
|
|
|
|
ref( $new_item ) || "scalar", |
1313
|
|
|
|
|
|
|
"' ", |
1314
|
|
|
|
|
|
|
defined $name ? $name : "item", |
1315
|
|
|
|
|
|
|
" to a '$t' (which is what is on the top of the xstack)"; |
1316
|
|
|
|
|
|
|
} |
1317
|
|
|
|
|
|
|
|
1318
|
|
|
|
|
|
|
} |
1319
|
|
|
|
|
|
|
|
1320
|
|
|
|
|
|
|
$XFD::cur_self->xpush( $new_item ) |
1321
|
|
|
|
|
|
|
if ref $new_item && ref $new_item ne "SCALAR"; |
1322
|
|
|
|
|
|
|
return $new_item; |
1323
|
|
|
|
|
|
|
} |
1324
|
|
|
|
|
|
|
|
1325
|
|
|
|
|
|
|
|
1326
|
|
|
|
|
|
|
=item xset |
1327
|
|
|
|
|
|
|
|
1328
|
|
|
|
|
|
|
Like C, but tries to set a named value. Dies if the value is |
1329
|
|
|
|
|
|
|
already defined (so duplicate values aren't silently ignored). |
1330
|
|
|
|
|
|
|
|
1331
|
|
|
|
|
|
|
xset $name, $new_item; |
1332
|
|
|
|
|
|
|
|
1333
|
|
|
|
|
|
|
does this: |
1334
|
|
|
|
|
|
|
|
1335
|
|
|
|
|
|
|
Top of Stack Action |
1336
|
|
|
|
|
|
|
============ ====== |
1337
|
|
|
|
|
|
|
scalar xpeek = $new_item; |
1338
|
|
|
|
|
|
|
SCALAR ref ${xpeek} = $new_item; |
1339
|
|
|
|
|
|
|
HASH ref xpeek->{$name} = $new_item; |
1340
|
|
|
|
|
|
|
blessed object xpeek->$name( $new_item ); |
1341
|
|
|
|
|
|
|
|
1342
|
|
|
|
|
|
|
Trying to xset any other types results in an exception. |
1343
|
|
|
|
|
|
|
|
1344
|
|
|
|
|
|
|
After the above action (except when the top is a scalar or SCALAR ref), an |
1345
|
|
|
|
|
|
|
|
1346
|
|
|
|
|
|
|
xpush $new_item; |
1347
|
|
|
|
|
|
|
|
1348
|
|
|
|
|
|
|
is done so that more may be added to the item. |
1349
|
|
|
|
|
|
|
|
1350
|
|
|
|
|
|
|
$name defaults to the LocalName of the current node if it is an |
1351
|
|
|
|
|
|
|
attribute or element, so |
1352
|
|
|
|
|
|
|
|
1353
|
|
|
|
|
|
|
xset $foo; |
1354
|
|
|
|
|
|
|
|
1355
|
|
|
|
|
|
|
will DWYM. TODO: search up the current node's ancestry for a LocalName |
1356
|
|
|
|
|
|
|
when handling other event types. |
1357
|
|
|
|
|
|
|
|
1358
|
|
|
|
|
|
|
If no parameters are provided, xvalue is used. |
1359
|
|
|
|
|
|
|
|
1360
|
|
|
|
|
|
|
=cut |
1361
|
|
|
|
|
|
|
|
1362
|
|
|
|
|
|
|
sub xset { |
1363
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
1364
|
|
|
|
|
|
|
my $name = @_ > 1 |
1365
|
|
|
|
|
|
|
? shift |
1366
|
|
|
|
|
|
|
: do { |
1367
|
|
|
|
|
|
|
croak "$XFD::ctx->{EventType} has no LocalName" |
1368
|
|
|
|
|
|
|
unless exists $XFD::ctx->{Node}->{LocalName} |
1369
|
|
|
|
|
|
|
&& defined exists $XFD::ctx->{Node}->{LocalName}; |
1370
|
|
|
|
|
|
|
croak "$XFD::ctx->{EventType} has a LocalName of ''" |
1371
|
|
|
|
|
|
|
unless length $XFD::ctx->{Node}->{LocalName}; |
1372
|
|
|
|
|
|
|
$XFD::ctx->{Node}->{LocalName}; |
1373
|
|
|
|
|
|
|
}; |
1374
|
|
|
|
|
|
|
|
1375
|
|
|
|
|
|
|
|
1376
|
|
|
|
|
|
|
my $new_item = @_ ? shift : $XFD::cur_self->xvalue; |
1377
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: xset()ing ", $name, " => ", $new_item if is_tracing; |
1378
|
|
|
|
|
|
|
|
1379
|
|
|
|
|
|
|
unless ( @{$XFD::cur_self->{XStack}} ) { |
1380
|
|
|
|
|
|
|
$XFD::cur_self->xpush( $new_item ); |
1381
|
|
|
|
|
|
|
} |
1382
|
|
|
|
|
|
|
else { |
1383
|
|
|
|
|
|
|
my $e = $XFD::cur_self->{XStack}->[-1]; |
1384
|
|
|
|
|
|
|
my $top = $e->{"#elt"}; |
1385
|
|
|
|
|
|
|
my $t = ref $top; |
1386
|
|
|
|
|
|
|
my $meth; |
1387
|
|
|
|
|
|
|
|
1388
|
|
|
|
|
|
|
if ( $t eq "" ) { |
1389
|
|
|
|
|
|
|
croak "already xset() scalar on top of xstack" |
1390
|
|
|
|
|
|
|
if $e->{scalar}++; |
1391
|
|
|
|
|
|
|
$e->{"#elt"} = $new_item; |
1392
|
|
|
|
|
|
|
} |
1393
|
|
|
|
|
|
|
elsif ( $t eq "SCALAR" ) { |
1394
|
|
|
|
|
|
|
croak "already xset() SCALAR reference on top of xstack" |
1395
|
|
|
|
|
|
|
if $e->{scalar}++; |
1396
|
|
|
|
|
|
|
$$top = $new_item; |
1397
|
|
|
|
|
|
|
} |
1398
|
|
|
|
|
|
|
elsif ( $t eq "HASH" ) { |
1399
|
|
|
|
|
|
|
croak "already xset() element '$name' of HASH on top of xstack" |
1400
|
|
|
|
|
|
|
if $e->{$name}++; |
1401
|
|
|
|
|
|
|
$top->{$name} = $new_item; |
1402
|
|
|
|
|
|
|
} |
1403
|
|
|
|
|
|
|
## See if it's a blessed object that can add thingamies" |
1404
|
|
|
|
|
|
|
elsif ( |
1405
|
|
|
|
|
|
|
( $meth = UNIVERSAL::can( $top, $name ) ) |
1406
|
|
|
|
|
|
|
|| ( $meth = UNIVERSAL::can( $top, "set_$name" ) ) |
1407
|
|
|
|
|
|
|
) { |
1408
|
|
|
|
|
|
|
croak "already xset() accessor $name() of ", ref $top, " on top of xstack" |
1409
|
|
|
|
|
|
|
if $e->{$name}++; |
1410
|
|
|
|
|
|
|
$top->$meth( $new_item ); |
1411
|
|
|
|
|
|
|
} |
1412
|
|
|
|
|
|
|
else { |
1413
|
|
|
|
|
|
|
croak "don't know how to xset() $name for a '$t' (which is what is on the top of the xstack)"; |
1414
|
|
|
|
|
|
|
} |
1415
|
|
|
|
|
|
|
|
1416
|
|
|
|
|
|
|
$XFD::cur_self->xpush( $new_item ) |
1417
|
|
|
|
|
|
|
if ref $new_item && ref $new_item ne "SCALAR"; |
1418
|
|
|
|
|
|
|
} |
1419
|
|
|
|
|
|
|
|
1420
|
|
|
|
|
|
|
return $new_item; |
1421
|
|
|
|
|
|
|
} |
1422
|
|
|
|
|
|
|
|
1423
|
|
|
|
|
|
|
|
1424
|
|
|
|
|
|
|
|
1425
|
|
|
|
|
|
|
=item xoverwrite |
1426
|
|
|
|
|
|
|
|
1427
|
|
|
|
|
|
|
Exactly like xset but does not complain if the value has already been |
1428
|
|
|
|
|
|
|
xadd(), xset() or xoverwrite(). |
1429
|
|
|
|
|
|
|
|
1430
|
|
|
|
|
|
|
=cut |
1431
|
|
|
|
|
|
|
|
1432
|
|
|
|
|
|
|
sub xoverwrite { |
1433
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
1434
|
|
|
|
|
|
|
my $name = @_ > 1 |
1435
|
|
|
|
|
|
|
? shift |
1436
|
|
|
|
|
|
|
: do { |
1437
|
|
|
|
|
|
|
croak "$XFD::ctx->{EventType} has no LocalName" |
1438
|
|
|
|
|
|
|
unless exists $XFD::ctx->{Node}->{LocalName} |
1439
|
|
|
|
|
|
|
&& defined exists $XFD::ctx->{Node}->{LocalName}; |
1440
|
|
|
|
|
|
|
croak "$XFD::ctx->{EventType} a LocalName of ''" |
1441
|
|
|
|
|
|
|
unless length $XFD::ctx->{Node}->{LocalName}; |
1442
|
|
|
|
|
|
|
$XFD::ctx->{Node}->{LocalName}; |
1443
|
|
|
|
|
|
|
}; |
1444
|
|
|
|
|
|
|
|
1445
|
|
|
|
|
|
|
|
1446
|
|
|
|
|
|
|
my $new_item = @_ ? shift : $XFD::cur_self->xvalue; |
1447
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: xoverwrite()ing ", $name, " => ", $new_item if is_tracing; |
1448
|
|
|
|
|
|
|
|
1449
|
|
|
|
|
|
|
unless ( @{$XFD::cur_self->{XStack}} ) { |
1450
|
|
|
|
|
|
|
$XFD::cur_self->xpush( $new_item ); |
1451
|
|
|
|
|
|
|
} |
1452
|
|
|
|
|
|
|
else { |
1453
|
|
|
|
|
|
|
my $e = $XFD::cur_self->{XStack}->[-1]; |
1454
|
|
|
|
|
|
|
my $top = $e->{"#elt"}; |
1455
|
|
|
|
|
|
|
|
1456
|
|
|
|
|
|
|
my $t = ref $top; |
1457
|
|
|
|
|
|
|
my $meth; |
1458
|
|
|
|
|
|
|
if ( $t eq "" ) { |
1459
|
|
|
|
|
|
|
$XFD::cur_self->{XStack}->[-1]->{"#elt"} = $new_item; |
1460
|
|
|
|
|
|
|
$e->{scalar}++; |
1461
|
|
|
|
|
|
|
} |
1462
|
|
|
|
|
|
|
elsif ( $t eq "SCALAR" ) { |
1463
|
|
|
|
|
|
|
$$top = $new_item; |
1464
|
|
|
|
|
|
|
$e->{scalar}++; |
1465
|
|
|
|
|
|
|
} |
1466
|
|
|
|
|
|
|
elsif ( $t eq "HASH" ) { |
1467
|
|
|
|
|
|
|
$top->{$name} = $new_item; |
1468
|
|
|
|
|
|
|
$e->{$name}++; |
1469
|
|
|
|
|
|
|
} |
1470
|
|
|
|
|
|
|
## See if it's a blessed object that can add thingamies" |
1471
|
|
|
|
|
|
|
elsif ( |
1472
|
|
|
|
|
|
|
( $meth = UNIVERSAL::can( $top, $name ) ) |
1473
|
|
|
|
|
|
|
|| ( $meth = UNIVERSAL::can( $top, "set_$name" ) ) |
1474
|
|
|
|
|
|
|
) { |
1475
|
|
|
|
|
|
|
$top->$meth( $new_item ); |
1476
|
|
|
|
|
|
|
$e->{$name}++; |
1477
|
|
|
|
|
|
|
} |
1478
|
|
|
|
|
|
|
else { |
1479
|
|
|
|
|
|
|
croak "don't know how to xoverwrite $name for a '$t' (which is what is on the top of the xstack)"; |
1480
|
|
|
|
|
|
|
} |
1481
|
|
|
|
|
|
|
$XFD::cur_self->xpush( $new_item ) |
1482
|
|
|
|
|
|
|
if ref $new_item && ref $new_item ne "SCALAR"; |
1483
|
|
|
|
|
|
|
} |
1484
|
|
|
|
|
|
|
|
1485
|
|
|
|
|
|
|
return $new_item; |
1486
|
|
|
|
|
|
|
} |
1487
|
|
|
|
|
|
|
|
1488
|
|
|
|
|
|
|
|
1489
|
|
|
|
|
|
|
|
1490
|
|
|
|
|
|
|
=item xpeek |
1491
|
|
|
|
|
|
|
|
1492
|
|
|
|
|
|
|
Rules => [ |
1493
|
|
|
|
|
|
|
"foo" => sub { |
1494
|
|
|
|
|
|
|
my $elt = $_[1]; |
1495
|
|
|
|
|
|
|
xpeek->set_name( $elt->{Attributes}->{"{}name"} ); |
1496
|
|
|
|
|
|
|
}, |
1497
|
|
|
|
|
|
|
"/end::*" => sub { |
1498
|
|
|
|
|
|
|
my $self = shift; |
1499
|
|
|
|
|
|
|
XXXXXXXXXXXXXXXXXXXX |
1500
|
|
|
|
|
|
|
} |
1501
|
|
|
|
|
|
|
], |
1502
|
|
|
|
|
|
|
|
1503
|
|
|
|
|
|
|
|
1504
|
|
|
|
|
|
|
Returns the top element on the xstack, which was the last thing |
1505
|
|
|
|
|
|
|
pushed in the current context. Throws an exception if the xstack is |
1506
|
|
|
|
|
|
|
empty. To check for an empty stack, use eval: |
1507
|
|
|
|
|
|
|
|
1508
|
|
|
|
|
|
|
my $stack_not_empty = eval { xpeek }; |
1509
|
|
|
|
|
|
|
|
1510
|
|
|
|
|
|
|
To peek down the xstack, use a Perlish index value. The most |
1511
|
|
|
|
|
|
|
recently pushed element is index number -1: |
1512
|
|
|
|
|
|
|
|
1513
|
|
|
|
|
|
|
$xpeek( -1 ); ## Same as $self->peek |
1514
|
|
|
|
|
|
|
|
1515
|
|
|
|
|
|
|
The first element pushed on the xstack is element 0: |
1516
|
|
|
|
|
|
|
|
1517
|
|
|
|
|
|
|
$xpeek( 0 ); |
1518
|
|
|
|
|
|
|
|
1519
|
|
|
|
|
|
|
An exception is thrown if the index is off either end of the stack. |
1520
|
|
|
|
|
|
|
|
1521
|
|
|
|
|
|
|
=cut |
1522
|
|
|
|
|
|
|
|
1523
|
|
|
|
|
|
|
sub xpeek { |
1524
|
|
|
|
|
|
|
unless ( @_ ) { |
1525
|
|
|
|
|
|
|
croak "xpeek() called on empty stack" |
1526
|
|
|
|
|
|
|
unless @{$XFD::cur_self->{XStack}}; |
1527
|
|
|
|
|
|
|
|
1528
|
|
|
|
|
|
|
return $XFD::cur_self->{XStack}->[-1]->{"#elt"}; |
1529
|
|
|
|
|
|
|
} |
1530
|
|
|
|
|
|
|
|
1531
|
|
|
|
|
|
|
local $XFD::cur_self = shift if UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
1532
|
|
|
|
|
|
|
|
1533
|
|
|
|
|
|
|
my $index = shift; |
1534
|
|
|
|
|
|
|
$index = -1 unless defined $index; |
1535
|
|
|
|
|
|
|
|
1536
|
|
|
|
|
|
|
croak "xpeek( $index ) off the end of the stack" |
1537
|
|
|
|
|
|
|
if $index > $#{$XFD::cur_self->{XStack}} |
1538
|
|
|
|
|
|
|
|| $index < -1 - $#{$XFD::cur_self->{XStack}}; |
1539
|
|
|
|
|
|
|
|
1540
|
|
|
|
|
|
|
return $XFD::cur_self->{XStack}->[$index]->{"#elt"}; |
1541
|
|
|
|
|
|
|
} |
1542
|
|
|
|
|
|
|
|
1543
|
|
|
|
|
|
|
=item xpop |
1544
|
|
|
|
|
|
|
|
1545
|
|
|
|
|
|
|
|
1546
|
|
|
|
|
|
|
my $d = XML::Filter::Dispatcher->new( |
1547
|
|
|
|
|
|
|
Rules => [ |
1548
|
|
|
|
|
|
|
....rules to build an object hierarchy... |
1549
|
|
|
|
|
|
|
], |
1550
|
|
|
|
|
|
|
); |
1551
|
|
|
|
|
|
|
|
1552
|
|
|
|
|
|
|
my $result = $d->xpop |
1553
|
|
|
|
|
|
|
|
1554
|
|
|
|
|
|
|
Removes an element from the xstack and returns it. Usually |
1555
|
|
|
|
|
|
|
called in a end_document handler or after the document returns to |
1556
|
|
|
|
|
|
|
retrieve a "root" object placed on the stack before the root element |
1557
|
|
|
|
|
|
|
was started. |
1558
|
|
|
|
|
|
|
|
1559
|
|
|
|
|
|
|
=cut |
1560
|
|
|
|
|
|
|
|
1561
|
|
|
|
|
|
|
sub xpop { |
1562
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
1563
|
|
|
|
|
|
|
|
1564
|
|
|
|
|
|
|
croak "xpop() called on empty stack" |
1565
|
|
|
|
|
|
|
unless @{$XFD::cur_self->{XStack}}; |
1566
|
|
|
|
|
|
|
|
1567
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: xpop()ing ", $XFD::cur_self->{XStack}->[-1]->{"#elt"} if is_tracing; |
1568
|
|
|
|
|
|
|
return (pop @{$XFD::cur_self->{XStack}})->{"#elt"}; |
1569
|
|
|
|
|
|
|
} |
1570
|
|
|
|
|
|
|
|
1571
|
|
|
|
|
|
|
=item xstack_empty |
1572
|
|
|
|
|
|
|
|
1573
|
|
|
|
|
|
|
Handy for detecting a nonempty stack: |
1574
|
|
|
|
|
|
|
|
1575
|
|
|
|
|
|
|
warn xpeek unless xstack_empty; |
1576
|
|
|
|
|
|
|
|
1577
|
|
|
|
|
|
|
Because C and C throw exceptions on an empty stack, |
1578
|
|
|
|
|
|
|
C is needed to detect whether it's safe to call them. |
1579
|
|
|
|
|
|
|
|
1580
|
|
|
|
|
|
|
=cut |
1581
|
|
|
|
|
|
|
|
1582
|
|
|
|
|
|
|
sub xstack_empty { |
1583
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
1584
|
|
|
|
|
|
|
return ! @{$XFD::cur_self->{XStack}}; |
1585
|
|
|
|
|
|
|
} |
1586
|
|
|
|
|
|
|
|
1587
|
|
|
|
|
|
|
=item xstack_max |
1588
|
|
|
|
|
|
|
|
1589
|
|
|
|
|
|
|
Handy for walking the stack: |
1590
|
|
|
|
|
|
|
|
1591
|
|
|
|
|
|
|
for my $i ( reverse 0 .. xstack_max ) { ## from top to bottom |
1592
|
|
|
|
|
|
|
use BFD;d xpeek( $i ); |
1593
|
|
|
|
|
|
|
} |
1594
|
|
|
|
|
|
|
|
1595
|
|
|
|
|
|
|
Because C and C throw exceptions on an empty stack, |
1596
|
|
|
|
|
|
|
C may be used to walk the stack safely. |
1597
|
|
|
|
|
|
|
|
1598
|
|
|
|
|
|
|
=cut |
1599
|
|
|
|
|
|
|
|
1600
|
|
|
|
|
|
|
sub xstack_max { |
1601
|
|
|
|
|
|
|
local $XFD::cur_self = shift if @_ && UNIVERSAL::isa( $_[0], __PACKAGE__ ); |
1602
|
|
|
|
|
|
|
return $#{$XFD::cur_self->{XStack}}; |
1603
|
|
|
|
|
|
|
} |
1604
|
|
|
|
|
|
|
|
1605
|
|
|
|
|
|
|
=back |
1606
|
|
|
|
|
|
|
|
1607
|
|
|
|
|
|
|
=cut |
1608
|
|
|
|
|
|
|
|
1609
|
|
|
|
|
|
|
## |
1610
|
|
|
|
|
|
|
## A little help for debugging |
1611
|
|
|
|
|
|
|
## |
1612
|
|
|
|
|
|
|
{ |
1613
|
|
|
|
|
|
|
## This allows us to number events as the arrive and then |
1614
|
|
|
|
|
|
|
## look those numbers up using their memory addresses |
1615
|
|
|
|
|
|
|
## This is a 1..N numbering system: the left hand side of the |
1616
|
|
|
|
|
|
|
## ||= preallocates the key, so there's already 1 key in the |
1617
|
|
|
|
|
|
|
## hash when this is first called. |
1618
|
|
|
|
|
|
|
## We also hold on to the events to keep them from reusing numbers. |
1619
|
|
|
|
|
|
|
my %events; |
1620
|
|
|
|
|
|
|
sub _ev($) { |
1621
|
|
|
|
|
|
|
return ( |
1622
|
|
|
|
|
|
|
$events{int $_[0]} ||= |
1623
|
|
|
|
|
|
|
{ ctx => $_, ev => 0+keys %events } |
1624
|
|
|
|
|
|
|
)->{ev}; |
1625
|
|
|
|
|
|
|
} |
1626
|
|
|
|
|
|
|
} |
1627
|
|
|
|
|
|
|
|
1628
|
|
|
|
|
|
|
{ |
1629
|
|
|
|
|
|
|
## This allows us to number events as the arrive and then |
1630
|
|
|
|
|
|
|
## look those numbers up using their memory addresses |
1631
|
|
|
|
|
|
|
## This is a 1..N numbering system: the left hand side of the |
1632
|
|
|
|
|
|
|
## ||= preallocates the key, so there's already 1 key in the |
1633
|
|
|
|
|
|
|
## hash when this is first called. |
1634
|
|
|
|
|
|
|
my %postponements; |
1635
|
|
|
|
|
|
|
sub _po($) { |
1636
|
|
|
|
|
|
|
return ( |
1637
|
|
|
|
|
|
|
$postponements{int $_[0]} ||= |
1638
|
|
|
|
|
|
|
{ ctx => $_, po => 0+keys %postponements} |
1639
|
|
|
|
|
|
|
)->{po}; |
1640
|
|
|
|
|
|
|
} |
1641
|
|
|
|
|
|
|
} |
1642
|
|
|
|
|
|
|
|
1643
|
|
|
|
|
|
|
## |
1644
|
|
|
|
|
|
|
## SAX handlers |
1645
|
|
|
|
|
|
|
## |
1646
|
|
|
|
|
|
|
|
1647
|
|
|
|
|
|
|
## a helper called by the handlers... |
1648
|
|
|
|
|
|
|
## TODO: optimize most of this away for all events except start_document and |
1649
|
|
|
|
|
|
|
## start_ element, since these are the only two events that can have |
1650
|
|
|
|
|
|
|
## child events. The others should be able to get away with much simpler logic. |
1651
|
|
|
|
|
|
|
|
1652
|
|
|
|
|
|
|
sub _call_queued_subs { |
1653
|
|
|
|
|
|
|
my $self = shift ; |
1654
|
|
|
|
|
|
|
my $event_type = shift; |
1655
|
|
|
|
|
|
|
|
1656
|
|
|
|
|
|
|
local $XFD::cur_self = $self; |
1657
|
|
|
|
|
|
|
|
1658
|
|
|
|
|
|
|
$XFD::ctx->{EventType} = $event_type; |
1659
|
|
|
|
|
|
|
$XFD::ctx->{Node} = $_[0]; |
1660
|
|
|
|
|
|
|
$XFD::ctx->{HighScore} = -1; |
1661
|
|
|
|
|
|
|
|
1662
|
|
|
|
|
|
|
for ( @{$XFD::ctx->{$event_type}} ) { |
1663
|
|
|
|
|
|
|
$_->[0]->( @{$_}[1..$#$_], @_ ); |
1664
|
|
|
|
|
|
|
} |
1665
|
|
|
|
|
|
|
|
1666
|
|
|
|
|
|
|
$self->_queue_pending_event( $XFD::ctx ); |
1667
|
|
|
|
|
|
|
} |
1668
|
|
|
|
|
|
|
|
1669
|
|
|
|
|
|
|
|
1670
|
|
|
|
|
|
|
sub _call_queued_end_subs { |
1671
|
|
|
|
|
|
|
my $self = shift ; |
1672
|
|
|
|
|
|
|
my $event_type = shift; |
1673
|
|
|
|
|
|
|
my $start_ctx = shift; |
1674
|
|
|
|
|
|
|
|
1675
|
|
|
|
|
|
|
local $XFD::cur_self = $self; |
1676
|
|
|
|
|
|
|
|
1677
|
|
|
|
|
|
|
local $XFD::ctx = $self->_build_end_ctx_from_start_ctx( $start_ctx ); |
1678
|
|
|
|
|
|
|
$XFD::ctx->{EventType} = $event_type; |
1679
|
|
|
|
|
|
|
$XFD::ctx->{Node} = $_[0]; |
1680
|
|
|
|
|
|
|
$XFD::ctx->{HighScore} = -1; |
1681
|
|
|
|
|
|
|
|
1682
|
|
|
|
|
|
|
my $i = 0; |
1683
|
|
|
|
|
|
|
if ( exists $start_ctx->{EndSubs} ) { |
1684
|
|
|
|
|
|
|
## EndSubs are subroutines that are placed in a start_ context |
1685
|
|
|
|
|
|
|
## to be run when the matching end_ event is reached. Their |
1686
|
|
|
|
|
|
|
## purpose is usually to evaluate some postponement and queue |
1687
|
|
|
|
|
|
|
## an action for that postponement if need be. |
1688
|
|
|
|
|
|
|
my $end_subs = $start_ctx->{EndSubs}; |
1689
|
|
|
|
|
|
|
while ( @$end_subs ) { |
1690
|
|
|
|
|
|
|
local $_ = pop @$end_subs; |
1691
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: *** calling EndSub ", $i++, " for event ", _ev $start_ctx, " ***" if is_tracing; |
1692
|
|
|
|
|
|
|
|
1693
|
|
|
|
|
|
|
$_->[0]->( @{$_}[1..$#$_], @_ ) |
1694
|
|
|
|
|
|
|
} |
1695
|
|
|
|
|
|
|
} |
1696
|
|
|
|
|
|
|
|
1697
|
|
|
|
|
|
|
# $start_ctx->{EndContext} = $XFD::ctx; |
1698
|
|
|
|
|
|
|
|
1699
|
|
|
|
|
|
|
$self->_queue_pending_event( $XFD::ctx ); |
1700
|
|
|
|
|
|
|
} |
1701
|
|
|
|
|
|
|
|
1702
|
|
|
|
|
|
|
sub _execute_next_action { |
1703
|
|
|
|
|
|
|
my $self = shift; |
1704
|
|
|
|
|
|
|
|
1705
|
|
|
|
|
|
|
my $actions = $XFD::ctx->{PendingActions}; |
1706
|
|
|
|
|
|
|
my $key = ( sort { $a <=> $b } keys %$actions)[-1]; |
1707
|
|
|
|
|
|
|
return unless defined $key; |
1708
|
|
|
|
|
|
|
|
1709
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: *** executing action $key for event ", _ev $XFD::ctx, " ***" if is_tracing; |
1710
|
|
|
|
|
|
|
my $sub = shift @{$actions->{$key}}; |
1711
|
|
|
|
|
|
|
delete $actions->{$key} unless @{$actions->{$key}}; |
1712
|
|
|
|
|
|
|
|
1713
|
|
|
|
|
|
|
$self->{LastHandlerResult} = $sub->( $self, $XFD::ctx->{Node} ); |
1714
|
|
|
|
|
|
|
|
1715
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: result set to ", defined $self->{LastHandlerResult} ? "'$self->{LastHandlerResult}'" : "undef" if is_tracing; |
1716
|
|
|
|
|
|
|
|
1717
|
|
|
|
|
|
|
# if ( exists $XFD::ctx->{EndContext} && exists $XFD::ctx->{EndSubs} ) { |
1718
|
|
|
|
|
|
|
# my $i = 0; |
1719
|
|
|
|
|
|
|
# while ( @{$XFD::ctx->{EndSubs}} ) { |
1720
|
|
|
|
|
|
|
# local $_ = shift @{$XFD::ctx->{EndSubs}}; |
1721
|
|
|
|
|
|
|
# emit_trace_SAX_message "EventPath: *** calling delayed (due to postponed start_) EndSub ", $i++, " for event ", _ev $XFD::ctx, " ***" if is_tracing; |
1722
|
|
|
|
|
|
|
# local $XFD::ctx = $XFD::ctx->{EndContext}; |
1723
|
|
|
|
|
|
|
# |
1724
|
|
|
|
|
|
|
# $_->[0]->( @{$_}[1..$#$_], @_ ) |
1725
|
|
|
|
|
|
|
# } |
1726
|
|
|
|
|
|
|
# } |
1727
|
|
|
|
|
|
|
|
1728
|
|
|
|
|
|
|
} |
1729
|
|
|
|
|
|
|
|
1730
|
|
|
|
|
|
|
|
1731
|
|
|
|
|
|
|
sub _queue_pending_event { |
1732
|
|
|
|
|
|
|
my $self = shift; |
1733
|
|
|
|
|
|
|
local $XFD::cur_self = $self; |
1734
|
|
|
|
|
|
|
|
1735
|
|
|
|
|
|
|
my ( $ctx ) = @_; |
1736
|
|
|
|
|
|
|
|
1737
|
|
|
|
|
|
|
# if ( exists $ctx->{PendingActions} |
1738
|
|
|
|
|
|
|
# || ( $ctx->{Postponements} && @{$ctx->{Postponements}} ) |
1739
|
|
|
|
|
|
|
# ) { |
1740
|
|
|
|
|
|
|
# emit_trace_SAX_message "EventPath: queuing pending event ", _ev $ctx if is_tracing; |
1741
|
|
|
|
|
|
|
push @{$self->{PendingEvents}}, $ctx; |
1742
|
|
|
|
|
|
|
# } |
1743
|
|
|
|
|
|
|
# else { |
1744
|
|
|
|
|
|
|
# emit_trace_SAX_message "EventPath: not queuing event ", _ev $ctx if is_tracing; |
1745
|
|
|
|
|
|
|
# } |
1746
|
|
|
|
|
|
|
|
1747
|
|
|
|
|
|
|
while ( @{$self->{PendingEvents}} |
1748
|
|
|
|
|
|
|
&& ! $self->{PendingEvents}->[0]->{PostponementCount} |
1749
|
|
|
|
|
|
|
) { |
1750
|
|
|
|
|
|
|
my $c = shift @{$self->{PendingEvents}}; |
1751
|
|
|
|
|
|
|
|
1752
|
|
|
|
|
|
|
if ( show_buffer_highwater |
1753
|
|
|
|
|
|
|
&& @{$self->{PendingEvents}} |
1754
|
|
|
|
|
|
|
&& @{$self->{PendingEvents}} >= $self->{BufferHighwater} |
1755
|
|
|
|
|
|
|
) { |
1756
|
|
|
|
|
|
|
$self->{BufferHighwater} = @{$self->{PendingEvents}}; |
1757
|
|
|
|
|
|
|
if ( @{$self->{PendingEvents}} > $self->{BufferHighwater} ) { |
1758
|
|
|
|
|
|
|
@{$self->{BufferHighwaterEvents}} = ( $c ); |
1759
|
|
|
|
|
|
|
} |
1760
|
|
|
|
|
|
|
else { |
1761
|
|
|
|
|
|
|
push @{$self->{BufferHighwaterEvents}}, $c; |
1762
|
|
|
|
|
|
|
} |
1763
|
|
|
|
|
|
|
} |
1764
|
|
|
|
|
|
|
|
1765
|
|
|
|
|
|
|
if ( |
1766
|
|
|
|
|
|
|
substr( $c->{EventType}, 0, 4 ) ne "end_" |
1767
|
|
|
|
|
|
|
# substr( $c->{EventType}, 0, 6 ) eq "start_" |
1768
|
|
|
|
|
|
|
# || $c->{EventType} eq "attribute" |
1769
|
|
|
|
|
|
|
) { |
1770
|
|
|
|
|
|
|
push @{$self->{XStackMarks}}, scalar @{$self->{XStack}}; |
1771
|
|
|
|
|
|
|
} |
1772
|
|
|
|
|
|
|
|
1773
|
|
|
|
|
|
|
if ( exists $c->{PendingActions} ) { |
1774
|
|
|
|
|
|
|
## The "-1" here implements the "last match wins" logic. |
1775
|
|
|
|
|
|
|
## All rules are evaluated in order; the last rule to evaluate |
1776
|
|
|
|
|
|
|
## queues its action last. TODO: test this in the face of |
1777
|
|
|
|
|
|
|
## precursors; actions may need to be set based on action |
1778
|
|
|
|
|
|
|
## numbers or something. |
1779
|
|
|
|
|
|
|
local $XFD::ctx = $c; |
1780
|
|
|
|
|
|
|
$self->_execute_next_action; |
1781
|
|
|
|
|
|
|
} |
1782
|
|
|
|
|
|
|
else { |
1783
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: discarding event ", _ev $c if is_tracing; |
1784
|
|
|
|
|
|
|
} |
1785
|
|
|
|
|
|
|
|
1786
|
|
|
|
|
|
|
if ( |
1787
|
|
|
|
|
|
|
# $c->{EventType} eq "end_element" |
1788
|
|
|
|
|
|
|
## Note that we don't unwind on end_document. Perhaps we should. |
1789
|
|
|
|
|
|
|
# || $c->{EventType} eq "attribute" |
1790
|
|
|
|
|
|
|
substr( $c->{EventType}, 0, 6 ) ne "start_" |
1791
|
|
|
|
|
|
|
) { |
1792
|
|
|
|
|
|
|
my $level = pop @{$self->{XStackMarks}}; |
1793
|
|
|
|
|
|
|
if ( $level < @{$self->{XStack}} ) { |
1794
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: unwinding from xstack: ", splice @{$self->{XStack}}, $level if is_tracing; |
1795
|
|
|
|
|
|
|
splice @{$self->{XStack}}, $level; |
1796
|
|
|
|
|
|
|
} |
1797
|
|
|
|
|
|
|
} |
1798
|
|
|
|
|
|
|
} |
1799
|
|
|
|
|
|
|
|
1800
|
|
|
|
|
|
|
emit_trace_SAX_message |
1801
|
|
|
|
|
|
|
"EventPath: ", |
1802
|
|
|
|
|
|
|
@{$self->{PendingEvents}} . " events pending (", |
1803
|
|
|
|
|
|
|
join( ", ", |
1804
|
|
|
|
|
|
|
map |
1805
|
|
|
|
|
|
|
$_->{PostponementCount} |
1806
|
|
|
|
|
|
|
? _ev( $_ ).":$_->{PostponementCount}:".( |
1807
|
|
|
|
|
|
|
exists $_->{PendingActions} |
1808
|
|
|
|
|
|
|
? "action" ## TODO: dump actions? |
1809
|
|
|
|
|
|
|
: "" |
1810
|
|
|
|
|
|
|
) |
1811
|
|
|
|
|
|
|
: (), |
1812
|
|
|
|
|
|
|
@{$self->{PendingEvents}} |
1813
|
|
|
|
|
|
|
), |
1814
|
|
|
|
|
|
|
")" if is_tracing; |
1815
|
|
|
|
|
|
|
} |
1816
|
|
|
|
|
|
|
|
1817
|
|
|
|
|
|
|
|
1818
|
|
|
|
|
|
|
sub _build_ctx { |
1819
|
|
|
|
|
|
|
my $self = shift; |
1820
|
|
|
|
|
|
|
|
1821
|
|
|
|
|
|
|
my $parent_ctx = $self->{CtxStack}->[-1]; |
1822
|
|
|
|
|
|
|
my $ctx = { %{$parent_ctx->{ChildCtx}} }; |
1823
|
|
|
|
|
|
|
$ctx->{Vars} = { %{$parent_ctx->{Vars}} } |
1824
|
|
|
|
|
|
|
if exists $parent_ctx->{Vars}; |
1825
|
|
|
|
|
|
|
|
1826
|
|
|
|
|
|
|
if ( exists $ctx->{EndSubs} ) { |
1827
|
|
|
|
|
|
|
## When descendant-or-self:: queues up cloned postponements |
1828
|
|
|
|
|
|
|
## for the child contexts, the child contexts don't exist yet |
1829
|
|
|
|
|
|
|
## so it puts undef where they should be. This loop replaces |
1830
|
|
|
|
|
|
|
## those undefs with the freshly minted context. Only |
1831
|
|
|
|
|
|
|
## descendant-or-self:: does this, so we can assume those are |
1832
|
|
|
|
|
|
|
## the only kind of EndSubs we'll find. |
1833
|
|
|
|
|
|
|
$ctx->{EndSubs} = [ map [ @$_ ], @{$ctx->{EndSubs}} ]; |
1834
|
|
|
|
|
|
|
for ( @{$ctx->{EndSubs}} ) { |
1835
|
|
|
|
|
|
|
die "The first param to the child's EndSubs should be undef not $_->[1]" |
1836
|
|
|
|
|
|
|
if defined $_->[1]; |
1837
|
|
|
|
|
|
|
$_->[1] = $ctx; |
1838
|
|
|
|
|
|
|
} |
1839
|
|
|
|
|
|
|
} |
1840
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: built event ", _ev $ctx, " (from event ", _ev $parent_ctx, ")" if is_tracing; |
1841
|
|
|
|
|
|
|
return $ctx; |
1842
|
|
|
|
|
|
|
} |
1843
|
|
|
|
|
|
|
|
1844
|
|
|
|
|
|
|
|
1845
|
|
|
|
|
|
|
sub _build_end_ctx_from_start_ctx { |
1846
|
|
|
|
|
|
|
my $self = shift; |
1847
|
|
|
|
|
|
|
|
1848
|
|
|
|
|
|
|
my ( $start_ctx ) = @_; |
1849
|
|
|
|
|
|
|
|
1850
|
|
|
|
|
|
|
## The $start_ctx's actions have yet to run. They may |
1851
|
|
|
|
|
|
|
## add actions to the end event. |
1852
|
|
|
|
|
|
|
$start_ctx->{PendingEndActions} ||= {}; |
1853
|
|
|
|
|
|
|
|
1854
|
|
|
|
|
|
|
my $end_ctx = { |
1855
|
|
|
|
|
|
|
PendingActions => $start_ctx->{PendingEndActions}, |
1856
|
|
|
|
|
|
|
}; |
1857
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: built end_ event ", _ev $end_ctx, " (from event ", _ev $start_ctx, ")" if is_tracing; |
1858
|
|
|
|
|
|
|
return $end_ctx; |
1859
|
|
|
|
|
|
|
} |
1860
|
|
|
|
|
|
|
|
1861
|
|
|
|
|
|
|
|
1862
|
|
|
|
|
|
|
sub start_document { |
1863
|
|
|
|
|
|
|
my $self = shift ; |
1864
|
|
|
|
|
|
|
|
1865
|
|
|
|
|
|
|
$self->{XStack} = []; |
1866
|
|
|
|
|
|
|
$self->{XStackMarks} = []; |
1867
|
|
|
|
|
|
|
delete $self->{DocStartedFlags}; |
1868
|
|
|
|
|
|
|
$self->{PendingEvents} = []; |
1869
|
|
|
|
|
|
|
|
1870
|
|
|
|
|
|
|
if ( $self->{DocSub} ) { |
1871
|
|
|
|
|
|
|
## The "[]" is the postponement record to pass in. |
1872
|
|
|
|
|
|
|
@{$self->{DocCtx}->{start_document}} = [ $self->{DocSub}, $self, [] ]; |
1873
|
|
|
|
|
|
|
} |
1874
|
|
|
|
|
|
|
|
1875
|
|
|
|
|
|
|
|
1876
|
|
|
|
|
|
|
if ( show_buffer_highwater ) { |
1877
|
|
|
|
|
|
|
$self->{BufferHighwater} = 0; |
1878
|
|
|
|
|
|
|
$self->{BufferHighwaterEvents} = []; |
1879
|
|
|
|
|
|
|
} |
1880
|
|
|
|
|
|
|
|
1881
|
|
|
|
|
|
|
local $XFD::ctx = $self->{DocCtx}; |
1882
|
|
|
|
|
|
|
emit_trace_SAX_message "EventPath: using doc event ", _ev $XFD::ctx if is_tracing; |
1883
|
|
|
|
|
|
|
$self->{CtxStack} = [ $XFD::ctx ]; |
1884
|
|
|
|
|
|
|
$self->_call_queued_subs( "start_document", @_ ); |
1885
|
|
|
|
|
|
|
|
1886
|
|
|
|
|
|
|
return; |
1887
|
|
|
|
|
|
|
} |
1888
|
|
|
|
|
|
|
|
1889
|
|
|
|
|
|
|
|
1890
|
|
|
|
|
|
|
sub end_document { |
1891
|
|
|
|
|
|
|
my $self = shift ; |
1892
|
|
|
|
|
|
|
my ( $doc ) = @_; |
1893
|
|
|
|
|
|
|
|
1894
|
|
|
|
|
|
|
confess "Bug: context stack should not be empty!" |
1895
|
|
|
|
|
|
|
unless @{$self->{CtxStack}}; |
1896
|
|
|
|
|
|
|
|
1897
|
|
|
|
|
|
|
my $start_ctx = pop @{$self->{CtxStack}}; |
1898
|
|
|
|
|
|
|
die "end_document() mismatch: ", |
1899
|
|
|
|
|
|
|
defined $start_ctx ? $start_ctx->{EventType} : "undef", |
1900
|
|
|
|
|
|
|
" from the context stack\n" |
1901
|
|
|
|
|
|
|
unless $start_ctx->{EventType} eq "start_document"; |
1902
|
|
|
|
|
|
|
|
1903
|
|
|
|
|
|
|
confess "Bug: context stack should be empty!" |
1904
|
|
|
|
|
|
|
unless ! @{$self->{CtxStack}}; |
1905
|
|
|
|
|
|
|
|
1906
|
|
|
|
|
|
|
$self->_call_queued_end_subs( end_document => $start_ctx, @_ ); |
1907
|
|
|
|
|
|
|
|
1908
|
|
|
|
|
|
|
if ( exists $self->{AutoStartedHandlers} ) { |
1909
|
|
|
|
|
|
|
for ( reverse @{$self->{AutoStartedHandlers}} ) { |
1910
|
|
|
|
|
|
|
$self->{LastHandlerResult} = $_->end_document( {} ); |
1911
|
|
|
|
|
|
|
} |
1912
|
|
|
|
|
|
|
} |
1913
|
|
|
|
|
|
|
|
1914
|
|
|
|
|
|
|
@{$self->{XStack}} = (); |
1915
|
|
|
|
|
|
|
|
1916
|
|
|
|
|
|
|
if ( show_buffer_highwater ) { |
1917
|
|
|
|
|
|
|
warn ref $self, |
1918
|
|
|
|
|
|
|
" buffer highwater mark was ", |
1919
|
|
|
|
|
|
|
$self->{BufferHighwater} + 1, |
1920
|
|
|
|
|
|
|
$self->{BufferHighwater} |
1921
|
|
|
|
|
|
|
? ( |
1922
|
|
|
|
|
|
|
" for event", |
1923
|
|
|
|
|
|
|
@{$self->{BufferHighwaterEvents}} > 1 |
1924
|
|
|
|
|
|
|
? "s" |
1925
|
|
|
|
|
|
|
: (), |
1926
|
|
|
|
|
|
|
":\n", |
1927
|
|
|
|
|
|
|
map { |
1928
|
|
|
|
|
|
|
my $n = $_->{Node}; |
1929
|
|
|
|
|
|
|
join( "", |
1930
|
|
|
|
|
|
|
" $_->{EventType}", |
1931
|
|
|
|
|
|
|
defined $n->{Name} |
1932
|
|
|
|
|
|
|
? ( " ", $n->{Name} ) |
1933
|
|
|
|
|
|
|
: (), |
1934
|
|
|
|
|
|
|
defined $n->{Data} |
1935
|
|
|
|
|
|
|
? ( " \"", |
1936
|
|
|
|
|
|
|
length $n->{Data} > 40 |
1937
|
|
|
|
|
|
|
? ( substr( $n->{Data}, 0, 40 ), "..." ) |
1938
|
|
|
|
|
|
|
: $n->{Data}, |
1939
|
|
|
|
|
|
|
"\"" |
1940
|
|
|
|
|
|
|
) |
1941
|
|
|
|
|
|
|
: (), |
1942
|
|
|
|
|
|
|
"\n" |
1943
|
|
|
|
|
|
|
); |
1944
|
|
|
|
|
|
|
} @{$self->{BufferHighwaterEvents}} |
1945
|
|
|
|
|
|
|
) |
1946
|
|
|
|
|
|
|
: ( " (no events were buffered)\n" ); |
1947
|
|
|
|
|
|
|
@{$self->{BufferHighwaterEvents}} = (); |
1948
|
|
|
|
|
|
|
} |
1949
|
|
|
|
|
|
|
|
1950
|
|
|
|
|
|
|
return $self->{LastHandlerResult}; |
1951
|
|
|
|
|
|
|
} |
1952
|
|
|
|
|
|
|
|
1953
|
|
|
|
|
|
|
|
1954
|
|
|
|
|
|
|
sub start_element { |
1955
|
|
|
|
|
|
|
my $self = shift ; |
1956
|
|
|
|
|
|
|
my ( $elt ) = @_ ; |
1957
|
|
|
|
|
|
|
|
1958
|
|
|
|
|
|
|
push @{$self->{CtxStack}}, local $XFD::ctx = $self->_build_ctx; |
1959
|
|
|
|
|
|
|
|
1960
|
|
|
|
|
|
|
{ |
1961
|
|
|
|
|
|
|
local $XFD::cur_self = $self; |
1962
|
|
|
|
|
|
|
|
1963
|
|
|
|
|
|
|
$XFD::ctx->{EventType} = "start_element"; |
1964
|
|
|
|
|
|
|
$XFD::ctx->{Node} = $_[0]; |
1965
|
|
|
|
|
|
|
$XFD::ctx->{HighScore} = -1; |
1966
|
|
|
|
|
|
|
|
1967
|
|
|
|
|
|
|
for ( @{$XFD::ctx->{start_element}} ) { |
1968
|
|
|
|
|
|
|
$_->[0]->( @{$_}[1..$#$_], @_ ); |
1969
|
|
|
|
|
|
|
} |
1970
|
|
|
|
|
|
|
|
1971
|
|
|
|
|
|
|
$self->{start_elementSub}->( $self, [] ) |
1972
|
|
|
|
|
|
|
if $self->{start_elementSub}; |
1973
|
|
|
|
|
|
|
|
1974
|
|
|
|
|
|
|
$self->_queue_pending_event( $XFD::ctx ); |
1975
|
|
|
|
|
|
|
} |
1976
|
|
|
|
|
|
|
|
1977
|
|
|
|
|
|
|
if ( |
1978
|
|
|
|
|
|
|
( |
1979
|
|
|
|
|
|
|
$self->{attributeSub} |
1980
|
|
|
|
|
|
|
|| exists $XFD::ctx->{ChildCtx}->{attribute} ## Any attr handlers? |
1981
|
|
|
|
|
|
|
) |
1982
|
|
|
|
|
|
|
&& exists $elt->{Attributes} ## Any attrs? |
1983
|
|
|
|
|
|
|
) { |
1984
|
|
|
|
|
|
|
$XFD::ctx->{ChildCtx} ||= {}; |
1985
|
|
|
|
|
|
|
|
1986
|
|
|
|
|
|
|
## Put attrs in a reproducible order. perl5.6.1 and perl5.8.0 |
1987
|
|
|
|
|
|
|
## use different hashing algs, this helps keep code stable |
1988
|
|
|
|
|
|
|
## across versions. |
1989
|
|
|
|
|
|
|
my @attrs = values %{$elt->{Attributes}}; |
1990
|
|
|
|
|
|
|
@attrs = sort { |
1991
|
|
|
|
|
|
|
( $a->{Name} || "" ) cmp ( $b->{Name} || "" ) |
1992
|
|
|
|
|
|
|
} @attrs if $self->{SortAttributes}; |
1993
|
|
|
|
|
|
|
|
1994
|
|
|
|
|
|
|
for my $attr ( @attrs ) { |
1995
|
|
|
|
|
|
|
local $XFD::ctx = $self->_build_ctx; |
1996
|
|
|
|
|
|
|
|
1997
|
|
|
|
|
|
|
$XFD::ctx->{EventType} = "attribute"; |
1998
|
|
|
|
|
|
|
$XFD::ctx->{Node} = $attr; |
1999
|
|
|
|
|
|
|
$XFD::ctx->{HighScore} = -1; |
2000
|
|
|
|
|
|
|
|
2001
|
|
|
|
|
|
|
for ( @{$XFD::ctx->{attribute}} ) { |
2002
|
|
|
|
|
|
|
$_->[0]->( @{$_}[1..$#$_], @_ ); |
2003
|
|
|
|
|
|
|
} |
2004
|
|
|
|
|
|
|
|
2005
|
|
|
|
|
|
|
$self->{attributeSub}->( $self, [] ) |
2006
|
|
|
|
|
|
|
if $self->{attributeSub}; |
2007
|
|
|
|
|
|
|
|
2008
|
|
|
|
|
|
|
$self->_queue_pending_event( $XFD::ctx ); |
2009
|
|
|
|
|
|
|
} |
2010
|
|
|
|
|
|
|
} |
2011
|
|
|
|
|
|
|
|
2012
|
|
|
|
|
|
|
return; |
2013
|
|
|
|
|
|
|
} |
2014
|
|
|
|
|
|
|
|
2015
|
|
|
|
|
|
|
|
2016
|
|
|
|
|
|
|
sub end_element { |
2017
|
|
|
|
|
|
|
my $self = shift ; |
2018
|
|
|
|
|
|
|
my ( $elt ) = @_ ; |
2019
|
|
|
|
|
|
|
|
2020
|
|
|
|
|
|
|
my $start_ctx = pop @{$self->{CtxStack}}; # Remove the child context |
2021
|
|
|
|
|
|
|
|
2022
|
|
|
|
|
|
|
$self->_call_queued_end_subs( end_element => $start_ctx, @_ ); |
2023
|
|
|
|
|
|
|
|
2024
|
|
|
|
|
|
|
return; |
2025
|
|
|
|
|
|
|
} |
2026
|
|
|
|
|
|
|
|
2027
|
|
|
|
|
|
|
|
2028
|
|
|
|
|
|
|
sub start_prefix_mapping { |
2029
|
|
|
|
|
|
|
my $self = shift ; |
2030
|
|
|
|
|
|
|
my ( $elt ) = @_ ; |
2031
|
|
|
|
|
|
|
|
2032
|
|
|
|
|
|
|
## Prefix mappings aren't containers, but they need to |
2033
|
|
|
|
|
|
|
## have contexts saved and restored in order like containers. |
2034
|
|
|
|
|
|
|
## So we have a stack within a stack to take care of them. |
2035
|
|
|
|
|
|
|
push @{$self->{CtxStack}->[-1]->{PrefixContexts}}, |
2036
|
|
|
|
|
|
|
local $XFD::ctx = $self->_build_ctx; |
2037
|
|
|
|
|
|
|
|
2038
|
|
|
|
|
|
|
{ |
2039
|
|
|
|
|
|
|
local $XFD::cur_self = $self; |
2040
|
|
|
|
|
|
|
|
2041
|
|
|
|
|
|
|
$XFD::ctx->{EventType} = "start_prefix_mapping"; |
2042
|
|
|
|
|
|
|
$XFD::ctx->{Node} = $_[0]; |
2043
|
|
|
|
|
|
|
$XFD::ctx->{HighScore} = -1; |
2044
|
|
|
|
|
|
|
|
2045
|
|
|
|
|
|
|
for ( @{$XFD::ctx->{start_prefix_mapping}} ) { |
2046
|
|
|
|
|
|
|
$_->[0]->( @{$_}[1..$#$_], @_ ); |
2047
|
|
|
|
|
|
|
} |
2048
|
|
|
|
|
|
|
|
2049
|
|
|
|
|
|
|
$self->{start_prefix_mappingSub}->( $self, [] ) |
2050
|
|
|
|
|
|
|
if $self->{start_prefix_mappingSub}; |
2051
|
|
|
|
|
|
|
|
2052
|
|
|
|
|
|
|
$self->_queue_pending_event( $XFD::ctx ); |
2053
|
|
|
|
|
|
|
} |
2054
|
|
|
|
|
|
|
|
2055
|
|
|
|
|
|
|
return; |
2056
|
|
|
|
|
|
|
} |
2057
|
|
|
|
|
|
|
|
2058
|
|
|
|
|
|
|
|
2059
|
|
|
|
|
|
|
sub end_prefix_mapping { |
2060
|
|
|
|
|
|
|
my $self = shift ; |
2061
|
|
|
|
|
|
|
my ( $elt ) = @_ ; |
2062
|
|
|
|
|
|
|
|
2063
|
|
|
|
|
|
|
my $start_ctx = pop @{$self->{CtxStack}->[-1]->{PrefixContexts}}; |
2064
|
|
|
|
|
|
|
|
2065
|
|
|
|
|
|
|
$self->_call_queued_end_subs( end_prefix_mapping => $start_ctx, @_ ); |
2066
|
|
|
|
|
|
|
|
2067
|
|
|
|
|
|
|
return; |
2068
|
|
|
|
|
|
|
} |
2069
|
|
|
|
|
|
|
|
2070
|
|
|
|
|
|
|
|
2071
|
|
|
|
|
|
|
compile_missing_methods __PACKAGE__, <<'CODE_END', sax_event_names; |
2072
|
|
|
|
|
|
|
#line 1 XML::Filter::Dispatcher::() |
2073
|
|
|
|
|
|
|
sub { |
2074
|
|
|
|
|
|
|
my $self = shift ; |
2075
|
|
|
|
|
|
|
return unless ( |
2076
|
|
|
|
|
|
|
@{$self->{CtxStack}} |
2077
|
|
|
|
|
|
|
&& $self->{CtxStack}->[-1]->{ChildCtx}->{} |
2078
|
|
|
|
|
|
|
) |
2079
|
|
|
|
|
|
|
|| $self->{Sub}; |
2080
|
|
|
|
|
|
|
|
2081
|
|
|
|
|
|
|
my ( $data ) = @_; |
2082
|
|
|
|
|
|
|
|
2083
|
|
|
|
|
|
|
local $XFD::cur_self = $self; |
2084
|
|
|
|
|
|
|
|
2085
|
|
|
|
|
|
|
local $XFD::ctx = $self->_build_ctx; |
2086
|
|
|
|
|
|
|
|
2087
|
|
|
|
|
|
|
$XFD::ctx->{EventType} = ""; |
2088
|
|
|
|
|
|
|
$XFD::ctx->{Node} = $data; |
2089
|
|
|
|
|
|
|
$XFD::ctx->{HighScore} = -1; |
2090
|
|
|
|
|
|
|
|
2091
|
|
|
|
|
|
|
for ( @{$XFD::ctx->{}} ) { |
2092
|
|
|
|
|
|
|
$_->[0]->( @{$_}[1..$#$_], @_ ); |
2093
|
|
|
|
|
|
|
} |
2094
|
|
|
|
|
|
|
|
2095
|
|
|
|
|
|
|
$self->{Sub}->( $self, [] ) |
2096
|
|
|
|
|
|
|
if $self->{Sub}; |
2097
|
|
|
|
|
|
|
|
2098
|
|
|
|
|
|
|
$self->_queue_pending_event( $XFD::ctx ); |
2099
|
|
|
|
|
|
|
|
2100
|
|
|
|
|
|
|
$self->_call_queued_end_subs( @_ ) if $XFD::ctx->{EndSubs}; |
2101
|
|
|
|
|
|
|
|
2102
|
|
|
|
|
|
|
return undef; |
2103
|
|
|
|
|
|
|
} |
2104
|
|
|
|
|
|
|
CODE_END |
2105
|
|
|
|
|
|
|
|
2106
|
|
|
|
|
|
|
=head2 Notes for XPath Afficianados |
2107
|
|
|
|
|
|
|
|
2108
|
|
|
|
|
|
|
This section assumes familiarity with XPath in order to explain some of |
2109
|
|
|
|
|
|
|
the particulars and side effects of the incremental XPath engine. |
2110
|
|
|
|
|
|
|
|
2111
|
|
|
|
|
|
|
=over |
2112
|
|
|
|
|
|
|
|
2113
|
|
|
|
|
|
|
=item * |
2114
|
|
|
|
|
|
|
|
2115
|
|
|
|
|
|
|
Much of XPath's power comes from the concept of a "node set". A node |
2116
|
|
|
|
|
|
|
set is a set of nodes returned by many XPath expressions. |
2117
|
|
|
|
|
|
|
Event XPath fires a rule once for each node the rule applies to. If there |
2118
|
|
|
|
|
|
|
is a location path in the expression, the rule will fire once for each |
2119
|
|
|
|
|
|
|
matching event (perhaps twice if both start and end SAX events are |
2120
|
|
|
|
|
|
|
trapped, see XXXX below. |
2121
|
|
|
|
|
|
|
|
2122
|
|
|
|
|
|
|
Expressions like C<0>, C, C<1>, and C<'a'> have no location |
2123
|
|
|
|
|
|
|
path and apply to all nodes (including namespace nodes and processing |
2124
|
|
|
|
|
|
|
instructions). |
2125
|
|
|
|
|
|
|
|
2126
|
|
|
|
|
|
|
=item * |
2127
|
|
|
|
|
|
|
|
2128
|
|
|
|
|
|
|
The XPath parser catches some simple mistakes Perlers might make in typing |
2129
|
|
|
|
|
|
|
XPath expressions, such as using C<&&> or C<==> instead of C or C<=>. |
2130
|
|
|
|
|
|
|
|
2131
|
|
|
|
|
|
|
=item * |
2132
|
|
|
|
|
|
|
|
2133
|
|
|
|
|
|
|
SAX does not define events for attributes; these are passed in to the |
2134
|
|
|
|
|
|
|
start_element (but not end_element) methods as part of the element node. |
2135
|
|
|
|
|
|
|
XML::Filter::Dispatcher emulates an event for each attribute in order to |
2136
|
|
|
|
|
|
|
allow selecting attribute nodes. |
2137
|
|
|
|
|
|
|
|
2138
|
|
|
|
|
|
|
=item * |
2139
|
|
|
|
|
|
|
|
2140
|
|
|
|
|
|
|
Axes in path steps (/foo::...) |
2141
|
|
|
|
|
|
|
|
2142
|
|
|
|
|
|
|
Only some axes can be reasonably supported within a SAX framework without |
2143
|
|
|
|
|
|
|
building a DOM and/or queueing SAX events for in-document-order delivery. |
2144
|
|
|
|
|
|
|
|
2145
|
|
|
|
|
|
|
On the other hand, lots of SAX-specific Axes are supported. |
2146
|
|
|
|
|
|
|
|
2147
|
|
|
|
|
|
|
=item * |
2148
|
|
|
|
|
|
|
|
2149
|
|
|
|
|
|
|
text node aggregation |
2150
|
|
|
|
|
|
|
|
2151
|
|
|
|
|
|
|
SAX does not guarantee that C events will be aggregated as |
2152
|
|
|
|
|
|
|
much as possible, as text() nodes do in XPath. Generally, however, |
2153
|
|
|
|
|
|
|
this is not a problem; instead of writing |
2154
|
|
|
|
|
|
|
|
2155
|
|
|
|
|
|
|
"quotation/text()" => sub { |
2156
|
|
|
|
|
|
|
## BUG: may be called several times within each quotation elt. |
2157
|
|
|
|
|
|
|
my $self = shift; |
2158
|
|
|
|
|
|
|
print "He said '", $self->current_node->{Data}, "'\n'"; |
2159
|
|
|
|
|
|
|
}, |
2160
|
|
|
|
|
|
|
|
2161
|
|
|
|
|
|
|
write |
2162
|
|
|
|
|
|
|
|
2163
|
|
|
|
|
|
|
"string( quotation )" => sub { |
2164
|
|
|
|
|
|
|
my $self = shift; |
2165
|
|
|
|
|
|
|
print "He said '", xvalue, "'\n'"; |
2166
|
|
|
|
|
|
|
}, |
2167
|
|
|
|
|
|
|
|
2168
|
|
|
|
|
|
|
The former is unsafe; consider the XML: |
2169
|
|
|
|
|
|
|
|
2170
|
|
|
|
|
|
|
I am GREAT! |
2171
|
|
|
|
|
|
|
|
2172
|
|
|
|
|
|
|
Rules like C<.../text()> will fire twice, which is not what is needed |
2173
|
|
|
|
|
|
|
here. |
2174
|
|
|
|
|
|
|
|
2175
|
|
|
|
|
|
|
Rules like C will fire once, at the end_element event, |
2176
|
|
|
|
|
|
|
with all descendant text of quotation as the expression result. |
2177
|
|
|
|
|
|
|
|
2178
|
|
|
|
|
|
|
You can also place an L |
2179
|
|
|
|
|
|
|
instance upstream of XML::Filter::Dispatcher if you really want to use |
2180
|
|
|
|
|
|
|
the former syntax (but the C example will still generate more |
2181
|
|
|
|
|
|
|
than one event due to the comment). |
2182
|
|
|
|
|
|
|
|
2183
|
|
|
|
|
|
|
=item * |
2184
|
|
|
|
|
|
|
|
2185
|
|
|
|
|
|
|
Axes |
2186
|
|
|
|
|
|
|
|
2187
|
|
|
|
|
|
|
All axes are implemented except for those noted below as "todo" or "not |
2188
|
|
|
|
|
|
|
soon". |
2189
|
|
|
|
|
|
|
|
2190
|
|
|
|
|
|
|
Also except where noted, axes have a principal event type of |
2191
|
|
|
|
|
|
|
C. This node type is used by the C<*> node type test. |
2192
|
|
|
|
|
|
|
|
2193
|
|
|
|
|
|
|
Note: XML::Filter::Dispatcher tries to die() on nonsensical paths like |
2194
|
|
|
|
|
|
|
C or C/start-cdata::*>, but it may miss some. |
2195
|
|
|
|
|
|
|
This is meant to help in debugging user code; the eventual goal is to |
2196
|
|
|
|
|
|
|
catch all such nonsense. |
2197
|
|
|
|
|
|
|
|
2198
|
|
|
|
|
|
|
=over |
2199
|
|
|
|
|
|
|
|
2200
|
|
|
|
|
|
|
=item * |
2201
|
|
|
|
|
|
|
|
2202
|
|
|
|
|
|
|
ancestor:: (XPath, todo, will be limited) |
2203
|
|
|
|
|
|
|
|
2204
|
|
|
|
|
|
|
=item * |
2205
|
|
|
|
|
|
|
|
2206
|
|
|
|
|
|
|
ancestor-or-self:: (XPath, todo, will be limited) |
2207
|
|
|
|
|
|
|
|
2208
|
|
|
|
|
|
|
=item * |
2209
|
|
|
|
|
|
|
|
2210
|
|
|
|
|
|
|
C (XPath, C) |
2211
|
|
|
|
|
|
|
|
2212
|
|
|
|
|
|
|
=item * |
2213
|
|
|
|
|
|
|
|
2214
|
|
|
|
|
|
|
C (XPath) |
2215
|
|
|
|
|
|
|
|
2216
|
|
|
|
|
|
|
Selects start_element, end_element, start_prefix_mapping, |
2217
|
|
|
|
|
|
|
end_prefix_mapping, characters, comment, and |
2218
|
|
|
|
|
|
|
processing_instruction events that are direct "children" of the context |
2219
|
|
|
|
|
|
|
element or document. |
2220
|
|
|
|
|
|
|
|
2221
|
|
|
|
|
|
|
=item * |
2222
|
|
|
|
|
|
|
|
2223
|
|
|
|
|
|
|
C (XPath) |
2224
|
|
|
|
|
|
|
|
2225
|
|
|
|
|
|
|
=item * |
2226
|
|
|
|
|
|
|
|
2227
|
|
|
|
|
|
|
C (XPath) |
2228
|
|
|
|
|
|
|
|
2229
|
|
|
|
|
|
|
=item * |
2230
|
|
|
|
|
|
|
|
2231
|
|
|
|
|
|
|
C (SAX, C) |
2232
|
|
|
|
|
|
|
|
2233
|
|
|
|
|
|
|
Like C, but selects the C event of the |
2234
|
|
|
|
|
|
|
element context node. |
2235
|
|
|
|
|
|
|
|
2236
|
|
|
|
|
|
|
This is usually used in preference to C due to its |
2237
|
|
|
|
|
|
|
brevity. |
2238
|
|
|
|
|
|
|
|
2239
|
|
|
|
|
|
|
Because this selects the end element event, most of the path tests that |
2240
|
|
|
|
|
|
|
may follow other axes are not valid following this axis. self:: and |
2241
|
|
|
|
|
|
|
attribute:: are the only legal axes that may occur to the right of this |
2242
|
|
|
|
|
|
|
axis. |
2243
|
|
|
|
|
|
|
|
2244
|
|
|
|
|
|
|
=item * |
2245
|
|
|
|
|
|
|
|
2246
|
|
|
|
|
|
|
C (SAX, C) |
2247
|
|
|
|
|
|
|
|
2248
|
|
|
|
|
|
|
Like C, but selects the C event of the document |
2249
|
|
|
|
|
|
|
context node. |
2250
|
|
|
|
|
|
|
|
2251
|
|
|
|
|
|
|
Note: Because this selects the end document event, most of the path tests |
2252
|
|
|
|
|
|
|
that may follow other axes are not valid following this axis. |
2253
|
|
|
|
|
|
|
self:: are the only legal axes that may occur to the |
2254
|
|
|
|
|
|
|
right of this axis. |
2255
|
|
|
|
|
|
|
|
2256
|
|
|
|
|
|
|
=item * |
2257
|
|
|
|
|
|
|
|
2258
|
|
|
|
|
|
|
C (SAX, C) |
2259
|
|
|
|
|
|
|
|
2260
|
|
|
|
|
|
|
B. This axis is not necessary given C. |
2261
|
|
|
|
|
|
|
|
2262
|
|
|
|
|
|
|
Like C, but selects the C event of the element |
2263
|
|
|
|
|
|
|
context node. This is like C, but different from |
2264
|
|
|
|
|
|
|
C. |
2265
|
|
|
|
|
|
|
|
2266
|
|
|
|
|
|
|
Note: Because this selects the end element event, most of the path tests |
2267
|
|
|
|
|
|
|
that may follow other axes are not valid following this axis. |
2268
|
|
|
|
|
|
|
attribute:: and self:: are the only legal axes that may occur to the |
2269
|
|
|
|
|
|
|
right of this axis. |
2270
|
|
|
|
|
|
|
|
2271
|
|
|
|
|
|
|
=item * |
2272
|
|
|
|
|
|
|
|
2273
|
|
|
|
|
|
|
C (XPath, B) |
2274
|
|
|
|
|
|
|
|
2275
|
|
|
|
|
|
|
=item * |
2276
|
|
|
|
|
|
|
|
2277
|
|
|
|
|
|
|
C (XPath, B) |
2278
|
|
|
|
|
|
|
|
2279
|
|
|
|
|
|
|
Implementing following axes will take some fucky postponement logic and |
2280
|
|
|
|
|
|
|
are likely to wait until I have time. Until then, setting a flag in |
2281
|
|
|
|
|
|
|
$self in one handler and checking in another should suffice for most |
2282
|
|
|
|
|
|
|
uses. |
2283
|
|
|
|
|
|
|
|
2284
|
|
|
|
|
|
|
=item * |
2285
|
|
|
|
|
|
|
|
2286
|
|
|
|
|
|
|
C (XPath, C, B) |
2287
|
|
|
|
|
|
|
|
2288
|
|
|
|
|
|
|
=item * |
2289
|
|
|
|
|
|
|
|
2290
|
|
|
|
|
|
|
C (XPath, B) |
2291
|
|
|
|
|
|
|
|
2292
|
|
|
|
|
|
|
parent/ancestor paths will not allow you to descend the tree, that would |
2293
|
|
|
|
|
|
|
require DOM building and SAX event queueing. |
2294
|
|
|
|
|
|
|
|
2295
|
|
|
|
|
|
|
=item * |
2296
|
|
|
|
|
|
|
|
2297
|
|
|
|
|
|
|
C (XPath, B) |
2298
|
|
|
|
|
|
|
|
2299
|
|
|
|
|
|
|
=item * |
2300
|
|
|
|
|
|
|
|
2301
|
|
|
|
|
|
|
C (XPath, B) |
2302
|
|
|
|
|
|
|
|
2303
|
|
|
|
|
|
|
Implementing reverse axes will take some fucky postponement logic and |
2304
|
|
|
|
|
|
|
are likely to wait until I have time. Until then, setting a flag in |
2305
|
|
|
|
|
|
|
$self in one handler and checking in another should suffice for most |
2306
|
|
|
|
|
|
|
uses. |
2307
|
|
|
|
|
|
|
|
2308
|
|
|
|
|
|
|
=item * |
2309
|
|
|
|
|
|
|
|
2310
|
|
|
|
|
|
|
C (XPath) |
2311
|
|
|
|
|
|
|
|
2312
|
|
|
|
|
|
|
=item * |
2313
|
|
|
|
|
|
|
|
2314
|
|
|
|
|
|
|
C (SAX, C ) |
2315
|
|
|
|
|
|
|
|
2316
|
|
|
|
|
|
|
This is like child::, but selects the C events. This is |
2317
|
|
|
|
|
|
|
usually used in preference to C due to its brevity. |
2318
|
|
|
|
|
|
|
|
2319
|
|
|
|
|
|
|
C is rarely used to drive code handlers because rules that |
2320
|
|
|
|
|
|
|
match document or element events already only fire code handlers on the |
2321
|
|
|
|
|
|
|
C event and not the C event (however, when a |
2322
|
|
|
|
|
|
|
SAX handler is used, such expressions send both start and end events to |
2323
|
|
|
|
|
|
|
the downstream handler, so start:: has utility there). |
2324
|
|
|
|
|
|
|
|
2325
|
|
|
|
|
|
|
=item * |
2326
|
|
|
|
|
|
|
|
2327
|
|
|
|
|
|
|
C (SAX, C) |
2328
|
|
|
|
|
|
|
|
2329
|
|
|
|
|
|
|
B. This axis is confusing compared to and |
2330
|
|
|
|
|
|
|
C, and is not necessary given C. |
2331
|
|
|
|
|
|
|
|
2332
|
|
|
|
|
|
|
This is like C, but selects only the C events. |
2333
|
|
|
|
|
|
|
|
2334
|
|
|
|
|
|
|
=item * |
2335
|
|
|
|
|
|
|
|
2336
|
|
|
|
|
|
|
C (SAX, C) |
2337
|
|
|
|
|
|
|
|
2338
|
|
|
|
|
|
|
B. This axis is not necessary given C. |
2339
|
|
|
|
|
|
|
|
2340
|
|
|
|
|
|
|
This is like C, but selects only the C events. |
2341
|
|
|
|
|
|
|
|
2342
|
|
|
|
|
|
|
=back |
2343
|
|
|
|
|
|
|
|
2344
|
|
|
|
|
|
|
=item * |
2345
|
|
|
|
|
|
|
|
2346
|
|
|
|
|
|
|
Implemented XPath Features |
2347
|
|
|
|
|
|
|
|
2348
|
|
|
|
|
|
|
Anything not on this list or listed as unimplemented is a TODO. Ring me |
2349
|
|
|
|
|
|
|
up if you need it. |
2350
|
|
|
|
|
|
|
|
2351
|
|
|
|
|
|
|
=over |
2352
|
|
|
|
|
|
|
|
2353
|
|
|
|
|
|
|
=item * |
2354
|
|
|
|
|
|
|
|
2355
|
|
|
|
|
|
|
String Functions |
2356
|
|
|
|
|
|
|
|
2357
|
|
|
|
|
|
|
=over |
2358
|
|
|
|
|
|
|
|
2359
|
|
|
|
|
|
|
=item * |
2360
|
|
|
|
|
|
|
|
2361
|
|
|
|
|
|
|
concat( string, string, string* ) |
2362
|
|
|
|
|
|
|
|
2363
|
|
|
|
|
|
|
=item * |
2364
|
|
|
|
|
|
|
|
2365
|
|
|
|
|
|
|
contains( string, string ) |
2366
|
|
|
|
|
|
|
|
2367
|
|
|
|
|
|
|
=item * |
2368
|
|
|
|
|
|
|
|
2369
|
|
|
|
|
|
|
normalize-space( string? ) |
2370
|
|
|
|
|
|
|
|
2371
|
|
|
|
|
|
|
C is equivalent to C. |
2372
|
|
|
|
|
|
|
|
2373
|
|
|
|
|
|
|
=item * |
2374
|
|
|
|
|
|
|
|
2375
|
|
|
|
|
|
|
starts-with( string, string ) |
2376
|
|
|
|
|
|
|
|
2377
|
|
|
|
|
|
|
=item * |
2378
|
|
|
|
|
|
|
|
2379
|
|
|
|
|
|
|
string(), string( object ) |
2380
|
|
|
|
|
|
|
|
2381
|
|
|
|
|
|
|
Object may be a number, boolean, string, or the result of a location path: |
2382
|
|
|
|
|
|
|
|
2383
|
|
|
|
|
|
|
string( 10 ); |
2384
|
|
|
|
|
|
|
string( /a/b/c ); |
2385
|
|
|
|
|
|
|
string( @id ); |
2386
|
|
|
|
|
|
|
|
2387
|
|
|
|
|
|
|
C is equivalent to C. |
2388
|
|
|
|
|
|
|
|
2389
|
|
|
|
|
|
|
=item * |
2390
|
|
|
|
|
|
|
|
2391
|
|
|
|
|
|
|
string-length( string? ) |
2392
|
|
|
|
|
|
|
|
2393
|
|
|
|
|
|
|
string-length() not supported; can't stringify the context node without |
2394
|
|
|
|
|
|
|
keeping all of the context node's children in mempory. Could enable it |
2395
|
|
|
|
|
|
|
for leaf nodes, I suppose, like attrs and #PCDATA containing elts. Drop |
2396
|
|
|
|
|
|
|
me a line if you need this (it's not totally trivial or I'd have done it). |
2397
|
|
|
|
|
|
|
|
2398
|
|
|
|
|
|
|
=item * |
2399
|
|
|
|
|
|
|
|
2400
|
|
|
|
|
|
|
substring( string, number, number? ) |
2401
|
|
|
|
|
|
|
|
2402
|
|
|
|
|
|
|
=item * |
2403
|
|
|
|
|
|
|
|
2404
|
|
|
|
|
|
|
substring-after( string, string ) |
2405
|
|
|
|
|
|
|
|
2406
|
|
|
|
|
|
|
=item * |
2407
|
|
|
|
|
|
|
|
2408
|
|
|
|
|
|
|
substring-before( string, string ) |
2409
|
|
|
|
|
|
|
|
2410
|
|
|
|
|
|
|
=item * |
2411
|
|
|
|
|
|
|
|
2412
|
|
|
|
|
|
|
translate( string, string, string ) |
2413
|
|
|
|
|
|
|
|
2414
|
|
|
|
|
|
|
=back |
2415
|
|
|
|
|
|
|
|
2416
|
|
|
|
|
|
|
=item * |
2417
|
|
|
|
|
|
|
|
2418
|
|
|
|
|
|
|
Boolean Functions, Operators |
2419
|
|
|
|
|
|
|
|
2420
|
|
|
|
|
|
|
=over |
2421
|
|
|
|
|
|
|
|
2422
|
|
|
|
|
|
|
=item * |
2423
|
|
|
|
|
|
|
|
2424
|
|
|
|
|
|
|
boolean( object ) |
2425
|
|
|
|
|
|
|
|
2426
|
|
|
|
|
|
|
See notes about node sets for the string() function above. |
2427
|
|
|
|
|
|
|
|
2428
|
|
|
|
|
|
|
=item * |
2429
|
|
|
|
|
|
|
|
2430
|
|
|
|
|
|
|
false() |
2431
|
|
|
|
|
|
|
|
2432
|
|
|
|
|
|
|
=item * |
2433
|
|
|
|
|
|
|
|
2434
|
|
|
|
|
|
|
lang( string ) B. |
2435
|
|
|
|
|
|
|
|
2436
|
|
|
|
|
|
|
=item * |
2437
|
|
|
|
|
|
|
|
2438
|
|
|
|
|
|
|
not( boolean ) |
2439
|
|
|
|
|
|
|
|
2440
|
|
|
|
|
|
|
=item * |
2441
|
|
|
|
|
|
|
|
2442
|
|
|
|
|
|
|
true() |
2443
|
|
|
|
|
|
|
|
2444
|
|
|
|
|
|
|
=back |
2445
|
|
|
|
|
|
|
|
2446
|
|
|
|
|
|
|
=item * |
2447
|
|
|
|
|
|
|
|
2448
|
|
|
|
|
|
|
Number Functions, Operators |
2449
|
|
|
|
|
|
|
|
2450
|
|
|
|
|
|
|
=over |
2451
|
|
|
|
|
|
|
|
2452
|
|
|
|
|
|
|
=item * |
2453
|
|
|
|
|
|
|
|
2454
|
|
|
|
|
|
|
ceil( number ) |
2455
|
|
|
|
|
|
|
|
2456
|
|
|
|
|
|
|
=item * |
2457
|
|
|
|
|
|
|
|
2458
|
|
|
|
|
|
|
floor( number ) |
2459
|
|
|
|
|
|
|
|
2460
|
|
|
|
|
|
|
=item * |
2461
|
|
|
|
|
|
|
|
2462
|
|
|
|
|
|
|
number( object? ) |
2463
|
|
|
|
|
|
|
|
2464
|
|
|
|
|
|
|
Converts strings, numbers, booleans, or the result of a location path |
2465
|
|
|
|
|
|
|
(C). |
2466
|
|
|
|
|
|
|
|
2467
|
|
|
|
|
|
|
Unlike real XPath, this dies if the object cannot be cleanly converted |
2468
|
|
|
|
|
|
|
in to a number. This is due to Perl's varying level of support for NaN, |
2469
|
|
|
|
|
|
|
and may change in the future. |
2470
|
|
|
|
|
|
|
|
2471
|
|
|
|
|
|
|
C is equivalent to C. |
2472
|
|
|
|
|
|
|
|
2473
|
|
|
|
|
|
|
=item * |
2474
|
|
|
|
|
|
|
|
2475
|
|
|
|
|
|
|
round ( number ) |
2476
|
|
|
|
|
|
|
|
2477
|
|
|
|
|
|
|
=item * sum( node-set ) B. |
2478
|
|
|
|
|
|
|
|
2479
|
|
|
|
|
|
|
=back |
2480
|
|
|
|
|
|
|
|
2481
|
|
|
|
|
|
|
=item * |
2482
|
|
|
|
|
|
|
|
2483
|
|
|
|
|
|
|
Node Set Functions |
2484
|
|
|
|
|
|
|
|
2485
|
|
|
|
|
|
|
Many of these cannot be fully implemented in an event oriented |
2486
|
|
|
|
|
|
|
environment. |
2487
|
|
|
|
|
|
|
|
2488
|
|
|
|
|
|
|
=over |
2489
|
|
|
|
|
|
|
|
2490
|
|
|
|
|
|
|
=item * |
2491
|
|
|
|
|
|
|
|
2492
|
|
|
|
|
|
|
last() B. |
2493
|
|
|
|
|
|
|
|
2494
|
|
|
|
|
|
|
=item * |
2495
|
|
|
|
|
|
|
|
2496
|
|
|
|
|
|
|
position() B. |
2497
|
|
|
|
|
|
|
|
2498
|
|
|
|
|
|
|
=item * |
2499
|
|
|
|
|
|
|
|
2500
|
|
|
|
|
|
|
count( node-set ) B. |
2501
|
|
|
|
|
|
|
|
2502
|
|
|
|
|
|
|
=back |
2503
|
|
|
|
|
|
|
|
2504
|
|
|
|
|
|
|
=item * |
2505
|
|
|
|
|
|
|
|
2506
|
|
|
|
|
|
|
id( object ) B. |
2507
|
|
|
|
|
|
|
|
2508
|
|
|
|
|
|
|
=item * |
2509
|
|
|
|
|
|
|
|
2510
|
|
|
|
|
|
|
local-name( node-set? ) |
2511
|
|
|
|
|
|
|
|
2512
|
|
|
|
|
|
|
=item * |
2513
|
|
|
|
|
|
|
|
2514
|
|
|
|
|
|
|
namespace-uri( node-set? ) |
2515
|
|
|
|
|
|
|
|
2516
|
|
|
|
|
|
|
=item * |
2517
|
|
|
|
|
|
|
|
2518
|
|
|
|
|
|
|
name( node-set? ) |
2519
|
|
|
|
|
|
|
|
2520
|
|
|
|
|
|
|
=item * |
2521
|
|
|
|
|
|
|
|
2522
|
|
|
|
|
|
|
All relational operators |
2523
|
|
|
|
|
|
|
|
2524
|
|
|
|
|
|
|
No support for nodesets, though. |
2525
|
|
|
|
|
|
|
|
2526
|
|
|
|
|
|
|
=item * |
2527
|
|
|
|
|
|
|
|
2528
|
|
|
|
|
|
|
All logical operators |
2529
|
|
|
|
|
|
|
|
2530
|
|
|
|
|
|
|
Supports limited nodesets, see the string() function description for details. |
2531
|
|
|
|
|
|
|
|
2532
|
|
|
|
|
|
|
=back |
2533
|
|
|
|
|
|
|
|
2534
|
|
|
|
|
|
|
=item * |
2535
|
|
|
|
|
|
|
|
2536
|
|
|
|
|
|
|
Missing Features |
2537
|
|
|
|
|
|
|
|
2538
|
|
|
|
|
|
|
Some features are entirely or just currently missing due to the lack of |
2539
|
|
|
|
|
|
|
nodesets or the time needed to work around their lack. This is an |
2540
|
|
|
|
|
|
|
incomplete list; it's growing as I find new things not to implement. |
2541
|
|
|
|
|
|
|
|
2542
|
|
|
|
|
|
|
=over |
2543
|
|
|
|
|
|
|
|
2544
|
|
|
|
|
|
|
=item * |
2545
|
|
|
|
|
|
|
|
2546
|
|
|
|
|
|
|
count() |
2547
|
|
|
|
|
|
|
|
2548
|
|
|
|
|
|
|
No nodesets => no count() of nodes in a node set. |
2549
|
|
|
|
|
|
|
|
2550
|
|
|
|
|
|
|
=item * |
2551
|
|
|
|
|
|
|
|
2552
|
|
|
|
|
|
|
last() |
2553
|
|
|
|
|
|
|
|
2554
|
|
|
|
|
|
|
With SAX, you can't tell when you are at the end of what would be a node set |
2555
|
|
|
|
|
|
|
in XPath. |
2556
|
|
|
|
|
|
|
|
2557
|
|
|
|
|
|
|
=item * |
2558
|
|
|
|
|
|
|
|
2559
|
|
|
|
|
|
|
position() |
2560
|
|
|
|
|
|
|
|
2561
|
|
|
|
|
|
|
I will implement pieces of this as I can. None are implemented as yet. |
2562
|
|
|
|
|
|
|
|
2563
|
|
|
|
|
|
|
=back |
2564
|
|
|
|
|
|
|
|
2565
|
|
|
|
|
|
|
=item * |
2566
|
|
|
|
|
|
|
|
2567
|
|
|
|
|
|
|
Todo features |
2568
|
|
|
|
|
|
|
|
2569
|
|
|
|
|
|
|
=over |
2570
|
|
|
|
|
|
|
|
2571
|
|
|
|
|
|
|
=item * |
2572
|
|
|
|
|
|
|
|
2573
|
|
|
|
|
|
|
id() |
2574
|
|
|
|
|
|
|
|
2575
|
|
|
|
|
|
|
=item * |
2576
|
|
|
|
|
|
|
|
2577
|
|
|
|
|
|
|
lang() |
2578
|
|
|
|
|
|
|
|
2579
|
|
|
|
|
|
|
=item * |
2580
|
|
|
|
|
|
|
|
2581
|
|
|
|
|
|
|
sum( node-set ) |
2582
|
|
|
|
|
|
|
|
2583
|
|
|
|
|
|
|
=back |
2584
|
|
|
|
|
|
|
|
2585
|
|
|
|
|
|
|
=item * |
2586
|
|
|
|
|
|
|
|
2587
|
|
|
|
|
|
|
Extensions |
2588
|
|
|
|
|
|
|
|
2589
|
|
|
|
|
|
|
=over |
2590
|
|
|
|
|
|
|
|
2591
|
|
|
|
|
|
|
=item * |
2592
|
|
|
|
|
|
|
|
2593
|
|
|
|
|
|
|
is-start-event(), is-end-event() |
2594
|
|
|
|
|
|
|
|
2595
|
|
|
|
|
|
|
XPath has no concept of time; it's meant to operate on a tree of nodes. SAX |
2596
|
|
|
|
|
|
|
has C and C events and C and |
2597
|
|
|
|
|
|
|
C events. |
2598
|
|
|
|
|
|
|
|
2599
|
|
|
|
|
|
|
By default, XML::Filter::Dispatcher acts on start events and not end events |
2600
|
|
|
|
|
|
|
(note that all rules are evaluated on both, but the actions are not run on end_ |
2601
|
|
|
|
|
|
|
events by default). |
2602
|
|
|
|
|
|
|
|
2603
|
|
|
|
|
|
|
By including a call to the C or C functions in a |
2604
|
|
|
|
|
|
|
predicate the rule may be forced to fire only on end events or on both start |
2605
|
|
|
|
|
|
|
and end events (using a C<[is-start-event() or is-end-event()]> idiom). |
2606
|
|
|
|
|
|
|
|
2607
|
|
|
|
|
|
|
=back |
2608
|
|
|
|
|
|
|
|
2609
|
|
|
|
|
|
|
=back |
2610
|
|
|
|
|
|
|
|
2611
|
|
|
|
|
|
|
=head1 TODO |
2612
|
|
|
|
|
|
|
|
2613
|
|
|
|
|
|
|
=over |
2614
|
|
|
|
|
|
|
|
2615
|
|
|
|
|
|
|
=item * |
2616
|
|
|
|
|
|
|
|
2617
|
|
|
|
|
|
|
Namespace support. |
2618
|
|
|
|
|
|
|
|
2619
|
|
|
|
|
|
|
=item * |
2620
|
|
|
|
|
|
|
|
2621
|
|
|
|
|
|
|
Text node aggregation so C handlers fire once per text node |
2622
|
|
|
|
|
|
|
instead of once per C event. |
2623
|
|
|
|
|
|
|
|
2624
|
|
|
|
|
|
|
=item * |
2625
|
|
|
|
|
|
|
|
2626
|
|
|
|
|
|
|
Nice messages on legitimate but unsupported axes. |
2627
|
|
|
|
|
|
|
|
2628
|
|
|
|
|
|
|
=item * |
2629
|
|
|
|
|
|
|
|
2630
|
|
|
|
|
|
|
/../ (parent node) |
2631
|
|
|
|
|
|
|
|
2632
|
|
|
|
|
|
|
=item * |
2633
|
|
|
|
|
|
|
|
2634
|
|
|
|
|
|
|
C, C, C methods. |
2635
|
|
|
|
|
|
|
|
2636
|
|
|
|
|
|
|
=back |
2637
|
|
|
|
|
|
|
|
2638
|
|
|
|
|
|
|
=head1 OPTIMIZING |
2639
|
|
|
|
|
|
|
|
2640
|
|
|
|
|
|
|
Pass Assume_xvalue => 0 flag to tell X::F::D not to support xvalue |
2641
|
|
|
|
|
|
|
and xvalue_type, which lets it skip some instructions and run faster. |
2642
|
|
|
|
|
|
|
|
2643
|
|
|
|
|
|
|
Pass SortAttributes => 0 flag to prevent calling sort() for each |
2644
|
|
|
|
|
|
|
element's attributes (note that Perl changes hashing algorithms |
2645
|
|
|
|
|
|
|
occasionally, so setting this to 0 may expose ordering dependancies |
2646
|
|
|
|
|
|
|
in your code). |
2647
|
|
|
|
|
|
|
|
2648
|
|
|
|
|
|
|
=head1 DEBUGGING |
2649
|
|
|
|
|
|
|
|
2650
|
|
|
|
|
|
|
NOTE: this section describes things that may change from version to |
2651
|
|
|
|
|
|
|
version as I need different views in to the internals. |
2652
|
|
|
|
|
|
|
|
2653
|
|
|
|
|
|
|
Set the option Debug => 1 to see the Perl code for the compiled ruleset. |
2654
|
|
|
|
|
|
|
If you have GraphViz.pm and ee installed and working, set Debug => 2 to |
2655
|
|
|
|
|
|
|
see a graph diagram of the intermediate tree generated by the compiler. |
2656
|
|
|
|
|
|
|
|
2657
|
|
|
|
|
|
|
Set the env. var XFDSHOWBUFFERHIGHWATER=1 to see what events were |
2658
|
|
|
|
|
|
|
postponed the most (in terms of how many events had to pile up behind |
2659
|
|
|
|
|
|
|
them). This can be of some help if you experience lots of buffering or |
2660
|
|
|
|
|
|
|
high latency through the filter. Latency meaning the lag between when |
2661
|
|
|
|
|
|
|
an event arrives at this filter and when it is dispatched to its |
2662
|
|
|
|
|
|
|
actions. This will only report events that were actually postponed. If |
2663
|
|
|
|
|
|
|
you have a 0 latency filter, the report will list no events. |
2664
|
|
|
|
|
|
|
|
2665
|
|
|
|
|
|
|
Set the env. var XFDOPTIMIZE=0 to prevent all sorts of optimizations. |
2666
|
|
|
|
|
|
|
|
2667
|
|
|
|
|
|
|
=head1 LIMITATIONS |
2668
|
|
|
|
|
|
|
|
2669
|
|
|
|
|
|
|
=over |
2670
|
|
|
|
|
|
|
|
2671
|
|
|
|
|
|
|
=item * |
2672
|
|
|
|
|
|
|
|
2673
|
|
|
|
|
|
|
NaN is not handled properly due to mediocre support in C, |
2674
|
|
|
|
|
|
|
especially across some platforms that it apparently isn't easily supported on. |
2675
|
|
|
|
|
|
|
|
2676
|
|
|
|
|
|
|
=item * |
2677
|
|
|
|
|
|
|
|
2678
|
|
|
|
|
|
|
-0 (negative zero) is not provided or handled properly |
2679
|
|
|
|
|
|
|
|
2680
|
|
|
|
|
|
|
=item * |
2681
|
|
|
|
|
|
|
|
2682
|
|
|
|
|
|
|
+/- Infinity is not handled properly due to mediocre support in C, |
2683
|
|
|
|
|
|
|
especially across some platforms that it apparently isn't easily supported on. |
2684
|
|
|
|
|
|
|
|
2685
|
|
|
|
|
|
|
=back |
2686
|
|
|
|
|
|
|
|
2687
|
|
|
|
|
|
|
This is more of a frustration than a limitation, but this class requires that |
2688
|
|
|
|
|
|
|
you pass in a type when setting variables (in the C ctor parameter or |
2689
|
|
|
|
|
|
|
when calling C). This is so that the engine can tell what type a |
2690
|
|
|
|
|
|
|
variable is, since string(), number() and boolean() all treat the Perlian C<0> |
2691
|
|
|
|
|
|
|
differently depending on its type. In Perl the digit C<0> means C, |
2692
|
|
|
|
|
|
|
C<0> or C<'0'>, depending on context, but it's a consistent semantic. When |
2693
|
|
|
|
|
|
|
passing a C<0> from Perl lands to XPath-land, we need to give it a type so that |
2694
|
|
|
|
|
|
|
C can, for instance, decide whether to convert it to C<'0'> or |
2695
|
|
|
|
|
|
|
C<'false'>. |
2696
|
|
|
|
|
|
|
|
2697
|
|
|
|
|
|
|
=head1 THANKS |
2698
|
|
|
|
|
|
|
|
2699
|
|
|
|
|
|
|
...to Kip Hampton, Robin Berjon and Matt Sergeant for sanity checks and |
2700
|
|
|
|
|
|
|
to James Clark (of Expat fame) for posting a Yacc XPath grammar where |
2701
|
|
|
|
|
|
|
I could snarf it years later and add lots of Perl code to it. |
2702
|
|
|
|
|
|
|
|
2703
|
|
|
|
|
|
|
=head1 AUTHOR |
2704
|
|
|
|
|
|
|
|
2705
|
|
|
|
|
|
|
Barrie Slaymaker |
2706
|
|
|
|
|
|
|
|
2707
|
|
|
|
|
|
|
=head1 COPYRIGHT |
2708
|
|
|
|
|
|
|
|
2709
|
|
|
|
|
|
|
Copyright 2002, Barrie Slaymaker, All Rights Reserved. |
2710
|
|
|
|
|
|
|
|
2711
|
|
|
|
|
|
|
You may use this module under the terms of the Artistic or GNU Pulic |
2712
|
|
|
|
|
|
|
licenses your choice. Also, a portion of XML::Filter::Dispatcher::Parser |
2713
|
|
|
|
|
|
|
is covered by: |
2714
|
|
|
|
|
|
|
|
2715
|
|
|
|
|
|
|
The Parse::Yapp module and its related modules and shell scripts are |
2716
|
|
|
|
|
|
|
copyright (c) 1998-1999 Francois Desarmenien, France. All rights |
2717
|
|
|
|
|
|
|
reserved. |
2718
|
|
|
|
|
|
|
|
2719
|
|
|
|
|
|
|
You may use and distribute them under the terms of either the GNU |
2720
|
|
|
|
|
|
|
General Public License or the Artistic License, as specified in the |
2721
|
|
|
|
|
|
|
Perl README file. |
2722
|
|
|
|
|
|
|
|
2723
|
|
|
|
|
|
|
Note: Parse::Yapp is only needed if you want to modify |
2724
|
|
|
|
|
|
|
lib/XML/Filter/Dispatcher/Grammar.pm |
2725
|
|
|
|
|
|
|
|
2726
|
|
|
|
|
|
|
=cut |
2727
|
|
|
|
|
|
|
|
2728
|
|
|
|
|
|
|
1 ; |