line |
stmt |
bran |
cond |
sub |
pod |
time |
code |
1
|
|
|
|
|
|
|
package Text::RewriteRules; |
2
|
|
|
|
|
|
|
|
3
|
7
|
|
|
7
|
|
208571
|
use Data::Dumper; |
|
7
|
|
|
|
|
82725
|
|
|
7
|
|
|
|
|
566
|
|
4
|
7
|
|
|
7
|
|
7374
|
use Filter::Simple; |
|
7
|
|
|
|
|
257282
|
|
|
7
|
|
|
|
|
58
|
|
5
|
|
|
|
|
|
|
|
6
|
7
|
|
|
7
|
|
567
|
use warnings; |
|
7
|
|
|
|
|
19
|
|
|
7
|
|
|
|
|
178
|
|
7
|
7
|
|
|
7
|
|
39
|
use strict; |
|
7
|
|
|
|
|
14
|
|
|
7
|
|
|
|
|
231
|
|
8
|
|
|
|
|
|
|
|
9
|
7
|
|
|
7
|
|
316
|
use 5.010000; # 5.10.0 |
|
7
|
|
|
|
|
28
|
|
|
7
|
|
|
|
|
47561
|
|
10
|
|
|
|
|
|
|
|
11
|
|
|
|
|
|
|
=encoding UTF-8 |
12
|
|
|
|
|
|
|
|
13
|
|
|
|
|
|
|
=head1 NAME |
14
|
|
|
|
|
|
|
|
15
|
|
|
|
|
|
|
Text::RewriteRules - A system to rewrite text using regexp-based rules |
16
|
|
|
|
|
|
|
|
17
|
|
|
|
|
|
|
=cut |
18
|
|
|
|
|
|
|
|
19
|
|
|
|
|
|
|
our $VERSION = '0.25'; |
20
|
|
|
|
|
|
|
|
21
|
|
|
|
|
|
|
=head1 SYNOPSIS |
22
|
|
|
|
|
|
|
|
23
|
|
|
|
|
|
|
use Text::RewriteRules; |
24
|
|
|
|
|
|
|
|
25
|
|
|
|
|
|
|
RULES email |
26
|
|
|
|
|
|
|
\.==> DOT |
27
|
|
|
|
|
|
|
@==> AT |
28
|
|
|
|
|
|
|
ENDRULES |
29
|
|
|
|
|
|
|
|
30
|
|
|
|
|
|
|
print email("ambs@cpan.org") # prints ambs AT cpan DOT org |
31
|
|
|
|
|
|
|
|
32
|
|
|
|
|
|
|
RULES/m inc |
33
|
|
|
|
|
|
|
(\d+)=e=> $1+1 |
34
|
|
|
|
|
|
|
ENDRULES |
35
|
|
|
|
|
|
|
|
36
|
|
|
|
|
|
|
print inc("I saw 11 cats and 23 dogs") # prints I saw 12 cats and 24 dogs |
37
|
|
|
|
|
|
|
|
38
|
|
|
|
|
|
|
=head1 ABSTRACT |
39
|
|
|
|
|
|
|
|
40
|
|
|
|
|
|
|
This module uses a simplified syntax for regexp-based rules for |
41
|
|
|
|
|
|
|
rewriting text. You define a set of rules, and the system applies them |
42
|
|
|
|
|
|
|
until no more rule can be applied. |
43
|
|
|
|
|
|
|
|
44
|
|
|
|
|
|
|
Two variants are provided: |
45
|
|
|
|
|
|
|
|
46
|
|
|
|
|
|
|
=over 4 |
47
|
|
|
|
|
|
|
|
48
|
|
|
|
|
|
|
=item 1 |
49
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
|
traditional rewrite (RULES function): |
51
|
|
|
|
|
|
|
|
52
|
|
|
|
|
|
|
while it is possible do substitute |
53
|
|
|
|
|
|
|
| apply first substitution rule |
54
|
|
|
|
|
|
|
|
55
|
|
|
|
|
|
|
=item 2 |
56
|
|
|
|
|
|
|
|
57
|
|
|
|
|
|
|
cursor based rewrite (RULES/m function): |
58
|
|
|
|
|
|
|
|
59
|
|
|
|
|
|
|
add a cursor to the beginning of the string |
60
|
|
|
|
|
|
|
while not reach end of string |
61
|
|
|
|
|
|
|
| apply substitute just after cursor and advance cursor |
62
|
|
|
|
|
|
|
| or advance cursor if no rule can be applied |
63
|
|
|
|
|
|
|
|
64
|
|
|
|
|
|
|
=back |
65
|
|
|
|
|
|
|
|
66
|
|
|
|
|
|
|
=head1 DESCRIPTION |
67
|
|
|
|
|
|
|
|
68
|
|
|
|
|
|
|
A lot of computer science problems can be solved using rewriting |
69
|
|
|
|
|
|
|
rules. |
70
|
|
|
|
|
|
|
|
71
|
|
|
|
|
|
|
Rewriting rules consist of mainly two parts: a regexp (LHS: Left Hand |
72
|
|
|
|
|
|
|
Side) that is matched with the text, and the string to use to |
73
|
|
|
|
|
|
|
substitute the content matched with the regexp (RHS: Right Hand Side). |
74
|
|
|
|
|
|
|
|
75
|
|
|
|
|
|
|
Now, why don't use a simple substitute? Because we want to define a |
76
|
|
|
|
|
|
|
set of rules and match them again and again, until no more regexp of |
77
|
|
|
|
|
|
|
the LHS matches. |
78
|
|
|
|
|
|
|
|
79
|
|
|
|
|
|
|
A point of discussion is the syntax to define this system. A brief |
80
|
|
|
|
|
|
|
discussion shown that some users would prefer a function to receive an |
81
|
|
|
|
|
|
|
hash with the rules, some other, prefer some syntax sugar. |
82
|
|
|
|
|
|
|
|
83
|
|
|
|
|
|
|
The approach used is the last: we use C such that we |
84
|
|
|
|
|
|
|
can add a specific non-perl syntax inside the Perl script. This |
85
|
|
|
|
|
|
|
improves legibility of big rewriting rules systems. |
86
|
|
|
|
|
|
|
|
87
|
|
|
|
|
|
|
This documentation is divided in two parts: first we will see the |
88
|
|
|
|
|
|
|
reference of the module. Kind of, what it does, with a brief |
89
|
|
|
|
|
|
|
explanation. Follows a tutorial which will be growing through time and |
90
|
|
|
|
|
|
|
releases. |
91
|
|
|
|
|
|
|
|
92
|
|
|
|
|
|
|
=head1 SYNTAX REFERENCE |
93
|
|
|
|
|
|
|
|
94
|
|
|
|
|
|
|
Note: most of the examples are very stupid, but that is the easiest |
95
|
|
|
|
|
|
|
way to explain the basic syntax. |
96
|
|
|
|
|
|
|
|
97
|
|
|
|
|
|
|
The basic syntax for the rewrite rules is a block, started by the |
98
|
|
|
|
|
|
|
keyword C and ended by the C. Everything between |
99
|
|
|
|
|
|
|
them is handled by the module and interpreted as rules or comments. |
100
|
|
|
|
|
|
|
|
101
|
|
|
|
|
|
|
The C keyword can handle a set of flags (we will see that |
102
|
|
|
|
|
|
|
later), and requires a name for the rule-set. This name will be used |
103
|
|
|
|
|
|
|
to define a function for that rewriting system. |
104
|
|
|
|
|
|
|
|
105
|
|
|
|
|
|
|
RULES functioname |
106
|
|
|
|
|
|
|
... |
107
|
|
|
|
|
|
|
ENDRULES |
108
|
|
|
|
|
|
|
|
109
|
|
|
|
|
|
|
The function is defined in the main namespace where the C |
110
|
|
|
|
|
|
|
block appears. |
111
|
|
|
|
|
|
|
|
112
|
|
|
|
|
|
|
In this block, each line can be a comment (Perl style), an empty line |
113
|
|
|
|
|
|
|
or a rule. |
114
|
|
|
|
|
|
|
|
115
|
|
|
|
|
|
|
=head2 Basic Rule |
116
|
|
|
|
|
|
|
|
117
|
|
|
|
|
|
|
A basic rule is a simple substitution: |
118
|
|
|
|
|
|
|
|
119
|
|
|
|
|
|
|
RULES foobar |
120
|
|
|
|
|
|
|
foo==>bar |
121
|
|
|
|
|
|
|
ENDRULES |
122
|
|
|
|
|
|
|
|
123
|
|
|
|
|
|
|
The arrow C<==E> is used as delimiter. At its left is the regexp |
124
|
|
|
|
|
|
|
to match, at the right side, the substitution. So, the previous block |
125
|
|
|
|
|
|
|
defines a C function that substitutes all C by |
126
|
|
|
|
|
|
|
C. |
127
|
|
|
|
|
|
|
|
128
|
|
|
|
|
|
|
Although this can seems similar to a global substitution, it is |
129
|
|
|
|
|
|
|
not. With a global substitution you can't do an endless loop. With |
130
|
|
|
|
|
|
|
this module it is very simple. I know you will get the idea. |
131
|
|
|
|
|
|
|
|
132
|
|
|
|
|
|
|
You can use the syntax of Perl both on the left and right hand side of |
133
|
|
|
|
|
|
|
the rule, including C<$1...>. |
134
|
|
|
|
|
|
|
|
135
|
|
|
|
|
|
|
=head2 Execution Rule |
136
|
|
|
|
|
|
|
|
137
|
|
|
|
|
|
|
If the Perl substitution supports execution, why not to support it, |
138
|
|
|
|
|
|
|
also? So, you got the idea. Here is an example: |
139
|
|
|
|
|
|
|
|
140
|
|
|
|
|
|
|
RULES foo |
141
|
|
|
|
|
|
|
(\d+)b=e=>'b' x $1 |
142
|
|
|
|
|
|
|
(\d+)a=eval=>'a' x ($1*2) |
143
|
|
|
|
|
|
|
ENDRULES |
144
|
|
|
|
|
|
|
|
145
|
|
|
|
|
|
|
So, for any number followed by a C, we replace by that number of |
146
|
|
|
|
|
|
|
C. For each number followed by an C, we replace them by twice |
147
|
|
|
|
|
|
|
that number of C. |
148
|
|
|
|
|
|
|
|
149
|
|
|
|
|
|
|
Also, you mean evaluation using an C or C inside the arrow. I |
150
|
|
|
|
|
|
|
should remind you can mix all these rules together in the same |
151
|
|
|
|
|
|
|
rewriting system. |
152
|
|
|
|
|
|
|
|
153
|
|
|
|
|
|
|
=head2 Conditional Rule |
154
|
|
|
|
|
|
|
|
155
|
|
|
|
|
|
|
On some cases we want to perform a substitution if the pattern matches |
156
|
|
|
|
|
|
|
B a set of conditions about that pattern (or not) are true. |
157
|
|
|
|
|
|
|
|
158
|
|
|
|
|
|
|
For that, we use a three part rule. We have the common rule plus the |
159
|
|
|
|
|
|
|
condition part, separated from the rule by C. These conditional |
160
|
|
|
|
|
|
|
rules can be applied both for basic and execution rules. |
161
|
|
|
|
|
|
|
|
162
|
|
|
|
|
|
|
RULES translate |
163
|
|
|
|
|
|
|
([[:alpha:]]+)=e=>$dic{$1}!! exists($dic{$1}) |
164
|
|
|
|
|
|
|
ENDRULES |
165
|
|
|
|
|
|
|
|
166
|
|
|
|
|
|
|
The previous example would translate all words that exist on the |
167
|
|
|
|
|
|
|
dictionary. |
168
|
|
|
|
|
|
|
|
169
|
|
|
|
|
|
|
=head2 Begin Rule |
170
|
|
|
|
|
|
|
|
171
|
|
|
|
|
|
|
Sometimes it is useful to change something on the string before |
172
|
|
|
|
|
|
|
starting to apply the rules. For that, there is a special rule named |
173
|
|
|
|
|
|
|
C (or C for abbreviate) just with a RHS. This RHS is Perl |
174
|
|
|
|
|
|
|
code. Any Perl code. If you want to modify the string, use C<$_>. |
175
|
|
|
|
|
|
|
|
176
|
|
|
|
|
|
|
RULES foo |
177
|
|
|
|
|
|
|
=b=> $_.=" END" |
178
|
|
|
|
|
|
|
ENDRULES |
179
|
|
|
|
|
|
|
|
180
|
|
|
|
|
|
|
=head2 Last Rule |
181
|
|
|
|
|
|
|
|
182
|
|
|
|
|
|
|
As you use C on Perl to skip the remaining code on a loop, you |
183
|
|
|
|
|
|
|
can also call a C (or C) rule when a specific pattern |
184
|
|
|
|
|
|
|
matches. |
185
|
|
|
|
|
|
|
|
186
|
|
|
|
|
|
|
Like the C rule with only a RHS, the C rule has only a |
187
|
|
|
|
|
|
|
LHS: |
188
|
|
|
|
|
|
|
|
189
|
|
|
|
|
|
|
RULES foo |
190
|
|
|
|
|
|
|
foobar=l=> |
191
|
|
|
|
|
|
|
ENDRULES |
192
|
|
|
|
|
|
|
|
193
|
|
|
|
|
|
|
This way, the rules iterate until the string matches with C. |
194
|
|
|
|
|
|
|
|
195
|
|
|
|
|
|
|
You can also supply a condition in a last rule: |
196
|
|
|
|
|
|
|
|
197
|
|
|
|
|
|
|
RULES bar |
198
|
|
|
|
|
|
|
f(o+)b(a+)r=l=> !! length($1) == 2 * length($2); |
199
|
|
|
|
|
|
|
|
200
|
|
|
|
|
|
|
=head2 Rules with /x mode |
201
|
|
|
|
|
|
|
|
202
|
|
|
|
|
|
|
It is possible to use the regular expressions /x mode in the rewrite rules. |
203
|
|
|
|
|
|
|
In this case: |
204
|
|
|
|
|
|
|
|
205
|
|
|
|
|
|
|
=over 4 |
206
|
|
|
|
|
|
|
|
207
|
|
|
|
|
|
|
=item 1 |
208
|
|
|
|
|
|
|
|
209
|
|
|
|
|
|
|
there must be an empty line between rules |
210
|
|
|
|
|
|
|
|
211
|
|
|
|
|
|
|
=item 2 |
212
|
|
|
|
|
|
|
|
213
|
|
|
|
|
|
|
you can insert space and line breaks into the regular expression: |
214
|
|
|
|
|
|
|
|
215
|
|
|
|
|
|
|
RULES/x f1 |
216
|
|
|
|
|
|
|
(\d+) |
217
|
|
|
|
|
|
|
(\d{3}) |
218
|
|
|
|
|
|
|
(000) |
219
|
|
|
|
|
|
|
==>$1 milhao e $2 mil!! $1 == 1 |
220
|
|
|
|
|
|
|
|
221
|
|
|
|
|
|
|
ENDRULES |
222
|
|
|
|
|
|
|
|
223
|
|
|
|
|
|
|
=back |
224
|
|
|
|
|
|
|
|
225
|
|
|
|
|
|
|
=head1 POWER EXPRESSIONS |
226
|
|
|
|
|
|
|
|
227
|
|
|
|
|
|
|
To facilitate matching complex languages Text::RewriteRules defines a |
228
|
|
|
|
|
|
|
set of regular expressions that you can use (without defining them). |
229
|
|
|
|
|
|
|
|
230
|
|
|
|
|
|
|
=head2 Parenthesis |
231
|
|
|
|
|
|
|
|
232
|
|
|
|
|
|
|
There are three kind of usual parenthesis: the standard parenthesis, |
233
|
|
|
|
|
|
|
brackets or curly braces. You can match a balanced string of |
234
|
|
|
|
|
|
|
parenthesis using the power expressions C<[[:PB:]]>, C<[[:BB:]]> and |
235
|
|
|
|
|
|
|
C<[[:CBB:]]> for these three kind of parenthesis. |
236
|
|
|
|
|
|
|
|
237
|
|
|
|
|
|
|
For instance, if you apply this rule: |
238
|
|
|
|
|
|
|
|
239
|
|
|
|
|
|
|
[[:BB:]]==>foo |
240
|
|
|
|
|
|
|
|
241
|
|
|
|
|
|
|
to this string |
242
|
|
|
|
|
|
|
|
243
|
|
|
|
|
|
|
something [ a [ b] c [d ]] and something more |
244
|
|
|
|
|
|
|
|
245
|
|
|
|
|
|
|
then, you will get |
246
|
|
|
|
|
|
|
|
247
|
|
|
|
|
|
|
something foo and something more |
248
|
|
|
|
|
|
|
|
249
|
|
|
|
|
|
|
Note that if you apply it to |
250
|
|
|
|
|
|
|
|
251
|
|
|
|
|
|
|
something [[ not ] balanced [ here |
252
|
|
|
|
|
|
|
|
253
|
|
|
|
|
|
|
then you will get |
254
|
|
|
|
|
|
|
|
255
|
|
|
|
|
|
|
something [foo balanced [ here |
256
|
|
|
|
|
|
|
|
257
|
|
|
|
|
|
|
=head2 XML tags |
258
|
|
|
|
|
|
|
|
259
|
|
|
|
|
|
|
The power expression C<[[:XML:]]> match a XML tag (with or without |
260
|
|
|
|
|
|
|
children XML tags. Note that this expression matches only well formed |
261
|
|
|
|
|
|
|
XML tags. |
262
|
|
|
|
|
|
|
|
263
|
|
|
|
|
|
|
As an example, the rule |
264
|
|
|
|
|
|
|
|
265
|
|
|
|
|
|
|
[[:XML:]]=>tag |
266
|
|
|
|
|
|
|
|
267
|
|
|
|
|
|
|
applied to the string |
268
|
|
|
|
|
|
|
|
269
|
|
|
|
|
|
|
and |
270
|
|
|
|
|
|
|
|
271
|
|
|
|
|
|
|
will result in |
272
|
|
|
|
|
|
|
|
273
|
|
|
|
|
|
|
and tag |
274
|
|
|
|
|
|
|
|
275
|
|
|
|
|
|
|
|
276
|
|
|
|
|
|
|
=cut |
277
|
|
|
|
|
|
|
|
278
|
|
|
|
|
|
|
our $DEBUG = 0; |
279
|
|
|
|
|
|
|
our $count = 0; |
280
|
|
|
|
|
|
|
our $NL = qr/\r?\n\r?/; |
281
|
|
|
|
|
|
|
my %pseudo_classes=( |
282
|
|
|
|
|
|
|
TEXENV => 'TEXENV', |
283
|
|
|
|
|
|
|
PB => 'PB', |
284
|
|
|
|
|
|
|
BB => 'BB', |
285
|
|
|
|
|
|
|
CBB => 'CBB', |
286
|
|
|
|
|
|
|
XML => 'XMLtree', |
287
|
|
|
|
|
|
|
'XML+1' => \&_tag_re, |
288
|
|
|
|
|
|
|
); |
289
|
|
|
|
|
|
|
|
290
|
|
|
|
|
|
|
sub _regular_expressions { |
291
|
13
|
|
|
13
|
|
134
|
return <<'EORE'; |
292
|
|
|
|
|
|
|
|
293
|
|
|
|
|
|
|
our $__XMLattrs = qr/(?: |
294
|
|
|
|
|
|
|
\s+[a-zA-Z0-9:-]+\s* |
295
|
|
|
|
|
|
|
= |
296
|
|
|
|
|
|
|
\s*(?: '[^']+' | "[^"]+" ))*/x; |
297
|
|
|
|
|
|
|
|
298
|
|
|
|
|
|
|
### This (?\n) is a BIG hack! |
299
|
|
|
|
|
|
|
our $__XMLempty = qr/<(?[a-zA-Z0-9:-]+)(?\b)$__XMLattrs\/>/x; |
300
|
|
|
|
|
|
|
|
301
|
|
|
|
|
|
|
our $__XMLtree2 = qr/$__XMLempty | |
302
|
|
|
|
|
|
|
(? |
303
|
|
|
|
|
|
|
<(?[a-zA-Z0-9:-]+)$__XMLattrs> |
304
|
|
|
|
|
|
|
(?: $__XMLempty | [^<]++ | (?&XML) )*+ |
305
|
|
|
|
|
|
|
<\/\k> |
306
|
|
|
|
|
|
|
)/x; |
307
|
|
|
|
|
|
|
our $__XMLtree = qr/$__XMLempty | |
308
|
|
|
|
|
|
|
(? |
309
|
|
|
|
|
|
|
<(?[a-zA-Z0-9:-]+)$__XMLattrs> |
310
|
|
|
|
|
|
|
(?(?: $__XMLempty | [^<]++ | $__XMLtree2 )*+) |
311
|
|
|
|
|
|
|
<\/\k> |
312
|
|
|
|
|
|
|
)/x; |
313
|
|
|
|
|
|
|
our $__XMLinner = qr/(?: [^<]++ | $__XMLempty | $__XMLtree2 )*+/x; |
314
|
|
|
|
|
|
|
|
315
|
|
|
|
|
|
|
our $__CBB = qr{ (? \{ (?(?:[^\{\}]++|(?&cbb1))*+) \} ) }sx; |
316
|
|
|
|
|
|
|
our $__BB = qr{ (? \[ (? (?:[^\[\]]++|(?&bb1) )*+) \] ) }sx; |
317
|
|
|
|
|
|
|
our $__PB = qr{ (? \( (? (?:[^\(\)]++|(?&pb1) )*+) \) ) }sx; |
318
|
|
|
|
|
|
|
|
319
|
|
|
|
|
|
|
our $__TEXENV = qr{\\begin\{(\w+)\}(.*?)\\end\{\1\}}s; ## \begin{$1}$2\end |
320
|
|
|
|
|
|
|
our $__TEXENV1 = qr{\\begin\{(\w+)\}($__BB?)($__CBB)(.*?)\\end\{\1\}}s; ## \begin{$1}[$2]{$3}$4\end |
321
|
|
|
|
|
|
|
|
322
|
|
|
|
|
|
|
|
323
|
|
|
|
|
|
|
EORE |
324
|
|
|
|
|
|
|
} |
325
|
|
|
|
|
|
|
|
326
|
|
|
|
|
|
|
sub _tag_re { |
327
|
6
|
|
|
6
|
|
12
|
my $tagname = shift; |
328
|
6
|
|
|
|
|
34
|
return "<$tagname\$__XMLattrs(?:\/>|>\$__XMLinner<\/$tagname>)"; |
329
|
|
|
|
|
|
|
} |
330
|
|
|
|
|
|
|
|
331
|
|
|
|
|
|
|
|
332
|
|
|
|
|
|
|
sub _expand_pseudo_classes { |
333
|
126
|
|
|
126
|
|
175
|
my $rules = shift; |
334
|
|
|
|
|
|
|
|
335
|
126
|
50
|
|
|
|
209
|
$rules =~ s/(\[\[:(\w+):\]\])/$pseudo_classes{$2}?"\$__$pseudo_classes{$2}":$1/ge; |
|
18
|
|
|
|
|
80
|
|
336
|
126
|
|
|
|
|
186
|
$rules =~ s/\[\[:(\w+)\(([^,\(\)]+)\):\]\]/$pseudo_classes{"$1+1"}->($2)/ge; |
|
6
|
|
|
|
|
20
|
|
337
|
|
|
|
|
|
|
|
338
|
126
|
|
|
|
|
258
|
return $rules; |
339
|
|
|
|
|
|
|
} |
340
|
|
|
|
|
|
|
|
341
|
|
|
|
|
|
|
sub _mrules { |
342
|
18
|
|
|
18
|
|
50
|
my ($conf, $name, $rules) = @_; |
343
|
18
|
|
|
|
|
26
|
++$count; |
344
|
|
|
|
|
|
|
|
345
|
18
|
|
|
|
|
35
|
my $code = "sub $name {\n"; |
346
|
18
|
|
|
|
|
25
|
$code .= " my \$p = shift;\n"; |
347
|
18
|
|
|
|
|
25
|
$code .= " my \$_M = \"\\x01\";\n"; |
348
|
18
|
|
|
|
|
23
|
$code .= " for (\$p) {\n"; |
349
|
18
|
|
|
|
|
21
|
$code .= " my \$modified = 1;\n"; |
350
|
18
|
|
|
|
|
22
|
$code .= " \$_ = \$_M.\$_;\n"; |
351
|
18
|
|
|
|
|
30
|
$code .= " #__$count#\n"; |
352
|
18
|
|
|
|
|
27
|
$code .= " my \$iteration = 0;\n"; |
353
|
18
|
|
|
|
|
23
|
$code .= " MAIN: while (\$modified) {\n"; |
354
|
18
|
|
|
|
|
22
|
$code .= " \$iteration++;\n"; |
355
|
|
|
|
|
|
|
|
356
|
18
|
50
|
|
|
|
41
|
if ($DEBUG) { |
357
|
0
|
|
|
|
|
0
|
$code .= " print STDERR \" >\$_\\n\";\n" |
358
|
|
|
|
|
|
|
} |
359
|
|
|
|
|
|
|
|
360
|
18
|
|
|
|
|
20
|
$code .= " \$modified = 0;\n"; |
361
|
|
|
|
|
|
|
|
362
|
18
|
50
|
|
|
|
218
|
my $ICASE = exists($conf->{i})?"i":""; |
363
|
18
|
100
|
|
|
|
37
|
my $DX = exists($conf->{x})?"x":""; |
364
|
18
|
50
|
|
|
|
37
|
if (exists($conf->{d})) { |
365
|
0
|
|
|
|
|
0
|
$code .= " print STDERR \"Iteration on $name: \$iteration\n\$p\n\";"; |
366
|
|
|
|
|
|
|
} |
367
|
|
|
|
|
|
|
|
368
|
18
|
|
|
|
|
21
|
my @rules; |
369
|
18
|
100
|
|
|
|
40
|
if ($DX eq "x") { |
370
|
2
|
|
|
|
|
61
|
@rules = split /$NL$NL/, $rules; |
371
|
|
|
|
|
|
|
} else { |
372
|
16
|
|
|
|
|
138
|
@rules = split /$NL/, $rules; |
373
|
|
|
|
|
|
|
} |
374
|
|
|
|
|
|
|
|
375
|
18
|
|
|
|
|
40
|
for my $rule (@rules) { |
376
|
34
|
|
|
|
|
196
|
$rule =~ s/$NL$//; |
377
|
|
|
|
|
|
|
|
378
|
34
|
100
|
|
|
|
492
|
if ($rule =~ m/(.*?)(=i?=>)(.*)!!(.*)/) { |
|
|
50
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
379
|
2
|
|
|
|
|
12
|
my ($ant,$con,$cond) = ($1,$3,$4); |
380
|
2
|
50
|
|
|
|
10
|
$ICASE = "i" if $2 =~ m!i!; |
381
|
2
|
|
|
|
|
7
|
$ant = _expand_pseudo_classes($ant); |
382
|
|
|
|
|
|
|
|
383
|
2
|
|
|
|
|
8
|
$code .= " while (m{\${_M}(?:$ant)}g$ICASE) {\n"; |
384
|
2
|
|
|
|
|
5
|
$code .= " if ($cond) {\n"; |
385
|
2
|
|
|
|
|
6
|
$code .= " s{\${_M}(?:$ant)}{$con\${_M}}$ICASE;\n"; |
386
|
2
|
|
|
|
|
5
|
$code .= " pos = undef;\n"; |
387
|
2
|
|
|
|
|
3
|
$code .= " \$modified = 1;\n"; |
388
|
2
|
|
|
|
|
24
|
$code .= " next MAIN\n"; |
389
|
2
|
|
|
|
|
4
|
$code .= " }\n"; |
390
|
2
|
|
|
|
|
7
|
$code .= " }\n"; |
391
|
|
|
|
|
|
|
|
392
|
|
|
|
|
|
|
} elsif ($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)!!(.*)/) { |
393
|
0
|
|
|
|
|
0
|
my ($ant,$con,$cond) = ($1,$3,$4); |
394
|
0
|
0
|
|
|
|
0
|
$ICASE = "i" if $2 =~ m!i!; |
395
|
0
|
|
|
|
|
0
|
$ant = _expand_pseudo_classes($ant); |
396
|
|
|
|
|
|
|
|
397
|
0
|
|
|
|
|
0
|
$code .= " while (m{\${_M}(?:$ant)}g$ICASE) {\n"; |
398
|
0
|
|
|
|
|
0
|
$code .= " if ($cond) {\n"; |
399
|
0
|
|
|
|
|
0
|
$code .= " s{\${_M}(?:$ant)}{eval{$con}.\${_M}}e$ICASE;\n"; |
400
|
0
|
|
|
|
|
0
|
$code .= " pos = undef;\n"; |
401
|
0
|
|
|
|
|
0
|
$code .= " \$modified = 1;\n"; |
402
|
0
|
|
|
|
|
0
|
$code .= " next MAIN\n"; |
403
|
0
|
|
|
|
|
0
|
$code .= " }\n"; |
404
|
0
|
|
|
|
|
0
|
$code .= " }\n"; |
405
|
|
|
|
|
|
|
|
406
|
|
|
|
|
|
|
} elsif ($rule =~ m/(.*?)(=i?=>)(.*)/) { |
407
|
22
|
|
|
|
|
53
|
my ($ant,$con) = ($1,$3); |
408
|
22
|
100
|
|
|
|
64
|
$ICASE = "i" if $2 =~ m!i!; |
409
|
22
|
|
|
|
|
41
|
$ant = _expand_pseudo_classes($ant); |
410
|
|
|
|
|
|
|
|
411
|
22
|
|
|
|
|
51
|
$code .= " if (m{\${_M}(?:$ant)}$ICASE) {\n"; |
412
|
22
|
|
|
|
|
177
|
$code .= " s{\${_M}(?:$ant)}{$con\${_M}}$ICASE;\n"; |
413
|
22
|
|
|
|
|
28
|
$code .= " \$modified = 1;\n"; |
414
|
22
|
|
|
|
|
27
|
$code .= " next\n"; |
415
|
22
|
|
|
|
|
52
|
$code .= " }\n"; |
416
|
|
|
|
|
|
|
|
417
|
|
|
|
|
|
|
} elsif($rule =~ m/=b(?:egin)?=>(.*)/s) { |
418
|
|
|
|
|
|
|
|
419
|
0
|
|
|
|
|
0
|
my $ac = $1; |
420
|
0
|
|
|
|
|
0
|
$code =~ s/(#__$count#\n)/$ac;\n$1/; |
421
|
|
|
|
|
|
|
|
422
|
|
|
|
|
|
|
} elsif ($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)/) { |
423
|
6
|
|
|
|
|
17
|
my ($ant,$con) = ($1,$3); |
424
|
6
|
100
|
|
|
|
23
|
$ICASE = "i" if $2 =~ m!i!; |
425
|
6
|
|
|
|
|
14
|
$ant = _expand_pseudo_classes($ant); |
426
|
|
|
|
|
|
|
|
427
|
6
|
|
|
|
|
17
|
$code .= " if (m{\${_M}(?:$ant)}$ICASE) {\n"; |
428
|
6
|
|
|
|
|
174
|
$code .= " s{\${_M}(?:$ant)}{eval{$con}.\"\$_M\"}e$ICASE;\n"; |
429
|
6
|
|
|
|
|
10
|
$code .= " \$modified = 1;\n"; |
430
|
6
|
|
|
|
|
7
|
$code .= " next\n"; |
431
|
6
|
|
|
|
|
18
|
$code .= " }\n"; |
432
|
|
|
|
|
|
|
|
433
|
|
|
|
|
|
|
} elsif($rule =~ m/(.*?)(=(?:i=)?l(?:ast)?=>\s*!!(.*))/s) { |
434
|
2
|
|
|
|
|
7
|
my ($ant,$cond) = ($1,$3); |
435
|
2
|
50
|
|
|
|
11
|
$ICASE = "i" if $2 =~ m!i!; |
436
|
2
|
|
|
|
|
6
|
$ant = _expand_pseudo_classes($ant); |
437
|
|
|
|
|
|
|
|
438
|
2
|
|
|
|
|
8
|
$code .= " if (m{\${_M}(?:$ant)}$ICASE$DX) {\n"; |
439
|
2
|
|
|
|
|
6
|
$code .= " if ($cond) {\n"; |
440
|
2
|
|
|
|
|
3
|
$code .= " s{\${_M}}{};\n"; |
441
|
2
|
|
|
|
|
3
|
$code .= " last\n"; |
442
|
2
|
|
|
|
|
4
|
$code .= " }\n"; |
443
|
2
|
|
|
|
|
6
|
$code .= " }\n"; |
444
|
|
|
|
|
|
|
|
445
|
|
|
|
|
|
|
} elsif($rule =~ m/(.*?)(=(?:i=)?l(?:ast)?=>)/s) { |
446
|
2
|
|
|
|
|
6
|
my ($ant) = ($1); |
447
|
2
|
50
|
|
|
|
11
|
$ICASE = "i" if $2 =~ m!i!; |
448
|
2
|
|
|
|
|
5
|
$ant = _expand_pseudo_classes($ant); |
449
|
|
|
|
|
|
|
|
450
|
2
|
|
|
|
|
9
|
$code .= " if (m{\${_M}(?:$ant)}$ICASE$DX) {\n"; |
451
|
2
|
|
|
|
|
4
|
$code .= " s{\${_M}}{};\n"; |
452
|
2
|
|
|
|
|
3
|
$code .= " last\n"; |
453
|
2
|
|
|
|
|
53
|
$code .= " }\n"; |
454
|
|
|
|
|
|
|
|
455
|
|
|
|
|
|
|
} else { |
456
|
0
|
0
|
|
|
|
0
|
warn "Unknown rule: $rule\n" unless $rule =~ m!^\s*(#|$)!; |
457
|
|
|
|
|
|
|
} |
458
|
|
|
|
|
|
|
} |
459
|
|
|
|
|
|
|
##--- |
460
|
|
|
|
|
|
|
|
461
|
|
|
|
|
|
|
# Make it walk... |
462
|
18
|
|
|
|
|
205
|
$code .= " if (m{\${_M}(.|\\n)}) {\n"; |
463
|
18
|
|
|
|
|
21
|
$code .= " s{\${_M}(.|\\n)}{\$1\${_M}};\n"; |
464
|
18
|
|
|
|
|
26
|
$code .= " \$modified = 1;\n"; |
465
|
18
|
|
|
|
|
18
|
$code .= " next\n"; |
466
|
18
|
|
|
|
|
20
|
$code .= " }\n"; |
467
|
|
|
|
|
|
|
|
468
|
18
|
|
|
|
|
18
|
$code .= " }\n"; |
469
|
18
|
|
|
|
|
24
|
$code .= " s/\$_M//;\n"; |
470
|
18
|
|
|
|
|
19
|
$code .= " }\n"; |
471
|
18
|
|
|
|
|
19
|
$code .= " return \$p;\n"; |
472
|
18
|
|
|
|
|
20
|
$code .= "}\n"; |
473
|
|
|
|
|
|
|
|
474
|
18
|
|
|
|
|
416
|
$code; |
475
|
|
|
|
|
|
|
} |
476
|
|
|
|
|
|
|
|
477
|
|
|
|
|
|
|
sub _rules { |
478
|
62
|
|
|
62
|
|
97
|
my ($conf, $name, $rules) = @_; |
479
|
62
|
|
|
|
|
62
|
++$count; |
480
|
|
|
|
|
|
|
|
481
|
62
|
|
|
|
|
302
|
my $code = "sub $name {\n"; |
482
|
62
|
|
|
|
|
80
|
$code .= " my \$p = shift;\n"; |
483
|
62
|
|
|
|
|
73
|
$code .= " for (\$p) {\n"; |
484
|
62
|
|
|
|
|
71
|
$code .= " my \$modified = 1;\n"; |
485
|
62
|
|
|
|
|
95
|
$code .= " #__$count#\n"; |
486
|
62
|
|
|
|
|
76
|
$code .= " my \$iteration = 0;\n"; |
487
|
62
|
|
|
|
|
78
|
$code .= " MAIN: while(\$modified) {\n"; |
488
|
62
|
50
|
|
|
|
243
|
$code .= " print STDERR \$_;\n" if $DEBUG > 1; |
489
|
62
|
|
|
|
|
78
|
$code .= " \$modified = 0;\n"; |
490
|
62
|
|
|
|
|
179
|
$code .= " \$iteration++;\n"; |
491
|
|
|
|
|
|
|
|
492
|
|
|
|
|
|
|
##--- |
493
|
|
|
|
|
|
|
|
494
|
62
|
100
|
|
|
|
124
|
my $DICASE = exists($conf->{i})?"i":""; |
495
|
62
|
100
|
|
|
|
110
|
my $DX = exists($conf->{x})?"x":""; |
496
|
62
|
50
|
|
|
|
119
|
if (exists($conf->{d})) { |
497
|
0
|
|
|
|
|
0
|
$code .= " print STDERR \"Iteration on $name: \$iteration\n\$p\n\";"; |
498
|
|
|
|
|
|
|
} |
499
|
|
|
|
|
|
|
|
500
|
62
|
|
|
|
|
58
|
my @rules; |
501
|
62
|
100
|
|
|
|
152
|
if ($DX eq "x") { |
502
|
4
|
|
|
|
|
67
|
@rules = split /$NL$NL/, $rules; |
503
|
|
|
|
|
|
|
} else { |
504
|
58
|
|
|
|
|
427
|
@rules = split /$NL/, $rules; |
505
|
|
|
|
|
|
|
} |
506
|
|
|
|
|
|
|
|
507
|
62
|
|
|
|
|
112
|
for my $rule (@rules) { |
508
|
74
|
|
|
|
|
287
|
$rule =~ s/$NL$//; |
509
|
|
|
|
|
|
|
|
510
|
74
|
|
|
|
|
88
|
my $ICASE = $DICASE; |
511
|
|
|
|
|
|
|
|
512
|
74
|
100
|
|
|
|
1323
|
if($rule =~ m/(.*?)(=i?=>)(.*)!!(.*)/s) { |
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
513
|
4
|
|
|
|
|
16
|
my ($ant,$con,$cond) = ($1,$3,$4); |
514
|
4
|
100
|
|
|
|
15
|
$ICASE = "i" if $2 =~ m!i!; |
515
|
4
|
|
|
|
|
8
|
$ant = _expand_pseudo_classes($ant); |
516
|
|
|
|
|
|
|
|
517
|
4
|
|
|
|
|
11
|
$code .= " while (m{$ant}g$ICASE$DX) {\n"; |
518
|
4
|
|
|
|
|
14
|
$code .= " if ($cond) {\n"; |
519
|
4
|
|
|
|
|
10
|
$code .= " s{$ant\\G}{$con}$ICASE$DX;\n"; |
520
|
4
|
|
|
|
|
6
|
$code .= " pos = undef;\n"; |
521
|
4
|
|
|
|
|
6
|
$code .= " \$modified = 1;\n"; |
522
|
4
|
|
|
|
|
5
|
$code .= " next MAIN\n"; |
523
|
4
|
|
|
|
|
6
|
$code .= " }\n"; |
524
|
4
|
|
|
|
|
12
|
$code .= " }\n"; |
525
|
|
|
|
|
|
|
|
526
|
|
|
|
|
|
|
} elsif($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)!!(.*)/s) { |
527
|
4
|
|
|
|
|
13
|
my ($ant,$con,$cond) = ($1,$3,$4); |
528
|
4
|
100
|
|
|
|
17
|
$ICASE = "i" if $2 =~ m!i!; |
529
|
4
|
|
|
|
|
8
|
$ant = _expand_pseudo_classes($ant); |
530
|
|
|
|
|
|
|
|
531
|
4
|
|
|
|
|
12
|
$code .= " while (m{$ant}g$ICASE$DX) {\n"; |
532
|
4
|
|
|
|
|
9
|
$code .= " if ($cond) {\n"; |
533
|
4
|
|
|
|
|
10
|
$code .= " s{$ant\\G}{$con}e${ICASE}${DX};\n"; |
534
|
4
|
|
|
|
|
6
|
$code .= " pos = undef;\n"; |
535
|
4
|
|
|
|
|
22
|
$code .= " \$modified = 1;\n"; |
536
|
4
|
|
|
|
|
5
|
$code .= " next MAIN\n"; |
537
|
4
|
|
|
|
|
6
|
$code .= " }\n"; |
538
|
4
|
|
|
|
|
12
|
$code .= " }\n"; |
539
|
|
|
|
|
|
|
|
540
|
|
|
|
|
|
|
} elsif ($rule =~ m/(.*?)(=i?=>)(.*)/s) { |
541
|
50
|
|
|
|
|
114
|
my ($ant,$con) = ($1,$3); |
542
|
50
|
100
|
|
|
|
128
|
$ICASE = "i" if $2 =~ m!i!; |
543
|
50
|
|
|
|
|
214
|
$ant = _expand_pseudo_classes($ant); |
544
|
|
|
|
|
|
|
|
545
|
50
|
|
|
|
|
119
|
$code .= " if (m{$ant}$ICASE$DX) {\n"; |
546
|
50
|
|
|
|
|
113
|
$code .= " s{$ant}{$con}$ICASE$DX;\n"; |
547
|
50
|
|
|
|
|
56
|
$code .= " \$modified = 1;\n"; |
548
|
50
|
|
|
|
|
62
|
$code .= " next\n"; |
549
|
50
|
|
|
|
|
133
|
$code .= " }\n"; |
550
|
|
|
|
|
|
|
|
551
|
|
|
|
|
|
|
} elsif ($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)/s) { |
552
|
8
|
|
|
|
|
21
|
my ($ant,$con) = ($1,$3); |
553
|
8
|
100
|
|
|
|
27
|
$ICASE = "i" if $2 =~ m!i!; |
554
|
8
|
|
|
|
|
16
|
$ant = _expand_pseudo_classes($ant); |
555
|
|
|
|
|
|
|
|
556
|
8
|
|
|
|
|
24
|
$code .= " if (m{$ant}$ICASE$DX) {\n"; |
557
|
8
|
|
|
|
|
20
|
$code .= " s{$ant}{$con}e$ICASE$DX;\n"; |
558
|
8
|
|
|
|
|
11
|
$code .= " \$modified = 1;\n"; |
559
|
8
|
|
|
|
|
9
|
$code .= " next\n"; |
560
|
8
|
|
|
|
|
22
|
$code .= " }\n"; |
561
|
|
|
|
|
|
|
|
562
|
|
|
|
|
|
|
} elsif($rule =~ m/=b(?:egin)?=>(.*)/s) { |
563
|
|
|
|
|
|
|
|
564
|
2
|
|
|
|
|
5
|
my $ac = $1; |
565
|
2
|
|
|
|
|
39
|
$code =~ s/(#__$count#\n)/$ac;\n$1/; |
566
|
|
|
|
|
|
|
|
567
|
|
|
|
|
|
|
} elsif($rule =~ m/(.*?)(=(i=)?l(ast)?=>\s*!!(.*))/s) { |
568
|
2
|
|
|
|
|
9
|
my ($ant,$cond) = ($1,$5); |
569
|
2
|
50
|
|
|
|
10
|
$ICASE = "i" if $2 =~ m!i!; |
570
|
2
|
|
|
|
|
6
|
$ant = _expand_pseudo_classes($ant); |
571
|
|
|
|
|
|
|
|
572
|
2
|
|
|
|
|
9
|
$code .= " if (m{$ant}$ICASE$DX) {\n"; |
573
|
2
|
|
|
|
|
4
|
$code .= " if ($cond) {\n"; |
574
|
2
|
|
|
|
|
5
|
$code .= " last\n"; |
575
|
2
|
|
|
|
|
3
|
$code .= " }\n"; |
576
|
2
|
|
|
|
|
6
|
$code .= " }\n"; |
577
|
|
|
|
|
|
|
|
578
|
|
|
|
|
|
|
} elsif($rule =~ m/(.*?)(=(i=)?l(ast)?=>)/s) { |
579
|
4
|
|
|
|
|
9
|
my ($ant) = ($1); |
580
|
4
|
100
|
|
|
|
14
|
$ICASE = "i" if $2 =~ m!i!; |
581
|
4
|
|
|
|
|
9
|
$ant = _expand_pseudo_classes($ant); |
582
|
|
|
|
|
|
|
|
583
|
4
|
|
|
|
|
22
|
$code .= " if (m{$ant}$ICASE$DX) {\n"; |
584
|
4
|
|
|
|
|
5
|
$code .= " last\n"; |
585
|
4
|
|
|
|
|
11
|
$code .= " }\n"; |
586
|
|
|
|
|
|
|
|
587
|
|
|
|
|
|
|
} else { |
588
|
0
|
0
|
|
|
|
0
|
warn "Unknown rule: $rule\n" unless $rule =~ m!^\s*(#|$)!; |
589
|
|
|
|
|
|
|
} |
590
|
|
|
|
|
|
|
} |
591
|
|
|
|
|
|
|
|
592
|
|
|
|
|
|
|
##--- |
593
|
|
|
|
|
|
|
|
594
|
62
|
|
|
|
|
89
|
$code .= " }\n"; |
595
|
62
|
|
|
|
|
68
|
$code .= " }\n"; |
596
|
62
|
|
|
|
|
62
|
$code .= " return \$p;\n"; |
597
|
62
|
|
|
|
|
131
|
$code .= "}\n"; |
598
|
|
|
|
|
|
|
|
599
|
62
|
|
|
|
|
1544
|
$code; |
600
|
|
|
|
|
|
|
} |
601
|
|
|
|
|
|
|
|
602
|
|
|
|
|
|
|
sub _lrules { |
603
|
8
|
|
|
8
|
|
14
|
my ($conf, $name, $rules) = @_; |
604
|
8
|
|
|
|
|
10
|
++$count; |
605
|
|
|
|
|
|
|
|
606
|
8
|
|
|
|
|
11
|
my $code = "my \$${name}_input = \"\";\n"; |
607
|
8
|
|
|
|
|
17
|
$code .= "sub ${name}_init {\n"; |
608
|
8
|
|
|
|
|
17
|
$code .= " \$${name}_input = shift;\n"; |
609
|
8
|
|
|
|
|
9
|
$code .= " return 1;\n"; |
610
|
8
|
|
|
|
|
9
|
$code .= "}\n\n"; |
611
|
|
|
|
|
|
|
|
612
|
8
|
|
|
|
|
12
|
$code .= "sub $name {\n"; |
613
|
8
|
|
|
|
|
16
|
$code .= " return undef if not defined \$${name}_input;\n"; |
614
|
8
|
50
|
|
|
|
94
|
$code .= " print STDERR \$_;\n" if $DEBUG > 1; |
615
|
8
|
|
|
|
|
14
|
$code .= " for (\$${name}_input) {\n"; |
616
|
|
|
|
|
|
|
|
617
|
|
|
|
|
|
|
##--- |
618
|
|
|
|
|
|
|
|
619
|
8
|
50
|
|
|
|
21
|
my $DICASE = exists($conf->{i})?"i":""; |
620
|
8
|
100
|
|
|
|
20
|
my $DX = exists($conf->{x})?"x":""; |
621
|
|
|
|
|
|
|
|
622
|
8
|
|
|
|
|
7
|
my @rules; |
623
|
8
|
100
|
|
|
|
17
|
if ($DX eq "x") { |
624
|
2
|
|
|
|
|
52
|
@rules = split /$NL$NL/, $rules; |
625
|
|
|
|
|
|
|
} else { |
626
|
6
|
|
|
|
|
183
|
@rules = split /$NL/, $rules; |
627
|
|
|
|
|
|
|
} |
628
|
|
|
|
|
|
|
|
629
|
8
|
|
|
|
|
19
|
for my $rule (@rules) { |
630
|
22
|
|
|
|
|
90
|
$rule =~ s/$NL$//; |
631
|
|
|
|
|
|
|
|
632
|
22
|
|
|
|
|
29
|
my $ICASE = $DICASE; |
633
|
|
|
|
|
|
|
|
634
|
22
|
50
|
|
|
|
410
|
if ($rule =~ m/=EOF=>(.*)/s) { |
|
|
100
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
635
|
|
|
|
|
|
|
|
636
|
0
|
|
|
|
|
0
|
my $act = $1; |
637
|
0
|
|
|
|
|
0
|
$code .= " if (m{^\$}) {\n"; |
638
|
0
|
|
|
|
|
0
|
$code .= " \$${name}_input = undef;\n"; |
639
|
0
|
|
|
|
|
0
|
$code .= " return \"$act\";\n"; |
640
|
0
|
|
|
|
|
0
|
$code .= " }\n"; |
641
|
|
|
|
|
|
|
|
642
|
|
|
|
|
|
|
} elsif ($rule =~ m/=EOF=e=>(.*)/s) { |
643
|
|
|
|
|
|
|
|
644
|
2
|
|
|
|
|
5
|
my $act = $1; |
645
|
2
|
|
|
|
|
4
|
$code .= " if (m{^\$}) {\n"; |
646
|
2
|
|
|
|
|
6
|
$code .= " \$${name}_input = undef;\n"; |
647
|
2
|
|
|
|
|
20
|
$code .= " return $act;\n"; |
648
|
2
|
|
|
|
|
8
|
$code .= " }\n"; |
649
|
|
|
|
|
|
|
|
650
|
|
|
|
|
|
|
} elsif ($rule =~ m/(.*?)(=(?:i=)?ignore=>)(.*)!!(.*)/s) { |
651
|
0
|
|
|
|
|
0
|
my ($ant,$cond) = ($1, $4); |
652
|
0
|
0
|
|
|
|
0
|
$ICASE = "i" if $2 =~ m!i!; |
653
|
0
|
|
|
|
|
0
|
$ant = _expand_pseudo_classes($ant); |
654
|
|
|
|
|
|
|
|
655
|
0
|
|
|
|
|
0
|
$code .= " if (m{^$ant}g$ICASE$DX) {\n"; |
656
|
0
|
|
|
|
|
0
|
$code .= " if ($cond) {\n"; |
657
|
0
|
|
|
|
|
0
|
$code .= " s{$ant}{}$ICASE$DX;\n"; |
658
|
0
|
|
|
|
|
0
|
$code .= " pos = undef;\n"; |
659
|
0
|
|
|
|
|
0
|
$code .= " return $name();\n"; |
660
|
0
|
|
|
|
|
0
|
$code .= " }\n"; |
661
|
0
|
|
|
|
|
0
|
$code .= " }\n"; |
662
|
|
|
|
|
|
|
|
663
|
|
|
|
|
|
|
} elsif ($rule =~ m/(.*?)(=(?:i=)?ignore=>)(.*)/s) { |
664
|
4
|
|
|
|
|
10
|
my ($ant) = ($1); |
665
|
4
|
50
|
|
|
|
15
|
$ICASE = "i" if $2 =~ m!i!; |
666
|
4
|
|
|
|
|
8
|
$ant = _expand_pseudo_classes($ant); |
667
|
|
|
|
|
|
|
|
668
|
4
|
|
|
|
|
12
|
$code .= " if (m{^$ant}g$ICASE$DX) {\n"; |
669
|
4
|
|
|
|
|
11
|
$code .= " s{$ant}{}$ICASE$DX;\n"; |
670
|
4
|
|
|
|
|
6
|
$code .= " pos = undef;\n"; |
671
|
4
|
|
|
|
|
8
|
$code .= " return $name();\n"; |
672
|
4
|
|
|
|
|
9
|
$code .= " }\n"; |
673
|
|
|
|
|
|
|
|
674
|
|
|
|
|
|
|
} elsif($rule =~ m/(.*?)(=i?=>)(.*)!!(.*)/s) { |
675
|
0
|
|
|
|
|
0
|
my ($ant,$con,$cond) = ($1,$3,$4); |
676
|
0
|
0
|
|
|
|
0
|
$ICASE = "i" if $2 =~ m!i!; |
677
|
0
|
|
|
|
|
0
|
$ant = _expand_pseudo_classes($ant); |
678
|
|
|
|
|
|
|
|
679
|
0
|
|
|
|
|
0
|
$code .= " if (m{^$ant}g$ICASE$DX) {\n"; |
680
|
0
|
|
|
|
|
0
|
$code .= " if ($cond) {\n"; |
681
|
0
|
|
|
|
|
0
|
$code .= " s{$ant}{}$ICASE$DX;\n"; |
682
|
0
|
|
|
|
|
0
|
$code .= " pos = undef;\n"; |
683
|
0
|
|
|
|
|
0
|
$code .= " return \"$con\"\n"; |
684
|
0
|
|
|
|
|
0
|
$code .= " }\n"; |
685
|
0
|
|
|
|
|
0
|
$code .= " }\n"; |
686
|
|
|
|
|
|
|
|
687
|
|
|
|
|
|
|
} elsif($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)!!(.*)/s) { |
688
|
0
|
|
|
|
|
0
|
my ($ant,$con,$cond) = ($1,$3,$4); |
689
|
0
|
0
|
|
|
|
0
|
$ICASE = "i" if $2 =~ m!i!; |
690
|
0
|
|
|
|
|
0
|
$ant = _expand_pseudo_classes($ant); |
691
|
|
|
|
|
|
|
|
692
|
0
|
|
|
|
|
0
|
$code .= " if (m{^$ant}g$ICASE$DX) {\n"; |
693
|
0
|
|
|
|
|
0
|
$code .= " if ($cond) {\n"; |
694
|
0
|
|
|
|
|
0
|
$code .= " s{$ant}{}${ICASE}${DX};\n"; |
695
|
0
|
|
|
|
|
0
|
$code .= " pos = undef;\n"; |
696
|
0
|
|
|
|
|
0
|
$code .= " return $con;\n"; |
697
|
0
|
|
|
|
|
0
|
$code .= " }\n"; |
698
|
0
|
|
|
|
|
0
|
$code .= " }\n"; |
699
|
|
|
|
|
|
|
|
700
|
|
|
|
|
|
|
} elsif($rule =~ m/(.*?)(=i?=>)(.*)/s) { |
701
|
|
|
|
|
|
|
|
702
|
4
|
|
|
|
|
11
|
my ($ant,$con) = ($1,$3); |
703
|
4
|
50
|
|
|
|
16
|
$ICASE = "i" if $2 =~ m!i!; |
704
|
4
|
|
|
|
|
11
|
$ant = _expand_pseudo_classes($ant); |
705
|
|
|
|
|
|
|
|
706
|
4
|
|
|
|
|
13
|
$code .= " if (m{^$ant}g$ICASE$DX) {\n"; |
707
|
4
|
|
|
|
|
9
|
$code .= " s{$ant}{}$ICASE$DX;\n"; |
708
|
4
|
|
|
|
|
6
|
$code .= " pos = undef;\n"; |
709
|
4
|
|
|
|
|
10
|
$code .= " return \"$con\"\n"; |
710
|
4
|
|
|
|
|
11
|
$code .= " }\n"; |
711
|
|
|
|
|
|
|
|
712
|
|
|
|
|
|
|
} elsif($rule =~ m/(.*?)(=(?:i=)?e(?:val)?=>)(.*)/s) { |
713
|
12
|
|
|
|
|
36
|
my ($ant,$con) = ($1,$3); |
714
|
12
|
50
|
|
|
|
53
|
$ICASE = "i" if $2 =~ m!i!; |
715
|
12
|
|
|
|
|
32
|
$ant = _expand_pseudo_classes($ant); |
716
|
|
|
|
|
|
|
|
717
|
12
|
|
|
|
|
34
|
$code .= " if (m{^$ant}g$ICASE$DX) {\n"; |
718
|
12
|
|
|
|
|
24
|
$code .= " s{$ant}{}${ICASE}${DX};\n"; |
719
|
12
|
|
|
|
|
14
|
$code .= " pos = undef;\n"; |
720
|
12
|
|
|
|
|
16
|
$code .= " return $con;\n"; |
721
|
12
|
|
|
|
|
30
|
$code .= " }\n"; |
722
|
|
|
|
|
|
|
|
723
|
|
|
|
|
|
|
} else { |
724
|
0
|
0
|
|
|
|
0
|
warn "Unknown rule in lexer mode: $rule\n" unless $rule =~ m!^\s*(#|$)!; |
725
|
|
|
|
|
|
|
} |
726
|
|
|
|
|
|
|
} |
727
|
|
|
|
|
|
|
|
728
|
|
|
|
|
|
|
##--- |
729
|
|
|
|
|
|
|
|
730
|
8
|
|
|
|
|
15
|
$code .= " }\n"; |
731
|
8
|
|
|
|
|
10
|
$code .= " return undef;\n"; |
732
|
8
|
|
|
|
|
10
|
$code .= "}\n"; |
733
|
|
|
|
|
|
|
|
734
|
8
|
|
|
|
|
113
|
$code; |
735
|
|
|
|
|
|
|
} |
736
|
|
|
|
|
|
|
|
737
|
|
|
|
|
|
|
|
738
|
|
|
|
|
|
|
FILTER { |
739
|
|
|
|
|
|
|
return if m!^(\s|\n)*$!; |
740
|
|
|
|
|
|
|
|
741
|
|
|
|
|
|
|
s!^!_regular_expressions()!e; |
742
|
|
|
|
|
|
|
|
743
|
|
|
|
|
|
|
print STDERR "BEFORE>>>>\n$_\n<<<<\n" if $DEBUG; |
744
|
|
|
|
|
|
|
|
745
|
|
|
|
|
|
|
s!^MRULES +(\w+)\s*?\n((?:.|\n)*?)^ENDRULES!_mrules({}, $1,$2)!gem; |
746
|
|
|
|
|
|
|
|
747
|
|
|
|
|
|
|
s!^LRULES +(\w+)\s*?\n((?:.|\n)*?)^ENDRULES!_lrules({}, $1,$2)!gem; |
748
|
|
|
|
|
|
|
|
749
|
|
|
|
|
|
|
s{^RULES((?:\/\w+)?) +(\w+)\s*?\n((?:.|\n)*?)^ENDRULES}{ |
750
|
|
|
|
|
|
|
my ($a,$b,$c) = ($1,$2,$3); |
751
|
|
|
|
|
|
|
my $conf = {map {($_=>$_)} split //,$a}; |
752
|
|
|
|
|
|
|
if (exists($conf->{'l'})) { |
753
|
|
|
|
|
|
|
_lrules($conf, $b, $c) |
754
|
|
|
|
|
|
|
} elsif (exists($conf->{'m'})) { |
755
|
|
|
|
|
|
|
_mrules($conf,$b,$c) |
756
|
|
|
|
|
|
|
} else { |
757
|
|
|
|
|
|
|
_rules($conf,$b,$c) |
758
|
|
|
|
|
|
|
} |
759
|
|
|
|
|
|
|
}gem; |
760
|
|
|
|
|
|
|
|
761
|
|
|
|
|
|
|
|
762
|
|
|
|
|
|
|
|
763
|
|
|
|
|
|
|
print STDERR "AFTER>>>>\n$_\n<<<<\n" if $DEBUG; |
764
|
|
|
|
|
|
|
|
765
|
|
|
|
|
|
|
$_ |
766
|
|
|
|
|
|
|
}; |
767
|
|
|
|
|
|
|
|
768
|
|
|
|
|
|
|
sub _compiler{ |
769
|
|
|
|
|
|
|
|
770
|
0
|
|
|
0
|
|
0
|
local $/ = undef; |
771
|
0
|
|
|
|
|
0
|
$_ = <>; |
772
|
|
|
|
|
|
|
|
773
|
0
|
|
|
|
|
0
|
print __compiler($_); |
774
|
|
|
|
|
|
|
} |
775
|
|
|
|
|
|
|
|
776
|
|
|
|
|
|
|
sub __compiler { |
777
|
6
|
|
|
6
|
|
2734
|
my $str = shift; |
778
|
|
|
|
|
|
|
|
779
|
6
|
|
|
|
|
14
|
for ($str) { |
780
|
|
|
|
|
|
|
|
781
|
6
|
|
|
|
|
35
|
s!use Text::RewriteRules;!_regular_expressions()!e; |
|
6
|
|
|
|
|
16
|
|
782
|
|
|
|
|
|
|
|
783
|
6
|
|
|
|
|
52
|
s!^MRULES +(\w+)\s*\n((?:.|\n)*?)^ENDRULES!_mrules({}, $1,$2)!gem; |
|
7
|
|
|
|
|
107
|
|
784
|
|
|
|
|
|
|
|
785
|
6
|
|
|
|
|
51
|
s!^LRULES +(\w+)\s*\n((?:.|\n)*?)^ENDRULES!_lrules({}, $1,$2)!gem; |
|
0
|
|
|
|
|
0
|
|
786
|
|
|
|
|
|
|
|
787
|
6
|
|
|
|
|
81
|
s{^RULES((?:\/\w+)?) +(\w+)\s*\n((?:.|\n)*?)^ENDRULES}{ |
788
|
37
|
|
|
|
|
108
|
my ($a,$b,$c) = ($1,$2,$3); |
789
|
|
|
|
|
|
|
|
790
|
37
|
|
|
|
|
77
|
my $conf = {map {($_=>$_)} split //,$a}; |
|
22
|
|
|
|
|
68
|
|
791
|
37
|
100
|
|
|
|
125
|
if (exists($conf->{'l'})) { |
|
|
100
|
|
|
|
|
|
792
|
4
|
|
|
|
|
9
|
_lrules($conf,$b,$c) |
793
|
|
|
|
|
|
|
} elsif (exists($conf->{'m'})) { |
794
|
2
|
|
|
|
|
11
|
_mrules($conf,$b,$c) |
795
|
|
|
|
|
|
|
} else { |
796
|
31
|
|
|
|
|
66
|
_rules($conf,$b,$c) |
797
|
|
|
|
|
|
|
} |
798
|
|
|
|
|
|
|
}gem; |
799
|
|
|
|
|
|
|
} |
800
|
|
|
|
|
|
|
|
801
|
6
|
|
|
|
|
231
|
return $str; |
802
|
|
|
|
|
|
|
} |
803
|
|
|
|
|
|
|
|
804
|
|
|
|
|
|
|
=head1 TUTORIAL |
805
|
|
|
|
|
|
|
|
806
|
|
|
|
|
|
|
At the moment, just a set of commented examples. |
807
|
|
|
|
|
|
|
|
808
|
|
|
|
|
|
|
Example1 -- from number to portuguese words (using traditional rewriting) |
809
|
|
|
|
|
|
|
|
810
|
|
|
|
|
|
|
Example2 -- Naif translator (using cursor-based rewriting) |
811
|
|
|
|
|
|
|
|
812
|
|
|
|
|
|
|
=head1 Conversion between numbers and words |
813
|
|
|
|
|
|
|
|
814
|
|
|
|
|
|
|
Yes, you can use L and similar (for other |
815
|
|
|
|
|
|
|
languages). Meanwhile, before it existed we needed to write such a |
816
|
|
|
|
|
|
|
conversion tool. |
817
|
|
|
|
|
|
|
|
818
|
|
|
|
|
|
|
Here I present a subset of the rules (for numbers bellow 1000). The |
819
|
|
|
|
|
|
|
generated text is Portuguese but I think you can get the idea. I'll |
820
|
|
|
|
|
|
|
try to create a version for English very soon. |
821
|
|
|
|
|
|
|
|
822
|
|
|
|
|
|
|
You can check the full code on the samples directory (file |
823
|
|
|
|
|
|
|
C). |
824
|
|
|
|
|
|
|
|
825
|
|
|
|
|
|
|
use Text::RewriteRules; |
826
|
|
|
|
|
|
|
|
827
|
|
|
|
|
|
|
RULES num2words |
828
|
|
|
|
|
|
|
100==>cem |
829
|
|
|
|
|
|
|
1(\d\d)==>cento e $1 |
830
|
|
|
|
|
|
|
0(\d\d)==>$1 |
831
|
|
|
|
|
|
|
200==>duzentos |
832
|
|
|
|
|
|
|
300==>trezentos |
833
|
|
|
|
|
|
|
400==>quatrocentos |
834
|
|
|
|
|
|
|
500==>quinhentos |
835
|
|
|
|
|
|
|
600==>seiscentos |
836
|
|
|
|
|
|
|
700==>setecentos |
837
|
|
|
|
|
|
|
800==>oitocentos |
838
|
|
|
|
|
|
|
900==>novecentos |
839
|
|
|
|
|
|
|
(\d)(\d\d)==>${1}00 e $2 |
840
|
|
|
|
|
|
|
|
841
|
|
|
|
|
|
|
10==>dez |
842
|
|
|
|
|
|
|
11==>onze |
843
|
|
|
|
|
|
|
12==>doze |
844
|
|
|
|
|
|
|
13==>treze |
845
|
|
|
|
|
|
|
14==>catorze |
846
|
|
|
|
|
|
|
15==>quinze |
847
|
|
|
|
|
|
|
16==>dezasseis |
848
|
|
|
|
|
|
|
17==>dezassete |
849
|
|
|
|
|
|
|
18==>dezoito |
850
|
|
|
|
|
|
|
19==>dezanove |
851
|
|
|
|
|
|
|
20==>vinte |
852
|
|
|
|
|
|
|
30==>trinta |
853
|
|
|
|
|
|
|
40==>quarenta |
854
|
|
|
|
|
|
|
50==>cinquenta |
855
|
|
|
|
|
|
|
60==>sessenta |
856
|
|
|
|
|
|
|
70==>setenta |
857
|
|
|
|
|
|
|
80==>oitenta |
858
|
|
|
|
|
|
|
90==>noventa |
859
|
|
|
|
|
|
|
0(\d)==>$1 |
860
|
|
|
|
|
|
|
(\d)(\d)==>${1}0 e $2 |
861
|
|
|
|
|
|
|
|
862
|
|
|
|
|
|
|
1==>um |
863
|
|
|
|
|
|
|
2==>dois |
864
|
|
|
|
|
|
|
3==>três |
865
|
|
|
|
|
|
|
4==>quatro |
866
|
|
|
|
|
|
|
5==>cinco |
867
|
|
|
|
|
|
|
6==>seis |
868
|
|
|
|
|
|
|
7==>sete |
869
|
|
|
|
|
|
|
8==>oito |
870
|
|
|
|
|
|
|
9==>nove |
871
|
|
|
|
|
|
|
0$==>zero |
872
|
|
|
|
|
|
|
0==> |
873
|
|
|
|
|
|
|
==> |
874
|
|
|
|
|
|
|
,==>, |
875
|
|
|
|
|
|
|
ENDRULES |
876
|
|
|
|
|
|
|
|
877
|
|
|
|
|
|
|
num2words(123); # returns "cento e vinte e três" |
878
|
|
|
|
|
|
|
|
879
|
|
|
|
|
|
|
=head2 Naif translator (using cursor-based rewriting) |
880
|
|
|
|
|
|
|
|
881
|
|
|
|
|
|
|
use Text::RewriteRules; |
882
|
|
|
|
|
|
|
%dict=(driver=>"motorista", |
883
|
|
|
|
|
|
|
the=>"o", |
884
|
|
|
|
|
|
|
of=>"de", |
885
|
|
|
|
|
|
|
car=>"carro"); |
886
|
|
|
|
|
|
|
|
887
|
|
|
|
|
|
|
$word='\b\w+\b'; |
888
|
|
|
|
|
|
|
|
889
|
|
|
|
|
|
|
if( b(a("I see the Driver of the car")) eq "(I) (see) o Motorista do carro" ) |
890
|
|
|
|
|
|
|
{print "ok\n"} |
891
|
|
|
|
|
|
|
else {print "ko\n"} |
892
|
|
|
|
|
|
|
|
893
|
|
|
|
|
|
|
RULES/m a |
894
|
|
|
|
|
|
|
($word)==>$dict{$1}!! defined($dict{$1}) |
895
|
|
|
|
|
|
|
($word)=e=> ucfirst($dict{lc($1)}) !! defined($dict{lc($1)}) |
896
|
|
|
|
|
|
|
($word)==>($1) |
897
|
|
|
|
|
|
|
ENDRULES |
898
|
|
|
|
|
|
|
|
899
|
|
|
|
|
|
|
RULES/m b |
900
|
|
|
|
|
|
|
\bde o\b==>do |
901
|
|
|
|
|
|
|
ENDRULES |
902
|
|
|
|
|
|
|
|
903
|
|
|
|
|
|
|
=head1 AUTHOR |
904
|
|
|
|
|
|
|
|
905
|
|
|
|
|
|
|
Alberto Simões, C<< >> |
906
|
|
|
|
|
|
|
|
907
|
|
|
|
|
|
|
José João Almeida, C<< >> |
908
|
|
|
|
|
|
|
|
909
|
|
|
|
|
|
|
=head1 BUGS |
910
|
|
|
|
|
|
|
|
911
|
|
|
|
|
|
|
We know documentation is missing and you all want to use this module. |
912
|
|
|
|
|
|
|
In fact we are using it a lot, what explains why we don't have the |
913
|
|
|
|
|
|
|
time to write documentation. |
914
|
|
|
|
|
|
|
|
915
|
|
|
|
|
|
|
Please report any bugs or feature requests to |
916
|
|
|
|
|
|
|
C, or through the web interface at |
917
|
|
|
|
|
|
|
L. I will be notified, and then you'll automatically |
918
|
|
|
|
|
|
|
be notified of progress on your bug as I make changes. |
919
|
|
|
|
|
|
|
|
920
|
|
|
|
|
|
|
=head1 ACKNOWLEDGEMENTS |
921
|
|
|
|
|
|
|
|
922
|
|
|
|
|
|
|
Damian Conway for Filter::Simple |
923
|
|
|
|
|
|
|
|
924
|
|
|
|
|
|
|
=head1 COPYRIGHT & LICENSE |
925
|
|
|
|
|
|
|
|
926
|
|
|
|
|
|
|
Copyright 2004-2012 Alberto Simões and José João Almeida, All Rights Reserved. |
927
|
|
|
|
|
|
|
|
928
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify it |
929
|
|
|
|
|
|
|
under the same terms as Perl itself. |
930
|
|
|
|
|
|
|
|
931
|
|
|
|
|
|
|
=cut |
932
|
|
|
|
|
|
|
|
933
|
|
|
|
|
|
|
1; # End of Text::RewriteRules |