File Coverage

blib/lib/PPIx/Regexp.pm

Criterion	Covered	Total	%
statement	117	119	98.3
branch	41	50	82.0
condition	12	12	100.0
subroutine	29	30	96.6
pod	18	18	100.0
total	217	229	94.7

line	stmt	bran	cond	sub	pod	time	code
1							=head1 NAME
2
3							PPIx::Regexp - Represent a regular expression of some sort
4
5							=head1 SYNOPSIS
6
7							use PPIx::Regexp;
8							use PPIx::Regexp::Dumper;
9							my $re = PPIx::Regexp->new( 'qr{foo}smx' );
10							PPIx::Regexp::Dumper->new( $re )
11							->print();
12
13							=head1 DEPRECATION NOTICE
14
15							The C<postderef> argument to L<new()\|/new> is retracted, and
16							postfix dereferences are always be recognized.
17
18							Starting with version 0.074_01, the first use of this argument warned.
19							With version 0.079_01, all uses warned. With version 0.080_01,
20							all uses became fatal. With version 0.084_01, all mention of this
21							argument was removed, except for this notice.
22
23							=head1 INHERITANCE
24
25							C<PPIx::Regexp> is a L<PPIx::Regexp::Node\|PPIx::Regexp::Node>.
26
27							C<PPIx::Regexp> has no descendants.
28
29							=head1 DESCRIPTION
30
31							The purpose of the F<PPIx-Regexp> package is to parse regular
32							expressions in a manner similar to the way the L<PPI\|PPI> package parses
33							Perl. This class forms the root of the parse tree, playing a role
34							similar to L<PPI::Document\|PPI::Document>.
35
36							This package shares with L<PPI\|PPI> the property of being round-trip
37							safe. That is,
38
39							my $expr = 's/ ( \d+ ) ( \D+ ) /$2$1/smxg';
40							my $re = PPIx::Regexp->new( $expr );
41							print $re->content() eq $expr ? "yes\n" : "no\n"
42
43							should print 'yes' for any valid regular expression.
44
45							Navigation is similar to that provided by L<PPI\|PPI>. That is to say,
46							things like C<children>, C<find_first>, C<snext_sibling> and so on all
47							work pretty much the same way as in L<PPI\|PPI>.
48
49							The class hierarchy is also similar to L<PPI\|PPI>. Except for some
50							utility classes (the dumper, the lexer, and the tokenizer) all classes
51							are descended from L<PPIx::Regexp::Element\|PPIx::Regexp::Element>, which
52							provides basic navigation. Tokens are descended from
53							L<PPIx::Regexp::Token\|PPIx::Regexp::Token>, which provides content. All
54							containers are descended from L<PPIx::Regexp::Node\|PPIx::Regexp::Node>,
55							which provides for children, and all structure elements are descended
56							from L<PPIx::Regexp::Structure\|PPIx::Regexp::Structure>, which provides
57							beginning and ending delimiters, and a type.
58
59							There are two features of L<PPI\|PPI> that this package does not provide
60							- mutability and operator overloading. There are no plans for serious
61							mutability, though something like L<PPI\|PPI>'s C<prune> functionality
62							might be considered. Similarly there are no plans for operator
63							overloading, which appears to the author to represent a performance hit
64							for little tangible gain.
65
66							=head1 NOTICE
67
68							The author will attempt to preserve the documented interface, but if the
69							interface needs to change to correct some egregiously bad design or
70							implementation decision, then it will change. Any incompatible changes
71							will go through a deprecation cycle.
72
73							The goal of this package is to parse well-formed regular expressions
74							correctly. A secondary goal is not to blow up on ill-formed regular
75							expressions. The correct identification and characterization of
76							ill-formed regular expressions is B<not> a goal of this package, nor is
77							the consistent parsing of ill-formed regular expressions from release to
78							release.
79
80							This policy attempts to track features in development releases as well
81							as public releases. However, features added in a development release and
82							then removed before the next production release B<will not> be tracked,
83							and any functionality relating to such features B<will be removed>. The
84							issue here is the potential re-use (with different semantics) of syntax
85							that did not make it into the production release.
86
87							From time to time the Perl regular expression engine changes in ways
88							that change the parse of a given regular expression. When these changes
89							occur, C<PPIx::Regexp> will be changed to produce the more modern parse.
90							Known examples of this include:
91
92							=over
93
94							=item C<$(> no longer interpolates as of Perl 5.005, per C<perl5005delta>.
95
96							Newer Perls seem to parse this as C<qr{$}> (i.e. an end-of-string or
97							newline assertion) followed by an open parenthesis, and that is what
98							C<PPIx::Regexp> does.
99
100							=item C<$)> and C<$\|> also seem to parse as the C<$> assertion
101
102							followed by the relevant meta-character, though I have no documentation
103							reference for this.
104
105							=item C<@+> and C<@-> no longer interpolate as of Perl 5.9.4
106
107							per C<perl594delta>. Subsequent Perls treat C<@+> as a quantified
108							literal and C<@-> as two literals, and that is what C<PPIx::Regexp>
109							does. Note that subscripted references to these arrays B<do>
110							interpolate, and are so parsed by C<PPIx::Regexp>.
111
112							=item Only space and horizontal tab are whitespace as of Perl 5.23.4
113
114							when inside a bracketed character class inside an extended bracketed
115							character class, per C<perl5234delta>. Formerly any white space
116							character parsed as whitespace. This change in C<PPIx::Regexp> will be
117							reverted if the change in Perl does not make it into Perl 5.24.0.
118
119							=item Unescaped literal left curly brackets
120
121							These are being removed in positions where quantifiers are legal, so
122							that they can be used for new functionality. Some of them are gone in
123							5.25.1, others will be removed in a future version of Perl. In
124							situations where they have been removed,
125							L<perl_version_removed()\|PPIx::Regexp::Element/perl_version_removed>
126							will return the version in which they were removed. When the new
127							functionality appears, the parse produced by this software will reflect
128							the new functionality.
129
130							B<NOTE> that the situation with a literal left curly after a literal
131							character is complicated. It was made an error in Perl 5.25.1, and
132							remained so through all 5.26 releases, but became a warning again in
133							5.27.1 due to its use in GNU Autoconf. Whether it will ever become
134							illegal again is not clear to me based on the contents of
135							F<perl5271delta>. At the moment
136							L<perl_version_removed()\|PPIx::Regexp::Element/perl_version_removed>
137							returns C<undef>, but obviously that is not the whole story, and methods
138							L<accepts_perl()\|PPIx::Regexp::Element/accepts_perl> and
139							L<requirements_for_perl()\|PPIx::Regexp::Element/requirements_for_perl>
140							were introduced to deal with this complication.
141
142							=item C<\o{...}>
143
144							is parsed as the octal equivalent of C<\x{...}>. This is its meaning as
145							of perl 5.13.2. Before 5.13.2 it was simply literal C<'o'> and so on.
146
147							=item C<x{,3}>
148
149							(with first count omitted) is allowed as a quantifier as of Perl 5.33.6.
150							The previous parse made this all literals.
151
152							=item C<x{ 0 , 3 }>
153
154							(with spaces inside but adjacent to curly brackets, or around the comma
155							if any) is allowed as a quantifier as of Perl 5.33.6. The previous parse
156							made this all literals.
157
158							=back
159
160							There are very probably other examples of this. When they come to light
161							they will be documented as producing the modern parse, and the code
162							modified to produce this parse if necessary.
163
164							=head1 METHODS
165
166							This class provides the following public methods. Methods not documented
167							here are private, and unsupported in the sense that the author reserves
168							the right to change or remove them without notice.
169
170							=cut
171
172							package PPIx::Regexp;
173
174	9			9		430940	use strict;
	9					26
	9					243
175	9			9		33	use warnings;
	9					22
	9					362
176
177	9			9		33	use base qw{ PPIx::Regexp::Node };
	9					13
	9					3809
178
179	9			9		44	use Carp;
	9					10
	9					532
180	9					813	use PPIx::Regexp::Constant qw{
181							ARRAY_REF
182							LOCATION_LINE
183							LOCATION_CHARACTER
184							LOCATION_COLUMN
185							LOCATION_LOGICAL_LINE
186							LOCATION_LOGICAL_FILE
187							@CARP_NOT
188	9			9		37	};
	9					12
189	9			9		4178	use PPIx::Regexp::Lexer ();
	9					31
	9					202
190	9			9		48	use PPIx::Regexp::Token::Modifier (); # For its modifier manipulations.
	9					16
	9					127
191	9			9		29	use PPIx::Regexp::Tokenizer;
	9					14
	9					235
192	9					413	use PPIx::Regexp::Util qw{
193							__choose_tokenizer_class
194							__instance
195	9			9		28	};
	9					12
196	9			9		36	use Scalar::Util qw{ refaddr };
	9					13
	9					10560
197
198							our $VERSION = '0.091';
199
200							=head2 new
201
202							my $re = PPIx::Regexp->new('/foo/');
203
204							This method instantiates a C<PPIx::Regexp> object from a string, a
205							L<PPI::Token::QuoteLike::Regexp\|PPI::Token::QuoteLike::Regexp>, a
206							L<PPI::Token::Regexp::Match\|PPI::Token::Regexp::Match>, or a
207							L<PPI::Token::Regexp::Substitute\|PPI::Token::Regexp::Substitute>.
208							Honestly, any L<PPI::Element\|PPI::Element> will work, but only the three
209							Regexp classes mentioned previously are likely to do anything useful.
210
211							Whatever form the argument takes, it is assumed to consist entirely of a
212							valid match, substitution, or C<< qr<> >> string.
213
214							Optionally you can pass one or more name/value pairs after the regular
215							expression. The possible options are:
216
217							=over
218
219							=item default_modifiers array_reference
220
221							This option specifies a reference to an array of default modifiers to
222							apply to the regular expression being parsed. Each modifier is specified
223							as a string. Any actual modifiers found supersede the defaults.
224
225							When applying the defaults, C<'?'> and C<'/'> are completely ignored,
226							and C<'^'> is ignored unless it occurs at the beginning of the modifier.
227							The first dash (C<'-'>) causes subsequent modifiers to be negated.
228
229							So, for example, if you wish to produce a C<PPIx::Regexp> object
230							representing the regular expression in
231
232							use re '/smx';
233							{
234							no re '/x';
235							m/ foo /;
236							}
237
238							you would (after some help from L<PPI\|PPI> in finding the relevant
239							statements), do something like
240
241							my $re = PPIx::Regexp->new( 'm/ foo /',
242							default_modifiers => [ '/smx', '-/x' ] );
243
244							=item encoding name
245
246							This option specifies the encoding of the regular expression. This is
247							passed to the tokenizer, which will C<decode> the regular expression
248							string before it tokenizes it. For example:
249
250							my $re = PPIx::Regexp->new( '/foo/',
251							encoding => 'iso-8859-1',
252							);
253
254							=item index_locations Boolean
255
256							This Boolean option specifies whether the locations of the elements in
257							the regular expression should be indexed.
258
259							If unspecified or specified as C<undef> a default value is used. This
260							default is true if the argument is a L<PPI::Element\|PPI::Element> or the
261							C<location> option was specified. Otherwise the default is false.
262
263							=item location array_reference
264
265							This option specifies the location of the new object in the document
266							from which it was created. It is a reference to a five-element array
267							compatible with that returned by the C<location()> method of
268							L<PPI::Element\|PPI::Element>.
269
270							If not specified, the location of the original string is used if it was
271							specified as a L<PPI::Element\|PPI::Element>.
272
273							If no location can be determined, the various C<location()> methods will
274							return C<undef>.
275
276							=item postderef Boolean
277
278							B<THIS ARGUMENT IS DEPRECATED>.
279							See L<DEPRECATION NOTICE\|/DEPRECATION NOTICE> above for the details.
280
281							This option is passed on to the tokenizer, where it specifies whether
282							postfix dereferences are recognized in interpolations and code. This
283							experimental feature was introduced in Perl 5.19.5.
284
285							As of version 0.074_01, the default is true. Through release 0.074, the
286							default was the value of
287							C<$PPIx::Regexp::Tokenizer::DEFAULT_POSTDEREF>, which was true. When
288							originally introduced this was false, but was documented as becoming
289							true when and if postfix dereferencing became mainstream. The intent to
290							mainstream was announced with Perl 5.23.1, and became official (so to
291							speak) with Perl 5.24.0, so the default became true with L<PPIx::Regexp>
292							0.049_01.
293
294							Note that if L<PPI\|PPI> starts unconditionally recognizing postfix
295							dereferences, this argument will immediately become ignored, and will be
296							put through a deprecation cycle and removed.
297
298							=item strict Boolean
299
300							This option is passed on to the tokenizer and lexer, where it specifies
301							whether the parse should assume C<use re 'strict'> is in effect.
302
303							The C<'strict'> pragma was introduced in Perl 5.22, and its
304							documentation says that it is experimental, and that there is no
305							commitment to backward compatibility. The same applies to the
306							parse produced when this option is asserted. Also, the usual caveat
307							applies: if C<use re 'strict'> ends up being retracted, this option and
308							all related functionality will be also.
309
310							Given the nature of C<use re 'strict'>, you should expect that if you
311							assert this option, regular expressions that previously parsed without
312							error might no longer do so. If an element ends up being declared an
313							error because this option is set, its C<perl_version_introduced()> will
314							be the Perl version at which C<use re 'strict'> started rejecting these
315							elements.
316
317							The default is false.
318
319							=item trace number
320
321							If greater than zero, this option causes trace output from the parse.
322							The author reserves the right to change or eliminate this without
323							notice.
324
325							=back
326
327							Passing optional input other than the above is not an error, but neither
328							is it supported.
329
330							=cut
331
332							{
333
334							my $errstr;
335
336							sub new {
337	335			335	1	302794	my ( $class, $content, %args ) = @_;
338	335	50				968	ref $class and $class = ref $class;
339
340							# We have to do this very early so the tokenizer can see it.
341							defined $args{index_locations}
342							or $args{index_locations} = (
343	335	50	100			2333	!! $args{location} \|\| __instance( $content, 'PPI::Element' ) );
344
345	335					665	$errstr = undef;
346
347							# As of 0.068_01 this either fails or returns
348							# PPIx::Regexp::Tokenizer
349	335					1024	my $tokenizer_class = __choose_tokenizer_class( $content, \%args );
350
351							my $tokenizer = $tokenizer_class->new(
352	335	100				2132	$content, %args ) or do {
353	1					4	$errstr = PPIx::Regexp::Tokenizer->errstr();
354	1					3	return;
355							};
356
357	334					1964	my $lexer = PPIx::Regexp::Lexer->new( $tokenizer, %args );
358	334					1296	my @nodes = $lexer->lex();
359	334					1091	my $self = $class->SUPER::__new( @nodes );
360	334					873	$self->{index_locations} = $args{index_locations};
361	334					770	$self->{source} = $content;
362	334					984	$self->{failures} = $lexer->failures();
363							$self->{effective_modifiers} =
364	334					1215	$tokenizer->__effective_modifiers();
365	334	100				889	if ( $args{location} ) {
366							ARRAY_REF eq ref $args{location}
367	1	50				3	or croak q<Argument 'location' must be an array reference>;
368	1					3	foreach my $inx ( 0 .. 3 ) {
369	4	50				9	$args{location}[$inx] =~ m/ [^0-9] /smx
370							and croak "Argument 'location' element $inx must be an unsigned integer";
371							}
372	1					6	$self->{location} = $args{location};
373							}
374	334					5203	return $self;
375							}
376
377							sub errstr {
378	2			2	1	3	return $errstr;
379							}
380
381							}
382
383							=head2 new_from_cache
384
385							This static method wraps L</new> in a caching mechanism. Only one object
386							will be generated for a given L<PPI::Element\|PPI::Element>, no matter
387							how many times this method is called. Calls after the first for a given
388							L<PPI::Element\|PPI::ELement> simply return the same C<PPIx::Regexp>
389							object.
390
391							When the C<PPIx::Regexp> object is returned from cache, the values of
392							the optional arguments are ignored.
393
394							Calls to this method with the regular expression in a string rather than
395							a L<PPI::Element\|PPI::Element> will not be cached.
396
397							B<Caveat:> This method is provided for code like
398							L<Perl::Critic\|Perl::Critic> which might instantiate the same object
399							multiple times. The cache will persist until L</flush_cache> is called.
400
401							=head2 flush_cache
402
403							$re->flush_cache(); # Remove $re from cache
404							PPIx::Regexp->flush_cache(); # Empty the cache
405
406							This method flushes the cache used by L</new_from_cache>. If called as a
407							static method with no arguments, the entire cache is emptied. Otherwise
408							any objects specified are removed from the cache.
409
410							=cut
411
412							{
413
414							my %cache;
415
416							our $DISABLE_CACHE; # Leave this undocumented, at least for
417							# now.
418
419							sub __cache_size {
420	8			8		74	return scalar keys %cache;
421							}
422
423							sub new_from_cache {
424	6			6	1	5626	my ( $class, $content, %args ) = @_;
425
426	6	100				26	__instance( $content, 'PPI::Element' )
427							or return $class->new( $content, %args );
428
429	5	100				18	$DISABLE_CACHE and return $class->new( $content, %args );
430
431	3					5	my $addr = refaddr( $content );
432	3	100				8	exists $cache{$addr} and return $cache{$addr};
433
434	2	50				12	my $self = $class->new( $content, %args )
435							or return;
436
437	2					6	$cache{$addr} = $self;
438
439	2					5	return $self;
440
441							}
442
443							sub flush_cache {
444	4			4	1	4992	my @args = @_;
445
446	4	100				23	ref $args[0] or shift @args;
447
448	4	100				15	if ( @args ) {
449	3					8	foreach my $obj ( @args ) {
450	3	100	100			19	if ( __instance( $obj, __PACKAGE__ ) &&
451							__instance( ( my $parent = $obj->source() ),
452							'PPI::Element' ) ) {
453	1					7	delete $cache{ refaddr( $parent ) };
454							}
455							}
456							} else {
457	1					4	%cache = ();
458							}
459	4					8	return;
460							}
461
462							}
463
464	0			0	1	0	sub can_be_quantified { return; }
465
466							=head2 capture_names
467
468							foreach my $name ( $re->capture_names() ) {
469							print "Capture name '$name'\n";
470							}
471
472							This convenience method returns the capture names found in the regular
473							expression.
474
475							This method is equivalent to
476
477							$self->regular_expression()->capture_names();
478
479							except that if C<< $self->regular_expression() >> returns C<undef>
480							(meaning that something went terribly wrong with the parse) this method
481							will simply return.
482
483							=cut
484
485							sub capture_names {
486	3			3	1	7	my ( $self ) = @_;
487	3	100				7	my $re = $self->regular_expression() or return;
488	2					7	return $re->capture_names();
489							}
490
491							=head2 delimiters
492
493							print join("\t", PPIx::Regexp->new('s/foo/bar/')->delimiters());
494							# prints '// //'
495
496							When called in list context, this method returns either one or two
497							strings, depending on whether the parsed expression has a replacement
498							string. In the case of non-bracketed substitutions, the start delimiter
499							of the replacement string is considered to be the same as its finish
500							delimiter, as illustrated by the above example.
501
502							When called in scalar context, you get the delimiters of the regular
503							expression; that is, element 0 of the array that is returned in list
504							context.
505
506							Optionally, you can pass an index value and the corresponding delimiters
507							will be returned; index 0 represents the regular expression's
508							delimiters, and index 1 represents the replacement string's delimiters,
509							which may be undef. For example,
510
511							print PPIx::Regexp->new('s{foo}<bar>')->delimiters(1);
512							# prints '<>'
513
514							If the object was not initialized with a valid regexp of some sort, the
515							results of this method are undefined.
516
517							=cut
518
519							sub delimiters {
520	63			63	1	132	my ( $self, $inx ) = @_;
521
522	63					84	my @rslt;
523	63					111	foreach my $method ( qw{ regular_expression replacement } ) {
524	126	100				342	defined ( my $obj = $self->$method() ) or next;
525	68					333	push @rslt, $obj->delimiters();
526							}
527
528	63	100				130	defined $inx and return $rslt[$inx];
529	57	50				117	wantarray and return @rslt;
530	57	50				238	defined wantarray and return $rslt[0];
531	0					0	return;
532							}
533
534							=head2 errstr
535
536							This static method returns the error string from the most recent attempt
537							to instantiate a C<PPIx::Regexp>. It will be C<undef> if the most recent
538							attempt succeeded.
539
540							=cut
541
542							# defined above, just after sub new.
543
544							sub explain {
545	1			1	1	3	return;
546							}
547
548							=head2 extract_regexps
549
550							my $doc = PPI::Document->new( $path );
551							$doc->index_locations();
552							my @res = PPIx::Regexp->extract_regexps( $doc )
553
554							This convenience (well, sort-of) static method takes as its argument a
555							L<PPI::Document\|PPI::Document> object and returns C<PPIx::Regexp>
556							objects corresponding to all regular expressions found in it, in the
557							order in which they occur in the document. You will need to keep a
558							reference to the original L<PPI::Document\|PPI::Document> object if you
559							wish to be able to recover the original L<PPI::Element\|PPI::Element>
560							objects via the L<PPIx::Regexp\|PPIx::Regexp>
561							L<source()\|PPIx::Regexp/source> method.
562
563							=cut
564
565							sub extract_regexps {
566	2			2	1	672093	my ( $class, $doc ) = @_;
567	2	100				5	my @found = map { @{ $doc->find( $_ ) \|\| [] } } qw{
	6					43660
	6					22
568							PPI::Token::QuoteLike::Regexp
569							PPI::Token::Regexp::Match
570							PPI::Token::Regexp::Substitute
571							};
572	3					14	return ( map { $class->new( $_ ) } map { $_->[0] }
	3					217
573	1	50				32	sort { $a->[1][0] <=> $b->[1][0] \|\| $a->[1][1] <=> $b->[1][1] }
574	2					21737	map { [ $_, $_->location() ] }
	3					25329
575							@found
576							);
577							}
578
579							=head2 failures
580
581							print "There were ", $re->failures(), " parse failures\n";
582
583							This method returns the number of parse failures. This is a count of the
584							number of unknown tokens plus the number of unterminated structures plus
585							the number of unmatched right brackets of any sort.
586
587							=cut
588
589							sub failures {
590	287			287	1	591	my ( $self ) = @_;
591	287					708	return $self->{failures};
592							}
593
594							=head2 max_capture_number
595
596							print "Highest used capture number ",
597							$re->max_capture_number(), "\n";
598
599							This convenience method returns the highest capture number used by the
600							regular expression. If there are no captures, the return will be 0.
601
602							This method is equivalent to
603
604							$self->regular_expression()->max_capture_number();
605
606							except that if C<< $self->regular_expression() >> returns C<undef>
607							(meaning that something went terribly wrong with the parse) this method
608							will too.
609
610							=cut
611
612							sub max_capture_number {
613	6			6	1	12	my ( $self ) = @_;
614	6	100				15	my $re = $self->regular_expression() or return;
615	5					16	return $re->max_capture_number();
616							}
617
618							=head2 modifier
619
620							my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
621							print $re->modifier()->content(), "\n";
622							# prints 'smx'.
623
624							This method retrieves the modifier of the object. This comes from the
625							end of the initializing string or object and will be a
626							L<PPIx::Regexp::Token::Modifier\|PPIx::Regexp::Token::Modifier>.
627
628							B<Note> that this object represents the actual modifiers present on the
629							regexp, and does not take into account any that may have been applied by
630							default (i.e. via the C<default_modifiers> argument to C<new()>). For
631							something that takes account of default modifiers, see
632							L<modifier_asserted()\|/modifier_asserted>, below.
633
634							In the event of a parse failure, there may not be a modifier present, in
635							which case nothing is returned.
636
637							=cut
638
639							sub modifier {
640	3			3	1	7	my ( $self ) = @_;
641	3					12	return $self->_component( 'PPIx::Regexp::Token::Modifier' );
642							}
643
644							=head2 modifier_asserted
645
646							my $re = PPIx::Regexp->new( '/ . /',
647							default_modifiers => [ 'smx' ] );
648							print $re->modifier_asserted( 'x' ) ? "yes\n" : "no\n";
649							# prints 'yes'.
650
651							This method returns true if the given modifier is asserted for the
652							regexp, whether explicitly or by the modifiers passed in the
653							C<default_modifiers> argument.
654
655							Starting with version 0.036_01, if the argument is a
656							single-character modifier followed by an asterisk (intended as a wild
657							card character), the return is the number of times that modifier
658							appears. In this case an exception will be thrown if you specify a
659							multi-character modifier (e.g. C<'ee*'>), or if you specify one of the
660							match semantics modifiers (e.g. C<'a*'>).
661
662							=cut
663
664							sub modifier_asserted {
665	15			15	1	23	my ( $self, $modifier ) = @_;
666							return PPIx::Regexp::Token::Modifier::__asserts(
667							$self->{effective_modifiers},
668	15					66	$modifier,
669							);
670							}
671
672							# This is a kluge for both determining whether the object asserts
673							# modifiers (hence the 'ductype') and determining whether the given
674							# modifier is actually asserted. The signature is the invocant and the
675							# modifier name, which must not be undef. The return is a Boolean.
676							*__ducktype_modifier_asserted = \&modifier_asserted;
677
678							# As of Perl 5.21.1 you can not leave off the type of a '?'-delimited
679							# regexp. Because this is not associated with any single child we
680							# compute it here.
681							sub perl_version_removed {
682	56			56	1	128	my ( $self ) = @_;
683	56					228	my $v = $self->SUPER::perl_version_removed();
684	56	100	100			174	defined $v
685							and $v <= 5.021001
686							and return $v;
687	55	50				154	defined( my $delim = $self->delimiters() )
688							or return $v;
689	55	100	100			164	'??' eq $delim
690							and '' eq $self->type()->content()
691							and return '5.021001';
692	54					162	return $v;
693							}
694
695							=head2 regular_expression
696
697							my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
698							print $re->regular_expression()->content(), "\n";
699							# prints '/(foo)/'.
700
701							This method returns that portion of the object which actually represents
702							a regular expression.
703
704							=cut
705
706							sub regular_expression {
707	78			78	1	130	my ( $self ) = @_;
708	78					204	return $self->_component( 'PPIx::Regexp::Structure::Regexp' );
709							}
710
711							=head2 replacement
712
713							my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
714							print $re->replacement()->content(), "\n";
715							# prints '${1}bar/'.
716
717							This method returns that portion of the object which represents the
718							replacement string. This will be C<undef> unless the regular expression
719							actually has a replacement string. Delimiters will be included, but
720							there will be no beginning delimiter unless the regular expression was
721							bracketed.
722
723							=cut
724
725							sub replacement {
726	65			65	1	119	my ( $self ) = @_;
727	65					113	return $self->_component( 'PPIx::Regexp::Structure::Replacement' );
728							}
729
730							=head2 source
731
732							my $source = $re->source();
733
734							This method returns the object or string that was used to instantiate
735							the object.
736
737							=cut
738
739							sub source {
740	5			5	1	11	my ( $self ) = @_;
741	5					20	return $self->{source};
742							}
743
744							=head2 type
745
746							my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
747							print $re->type()->content(), "\n";
748							# prints 's'.
749
750							This method retrieves the type of the object. This comes from the
751							beginning of the initializing string or object, and will be a
752							L<PPIx::Regexp::Token::Structure\|PPIx::Regexp::Token::Structure>
753							whose C<content> is one of 's',
754							'm', 'qr', or ''.
755
756							=cut
757
758							sub type {
759	4			4	1	7	my ( $self ) = @_;
760	4					10	return $self->_component( 'PPIx::Regexp::Token::Structure' );
761							}
762
763							sub _component {
764	150			150		242	my ( $self, $class ) = @_;
765	150					295	foreach my $elem ( $self->children() ) {
766	371	100				1277	$elem->isa( $class ) and return $elem;
767							}
768	60					157	return;
769							}
770
771							1;
772
773							__END__
774
775							=head1 RESTRICTIONS
776
777							By the nature of this module, it is never going to get everything right.
778							Many of the known problem areas involve interpolations one way or
779							another.
780
781							=head2 Ambiguous Syntax
782
783							Perl's regular expressions contain cases where the syntax is ambiguous.
784							A particularly egregious example is an interpolation followed by square
785							or curly brackets, for example C<$foo[...]>. There is nothing in the
786							syntax to say whether the programmer wanted to interpolate an element of
787							array C<@foo>, or whether he wanted to interpolate scalar C<$foo>, and
788							then follow that interpolation by a character class.
789
790							The F<perlop> documentation notes that in this case what Perl does is to
791							guess. That is, it employs various heuristics on the code to try to
792							figure out what the programmer wanted. These heuristics are documented
793							as being undocumented (!) and subject to change without notice. As an
794							example of the problems even F<perl> faces in parsing Perl, see
795							L<https://github.com/perl/perl5/issues/16478>.
796
797							Given this situation, this module's chances of duplicating every Perl
798							version's interpretation of every regular expression are pretty much nil.
799							What it does now is to assume that square brackets containing B<only> an
800							integer or an interpolation represent a subscript; otherwise they
801							represent a character class. Similarly, curly brackets containing
802							B<only> a bareword or an interpolation are a subscript; otherwise they
803							represent a quantifier.
804
805							=head2 Changes in Syntax
806
807							Sometimes the introduction of new syntax changes the way a regular
808							expression is parsed. For example, the C<\v> character class was
809							introduced in Perl 5.9.5. But it did not represent a syntax error prior
810							to that version of Perl, it was simply parsed as C<v>. So
811
812							$ perl -le 'print "v" =~ m/\v/ ? "yes" : "no"'
813
814							prints "yes" under Perl 5.8.9, but "no" under 5.10.0. C<PPIx::Regexp>
815							generally assumes the more modern parse in cases like this.
816
817							=head2 Equivocation
818
819							Very occasionally, a construction will be removed and then added back --
820							and then, conceivably, removed again. In this case, the plan is for
821							L<perl_version_introduced()\|PPIx::Regexp::Element/perl_version_introduced>
822							to return the earliest version in which the construction appeared, and
823							L<perl_version_removed()\|PPIx::Regexp::Element/perl_version_removed> to
824							return the version after the last version in which it appeared (whether
825							production or development), or C<undef> if it is in the highest-numbered
826							Perl.
827
828							The constructions involved in this are:
829
830							=head3 Un-escaped literal left curly after literal
831
832							That is, something like C<< qr<x{> >>.
833
834							This was made an error in C<5.25.1>, and it was an error in C<5.26.0>.
835							But it became a warning again in C<5.27.1>. The F<perl5271delta> says it
836							was re-instated because the changes broke GNU Autoconf, and the warning
837							message says it will be removed in Perl C<5.30>.
838
839							Accordingly,
840							L<perl_version_introduced()\|PPIx::Regexp::Element/perl_version_introduced>
841							returns C<5.0>. At the moment
842							L<perl_version_removed()\|PPIx::Regexp::Element/perl_version_removed> returns
843							C<'5.025001'>. But if it is present with or without warning in C<5.28>,
844							L<perl_version_removed()\|PPIx::Regexp::Element/perl_version_removed> will become
845							C<undef>. If you need finer resolution than this, see
846							L<PPIx::Regexp::Element\|PPIx::Regexp::Element> methods
847							l<accepts_perl()\|PPIx::Regexp::Element/accepts_perl> and
848							l<requirements_for_perl()\|PPIx::Regexp::Element/requirements_for_perl>
849
850							=head2 Static Parsing
851
852							It is well known that Perl can not be statically parsed. That is, you
853							can not completely parse a piece of Perl code without executing that
854							same code.
855
856							Nevertheless, this class is trying to statically parse regular
857							expressions. The main problem with this is that there is no way to know
858							what is being interpolated into the regular expression by an
859							interpolated variable. This is a problem because the interpolated value
860							can change the interpretation of adjacent elements.
861
862							This module deals with this by making assumptions about what is in an
863							interpolated variable. These assumptions will not be enumerated here,
864							but in general the principal is to assume the interpolated value does
865							not change the interpretation of the regular expression. For example,
866
867							my $foo = 'a-z]';
868							my $re = qr{[$foo};
869
870							is fine with the Perl interpreter, but will confuse the dickens out of
871							this module. Similarly and more usefully, something like
872
873							my $mods = 'i';
874							my $re = qr{(?$mods:foo)};
875
876							or maybe
877
878							my $mods = 'i';
879							my $re = qr{(?$mods)$foo};
880
881							probably sets a modifier of some sort, and that is how this module
882							interprets it. If the interpolation is B<not> about modifiers, this
883							module will get it wrong. Another such semi-benign example is
884
885							my $foo = $] >= 5.010 ? '?<foo>' : '';
886							my $re = qr{($foo\w+)};
887
888							which will parse, but this module will never realize that it might be
889							looking at a named capture.
890
891							=head2 Non-Standard Syntax
892
893							There are modules out there that alter the syntax of Perl. If the syntax
894							of a regular expression is altered, this module has no way to understand
895							that it has been altered, much less to adapt to the alteration. The
896							following modules are known to cause problems:
897
898							L<Acme::PerlML\|Acme::PerlML>, which renders Perl as XML.
899
900							C<Data::PostfixDeref>, which causes Perl to interpret suffixed empty
901							brackets as dereferencing the thing they suffix. This module by Ben
902							Morrow (C<BMORROW>) appears to have been retracted.
903
904							L<Filter::Trigraph\|Filter::Trigraph>, which recognizes ANSI C trigraphs,
905							allowing Perl to be written in the ISO 646 character set.
906
907							L<Perl6::Pugs\|Perl6::Pugs>. Enough said.
908
909							L<Perl6::Rules\|Perl6::Rules>, which back-ports some of the Perl 6
910							regular expression syntax to Perl 5.
911
912							L<Regexp::Extended\|Regexp::Extended>, which extends regular expressions
913							in various ways, some of which seem to conflict with Perl 5.010.
914
915							=head1 SEE ALSO
916
917							L<Regexp::Parsertron\|Regexp::Parsertron>, which uses
918							L<Marpa::R2\|Marpa::R2> to parse the regexp, and L<Tree\|Tree> for
919							navigation. Unlike C<PPIx::Regexp\|PPIx::Regexp>,
920							L<Regexp::Parsertron\|Regexp::Parsertron> supports modification of the
921							parse tree.
922
923							L<Regexp::Parser\|Regexp::Parser>, which parses a bare regular expression
924							(without enclosing C<qr{}>, C<m//>, or whatever) and uses a different
925							navigation model. After a long hiatus, this module has been adopted, and
926							is again supported.
927
928							L<YAPE::Regex\|YAPE::Regex>, which provides the parse tree, and has a
929							mechanism to subclass the various element classes for customization. The
930							most-recent release is 2011, but the CPAN testers results are still all
931							green. Companion module L<YAPE::Regex::Explain\|YAPE::Regex::Explain>
932							says what the various pieces of a regex do, though constructs added in
933							perl 5.10 and later are not supported. I have no idea how I missed this
934							when I originally went looking for C<Regexp> parsers.
935
936							L<PPR\|PPR>, which recognizes Perl of all sorts, including regular
937							expressions, but does not actually provide a parse of the recognized
938							constructs.
939
940							=head1 SUPPORT
941
942							Support is by the author. Please file bug reports at
943							L<https://rt.cpan.org/Public/Dist/Display.html?Name=PPIx-Regexp>,
944							L<https://github.com/trwyant/perl-PPIx-Regexp/issues>, or in
945							electronic mail to the author.
946
947							=head1 AUTHOR
948
949							Thomas R. Wyant, III F<wyant at cpan dot org>
950
951							=head1 COPYRIGHT AND LICENSE
952
953							Copyright (C) 2009-2023, 2025 by Thomas R. Wyant, III
954
955							This program is free software; you can redistribute it and/or modify it
956							under the same terms as Perl 5.10.0. For more details, see the full text
957							of the licenses in the directory LICENSES.
958
959							This program is distributed in the hope that it will be useful, but
960							without any warranty; without even the implied warranty of
961							merchantability or fitness for a particular purpose.
962
963							=cut
964
965							# ex: set textwidth=72 :