File Coverage

blib/lib/XS/Parse/Keyword.pm
Criterion Covered Total %
statement 5 5 100.0
branch n/a
condition n/a
subroutine 2 2 100.0
pod n/a
total 7 7 100.0


line stmt bran cond sub pod time code
1             # You may distribute under the terms of either the GNU General Public License
2             # or the Artistic License (the same terms as Perl itself)
3             #
4             # (C) Paul Evans, 2021-2022 -- leonerd@leonerd.org.uk
5              
6             package XS::Parse::Keyword 0.27;
7              
8 22     22   1462059 use v5.14;
  22         284  
9 22     22   121 use warnings;
  22         42  
  22         3606  
10              
11             require XSLoader;
12             XSLoader::load( __PACKAGE__, our $VERSION );
13              
14             =head1 NAME
15              
16             C - XS functions to assist in parsing keyword syntax
17              
18             =head1 DESCRIPTION
19              
20             This module provides some XS functions to assist in writing syntax modules
21             that provide new perl-visible syntax, primarily for authors of keyword plugins
22             using the C hook mechanism. It is unlikely to be of much
23             use to anyone else; and highly unlikely to be any use when writing perl code
24             using these. Unless you are writing a keyword plugin using XS, this module is
25             not for you.
26              
27             This module is also currently experimental, and the design is still evolving
28             and subject to change. Later versions may break ABI compatibility, requiring
29             changes or at least a rebuild of any module that depends on it.
30              
31             =cut
32              
33             =head1 XS FUNCTIONS
34              
35             =head2 boot_xs_parse_keyword
36              
37             void boot_xs_parse_keyword(double ver);
38              
39             Call this function from your C section in order to initialise the module
40             and parsing hooks.
41              
42             I should either be 0 or a decimal number for the module version
43             requirement; e.g.
44              
45             boot_xs_parse_keyword(0.14);
46              
47             =head2 register_xs_parse_keyword
48              
49             void register_xs_parse_keyword(const char *keyword,
50             const struct XSParseKeywordHooks *hooks, void *hookdata);
51              
52             This function installs a set of parsing hooks to be associated with the given
53             keyword. Such a keyword will then be handled automatically by a keyword parser
54             installed by C itself.
55              
56             =cut
57              
58             =head1 PARSE HOOKS
59              
60             The C structure provides the following hook stages, which
61             are invoked in the given order.
62              
63             =head2 flags
64              
65             The following flags are defined:
66              
67             =over 4
68              
69             =item C
70              
71             The parse or build function is expected to return C.
72              
73             =item C
74              
75             The parse or build function is expected to return C.
76              
77             These two flags are largely for the benefit of giving static information at
78             registration time to assist static parsing or other related tasks to know what
79             kind of grammatical element this keyword will produce.
80              
81             =item C
82              
83             The syntax forms a complete statement, which should be followed by a statement
84             separator semicolon (C<;>). This semicolon is optional at the end of a block.
85              
86             The semicolon, if present, will be consumed automatically.
87              
88             =back
89              
90             =head2 The C Stage
91              
92             const char *permit_hintkey;
93             bool (*permit) (pTHX_ void *hookdata);
94              
95             Called by the installed keyword parser hook which is used to handle keywords
96             registered by L.
97              
98             As a shortcut for the common case, the C may point to a string
99             to look up from the hints hash. If the given key name is not found in the
100             hints hash then the keyword is not permitted. If the key is present then the
101             C function is invoked as normal.
102              
103             If not rejected by a hint key that was not found in the hints hash, the
104             function part of the stage is called next and should inspect whether the
105             keyword is permitted at this time perhaps by inspecting other lexical clues,
106             and return true only if the keyword is permitted.
107              
108             Both the string and the function are optional. Either or both may be present.
109             If neither is present then the keyword is always permitted - which is likely
110             not what you wanted to do.
111              
112             =head2 The C Stage
113              
114             void (*check)(pTHX_ void *hookdata);
115              
116             Invoked once the keyword has been permitted. If present, this hook function
117             can check the surrounding lexical context, state, or other information and
118             throw an exception if it is unhappy that the keyword should apply in this
119             position.
120              
121             =head2 The C Stage
122              
123             This stage is invoked once the keyword has been checked, and actually
124             parses the incoming text into an optree. It is implemented by calling the
125             B of the following function pointers which is not NULL. The invoked
126             function may optionally build an optree to represent the parsed syntax, and
127             place it into the variable addressed by C. If it does not, then a simple
128             C will be constructed in its place.
129              
130             C is called both before and after this stage is invoked, so
131             in many simple cases the hook function itself does not need to bother with it.
132              
133             int (*parse)(pTHX_ OP **out, void *hookdata);
134              
135             If present, this should consume text from the parser buffer by invoking
136             C or C functions and eventually return a C
137             result value.
138              
139             This is the most generic and powerful of the options, but requires the most
140             amount of implementation work.
141              
142             int (*build)(pTHX_ OP **out, XSParseKeywordPiece *args[], size_t nargs, void *hookdata);
143              
144             If C is not present, this is called instead after parsing a sequence of
145             arguments, of types given by the I field; which should be a zero-
146             terminated array of piece types.
147              
148             This alternative is somewhat less generic and powerful than providing C
149             yourself, but involves much less parsing work and is shorter and easier to
150             implement.
151              
152             int (*build1)(pTHX_ OP **out, XSParseKeywordPiece *arg0, void *hookdata);
153              
154             If neither C nor C are present, this is called as a simpler
155             variant of C when only a single argument is required. It takes its type
156             from the C field instead.
157              
158             =cut
159              
160             =head1 PIECES AND PIECE TYPES
161              
162             When using the C or C alternatives for the C phase, the
163             actual syntax is parsed automatically by this module, according to the
164             specification given by the I or I field. The result of that
165             parsing step is placed into the I or I parameter to the invoked
166             function, using a C type consisting of the following fields:
167              
168             typedef struct
169             union {
170             OP *op;
171             CV *cv;
172             SV *sv;
173             int i;
174             struct {
175             SV *name;
176             SV *value;
177             } attr;
178             PADOFFSET padix;
179             struct XSParseInfixInfo *infix;
180             };
181             int line;
182             } XSParseKeywordPiece;
183              
184             Which field of the anonymous union is set depends on the type of the piece.
185             The I field contains the line number of the source file where parsing of
186             that piece began.
187              
188             Some piece types are "atomic", whose definition is self-contained. Others are
189             structural, defined in terms of inner pieces. Together these form an entire
190             tree-shaped definition of the syntax that the keyword expects to find.
191              
192             Atomic types generally provide exactly one argument into the list of I
193             (with the exception of literal matches, which do not provide anything).
194             Structural types may provide an initial argument themselves, followed by a
195             list of the values of each sub-piece they contained inside them. Thus, while
196             the data structure defining the syntax shape is a tree, the argument values it
197             parses into is passed as a flat array to the C function.
198              
199             Some structural types need to be able to determine whether or not syntax
200             relating some optional part of them is present in the incoming source text. In
201             this case, the pieces relating to those optional parts must support "probing".
202             This ability is also noted below.
203              
204             The type of each piece should be one of the following macro values.
205              
206             =head2 XPK_BLOCK
207              
208             I
209              
210             XPK_BLOCK
211              
212             A brace-delimited block of code is expected, passed as an optree in the I
213             field. This will be parsed as a block within the current function scope.
214              
215             This can be probed by checking for the presence of an open-brace (C<{>)
216             character.
217              
218             Be careful defining grammars with this because an open-brace is also a valid
219             character to start a term expression, for example. Given a choice between
220             C and C, either of them could try to consume such
221             code as
222              
223             { 123, 456 }
224              
225             =head2 XPK_BLOCK_VOIDCTX, XPK_BLOCK_SCALARCTX, XPK_BLOCK_LISTCTX
226              
227             Variants of C which wrap a void, scalar or list-context scope
228             around the block.
229              
230             =head2 XPK_PREFIXED_BLOCK
231              
232             I
233              
234             XPK_PREFIXED_BLOCK(pieces ...)
235              
236             Some pieces are expected, followed by a brace-delimited block of code, which
237             is passed as an optree in the I field. The prefix pieces are parsed first,
238             and their results are passed before the block itself.
239              
240             The entire sequence, including the prefix items, is contained within a pair of
241             C / C calls. This permits the prefix pieces to
242             introduce new items into the lexical scope of the block - for example by the
243             use of C.
244              
245             A call to C is automatically made at the end of the prefix pieces,
246             before the block itself is parsed, ensuring any new lexical variables are now
247             visible.
248              
249             In addition, the following extra piece types are recognised here:
250              
251             =over 4
252              
253             =item XPK_SETUP
254              
255             void setup(pTHX_ void *hookdata);
256              
257             XPK_SETUP(&setup)
258              
259             I
260              
261             This piece type runs a function given by pointer. Typically this function may
262             be used to introduce new lexical state into the parser, or in some other way
263             have some side-effect on the parsing context of the block to be parsed.
264              
265             =back
266              
267             =head2 XPK_PREFIXED_BLOCK_ENTERLEAVE
268              
269             A variant of C which additionally wraps the entire parsing
270             operation, including the C, C and any calls to
271             C functions, within a C/C pair.
272              
273             This should not make a difference to the standard parser pieces provided here,
274             but may be useful behaviour for the code in the setup function, especially if
275             it wishes to modify parser state and use the savestack to ensure it is
276             restored again when parsing has finished.
277              
278             =head2 XPK_ANONSUB
279              
280             I
281              
282             A brace-delimited block of code is expected, and assembled into the body of a
283             new anonymous subroutine. This will be passed as a protosub CV in the I
284             field.
285              
286             =head2 XPK_ARITHEXPR
287              
288             I
289              
290             XPK_ARITHEXPR
291              
292             An arithmetic expression is expected, parsed using C, and
293             passed as an optree in the I field.
294              
295             =head2 XPK_ARITHEXPR_VOIDCTX, XPK_ARITHEXPR_SCALARCTX
296              
297             Variants of C which puts the expression in void or scalar context.
298              
299             =head2 XPK_TERMEXPR
300              
301             I
302              
303             XPK_TERMEXPR
304              
305             A term expression is expected, parsed using C, and passed as
306             an optree in the I field.
307              
308             =head2 XPK_TERMEXPR_VOIDCTX, XPK_TERMEXPR_SCALARCTX
309              
310             Variants of C which puts the expression in void or scalar context.
311              
312             =head2 XPK_LISTEXPR
313              
314             I
315              
316             XPK_LISTEXPR
317              
318             A list expression is expected, parsed using C, and passed as
319             an optree in the I field.
320              
321             =head2 XPK_LISTEXPR_LISTCTX
322              
323             Variant of C which puts the expression in list context.
324              
325             =head2 XPK_IDENT, XPK_IDENT_OPT
326              
327             I
328              
329             A bareword identifier name is expected, and passed as an SV containing a PV
330             in the I field. An identifier is not permitted to contain a double colon
331             (C<::>).
332              
333             The C<_OPT>-suffixed version is optional; if no identifier is found then I
334             is set to C.
335              
336             =head2 XPK_PACKAGENAME, XPK_PACKAGENAME_OPT
337              
338             I
339              
340             A bareword package name is expected, and passed as an SV containing a PV in
341             the I field. A package name is similar to an identifier, except it permits
342             double colons in the middle.
343              
344             The C<_OPT>-suffixed version is optional; if no package name is found then
345             I is set to C.
346              
347             =head2 XPK_LEXVARNAME
348              
349             I
350              
351             XPK_LEXVARNAME(kind)
352              
353             A lexical variable name is expected, and passed as an SV containing a PV in
354             the I field. The C argument specifies what kinds of variable are
355             permitted, and should be a bitmask of one or more bits from
356             C, C and C. A convenient
357             shortcut C permits all three.
358              
359             =head2 XPK_ATTRIBUTES
360              
361             I
362              
363             A list of C<:>-prefixed attributes is expected, in the same format as sub or
364             variable attributes. An optional leading C<:> indicates the presence of
365             attributes, then one or more of them are parsed. Attributes may be optionally
366             separated by additional C<:>s, but this is not required.
367              
368             Each attribute is expected to be an identifier name, followed by an optional
369             value wrapped in parentheses. Whitespace is B permitted between the name
370             and value, as per standard Perl parsing rules.
371              
372             :attrname
373             :attrname(value)
374              
375             The I field indicates how many attributes were found. That number of
376             additional arguments are then passed, each containing two SVs in the
377             I and I fields. This number may be zero.
378              
379             It is not an error for there to be no attributes present, or for the optional
380             colon to be missing. In this case I will be set to zero.
381              
382             =head2 XPK_VSTRING, XPK_VSTRING_OPT
383              
384             I
385              
386             A version string is expected, of the form C including the leading C
387             character. It is passed as a L SV object in the I field.
388              
389             The C<_OPT>-suffixed version is optional; if no version string is found then
390             I is set to C.
391              
392             =head2 XPK_LEXVAR_MY
393              
394             I
395              
396             XPK_LEXVAR_MY(kind)
397              
398             A lexical variable name is expected, added to the current pad as if specified
399             in a C expression, and passed as the pad index in the I field.
400              
401             The C argument specifies what kinds of variable are permitted, as per
402             C.
403              
404             =head2 XPK_COMMA, XPK_COLON, XPK_EQUALS
405              
406             I
407              
408             A literal character (C<,>, C<:> or C<=>) is expected. No argument value is passed.
409              
410             =head2 XPK_AUTOSEMI
411              
412             I
413              
414             A literal semicolon (C<;>) as a statement terminator is optionally expected.
415             If the next token is a closing brace to indicate the end of a block, then a
416             semicolon is not required. If anything else is encountered an error will be
417             raised.
418              
419             This piece type is the same as specifying the C. It is
420             useful to put at the end of a sequence that forms part of a choice of syntax,
421             where some forms indicate a statement ending in a semicolon, whereas others
422             may end in a full block that does not need one.
423              
424             =head2 XPK_INFIX_*
425              
426             I
427              
428             An infix operator as recognised by L. The returned pointer
429             points to a structure allocated by C describing the
430             operator.
431              
432             Various versions of the macro are provided, each using a different selection
433             filter to choose certain available infix operators:
434              
435             XPK_INFIX_RELATION # any relational operator
436             XPK_INFIX_EQUALITY # an equality operator like `==` or `eq`
437             XPK_INFIX_MATCH_NOSMART # any sort of "match"-like operator, except smartmatch
438             XPK_INFIX_MATCH_SMART # XPK_INFIX_MATCH_NOSMART plus smartmatch
439              
440             =head2 XPK_LITERAL
441              
442             I
443              
444             XPK_LITERAL("literal")
445              
446             A literal string match is expected. No argument value is passed.
447              
448             This form should generally be avoided if at all possible, because it is very
449             easy to abuse to make syntaxes which confuse humans and code tools alike.
450             Generally it is best reserved just for the first component of a
451             C or C sequence, to provide a "secondary keyword"
452             that such a repeated item can look out for.
453              
454             =head2 XPK_KEYWORD
455              
456             I
457              
458             XPK_KEYWORD("keyword")
459              
460             A literal string match is expected. No argument value is passed.
461              
462             This is similar to C except that it additionally checks that the
463             following character is not an identifier character. This ensures that the
464             expected keyword-like behaviour is preserved. For example, given the input
465             C<"keyword">, the piece C would match it, whereas
466             C would not because of the subsequent C<"w"> character.
467              
468             =head2 XPK_SEQUENCE
469              
470             I
471              
472             XPK_SEQUENCE(pieces ...)
473              
474             A structural type which contains a number of pieces. This is normally
475             equivalent to simply placing the pieces in sequence inside their own
476             container, but it is useful inside C or C.
477              
478             An C supports probe if its first contained piece does; i.e.
479             is transparent to probing.
480              
481             =head2 XPK_OPTIONAL
482              
483             I
484              
485             XPK_OPTIONAL(pieces ...)
486              
487             A structural type which may expects to find its contained pieces, or is happy
488             not to. This will pass an argument whose I field contains either 1 or 0,
489             depending whether the contents were found. The first piece type within must
490             support probe.
491              
492             =head2 XPK_REPEATED
493              
494             I
495              
496             XPK_REPEATED(pieces ...)
497              
498             A structural type which expects to find zero or more repeats of its contained
499             pieces. This will pass an argument whose I field contains the count of the
500             number of repeats it found. The first piece type within must support probe.
501              
502             =head2 XPK_CHOICE
503              
504             I
505              
506             XPK_CHOICE(options ...)
507              
508             A structural type which expects to find one of a number of alternative
509             options. An ordered list of types is provided, all of which must support
510             probe. This will pass an argument whose I field gives the index of the
511             first choice that was accepted. The first option takes the value 0.
512              
513             As each of the options is interpreted as an alternative, not a sequence, you
514             should use C if a sequence of multiple items should be
515             considered as a single alternative.
516              
517             It is not an error if no choice matches. At that point, the I field will be
518             set to -1.
519              
520             If you require a failure message in this case, set the final choice to be of
521             type C. This will cause an error message to be printed instead.
522              
523             XPK_FAILURE("message string")
524              
525             =head2 XPK_TAGGEDCHOICE
526              
527             I
528              
529             XPK_TAGGEDCHOICE(choice, tag, ...)
530              
531             A structural type similar to C, except that each choice type is
532             followed by an element of type C which gives an integer. It is that
533             integer value, rather than the positional index of the choice within the list,
534             which is passed in the I field.
535              
536             XPK_TAG(value)
537              
538             As each of the options is interpreted as an alternative, not a sequence, you
539             should use C if a sequence of multiple items should be
540             considered as a single alternative.
541              
542             =head2 XPK_COMMALIST
543              
544             I
545              
546             XPK_COMMALIST(pieces ...)
547              
548             A structural type which expects to find one or more repeats of its contained
549             pieces, separated by literal comma (C<,>) characters. This is somewhat similar
550             to C, except that it needs at least one copy, needs commas
551             between its items, but does not require that the first contained piece support
552             probe (the comma itself is sufficient to indicate a repeat).
553              
554             An C supports probe if its first contained piece does; i.e.
555             is transparent to probing.
556              
557             =head2 XPK_PARENSCOPE
558              
559             I
560              
561             XPK_PARENSCOPE(pieces ...)
562              
563             A structural type which expects to find a sequence of pieces, all contained in
564             parentheses as C<( ... )>. This will pass no extra arguments.
565              
566             =head2 XPK_ARGSCOPE
567              
568             I
569              
570             XPK_ARGSCOPE(pieces ...)
571              
572             A structural type similar to C, except that the parentheses
573             themselves are optional; much like Perl's parsing of calls to known functions.
574              
575             If parentheses are encountered in the input, they will be consumed by this
576             piece and it will behave identically to C. If there is no open
577             parenthesis, this piece will behave like C and consume all the
578             pieces inside it, without expecting a closing parenthesis.
579              
580             =head2 XPK_BRACKETSCOPE
581              
582             I
583              
584             XPK_BRACKETSCOPE(pieces ...)
585              
586             A structural type which expects to find a sequence of pieces, all contained in
587             square brackets as C<[ ... ]>. This will pass no extra arguments.
588              
589             =head2 XPK_BRACESCOPE
590              
591             I
592              
593             XPK_BRACESCOPE(pieces ...)
594              
595             A structural type which expects to find a sequence of pieces, all contained in
596             braces as C<{ ... }>. This will pass no extra arguments.
597              
598             Note that this is not necessary to use with C or C;
599             those will already consume a set of braces. This is intended for special
600             constrained syntax that should not just accept an arbitrary block.
601              
602             =head2 XPK_CHEVRONSCOPE
603              
604             I
605              
606             XPK_CHEVRONSCOPE(pieces ...)
607              
608             A structural type which expects to find a sequence of pieces, all contained in
609             angle brackets as C<< < ... > >>. This will pass no extra arguments.
610              
611             Remember that expressions like C<< a > b >> are valid term expressions, so the
612             contents of this scope shouldn't allow arbitrary expressions or the closing
613             bracket will be ambiguous.
614              
615             =head2 XPK_PARENSCOPE_OPT, XPK_BRACKETSCOPE_OPT, XPK_BRACESCOPE_OPT, XPK_CHEVRONSCOPE_OPT
616              
617             I
618              
619             XPK_PARENSCOPE_OPT(pieces ...)
620             XPK_BRACKETSCOPE_OPT(pieces ...)
621             XPK_BRACESCOPE_OPT(pieces ...)
622             XPK_CHEVERONSCOPE_OPT(pieces ...)
623              
624             Each of the four C macros above has an optional variant, whose
625             name is suffixed by C<_OPT>. These pass an argument whose I field is either
626             true or false, indicating whether the scope was found, followed by the values
627             from the scope itself.
628              
629             This is a convenient shortcut to nesting the scope within a C
630             macro.
631              
632             =cut
633              
634             =head1 AUTHOR
635              
636             Paul Evans
637              
638             =cut
639              
640             0x55AA;