File Coverage

blib/lib/Text/BibTeX/Structure.pm
Criterion Covered Total %
statement 83 179 46.3
branch 12 70 17.1
condition 6 36 16.6
subroutine 15 33 45.4
pod 16 16 100.0
total 132 334 39.5


line stmt bran cond sub pod time code
1             # ----------------------------------------------------------------------
2             # NAME : BibTeX/Structure.pm
3             # CLASSES : Text::BibTeX::Structure, Text::BibTeX::StructuredEntry
4             # RELATIONS :
5             # DESCRIPTION: Provides the two base classes needed to implement
6             # Text::BibTeX structure modules.
7             # CREATED : in original form: Apr 1997
8             # completely redone: Oct 1997
9             # MODIFIED :
10             # VERSION : $Id$
11             # COPYRIGHT : Copyright (c) 1997-2000 by Gregory P. Ward. All rights
12             # reserved.
13             #
14             # This file is part of the Text::BibTeX library. This
15             # library is free software; you may redistribute it and/or
16             # modify it under the same terms as Perl itself.
17             # ----------------------------------------------------------------------
18              
19             package Text::BibTeX::Structure;
20              
21             require 5.004; # for 'isa' and 'can'
22              
23 1     1   7 use strict;
  1         1  
  1         28  
24 1     1   4 use Carp;
  1         2  
  1         50  
25              
26 1     1   5 use vars qw'$VERSION';
  1         15  
  1         48  
27             $VERSION = 0.87;
28              
29 1     1   6 use Text::BibTeX ('check_class');
  1         1  
  1         1431  
30              
31             =head1 NAME
32              
33             Text::BibTeX::Structure - provides base classes for user structure modules
34              
35             =head1 SYNOPSIS
36              
37             # Define a 'Foo' structure for BibTeX databases: first, the
38             # structure class:
39              
40             package Text::BibTeX::FooStructure;
41             @ISA = ('Text::BibTeX::Structure');
42              
43             sub known_option
44             {
45             my ($self, $option) = @_;
46              
47             ...
48             }
49              
50             sub default_option
51             {
52             my ($self, $option) = @_;
53              
54             ...
55             }
56              
57             sub describe_entry
58             {
59             my $self = shift;
60              
61             $self->set_fields ($type,
62             \@required_fields,
63             \@optional_fields,
64             [$constraint_1, $constraint_2, ...]);
65             ...
66             }
67              
68              
69             # Now, the structured entry class
70              
71             package Text::BibTeX::FooEntry;
72             @ISA = ('Text::BibTeX::StructuredEntry');
73              
74             # define whatever methods you like
75              
76             =head1 DESCRIPTION
77              
78             The module C provides two classes that form the
79             basis of the B "structure module" system. This system is how
80             database structures are defined and imposed on BibTeX files, and
81             provides an elegant synthesis of object-oriented techniques with
82             BibTeX-style database structures. Nothing described here is
83             particularly deep or subtle; anyone familiar with object-oriented
84             programming should be able to follow it. However, a fair bit of jargon
85             in invented and tossed around, so pay attention.
86              
87             A I, in B parlance, is just a set of allowed
88             entry types and the rules for fields in each of those entry types.
89             Currently, there are three kinds of rules that apply to fields: some
90             fields are I, meaning they must be present in every entry for
91             a given type; some are I, meaning they may be present, and
92             will be used if they are; other fields are members of I
93             sets>, which are explained in L<"Field lists and constraint sets">
94             below.
95              
96             A B structure is implemented with two classes: the I
97             class> and the I. The former defines everything
98             that applies to the structure as a whole (allowed types and field
99             rules). The latter provides methods that operate on individual entries
100             which conform (or are supposed to conform) to the structure. The two
101             classes provided by the C module are
102             C and C; these
103             serve as base classes for, respectively, all structure classes and all
104             structured entry classes. One canonical structure is provided as an
105             example with B: the C structure, which (via the
106             C and C classes) provides the same functionality
107             as the standard style files of BibTeX 0.99. It is hoped that other
108             programmers will write new bibliography-related structures, possibly
109             deriving from the C structure, to emulate some of the functionality
110             that is available through third-party BibTeX style files.
111              
112             The purpose of this manual page is to describe the whole "structure
113             module" system. It is mainly for programmers wishing to implement a new
114             database structure for data files with BibTeX syntax; if you are
115             interested in the particular rules for the BibTeX-emulating C
116             structure, see L.
117              
118             Please note that the C prefix is dropped from most module
119             and class names in this manual page, except where necessary.
120              
121             =head1 STRUCTURE CLASSES
122              
123             Structure classes have two roles: to define the list of allowed types
124             and field rules, and to handle I.
125              
126             =head2 Field lists and constraint sets
127              
128             Field lists and constraint sets define the database structure for a
129             particular entry type: that is, they specify the rules which an entry
130             must follow to conform to the structure (assuming that entry is of an
131             allowed type). There are three components to the field rules for each
132             entry type: a list of required fields, a list of optional fields, and
133             I. Required and optional fields should be obvious to
134             anyone with BibTeX experience: all required fields must be present, and
135             any optional fields that are present have some meaning to the structure.
136             (One could conceive of a "strict" interpretation, where any field not
137             mentioned in the official definition is disallowed; this would be
138             contrary to the open spirit of BibTeX databases, but could be useful in
139             certain applications where a stricter level of control is desired.
140             Currently, B does not offer such an option.)
141              
142             Field constraints capture the "one or the other, but not both" type of
143             relationships present for some entry types in the BibTeX standard style
144             files. Most BibTeX documentation glosses over the distinction between
145             mutually constrained fields and required/optional fields. For instance,
146             one of the standard entry types is C, and "C or C"
147             is given in the list of required fields for that type. The meaning of
148             this is that an entry of type C must have I the C
149             or C fields, but not both. Likewise, the "C or
150             C" are listed under the "optional fields" heading for C
151             entries; it would be more accurate to say that every C entry may
152             have one or the other, or neither, of C or C---but not
153             both.
154              
155             B attempts to clarify this situation by creating a third category
156             of fields, those that are mutually constrained. For instance, neither
157             C nor C appears in the list of required fields for
158             the C type according to B; rather, a field constraint is
159             created to express this relationship:
160              
161             [1, 1, ['author', 'editor']]
162              
163             That is, a field constraint is a reference to a three-element list. The
164             last element is a reference to the I, the list of fields
165             to which the constraint applies. (Calling this a set is a bit
166             inaccurate, as there are conditions in which the order of fields
167             matters---see the C method in L<"METHODS 2:
168             BASE STRUCTURED ENTRY CLASS">.) The first two elements are the minimum
169             and maximum number of fields from the constraint set that must be
170             present for an entry to conform to the constraint. This constraint thus
171             expresses that there must be exactly one (>= 1 and <= 1) of the fields
172             C and C in a C entry.
173              
174             The "either one or neither, but not both" constraint that applies to the
175             C and C fields for C entries is expressed slightly
176             differently:
177              
178             [0, 1, ['volume', 'number']]
179              
180             That is, either 0 or 1, but not the full 2, of C and C
181             may be present.
182              
183             It is important to note that checking and enforcing field constraints is
184             based purely on counting which fields from a set are actually present;
185             this mechanism can't capture "x must be present if y is" relationships.
186              
187             The requirements imposed on the actual structure class are simple: it
188             must provide a method C which sets up a fancy data
189             structure describing the allowed entry types and all the field rules for
190             those types. The C class provides methods (inherited by a
191             particular structure class) to help particular structure classes create
192             this data structure in a consistent, controlled way. For instance, the
193             C method in the BibTeX 0.99-emulating
194             C class is quite simple:
195              
196             sub describe_entry
197             {
198             my $self = shift;
199              
200             # series of 13 calls to $self->set_fields (one for each standard
201             # entry type)
202             }
203              
204             One of those calls to the C method defines the rules for
205             C entries:
206              
207             $self->set_fields ('book',
208             [qw(title publisher year)],
209             [qw(series address edition month note)],
210             [1, 1, [qw(author editor)]],
211             [0, 1, [qw(volume number)]]);
212              
213             The first field list is the list of required fields, and the second is
214             the list of optional fields. Any number of field constraints may follow
215             the list of optional fields; in this case, there are two, one for each
216             of the constraints (C/C and C/C)
217             described above. At no point is a list of allowed types explicitly
218             supplied; rather, each call to C adds one more allowed type.
219              
220             New structure modules that derive from existing ones will probably use the
221             C method (and possibly C) to augment an
222             existing entry type. Adding new types should be done with C,
223             though.
224              
225             =head2 Structure options
226              
227             The other responsibility of structure classes is to handle I
228             options>. These are scalar values that let the user customize the
229             behaviour of both the structure class and the structured entry class.
230             For instance, one could have an option to enable "extended structure",
231             which might add on a bunch of new entry types and new fields. (In this
232             case, the C method would have to pay attention to this
233             option and modify its behaviour accordingly.) Or, one could have
234             options to control how the structured entry class sorts or formats
235             entries (for bibliography structures such as C).
236              
237             The easy way to handle structure options is to provide two methods,
238             C and C. These return, respectively,
239             whether a given option is supported, and what its default value is. (If
240             your structure doesn't support any options, you can just inherit these
241             methods from the C class. The default C
242             returns false for all options, and its companion C
243             crashes with an "unknown option" error.)
244              
245             Once C and C are provided, the structure
246             class can sit back and inherit the more visible C and
247             C methods from the C class. These are the
248             methods actually used to modify/query options, and will be used by
249             application programs to customize the structure module's behaviour, and
250             by the structure module itself to pay attention to the user's wishes.
251              
252             Options should generally have pure string values, so that the generic
253             set_options method doesn't have to parse user-supplied strings into some
254             complicated structure. However, C will take any scalar
255             value, so if the structure module clearly documents its requirements,
256             the application program could supply a structure that meets its needs.
257             Keep in mind that this requires cooperation between the application and
258             the structure module; the intermediary code in
259             C knows nothing about the format or syntax of
260             your structure's options, and whatever scalar the application passes via
261             C will be stored for your module to retrieve via
262             C.
263              
264             As an example, the C structure supports a number of "markup"
265             options that allow applications to control the markup language used for
266             formatting bibliographic entries. These options are naturally paired,
267             as formatting commands in markup languages generally have to be turned
268             on and off. The C structure thus expects references to two-element
269             lists for markup options; to specify LaTeX 2e-style emphasis for book
270             titles, an application such as C would set the C
271             option as follows:
272              
273             $structure->set_options (btitle_mkup => ['\emph{', '}']);
274              
275             Other options for other structures might have a more complicated
276             structure, but it's up to the structure class to document and enforce
277             this.
278              
279             =head1 STRUCTURED ENTRY CLASSES
280              
281             A I defines the behaviour of individual entries
282             under the regime of a particular database structure. This is the
283             Itre> for any database structure: the structure class
284             merely lays out the rules for entries to conform to the structure, but
285             the structured entry class provides the methods that actually operate on
286             individual entries. Because this is completely open-ended, the
287             requirements of a structured entry class are much less rigid than for a
288             structure class. In fact, all of the requirements of a structured entry
289             class can be met simply by inheriting from
290             C, the other class provided by the
291             C module. (For the record, those requirements
292             are: a structured entry class must provide the entry
293             parse/query/manipulate methods of the C class, and it must
294             provide the C, C, and C methods of the
295             C class. Since C inherits from
296             C, both of these requirements are met "for free" by structured
297             entry classes that inherit from C, so
298             naturally this is the recommended course of action!)
299              
300             There are deliberately no other methods required of structured entry
301             classes. A particular application (eg. C for bibliography
302             structures) will require certain methods, but it's up to the application
303             and the structure module to work out the requirements through
304             documentation.
305              
306             =head1 CLASS INTERACTIONS
307              
308             Imposing a database structure on your entries sets off a chain reaction
309             of interactions between various classes in the C library
310             that should be transparent when all goes well. It could prove confusing
311             if things go wrong and you have to go wading through several levels of
312             application program, core C classes, and some structure
313             module.
314              
315             The justification for this complicated behaviour is that it allows you
316             to write programs that will use a particular structured module without
317             knowing the name of the structure when you write the program. Thus, the
318             user can supply a database structure, and ultimately the entry objects
319             you manipulate will be blessed into a class supplied by the structure
320             module. A short example will illustrate this.
321              
322             Typically, a C-based program is based around a kernel of
323             code like this:
324              
325             $bibfile = Text::BibTeX::File->new("foo.bib");
326             while ($entry = Text::BibTeX::Entry->new($bibfile))
327             {
328             # process $entry
329             }
330              
331             In this case, nothing fancy is happening behind the scenes: the
332             C<$bibfile> object is blessed into the C class, and
333             C<$entry> is blessed into C. This is the
334             conventional behaviour of Perl classes, but it is not the only possible
335             behaviour. Let us now suppose that C<$bibfile> is expected to conform
336             to a database structure specified by C<$structure> (presumably a
337             user-supplied value, and thus unknown at compile-time):
338              
339             $bibfile = Text::BibTeX::File->new("foo.bib");
340             $bibfile->set_structure ($structure);
341             while ($entry = Text::BibTeX::Entry->new($bibfile))
342             {
343             # process $entry
344             }
345              
346             A lot happens behind the scenes with the call to C<$bibfile>'s
347             C method. First, a new structure object is created from
348             C<$structure>. The structure name implies the name of a Perl
349             module---the structure module---which is C'd by the
350             C constructor. (The main consequence of this is that any
351             compile-time errors in your structure module will not be revealed until
352             a C or
353             C call attempts to load it.)
354              
355             Recall that the first responsibility of a structure module is to define
356             a structure class. The "structure object" created by the
357             C method call is actually an object of this class; this
358             is the first bit of trickery---the structure object (buried behind the
359             scenes) is blessed into a class whose name is not known until run-time.
360              
361             Now, the behaviour of the C constructor
362             changes subtly: rather than returning an object blessed into the
363             C class as you might expect from the code, the
364             object is blessed into the structured entry class associated with
365             C<$structure>.
366              
367             For example, if the value of C<$structure> is C<"Foo">, that means the
368             user has supplied a module implementing the C structure.
369             (Ordinarily, this module would be called C---but you
370             can customize this.) Calling the C method on C<$bibfile>
371             will attempt to create a new structure object via the
372             C constructor, which loads the structure module
373             C. Once this module is successfully loaded, the new
374             object is blessed into its structure class, which will presumably be
375             called C (again, this is customizable). The
376             new object is supplied with the user's structure options via the
377             C method (usually inherited), and then it is asked to
378             describe the actual entry layout by calling its C
379             method. This, in turn, will usually call the inherited C
380             method for each entry type in the database structure. When the
381             C constructor is finished, the new structure object is stored
382             in the C object (remember, we started all this by calling
383             C on a C object) for future reference.
384              
385             Then, when a new C object is created and parsed from that
386             particular C object, some more trickery happens. Trivially, the
387             structure object stored in the C object is also stored in the
388             C object. (The idea is that entries could belong to a database
389             structure independently of any file, but usually they will just get the
390             structure that was assigned to their database file.) More importantly,
391             the new C object is re-blessed into the structured entry class
392             supplied by the structure module---presumably, in this case,
393             C (also customizable).
394              
395             Once all this sleight-of-hand is accomplished, the application may treat
396             its entry objects as objects of the structured entry class for the
397             C structure---they may call the check/coerce methods inherited from
398             C, and they may also call any methods
399             specific to entries for this particular database structure. What these
400             methods might be is up to the structure implementor to decide and
401             document; thus, applications may be specific to one particular database
402             structure, or they may work on all structures that supply certain
403             methods. The choice is up to the application developer, and the range
404             of options open to him depends on which methods structure implementors
405             provide.
406              
407             =head1 EXAMPLE
408              
409             For example code, please refer to the source of the C module and
410             the C, C, and C applications supplied with
411             C.
412              
413             =head1 METHODS 1: BASE STRUCTURE CLASS
414              
415             The first class provided by the C module is
416             C. This class is intended to provide methods
417             that will be inherited by user-supplied structure classes; such classes
418             should not override any of the methods described here (except
419             C and C) without very good reason.
420             Furthermore, overriding the C method would be useless, because in
421             general applications won't know the name of your structure class---they
422             can only call C (usually via
423             C).
424              
425             Finally, there are three methods that structure classes should
426             implement: C, C, and C.
427             The first two are described in L<"Structure options"> above, the latter
428             in L<"Field lists and constraint sets">. Note that C
429             depends heavily on the C, C, and
430             C methods described here.
431              
432             =head2 Constructor/simple query methods
433              
434             =over 4
435              
436             =item new (STRUCTURE, [OPTION =E VALUE, ...])
437              
438             Constructs a new structure object---I a C
439             object, but rather an object blessed into the structure class associated
440             with STRUCTURE. More precisely:
441              
442             =over 4
443              
444             =item *
445              
446             Loads (with C) the module implementing STRUCTURE. In the
447             absence of other information, the module name is derived by appending
448             STRUCTURE to C<"Text::BibTeX::">---thus, the module C
449             implements the C structure. Use the pseudo-option C to
450             override this module name. For instance, if the structure C is
451             implemented by the module C:
452              
453             $structure = Text::BibTeX::Structure->new
454             ('Foo', module => 'Foo');
455              
456             This method Cs if there are any errors loading/compiling the
457             structure module.
458              
459             =item *
460              
461             Verifies that the structure module provides a structure class and a
462             structured entry class. The structure class is named by appending
463             C<"Structure"> to the name of the module, and the structured entry class
464             by appending C<"Entry">. Thus, in the absence of a C option,
465             these two classes (for the C structure) would be named
466             C and C. Either or
467             both of the default class names may be overridden by having the
468             structure module return a reference to a hash (as opposed to the
469             traditional C<1> returned by modules). This hash could then supply a
470             C element to name the structure class, and an
471             C element to name the structured entry class.
472              
473             Apart from ensuring that the two classes actually exist, C verifies
474             that they inherit correctly (from C and
475             C respectively), and that the structure
476             class provides the required C, C, and
477             C methods.
478              
479             =item *
480              
481             Creates the new structure object, and blesses it into the structure
482             class. Supplies it with options by passing all (OPTION, VALUE) pairs to
483             its C method. Calls its C method, which
484             should list the field requirements for all entry types recognized by
485             this structure. C will most likely use some or all of
486             the C, C, and C
487             methods---described below---for this.
488              
489             =back
490              
491             =cut
492              
493             sub new
494             {
495 1     1 1 4 my ($type, $name, %options) = @_;
496              
497             # - $type is presumably "Text::BibTeX::Structure" (if called from
498             # Text::BibTeX::File::set_structure), but shouldn't assume that
499             # - $name is the name of the user-supplied structure; it also
500             # determines the module we will attempt to load here, unless
501             # a 'module' option is given in %options
502             # - %options is a mix of options recognized here (in particular
503             # 'module'), by Text::BibTeX::StructuredEntry (? 'check', 'coerce',
504             # 'warn' flags), and by the user structure classes
505              
506 1   33     8 my $module = (delete $options{'module'}) || ('Text::BibTeX::' . $name);
507              
508 1         62 my $module_info = eval "require $module";
509 1 50       7 die "Text::BibTeX::Structure: unable to load module \"$module\" for " .
510             "user structure \"$name\": $@\n"
511             if $@;
512              
513 1         3 my ($structure_class, $entry_class);
514 1 50       5 if (ref $module_info eq 'HASH')
515             {
516 0         0 $structure_class = $module_info->{'structure_class'};
517 0         0 $entry_class = $module_info->{'entry_class'};
518             }
519 1   33     9 $structure_class ||= $module . 'Structure';
520 1   33     6 $entry_class ||= $module . 'Entry';
521              
522 1         7 check_class ($structure_class, "user structure class",
523             'Text::BibTeX::Structure',
524             ['known_option', 'default_option', 'describe_entry']);
525 1         5 check_class ($entry_class, "user entry class",
526             'Text::BibTeX::StructuredEntry',
527             []);
528              
529 1         3 my $self = bless {}, $structure_class;
530 1         6 $self->{entry_class} = $entry_class;
531 1         3 $self->{name} = $name;
532 1         8 $self->set_options (%options); # these methods are both provided by
533 1         5 $self->describe_entry; # the user structure class
534 1         6 $self;
535             }
536              
537              
538             =item name ()
539              
540             Returns the name of the structure described by the object.
541              
542             =item entry_class ()
543              
544             Returns the name of the structured entry class associated with this
545             structure.
546              
547             =back
548              
549             =cut
550              
551 0     0 1 0 sub name { shift->{'name'} }
552              
553 2     2 1 8 sub entry_class { shift->{'entry_class'} }
554              
555              
556             =head2 Field structure description methods
557              
558             =over 4
559              
560             =item add_constraints (TYPE, CONSTRAINT, ...)
561              
562             Adds one or more field constraints to the structure. A field constraint
563             is specified as a reference to a three-element list; the last element is
564             a reference to the list of fields affected, and the first two elements
565             are the minimum and maximum number of fields from the constraint set
566             allowed in an entry of type TYPE. See L<"Field lists and constraint
567             sets"> for a full explanation of field constraints.
568              
569             =cut
570              
571             sub add_constraints
572             {
573 14     14 1 38 my ($self, $type, @constraints) = @_;
574 14         17 my ($constraint);
575              
576 14         27 foreach $constraint (@constraints)
577             {
578 9         15 my ($min, $max, $fields) = @$constraint;
579 9 50 33     36 croak "add_constraints: constraint record must be a 3-element " .
580             "list, with the last element a list ref"
581             unless (@$constraint == 3 && ref $fields eq 'ARRAY');
582 9 50 33     41 croak "add_constraints: constraint record must have 0 <= 'min' " .
      33        
583             "<= 'max' <= length of field list"
584             unless ($min >= 0 && $max >= $min && $max <= @$fields);
585 9         13 map { $self->{fields}{$type}{$_} = $constraint } @$fields;
  18         43  
586             }
587 14         19 push (@{$self->{fieldgroups}{$type}{'constraints'}}, @constraints);
  14         43  
588              
589             } # add_constraints
590              
591              
592             =item add_fields (TYPE, REQUIRED [, OPTIONAL [, CONSTRAINT, ...]])
593              
594             Adds fields to the required/optional lists for entries of type TYPE.
595             Can also add field constraints, but you can just as easily use
596             C for that.
597              
598             REQUIRED and OPTIONAL, if defined, should be references to lists of
599             fields to add to the respective field lists. The CONSTRAINTs, if given,
600             are exactly as described for C above.
601              
602             =cut
603              
604             sub add_fields # add fields for a particular type
605             {
606 0     0 1 0 my ($self, $type, $required, $optional, @constraints) = @_;
607              
608             # to be really robust and inheritance-friendly, we should:
609             # - check that no field is in > 1 list (just check $self->{fields}
610             # before we start assigning stuff)
611             # - allow sub-classes to delete fields or move them to another group
612              
613 0 0       0 if ($required)
614             {
615 0         0 push (@{$self->{fieldgroups}{$type}{'required'}}, @$required);
  0         0  
616 0         0 map { $self->{fields}{$type}{$_} = 'required' } @$required;
  0         0  
617             }
618              
619 0 0       0 if ($optional)
620             {
621 0         0 push (@{$self->{fieldgroups}{$type}{'optional'}}, @$optional);
  0         0  
622 0         0 map { $self->{fields}{$type}{$_} = 'optional' } @$optional;
  0         0  
623             }
624              
625 0         0 $self->add_constraints ($type, @constraints);
626              
627             } # add_fields
628              
629              
630             =item set_fields (TYPE, REQUIRED [, OPTIONAL [, CONSTRAINTS, ...]])
631              
632             Sets the lists of required/optional fields for entries of type TYPE.
633             Identical to C, except that the field lists and list of
634             constraints are set from scratch here, rather than being added to.
635              
636             =back
637              
638             =cut
639              
640             sub set_fields
641             {
642 14     14 1 29 my ($self, $type, $required, $optional, @constraints) = @_;
643 14         23 my ($constraint, $field);
644              
645 14         17 undef %{$self->{fields}{$type}};
  14         41  
646              
647 14 50       32 if ($required)
648             {
649 14         37 $self->{fieldgroups}{$type}{'required'} = $required;
650 14         24 map { $self->{fields}{$type}{$_} = 'required' } @$required;
  41         88  
651             }
652              
653 14 50       25 if ($optional)
654             {
655 14         24 $self->{fieldgroups}{$type}{'optional'} = $optional;
656 14         20 map { $self->{fields}{$type}{$_} = 'optional' } @$optional;
  82         166  
657             }
658              
659 14         20 undef @{$self->{fieldgroups}{$type}{'constraints'}};
  14         28  
660 14         64 $self->add_constraints ($type, @constraints);
661              
662             } # set_fields
663              
664              
665             =head2 Field structure query methods
666              
667             =over 4
668              
669             =item types ()
670              
671             Returns the list of entry types supported by the structure.
672              
673             =item known_type (TYPE)
674              
675             Returns true if TYPE is a supported entry type.
676              
677             =item known_field (TYPE, FIELD)
678              
679             Returns true if FIELD is in the required list, optional list, or one of
680             the constraint sets for entries of type TYPE.
681              
682             =item required_fields (TYPE)
683              
684             Returns the list of required fields for entries of type TYPE.
685              
686             =item optional_fields ()
687              
688             Returns the list of optional fields for entries of type TYPE.
689              
690             =item field_constraints ()
691              
692             Returns the list of field constraints (in the format supplied to
693             C) for entries of type TYPE.
694              
695             =back
696              
697             =cut
698              
699             sub types
700             {
701 0     0 1 0 my $self = shift;
702              
703 0         0 keys %{$self->{'fieldgroups'}};
  0         0  
704             }
705              
706             sub known_type
707             {
708 0     0 1 0 my ($self, $type) = @_;
709              
710 0         0 exists $self->{'fieldgroups'}{$type};
711             }
712              
713             sub _check_type
714             {
715 0     0   0 my ($self, $type) = @_;
716              
717             croak "unknown entry type \"$type\" for $self->{'name'} structure"
718 0 0       0 unless exists $self->{'fieldgroups'}{$type};
719             }
720              
721             sub known_field
722             {
723 0     0 1 0 my ($self, $type, $field) = @_;
724              
725 0         0 $self->_check_type ($type);
726 0         0 $self->{'fields'}{$type}{$field}; # either 'required', 'optional', or
727             } # a constraint record (or undef!)
728              
729             sub required_fields
730             {
731 0     0 1 0 my ($self, $type) = @_;
732              
733 0         0 $self->_check_type ($type);
734 0         0 @{$self->{'fieldgroups'}{$type}{'required'}};
  0         0  
735             }
736              
737             sub optional_fields
738             {
739 0     0 1 0 my ($self, $type) = @_;
740              
741 0         0 $self->_check_type ($type);
742 0         0 @{$self->{'fieldgroups'}{$type}{'optional'}};
  0         0  
743             }
744              
745             sub field_constraints
746             {
747 0     0 1 0 my ($self, $type) = @_;
748              
749 0         0 $self->_check_type ($type);
750 0         0 @{$self->{'fieldgroups'}{$type}{'constraints'}};
  0         0  
751             }
752              
753              
754             =head2 Option methods
755              
756             =over 4
757              
758             =item known_option (OPTION)
759              
760             Returns false. This is mainly for the use of derived structures that
761             don't have any options, and thus don't need to provide their own
762             C method. Structures that actually offer options should
763             override this method; it should return true if OPTION is a supported
764             option.
765              
766             =cut
767              
768             sub known_option
769             {
770 0     0 1 0 return 0;
771             }
772              
773              
774             =item default_option (OPTION)
775              
776             Crashes with an "unknown option" message. Again, this is mainly for use
777             by derived structure classes that don't actually offer any options.
778             Structures that handle options should override this method; every option
779             handled by C should have a default value (which might just
780             be C) that is returned by C. Your
781             C method should crash on an unknown option, perhaps by
782             calling C (in order to ensure consistent error
783             messages). For example:
784              
785             sub default_option
786             {
787             my ($self, $option) = @_;
788             return $default_options{$option}
789             if exists $default_options{$option};
790             $self->SUPER::default_option ($option); # crash
791             }
792              
793             The default value for an option is returned by C when that
794             options has not been explicitly set with C.
795              
796             =cut
797              
798             sub default_option
799             {
800 0     0 1 0 my ($self, $option) = @_;
801              
802 0         0 croak "unknown option \"$option\" for structure \"$self->{'name'}\"";
803             }
804              
805              
806             =item set_options (OPTION =E VALUE, ...)
807              
808             Sets one or more option values. (You can supply as many
809             C
810             number of arguments.) Each OPTION must be handled by the structure
811             module (as indicated by the C method); if not
812             C will C. Each VALUE may be any scalar value; it's
813             up to the structure module to validate them.
814              
815             =cut
816              
817             sub set_options
818             {
819 9     9 1 16166 my $self = shift;
820 9         20 my ($option, $value);
821              
822 9 50       27 croak "must supply an even number of arguments (option/value pairs)"
823             unless @_ % 2 == 0;
824 9         23 while (@_)
825             {
826 10         24 ($option, $value) = (shift, shift);
827 10 50       29 croak "unknown option \"$option\" for structure \"$self->{'name'}\""
828             unless $self->known_option ($option);
829 10         35 $self->{'options'}{$option} = $value;
830             }
831             }
832              
833              
834             =item get_options (OPTION, ...)
835              
836             Returns the value(s) of one or more options. Any OPTION that has not
837             been set by C will return its default value, fetched using
838             the C method. If OPTION is not supported by the
839             structure module, then your program either already crashed (when it
840             tried to set it with C), or it will crash here (thanks to
841             calling C).
842              
843             =back
844              
845             =cut
846              
847             sub get_options
848             {
849 49     49 1 68 my $self = shift;
850 49         80 my ($options, $option, $value, @values);
851              
852 49         72 $options = $self->{'options'};
853 49         107 while (@_)
854             {
855 58         83 $option = shift;
856             $value = (exists $options->{$option})
857 58 100       141 ? $options->{$option}
858             : $self->default_option ($option);
859 58         136 push (@values, $value);
860             }
861              
862 49 100       147 wantarray ? @values : $values[0];
863             }
864              
865              
866              
867             # ----------------------------------------------------------------------
868             # Text::BibTeX::StructuredEntry methods dealing with entry structure
869              
870             package Text::BibTeX::StructuredEntry;
871 1     1   9 use strict;
  1         2  
  1         34  
872 1     1   6 use vars qw(@ISA $VERSION);
  1         2  
  1         88  
873             $VERSION = 0.87;
874              
875 1     1   7 use Carp;
  1         2  
  1         70  
876              
877             @ISA = ('Text::BibTeX::Entry');
878 1     1   6 use Text::BibTeX qw(:metatypes display_list);
  1         2  
  1         867  
879              
880             =head1 METHODS 2: BASE STRUCTURED ENTRY CLASS
881              
882             The other class provided by the C module is
883             C, the base class for all structured entry classes.
884             This class inherits from C, so all of its entry
885             query/manipulation methods are available. C adds
886             methods for checking that an entry conforms to the database structure
887             defined by a structure class.
888              
889             It only makes sense for C to be used as a base class;
890             you would never create standalone C objects. The
891             superficial reason for this is that only particular structured-entry
892             classes have an actual structure class associated with them,
893             C on its own doesn't have any information about allowed
894             types, required fields, field constraints, and so on. For a deeper
895             understanding, consult L<"CLASS INTERACTIONS"> above.
896              
897             Since C derives from C, it naturally operates on
898             BibTeX entries. Hence, the following descriptions refer to "the
899             entry"---this is just the object (entry) being operated on. Note that
900             these methods are presented in bottom-up order, meaning that the methods
901             you're most likely to actually use---C, C, and
902             C are at the bottom. On a first reading, you'll
903             probably want to skip down to them for a quick summary.
904              
905             =over 4
906              
907             =item structure ()
908              
909             Returns the object that defines the structure the entry to which is
910             supposed to conform. This will be an instantiation of some structure
911             class, and exists mainly so the check/coerce methods can query the
912             structure about the types and fields it recognizes. If, for some
913             reason, you wanted to query an entry's structure about the validity of
914             type C, you might do this:
915              
916             # assume $entry is an object of some structured entry class, i.e.
917             # it inherits from Text::BibTeX::StructuredEntry
918             $structure = $entry->structure;
919             $foo_known = $structure->known_type ('foo');
920              
921             =cut
922              
923             sub structure
924             {
925 49     49   68 my $self = shift;
926 49         138 $self->{'structure'};
927             }
928              
929              
930             =item check_type ([WARN])
931              
932             Returns true if the entry has a valid type according to its structure.
933             If WARN is true, then an invalid type results in a warning being
934             printed.
935              
936             =cut
937              
938             sub check_type
939             {
940 0     0     my ($self, $warn) = @_;
941              
942 0           my $type = $self->{'type'};
943 0 0         if (! $self->{'structure'}->known_type ($type))
944             {
945 0 0         $self->warn ("unknown entry type \"$type\"") if $warn;
946 0           return 0;
947             }
948 0           return 1;
949             }
950              
951              
952             =item check_required_fields ([WARN [, COERCE]])
953              
954             Checks that all required fields are present in the entry. If WARN is
955             true, then a warning is printed for every missing field. If COERCE is
956             true, then missing fields are set to the empty string.
957              
958             This isn't generally used by other code; see the C and C
959             methods below.
960              
961             =cut
962              
963             sub check_required_fields
964             {
965 0     0     my ($self, $warn, $coerce) = @_;
966 0           my ($field, $warning);
967 0           my $num_errors = 0;
968            
969 0           foreach $field ($self->{'structure'}->required_fields ($self->type))
970             {
971 0 0         if (! $self->exists ($field))
972             {
973 0 0         $warning = "required field '$field' not present" if $warn;
974 0 0         if ($coerce)
975             {
976 0 0         $warning .= " (setting to empty string)" if $warn;
977 0           $self->set ($field, '');
978             }
979 0 0         $self->warn ($warning) if $warn;
980 0           $num_errors++;
981             }
982             }
983            
984             # Coercion is always successful, so if $coerce is true return true.
985             # Otherwise, return true if no errors found.
986              
987 0   0       return $coerce || ($num_errors == 0);
988              
989             } # check_required_fields
990              
991              
992             =item check_field_constraints ([WARN [, COERCE]])
993              
994             Checks that the entry conforms to all of the field constraints imposed
995             by its structure. Recall that a field constraint consists of a list of
996             fields, and a minimum and maximum number of those fields that must be
997             present in an entry. For each constraint, C
998             simply counts how many fields in the constraint's field set are present.
999             If this count falls below the minimum or above the maximum for that
1000             constraint and WARN is true, a warning is issued. In general, this
1001             warning is of the form "between x and y of fields foo, bar, and baz must
1002             be present". The more common cases are handled specially to generate
1003             more useful and human-friendly warning messages.
1004              
1005             If COERCE is true, then the entry is modified to force it into
1006             conformance with all field constraints. How this is done depends on
1007             whether the violation is a matter of not enough fields present in the
1008             entry, or of too many fields present. In the former case, just enough
1009             fields are added (as empty strings) to meet the requirements of the
1010             constraint; in the latter case, fields are deleted. Which fields to add
1011             or delete is controlled by the order of fields in the constraint's field
1012             list.
1013              
1014             An example should clarify this. For instance, a field constraint
1015             specifying that exactly one of C or C must appear in an
1016             entry would look like this:
1017              
1018             [1, 1, ['author', 'editor']]
1019              
1020             Suppose the following entry is parsed and expected to conform to this
1021             structure:
1022              
1023             @inbook{unknown:1997a,
1024             title = "An Unattributed Book Chapter",
1025             booktitle = "An Unedited Book",
1026             publisher = "Foo, Bar \& Company",
1027             year = 1997
1028             }
1029              
1030             If C is called on this method with COERCE true
1031             (which is done by any of the C, C, and
1032             C methods), then the C field is set to the
1033             empty string. (We go through the list of fields in the constraint's
1034             field set in order -- since C is the first missing field, we
1035             supply it; with that done, the entry now conforms to the
1036             C/C constraint, so we're done.)
1037              
1038             However, if the same structure was applied to this entry:
1039              
1040             @inbook{smith:1997a,
1041             author = "John Smith",
1042             editor = "Fred Jones",
1043             ...
1044             }
1045              
1046             then the C field would be deleted. In this case, we allow the
1047             first field in the constraint's field list---C. Since only one
1048             field from the set may be present, all fields after the first one are in
1049             violation, so they are deleted.
1050              
1051             Again, this method isn't generally used by other code; rather, it is
1052             called by C and its friends below.
1053              
1054             =cut
1055              
1056             sub check_field_constraints
1057             {
1058 0     0     my ($self, $warn, $coerce) = @_;
1059              
1060 0           my $num_errors = 0;
1061 0           my $constraint;
1062              
1063 0           foreach $constraint ($self->{'structure'}->field_constraints ($self->type))
1064             {
1065 0           my ($warning);
1066 0           my ($min, $max, $fields) = @$constraint;
1067              
1068 0           my $field;
1069 0           my $num_seen = 0;
1070 0 0         map { $num_seen++ if $self->exists ($_) } @$fields;
  0            
1071              
1072 0 0 0       if ($num_seen < $min || $num_seen > $max)
1073             {
1074 0 0         if ($warn)
1075             {
1076 0 0 0       if ($min == 0 && $max > 0)
    0 0        
    0          
1077             {
1078 0           $warning = sprintf ("at most %d of fields %s may be present",
1079             $max, display_list ($fields, 1));
1080             }
1081             elsif ($min < @$fields && $max == @$fields)
1082             {
1083 0           $warning = sprintf ("at least %d of fields %s must be present",
1084             $min, display_list ($fields, 1));
1085             }
1086             elsif ($min == $max)
1087             {
1088 0 0         $warning = sprintf ("exactly %d of fields %s %s be present",
1089             $min, display_list ($fields, 1),
1090             ($num_seen < $min) ? "must" : "may");
1091             }
1092             else
1093             {
1094 0           $warning = sprintf ("between %d and %d of fields %s " .
1095             "must be present",
1096             $min, $max, display_list ($fields, 1))
1097             }
1098             }
1099              
1100 0 0         if ($coerce)
1101             {
1102 0 0         if ($num_seen < $min)
    0          
1103             {
1104 0           my @blank = @{$fields}[$num_seen .. ($min-1)];
  0            
1105 0 0         $warning .= sprintf (" (setting %s to empty string)",
1106             display_list (\@blank, 1))
1107             if $warn;
1108 0           @blank = map (($_, ''), @blank);
1109 0           $self->set (@blank);
1110             }
1111             elsif ($num_seen > $max)
1112             {
1113 0           my @delete = @{$fields}[$max .. ($num_seen-1)];
  0            
1114 0 0         $warning .= sprintf (" (deleting %s)",
1115             display_list (\@delete, 1))
1116             if $warn;
1117 0           $self->delete (@delete);
1118             }
1119             } # if $coerce
1120              
1121 0 0         $self->warn ($warning) if $warn;
1122 0           $num_errors++;
1123             } # if $num_seen out-of-range
1124              
1125             } # foreach $constraint
1126              
1127             # Coercion is always successful, so if $coerce is true return true.
1128             # Otherwise, return true if no errors found.
1129              
1130 0   0       return $coerce || ($num_errors == 0);
1131              
1132             } # check_field_constraints
1133              
1134              
1135             =item full_check ([WARN [, COERCE]])
1136              
1137             Returns true if an entry's type and fields are all valid. That is, it
1138             calls C, C, and
1139             C; if all of them return true, then so does
1140             C. WARN and COERCE are simply passed on to the three
1141             C methods: the first controls the printing of warnings, and the
1142             second decides whether we should modify the entry to force it into
1143             conformance.
1144              
1145             =cut
1146              
1147             sub full_check
1148             {
1149 0     0     my ($self, $warn, $coerce) = @_;
1150              
1151 0 0         return 1 unless $self->metatype == &BTE_REGULAR;
1152 0 0         return unless $self->check_type ($warn);
1153 0   0       return $self->check_required_fields ($warn, $coerce) &&
1154             $self->check_field_constraints ($warn, $coerce);
1155             }
1156              
1157              
1158             # Front ends for full_check -- there are actually four possible wrappers,
1159             # but having both $warn and $coerce false is pointless.
1160              
1161             =item check ()
1162              
1163             Checks that the entry conforms to the requirements of its associated
1164             database structure: the type must be known, all required fields must be
1165             present, and all field constraints must be met. See C,
1166             C, and C for details.
1167              
1168             Calling C is the same as calling C with WARN true and
1169             COERCE false.
1170              
1171             =item coerce ()
1172              
1173             Same as C, except entries are coerced into conformance with the
1174             database structure---that is, it's just like C with both
1175             WARN and COERCE true.
1176              
1177             =item silently_coerce ()
1178              
1179             Same as C, except warnings aren't printed---that is, it's just
1180             like C with WARN false and COERCE true.
1181              
1182             =back
1183              
1184             =cut
1185              
1186 0     0     sub check { shift->full_check (1, 0) }
1187              
1188 0     0     sub coerce { shift->full_check (1, 1) }
1189              
1190 0     0     sub silently_coerce { shift->full_check (0, 1) }
1191              
1192             1;
1193              
1194             =head1 SEE ALSO
1195              
1196             L, L, L
1197              
1198             =head1 AUTHOR
1199              
1200             Greg Ward
1201              
1202             =head1 COPYRIGHT
1203              
1204             Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This file
1205             is part of the Text::BibTeX library. This library is free software; you
1206             may redistribute it and/or modify it under the same terms as Perl itself.