line |
stmt |
bran |
cond |
sub |
pod |
time |
code |
1
|
|
|
|
|
|
|
# ---------------------------------------------------------------------- |
2
|
|
|
|
|
|
|
# NAME : BibTeX/NameFormat.pm |
3
|
|
|
|
|
|
|
# CLASSES : Text::BibTeX::NameFormat |
4
|
|
|
|
|
|
|
# RELATIONS : |
5
|
|
|
|
|
|
|
# DESCRIPTION: Provides a way to format already-parsed BibTeX-style |
6
|
|
|
|
|
|
|
# author names. (The parsing is done by the |
7
|
|
|
|
|
|
|
# Text::BibTeX:Name class.) |
8
|
|
|
|
|
|
|
# CREATED : Nov 1997, Greg Ward |
9
|
|
|
|
|
|
|
# MODIFIED : |
10
|
|
|
|
|
|
|
# VERSION : $Id$ |
11
|
|
|
|
|
|
|
# COPYRIGHT : Copyright (c) 1997-2000 by Gregory P. Ward. All rights |
12
|
|
|
|
|
|
|
# reserved. |
13
|
|
|
|
|
|
|
# |
14
|
|
|
|
|
|
|
# This file is part of the Text::BibTeX library. This |
15
|
|
|
|
|
|
|
# library is free software; you may redistribute it and/or |
16
|
|
|
|
|
|
|
# modify it under the same terms as Perl itself. |
17
|
|
|
|
|
|
|
# ---------------------------------------------------------------------- |
18
|
|
|
|
|
|
|
|
19
|
|
|
|
|
|
|
package Text::BibTeX::NameFormat; |
20
|
|
|
|
|
|
|
|
21
|
|
|
|
|
|
|
require 5.004; |
22
|
|
|
|
|
|
|
|
23
|
13
|
|
|
13
|
|
90
|
use strict; |
|
13
|
|
|
|
|
27
|
|
|
13
|
|
|
|
|
386
|
|
24
|
13
|
|
|
13
|
|
65
|
use Carp; |
|
13
|
|
|
|
|
32
|
|
|
13
|
|
|
|
|
700
|
|
25
|
13
|
|
|
13
|
|
85
|
use vars qw'$VERSION'; |
|
13
|
|
|
|
|
34
|
|
|
13
|
|
|
|
|
6456
|
|
26
|
|
|
|
|
|
|
$VERSION = 0.87; |
27
|
|
|
|
|
|
|
|
28
|
|
|
|
|
|
|
=head1 NAME |
29
|
|
|
|
|
|
|
|
30
|
|
|
|
|
|
|
Text::BibTeX::NameFormat - format BibTeX-style author names |
31
|
|
|
|
|
|
|
|
32
|
|
|
|
|
|
|
=head1 SYNOPSIS |
33
|
|
|
|
|
|
|
|
34
|
|
|
|
|
|
|
use Text::BibTeX::NameFormat; |
35
|
|
|
|
|
|
|
|
36
|
|
|
|
|
|
|
$format = Text::BibTeX::NameFormat->($parts, $abbrev_first); |
37
|
|
|
|
|
|
|
|
38
|
|
|
|
|
|
|
$format->set_text ($part, |
39
|
|
|
|
|
|
|
$pre_part, $post_part, |
40
|
|
|
|
|
|
|
$pre_token, $post_token); |
41
|
|
|
|
|
|
|
|
42
|
|
|
|
|
|
|
$format->set_options ($part, $abbrev, $join_tokens, $join_part |
43
|
|
|
|
|
|
|
|
44
|
|
|
|
|
|
|
## Uses the encoding/binmode and normalization form stored in $name |
45
|
|
|
|
|
|
|
$formatted_name = $format->apply ($name); |
46
|
|
|
|
|
|
|
|
47
|
|
|
|
|
|
|
=head1 DESCRIPTION |
48
|
|
|
|
|
|
|
|
49
|
|
|
|
|
|
|
After splitting a name into its components parts (represented as a |
50
|
|
|
|
|
|
|
C object), you often want to put it back together |
51
|
|
|
|
|
|
|
again as a single string formatted in a consistent way. |
52
|
|
|
|
|
|
|
C provides a very flexible way to do this, |
53
|
|
|
|
|
|
|
generally in two stages: first, you create a "name format" which |
54
|
|
|
|
|
|
|
describes how to put the tokens and parts of any name back together, and |
55
|
|
|
|
|
|
|
then you apply the format to a particular name. |
56
|
|
|
|
|
|
|
|
57
|
|
|
|
|
|
|
The "name format" is encapsulated in a C |
58
|
|
|
|
|
|
|
object. The constructor (C) includes some clever behind-the-scenes |
59
|
|
|
|
|
|
|
trickery that means you can usually get away with calling it alone, and |
60
|
|
|
|
|
|
|
not need to do any customization of the format object. If you do need |
61
|
|
|
|
|
|
|
to customize the format, though, the C and C |
62
|
|
|
|
|
|
|
methods provide that capability. |
63
|
|
|
|
|
|
|
|
64
|
|
|
|
|
|
|
Note that C is a fairly direct translation of |
65
|
|
|
|
|
|
|
the name-formatting C interface in the B library. This manual |
66
|
|
|
|
|
|
|
page is meant to provide enough information to use the Perl class, but |
67
|
|
|
|
|
|
|
for more details and examples, consult L. |
68
|
|
|
|
|
|
|
|
69
|
|
|
|
|
|
|
=head1 CONSTANTS |
70
|
|
|
|
|
|
|
|
71
|
|
|
|
|
|
|
Two enumerated types for dealing with names and name formatting have |
72
|
|
|
|
|
|
|
been brought from C into Perl. In the B documentation, you'll |
73
|
|
|
|
|
|
|
see references to C and C. The former lists |
74
|
|
|
|
|
|
|
the four "parts" of a BibTeX name: first, von, last, and jr; its values |
75
|
|
|
|
|
|
|
(in both C and Perl) are C, C, C, and |
76
|
|
|
|
|
|
|
C. The latter lists the ways in which C (the |
77
|
|
|
|
|
|
|
C function that corresponds to C's C |
78
|
|
|
|
|
|
|
method) can join adjacent tokens together: C, C, |
79
|
|
|
|
|
|
|
C, and C. Both sets of values may be |
80
|
|
|
|
|
|
|
imported from the C module, using the import tags |
81
|
|
|
|
|
|
|
C and C. For instance: |
82
|
|
|
|
|
|
|
|
83
|
|
|
|
|
|
|
use Text::BibTeX qw(:nameparts :joinmethods); |
84
|
|
|
|
|
|
|
use Text::BibTeX::Name; |
85
|
|
|
|
|
|
|
use Text::BibTeX::NameFormat; |
86
|
|
|
|
|
|
|
|
87
|
|
|
|
|
|
|
The "name part" constants are used to specify surrounding text or |
88
|
|
|
|
|
|
|
formatting options on a per-part basis: for instance, you can supply the |
89
|
|
|
|
|
|
|
"pre-token" text, or the "abbreviate" flag, for a single part without |
90
|
|
|
|
|
|
|
affecting other parts. The "join methods" are two of the three |
91
|
|
|
|
|
|
|
formatting options that you can set for a part: you can control how to |
92
|
|
|
|
|
|
|
join the individual tokens of a name (C<"JR Smith">, or C<"J R Smith">, |
93
|
|
|
|
|
|
|
or C<"J~R Smith">, and you can control how the final token of one part |
94
|
|
|
|
|
|
|
is joined to the next part (C<"la Roche"> versus C<"la~Roche">). |
95
|
|
|
|
|
|
|
|
96
|
|
|
|
|
|
|
=head1 METHODS |
97
|
|
|
|
|
|
|
|
98
|
|
|
|
|
|
|
=over 4 |
99
|
|
|
|
|
|
|
|
100
|
|
|
|
|
|
|
=item new(PARTS, ABBREV_FIRST) |
101
|
|
|
|
|
|
|
|
102
|
|
|
|
|
|
|
Creates a new name format, with the two most common customizations: which |
103
|
|
|
|
|
|
|
parts to include (and in what order), and whether to abbreviate the first |
104
|
|
|
|
|
|
|
name. PARTS should be a string with at most four characters, one representing |
105
|
|
|
|
|
|
|
each part that you want to occur in a formatted name (defaults to C<"fvlj">). |
106
|
|
|
|
|
|
|
For example, C<"fvlj"> means to format names in "first von last jr" order, |
107
|
|
|
|
|
|
|
while C<"vljf"> denotes "von last jr first." ABBREV_FIRST is just a boolean |
108
|
|
|
|
|
|
|
value: false to print out the first name in full, and true to abbreviate it |
109
|
|
|
|
|
|
|
with periods after each token and discretionary ties between tokens (defaults |
110
|
|
|
|
|
|
|
to false). All intra- and inter-token punctuation and spacing is independently |
111
|
|
|
|
|
|
|
controllable with the C and C methods, although these |
112
|
|
|
|
|
|
|
will rarely be necessary---sensible defaults are chosen for everything, based |
113
|
|
|
|
|
|
|
on the PARTS and ABBREV_FIRST values that you supply. See the description of |
114
|
|
|
|
|
|
|
C in L for full details of the |
115
|
|
|
|
|
|
|
choices made. |
116
|
|
|
|
|
|
|
|
117
|
|
|
|
|
|
|
=cut |
118
|
|
|
|
|
|
|
|
119
|
|
|
|
|
|
|
sub new |
120
|
|
|
|
|
|
|
{ |
121
|
29
|
|
|
29
|
1
|
710
|
my ($class, $parts, $abbrev_first) = @_; |
122
|
|
|
|
|
|
|
|
123
|
29
|
|
50
|
|
|
107
|
$parts ||= "fvlj"; |
124
|
29
|
100
|
|
|
|
85
|
$abbrev_first = defined($abbrev_first)? $abbrev_first : 0; |
125
|
|
|
|
|
|
|
|
126
|
29
|
50
|
|
|
|
201
|
die unless $parts =~ /^[fvlj]{1,4}$/; |
127
|
|
|
|
|
|
|
|
128
|
29
|
|
33
|
|
|
94
|
$class = ref ($class) || $class; |
129
|
29
|
|
|
|
|
68
|
my $self = bless {}, $class; |
130
|
29
|
|
|
|
|
147
|
$self->{_cstruct} = create ($parts, $abbrev_first); |
131
|
29
|
|
|
|
|
84
|
$self; |
132
|
|
|
|
|
|
|
} |
133
|
|
|
|
|
|
|
|
134
|
|
|
|
|
|
|
|
135
|
|
|
|
|
|
|
sub DESTROY |
136
|
|
|
|
|
|
|
{ |
137
|
29
|
|
|
29
|
|
4291
|
my $self = shift; |
138
|
|
|
|
|
|
|
free ($self->{'_cstruct'}) |
139
|
29
|
50
|
|
|
|
175
|
if defined $self->{'_cstruct'}; |
140
|
|
|
|
|
|
|
} |
141
|
|
|
|
|
|
|
|
142
|
|
|
|
|
|
|
|
143
|
|
|
|
|
|
|
=item set_text (PART, PRE_PART, POST_PART, PRE_TOKEN, POST_TOKEN) |
144
|
|
|
|
|
|
|
|
145
|
|
|
|
|
|
|
Allows you to customize some or all of the surrounding text for a single |
146
|
|
|
|
|
|
|
name part. Every name part has four possible chunks of text that go |
147
|
|
|
|
|
|
|
around or within it: before/after the part as a whole, and before/after |
148
|
|
|
|
|
|
|
each token in the part. For instance, if you are abbreviating first |
149
|
|
|
|
|
|
|
names and wish to control the punctuation after each token in the first |
150
|
|
|
|
|
|
|
name, you would set the "post token" text: |
151
|
|
|
|
|
|
|
|
152
|
|
|
|
|
|
|
$format->set_text ('first', undef, undef, undef, ''); |
153
|
|
|
|
|
|
|
|
154
|
|
|
|
|
|
|
would set the post-token text to the empty string, resulting in names |
155
|
|
|
|
|
|
|
like C<"J R Smith">. (Normally, abbreviated first names will have a |
156
|
|
|
|
|
|
|
period after each token: C<"J. R. Smith">.) Note that supplying |
157
|
|
|
|
|
|
|
C for the other three values leaves them unchanged. |
158
|
|
|
|
|
|
|
|
159
|
|
|
|
|
|
|
See L for full information on formatting names. |
160
|
|
|
|
|
|
|
|
161
|
|
|
|
|
|
|
=cut |
162
|
|
|
|
|
|
|
|
163
|
|
|
|
|
|
|
sub set_text |
164
|
|
|
|
|
|
|
{ |
165
|
11
|
|
|
11
|
1
|
30
|
my ($self, $part, $pre_part, $post_part, $pre_token, $post_token) = @_; |
166
|
|
|
|
|
|
|
|
167
|
|
|
|
|
|
|
# Engage in a little conspiracy with the XS code (_set_text) and the |
168
|
|
|
|
|
|
|
# underlying C function (bt_set_format_text) here. In particular, |
169
|
|
|
|
|
|
|
# neither of those functions copy the strings we pass in here -- they |
170
|
|
|
|
|
|
|
# just copy the C pointers. Ultimately, those refer back to the Perl |
171
|
|
|
|
|
|
|
# strings that we're passing in now. Thus, if those Perl strings |
172
|
|
|
|
|
|
|
# were to go away (ref count drop to zero), then the C code might |
173
|
|
|
|
|
|
|
# have dangling pointers to free'd strings -- oops! The solution is |
174
|
|
|
|
|
|
|
# to keep references of those Perl strings here, so that their ref |
175
|
|
|
|
|
|
|
# count can never drop to zero without our assent. Every time |
176
|
|
|
|
|
|
|
# set_text is called, the old references are overridden (ref count |
177
|
|
|
|
|
|
|
# drops), and when the NameFormat object is destroyed, we destroy |
178
|
|
|
|
|
|
|
# them (ref count drops). Other than that, there will always be some |
179
|
|
|
|
|
|
|
# reference to the strings passed in to set_text. |
180
|
|
|
|
|
|
|
|
181
|
|
|
|
|
|
|
# XXX what if some of these are undef? |
182
|
|
|
|
|
|
|
|
183
|
11
|
|
|
|
|
51
|
$self->{'textrefs'} = [\$pre_part, \$post_part, \$pre_token, \$post_token]; |
184
|
|
|
|
|
|
|
|
185
|
11
|
|
|
|
|
90
|
_set_text ($self->{'_cstruct'}, |
186
|
|
|
|
|
|
|
$part, |
187
|
|
|
|
|
|
|
$pre_part, |
188
|
|
|
|
|
|
|
$post_part, |
189
|
|
|
|
|
|
|
$pre_token, |
190
|
|
|
|
|
|
|
$post_token); |
191
|
11
|
|
|
|
|
26
|
1; |
192
|
|
|
|
|
|
|
} |
193
|
|
|
|
|
|
|
|
194
|
|
|
|
|
|
|
|
195
|
|
|
|
|
|
|
=item set_options (PART, ABBREV, JOIN_TOKENS, JOIN_PART) |
196
|
|
|
|
|
|
|
|
197
|
|
|
|
|
|
|
Allows further customization of a name format: you can set the |
198
|
|
|
|
|
|
|
abbreviation flag and the two token-join methods. Alas, there is no |
199
|
|
|
|
|
|
|
mechanism for leaving a value unchanged; you must set everything with |
200
|
|
|
|
|
|
|
C. |
201
|
|
|
|
|
|
|
|
202
|
|
|
|
|
|
|
For example, let's say that just dropping periods from abbreviated |
203
|
|
|
|
|
|
|
tokens in the first name isn't enough; you I want to save |
204
|
|
|
|
|
|
|
space by jamming the abbreviated tokens together: C<"JR Smith"> rather |
205
|
|
|
|
|
|
|
than C<"J R Smith"> Assuming the two calls in the above example have |
206
|
|
|
|
|
|
|
been done, the following will finish the job: |
207
|
|
|
|
|
|
|
|
208
|
|
|
|
|
|
|
$format->set_options (BTN_FIRST, |
209
|
|
|
|
|
|
|
1, # keep same value for abbrev flag |
210
|
|
|
|
|
|
|
BTJ_NOTHING, # jam tokens together |
211
|
|
|
|
|
|
|
BTJ_SPACE); # space after final token of part |
212
|
|
|
|
|
|
|
|
213
|
|
|
|
|
|
|
Note that we unfortunately had to know (and supply) the current values |
214
|
|
|
|
|
|
|
for the abbreviation flag and post-part join method, even though we were |
215
|
|
|
|
|
|
|
only setting the intra-part join method. |
216
|
|
|
|
|
|
|
|
217
|
|
|
|
|
|
|
=cut |
218
|
|
|
|
|
|
|
|
219
|
|
|
|
|
|
|
sub set_options |
220
|
|
|
|
|
|
|
{ |
221
|
14
|
|
|
14
|
1
|
39
|
my ($self, $part, $abbrev, $join_tokens, $join_part) = @_; |
222
|
|
|
|
|
|
|
|
223
|
14
|
|
|
|
|
42
|
_set_options ($self->{'_cstruct'}, $part, |
224
|
|
|
|
|
|
|
$abbrev, $join_tokens, $join_part); |
225
|
14
|
|
|
|
|
30
|
1; |
226
|
|
|
|
|
|
|
} |
227
|
|
|
|
|
|
|
|
228
|
|
|
|
|
|
|
|
229
|
|
|
|
|
|
|
=item apply (NAME) |
230
|
|
|
|
|
|
|
|
231
|
|
|
|
|
|
|
Once a name format has been created and customized to your heart's |
232
|
|
|
|
|
|
|
content, you can use it to format any number of names using the C |
233
|
|
|
|
|
|
|
method. NAME must be a C object (i.e., a pre-split |
234
|
|
|
|
|
|
|
name); C returns a string containing the parts of the name |
235
|
|
|
|
|
|
|
formatted according to the C structure it is |
236
|
|
|
|
|
|
|
called on. |
237
|
|
|
|
|
|
|
|
238
|
|
|
|
|
|
|
=cut |
239
|
|
|
|
|
|
|
|
240
|
|
|
|
|
|
|
sub apply |
241
|
|
|
|
|
|
|
{ |
242
|
47
|
|
|
47
|
1
|
119
|
my ($self, $name) = @_; |
243
|
|
|
|
|
|
|
|
244
|
47
|
|
33
|
|
|
116
|
my $name_struct = $name->{'_cstruct'} || |
245
|
|
|
|
|
|
|
croak "invalid Name object: no C structure"; |
246
|
47
|
|
33
|
|
|
95
|
my $format_struct = $self->{'_cstruct'} || |
247
|
|
|
|
|
|
|
croak "invalid NameFormat object: no C structure"; |
248
|
|
|
|
|
|
|
|
249
|
47
|
|
|
|
|
249
|
my $ans = format_name ($name_struct, $format_struct); |
250
|
|
|
|
|
|
|
|
251
|
47
|
|
|
|
|
131
|
$ans = Text::BibTeX->_process_result($ans, $name->{binmode}, $name->{normalization}); |
252
|
|
|
|
|
|
|
|
253
|
47
|
|
|
|
|
368
|
return $ans; |
254
|
|
|
|
|
|
|
} |
255
|
|
|
|
|
|
|
|
256
|
|
|
|
|
|
|
=back |
257
|
|
|
|
|
|
|
|
258
|
|
|
|
|
|
|
=head1 EXAMPLES |
259
|
|
|
|
|
|
|
|
260
|
|
|
|
|
|
|
Although the process of splitting and formatting names may sound |
261
|
|
|
|
|
|
|
complicated and convoluted from reading the above (along with |
262
|
|
|
|
|
|
|
L), it's actually quite simple. There are really |
263
|
|
|
|
|
|
|
only three steps to worry about: split the name (create a |
264
|
|
|
|
|
|
|
C object), create and customize the format |
265
|
|
|
|
|
|
|
(C object), and apply the format to the name. |
266
|
|
|
|
|
|
|
|
267
|
|
|
|
|
|
|
The first step is covered in L; here's a brief |
268
|
|
|
|
|
|
|
example: |
269
|
|
|
|
|
|
|
|
270
|
|
|
|
|
|
|
$orig_name = 'Charles Louis Xavier Joseph de la Vall{\'e}e Poussin'; |
271
|
|
|
|
|
|
|
$name = Text::BibTeX::Name->new($orig_name); |
272
|
|
|
|
|
|
|
|
273
|
|
|
|
|
|
|
The various parts of the name can now be accessed through |
274
|
|
|
|
|
|
|
C methods; for instance C<$name-Epart('von')> |
275
|
|
|
|
|
|
|
returns the list C<("de","la")>. |
276
|
|
|
|
|
|
|
|
277
|
|
|
|
|
|
|
Creating the name format is equally simple: |
278
|
|
|
|
|
|
|
|
279
|
|
|
|
|
|
|
$format = Text::BibTeX::NameFormat->new('vljf', 1); |
280
|
|
|
|
|
|
|
|
281
|
|
|
|
|
|
|
creates a format that will print the name in "von last jr first" order, |
282
|
|
|
|
|
|
|
with the first name abbreviated. And for no extra charge, you get the |
283
|
|
|
|
|
|
|
right punctuation at the right place: a comma before any `jr' or `first' |
284
|
|
|
|
|
|
|
tokens, and periods after each `first' token. |
285
|
|
|
|
|
|
|
|
286
|
|
|
|
|
|
|
For instance, we can perform no further customization on this format, |
287
|
|
|
|
|
|
|
and apply it immediately to C<$name>. There are in fact two ways to do |
288
|
|
|
|
|
|
|
this, depending on whether you prefer to think of it in terms of |
289
|
|
|
|
|
|
|
"Applying the format to a name" or "formatting a name". The first is |
290
|
|
|
|
|
|
|
done with C's C method: |
291
|
|
|
|
|
|
|
|
292
|
|
|
|
|
|
|
$formatted_name = $format->apply ($name); |
293
|
|
|
|
|
|
|
|
294
|
|
|
|
|
|
|
while the second uses C's C method: |
295
|
|
|
|
|
|
|
|
296
|
|
|
|
|
|
|
$formatted_name = $name->format ($format); |
297
|
|
|
|
|
|
|
|
298
|
|
|
|
|
|
|
which is just a wrapper around C. In |
299
|
|
|
|
|
|
|
either case, the result with the example name and format shown is |
300
|
|
|
|
|
|
|
|
301
|
|
|
|
|
|
|
de~la Vall{\'e}e~Poussin, C.~L. X.~J. |
302
|
|
|
|
|
|
|
|
303
|
|
|
|
|
|
|
Note the strategic insertion of TeX "ties" (non-breakable spaces) at |
304
|
|
|
|
|
|
|
sensitive spots in the name. (The exact rules for insertion of |
305
|
|
|
|
|
|
|
discretionary ties are given in L.) |
306
|
|
|
|
|
|
|
|
307
|
|
|
|
|
|
|
=head1 SEE ALSO |
308
|
|
|
|
|
|
|
|
309
|
|
|
|
|
|
|
L, L, L. |
310
|
|
|
|
|
|
|
|
311
|
|
|
|
|
|
|
=head1 AUTHOR |
312
|
|
|
|
|
|
|
|
313
|
|
|
|
|
|
|
Greg Ward |
314
|
|
|
|
|
|
|
|
315
|
|
|
|
|
|
|
=head1 COPYRIGHT |
316
|
|
|
|
|
|
|
|
317
|
|
|
|
|
|
|
Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This file |
318
|
|
|
|
|
|
|
is part of the Text::BibTeX library. This library is free software; you |
319
|
|
|
|
|
|
|
may redistribute it and/or modify it under the same terms as Perl itself. |
320
|
|
|
|
|
|
|
|
321
|
|
|
|
|
|
|
=cut |
322
|
|
|
|
|
|
|
|
323
|
|
|
|
|
|
|
|
324
|
|
|
|
|
|
|
1; |
325
|
|
|
|
|
|
|
|