File Coverage

blib/lib/PDF/Builder/Content/Column_docs.pm
Criterion Covered Total %
statement 6 6 100.0
branch n/a
condition n/a
subroutine 2 3 66.6
pod n/a
total 8 9 88.8


line stmt bran cond sub pod time code
1             package PDF::Builder::Content::Column_docs;
2              
3 1     1   1657 use strict;
  1         2  
  1         30  
4 1     1   3 use warnings;
  1         2  
  1         314  
5              
6             our $VERSION = '3.028'; # VERSION
7             our $LAST_UPDATE = '3.028'; # manually update whenever code is changed
8              
9             # originally mostly part of Content/Text.pm, it was split out due to its length
10             #
11             # WARNING: be sure to keep in synch with changes to code and POD elsewhere
12             #
13             # do not attempt to use Unicode entities, such as E<euro> -- the POD to
14             # HTML converter will barf on them!
15              
16             =head1 NAME
17              
18             PDF::Builder::Content::Column_docs -- column text formatting system
19              
20             =head1 PDF::Builder::Content::Text/column and related routines
21              
22             These routines form a sub-library for support of complex columnar output with
23             high level markup languages. Currently, a single rectangular layout may be
24             defined on a page, to be filled by user-defined content. Any content which
25             could not be fit within the column confines is returned in an internal array
26             format, and may be passed to the next C<column()> call to finish the
27             formatting.
28              
29             Future plans call for non-rectangular columns to be definable, as well as
30             flow from one column to another on a page, and column balancing. Other
31             possible enhancements call for support of non-Western writing systems
32             (e.g., bidirectional text, using the HarfBuzz library), proper
33             word-splitting and paragraph shaping (possibly using the Knuth-Plass
34             algorithm), and additional markup languages.
35              
36             =head2 column
37              
38             ($rc, $next_y, $unused) = $text->column($page, $text, $grfx, $markup, $txt, %opts)
39              
40             =over
41              
42             This method fills out a column of text on a page, returning any unused portion
43             that could not be fit, and where it left off on the page.
44              
45             Tag names, CSS entries, markup type, etc. are case-sensitive (usually
46             lower-case letters only). For example, you cannot give a <P> paragraph in
47             HTML or a B<P> selector in CSS styling.
48              
49             B<$page> is the page context. Currently, its only use is for page annotations
50             for links ('md1' []() and 'html' E<lt>aE<gt>), so if you're not using those,
51             you may pass anything such as C<undef> for C<$page> if you wish.
52              
53             B<$text> is the text context, so that various font and text-output operations
54             may be performed. It is often, but not necessarily always, the same as the
55             object containing the "column" method.
56              
57             B<$grfx> is the graphics (gfx) context. It may be a dummy (e.g., undef) if
58             I<no> graphics are to be drawn, but graphical items such as the column outline
59             ('outline' option) and horizontal rule (<hr> in HTML markup) use it.
60             Currently, I<text-decoration> underline (default for links, 'md1' C<[]()> and
61             'html' C<E<lt>aE<gt>>) or line-through or overline use the text context, but
62             may in the future require a valid graphics context. Images (when implemented)
63             will require a graphics context.
64              
65             B<$markup> is information on what sort of I<markup> is being used to format
66             and lay out the column's text:
67              
68             =over
69              
70             =item 'pre'
71              
72             The input material has already been processed and is already in the desired
73             form. C<$txt> is an array reference to the list of hashes. This I<must> be used
74             when you are calling C<column()> a second (or later)
75             time to output material left over from the first call. It may also be used when
76             the caller application has already processed the text into the appropriate
77             format, and other markup isn't being used.
78              
79             =item 'none'
80              
81             If I<none> is specified, there is no markup in use. At most, a blank line or
82             a new text array element specifies a new paragraph, and that's it. C<$txt> may
83             be a single string, or an array (list) of strings.
84              
85             The input B<txt> is a list (anonymous array reference) of strings, each
86             containing one or more paragraphs. A single string may also be given. An empty
87             line between paragraphs may be used to separate the paragraphs. Paragraphs may
88             not span array elements.
89              
90             =item 'md1'
91              
92             This specifies a certain flavor of Markdown compatible with Text::Markdown.
93             See the full description below.
94              
95             There are other flavors of Markdown, so other mdI<n> flavors I<may> be defined
96             in the future, such as POD from Perl code.
97              
98             =item 'html'
99              
100             This specifies that a large subset of HTML markup is used, along with some
101             attributes and CSS.
102              
103             Numeric entities (decimal &#nnn; and hexadecimal &#xnnn;) are supported,
104             as well as named entities (&mdash; for example).
105              
106             The input B<txt> is a list (anonymous array reference) of strings, each
107             containing one or more paragraphs and other markup. A single string may also be
108             given. Per normal HTML practice, paragraph tags should be used to mark
109             paragraphs. I<Note that HTML::TreeBuilder is configured to automatically
110             mark top body-level text with paragraph tags, in case you forget to do so,
111             although it is probably better to do it yourself, to maintain more control
112             over the processing.>
113             Separate array elements will first be glued together into a single string
114             before processing, permitting paragraphs to span array elements if desired.
115              
116             =item Other input formats
117              
118             There are other markup languages out there, such as HTML-like Pango,
119             nroff-like man page, Markdown-like wikimedia, and Perl's POD, that
120             might be supported in the future (provided there are supported Perl libraries
121             for them). It is very unlikely that TeX or LaTeX will
122             ever be supported, as they both already have excellent PDF output.
123              
124             PDF::Builder currently only supports the markup languages described above.
125             If you want to use something else (e.g., Perl's POD, or I<man> format, or even
126             MS Word or some other WYSIWYG format), you will need to find a converter
127             utility to convert it to a supported flavor of Markdown or HTML. Many such
128             converters already exist, so take a look (although you may well have to do some
129             cleanup before C<column()> accepts the resulting HTML as input).
130              
131             Perhaps in the future, PDF::Builder will directly support additional formats,
132             but no promises.
133              
134             =back
135              
136             B<$txt> is the input text: a string, an array reference to multiple strings,
137             or an array reference to hashes. See C<$markup> for details.
138              
139             B<%opts> Options -- a number of these are, despite the name, mandatory.
140              
141             =over
142              
143             =item 'rect' => [x, y, width, height]
144              
145             This defines a column as a rectangular area of a given width and height (both
146             in points) on the current page. I<In the future, it is expected that more
147             elaborate non-rectangular areas will be definable, but for now, a simple
148             rectangle is all that is permitted.> The column's upper left coordinate is
149             C<x, y>.
150              
151             The top text baseline is assumed to be relative to the UL corner (based on the
152             determined line height), and the column outline
153             clips that baseline, as it does additional baselines down the page (interline
154             spacing is C<leading> multiplied by the largest C<font_size> or image height
155             needed on that line).
156              
157             I<Currently, 'rect' is required, as it is the only column shape supported.>
158              
159             =item 'relative' => [ x, y, scale(s) ]
160              
161             C<'relative'> defaults to C<[ 0, 0, 1, 1 ]>, and allows a column outline
162             (currently only 'rect') to be either absolute or relative. C<x> and C<y> are
163             added to each C<x,y> coordinate pair, I<after> scaling. Scaling values:
164              
165             =over
166              
167             =item (none) The scaling defaults to 1 in both x and y dimensions (no change).
168              
169             =item scale (one value) The scaling in both the x (width) and y (height)
170             dimensions uses this value.
171              
172             =item scale_x, scale_y (two values) There are two separate scaling factors
173             for the x dimension (width) and y dimension (height).
174              
175             =back
176              
177             This permits a generically-shaped outline to be defined, scaled (perhaps
178             not preserving the aspect ratio) and placed anywhere on the page. This could
179             save you from having to define similarly-shaped columns from scratch multiple
180             times.
181             If you want to define a relative outline, the lower left corner (whether or
182             not it contains a point, and whether or not it's the first one listed) would
183             usually be C<0, 0>, to have scaling work as expected. In other works, your
184             outline template should be in the lower left corner of the page.
185              
186             =item 'start_y' => $start_y
187              
188             If omitted, it is assumed that you want to start at the top of the defined
189             column (the maximum C<y> value minus the maximum vertical extent of this line).
190             If used, the normal value is the C<next_y> returned from the previous
191             C<column()> call. It is the deepest extent reached by the previous line (plus
192             leading), and is the top-most point of the new first line of this C<column()>
193             call.
194              
195             Note that the C<x> position will be determined by the column shape and size
196             (the left-most point of the baseline), so there is no place to explicitly set
197             an C<x> position to start at.
198              
199             =item 'font_size' => $font_size
200              
201             This is the starting font size (in points) to be used. Over the course of
202             the text, it may be modified by markup. The default is 12pt. It is in turn
203             overridden by any CSS or HTML font size-settings.
204              
205             The starting font size may be set in a number of ways. It may be inherited from
206             a previous C<$text-E<gt>font(..., font-size)> statement; it may be set via the
207             C<font_size> option (overriding any font method inheritance); it may default to
208             12pt (if neither explicit way is given). For HTML markup, it may of course be
209             modified by the C<font> tag or by CSS styling C<font-size>. For Markdown, it
210             may be modified by CSS styling.
211              
212             =item 'font_info' => $string
213              
214             This permits the user to specify the starting font used in C<column()> (body
215             font-family, font-style, font-weight, color). C<column()> will pick up any
216             font already
217             loaded (C<$text-E<gt>font($font, $size);>, or using FontManager), and use that
218             as the "current" font. If no font has been loaded, and no other instructions
219             are given, the FontManager default (core Times-Roman) will be used.
220              
221             The C<font_info> option for C<column()> may be given to override either of the
222             two above methods. You may specify a C<$string> of B<'-fm-'> to instruct
223             C<column()> to use the FontManager "default" font (Times face core font).
224             Or, you may pick a font
225             face I<known> to FontManager (added by user code if not one of the 28 core
226             fonts), and optionally give it style and weight: C<$string> of
227             B<'face:style:weight:color'>. The style defaults to 'normal' (non-italic), or
228             'normal' or '0' may be given. For italics, use 'italic' or '1'. The weight
229             defaults to 'normal' (unbolded weight), or 'normal' or '0' may be given. For
230             bold (heavy) text, use 'bold' or '1'. Finally, a color may be given.
231              
232             Finally, the C<style> option for C<column()> may be given to override any of
233             the above settings, e.g., B<'style'=E<gt>{ body { font-family:... }> and set
234             the initial current font. Remember that, as with anything font-related that
235             C<column()> does, the 'face' (family) used must already be known to FontManager
236             (explicitly loaded with C<add_font()> if not one of the 28 core fonts).
237             Remember that the first 14 fonts are standard PDF, and the second 14 are
238             normally supplied with Windows (but not always with other operating systems).
239              
240             =item 'marker_width' => $marker_width
241              
242             =item 'marker_gap' => $marker_gap
243              
244             This is the width of the gutter to the left of a list item, where (for the
245             first line of the item) the marker lives. The marker contains the symbol (for
246             bulleted/unordered lists) or formatted number and "before" and "after" text
247             (for numbered/ordered lists). Both have a single space (marker_gap = 1em)
248             before the item text starts. The number is a length, in points.
249              
250             The default is 1 em (1 times the font_size passed to C<column()>), and is not
251             adjusted for any changes of font_size in the markup, so that lists are indented
252             I<consistently>. This is usually fine for unordered (bulleted) lists and single
253             digit ordered (numbered) lists, although you may need to make it wider for
254             two or three digit numbered lists. An explicit value passed
255             in is also not changed -- the gutter width for the marker will be the same in
256             all lists (keeping them aligned). If you plan to have exceptionally long
257             markers, such as an ordered list of years in Roman numerals, e.g.,
258             B<(MCMXCIX)>, you may want to make this gutter a bit wider.
259              
260             A value may be given for the marker_gap, which is the gap between the
261             (C<$marker_width> wide) I<marker> and the start of the list item's text.
262             The default is $fs points (1 em), set by the font_size in the markup.
263              
264             The C<list-style-position> CSS property may be given as the standard 'outside'
265             (the default) or 'inside', or (extension to CSS) to indent the left side of
266             second, third, etc. E<lt>liE<gt> lines to somewhere between the 'inside' and
267             'outside' positions. Be sure to consider the C<_marker-align> extended
268             property to left, center, or right (default) align the marker within the
269             C<marker_gutter>.
270              
271             =item 'leading' => $leading
272              
273             This is the leading I<ratio> used throughout the column text.
274             The C<$x, $y> position through C<$x + width> is assumed to be the first
275             text baseline. The next line down will be C<$y - $leading*$font_size>. If the
276             font_size changes for any reason over the course of the column, the baseline
277             spacing (leading * font_size) will also change. The B<default> leading ratio
278             is 1.125 (12.5% added to font).
279              
280             =item 'para' => [ $indent, $top-margin ]
281              
282             When starting a new paragraph, these are the I<default> indentation (in points),
283             and the extra vertical spacing for a top margin on a paragraph. Otherwise, the
284             default is
285             C<[ 1*$font_size, 0 ]> (1em indent, 0 additional vertical space). Either may
286             be overridden by the appropriate CSS settings. An I<outdent> may be defined
287             with a negative indentation value. These apply to all C<$markup> types.
288              
289             At the top of a column, any top margin (not just for paragraphs) is ignored.
290              
291             =item 'outline' => "color string"
292              
293             You may optionally request that the column be outlined in a given color, to aid
294             in debugging fitting problems. This will require that the graphics context be
295             provided to C<column()>.
296              
297             =item 'color' => "color string"
298              
299             The color to draw the text (or rule or other graphic) in. The default is
300             black (#000000).
301              
302             =item 'style' => "CSS styling"
303              
304             You may define CSS (selectors and properties lists) to override the built-in
305             CSS defaults. These will be applied for the entire C<column()> call. You can
306             use this, or C<style> tags in 'html', but for 'none' or 'md1', you will need to
307             use this method to set styling. See also the C<font_info=E<gt>> option to set
308             initial font settings.
309              
310             Note that, unlike the C<style=> I<attribute> in HTML tags, the C<style=E<gt>>
311             option is formatted like a E<lt>style> I<tag> -- that is, with B<selector {>
312             I<property>: I<value>;... B<}>. If you want to set I<global> values, use the
313             B<body> selector.
314              
315             =item 'substitute' => [ [ 'char or string', 'before', 'replace', 'after'],... ]
316              
317             When a certain Unicode code point (character) or string is found, insert
318             I<before> text before the character, replace the character or string with
319             I<replace> text, and insert I<after> text after the character. This may make
320             it easier to insert HTML code (font, color, etc.) into Markdown text, if the
321             desired settings and character can not be produced by your Markdown editor.
322             This applies both to 'md1' and 'html' markup. Multiple substitutions may be
323             defined via multiple array elements.
324             If you want to leave the original character or string I<itself> unchanged, you
325             should define the I<replace> text to be the same as C<'char or string'>.
326             'before' and/or 'after' text may be empty strings if you don't want to insert
327             some sort of markup there.
328              
329             Example: to insert a red cross (X-out) and green tick (check) mark
330              
331             'substitute' => [
332             [ '%cross%', '<font face="ZapfDingbats" color="red">', '8', '</font>' ],
333             [ '%tick%', '<font face="ZapfDingbats" color="green">', '4', '</font>' ],
334             ]
335              
336             should change C<%cross%> in Markdown text ('md1') or HTML text ('html')
337             to C<E<lt>font face="ZapfDingbats" color="green"E<gt>8E<lt>/fontE<gt>>
338             and similarly for C<%tick%>. This is done I<after> the Markdown is converted
339             to HTML (but before HTML is parsed), so make sure that your macro text (e.g.,
340             C<%tick%>) isn't something that Markdown will try to interpret by itself! Also,
341             Perl's regular expression parser seems to get upset with some characters, such
342             as C<|>, so don't use them as delimiters (e.g., C<|cross|>). You don't I<have>
343             to wrap your macro name in delimiters, but it can make the text structure
344             clearer, and may be necessary in order not to do substitutions in the wrong
345             place.
346              
347             =item 'state' => \%state
348              
349             This is the state of processing, including (in particular), information on all
350             the requested references (<a>, <_ref>) and targets (<_reft> and specific id's).
351             Before use, it must be created and initialized. During multiple passes across
352             multiple column() calls, 'state' preserves all the link information. It can
353             even preserve information across the creation of multiple related PDFs, though
354             this may require writing and reading back from a file. There is no information
355             in 'state' that is likely to be of interest to a user (i.e., all internal data).
356             If 'state' is not given, it will (in most cases) be impossible to define various
357             kinds of links (including cross references). A URL link to a browser does not
358             need C<'state'>, but all other kinds of links to this or other PDF files do.
359              
360             =item 'page' => [ $ppn, $extfile, $fpn, $LR, $bind ]
361              
362             This array of values gives C<column()> information needed for generating links
363             (both I<goto> and I<pdf> annotations), and (TBD) left- and right-hand page
364             processing, including how much to shift C<column()> definitions to the outside
365             of the page for binding purposes (TBD). The link information is as follows:
366              
367             =over
368              
369             =item $ppn
370              
371             This is the Physical Page Number of the page currently being generated. It is
372             always an integer greater than 0, and takes a value 1,2,3,... It is needed if
373             this page is used as the target for an external (across PDFs) link, using a
374             physical page number and not a Named Destination.
375             Remember to increment it every time the code calls the C<page()> method.
376             It may be left undefined if you are sure you're never going to generate a link
377             (via C<pdf> call, not using a Named Destination) to this PDF file from another
378             PDF.
379              
380             =item $extfile
381              
382             This describes the external path, filename, and extension of B<this> PDF being
383             created. It is needed if this page is used as the target for an external
384             (across PDFs) link. Remember that this is the I<final> location and name of
385             where this file will live when in use, not necessarily where it is being
386             I<created> at this moment!
387             It may be left undefined or a random name if you are sure you're never going to
388             generate a link (via C<pdf> call) to I<this> PDF file from another PDF.
389              
390             =item $fpn
391              
392             This is the I<Formatted> Page Number of the page being generated. In the
393             simplest case, it is equal to the Physical Page Number, but often you will want
394             to "get fancy" with numbering, such as a prefix for an appendix ('C-2',
395             'Glossary-5', etc.), lowercase Roman numerals in the front matter, etc. You
396             might even want to carry one single sequence of decimal page numbers across
397             multiple PDFs, thus starting at other than "1". If you leave it undefined,
398             certain kinds of links and cross reference formats (where the formatted page
399             number is shown) will not be possible.
400              
401             =item $LR
402              
403             This says whether it's a left-hand page or a right-hand page, for purposes of
404             formatting layout and shifting the C<column()> outline left or right (towards
405             the "outside" of the page) to allow binding space. If undefined, it defaults
406             to an 'R' right-hand page. This ability is currently unused.
407              
408             =item $bind
409              
410             This is the number of points to shift the C<column()> coordinates towards the
411             "outside" of the page for purposes of binding multiple pages together, whether
412             left-right alternation or all right-hand pages (e.g., punched for a notebook or
413             spiral binding, or just stapled on the inside, or glued or sewn into a
414             paperback or hard-cover binding). If undefined, the default is 0. This ability
415             is currently unused.
416              
417             =back
418              
419             =item 'restore' => flag
420              
421             This integer flag determines what sort of cleanup C<column()> will do upon
422             exit, to restore (or not) the font state (face, bold or normal weight,
423             italic or normal style, size, and color).
424              
425             =over
426              
427             =item for rc = 0 (all input markup was used up, without running out of column)
428              
429             =over
430              
431             =item restore => 0
432              
433             This is the B<default>. Upon exiting, C<column()> will attempt to restore the
434             state to what one would see if there was yet more text to be output. Note that
435             this is I<not> necessarily what one would see if the entire state was restored
436             to entry conditions. The intent is that another C<column()> call can be
437             immediately made, using whatever font state was left by the previous call, as
438             though the two calls' markup inputs were concatenated.
439              
440             =item restore => 1
441              
442             This value of C<restore> commands that I<no> change be made to the font state,
443             that is, C<column()> exits with the font state left in the last text output.
444             This may or may not be desirable, especially if the last text output left the
445             text in an unexpected state.
446              
447             =item restore => 2
448              
449             This value of C<restore> attempts to bring the font state all the way back to
450             what it was upon I<entry> to the routine, as if it had never been called. Note
451             that if C<column()> was called with no global font settings, that can not be
452             undone, although the color I<can> be changed back to its original state,
453             usually black.
454              
455             B<CAUTION:> The Font Manager is not synchronized with whatever state the font
456             is returned to. You should not request the 'current' font, but should instead
457             explicitly set it to a specific face, etc., which resets 'current'.
458              
459             =back
460              
461             =item for rc = 1 (ran out of column space before all the input markup was used up)
462              
463             =over
464              
465             =item restore => 0
466              
467             This is the B<default>. Upon exiting, no changes will be made to the font
468             state. As the code will be in the middle of some output, the font state is
469             kept the same, so the next C<column()> call (for the overflow) can pick up
470             where the previous call left off, with regards to the font state.
471              
472             It is equivalent to C<restore = 1>.
473              
474             =item restore => 1
475              
476             This is the same as C<restore = 0>.
477              
478             =item restore => 2
479              
480             This value of C<restore> attempts to bring the font state all the way back to
481             what it was upon I<entry> to the routine, as if it had never been called. Note
482             that if C<column()> was called with no global font settings, that can not be
483             undone, although the color I<can> be changed back to its original state,
484             usually black.
485              
486             B<CAUTION:> The Font Manager is not synchronized with whatever state the font
487             is returned to. You should not request the 'current' font, but should instead
488             explicitly set it to a specific face, etc., which resets 'current'.
489              
490             =back
491              
492             =back
493              
494             =back
495              
496             B<Data returned by this call>
497              
498             If there is more text than can be accommodated by the column size, the unused
499             portion is returned, with a return code of 1. It is an empty list if all the
500             text could be formatted, and the return code is 0.
501             C<next_y> is the y coordinate where any additional text (C<column()> call)
502             could be added to a column (as C<start_y>) that wasn't completely filled.
503             This would be at the starting point of a new column (i.e., the
504             last paragraph is ended). Note that the application code should check if this
505             position is too far down the page (in the bottom margin) and not blindly use
506             it! Also, as 'md1' is first converted to HTML, any unused portion will be
507             returned as 'pre' markup, rather than Markdown or HTML. Be sure to specify
508             'pre' for any continuation of the column (with one or more additional
509             C<column()> calls), rather than 'none', 'md1', or 'html'.
510              
511             =over
512              
513             =item $rc
514              
515             The return code.
516              
517             =over
518              
519             =item '0'
520              
521             A return code of 0 indicates that the call completed, while using up all the
522             input C<$txt>. It did I<not> run out of defined column space.
523              
524             B<NOTE:> if C<restore> has a value of 1, the C<column()> call makes no effort
525             to "restore" conditions to any
526             starting values. If your last bit of text left the "current" font with some
527             "odd" face/family, size, I<italicized>, B<bolded>, or colored; that will be
528             what is used by the next column call (or other PDF::Builder text calls). This
529             is done in order to allow you to easily chain from one column to the next,
530             without having to manually tell the system what font, color, etc. you want
531             to return to. On the other hand, in some cases you may want to start from the
532             same initial conditions as usual. You
533             may want to add C<get_font()>, C<font()>, C<fillcolor()>, and
534             C<strokecolor()> calls as necessary before the next text output, to get the
535             expected text characteristics. Or, you can simply let C<restore> default to
536             0 to get the same effect.
537              
538             =item '1'
539              
540             A return code of 1 indicates that the call completed by filling up the defined
541             column space. It did I<not> run out of input C<$txt>. You will need to make
542             one or more calls with empty column space (to fill), to use up the remaining
543             input text (with "pre" I<$markup>).
544              
545             If C<restore> defaults to 0 (or is set to 1), the text settings in the
546             "current" font are left as-is, so that whatever you
547             were doing when you ran out of defined column (as regards to font face/family,
548             size, italic and bold states, and color) should automatically be the same when
549             you make the next C<column()> call to make more output.
550              
551             =back
552              
553             Additional return codes I<may> be added in the future, to indicate failures
554             of one sort or another.
555              
556             =item $next_y
557              
558             The next page "y" coordinate to start at, if using the same column definition
559             as the previous C<column()> definition did (i.e., you didn't completely fill
560             the column, and received a return code of 0). In that case, C<$next_y> would
561             give the page "y" coordinate to pass to C<column()> (as C<start_y>) to start a
562             new paragraph at.
563              
564             If the return code C<$rc> was 1 (column was used up), the C<$next_y> returned
565             will be -1, as it would be meaningless to use it.
566              
567             =item $unused
568              
569             This is the unused portion of the input text (return code C<$rc> is 1), in a
570             format ("pre" C<$markup>) suitable for input as C<$txt>. It will be a
571             I<reference> to an array of hashes.
572              
573             If C<$rc> is 0 (all input was used up), C<$unused> is an empty anonymous array.
574             It contains nothing to be used.
575              
576             =back
577              
578             =back
579              
580             =head3 Special notes on saving and restoring the font
581              
582             It is important to let C<column()> know what font face (font-family), weight,
583             and style to use, so it can switch between normal, bold, and italic as desired.
584             There are several methods to I<explicitly select> a font face (font-family) and
585             its variants (weight, style) upon entry to C<column()>. One is to use the
586             C<font_info> option to C<column>, including "-fm-" (default) to use
587             FontManager's default font (core Times-Roman). Another is to use the C<style>
588             option to C<column()> to override the B<body> default CSS. A third, if using
589             HTML or Markdown, is to add a E<lt>styleE<gt> tag to the beginning of the text
590             markup, in order to set the B<body> CSS (as with C<style>). All of these
591             methods will set the B<body>'s font.
592              
593             If nothing special is done, the font selection upon entry to C<column()> will
594             default to using the default FontManager settings (core Times-Roman, equivalent
595             to C<'font_info'=E<gt>'-fm-'>). C<font_info> may also be explicitly set to
596             specify the body text font-family (optionally also style, weight, and color).
597             C<'font_info'=E<gt>'-ext-'> may be given to tell FontManager to pick up an
598             already-loaded font in this text context. It will label that font
599             B<-external-> and use it as the current font. I<However>, be aware that if
600             doing this, C<column()> will B<not> know the actual face (font-family) of
601             whatever font this is, and thus can not change the font-weight (bold) or
602             font-style (italic). These change requests will be ignored. If no font is
603             already loaded, the FontManager's default font (C<-fm-> core Times-Roman) will
604             be selected (and no "-external-" font defined). Whatever way is used to specify
605             he body font-family on the command line, it may be overridden by a
606             C<E<lt>styleE<gt>> tag or C<'style'=E<gt>> command line CSS specification.
607              
608             Once C<column()> has already been called within a given text context, whatever
609             font is in force at the end of the call will be preserved by the text context,
610             available to be picked up by the next C<column()> call with
611             C<'font_info'=E<gt>'-ext'> within I<this> text context. I<column() will still
612             B<not> know the font-family, since this information is not carried in the text
613             context!> Note that a text context is limited to a single page of a PDF, at
614             most (it must be defined by the C<$page-E<gt>text()> call, and is reset with
615             each new page). The user code may of course choose to load a
616             new font externally to C<column()>, in order to use that one upon entry. An
617             C<-external-> font still cannot change style or weight.
618              
619             Any font "face" used must be first registered with FontManager. The standard
620             core fonts (as well as Windows extensions) are preregistered. If user code
621             loads an arbitrary font outside of C<column()>, it will only be known as
622             "-external-" (as described above). C<column()> calls (including CSS font-family)
623             only recognize registered faces, so it knows where to find the font file and
624             other information, and can cache the loaded font. It can keep track of which
625             font is currently being used, and know how to set bold and italic variants.
626              
627             When the end of the defined column is reached (before the text source is
628             exhausted), all open tags are preserved, so that the next C<column()> call
629             (with I<pre> formatting) can pick up with the same font settings as before.
630             However, this works only as long as the complete font description is set in
631             the tags (including the face). If the font face is not given in the tags, it
632             will not be known, and bold and italic will likely not work at the next change.
633             If the text is in the middle of a highlighted phrase (e.g., bold or italic, or
634             a different font), that particular font should be picked up again. However, the
635             B<body> font face and variant may not be correctly resumed if it is assumed
636             that the proper font has been inherited by the next C<column()> call.
637             Explicitly setting the B<body> font should allow the font to return to a known
638             starting condition, although it is possible that (based on nesting of font
639             changes at the column break) other aspects might be incorrect.
640              
641             To summarize, the best practice is to register (C<add_font>) to FontManager any
642             fonts you wish to use, and then explicitly use C<font_info> or C<style> to
643             let C<column()> know what the base font is for your text. This is better than
644             externally loading a font, and depending on its being inherited from the text
645             context, which may in turn may leave it in some other state after a C<column()>
646             call, as well as not being able to change bold and italic.
647              
648             =head2 Markup supported
649              
650             =head3 pre (already formatted from another format)
651              
652             This is an internal format, essentially the output from HTML::TreeBuilder.
653             As this data is consumed by output, it is removed from the array. If any is
654             left over when the column is filled, it is returned to the user, and may be
655             used in a 'pre' format call to C<column()> to finish the job.
656              
657             If you wish to manually generate 'pre' format data, you may do so, although it
658             is usually easier to use a higher level markup such as 'md1' or 'html'.
659              
660             =head3 none
661              
662             This format simply has empty lines separating paragraphs. Otherwise it has no
663             markup.
664              
665             =head3 md1 (Markdown)
666              
667             This is the version of the Markdown language supported by the Text::Markdown
668             library. It is converted into the equivalent HTML and then processed by
669             HTML::TreeBuilder.
670              
671             =head4 Standard Markdown
672              
673             =over
674              
675             =item *
676              
677             * or _ produces italic text
678              
679             =item *
680              
681             ** produces bold text
682              
683             =item *
684              
685             *** produces bold+italic text
686              
687             =item *
688              
689             * (in column 1) produces a bulleted list
690              
691             =item *
692              
693             1. (2., 3., etc. if desired) in column 1 produces a numbered list 1., 2., etc.
694              
695             =item *
696              
697             # produces a level 1 heading.
698             ## produces a level 2 heading, etc. (up to ###### level 6 heading)
699              
700             =item *
701              
702             ---, ===, or ___ produces a horizontal rule
703              
704             =item *
705              
706             ~~ enclose a section of text to be line-through'd (strike-out)
707              
708             =item *
709              
710             [label](URL) external links (to HTML page or within this document, see '<a>' for URL/href formats)
711              
712             =item *
713              
714             [label][n] reference-style links B<NOT> currently supported
715              
716             =item *
717              
718             [label][^n] footnote-style links B<NOT> currently supported
719              
720             =item *
721              
722             ` (backticks) enclose a "code" format phrase
723              
724             =item *
725              
726             ``` (backticks) enclose a "code" format I<block>, B<NOT> currently supported
727              
728             =item *
729              
730             ![alt text](path_to_image) image, B<NOT> currently supported
731              
732             =item *
733              
734             table entries with | and - (or HTML tags) B<NOT> currently supported
735              
736             =item *
737              
738             superscripts (^) and subscripts (~) (or HTML tags) B<NOT> currently supported
739              
740             =item *
741              
742             definition lists with : B<NOT> currently supported
743              
744             =item *
745              
746             task lists - [ ] B<NOT> currently supported
747              
748             =item *
749              
750             emojis will B<NEVER> be supported. We have a perfectly good alphabet.
751              
752             =item *
753              
754             highlighting (inline == or HTML E<lt>mark>) B<NOT> currently supported
755              
756             =back
757              
758             HTML (see below) may be mixed in as desired (although not within "code" blocks
759             marked by backticks, where E<lt>, E<gt>, and E<amp> get turned into HTML
760             entities, disabling the intended tags).
761             Markdown will be converted into HTML, which will then be interpreted into PDF.
762             I<Note that Text::Markdown may produce HTML for certain features, that is not
763             yet supported by HTML processing (see 'html' section below). Let us know if
764             you need such a feature!>
765              
766             The input B<txt> is a list (anonymous array reference) of strings, each
767             containing one or more paragraphs and other markup. A single string may also be
768             given. Per Markdown formatting, an empty line between paragraphs may be used to
769             separate the paragraphs. Separate array elements will first be glued together
770             into a single string before processing, permitting paragraphs to span array
771             elements if desired.
772              
773             =head4 Extended Markdown
774              
775             CSS (Cascading Style Sheets) may be defined for resulting HTML tags (or "body"
776             for global settings), via the C<style=E<gt>> C<column()> option. You may also
777             prepend a C<E<lt>styleE<gt>> HTML tag, with CSS markup, to your Markdown source.
778              
779             Standard Markdown permits an 'id' to be defined in a heading, by suffixing the
780             text with C<{#id_name}>. This is equivalent to C<id="id_name"> in HTML markup.
781             Although Text::Markdown does not currently support it, C<column()> implements
782             this way of defining a target's id, and in fact extends it to permit an id to
783             be defined for any tag with child text.
784              
785             Markdown is further extended by C<column()> to permit a 'title' to be defined
786             for any tag with child text, by use of C<{^title_text}>. Note that this 'title'
787             is the I<link title> to be used, B<not> browser style "hover" popup text. It is
788             the equivalent of C<title="title_text"> in E<lt>a> or E<lt>_ref> HTML markup.
789             Any link tag may define the PDF "fit" to use at the target, by
790             C<{%fit_type,parm(s)}> or C<fit="fit_type,parm(s)"> in E<lt>a> or E<lt>_ref>
791             HTML markup.
792              
793             There are other HTML equivalents defined by Standard Markdown which may not
794             be implemented (converted) by Text::Markdown. Among these are C<~~> line-through
795             (strike-out) and C<===> horizontal rule, which have been fixed with
796             post-processing of the generated HTML. Let us know if you find any more such
797             cases, and we may be able to extend the functionality of 'md1' formatting, or
798             if necessary, implement 'md2' format to use a different library. By default,
799             Text::Markdown disables extended E<lt>_tagname> calls, but these all should be
800             handled properly in post-processing. There are also Markdown features that may
801             be implemented by Text::Markdown, but the resulting HTML is not supported by
802             C<column()> (yet). If you are missing a needed feature, ask about our moving it
803             up on the priority list.
804              
805             =head3 html (HTML)
806              
807             This is the HTML language to be processed by the HTML::TreeBuilder library. It
808             is processed into an array of tags and text strings ('pre' format), which is
809             interpreted by C<column()>. A substantial subset of CSS (Cascading Style Sheets)
810             is also interpreted by C<column()>, although selectors are primitive compared
811             to what a browser supports.
812              
813             =head4 Standard HTML tags
814              
815             A good many HTML tags are implemented, although not all of them:
816              
817             =over
818              
819             =item *
820              
821             B<E<lt>iE<gt> or E<lt>emE<gt>>
822             produces italic or slanted/oblique text, where available through FontManager
823              
824             =item *
825              
826             B<E<lt>bE<gt> or E<lt>strongE<gt>>
827             produces bold text, where available through FontManager
828              
829             =item *
830              
831             B<E<lt>sE<gt>, E<lt>strikeE<gt>, and E<lt>delE<gt>>
832             produce text line-through (strike-out or strike-through)
833              
834             =item *
835              
836             B<E<lt>uE<gt> and E<lt>insE<gt>>
837             produce underlined text
838              
839             =item *
840              
841             B<E<lt>codeE<gt>>
842             produce 'code'-style fixed-pitch text
843              
844             =item *
845              
846             B<E<lt>h1E<gt> through E<lt>h6E<gt>>
847             produce level 1 through 6 headings and subheadings
848              
849             =item *
850              
851             B<E<lt>hrE<gt>>
852             produce a horizontal rule
853              
854             The C<width="length"> attribute gives a length (width, in pixels) less than the full column width, and C<size="height"> attribute gives the height (thickness) of the rule. CSS properties C<width> and C<height> are the equivalent, permitting other units of measure.
855              
856             The default C<width> is the full column, and C<size> (thickness) of the line is 0.5pt.
857              
858             Note that most browsers default to I<center> alignment if the width is less than the full column, which is the default here.
859             The B<align> attribute is available here to specify I<left> alignment of the rule, I<center> alignment (default), or I<right>
860             alignment. Note that this attribute is deprecated in the HTML standard, however, PDF::Builder does not yet support the suggested
861             CSS methods (properties) for doing this.
862              
863             =item *
864              
865             B<E<lt>blockquoteE<gt>>
866             produces a quotation block, a paragraph indented on both sides and of smaller font size
867              
868             =item *
869              
870             B<E<lt>pE<gt>>
871             produces a paragraph
872              
873             =item *
874              
875             B<E<lt>font face="font-family" color="color" size="font-size"E<gt>>
876             as selecting font face, color, and size (considered better to use CSS)
877              
878             =item *
879              
880             B<E<lt>spanE<gt>>
881             needs a style= attribute with CSS to do anything useful
882              
883             =item *
884              
885             B<E<lt>ulE<gt>>
886             produces an unordered (bulleted) list. The C<type> attribute to override the default marker is supported
887              
888             =item *
889              
890             B<E<lt>olE<gt>>
891             produces an ordered (numbered) list. C<start=>, C<type=>, and C<reverse=> attributes are supported to override the default starting count, format, and direction
892              
893             =item *
894              
895             B<E<lt>liE<gt>>
896             adds a list item to a list (ul, ol, or _sl). The C<value=> attribute may be given to override the ordered list counter, and the C<type=> attribute may be given to override the default marker type
897              
898             =item *
899              
900             B<E<lt>a href="URL"E<gt>>
901             produces a link to a browser URL or to this or another PDF document. "URL" is anchor/link, web page URL or this document target C<#p[-x-y[-z]]> (p is physical page number), or path to external PDF document before the #. ##NamedDest and extPDF##NamedDest are supported. Otherwise treat as an "id" (id=).
902              
903             =over
904              
905             =item *
906              
907             B<href="protocol://...">
908             a link will be generated to an HTML browser or (for protocol "mailto") an email client. The tag remains E<lt>a> and a's CSS properties are used. Otherwise, internally the tag will be changed to E<lt>_ref>, whose CSS properties will be used
909              
910             =item *
911              
912             B<href="#p" or "#p-x-y" or "#p-x-y-zoom">
913             a link will be generated to a physical page number "p" in this document. Optionally, an x-y location on the page (for fit="xyz") may be given with an optional zoom factor
914              
915             =item *
916              
917             B<"PDF_document_path#p" etc.>
918             a link will be generated to a physical page number "p" in an external document. Note that the path and filename must point to either an absolute address or to one relative to where this PDF will be located
919              
920             =item *
921              
922             B<"##Named_destination">
923             a link will be generated to a Named Destination in this document (see E<lt>_nameddest>). Note that the Named Destination itself will define the "fit" to be used
924              
925             =item *
926              
927             B<"PDF_document_path##Named_destination">
928             a link will be generated to a Named Destination defined in an external document. Note that the path and filename must point to either an absolute address or to one relative to where this PDF will be located, and that the Named Destination itself will define the "fit" to be used
929              
930             =item *
931              
932             B<"#id_name" or "id_name">
933             a link will be generated to the "id" of given name, which may be in this PDF document or another (if processed in the same run). If a "#" is used, the name must B<not> be all decimal digits, or all decimal digits followed by a "-" and other parts, as this will be interpreted as a "#p" physical page link!
934              
935             =back
936              
937             The link's child text, if not empty, will be used for the resulting link. If there is none, any "title" attribute or C<{^title text}> provided with the target (such as E<lt>_reft>) will be used. Finally, any native "child text" (e.g., a heading's text content) will be used.
938              
939             If using HTML markup, any tag with an id= may be a target. Especially for Markdown use, any tag with child text (not just a heading's text) may include C<{#idname}> to be parsed out as C<id="idname"> (and thus usable as a target).
940              
941             An explicit "fit=" attribute may be given in the E<lt>a> tag, to specify the page fit used by the PDF Reader at the target location. For example, fit="xyz,45,600,1.5" to place the window upper left corner at 45,600 and 150% zoom factor. For Markdown usage, {%xyz,45,600,1.5} in the link text (title) would be the equivalent (xyz fit, at 45,600, zoom 1.5). For a page number target (#p), -x-y (and optionally -zoom) may be added for the same effect (xyz fit). "null" or "undef"
942             may be used for undefined items. For any fit, %x and %y may be used for the
943             target's x and y location, to use the actual target location and not a fixed
944             location on a page.
945              
946             =item *
947              
948             B<In plan, but not yet implemented>
949              
950             =over
951              
952             =item *
953              
954             'pre' (preformatted blocks),
955              
956             =item *
957              
958             'cite', 'q', 'samp', 'var', 'kbd' (various highlights),
959              
960             =item *
961              
962             'big', 'bigger', 'small', 'smaller' (various font sizes),
963              
964             =item *
965              
966             'img' (image display),
967              
968             =item *
969              
970             'br', 'nobr' (line break, line break suppression),
971              
972             =item *
973              
974             'sup', 'sub' (superscript and subscript),
975              
976             =item *
977              
978             'dl', 'dt', 'dd' (definition lists),
979              
980             =item *
981              
982             'table', 'thead', 'tbody', 'tfoot', 'tr', 'th', 'td' (tables),
983              
984             =item *
985              
986             'mark' (highlighting... requires ability to set background color),
987              
988             =item *
989              
990             'div' (handle div's in some manner),
991              
992             =item *
993              
994             'center',
995              
996             =item *
997              
998             'caption', 'figure', 'figcap' (optional sub- and super-sections)
999              
1000             =item *
1001              
1002             'nav', 'header', 'footer', 'address', 'article', 'aside', 'canvas', 'section', 'summary' (possibly some sectioning)
1003              
1004             =back
1005            
1006             =back
1007              
1008             I<Numbered> (decimal and hexadecimal) entities are supported, as well as I<named> entities (e.g., C<E<amp>mdash;>). Lists get a "gutter" (for the marker) of I<marker_width> points wide, and a "gap" between the marker's field and the start of the item's text (I<marker_gap> points wide), so list alignments are consistent over the call.
1009              
1010              
1011             =head4 Extended HTML tags
1012              
1013             A number of HTML-like tags, whose names start with an underscore "_", have
1014             been implemented to perform various tasks. These include:
1015              
1016             =over
1017              
1018             =item *
1019              
1020             B<E<lt>_refE<gt>>
1021             defines an I<alternative> form of a link to this or another PDF document (not for URLs to HTML or email; for that, use E<lt>a>). The attribute C<tgtid=> is required, and equivalent to E<lt>a>'s C<href=>. The attribute C<title=> is optional, and provides title text for the link. The attribute C<fit=> is optional, and provides a non-default "fit" for the target page (note that a Named Destination target provides its own "fit"). For a 'fit' of 'xyz', you may use '%x' for the x value and '%y' for the y value to use the current positions, rather than fixed values. Note that any E<lt>a> link I<not> to a browser or email client will be internally converted to E<lt>_ref>, so CSS for formatting the link text (title) will be defined under '_ref'
1022              
1023             =item *
1024              
1025             B<E<lt>_reftE<gt>>
1026             defines a target id for a link (via the C<id=> attribute), especially if an existing id is not conveniently at hand. See C<id=> attributes in most HTML tags, and C<{#id_name}> for many Markdown "tags". An optional attribute C<title=> may be given to provide a default link text for the link (E<lt>a> or E<lt>_ref>) referring to this id
1027              
1028             =item *
1029              
1030             B<E<lt>_nameddestE<gt>>
1031             defines a Named Destination within this document, accessible via a "##" format link href, or from some PDF Readers on the command line (one or more of C<#ND_name>, C<#name=ND_name>, or C<#nameddest=ND_name> will usually work, when appended to the PDF file path and name, just like with HTML anchor id's). The attribute C<name="ND_name"> is required, to globally name this Destination (the character set allowed and maximum length vary among Readers!). The optional attribute C<fit="fit_info"> may be give to specify a non-default "fit" when invoked. It is the type of fit (e.g., "xyz", "fith", etc.) followed by any location values required by that fit, all separated by commas. C<xyz,%x-100,$y+100,null> is the default fit
1032              
1033             =item *
1034              
1035             B<E<lt>_markerE<gt>>
1036             provides a place to specify, via CSS, on a I<per list item basis>, overrides to default marker settings (see also C<_marker-*> CSS extensions below). If omitted, the same HTML list markers and CSS properties are used for each list item (per usual practice). The intent of this tag is to permit styling changes such as font, color, and alignment to an individual list item (E<lt>li>). This tag is placed immediately I<before> the <li> it applies to
1037              
1038             =item *
1039              
1040             B<E<lt>_moveE<gt>>
1041             provides a way to explicitly move the current write point left or right. Attribute C<x="value"> is an absolute move (in points), while attribute C<dx="value"> is a relative move from the current write point. Along with the "text-align" CSS property, this can provide a way to fine tune text position within a column line.
1042              
1043             An x value that is a bare number (no units) is assumed to specify I<points>, equivalent to units of C<pt>. The unit may also be C<%>, where 0% is the left end of the column, 50% is the center, and 100% is the right end.
1044             A dx value that is a bare number (no units) is assumed to specify I<points>, equivalent to units of C<pt>. The unit may also be C<%>, a fraction of the column width to move (+ right, - left). Note that results are unpredictable if you move beyond the edge of the column in either direction
1045              
1046             =item *
1047              
1048             B<E<lt>_slE<gt>>
1049             provides a I<simple list>, very similar to an I<unordered list>, except for no list markers
1050              
1051             =item *
1052              
1053             B<In plan, but not yet implemented>
1054              
1055             =over
1056              
1057             =item *
1058              
1059             '_k' (manual kerning control)
1060              
1061             =item *
1062              
1063             '_ovl' (overline, similar to underline and line-through)
1064              
1065             =item *
1066              
1067             '_lig' (specify a particular ligature to use here)
1068              
1069             =item *
1070              
1071             '_nolig' (suppress ligatures by HarfBuzz)
1072              
1073             =item *
1074              
1075             '_swash' and '_altg' (specify a particular alternate glyph to use here)
1076              
1077             =item *
1078              
1079             '_sc' (specify "small caps" font variant, with forced end after N words)
1080              
1081             =item *
1082              
1083             '_pc' (specify "petite caps" font variant, with forced end after N words)
1084              
1085             =item *
1086              
1087             '_dc' (specify "dropped cap" font variant in some manner, also CSS)
1088              
1089             =item *
1090              
1091             ? (specify conditional and unconditional page breaks)
1092              
1093             =back
1094              
1095             =back
1096              
1097             =head4 Standard CSS properties and values
1098              
1099             CSS (Cascading Style Sheets) may be defined for HTML tags (or "body"
1100             for global settings), via the C<style=E<gt>> C<column()> option. You may also
1101             add one or more C<E<lt>styleE<gt>> HTML tags, with CSS markup, to your HTML
1102             source. Such entries will be combined into a global style section.
1103              
1104             E<lt>styleE<gt> tags may be placed in an optional E<lt>headE<gt> section, or
1105             within the E<lt>bodyE<gt>. In the latter case, style tags will be pulled out
1106             of the body and added (in order) on to the end of any style tag(s) defined in
1107             a head section. Multiple style tags will be condensed into a single collection
1108             (later definitions of equal precedence overriding earlier). These stylings will
1109             have global effect, as though they were defined in the head. As with normal CSS,
1110             the hierarchy of a given property (in decreasing precedence) is
1111              
1112             appearance in a style= tag attribute
1113             appearance in a tag attribute (possibly a different name than the property)
1114             appearance in a #IDname selector in a <style>
1115             appearance in a .classname selector in a <style>
1116             appearance in a tag name selector in a <style>
1117              
1118             Selectors are quite simple: a single tag name (e.g., B<body>),
1119             a single class (.cname), or a single ID (#iname).
1120             There are I<no> combinations (e.g.,
1121             C<p.abstract> or C<ol, ul>), hierarchies (e.g., C<ol E<gt> li>), specified
1122             number of appearance, pseudotags, or other such complications as found in a
1123             browser's CSS. Sorry!
1124              
1125             =head4 Length Measures
1126              
1127             Property values which are lengths (including C<font-size>) may have units of B<pt>
1128             (points, 72 to the inch), B<px> (pixels, currently fixed at 78 to the inch),
1129             B<in> (inches), B<cm>, B<mm>, B<em> (equal to font-size), B<en> (0.5em), and B<ex> (currently
1130             0.5em, but in the future may be able to query the font's actual x-height).
1131             % (percentage) of the current font-size (in most cases, unless otherwise noted) is allowed, although
1132             some properties may in the future support % of the enclosing object size.
1133             Sizes may be negative numbers (useful only for margins).
1134              
1135             =over
1136              
1137             =item *
1138              
1139             For property
1140             I<list-style-position>, % is relative to the marker width+gap, not font-size (and pt values may
1141             be given, where "inside" = 0% and "outside" = 100% of marker width+gap). The standard 'outside'
1142             (default) and 'inside' values may also be given.
1143              
1144             =item *
1145              
1146             For the I<E<lt>hr>> tag, the "width" and "size" attributes are in points. For the CSS "width"
1147             property, absolute units may be given, or % of available column width. For the CSS "height"
1148             property, absolute units may be given.
1149              
1150             =back
1151              
1152             B<Note> that eventually we may support C<li::marker>, which is now standard CSS,
1153             but there does not appear to be a way to support changes via C<style=>, because
1154             the same property names (e.g., I<color>) would apply to both the marker and the
1155             list item text. This will require extensive changes to CSS style to permit
1156             complex selectors, which C<column()> does not currently offer. Even doing that,
1157             we may retain the current "marker" tags and CSS introduced here. I think W3C
1158             may have missed the boat by not doing something like an optional C<_marker> to
1159             permit normal properties for markers alone, but configurable in-line with
1160             C<style=>.
1161              
1162             Supported CSS properties:
1163              
1164             =over
1165              
1166             =item *
1167              
1168             B<color> (foreground color, in standard PDF::Builder formats)
1169              
1170             =item *
1171              
1172             B<display> (I<inline> or I<block>)
1173              
1174             =item *
1175              
1176             B<font-family> (name as defined to FontManager, e.g. Times)
1177              
1178             =item *
1179              
1180             B<font-size> (length measure)
1181              
1182             Note that B<body> C<font-size> is the starting point, and so if given,
1183             must be a bare number (greater than 0) or number + 'pt'. C<font-size>s for
1184             other tags may be given as % of inherited font-size or em (100% of font-size),
1185             en (50%), or ex (currently fixed at 50%).
1186              
1187             Unless otherwise prohibited, any tag's CSS may first change the font-size,
1188             and then properties such as margins defined as % of font-size will be
1189             calculated using the new font-size, rather than the inherited one.
1190              
1191             =item *
1192              
1193             B<font-style> (I<normal> or I<italic>)
1194              
1195             =item *
1196              
1197             B<font-weight> (I<normal> or I<bold>)
1198              
1199             =item *
1200              
1201             B<height> (pt, bare number)
1202              
1203             Thickness (height) of B<horizontal rule>. The HTML attribute is C<size>.
1204              
1205             =item *
1206              
1207             B<list-style-position> (outside, inside, B<extension:> number pt or % to indent)
1208              
1209             =item *
1210              
1211             B<list-style-type> (marker description, see also _marker-text/before/after)
1212              
1213             =item *
1214              
1215             B<margin-top/right/bottom/left> (length measure)
1216              
1217             Note that adjacent bottom and top margins will be collapsed to use the
1218             I<larger> amount of the two. Negative margin values ("pulling" objects towards
1219             each other) are allowed, and positive margin values "push" objects away from
1220             each other.
1221              
1222             =item *
1223              
1224             B<text-decoration> (none, underline, line-through, overline)
1225              
1226             May use more than one value (except 'none') separated by spaces.
1227              
1228             B<Note 1:> various HTML tags (such as I<u>, I<ins>, I<del>, I<s>) make use of
1229             this CSS property, and may of course be changed in the styling.
1230              
1231             B<Note 2:> both I<underline> and I<overline> are solid lines, which will collide with
1232             glyph descenders and ascenders respectively. We are investigating means of
1233             implementing something like the CSS I<text-decoration-skip-ink: auto> property.
1234             PDF does not appear to currently define a way of doing this (to be handled by
1235             the Reader). I<line-through> is
1236             also a solid line that collides with glyph strokes, but the usual intent I<is>
1237             to obscure the text, so there are no plans to change this default behavior.
1238              
1239             B<Note 3:> these decorations are made as escapes within the text object, rather
1240             than within the graphics object. We reserve the right to (in the future) change
1241             this to require a graphics object to draw them. Some lead time will be given so
1242             that you have a chance to update your code.
1243              
1244             B<Note 4:> I<line-through> uses a fixed % of ascender height, rather than of I<ex>
1245             height. In some fonts, this may result in a I<line-through> "floating" above the
1246             bulk of the characters (intersecting only ascenders), i.e., it is above the x-height.
1247             It's on the "to do" list to address this.
1248              
1249             =item *
1250              
1251             B<line-height> (leading, as ratio of baseline-spacing to font-size).
1252              
1253             Currently, percentage of font-size and absolute units (e.g., pt) are B<not> supported.
1254             The default value is 1.125 (18pt line-to-line for font-size 16).
1255              
1256             B<Note:> B<text-height>, the former I<incorrect> name for this property, is
1257             still supported (as an alias for B<line-height>) through release 3.029, but may
1258             be withdrawn as soon as release 3.030. Update your code if you use it!
1259              
1260             =item *
1261              
1262             B<text-indent> (length measure)
1263              
1264             For paragraph indentation.
1265              
1266             =item *
1267              
1268             B<text-align> (left/center/right justify at current text position)
1269              
1270             B<Note:> if center or right justified, you should keep the text short enough
1271             to fit within the left and right bounds of the column. Center and right
1272             justification need an explicit position defined (usually via <_move>) and will
1273             not properly wrap to a new line.
1274              
1275             =item *
1276              
1277             B<width> (length measure) width (length) of B<horizontal rule>
1278              
1279             Currently only used for E<lt>hr>. In the future it may be expanded to other
1280             object types. E<lt>hr> may be permitted in the future to be a percentage
1281             of the enclosing parent's width.
1282              
1283             =item *
1284              
1285             B<height> (length measure) height (thickness) of B<horizontal rule>
1286              
1287             Currently only used for E<lt>hr>. In the future it may be expanded to other
1288             object types. E<lt>hr> may be permitted in the future to be a percentage
1289             of the enclosing parent's height.
1290              
1291             The equivalent HTML attribute is C<size>.
1292              
1293             =item *
1294              
1295             B<In plan, but not yet implemented>
1296              
1297             =over
1298              
1299             =item *
1300              
1301             white-space (treatment of line-ends and various spaces),
1302              
1303             =item *
1304              
1305             /* and */ comments in CSS
1306              
1307             =item *
1308              
1309             border and border-* (border properties),
1310              
1311             =item *
1312              
1313             padding and padding-* (padding properties),
1314              
1315             =item *
1316              
1317             list-style-image (use an image as a list bullet),
1318              
1319             =item *
1320              
1321             margin (update the four C<margin-*> properties in one setting, add 'auto' value)
1322              
1323             =item *
1324              
1325             background-color (also for <mark> tag),
1326              
1327             =back
1328              
1329             =back
1330              
1331             See the L</Length Measures> section to see what measurements are allowed.
1332              
1333             B<CAUTION:> comments /* and */ are NOT
1334             currently supported in CSS -- perhaps in the future.
1335              
1336             =head4 Extended CSS properties and values
1337              
1338             A number of additional (non-standard) CSS properties and/or values have been
1339             defined for additional functionality for C<column()>. Note that if you set
1340             _marker-* properties in a list, all nested lists will, as usual, inherit these
1341             properties. If you don't want that, you will need to cancel the new settings by
1342             resetting them to standard values in the nested <ul> or <ol> tag's style.
1343              
1344             =over
1345              
1346             =item *
1347              
1348             B<_marker-before> (constant text to insert I<before> an E<lt>ol> marker, default nothing)
1349              
1350             =item *
1351              
1352             B<_marker-after> (constant text to insert I<after> an E<lt>ol> marker, default period ".")
1353              
1354             =item *
1355              
1356             B<_marker-text> (define text to use as marker instead of the system-generated text)
1357              
1358             =item *
1359              
1360             B<_marker-color> (change color from default, such as color-coded E<gt>ul> bullets)
1361              
1362             =item *
1363              
1364             B<_marker-font> (change marker font face (font-family))
1365              
1366             =item *
1367              
1368             B<_marker-style> (change marker font style, e.g., italic)
1369              
1370             =item *
1371              
1372             B<_marker-size> (change marker font size)
1373              
1374             =item *
1375              
1376             B<_marker-weight> (change marker font weight)
1377              
1378             =item *
1379              
1380             B<_marker-align> (left/center/right justify within marker_width gutter)
1381              
1382             =item *
1383              
1384             B<list-style-position> standard (inside or outside) or numeric (points or percentage of marker_width gutter + marker_gap)
1385              
1386             =back
1387              
1388             There are additional non-standard CSS "properties" that you would normally
1389             B<not> set in CSS. They are internal state trackers:
1390              
1391             =over
1392              
1393             =item *
1394              
1395             B<_parent-fs> (current running font size, in points)
1396              
1397             This is actually the parent of this tag's font-size, which the current tag inherits
1398             and may set to a new value if desired (with C<font-size> property).
1399              
1400             =item *
1401              
1402             B<_href> (URL for <a>, normally provided by href= attribute)
1403              
1404             =item *
1405              
1406             B<_left> (running number of points to indent on the left, from margin-left and list nesting)
1407              
1408             =item *
1409              
1410             B<_left_nest> (amount to indent next nested list)
1411              
1412             =item *
1413              
1414             B<_right> (running number of points to indent on the right, from margin-right)
1415              
1416             =back
1417              
1418             =head2 General Comments
1419              
1420             The Font Manager system is used to supply the requested fonts, so it is up to
1421             the application to preload the desired font information I<before> C<column()>
1422             is called. Any request to change the encoding within C<column()> will be
1423             ignored, as the fonts have already been specified for a specific encoding.
1424             Needless to say, the encoding used in creating the input text needs to match
1425             the specified font encoding.
1426              
1427             Absent any markup changing the font face or styling, whatever is defined by
1428             Font Manager as the I<current> font will be what is used. This way, you may
1429             inherit the font from the previous C<column()>, or call
1430             C<$text->font($pdf-E<gt>get_font(), size)> to set both the font and size, or
1431             just call C<$pdf->get_font()> to set only the font, relying on the C<font_size>
1432             option or CSS markup to set the size.
1433              
1434             Line fitting (paragraph shaping) is currently quite primitive. Words will
1435             not be split (hyphenated). I<It is planned to eventually add Knuth-Plass
1436             paragraph shaping, along with proper language-dependent hyphenation.>
1437              
1438             Each change of font automatically supplies its maximum ascender and minimum
1439             descender, the B<extents> above and below the text line's baseline. Each block
1440             of text with a given face and variant, or change of font size, will be given
1441             the same I<vertical> extents -- the extents are font-wide, and not determined
1442             on a per-glyph basis. So, unfortunately, a block of text "acemnorsuvwz" will
1443             have the same vertical extents as a block of text "bdfghijklpqty". For a given
1444             line of text, the highest ascender and the lowest descender (plus leading) will
1445             be used to position the line at the appropriate distance below the previous
1446             line (or the top of the column). No attempt is made to "fit" projections into
1447             recesses (jigsaw-puzzle like). If there is an inset into the side of a column,
1448             or it is otherwise not a straight vertical line,
1449             so long as the baseline fits within the column outline, no check is made
1450             whether descenders or ascenders will fall outside the defined column (i.e.,
1451             project into the inset). We suggest that you try to keep font sizes fairly
1452             consistent, to keep reasonably consistent text vertical extents.
1453              
1454             =cut
1455              
1456             # ---------------------------------------------------------------------------
1457             # function defined in Builder.pm
1458             =head2 init_state
1459              
1460             %state = PDF::Builder->init_state(%lists)
1461              
1462             %state = PDF::Builder->init_state()
1463              
1464             This method is used in L<PDF::Builder::Content::Text> to create and initialize
1465             the hash structure that permits transfer of data between
1466             C<column()> calls, as well as accumulating link information to build
1467             intra- and inter-PDF file jumps for a variety of uses.
1468              
1469             B<%lists> is optional, and allows the user to define tags (which have an id= )
1470             lists for various purposes. These are anonymous lists. Element '_reft' is
1471             predefined for cross reference targets, and already includes the <_reft> tag
1472             as '_reft'. B<Do not add '_reft' to the '_reft' list!> The user may wish to add
1473             other tags (which have id= ) to be used, and define other lists to be
1474             accumulated. For example,
1475              
1476             {'_reft' => [ 'h1', 'h2', 'h3', 'h4' ],
1477             'TOC' => [ '_part', '_chap', 'h1', 'h2', 'h3' ], }
1478              
1479             adds the top 4 heading levels to cross references ('_reft' is already there),
1480             and creates a 5-level list of tags to build a Table of Contents. Additional
1481             lists might include for an Index, glossary, List of Tables, List of Figures
1482             (Illustrations, Photos), List of Equations, etc. I<TOC, Index, etc. have not
1483             yet been implemented, but are planned for the near future!>
1484              
1485             If no C<%lists> parameter is given, you will be limited to cross references
1486             from <_reft> only, and no entries specifically for TOC etc. will be defined.
1487             Remember, only tags with C<id=>s in your markup will be used as link targets.
1488              
1489             If you are using **markdown** for your source, you may not be able to define
1490             C<id=>s for all your "tags" (HTML tags produced after translation from
1491             markdown), and thus will need to use C<E<lt>_reftE<gt>>s as link targets, which
1492             should be passed through to HTML. For applications such as a TOC, you I<may>
1493             be able to postprocess the _reft list to separate out (based on id given) this
1494             large group of target ids into groups for specific purposes, such as a TOC.
1495              
1496             %state = PDF::Builder->init_state(%lists)
1497              
1498             This creates the state structure (hash) to be passed to C<column()> calls, and
1499             it saves information from invocation to invocation. It must be initialized
1500             I<before> the first pass of the loop which invokes one or more C<column()>
1501             formatting calls at each pass (for a different part of the document).
1502              
1503             It is defined in PDF::Builder (Builder.pm) as L<PDF::Builder::init_state>,
1504             rather than here in PDF::Builder::Content::Text, because C<$text> does not
1505             yet exist when it needs to be called.
1506              
1507             =cut
1508              
1509             # ---------------------------------------------------------------------------
1510             # function defined in Builder.pm
1511             =head2 pass_start_state
1512              
1513             $rc = $pdf->pass_start_state($pass_count, $max_passes, \%state)
1514              
1515             This does whatever is necessary at the I<start> of a pass (number $pass_count).
1516             Currently, this is resetting the 'changed_target' hash list.
1517              
1518             It is defined in PDF::Builder (Builder.pm) as
1519             L<PDF::Builder::pass_start_state>, rather than here in
1520             PDF::Builder::Content::Text, because C<$text> does not yet exist when it
1521             needs to be called.
1522              
1523             =cut
1524              
1525             # ---------------------------------------------------------------------------
1526             # function in Content::Text
1527             =head2 pass_end_state
1528              
1529             $rc = $text->pass_end_state($pass_count, $max_passes, $pdf, $state, %opts)
1530              
1531             This examines the state structure (hash), resolves any content changes that
1532             need to be made, and builds a list of all refs (by target id C<tgtid>) which
1533             are still changing at this pass. If any have changed, a non-zero return code
1534             (number of cases) is returned, but if everything has settled down, the return
1535             code is 0.
1536              
1537             =over
1538              
1539             =item $pass_count
1540              
1541             What pass number we are on. Start at 1, and must be no greater than
1542             C<max_passes>.
1543              
1544             =item $max_passes
1545              
1546             The pass number of the last permitted pass, if reached. We may exit before
1547             this if things settle down quickly enough. If
1548              
1549             =over
1550              
1551             =item 1.
1552              
1553             page numbers are not output in link text (C<page_numbers == 0>) _and_
1554              
1555             =item 2.
1556              
1557             C<title=> is given in all '_ref' tags, _or_ all _ref's without title
1558             attributes are backwards references (all forward _ref's have a title)
1559              
1560             =back
1561              
1562             you may often be able to get away with a single pass (C<max_passes == 1>).
1563             You still may be informed that not all cross references have settled.
1564              
1565             =item $pdf
1566              
1567             The PDF object.
1568              
1569             =item $state
1570              
1571             Hashref to state structure, which includes, among other things, lists of
1572             link sources (_ref tags) and link targets (_reft and other listed tags).
1573              
1574             =item %opts
1575              
1576             Options.
1577              
1578             =over
1579              
1580             =item 'debug' => 1
1581              
1582             Draw a border around the link text (the source, not the target), so you can
1583             see where a click would take effect.
1584              
1585             =item 'deltas' => [ 20, 20 ]
1586              
1587             To show some context around the target text (if I<xyz> fit is used without a
1588             specific x and y), the upper left corner of the target window is placed these
1589             amounts (units I<points>) from the left (delta x) and top (delta y) edges of
1590             the target text. The default is 20 (points) each, roughly a couple of lines'
1591             worth. The left side is limited to the page edge, and the top side is limited
1592             to the page top.
1593              
1594             Note that the upper edge of the text is where the I<previous> line left off,
1595             so if there is a top margin on the target text (e.g., it's a heading), the
1596             offset will be from there, not the text itself, and the view window may
1597             therefore be up higher on the page than you would otherwise expect. This has
1598             been known to confuse users with a PDF Reader which displays a fixed-size popup
1599             window showing the target a link will go to, which might even miss the target
1600             text entirely if the deltas are too large.
1601              
1602             =back
1603              
1604             =back
1605              
1606             If all references include their own title string and do B<not> show a page
1607             (only the title string as the annotation link text), a document should take
1608             only one pass. Often two passes are enough to resolve even forward references
1609             which need to pick up text from later in the document,
1610             but sometimes (especially if special formatting of page numbers is involved),
1611             a target may move back and forth between two pages and not settle down. In
1612             such cases, you may need to simplify or rearrange the text, such as moving a
1613             target back from the end of a page, or changing from specialty formats (such
1614             as "on following page" to a fixed "on page N".
1615              
1616             B<Fields in %state structure:>
1617              
1618             settings = hold settings between column() calls
1619             TBD
1620              
1621             xrefs = source of link (<_ref>) info needed
1622             [ ] = array of each link source
1623             id = target's id, tag that defines a target
1624             fit = any fit information provided
1625             tfn = target filename (FINAL position and name) used for
1626             external links
1627             tppn = target physical page number (integer > 0)
1628             sppn = source physical page number (integer > 0)
1629             other_pg = text for "other page" if page_numbers > 0
1630             prev_other_pg = previous value (to detect change)
1631             tfpn = formatted page number (string, may be '')
1632             tx,ty = coordinates of target on page (used for fit)
1633             title = text for link. if not defined in <_ref>, use one
1634             in <_reft> (if defined), else "natural text" such
1635             as heading <hX> child text
1636             prev_title = previous value (to detect change)
1637             tag = tag that produced this target (useful for formatting,
1638             e.g., indenting TOC entries based on hX level)
1639             click = [ ] of one or more click areas, each element is
1640             [sppn, [x,y, x,y]]
1641              
1642             xreft = tag that created a target for a link (<_reft> et al.)
1643             _reft = entries for cross reference targets (_reft list)
1644             id
1645             tfn = filepath for external links
1646             tppn = target physical page number
1647             tfpn = target formatted page number
1648             tx,ty = coordinates of target on page
1649             title = title, defaulting to "natural text", to update source
1650             tag = tag type that produced this entry
1651             $another_list = other tag list name list of targets (e.g., TOC)
1652             id...
1653             etc.
1654              
1655             changed_target = hash of tgtids (in xrefs id) that changed AFTER link text
1656             and page text output, requiring another pass
1657              
1658             tag_lists = anon list of tags (with id) to put in various lists.
1659             see 'init_state()' for building tag lists
1660             _reft = [ ] predefined for cross references, may add more (such
1661             as hX heading tags)
1662             TOC = [ ] NOT predefined, add if desired ...TBD
1663             Index = [ ] NOT predefined, add if desired, etc. ...TBD
1664              
1665             nameddest = hash of named destinations to be defined
1666             $name = name of the destination
1667             fit = fit information (location, parms)
1668             ppn = physical page number in this PDF
1669             x,y = x and y coordinates on page
1670              
1671             Note that the link text ('title') and any page information ('on page X') need
1672             to be output at each pass, to determine where everything is, while other
1673             information is stored until the last pass, to actually generate the annotation
1674             links. The "last pass" will be either when it is found that all link information
1675             has "settled down", or the C<max_passes> limit is reached.
1676              
1677             =cut
1678              
1679             # ---------------------------------------------------------------------------
1680             # function in Content::Text
1681             =head2 unstable_state
1682              
1683             @list = $text->unstable_state(\%state)
1684              
1685             This returns a list (array) of string target ids (tgtid) which appear to still
1686             be changing at the end of the loop, i.e., have not settled down.
1687              
1688             If this method is called when C<check_state()> returned a 0, the list will
1689             be empty. It may also be called at each pass, for diagnostic purposes.
1690              
1691             =cut
1692              
1693             # ---------------------------------------------------------------------------
1694       0     sub _cdocs {
1695             # dummy stub
1696             }
1697              
1698             1;