File Coverage

blib/lib/Data/CTable.pm
Criterion Covered Total %
statement 1306 1470 88.8
branch 366 608 60.2
condition 205 416 49.2
subroutine 164 179 91.6
pod 6 147 4.0
total 2047 2820 72.5


line stmt bran cond sub pod time code
1             #!/usr/bin/perl
2             ## Emacs: -*- tab-width: 4; -*-
3              
4 1     1   13071 use strict;
  1         4  
  1         211  
5              
6             package Data::CTable;
7              
8 1     1   8 use vars qw($VERSION); $VERSION = '1.01';
  1         4  
  1         333  
9              
10             =pod
11              
12             =head1 NAME
13              
14             Data::CTable - Read, write, manipulate tabular data
15              
16             =head1 SYNOPSIS
17              
18             ## Read some data files in various tabular formats
19             use Data::CTable;
20             my $People = Data::CTable->new("people.merge.mac.txt");
21             my $Stats = Data::CTable->new("stats.tabs.unix.txt");
22              
23             ## Clean stray whitespace in fields
24             $People->clean_ws();
25             $Stats ->clean_ws();
26              
27             ## Retrieve columns
28             my $First = $People->col('FirstName');
29             my $Last = $People->col('LastName' );
30              
31             ## Calculate a new column based on two others
32             my $Full = [map {"$First->[$_] $Last->[$_]"} @{$People->all()}];
33              
34             ## Add new column to the table
35             $People->col(FullName => $Full);
36              
37             ## Another way to calculate a new column
38             $People->col('Key');
39             $People->calc(sub {no strict 'vars'; $Key = "$Last,$First";});
40              
41             ## "Left join" records matching Stats:PersonID to People:Key
42             $Stats->join($People, PersonID => 'Key');
43              
44             ## Find certain records
45             $Stats->select_all();
46             $Stats->select(Department => sub {/Sale/i }); ## Sales depts
47             $Stats->omit (Department => sub {/Resale/i}); ## not Resales
48             $Stats->select(UsageIndex => sub {$_ > 20.0}); ## high usage
49              
50             ## Sort the found records
51             $Stats->sortspec('DeptNum' , {SortType => 'Integer'});
52             $Stats->sortspec('UsageIndex', {SortType => 'Number' });
53             $Stats->sort([qw(DeptNum UsageIndex Last First)]);
54              
55             ## Make copy of table with only found/sorted data, in order
56             my $Report = $Stats->snapshot();
57              
58             ## Write an output file
59             $Report->write(_FileName => "Rept.txt", _LineEnding => "mac");
60              
61             ## Print a final progress message.
62             $Stats->progress("Done!");
63              
64             ## Dozens more methods and parameters available...
65              
66             =head1 OVERVIEW
67              
68             Data::CTable is a comprehensive utility for reading, writing,
69             manipulating, cleaning and otherwise transforming tabular data. The
70             distribution includes several illustrative subclasses and utility
71             scripts.
72              
73             A Columnar Table represents a table as a hash of data columns, making
74             it easy to do data cleanup, formatting, searching, calculations,
75             joins, or other complex operations.
76              
77             The object's hash keys are the field names and the hash values hold
78             the data columns (as array references).
79              
80             Tables also store a "selection" -- a list of selected / sorted record
81             numbers, and a "field list" -- an ordered list of all or some fields
82             to be operated on. Select() and sort() methods manipulate the
83             selection list. Later, you can optionally rewrite the table in memory
84             or on disk to reflect changes in the selection list or field list.
85              
86             Data::CTable reads and writes any tabular text file format including
87             Merge, CSV, Tab-delimited, and variants. It transparently detects,
88             reads, and preserves Unix, Mac, and/or DOS line endings and tab or
89             comma field delimiters -- regardless of the runtime platform.
90              
91             In addition to reading data files, CTable is a good way to gather,
92             store, and operate on tabular data in memory, and to export data to
93             delimited text files to be read by other programs or interactive
94             productivity applications.
95              
96             To achieve extremely fast data loading, CTable caches data file
97             contents using the Storable module. This can be helpful in CGI
98             environments or when operating on very large data files. CTable can
99             read an entire cached table of about 120 megabytes into memory in
100             about 10 seconds on an average mid-range computer.
101              
102             For simple data-driven applications needing to store and quickly
103             retrieve simple tabular data sets, CTable provides a credible
104             alternative to DBM files or SQL.
105              
106             For data hygiene applications, CTable forms the foundation for writing
107             utility scripts or compilers to transfer data from external sources,
108             such as FileMaker, Excel, Access, personal organizers, etc. into
109             compiled or validated formats -- or even as a gateway to loading data
110             into SQL databases or other destinations. You can easily write short,
111             repeatable scripts in Perl to do reporting, error checking, analysis,
112             or validation that would be hard to duplicate in less-flexible
113             application environments.
114              
115             The data representation is simple and open so you can directly access
116             the data in the object if you feel like it -- or you can use accessors
117             to request "clean" structures containing only the data or copies of
118             it. Or you can build your own columns in memory and then when you're
119             ready, turn them into a table object using the very flexible new()
120             method.
121              
122             The highly factored interface and implementation allow fine-grained
123             subclassing so you can easily create useful lightweight subclasses.
124             Several subclasses are included with the distribution.
125              
126             Most defaults and parameters can be customized by subclassing,
127             overridden at the instance level (avoiding the need to subclass too
128             often), and further overridden via optional named-parameter arguments
129             to most major method calls.
130              
131             =head2 Similar / related modules on CPAN
132              
133             The Data::Table module by Yingyao Zhou & Guangzhou Zou offers similar
134             functionality, but uses a different underlying data representation
135             (2-dimensional array), and has a somewhat different feature set.
136             Check it out. Maybe you will prefer it for your application.
137              
138             http://search.cpan.org/search?mode=module&query=Data::Table
139              
140             The Data::ShowTable module renders tables in various viewable formats.
141             CTable relies on ShowTable's ShowBoxTable method to implement its own
142             format() and out() methods.
143              
144             http://search.cpan.org/search?mode=module&query=Data::ShowTable
145              
146              
147             =head2 Prerequisites
148              
149             The CTable documentation, source code, and examples assume familiarity
150             with large nested data structures, object-oriented syntax and
151             terminology, and comfort working with array and hash references and
152             array and hash slice syntax.
153              
154             See the perlref man page for more on these topics.
155              
156              
157             =head2 How to learn more
158              
159             Dozens more methods, parameters, and examples are described below.
160              
161             See the full source code in CTable.pm.
162              
163             Or, after installing, read the man page using:
164              
165             man Data::CTable
166             perldoc Data::CTable
167              
168             See the eg/ (examples) folder in the Data::CTable distribution and the
169             test.pl script for scripts demonstrating every CTable method.
170              
171             For latest version and other news, check the Data::CTable home page:
172              
173             http://christhorman.com/projects/perl/Data-CTable/
174              
175             Or search CPAN:
176              
177             http://search.cpan.org/search?mode=module&query=Data::CTable
178              
179             =head1 INSTALLATION
180              
181             Using CPAN module:
182              
183             perl -MCPAN -e 'install Data::CTable'
184              
185             Or manually:
186              
187             tar xzvf Data-CTable*gz
188             cd Data-CTable-?.??
189             perl Makefile.PL
190             make
191             make test
192             make install
193              
194             =head1 INCLUDED SUBCLASSES AND UTILITIES
195              
196             In addition to the module itself, there are a number of subclasses and
197             simple utilities included with the Data::CTable distribution.
198              
199             =head2 Subclases
200              
201             The Data::CTable distribution includes these example subclasses. Each
202             is installed in your Perl environment along with the main module, and
203             so may be used by your scripts. Each has its own man/perldoc page
204             containing more detail.
205              
206             B is a subclass that logs all progress
207             messages to a list within the object itself rather than (merely)
208             echoing them to STDERR. Later, you may retrieve and examine the list.
209              
210             B is a virtual subclass that includes class and
211             object methods that make it easy to write a simple interactive
212             command-line program that parses options and outputs a table.
213              
214             B is a very useful subclass of
215             Data::CTable::Script that implements a souped-up Unix-like "ls" (file
216             listing) command -- it first gets an optionally-recursive listing of
217             any number of files and/or directories, builds a list of their full
218             absolute or relative paths, then build a Data::CTable::Listing object
219             that contains all the paths, plus about 25+ other pieces of useful
220             information about each file or directory.
221              
222             The "tls" utility, below, is simply a command-line cover for this
223             class, but you could use this class in your own scripts in order to
224             get detailed file listings.
225              
226             =head2 Utilities
227              
228             Each of these utilities is provided mainly so you can see mature
229             examples of how to use Data::CTable in real-world scripts.
230              
231             But each is also genuinely useful, too, and you may enjoy adding them
232             to your regular bag of tricks or using them as an easily-modifiable
233             basis for scripts of your own.
234              
235             On most systems, these will be installed in an appropriate directory
236             in your path when you install CTable, and hence will be executable
237             just by typing their file name. (On Windows, they'll be wrapped by a
238             .bat file and installed in C:\Perl\bin or equivalent. On *nix,
239             they'll be in /usr/bin/ or equivalent.)
240              
241             B is a command-line utility that wraps Data::CTable::Listing to
242             implement a variant on the classic Unix "ls" command in which an
243             internal CTable object is used to hold and calculate a very large
244             amount of meta-data about each file and directory and then output that
245             data in formatted tables, or as any kind of delimited text file, and
246             with much more flexibility and control over included data, and sort
247             and sub-sort order than with ls.
248              
249             B is a command-line utility that reads each of its arguments as
250             a Data::CTable file and then calls the out() method to display its
251             entire contents. (Warning: out() is slow with very large data sets.)
252              
253             B is a command-line utility that takes a US zip code,
254             grabs the local weather report from a popular weather web site and
255             uses a CTable object to store, process, and clean, and present the
256             table of weather data that results in a simple text format.
257              
258             =cut
259            
260             ## Required dependencies
261              
262 1     1   1022 use IO::File;
  1         12860  
  1         134  
263 1     1   10 use Config; qw(%Config);
  1         1  
  1         49  
264              
265 1     1   4 use Carp qw(croak carp confess cluck);
  1         2  
  1         64  
266 1     1   1141 use Storable qw(store nstore retrieve dclone);
  1         3435  
  1         111  
267 1     1   8 use File::Basename qw(fileparse);
  1         2  
  1         137  
268              
269              
270             ## Optional dependencies
271              
272             my $HaveDumper;
273             my $HaveShowTable;
274              
275             BEGIN
276             {
277 1     1   70 eval "
  1     1   1657  
  1     1   10438  
  1         83  
  1         311  
  0         0  
  0         0  
278             use Data::Dumper qw(Dumper);
279             use Data::ShowTable qw(ShowBoxTable);
280             ";
281            
282 1         3 $HaveDumper = $Data::Dumper::VERSION;
283 1         21 $HaveShowTable = exists($Data::ShowTable::{ShowBoxTable});
284             };
285              
286              
287             ## We optionally export a few general-purpose utility routines.
288              
289 1     1   4 use Exporter; use vars qw(@ISA @EXPORT_OK); @ISA=qw(Exporter);
  1     1   1  
  1         33  
  1         5  
  1         2  
  1         3376  
290             @EXPORT_OK = qw(
291             &ISORoman8859_1ToMacRoman
292             &MacRomanToISORoman8859_1
293            
294             &ISORoman8859_1ToMacRoman_clean
295             &MacRomanToISORoman8859_1_clean
296              
297             &guess_endings
298             &guess_delimiter
299              
300             &path_info
301             &path_is_absolute
302              
303             &min
304             &max
305             );
306              
307             =pod
308              
309             =head1 CREATING TABLE OBJECTS
310              
311             ## Create an object / read file(s) / override params
312              
313             use Data::CTable;
314              
315             $t = Data::CTable->new()
316             $t = Data::CTable->new($File)
317             $t = Data::CTable->new($File1, $File2....)
318             $t = Data::CTable->new($Params)
319             $t = Data::CTable->new($Params, $File1)
320             $t = Data::CTable->new($Params, $File1, $File2....)
321              
322             ## Internal initializer (subclassable): called for you by new()
323              
324             $t->initialize()
325              
326             If the first argument to new() is a hash ref, it is the $Params hash
327             of initial parameters and/or data columns which, if supplied will form
328             the starting point for the object being created. Any non-hash and/or
329             further arguments to new() are treated as file names to be opened.
330              
331             If supplied, data in the $Params hash will be shallowly copied -- the
332             original hash object passed will not be used, but any sub-structures
333             within it will now "belong" to the resulting new object which will
334             feel free to manipulate them or discard them.
335              
336             Then, any parameters not supplied (usually most of them) will be
337             defaulted for you because new() will call the internal method
338             initialize() before the object is finished and returned.
339              
340             See the PARAMETER REFERENCE section below for the parameters you can
341             choose to supply or have defaulted for you.
342              
343             Any file name arguments will be read and appended into a single
344             object. If any of the files fails to be read, then new() will fail
345             (return a false value) and no object will be created.
346              
347             initialize() makes sure there is a legal and consistent value for
348             every internal parameter in the object. Generally, initialize()
349             leaves alone any parameters you supplied to new(), and simply sets
350             default values for any that were not yet supplied.
351              
352             You should never need to call initialize() directly.
353              
354             After calling initialize(), new() then calls the append_files_new()
355             method to process the filename arguments. This method then calls
356             read() on the first filename, and then append_file on all subsequent
357             file names, appending them to the first, in sequence.
358              
359             See append() for an explanation of how the data from multiple files is
360             combined into a single table.
361              
362             =head2 Advanced: Using a template object
363              
364             ## Calling new() with a template object
365              
366             $v = $t->new()
367             $v = $t->new($File)
368             $v = $t->new($File1, $File2....)
369             $v = $t->new($Params)
370             $v = $t->new($Params, $File1)
371             $v = $t->new($Params, $File1, $File2....)
372              
373             You can also call $t->new() to use an existing object, $t, as a
374             template for the new object. $t->new() will create a new object of
375             the same class or subclass as the template object. Furthermore, the
376             template object, if provided, will be used as a starting point for the
377             resulting object -- in fact, it will initially share shallow copies of
378             all data columns, if any, and all internal parameters and data
379             structures, if any.
380              
381             This advanced shared-data technique could be used to create two
382             separate table objects that share and operate on the same underlying
383             data columns in memory but have different custom field lists, custom
384             selections, sort behavior, etc. But don't do this unless you're sure
385             you understand what you're doing, because changing data in one table
386             would change it in the other.
387              
388             =head1 PARAMETER REFERENCE
389              
390             The parameters listed here are recognized by new(), initialize(), and
391             by many functions that use the named-parameter calling convention,
392             such as read(), write(), sort(), and others.
393              
394             Any parameter listed may be specified when using new().
395              
396             Most parameters should not be directly accessed or manipulated once
397             the object has been created, except those that have appropriate
398             accessor methods described throughout in this documentation.
399              
400             Each parameter in the lists below is listed along with its defaulting
401             logic as performed by new() via initialize().
402              
403             =head2 Custom field list and custom selection (record list)
404              
405             =over 4
406              
407             =item _FieldList ||= undef;
408              
409             This is the ordered list (array reference) of columns / fields present
410             in the object. It is set by read() to reflect the names and order of
411             the fields encountered or actually read from in the incoming data
412             file, if any. Initially this list is undefined, and if removed or
413             left undefined, then the de-facto field list will be a list of all
414             columns present in the table object, in alphabetical order (see
415             fieldlist() and fieldlist_all()).
416              
417             Normally, all fields present in the table object would be listed in
418             the field list.
419              
420             However, this parameter may be set in the object (or overridden in
421             named-parameter function calls like read(), write(), etc.) to cause a
422             subset of fields to be read, written or otherwise used. If a subset
423             field list is specified before reading a data file, then ONLY fields
424             listed will be read -- this is a way to read just certain fields from
425             a very large file, but may not always be what you want.
426              
427             Specifying the field list before calling read() is required if the
428             data file has no header row giving names to its fields -- the names
429             you specify in the field list, in order, will be applied to the data
430             file being read. See the _HeaderRow parameter, below.
431              
432             If a subset field list is used after columns are already loaded in
433             memory, the columns not listed in the field list will still be present
434             in the object (and can be listed by calling fieldlist_all()), but they
435             will be omitted from most operations that iterate over fields.
436              
437             =item _Selection ||= undef;
438              
439             This is a list of the record numbers of "selected" records in the
440             table, possibly indicating sorted order.
441              
442             If absent, then all records are considered to be selected, in
443             "natural" order -- i.e. the order they occur in the file.
444              
445             You can create and set your own selection list or get and modify an
446             existing one. Deleting it resets the selection (for example, by
447             calling select_all()).
448              
449             Calling sort() will create a _Selection if none existed. Otherwise it
450             operates by modifying the existing _Selection, which may be a subset
451             of all record numbers.
452              
453             =back
454              
455             =head2 Cache behavior controls
456              
457             See sections related to Cacheing, below.
458              
459             =over 4
460              
461             =item _CacheOnRead = 1 unless exists
462              
463             Boolean: whether data files read by the read() method should be cached
464             after reading. Once cached, the data will be read from the cache
465             instead of the original file the NEXT TIME READ() IS CALLED, but only
466             if: 1) the cache file is found, and 2) its date is later than the
467             original. Otherwise, the cache file is ignored or re-written.
468             Cacheing can be up to 10x faster than parsing the file, so it's almost
469             always worth doing in any situation where you'll be reading a data
470             file more often than writing it.
471              
472             This parameter defaults to true.
473              
474             =item _CacheOnWrite = 0 unless exists
475              
476             Boolean: whether tables written by the write() method should be cached
477             after writing. This defaults to false on the assumption that the
478             program won't need to re-read a file it just wrote. However, this
479             behavior would be useful if a later step in your program or another
480             program will be reading the file that was written and would benefit
481             from having the cacheing already done. Cacheing a file after writing
482             is quite fast since the data is already in memory and of course it
483             speeds up subsequent read() operations by up to 10x.
484              
485             =item _CacheExtension = ".cache" unless exists
486              
487             This is the file name extension that is added to a file's name to
488             determine the name of its corresponding cache file. (First, any
489             existing extension, if any, is removed.) If this extension is empty,
490             then the cache file will be named the same as the original assuming
491             the cache file is being stored in a different directory. (See next
492             setting.)
493              
494             =item _CacheSubDir = "cache" unless exists
495              
496             This is the absolute or relative path to the subdirectory that should
497             be used to store the cache files. The default value is the relative
498             path to a directory called "cache". Relative paths will be appended
499             to the directory path containing the original file being read.
500              
501             Absolute cache paths (such as /tmp or c:\temp\) can also be used.
502              
503             Override _CacheExtension and _CacheSubdir in a subclass, in each
504             object, or in each call to read() or write() in order to have the
505             cache files stored elsewhere. But remember: unless you use the same
506             cache settings next time you read the same file, the cache files will
507             be orphaned.
508              
509             =back
510              
511             =head2 Progress routine / setting
512              
513             =over 4
514              
515             =item _Progress = undef unless exists
516              
517             The _Progress setting controls the routing of diagnostic messages.
518             Four possible settings are recognized:
519              
520             undef (default) The class's progress settings are used.
521             subroutine reference Your own custom progress routine.
522             true Built-in progress_default() method used.
523             0/false No progress messages for this object.
524              
525             See the PROGRESS section for a description of the interface of custom
526             progress routines and for details on how the builtin one works.
527              
528             =back
529              
530             =head2 File format settings
531              
532             =over 4
533              
534             =item _LineEnding ||= undef;
535              
536             _LineEnding indicates the line ending string or setting to be used to
537             read a file, the setting that actually I used to read a file,
538             and/or the line ending that will be used to write a file.
539              
540             Set this parameter to force a particular encoding to be used.
541              
542             Otherwise, leave it undef. The program will Do What You Mean.
543              
544             If _LineEnding is undef when read() is called, read() will try to
545             guess the line ending type by inspecting the first file it reads.
546             Then it will set this setting for you. It can detect DOS, Unix, and
547             Mac line endings.
548              
549             If _LineEnding is undef when write() is called, write() will use
550             C<"\n">, which yields different strings depending on the current
551             runtime platform: \x0A on Unix; \x0D in MacPerl, \x0D\x0A on DOS.
552              
553             Otherwise, write() uses the value defined in _LineEnding, which would
554             match the value filled in by read() if this object's data originally
555             had been read from a file. So if you read a file and then later write
556             it out, the line endings in the written file will match the format of
557             original unless you override _LineEnding specifically.
558              
559             Since Data::CTable supports reading and writing all common endings,
560             base your decision on line ending format during write() on the needs
561             of other programs you might be using.
562              
563             For example: FileMaker Pro and Excel crash / hang if Unix line endings
564             are used, so be sure to use the ending format that matches the needs
565             of the other programs you plan to use.
566              
567             As a convenience, you may specify and retrieve the _LineEnding setting
568             using the mnemonic symbols "mac", "dos" and "unix." These special
569             values are converted to the string values shown in this chart:
570              
571             symbol string value chars decimal octal control
572             -------------------------------------------------------------
573             dos "\x0D\x0A" CR/LF 13,10 "\015\012" ^M^J
574             mac "\x0D" CR 13 "\015" ^M
575             unix "\x0A" LF 10 "\012" ^J
576              
577             See the section LINE ENDINGS, below, for accessor methods and
578             conversion utilities that help you get/set this parameter in either
579             symbolic format or string format as you prefer.
580              
581             =item _FDelimiter ||= undef;
582              
583             _FDelimiter is the field delimiter between field names in the header
584             row (if any) and also between fields in the body of the file. If
585             undef, read() will try to guess whether it is tab C<"\t"> or comma
586             <",">, and set this parameter accordingly. If there is only one field
587             in the file, then comma is assumed by read() and will be used by
588             write().
589              
590             To guess the delimiter, the program looks for the first comma or tab
591             character in the header row (if present) or in the first record.
592             Whichever character is found first is assumed to be the delimiter.
593              
594             If you don't want the program to guess, or you have a data file format
595             that uses a custom delimiter, specify the delimiter explicitly in the
596             object or when calling read() or make a subclass that initializes this
597             value differently. On write(), this will default to comma if it is
598             empty or undef.
599              
600             =item _QuoteFields = undef unless exists
601              
602             _QuoteFields controls how field values are quoted by write() when
603             writing the table to a delimited text file.
604              
605             An undef value (the default) means "auto" -- each field is checked
606             individually and if it contains either the _FDelimiter character or a
607             double-quote character, the field value will be surrounded by
608             double-quotes as it is written to the file. This method is slower to
609             write but faster to read, and may make the output easier for humans to
610             read.
611              
612             A true value means always put double-quotes around every field value.
613             This mode is faster to write but slower to read.
614              
615             A zero value means never to use double-quotes around field values and
616             not to check for the need to use them. This method is the fastest to
617             read and write. You may use it when you are certain that your data
618             can't contain any special characters. However, if you're wrong, this
619             mode will produce a corrupted file in the event that one of the fields
620             does contain the active delimiter (such as comma or tab) or a quote.
621              
622             =item _HeaderRow = 1 unless exists
623              
624             _HeaderRow is a boolean that says whether to expect a header row in
625             data files. The default is true: a header row is required. If false,
626             _FieldList MUST be present before calling read() or an error will be
627             generated. In this latter case, _FieldList will be assumed to give
628             the correct names of the fields in the file, in order, before the file
629             is read. In other words, the object expects that either a) it can get
630             the field names from the file's header row or b) you will supply them
631             before read() opens the file.
632              
633             =back
634              
635             =head2 Encoding of return characters within fields
636              
637             =over 4
638              
639             =item _ReturnMap = 1 unless exists
640              
641             _ReturnMap says that returns embedded in fields should be decoded on
642             read() and encoded again on write(). The industry-standard encoding
643             for embedded returns is ^K (ascii 11 -- but see next setting to change
644             it). This defaults to true but can be turned off if you want data
645             untouched by read(). This setting has no effect on data files where
646             no fields contain embedded returns. However, it is vital to leave
647             this option ON when writing any data file whose fields could contain
648             embedded returns -- if you have such data and call write() with
649             _ReturnMap turned off, the resulting file will be an invalid Merge/CSV
650             file and might not be re-readable.
651              
652             When these fields are decoded on read(), encoded returns are converted
653             to C<"\n"> in memory, whatever its interpretation may be on the current
654             platform (\x0A on Unix or DOS; \x0D on MacPerl).
655              
656             IMPORTANT NOTE: When these fields are encoded by write(), any
657             occurrence of the current _LineEnding being used to write the file is
658             searched and encoded FIRST, and THEN, any occurrence of "\n" is also
659             searched and encoded. For example, if using mac line endings (^M) to
660             write a file on a Unix machine, any ^M characters in fields will be
661             encoded, and then any "\n" (^J) characters will ALSO be encoded. This
662             may not be what you want, so be sure you know how your data is encoded
663             in cases where your field values might contain any ^J and/or ^M
664             characters.
665              
666             IMPORTANT NOTE: If you turn _ReturnMap off, fields with returns in
667             them will still be double-quoted correctly. Some parsers of tab- or
668             comma-delimited files are able to support reading such files.
669             HOWEVER, the parser in this module's read() method DOES NOT currently
670             support reading files in which a single field value appears to span
671             multiple lines in the file. If you have a need to read such a file,
672             you may need to write your own parser as a subclass of this module.
673              
674             =item _ReturnEncoding ||= "\x0B";
675              
676             This is the default encoding to assume when embedding return
677             characters within fields. The industry standard is "\x0B" (ascii 11 /
678             octal \013 / ^K) so you should probably not ever change this setting.
679              
680             When fields are encoded on write(), C<"\n"> is converted to this
681             value. Note that different platforms use different ascii values for
682             C<"\n">, which is another good reason to leave the ReturnEncoding
683             feature enabled when calling write().
684              
685             To summarize: this module likes to assume, and you should too, that
686             returns in data files on disk are encoded as "\x0B", but once loaded
687             into memory, they are encoded as the current platform's value of
688             C<"\n">.
689              
690             =item _MacRomanMap = undef unless exists
691              
692             Data::CTable assumes by default that you want field data in memory to
693             be in the ISO 8859-1 character set (the standard for Latin 1 Roman
694             characters on Unix and Windows in the English and Western European
695             languages -- and also the default encoding for HTML Web pages).
696              
697             _MacRomanMap controls the module's optional mapping of Roman
698             characters from Mac format on disk to ISO format in memory when
699             reading and writing data files. These settings are recognized:
700              
701             undef ## Auto: Read/write Mac chars if using Mac line endings
702             1 ## On: Assume Mac char set in all fields
703             0 ## Off: Don't do any character mapping at all
704              
705             The default setting is undef, which enables "Auto" mode: files found
706             to contain Mac line endings will be assumed to contain Mac upper-ASCII
707             characters and will be mapped to ISO on read(); and files to be
708             written with Mac line endings will mapped back from ISO to Mac format
709             on write().
710              
711             If your data uses any non-Latin-1 character sets, or binary data, or
712             you really want Mac upper-ASCII characters in memory, or you just
713             don't want this module messing with your encodings, set this option to
714             0 (Off) or make a subclass that always sets it to 0.
715              
716             See also the clean() methods that can help you translate just the
717             columns you want after reading a file or before writing it, which may
718             be faster for you if only a few fields might contain high-ASCII
719             characters.
720              
721             =item _FileName ||= undef;
722              
723             This is the name of the file that should be read from or WAS read
724             from. (read() will set _FileName to the value it used to read the
725             file, even if _FileName was only supplied as a named parameter.)
726              
727             This name will also be used, unless overridden, to re-write the file
728             again, but with an optional extension added. (See next setting.)
729              
730             =item _WriteExtension = ".out" unless exists
731              
732             The _WriteExtension is provided so that CTable won't overwrite your
733             input data file unless you tell it to.
734              
735             _WriteExtension will be added to the object's _FileName setting to
736             create a new, related file name, before writing.... UNLESS _FileName
737             is supplied as an direct or named parameter when calling write().
738              
739             In the latter case, write() uses the file name you supply and adds no
740             extension, even if this would mean overwriting the original data file.
741              
742             To add _WriteExtension, write() places it prior to any existing final
743             extension in the _FileName:
744              
745             _FileName default file name used by write()
746             --------------------------------------------------------------
747             People.merge.txt People.merge.out.txt
748             People People.out
749              
750             If you want to always overwrite the original file without having to
751             supply _FileName each time, simply set _WriteExtension to undef in a
752             subclass or in each instance.
753              
754             If _CacheOnWrite is true, then the _WriteExtension logic is applied
755             first to arrive at the actual name of the file to be written, and then
756             the _CacheExtension logic is applied to that name to arrive at the
757             name of the cache file to be written.
758              
759             =back
760              
761             =head2 Sorting-related parameters
762              
763             =over 4
764              
765             =item _SortOrder ||= undef;
766              
767             _SortOrder is the list of fields which should be used as primary,
768             secondary, etc. sort keys when sort() is called. Like other
769             parameters, it may be initialized by a subclass, stored in the object,
770             or provided as a named parameter on each call to sort().
771              
772             If _SortOrder is empty or undefined, then sort() sorts the records by
773             record number (i.e. they are returned to their "natural" order).
774              
775             =item _SortSpecs ||= {};
776              
777             _SortSpecs is a hash of specifications for the SortType and
778             SortDirection of fields on which sorting may be done. For any field
779             missing a sort spec or the SortType or SortDirection components of its
780             sort spec, the _DefaultSortType and _DefaultSortDirection settings
781             will be used. So, for example, if all fields are of type String and
782             you want them to sort Ascending, then you don't need to worry about
783             _SortSpecs. You only need to provide specs for fields that don't take
784             the default settings.
785              
786             _SortSpecs might look like this:
787              
788             {Age => {SortType => 'Integer'},
789             NameKey => {SortType => 'Text', SortDirection => -1}}
790              
791             =item _SRoutines ||= {};
792              
793             _SRoutines is a hash mapping any new SortTypes invented by you to your
794             custom subroutines for sorting that type of data. (See the section on
795             sort routines, below, for a full discussion.)
796              
797             =back
798              
799             =head2 Sorting defaults
800              
801             =over 4
802              
803             =item _DefaultSortType ||= 'String';
804              
805             If you sort using a field with no sort spec supplied, or whose sort
806             spec omits the SortType, it will get its SortType from this parameter.
807              
808             See the sections below on SORT TYPES and SORT ROUTINES.
809              
810             =item _DefaultSortDirection ||= 1;
811              
812             If you sort using a field with no sort spec supplied, or whose sort
813             spec omits the SortDirection, it will get its SortDirection from this
814             parameter.
815              
816             Legal sort directions are: 1 (Ascending) or -1 (Descending).
817              
818             See the section below on DEFAULT SORT DIRECTION.
819              
820             =back
821              
822             =head2 Miscellaneous parameters
823              
824             =over 4
825              
826             =item _ErrorMsg ||= "";
827              
828             This parameter is set by read() or write() methods that encounter an
829             error (usually a parameter error or file-system error) that prevents
830             them from completing. If those methods or any methods that call them
831             return a false value indicating failure, then _ErrorMsg will contain a
832             string explaining the problem. The message will also have been passed
833             to the progress() method for possible console feedback.
834              
835             =item _Subset
836              
837             This parameter is set to 1 (true) by read() if the last call to read()
838             brought in a subset of the fields available in the file; 0 otherwise.
839              
840             The object uses this field internally so it knows to abandon any cache
841             files that might not contain all requested fields upon read().
842              
843             =back
844              
845             =head1 SUBCLASSING
846              
847             Most subclasses will override initialize() to set default values for
848             the parameters of the parent class and then they may provide default
849             values for other subclass-specific parameters. Then, the subclass's
850             initialize() should call SUPER::initialize() to let the parent
851             class(es) take care of the remaining ones.
852              
853             Every initialize() method should always allow for parameters to have
854             already been provided by the $Params hash or template object. It
855             should not overwrite any valid values that already exist.
856              
857             The following sample subclass changes the default setting of the
858             _Progress parameter from undef to 1 and then overrides the
859             progress_default() method to log all progress messages into a new
860             "_ProgrLog" (progress log) parameter stored in the object.
861              
862             BEGIN
863             { ## Data::CTable::ProgressLogger: store messages in the object
864              
865             package Data::CTable::ProgressLogger;
866             use vars qw(@ISA); @ISA=qw(Data::CTable);
867              
868             sub initialize ## Add a new param; change one default
869             {
870             my $this = shift;
871             $this->{_Progress} = 1 unless exists($this->{_Progress});
872             $this->{_ProgrLog} ||= [];
873             $this->SUPER::initialize();
874             }
875              
876             sub progress_default ## Log message to object's ProgMsgs list
877             {
878             my $this = shift;
879             my ($msg) = @_;
880             chomp $msg;
881             push @{$this->{_ProgrLog}}, localtime() . " $msg";
882              
883             return(1);
884             }
885              
886             sub show_log ## Use Dumper to spit out the log list
887             {
888             my $this = shift;
889             $this->dump($this->{_ProgrLog});
890             }
891             }
892              
893             ## Later...
894              
895             my $Table = Data::CTable::ProgressLogger->new("mydata.txt");
896             # ... do stuff...
897             $Table->write();
898             $Table->show_log();
899              
900             =cut
901            
902             {}; ## Get emacs to indent correctly.
903              
904             sub new
905             {
906             ## First arg to new is always either class name or a template
907             ## object. This allows $obj->new() or CLASS->new().
908              
909             ## Second argument (if and only if it is a hash ref or an object
910             ## whose underlying representation is a hash ref) is an optional
911             ## anonymous hash of parameters which if supplied, will override
912             ## any parameters already found in the template object, if any.
913              
914             ## See the initialize method, below, for a list of parameters that
915             ## can be supplied (and will be defaulted for you if not
916             ## supplied).
917              
918             ## Note that the template object and the params hash will be
919             ## SHALLOWLY copied -- the original hash objects passed will not
920             ## be used, but any sub-structures within them will now "belong"
921             ## to the resulting new object which will feel free to manipulate
922             ## them, possibly invalidating the integrity of the original
923             ## template object.
924              
925 109     109 0 3853 my $ClassOrObj = shift;
926 109 100       531 my ($Params) = {%{shift()}} if UNIVERSAL::isa($_[0], 'HASH');
  82         855  
927              
928             ## Shallow-copy all params from template object and/or optional
929             ## $Params hash into new hash. DON'T re-use caller's obj or hash.
930              
931 109 50       673 my $this =
932 109 100       862 {%{(UNIVERSAL::isa($ClassOrObj, 'HASH') ? $ClassOrObj : {})},
933 109         212 %{(UNIVERSAL::isa($Params, 'HASH') ? $Params : {})}};
934            
935             ## Bless the new object into the class
936            
937 109   33     545 my $class = ref($ClassOrObj) || $ClassOrObj;
938 109         273 bless $this, $class;
939            
940 109         133 my $Success;
941              
942             ## Run the subclassable initialize() method to create default
943             ## settings for any private parameters.
944              
945 109 50       268 goto done unless $this->initialize();
946              
947             ## Finally, process any (other) arguments to new(), if any.
948            
949 109         211 my $RemainingArgs = [@_];
950            
951 109 50       305 goto done unless $this->process_new_args($RemainingArgs, $Params);
952            
953 109         125 $Success = 1;
954 109 50       556 done:
955             return ($Success ? $this : undef);
956             }
957              
958             ### process_new_args
959              
960             ### Any optional remaining (non-HASH ref) arguments to new() are
961             ### treated as file names of files to open and append to the in-memory
962             ### table, creating new columns as necessary. We call the
963             ### subclassable append_files_new() method to process these.
964              
965             sub process_new_args
966             {
967 109     109 0 155 my $this = shift;
968 109         148 my ($RemainingArgs, $Params) = @_;
969              
970 109         169 my $Success;
971              
972 109 50       301 $Success = $this->append_files_new($RemainingArgs, $Params) or goto done;
973            
974 109         132 $Success = 1;
975 109         263 done:
976             return ($Success);
977             }
978              
979             ### initialize
980              
981             ### Assumptions made by initialize() (and all other methods, too):
982              
983             ### The blessed object is a hash ref.
984              
985             ### All hash keys beginning with _ are reserved for non-data columns.
986              
987             ### Hash keys beginning with a single _ are reserved for future
988             ### versions of this parent class implementation. Subclasses might
989             ### want to use double-underscore for additional slots.
990              
991             ### All other hash keys are field names; their values are data
992             ### columns (array references).
993              
994             ### initialize() sets / validates initial settings for all parameters
995             ### recognized by this parent class. It exercises caution to not
996             ### override any legal values previously set by the
997             ### subclass::initialize() or by new().
998              
999             sub initialize
1000             {
1001 109 50   109 0 306 my $this = shift or goto done;
1002              
1003 109         121 my $Success;
1004              
1005             ## Reading / writing
1006              
1007 109   100     361 $this->{_FileName} ||= undef; ## Path of file that was read
1008              
1009 109 100       329 $this->{_WriteExtension} = ".out" unless exists($this->{_WriteExtension});
1010              
1011             ## Cache settings
1012              
1013 109 100       297 $this->{_CacheOnRead} = 1 unless exists($this->{_CacheOnRead});
1014 109 100       298 $this->{_CacheOnWrite} = 0 unless exists($this->{_CacheOnWrite});
1015 109 100       288 $this->{_CacheExtension} = ".cache" unless exists($this->{_CacheExtension});
1016 109 100       323 $this->{_CacheSubDir} = "cache" unless exists($this->{_CacheSubDir});
1017              
1018             ## File format settings
1019              
1020 109   100     374 $this->{_LineEnding} ||= undef;
1021 109   100     361 $this->{_FDelimiter} ||= undef;
1022 109 100       284 $this->{_QuoteFields} = undef unless exists ($this->{_QuoteFields});
1023 109 100       281 $this->{_HeaderRow} = 1 unless exists ($this->{_HeaderRow});
1024              
1025             ## Return encodings
1026              
1027 109 100       261 $this->{_ReturnMap} = 1 unless exists ($this->{_ReturnMap});
1028 109   100     356 $this->{_ReturnEncoding} ||= "\x0B"; ## Char to use for return chars
1029 109 100       288 $this->{_MacRomanMap} = undef unless exists ($this->{_MacRomanMap});
1030              
1031             ## Sorting defaults
1032              
1033 109   100     334 $this->{_DefaultSortType} ||= 'String';
1034 109   100     330 $this->{_DefaultSortDirection} ||= 1; ## Ascending (-1 = desc)
1035              
1036             ## Progress routine / setting
1037              
1038 109 100       373 $this->{_Progress} = undef unless exists ($this->{_Progress});
1039              
1040             ## Internal meta-structures
1041              
1042 109   100     346 $this->{_FieldList} ||= undef; ## List of fields; undef means all fields, alpha order
1043 109   100     487 $this->{_Selection} ||= undef; ## List of rec #s; undef means all records, natural order
1044 109   50     422 $this->{_SortOrder} ||= undef; ## List of fields; undef/empty means sort by record number
1045              
1046 109   100     359 $this->{_SortSpecs} ||= {}; ## Hash: FieldName => Sortspec
1047 109   100     348 $this->{_SRoutines} ||= {}; ## Hash: SortType => custom sort routine for type
1048              
1049             ## Miscellaneous
1050              
1051 109   50     371 $this->{_ErrorMsg} ||= ""; ## Explains last read/write failure
1052 109   50     394 $this->{_Subset} ||= 0; ## Flag indicating subset of available fields were read
1053              
1054 109         141 $Success = 1;
1055 109         291 done:
1056             return($Success);
1057             }
1058              
1059             =pod
1060              
1061             =head1 FIELD LIST
1062              
1063             ## Getting / setting the object's _FieldList
1064              
1065             $t->fieldlist() ## Get _FieldList or fieldlist_all()
1066             $t->fieldlist_get()
1067             $t->fieldlist_hash() ## Get fieldlist() as keys in a hash
1068            
1069             $t->fieldlist_all() ## Get all fields (ignore _FieldList)
1070            
1071             $t->fieldlist($MyList) ## Set field list (_FieldList param)
1072             $t->fieldlist_set($MyList)
1073              
1074             $t->fieldlist(0) ## Remove field list (use default)
1075             $t->fieldlist_set()
1076              
1077             $t->fieldlist_force($MyList)## Set list; remove non-matching cols
1078              
1079             $t->fieldlist_truncate() ## Just remove nonmatching cols
1080              
1081             $t->fieldlist_default() ## Default field list (alpha-sorted)
1082              
1083             $t->fieldlist_add($MyName) ## Append new name to custom list.
1084             $t->fieldlist_delete($MyName) ## Delete name from custom list.
1085              
1086             A CTable object can optionally have a custom field list. The custom
1087             field list can store both the ORDER of the fields (which otherwise
1088             would be unordered since they are stored as keys in a hash), and also
1089             can be a subset of the fields actually in the object, allowing you to
1090             temporarily ignore certain effectively-hidden fields for the benefit
1091             of certain operations. The custom field list can be changed or
1092             removed at any time.
1093              
1094             The custom field list is stored in the private _FieldList parameter.
1095              
1096             fieldlist() always returns a list (reference). The list is either the
1097             same list as _FieldList, if present, or it is the result of calling
1098             fieldlist_default(). In CTable, fieldlist_default() in turn calls
1099             fieldlist_all() -- hence fieldlist() would yield an auto-generated
1100             list of all fields in alphabetical order.
1101              
1102             fieldlist_all() can be called directly to get a list of all fields
1103             present regardless of the presence of a _FieldList parameter. The
1104             list is an alphabetical case-insensitively sorted list of all hash
1105             keys whose names do not begin with an underscore.
1106              
1107             You could override this method if you want a different behavior. Or,
1108             you could create your own custom field list by calling fieldlist_all()
1109             and removing fields or ordering them differently.
1110              
1111             To set a custom field list (in _FieldList), call fieldlist() or
1112             fieldlist_set() with a list (reference). The list must be a list of
1113             strings (field names) that do not begin with underscore. The object
1114             owns the list you supply.
1115              
1116             To remove a custom field list (and let the default be used), call
1117             fieldlist(0) or fieldlist_set() with no arguments (these will return
1118             the fieldlist that was deleted, if any).
1119              
1120             fieldlist_freeze() "freezes" the fieldlist in its current state. This
1121             is equivalent to the following:
1122              
1123             $t->fieldlist_set($t->fieldlist());
1124              
1125             ... which would force the fieldlist to $t->fieldlist_all() if and only
1126             if there is not already a custom _FieldList present.
1127              
1128             IMPORTANT NOTE ABOUT PARTIAL FIELD LISTS: When setting a field list,
1129             the object ensures that all fields (columns) mentioned in the list are
1130             present in the object -- it creates empty columns of the correct
1131             length as necessary. However, it does NOT delete any fields not
1132             mentioned in the field list. This allows you to manipulate the field
1133             list in order to have certain fields be temporarily ignored by all
1134             other methods, then alter, restore, or remove it (allow it to revert
1135             to default) and they will be effectively unhidden again. Some methods
1136             (such as cols(), write(), etc.) also allow you to specify a custom
1137             field list that will override any other list just during the execution
1138             of that method call but will not modify the object itself.
1139              
1140             Call fieldlist_force() to set the list AND have any non-listed fields
1141             also deleted at the same time (by calling fieldlist_truncate()
1142             internally). You can also just delete individual columns one-by-one,
1143             of course, using the column-manipulation methods and the custom
1144             fieldlist, if any, will be appropriately updated for you.
1145              
1146             fieldlist_truncate() deletes any fields found in the table but not
1147             currently present in _FieldList. A hash of the deleted columns is
1148             returned to the caller. If there is no _FieldList, then this method
1149             does nothing.
1150              
1151             fieldlist_default() just calls fieldlist_all() in this implementation,
1152             but could be changed in subclasses.
1153              
1154             fieldlist_add() is the internal method that adds a new field name to
1155             the custom field list (if present) and if the field name was not
1156             already on the list. It is called by other methods any time a new
1157             column is added to the table. Don't call it directly unless you know
1158             what you're doing because the corresponding column won't be created.
1159             (Instead, use col().) The field name is appended to the end of the
1160             existing custom field list. If there is no custom field list, nothing
1161             is done.
1162              
1163             fieldlist_delete() is the internal method that deletes a field name
1164             from the custom field list (if present). It is called by other
1165             methods when columns are deleted, but it does not actually delete the
1166             columns themselves, so use with caution: deleting a field from the
1167             custom field list effectively hides the field. This method has no
1168             effect, however, if there is no custom field list present. So don't
1169             call this method directly unless you know what you're doing.
1170              
1171             =cut
1172              
1173             sub fieldlist_all
1174             {
1175 223     223 0 298 my $this = shift;
1176 223         1020 my $FieldList = [sort {lc($a) cmp lc($b)} grep {!/^_/} keys %$this];
  883         1558  
  5726         12123  
1177            
1178 223         899 return($FieldList);
1179             }
1180              
1181             sub fieldlist_default ## Same as fieldlist_all() in this class.
1182             {
1183 40     40 0 51 my $this = shift;
1184 40         73 my $FieldList = $this->fieldlist_all();
1185              
1186 40         136 return($FieldList);
1187             }
1188              
1189             sub fieldlist
1190             {
1191 805     805 0 2485 my $this = shift;
1192 805         927 my ($FieldList) = @_;
1193              
1194             ## Set if specified.
1195 805 100       1560 $this->fieldlist_set($FieldList) if defined($FieldList);
1196              
1197             ## Get and return.
1198 805         1336 $FieldList = $this->fieldlist_get();
1199              
1200 805         1857 return($FieldList);
1201             }
1202              
1203             sub fieldlist_get
1204             {
1205 805     805 0 1291 my $this = shift;
1206 805   66     1821 my $FieldList = $this->{_FieldList} || $this->fieldlist_default();
1207              
1208 805         1250 return($FieldList);
1209             }
1210              
1211             sub fieldlist_hash ## ([$FieldList])
1212             {
1213 96     96 0 124 my $this = shift;
1214 96         123 my ($FieldList) = @_;
1215 96   66     280 $FieldList ||= $this->fieldlist();
1216 96         170 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  96         388  
1217            
1218 96         189 return($FieldHash);
1219             }
1220              
1221             sub fieldlist_set
1222             {
1223 36     36 0 62 my $this = shift;
1224 36         50 my ($FieldList) = @_;
1225              
1226 36         118 return($this->fieldlist_set_internal($FieldList, 0));
1227             }
1228              
1229             sub fieldlist_freeze
1230             {
1231 0     0 0 0 my $this = shift;
1232 0         0 return($this->fieldlist_set($this->fieldlist()));
1233             }
1234              
1235             sub fieldlist_force
1236             {
1237 3     3 0 7 my $this = shift;
1238 3         5 my ($FieldList) = @_;
1239              
1240 3         9 return($this->fieldlist_set_internal($FieldList, 1));
1241             }
1242              
1243             sub fieldlist_set_internal
1244             {
1245 39     39 0 49 my $this = shift;
1246 39         50 my ($FieldList, $Force) = @_;
1247              
1248 39 100       106 if (ref($FieldList) eq 'ARRAY')
1249             {
1250              
1251             ## Whether forcing or not, ensure all fields mentioned in the
1252             ## list actually exist and are the correct length.
1253 33         88 $this->fieldlist_check($FieldList);
1254              
1255             ## Set the custom list
1256 33         68 $this->{_FieldList} = $FieldList;
1257            
1258             ## In "force" mode, remove any non-listed columns.
1259 33 100       92 $this->fieldlist_truncate() if ($Force);
1260             }
1261             else
1262             {
1263             ## Remove the custom field list.
1264 6         17 $FieldList = delete $this->{_FieldList};
1265             }
1266              
1267 39         73 return($FieldList); ## Return the one that was set or deleted.
1268             }
1269              
1270             sub fieldlist_check
1271             {
1272 33     33 0 39 my $this = shift;
1273 33         65 my ($FieldList) = @_;
1274            
1275 33   33     81 $FieldList ||= $this->fieldlist();
1276              
1277             ## Visit each field name in the current list. Make sure it is
1278             ## present.
1279              
1280 33         60 foreach my $FieldName (@$FieldList)
1281             {
1282             ## The col method will the column exist if not present.
1283 100         214 $this->col($FieldName);
1284             }
1285             }
1286              
1287             sub fieldlist_truncate
1288             {
1289 3     3 0 5 my $this = shift;
1290              
1291 3         8 my $FieldList = $this->fieldlist();
1292 3         9 my $AllFields = $this->fieldlist_all();
1293 3         6 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  3         11  
1294              
1295 3         6 my $DeletedCols = {};
1296            
1297 3         7 foreach my $FieldName (@$AllFields)
1298             {
1299 8 100       23 if (!exists($FieldHash->{$FieldName}))
1300             {
1301 3         9 $DeletedCols->{$FieldName} = delete $this->{$FieldName};
1302             }
1303             }
1304            
1305 3         12 return($DeletedCols);
1306             }
1307              
1308             sub fieldlist_add
1309             {
1310 43     43 0 53 my $this = shift;
1311 43         56 my ($FieldName) = @_;
1312            
1313 43 100       124 if (ref($this->{_FieldList}) eq 'ARRAY')
1314             {
1315 34         47 my $FieldList = $this->{_FieldList};
1316 34         52 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  34         112  
1317            
1318 34 50       85 if (!exists($FieldHash->{$FieldName}))
1319             {
1320 34         117 push @$FieldList, $FieldName;
1321             }
1322             }
1323             }
1324              
1325             sub fieldlist_delete
1326             {
1327 3     3 0 10 my $this = shift;
1328 3         8 my ($FieldName) = @_;
1329            
1330 3 50       14 if (ref($this->{_FieldList}) eq 'ARRAY')
1331             {
1332 3         6 $this->{_FieldList} = [grep {$_ ne $FieldName} @{$this->{_FieldList}}];
  11         33  
  3         11  
1333             }
1334             }
1335              
1336             =pod
1337              
1338             =head1 DATA COLUMNS (FIELD DATA)
1339              
1340             ## Getting or setting data in entire columns
1341              
1342             $t->{$ColName} ## Get a column you know exists
1343             $t->col($ColName) ## Get a column or make empty one.
1344             $t->col_get($ColName)
1345              
1346             $t->col($ColName, $ListRef) ## Set all of a column all at once.
1347             $t->col_set($ColName, $ListRef)
1348             $t->col_force($ColName, $ListRef) ## Add but don't check size or
1349             ## add to custom field list
1350              
1351             $t->col_set($ColName, undef) ## Delete a column completely
1352             $t->col_delete($ColName)
1353              
1354             $t->col_empty() ## An empty col presized for table
1355             $t->col_empty(22) ## An empty col of another length
1356             $t->col_empty($Col) ## An empty col sized to match another
1357              
1358             $t->col_default() ## Default if req. column not found.
1359              
1360             $t->col_exists($Field) ## Check existence of column
1361             $t->col_active($Field) ## Restrict check to fieldlist()
1362              
1363             $t->cols($ColList) ## Get list of multiple named columns
1364             $t->cols_hash($ColList) ## Get hash " " "
1365              
1366             $t->col_rename($Old => $New) ## Change name of columns
1367             $t->col_rename($Old1 => $New1, $Old2 => $New2) ## Change several
1368              
1369             A "column" is a field in the table and all its data. The column's
1370             field name is a key in the object itself, and may also optionally be
1371             listed in a custom field list if present. The column's data is the
1372             key's value in the hash and is an array ref of values presumed to be
1373             of the same data type (e. g. string, integer, etc.)
1374              
1375             Sometimes the terms "column" and "field" are used interchangeably in
1376             this documentation.
1377              
1378             If you already know that a column exists (because you got it from the
1379             fieldlist() method and you've not previously manipulated _FieldList
1380             directly but instead carefully used the method calls available for
1381             that), then you can safely get the column by just looking it up in the
1382             object itself.
1383              
1384             The col() method does the same thing, but forces the column to spring
1385             into existence if it did not already (which can also have the
1386             potentially unwanted side-effect of hiding coding errors in which you
1387             retreive mis-named columns: so beware). Columns brought into
1388             existence this way will automatically be pre-sized (i.e. they will
1389             will be created and set to whatever col_default() returns).
1390              
1391             The col() or col_set() methods can also be used to set a column. When
1392             the column is set, the list you pass is automatically sized
1393             (lengthened or truncated) to match the current length of the table.
1394             If this is not what you want, then call col_force() which will not
1395             check whether the new column matches the size of the others.
1396              
1397             No matter how you set it, the object now "owns" the list you gave it.
1398              
1399             As a convenience, col(), col_set() and col_force() return the column
1400             that was set. They silently discard any previous column.
1401              
1402             All three methods of column setting will append the column to the
1403             custom field list if one is present and the column name is not already
1404             listed there (by calling fieldlist_add()). They will also call the
1405             extend() method to ensure all columns have the same length (either
1406             others will be extended to match the length of the new one, or the new
1407             one will be extended to match the length of the others).
1408              
1409             col_delete() deletes a column.
1410              
1411             col_empty() returns an anonymous list reference that is pre-sized to
1412             the length of the table (by default). You could use it to get an
1413             empty column that you intend to fill up and then later insert into the
1414             table or use to hold the results of an operation on other columns. If
1415             you want a different length, specify it as a number or as an array ref
1416             whose length should be matched.
1417              
1418             col_default() is the internal method that implements the "springing
1419             into existence" of missing columns. Currently it just calls
1420             col_empty(). Other subclasses might want to have it return undef or a
1421             string like "NO_SUCH_COLUMN" in order to help track programming errors
1422             where nonexistent columns are requested.
1423              
1424             cols($FieldList) returns an ordered list of the requested column
1425             names. If no list is given, then fieldlist() is used.
1426              
1427             cols_hash($FieldList) does the same as cols(), but the result is a
1428             hash whose keys are the field names and whose values are the columns
1429             -- much like the original object itself, but not blessed into the
1430             class. The resulting hash, however, could be used as the prototype
1431             for a new Data::CTable object (by calling the new() method). However,
1432             be warned that both objects will think they "own" the resulting shared
1433             so be careful what you do..... which brings us to this:
1434              
1435             IMPORTANT NOTE ABOUT GETTING COLUMNS: The columns you retrieve from a
1436             table are still "owned" by the table object as long as it lives. If
1437             you modify them, you are modifying the table's data. If you change
1438             their length, then you may be invalidating the table's own
1439             expectations that all its columns have the same length. So beware.
1440              
1441             Just make yourself a copy of the data if that isn't what you want.
1442             For example, instead of this:
1443              
1444             my $Foo = $Table->col('Foo'); ## Reference to actual column
1445              
1446             Do this:
1447              
1448             my $Foo = [@{$Table->col('Foo')}]; ## Shallow copy of the column
1449              
1450             =cut
1451              
1452             sub col ## ($ColName, [$Vector])
1453             {
1454 1472     1472 0 2828 my $this = shift;
1455 1472         1831 my ($ColName, $Vector) = @_;
1456              
1457             ## Set if specified.
1458 1472 100       3406 my $FoundVector = $this->col_set($ColName, $Vector) if defined($Vector);
1459              
1460             ## Get and return.
1461             ## If not specified, create it with col_default()
1462 1472         2428 my $Col = $this->col_get($ColName);
1463              
1464 1472         3961 return($Col);
1465             }
1466              
1467             sub col_get
1468             {
1469 1475     1475 0 1730 my $this = shift;
1470 1475         2103 my ($ColName) = @_;
1471              
1472 1475   66     3413 my $Col = ($this->{$ColName} || $this->col_add($ColName));
1473              
1474 1475         2320 return($Col);
1475             }
1476              
1477             sub col_add
1478             {
1479 28     28 0 38 my $this = shift;
1480 28         43 my ($ColName) = @_;
1481 28         61 my $Col = $this->{$ColName} = $this->col_empty();
1482              
1483 28         67 $this->fieldlist_add($ColName);
1484 28         80 return($Col);
1485             }
1486              
1487             sub col_set_internal ## ($ColName, [$Vector], [$Force])
1488             {
1489 16     16 0 18 my $this = shift;
1490 16         26 my ($ColName, $Vector, $Force) = @_;
1491              
1492 16         28 my $Valid = (ref($Vector) eq 'ARRAY');
1493 16         26 my $Existing = (ref($this->{$ColName}) eq 'ARRAY');
1494            
1495             ## Delete existing vector by this name...
1496 16 100 66     99 if (!$Valid && $Existing)
    100 66        
    50          
1497             {
1498 1         3 $Vector = delete $this->{$ColName}; ## Delete and save to return to caller.
1499 1         6 $this->fieldlist_delete($ColName); ## Delete from field list if needed.
1500             }
1501             ## ...or add one...
1502             elsif ($Valid && !$Existing)
1503             {
1504 11         24 $this->{$ColName} = $Vector;
1505              
1506 11 50       20 if (!$Force)
1507             {
1508 11         23 $this->extend(); ## Extend all vectors as needed to ensure same length.
1509 11         25 $this->fieldlist_add($ColName); ## Add to custom field list if needed.
1510             }
1511             }
1512             ## ...otherwise replace.
1513             elsif ($Valid)
1514             {
1515 4         9 $this->{$ColName} = $Vector;
1516              
1517 4 50       18 if (!$Force)
1518             {
1519 4         11 $this->extend(); ## Extend all vectors as needed to ensure same length.
1520             }
1521             }
1522            
1523 16         44 return($Vector); ## Return added or deleted vector for convenience.
1524             }
1525              
1526             sub col_delete ## ($ColName)
1527             {
1528 1     1 0 7 my $this = shift;
1529 1         3 my ($ColName) = @_;
1530              
1531 1         4 return($this->col_set_internal($ColName));
1532             }
1533              
1534             sub col_set ## ($ColName, $Vector)
1535             {
1536 15     15 0 27 my $this = shift;
1537 15         25 my ($ColName, $Vector) = @_;
1538              
1539 15         37 return($this->col_set_internal($ColName, $Vector));
1540             }
1541              
1542             sub col_force ## ($ColName, $Vector)
1543             {
1544 0     0 0 0 my $this = shift;
1545 0         0 my ($ColName, $Vector) = @_;
1546              
1547 0         0 return($this->col_set_internal($ColName, $Vector, 1));
1548             }
1549              
1550             sub col_empty
1551             {
1552 37     37 0 56 my $this = shift;
1553 37         46 my ($Length) = @_;
1554              
1555             ## Default to table length. Or get length from sample column.
1556 37 100       114 $Length = $this->length() unless defined($Length);
1557 37 100       89 $Length = @$Length if ref($Length) eq 'ARRAY';
1558              
1559 37         54 my $Col = [];
1560 37         119 $#$Col = $Length - 1;
1561              
1562 37         93 return($Col);
1563             }
1564              
1565             sub col_default
1566             {
1567 1     1 0 2 my $this = shift;
1568 1         4 my $Col = $this->col_empty();
1569              
1570 1         6 return($Col);
1571             }
1572              
1573             sub cols ## ($ColNames)
1574             {
1575 33     33 0 68 my $this = shift;
1576 33         49 my ($ColNames) = @_;
1577 33   66     89 $ColNames ||= $this->fieldlist();
1578 33         62 my $Cols = [map {$this->col($_)} @$ColNames];
  106         214  
1579              
1580 33         106 return($Cols);
1581             }
1582              
1583             sub cols_hash ## ($ColNames)
1584             {
1585 25     25 0 60 my $this = shift;
1586 25         44 my ($ColNames) = @_;
1587 25   66     88 $ColNames ||= $this->fieldlist();
1588 25         73 my $Cols = $this->cols($ColNames);
1589 25         56 my $ColsHash = {}; @$ColsHash{@$ColNames} = @$Cols;
  25         93  
1590              
1591 25         225 return($ColsHash);
1592             }
1593              
1594             sub col_exists ## ($ColName, [$FieldList])
1595             {
1596 24     24 0 93 my $this = shift;
1597 24         40 my ($ColName, $FieldList) = @_;
1598              
1599             ## Default list to search is ALL fields in object.
1600 24   66     78 $FieldList ||= $this->fieldlist_all();
1601              
1602             ## Disallow column names starting with underscore.
1603 24 50       60 return(0) if $ColName =~ /^_/;
1604              
1605 24         55 my $FieldHash = $this->fieldlist_hash($FieldList);
1606 24         49 my $Exists = exists($FieldHash->{$ColName});
1607              
1608 24         99 return($Exists);
1609             }
1610              
1611             sub col_active ## ($ColName, [$FieldList])
1612             {
1613 8     8 0 42 my $this = shift;
1614 8         14 my ($ColName, $FieldList) = @_;
1615              
1616             ## Default list to search is only ACTIVE fields in object.
1617 8   33     30 $FieldList ||= $this->fieldlist();
1618              
1619             ## Disallow column names starting with underscore.
1620 8 50       22 return(0) if $ColName =~ /^_/;
1621              
1622 8         20 my $FieldHash = $this->fieldlist_hash($FieldList);
1623 8         17 my $Exists = exists($FieldHash->{$ColName});
1624              
1625 8         30 return($Exists);
1626             }
1627              
1628             sub col_rename ## ($Old => $New, [$Old => New...])
1629             {
1630 3     3 0 19 my $this = shift;
1631            
1632 3         4 my $Success;
1633              
1634 3         8 my $Fields = $this->fieldlist_all();
1635              
1636 3         5 my ($Old, $New);
1637 3         18 while (($Old, $New) = splice(@_, 0, 2))
1638             {
1639 3 50       17 $this->warn("Invalid column name: $New"), next
1640             unless ($New =~ /^[^_]+/);
1641            
1642 3 50       10 $this->warn("Column to be renamed does not exist: $Old"), next
1643             unless $this->col_exists($Old, $Fields);
1644            
1645 3 50 0     8 (($Old ne $New) && $this->warn("Failed to rename column $Old to $New: $New exists.")), next
1646             if $this->col_exists($New, $Fields);
1647            
1648 3         9 my $Col = $this->col($Old); ## Creates if not present.
1649            
1650             ## Rename the column...
1651 3         11 $this->{$New} = delete $this->{$Old};
1652            
1653             ## Then make the same change to _FieldList, _SortOrder, _SortSpecs
1654            
1655 3 100       10 $this->{_FieldList} = [map {$_ = $New if $_ eq $Old; $_} @{$this->{_FieldList}}] if (defined($this->{_FieldList}));
  10 50       21  
  10         25  
  3         7  
1656 3 0       12 $this->{_SortOrder} = [map {$_ = $New if $_ eq $Old; $_} @{$this->{_SortOrder}}] if (defined($this->{_SortOrder}));
  0 50       0  
  0         0  
  0         0  
1657 3 50 33     29 $this->{_SortSpecs}->{$New} = delete $this->{_SortSpecs}->{$Old} if (defined($this->{_SortSpecs}) &&
1658             ( $this->{_SortSpecs}->{$Old}));
1659             }
1660            
1661 3         5 $Success = 1;
1662 3         9 done:
1663             return($Success);
1664             }
1665              
1666             =pod
1667              
1668             =head1 CLEANUP AND VALIDATION
1669              
1670             ## Performing your own cleanups or validations
1671              
1672             $t->clean($Sub) ## Clean with custom subroutine
1673             $t->clean($Sub, $Fields) ## Clean specified columns only
1674              
1675             ## Cleaning whitespace
1676              
1677             $t->clean_ws() ## Clean whitespace in fieldlist() cols
1678             $t->clean_ws($Fields) ## Clean whitespace in specified cols
1679              
1680             ## Cleaning methods that map character sets
1681              
1682             $t->clean_mac_to_iso8859()
1683             $t->clean_mac_to_iso8859($Fields)
1684              
1685             $t->clean_iso8859_to_mac()
1686             $t->clean_iso8859_to_mac($Fields)
1687              
1688             ## Character mapping utilities (not methods)
1689              
1690             use Data::CTable qw(
1691             ISORoman8859_1ToMacRoman
1692             MacRomanToISORoman8859_1
1693              
1694             ISORoman8859_1ToMacRoman_clean
1695             MacRomanToISORoman8859_1_clean
1696             );
1697              
1698             &ISORoman8859_1ToMacRoman(\ $Str) ## Pass pointer to buffer
1699             &MacRomanToISORoman8859_1(\ $Str) ## Pass pointer to buffer
1700              
1701             &ISORoman8859_1ToMacRoman_clean() ## Operates on $_
1702             &MacRomanToISORoman8859_1_clean() ## Operates on $_
1703              
1704             One of the most important things you can do with your data once it's
1705             been placed in a table in Perl is to use the power of Perl to scrub it
1706             like crazy.
1707              
1708             The built-in clean_ws() method applies a standard white-space cleanup
1709             to selected records in every field in the fieldlist() or other list of
1710             fields you optionally supply (such as fieldlist_all()).
1711              
1712             It does the following cleanups that are deemed correct for the
1713             majority of data out there:
1714              
1715             - Remove all leading whitespace, including returns (\n)
1716             - Remove all trailing whitespace, including returns (\n)
1717             - Convert runs of spaces to a single space
1718             - Convert empty string values back to undef to save space
1719              
1720             Of course, depending on your data, clean_ws() might just be the first
1721             thing you do in your cleanup pass. There might be many more cleanups
1722             you'd like to apply.
1723              
1724             clean() is like clean_ws() except you supply as the first argument
1725             your own cleaning subroutine (code reference). It should do its work
1726             by modifying $_.
1727              
1728             Both clean_ws() and clean() apply cleaning ONLY to selected records.
1729             If this isn't what you want, then select_all() before cleaning.
1730              
1731             Since a cleanup subroutine can do ANY modifications to a field that it
1732             likes, you can imagine some cleanup routines that also supply default
1733             values and do other validations.
1734              
1735             For example, a cleanup routine could convert every value in each field
1736             to an integer, or apply minimum or maximum numerical limits:
1737              
1738             sub {$_ = int($_) }
1739             sub {$_ = max(int($_), 0) }
1740             sub {$_ = min(int($_), 200)}
1741              
1742             Or your cleanup routine could use regular expressions to do
1743             capitalizations or other regularizations of data:
1744              
1745             sub Capitalize {/\b([a-z])([a-z]+)\b)/\U$1\E$2/g}
1746              
1747             $t->clean(\ &Capitalize , ['FirstName', 'LastName']);
1748             $t->clean(\ &PhoneFormat, ['Phone', 'Fax' ]);
1749             $t->clean(\ &LegalUSZip, ['HomeZip', 'WorkZip' ]);
1750              
1751             ... and so on. Cleanups are easy to write and quick and easy to apply
1752             with Data::CTable. Do them early! Do them often!
1753              
1754             =head2 Hints for writing cleanup routines
1755              
1756             If your cleanup routine may be used to clean up fields that could be
1757             empty/undef and empty/undef is a legal value, it should not touch any
1758             undef values (unintentionally converting them to strings).
1759              
1760             Finally, instead of setting any values to the empty string, it should
1761             set them to undef instead. This includes any values it might have
1762             left empty during cleanup. (Using undef instead of empty string to
1763             represent empty values is one way that Data::CTable likes to save
1764             memory in tables that may have lots of those.)
1765              
1766             For an example of a well-behaved cleanup routine, consider the
1767             following implementation of the builtin CleanWhitespace behavior:
1768              
1769             sub CleanWhitespace
1770             {
1771             return unless defined; ## Empty/undef values stay that way
1772             s/ \s+$//sx; ## Remove trailing whitespace
1773             s/^\s+ //sx; ## Remove leading whitespace
1774             s/ +/ /g; ## Runs of spaces to single space
1775             $_ = undef unless length; ## (Newly?) empty strings to undef
1776             }
1777              
1778             =head2 Roman character set mapping
1779              
1780             The character set mapping cleanup routines can be used to convert
1781             upper-ASCII characters bidirectionally between two popular Roman
1782             Character sets -- Mac Roman 1 and ISO 8859-1 (also sometimes called
1783             ISO Latin 1) -- i.e. the Western European Roman character sets.
1784              
1785             By default, read() converts all incoming data fields in data files
1786             with Mac line endings to ISO format when reading in. Conversely,
1787             write() does the reverse mapping (ISO to Mac) when writing a file with
1788             Mac line endings.
1789              
1790             However, you may wish to turn off these default behaviors and instead
1791             apply the mappings manually, possibly just to certain fields.
1792              
1793             For example, if a table contains fields with non-Roman character sets,
1794             you would definitely not want to apply these mappings, and instead
1795             might want to apply some different ones that you create yourself.
1796              
1797             =head2 Utility routines for character mapping
1798              
1799             This module can optionally export four utility routines for mapping
1800             character Latin 1 character sets. Always be sure to map the correct
1801             direction -- otherwise you'll end up with garbage! Be careful to only
1802             pass Western Roman strings -- not double-byte strings or strings
1803             encoded in any single-byte Eastern European Roman or non-Roman
1804             character set.
1805              
1806             &ISORoman8859_1ToMacRoman(\ $Str) ## Pass pointer to buffer
1807             &MacRomanToISORoman8859_1(\ $Str) ## Pass pointer to buffer
1808              
1809             These routines translate characters whose values are 128-255 from one
1810             Western Roman encoding to another. The argument is a string buffer of
1811             any size passed by reference.
1812              
1813             The functions return a count of the number of characters that were
1814             mapped (zero or undef if none were).
1815              
1816             &ISORoman8859_1ToMacRoman_clean() ## Operates on $_
1817             &MacRomanToISORoman8859_1_clean() ## Operates on $_
1818              
1819             These routines are variants of the above, but they're versions that
1820             are compatible with clean() -- they operate on $_ and will take care
1821             to leave undefined values undefined. They do not have return values.
1822              
1823             =head2 More advanced cleaning and validation
1824              
1825             Unfortunately, clean() only lets you operate on a single field value
1826             at a time -- and there's no way to know the record number or other
1827             useful information inside the cleaning routine.
1828              
1829             For really powerful cleaning and validation involving access to all
1830             fields of a record as well as record numbers, see the discussion of
1831             the calc() method and other methods for doing complex field
1832             calculations in the next section.
1833              
1834             =cut
1835              
1836             sub clean
1837             {
1838 8     8 0 20 my $this = shift;
1839 8         14 my ($Sub, $Fields) = @_;
1840            
1841             ## Default is fields in the list.
1842 8   33     36 $Fields ||= $this->fieldlist();
1843              
1844 8         22 my $Sel = $this->selection();
1845            
1846 8         18 foreach (@$Fields) {foreach (@{$this->col($_)}[@$Sel]) {&$Sub()}};
  38         76  
  38         81  
  114         237  
1847             }
1848              
1849             sub clean_ws
1850             {
1851 6     6 0 32 my $this = shift;
1852 6         26 return($this->clean(\ &CleanWhitespace, @_));
1853             }
1854              
1855             sub CleanWhitespace
1856             {
1857 84 50   84 0 156 return unless defined; ## Empty/undef values stay that way
1858 84         180 s/ \s+$//sx; ## Remove trailing whitespace
1859 84         153 s/^\s+ //sx; ## Remove leading whitespace
1860 84         169 s/ +/ /g; ## Runs of spaces to single space
1861 84 50       257 $_ = undef unless length; ## (Newly?) empty strings to undef
1862             }
1863              
1864             sub clean_mac_to_iso8859
1865             {
1866 1     1 0 8 my $this = shift;
1867 1         5 return($this->clean(\ &MacRomanToISORoman8859_1_clean, @_));
1868             }
1869              
1870             sub clean_iso8859_to_mac
1871             {
1872 0     0 0 0 my $this = shift;
1873 0         0 return($this->clean(\ &ISORoman8859_1ToMacRoman_clean, @_));
1874             }
1875              
1876             sub ISORoman8859_1ToMacRoman
1877             {
1878 20     20 0 32 return($ {$_[0]} =~
  20         49  
1879            
1880             tr/\x80-\xFF/\xDE\xDF\xE2\xC4\xE3\xC9\xA0\xE0\xF6\xE4\xBA\xDC\xCE\xAD\xB3\xB2\xB0\xD4\xD5\xD2\xD3\xA5\xF8\xD1\xF7\xAA\xF9\xDD\xCF\xF0\xDA\xD9\xCA\xC1\xA2\xA3\xDB\xB4\xF5\xA4\xAC\xA9\xBB\xC7\xC2\xD0\xA8\xC3\xA1\xB1\xFA\xFE\xAB\xB5\xA6\xE1\xFC\xFF\xBC\xC8\xC5\xFD\xFB\xC0\xCB\xE7\xE5\xCC\x80\x81\xAE\x82\xE9\x83\xE6\xE8\xED\xEA\xEB\xEC\xC6\x84\xF1\xEE\xEF\xCD\x85\xD7\xAF\xF4\xF2\xF3\x86\xB7\xB8\xA7\x88\x87\x89\x8B\x8A\x8C\xBE\x8D\x8F\x8E\x90\x91\x93\x92\x94\x95\xB6\x96\x98\x97\x99\x9B\x9A\xD6\xBF\x9D\x9C\x9E\x9F\xBD\xB9\xD8/);
1881              
1882             }
1883              
1884             sub ISORoman8859_1ToMacRoman_clean
1885             {
1886 0 0   0 0 0 return unless defined; ## Empty/undef values stay that way
1887              
1888 0         0 tr/\x80-\xFF/\xDE\xDF\xE2\xC4\xE3\xC9\xA0\xE0\xF6\xE4\xBA\xDC\xCE\xAD\xB3\xB2\xB0\xD4\xD5\xD2\xD3\xA5\xF8\xD1\xF7\xAA\xF9\xDD\xCF\xF0\xDA\xD9\xCA\xC1\xA2\xA3\xDB\xB4\xF5\xA4\xAC\xA9\xBB\xC7\xC2\xD0\xA8\xC3\xA1\xB1\xFA\xFE\xAB\xB5\xA6\xE1\xFC\xFF\xBC\xC8\xC5\xFD\xFB\xC0\xCB\xE7\xE5\xCC\x80\x81\xAE\x82\xE9\x83\xE6\xE8\xED\xEA\xEB\xEC\xC6\x84\xF1\xEE\xEF\xCD\x85\xD7\xAF\xF4\xF2\xF3\x86\xB7\xB8\xA7\x88\x87\x89\x8B\x8A\x8C\xBE\x8D\x8F\x8E\x90\x91\x93\x92\x94\x95\xB6\x96\x98\x97\x99\x9B\x9A\xD6\xBF\x9D\x9C\x9E\x9F\xBD\xB9\xD8/;
1889             }
1890              
1891             sub MacRomanToISORoman8859_1
1892             {
1893 61     61 0 73 return($ {$_[0]} =~
  61         142  
1894            
1895             tr/\x80-\xFF/\xC4\xC5\xC7\xC9\xD1\xD6\xDC\xE1\xE0\xE2\xE4\xE3\xE5\xE7\xE9\xE8\xEA\xEB\xED\xEC\xEE\xEF\xF1\xF3\xF2\xF4\xF6\xF5\xFA\xF9\xFB\xFC\x86\xB0\xA2\xA3\xA7\x95\xB6\xDF\xAE\xA9\x99\xB4\xA8\x8D\xC6\xD8\x90\xB1\x8F\x8E\xA5\xB5\xF0\xDD\xDE\xFE\x8A\xAA\xBA\xFD\xE6\xF8\xBF\xA1\xAC\xAF\x83\xBC\xD0\xAB\xBB\x85\xA0\xC0\xC3\xD5\x8C\x9C\xAD\x97\x93\x94\x91\x92\xF7\xD7\xFF\x9F\x9E\xA4\x8B\x9B\x80\x81\x87\xB7\x82\x84\x89\xC2\xCA\xC1\xCB\xC8\xCD\xCE\xCF\xCC\xD3\xD4\x9D\xD2\xDA\xDB\xD9\xA6\x88\x98\x96\x9A\xB2\xBE\xB8\xBD\xB3\xB9/);
1896              
1897             }
1898              
1899             sub MacRomanToISORoman8859_1_clean
1900             {
1901 15 50   15 0 27 return unless defined; ## Empty/undef values stay that way
1902              
1903 15         33 tr/\x80-\xFF/\xC4\xC5\xC7\xC9\xD1\xD6\xDC\xE1\xE0\xE2\xE4\xE3\xE5\xE7\xE9\xE8\xEA\xEB\xED\xEC\xEE\xEF\xF1\xF3\xF2\xF4\xF6\xF5\xFA\xF9\xFB\xFC\x86\xB0\xA2\xA3\xA7\x95\xB6\xDF\xAE\xA9\x99\xB4\xA8\x8D\xC6\xD8\x90\xB1\x8F\x8E\xA5\xB5\xF0\xDD\xDE\xFE\x8A\xAA\xBA\xFD\xE6\xF8\xBF\xA1\xAC\xAF\x83\xBC\xD0\xAB\xBB\x85\xA0\xC0\xC3\xD5\x8C\x9C\xAD\x97\x93\x94\x91\x92\xF7\xD7\xFF\x9F\x9E\xA4\x8B\x9B\x80\x81\x87\xB7\x82\x84\x89\xC2\xCA\xC1\xCB\xC8\xCD\xCE\xCF\xCC\xD3\xD4\x9D\xD2\xDA\xDB\xD9\xA6\x88\x98\x96\x9A\xB2\xBE\xB8\xBD\xB3\xB9/;
1904              
1905             }
1906              
1907             =pod
1908              
1909             =head1 CALCULATIONS USING calc()
1910              
1911             ## Calculate a new field's values based on two others
1912              
1913             $t->calc($Sub) ## Run $Sub for each row, with
1914             ## fields bound to local vars
1915              
1916             $t->calc($Sub, $Sel) ## Use these row nums
1917             $t->calc($Sub, undef, $Fields) ## Use only these fields
1918             $t->calc($Sub, $Sel, $Fields) ## Use custom rows, fields
1919              
1920             my $Col = $t->calc($Sub) ## Gather return vals in vector
1921              
1922              
1923             ## Example 1: Overwrite values in an existing column.
1924              
1925             $t->calc(sub{no strict 'vars'; $Size = (stat($Path))[7]});
1926              
1927              
1928             ## Example 2: Create empty column; fill fields 1 by 1
1929              
1930             $t->col('PersonID');
1931             $t->calc(sub{no strict 'vars'; $PersonID = "$Last$First"});
1932              
1933              
1934             ## Example 3: Calculate values; put into to table if desired
1935              
1936             $PersonID = $t->calc(sub{no strict 'vars'; "$Last$First"});
1937             $t->sel('PersonID', $PersonID);
1938              
1939              
1940             ## Example 4: Using fully-qualified variable names
1941              
1942             $t->calc(sub{$main::PersonID = "$main::Last$main::First"});
1943              
1944             calc() runs your custom calculation subroutine $Sub once for every row
1945             in the current selection() or other list of row numbers that you
1946             specify in the optional $Sel argument.
1947              
1948             This lets you apply a complex calculation to every record in a table
1949             in a single statement, storing the results in one or more columns, or
1950             retrieving them as a list.
1951              
1952             Your custom subroutine may refer to the value in any field in the
1953             current row by using a global variable with the field's name:
1954              
1955             For example, if the table has fields First, Last, and Age, then $Sub
1956             may use, modify, and set the variables $First, $Last, $Age. (Also
1957             known as $main::First, $main::Last, $main::Age).
1958              
1959             Modifying any of these specially-bound variables actually modifies the
1960             data in the correct record and field within the table.
1961              
1962             By default, the fields available to $Sub are all fields in the table.
1963             calc() must bind all the field names for you for each row, which can
1964             be time-consuming for tables with very large numbers of fields.
1965              
1966             You can speed up the operation of calc() by listing only the fields
1967             your $Sub needs in the optional parameter $Fields. Any field names
1968             you don't mention won't be available to $Sub. Conversely, calc() will
1969             run faster because it can bind only the fields you actually need.
1970              
1971             If you include non-existent fields in your custom $Fields list, calc()
1972             creates them for you before $Sub runs the first time. Then your $Sub
1973             can store field values into the new column, referring to it by name.
1974              
1975             Variables in $Sub are in the "main" package. So you should set $Sub
1976             to use pacakge "main" in case the rest of your code is not in "main".
1977              
1978             Similarly, if you "use strict", Perl will complain about global
1979             variables in $Sub. So you may need to assert "no strict 'vars'".
1980              
1981             { package Foo; use strict;
1982            
1983             $t = ...;
1984             $t->calc(sub {package main; no strict 'vars';
1985             $Age = int($Age)});
1986             }
1987              
1988             ## Or this:
1989              
1990             { package Foo; use strict;
1991              
1992             $t = ...;
1993             { package main; no strict 'vars';
1994             my $Sub = sub {$Age = int($Age)};
1995             }
1996             $t->calc($Sub);
1997             }
1998              
1999             You may be able to get around both problems more easily by prefixing
2000             each variable reference in $Sub with "main::". This takes care of the
2001             package name issue and bypasses "use strict" at the same time, at the
2002             slight cost of making the calculation itself a bit harder to read.
2003              
2004             $t->calc(sub {$main::Age = int($main::Age)}); ## OK in any package
2005              
2006             In addition to the field names, the following three values are defined
2007             during each invocation of $Sub:
2008              
2009             $_r ($main::_r) -- the row number in the entire table
2010             $_s ($main::_s) -- the item number in selection or $Recs
2011             $_t ($main::_t) -- the table object itself
2012              
2013             You could use these values to print diagnostic information or to
2014             access any of the data, parameters, or methods of the table itself
2015             from within $Sub. Or you could even calculate field values using $_r
2016             or $_s.
2017              
2018             For example, after searching & sorting, you could make a field which
2019             preserves the resulting sort order for future reference:
2020              
2021             $t->col('Ranking'); ## Create the empty column first.
2022             $t->calc(sub{$main::Ranking = $main::_s});
2023              
2024             This last example is equivalent to:
2025              
2026             $t->sel(Ranking => [0 .. $#{$t->selection()}]); ## See sel() below
2027              
2028              
2029             =head1 "MANUAL" CALCULATIONS
2030              
2031             calc() (see previous secion) is the briefer, more elegant way to do
2032             batch calculations on entire columns in a table, but it can be
2033             slightly slower than doing the calculations yourself.
2034              
2035             If you have extremely large tables, and you notice the processing time
2036             for your calculations taking more than a second, you might want to
2037             rewrite your calculations to use the more efficient techniques shown
2038             here.
2039              
2040             You will often need to create new calculated columns based on one or
2041             more existing ones, and then either insert the columns back in the
2042             tables or use them for further calculations or indexing.
2043              
2044             Examples 1a and 1b create a new field 'NameOK' containing either the
2045             string "OK" or undef (empty) depending on whether the field 'Name' is
2046             empty. Just use map() to iterate over the existing values in the
2047             other column, binding $_ to each value in turn.
2048              
2049             ### Example 1a: Calculation based on one other column
2050              
2051             ## Retrieve column
2052             my $Name = $t->col('Name');
2053            
2054             ## Make new column
2055             my $NameOK = [map {!!length && 'OK'} @$Name];
2056              
2057             ## Insert column back into table:
2058             $t->col(NameOK => $NameOK);
2059              
2060             ### Example 1b: Same calculation, in a single statement:
2061              
2062             $t->col(NameOK => [map {!!length && 'OK'} @{$t->col('Name')}]);
2063              
2064             In order to iterate over MULTIPLE columns at once, you need a list of
2065             the row numbers generated by $t->all() so you can index the two
2066             columns in tandem. Then, you use map to bind $_ to each row number,
2067             and then use the expression $t->col($ColName)->[$_] to retreive each
2068             value.
2069              
2070             Examples 2a and 2b demonstrate this method. They create a new field
2071             'FullName' which is a string joining the values in the 'First' and
2072             'Last' columns with a space between.
2073              
2074             ### Example 2a: Calculation based on multiple columns
2075              
2076             ## Retrieve columns
2077             my $First = $t->col('First');
2078             my $Last = $t->col('Last' );
2079              
2080             ## Retreive row nums
2081             my $Nums = $t->all();
2082              
2083             ## Calculate a new column based on two others
2084             my $Full = [map {"$First->[$_] $Last->[$_]"} @$Nums];
2085              
2086             ## Add new column to the table
2087             $t->col(FullName => $Full);
2088              
2089             ### Example 2b: Same calculation, in a single statement:
2090              
2091             $t->col(FullName =>
2092             [map {"$t->col('First')->[$_] t->col('Last')->[$_]"}
2093             @{$t->all()}]);
2094              
2095             In examples 1 and 2, you create entirely new columns and then add or
2096             replace them in the table.
2097              
2098             Using the approach in Examples 3a and 3b, you can assign calculated
2099             results directly into each value of an existing column as you go.
2100              
2101             ## Example 3a: Calculate by assigning directly into fields...
2102              
2103             my $A = $t->col->('A'); ## This column will be modified
2104             my $B = $t->col->('B');
2105             my $C = $t->col->('C');
2106              
2107             foreach @($t->all()) {$A->[$_] = $B->[$_] + $C->[$_];}
2108              
2109             ## Example 3b: Same calculation, in a single statement:
2110              
2111             foreach @($t->all()) {($t->col('A')->[$_] =
2112             $t->col('B')->[$_] +
2113             $t->col('C')->[$_])};
2114              
2115             Before writing your code, think about which calculation paradigms best
2116             suit your needs and your data set.
2117              
2118             Just as Perl Hackers know: There's More Than One Way To Do It!
2119              
2120             =cut
2121              
2122             {}; ## Get emacs to indent correctly.
2123              
2124             sub calc
2125             {
2126             ## We operate in package main for this subroutine so the local
2127             ## object and row number can be available to the caller's $Sub.
2128              
2129             ## The following local vars will be available to $Sub
2130             ## $_r ($main::_r) -- the row number in the entire table
2131             ## $_s ($main::_s) -- the row number in selection or $Recs
2132             ## $_t ($main::_t) -- the table object itself
2133              
2134             package main;
2135 1     1   13 use vars qw($_r $_s $_t);
  1         2  
  1         139  
2136 3     3 0 37 local ($_r, $_s, $_t);
2137              
2138 3         6 $_t = shift;
2139 3         7 my ($Sub, $Recs, $Fields) = @_;
2140              
2141             ## These optional params default to current field and current sel
2142              
2143 3   33     21 $Recs ||= $_t->selection();
2144 3   33     16 $Fields ||= $_t->fieldlist_all();
2145            
2146             ## Local copy of symbol table. Didn't seem to help. Odd.
2147             ## local %main::;
2148            
2149             ## We'll build a column of return values from $Sub if needed.
2150 3         5 my $WantVals = defined(wantarray);
2151 3 50       10 my $Vals = $_t->col_empty() if $WantVals;
2152              
2153             ## Call col() on each field in list to make sure it exists.
2154 3         9 foreach (@$Fields) {$_t->col($_)};
  15         31  
2155              
2156 3         12 foreach $_s (0..$#$Recs)
2157             {
2158 9         28 $_r = $Recs->[$_s];
2159              
2160             ## Bind $FieldName1, $FieldName2, (etc. for each field name in
2161             ## $Fields) point to address of the current value for that
2162             ## field in this record.
2163              
2164 1     1   5 no strict 'refs';
  1         2  
  1         4377  
2165 9         18 foreach my $F (@$Fields) {*{$F} = \ $_t->{$F}->[$_r]};
  45         64  
  45         113  
2166              
2167             ## Now $Sub may refer to $_r, $_s, $_t, and ${any field name}
2168              
2169             ## Call $Sub and capture return values iff caller wants them
2170 9 50       30 ($WantVals ? $Vals->[$_r] = &$Sub() : &$Sub());
2171             }
2172              
2173             ## Return scalar column ref unless return context is undef
2174 3 50       22 return($WantVals ? $Vals : ());
2175             }
2176              
2177             =pod
2178              
2179             =head1 INDEXES
2180              
2181             ## Make indexes of columns or just selected data in columns
2182              
2183             my $Index1 = $t->index_all($Key); ## entire column
2184             my $Index2 = $t->index_sel($Key); ## selected data only
2185              
2186             ## Make hashes of 2 columns or just selected data in columns
2187              
2188             my $Index1 = $t->hash_all($KeyFld, $ValFld); ## entire column
2189             my $Index2 = $t->hash_sel($KeyFld, $ValFld); ## selected data
2190              
2191             index_all() creates and returns a hash (reference) that maps keys
2192             found in the column called $Key to corresponding record numbers.
2193              
2194             Ideally, values in $Key would be unique (that's up to you).
2195              
2196             If any values in $Key are NOT unique, then later values (higher record
2197             numbers) will be ignored.
2198              
2199             index_sel() creates and returns a hash (ref) that maps keys found in
2200             the SELECTED RECORDS of column $Key to corresponding record numbers.
2201              
2202             Any keys in unselected records are ignored. Otherwise, the behavior
2203             is equivalent to index_all().
2204              
2205             hash_all() and hash_sel() are similar, except they create and return
2206             hashes whose keys are taken from column $KeyFld, but whose values are
2207             from $ValFld in the corresponding records.
2208              
2209             So, for example, imagine you have a tab-delimited file on disk with
2210             just a single tab per line (2 fields) and no header row. The entries
2211             on the left side of the tab on each line are keys and the right side
2212             are values. You could convert that file into a hash in memory like
2213             this:
2214              
2215             my $t = Data::CTable->new({_HeaderRow=>0, _FieldList=>[qw(F1 F2)]},
2216             "DeptLookup.txt");
2217              
2218             my $DeptLookup = $t->hash_all(qw(F1 F2));
2219              
2220             =head2 Reverse indexes
2221              
2222             If you'd like an index mapping record number to key, just get
2223             $t->col($Key). That's what the data columns in Data::CTable are.
2224              
2225             =cut
2226              
2227             sub index_all
2228             {
2229 2     2 0 8 my $this = shift;
2230 2         4 my ($Key) = @_;
2231              
2232 2         4 my $Index = {}; @$Index{reverse @{$this->col($Key)}} = reverse @{$this->all()};
  2         3  
  2         6  
  2         5  
2233              
2234 2         16 return($Index);
2235             }
2236              
2237             sub index_sel
2238             {
2239 2     2 0 6 my $this = shift;
2240 2         3 my ($Key) = @_;
2241            
2242 2         5 my $Index = {}; @$Index{reverse @{$this->sel($Key)}} = reverse @{$this->selection()};
  2         3  
  2         5  
  2         6  
2243            
2244 2         15 return($Index);
2245             }
2246              
2247              
2248             sub hash_all
2249             {
2250 2     2 0 3 my $this = shift;
2251 2         5 my ($Key, $Val) = @_;
2252              
2253 2         3 my $Hash = {}; @$Hash{reverse @{$this->col($Key)}} = reverse @{$this->col($Val)};
  2         4  
  2         5  
  2         6  
2254              
2255 2         15 return($Hash);
2256             }
2257              
2258             sub hash_sel
2259             {
2260 2     2 0 4 my $this = shift;
2261 2         4 my ($Key, $Val) = @_;
2262            
2263 2         5 my $Hash = {}; @$Hash{reverse @{$this->sel($Key)}} = reverse @{$this->sel($Val)};
  2         3  
  2         7  
  2         6  
2264            
2265 2         15 return($Hash);
2266             }
2267              
2268              
2269             =pod
2270              
2271             =head1 DATA ROWS (RECORDS)
2272              
2273             ## Getting or setting rows / records
2274              
2275             $t->row($Num) ## Get a row or make empty one.
2276             $t->row_get($Num)
2277              
2278             $t->row($Num, $HashRef) ## Set all of a row all at once.
2279             $t->row_set($Num, $HashRef)
2280              
2281             $t->row_set($Num, undef) ## Delete a row completely
2282             $t->row_delete($Num)
2283             $t->row_delete($Beg, $End)## Delete a range of rows
2284              
2285             $t->row_move($Old, $New) ## Move a row to before $New
2286              
2287             $t->row_empty() ## An empty hash
2288             $t->row_exists($Num) ## True if $Num < $t->length()
2289              
2290             $t->rows($RowList) ## Get list of multiple row nums
2291              
2292             $t->row_list($Num) ## Get row vals as a list
2293             $t->row_list($Num, $Fields) ## Get row vals: specified fields
2294              
2295             $t->row_list_set($Num, undef, $Vals) ## Set row vals as a list
2296             $t->row_list_set($Num, $Fields, $Vals) ## Set row vals as a list
2297             $t->row_list_set($Num, $Fields) ## Set vals to empty/undef
2298              
2299             Usually, when working with Data::CTable objects, you are operating on
2300             entire columns or tables at a time (after all: any transformation you
2301             do on one record you almost always want to do on all records or all
2302             selected ones).
2303              
2304             You should very rarely need to access data by retrieving rows or
2305             setting rows, moving them around individually, and so on. (It's much
2306             cleaner, and much more efficient to manipulate the selection() (list
2307             of selected row numbers) instead -- just delete a row number from the
2308             selection, for example, and then for most operations it's almost as if
2309             the row is gone from the table, except the data is really still
2310             there.)
2311              
2312             However, if on rare occasions you really do need direct row
2313             operations, you're reading the right section.
2314              
2315             A row is generally accessed as a hash. The hash you provide or get
2316             back is not saved by the object in any way. Data values are always
2317             copied in or out of it, so you always "own" the hash.
2318              
2319             Rows are specified by $Num -- the row number with in the unsorted
2320             columns (the raw data in the table). These numbers are just array
2321             indices into the data columns, and so their legal range is:
2322              
2323             [0 .. ($t->length() - 1)] ## (Zero-based row numbering.)
2324              
2325             The row hash (or "record") has keys that are field names and values
2326             that are copies of the scalar values stored in the data columns within
2327             the table.
2328              
2329             row() always copies only the fields in fieldlist(), except for
2330             row_list() which allows you to specify an optional $Fields parameter
2331             which can override the current fieldlist().
2332              
2333             If the fieldlist happens to be a subset of all fields, but you really
2334             want to get all fields in your record, then call fieldlist_set(0)
2335             first to permanently or temporarily delete it.
2336              
2337             row() and row_get() always return a hash.
2338              
2339             row($Num, $Hash), row_set() take a hash and set just the fields you
2340             specify in the hash (in the given row of course). Any non-existent
2341             field names in the hash are created, so be careful.
2342              
2343             In fact, in general with either getting or setting rows, any
2344             non-existent fields mentioned will be created for you (by internally
2345             calling col()). So you could build a whole table of 100 rows by
2346             starting with an empty, new, table and setting row 99 from a hash that
2347             gives the field names.
2348              
2349             Setting a row number higher than any existing row number with row(),
2350             row_set() or row_force() will automatically set the new length of the
2351             entire table to match (extending all the columns with empty rows as
2352             necessary).
2353              
2354             IMPORTANT: IF YOU SIMPLY MUST ADD ROWS SEQUENTIALLY, do not let the
2355             table auto-extend by one with each row you set. This is slow and gets
2356             rapidly slower if there's lots of data because the arrays holding the
2357             data columns will keep getting reallocated on every insert. Instead,
2358             first pre-extend the table to your highest row number by calling
2359             length($Len), and then set your rows. Or easier: if convenient just
2360             set your rows starting with the highest-numbered one first. If you
2361             don't know how many you'll have, guess or estimate and pre-extend to
2362             the estimated number and then cut back later. This will be faster
2363             than extending all columns by one each time.
2364              
2365             row_delete() removes a row or range of rows completely from the table.
2366             Any rows above the deleted ones will move down and the table's
2367             length() will decrease. If the data columns are very large, this
2368             could be a bit slow because a lot of data could be moved around. The
2369             low and high row numbers will be limited for you to 0 and length() -
2370             1, respectively. Null ranges are OK and are silently ignored. The
2371             range is inclusive, so to delete just row 99, call row_delete(99) or
2372             row_delete(99,99).
2373              
2374             EFFICIENCY NOTE: Don't call row_delete() to remove lots of individual
2375             rows. Instead, select those row numbers by setting the selection (if
2376             not already selected), and then invert the selection using
2377             selection_invert(), so the undesired rows are deselected, and then use
2378             the cull() method to rewrite the entire table at once. The deselected
2379             rows will be omitted very efficiently this way.
2380              
2381             row_move() moves a row from its $Old row number to the position before
2382             the row currently in row $New (specify $New = length() to move the row
2383             to the end). Again, in shuffling data in columns, lots of data could
2384             get moved around by this operation, so expect it to be slow. If as
2385             with row_delete(), if you will be doing several moves, consider
2386             building an appropriate selection() first, and then using cull()
2387             instead.
2388              
2389             Using row_delete() and row_move() to shift records around changes the
2390             record numbers of the affected records and many others in the table.
2391             The record numbers in the custom selection, if any, are updated to
2392             reflect these changes, so the records that were selected before will
2393             still be selected after the move (except those that were deleted of
2394             course). If you had a private copy of the selection, your copy will
2395             likely become outdated after these operations. You should get it
2396             again by calling selection().
2397              
2398             row_empty() returns a hash whose keys are the entries in fieldlist()
2399             and whose values are undef. (You could use it to fill in values
2400             before calling row_set()). Note: in this class, row_empty() does
2401             exactly the same thing as fieldlist_hash() when the latter is called
2402             with no arguments.
2403              
2404             row_exists() returns true if C<(($Num E= 0) && ($Num E $t-Elength()))>.
2405              
2406             rows() calls row() for each row num in a list and returns a list of
2407             the resulting hashes.
2408              
2409             row_list() gets row values as a list instead of a hash. They appear
2410             in the order specified in fieldlist() unless you supply an optional
2411             $Fields parameter listing the fields you want to get.
2412              
2413             row_list_set() sets row values as a list instead of a hash. Pass your
2414             own $Fields list or undef and fieldlist() will be used. $Values
2415             should be a list with the same number of values as fields expected;
2416             any shortage will result in undef/empty values being set.
2417              
2418             =cut
2419              
2420             {}; ## Get emacs to indent correctly.
2421              
2422             sub row
2423             {
2424 69     69 0 139 my $this = shift;
2425 69         95 my ($Num, $Row) = @_;
2426              
2427             ## Set if specified.
2428 69 50       141 return($this->row_set($Num, $Row)) if defined($Row);
2429              
2430             ## Else get.
2431 69         132 return($this->row_get($Num));
2432             }
2433              
2434             sub row_get
2435             {
2436 69     69 0 81 my $this = shift;
2437 69         94 my ($Num) = @_;
2438              
2439 69         123 my $Fields = $this->fieldlist();
2440 69         100 my $Row = {}; @$Row{@$Fields} = map {$this->col($_)->[$Num]} @$Fields;
  69         115  
  270         532  
2441              
2442 69         615 return($Row);
2443             }
2444              
2445             sub row_set
2446             {
2447 3     3 0 18 my $this = shift;
2448 3         6 my ($Num, $Row) = @_;
2449              
2450             ## We thoughtfully sort the keys in case columns will get created
2451             ## in this order.
2452              
2453 3         14 my $Fields = [sort keys %$Row];
2454              
2455             ## Pre-extend the table to accommodate row $Num if necessary.
2456             ## This will do nothing if there are not yet any fields in the
2457             ## table (and the length will still be effectively zero).
2458              
2459 3 100       10 $this->length($Num + 1) unless $this->length() >= $Num + 1;
2460            
2461             ## Insert into columns, creating them if necessary.
2462 3         8 foreach (@$Fields) {$this->col($_)->[$Num] = $Row->{$_}};
  10         27  
2463            
2464 3         12 return($Row); ## Why not?
2465             }
2466              
2467             sub row_delete
2468             {
2469 10     10 0 92 my $this = shift;
2470 10         18 my ($First, $Last) = @_;
2471              
2472             ## Nothing to do if $First not specified.
2473 10 100       23 return() unless defined($First);
2474              
2475             ## Default is $Last is same as first (remove one row only)
2476 9 100       23 $Last = $First unless defined($Last);
2477              
2478 9         18 my $LastIndex = $this->length() - 1;
2479              
2480             ## Restrict the range to meaningful values.
2481 9         23 $First = max($First, 0 ); ## First could be very high, to indicate a null range.
2482 9         23 $Last = min($Last, $LastIndex); ## Last could be negative, like -1, to indicate null range.
2483              
2484             ## Nothing to do if the range is empty.
2485 9 100       23 return() if $Last < $First;
2486              
2487 8         17 my $Fields = $this->fieldlist();
2488              
2489 8         13 my $RangeSize = ($Last - $First + 1);
2490            
2491 8         18 foreach (@$Fields) {splice @{$this->col($_)}, $First, $RangeSize};
  32         39  
  32         58  
2492            
2493             ## Here we could have trapped all the list segments we spliced out
2494             ## and return them in nice CTable-ish hash. Maybe we will some
2495             ## day. This would be a way to split a range of rows out of a
2496             ## table object to create another table object. We could even
2497             ## call it the "split" method...
2498            
2499             ## After deleting the rows, we need to adjust the _Selection if
2500             ## present. Deleted row numbers in the selection need to be
2501             ## omitted; row numbers greater than the range of deleted ones
2502             ## need to be decreased by the size of the range reduction, and
2503             ## others need to be left untouched.
2504            
2505 0 0       0 $this->{_Selection} =
    0          
2506             [map
2507             {
2508 0         0 ($_ < $First ? $_ : ## Before range: pass through.
2509             ($_ <= $Last ? () : ## In range: omit.
2510             $_ - $RangeSize)) ## After range: reduce by range size
2511 8 50       33 } @{ $this->{_Selection}}]
2512             if $this->{_Selection};
2513             }
2514              
2515             sub row_move
2516             {
2517 8     8 0 13 my $this = shift;
2518 8         12 my ($Old, $New) = @_;
2519            
2520             ## $Old and $New are required params and must not be undef.
2521 8 50 33     41 goto done unless defined($Old) && defined($New);
2522            
2523             ## If $Old and $New are the same or one apart, there's nothing to do.
2524 8 100       23 goto done if ($New == $Old); ## This would mean a no-op.
2525 7 100       20 goto done if ($New == $Old + 1); ## This would mean a no-op.
2526            
2527 5         11 my $Length = $this->length();
2528            
2529             ## Ensure both $Old and $New are legal indices.
2530 5 50 33     28 goto done if (($Old < 0) || ($Old > $Length - 1));
2531 5 50 33     26 goto done if (($New < 0) || ($New > $Length )); ## New has a range up to $Length, meaning move to end.
2532              
2533 5         11 my $Fields = $this->fieldlist_all();
2534              
2535 5 100       14 if ($Old < $New) ## Move forward (to higher / later row num)
2536             {
2537             ## Delete from the lower position and insert into higher -- MINUS ONE to account for prior shortening.
2538 2         3 my $Row = {};
2539 2         4 foreach (@$Fields) {$Row->{$_} = splice(@{$this->col($_)}, $Old , 1 )};
  8         95  
  8         19  
2540 2         6 foreach (@$Fields) { splice(@{$this->col($_)}, $New - 1, 0, $Row->{$_})};
  8         11  
  8         16  
2541              
2542             }
2543             else ## Move backward (to lower / earlier row num)
2544             {
2545             ## Delete from the higher position and insert into lower.
2546 3         5 my $Row = {};
2547 3         7 foreach (@$Fields) {$Row->{$_} = splice(@{$this->col($_)}, $Old , 1 )};
  12         12  
  12         26  
2548 3         6 foreach (@$Fields) { splice(@{$this->col($_)}, $New , 0, $Row->{$_})};
  12         14  
  12         24  
2549              
2550             }
2551              
2552             ## After moving the rows, we need to adjust the _Selection if
2553             ## present. Row numbers outside the shuffled range stay the same;
2554             ## the moved row number(s) change; others (inside the range) get
2555             ## shifted down or up by 1.
2556              
2557 5 100       17 if ($Old < $New) ## Move forward / higher / later
2558             {
2559 6 100       25 $this->{_Selection} =
    50          
    100          
2560             [map
2561             {
2562 2         5 ($_ == $Old ? $New - 1 : ## Moved row: change num to new - 1
2563             ($_ < $Old ? $_ : ## Less than $Old: no change
2564             ($_ >= $New ? $_ : ## Grtr= than $New: no change
2565             $_ - 1))) ## In range: shift down by 1.
2566 2 50       7 } @{ $this->{_Selection}}]
2567             if $this->{_Selection};
2568            
2569             }
2570             else ## Move backward / lower / earlier
2571             {
2572 9 100       34 $this->{_Selection} =
    100          
    100          
2573             [map
2574             {
2575 3         7 ($_ == $Old ? $New : ## Moved row: change num to new
2576             ($_ >= $Old ? $_ : ## Grtr= than $Old: no change
2577             ($_ < $New ? $_ : ## Less than $New: no change
2578             $_ + 1))) ## In range: shift up by 1.
2579 3 50       10 } @{ $this->{_Selection}}]
2580             if $this->{_Selection};
2581             }
2582              
2583 8         25 done:
2584             return;
2585             }
2586              
2587             sub row_empty
2588             {
2589 2     2 0 9 my $this = shift;
2590              
2591 2         5 my $Fields = $this->fieldlist();
2592 2         4 my $Row = {}; @$Row{@$Fields} = undef;
  2         10  
2593              
2594 2         12 return($Row);
2595             }
2596              
2597             sub row_exists
2598             {
2599 14     14 0 38 my $this = shift;
2600 14         16 my ($Num) = @_;
2601              
2602 14   100     46 return(($Num >= 0) && ($Num < $this->length()));
2603             }
2604              
2605             sub rows
2606             {
2607 4     4 0 13 my $this = shift;
2608 4         6 my ($Nums) = @_;
2609            
2610 4         7 return([map {$this->row($_)} @$Nums]);
  8         20  
2611             }
2612              
2613             sub row_list
2614             {
2615 19     19 0 33 my $this = shift;
2616 19         26 my ($Num, $Fields) = @_;
2617              
2618             ## $Fields argument is optional and defaults to fieldlist();
2619 19   66     54 $Fields ||= $this->fieldlist();
2620              
2621 19         32 my $Row = [map {$this->col($_)->[$Num]} @$Fields];
  105         193  
2622 19         193 return($Row);
2623             }
2624              
2625             sub row_list_set
2626             {
2627 6     6 0 39 my $this = shift;
2628 6         9 my ($Num, $Fields, $Vals) = @_;
2629              
2630             ## $Fields argument is optional and defaults to fieldlist();
2631 6   66     19 $Fields ||= $this->fieldlist();
2632              
2633             ## $Vals is optional and defaults to [].
2634 6   50     10 $Vals ||= [];
2635              
2636             ## Pre-extend the table to accommodate row $Num if necessary.
2637 6 50       14 $this->length($Num + 1) unless $this->length() >= $Num + 1;
2638            
2639             ## Set the $Vals in row $Num in the order given by $Fields.
2640 6         15 foreach (0..$#$Fields) {$this->col($Fields->[$_])->[$Num] = $Vals->[$_]}
  18         35  
2641             }
2642              
2643             =pod
2644              
2645             =head1 ROW / RECORD COUNT (TABLE LENGTH)
2646              
2647             ## Getting or setting table length
2648              
2649             $t->length() ## Get length
2650             $t->length_get()
2651              
2652             $t->length(22) ## Set length (truncate or pre-extend)
2653             $t->length_set(22)
2654              
2655             $t->extend() ## Set length of all columns to match longest
2656              
2657             The length* methods assume the table already has columns of equal
2658             length. So the length of the table is the length of any field taken
2659             at random. We choose the first one in the field list.
2660              
2661             Setting the length will truncate or pre-extend every column in the
2662             table to a given length as required.
2663              
2664             (Pre-extending means setting each column's length via $# so that it
2665             has the correct number of entries already allocated (and filled with
2666             undef) so that operations that fill up the table can be done much more
2667             quickly than with push().
2668              
2669             However, if a new column has been added directly, or a table has been
2670             constructed out of columns whose length may not initially match, the
2671             extend() method may be (should be) called to inspect all columns and
2672             extend them all to match the longest one. Note that extend() operates
2673             on all fields in the object, ignoring the custom _FieldList if any.
2674              
2675             The length of a table with no columns is zero.
2676              
2677             =cut
2678              
2679             sub length
2680             {
2681 470     470 0 743 my $this = shift;
2682 470         560 my ($Length) = @_;
2683            
2684 470 100       1172 return(defined($Length) ?
2685             $this->length_set($Length) :
2686             $this->length_get());
2687             }
2688              
2689             sub length_get
2690             {
2691 468     468 0 581 my $this = shift;
2692              
2693 468         784 my $FieldList = $this->fieldlist();
2694 468         722 my $FirstField = $FieldList->[0];
2695 468         869 my $Col = $this->{$FirstField};
2696 468 100       1119 my $Length = (ref($Col) eq 'ARRAY' ? @$Col+0 : 0);
2697              
2698 468         1835 return($Length);
2699             }
2700              
2701             sub length_set
2702             {
2703 26     26 0 36 my $this = shift;
2704 26         35 my ($Length) = @_;
2705              
2706             ## Apply the length-setting logic to any field found in the hash
2707             ## OR listed in the field list. They will all be created if not
2708             ## already present.
2709              
2710 26         27 my $FieldList = [@{$this->fieldlist_all()}, @{$this->fieldlist()}];
  26         49  
  26         51  
2711            
2712 26         60 foreach my $FieldName (@$FieldList)
2713             {
2714 223         254 $#{$this->col($FieldName)} = ($Length - 1); ## $Length = 0 => $# = -1 => empty list.
  223         357  
2715             };
2716              
2717             ## Since records might have been deleted, re-validate the
2718             ## _Selection, if it is present.
2719              
2720 26         62 $this->selection_validate();
2721            
2722 26         65 return($Length);
2723             }
2724              
2725             sub extend
2726             {
2727 22     22 0 32 my $this = shift;
2728 22         29 my $Length = 0;
2729              
2730             ## Find the length of the longest vector...
2731            
2732 22         28 foreach (@{$this->fieldlist_all()}) {$Length = max($Length, $#{$this->{$_}} + 1)};
  22         41  
  114         132  
  114         265  
2733              
2734             ## ...and set them all to be that length.
2735              
2736 22         64 $this->length_set($Length);
2737             }
2738              
2739             =pod
2740              
2741             =head1 SELECTIONS
2742              
2743             ## Getting or setting the custom selection list itself (_Selection)
2744              
2745             $t->selection() ## Get sel if any; else all()
2746             $t->selection_get()
2747              
2748             $t->selection($List) ## Set sel (list of rec nums)
2749             $t->selection_set($List)
2750            
2751             $t->selection(0) ## Remove sel (select all)
2752             $t->selection_set(undef)
2753             $t->selection_delete()
2754             $t->select_all()
2755              
2756             $t->selection_inverse() ## Get inverse copy of selection
2757             $t->select_inverse() ## Invert the selection
2758              
2759             $t->selection_validate() ## Remove invalid #s from sel
2760              
2761             ## List of all rec nums present (regardless of selection)
2762              
2763             $t->all()
2764              
2765             ## Getting or setting just selected fields in columns
2766             ## (as contrasted with col() and friends).
2767              
2768             $t->sel($ColName) ## Get col but only records in sel
2769             $t->sel_get($ColName)
2770              
2771             $t->sel($ColName, $ListRef) ## Set selected fields in col...
2772             $t->sel_set($ColName, $ListRef) ##... in selection order
2773              
2774             $t->sel_set($ColName) ## Set selected fields to undef
2775             $t->sel_clear($ColName)
2776              
2777             $t->sels($ColList) ## Like cols, but selected fields
2778             $t->sels_hash($ColList) ## " " cols_hash()... " " " "
2779              
2780             ## Finding out size of selection (number of rows)
2781              
2782             $t->sel_len() ## Get number of selected rows.
2783              
2784             A selection is an ordered list of record numbers. The record numbers
2785             in the selection may be a subset of available records. Furthermore,
2786             they may be in non-record-number order, indicating that the records
2787             have been sorted.
2788              
2789             Record numbers are numeric array indices into the columns in the
2790             table. It is an error for any selection list to contain an index less
2791             than zero or greater than (length() - 1), so if you set a selection
2792             explicitly, be careful.
2793              
2794             Any selection list you get or set belongs to the object. Be careful
2795             of modifying its contents.
2796              
2797             The custom selection, if any, is stored internally in the _Selection
2798             parameter. If this parameter is absent, the selection defaults to
2799             all() -- i.e. a list of all record numbers, in order:
2800             [0..($this->length() - 1)] (which becomes [] if length() is 0).
2801              
2802             REMEMBER: length() is one-based, but record numbers are zero-based.
2803              
2804             Removing the selection (that is, removing the LIST itself of which
2805             records are selected), is the same as selecting all records.
2806             consequently, selection(0), selection_delete(), and select_all() are
2807             all synonymous.
2808              
2809             selection_validate() removes any entries from the current _Selection
2810             list (if any) that are not valid record numbers -- i.e. it removes any
2811             record whose integer value is < 0 or greater than length() - 1. This
2812             routine is mainly used by other methods that might delete records,
2813             such as length_set().
2814              
2815             Getting or setting just selected data from columns
2816              
2817             Sometimes, you don't want to get/set entire columns, you instead want
2818             to get or set data in just the selected fields in a column.
2819              
2820             The sel(), sel_get(), sel_set(), sels() and sels_hash() methods are
2821             analagous to the corresponding col(), ... cols_hash() methods except
2822             in these two ways:
2823              
2824             - the 'sels' variants get or set just selected data, as determined by
2825             the current selection(), which gives an ordered list of the selected /
2826             sorted records.
2827              
2828             - the 'sels' variants all make COPIES of the data you request or
2829             supply -- the data is copied out of or into the correspnding column.
2830             So, you "own" any vector you pass or receive in reply.
2831              
2832             So, for example, imagine you have just set selection() to only list
2833             record numbers where the LastName field is not empty. Then you have
2834             called sort() to sort those record numbers by the LastName field. You
2835             could then call $t->sel('LastName') to get a sorted list of all
2836             non-empty last names.
2837              
2838             It might be helpful to think of "sel" as short for "selected". So
2839             $t->sel('LastName') would mean "get the selected field values from the
2840             LastName field".
2841              
2842             =cut
2843              
2844             sub selection
2845             {
2846 477     477 1 673 my $this = shift;
2847 477         597 my ($Selection) = @_;
2848              
2849             ## Set if specified.
2850 477 100       842 $this->selection_set($Selection) if defined($Selection);
2851            
2852             ## Get and return.
2853 477         770 $Selection = $this->selection_get();
2854              
2855 477         1178 return($Selection);
2856             }
2857              
2858             sub selection_get
2859             {
2860 478     478 0 546 my $this = shift;
2861              
2862 478   66     1353 my $Selection = $this->{_Selection} || $this->selection_default();
2863              
2864 478         784 return($Selection);
2865             }
2866              
2867             sub selection_set
2868             {
2869 159     159 0 186 my $this = shift;
2870 159         185 my ($Selection) = @_;
2871              
2872 159 100       295 if (ref($Selection) eq 'ARRAY')
2873             {
2874             ## Set if specified...
2875 36         69 $this->{_Selection} = $Selection;
2876             }
2877             else
2878             {
2879             ## Otherwise, delete and return original if any.
2880 123         246 $Selection = delete $this->{_Selection};
2881             }
2882            
2883 159         288 return($Selection);
2884             }
2885              
2886             sub selection_delete
2887             {
2888 2     2 0 5 my $this = shift;
2889 2         6 $this->selection_set();
2890             }
2891              
2892             sub select_all
2893             {
2894 70     70 0 115 my $this = shift;
2895 70         167 $this->selection_set();
2896             }
2897              
2898             sub select_none
2899             {
2900 4     4 0 22 my $this = shift;
2901 4         10 $this->selection_set([]);
2902             }
2903              
2904             sub selection_default
2905             {
2906 322     322 0 344 my $this = shift;
2907              
2908 322         8686 my $Selection = $this->all();
2909              
2910 322         885 return($Selection);
2911             }
2912              
2913             sub all
2914             {
2915 343     343 0 384 my $this = shift;
2916 343         580 my $RowNums = [0..($this->length() - 1)];
2917              
2918 343         587 return($RowNums);
2919             }
2920              
2921             sub selection_inverse
2922             {
2923 9     9 0 14 my $this = shift;
2924 9         15 my $Sel = $this->selection();
2925 9         18 my $All = $this->all();
2926              
2927 9         18 @$All[@$Sel] = undef;
2928 9         17 $All = [grep {defined} @$All];
  27         54  
2929              
2930 9         34 return($All);
2931             }
2932              
2933             sub select_inverse
2934             {
2935 1     1 0 2 my $this = shift;
2936              
2937 1         4 return($this->{_Selection} = $this->selection_inverse());
2938             }
2939              
2940             sub selection_validate
2941             {
2942 27     27 0 29 my $this = shift;
2943              
2944 27 100       84 if (ref($this->{_Selection}) eq 'ARRAY')
2945             {
2946 1         4 $this->{_Selection} = $this->selection_validate_internal($this->{_Selection});
2947             }
2948             }
2949              
2950             sub selection_validate_internal
2951             {
2952 19     19 0 31 my $this = shift;
2953 19         29 my ($Selection) = @_;
2954              
2955 19         37 my $Length = $this->length();
2956              
2957 19 100       39 $Selection = [grep {(($_ >= 0) && ($_ < $Length))} @$Selection];
  71         299  
2958              
2959 19         43 return($Selection);
2960             }
2961              
2962             sub sel ## ($ColName, [$Vector])
2963             {
2964 284     284 0 474 my $this = shift;
2965 284         380 my ($ColName, $Vector) = @_;
2966              
2967             ## Set if specified.
2968 284 50       447 if (defined($Vector))
2969             {
2970 0         0 $this->sel_set($ColName, $Vector);
2971             ## Nothing to return.
2972             }
2973             ## Get and return.
2974             else
2975             {
2976 284         521 my $Sel = $this->sel_get($ColName);
2977 284         798 return($Sel);
2978             }
2979             }
2980              
2981             sub sel_get
2982             {
2983 318     318 0 500 my $this = shift;
2984 318         473 my ($ColName, $Selection) = @_;
2985              
2986 318         646 my $Col = $this->col($ColName);
2987 318   66     1098 $Selection ||= $this->selection();
2988 318         1083 my $Sel = [@$Col[@$Selection]];
2989            
2990 318         601 return($Sel);
2991             }
2992              
2993             sub sel_set ## ($ColName, [$Vector])
2994             {
2995 5     5 0 18 my $this = shift;
2996 5         7 my ($ColName, $Vector) = @_;
2997              
2998 5         11 my $Col = $this->col($ColName);
2999 5         11 my $Selection = $this->selection();
3000              
3001 5 100 66     24 if (defined($Vector) && (ref($Vector) eq 'ARRAY'))
3002             {
3003 3         13 @$Col[@$Selection] = @$Vector;
3004             }
3005             else
3006             {
3007 2         6 @$Col[@$Selection] = undef;
3008             }
3009             }
3010              
3011             sub sel_clear ## ($ColName)
3012             {
3013 2     2 0 8 my $this = shift;
3014 2         3 my ($ColName) = @_;
3015            
3016 2         6 $this->sel_set($ColName);
3017             }
3018              
3019             sub sel_len
3020             {
3021 20     20 0 86 my $this = shift;
3022            
3023 16         50 return(ref($this->{_Selection}) eq 'ARRAY' ?
3024 20 100       51 @{$this->{_Selection}}+0 :
3025             $this->length());
3026             }
3027              
3028             sub sels ## ($ColNames)
3029             {
3030 56     56 0 71 my $this = shift;
3031 56         76 my ($ColNames) = @_;
3032 56   66     128 $ColNames ||= $this->fieldlist();
3033 56         196 my $Sels = [map {$this->sel($_)} @$ColNames];
  216         427  
3034              
3035 56         268 return($Sels);
3036             }
3037              
3038             sub sels_hash ## ($ColNames)
3039             {
3040 4     4 0 8 my $this = shift;
3041 4         7 my ($ColNames) = @_;
3042 4   66     18 $ColNames ||= $this->fieldlist();
3043 4         9 my $Sels = $this->sels($ColNames);
3044 4         8 my $SelsHash = {}; @$SelsHash{@$ColNames} = @$Sels;
  4         16  
3045              
3046 4         35 return($SelsHash);
3047             }
3048              
3049             =pod
3050              
3051             =head1 SEARCHING / SELECTING RECORDS
3052              
3053             ## Modifying the table's custom selection (_Selection)
3054              
3055             $t->select_all() ## Set _Selection = $t->all() or undef
3056             $t->select_none() ## Set _Selection = []
3057             $t->select_inverse() ## Invert the curr. sel. (and get it)
3058              
3059             ## Specific searches: "the select() methods"
3060              
3061             $t->select($Field1=>$Sub1, ## Del nonmatching recs from sel.
3062             $Field2=>$Sub2, ## i.e. narrow sel. to match
3063             ...);
3064              
3065             $t->omit ($Field1=>$Sub1, ## Del matching recs from sel.
3066             $Field2=>$Sub2,
3067             ...);
3068              
3069             $t->add ($Field1=>$Sub1, ## Add matching recs to sel.
3070             $Field2=>$Sub2,
3071             ...);
3072              
3073             $t->but ($Field1=>$Sub1, ## Add nonmatching recs to sel.
3074             $Field2=>$Sub2,
3075             ...);
3076              
3077             ## Getting useful lists of record numbers...
3078              
3079             $t->all() ## Get "full" sel. (all record #s)
3080             $t->selection() ## Get current selection
3081             $t->selection_inverse() ## Get inverse copy of curr. sel.
3082              
3083             ## Example 1: Refine a selection by narrowing down...
3084              
3085             $t->select_all()
3086             $t->select(Field1 => sub {$_});
3087             $t->select(Field2 => sub {$_});
3088             $t->select(Field3 => sub {$_});
3089              
3090             ## Example 2: Manually refine and set the selection...
3091              
3092             $Sel = [grep {$t->col($Field1)->[$_]} @{$t->all ()}];
3093             $Sel = [grep {$t->col($Field2)->[$_]} @$Sel];
3094             $Sel = [grep {$t->col($Field3)->[$_]} @$Sel];
3095             $t->selection($Sel); ## Set the selection when done.
3096              
3097             ## Example 3: Complex manual search using calculated value
3098            
3099             my $A = $t->col('A');
3100             my $B = $t->col('B');
3101             my $S = [grep
3102             {my $X = $A->[$_] + $B->[$_]; ($X > 100 && $X < 200);}
3103             @{$t->all()}]; ## Or could start with $t->selection().
3104             $t->selection($S); ## Set the selection when done.
3105              
3106             ## Example 4: Refine a selection by building up...
3107              
3108             $t->select_none()
3109             $t->add(Field1 => sub {$_});
3110             $t->add(Field2 => sub {$_});
3111             $t->add(Field3 => sub {$_});
3112              
3113             ## Example 5: Combining the select() methods to build a query...
3114              
3115             $t->select_all()
3116             $t->select(Status => sub {/prime/i });
3117             $t->omit (DueDate => sub {$_ > $Today});
3118             $t->add (Force => sub {$_ });
3119              
3120             select() and its friends omit(), add(), and but(), known collectively
3121             as "the select() methods," all work similarly: they take a series of
3122             one or more pairs indicating matches to be done, where each match is
3123             specified as (FieldName => Subroutine).
3124              
3125             In addition to the field names already present in the table, the
3126             FieldName in any Spec may also be one of these two special
3127             pseudo-fields:
3128              
3129             =over 4
3130              
3131             =item _RecNum
3132              
3133             the record number of the record being compared
3134              
3135             =item _SelNum
3136              
3137             the numerical position of the record being compared within the
3138             previous selection (only usable with select() and omit() since add()
3139             and but() by definition operate on non-selected records).
3140              
3141             =back
3142              
3143             For example:
3144              
3145             ## Match 2nd 100 rec numbers
3146             $t->select(_RecNum => sub {$_ >= 100 && $_ <= 199});
3147              
3148             ## Match 2nd 100 currently selected/sorted items
3149             $t->select(_SelNum => sub {$_ >= 100 && $_ <= 199});
3150              
3151             Be careful when using _SelNum in a search. In the above _SelNum search
3152             example, since the selection itself will be modified by select(), the
3153             items that were formerly selection items 100 - 199 will now be _SelNum
3154             0 - 99 in the new selection.
3155              
3156             The Subroutine is an anonymous grep-style predicate that operates on
3157             $_ and should return true/false to indicate a match with an element of
3158             the field FieldName.
3159              
3160             The order of multiple matches in a single method call is significant
3161             only in that the searches can be faster if the field that will match
3162             the fewest records is listed first.
3163              
3164             A given FieldName may be listed in the specs more than once if it has
3165             multiple search criteria that you prefer to execute as multiple
3166             subroutines (though it would be more efficient on very large tables to
3167             combine their logic into one subroutine joined with "&&").
3168              
3169             Each field match will be applied (with an implied AND joining them) to
3170             determine whether the record itself matches. Then, based on whether
3171             the record itself matches, it will either be added or deleted from the
3172             selection based on which method is being called:
3173              
3174             method... operates on... action....
3175             ------------------------------------------------------------------
3176             select() selected records Keep only recs that DO match
3177             omit() selected records Keep only recs that DO NOT match
3178             add() non-selected recs Add recs that DO match
3179             but() non-selected recs Add recs that DO NOT match
3180              
3181             Here's how to think about what's going on:
3182              
3183             methods... think...
3184             ------------------------------------------------------------------
3185             select() "SELECT things matching this"...
3186             omit() "... then OMIT those matching this."
3187              
3188             select() "SELECT things matching this"...
3189             add() "... and ADD any others matching this."
3190              
3191             select() "SELECT things matching this"...
3192             but() "... and add any others BUT those matching this."
3193              
3194             select() and omit() both NARROW the selection.
3195              
3196             add() and but() both INCREASE the selection.
3197              
3198             IMPORTANT: You DO NOT need to use these select() routines to work with
3199             selections. It may be much easier for you to clarify your logic, or
3200             more efficient to express your search, using a single grep or series
3201             of grep operations as in Examples 2 or 3 above.
3202              
3203             Building the selection manually is required if you want to filter
3204             based on any COMPLEX RELATIONSHIPS BETWEEN FIELDS. For example, if
3205             you want to add two fields and match or reject the record based on the
3206             sum of the fields.
3207              
3208             In Example 3 above, we add the values in fields "A" and "B" and then
3209             match the record only if the SUM is between 100 and 199. By grepping
3210             to produce a subset of @{$t->all()}, you end up with a Selection -- a
3211             list of record numbers you want "selected". Then you call
3212             $t->selection() to put the selection you built into the object.
3213              
3214             If you had instead wanted to narrow an existing selection in the above
3215             example, you would start with $t->selection() (which defaults to
3216             $t->all()) instead of starting with $t->all().
3217              
3218             Each of the select() methods returns $this->selection() as a
3219             convenience.
3220              
3221             =head2 The effects of modifying a sorted selection
3222              
3223             Generally, you should sort AFTER finding, and you should not generally
3224             rely on sort order after doing a find. But in case you want to know,
3225             the following chart explains what happens to the sort order after the
3226             various select() commands are called (at least in the current
3227             implementation, which may change without notice):
3228              
3229             method... effect on an existing sort order...
3230             ------------------------------------------------------------------
3231             select() relative sort order is preserved (stay sorted)
3232             omit() all selected recs restored to "natural" order (unsorted)
3233             add() orig. recs preserved; new recs appended: "natural" order
3234             but() orig. recs preserved; new recs appended: "natural" order
3235              
3236             In other words, you could sort() first and then call select() to
3237             narrow down the selection repeatedly without disrupting the sort
3238             order. However, any of the other methods will disrupt the sort order
3239             and you would need to re-sort. The preservation of order when using
3240             select(), and other sort order effects, are likely but not guaranteed
3241             to be preserved in future implementations.
3242              
3243             =head2 Hints about Boolean logic
3244              
3245             Consider the following example and the alternative below it. You
3246             might initially think these are equivalent, but they're not:
3247              
3248             ## Case 1:
3249              
3250             my $Sel = $t->add(Force => sub {$_ == 1 });
3251             my $Sel = $t->add(Status => sub {$_ eq 'Prime'});
3252              
3253             ## Case 2:
3254              
3255             my $Sel = $t->add(Force => sub {$_ == 1 },
3256             Status => sub {$_ eq 'Prime'});
3257              
3258             Case 1 extends the selection by adding all records where Force == 1,
3259             and then extends it again by adding all additional records where
3260             Status eq 'Prime'.
3261              
3262             Case 2 adds only those records where: Force == 1 AND ALSO, IN THE SAME
3263             RECORD, Status eq 'Prime'.
3264              
3265             One final note about logic. This is not SQL and these select() etc.
3266             routines are not meant to replace the full power of a programming
3267             language.
3268              
3269             If you want full Boolean expressions, use the power of Perl to form
3270             your own arbitrarily complex query using grep as in Example 3 above.
3271              
3272             Writing your own grep is also almost always faster than chaining the
3273             builtin select() methods or using multiple Field / Sub specifications,
3274             so keep that in mind when working with extremely large data sets.
3275              
3276             With tables of only a few thousand records or so, you probably won't
3277             notice the difference in efficiency.
3278              
3279             =cut
3280              
3281             {}; ## Get emacs to indent correctly.
3282              
3283 4     4 0 35 sub select {my $this = shift; return($this->select_internal(!'Add', !'Not', @_))}
  4         13  
3284 5     5 0 29 sub omit {my $this = shift; return($this->select_internal(!'Add', 'Not', @_))}
  5         19  
3285 3     3 0 23 sub add {my $this = shift; return($this->select_internal( 'Add', !'Not', @_))}
  3         8  
3286 1     1 0 8 sub but {my $this = shift; return($this->select_internal( 'Add', 'Not', @_))}
  1         4  
3287              
3288             sub select_internal ## Implements all 4 "select() methods"
3289             {
3290 13     13 0 23 my $this = shift;
3291 13         34 my ($Add, $Not, @Specs) = @_;
3292            
3293             ## In "Add" mode, we only operate on not-yet-selected records.
3294             ## Otherwise, we operate on the current selection.
3295              
3296             ## Either way, start out with all of one or the other.
3297              
3298 13 100       40 my $Start = ($Add ? $this->selection_inverse() : $this->selection());
3299              
3300             ## Then grep repeatedly for each Field/Sub spec we were given, in
3301             ## order. For a record to match, ALL specs must match -- i.e. the
3302             ## record number must make it through the grep gauntlet once for
3303             ## each Field/Sub in the Specs.
3304            
3305 13         22 my $Pseudo = {}; ## hold pseudo-columns _RecNum, _SelNum if requested.
3306            
3307 13         19 my $Matches = $Start;
3308            
3309 13         18 my $i = 0;
3310 13         36 while ($i < (@Specs - 1))
3311             {
3312 13         31 my ($Field, $Sub) = @Specs[$i++, $i++];
3313            
3314 13 50 33     87 next unless ((length($Field)) && (ref($Sub) eq 'CODE'));
3315              
3316             ## Create pseudo-fields _RecNum / _SelNum if needed, but at
3317             ## most once per invocation.
3318              
3319 13 50 0     33 $Pseudo->{_RecNum} ||= $this->all() if ($Field eq '_RecNum');
3320 13 50 0     28 $Pseudo->{_SelNum} ||= $this->selnum_mask() if ($Field eq '_SelNum');
3321            
3322             ## Narrow down $Matches using this Field/Spec, then move on to next.
3323 36 50 33     142 $Matches =
3324             [grep
3325             {
3326             ## Locally bind $_ to value of field in this column / this record.
3327 13         23 local $_ = $ {($this ->{$Field} || ## 98% of time: good field name
  36         178  
3328             $Pseudo->{$Field} || ## 1% of time: _RecNum or _SelNum
3329             ($this ->warn("Bad field name: $Field"),
3330             $this ->col_empty())
3331             )}[$_]; ## look up value in record $_ of column
3332            
3333             ## Call the sub & let it yield the Boolean value.
3334 36         73 &$Sub();
3335             }
3336             @$Matches];
3337             }
3338            
3339             ## IMPLEMENTATION NOTE:
3340            
3341             ## The logic below to support "Not" looks complicated, and indeed
3342             ## it could be made cleaner if we were to process the "Not" logic
3343             ## during the previous step. However, doing that would make the
3344             ## above nested loop(s) above much less efficient because we'd
3345             ## have to move the while() loop inside the grep -- repeating that
3346             ## loop up to several times for each record instead of just a few
3347             ## times total. So the logic below will actually save execution
3348             ## time. Besides, using array-slicing to achieve the selection
3349             ## masking is quite fast.
3350              
3351             ## "Add" means append the matching record numbers to the existing
3352             ## selection, if any.
3353            
3354             ## sub select()
3355 13 100 100     173 if (!$Not && !$Add) ## ... i.e. remove unfound (i.e. keep (only) the ones we found)...
    100 66        
    100 66        
    50 33        
3356             {
3357 4         11 $this->{_Selection} = $Matches; ## sort order is preserved...
3358             }
3359            
3360             ## sub add() ## ... i.e. add the ones we found...
3361             elsif (!$Not && $Add)
3362             {
3363 3         5 push @{$this->selection()}, @$Matches; ## order preserved in first part only
  3         8  
3364             }
3365            
3366             ## ! "Add" means replace the existing selection with those that
3367             ## matched (effectively removing any non-matching ones).
3368              
3369             ## "Not" means select the opposite of the set of records we just
3370             ## matched.
3371              
3372             ## sub omit()
3373             elsif ( $Not && !$Add) ## ... i.e. remove the opposite of the ones we found...
3374             {
3375 5         20 my $Sel = $this->col_empty(); ## Start with empty mask (all entries undef).
3376 5         17 @$Sel[@$Start] = @$Start; ## Mask in those in the original selection.
3377 5         14 @$Sel[@$Matches] = undef; ## Mask out those we found.
3378 5         13 my $NonMatches = [grep {defined} @$Sel]; ## The remaining ones are the non-matches.
  15         32  
3379              
3380 5         19 $this->{_Selection} = $NonMatches; ## The new selection IS the non-matches.
3381              
3382             ## selection order not preserved
3383             }
3384              
3385             ## sub but()
3386             elsif ( $Not && $Add) ## ... i.e. add the opposite of the ones we found...
3387             {
3388 1         5 my $Sel = $this->all(); ## Start with a full selection mask.
3389 1         2 @$Sel[@{$this->selection()}] = undef; ## Mask out those in original selection.
  1         5  
3390 1         4 @$Sel[@$Matches ] = undef; ## Mask out those we found.
3391 1         3 my $NonMatches = [grep {defined} @$Sel]; ## The remaining ones are the non-matches.
  3         8  
3392            
3393 1         3 push @{$this->selection()}, @$NonMatches; ## Add the non-matches to the selection.
  1         4  
3394              
3395             ## order preserved in first part only
3396             }
3397              
3398 13         28 return($this->selection());
3399             }
3400              
3401             sub selnum_mask ## Create mask mapping RecNum -> selected item num or undef if not selected
3402             {
3403 0     0 0 0 my $this = shift;
3404 0         0 my $Mask = $this->col_empty();
3405 0         0 my $Sel = $this->selection();
3406 0         0 @$Mask[@$Sel] = [0..$#$Sel];
3407              
3408 0         0 return($Mask);
3409             }
3410              
3411             =pod
3412              
3413             =head1 SORTING
3414              
3415             ## Sort the current table's _Selection
3416              
3417             $t->sort() ## Use existing/default params
3418             $t->sort([qw(Last First Phone)]) ## Specify _SortOrder (fields)
3419             $t->sort( ## Named-parameter call:
3420             _SortOrder => [...], ## override sort-related params.
3421             _Selection => [...], ## (See param lists above).
3422             _SortSpecs => {...},
3423             _SRoutines => {...},
3424             _DefaultSortType=>'Integer',
3425             _DefaultSortDirection=>-1,
3426             );
3427              
3428             The sort() method modifies the _Selection (creating one with all
3429             records if it was missing, undef, or not supplied by caller) so that
3430             the record numbers listed there are sorted according to the criteria
3431             implied by _SortOrder, _SortSpecs, _SRoutines, etc.
3432              
3433             For example, before sorting, a table's "natural" order might be:
3434              
3435             Rec# First Last Age State
3436             0 Chris Zack 43 CA
3437             1 Marco Bart 22 NV
3438             2 Pearl Muth 15 HI
3439              
3440             ... and its selection() method would yield: [0, 1, 2] -- which is a
3441             list of all the records, in order.
3442              
3443             After calling $t->sort([Last]), selection() would yield [1, 2, 0]. So
3444             displaying the table in "selection" order would yield:
3445              
3446             Rec# First Last Age State
3447             1 Marco Bart 22 NV
3448             2 Pearl Muth 15 HI
3449             0 Chris Zack 43 CA
3450              
3451             IMPORTANT: sorting does not alter any data in the table. It merely
3452             alters the _Selection parameter (which you can then get and set using
3453             the selection() methods described above).
3454              
3455             If you want to permanently alter the table's data in memory so that
3456             the new sorted order becomes the "natural" order, you can use the
3457             cull() method to modify the original object, the snapshot() method to
3458             make a new object, or use the write() method to write the data to disk
3459             in selected/sorted order and then read() it back again.
3460              
3461             =head2 Using the Named-parameter calling convention with sort()
3462              
3463             You may specify any combination of the parameters listed above when
3464             calling sort(). Any you specify will be used IN PLACE OF the
3465             corresponding parameters already found in the object.
3466              
3467             If you specify _Selection using the named-parameter calling, the
3468             sort() method reserves the right to "own" the list you provide, and
3469             use it as the object's new _Selection, possibly discarding the
3470             previous _Selection, if any and modifying the one you provided. So
3471             don't make any assumptions about ownership of that list object after
3472             calling sort(). Luckily, you will rarely need to provide _Selection
3473             explicitly since generally you'll want to be sorting the selection()
3474             already inherent in the object.
3475              
3476             sort() returns the _Selection list owned by the object (the same list
3477             that would be returned if you called the selection() method
3478             immediately after calling sort()).
3479              
3480             See the next sections for complete descriptions of _SortOrder and
3481             other sorting parameters.
3482              
3483             =cut
3484              
3485             {}; ## Get emacs to indent correctly.
3486              
3487             sub sort
3488             {
3489 10     10 1 64 my $this = shift;
3490 10 100       43 my $Params = (@_ == 1 ? {_SortOrder => $_[0]} : {@_});
3491              
3492 60         111 my($SortOrder, $Selection, $SortSpecs, $SRoutines, $DefaultSortType, $DefaultSortDirection)
3493 10         23 = map {$this->getparam($Params, $_)}
3494             qw(_SortOrder _Selection _SortSpecs _SRoutines _DefaultSortType _DefaultSortDirection);
3495            
3496             ## Validate / rectify all parameters...
3497              
3498             ## Default sort order is Record Number
3499              
3500 10 50 33     69 $SortOrder = [qw(_RecNum)] unless ((ref($SortOrder) eq 'ARRAY') && @$SortOrder);
3501              
3502             ## Note if we're going to sort on _RecNum ( requires extra work).
3503              
3504 10         18 my $NeedRecNum = grep {$_ eq '_RecNum'} @$SortOrder;
  14         38  
3505              
3506             ## Default list of record numbers is all of them.
3507              
3508 10 100       34 $Selection = $this->selection() unless (ref($Selection) eq 'ARRAY');
3509 10         29 $Selection = $this->selection_validate_internal($Selection);
3510              
3511             ## Our private copy of SortSpecs includes a spec for _RecNum
3512              
3513 10 50       32 $SortSpecs = {} unless (ref($SortSpecs) eq 'HASH');
3514 10         24 $SortSpecs = {%$SortSpecs};
3515 10 50 0     24 $SortSpecs ->{_RecNum} ||= {SortType => '_RecNum', SortDirection => 1} if $NeedRecNum;
3516              
3517             ## Our private copy of SRoutines also has the builtin entries
3518             ## added in (including one for _RecNum)
3519              
3520 10 50       30 $SRoutines = {} unless (ref($SRoutines) eq 'HASH');
3521 10         20 $SRoutines = {%$SRoutines, %{$this->sortroutines_builtin()}};
  10         26  
3522              
3523             ## Ensure that DefaultSortType has a reasonable value for which we
3524             ## have a sort routine.
3525            
3526 10 50 33     77 $DefaultSortType = 'String' unless (length($DefaultSortType) &&
3527             exists($SRoutines->{$DefaultSortType}));
3528            
3529             ## Ensure that DefaultSortDirection has a legal value (1 or -1;
3530             ## undef/0 will be treated as -1 (descending))
3531            
3532 10   50     37 $DefaultSortDirection = (max(min(int($DefaultSortDirection), 1), -1) || -1);
3533            
3534             ## Make some optimized lists of things to speed sorting.
3535              
3536             ## Get a hash of all data columns in $this plus a temporary one
3537             ## for _RecNum if needed.
3538 10 50       13 my $Cols = {%{$this->cols_hash($this->fieldlist_all())},
  10         26  
3539             ($NeedRecNum ? (_RecNum => $this->all()) : ())};
3540            
3541             ## Get a list mapping field numbers in $SortOrder to data columns
3542             ## Get a list mapping field numbers in $SortOrder to sort directions
3543             ## Get a list mapping field numbers in $SortOrder to sort types
3544             ## Get a list mapping field numbers in $SortOrder to sort routines
3545             ## Get a list of the field numbers in $SortOrder
3546              
3547 10 50       43 my $SortCols = [map { $Cols->{$_} || $this->col($_) } @$SortOrder];
  14         53  
3548 10 100       21 my $SortDirs = [map {$ {$SortSpecs->{$_}||{}}{SortDirection} || $DefaultSortDirection} @$SortOrder];
  14 100       17  
  14         108  
3549 10 100       18 my $SortTypes = [map {$ {$SortSpecs->{$_}||{}}{SortType } || $DefaultSortType } @$SortOrder];
  14 100       20  
  14         104  
3550 10 50 33 0   17 my $SortSubs = [map { $SRoutines->{$_} || $SRoutines->{'String'} || sub {0} } @$SortTypes];
  14         58  
  0         0  
3551 10         29 my $FieldNums = [0 .. $#$SortOrder];
3552              
3553             ## Construct a sort subroutine that sorts record numbers by
3554             ## examining values in given fields in the table, in the order
3555             ## specified in $SortOrder.
3556              
3557             ## If a given field's sort routine produces a zero $CmpVal, it
3558             ## means that the values are considered equal, and so to
3559             ## disambiguate, we keep trying the next fields in the sort order,
3560             ## until we've found one that compares non-zero or exhausted all
3561             ## the fields. If we get through all the specified sort fields
3562             ## and still get zeroes, the values must be equal in all the
3563             ## fields, and so the records are considered equal, so return 0.
3564              
3565 10         18 my $ProgCount;
3566             my $ShowedProgress;
3567 58         60 $Selection =
3568             [sort
3569             {
3570 10         30 my $CmpVal;
3571 58         89 foreach (@$FieldNums)
3572             {
3573             ## $a and $b are record numbers to be compared.
3574             ## $_ is the number of a field in the above lists.
3575            
3576 62         107 $CmpVal = (&{ $SortSubs->[$_] } ## Call the sort routine for field $_ with...
  62         114  
3577             (\ $SortCols->[$_]->[$a], ## 1st arg: ref to value in field $_, record $a
3578             \ $SortCols->[$_]->[$b]) ## 2nd arg: ref to value in field $_, record $b.
3579             * $SortDirs->[$_] ); ## Then invert cmp value if descending (-1)
3580            
3581             # print "($_, $SortCols->[$_]->[$a], $SortCols->[$_]->[$b]) ==> $CmpVal\n";
3582            
3583 62 100       143 last if $CmpVal; ## Keep going if $CmpVal == 0 (same)
3584             }
3585            
3586             ## Maybe show timed progress (only after 2 seconds have elapsed)
3587 58 100       142 my $Did = $this->progress_timed("Sorting", $ProgCount, undef, undef, 1)
3588             if ((($ProgCount++) % 200) == 0);
3589 58   33     165 $ShowedProgress ||= $Did;
3590              
3591 58         87 $CmpVal;
3592             }
3593             @$Selection];
3594              
3595             ## If no progress shown yet (sort took less than 2 seconds or 200
3596             ## operations), show a message now.
3597              
3598 10 50       47 $this->progress("Sorted.") unless $ShowedProgress;
3599            
3600             ## Replace any existing selection with the new, sorted, one.
3601 10         17 $this->{_Selection} = $Selection;
3602              
3603 10         70 done:
3604             return($Selection);
3605             }
3606              
3607             =pod
3608              
3609             =head1 SORT ORDER
3610              
3611             ## Getting / Setting table's default _SortOrder
3612              
3613             $t->sortorder() ## Get sortorder (default is [])
3614            
3615             my $Order = [qw(Last First State Zip)];
3616             $t->sortorder($Order) ## Set sortorder (use [] for none)
3617              
3618             $t->sortorder_default() ## Get the object's default sort order ([])
3619              
3620             The sort order is an optional list of field names on which to sort and
3621             sub-sort the data when sorting is requested. The field names must be
3622             the names of actual columns in the table. The names in the sort order
3623             do not necessarily need to coincide with the custom fieldlist if any.
3624              
3625             There is one special value that can be included: _RecNum. This sorts
3626             on the imaginary "record number" field. So for example, you could
3627             specify a sort order this way:
3628              
3629             [qw(Last First _RecNum)]
3630              
3631             (There is no point in putting _RecNum anywhere except at the end of
3632             the sort order because no two records will ever have the same record
3633             number so there will be no further need to disambiguate by referring
3634             to additional fields.)
3635              
3636             Sorting by _RecNum adds a bit of computational overhead because sort()
3637             first builds a record number vector for use in sorting, so for very
3638             large tables, don't do it unless you really need it.
3639              
3640             A sort order can be specified each time the object is sorted (see the
3641             sort() method for details).
3642              
3643             Or, the object's sort order can be set once, and then sort() will use
3644             that sort order when no other sort order is specified.
3645              
3646             If sorting is done when there is no sort order present in the object
3647             or specifed for the sort() method, the selection is sorted by record
3648             number (i.e. it is "unsorted" or returned to its "natural" order).
3649              
3650             In order words, a sortorder that is undef or [] is considered the same
3651             as: [qw(_RecNum)]. This is sometimes called "unsorting".
3652              
3653             In order to decide how values in each field should be compared, sort()
3654             is informed by SortSpecs (specifying SortType and SortDirection for
3655             each field) and by SortRoutines, each of which may similarly either be
3656             pre-set for the object or specified when calling sort() -- see below
3657             for further details.
3658              
3659             =cut
3660              
3661             {}; ## Get emacs to indent correctly.
3662              
3663             sub sortorder
3664             {
3665 8     8 0 21 my $this = shift;
3666 8         13 my ($SortOrder) = @_;
3667              
3668 8   100     32 my $Valid = ((ref($SortOrder) eq 'ARRAY') && (@$SortOrder > 0));
3669              
3670             ## Set if specified.
3671 8 100 100     37 if (defined($SortOrder) && $Valid)
    100          
3672             {
3673 2         4 $this->{_SortOrder} = $SortOrder;
3674             }
3675             elsif (defined($SortOrder))
3676             {
3677 1         4 $this->{_SortOrder} = undef; ## Store undef instead of []
3678             }
3679            
3680             ## Get and return.
3681 8   66     33 $SortOrder = $this->{_SortOrder} || $this->sortorder_default();
3682            
3683 8         31 return($SortOrder);
3684             }
3685              
3686             sub sortorder_default
3687             {
3688 5     5 0 13 my $this = shift;
3689              
3690 5         8 my $SortOrder = [];
3691              
3692 5         19 return($SortOrder);
3693             }
3694              
3695             sub sortorder_check
3696             {
3697 63     63 0 80 my $this = shift;
3698 63         149 my $FieldsHash = $this->fieldlist_hash();
3699              
3700             ## Remove any bogus field names from the sort order, if any.
3701              
3702 4         13 $this->{_SortOrder} = [grep {exists($FieldsHash->{$_})}
  1         3  
3703 63 100       244 @{$this->{_SortOrder}}] if defined $this->{_SortOrder};
3704             }
3705              
3706             =pod
3707              
3708             =head1 SORT SPECIFICATIONS
3709              
3710             ## Getting / Setting table's default _SortSpecs
3711              
3712             $t->sortspecs() ## Get sortspecs (default is {} -- none)
3713              
3714             my $Specs = {Last => {SortType => 'String' ,
3715             SortDirection => -1 },
3716             Zip => {SortType => 'Integer' }};
3717              
3718             $t->sortspecs($Specs) ## Set sortspecs
3719              
3720             $t->sortspecs_default() ## Get the object's default sort specs ({})
3721              
3722             The sortspecs are an optional hash mapping field names to "sort
3723             specifications".
3724              
3725             Each field's sort specification may specify zero or more of these
3726             values:
3727              
3728             =over 4
3729              
3730             =item SortType
3731              
3732             the sort type to use (For example: String, Integer)
3733              
3734             =item SortDirection
3735              
3736             the sort direction (1: ascending, -1: descending)
3737              
3738             =back
3739              
3740             Sortspecs can be specified when calling the sort() routine, or, a set
3741             of specs can be placed beforehand into the object itself and those
3742             will be used by sort() if no other specs are given.
3743              
3744             For any field listed in the sort order at the time of sorting, but
3745             lacking a sort spec or any component of the sort spec, the object's
3746             default sort type (see sorttype_default()) and default sort direction
3747             (see sortdirection_default()) will be used.
3748              
3749             In addition to getting/setting sort specs as a whole, they may be
3750             gotten/set on a per-field basis, too:
3751              
3752             sortspec($Field) ## Get sortspec for $Field or default spec
3753              
3754             my $Spec = {SortType => 'Integer', SortDirection => -1};
3755             sortspec('Zip', $Spec) ## Set sortspec
3756              
3757             sortspec_default() ## Get a sortspec with all defaults filled in
3758              
3759             For any $Field not found in the object's sortspecs, sortspec($Field)
3760             returns the same thing returned by sortspec_default(), which is a
3761             sortspec filled in with the default sort type and sort direction (see
3762             below).
3763              
3764             For a list of available built-in SortTypes, and instructions for how
3765             to define your own, see SORT ROUTINES, below.
3766              
3767             =cut
3768              
3769             {}; ## Get emacs to indent correctly.
3770              
3771             sub sortspecs
3772             {
3773 46     46 0 62 my $this = shift;
3774 46         55 my ($SortSpecs) = @_;
3775              
3776             ## Set if specified.
3777 46 100       86 $this->{_SortSpecs} = $SortSpecs if $SortSpecs;
3778            
3779             ## Get and return
3780 46   33     101 $SortSpecs = $this->{_SortSpecs} || $this->sortspecs_default();
3781              
3782 46         93 return($SortSpecs);
3783             }
3784              
3785             sub sortspecs_default
3786             {
3787 1     1 0 2 my $this = shift;
3788              
3789 1         3 my $SortSpecs = {};
3790              
3791 1         6 return($SortSpecs);
3792             }
3793              
3794             sub sortspec
3795             {
3796 4     4 0 17 my $this = shift;
3797 4         9 my ($FieldName, $SortSpec) = @_;
3798              
3799             ## Set if specified.
3800 4 100       18 $this->{_SortSpecs}->{$FieldName} = $SortSpec if $SortSpec;
3801            
3802             ## Get and return.
3803 4   33     19 my $SortSpec = ($this->{_SortSpecs}->{$FieldName} ||
3804             $this->sortspec_default($FieldName));
3805            
3806             ## Provide defaults for needed fields of sort spec.
3807 4   33     14 $SortSpec->{SortType} ||= $this->sorttype_default ();
3808 4 50       15 $SortSpec->{SortDirection} = $this->sortdirection_default()
3809             unless defined($SortSpec->{SortDirection});
3810            
3811 4         14 return($SortSpec);
3812             }
3813              
3814             sub sortspec_default
3815             {
3816 2     2 0 17 my $this = shift;
3817 2         4 my ($FieldName) = @_;
3818              
3819             ## Default sortspec for a field
3820              
3821 2         8 my $SortType = $this->sorttype_default ();
3822 2         7 my $SortDir = $this->sortdirection_default();
3823            
3824 2         8 my $Spec = {SortType => $SortType, SortDirection => $SortDir};
3825              
3826 2         8 return($Spec);
3827             }
3828              
3829              
3830             =pod
3831              
3832             =head1 DEFAULT SORT DIRECTION
3833              
3834             ## Getting / Setting table's default _DefaultSortDirection
3835              
3836             $t->sortdirection_default() ## Get default sort direction
3837              
3838             $t->sortdirection_default(-1) ## Set default sort direction
3839              
3840             Each element in a sort specification can optionally specify a sort
3841             direction.
3842              
3843             1 = ascending, -1 = descending
3844              
3845             For any sort specs that don't specify a direction, the object's
3846             default sort direction will be used. Use these routines to get/set
3847             the default sort direction.
3848              
3849             =cut
3850              
3851             {}; ## Get emacs to indent correctly.
3852              
3853             sub sortdirection_default
3854             {
3855 6     6 0 14 my $this = shift;
3856 6         10 my ($DefaultSortDir) = @_;
3857            
3858 6 100       17 if (defined($DefaultSortDir))
3859             {
3860             ## Set if specified. Force to 1 or -1. Treat 0 as -1 (descending).
3861 1   50     6 $this->{_DefaultSortDirection} =
3862             (max(min(int($DefaultSortDir), 1), -1) || -1);
3863             }
3864            
3865             ## Get and return. If not present, then 1 (ascending) is the default.
3866 6 50       16 my $SortDir = (defined($this->{_DefaultSortDirection}) ?
3867             $this->{_DefaultSortDirection} :
3868             1);
3869 6         21 return($SortDir);
3870             }
3871              
3872             =pod
3873              
3874             =head1 DEFAULT SORT TYPE
3875              
3876             ## Getting / Setting table's default _DefaultSortType
3877              
3878             $t->sorttype_default() ## Get default sort type
3879              
3880             $t->sorttype_default('Integer') ## Set default sort type
3881              
3882             Each element in a sort specification can optionally specify a sort
3883             type. The sort type is a string (like 'String' or 'Integer' or
3884             'Date') that selects from one or more sort routines. (See Sort
3885             Routines, below).
3886              
3887             There are several sort routines built into the CTable object, and you
3888             can also add as many of your own routines (and hence Sort Types) as
3889             you like or need. This allows for very flexible sorting.
3890              
3891             For any sort specs that don't specify a type, the object's default
3892             sort type will be used. Use these routines to get/set the default
3893             sort type, which initially is 'String'.
3894              
3895             =cut
3896              
3897             {}; ## Get emacs to indent correctly.
3898              
3899             sub sorttype_default
3900             {
3901 6     6 0 10 my $this = shift;
3902 6         9 my ($DefaultSortType) = @_;
3903            
3904 6 100       15 if (defined($DefaultSortType))
3905             {
3906             ## Set if specified.
3907 1         4 $this->{_DefaultSortType} = "$DefaultSortType";
3908             }
3909              
3910             ## Get and return. If not present, then 'String' is the default.
3911 6 50       21 my $SortDir = (defined($this->{_DefaultSortType}) ?
3912             $this->{_DefaultSortType} :
3913             'String');
3914 6         18 return($SortDir);
3915             }
3916              
3917             =pod
3918              
3919             =head1 SORT ROUTINES: BUILTIN AND CUSTOM
3920              
3921             ## Getting / Setting table's custom sort routines (_SRoutines)
3922              
3923             $t->sortroutine($Type) ## Get a sort routine for $Type
3924              
3925             $t->sortroutine($Type, $Sub) ## Set a sort routine for $Type
3926              
3927             $t->sortroutine($Type, 0 ) ## Remove sort routine for $Type
3928             $t->sortroutine_set($Type)
3929            
3930             $t->sortroutines() ## Get hash of any sort routines
3931            
3932             $t->sortroutines_builtin() ## Get hash of builtin routines
3933              
3934             Each SortType in the sortspecs should have a corresponding sort
3935             routine (any unrecognized type will be sorted using the 'String' sort
3936             routine).
3937              
3938             The sort() command looks up the appropriate sort routine for each
3939             field it is asked to sort, based on the SortType for that field, as
3940             given in the sortspecs, as described above.
3941              
3942             Builtin sort types, recognized and implemented natively by this
3943             module, are:
3944              
3945             String ## Fastest case-sensitive compare (data is string)
3946             Text ## Insensitive compare (lowercases, then compares)
3947             Number ## Number works for floats or integers
3948             Integer ## Faster than floats. Uses "use integer"
3949             DateSecs ## Same as integer; assumes date in seconds
3950             Boolean ## Treats item as a Perlish boolean (empty/undef = false)
3951              
3952             The above sort types are always recognized. Additional sort types may
3953             be added by subclasses (and could shadow the builtin implementations
3954             of the above types if desired) and/or may be added to instances (and
3955             again could shadow the above implementations), and/or may be specified
3956             when the sort() method is called, once again optionally shadowing any
3957             deeper definitions.
3958              
3959             =head1 CUSTOM SORT ROUTINE INTERFACE
3960              
3961             A custom sort routine is called with two arguments, each of which is a
3962             pointer to a scalar. The sort routine should dereference each pointer
3963             and compare the resulting scalars, returning -1 if the first scalar is
3964             smaller than the second, 1 if it is larger, and 0 if they are
3965             considered equal.
3966              
3967             For example, here is the built-in comparison routine for 'String':
3968              
3969             sub { $ {$_[0]} cmp $ {$_[1]} }
3970              
3971             NOTE: Your custom sort routines should NOT compare $a and $b as with
3972             Perl's builtin sort() command.
3973              
3974             Examine the variable $BuiltinSortRoutines in this module's
3975             implementation to see some additional examples of sort routines.
3976              
3977             Internally, sort() calls the sortroutines() method to get a hash that
3978             should consist of all builtin sort routines with the per-object sort
3979             routines, if any, overlaid. sortroutines() in turn calls the
3980             sortroutines_builtin() method to get a copy of the hash of all builtin
3981             sort routines for the object. (So a subclass could easily add
3982             additional SortTypes or reimplement them by just overriding
3983             sortroutines_builtin() and adding its own additional routines to the
3984             resulting hash.)
3985              
3986             sortroutine() may be called to get or set a custom sort routine for a
3987             given type in the given object.
3988              
3989             There is no way to directly manipulate the builtin sort routines for
3990             the entire class. To accomplish that, you should define and use a
3991             subclass that extends sortroutines_builtin() to add its own routines.
3992              
3993             For example:
3994              
3995             BEGIN
3996             { ## A subclass of Data::CTable with an INetAddr SortType.
3997             package IATable; use vars qw(@ISA); @ISA = qw(Data::CTable);
3998              
3999             sub sortroutines_builtin
4000             {
4001             my $this = shift;
4002             my $CustomRoutines =
4003             {INetAddr =>
4004             sub {use integer; ip2int($ {$_[0]}) <=> ip2int($ {$_[1]})}};
4005             my $AllRoutines =
4006             {%{$this->SUPER::sortroutines_builtin()} %$CustomRoutines};
4007             return($AllRoutines);
4008             };
4009              
4010             sub ip2int {.....} $# Could memoize & inline for efficiency
4011             }
4012              
4013             my $Table = IATable::new(......);
4014              
4015             The IATable class would then have all the same features of
4016             Data::CTable but would then also support the INetAddr SortType.
4017              
4018             =cut
4019              
4020             {}; ## Get emacs to indent correctly.
4021              
4022             BEGIN
4023             {
4024             my $BuiltinSortRoutines =
4025             {(
4026 64         68 String => sub { $ {$_[0]} cmp $ {$_[1]} },
  64         79  
  64         122  
4027 5         6 Text => sub { lc($ {$_[0]}) cmp lc($ {$_[1]})},
  5         8  
  5         15  
4028 8         9 Number => sub { $ {$_[0]} <=> $ {$_[1]} },
  8         11  
  8         21  
4029 1     1   1074 Integer => sub {use integer; $ {$_[0]} <=> $ {$_[1]} },
  1         11  
  1         4  
  8         10  
  8         16  
  8         21  
4030 1     1   54 DateSecs => sub {use integer; $ {$_[0]} <=> $ {$_[1]} },
  1         1  
  1         51  
  4         6  
  4         7  
  4         9  
4031 1     1   48 _RecNum => sub {use integer; $ {$_[0]} <=> $ {$_[1]} },
  1         2  
  1         4  
  4         6  
  4         7  
  4         9  
4032 4         5 Boolean => sub { !!$ {$_[0]} <=> !!$ {$_[1]} },
  4         7  
  4         12  
4033 1     1   956 )};
4034            
4035             sub sortroutines_builtin ## Class or instance method
4036             {
4037 53     53 0 351 return({%$BuiltinSortRoutines}); ## Copy of above private hash.
4038             }
4039             }
4040              
4041             sub sortroutine
4042             {
4043 38     38 0 179 my $this = shift;
4044 38         59 my ($Type, $Routine) = @_;
4045              
4046 38 100       78 if (defined($Routine))
4047             {
4048             ## Set if $Routine provided.
4049 3         10 $this->sortroutine_set($Type, $Routine);
4050             }
4051              
4052             ## Get and return.
4053 38         70 $Routine = $this->sortroutine_get($Type);
4054              
4055 38         123 return($Routine);
4056             }
4057              
4058             sub sortroutine_get
4059             {
4060 38     38 0 45 my $this = shift;
4061 38         42 my ($Type) = @_;
4062 38         72 my $Routines = $this->sortroutines();
4063 38   66     96 my $Routine = $Routines->{$Type} || $Routines->{'String'};
4064            
4065 38         99 return($Routine);
4066             }
4067              
4068             sub sortroutine_set
4069             {
4070 4     4 0 5 my $this = shift;
4071 4         6 my ($Type, $Routine) = @_;
4072              
4073 4         7 my $Valid = (ref($Routine) eq 'CODE');
4074              
4075 4 100       10 if ($Valid)
4076             {
4077             ## Add / replace if a routine was supplied.
4078 2   50     3 $ {$this->{_SRoutines} ||= {}}{$Type} = $Routine;
  2         8  
4079             }
4080             else
4081             {
4082             ## Otherwise delete.
4083 2   50     3 $Routine = delete $ {$this->{_SRoutines} ||= {}}{$Type};
  2         8  
4084             }
4085            
4086 4         14 return($Routine);
4087             }
4088              
4089             sub sortroutines
4090             {
4091 42     42 0 54 my $this = shift;
4092            
4093 42 50       86 my $Routines = {%{$this->sortroutines_builtin()}, ## First builtin ones
  42         186  
4094 42         46 %{$this->{_SRoutines} || {}}}; ## Shadow with object's own
4095            
4096 42         193 return($Routines);
4097             }
4098              
4099             =pod
4100              
4101             =head1 FREEZING SELECTION & FIELD LIST
4102              
4103             ## Freeze data layout: re-order columns; omit unused fields
4104            
4105             $t->cull(...params...) ## Rebuild table in order
4106             my $s = $t->snapshot(...params...) ## Make copy as if rebuilt
4107              
4108             The cull() method re-writes all data in the table to be in the order
4109             indicated in _Selection (if present). This will cause any records not
4110             listed in _Selection to be omitted (unless selection is null in which
4111             case all records are retained in original order).
4112              
4113             In addition, if there is a custom field list present, it removes any
4114             fields NOT mentioned in _FieldList.
4115              
4116             The snapshot() method is similar, except instead of modifying the
4117             object itself, it makes a copy of the object that's equivalent to what
4118             cull() would have created, and returns that new object, leaving the
4119             original untouched. (All data structures are deep-copied from the old
4120             object to the new one, leaving the objects totally independent.)
4121              
4122             cull() and snapshot() both take two optional named parameters:
4123             _FieldList and/or _Selection to be used in place of the corresponding
4124             parameters found in the object.
4125              
4126             If only a single argument is supplied, it is assumed to be _Selection.
4127              
4128             =cut
4129              
4130             sub cull
4131             {
4132 1     1 0 2 my $this = shift;
4133 1 50       5 my $Params = (@_ == 1 ? {_Selection => $_[0]} : {@_});
4134            
4135 1         2 my($Selection, $FieldList) = map {$this->getparam($Params, $_)}
  2         4  
4136             qw(_Selection _FieldList);
4137            
4138 1   33     3 $FieldList ||= $this->{_FieldList};
4139 1   33     4 $Selection ||= $this->{_Selection};
4140              
4141             ## First cull any fields/columns not mentioned in _FieldList, if any.
4142 1 50       3 if ($FieldList)
4143             {
4144 1         2 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  1         4  
4145 1         2 my $DeadFields = [grep {!exists($FieldHash->{$_})} @{$this->fieldlist_all()}];
  4         8  
  1         4  
4146 1         4 delete @$this{@$DeadFields};
4147              
4148             ## Set the (possibly) new field list in the object.
4149 1         5 $this->fieldlist_set($FieldList);
4150             }
4151              
4152             ## Then cull / rearrange all the columns
4153 1 50       5 if ($Selection)
4154             {
4155             ## Temporarily set the selection() to be the one we may have been given...
4156 1         5 $this->selection($Selection);
4157            
4158             ## Get the de-facto field list if we don't already have it explicitly
4159 1   33     4 $FieldList ||= $this->fieldlist();
4160            
4161             ## Rearrange each column
4162 1         2 foreach my $FieldName (@$FieldList)
4163             {
4164 3         7 $this->{$FieldName} = $this->sel($FieldName);
4165             }
4166              
4167             ## Remove the _Selection since it is no longer valid.
4168 1         3 $this->selection_delete();
4169             }
4170             }
4171              
4172             sub snapshot
4173             {
4174 50     50 0 1096 my $this = shift;
4175 50 50       151 my $Params = (@_ == 1 ? {_Selection => $_[0]} : {@_});
4176            
4177 50         85 my($Selection, $FieldList) = map {$this->getparam($Params, $_)}
  100         234  
4178             qw(_Selection _FieldList);
4179            
4180 50   33     296 $FieldList ||= $this->{_FieldList};
4181 50   66     230 $Selection ||= $this->{_Selection};
4182              
4183             ## First make a shallow copy of $this
4184 50         812 my $copy = {%$this};
4185              
4186             ## Then delete any (references to) data columns owned by $this...
4187 50         144 delete @$copy{@{$this->fieldlist_all()}};
  50         125  
4188              
4189             ## Then deep-copy all other parameters from $this...
4190 50         2619 $copy = dclone($copy);
4191            
4192             ## Then new/bless/initialize the copy into the same class as $this...
4193 50         401 $copy = ref($this)->new($copy);
4194              
4195             ## Temporarily override selection if necessary.
4196 50         240 my $OldSel = $this->{_Selection};
4197 50         122 $this->selection($Selection);
4198              
4199             ## Then insert all the rearranged columns into $copy...
4200 50         72 @$copy{@$FieldList} = @{$this->sels($FieldList)};
  50         119  
4201            
4202             ## Restore old selection, if any.
4203 50         151 $this->selection_set($OldSel);
4204              
4205             ## Remove the selection in copy.
4206 50         86 delete $copy->{_Selection};
4207              
4208             ## Set copy's fieldlist to a copy of the one we used.
4209 50         175 $copy->{_FieldList} = [@$FieldList];
4210              
4211 50         204 return($copy);
4212             }
4213              
4214             =pod
4215              
4216             =head1 LINE ENDINGS
4217              
4218             ## Get current value
4219              
4220             $t->lineending() ## Get actual setting: string or symbol
4221             $t->lineending_symbol() ## Get setting's symbolic value if possible
4222             $t->lineending_string() ## Get setting's string value if possible
4223              
4224             ## Set value
4225              
4226             $t->lineending($Ending) ## Will be converted internally to symbol
4227              
4228             ## Convert a value to symbol or string form
4229              
4230             $t->lineending_symbol($L) ## Convert string form to symbolic form
4231             $t->lineending_string($L) ## Convert symbol form to string form
4232              
4233             ## Get internal conversion hash tables
4234              
4235             $t->lineending_symbols() ## Hash ref mapping known strings to symbols
4236             $t->lineending_strings() ## Hash ref mapping known symbols to strings
4237              
4238             Use these accessor functions to get/set the _LineEnding parameter.
4239              
4240             You can set the parameter in either string or symbol form as you wish.
4241             You can get it in its raw, as-stored, form, or, you can get it in
4242             string form or symbol form as desired.
4243              
4244             Finally, some utility conversion calls allows you to convert a string
4245             you have on hand to a symbolic form. For example:
4246              
4247             $L = "\x0D";
4248             print ("This file uses " . $t->lineending_symbol($L) . " endings.");
4249              
4250             This would print:
4251            
4252             This file uses mac endings.
4253              
4254             =cut
4255              
4256             {}; ## Get emacs to indent correctly.
4257              
4258             BEGIN
4259             {
4260             ## Map any recognized _LineEnding value to its actual string
4261 1     1   8 my $LineEnding_Strings =
4262             {(
4263             dos => "\x0D\x0A",
4264             mac => "\x0D",
4265             unix => "\x0A",
4266             "\x0D\x0A" => "\x0D\x0A",
4267             "\x0D" => "\x0D",
4268             "\x0A" => "\x0A",
4269             )};
4270            
4271             ## Map any recognized _LineEnding value to its logical form
4272 1         781 my $LineEnding_Symbols =
4273             {(
4274             "\x0D\x0A" => "dos",
4275             "\x0D" => "mac",
4276             "\x0A" => "unix",
4277             dos => "dos",
4278             mac => "mac",
4279             unix => "unix",
4280             )};
4281            
4282             sub lineending_strings ## Class or instance method
4283             {
4284 33     33 0 311 return({%$LineEnding_Strings}); ## Copy of above private hash.
4285             }
4286              
4287             sub lineending_symbols ## Class or instance method
4288             {
4289 25     25 0 1388 return({%$LineEnding_Symbols}); ## Copy of above private hash.
4290             }
4291             }
4292              
4293             sub lineending()
4294             {
4295 3     3 0 12 my $this = shift;
4296 3         3 my ($LineEnding) = @_;
4297            
4298             ## Set if specified. Try to convert to symbolic form if possible.
4299 3 50 0     7 $this->{_LineEnding} = $this->lineending_symbol($LineEnding) || $LineEnding if $LineEnding;
4300              
4301             ## Otherwise / either case, get whatever value we have and return it....
4302 3         4 my $LineEnding = $this->{_LineEnding};
4303              
4304 3         7 return($LineEnding);
4305             }
4306              
4307             sub lineending_symbol
4308             {
4309 22     22 0 34 my $this = shift;
4310 22         30 my ($LineEnding) = @_;
4311              
4312 22   66     57 $LineEnding ||= $this->{_LineEnding};
4313              
4314 22   33     30 return($ {$this->lineending_symbols()}{$LineEnding} || $LineEnding);
4315             }
4316              
4317              
4318             sub lineending_string
4319             {
4320 30     30 0 50 my $this = shift;
4321 30         42 my ($LineEnding) = @_;
4322              
4323 30   100     125 $LineEnding ||= $this->{_LineEnding};
4324              
4325 30   66     33 return($ {$this->lineending_strings()}{$LineEnding} || $LineEnding);
4326             }
4327              
4328             =pod
4329              
4330             =head1 AUTOMATIC CACHEING
4331              
4332             By default, Data::CTable makes cached versions of files it reads so it
4333             can read them much more quickly the next time. Optionally, it can
4334             also cache any file it writes for quicker re-reading later.
4335              
4336             On Unix systems, cache files are always created with 0666
4337             (world-write) permissions for easy cleanup.
4338              
4339             When reading files, Data::CTable checks the _CacheOnRead parameter.
4340             If that parameter is true, which it is by default, the module tries to
4341             find an up-to-date cache file to read instead of the original.
4342             Reading a cache file can be 10x faster than reading and parsing the
4343             original text file.
4344              
4345             In order to look for the cache file, it must first calculate the path
4346             where the cache file should be located, based on the _FileName of the
4347             file to be read.
4348              
4349             The path of the cache file is calculated as follows:
4350              
4351             If the _CacheSubDir parameter is a RELATIVE PATH, then it is appended
4352             to the directory component of _FileName to arrive at the directory to
4353             use to store the cache file. If it is an ABSOLUTE PATH, then
4354             _CacheSubDir is used by itself. (The trailing path separator is
4355             optional and an appropriate one will be added by Data::CTable if it is
4356             missing.)
4357              
4358             The file name of the cache file is calculated as follows:
4359              
4360             If the _CacheExtension parameter is specified, it is appended to the
4361             base file name component from the _FileName parameter. If you want
4362             the cached file name to be the same as the name of the original file,
4363             you can set _CacheExtension to "", which is not recommended.
4364              
4365             Then, the cache path and cache file name are joined to arrive at the
4366             name of the cache file. If both _CacheSubDir and _CacheExtension were
4367             empty, then the cache file path will be the same as the _FileName, and
4368             Data::CTable will refuse to either read or write a cache file, so
4369             setting these fields both to empty is equivalent to setting
4370             _CacheOnRead to false.
4371              
4372             The cache file contains a highly-efficient representation of all the
4373             following data that would otherwise have to be determined by reading
4374             and parsing the entire text file:
4375              
4376             - All the data columns (field values)
4377             - _FieldList: The list of fields, in order
4378             - _HeaderRow: Whether a header row is / should be present
4379             - _LineEnding: The line ending setting
4380             - _FDelimiter: The field delimiter setting
4381              
4382             If found prior to a read(), AND, the date of the cache file is LATER
4383             than the date of the original file, the cache file is used instead.
4384             (If the date is EARLIER, then the cache file is ignored because it can
4385             be presumed that the data inside the text file is newer.)
4386              
4387             If cacheing is ON, then after successfully reading the text file
4388             (either because there was no cache file yet or the cache file was out
4389             of date or corrupted or otherwise unusable), read() will then try to
4390             create a cache file. This, of course, takes some time, but the time
4391             taken will be more than made up in the speedup of the next read()
4392             operation on the same file.
4393              
4394             If creating the cache file fails (for example, because file
4395             permissions didn't allow the cache directory to be created or the
4396             cache file to be written), read() generates a warning explaining why
4397             cacheing failed, but the read() operation itself still succeeds.
4398              
4399             No parameters in the object itself are set or modified to indicate the
4400             success or failure of writing the cache file.
4401              
4402             Similarly, there is no way to tell whether a successful read()
4403             operation read from the cache or from the original data file. If you
4404             want to be SURE the reading was from the data file, either turn off
4405             _CacheOnRead, or call the read_file() method instead of read().
4406              
4407             NOTE: because the name of the cache file to be used is calculated just
4408             before the read() is actually done, the cache file can only be found
4409             if the _CacheSubDir and _CacheExtension are the same as they were when
4410             the cache was last created. If you change these parameters after
4411             having previously cached a file, the older caches could be "orphaned"
4412             and just sit around wasting disk space.
4413              
4414             =head2 Cacheing on write()
4415              
4416             You may optionally set _CacheOnWrite (default = false) to true. If
4417             done, then a cache file will be saved for files written using the
4418             write() command. Read about write() below for more about why you
4419             might want to do this.
4420              
4421             =head1 AUTOMATIC DIRECTORY CREATION
4422              
4423             When Data::CTable needs to write a file, (a cache file or a data
4424             file), it automatically tries to create any directories or
4425             subdirectories you specify in the _FileName or _CacheSubDir
4426             parameters.
4427              
4428             If it fails while writing a data file, write() will fail (and you will
4429             be warned). If it fails to create a directory while writing a cache
4430             file, a warning will be issued, but the overall read() or write()
4431             operation will still return a result indicating success.
4432              
4433             Any directories created will have the permissions 0777 (world-write)
4434             for easy cleanup.
4435              
4436             Generally, the only directory the module will have to create is a
4437             subdirectory to hold cache files.
4438              
4439             However, since other directories could be created, be sure to exercise
4440             caution when allowing the module to create any directories for you on
4441             any system where security might be an issue.
4442              
4443             Also, if the 0666 permissions on the cache files themselves are too
4444             liberal, you can either 1) turn off cacheing, or 2) call the
4445             prep_cache_file() method to get the name of the cache file that would
4446             have been written, if any, and then restrict its permissions:
4447              
4448             chmod (0600, $this->prep_cache_file());
4449              
4450             =head1 READING DATA FILES
4451              
4452             ## Replacing data in table with data read from a file
4453              
4454             $t->read($Path) ## Simple calling convention
4455              
4456             $t->read( ## Named-parameter convention
4457              
4458             ## Params that override params in the object if supplied...
4459              
4460             _FileName => $Path, ## Full or partial path of file to read
4461              
4462             _FieldList => [...], ## Fields to read; others to be discarded
4463              
4464             _HeaderRow => 0, ## No header row (_FieldList required!)
4465              
4466             _LineEnding => undef, ## Text line ending (undef means guess)
4467             _FDelimiter => undef, ## Field delimiter (undef means guess)
4468              
4469             _ReturnMap => 1, ## Whether to decode internal returns
4470             _ReturnEncoding=>"\x0B", ## How to decode returns.
4471             _MacRomanMap => undef, ## Whether/when to read Mac char set
4472              
4473             _CacheOnRead => 0, ## Enable/disable cacheing behavior
4474             _CacheExtension=> ".x", ## Extension to add to cache file name
4475             _CacheSubDir => "", ## (Sub-)dir, if any, for cache files
4476              
4477             ## Params specific to the read()/write() methods...
4478              
4479             _MaxRecords => 200, ## Limit on how many records to read
4480             )
4481              
4482             $t->read_file() ## Internal: same as read(); ignores cacheing
4483              
4484             read() opens a Merge, CSV, or Tab-delimited file and reads in all or
4485             some fields, and all or some records, REPLACING ANY EXISTING DATA in
4486             the CTable object.
4487              
4488             Using the simple calling convention, just pass it a file name. All
4489             other parameters will come from the object (or will be defaulted if
4490             absent). To specify additional parameters or override any parameters
4491             in the object while reading, use the named-parameter calling
4492             convention.
4493              
4494             See the full PARAMETER LIST, above, or read on for some extra details:
4495              
4496             _ReturnMap controls whether return characters encoded as ASCII 11
4497             should be mapped back to real newlines (C<"\n">) when read into memory.
4498             If false, they are left as ASCII 11 characters. (default is "true")
4499              
4500             _ReturnEncoding controls the character that returns are encoded as, if
4501             different from ASCII 11.
4502              
4503             _FieldList is an array (reference) listing the names of fields to
4504             import, in order (and will become the object's _FieldList upon
4505             successful completion of the read() operation). If not provided and
4506             not found in the object, or empty, then all fields found in the file
4507             are imported and the object's field list will be set from those found
4508             in the file, in the order found there. If _HeaderRow is false, then
4509             this parameter is required (either in the object or as a formal
4510             parameter) and is assumed to give the correct names for the fields as
4511             they actually occur in the file. If _HeaderRow is true and _FieldList
4512             is provided, then _FieldList specifies the (sub-)set of fields to be
4513             read from the file and others will be ignored.
4514              
4515             _HeaderRow, which defaults to true, if set to false, tells read() to
4516             not expect a header row showing the field names in the file. Instead,
4517             it assumes that the _FieldList gives those (and _FieldList must
4518             therefore be specified either as a parameter or an existing parameter
4519             in the object).
4520              
4521             _MaxRecords (optional) is an upper limit on the number of fields to
4522             import. If not specified, or zero, or undef, then there is no limit;
4523             all records will be imported or memory will be exhausted.
4524              
4525             read() returns a Boolean "success" code.
4526              
4527             If read() returns false, then it will also have set the _ErrorMsg
4528             parameter in the object. It may or may not have partially altered
4529             data in the object if an error is encountered.
4530              
4531             After a successful read:
4532              
4533             fieldlist() (the object's _FieldList parameter) tells which
4534             fields were actually read, in what order. It may omit any fields
4535             requested in _FieldList that were not actually found in the file for
4536             whatever reason.
4537              
4538             length() tells how many fields were read.
4539              
4540             The selection() is reset to no selection (all selected / unsorted)
4541              
4542             The object's _FileName parameter contains the path to the file
4543             that was read. If the _FileName you specified did not have a path,
4544             then _FileName will be prepended with a path component indicating
4545             "current directory" (e.g. "./" on Unix).
4546              
4547             _FDelimiter will contain the actual delimiter character that was
4548             used to read the file (either tab or comma if the delimiter was
4549             guessed, or whatever delimiter you specified).
4550              
4551             _LineEnding will contain the actual line-ending setting used to
4552             read the file. This will be either "mac" ("\x0D"), "unix" ("\x0D"),
4553             or "dos" ("\x0D\x0A") if the line endings were guessed by read().
4554             Otherwise it will be whatever _LineEnding you specified.
4555              
4556              
4557             =head1 FILE FORMAT NOTES
4558              
4559             As mentioned, read() allows the following flexibilities in reading
4560             text-based tabular data files:
4561              
4562             You may specify the line endings (record delimiters), or it can
4563             guess them (mac, unix, dos are supported).
4564              
4565             You may specify the field delimiters, or it can guess them (tab
4566             and comma are supported).
4567              
4568             It can get field names from a header row, or, if there is no
4569             header row, you can tell it the field names, in order.
4570              
4571             You can tell it whether or not to decode embedded returns in
4572             data fields, and if so, which character they were encoded as.
4573              
4574             Beyond supporting the above flexible options, read() makes the
4575             following non-flexible assumptions:
4576              
4577             Fields must NOT contain unencoded returns -- that is: whatever
4578             character sequence is specified for _LineEnding will NEVER occur
4579             inside a field in the text file; in addition, the current platform's
4580             definition of C<"\n"> will NEVER occur; these characters if present in
4581             field data, MUST have been encoded to some safe character string
4582             before the file was created.
4583              
4584             Each field may OPTIONALLY be surrounded with double-quote marks.
4585             However, if the field data itself contains either a double-quote
4586             character (C<">) or the current file's field delimiter (such as tab or
4587             comma), then the field MUST be surrounded with double-quotes.
4588             (Currently, all data written by Data::CTable have all field values
4589             surrounded by double-quotes, but a more selective policy may be used
4590             in the future.)
4591              
4592             If a field contains a double-quote character, then each double-quote
4593             character in the field must be encoded as C<""> -- i.e. each C<"> in the
4594             original data becomes C<""> in the text file.
4595              
4596             Data files may not mix line-ending types or field delimiter types.
4597             Once determined, the same endings and delimiters will be used to read
4598             the entire file.
4599              
4600             The fields recognized on each line will either be determined by
4601             the header row or the _FieldList provided by the caller. Any extra
4602             fields on any given line will be ignored. Any missing fields will be
4603             treated as undef/empty.
4604              
4605             If you are having trouble reading a delimited text file, check that
4606             all data in the file obeys these assumptions.
4607              
4608             =cut
4609              
4610             sub read ## Read, cacheing if possible
4611             {
4612 62     62 1 99 my $this = shift;
4613 62         191 return($this->read_file_or_cache(@_));
4614             }
4615              
4616             sub read_file ## Read, ignoring cacheing
4617             {
4618 19     19 0 32 my $this = shift;
4619 19 50       82 my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
4620              
4621 171         312 my($FileName, $FieldList, $MaxRecords, $LineEnding, $FDelimiter, $ReturnMap, $ReturnEncoding, $MacRomanMap, $HeaderRow)
4622 19         54 = map {$this->getparam($Params, $_)}
4623             qw(_FileName _FieldList _MaxRecords _LineEnding _FDelimiter _ReturnMap _ReturnEncoding _MacRomanMap _HeaderRow);
4624              
4625 19         34 my $Success;
4626              
4627             ## Default error message is none.
4628 19         38 $this->{_ErrorMsg} = "";
4629              
4630             ## Default for HeaderRow is true.
4631 19 50       45 $HeaderRow = 1 unless defined($HeaderRow);
4632              
4633             ## Default for ReturnEncoding is "\x0B" (control-K; ASCII 11)
4634 19 50       51 $ReturnEncoding = "\x0B" unless length($ReturnEncoding);
4635              
4636             ## Default for ReturnMap is true.
4637 19 50       42 $ReturnMap = 1 unless defined($ReturnMap);
4638              
4639             ## Default for MacRomanMap is undef ("Auto");
4640 19 100       49 $MacRomanMap = undef unless defined($MacRomanMap);
4641              
4642             ## Default for MaxRecords is 0 (meaning import all records)
4643 19 50       69 $MaxRecords = 0 unless (int($MaxRecords) == $MaxRecords);
4644              
4645             ## Precompile a regex for the return encoding since we'll call it often (on each field!) later.
4646 19         101 my $RetRegex = qr/$ReturnEncoding/;
4647              
4648 19         70 $this->progress("Reading $FileName...");
4649              
4650             ## Open the data file.
4651 0         0 my $File = IO::File->new("<$FileName") or
4652 19 50       135 do {$this->{_ErrorMsg} = "Failed to open $FileName: $!"; goto done};
  0         0  
4653            
4654             ## Get its total file size (useful for estimating table size later on).
4655 19 50       1437 my $FileSize = (stat($File))[7] or
4656             $this->{_ErrorMsg} = "File $FileName contains no data.", goto done;
4657            
4658             ## Convert from optional "dos", "mac", "unix" symbolic values.
4659 19         68 $LineEnding = $this->lineending_string($LineEnding);
4660              
4661             ## Default for LineEnding is found by inspecting data in the file.
4662 19 50 33     114 $LineEnding ||= guess_endings($File) or
4663             $this->{_ErrorMsg} = "Could not find any line endings in the file $FileName.", goto done;
4664            
4665             ## DoMacMapping is the actual setting for auto charset mapping
4666 19   66     110 my $DoMacMapping =
4667             ((!defined($MacRomanMap) && ($LineEnding eq "\x0D")) || ## Auto
4668             ($MacRomanMap)); ## On
4669              
4670 19 100       146 $this->progress("Will convert upper-ascii characters if any, from Mac Roman to ISO 8859-1.") if $DoMacMapping;
4671              
4672             ## FieldList is usable is it is a list and has at least one entry.
4673 19   66     73 my $FieldListValid = ((ref($FieldList) eq 'ARRAY') && @$FieldList);
4674            
4675             ## Set <$File> to use the line ending sequence we no known we are looking for.
4676 19         91 local $/ = $LineEnding;
4677            
4678             ## We use $_ explicitly, so must localize.
4679 19         27 local $_;
4680            
4681 19         29 my $IncomingFields;
4682              
4683 19 50       48 if ($HeaderRow)
4684             {
4685             ## Get the list of fields available in the file (first line of file).
4686              
4687 19 50       190 $_ = <$File> or
4688             $this->{_ErrorMsg} = "Could not find a first line with field names in $FileName.", goto done;
4689              
4690             ## Try to guess file delimiter from the header row if not yet specified.
4691 19 50 33     93 $FDelimiter ||= guess_delimiter($_) or
4692             $this->{_ErrorMsg} = "Could not find comma or tab delimiters in $FileName.", goto done;
4693            
4694             ## Maybe convert entire line (all records) Mac to ISO before splitting.
4695 19 100       78 &MacRomanToISORoman8859_1(\ $_) if $DoMacMapping;
4696              
4697 19         48 chomp;
4698            
4699 19         42 s/^\"//; s/\"$//; ## remove possible leading, trailing quotes surrounding header row (rare)
  19         31  
4700              
4701             ## Split header row into field names, removing optional "" around each at the same time.
4702 19         306 $IncomingFields = [split(/\"?$FDelimiter\"?/, $_)];
4703            
4704             }
4705             else
4706             {
4707             ## Otherwise, require that the caller specifies it in _FieldList
4708              
4709 0 0       0 $this->{_ErrorMsg} = "Must specify a _FieldList if _HeaderRow says no header row is present.", goto done
4710             unless $FieldListValid;
4711            
4712 0         0 $IncomingFields = [@$FieldList];
4713             }
4714              
4715             ## Remove any leading underscores in the names of the incoming
4716             ## fields (not allowed because such field names are reserved for
4717             ## other object data). Note: this could result in
4718             ## duplicate/overwritten field names that were otherwise
4719             ## apparently unique in the incoming data file.
4720              
4721 19         55 $IncomingFields = [map {(/^_*(.*)/)[0]} @$IncomingFields];
  79         259  
4722            
4723             ## Make a hash that can be used to map these fields' names to their numbers.
4724 1     1   725 my $IncomingFieldNameToNum = {}; @$IncomingFieldNameToNum{@$IncomingFields} = ($[ .. $#$IncomingFields);
  1         409  
  1         2533  
  19         59  
  19         188  
4725            
4726             ## Make a list of the fields we'll be importing (by taking the
4727             ## list the caller requested, and paring it down to only those
4728             ## fields that are actually available in the table.)
4729              
4730 71         172 my $FieldsToGet =
4731 19 100       61 [grep {exists($IncomingFieldNameToNum->{$_})}
4732             ($FieldListValid ? @$FieldList : @$IncomingFields)];
4733              
4734             ## Make a note of whether we're getting a subset of available
4735             ## fields because the caller requested such. If we are, we'll add
4736             ## a _Subset => 1 marker to the data for use later in ensuring the
4737             ## cache is OK.
4738            
4739 19   100     59 my $GettingSubset = ($FieldListValid && ("@{[sort @$IncomingFields]}" ne
4740             "@{[sort @$FieldList ]}"));
4741            
4742             ## Make an array of the incoming indices of these fields.
4743              
4744             ## Allocate a list of empty arrays into which we can import the
4745             ## data. Initially they'll each have 100 empty slots for data;
4746             ## after we have imported 100 records, we'll re-consider the size
4747             ## estimate. When we're all done, we'll prune them back.
4748              
4749 19         69 my $FieldNums = [@$IncomingFieldNameToNum{@$FieldsToGet}];
4750 19         29 my $FieldVectors = []; foreach (@$FieldNums) {$#{$FieldVectors->[$_] = []} = 100};
  19         95  
  71         78  
  71         431  
4751              
4752             ## We want to be cool and support any embedded NULL (ascii zero)
4753             ## characters should they exist in the data, even though we are
4754             ## going to use NULL chars to encode embedded delimiters before we
4755             ## split....
4756              
4757             ## First we create a sufficiently obscure placeholder for any
4758             ## ascii zero characters in the input text (a rare occurrence
4759             ## anyway).
4760              
4761 19         78 my $ZeroMarker = "\001ASCII_ZERO" . time() . "\001";
4762            
4763             ## Now ready to go through the file line-by-line (record-by-record)
4764              
4765 19         26 my $WroteProg;
4766 19         23 my $RecordsRead = 0;
4767 19         86 while (<$File>)
4768             {
4769             ## Try to guess file delimiter from the header row if not yet specified.
4770 57 50 33     164 $FDelimiter ||= guess_delimiter($_) or
4771             $this->{_ErrorMsg} = "Could not find comma or tab delimiters in $FileName.", goto done;
4772            
4773             ## Maybe convert entire line (all records) ISO to Mac before splitting.
4774 57 100       152 &MacRomanToISORoman8859_1(\ $_) if $DoMacMapping;
4775            
4776             ## Manipulate the single line of data fields into a splittable format.
4777            
4778 57         197 chomp;
4779            
4780             ## Replace any delimiters inside quotes with ASCII 0.
4781             ## Split fields on delimiters.
4782             ## Delete leading or trailing quote marks from each field.
4783             ## Restore delimiters ASCII 0 back to delimiters.
4784            
4785             ## Protect delimiters inside fields.
4786 57         88 s/\000/$ZeroMarker/go; ## Preserve genuine ASCII 0 chars.
4787 57         75 my $InQuote = 0; ## Initialize InQuote flag to zero.
4788 57         368 s/(\")|($FDelimiter)/ ## Replace delimiters inside quotes with ASCII 0 ...
4789 180 50       1239 ($1 ? do{$InQuote^=1; $1} : ## ... if quote char, toggle InQuote flag
  0 50       0  
  0         0  
4790             ($InQuote ? "\000" : $2))/eg; ## ... if delimiter, InQuote sez whether to replace or retain.
4791              
4792             ## Split record into fields, then clean each field.
4793              
4794 57         119 s/^\"//; s/\"$//; ## Kill leading, trailing quotes surrounding each record
  57         90  
4795             my @FieldVals =
4796             map
4797 57 50       439 {if (length($_))
  237         473  
4798             {
4799 237         290 s/\"\"/\"/g; ## Restore Merge format's quoted double-quotes. ("" ==> ")
4800 237         259 s/\000/$FDelimiter/g; ## Restore delimiters inside fields
4801 237         262 s/$ZeroMarker/\000/go; ## Restore preserved ASCII 0 chars.
4802 237 50       836 s/$RetRegex/\n/g if $ReturnMap;## Restore return characters that were coded as ASCII 11 (^K)
4803             }
4804 237         504 $_;} ## Return field val after above mods.
4805             split(/\"?$FDelimiter\"?/, $_); ## Split on delimiters, killing optional surrounding quotes at same time.
4806            
4807             ## Put the data into the vectors
4808 57         140 foreach (@$FieldNums)
4809             {
4810 213 50       623 $FieldVectors->[$_]->[$RecordsRead] = $FieldVals[$_] if (length($FieldVals[$_]));
4811             }
4812 57         80 $RecordsRead++;
4813            
4814             ## Stop if we've read all the records we wanted.
4815 57 50 33     175 last if ($MaxRecords && ($RecordsRead >= $MaxRecords));
4816            
4817             ## Optimization:
4818              
4819             ## After importing 100, 200, 300, 400, etc. records, we
4820             ## re-estimate the size of the table. To help make the field
4821             ## insertion more efficient (by avoiding frequent
4822             ## array-resizing), we can set the sizes of the field vectors
4823             ## to hold at least our estimated number of records).
4824             ## Ideally, this estimation/resize step will happen at most 2
4825             ## or 3 times no matter how big the incoming data file is.
4826              
4827 57         59 my $EstTotalRecords;
4828 57 50 33     140 if ((($RecordsRead % 100) == 0) && ## If we're on a record divisble by 100...
  0         0  
4829             (($RecordsRead + 100 > $#{$FieldVectors->[$FieldNums->[0]]}))) ## ... and we're getting close to max size...
4830             {
4831             ## Then estimate the size we'd like to resize it to.
4832 0         0 $EstTotalRecords = (100 + int($RecordsRead * ($FileSize / tell($File))));
4833 0 0 0     0 $EstTotalRecords = $MaxRecords if ($MaxRecords && ($MaxRecords < $EstTotalRecords));
4834            
4835             ## If this size is greater than the actual size...
4836 0 0       0 if ($EstTotalRecords > $#{$FieldVectors->[$FieldNums->[0]]})
  0         0  
4837             {
4838             ## Then resize all the vectors.
4839             ## $this->progress("$RecordsRead: Resizing to $EstTotalRecords...\n"); ## Debugging
4840 0         0 foreach (@$FieldNums) {$#{$FieldVectors->[$_]} = $EstTotalRecords};
  0         0  
  0         0  
4841             }
4842             }
4843              
4844             ## Try doing timed (throttled to 1 per 2 secs) progress at
4845             ## most every 100th record.
4846 57 0       111 my $Did = ($EstTotalRecords ?
    50          
4847             $this->progress_timed("Reading", "$RecordsRead of $EstTotalRecords (est.)", tell($File), $FileSize, 1) :
4848             $this->progress_timed("Reading", "$RecordsRead" , tell($File), $FileSize, 1))
4849             if (($RecordsRead % 100) == 0);
4850 57   33     477 $WroteProg ||= $Did;
4851             }
4852              
4853             ## If we wrote timed progress but didn't get to give the 100%
4854             ## message yet, print the 100% message now.
4855              
4856 19 50       48 if ($WroteProg)
4857             {
4858 0 0       0 $this->progress_timed("Reading", "$RecordsRead of $RecordsRead", $FileSize, $FileSize, 1)
4859             unless (($RecordsRead % 100) == 0);
4860             }
4861              
4862             ## Print the regular Done message.
4863 19         78 $this->progress("Read $FileName.");
4864              
4865             ## Set the field vectors' length to the exact length we really
4866             ## read.
4867              
4868             ## $this->progress("$RecordsRead: Truncating to @{[$RecordsRead - 1]}... \n"); ## Debugging
4869 19         41 foreach (@$FieldNums) {$#{$FieldVectors->[$_]} = ($RecordsRead - 1)};
  71         79  
  71         160  
4870            
4871             ## Delete any existing columns in the object.
4872 19         25 delete @$this{@{$this->fieldlist_all()}};
  19         64  
4873              
4874             ## Put the new columns into the object.
4875 19         96 @$this{@$FieldsToGet} = @$FieldVectors[@$FieldNums];
4876              
4877             ## Set fieldlist to fields we actually read, in order.
4878 19         60 $this->fieldlist($FieldsToGet);
4879              
4880             ## Remember the line ending char or chars that were successfully
4881             ## used to read the file. The same ending will be used by default
4882             ## to write any file based on this object.
4883 19         57 $this->{_LineEnding} = $this->lineending_symbol($LineEnding);
4884            
4885             ## Remember the field delimiter that was successfully used to read
4886             ## the file. The same delimiter will be used by default to write
4887             ## any file based on this object.
4888 19         66 $this->{_FDelimiter} = $FDelimiter;
4889              
4890             ## Remember the header row setting.
4891 19         29 $this->{_HeaderRow} = $HeaderRow;
4892              
4893             ## Remember the filename used for reading.
4894 19         30 $this->{_FileName} = $FileName;
4895              
4896             ## Remember whether we read (and maybe will cache) a subset of available fields.
4897 19   100     76 $this->{_Subset} = $GettingSubset || 0;
4898              
4899             ## Clean out _Selection and verify _SortOrder to ensure compatibility
4900             ## with current _FieldList.
4901 19         49 $this->read_postcheck();
4902              
4903             ## Other informational data and options, like sort specs, sort
4904             ## routines and so on, need not be changed or replaced when data
4905             ## changes.
4906              
4907 19         26 $Success = 1;
4908              
4909 19 50       40 done:
4910            
4911             $this->warn("FAILURE: $this->{_ErrorMsg}") unless $Success;
4912              
4913 19 50       281 close $File if $File;
4914 19         258 return($Success);
4915             }
4916              
4917             sub read_postcheck ## Called to clean up after a successful read
4918             {
4919 62     62 0 92 my $this = shift;
4920              
4921             ## Run select_all to empty out the _Selection.
4922 62         180 $this->select_all();
4923              
4924             ## Remove any bogus field names from the sort order, if any.
4925 62         167 $this->sortorder_check();
4926             }
4927              
4928             sub read_file_or_cache ## Read, cacheing if possible
4929             {
4930 62     62 0 91 my $this = shift;
4931 62 100       219 my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
4932            
4933 62         117 my($FileName, $FieldList, $CacheOnRead, $CacheExtension, $CacheSubDir) = map {$this->getparam($Params, $_)}
  310         784  
4934             qw(_FileName _FieldList _CacheOnRead _CacheExtension _CacheSubDir);
4935              
4936 62         106 my $Success;
4937              
4938             ## If cacheing is turned off, just bail prematurely and treat this
4939             ## as a call to read_file().
4940              
4941 62 100       163 return($this->read_file(%$Params)) unless $CacheOnRead;
4942              
4943             ## Otherwise... check if cacheing is possible.
4944              
4945             ## Calculate the cache file name. If it comes back empty, it
4946             ## means the cache directory probably could not be created, or the
4947             ## cache file itself either does not exist or could not be
4948             ## preflighted (either read or touched/deleted).
4949              
4950 51         185 my $CacheFileName = $this->prep_cache_file($FileName, $CacheExtension, $CacheSubDir);
4951              
4952             ## If the cache file preflight failed, treat this as a regular
4953             ## read_file() without cacheing.
4954            
4955 51 50       141 return($this->read_file(%$Params)) unless length($CacheFileName);
4956            
4957             ## At this point we believe we'll either be able to read or write
4958             ## the cache file as needed.
4959              
4960             ## Try to read the cache if both files exist and the mod date is
4961             ## later.
4962              
4963 51         63 my $Data;
4964 51 100 66     2501 if ((-e $FileName ) &&
      100        
4965             (-e $CacheFileName ) &&
4966             (((stat($CacheFileName))[9]) > ((stat($FileName))[9])) )
4967             {
4968 46         205 $this->progress("Thawing $CacheFileName...");
4969 46         237 $Data = &retrieve($CacheFileName);
4970            
4971 46 50       4512 $this-warn("Cache restore from $CacheFileName failed: $!"), unlink($CacheFileName)
4972             unless defined ($Data);
4973             }
4974            
4975 51 100       217 if (ref($Data) eq 'HASH')
4976             {
4977             ## Retrieval succeeded.
4978            
4979             ## Verify that the data in the cache is usable.
4980              
4981             ## First, check newline-encoding compatibility.
4982             (
4983 46 50       135 $this->warn("Abandoning cache due to incompatible newline encoding"),
4984             unlink $CacheFileName, goto cache_failed) unless $Data->{_Newline} eq "\n";
4985            
4986             ## Simulate an actual read_file() using data
4987             ## from the cache instead of the original file.
4988              
4989 46 100 66     218 if ((ref($FieldList) eq 'ARRAY') && @$FieldList)
    100          
4990             {
4991             ## If $FieldList requests fields not found in the cache,
4992             ## we abandon (delete and maybe rewrite) the cache: maybe
4993             ## we previously cached a different subset of fields from
4994             ## a previous request and so the cache is no longer
4995             ## adequate.
4996              
4997 18         33 my $MissingFields = [grep {!exists($Data->{$_})} @$FieldList];
  48         134  
4998             (
4999             ## $this->warn("Abandoning cache due to change in requested field list"),
5000 18 100       79 unlink $CacheFileName, goto cache_failed) if @$MissingFields;
5001            
5002             ## If there was a _FieldList supplied in $Params or $this,
5003             ## we might need to omit any fields read from cache but
5004             ## not mentioned (just as read_file() would have done).
5005              
5006 16         113 my $FieldHash = {}; @$FieldHash{@$FieldList} = undef;
  16         53  
5007 16         61 my $OmitFields = [grep {!exists($FieldHash->{$_})} grep {!/^_/} keys %$Data];
  64         120  
  160         327  
5008            
5009 16         52 delete @$Data{@$OmitFields};
5010            
5011             ## Finally, also pare down _FieldList to only mention
5012             ## those fields that were desired....
5013            
5014 16         26 my $AvailFields = $Data->{_FieldList};
5015 16         27 $Data->{_FieldList} = [grep {exists($Data->{$_})} @$FieldList];
  42         105  
5016              
5017             ## We might have ended up reading a subset of the
5018             ## available fields in the cache. (This is logically
5019             ## equivalent to reading a subset of the available fields
5020             ## in the real file.) If so, then change the value of
5021             ## $Data->{_Subset} to indicate that.
5022              
5023 16   100     80 my $GettingSubset = (@$OmitFields && 1 || 0);
5024 16   66     92 $Data->{_Subset} ||= $GettingSubset;
5025             }
5026             elsif ($Data->{_Subset})
5027             {
5028              
5029             ## Conversely, if no field list was specified (hence all
5030             ## fields are desired), but a subset of fields has
5031             ## previously been cached, we have to delete / abandon the
5032             ## cache and re-read the file so we are getting all
5033             ## fields.
5034              
5035             (
5036             ## $this->warn("Abandoning partial cache due to request of full field list"),
5037 1         20 unlink $CacheFileName, goto cache_failed);
5038             }
5039              
5040             ## Copy all elements from $Data, including possibly overridden
5041             ## _FieldList element if any, but excepting the cache-only
5042             ## _Newline element, into $this.
5043              
5044 43         102 delete $Data->{_Newline};
5045 43         312 @$this{keys %$Data} = values %$Data;
5046              
5047             ## Set the file name to the name of the original (not the
5048             ## cache) file, just as read() would have done.
5049              
5050 43         125 $this->{_FileName} = $FileName;
5051              
5052             ## Run the "read post-check" -- the same things we do in
5053             ## read_file() after completing the read process: Clean out
5054             ## _Selection and verify _SortOrder to ensure compatibility
5055             ## with current _FieldList.
5056              
5057 43         176 $this->read_postcheck();
5058              
5059 43         148 $this->progress("Thawed $FileName.");
5060              
5061 43         100 $Success = 1;
5062 43         621 goto done; ## Successful completion: we read from the cache.
5063             }
5064              
5065             ## Could not retrieve for whatever reason (maybe cache did not
5066             ## exist yet or was out of date or had to be abandoned). So just
5067             ## read normally and possibly write the cache.
5068              
5069             cache_failed:
5070             {
5071 8 50       15 $Success = $this->read_file(%$Params) or goto done;
  8         70  
5072              
5073             ## Now, having read successfully, we try to write the cache
5074             ## for next time. Writing the cache is optional; failing to
5075             ## write it is not a failure of the method.
5076              
5077             { ## Code in this block may fail and that's OK.
5078              
5079             ## First, pre-flight.
5080 8 50       19 $this->warn("Cache file $CacheFileName cannot be created/overwritten: $!"),
  8         37  
5081             goto done ## Successful completion.
5082             unless $this->try_file_write($CacheFileName);
5083              
5084             ## The data to be stored is:
5085              
5086             ## 1) All data columns read by read_file()
5087             ## 2) Any parameters set by read_file()
5088             ## 3) _Subset param indicating partial fieldlist was read from file.
5089             ## 4) _Newline setting so we know if it's compatabile when read back.
5090              
5091             ## No other parameters should be cached because we want a
5092             ## read from the cache to produce exactly the same result
5093             ## as a read from the file itself would have produced.
5094              
5095             ## After a read, fieldlist() will contain the fields
5096             ## actually read, so cols_hash() WILL yield all the
5097             ## columns.
5098              
5099 8         39 my $Data = {(
5100             ## Refs to each column read by read_file()
5101             %{ $this->cols_hash() },
5102              
5103             ## Other parameters set by read_file()
5104             _FieldList => $this->{_FieldList },
5105             _LineEnding => $this->{_LineEnding},
5106             _FDelimiter => $this->{_FDelimiter},
5107             _HeaderRow => $this->{_HeaderRow },
5108             _Subset => $this->{_Subset },
5109 8         23 _Newline => "\n",
5110             )};
5111            
5112 8 50       48 $this->warn("Failed to cache $CacheFileName"),
5113             unlink($CacheFileName),
5114             goto done ## Successful completion.
5115             unless $this->write_cache($Data, $CacheFileName);
5116 8         235 chmod 0666, $CacheFileName; ## Liberal perms if possible.
5117             }
5118              
5119 8         73 goto done; ## Successful completion: we read from the file & maybe saved cache.
5120             }
5121            
5122             done:
5123 51         317 return ($Success);
5124             }
5125              
5126             =pod
5127              
5128             =head1 WRITING DATA FILES
5129              
5130             ## Writing some or all data from table into a data file
5131              
5132             $t->write($Path) ## Simple calling convention
5133              
5134             $t->write( ## Named-parameter convention
5135              
5136             ## Params that override params in the object if supplied...
5137              
5138             _FileName => $Path, ## "Base path"; see _WriteExtension
5139              
5140             _WriteExtension=> ".out",## Insert/append extension to _FileName
5141              
5142             _FieldList => [...], ## Fields to write; others ignored
5143             _Selection => [...], ## Record (#s) to write; others ignored
5144              
5145             _HeaderRow => 0, ## Include header row in file
5146              
5147             _LineEnding => undef, ## Record delimiter (default is "\n")
5148             _FDelimiter => undef, ## Field delimiter (default is comma)
5149              
5150             _ReturnMap => 1, ## Whether to encode internal returns
5151             _ReturnEncoding=>"\x0B", ## How to encode returns
5152             _MacRomanMap => undef, ## Whether/when to write Mac char set
5153              
5154              
5155             _CacheOnWrite => 1, ## Enable saving cache after write()
5156             _CacheExtension=> ".x", ## Extension to add to cache file name
5157             _CacheSubDir => "", ## (Sub-)dir, if any, for cache files
5158              
5159             ## Params specific to the read()/write() methods...
5160              
5161             _MaxRecords => 200, ## Limit on how many records to write
5162             )
5163              
5164             $t->write_file() ## Internal: same as write(); ignores cacheing
5165              
5166             write() writes a Merge, CSV, or Tab-delimited file.
5167              
5168             It uses parameters as described above. Any parameters not supplied
5169             will be gotten from the object.
5170              
5171             Using the simple calling convention, just pass it a path which will
5172             override the _FileName parameter in the object, if any.
5173              
5174             All other parameters will come from the object (or will be defaulted
5175             if absent).
5176              
5177             If no _FileName or path is specified, or it is the special string "-"
5178             (dash), then the file handle \ * STDOUT will be used by default (and
5179             you could redirect it to a file). You can supply any open file handle
5180             or IO::File object of your own for the _FileName parameter.
5181              
5182             If write() is writing to a file handle by default or because you
5183             specified one, then no write-cacheing will occur.
5184              
5185             To specify additional parameters or override any parameters in the
5186             object while reading, use the named-parameter calling convention.
5187              
5188             If the object's data was previously filled in using new() or read(),
5189             then the file format parameters from the previous read() method will
5190             still be in the object, so the format of the written file will
5191             correspond as much as possible to the file that was read().
5192              
5193             write() returns the path name of the file actually written, or the
5194             empty string if a supplied file handle or STDOUT was written to, or
5195             undef if there was a failure.
5196              
5197             If write() returns undef, then it will also have set the _ErrorMsg
5198             parameter in the object.
5199              
5200             write() never modifies any data in the object itself.
5201              
5202             Consequently, if you specify a _FieldList or a _Selection, only those
5203             fields or records will be written, but the corresponding parameters in
5204             the object itself will be left untouched.
5205              
5206             =head2 How write() calculates the Path
5207              
5208             The _FileName parameter is shared with the read() method. This
5209             parameter is set by read() and may be overridden when calling write().
5210              
5211             In the base implementation of Data::CTable, write() will try not to
5212             overwrite the same file that was read, which could possibly cause data
5213             loss.
5214              
5215             To avoid this, it does not use the _FileName parameter directly.
5216             Instead, it starts with _FileName and inserts or appends the value of
5217             the _WriteExtension parameter (which defaults to ".out") into the file
5218             name before writing.
5219              
5220             If the _FileName already has an extension at the end, write() will
5221             place the _WriteExtension BEFORE the final extension; otherwise the
5222             _WriteExtension will be placed at the end of the _FileName.
5223              
5224             For example:
5225              
5226             Foobar.txt ==> Foobar.out.txt
5227             Foobar.merge.txt ==> Foobar.merge.out.txt
5228             My_Merge_Data ==> My_Merge_Data.out
5229              
5230             If you DON'T want write() to add a _WriteExtension to _FileName before
5231             it writes the file, then you must set _WriteExtension to empty/undef
5232             either in the object or when calling write(). Or, you could make a
5233             subclass that initializes _WriteExtension to be empty. If
5234             _WriteExtension is empty, then _FileName will be used exactly, which
5235             may result in overwriting the original data file.
5236              
5237             Remember: write() returns the path name it actually used to
5238             successfully write the file. Just as with read(), if the _FileName
5239             you specified did not have a path, then write() will prepend a path
5240             component indicating "current directory" (e.g. "./" on Unix) and this
5241             will be part of the return value.
5242              
5243              
5244             =head2 Cacheing with write()
5245              
5246             By default, Data::CTable only creates a cached version of a file when
5247             it reads that file for the first time (on the assumption that it will
5248             need to read the file again more often than the file's data will
5249             change.)
5250              
5251             But by default, it does not create a cached version of a file when
5252             writing it, on the assumption that the current program probably will
5253             not be re-reading the written file and any other program that wants to
5254             read it can cache it at that time.
5255              
5256             However, if you want write() to create a cache for its output file, it
5257             is much faster to create it on write() than waiting for the next
5258             read() because the next read() will be able to use the cache the very
5259             first time.
5260              
5261             To enable write-cacheing, set _CacheOnWrite to true. Then, after the
5262             write() successfully completes (and only if it does), the cached
5263             version will be written.
5264              
5265             =head1 FORMATTED TABLES (Using Data::ShowTable)
5266              
5267             ## Get formatted data in memory
5268              
5269             my $StringRef = $t->format(); ## Format same data as write()
5270             my $StringRef = $t->format(10); ## Limit records to 10
5271             my $StringRef = $t->format(...); ## Specify arbitrary params
5272             print $$StringRef;
5273              
5274             ## Write formatted table to file or terminal
5275              
5276             $t->out($Dest, ....);## $Dest as follows; other params to format()
5277             $t->out($Dest, 10, ....) ## Limit recs to 10; params to format()
5278              
5279             $t->out() ## print formatted data to STDOUT
5280             $t->out(\*STDERR) ## print to STDERR (or any named handle)
5281             $t->out("Foo.txt") ## print to any path (file to be overwritten)
5282             $t->out($FileObj) ## print to any object with a print() method
5283              
5284             out() takes a first argument specifying a destination for the output,
5285             then passes all other arguments to format() to create a nice-looking
5286             table designed to be human-readable; it takes the resulting buffer and
5287             print()s it to the destination you specified.
5288              
5289             Sample output:
5290              
5291             +-------+------+-----+-------+
5292             | First | Last | Age | State |
5293             +-------+------+-----+-------+
5294             | Chris | Zack | 43 | CA |
5295             | Marco | Bart | 22 | NV |
5296             | Pearl | Muth | 15 | HI |
5297             +-------+------+-----+-------+
5298              
5299             (Note extra space character before each line.)
5300              
5301             The destination may be a file handle (default if undef is \*STDOUT), a
5302             string (treated as a path to be overwritten), or any object that has a
5303             print() method, especially an object of type IO::File.
5304              
5305             The main purpose of out() is to give you a quick way to dump a table
5306             when debugging. out() calls format() to create the output, so read
5307             on...
5308              
5309             format() produces a human-readable version of a table, in the form of
5310             a reference to a string buffer (which could be very large), and
5311             returns the buffer to you. Dereference the resulting string reference
5312             before using.
5313              
5314             If format() is given one argument, that argument is the _MaxRecords
5315             parameter, which limits the length of the output.
5316              
5317             Otherwise, format() takes the following named-parameter arguments,
5318             which can optionally override the corresponding parameters, if any, in
5319             the object:
5320              
5321             _FieldList ## Fields to include in table
5322             _Selection ## Records to be included, in order
5323              
5324             _SortSpecs ## SortType controls number formatting
5325             _DefaultSortType
5326              
5327             _MaxRecords ## Limit number of records output
5328              
5329             _MaxWidth ## Limit width of per-col. data in printout
5330              
5331             format() will obey _MaxRecords, if you'd like to limit the number of
5332             rows to be output. _MaxRecords can also be a single argument to
5333             format(), or a second argument to out() if no other parameters are
5334             passed.
5335              
5336             format() also recognizes the _SortSpecs->{SortType} and
5337             _DefaultSortType parameters to help it determine the data types of the
5338             fields being formatted. Fields of type "Number" are output as
5339             right-justified floats; "Integer" or "Boolean" are output as
5340             right-justified integers, and all others (including the default:
5341             String) are output as left-justified strings.
5342              
5343             In addition, there is one parameter uniquely supported by format() and
5344             out():
5345              
5346             =over 4
5347              
5348             =item _MaxWidth ||= 15;
5349              
5350             =back
5351              
5352             _MaxWidth specifies the maximum width of columns. If unspecifed, this
5353             will be 15; the minimum legal value is 2. Each column may actually
5354             take up 3 more characters than _MaxWidth due to divider characters.
5355              
5356             The data to be output will be examined, and only the necessary width
5357             will be used for each column. _MaxWidth just limits the upper bound,
5358             not the lower.
5359              
5360             Data values that are too wide to fit in _MaxWidth spaces will be
5361             truncated and the tilde character "~" will appear as the last
5362             character to indicate the truncation.
5363              
5364             Data values with internal returns will have the return characters
5365             mapped to slashes for display.
5366              
5367             format() and out() will NOT wrap entries onto a second line,
5368             like you may have seen Data::ShowTable::ShowBoxTable do in some cases.
5369             Each record will get exactly one line.
5370              
5371             format() and out() ignore the _HeaderRow parameter. A header
5372             row showing the field names is always printed.
5373              
5374             format() and out() make no attempt to map upper-ascii characters from
5375             or to any particular dataset. The encoding used in memory (generally
5376             ISO 8859-1 by default) is the encoding used in the output. If you
5377             want to manipulate the encoding, first call format(), then change the
5378             encoding, then format the resulting table.
5379              
5380             =cut
5381              
5382             sub write ## Write, cacheing afterward if possible
5383             {
5384 8     8 1 122 my $this = shift;
5385 8         26 return($this->write_file_and_cache(@_));
5386             }
5387              
5388             sub write_file_and_cache ## Write, cacheing afterward if possible
5389             {
5390 8     8 0 14 my $this = shift;
5391              
5392 8 50       34 my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
5393            
5394 8         20 my($FieldList, $LineEnding, $FDelimiter, $HeaderRow, $CacheOnWrite, $CacheExtension, $CacheSubDir) = map {$this->getparam($Params, $_)}
  56         99  
5395             qw($FieldList _LineEnding _FDelimiter _HeaderRow _CacheOnWrite _CacheExtension _CacheSubDir);
5396              
5397             ## First write the file and go to done if it failed.
5398 8 50       36 my $WriteFileName = $this->write_file(@_) or goto done;
5399              
5400             ## Only try to cache if we got a non-empty $WriteFileName back.
5401             ## We won't get a name back in the case where we wrote directly to
5402             ## an open file handle.
5403            
5404 8 50       21 goto done unless $WriteFileName;
5405            
5406             ## Only try to cache if $CacheOnWrite has been turned ON.
5407 8 50       566 goto done unless $CacheOnWrite;
5408              
5409             ## Now, having written successfully, we try to write the cache for
5410             ## next time. Writing the cache is always optional; failing to
5411             ## write it is not a failure of this method. Consequently, any
5412             ## "goto done" statements beyond this point will still result in a
5413             ## successful outcome since $WriteFileName will have a value.
5414              
5415             ## Calculate the name of the cache file and fail the directory
5416             ## creation fails. prep_cache_file will have generated a warning
5417             ## if an attempt to create needed subdirectories has failed.
5418              
5419 0 0       0 my $CacheFileName = $this->prep_cache_file($WriteFileName, $CacheExtension, $CacheSubDir)
5420             or goto done;
5421            
5422             ## Pre-flight the cache file for writing.
5423 0 0       0 $this->warn("Cache file $CacheFileName cannot be created/overwritten: $!"),
5424             goto done ## Successful completion.
5425             unless $this->try_file_write($CacheFileName);
5426              
5427             ## The data to be stored is:
5428            
5429             ## 1) All data columns written by write_file()
5430             ## 2) Any file format parameters used by write_file()
5431              
5432             ## Calculate the main writing-related parameters using the same
5433             ## logic that write_file() uses...
5434              
5435             ## Default for FieldList is all fields.
5436 0   0     0 $FieldList ||= $this->fieldlist();
5437            
5438             ## Convert from optional "dos", "mac", "unix" symbolic values.
5439 0         0 $LineEnding = $this->lineending_string($LineEnding);
5440            
5441             ## Default for LineEnding is "\n" (CR on Mac; LF on Unix; CR/LF on DOS)
5442 0 0       0 $LineEnding = "\n" unless length($LineEnding);
5443            
5444             ## Default for FDelimiter is comma
5445 0 0       0 $FDelimiter = ',' unless length($FDelimiter);
5446              
5447             ## Default for HeaderRow is true.
5448 0 0       0 $HeaderRow = 1 unless defined($HeaderRow);
5449              
5450             ## No other parameters should be cached because we want a
5451             ## read from the cache to produce exactly the same result
5452             ## as a read from the file itself would have produced.
5453            
5454 0         0 my $Data = {(
5455             ## Refs to each column written
5456 0   0     0 %{ $this->cols_hash($FieldList)},
5457            
5458             ## Other relevant file-format parameters
5459             _FieldList => $FieldList,
5460             _LineEnding => $LineEnding,
5461             _FDelimiter => $FDelimiter,
5462             _HeaderRow => $HeaderRow,
5463             _Subset => $this->{_Subset} || 0,
5464             _Newline => "\n",
5465            
5466             ## We don't need to save _ReturnMap and
5467             ## _ReturnEncoding because those only are relevant
5468             ## when reading physical files. Cached data has the
5469             ## return chars already encoded as returns.
5470            
5471             )};
5472            
5473 0 0       0 $this->warn("Failed to cache $CacheFileName"),
5474             unlink($CacheFileName), ## Delete cache if failure
5475             goto done ## Successful completion.
5476             unless $this->write_cache($Data, $CacheFileName);
5477 0         0 chmod 0666, $CacheFileName; ## Liberal perms if possible.
5478            
5479 8         36 done:
5480             return($WriteFileName);
5481             }
5482              
5483             sub write_cache
5484             {
5485 8     8 0 17 my $this = shift;
5486 8         15 my ($Data, $CacheFileName) = @_;
5487            
5488 8         34 $this->progress("Storing $CacheFileName...");
5489            
5490 8         42 my $Success = nstore($Data, $CacheFileName);
5491            
5492 8 50       1641 $this->progress("Stored $CacheFileName.") if $Success;
5493            
5494 8         36 done:
5495             return($Success);
5496             }
5497              
5498             sub write_file ## Just write; don't worry about cacheing
5499             {
5500 8     8 0 14 my $this = shift;
5501 8 50       31 my $Params = (@_ == 1 ? {_FileName => $_[0]} : {@_});
5502              
5503 96         267 my($FileName, $FieldList, $Selection, $MaxRecords, $LineEnding, $FDelimiter, $QuoteFields, $ReturnMap, $ReturnEncoding, $MacRomanMap, $HeaderRow, $WriteExtension)
5504 8         23 = map {$this->getparam($Params, $_)}
5505             qw(_FileName _FieldList _Selection _MaxRecords _LineEnding _FDelimiter _QuoteFields _ReturnMap _ReturnEncoding _MacRomanMap _HeaderRow _WriteExtension);
5506              
5507 8         17 my $Success;
5508            
5509 8         27 $this->{_ErrorMsg} = "";
5510              
5511             ## if FileName is unspecified, or is the single character "-",
5512             ## then default to STDOUT.
5513 8 50       42 $FileName = \ *STDOUT if ($FileName =~ /^-?$/);
5514            
5515             ## If we have a regular file handle, bless it into IO::File.
5516 8 50       30 $FileName = bless ($FileName, 'IO::File') if ref($FileName) =~ /(HANDLE)|(GLOB)/;
5517            
5518             ## If we have a file handle either passed or constructed, make note of that fact.
5519 8         18 my $GotHandle = ref($FileName) eq 'IO::File';
5520            
5521 8 50 33     6343 $this->{_ErrorMsg} = "FileName must be specified for write()", goto done
5522             unless $GotHandle or length($FileName);
5523            
5524             ## Default for FieldList is all fields.
5525 8   33     20 $FieldList ||= $this->fieldlist();
5526            
5527             ## Default for Selection is all records.
5528 8   33     38 $Selection ||= $this->selection();
5529              
5530             ## Default for MaxRecords is 0 (meaning write all records)
5531 8 50       27 $MaxRecords = 0 unless (int($MaxRecords) == $MaxRecords);
5532              
5533             ## Convert from optional "dos", "mac", "unix" symbolic values.
5534 8         20 $LineEnding = $this->lineending_string($LineEnding);
5535              
5536             ## Default for LineEnding is "\n" (CR on Mac; LF on Unix; CR/LF on DOS)
5537 8 50       30 $LineEnding = "\n" unless length($LineEnding);
5538              
5539             ## Default for FDelimiter is comma
5540 8 50       22 $FDelimiter = ',' unless length($FDelimiter);
5541              
5542             ## Default for QuoteFields is undef (auto)
5543 8 50       81 $QuoteFields = undef unless defined($QuoteFields);
5544              
5545             ## "QuoteCheck" mode means check each field -- this is the "auto"
5546             ## mode that kicks in when _QuoteFields is undef.
5547 8         13 my $QuoteCheck = (!defined($QuoteFields));
5548            
5549             ## Default for ReturnMap is true.
5550 8 50       22 $ReturnMap = 1 unless defined($ReturnMap);
5551              
5552             ## Default for MacRomanMap is undef ("Auto");
5553 8 50       17 $MacRomanMap = undef unless defined($MacRomanMap);
5554              
5555             ## DoMacMapping is the actual setting for auto charset mapping
5556 8   66     54 my $DoMacMapping =
5557             ((!defined($MacRomanMap) && ($LineEnding eq "\x0D")) || ## Auto
5558             ($MacRomanMap)); ## On
5559            
5560 8 100       28 $this->progress("Will convert upper-ascii characters if any, from ISO-8859-1 to Mac Roman.") if $DoMacMapping;
5561              
5562             ## Default for ReturnEncoding is "\x0B" (control-K; ASCII 11)
5563 8 50       22 $ReturnEncoding = "\x0B" unless length($ReturnEncoding);
5564              
5565             ## Default for HeaderRow is true.
5566 8 50       17 $HeaderRow = 1 unless defined($HeaderRow);
5567              
5568             ## Default for $WriteExtension is "" (none) -- meaning use exact $FileName
5569 8 50       18 $WriteExtension = "" unless defined($WriteExtension);
5570              
5571             ## Get a hash of fields actually present...
5572              
5573 8         22 my $AllFields = $this->fieldlist_all();
5574 8         14 my $AllFieldsHash = {}; @$AllFieldsHash{@$AllFields} = undef;
  8         32  
5575              
5576             ## Cull $FieldList to only include fields we have...
5577              
5578 8         18 $FieldList = [grep {exists($AllFieldsHash->{$_})} @$FieldList];
  33         74  
5579            
5580             ## Ensure $Selection contains only valid record numbers...
5581              
5582 8         31 $Selection = $this->selection_validate_internal($Selection);
5583              
5584             ## Get an ordered list of the columns.
5585 8         34 my $Columns = [@$this{@$FieldList}];
5586              
5587             ## Calculate the name of the file we'll write to, if any, and get
5588             ## the file handle either from the one we were given or by opening
5589             ## the specified file for writing.
5590              
5591 8         12 my $WriteFileName;
5592             my $OutFile;
5593              
5594 8 50       26 if ($GotHandle)
5595             {
5596 0         0 $WriteFileName = "";
5597 0         0 $OutFile = $FileName; ## Actually a handle.
5598             }
5599             else
5600             {
5601             ## Calculate the name of the file we're going to try to write to.
5602 8 50       36 $WriteFileName =
5603            
5604             ## Use file name exactly if specified in write() call.
5605             ($Params->{_FileName} ? $FileName :
5606            
5607             ## Otherwise, calculate the name by adding/appending the _WriteExtension.
5608             $this->write_file_name($FileName, $WriteExtension));
5609            
5610             ## Ensure the directory that will hold the file actually exists.
5611 1     1   7 use File::Basename qw(fileparse);
  1         2  
  1         212  
5612 8         182 my ($Basename, $Path, $Ext) = fileparse($WriteFileName, '\.[^\.]+');
5613 8         18 my ($Sep, $Up, $Cur) = @{$this->path_info()}{qw(sep up cur)};
  8         17  
5614 8   33     26 $Path ||= $Cur; ## Once again, default $Path to cwd just in case.
5615            
5616 8 50       24 $this->{_ErrorMsg} = "Can't make directory $Path to save $WriteFileName: $!", goto done
5617             unless $this->verify_or_create_path($Path, $Sep);
5618            
5619             ## Ensure the directory is writeable and the file is overwriteable
5620             ## if it exists.
5621 8 50       93 $this->{_ErrorMsg} = "Directory $Path is not writeable.", goto done
5622             unless (-w $Path);
5623            
5624 8 50 33     206 $this->{_ErrorMsg} = "File $WriteFileName cannot be overwritten.", goto done
5625             if (-e $WriteFileName && !(-w $WriteFileName));
5626            
5627             ## Open the file for write.
5628            
5629 1     1   6 use IO::File;
  1         2  
  1         1613  
5630 8         60 $OutFile = IO::File->new(">$WriteFileName");
5631 8 50       983 $this->{_ErrorMsg} = "Failed to open $WriteFileName for writing: $!", goto done
5632             unless $OutFile;
5633            
5634 8         37 $this->progress("Writing $WriteFileName...");
5635             }
5636              
5637             ## Figure out the line initiator & ender strings, and delimiter sequence.
5638              
5639 8         12 my ($LineStartQuote, $Delim, $LineEndQuote);
5640 8 50       17 if ($QuoteFields)
5641             {
5642             ## In "forced" quote mode, we just do the quoting by always
5643             ## putting them at the start and end of lines and in between
5644             ## each field.
5645              
5646 0         0 $LineStartQuote = "\"";
5647 0         0 $LineEndQuote = "\"";
5648 0         0 $Delim = "\"$FDelimiter\"";
5649             }
5650             else
5651             {
5652             ## In no-quote or auto-quote mode, we don't put the quotes in
5653             ## these places; they'll either be omitted entirely or
5654             ## inserted per-field.
5655              
5656 8         59 $LineStartQuote = '';
5657 8         82 $LineEndQuote = '';
5658 8         14 $Delim = $FDelimiter;
5659             }
5660              
5661             ## Precompile a regex that checks for quotes, the field delimiter
5662             ## sequence, the line ending sequence, or any line-ending-ish
5663             ## characters at all. In $QuoteCheck mode, we'll use this to
5664             ## identify fields needing double-quotes.
5665              
5666 8 50       211 my $QuoteOrDelimCheck = qr{(?:\Q$LineEnding\E)|(?:\Q$FDelimiter\E)|[\"\x0D\x0A]} if $QuoteCheck;
5667              
5668             ## Precompile a return-map-checking regex that checks first for
5669             ## the line ending actively in use and then for the platform's
5670             ## "\n". These may be the same, but are not necessarily always
5671             ## the same. We don't return-map any old \x0D or \x0A because,
5672             ## for example in a pure Unix world, \x0D would not be interpreted
5673             ## as line-ending-related and hence is a valid character in a
5674             ## field.
5675              
5676 8 50       125 my $ReturnMapCheck = qr{(?:\Q$LineEnding\E)|(?:\n)} if $ReturnMap;
5677              
5678             ## Print out the header row that lists the field names in order.
5679            
5680 8 50       27 if ($HeaderRow)
5681             {
5682 33         62 my $Line = ($LineStartQuote .
5683             join($Delim,
5684             map
5685             {
5686             ## Quote any " character as ""
5687 8         19 (my $X = $_) =~ s/\"/\"\"/g;
5688            
5689             ## In QuoteCheck mode _QuoteFields =>
5690             ## undef ("auto"): put quotes around
5691             ## field only if required.
5692            
5693 33 50 33     258 $X = "\"$X\"" if $QuoteCheck && $X =~ $QuoteOrDelimCheck;
5694            
5695             ## Convert returns back to \x0B
5696 33 50       142 $X =~ s/$ReturnMapCheck/$ReturnEncoding/g if $ReturnMap;
5697            
5698 33         105 $X;
5699             } @$FieldList) .
5700             $LineEndQuote .
5701             $LineEnding);
5702            
5703             ## Maybe convert entire line (all records) Mac to ISO before writing it.
5704 8 100       41 &ISORoman8859_1ToMacRoman(\ $Line) if $DoMacMapping;
5705            
5706 8 50       55 $OutFile->print($Line) if $HeaderRow;
5707             }
5708            
5709             ## Print out each row (record). Fields are output in $FieldList
5710             ## order (same order as they were in the header row, if any).
5711             ## Records are printed in the order specified in $Selection.
5712            
5713 8         114 my $WroteProg;
5714 8         16 my $TotalLen = @$Selection+0;
5715 8 50       16 my $RecordsToWrite = ($MaxRecords ? min($this, $MaxRecords, $TotalLen) : $TotalLen);
5716 8         9 my $RecordsWritten = 0;
5717              
5718 8         15 foreach my $i (@$Selection)
5719             {
5720 99         175 my $Line = ($LineStartQuote .
5721             join($Delim,
5722             map
5723             {
5724             ## Quote any " character as ""
5725 24         42 (my $X = $_->[$i]) =~ s/\"/\"\"/g;
5726            
5727             ## In QuoteCheck mode _QuoteFields =>
5728             ## undef ("auto"): put quotes around
5729             ## field only if required.
5730            
5731 99 50 33     713 $X = "\"$X\"" if $QuoteCheck && $X =~ $QuoteOrDelimCheck;
5732            
5733             ## Convert returns back to \x0B
5734 99 50       369 $X =~ s/$ReturnMapCheck/$ReturnEncoding/g if $ReturnMap;
5735            
5736 99         252 $X;
5737             } @$Columns) .
5738             $LineEndQuote .
5739             $LineEnding);
5740            
5741             ## Maybe convert entire line (all records) ISO to Mac before writing it.
5742 24 100       79 &ISORoman8859_1ToMacRoman(\ $Line) if $DoMacMapping;
5743            
5744 24         70 $OutFile->print($Line);
5745            
5746 24         117 $RecordsWritten++;
5747              
5748             ## Try doing timed (throttled to 1 per 2 secs) progress at
5749             ## most every 100th record.
5750 24 50       54 my $Did = $this->progress_timed("Writing", $RecordsWritten, $RecordsWritten, $RecordsToWrite, 1)
5751             if (($RecordsWritten % 100) == 0);
5752 24   33     82 $WroteProg ||= $Did;
5753            
5754             ## Stop if we have written all the records we wanted.
5755 24 100       65 last if ($RecordsWritten >= $RecordsToWrite);
5756             }
5757            
5758             ## If we wrote timed progress but didn't get to give the 100%
5759             ## message yet, print the 100% message now.
5760 8 50       18 if ($WroteProg)
5761             {
5762 0 0       0 my $FinalProg = $this->progress_timed("Writing", $RecordsWritten, $RecordsWritten, $RecordsToWrite, 1)
5763             unless (($RecordsWritten % 100) == 0);
5764             }
5765            
5766             ## Print the regular Done message.
5767 8 50       20 if ($GotHandle)
5768             {
5769 0         0 $this->progress("Done writing.");
5770 0         0 $Success = 1;
5771             }
5772             else
5773             {
5774 8         33 $this->progress("Wrote $WriteFileName.");
5775             }
5776              
5777 8 50       29 if (!$GotHandle)
5778             {
5779             ## Close the file and check the exit code.
5780 8         36 $OutFile->close();
5781 8         425 $Success = (($?>>8) == 0);
5782            
5783 8 50       27 $this->{_ErrorMsg} = "Unexpected failure writing $WriteFileName ($!)", goto done
5784             unless $Success;
5785             }
5786              
5787             done:
5788 8 50       22 $this->warn("FAILURE: $this->{_ErrorMsg}") unless $Success;
5789              
5790 8 50       105 return($Success ? $WriteFileName : undef);
5791             }
5792              
5793             sub write_file_name ## Calculate the name of the file to be written.
5794             {
5795 8     8 0 15 my $this = shift;
5796 8         14 my ($FileName, $WriteExtension) = @_;
5797              
5798             ## Break the path into its parts...
5799 1     1   6 use File::Basename qw(fileparse);
  1         2  
  1         4249  
5800 8         246 my ($Basename, $Path, $Ext) = fileparse($FileName, '\.[^\.]+');
5801              
5802             ## If no directory is explicitly named, set $Path to be the
5803             ## implicit "current directory" (e.g. "./")
5804            
5805 8         19 my ($Sep, $Up, $Cur) = @{$this->path_info()}{qw(sep up cur)};
  8         21  
5806 8   33     35 $Path ||= $Cur;
5807            
5808             ## If $WriteExtension is empty, which is allowed, then the result
5809             ## will be the same as $FileName, which could in some cases result
5810             ## in the overwriting of the same file that was read in (and may
5811             ## be what is intended).
5812              
5813 8         19 my $WriteFileName = "$Path$Basename$WriteExtension$Ext";
5814              
5815 8         24 return($WriteFileName);
5816             }
5817              
5818             sub out
5819             {
5820 0     0 0 0 my $this = shift;
5821 0         0 my $Dest = shift;
5822              
5823 0         0 my $Success;
5824              
5825             ## First do the formatting (or fail) -- format() will warn if needed.
5826 0 0       0 my $Data = $this->format(@_) or goto done;
5827              
5828             ## If $Dest is empty or not defined, use STDOUT.
5829 0   0     0 $Dest ||= \*STDOUT;
5830            
5831             ## If given an IO handle (such as \*STDERR), bless and use it.
5832 0 0       0 if (ref($Dest) eq 'HANDLE') {$Dest = bless($Dest , 'IO::File')};
  0         0  
5833            
5834             ## If it is not an object, treat it as a file name to be opened.
5835 0 0 0     0 if (!ref($Dest)) {$Dest = (IO::File->new(">$Dest") or
  0         0  
5836             $this->warn("Can't open file $Dest: $!"), goto done)};
5837              
5838             ## At this point treat $Dest as an object with a print method and
5839             ## complain if the print method doesn't return a true value.
5840            
5841 0 0       0 $Dest->print($$Data) or $this->warn("Had trouble writing file: $!"), goto done;
5842            
5843 0         0 $Success = 1;
5844 0         0 done:
5845             return($Success);
5846             }
5847              
5848             sub format ## use Data::ShowTable to format the table in a pretty way.
5849             {
5850 1     1 1 3 my $this = shift;
5851 1 50       7 my $Params = (@_ == 1 ? {_MaxRecords => $_[0]} : {@_});
5852              
5853 1         3 my($Selection, $FieldList, $SortSpecs, $DefaultSortType, $MaxRecords, $MaxWidth) = map {$this->getparam($Params, $_)}
  6         24  
5854             qw(_Selection _FieldList _SortSpecs _DefaultSortType _MaxRecords _MaxWidth);
5855              
5856             ## This method relies on Data::ShowTable.
5857 1 50       7 $this->warn("@{[__PACKAGE__]}::show() requires optional Data::ShowTable module."), goto done
  1         10  
5858             unless $HaveShowTable;
5859            
5860 0   0     0 $FieldList ||= $this->fieldlist();
5861 0   0     0 $Selection ||= $this->selection();
5862 0   0     0 $SortSpecs ||= {};
5863 0 0       0 $DefaultSortType = 'String' unless (length($DefaultSortType));
5864 0   0     0 $MaxRecords ||= 0; ## Default is no maximum (all records).
5865 0   0     0 $MaxWidth ||= 15; ## Zero or undef means use default.
5866 0         0 $MaxWidth = max(2, $MaxWidth); ## MaxWidth must not be less than 2
5867            
5868 0         0 my $TypeMap = {qw(string char
5869             text text
5870             integer int
5871             number numeric
5872             boolean int)};
5873              
5874 0 0 0     0 my $Types = [map {$TypeMap->{lc(@{$SortSpecs->{$_} || {}}{SortType} ||
  0         0  
5875             $DefaultSortType) } || 'string'}
5876             @$FieldList];
5877              
5878 0         0 my $TotalLen = @$Selection+0;
5879 0 0       0 my $RecordsToWrite = ($MaxRecords ? min($this, $MaxRecords, $TotalLen) : $TotalLen);
5880              
5881             ## The row-yielder subroutine and its private state variables.
5882 0         0 my $SelNum = 0;
5883             my $RowSub = sub ## A closure over the local vars in this subroutine.
5884             {
5885             ## We might be asked to rewind.
5886 0     0   0 my ($Rewind) = @_;
5887 0 0       0 $SelNum = 0, return(1) if ($Rewind);
5888            
5889             ## Done if we've written all rows.
5890 0 0       0 return() if $SelNum >= $RecordsToWrite;
5891            
5892             ## Otherwise, yield a row if we still can.
5893 0 0       0 my $List = [map
5894             {
5895             ## Truncate as needed.
5896 0         0 my $X = (length > $MaxWidth ?
5897             (substr($_, 0, ($MaxWidth - 1)) . '>') : $_);
5898              
5899             ## Encode returns, tabs as carets.
5900 0         0 $X =~ s{(?:\x0D\x0A)|[\x0D\x0A\x09]}{^}g;
5901            
5902 0         0 $X;
5903             }
5904 0         0 @{$this->row_list($Selection->[$SelNum++], $FieldList)}];
5905            
5906 0         0 return(@$List);
5907 0         0 };
5908              
5909             ## Locally replace put() and out() in Data::ShowTable so we can
5910             ## gather the data into memory instead of having it go right out
5911             ## to STDOUT before we may want it to.
5912              
5913             ## Too bad Data::ShowTable is not a subclassable object instead.
5914              
5915 0         0 my $Data = [""]; ## Array to hold output from out() and put()
5916             {
5917 0         0 local *{Data::ShowTable::out} = sub ## See sub out in ShowTable.pm
5918             {
5919 0     0   0 my $fmt = shift;
5920 0 0       0 $fmt .= "\n" unless $fmt =~ /\n$/;
5921 0         0 $Data->[-1] .= sprintf($fmt, @_);
5922 0         0 push @$Data, "";
5923 0         0 };
5924            
5925             local *{Data::ShowTable::put} = sub ## See sub put in ShowTable.pm
5926             {
5927 0     0   0 my $fmt = shift();
5928 0         0 $Data->[-1] .= sprintf($fmt, @_);
5929 0         0 };
5930            
5931 0         0 &ShowBoxTable({titles => $FieldList,
5932             types => $Types,
5933             row_sub => $RowSub,
5934             widths => [], ## Will calculate from the data
5935             });
5936             }
5937            
5938             ## Remove spurious extra newline entries at end of $Data
5939 0 0       0 pop @$Data if $Data->[-1] =~ /^\s*$/s;
5940 0 0       0 pop @$Data if $Data->[-1] =~ /^\s*$/s;
5941              
5942 0         0 my $Formatted = join("", @$Data);
5943              
5944 1         7 done:
5945             return(\ $Formatted);
5946             }
5947              
5948              
5949             =pod
5950              
5951             =head1 APPENDING / MERGING / JOINING TABLES
5952              
5953             ## Append all records from a second table
5954              
5955             $t->append($Other) ## Append records from $Other
5956             $t->append_file($File, $Params) ## Append from new($Params, $File)
5957             $t->append_files($Files, $Params) ## Call append_file for all files
5958             $t->append_files_new($Files, $Params) ## Internal helper routine
5959              
5960             ## Combine all fields from a second table
5961              
5962             $t->combine($Other) ## Combine fields from $Other
5963             $t->combine_file($File, $Params) ## Combine new($Params, $File)
5964             $t->combine_files($Files, $Params) ## combine_file on each file
5965              
5966             ## Left-join records from a second table (lookup field vals)
5967              
5968             $t->join ($Other, $KeyField1, [$KeyField2, $Fields])
5969             $t->join_file ($File, $Params, $KeyField1, [$KeyField2, $Fields])
5970             $t->join_files($Files, $Params, $KeyField1, [$KeyField2, $Fields])
5971              
5972             The append() method concatenates all the records from two CTable
5973             objects together -- even if the two tables didn't start out with
5974             exactly the same fields (or even any of the same fields).
5975              
5976             It takes all the data records from another CTable object and appends
5977             them into the present table. Any columns present in the $Other table
5978             but not in the first table, are created (and the corresponding field
5979             values in the first table will all be empty/undef). Similarly, any
5980             columns present in $t but not present in $Other will be extended
5981             to the correct new length as necessary and the field values in the
5982             original columns will be empty/undef. Columns present in both will,
5983             of course, have all the data from both the original sets of data.
5984              
5985             All data from the second table is brought into the first one. No
5986             attempt whatsoever is made to eliminate any duplicate records that
5987             might result.
5988              
5989             The number of records (length()) after this call is the sum of the
5990             length() of each of the tables before the operation.
5991              
5992             IMPORTANT NOTE: The data from the $Other object is COPIED in memory
5993             into the new object. This could be hard on memory if $Other is big.
5994             Might want to be sure to discard $Other when you're done with it.
5995              
5996             $Other is left untouched by the operation.
5997              
5998             All columns from both tables are combined whether or not they are
5999             mentioned in the custom field list of either.
6000              
6001             The custom field lists, if present in either table object, are
6002             concatenated into this object's custom field list, but with
6003             duplications eliminated, and order retained.
6004              
6005             Any existing custom selections, custom sort order, sort specs, and/or
6006             sort routines are also combined appropriately, with settings from this
6007             object taking precedence over those from $Other anywhere the two have
6008             conflicting settings.
6009              
6010             append_file() takes a file name and optional $Params hash. It uses
6011             those to create a new() object with data read from the file. Then,
6012             the new table is appended to $t using append() and then the new table
6013             is discarded.
6014              
6015             append_files() is a convenience function that calls append_file() on
6016             each file in a list, using the same optional $Params for each.
6017              
6018             append_files_new() is the internal routine that implements the
6019             processing done by new() on the optional list of files to be read. It
6020             does the following: It calls read() on the first file in the list.
6021             Then, it calls append_files() to read the remaining into their own
6022             new() objects of the same class as $t and using the same $Params to
6023             new() (if any were supplied). Then each of these is append()-ed in
6024             turn to $t and discarded. The final result will be that $t will hold
6025             a concatenation of all the data in all the files mentioned. However,
6026             consistent with the behavior of append(), the _FileName parameter and
6027             other read()-controlled settings will correspond to the first file
6028             read. The intermediate objects are discarded.
6029              
6030             NOTE: As with new() and read(), if a non-empty _FieldList Param is
6031             specified, the read() methods called internally by the append_file*()
6032             methods will read only the fields mentioned and will ignore any other
6033             fields in the files.
6034              
6035             =head2 Combining tables
6036              
6037             combine() adds columns from a second table into the current one.
6038              
6039             CAUTION: You should only use combine() when you have two tables where
6040             all the (possibly selected) records in the second table line up
6041             perfectly with all the (unselected) records in the first table -- in
6042             other words, each table before combine() should contain a few of the
6043             columns of the new table -- for example, maybe one table contains a
6044             column of file names, and the other contains columns of corresponding
6045             file sizes and modification times. If you don't understand the
6046             consequences of combine, don't use it or you could end up with some
6047             records whose field values don't refer to the same object. (Maybe you
6048             meant to use append() or join() instead.)
6049              
6050             If the second table has a custom field list, only those columns are
6051             brought in.
6052              
6053             If any column in the second table has the same name as one in the
6054             current table, the incoming column replaces the one by the same name.
6055              
6056             All columns are COPIED from the second table, so the first table owns
6057             the new data exclusively.
6058              
6059             If the second table has a selection, only those records are copied, in
6060             selection order. (select_all() first if that's not what you want.)
6061              
6062             The selection in the first table, if any, is ignored during the
6063             combine. If this isn't what you want, then consider using cull()
6064             before combine().
6065              
6066             Field list and sort order are concatenated (but retaining uniqueness:
6067             second mentions of a field in the combined lists are omitted).
6068              
6069             Custom sort routines and sort specs are combined, with those in the
6070             first table taking precedence over any copied in with the same name.
6071              
6072             The custom _Selection from the first table, if any, is retained. (It
6073             will initially omit any records added by extend()).
6074              
6075             All other parameters from the first table are retained, and from the
6076             second table are ignored.
6077              
6078             combine() calls extend() after combining to ensure that all columns
6079             have the same length: if either the older or newer columns were
6080             shorter, they will all be set to the length of the longest columns in
6081             the table -- creating some empty field values at the end of the
6082             lengthened columns.
6083              
6084             combine_file() does the same as combine() except starting with a file
6085             name, first creating the $Other object by creating it using
6086             new($Params, $File), then discarding it after combining.
6087              
6088             =head2 Joining tables (Looking up data from another table)
6089              
6090             join() looks up field values from a second table, based on common
6091             values in key fields which may have different or the same names in
6092             each table. It adds columns to the current table if necessary to hold
6093             any new field values that must be brought in.
6094              
6095             join() never adds any new or non-matching records to the table:
6096             records where the lookup fails will simply have empty/undef values in
6097             the corresponding columns.
6098              
6099             ## Example:
6100              
6101             $t->join ($People, 'FullName', 'FirstAndLast'); ## or
6102             $t->join_file("People.txt", {}, 'FullName', 'FirstAndLast');
6103              
6104              
6105             Here's how join() calculates the list of fields to bring in:
6106              
6107             - Legal field names from the optional $Fields list, if supplied
6108             - Otherwise, the fieldlist() from second table
6109             - ... minus any fields with same name as $KeyField1 or $KeyField2
6110              
6111             Join starts by adding new empty columns in the first table for any
6112             field to be brought in from the second but not yet present in the
6113             first.
6114              
6115             Here's how join() calculates the records eligible for lookup:
6116              
6117             - Join only modifies the selected records in the first table
6118             - Join only looks up values from selected records in second table
6119              
6120             (If you want all records to be used in both or either table, call the
6121             table's select_all() method before calling join().)
6122              
6123             Then, for every selected record in $t (using the example above), join
6124             examines the FullName field ($KeyField1), and looks up a corresponding
6125             entry (must be 'eq') in the FirstAndLast field ($KeyField2) in the
6126             second table.
6127              
6128             IMPORTANT NOTE ABOUT KEY LENGTH: To speed lookup, hash-based indices
6129             are made. The strings in $Key1 and $Key2 fields should not be so long
6130             that the hash lookups bog down or things could get ugly fast. There
6131             is no fixed limit to hash key length in Perl, but fewer than 128
6132             characters in length is longer than customary for such things. (Many
6133             systems require text-based keys to be no longer than 31 characters.)
6134             So be judicious about the values in $Key1 and $Key2 fields.
6135              
6136             The first record found in the second table's selection with a matching
6137             value in the key field is then copied over (but only the appropriate
6138             fields are copied, as explained above). Any field values being
6139             brought over will REPLACE corresponding field values in the first
6140             table, possibly overwriting any previous values if the field being
6141             looked up was already present in the first table and contained data.
6142              
6143             The first table's _FieldList is updated to reflect new fields added.
6144              
6145             Its _Selection is untouched.
6146              
6147             Its _SortOrder is untouched.
6148              
6149             Its _SortSpecs are augmented to include any entries from the second
6150             table that should be brought over due to the field additions.
6151              
6152             Its _SRoutines are augmented to add new ones from the second table.
6153              
6154             All other parameters of table 1 are untouched.
6155              
6156             The second table is not modified. No data structures will be shared
6157             between the tables. Data is only copied.
6158              
6159             join_file() calls join() after creating a seond table from your $File.
6160              
6161             join_files() calls join_file() repeatedly for each file in a list, but
6162             it is important to note that each file in the list of files to be
6163             joined must have a $Key2 field -- AND, that any values looked up from
6164             the second file will overwrite any values of the same key found in the
6165             first file, and so on. You probably will not ever need join_files().
6166             It is mainly here for completeness.
6167              
6168             =cut
6169            
6170             {}; ## Get emacs to indent correctly.
6171              
6172             sub append ## ($this, $OtherCTable)
6173             {
6174 10     10 0 41 my $this = shift;
6175 10         16 my ($that) = @_;
6176              
6177 10         13 my $Success;
6178              
6179             ## Get all fields in $this, but only selected ones in $that
6180 10         28 my $ThisFieldsAll = $this->fieldlist_all();
6181 10         23 my $ThatFields = $that->fieldlist();
6182              
6183             ## Figure out how many data fields in each of the tables.
6184 10         25 my $ThisFieldCount = @$ThisFieldsAll+0;
6185 10         15 my $ThatFieldCount = @$ThatFields +0;
6186              
6187             ## We're going to bring over only the selected records in $that.
6188 10         21 my $ThisSel = $this->selection();
6189 10         20 my $ThatSel = $that->selection();
6190              
6191             ## Figure out how many records there were to start with.
6192 10         22 my $ThisLength = $this->length();
6193 10         16 my $ThatLength = @$ThatSel+0;
6194              
6195             ## New record count is sum of the other two.
6196 10         13 my $NewLength = $ThisLength + $ThatLength;
6197              
6198             ## Create any missing columns not yet present in $this and,
6199             ## whether new or not, presize all vectors to the new length,
6200             ## which will create empty/undef entries as necessary.
6201              
6202 10   50     179 foreach (@$ThisFieldsAll, @$ThatFields) {$#{$this->{$_} ||= []} = ($NewLength - 1)};
  72         81  
  72         735  
6203              
6204             ## Then copy the field data from the second table into the already
6205             ## pre-sized columns in this one.
6206              
6207 10         185 foreach my $FieldName (@$ThatFields)
6208             {
6209 34         48 my $NewVector = $this->{$FieldName};
6210 34         71 my $OldVector = $that->sel_get($FieldName, $ThatSel);
6211            
6212 34         65 foreach my $RecordNum (0..$#$OldVector)
  86         750  
6213             {($NewVector->[$ThisLength + $RecordNum] =
6214             $OldVector->[ $RecordNum])};
6215             }
6216              
6217             ## Now all the data columns have been combined. We just have to
6218             ## combine any custom metadata.
6219            
6220             ## If either table had a custom fieldlist, then make a new custom
6221             ## field list which is the result of concatenating both field
6222             ## lists together, without duplicates, and of course preserving
6223             ## the original order as completely as possible (with the order
6224             ## given in the first table taking precedence).
6225              
6226 10 50 33     35 if (defined($this->{_FieldList}) ||
6227             defined($that->{_FieldList}))
6228            
6229             {
6230 10         23 my $ThisFields = $this->fieldlist();
6231 10         221 my $ThatFields = $that->fieldlist();
6232              
6233             ## Make a hash mapping field names from both tables to the
6234             ## order they should appear
6235              
6236 10         18 my $FieldOrderHash = {};
6237 10   66     21 foreach (@$ThisFields, @$ThatFields) {$FieldOrderHash->{$_} ||= (keys %$FieldOrderHash) + 1};
  69         226  
6238              
6239 10         33 my $FieldList = [sort {$FieldOrderHash->{$a} <=> $FieldOrderHash->{$b}} keys %$FieldOrderHash];
  42         71  
6240            
6241 10         37 $this->{_FieldList} = $FieldList;
6242             }
6243              
6244             ## If either table had a custom sortorder, then make a new custom
6245             ## sort order which is the result of concatenating both orders
6246             ## together, without duplicates, and of course preserving the
6247             ## original order as completely as possible (with the order given
6248             ## in the first table taking precedence).
6249            
6250 10 50 33     50 if (defined($this->{_SortOrder}) ||
6251             defined($that->{_SortOrder}))
6252             {
6253 0         0 my $ThisOrder = $this->sortorder();
6254 0         0 my $ThatOrder = $that->sortorder();
6255              
6256             ## Make a hash mapping field names from both lists to the
6257             ## order they should appear
6258              
6259 0         0 my $OrderHash = {};
6260 0   0     0 foreach (@$ThisOrder, @$ThatOrder) {$OrderHash->{$_} ||= (keys %$OrderHash) + 1};
  0         0  
6261            
6262 0         0 my $OrderList = [sort {$OrderHash->{$a} <=> $OrderHash->{$b}} keys %$OrderHash];
  0         0  
6263            
6264 0         0 $this->{_SortOrder} = $OrderList;
6265             }
6266              
6267             ## If either table had a custom selection, then create a new
6268             ## selection which is the concatenation of the two selections.
6269              
6270 10 100 100     46 if (defined($this->{_Selection}) ||
6271             defined($that->{_Selection}))
6272             {
6273 3         11 $this->{_Selection} = [@$ThisSel, ## Original selected records...
6274             ## Plus an adjusted entry for newly-added ones.
6275             ($ThisLength .. ($ThisLength + @$ThatSel - 1))
6276             ];
6277             }
6278              
6279             ## If either table had custom sortspecs, then create a new
6280             ## sortspecs hash by starting with all the entries from the other
6281             ## table and adding/overwriting with those from this table.
6282              
6283 10 50 33     34 if (defined($this->{_SortSpecs}) ||
6284             defined($that->{_SortSpecs}))
6285             {
6286 10         24 my $ThisSpecs = $this->sortspecs();
6287 10         20 my $ThatSpecs = $that->sortspecs();
6288              
6289 10         37 $this->{_SortSpecs} = {%$ThatSpecs, %$ThisSpecs};
6290             }
6291              
6292             ## If either table had custom sortroutines, then create a new
6293             ## sortroutines hash by starting with all the entries from the
6294             ## other table and adding/overwriting with those from this table.
6295              
6296 10 50 33     32 if (defined($this->{_SRoutines}) ||
6297             defined($that->{_SRoutines}))
6298             {
6299 10   50     24 my $ThisRoutines = $this->{_SRoutines} || {};
6300 10   50     22 my $ThatRoutines = $that->{_SRoutines} || {};
6301              
6302 10         29 $this->{_SRoutines} = {%$ThatRoutines, %$ThisRoutines};
6303             }
6304              
6305             ## All other settings are kept from $this and those from $that are
6306             ## ignored.
6307              
6308 10         15 $Success = 1;
6309 10         41 done:
6310             return($Success);
6311             }
6312              
6313             sub append_file
6314             {
6315 2     2 0 3 my $this = shift;
6316 2         5 my ($FileName, $Params) = @_;
6317              
6318 2         2 my $Success;
6319              
6320             ## $Params argument is optional. If supplied, it must be a hash.
6321 2   100     8 $Params ||= {};
6322              
6323             ## Create a new empty table object of the same class as $this and
6324             ## read just the specified file into it. (note: this could be a
6325             ## recursive call here).
6326              
6327 2 50       13 my $that = ref($this)->new($Params, $FileName) or goto done;
6328            
6329             ## Append the data from $that table into this one.
6330              
6331 2 50       8 $this->append($that) or goto done;
6332              
6333 2         4 $Success = 1;
6334 2         17 done:
6335             return($Success);
6336             }
6337              
6338             sub append_files
6339             {
6340 109     109 0 134 my $this = shift;
6341 109         157 my ($FileNames, $Params) = @_;
6342            
6343 109         107 my $Success;
6344            
6345 109         255 foreach my $FileName (@$FileNames)
6346             {
6347 2 50       7 goto done unless $this->append_file($FileName, $Params);
6348             }
6349            
6350 109         128 $Success = 1;
6351 109         274 done:
6352             return($Success);
6353             }
6354              
6355             sub append_files_new ## Called by new() to process its file name args.
6356             {
6357 109     109 0 137 my $this = shift;
6358 109         149 my ($FileNames, $Params) = @_;
6359              
6360 109         133 my $Success;
6361              
6362             ## First we read the first file, if any, into this object using
6363             ## the read() method.
6364              
6365 109         155 my $FirstFile = shift @$FileNames;
6366 109 100       233 if (defined($FirstFile))
6367             {
6368 57 100       63 goto done unless $this->read(%{$Params || {}}, _FileName => $FirstFile);
  57 50       305  
6369             }
6370            
6371 109 50       366 goto done unless $this->append_files($FileNames, $Params);
6372              
6373 109         145 $Success = 1;
6374 109         266 done:
6375             return($Success);
6376             }
6377              
6378             sub combine
6379             {
6380 6     6 0 10 my $this = shift;
6381 6         8 my ($that) = @_;
6382              
6383 6         6 my $Success;
6384              
6385             ## Get a snapshot of field lists before any modifications.
6386 6         9 my $ThisFields = $this->fieldlist();
6387 6         12 my $ThatFields = $that->fieldlist();
6388              
6389             ## Bring in all (listed) fields from other table.
6390 6         8 my $IncomingFields = $ThatFields;
6391              
6392             ## Preserve any previous non-selection and force one to be saved
6393             ## in the interim. This will prevent $that->sel() from
6394             ## recalculating the selection each time if there is none.
6395 6         7 my $OldSel = $that->{_Selection};
6396 6         12 $that->{_Selection} = $that->selection(); ## Might be a no-op
6397            
6398             ## Copy columns from $that in selection order; (re)place into $this
6399 6         11 foreach (@$IncomingFields) {$this->col_set($_, $that->sel($_))};
  12         22  
6400            
6401             ## Restore the possibly-undef selection in other table.
6402 6         12 $that->{_Selection} = $OldSel; ## Might be a no-op
6403            
6404             ## Extend any short columns (whether originating from other table
6405             ## or from this one) to be the same length as all others.
6406            
6407 6         15 $this->extend();
6408              
6409             ## If either table had a custom fieldlist, then make a new custom
6410             ## field list which is the result of concatenating both field
6411             ## lists together, without duplicates, and of course preserving
6412             ## the original order as completely as possible (with the order
6413             ## given in the first table taking precedence).
6414            
6415 6 50 33     20 if (defined($this->{_FieldList}) ||
6416             defined($that->{_FieldList}))
6417             {
6418             ## Make a hash mapping field names from both tables to the
6419             ## order they should appear
6420 6         9 my $FieldOrderHash = {};
6421 6   66     13 foreach (@$ThisFields, @$ThatFields) {$FieldOrderHash->{$_} ||= (keys %$FieldOrderHash) + 1};
  47         148  
6422            
6423 6         20 my $FieldList = [sort {$FieldOrderHash->{$a} <=> $FieldOrderHash->{$b}} keys %$FieldOrderHash];
  60         75  
6424              
6425 6         19 $this->{_FieldList} = $FieldList;
6426             }
6427              
6428             ## If either table had a custom sortorder, then make a new custom
6429             ## sort order which is the result of concatenating both orders
6430             ## together, without duplicates, and of course preserving the
6431             ## original order as completely as possible (with the order given
6432             ## in the first table taking precedence).
6433            
6434 6 50 33     30 if (defined($this->{_SortOrder}) ||
6435             defined($that->{_SortOrder}))
6436             {
6437 0         0 my $ThisOrder = $this->sortorder();
6438 0         0 my $ThatOrder = $that->sortorder();
6439              
6440             ## Make a hash mapping field names from both lists to the
6441             ## order they should appear
6442              
6443 0         0 my $OrderHash = {};
6444 0   0     0 foreach (@$ThisOrder, @$ThatOrder) {$OrderHash->{$_} ||= (keys %$OrderHash) + 1};
  0         0  
6445            
6446 0         0 my $OrderList = [sort {$OrderHash->{$a} <=> $OrderHash->{$b}} keys %$OrderHash];
  0         0  
6447            
6448 0         0 $this->{_SortOrder} = $OrderList;
6449             }
6450              
6451             ## If either table had custom sortspecs, then create a new
6452             ## sortspecs hash by starting with all the entries from the other
6453             ## table and adding/overwriting with those from this table.
6454              
6455 6 50 33     19 if (defined($this->{_SortSpecs}) ||
6456             defined($that->{_SortSpecs}))
6457             {
6458 6         193 my $ThisSpecs = $this->sortspecs();
6459 6         14 my $ThatSpecs = $that->sortspecs();
6460              
6461 6         21 $this->{_SortSpecs} = {%$ThatSpecs, %$ThisSpecs};
6462             }
6463              
6464             ## If either table had custom sortroutines, then create a new
6465             ## sortroutines hash by starting with all the entries from the
6466             ## other table and adding/overwriting with those from this table.
6467              
6468 6 50 33     19 if (defined($this->{_SRoutines}) ||
6469             defined($that->{_SRoutines}))
6470             {
6471 6   50     16 my $ThisRoutines = $this->{_SRoutines} || {};
6472 6   50     12 my $ThatRoutines = $that->{_SRoutines} || {};
6473              
6474 6         16 $this->{_SRoutines} = {%$ThatRoutines, %$ThisRoutines};
6475             }
6476              
6477             ## All other settings are kept from $this and those from $that are
6478             ## ignored.
6479              
6480 6         9 $Success = 1;
6481 6         21 done:
6482             return($Success);
6483             }
6484              
6485             sub combine_file
6486             {
6487 5     5 0 29 my $this = shift;
6488 5         7 my ($FileName, $Params) = @_;
6489              
6490 5         6 my $Success;
6491              
6492             ## $Params argument is optional. If supplied, it must be a hash.
6493 5   50     39 $Params ||= {};
6494              
6495             ## Create a new empty table object of the same class as $this and
6496             ## read just the specified file into it.
6497              
6498 5 50       17 my $that = ref($this)->new($Params, $FileName) or goto done;
6499              
6500             ## Combine the data from $that table into this one.
6501              
6502 5 50       15 $this->combine($that) or goto done;
6503              
6504 5         8 $Success = 1;
6505 5         35 done:
6506             return($Success);
6507             }
6508              
6509             sub combine_files
6510             {
6511 1     1 0 13 my $this = shift;
6512 1         3 my ($FileNames, $Params) = @_;
6513            
6514 1         2 my $Success;
6515            
6516 1         3 foreach my $FileName (@$FileNames)
6517             {
6518 1 50       6 goto done unless $this->combine_file($FileName, $Params);
6519             }
6520            
6521 1         3 $Success = 1;
6522 1         4 done:
6523             return($Success);
6524             }
6525              
6526             sub join
6527             {
6528 5     5 0 31 my $this = shift;
6529 5         9 my ($that, $Key1, $Key2, $Fields) = @_;
6530              
6531 5         7 my $Success;
6532              
6533             ## $Key1 is required.
6534 5 50       12 $this->warn("Key1 is required for join()"), goto done
6535             unless ($Key1);
6536              
6537             ## $Key2 defaults to the same as $Key1.
6538 5   33     10 $Key2 ||= $Key1;
6539            
6540             ## The fields we'll be getting can optionally be overridden by
6541             ## caller; otherwise, they're the field list of the other table.
6542 5   33     17 my $IncomingFields = $Fields || $that->fieldlist();
6543            
6544             ## Cull $Key1 and $Key2 from the incoming field list.
6545 5 50       9 $IncomingFields = [grep {($_ ne $Key1) && ($_ ne $Key2)} @$IncomingFields];
  26         89  
6546            
6547             ## Preserve any previous non-selection and force one to be saved
6548             ## in the interim. This will prevent $that->sel() from
6549             ## recalculating the selection each time if there is none.
6550 5         11 my $OldSel = $that->{_Selection};
6551 5         11 $that->{_Selection} = $that->selection(); ## Might be a no-op
6552            
6553             ## Make an index mapping values in $Key2 to record numbers in
6554             ## $that. We reverse the order of insertion into the $Index
6555             ## because we want the items earliest in selection order to have
6556             ## precedence in case keys are not unique as they should be.
6557            
6558 5         9 my $Index = {}; @$Index{reverse @{$that->sel($Key2)}} = reverse @{$that->{_Selection}};
  5         6  
  5         15  
  5         12  
6559              
6560             ## Get a list of record numbers in $this that we'll be copying data into.
6561 5         14 my $Recs1 = $this->selection();
6562              
6563             ## Get a corresponding list of keys we're going to look up.
6564 5         9 my $Key1s = $this->sel($Key1);
6565              
6566             ## The default "record number" in table 2 is $that->length()
6567             ## ... i.e. an invalid record number past the end of the table.
6568             ## This will ensure that failed lookups result in lookups to this
6569             ## illegal record number, correctly producing undef in the
6570             ## corresponding joined fields, whereas looking up "undef" would
6571             ## have produced record number zero.
6572              
6573 5         9 my $DefaultRecNum = $that->length();
6574              
6575             ## Look up @$Key1s in @$Index to get a list of data-source record
6576             ## numbers in $that. Failed lookups map to $DefaultRecNum.
6577            
6578 5 100       16 my $Recs2 = [map {defined() ? $_ : $DefaultRecNum} @$Index{@$Key1s}];
  15         29  
6579              
6580             ## Copy data into selected positions within columns of $this, one
6581             ## column at a time, creating pre-sized columns in $this as
6582             ## necessary (col()). These array slice operations are very, very
6583             ## fast.
6584              
6585 5         11 foreach my $Field (@$IncomingFields)
6586             {
6587 21         36 (@{$this->col($Field)}[@$Recs1] = ## Put values into selected records of $this
  21         36  
6588 21         26 @{$that->col($Field)}[@$Recs2]); ## Get values from looked-up records of $that
6589             }
6590              
6591             ## Restore the possibly-undef selection in other table.
6592 5         8 $that->{_Selection} = $OldSel; ## Might be a no-op
6593              
6594             ## If this table had a custom fieldlist, then make a new custom
6595             ## field list which is the result of concatenating both field
6596             ## lists together, without duplicates, and of course preserving
6597             ## the original order as completely as possible (with the order
6598             ## given in the first table taking precedence).
6599              
6600 5 50       13 if (defined($this->{_FieldList}))
6601             {
6602 5         10 my $ThisFields = $this->fieldlist();
6603 5         8 my $ThatFields = $IncomingFields;
6604              
6605             ## Make a hash mapping field names from both tables to the
6606             ## order they should appear
6607              
6608 5         7 my $FieldOrderHash = {};
6609 5   66     12 foreach (@$ThisFields, @$ThatFields) {$FieldOrderHash->{$_} ||= (keys %$FieldOrderHash) + 1};
  56         223  
6610              
6611 5         21 my $FieldList = [sort {$FieldOrderHash->{$a} <=> $FieldOrderHash->{$b}} keys %$FieldOrderHash];
  64         86  
6612            
6613 5         23 $this->{_FieldList} = $FieldList;
6614             }
6615              
6616             ## If either table had custom sortspecs, then create a new
6617             ## sortspecs hash by starting with all the entries from the other
6618             ## table and adding/overwriting with those from this table.
6619              
6620 5 50 33     19 if (defined($this->{_SortSpecs}) ||
6621             defined($that->{_SortSpecs}))
6622             {
6623 5         15 my $ThisSpecs = $this->sortspecs();
6624 5         9 my $ThatSpecs = $that->sortspecs();
6625              
6626 5         16 $this->{_SortSpecs} = {%$ThatSpecs, %$ThisSpecs};
6627             }
6628              
6629             ## If either table had custom sortroutines, then create a new
6630             ## sortroutines hash by starting with all the entries from the
6631             ## other table and adding/overwriting with those from this table.
6632              
6633 5 50 33     24 if (defined($this->{_SRoutines}) ||
6634             defined($that->{_SRoutines}))
6635             {
6636 5   50     12 my $ThisRoutines = $this->{_SRoutines} || {};
6637 5   50     11 my $ThatRoutines = $that->{_SRoutines} || {};
6638              
6639 5         13 $this->{_SRoutines} = {%$ThatRoutines, %$ThisRoutines};
6640             }
6641              
6642             ## All other settings are kept from $this and those from $that are
6643             ## ignored.
6644              
6645 5         7 $Success = 1;
6646 5         24 done:
6647             return($Success);
6648             }
6649              
6650             sub join_file
6651             {
6652 0     0 0 0 my $this = shift;
6653 0         0 my ($FileName, $Params, $Key1, $Key2, $Fields) = @_;
6654              
6655 0         0 my $Success;
6656              
6657             ## $Params argument may be undef. If supplied, it must be a hash.
6658 0   0     0 $Params ||= {};
6659              
6660             ## Create a new empty table object of the same class as $this and
6661             ## read just the specified file into it.
6662              
6663 0 0       0 my $that = ref($this)->new($Params, $FileName) or goto done;
6664            
6665             ## Join the data from $that table into this one.
6666              
6667 0 0       0 $this->join($that, $Key1, $Key2, $Fields) or goto done;
6668              
6669 0         0 $Success = 1;
6670 0         0 done:
6671             return($Success);
6672             }
6673              
6674             sub join_files
6675             {
6676 0     0 0 0 my $this = shift;
6677 0         0 my ($FileNames, $Params, $Key1, $Key2, $Fields) = @_;
6678            
6679 0         0 my $Success;
6680            
6681 0         0 foreach my $FileName (@$FileNames)
6682             {
6683 0 0       0 goto done unless $this->join_file($FileName, $Params, $Key1, $Key2, $Fields);
6684             }
6685            
6686 0         0 $Success = 1;
6687 0         0 done:
6688             return($Success);
6689             }
6690              
6691             =pod
6692              
6693             =head1 INVERTING A TABLE'S ROWS/COLUMNS
6694              
6695             ## Re-orient table's data using vals from $ColName as field names...
6696             $t-invert($ColName)
6697              
6698             Sometimes a situation gives you a table that's initially organized
6699             with column data in rows, and field names in one of the columns, so
6700             you need to flip the table in order to be able to work meaningfully
6701             with it.
6702              
6703             "Inverting" a table means to rewrite each row as a column. One row is
6704             designated to be used as the field names.
6705              
6706             For example, consider this table:
6707              
6708             F01 F02 F03 F04
6709             ------------------------
6710             First Chris Agnes James
6711             Last Bart Marco Nixon
6712             Age 22 33 44
6713              
6714             Calling invert() using field names from "F01"...
6715              
6716             $t->invert('F01');
6717              
6718             ... would change the table to look like this:
6719            
6720             First Last Age
6721             ----------------
6722             Chris Bart 22
6723             Agnes Marco 33
6724             James Nixon 44
6725              
6726             The field F01 which formerly contained the field names, is now gone,
6727             and the remaining data columns have been converted from their old row
6728             orientation into a column orientation.
6729              
6730             =cut
6731              
6732             sub invert
6733             {
6734 0     0 0 0 my $this = shift;
6735 0         0 my ($HeaderField) = @_;
6736              
6737 0         0 my $Success;
6738            
6739             ## Get new field names from an existing column and delete it at the same time.
6740 0 0       0 my $NewColNames = $this->col_delete($HeaderField) or
6741             $this->warn("Invalid field name given to invert() method"), goto done;
6742            
6743             ## Get a hash of all existing (remaining) data columns.
6744 0         0 my $OldColNames = $this->fieldlist_all();
6745 0         0 my $OldCols = $this->cols_hash($OldColNames);
6746            
6747             ## Make the new columns...
6748 0         0 my $NewCols = [map {$this->row_list($_, $OldColNames)} (0..$#$NewColNames)];
  0         0  
6749            
6750             ## Delete old columns from the object.
6751 0         0 delete @$this{@$OldColNames};
6752            
6753             ## Add new columns
6754 0         0 @$this{@$NewColNames} = @$NewCols;
6755              
6756             ## Set the field name list...
6757 0         0 $this->{_FieldList} = $NewColNames;
6758              
6759 0         0 $Success = 1;
6760 0         0 done:
6761             return($Success);
6762             }
6763              
6764             =pod
6765              
6766             =head1 PROGRESS MESSAGES
6767              
6768             ## Printing a progress message....
6769            
6770             $t->progress($Msg) ## Print a message per current settings
6771              
6772             ## Progress settings applying to this object only...
6773              
6774             $t->progress_get() ## Get current progress setting
6775              
6776             $t->progress_set(1) ## Use progress_default() method
6777             $t->progress_set($Sub) ## Set a custom progress routine
6778             $t->progress_set(0) ## Disable progress
6779             $t->progress_set(undef) ## Use class's settings (default)...
6780              
6781             ## Class's settings (for instances with _Progress == undef)
6782              
6783             $t->progress_class() ## Get current setting.
6784              
6785             $t->progress_class(1) ## Use progress_default() method
6786             $t->progress_class($Sub) ## Set shared custom prog routine
6787             $t->progress_class(0) ## Disable class-default progress
6788            
6789             Data::CTable->progress_class(..) ## Call without an object
6790              
6791             ## Call builtin default progress method regardless of settings
6792              
6793             $t->progress_default($Msg) ## Default prog. routine for class
6794              
6795             ## Generate a warning (used internally by other methods)
6796              
6797             $t->warn($Msg) ## In this class, calls progress_default()
6798              
6799             ## Timed progress: print msg to start, then at most once/2 sec
6800              
6801             $t->progress_timed($Op, $Item) ## Re-print msg every 2 sec
6802             $t->progress_timed($Op, $Item, $Pos, $Tot) ##... with % readout
6803             $t->progress_timed($Op, $Item, $Pos, $Tot, $Wait) ## Not 1st x
6804              
6805             $t->progress_timed_default($Msg) ## Called by progress_timed
6806              
6807             Data::CTable is especially useful in creating batch-oriented
6808             applications for processing data. As such, routines that may perform
6809             time-consuming tasks will, by default, generate helpful progress
6810             messages. The progress mechanism is highly customizable, however, to
6811             suit the needs of applications that don't require this output, or that
6812             require the output to go somewhere other than STDERR or the console.
6813              
6814             The default progress routine is one that prints a message with a
6815             date/time stamp to STDERR if and only if STDERR is an interactive
6816             terminal, and otherwise is silent.
6817              
6818             You could write a custom progress routine that does something else or
6819             something in addition (e.g. logs to a file or syslog). The custom
6820             routine could either be implemented by overriding the
6821             progress_default() method in a subclass, or by calling progress_set()
6822             in any instance.
6823              
6824             The custom progress routine, if any, is stored in the _Progress
6825             parameter of the object. But use progress_set() and progress_get() to
6826             access it.
6827              
6828             The interface for your custom progress routine should be:
6829              
6830             sub MyProgress {my ($Obj, $Message) = @_; chomp $Message; .....}
6831              
6832             In other words, the routine takes a single message which may or may
6833             not have a trailing newline. It should always chomp the newline if
6834             present, and then do its business... which generally will include
6835             printing or logging a message (usually with a newline added).
6836              
6837             The default, built-in progress routine for Data::CTable is:
6838              
6839             sub progress_default
6840             {
6841             my ($this, $msg) = @_;
6842             chomp $msg;
6843            
6844             print STDERR (localtime() . " $msg\n") if -t STDERR;
6845              
6846             return(1); ## Indicate progress actually completed
6847             }
6848              
6849             Of course, you are free to call this method directly at any time, and
6850             it will do its thing regardless of other progress-enabling settings.
6851             But the preferred way is to first set the settings and then call
6852             progress().
6853              
6854             The warn() method always calls progress_default() -- i.e. warnings
6855             will display even if progress is otherwise disabled or overridden at
6856             the object or class level. However, you could create a subclass that
6857             changes warn()'s behavior if desired. (For example, it could just
6858             call perl's builtin warn function, or be even more forceful,
6859             generating warnings even if STDERR is not a terminal, for example.)
6860              
6861             The progress_set() method may be used to override the progress routine
6862             for an individual object (set to 1/true for default behavior, or
6863             0/undef/false to disable progress for that object entirely).
6864              
6865             Call progress_class() to set similar values to control the global
6866             default behavior (e.g. turning on/off default progress behavior for
6867             all instances), but be cautious about using this approach in any
6868             environment where other programs might be accessing the same loaded
6869             class data, since the setting is stored in a class-owned global
6870             ($Data::CTable::DefaultProgress).
6871              
6872             Manipulating the class-default settings is only recommended in batch
6873             or shell-script environments, not in mod_perl Web applications where
6874             the module stays loaded into the Perl environment across multiple
6875             invocations, for example.
6876              
6877             If you want a particular method (e.g. read() but not write()) to be
6878             silent, you could make a subclass and could override that method with
6879             an implementation that first disables progress, calls the SUPER::
6880             method, and then restores the progress setting to its original
6881             setting.
6882              
6883             =head2 Timed progress
6884              
6885             Timed progress is a way of printing periodically-recurring progress
6886             messages about potentially time-consuming processes to the terminal.
6887              
6888             For example, consider the following messages which might appear every
6889             2 seconds during a lengthy read() operation:
6890              
6891             Reading... 0 (0%)
6892             Reading... 2000 (4%)
6893             ...
6894             Reading... 38000 (96%)
6895             Reading... 40000 (100%)
6896              
6897             The progress_timed() method is called internally by potentially
6898             time-consuming processes (read(), write(), and sort()), and you may
6899             want to call it yourself from your own scripts, to produce
6900             weary-programmer-soothing visual output during otherwise
6901             panic-producing long delays.
6902              
6903             Generally, progress_timed() is called with the $Wait parameter set to
6904             true, which delays the display of any messages until 2 seconds have
6905             passed, so no messages will be displayed unless the process actually
6906             does end up being slower than 2 seconds.
6907              
6908             Parameters are:
6909              
6910             $Op The string that identifies the "operation" taking place
6911             $Item A milestone such as a number or datum to indicate progress
6912             $Pos A numeric position against the (maybe estimated) baseline
6913             $Tot The baseline. If estimated, don't re-estimate too often
6914             $Wait If true, skip printing the first message for this $Op
6915              
6916             All parameters except $Op are optional.
6917              
6918             progress_timed() has a throttle that keeps it from re-triggering more
6919             often than every 2 seconds for any given sequence of the same $Op.
6920             The clock is restarted each time you call it with a different $Op or
6921             $Tot from the previous call (on the assumption that if the operation
6922             or the baseline changes then that fact should be noted).
6923              
6924             The messages printed will start with "$Op... ".
6925              
6926             If you supply $Item, which could be a number or a string, the messages
6927             will then show the $Item after the $Op.
6928              
6929             If you supply BOTH $Pos and $Tot, then a percentage will be calculated
6930             and added to the readout; otherwise omitted.
6931              
6932             If you supply $Wait, the first message (only) that uses this $Op will
6933             be skipped, and the next one won't appear for at least 2 seconds.
6934              
6935             If using $Pos and $Tot to display percentages for your user, be sure
6936             to call progress_timed() one final time when $Pos == $Tot so your user
6937             sees the satisfying 100% milestone. This "completion" call will not
6938             be skipped even if 2 seconds have not passed since the previous timed
6939             progress message was printed.
6940              
6941             Althought progress_timed() is designed to cut down on too much visual
6942             output when called often in a tight loop, remember that it still takes
6943             some processing time to call it and so if you call it too frequently,
6944             you're slowing down the very loop you wish were running faster.
6945              
6946             So, you might want to call it every tenth or 100th or even 1000th time
6947             through a tight loop, instead of every time through, using the mod (%)
6948             operator:
6949              
6950             $t->progress_timed(....) if ($LoopCount % 100) == 0;
6951              
6952             progress_timed_default() is the method called internally by
6953             progress_timed() to actually print the messages it has prepared. In
6954             this implementation, progress_timed_default() just calls
6955             progress_default(). That is, it ignores all other progress-inhibiting
6956             or -enhancing settings so delay-soothing messages will print on the
6957             terminal even if other messages are turned off.
6958              
6959             This is because the author assumes that even if you don't want all
6960             those other progress messages, you might still want these ones that
6961             explain long delays. If you REALLY don't, then just make yourself a
6962             lightweight subclass where progress_timed_default() is a no-op, or
6963             maybe calls regular progress(). For example:
6964              
6965             BEGIN {package Data::CTable::Silent; use vars qw(@ISA);
6966             @ISA=qw(Data::CTable); sub progress_timed_default{}}
6967              
6968             ## Later...
6969             my $t = Data::CTable::Silent->new(...);
6970              
6971              
6972             =cut
6973              
6974             $Data::CTable::DefaultProgress = 1;
6975              
6976             sub progress_set
6977             {
6978 5     5 0 30 my $this = shift;
6979 5         8 my ($ProgSetting) = @_;
6980            
6981 5         14 $this->{_Progress} = $ProgSetting;
6982             }
6983              
6984             sub progress_class
6985             {
6986 199     199 0 366 my $Ignored = shift;
6987 199         264 my ($ProgSetting) = @_;
6988              
6989             ## Set if specified...
6990 199 100       424 $Data::CTable::DefaultProgress = $ProgSetting if defined($ProgSetting);
6991              
6992             ## Return..
6993 199         346 return($Data::CTable::DefaultProgress);
6994             }
6995              
6996             sub progress_get
6997             {
6998 0     0 0 0 my $this = shift;
6999            
7000 0         0 my $ProgSetting = $this->{_Progress};
7001              
7002 0         0 return($ProgSetting);
7003             }
7004              
7005             sub progress_default
7006             {
7007 7     7 0 18 my ($this, $msg) = @_;
7008 7         23 chomp $msg;
7009            
7010 7 50       65 print STDERR (localtime() . " $msg\n") if -t STDERR;
7011              
7012 7         78 return(1); ## Indicate progress actually completed
7013             }
7014              
7015             sub progress
7016             {
7017 193     193 1 1965 my $this = shift;
7018 193         262 my ($msg) = @_;
7019            
7020 193         326 my $Prog1 = $this->{_Progress}; ## First check object's progress setting
7021 193         371 my $Prog2 = $this->progress_class(); ## Then check class's setting
7022              
7023             ## Calling regular progress resets the timers & ops in
7024             ## progress_timed...
7025              
7026 193         472 delete $this->{_ProgTimeInfo};
7027              
7028             ## First examine object setting to find a progress routine...
7029              
7030 193 100       376 return(&$Prog1($this, $msg)) if ref($Prog1) eq 'CODE'; ## Code ref: return it.
7031 191 100       389 return($this->progress_default($msg)) if $Prog1; ## true: use default progress.
7032 185 100       321 return(undef) if defined($Prog1); ## false but defined: no progress.
7033            
7034             ## undef: fall through to class settings...
7035            
7036 181 100       302 return(&$Prog2($this, $msg)) if ref($Prog2) eq 'CODE'; ## Code ref: return it.
7037 179 50       303 return($this->progress_default($msg)) if $Prog2; ## true: use default progress.
7038 179         268 return(undef); ## false/undef: no progress.
7039             }
7040              
7041             sub progress_timed
7042             {
7043 188159     188159 0 582041 my $this = shift;
7044 188159         282096 my ($Op, $Item, $Pos, $Tot, $Wait) = @_;
7045              
7046             ## Get params from previous call if any.
7047 188159 100       200829 my ($LastOp, $LastItem, $LastPos, $LastTot, $LastTime) = @{$this->{_ProgTimeInfo} || []};
  188159         555883  
7048              
7049             ## print &Dumper([$Op, $Item, $Pos, $Tot, $Wait], [$LastOp, $LastItem, $LastPos, $LastTot, $LastTime]);
7050              
7051             ## Get elapsed time.
7052 188159         241832 my $Time = time();
7053 188159   66     369130 my $Elapsed = $Time - ($LastTime || $Time);
7054              
7055             ## We're on the "same" operation if the $Op name is the same and
7056             ## the total (baseline) is the same. Otherwise treat as new op.
7057              
7058 188159         257075 my $SameOp = (($Op eq $LastOp));
7059 188159   66     641931 my $SameOpAndTot = ($SameOp && ($Tot == $LastTot));
7060 188159   100     593417 my $Finished = ($Tot && ($Pos == $Tot));
7061              
7062             ## We trigger a message to print if we've been on the same op for
7063             ## 2 seconds or more, OR this is a new op.
7064              
7065 188159   100     1554885 my $Trigger = (($SameOpAndTot && ($Elapsed >= 2)) || ## Yes if same op & time has passed...
7066             ($SameOpAndTot && $Finished) || ## Yes if we're finished (100%).
7067             !$SameOpAndTot); ## Yes if new op
7068            
7069             ## Quit now if nothing to do.
7070 188159 100       471098 goto done unless $Trigger;
7071            
7072             ## Otherwise print message and save details for next time around.
7073 17 100 66     125 my $Percent = sprintf("(%2d\%)", int(($Pos * 100) / $Tot)) if (defined($Pos) && $Tot);
7074              
7075             ## If we've been asked to "wait", we skip actually printing the
7076             ## message this the first time, but act as if we did (starting the timer).
7077            
7078 17 100 100     110 my $RetVal = $this->progress_timed_default("$Op... $Item $Percent")
7079             unless (!$SameOp && $Wait); ## Skip first-time message if $Wait.
7080            
7081 17         65 $this->{_ProgTimeInfo} = [$Op, $Item, $Pos, $Tot, $Time];
7082            
7083 188159         440607 done:
7084             return($RetVal);
7085             }
7086              
7087             sub progress_timed_default
7088             {
7089 6     6 0 82 my $this = shift;
7090 6         13 my ($msg) = @_;
7091              
7092 6         37 return($this->progress_default("$msg"));
7093             }
7094              
7095             sub warn
7096             {
7097 1     1 0 3 my $this = shift;
7098 1         4 my ($msg) = @_;
7099              
7100 1         6 return($this->progress_default("WARNING: $msg"));
7101             }
7102              
7103             =pod
7104              
7105             =head1 DEBUGGING / DUMPING
7106              
7107             ## Print some debugging output...
7108              
7109             $t->out() ## Pretty-print $t using Data::ShowTable
7110              
7111             $t->dump() ## Dump $t using Data::Dumper
7112             $t->dump($x, $y) ## Dump anything else using Data::Dumper
7113              
7114             ## Print some debugging output and then die.
7115              
7116             die $t->out() ## Same but die afterwards.
7117              
7118             die $t->dump() ## Same but die afterwards.
7119             die $t->dump($x, $y) ## Same but die afterwards.
7120              
7121             These two methods can be very helpful in debugging your scripts.
7122              
7123             The out() method, which has many options, is described in complete
7124             detail in the section below titled "FORMATTED TABLES". In short, it
7125             prints a nicely-formatted diagram of $t, obeying the custom field list
7126             if any and custom selection if any.
7127              
7128             The dump() method uses the Data::Dumper module to call &Dumper() on
7129             the table itself (by default) and prints the result to STDERR. If you
7130             specify any number of other values, those will be dumped instead using
7131             a single call to &Dumper (rather than individually).
7132              
7133             =head2 Optional module dependencies
7134              
7135             These methods require the otherwise-optional modules shown here:
7136              
7137             out() Data::ShowTable
7138             dump() Data::Dumper
7139              
7140             You'll get a warning at runtime if you try to call either method
7141             without the appropriate module installed on your system.
7142              
7143             =cut
7144              
7145             sub dump
7146             {
7147 0     0 0 0 my $this = shift;
7148 0         0 my (@Things) = @_;
7149              
7150             ## Default is to dump the object.
7151 0 0       0 @Things = ($this) unless @Things;
7152              
7153 0 0       0 if ($HaveDumper)
7154             {
7155 0         0 print STDERR &Dumper(@Things);
7156             }
7157             else
7158             {
7159 0         0 carp("Data::Dumper module is not installed. Can't dump.");
7160             }
7161            
7162 0         0 return(1);
7163             }
7164              
7165             =pod
7166              
7167             =head1 MISCELLANEOUS UTILITY METHODS
7168              
7169             The following utilities are methods of the Data::CTable object. They
7170             may be called directly by clients, subclassed, or used by subclass
7171             implementations as needed.
7172              
7173             ## Get cache file path (all args optional: defaulted from $t)
7174              
7175             $f = $t->prep_cache_file($FileName, $CacheExtension, $CacheSubDir)
7176              
7177             ## Verify all directories in a path, creating any needed ones.
7178              
7179             $ok = $t->verify_or_create_path($DirPath, $Sep)
7180              
7181             ## Testing readability / writeability of a proposed file
7182              
7183             $ok = $t->try_file_read ($Path); ## Opens for read; then closes
7184             $ok = $t->try_file_write($Path); ## Opens for write; then deletes
7185              
7186             ## Getting parameters from object with optional overrides
7187              
7188             $param = $t->getparam($Params, $Param)
7189              
7190             prep_cache_file() is the internal method used by both read() and
7191             write() to calculate the name of a cache file to be used for a given
7192             $FileName.
7193              
7194             It calculates the path to the cache file that corresponds to the given
7195             $FileName (which may be a bare file name, a relative path, or a
7196             partial path, as long as it obeys the current platform's path format
7197             rules). All arguments are optional and if absent (undef), will be
7198             defaulted from the corresponding parameters in $t.
7199              
7200             In addition to calculating the path and file name, it also prepends
7201             the "current directory" path if there was no path. Then it checks
7202             that all directories mentioned in the path actually exist. If not, it
7203             fails. Then, it checks that EITHER the file exists and is readable,
7204             OR it does not exist but would be writeable in that directory. If any
7205             of these directory creations or file checks fails, then undef is
7206             returned (and there would be no cache file).
7207              
7208             You may call it with no arguments on a file that has been read() to
7209             find the path to the cache file that may have been used and/or
7210             created, if any.
7211              
7212             You may call it with a file name that was written to, to see what the
7213             corresponding written cache file would be.
7214              
7215             For example:
7216              
7217             ## Get name of cache file used or created by read and delete it.
7218              
7219             $RCache = $t->prep_cache_file() and unlink($RCache);
7220              
7221             ## Cache on write() and get name of file and delete it.
7222              
7223             $Written = $t->write(_CacheOnWrite=>1, _FileName=>"Foo.txt");
7224             $WCache = $t->prep_cache_file($Written) and unlink($WCache);
7225              
7226             verify_or_create_path() is the internal routine used by read(),
7227             write(), and the cache-writing logic, that makes sure a requested file
7228             path exists (by creating it if necessary and possible) before any file
7229             is written by this module.
7230              
7231             (If you don't like this module's tendency to try to create
7232             directories, make yourself a subclass in which this routine simply
7233             checks -d on its $Path argument and returns the result.)
7234              
7235             It must be called with a full or partial path TO A DIRECTORY, NOT A
7236             FILE. You may supply $Sep, a platform-appropriate separator character
7237             (which defaults correctly for the runtime platform if you don't).
7238              
7239             Returns true if the path verification and/or creation ultimately
7240             succeeded, false otherwise (meaning that, after this call, there is no
7241             such directory on the system and so you should not try to write a file
7242             there).
7243              
7244             try_file_read() and try_file_write() are the internal methods called
7245             by prep_cache_file() as well as by read() and write() to preflight
7246             proposed file reading and writing locations.
7247              
7248             try_file_read() opens a file for read and closes it again; returns
7249             true if the open was possible.
7250              
7251             try_file_write() opens a file for write and closes it again, deleting
7252             it if successful. Returns true if the open for write and the delete
7253             were successful. (Be aware that this call will actually delete any
7254             existing file by this name.)
7255              
7256             The reason that failure to delete causes try_file_write() to fail is
7257             that successful cacheing depends on the ability to delete cache files
7258             as well as create them or write to them. A file in a location that
7259             couldn't be deleted will not be used for cacheing.
7260              
7261             getparam() looks up a named parameter in a params hash if it exists
7262             there, otherwise looks it up in the object, thereby allowing $Params
7263             to shadow any parameters in $this.
7264              
7265             This internal routine is used by any methods that allow overriding of
7266             parameters in the object when using a named-parameter calling
7267             interface. It should be used by any subclasses that also wish to use
7268             a named-parameter calling convention. For example:
7269              
7270             my $this = shift;
7271             my $Params = (@_ == 1 ? {_FieldList => $_[0]} : {@_});
7272              
7273             my($FieldList, $Selection) = map {$this->getparam($Params, $_)}
7274             qw(_FieldList _Selection);
7275              
7276             =cut
7277              
7278             {}; ## Get emacs to indent correctly.
7279              
7280             sub prep_cache_file
7281             {
7282 53     53 0 90 my $this = shift;
7283 53         89 my($FileName, $CacheExtension, $CacheSubDir) = @_;
7284            
7285 53         65 my $Success;
7286              
7287 53   66     115 $FileName ||= $this->{_FileName};
7288 53   66     102 $CacheExtension ||= $this->{_CacheExtension};
7289 53   66     93 $CacheSubDir ||= $this->{_CacheSubDir};
7290              
7291             ## Break the path into its parts...
7292 1     1   10 use File::Basename qw(fileparse);
  1         2  
  1         506  
7293 53         1923 my ($Basename, $Path, $Ext) = fileparse($FileName, '\.[^\.]+');
7294            
7295             ## Figure out what the path separator should be...
7296 53         98 my ($Sep, $Up, $Cur) = @{$this->path_info()}{qw(sep up cur)};
  53         187  
7297              
7298             ## FileDir is guaranteed to either be empty or have a trailing
7299             ## separator (see: man File::Basename). If empty, we set it to
7300             ## $Cur (the current directory).
7301              
7302 53 50       187 my $FileDir = (length($Path) ? $Path : $Cur);
7303              
7304             ## Ensure $CacheSubDir is either empty or has a trailing separator...
7305              
7306 53         414 $CacheSubDir =~ s/([^\Q$Sep\E])$/$1$Sep/;
7307              
7308             ## Check whether it's an absolute path (and not really a "sub"-dir)
7309 53         131 my $Absolute = &path_is_absolute($CacheSubDir);
7310              
7311             ## $CacheDir is $FileDir with $CacheSubDir appended. $CacheSubDir
7312             ## may be empty, meaning $FileDir is to be used for the cache
7313             ## files. But if it's an absolute path, it stands alone.
7314              
7315 53 100       155 my $CacheDir = ($Absolute ? $CacheSubDir : "$FileDir$CacheSubDir");
7316              
7317             ## Now we need to make sure that CacheDir exists OR try to create
7318             ## it. (Warning will have already happened if necessary.)
7319 53 50       152 goto done unless $this->verify_or_create_path($CacheDir, $Sep);
7320              
7321             ## Verify that the dir is writeable.
7322 53 50       673 $this->warn("Cache directory $CacheDir is read-only"), goto done
7323             unless -r $CacheDir;
7324              
7325             ## Full path: note $CacheExtension and $Ext may be empty.
7326 53         136 my $CacheFilePath = "$CacheDir$Basename$Ext$CacheExtension";
7327              
7328             ## If the cache path and the full path end up being the same
7329             ## (probably because the _CacheSubDir and _CacheExtension are both
7330             ## empty), we bail. Obviously, we don't want to risk overwriting
7331             ## the original data with the cache (or trying to read cache data
7332             ## from a text file).
7333              
7334 53 50       117 $this->warn("Can't cache $FileName without either _CacheSubDir or _CacheExtension"), goto done
7335             if $CacheFilePath eq $FileName;
7336            
7337             ## Pre-flight the cache file: ensure we can either read it or
7338             ## touch and then delete it.
7339            
7340 53 50 66     173 $this->warn("Cache file $CacheFilePath cannot be created/overwritten: $!"), goto done
7341             unless ($this->try_file_read ($CacheFilePath) ||
7342             $this->try_file_write($CacheFilePath));
7343            
7344 53         83 $Success = 1;
7345 53 50       522 done:
7346             return($Success ? $CacheFilePath : undef);
7347             }
7348              
7349             sub verify_or_create_path
7350             {
7351 61     61 0 101 my $this = shift;
7352 61         93 my ($Dir, $Sep) = @_;
7353              
7354             ## Get default value for $Sep if not supplied.
7355 61   33     135 $Sep ||= ${$this->path_info()}{sep};
  0         0  
7356              
7357             ## $Dir might end in $Sep; split strips trailing one if so.
7358 61         292 my $Parts = [split(/\Q$Sep\E/, $Dir)];
7359              
7360 61         138 my $WholePath = "";
7361 61         119 foreach (@$Parts)
7362             {
7363 114         193 $WholePath = "$WholePath$_$Sep";
7364 114 50       1737 next if -d $WholePath;
7365              
7366             ## Directory does not exist. We need to make it.
7367              
7368             ## mkdir($WholePath, 0777) or
7369             ## $this->warn("Failed to create directory '$WholePath': $!"), last;
7370            
7371             ## On some platforms (e.g. Darwin), perl's mkdir fails if
7372             ## there's a trailing separator. Others tolerate its absence,
7373             ## so we remove it.
7374            
7375 0         0 (my $TryDir = $WholePath) =~ s/\Q$Sep\E$//;
7376            
7377 0 0       0 mkdir($TryDir, 0777) or
7378             $this->warn("Failed to create directory '$TryDir': $!"), last;
7379             }
7380              
7381 61         777 return(-d $Dir);
7382             }
7383              
7384             sub try_file_write ## like a "touch" but deletes the file if it succeeds.
7385             {
7386 10     10 0 20 my $this = shift;
7387 10         17 my ($Path) = @_;
7388              
7389 10         14 my $Success;
7390              
7391             ## Try creating it.
7392 1     1   8 use IO::File;
  1         1  
  1         314  
7393 10         69 my $File = IO::File->new(">$Path");
7394            
7395             ## Created: close and delete it.
7396 10 50       1304 $File->close(), unlink($Path) if $File;
7397              
7398             ## Failed: bail.
7399 10 50       810 goto done unless $File;
7400              
7401             ## If we couldn't unlink, fail. The ability to delete a failed or
7402             ## part-written cache file is a critical part of cacheing.
7403              
7404 10 50       178 goto done if -e $Path;
7405              
7406 10         20 $Success = 1;
7407 10         59 done:
7408             return($Success);
7409             }
7410              
7411             sub try_file_read ## Verifies that a file exists / can be read...
7412             {
7413 53     53 0 73 my $this = shift;
7414 53         76 my ($Path) = @_;
7415              
7416 53         71 my $Success;
7417              
7418             ## Try opening it.
7419 1     1   5 use IO::File;
  1         2  
  1         516  
7420 53 100       394 my $File = IO::File->new("<$Path") or goto done;
7421              
7422 51         3867 $Success = 1;
7423 53         966 done:
7424             return($Success);
7425             }
7426              
7427             sub getparam
7428             {
7429 801     801 0 950 my $this = shift;
7430 801         1036 my ($Params, $Param) = @_;
7431              
7432 801 100       2778 return(exists($Params->{$Param}) ?
7433             ( $Params->{$Param}) :
7434             ( $this->{$Param}) );
7435             }
7436              
7437              
7438              
7439             =pod
7440              
7441             =head1 GENERAL-PURPOSE UTILITY FUNCTIONS
7442              
7443             These general-purpose utility routines are defined in the Data::CTable
7444             module but are not method calls. You may optionally import them or
7445             call them by their fully-qualified name.
7446              
7447             use Data::CTable qw(
7448             guess_endings
7449             guess_delimiter
7450             path_info
7451             path_is_absolute
7452             min
7453             max
7454             );
7455              
7456             ## File-format guessing
7457              
7458             my $E = &guess_endings($IOHandle) ## Guess txt file line endings
7459             my $D = &guess_delimiter($String) ## Tab if found, else comma
7460              
7461             ## Cross-platform file path analysis
7462              
7463             my $Info = path_info(); ## Hash: 3 of platform's path values:
7464             my ($Sep, ## ... path separator ( / on Unix)
7465             $Up, ## ... "up" component (../ on Unix)
7466             $Cur) = ## ... curr. dir path ( ./ on Unix)
7467             @$Info{qw(sep up cur)};
7468              
7469             my $Abs = path_is_absolute($Path) ## Check path type
7470              
7471             ## Our old favorites min and max
7472              
7473             $x = max($x, 0); ## Should have been part of Perl...
7474             $x = min($x, 100);
7475              
7476             guess_endings() tries to figure out whether an open IO::File handle
7477             has DOS, Mac, or Unix file endings. It reads successively large
7478             blocks of the file until it either finds evidence of at least two
7479             separate line endings (of any type, but presumably they are the same),
7480             or until it reaches the end of the file. Then, it takes the resulting
7481             block and searches for the first qualifying line ending sequence it
7482             finds, if any. This sequence is then returned to the caller. If it
7483             returns undef, it was not able to find any evidence of line endings in
7484             the file.
7485              
7486             guess_delimiter() takes a string buffer and returns a "," unless it
7487             finds a tab character before the first comma in the $String, if any,
7488             in which case a tab is returned.
7489              
7490             path_info() returns a hash of three helpful strings for building and
7491             parsing paths on the current platform. Knows about Mac, Dos/Win, and
7492             otherwise defaults to Unix.
7493              
7494             path_is_absolute($Path) returns true if it thinks the given path
7495             string is an absolute path on the current platform.
7496              
7497             =cut
7498            
7499             {}; ## Get emacs to indent correctly.
7500              
7501             sub guess_endings
7502             {
7503 19     19 0 29 my ($File) = @_;
7504              
7505 19         29 my $Ending = undef;
7506              
7507 19         25 my $ReadCount = 0;
7508 19         31 my $BlockSize = 512;
7509              
7510 19         26 my $Buf;
7511             my $Actual;
7512              
7513 19         104 while ($File->seek(0, 0), $Actual = $File->read($Buf, ($BlockSize * ++$ReadCount)))
7514             {
7515             ## Break out of the loop if it appears a line ending match is
7516             ## found (but disallow initial match at very end of buffer).
7517            
7518 19 50       792 last if $Buf =~ /((?:\x0D\x0A)|(?:\x0D)|(?:\x0A))[^\x0D\x0A]/;
7519            
7520             ## Break out of the loop if we just read any less than
7521             ## attempted (we are probably at the end of a very short,
7522             ## maybe one-line or even zero-line, file).
7523            
7524 0 0       0 last if $Actual < ($BlockSize * $ReadCount);
7525             }
7526              
7527             ## We can presume that the buffer we now have must either have
7528             ## line endings in it, or there is no line ending in the file at
7529             ## all. So we extract the first one we come to, (trying the DOS
7530             ## ending first since it contains the other two), if any, and we
7531             ## return it.
7532              
7533 19         181 my $Ending = ($Buf =~ /((\x0D\x0A)|(\x0D)|(\x0A))/)[0];
7534            
7535             ## &progress_default(undef, "DOS line endings") if $2; ## Debugging.
7536             ## &progress_default(undef, "Mac line endings") if $3; ## Debugging.
7537             ## &progress_default(undef, "Unix line endings") if $4; ## Debugging.
7538              
7539 19         70 done:
7540              
7541             ## We always seek back to zero when done.
7542             $File->seek(0, 0);
7543              
7544 19         223 return($Ending);
7545             }
7546              
7547             sub guess_delimiter
7548             {
7549 19     19 0 37 my ($String) = @_;
7550            
7551 19   50     162 return(($String =~ /([,\t])/)[0] || ",");
7552             }
7553              
7554             sub path_info
7555             {
7556 1     1   5 use Config qw(%Config);
  1         2  
  1         145  
7557 70     70 0 673 my $OSName = $Config{osname};
7558            
7559 70 50       277 return({sep =>':' , up =>'::' , cur =>':' }) if $OSName =~ /mac /ix;
7560 70 50       363 return({sep =>'\\', up =>'..\\', cur =>'.\\'}) if $OSName =~ /(?
7561 70         379 return({sep =>'/' , up =>'../' , cur =>'./' }) ;
7562             }
7563              
7564             sub path_is_absolute
7565             {
7566 53     53 0 86 my ($Path) = @_;
7567              
7568 1     1   6 use Config qw(%Config);
  1         2  
  1         494  
7569 53         316 my $OSName = $Config{osname};
7570            
7571 53 50       190 return($Path =~ /^[^:]/) if $OSName =~ /mac /ix;
7572 53 50       216 return($Path =~ /^(([a-z][:])|(\\\\))/i) if $OSName =~ /(?
7573 53         182 return($Path =~ /^\//) ;
7574             }
7575              
7576             ### min and max
7577              
7578 188169 100   188169 0 865915 sub min {return($_[0] < $_[1] ? $_[0] : $_[1])}
7579 134 100   134 0 323 sub max {return($_[0] > $_[1] ? $_[0] : $_[1])}
7580              
7581              
7582             =pod
7583              
7584             =head1 IMPLEMENTATION LIMITATIONS
7585              
7586             =over 4
7587              
7588             =item Column (field) names must not start with underscore
7589              
7590             This object is implemented as a blessed hash reference. By
7591             convention, keys that do not start with underscore are data columns
7592             and the key is the field name. Keys that do start with underscore
7593             refer to parameters or other data structures stored in the object.
7594              
7595             Consequently, no field names may start with underscore. When a file
7596             is read from disk, any field names that DO start with underscores will
7597             have the leading underscores stripped off. Strange things could then
7598             occur if the field names are then no longer unique. For example,
7599             field "A" and "_A" in the data file would be treated as the single
7600             field "A" after the file was read.
7601              
7602             =item Field values are always read as strings
7603              
7604             Field values when written to a file are necessarily converted to
7605             strings. When read back in, they are read as strings, regardless of
7606             original format. The sole exception is the empty string which is read
7607             back in as undef for efficiency.
7608              
7609             An exception is when the _CacheOnWrite feature is used: field values
7610             stored internally as integers or other scalar types may be saved and
7611             later restored as such. However, you should not rely on this
7612             behavior.
7613              
7614             =item Undef vs. empty
7615              
7616             Empty field values are stored as "undef" for efficiency. This means
7617             that programs should generally not rely on any differences between ""
7618             and undef in field values. However, when working with large but
7619             sparse tables, programs should take care not to convert undef values
7620             to empty strings unnecessarily since the separate string objects
7621             consume considerably more memory than undef.
7622              
7623             =back
7624              
7625             =head1 CONTRIBUTIONS
7626              
7627             Corrections, bug reports, bug fixes, or feature additions are
7628             encouraged. Please send additions or patches with a clear explanation
7629             of their purpose. Consider making additions in the form of a subclass
7630             if possible.
7631              
7632             I'm committed to bundling useful subclasses contributed by myself or
7633             others with this main distribution.
7634              
7635             So, if you've got a subclass of Data::CTable (which should have a name
7636             like Data::CTable::YourClassName) and you would like it included in
7637             the main distribution, please send it along with a test script and
7638             I'll review the code and add it (at my discretion).
7639              
7640             If you've got a module that uses, augments, or complements this one,
7641             let me know that, too, and I'll make appropriate mention of it.
7642              
7643             =head1 SEE ALSO
7644              
7645             The Data::CTable home page:
7646              
7647             http://christhorman.com/projects/perl/Data-CTable/
7648              
7649             The implementation in CTable.pm.
7650              
7651             The test.pl script, other subclasses, and examples.
7652              
7653             The Data::ShowTable module.
7654              
7655             The Data::Table module by Yingyao Zhou & Guangzhou Zou.
7656              
7657             The perlref manual page.
7658              
7659             =head1 AUTHOR
7660              
7661             Chris Thorman
7662              
7663             Copyright (c) 1995-2002 Chris Thorman. All rights reserved.
7664              
7665             This program is free software; you can redistribute it and/or modify
7666             it under the same terms as Perl itself.
7667              
7668             =cut
7669              
7670             1;
7671