File Coverage

blib/lib/Tie/FieldVals.pm
Criterion Covered Total %
statement 110 164 67.0
branch 23 68 33.8
condition 4 12 33.3
subroutine 16 25 64.0
pod 6 6 100.0
total 159 275 57.8


line stmt bran cond sub pod time code
1             package Tie::FieldVals;
2 8     8   121515 use strict;
  8         20  
  8         239  
3 8     8   46 use warnings;
  8         16  
  8         597  
4              
5             =head1 NAME
6              
7             Tie::FieldVals - an array tie for a file of enhanced Field:Value data
8              
9             =head1 VERSION
10              
11             This describes version B<0.6203> of Tie::FieldVals.
12              
13             =cut
14              
15             our $VERSION = '0.6203';
16              
17             =head1 SYNOPSIS
18              
19             use Tie::FieldVals;
20             use Tie::FieldVals::Row;
21              
22             # tie the array
23             my @records;
24             my $recs_obj = tie @records, 'Tie::FieldVals', datafile=>$datafile;
25              
26             # object methods
27             my @field_names = $recs_obj->field_names();
28              
29             =head1 DESCRIPTION
30              
31             This is a Tie object to map the records in an enhanced Field:Value data
32             file into an array. Each file has multiple records, each record has its
33             values defined by a Field:Value pair, with the enhancements that (a) the
34             Value part can extend over more than one line (because the Field names
35             are predefined) and (b) Fields can have multiple values by repeating
36             the Field:Value part for a given field.
37              
38             Because of its use of the Tie::File module, access to each record is
39             reasonably fast. The Tie::File module also ensures that (a) the whole file
40             doesn't have to be read into memory (b) record changes are written to the
41             file straight away (c) record changes don't require the whole file to be
42             rewritten, just the part of the file after the change.
43              
44             The advantage of this setup is that one can have useful data files which
45             are plain text, human readable, human editable, and at the same time able
46             to be accessed faster than using XML (I know, I wrote a version of my
47             reporting software using XML data, and even the fastest XML parsers weren't
48             as fast as this setup, once there were a reasonable number of records).
49             This also has advantages over a simpler setup where values are given one
50             per line with no indication of what value belongs to what field; the
51             problems with that is that it is harder to fix corrupted data by hand, and
52             it is harder to add new fields, and one can't have multi-line data.
53              
54             It is likewise better than a CSV (Comma-Separated Values) file, because
55             again, with a CSV file, the data is positional and therefore harder to fix
56             and harder to change, and again one can't have multi-line data.
57              
58             This module is both better and worse than file-oriented databases like
59             L and its variants and extensions (such as L). This module
60             does not require that each record have a unique key, and the fact that a
61             DBM file is binary makes it not only less correctable, but also less
62             portable. On the downside, this module isn't as fast.
63              
64             Naturally, if one's data needs are more complex, it is probably better to
65             use a fully-fledged database; this is oriented towards those who don't wish
66             to have the overhead of setting up and maintaining a relational database
67             server, and wish to use something more straightforward.
68              
69             This comes bundled with other support modules, such as the
70             Tie::FieldVals::Row module. The Tie::FieldVals::Select module is for
71             selecting and sorting a sub-set from a Tie::FieldVals array, and the
72             Tie::FieldVals::Join is a very simple method of joining two files on a
73             common field.
74              
75             This distribution includes the fv2xml script, which converts a
76             Tie::FieldVals data file into an XML file, and xml2fv which
77             converts an XML file into a Tie::FieldVals data file.
78              
79             =head1 FILE FORMAT
80              
81             The data file is in the form of Field:Value pairs, with each
82             record separated by a line with '=' on it. The first record
83             is an "empty" record, which just contains the field names;
84             this lets us know what the legal fields are.
85             A line which doesn't start with a recognised field is
86             considered to be part of the value of the most recent Field.
87              
88             =head2 Example 1
89              
90             Name:
91             Entry:
92             =
93             Name:fanzine
94             Entry:Fanzines are amateur magazines produced by fans.
95             =
96             Name:fan fiction (fanfic)
97             Entry:Original fiction written by fans of a particular
98             TV Show/Movie set in the universe depicted by that work.
99             =
100              
101             The first record just contains Name: and Entry: fields to show that those
102             are the legal fields for this file. The third record gives an example
103             of a value that goes over more than one line.
104              
105             =head2 Example 2
106              
107             Author:
108             AuthorEmail:
109             AuthorURL:
110             AuthorURLName:
111             =
112             Author:Adele
113             AuthorEmail:adele@example.com
114             AuthorEmail:adele@example.tas.edu
115             AuthorURL:
116             AuthorURLName:
117             =
118             Author:Danzer,Brenda
119             AuthorEmail:
120             AuthorURL:http://www.example.com/~danzer
121             AuthorURLName:Danzer Dancing
122             AuthorURL:http://www.brendance.com/
123             AuthorURLName:BrenDance
124             =
125              
126             This one gives examples of multi-valued fields.
127              
128             =head2 Gotchas
129              
130             Field names cannot have spaces in them, indeed, they must
131             consist of plain alphanumeric characters or underscores.
132             They are case-sensitive.
133              
134             The record separator (=) must be on a line by itself, and the last record
135             in the file must also have a record-separator after it.
136              
137             =cut
138              
139 8     8   194 use 5.006;
  8         26  
140 8     8   39 use strict;
  8         12  
  8         205  
141 8     8   39 use Carp;
  8         15  
  8         688  
142 8     8   7130 use Tie::Array;
  8         10201  
  8         232  
143 8     8   10031 use Tie::File;
  8         199678  
  8         293  
144 8     8   5918 use Tie::FieldVals::Row;
  8         22  
  8         293  
145 8     8   44 use Fcntl qw(:DEFAULT);
  8         12  
  8         3751  
146 8     8   21007 use Data::Dumper;
  8         68866  
  8         15638  
147              
148             our @ISA = qw(Tie::Array);
149              
150             # to make taint happy
151             $ENV{PATH} = "/bin:/usr/bin:/usr/local/bin";
152             $ENV{CDPATH} = '';
153             $ENV{BASH_ENV} = '';
154              
155             # for debugging
156             my $DEBUG = 0;
157              
158             =head1 PUBLIC FUNCTIONS
159              
160             =head2 find_field_names
161              
162             my @field_names = Tie::FieldVals::find_field_names($datafile);
163              
164             Read the field-name information from the file, if the file
165             exists and is readable.
166              
167             =cut
168             sub find_field_names ($) {
169 11 50   11 1 38 carp &whowasi if $DEBUG;
170 11         22 my $datafile = shift;
171              
172 11         23 my @field_names = ();
173 11 50       152 if (-r $datafile)
174             {
175             # make a temporary file object to look at
176 11         15 my @records;
177 11 50       94 my $file_obj = tie @records, 'Tie::File', "$datafile",
178             recsep =>"\n=\n", mode=>O_RDONLY, memory=>0
179             or croak "Tie::FieldVals::find_field_names - Could not open '",
180             $datafile, "'.";
181              
182             # the field info is in the first record
183 11         1839 my %row = ();
184 11         88 my $row_obj = tie %row,
185             'Tie::FieldVals::Row', fields=>['dummy'];
186 11         66 my $rec_str = $records[0];
187 11 50       1170 if (defined $rec_str)
188             {
189 11         55 $row_obj->set_from_string($rec_str,
190             override_keys=>1);
191 11         18 @field_names = @{$row_obj->field_names()};
  11         61  
192             }
193 11         22 undef $file_obj;
194 11         83 untie @records;
195 11         462 undef $row_obj;
196 11         74 untie %row;
197             }
198              
199 11         60 return @field_names;
200              
201             } # find_field_names
202              
203             =head1 OBJECT METHODS
204              
205             =head2 field_names
206              
207             Get the field names of this data.
208              
209             my @field_names = $recs_obj->field_names();
210              
211             =cut
212             sub field_names {
213 0 0   0 1 0 carp &whowasi if $DEBUG;
214 0         0 my $self = shift;
215              
216 0         0 @{$self->{field_names}};
  0         0  
217             }
218              
219             =head2 flock
220              
221             $recs_obj->flock(MODE);
222              
223             Locks the data file. "MODE" has the same meaning as the second
224             argument to the Perl built-in "flock" function; for example
225             "LOCK_SH" or "LOCK_EX | LOCK_NB". (These constants are provided
226             by the "use Fcntl ':flock';" declaration.)
227              
228             "MODE" is optional; the default is "LOCK_EX".
229              
230             When you use "flock" to lock the file, "Tie::FieldVals" assumes that the
231             record cache is no longer trustworthy, because another process might have
232             modified the file since the last time it was read. Therefore, a successful
233             call to "flock" discards the contents of the record cache.
234              
235             The best way to unlock a file is to discard the object and untie the
236             array. It is probably unsafe to unlock the file without also untying
237             it, because if you do, changes may remain unwritten inside the object.
238             That is why there is no shortcut for unlocking. If you really want to
239             unlock the file prematurely, you know what to do; if you don't know
240             what to do, then don't do it.
241              
242             See L for more information (this calls the
243             flock method of that module).
244              
245             =cut
246             sub flock {
247 0 0   0 1 0 carp &whowasi if $DEBUG;
248 0         0 my $self = shift;
249              
250             # call the Tie::File flock method
251 0 0       0 if ($self->{FILE_OBJ}->flock(@_))
252             {
253             # clear the cache
254 0         0 $self->{REC_CACHE} = {};
255             }
256             }
257              
258             =head1 TIE-ARRAY METHODS
259              
260             =head2 TIEARRAY
261              
262             Create a new instance of the object as tied to an array.
263              
264             tie @people, 'Tie::FieldVals', datafile=>$datafile;
265              
266             tie @people, 'Tie::FieldVals', datafile=>$datafile,
267             mode=>O_RDONLY, cache_size=>1000, memory=>0;
268              
269             tie @people, 'Tie::FieldVals', datafile=>$datafile,
270             fields=>[qw(Name Email)], mode=>(O_RDWR|O_CREAT);
271              
272             tie @people, 'Tie::FieldVals', datafile=>$datafile,
273             mode=>O_RDWR, cache_all=>1;
274              
275             Arguments:
276              
277             =over
278              
279             =item datafile
280              
281             The file with the data in it. (required)
282              
283             =item fields
284              
285             Field defintions for creating a new file. This is ignored if the
286             file already exists.
287              
288             =item mode
289              
290             The mode to open the file with. O_RDONLY means that the file is read-only.
291             O_RDWR means that the file is read-write.
292             (default: O_RDONLY)
293              
294             =item cache_all
295              
296             If true, cache all the records in the file. This will speed things up,
297             but consume more memory. (default: false)
298              
299             Note that this merely sets the cache_size to the size of the file when
300             the tie is initially made: if you add more records to the file, the
301             cache size will not be increased.
302              
303             =item cache_size
304              
305             The size of the cache (if we aren't caching all the records).
306             (default: 100) As ever, there is a trade-off between space and time.
307              
308             =item memory
309              
310             The upper limit on the memory consumed by C.
311             (See L).
312             (default: 10,000,000)
313              
314             Note that there are two caches: the cache of unparsed records maintained
315             by Tie::File, and the cache of parsed records maintained by Tie::FieldVals.
316             The B option affects the Tie::File cache, and the B
317             options affect the Tie::FieldVals cache.
318              
319             =back
320              
321             =cut
322             sub TIEARRAY {
323 10 50   10   1388 carp &whowasi if $DEBUG;
324 10         22 my $class = shift;
325 10         91 my %args = (
326             datafile=>'',
327             mode=>(O_RDONLY),
328             cache_size=>100,
329             cache_all=>0,
330             memory=>10_000_000,
331             fields=>undef,
332             @_
333             );
334              
335 10         23 my $self = {};
336              
337             # check if the file is readable while existing
338 10 50 66     367 if (-e $args{datafile} && !-r $args{datafile})
339             {
340 0         0 croak "Tie::FieldVals::TIEARRAY - Could not read '", $args{datafile}, "'.";
341             }
342 10         20 my @records;
343 10 100       127 if (-e $args{datafile})
344             {
345 9         41 @{$self->{field_names}} = find_field_names($args{datafile});
  9         50  
346             $self->{FILE_OBJ} = tie @records, 'Tie::File', "$args{datafile}",
347             recsep =>"\n=\n", mode=>$args{mode}, memory=>$args{memory}
348 9 50       70 or croak "Tie::FieldVals - Could not open '", $args{datafile}, "'.";
349 9         1191 $self->{FILE_RECS} = \@records;
350             }
351             else
352             {
353             # check that the fields have been given
354 1 50 33     9 if (!defined $args{fields}
355             || ref $args{fields} ne 'ARRAY')
356             {
357             croak "Tie::FieldVals - ", $args{datafile},
358 0         0 " does not exist and no field names were given";
359             }
360             # set the fields and tie the file
361 1         3 @{$self->{field_names}} = @{$args{fields}};
  1         4  
  1         3  
362              
363             $self->{FILE_OBJ} = tie @records, 'Tie::File', "$args{datafile}",
364             recsep =>"\n=\n", mode=>$args{mode}, memory=>$args{memory}
365 1 50       11 or croak "Tie::FieldVals - Could not open '", $args{datafile}, "'.";
366 1         237 $self->{FILE_RECS} = \@records;
367              
368 1         4 set_field_names($self);
369             }
370              
371 10         27 $self->{OPTIONS} = \%args;
372              
373             # set a hash of the field names
374 10         23 foreach my $fn (@{$self->{field_names}})
  10         34  
375             {
376 98         218 $self->{field_names_hash}->{$fn} = 1;
377             }
378              
379 10         28 $self->{REC_CACHE} = {};
380 10 100       41 if ($args{cache_all}) # set the cache to the size of the file
381             {
382 1         6 my $count = @records;
383 1         438 $self->{OPTIONS}->{cache_size} = $count;
384             }
385              
386 10   33     96 bless ($self, (ref $class || $class));
387             } # TIEARRAY
388              
389             =head2 FETCH
390              
391             Get a row from the array.
392              
393             $val = $array[$ind];
394              
395             Returns a reference to a Tie::FieldVals::Row hash, or undef.
396              
397             =cut
398             sub FETCH {
399 915 50   915   2405 carp &whowasi if $DEBUG;
400 915         1345 my ($self, $ind) = @_;
401              
402 915 100       2230 if (defined $self->{REC_CACHE}->{$ind})
403             {
404 301         1110 return $self->{REC_CACHE}->{$ind};
405             }
406             else # not cached, add to cache
407             {
408             # remove one from cache if cache full
409 614         707 my @cached = keys %{$self->{REC_CACHE}};
  614         8703  
410 614 100       3141 if (@cached >= $self->{OPTIONS}->{cache_size})
411             {
412 106         443 delete $self->{REC_CACHE}->{shift @cached};
413             }
414 614         833 %{$self->{REC_CACHE}->{$ind}} = ();
  614         1778  
415 614         2680 my $row_obj = tie %{$self->{REC_CACHE}->{$ind}},
416 614         755 'Tie::FieldVals::Row', fields=>$self->{field_names};
417             # remember, the 0 record is the empty fields record
418 614         2686 my $rec_str = $self->{FILE_RECS}->[$ind + 1];
419 614 50       50170 if (defined $rec_str)
420             {
421 614         1776 $row_obj->set_from_string($rec_str);
422 614         3932 return $self->{REC_CACHE}->{$ind};
423             }
424             else
425             {
426 0         0 delete $self->{REC_CACHE}->{$ind};
427 0         0 return undef;
428             }
429             }
430 0         0 return undef;
431             } # FETCH
432              
433             =head2 STORE
434              
435             Add a value to the array. Value must be a Tie::FieldVals::Row hash.
436              
437             $array[$ind] = $val;
438              
439             If $ind is bigger than the array, then just push, don't extend.
440              
441             =cut
442             sub STORE {
443 0 0   0   0 carp &whowasi if $DEBUG;
444 0         0 my ($self, $ind, $val) = @_;
445              
446             # only store a hash and if writing
447 0 0 0     0 if (ref $val eq 'HASH'
448             && $self->{OPTIONS}->{mode} & O_RDWR)
449             {
450 0 0       0 if ($ind > $self->FETCHSIZE())
451             {
452 0         0 $ind = $self->FETCHSIZE();
453 0         0 $self->{REC_CACHE}->{$ind} = $val;
454 0         0 my $row_obj = tied %{$val};
  0         0  
455 0         0 my $rec_str = $row_obj->get_as_string();
456 0         0 $self->{FILE_OBJ}->PUSH($rec_str);
457             }
458             else
459             {
460 0         0 $self->{REC_CACHE}->{$ind} = $val;
461 0         0 my $row_obj = tied %{$val};
  0         0  
462 0         0 my $rec_str = $row_obj->get_as_string();
463             # remember record 0 is the empty fields record
464 0         0 $self->{FILE_OBJ}->STORE($ind + 1, $rec_str);
465             }
466             }
467             } # STORE
468              
469             =head2 FETCHSIZE
470              
471             Get the size of the array.
472              
473             =cut
474             sub FETCHSIZE {
475 10 50   10   3379 carp &whowasi if $DEBUG;
476 10         112 my $self = shift;
477              
478             # remember record 0 is the empty fields record
479 10         57 return ($self->{FILE_OBJ}->FETCHSIZE() - 1);
480             } # FETCHSIZE
481              
482             =head2 STORESIZE
483              
484             Set the size of the array, if the file is writeable.
485              
486             =cut
487             sub STORESIZE {
488 0 0   0   0 carp &whowasi if $DEBUG;
489 0         0 my $self = shift;
490 0         0 my $count = shift;
491              
492 0 0       0 if ($self->{OPTIONS}->{mode} & O_RDWR)
493             {
494             # remember record 0 is the empty fields record
495 0         0 $self->{FILE_OBJ}->STORESIZE($count + 1);
496             }
497             } # STORESIZE
498              
499             =head2 EXISTS
500              
501             exists $array[$ind];
502              
503             =cut
504             sub EXISTS {
505 0 0   0   0 carp &whowasi if $DEBUG;
506 0         0 my $self = shift;
507 0         0 my $ind = shift;
508              
509             # remember record 0 is the empty fields record
510 0         0 return $self->{FILE_OBJ}->EXISTS($ind + 1);
511             } # EXISTS
512              
513             =head2 DELETE
514              
515             delete $array[$ind];
516              
517             Delete the value at $ind if the file is writeable.
518              
519             =cut
520             sub DELETE {
521 0 0   0   0 carp &whowasi if $DEBUG;
522 0         0 my $self = shift;
523 0         0 my $ind = shift;
524              
525 0 0       0 if ($self->{OPTIONS}->{mode} & O_RDWR)
526             {
527 0 0       0 if (exists $self->{REC_CACHE}->{$ind})
528             {
529 0         0 delete $self->{REC_CACHE}->{$ind};
530             }
531             # remember record 0 is the empty fields record
532 0         0 $self->{FILE_OBJ}->DELETE($ind + 1);
533             }
534             } # DELETE
535              
536             =head2 CLEAR
537              
538             @array = ();
539              
540             Clear the array if the file is writeable.
541              
542             =cut
543             sub CLEAR {
544 0 0   0   0 carp &whowasi if $DEBUG;
545 0         0 my $self = shift;
546 0         0 my $ind = shift;
547              
548 0 0       0 if ($self->{OPTIONS}->{mode} & O_RDWR)
549             {
550 0         0 $self->{REC_CACHE} = {};
551             # remember record 0 is the empty fields record
552 0         0 my $rec_str = $self->{FILE_RECS}->[0];
553 0         0 $self->{FILE_OBJ}->CLEAR();
554 0         0 $self->{FILE_RECS}->[0] = $rec_str;
555             }
556             } # CLEAR
557              
558             =head2 UNTIE
559              
560             untie @array;
561              
562             Untie the array.
563              
564             =cut
565             sub UNTIE {
566 1 50   1   4 carp &whowasi if $DEBUG;
567 1         2 my $self = shift;
568 1         2 my $count = shift;
569              
570 1 50       3 carp "untie attempted while $count inner references still exist" if $count;
571 1         3 $self->{REC_CACHE} = {};
572 1         4 undef $self->{FILE_OBJ};
573 1         2 untie @{$self->{FILE_RECS}};
  1         6  
574             } # UNTIE
575              
576             =head1 PRIVATE METHODS
577              
578             This documentation is for developer reference only.
579              
580             =head2 debug
581              
582             Set debugging on.
583              
584             =cut
585 0 0   0 1 0 sub debug { $DEBUG = @_ ? shift : 1 }
586              
587             =head2 whowasi
588              
589             For debugging: say who called this
590              
591             =cut
592 0     0 1 0 sub whowasi { (caller(1))[3] . '()' }
593              
594             =head2 set_field_names
595              
596             Set the field names in the data-file to be the given field names.
597             (Assumes the file didn't exist before).
598              
599             =cut
600             sub set_field_names ($) {
601 1 50   1 1 3 carp &whowasi if $DEBUG;
602 1         2 my $self = shift;
603              
604 1         2 my %row = ();
605             # set the row fields from the given fields
606             my $row_obj = tie %row,
607 1         10 'Tie::FieldVals::Row', fields=>$self->{field_names};
608             # give the row fields values of the empty string
609             # (right now they are undefined)
610 1         2 foreach my $fn (@{$self->{field_names}})
  1         3  
611             {
612 4         19 $row{$fn} = '';
613             }
614             # get the empty row as a string, and set the file record[0]
615             # to that string
616 1         5 my $rec_str = $row_obj->get_as_string();
617 1         7 $self->{FILE_RECS}->[0] = $rec_str;
618              
619             } # set_field_names
620              
621             =head1 REQUIRES
622              
623             Test::More
624              
625             Carp
626             Tie::Array
627             Tie::File
628             Fcntl
629             Data::Dumper
630              
631             Getopt::Long
632             Pod::Usage
633             Getopt::ArgvFile
634             File::Basename
635              
636             =head1 INSTALLATION
637              
638             To install this module, run the following commands:
639              
640             perl Build.PL
641             ./Build
642             ./Build test
643             ./Build install
644              
645             Or, if you're on a platform (like DOS or Windows) that doesn't like the
646             "./" notation, you can do this:
647              
648             perl Build.PL
649             perl Build
650             perl Build test
651             perl Build install
652              
653             In order to install somewhere other than the default, such as
654             in a directory under your home directory, like "/home/fred/perl"
655             go
656              
657             perl Build.PL --install_base /home/fred/perl
658              
659             as the first step instead.
660              
661             This will install the files underneath /home/fred/perl.
662              
663             You will then need to make sure that you alter the PERL5LIB variable to
664             find the modules, and the PATH variable to find the script.
665              
666             Therefore you will need to change:
667             your path, to include /home/fred/perl/script (where the script will be)
668              
669             PATH=/home/fred/perl/script:${PATH}
670              
671             the PERL5LIB variable to add /home/fred/perl/lib
672              
673             PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
674              
675              
676             =head1 SEE ALSO
677              
678             perl(1).
679             L
680             L
681             L
682             L
683              
684             =head1 BUGS
685              
686             Please report any bugs or feature requests to the author.
687              
688             =head1 AUTHOR
689              
690             Kathryn Andersen (RUBYKAT)
691             perlkat AT katspace dot com
692             http://www.katspace.com
693              
694             =head1 COPYRIGHT AND LICENCE
695              
696             Copyright (c) 2004-2008 by Kathryn Andersen
697              
698             This program is free software; you can redistribute it and/or modify it
699             under the same terms as Perl itself.
700              
701              
702             =cut
703              
704             1; # End of Tie::FieldVals
705             # vim: ts=8 sts=4 sw=4
706             __END__