line
stmt
bran
cond
sub
pod
time
code
1
package Statistics::Data;
2
4
4
65520
use strict;
4
8
4
155
3
4
4
16
use warnings FATAL => 'all';
4
4
4
160
4
4
4
18
use Carp qw(croak);
4
13
4
390
5
4
4
2385
use List::AllUtils qw(all);
4
46715
4
363
6
4
4
2283
use Number::Misc qw(is_even);
4
4320
4
230
7
4
4
22
use Scalar::Util qw(looks_like_number);
4
5
4
321
8
4
4
2090
use String::Util qw(hascontent nocontent);
4
17036
4
13203
9
our $VERSION = '0.09';
10
11
=head1 NAME
12
13
Statistics::Data - Load, access, update, check and save one or more sequences of data for statistical analysis
14
15
=head1 VERSION
16
17
This is documentation for Version 0.09 of Statistics/Data.pm, released February 2015.
18
19
=head1 SYNOPSIS
20
21
use Statistics::Data 0.09;
22
my $dat = Statistics::Data->new();
23
24
# managing labelled sequences:
25
$dat->load({'aname' => \@data1, 'anothername' => \@data2}); # labels are arbitrary
26
$aref = $dat->access(label => 'aname'); # gets back a copy of @data1
27
$dat->add(aname => [2, 3]); # pushes new values onto loaded copy of @data1
28
$dat->dump_list(); # print to check if both arrays are loaded and their number of elements
29
$dat->unload(label => 'anothername'); # only 'aname' data remains loaded
30
$aref = $dat->access(label => 'aname'); # $aref is a reference to a copy of @data1
31
$dat->dump_vals(label => 'aname', delim => ','); # proof in print it's back
32
33
# managing multiple anonymous sequences:
34
$dat->load(\@data1, \@data2); # any number of anonymous arrays
35
$dat->add([2], [6]); # pushes a single value apiece onto copies of @data1 and @data2
36
$aref = $dat->access(index => 1); # returns reference to copy of @data2, with its new values
37
$dat->unload(index => 0); # only @data2 remains loaded, and its index is now 0
38
39
# managing single anonymous sequence, over time:
40
$dat->load(1, 2, 2);
41
$dat->add(1); # loaded sequence is now 1, 2, 2, 1
42
$dat->dump_vals(); # same as: print @{$dat->access()}, "\n";
43
$dat->save_to_file(path => 'variables1.dat'); # write to disk
44
$dat->unload(); # all gone - go wild, knowing that you can ...
45
$dat->load_from_path(path => 'variables1.dat'); # back again - go to work
46
$dat->dump_vals(); # proof: same printed output as before
47
# do more work
48
49
=head1 DESCRIPTION
50
51
Handles data for some other statistics modules, as in loading, updating and retrieving data for analysis. Performs no actual statistical analysis itself.
52
53
Rationale is not wanting to write the same or similar load, add, etc. methods for every statistics module, not to provide an omnibus API for Perl stat modules. It, however, encompasses much of the variety of how Perl stats modules do the basic handling their data. Used for L (and its sub-tests).
54
55
=head1 SUBROUTINES/METHODS
56
57
Manages caches of one or more lists of data for use by some other statistics modules. The lists are ordered arrays comprised of literal scalars (numbers, strings). They can be loaded, added to (updated), accessed or unloaded by referring to the index (order) in which they have been loaded (or previously added to), or by a particular label. The lists are cached within the class object's '_DATA' aref as an aref itself, optionally associated with a 'label'. The particular structures supported here to load, update, retrieve, unload data are specified under L. Any module that uses this one as its base can still use its own rules to select the appropriate sequence, or provide the appropriate sequence within the call to itself.
58
59
=head2 Constructors
60
61
=head3 new
62
63
$dat = Statistics::Data->new();
64
65
Returns a new Statistics::Data object.
66
67
=cut
68
69
sub new {
70
4
4
1
34
my $class = shift;
71
4
50
17
my $self = bless {}, ref($class) ? ref($class) : $class;
72
4
21
$self->{_DATA} = [];
73
4
12
return $self;
74
}
75
76
=head3 clone
77
78
$new_self = $dat->clone();
79
80
I: B
81
82
Returns a copy of the class object with its data loaded (if any). Note this is not a copy of any particular data but the whole blessed hash. Alternatively, use L to get all the data added to a new object, or use L to load/add particular arrays of data into another object. Nothing modified in this new object affects the original.
83
84
=cut
85
86
sub clone {
87
0
0
1
0
my $self = shift;
88
0
0
require Clone;
89
0
0
return Clone::clone($self);
90
}
91
*copy = \*clone;
92
93
=head2 Setting data
94
95
Methods to cache and uncache data into the data-object.
96
97
=head3 load
98
99
$dat->load(@data); # CASE 1 - can be updated/retrieved anonymously, or as index => i (load order)
100
$dat->load(\@data); # CASE 2 - same, as aref
101
$dat->load(data => \@data); # CASE 3 - updated/retrieved as label => 'data' (arbitrary name, not just 'data'); or by index (order)
102
$dat->load({ data => \@data }) # CASE 4 - same as CASE 4, as hashref
103
$dat->load(blues => \@blue_data, reds => \@red_data); # CASE 5 - same as CASE 3 but with multiple named loads
104
$dat->load({ blues => \@blue_data, reds => \@red_data }); # CASE 6 - same as CASE 5 bu as hashref
105
$dat->load(\@blue_data, \@red_data); # CASE 7 - same as CASE 2 but with multiple aref loads
106
107
# Not supported:
108
#$dat->load(data => @data); # not OK - use CASE 3 instead
109
#$dat->load([\@blue_data, \@red_data]); # not OK - use CASE 7 instead
110
#$dat->load([ [blues => \@blue_data], [reds => \@red_data] ]); # not OK - use CASE 5 or CASE 6 instead
111
#$dat->load(blues => \@blue_data, reds => [\@red_data1, \@red_data2]); # not OK - too mixed to make sense
112
113
I: B
114
115
Cache a list of data as an array-reference. Each call removes previous loads, as does sending nothing. If data need to be cached without unloading previous loads, try L. Arguments with the following structures are acceptable as data, and will be Lible by either index or label as expected:
116
117
=over 4
118
119
=item load ARRAY
120
121
Load an anonymous array that has no named values. For example:
122
123
$dat->load(1, 4, 7);
124
$dat->load(@ari);
125
126
This is loaded as a single sequence, with an undefined label, and indexed as 0. Note that trying to load a labelled dataset with an unreferenced array is wrong for it will be treated like this case - the label will be "folded" into the sequence itself.
127
128
=item load AREF
129
130
Load a reference to a single anonymous array that has no named values, e.g.:
131
132
$dat->load([1, 4, 7]);
133
$dat->load(\@ari);
134
135
This is loaded as a single sequence, with an undefined label, and indexed as 0.
136
137
=item load ARRAY of AREF(s)
138
139
Same as above, but note that more than one unlabelled array-reference can also be loaded at once, e.g.:
140
141
$dat->load([1, 4, 7], [2, 5, 9]);
142
$dat->load(\@ari1, \@ari2);
143
144
Each sequence can be accessed, using L, by specifying B => index, the latter value representing the order in which these arrays were loaded.
145
146
=item load HASH of AREF(s)
147
148
Load one or more labelled references to arrays, e.g.:
149
150
$dat->load('dist1' => [1, 4, 7]);
151
$dat->load('dist1' => [1, 4, 7], 'dist2' => [2, 5, 9]);
152
153
This loads the sequence(s) with a label attribute, so that when calling L, they can be retrieved by name, e.g., passing B => 'dist1'. The load method involves a check that there is an even number of arguments, and that, if this really is a hash, all the keys are defined and not empty, and all the values are in fact array-references.
154
155
=item load HASHREF of AREF(s)
156
157
As above, but where the hash is referenced, e.g.:
158
159
$dat->load({'dist1' => [1, 4, 7], 'dist2' => [2, 5, 9]});
160
161
=back
162
163
This means that using the following forms will produce unexpected results, if they do not actually croak, and so should not be used:
164
165
$dat->load(data => @data); # no croak but wrong - puts "data" in @data - use \@data
166
$dat->load([\@blue_data, \@red_data]); # use unreferenced ARRAY of AREFs instead
167
$dat->load([ [blues => \@blue_data], [reds => \@red_data] ]); # treated as single AREF; use HASH of AREFs instead
168
$dat->load(blues => \@blue_data, reds => [\@red_data1, \@red_data2]); # mixed structures not supported
169
170
=cut
171
172
sub load
173
{ # load single aref: cannot load more than one sequence; keeps a direct reference to the data: any edits creep back.
174
17
17
1
14121
my ( $self, @args ) = @_;
175
17
66
$self->unload();
176
17
43
$self->add(@args);
177
17
38
return 1;
178
}
179
*load_data = \&load;
180
181
=head3 add
182
183
I: B, B, B
184
185
Same usage as above for L. Just push any value(s) or so along, or loads an entirely labelled sequence, without clobbering what's already in there (as L would). If data have not been loaded with a label, then appending data to them happens according to the order of array-refs set here, see L could even skip adding something to one previously loaded sequence by, e.g., going $dat->add([], \new_data) - adding nothing to the first loaded sequence, and initialising a second array, if none already, or appending these data to it.
186
187
=cut
188
189
sub add {
190
29
29
1
7010
my ( $self, @args ) = @_;
191
29
63
my $tmp = _init_data( $self, @args )
192
; # hashref of data sequence(s) keyed by index to use for loading or adding
193
29
32
while ( my ( $i, $val ) = each %{$tmp} ) {
62
159
194
33
100
61
if ( defined $val->{'lab'} ) { # newly labelled data
195
16
63
$self->{_DATA}->[$i] =
196
{ seq => $val->{'seq'}, lab => $val->{'lab'} };
197
}
198
else
199
{ # data to be added to existing cache, or an anonymous load, indexed only
200
17
18
push @{ $self->{_DATA}->[$i]->{'seq'} }, @{ $val->{'seq'} };
17
84
17
48
201
}
202
}
203
29
73
return;
204
}
205
*add_data = \&add;
206
*append_data = \&add;
207
*update = \&add;
208
209
=head3 unload
210
211
$dat->unload(); # deletes all cached data, named or not
212
$dat->unload(index => integer); # deletes the aref named 'data' whatever
213
$dat->unload(label => 'a name'); # deletes the aref named 'data' whatever
214
215
Empty, clear, clobber what's in there. Croaks if given index or label does not refer to any loaded data. This should be used whenever any already loaded or added data are no longer required ahead of another L, including via L or L.
216
217
=cut
218
219
sub unload {
220
19
19
1
574
my ( $self, @args ) = @_;
221
19
100
41
if ( !$args[0] ) {
222
18
34
$self->{_DATA} = [];
223
}
224
else {
225
1
2
splice @{ $self->{_DATA} }, _index_by_args( $self, @args ), 1;
1
2
226
}
227
19
48
return;
228
}
229
230
=head3 share
231
232
$dat_new->share($dat_old);
233
234
Adds all the data from one Statistics::Data object to another. Changes in the new copies do not affect the originals.
235
236
=cut
237
238
sub share {
239
1
1
1
4
my ( $self, $other ) = @_;
240
1
4
_add_from_object_aref( $self, $other->{_DATA} );
241
1
1
return 1;
242
}
243
244
=head3 load_from_file
245
246
$dat->load_from_file(path => 'medata.csv', format => 'xml|csv');
247
$dat->load_from_file(path => 'mysequences.csv', serializer => 'XML::Simple', compress => 1, secret => '123'); # serialization options
248
249
Loads data from a file, assuming there are data in the given path that have been saved in the format used in L. Basically a wrapper to L method in L; cf. for options; and then to L. If the data retreived are actually to be added to any data already cached via a previous load or add, define the optional parameter B => 1.
250
251
=cut
252
253
sub load_from_file {
254
0
0
1
0
my ( $self, @args ) = @_;
255
0
0
0
my $args = ref $args[0] ? $args[0] : {@args};
256
0
0
0
0
croak 'There is no path for loading data'
257
if nocontent( $args->{'path'} )
258
|| !-e $args->{'path'}; # is filepath valid?
259
0
0
require Data::Serializer;
260
0
0
my $serializer = Data::Serializer->new( %{$args} );
0
0
261
0
0
my $href = $serializer->retrieve( $args->{'path'} );
262
0
0
0
$self->unload() unless $args->{'keep'};
263
0
0
_add_from_object_aref( $self, $href->{DATA} );
264
0
0
return 1;
265
}
266
267
=head2 Getting data
268
269
To retrieve what has been previously loaded, simply call L, specifying the "label" or "index" that was used to load/add the data - i.e., when loaded as a hashref or an arrayref, respectively; specifying the list by B (as loaded hash-wise) or B (as loaded array-wise).
270
271
For retrieving more than one previously loaded dataset, use one of the "get" methods, choosing between getting back a hash- or an array-ref, or to get back a single list, as by L, after all. These "get" methods only support retrieving data loaded as hashrefs for now; use L to get back index-specific loads. They might be folded within L later on.
272
273
=head3 access
274
275
$aref = $dat->access(); #returns the first and/or only sequence loaded, if any
276
$aref = $dat->access(index => integer); #returns the ith sequence loaded
277
$aref = $dat->access(label => 'a_name'); # returns a particular named cache of data
278
279
I: B
280
281
Returns one referenced array being previously loaded/added to data by the given B (in a flat-list load) or B (in a hash-wise load). Same as calling L.
282
283
=cut
284
285
sub access {
286
24
24
1
7123
my ( $self, @args ) = @_;
287
24
67
return $self->{_DATA}->[ _index_by_args( $self, @args ) ]->{'seq'};
288
}
289
*read = \&access; # legacy only
290
291
=head3 get_hoa, get_hoa_by_lab
292
293
$href = $data->get_hoa(lab => [qw/fez boa/]); # retrieve 1 or more named data
294
$href = $data->get_hoa(); # retrieve all named data
295
296
Returns a hashref of arefs, where the keys are the names of the data, as previously given in a load, and the values are arefs of the list of data that has been loaded for that name.
297
298
The optional argument B should be a reference to a list of one or more data that have been given as keys in a hash-wise L. Any elements in this list that have not been used as names in a load are ignored. If none of the names has been used, an empty list is returned. If there is no B argument, then all of the loaded data are returned as a hashref of arefs; if there were no named data, this a reference to an empty hash.
299
300
This is useful in a module like L that needs to continuously cross-refer to multiple variables to make a single calculation while also being able to distinguish them by some meaningful key other than simply an index number.
301
302
For working with numerical data in particular, see the following two methods.
303
304
=cut
305
306
sub get_hoa_by_lab {
307
0
0
1
0
my ( $self, %args ) = @_;
308
0
0
0
0
$args{'lab'} = [ $args{'lab'} ]
309
if hascontent( $args{'lab'} )
310
and not ref $args{'lab'};
311
0
0
my %data = ();
312
0
0
0
if ( !ref $args{'lab'} ) { # get all data
313
0
0
for my $i ( 0 .. $self->ndata() - 1 ) {
314
0
0
0
if ( hascontent( $self->{_DATA}->[$i]->{'lab'} ) ) {
315
0
0
$data{ $self->{_DATA}->[$i]->{'lab'} } =
316
$self->{_DATA}->[$i]->{'seq'};
317
}
318
}
319
}
320
else { # get named data
321
0
0
for my $i ( 0 .. scalar @{ $args{'lab'} } - 1 )
0
0
322
{ # assume ref eq 'ARRAY'
323
0
0
my $j = _seq_index_by_label( $self, $args{'lab'}->[$i] )
324
; # is name loaded with data?
325
0
0
0
if ( defined $j ) {
326
0
0
$data{ $args{'lab'}->[$i] } = $self->{_DATA}->[$j]->{'seq'};
327
} # else ignore the given name
328
}
329
}
330
0
0
0
return wantarray ? %data : \%data;
331
}
332
*get_hoa = \&get_hoa_by_lab;
333
334
=head3 get_hoa_by_lab_numonly_indep
335
336
$hoa = $dat->get_hoa_by_lab_numonly_indep(); # same as get_hoa but each list culled of NaNs
337
338
Returns the variables given in the argument B (an aref of strings) culled on any empty or non-numeric values. This is done by treating each variable indpendently, with culls on one "list" not creating a cull on any other. This is the type of data useful for an independent ANOVA.
339
340
=cut
341
342
sub get_hoa_by_lab_numonly_indep {
343
0
0
1
0
my ( $self, %args ) = @_;
344
0
0
return _cull_hoa_indep( scalar $self->get_hoa_by_lab(%args),
345
\$self->{'purged'} );
346
}
347
348
=head3 get_hoa_by_lab_numonly_across
349
350
$hoa = $dat->get_hoa_by_lab_numonly_across(); # same as get_hoa but each list culled of NaNs at same i across lists
351
352
Returns hashref of previously loaded variable data (as arefs) culled of an empty or non-numerical values whereby even a valid value in one list is culled if it is at an index that is invalid in another list. This is the type of data useful for a dependent ANOVA.
353
354
=cut
355
356
sub get_hoa_by_lab_numonly_across {
357
0
0
1
0
my ( $self, %args ) = @_;
358
0
0
return _cull_hoa_across( scalar $self->get_hoa_by_lab(%args),
359
\$self->{'purged'} );
360
}
361
362
=head3 get_aoa, get_aoa_by_lab
363
364
$aref_of_arefs = $dat->get_aoa_by_lab();
365
366
Returns a reference to an array where each value is itself an array of data, as separately loaded under a different name or anonymously, in the order that they were loaded. If no B value is defined, all the loaded data are returned as a list of arefs.
367
368
=cut
369
370
sub get_aoa_by_lab {
371
0
0
1
0
my ( $self, %args ) = @_;
372
0
0
0
0
$args{'lab'} = [ $args{'lab'} ]
373
if hascontent( $args{'lab'} )
374
and not ref $args{'lab'};
375
0
0
my @data = ();
376
0
0
0
if ( !ref $args{'lab'} ) { # get all data
377
0
0
for my $i ( 0 .. $self->ndata() - 1 ) {
378
0
0
$data[$i] = $self->{_DATA}->[$i]->{'seq'};
379
}
380
}
381
else { # get named data
382
0
0
for my $i ( 0 .. scalar @{ $args{'lab'} } - 1 )
0
0
383
{ # assume ref eq 'ARRAY'
384
0
0
my $j = _seq_index_by_label( $self, $args{'lab'}->[$i] )
385
; # is name loaded with data?
386
0
0
0
if ( defined $j ) {
387
0
0
$data[$i] = $self->{_DATA}->[$j]->{'seq'};
388
} # else ignore the given name
389
}
390
}
391
0
0
0
return wantarray ? @data : \@data; # unreferenced for chance legacy for now
392
}
393
*get_aoa = \&get_aoa_by_lab;
394
395
=head3 get_aref_by_lab
396
397
$aref = $dat->get_aref_by_lab();
398
399
Returns a reference to a single, previously loaded hashref of arrayed of data, as specified in the named argument B. The array is empty if no data have been loaded, or if there is none with the given B. If B is not defined, the the last-loaded data, if any, is returned (as aref).
400
401
=cut
402
403
sub get_aref_by_lab {
404
0
0
1
0
my ( $self, %args ) = @_;
405
0
0
my $aref = [];
406
0
0
0
0
if ( nocontent( $args{'lab'} ) && ref $self->{_DATA}->[-1]->{'seq'} ) {
407
0
0
$aref = $self->{_DATA}->[-1]->{'seq'};
408
}
409
else {
410
0
0
my $j = _seq_index_by_label( $self, $args{'lab'} )
411
; # is name loaded with data?
412
0
0
0
if ( defined $j ) {
413
0
0
$aref = $self->{_DATA}->[$j]->{'seq'};
414
}
415
}
416
0
0
return $aref;
417
}
418
419
=head3 ndata
420
421
$n = $dat->ndata();
422
423
Returns the number of loaded variables.
424
425
=cut
426
427
sub ndata {
428
6
6
1
1982
my $self = shift;
429
6
8
return scalar( @{ $self->{'_DATA'} } );
6
28
430
}
431
432
=head3 labels
433
434
$aref = $dat->labels();
435
436
Returns a reference to an array of all the datanames (labels), if any.
437
438
=cut
439
440
sub labels {
441
0
0
1
0
my $self = shift;
442
0
0
my @names = ();
443
0
0
for ( 0 .. scalar @{ $self->{'_DATA'} } - 1 ) {
0
0
444
0
0
0
push @names, $self->{'_DATA'}->[$_]->{'lab'}
445
if hascontent( $self->{'_DATA'}->[$_]->{'lab'} );
446
}
447
0
0
return \@names;
448
}
449
450
=head2 Checking data
451
452
=head3 all_full
453
454
$bool = $dat->all_full(\@data); # test data are valid before loading them
455
$bool = $dat->all_full(label => 'mydata'); # checking after loading/adding the data (or key in 'index')
456
457
Checks not only if the data sequence, as named or indexed, exists, but if it is non-empty: has no empty elements, with any elements that might exist in there being checked with L.
458
459
=cut
460
461
sub all_full {
462
3
3
1
3988
my ( $self, @args ) = @_;
463
3
50
13
my $data = ref $args[0] ? shift @args : $self->access(@args);
464
3
5
my ( $bool, @vals ) = ();
465
3
5
foreach ( @{$data} ) {
3
9
466
24
100
36
$bool = nocontent($_) ? 0 : 1;
467
24
100
151
if (wantarray) {
468
5
100
13
push @vals, $_ if $bool;
469
}
470
else {
471
19
100
38
last if $bool == 0;
472
}
473
}
474
3
100
10
return wantarray ? ( \@vals, $bool ) : $bool;
475
}
476
477
=head3 all_numeric
478
479
$bool = $dat->all_numeric(); # test data first-loaded, if any
480
$bool = $dat->all_numeric(\@data); # test these data are valid before loading them
481
$bool = $dat->all_numeric(label => 'mydata'); # check specific data after loading/adding them by a 'label' or by their 'index' order
482
($aref, $bool) = $dat->all_numeric([3, '', 4.7, undef, 'b']); # returns ([3, 4.7], 0); - same for any loaded data
483
484
Given an aref of data, or reference to data previously loaded (see L), tests numeracy of each element, and return, if called in scalar context, a boolean scalar indicating if all data in this aref are defined and not empty (using C in L), and, if they have content, if these are all numerical, using C in L. Alternatively, if called in array context, returns the data (as an aref) less any values that failed this test, followed by the boolean.
485
486
=cut
487
488
sub all_numeric {
489
7
7
1
3055
my ( $self, @args ) = @_;
490
7
7
my ( $data, $bool ) = ();
491
7
100
13
if ( ref $args[0] eq 'ARRAY' ) {
492
3
5
$data = shift @args;
493
}
494
else {
495
4
9
$data = $self->{_DATA}->[ _index_by_args( $self, @args ) ]->{'seq'};
496
}
497
7
11
my (@vals) = ();
498
7
6
foreach ( @{$data} ) {
7
12
499
58
100
100
80
$bool = ( nocontent($_) or not looks_like_number($_) ) ? 0 : 1;
500
58
100
428
if (wantarray) {
501
5
100
7
push @vals, $_ if $bool;
502
}
503
else {
504
53
100
156
last if $bool == 0;
505
}
506
53
58
$data = \@vals;
507
}
508
7
100
21
return wantarray ? ( $data, $bool ) : $bool;
509
}
510
*all_numerical = \&all_numeric;
511
512
=head3 all_proportions
513
514
$bool = $dat->all_proportions(\@data); # test data are valid before loading them
515
$bool = $dat->all_proportions(label => 'mydata'); # checking after loading/adding the data (or key in 'index')
516
517
Ensure data are all proportions. Sometimes, the data a module needs are all proportions, ranging from 0 to 1 inclusive. A dataset might have to be cleaned
518
519
=cut
520
521
sub all_proportions {
522
5
5
1
796
my ( $self, @args ) = @_;
523
5
100
15
my $data = ref $args[0] ? shift @args : $self->access(@args);
524
5
5
my ( $bool, @vals ) = ();
525
5
9
foreach ( @{$data} ) {
5
9
526
15
100
18
if ( nocontent($_) ) {
100
527
3
15
$bool = 0;
528
}
529
elsif ( looks_like_number($_) ) {
530
11
100
66
132
$bool = ( $_ < 0 || $_ > 1 ) ? 0 : 1;
531
}
532
15
100
23
if (wantarray) {
533
5
100
9
push @vals, $_ if $bool;
534
}
535
else {
536
10
100
25
last if $bool == 0;
537
}
538
}
539
5
100
12
return wantarray ? ( \@vals, $bool ) : $bool;
540
}
541
542
=head3 all_counts
543
544
$bool = $dat->all_counts(\@data); # test data are valid before loading them
545
$bool = $dat->all_counts(label => 'mydata'); # checking after loading/adding the data (or key in 'index')
546
($aref, $bool) = $dat->all_counts(\@data);
547
548
Returns true if all values in given data are real positive integers or zero, as well as satisfying "hascontent" and "looks_like_number" methods; false otherwise. Called in array context, returns aref of data culled of any values that are false on this basis, and then the boolean. For example, [2.2, 3, 4] and [-1, 3, 4] both fail, but [1, 3, 4] is true. Integer test is simply if $v == int($v).
549
550
=cut
551
552
sub all_counts {
553
3
3
1
13
my ( $self, @args ) = @_;
554
3
50
8
my $data = ref $args[0] ? shift @args : $self->access(@args);
555
3
4
my ( $bool, @vals ) = ();
556
3
3
foreach ( @{$data} ) {
3
6
557
5
50
10
if ( nocontent($_) ) {
50
558
0
0
$bool = 0;
559
}
560
elsif ( looks_like_number($_) ) {
561
5
100
100
65
$bool = $_ >= 0 && $_ == int($_) ? 1 : 0;
562
}
563
else {
564
0
0
$bool = 0;
565
}
566
5
50
16
if (wantarray) {
567
0
0
0
push @vals, $_ if $bool;
568
}
569
else {
570
5
100
10
last if $bool == 0;
571
}
572
}
573
3
50
8
return wantarray ? ( \@vals, $bool ) : $bool;
574
}
575
576
=head3 all_pos
577
578
$bool = $dat->all_pos(\@data); # test data are valid before loading them
579
$bool = $dat->all_pos(label => 'mydata'); # checking after loading/adding the data (or key in 'index')
580
($aref, $bool) = $dat->all_pos(\@data);
581
582
Returns true if all values in given data are greater than zero, as well as "hascontent" and "looks_like_number"; false otherwise. Called in array context, returns aref of data culled of any values that are false on this basis, and then the boolean.
583
584
=cut
585
586
sub all_pos {
587
2
2
1
9
my ( $self, @args ) = @_;
588
2
50
22
my $data = ref $args[0] ? shift @args : $self->access(@args);
589
2
3
my ( $bool, @vals ) = ();
590
2
3
foreach ( @{$data} ) {
2
6
591
5
50
15
if ( nocontent($_) ) {
50
592
0
0
$bool = 0;
593
}
594
elsif ( looks_like_number($_) ) {
595
5
100
52
$bool = $_ > 0 ? 1 : 0;
596
}
597
5
50
10
if (wantarray) {
598
0
0
0
push @vals, $_ if $bool;
599
}
600
else {
601
5
100
19
last if $bool == 0;
602
}
603
}
604
2
50
8
return wantarray ? ( \@vals, $bool ) : $bool;
605
}
606
607
=head3 equal_n
608
609
If the given data or aref of variable names all have the same number of elements, then that number is returned; otherwise 0.
610
611
=cut
612
613
sub equal_n {
614
0
0
1
0
my ( $self, %args ) = ( shift, @_ );
615
0
0
0
my $data =
616
$args{'data'} ? delete $args{'data'} : $self->get_hoa_by_lab(%args);
617
0
0
0
my @data = values %{$data} if ref $data eq 'HASH';
0
0
618
0
0
my $n = scalar @{ $data[0] };
0
0
619
0
0
for ( 1 .. scalar @data - 1 ) {
620
0
0
my $tmp_count = scalar @{ $data[$_] };
0
0
621
0
0
0
if ( $tmp_count != $n ) {
622
0
0
$n = 0;
623
0
0
last;
624
}
625
else {
626
0
0
$n = $tmp_count;
627
}
628
}
629
0
0
return $n;
630
}
631
632
=head3 idx_anumeric
633
634
Given an aref, returns a reference to an array of indices for a particular dataset (list) where the values are either undefined, empty or non-numerical.
635
636
=cut
637
638
sub idx_anumeric
639
{ # List keyed by sample-names of their indices where invalid values lie
640
0
0
1
0
my ( $self, $dat ) = @_;
641
0
0
my %purge = ();
642
0
0
for my $i ( 0 .. scalar @{$dat} - 1 ) {
0
0
643
0
0
0
$purge{$i}++ if !looks_like_number( $dat->[$i] );
644
}
645
0
0
return \%purge;
646
}
647
648
=head2 Dumping data
649
650
=head3 dump_vals
651
652
$seq->dump_vals(delim => ", "); # assumes the first (only?) loaded sequence should be dumped
653
$seq->dump_vals(index => I, delim => ", "); # dump the i'th loaded sequence
654
$seq->dump_vals(label => 'mysequence', delim => ", "); # dump the sequence loaded/added with the given "label"
655
656
Prints to STDOUT a space-separated line (ending with "\n") of a loaded/added data's elements. Optionally, give a value for B to specify how the elements in each sequence should be separated; default is a single space.
657
658
=cut
659
660
sub dump_vals {
661
0
0
1
0
my ( $self, @args ) = @_;
662
0
0
0
my $args = ref $args[0] ? $args[0] : {@args};
663
0
0
0
my $delim = $args->{'delim'} || q{ };
664
0
0
0
print {*STDOUT} join( $delim, @{ $self->access($args) } ), "\n"
0
0
0
0
665
or croak 'Could not print line to STDOUT';
666
0
0
return 1;
667
}
668
669
=head3 dump_list
670
671
Dumps a list (using L) of the data currently loaded, without showing their actual elements. List is firstly by index, then by label (if any), then gives the number of elements in the associated sequence.
672
673
=cut
674
675
sub dump_list {
676
0
0
1
0
my ( $self, $i, $lim, $lab, $N, $len_lab, $len_n, $tbl, @rows, @maxlens ) =
677
(shift);
678
0
0
$lim = $self->ndata();
679
0
0
0
@maxlens = ( ( $lim > 5 ? $lim : 5 ), 5, 1 );
680
0
0
for my $i ( 0 .. $lim - 1 ) {
681
0
0
0
$lab =
682
defined $self->{_DATA}->[$i]->{lab}
683
? $self->{_DATA}->[$i]->{lab}
684
: q{-};
685
0
0
$N = scalar @{ $self->{_DATA}->[$i]->{seq} };
0
0
686
0
0
$len_lab = length $lab;
687
0
0
$len_n = length $N;
688
0
0
0
$maxlens[1] = $len_lab if $len_lab > $maxlens[1];
689
0
0
0
$maxlens[2] = $len_n if $len_n > $maxlens[2];
690
0
0
$rows[$i] = [ $i, $lab, $N ];
691
}
692
0
0
require Text::SimpleTable;
693
0
0
$tbl = Text::SimpleTable->new(
694
[ $maxlens[0], 'index' ],
695
[ $maxlens[1], 'label' ],
696
[ $maxlens[2], 'N' ]
697
);
698
0
0
$tbl->row( @{$_} ) foreach @rows;
0
0
699
0
0
0
print {*STDOUT} $tbl->draw or croak 'Could not print list of loaded data';
0
0
700
0
0
return 1;
701
}
702
703
=head3 save_to_file
704
705
$dat->save_to_file(path => 'mysequences.csv');
706
$dat->save_to_file(path => 'mysequences.csv', serializer => 'XML::Simple', compress => 1, secret => '123'); # serialization options
707
708
Saves the data presently loaded in the Statistics::Data object to a file, with the given B. This can be retrieved, with all the data added to the Statistics::Data object, via L. Basically a wrapper to C method in L; cf. for options.
709
710
=cut
711
712
sub save_to_file {
713
0
0
1
0
my ( $self, @args ) = @_;
714
0
0
0
my $args = ref $args[0] ? $args[0] : {@args};
715
0
0
0
croak 'There is no path for saving data' if nocontent( $args->{'path'} );
716
0
0
require Data::Serializer;
717
0
0
my $serializer = Data::Serializer->new( %{$args} );
0
0
718
0
0
$serializer->store( { DATA => $self->{_DATA} }, $args->{'path'} );
719
0
0
return 1;
720
}
721
*save = \&save_to_file;
722
723
# PRIVATE METHODS:
724
725
sub _cull_hoa_indep {
726
0
0
0
my $hoa = shift;
727
0
0
my $purged_n = shift;
728
0
0
my $purged = 0;
729
0
0
my %purged_data = ();
730
0
0
foreach my $name ( keys %{$hoa} ) {
0
0
731
0
0
my @clean = ();
732
0
0
for my $i ( 0 .. scalar( @{ $hoa->{$name} } ) - 1 ) {
0
0
733
0
0
0
0
if ( nocontent( $hoa->{$name}->[$i] )
734
or not looks_like_number( $hoa->{$name}->[$i] ) )
735
{
736
0
0
$purged++;
737
}
738
else {
739
0
0
push @clean, $hoa->{$name}->[$i];
740
}
741
}
742
croak
743
0
0
0
"Empty data for ANOVA following purge of invalid value(s) in list < $name >"
744
if !scalar @clean;
745
0
0
$purged_data{$name} = [@clean];
746
}
747
0
0
0
${$purged_n} = $purged if ref $purged_n;
0
0
748
0
0
return \%purged_data;
749
}
750
751
sub _cull_hoa_across {
752
0
0
0
my $hoa = shift;
753
0
0
my $purged_n = shift;
754
0
0
my ( $purged, %inv_1, %invalid_ids, %clean, %purged_data ) = ();
755
756
0
0
foreach my $name ( keys %{$hoa} ) {
0
0
757
0
0
for my $i ( 0 .. scalar( @{ $hoa->{$name} } ) - 1 ) {
0
0
758
0
0
0
$inv_1{$name}->{$i} = 1
759
if !looks_like_number( $hoa->{$name}->[$i] );
760
}
761
}
762
763
# List of all indices in all lists with invalid values; and copy of each group of data:
764
0
0
foreach my $name ( keys %{$hoa} ) {
0
0
765
0
0
$clean{$name} = $hoa->{$name}; #$self->get_aref_by_lab( lab => $name );
766
0
0
while ( my ( $key, $val ) = each %{ $inv_1{$name} } )
0
0
767
{ #each %{ $self->{'_purge'}->{$name} } ) {
768
0
0
$invalid_ids{$key} += $val;
769
}
770
}
771
0
0
0
$purged = scalar( keys(%invalid_ids) ) || 0;
772
773
# Purge by index (from highest to lowest):
774
0
0
my @invalid_ids = reverse( sort { $a <=> $b } keys(%invalid_ids) );
0
0
775
0
0
foreach my $cull (@invalid_ids) {
776
0
0
foreach my $name ( keys %clean ) {
777
0
0
splice( @{ $clean{$name} }, $cull, 1 );
0
0
778
}
779
}
780
781
0
0
foreach my $c ( keys %clean ) {
782
0
0
$purged_data{$c} = $clean{$c};
783
}
784
0
0
0
${$purged_n} = $purged if ref $purged_n;
0
0
785
0
0
return \%purged_data;
786
}
787
788
sub _init_data {
789
29
29
42
my ( $self, @args ) = @_;
790
29
43
my $tmp = {};
791
29
100
62
if ( _isa_hashref_of_arefs( $args[0] ) ) { # cases 4 & 6
100
100
50
792
5
15
$tmp = _init_labelled_data( $self, $args[0] );
793
}
794
elsif ( _isa_hash_of_arefs(@args) ) { # cases 3 & 5
795
14
52
$tmp = _init_labelled_data( $self, {@args} );
796
}
797
elsif ( _isa_array_of_arefs(@args) ) { # case 2 & 7
798
8
15
$tmp = _init_unlabelled_data(@args);
799
}
800
elsif ( ref $args[0] ) {
801
0
0
croak 'Don\'t know how to load/add data';
802
}
803
else { # assume @args is just a list of nude strings - case 1
804
2
15
$tmp->{0} = { seq => [@args], lab => undef };
805
}
806
29
85
return $tmp;
807
}
808
809
sub _isa_hashref_of_arefs {
810
29
29
29
my $arg = shift;
811
29
100
100
126
if ( not ref $arg or ref $arg ne 'HASH' ) {
812
24
79
return 0;
813
}
814
else {
815
5
6
return _isa_hash_of_arefs( %{$arg} );
5
21
816
}
817
}
818
819
sub _isa_hash_of_arefs {
820
821
# determines that:
822
# - scalar @args passes Number::Misc is_even, then that:
823
# - every odd indexed value 'hascontent' via String::Util
824
# - every even indexed value is aref
825
29
29
47
my @args = @_;
826
29
28
my $bool = 0;
827
29
100
80
if ( is_even( scalar @args ) )
828
{ # Number::Misc method - not odd number in assignment
829
21
562
my %args = @args; # so assume is hash
830
HASHCHECK:
831
21
76
while ( my ( $lab, $val ) = each %args ) {
832
25
100
66
72
if ( hascontent($lab) && ref $val eq 'ARRAY' ) {
833
23
257
$bool = 1;
834
}
835
else {
836
2
35
$bool = 0;
837
}
838
25
100
135
last HASHCHECK if $bool == 0;
839
}
840
}
841
else {
842
8
163
$bool = 0;
843
}
844
29
83
return $bool;
845
}
846
847
sub _isa_array_of_arefs {
848
10
10
16
my @args = @_;
849
10
100
10
74
if ( all { ref($_) eq 'ARRAY' } @args ) {
10
29
850
8
16
return 1;
851
}
852
else {
853
2
11
return 0;
854
}
855
}
856
857
sub _init_labelled_data {
858
19
19
20
my ( $self, $href ) = @_;
859
19
20
my ( $i, %tmp ) = ( scalar @{ $self->{_DATA} } );
19
42
860
19
20
while ( my ( $lab, $seq ) = each %{$href} ) {
42
110
861
23
45
my $j = _seq_index_by_label( $self, $lab );
862
23
100
41
if ( defined $j )
863
{ # there is already a label for these data, so don't need to define it for this init
864
7
9
$tmp{$j} = { seq => [ @{$seq} ], lab => undef };
7
33
865
}
866
else { # no aref labelled $lab yet: define for seq and label
867
16
16
$tmp{ $i++ } = { seq => [ @{$seq} ], lab => $lab };
16
113
868
}
869
}
870
19
37
return \%tmp;
871
}
872
873
sub _init_unlabelled_data {
874
8
8
11
my @args = @_;
875
8
11
my %tmp = ();
876
8
19
for my $i ( 0 .. scalar @args - 1 ) {
877
8
7
$tmp{$i} = { seq => [ @{ $args[$i] } ], lab => undef };
8
57
878
}
879
8
17
return \%tmp;
880
}
881
882
sub _index_by_args {
883
29
29
38
my ( $self, @args ) = @_;
884
29
23
my $i;
885
29
100
57
if ( !$args[0] ) {
886
10
12
$i = 0;
887
}
888
else {
889
19
50
69
my $args = ref $args[0] ? $args[0] : {@args};
890
19
50
72
if ( hascontent( $args->{'index'} ) ) {
50
891
0
0
$i = $args->{'index'};
892
}
893
elsif ( hascontent( $args->{'label'} ) ) {
894
19
219
$i = _seq_index_by_label( $self, $args->{'label'} );
895
}
896
else {
897
0
0
$i = 0;
898
}
899
} # print "index by args = $i\n";
900
29
50
33
204
return ( defined $i and ref $self->{_DATA}->[$i]->{'seq'} )
901
? $i
902
: croak __PACKAGE__, ' Data for accessing need to be loaded';
903
}
904
905
sub _seq_index_by_label {
906
42
42
48
my ( $self, $label ) = @_;
907
42
40
my ( $i, $k ) = ( 0, 0 );
908
42
41
for ( ; $i < scalar( @{ $self->{_DATA} } ) ; $i++ ) {
46
101
909
30
100
66
180
do { $k++; last; }
26
24
26
45
910
if $self->{_DATA}->[$i]->{lab}
911
and $self->{_DATA}->[$i]->{lab} eq $label;
912
}
913
42
100
94
return $k ? $i : undef;
914
}
915
916
sub _add_from_object_aref {
917
1
1
1
my ( $self, $aref ) = @_;
918
1
2
foreach my $dat ( @{$aref} ) {
1
2
919
2
50
5
if ( hascontent( $dat->{'lab'} ) ) {
920
2
14
$self->add( $dat->{'lab'} => $dat->{'seq'} );
921
}
922
else {
923
0
0
$self->add( $dat->{'seq'} );
924
}
925
}
926
1
2
return 1;
927
}
928
929
=head1 EXAMPLES
930
931
B<1. Multivariate data (a tale of horny frogs)>
932
933
In a study of how doing mental arithmetic affects arousal in self and others (i.e., how mind, body and world interact), three male frogs were maths-trained and then, as they did their calculations, were measured for pupillary dilation and perceived attractiveness. After four runs, average measures per frog can be loaded:
934
935
$frogs->load(Names => [qw/Freddo Kermit Larry/], Pupil => [59.2, 77.7, 56.1], Attract => [3.11, 8.79, 6.99]);
936
937
But one more frog still had to graudate from training, and data are now ready for loading:
938
939
$frogs->add(Names => ['Sleepy'], Pupil => [83.4], Attract => [5.30]);
940
$frogs->dump_data(label => 'Pupil'); # prints "59.2 77.7 56.1 83.4" : all 4 frogs' pupil data for analysis by some module
941
942
Say we're finished testing for now, so:
943
944
$frogs->save_to_file(path => 'frogs.csv');
945
$frogs->unload();
946
947
But another frog has been trained, measures taken:
948
949
$frogs->load_from_file(path => 'frogs.csv');
950
$frogs->add(Pupil => [93], Attract => [6.47], Names => ['Jack']); # add yet another frog's data
951
$frogs->dump_data(label => 'Pupil'); # prints "59.2 77.7 56.1 83.4 93": all 5 frogs' pupil data
952
953
Now we run another experiment, taking measures of heart-rate, and can add them to the current load of data for analysis:
954
955
$frogs->add(Heartrate => [.70, .50, .44, .67, .66]); # add entire new sequence for all frogs
956
print "heartrate data are bung" if ! $frogs->all_proportions(label => 'Heartrate'); # validity check (could do before add)
957
$frogs->dump_list(); # see all four data-sequences now loaded, each with 5 observations (1 per frog), i.e.:
958
.-------+-----------+----.
959
| index | label | N |
960
+-------+-----------+----+
961
| 0 | Names | 5 |
962
| 1 | Attract | 5 |
963
| 2 | Pupil | 5 |
964
| 3 | Heartrate | 5 |
965
'-------+-----------+----'
966
967
B<2. Using as a base module>
968
969
As L, and so its sub-modules, use this module as their base, it doesn't have to do much data-managing itself:
970
971
use Statistics::Sequences;
972
my $seq = Statistics::Sequences->new();
973
$seq->load(qw/f b f b b/); # using Statistics::Data method
974
say $seq->p_value(stat => 'runs', exact => 1); # using Statistics::Sequences::Runs method
975
976
Or if these data were loaded directly within Statistics::Data, the data can be shared around modules that use it as a base:
977
978
use Statistics::Data;
979
use Statistics::Sequences::Runs;
980
my $dat = Statistics::Data->new();
981
my $runs = Statistics::Sequences::Runs->new();
982
$dat->load(qw/f b f b b/);
983
$runs->pass($dat);
984
say $runs->p_value(exact => 1);
985
986
=head1 DIAGNOSTICS
987
988
=over 4
989
990
=item Don't know how to load/add data
991
992
Croaked when attempting to load or add data with an unsupported data structure where the first argument is a reference. See the examples under L for valid (and invalid) ways of sending data to them.
993
994
=item Data for accessing need to be loaded
995
996
Croaked when calling L, or any methods that use it internally -- viz., L and the validity checks L -- when it is called with a label for data that have not been loaded, or did not load successfully.
997
998
=item Data for unloading need to be loaded
999
1000
Croaked when calling L with an index or a label attribute and the data these refer to have not been loaded, or did not load successfully.
1001
1002
=item There is no path for saving (or loading) data
1003
1004
Croaked when calling L or L without a value for the required B argument, or if it does not exist when it's touched for a load.
1005
1006
=back
1007
1008
=head1 DEPENDENCIES
1009
1010
L - used for its C method when testing loads
1011
1012
L - used for its C method when testing loads
1013
1014
L - used for its C and C methods
1015
1016
L - required for L and L
1017
1018
L - required for L
1019
1020
L - required for L
1021
1022
=head1 BUGS AND LIMITATIONS
1023
1024
Some methods rely on accessing previously loaded data but should permit performing their operations on data submitted directly to them, just like, e.g., $dat->all_numeric(\@data) is ok. This is handled for now internally, but should be handled in the same way by modules using this one as its base - for at the moment they have to check for an aref to their data-manipulating methods ahead of accessing any loaded data by this module.
1025
1026
Please report any bugs or feature requests to C, or through the web interface at L. This will notify the author, and then you'll automatically be notified of progress on your bug as any changes are made.
1027
1028
=head1 SUPPORT
1029
1030
You can find documentation for this module with the perldoc command.
1031
1032
perldoc Statistics::Data
1033
1034
You can also look for information at:
1035
1036
=over 4
1037
1038
=item * RT: CPAN's request tracker
1039
1040
L
1041
1042
=item * AnnoCPAN: Annotated CPAN documentation
1043
1044
L
1045
1046
=item * CPAN Ratings
1047
1048
L
1049
1050
=item * Search CPAN
1051
1052
L
1053
1054
=back
1055
1056
=head1 AUTHOR
1057
1058
Roderick Garton, C<< >>
1059
1060
=head1 LICENSE AND COPYRIGHT
1061
1062
Copyright 2009-2015 Roderick Garton
1063
1064
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published
1065
by the Free Software Foundation; or the Artistic License. See L for more information.
1066
1067
=cut
1068
1069
1; # End of Statistics::Data