File Coverage

blib/lib/Statistics/Sequences.pm

Criterion	Covered	Total	%
statement	24	123	19.5
branch	0	68	0.0
condition	0	7	0.0
subroutine	8	19	42.1
pod	9	9	100.0
total	41	226	18.1

line	stmt	bran	cond	sub	pod	time	code
1							package Statistics::Sequences;
2	2			2		26669	use strict;
	2					4
	2					51
3	2			2		6	use warnings FATAL => 'all';
	2					2
	2					71
4	2			2		7	use Carp qw(croak cluck);
	2					8
	2					125
5	2			2		969	use Statistics::Data 0.09;
	2					41296
	2					59
6	2			2		14	use base qw(Statistics::Data);
	2					4
	2					149
7	2			2		8	use Scalar::Util qw(looks_like_number);
	2					2
	2					545
8							$Statistics::Sequences::VERSION = '0.14';
9
10							=pod
11
12							=head1 NAME
13
14							Statistics::Sequences - Manage sequences (ordered list of literals) for testing their runs, joins, turns, trinomes, potential energy, etc.
15
16							=head1 VERSION
17
18							This is documentation for Version 0.14 of Statistics::Sequences.
19
20							=head1 SYNOPSIS
21
22							use Statistics::Sequences 0.14;
23							$seq = Statistics::Sequences->new();
24							my @data = (1, 'a', 'a', 1); # ordered list of literal scalars (numbers, strings), as permitted by specific test
25							$seq->load(\@data); # or @data or dataname => \@data
26							print $seq->observed(stat => 'runs'); # expected, variance, z_value, p_value - assuming sub-module Runs.pm is installed
27							print $seq->test(stat => 'vnomes', length => 2); # - - assuming sub-module Vnomes.pm is installed
28							$seq->dump(stat => 'runs', values => {observed => 1, z_value => 1, p_value => 1}, exact => 1, tails => 1);
29							# see also Statistics::Data for inherited methods
30
31							=head1 DESCRIPTION
32
33							Loading, updating and accessing data as ordered list of literal scalars (numbers, strings) for statistical tests of their sequential structure via L, L, L, L and L. Note that none of these sub-modules are installed by default; to use this module as intended, install one or more of these sub-modules.
34
35							To access the tests, L this base module to create a Statistics::Sequences object with L, then L data into it and access each test by calling the L method, specifying the B attribute: either joins, pot, runs, turns or vnomes, where the relevant sub-module is installed. This allows running several tests on the same data, as the data are immediately available to each test (of joins, pot, runs, turns or vnomes). See the L for a simple example.
36
37							Alternatively, L each sub-module directly, and restrict analyses to the sub-module's test; this module is used implicitly as their base. That is, to perform a test of one type (e.g., runs), L the relevant sub-package, load data via its constructor; see the SYNOPSIS for the particular test, i.e., L, L, L, L or L. You won't be able to access other tests of the same data by this approach, unless you create another object for that test, and then specifically pass the data from the earlier object into the new one.
38
39							=head1 SUBROUTINES/METHODS
40
41							=head2 new
42
43							$seq = Statistics::Sequences->new();
44
45							Returns a new Statistics::Sequences object (inherited from L) by which all the methods for caching, reading and testing data can be accessed, including each of the methods for performing the L, L, L, L or Ltests.
46
47							Sub-packages also have their own new method - so, e.g., L, can be individually imported, and its own L method can be called, e.g.:
48
49							use Statistics::Sequences::Runs;
50							$runs = Statistics::Sequences::Runs->new();
51
52							In this case, data are not automatically shared across packages, and only one test (in this case, the Runs-test) can be accessed through the class-object returned by L.
53
54							=head2 load, add, access, unload
55
56							All these operations on the basic data are inherited from L - see this doc for details of these and other possible methods.
57
58							B: Both the runs- and joins-tests expect dichotomous data: a binary or binomial or Bernoulli sequence, but with whatever characters to symbolize the two possible events. They test their "loads" to make sure the data are dichotomous. To reduce numerical and categorical data to a dichotomous level, see the L, L, L, L, L and other methods in L.
59
60							=head2 observed, observation
61
62							$v = $seq->observed(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
63							$v = $seq->observed(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
64							$v = $seq->observed(stat => 'joins\|pot\|runs\|turns\|vnomes', label => 'myLabelledLoadedData'); # just needs args for partic.stats
65
66							Return the observed value of the statistic for the Led data, or data sent with this call, eg., how many runs in the sequence (1, 1, 0, 1). See the particular statistic's manpage for any other arguments needed or optional.
67
68							=cut
69
70	0			0	1		sub observed { return _feed( 'observed', @_ ); }
71							*observation = \&observed;
72
73							=head2 expected, expectation
74
75							$v = $seq->expected(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
76							$v = $seq->expected(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
77
78							Return the expected value of the statistic for the Led data, or data sent with this call, eg., how many runs should occur in a 4-length sequence of two possible events. See the statistic's manpage for any other arguments needed or optional.
79
80							=cut
81
82	0			0	1		sub expected { return _feed( 'expected', @_ ); }
83							*expectation = \&expected;
84
85							=head2 variance
86
87							$seq->variance(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
88							$seq->variance(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
89
90							Returns the expected range of deviation in the statistic's observed value for the given number of trials.
91
92							=cut
93
94	0			0	1		sub variance { return _feed( 'variance', @_ ); }
95
96							=head2 obsdev, observed_deviation
97
98							$v = $seq->obsdev(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
99							$v = $seq->obsdev(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
100
101							Returns the deviation of (difference between) observed and expected values of the statistic for the loaded/given sequence (I - I).
102
103							=cut
104
105							sub obsdev {
106	0			0	1		return observed(@_) - expected(@_);
107							}
108							*observed_deviation = \&obsdev;
109
110							=head2 stdev, standard_deviation
111
112							$v = $seq->stdev(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
113							$v = $seq->stdev(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
114
115							Returns square-root of the variance.
116
117							=cut
118
119							sub stdev {
120	0			0	1		return sqrt variance(@_);
121							}
122							*standard_deviation = \&stdev;
123
124							=head2 z_value, zscore
125
126							$v = $seq->zscore(stat => 'joins\|pot\|runs\|turns\|vnomes', %args); # gets data from cache, with any args needed by the stat
127							$v = $seq->zscore(stat => 'joins\|pot\|runs\|turns\|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
128
129							Return the deviation ratio: observed deviation to standard deviation. Use argument B for continuity correction.
130
131							=cut
132
133	0			0	1		sub zscore { return _feed( 'zscore', @_ ); }
134							*z_value = \&zscore;
135
136							=head2 p_value, test
137
138							$p = $seq->test(stat => 'runs');
139							$p = $seq->test(stat => 'joins');
140							$p = $seq->test(stat => 'turns');
141							$p = $seq->test(stat => 'pot', state => 'a value appearing in the data');
142							$p = $seq->test(stat => 'vnomes', length => 'an integer greater than zero and less than sample-size');
143
144							Returns the probability of observing so many runs, joins, etc., versus those expected, relative to the expected variance.
145
146							When using a Statistics::Sequences class-object, this method requires naming which test to perform, i.e., runs, joins, pot or vnomes. This is I required when the class-object already refers to one of the sub-modules, as created by the C method within L, L, L, L and L.
147
148							=head3 Common options
149
150							Options common to all the sub-package tests are as follows.
151
152							=over 8
153
154							=item data => 'I'
155
156							Optionally specify the name of the data to be tested. By default, this is not required: the data tested are those that were last loaded, either anonymously, or as returned by one of the L methods. Otherwise, I, data that were previously loaded by name can be individually tested. For example, here are two sets of data that are loaded by name, and then a single test of one of them is performed.
157
158							@chimps = (qw/banana banana cheese banana cheese banana banana banana/);
159							@mice = (qw/banana cheese cheese cheese cheese cheese cheese cheese/);
160							$seq->load(chimps => \@chimps, mice => \@mice);
161							$p = $seq->test(stat => 'runs', data => 'chimps');
162
163							=item ccorr => I
164
165							Specify whether or not to perform the continuity-correction on the observed deviation. Default is false. Relevant only for those tests relying on a I-test. See L.
166
167							=item tails => I<1>\|I<2>
168
169							Specify whether the I-value is calculated for both sides of the normal (or chi-square) distribution (2, the default for most tested data) or only one side (the default for data prepared with the B method.
170
171							=back
172
173							=head3 Test-specific required settings and options
174
175							Some sub-package tests need to have parameters defined in the call to L, and/or have specific options, as follows.
176
177							B : The Joins test I allows the setting of a probability value; see C in the L manpage.
178
179							B : The Pot test I the setting of a state to be tested; see C in the L manpage.
180
181							B : The Serial test for v-nomes requires a length, i.e., the value of I; see C in the L manpage..
182
183							B, B : There are presently no specific requirements nor options for the Runs- and Turns-tests.
184
185							=cut
186
187	0			0	1		sub p_value { return _feed( 'p_value', @_ ); }
188							*test = \&p_value;
189
190							=head2 stats_hash
191
192							$href = $seq->stats_hash(stat => 'runs', values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1});
193
194							Returns a hashref with values for any of the descriptives and probability value relevant to the specified Bistic. Include other required or optional arguments relevant to any of the values requested, e.g., B if getting a z_value, B and B if getting a p_value, B if testing pot, B if testing joins, ... B, B ...
195
196							=cut
197
198							sub stats_hash {
199	0			0	1		my $self = shift;
200	0	0					my $args = ref $_[0] ? $_[0] : {@_};
201	0						my @methods = keys %{ $args->{'values'} };
	0
202	0						my (%stats_hash) = ();
203	2			2		9	no strict 'refs';
	2					2
	2					2131
204	0						foreach my $method (@methods) {
205	0	0					if ( $args->{'values'}->{$method} == 1 ) {
206	0						eval { $stats_hash{$method} = $self->$method($args); };
	0
207	0	0					croak "Method $method is not defined or correctly called for "
208							. __PACKAGE__
209							if $@;
210							}
211							}
212	0	0					if ( !scalar keys %stats_hash ) { # get default stats:
213	0						foreach my $method (qw/observed p_value/) {
214	0						eval { $stats_hash{$method} = $self->$method($args); };
	0
215	0	0					croak "Method $method is not defined or correctly called for "
216							. __PACKAGE__
217							if $@;
218							}
219							}
220	0						return \%stats_hash;
221							}
222
223							=head2 dump
224
225							$seq->dump(stat => 'runs\|joins\|pot ...', values => {}, format => 'string\|table', flag => '1\|0', precision_s => 'integer', precision_p => 'integer');
226
227							I: B
228
229							Print results of the last-conducted test to STDOUT. By default, if no parameters to C are passed, a single line of test statistics is printed. Options are as follows.
230
231							=over 8
232
233							=item values => hashref
234
235							Hashref of the statistical parameters to dump. Default is observed value and p-value for the given B.
236
237							=item flag => I
238
239							If true, the I -value associated with the I-value is appended with a single asterisk if the value if below .05, and with two asterisks if it is below .01.
240
241							If false (default), nothing is appended to the I -value.
242
243							=item format => 'table\|labline\|csv'
244
245							Default is 'csv', to print the stats hash as a comma-separated string (no newline), e.g., '4.0000,0.8596800". If specifying 'labline', you get something like "observed = 4.0000, p_value = 0.8596800\n". If specifying "table", this is a dump from L with the stat methods as headers and column length set to the maximum required for the given headers, level of precision, flag, etc. For example, with B => 4 and B => 7, you get:
246
247							.-----------+-----------.
248							\| observed \| p_value \|
249							+-----------+-----------+
250							\| 4.0000 \| 0.8596800 \|
251							'-----------+-----------'
252
253							=item verbose => 1\|0
254
255							If true, includes a title giving the name of the statistic, details about the hypothesis tested (if B => 1 in the B hashref), et al. No effect if B is not defined or equals 'csv'.
256
257							=item precision_s => 'I'
258
259							Precision of the statistic values (observed, expected, variance, z_value).
260
261							=item precision_p => 'I'
262
263							Specify rounding of the probability associated with the I-value to so many digits. If zero or undefined, you get everything available.
264
265							=back
266
267							=cut
268
269							sub dump {
270	0			0	1		my $self = shift;
271	0	0					my $args = ref $_[0] ? $_[0] : {@_};
272	0						my $stats_hash = $self->stats_hash($args);
273	0		0				$args->{'format'} \|\|= 'csv';
274	0						my @standard_methods =
275							(qw/observed expected variance obsdev stdev z_value p_value/);
276	0						my ( $maxlen, @strs, @headers, @wanted_methods ) = (0);
277	0						foreach my $method (@standard_methods)
278							{ # set up what has been requested in a meaningful order:
279	0	0					push( @wanted_methods, $method ) if defined $stats_hash->{$method};
280							}
281	0						foreach my $method ( keys %{$stats_hash} )
	0
282							{ # add any extra "non-standard" methods
283	0	0					push( @wanted_methods, $method ) if !grep /$method/, @wanted_methods;
284							}
285	0						foreach my $method (@wanted_methods) {
286	0						my $val = delete $stats_hash->{$method};
287	0						my $len;
288	0	0					if ( $method eq 'p_value' ) {
289	0						$val = _precisioned( $args->{'precision_p'}, $val );
290							$val .= ( $val < .05 ? ( $val < .01 ? q{*} : q{} ) : q{} )
291	0	0					if $args->{'flag'};
		0
		0
292							}
293							else {
294	0	0					if ( ref $val ) {
		0
295	0	0					if ( ref $val eq 'HASH' ) {
296	0						my %vals = %{$val};
	0
297	0						$val = q{};
298	0	0					my $delim = $args->{'format'} eq 'table' ? "\n" : q{,};
299	0						my ( $str, $this_len ) = (q{});
300	0						while ( my ( $k, $v ) = each %vals ) {
301	0						$str = "'$k' = $v";
302	0						$this_len = length($str);
303	0	0	0				$len = $this_len
304							if not defined $len or $this_len > $len;
305	0						$val .= $str;
306	0						$val .= $delim;
307							}
308	0	0					if ( $args->{'format'} ne 'table' ) {
309	0						chop $val;
310	0						$val = '(' . $val . ')';
311							}
312							}
313							else {
314	0						$val = join q{, }, @{$val};
	0
315							}
316							}
317							elsif ( looks_like_number($val) ) {
318	0						$val = _precisioned( $args->{'precision_s'}, $val );
319							}
320							}
321	0						push @headers, $method;
322	0						push( @strs, $val );
323	0	0					$len = length $val if !defined $len;
324	0	0					$maxlen = $len if $len > $maxlen;
325							}
326	0	0					if ( $args->{'format'} eq 'table' ) {
		0
327	0	0					$maxlen = 8 if $maxlen < 8;
328							my $title =
329							$args->{'verbose'}
330	0	0					? ucfirst( $args->{'stat'} ) . " statistics\n"
331							: q{};
332	0	0					print $title or croak 'Cannot print title for data-table';
333	0						my @hh = ();
334	0						push( @hh, [ $maxlen, $_ ] ) foreach @headers;
335	0						require Text::SimpleTable;
336	0						my $tbl = Text::SimpleTable->new(@hh);
337	0						$tbl->row(@strs);
338	0	0					print $tbl->draw or croak 'Cannot print data-table';
339							}
340							elsif ( $args->{'format'} eq 'labline' ) {
341	0						my @hh;
342	0						for ( my $i = 0 ; $i <= $#strs ; $i++ ) {
343	0						$hh[$i] = "$headers[$i] = $strs[$i]";
344							}
345	0						my $str = join( q{, }, @hh );
346	0	0					if ( $args->{'verbose'} ) {
347	0						$str = ucfirst( $args->{'stat'} ) . ': ' . $str;
348							}
349	0	0					print {*STDOUT} $str, "\n" or croak 'Cannot print data-string';
	0
350							}
351							else { # csv
352	0	0					print join( q{,}, @strs ) or croak 'Cannot print data-string';
353							}
354	0						return;
355							}
356							*print_summary = \&dump;
357
358							=head2 dump_data
359
360							$seq->dump_data(delim => "\n");
361
362							Prints to STDOUT a space-separated line of the tested data - as dichotomized and put to test. Optionally, give a value for B to specify how the datapoints should be separated. Inherited from L.
363
364							=cut
365
366							# PRIVATMETHODEN
367
368							sub _feed {
369	0			0			my $method = shift;
370	0						my $self = shift;
371	0	0					my $args = ref $_[0] ? $_[0] : {@_};
372	0		0				my $statname = $args->{'stat'} \|\| q{};
373	0						my $class = __PACKAGE__ . q{::} . ucfirst($statname);
374	0						eval "require $class";
375	0	0					croak __PACKAGE__,
376							" error: Requested sequences module '$class' is not valid/available. You might need to install '$class'"
377							if $@;
378	0						my ( $val, $nself ) = ( q{}, {} );
379
380							#my $nself = {};
381	0						bless( $nself, $class ); #$nself = $class->new();
382	0						$nself->{$_} = $self->{$_} foreach keys %{$self};
	0
383	2			2		8	no strict 'refs';
	2					2
	2					232
384	0						eval '$val = $nself->$method($args)'
385							; # but does not trap "deep recursion" if method not defined
386	0	0					croak __PACKAGE__, " error: Method '$method' is not defined for $class"
387							if $@;
388	0						$self->{'stat'} = $statname;
389	0						return $val;
390							}
391
392							sub _precisioned {
393	0	0		0			return $_[0]
		0
394							? sprintf( q{%.} . $_[0] . 'f', $_[1] )
395							: ( defined $_[1] ? $_[1] : q{} ); # don't lose any zero
396							}
397
398							=head1 BUNDLING
399
400							This module Cs its sub-modules implicitly - so a bundled program using this module might need to explicitly C its sub-modules if these need to be included in the bundle itself.
401
402							=head1 AUTHOR
403
404							Roderick Garton, C<< >>
405
406							=head1 SUPPORT
407
408							You can find documentation for this module with the perldoc command.
409
410							perldoc Statistics::Sequences
411
412							You can also look for information at:
413
414							=over 4
415
416							=item * RT: CPAN's request tracker (report bugs here)
417
418							L
419
420							=item * AnnoCPAN: Annotated CPAN documentation
421
422							L
423
424							=item * CPAN Ratings
425
426							L
427
428							=item * Search CPAN
429
430							L
431
432							=back
433
434							=head1 LICENSE AND COPYRIGHT
435
436							=over 4
437
438							=item Copyright (c) 2006-2016 Roderick Garton
439
440							This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see L).
441
442							=item Disclaimer
443
444							To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.
445
446							=back
447
448							=cut
449
450							1; # end of Statistics::Sequences