File Coverage

blib/lib/Statistics/ANOVA/Compare.pm
Criterion Covered Total %
statement 170 184 92.3
branch 32 58 55.1
condition 6 15 40.0
subroutine 18 19 94.7
pod 1 1 100.0
total 227 277 81.9


line stmt bran cond sub pod time code
1             package Statistics::ANOVA::Compare;
2            
3 3     3   42 use 5.006;
  3         6  
4 3     3   9 use strict;
  3         3  
  3         59  
5 3     3   8 use warnings FATAL => 'all';
  3         3  
  3         110  
6 3     3   12 use base qw(Statistics::Data);
  3         2  
  3         215  
7 3     3   1344 use Algorithm::Combinatorics qw(combinations);
  3         7371  
  3         165  
8 3     3   13 use Carp qw(carp croak);
  3         4  
  3         111  
9 3     3   11 use List::AllUtils qw(sum0);
  3         1  
  3         106  
10 3     3   9 use Math::Cephes qw(:dists);
  3         3  
  3         518  
11 3     3   13 use Statistics::Lite qw(count mean variance);
  3         3  
  3         4693  
12             $Statistics::ANOVA::Compare::VERSION = '0.01';
13            
14             =head1 NAME
15            
16             Statistics::ANOVA::Compare - Comparison procedures for ANOVA
17            
18             =head1 VERSION
19            
20             This is documentation for Version 0.01, released February 2015.
21            
22             The methods here reproduce those previously incorporated as part of L itself.
23            
24             =head1 SYNOPSIS
25            
26             use Statistics::ANOVA::Compare;
27             my $cmp = Statistics::ANOVA::Compare->new();
28             $cmp->load(HOA); # hash of arefs preferably
29             my $href = $cmp->run(parametric => BOOL, independent => BOOL);
30            
31             =head1 SUBROUTINES/METHODS
32            
33             =head2 new
34            
35             $cmp = Statistics::ANOVA::Compare->new();
36            
37             New object for accessing methods and storing results. This "isa" Statistics::Data object.
38            
39             =head2 load, add, unload
40            
41             $cmp->load(1 => [1, 4], 2 => [3, 7]);
42            
43             The given data can now be used by any of the following methods. This is inherited from L, and all its other methods are available here via the class object. Only passing of data as a hash of arrays (HOA) is supported for now. Alternatively, give each of the following methods the HOA for the optional named argument B.
44            
45             =head2 run
46            
47             Performs all possible pairwise comparisons, with Bonferroni control of experiment-wise error-rate. The particular tests depend on whether or not you want parametric (default) or nonparametric tests, and if the observations have been made independently (between groups, the default) or by repeated measures.
48            
49             If C =E 1 (default), it firstly checks if the variances are unequal> (I

E .05) by the L, and runs L. If the variances are equal, runs L. Alternatively, you get unadjusted use of the mean-square error, with no prior test of equality-of-variances, if the parameter C =E 0. On the other hand, force the procedure to use separate variances, as if unequal variances, if C =E 2.

50            
51             If C =E 1, performs non-parametric pairwise comparison. This derives the I-value and associated I

-value for the standardized (a) Wilcoxon (between-groups) sum-of-ranks if C =E 1 (B procedure), or (b) (merely) paired I-tests (TO DO: Friedman-type (within-groups) sum-of-ranks) if C =E 0.

52            
53             Nominality is always assumed; there is no accounting for ordinality of the variables.
54            
55             The I

-value is 2-tailed, by default, unless otherwise specified, as above. If the value of the argument C equals 1, then the probability values themselves will be adjusted according to the number of comparisons, alpha will remain as given or at .05. The correction is:

56            
57             =for html

    p' = 1 – (1 – p)N

58            
59             where I

is the probability returned by the relevant comparison procedure, and I is the number of pairwise comparisons performed.

60            
61             By default, returns a hashref of hashrefs, the outer hash keyed by the pairs compared (as a comma-separated string), each with a hashref with keys named C, C, C, C (= 1 or 0 depending on its being below or greater than/equal to C).
62            
63             Alternatively, if the value of C =E 1, you just get back a referenced array of strings that describe the results, e.g., G1 - G2: t(39) = 2.378, 2p = 0.0224.
64            
65             Give a value of 1 to C to automatically print these strings to STDOUT. (Distinct from earlier versions, there is no default dump to STDOUT of the results.)
66            
67             The output strings are appended with an asterisk if the logical value of the optional attribute C equals 1 and the C is less than the Bonferroni-adjusted alpha level. This alpha level, relative to the given or default alpha of .05, for the number of paired comparisons, is printed at the end of the list.
68            
69             An alternative (actually, legacy from earlier version) is to use I-tests, rather than I-tests, and this you get if the argument C =E 1. The module uses Perl's Statistics I-test modules for this purpose, with no accounting for the variance issue.
70            
71             =cut
72            
73             sub run {
74 5     5 1 9 my ($self, %args) = @_;
75 5         6 foreach (qw/independent parametric nominal adjust_e/)
76             { # 'nominal' an option not yet implemented
77 20 100       44 $args{$_} = 1 if !defined $args{$_};
78             }
79             my (
80 5         6 $data, $s_value, $a3, $a4, $p_value, $cmp_fn, $p_str, $flag, $flag_str, $ms_w,
81             $alpha, @all_pairs, @strings, %res
82             ) = ();
83 5   50     24 $args{'tails'} ||= 2;
84            
85             # Define the data and which routine to use based on values of independent and parametric:
86 5 100       10 if ( $args{'independent'} ) {
87 4         15 $data = $self->get_hoa_by_lab_numonly_indep(%args);
88             croak 'Not enough variables, if any, in the data for performing ANOVA'
89 4 50       488 if scalar( keys( %{$data} ) ) <= 1;
  4         11  
90 4 100       7 if ( $args{'parametric'} ) {
91 3 100       4 if ( !$args{'use_t'} ) {
92 2         8 require Statistics::ANOVA;
93 2         5 my $aov = Statistics::ANOVA->new;
94 2         14 $aov->share($self);
95 2         13 my %res = $aov->obrien(%args);
96 2 50       6 my $eq_var = $res{'p_value'} < .05 ? 0 : 1;
97 2         7 %res = $aov->anova(independent => 1, parametric => 1, ordinal => 0);
98 2         4 $ms_w = $res{'ms_w'};
99 2 100 66     11 if (defined $args{'adjust_e'} and $args{'adjust_e'} == 0) {
100 1         8 $cmp_fn = \&_cmp_indep_param_cat_by_poolederror;
101             }
102             else {
103 1 50 33     5 if (!$eq_var || $args{'adjust_e'}) {
104 1         12 $cmp_fn = \&_cmp_indep_param_cat_by_contrasts;
105             }
106             else {
107 0         0 $cmp_fn = \&_cmp_indep_param_cat_by_poolederror;
108             }
109             }
110             }
111             else { # legacy offer:
112 1         645 require Statistics::TTest;
113 1         28119 my $ttest = Statistics::TTest->new();
114             $cmp_fn = sub {
115 6     6   7 my $data_pairs = shift;
116 6         16 $ttest->load_data( $data_pairs->[0]->[1],
117             $data_pairs->[1]->[1] );
118 6         4485 $p_value = $ttest->{'t_prob'}
119             ; # returns the 2-tailed p_value via Statistics::Distributions
120 6 50       14 $p_value /= 2 if $args{'tails'} == 1;
121 6         19 return ( $ttest->t_statistic, $p_value, $ttest->df );
122 1         17 };
123             }
124             }
125             else {
126             #$cmp_fn = !$args{'ordinal'} ? \&_cmp_indep_dfree_cat : \&_cmp_indep_dfree_ord;
127 1         1 $cmp_fn = \&_cmp_indep_dfree_cat;
128             }
129             }
130             else {
131 1         6 $data = $self->get_hoa_by_lab_numonly_across(%args);
132 1         111 my $n_wt = $self->equal_n( data => $data );
133 1 50 33     17 croak
134             'Number of observations per variable need to be equal and greater than one for repeated measures ANOVA'
135             if !$n_wt or $n_wt == 1;
136 1 50       2 if ( $args{'parametric'} ) {
137 1         2 $cmp_fn = \&_cmp_rmdep_param_cat;
138             }
139             else {
140 0         0 carp
141             'Non-parametric multi-comparison procedure for dependent/repeated measures is not implemented';
142             }
143             }
144            
145             @all_pairs =
146 5         8 combinations( [ keys( %{$data} ) ], 2 ); # Algorithm::Combinatorics fn
  5         23  
147 5   50     341 $alpha = $args{'alpha'} || .05;
148             $alpha /= scalar(@all_pairs)
149 5 50       17 if !$args{'adjust_p'}; # divide by number of comparisons
150            
151             # Compare each pair:
152 5         8 foreach my $pairs (@all_pairs) {
153 27         25 $pairs = [ sort { $a cmp $b } @{$pairs} ];
  27         56  
  27         68  
154             ( $s_value, $p_value, $a3, $a4 ) = $cmp_fn->(
155             [
156             [ $pairs->[0], $data->{ $pairs->[0] } ],
157 27         114 [ $pairs->[1], $data->{ $pairs->[1] } ]
158             ],
159             ms_w => $ms_w,
160             %args,
161             );
162             $p_value = _pcorrect( $p_value, scalar(@all_pairs) )
163 27 50       145 if $args{'adjust_p'};
164 27         59 $a3 = _precisioned( $args{'precision_s'}, $a3 ); # degrees-of-freedom
165 27         61 $s_value = _precisioned( $args{'precision_s'}, $s_value );
166 27         44 $p_value = _precisioned( $args{'precision_p'}, $p_value );
167 27 50       45 $p_str = $args{'tails'} == 1 ? '1p' : '2p';
168 27 100       35 $flag = $p_value < $alpha ? 1 : 0;
169 27 0       37 $flag_str = $args{'flag'} ? $flag ? ' *' : '' : '';
    50          
170            
171 27 100       34 if ( $args{'parametric'} ) {
172 21         104 $res{"$pairs->[0],$pairs->[1]"} = {
173             t_value => $s_value,
174             p_value => $p_value,
175             df => $a3,
176             flag => $flag
177             };
178 21         135 push @strings,
179             "($pairs->[0] - $pairs->[1]), t($a3) = $s_value, $p_str = $p_value"
180             . $flag_str;
181             }
182             else {
183 6         19 $res{"$pairs->[0],$pairs->[1]"} = {
184             z_value => $s_value,
185             p_value => $p_value,
186             s_value => $a3,
187             flag => $flag
188             };
189 6         41 push @strings,
190             "($pairs->[0] - $pairs->[1]), Z(W) = $s_value, $p_str = $p_value"
191             . $flag_str;
192             }
193             } # end loop
194            
195 5 50       15 if ( $args{'dump'} ) {
196 0         0 print "$_\n" foreach @strings;
197 0         0 print "Alpha = $alpha\n";
198             }
199            
200 5 50       94 return $args{'str'} ? \@strings : \%res;
201             }
202            
203             =head2 indep_param_by_contrasts
204            
205             TO DO: use run() for now
206            
207             Performs parametric pairwise comparison by I-tests on each possible pair of observations, with respect to the value of C. This assumes that the variances are unequal, and uses the variance of each sample in the pair in the error-term of the I-value, and the denominator degrees-of-freedom is adjusted accordingly.
208            
209             =head2 indep_param_by_mse
210            
211             TO DO: use run() for now
212            
213             Performs parametric pairwise comparison by I-tests on each possible pair of observations, with respect to the value of C. This assumes that the variances are equal, so that the mean-square error ($aov-E{'ms_w'}) is used in the error-term of the I-value.
214            
215             =cut
216            
217             sub _cmp_indep_param_cat_by_contrasts {
218 6     6   11 my ( $data_pairs, %args ) = @_;
219             my @nn = (
220 6         13 count( @{ $data_pairs->[0]->[1] } ),
221 6         6 count( @{ $data_pairs->[1]->[1] } )
  6         36  
222             );
223 6         10 my @uu = ( mean( @{ $data_pairs->[0]->[1] } ),
224 6         30 mean( @{ $data_pairs->[1]->[1] } ) );
  6         100  
225 6         96 my @cc = ( 1, -1 )
226             ; # contrast coefficients for pairwise comparison of means (ui - uj = 0)
227 6         6 my ( $f_value, $df_w ) = ();
228 6         2 my $lu = sum0( map { $cc[$_] * $uu[$_] } ( 0, 1 ) )
  12         21  
229             ; # estimate of linear combo of means; s/= zero if no diff.
230 6         5 my $ss_cc = sum0( map { $cc[$_]**2 / $nn[$_] } ( 0, 1 ) )
  12         15  
231             ; # weighted sum of squared contrast coefficients
232            
233 6         9 ( $f_value, $df_w ) =
234             _var_by_contrasts( $data_pairs, $lu, $ss_cc, \@nn, \@cc );
235 6         19 my $p_value =
236             fdtrc( 1, $df_w, $f_value ); # s/be compared to alpha/nContrasts
237 6         15 return ( $f_value, $p_value, 1, $df_w );
238             }
239            
240             sub _cmp_indep_param_cat_by_poolederror {
241 6     6   11 my ( $data_pairs, %args ) = @_;
242             my @nn = (
243 6         25 count( @{ $data_pairs->[0]->[1] } ),
244 6         8 count( @{ $data_pairs->[1]->[1] } )
  6         38  
245             );
246 6         13 my @uu = ( mean( @{ $data_pairs->[0]->[1] } ),
247 6         28 mean( @{ $data_pairs->[1]->[1] } ) );
  6         101  
248 6         95 my @cc = ( 1, -1 )
249             ; # contrast coefficients for pairwise comparison of means (ui - uj = 0)
250 6         7 my $lu = sum0( map { $cc[$_] * $uu[$_] } ( 0, 1 ) )
  12         33  
251             ; # estimate of linear combo of means; s/= zero if no diff.
252 6         10 my $ss_cc = sum0( map { $cc[$_]**2 / $nn[$_] } ( 0, 1 ) )
  12         18  
253             ; # weighted sum of squared contrast coefficients
254             # use the pooled error-term (MSe):
255 6         9 my $f_value = $lu**2 / ( $ss_cc * $args{'ms_w'} )
256             ; # Maxwell & Delaney Eq 4.37 (p. 176); assumes equal variances
257             ##$f_value = ( $nn[0] * $nn[1] * ( $uu[0] - $uu[1])**2 ) / ( ( $nn[0] + $nn[1] ) * $self->{'_stat'}->{'ms_w'} );
258 6         6 my $df_w = ( $nn[0] + $nn[1] - 2 );
259 6         20 my $p_value =
260             fdtrc( 1, $df_w, $f_value ); # s/be compared to alpha/nContrasts
261 6         14 return ( $f_value, $p_value, 1, $df_w );
262             }
263            
264             sub _var_by_contrasts { # permits unequal variances; sep. error terms/contrast
265 6     6   4 my ( $data_pairs, $lu, $ss_cc, $nn, $cc ) = @_;
266            
267             #$cc ||= [1, -1];
268 6         6 my $ss_u = $lu**2 / $ss_cc;
269             my @vv = (
270 6         9 variance( @{ $data_pairs->[0]->[1] } ),
271 6         5 variance( @{ $data_pairs->[1]->[1] } )
  6         202  
272             );
273 6         196 my $den_a = sum0( map { ( $cc->[$_]**2 / $nn->[$_] ) * $vv[$_] } ( 0, 1 ) );
  12         19  
274 6         6 my $denom = $den_a / $ss_cc;
275 6         5 my $f_value = $ss_u / $denom; # Maxwell & Delaney Eq. 5.10 (p. 180)
276 6         5 my $df_n = sum0( map { ( $cc->[$_]**2 * $vv[$_] / $nn->[$_] ) } ( 0, 1 ) );
  12         25  
277             my $df_d = sum0(
278 6         6 map { ( $cc->[$_]**2 * $vv[$_] / $nn->[$_] )**2 / ( $nn->[$_] - 1 ) }
  12         18  
279             ( 0, 1 ) );
280 6         5 my $df_w = $df_n**2 / $df_d;
281 6         11 return ( $f_value, $df_w );
282             }
283            
284             sub _cmp_rmdep_param_cat { # not completed - only use Dependant t-test for now:
285 3     3   9 my ( $data_pairs, %args ) = @_;
286 3         179730 require Statistics::DependantTTest;
287 3         24800 my $ttest = Statistics::DependantTTest->new();
288 3         24 $ttest->load_data( $data_pairs->[0]->[0], @{ $data_pairs->[0]->[1] } );
  3         10  
289 3         733 $ttest->load_data( $data_pairs->[1]->[0], @{ $data_pairs->[1]->[1] } );
  3         9  
290 3         610 my ( $s_value, $df ) =
291             $ttest->perform_t_test( $data_pairs->[0]->[0], $data_pairs->[1]->[0] );
292 3         737 my $p_value =
293             stdtr( $df, -1 * abs($s_value) ); # Math::Cephes - left 1-tailed p_value
294 3 50       14 $p_value *= 2 unless $args{'tails'} == 1;
295 3         30 return ( $s_value, $p_value, $df );
296             }
297            
298             sub _cmp_indep_dfree_cat { # Dwass-Steel procedure
299 6     6   10 my ( $data_pairs, %args ) = @_;
300             my ( $n1, $n2 ) = (
301 6         13 count( @{ $data_pairs->[0]->[1] } ),
302 6         6 count( @{ $data_pairs->[1]->[1] } )
  6         54  
303             ); # Ns/variable
304 6         46 my $nm =
305             $data_pairs->[1]->[0]; # arbitrarily use second variable as reference data
306 6         21 require Statistics::Data::Rank;
307 6         7 my ( $ranks_href, $xtied, $gn, $ties_var ) =
308             Statistics::Data::Rank->ranks_between( data => _aref2href($data_pairs) )
309             ; # get joint ranks
310 6         1744 my $sum = sum0( @{ $ranks_href->{$nm} } )
  6         19  
311             ; # calc. sum-of-ranks for (arbitrarily) second member of pair
312 6         9 my $exp = ( $n2 * ( $gn + 1 ) ) / 2; # expected value
313 6         5 my $tie = sum0( map { ( $_ - 1 ) * $_ * ( $_ + 1 ) } @{$xtied} );
  78         80  
  6         6  
314 6         12 my $var =
315             ( ( $n1 * $n2 ) / 24 ) *
316             ( ( $gn + 1 - ( $tie / ( ($gn) * ( $gn - 1 ) ) ) ) ); # variance
317 6         7 my $z_value = ( $sum - $exp ) / sqrt($var); # standardized sum-of-ranks
318 6         38 my $p_value = ( 1 - ndtr( abs($z_value) ) ); # Math::Cephes fn
319 6 50       14 $p_value *= 2 unless $args{'tails'} == 1;
320 6         24 return ( $z_value, $p_value, $sum );
321             }
322            
323             sub _aref2href {
324 6     6   5 my $aref = shift;
325 6         5 my %hash = ();
326 6 50       10 $aref = [$aref] if !ref $aref->[0];
327 6         10 foreach ( @{$aref} ) {
  6         6  
328 12 50       17 if ( ref $_->[1] ) {
329 12         15 $hash{ $_->[0] } = $_->[1];
330             }
331             else {
332 0         0 my $name = shift( @{$_} );
  0         0  
333 0         0 $hash{$name} = [ @{$_} ];
  0         0  
334             }
335             }
336 6         14 return \%hash;
337             }
338            
339             sub _is_significant {
340 0     0   0 my ( $pvalue, $alpha, $exp_tail ) = @_;
341 0   0     0 $exp_tail ||= 2;
342 0 0       0 if ( ref $alpha ) {
343            
344             # Assume aref:
345 0 0       0 if ( $pvalue > $alpha->[0] ) {
346 0 0       0 if ($exp_tail) {
347            
348             }
349             }
350             }
351 0         0 return;
352             }
353            
354             sub _precisioned {
355 81 50   81   161 return $_[0]
    50          
356             ? sprintf( '%.' . $_[0] . 'f', $_[1] )
357             : ( defined $_[1] ? $_[1] : q{} ); # don't lose any zero
358             }
359            
360             =head1 AUTHOR
361            
362             Roderick Garton, C<< >>
363            
364             =head1 BUGS AND LIMITATIONS
365            
366             Please report any bugs or feature requests to C, or through
367             the web interface at L. I will be notified, and then you'll
368             automatically be notified of progress on your bug as I make changes.
369            
370             =head1 SUPPORT
371            
372             You can find documentation for this module with the perldoc command.
373            
374             perldoc Statistics::ANOVA::Compare
375            
376            
377             You can also look for information at:
378            
379             =over 4
380            
381             =item * RT: CPAN's request tracker (report bugs here)
382            
383             L
384            
385             =item * AnnoCPAN: Annotated CPAN documentation
386            
387             L
388            
389             =item * CPAN Ratings
390            
391             L
392            
393             =item * Search CPAN
394            
395             L
396            
397             =back
398            
399             =head1 LICENSE AND COPYRIGHT
400            
401             Copyright 2015 Roderick Garton.
402            
403             This program is free software; you can redistribute it and/or modify it
404             under the terms of either: the GNU General Public License as published
405             by the Free Software Foundation; or the Artistic License.
406            
407             See L for more information.
408            
409            
410             =cut
411            
412             1; # End of Statistics::ANOVA::Compare