File Coverage

blib/lib/Text/NSP.pm

Criterion	Covered	Total	%
statement	9	9	100.0
branch			n/a
condition			n/a
subroutine	3	3	100.0
pod			n/a
total	12	12	100.0

line	stmt	sub	time	code
1				=head1 NAME
2
3				Text::NSP - Extract collocations and Ngrams from text
4
5				=head1 SYNOPSIS
6
7				=head2 Basic Usage
8
9				use Text::NSP::Measures::2D::MI::ll;
10
11				my $npp = 60; my $n1p = 20; my $np1 = 20; my $n11 = 10;
12
13				$ll_value = calculateStatistic( n11=>$n11,
14				n1p=>$n1p,
15				np1=>$np1,
16				npp=>$npp);
17
18				if( ($errorCode = getErrorCode()))
19				{
20				print STDERR $errorCode." - ".getErrorMessage()."\n"";
21				}
22				else
23				{
24				print getStatisticName."value for bigram is ".$ll_value."\n"";
25				}
26
27				=head1 DESCRIPTION
28
29				The Ngram Statistics Package (NSP) is a collection of perl modules
30				that aid in analyzing Ngrams in text files. We define an Ngram as a
31				sequence of 'n' tokens that occur within a window of at least 'n'
32				tokens in the text; what constitutes a "token" can be defined by the
33				user.
34
35				NSP.pm is a stub that doesn't have any real functionality. It serves
36				as a top level module in the hierarchy and allows us to group the
37				Text::NSP::Count and Text::NSP::Measures modules.
38
39				The modules under Text::NSP::Measures implement measures of
40				association that are used to evaluate whether the co-occurrence of the
41				words in a Ngram is purely by chance or statistically significant.
42				These measures compute a numerical score for Ngrams. This score can be
43				used to decide whether or not there is enough evidence to reject the
44				null hypothesis (that the Ngram is not statistically significant) for
45				that Ngram.
46
47				To use one of the measures you can either use the program statistic.pl
48				provided under the utils directory, or write your own driver program.
49				Program statistic.pl takes as input a list of Ngrams with their
50				frequencies (in the format output by count.pl) and runs a
51				user-selected statistical measure of association to compute the score
52				for each Ngram. The Ngrams, along with their scores, are output in
53				descending order of this score. For help on using utils/statistic.pl
54				please refer to its perldoc (perldoc utils/statistic.pl).
55
56				If you are writing your own driver program, a basic usage example is
57				provided above under SYNOPSIS. For further clarification please refer
58				to the documentation of Text::NSP::Measures (perldoc
59				Text::NSP::Measures).
60
61
62				=head2 Error Codes
63
64				The following table describes the error codes use in the
65				implementation,
66
67				Error codes common to all the association measures.
68
69				100 - Trying to create an object of a abstract class.
70
71				200 - one of the required values is missing.
72
73				201 - one of the observed frequency comes out to be -ve.
74
75				202 - one of the frequency values(n11) exceeds the total no of
76				bigrams(npp) or a marginal total(n1p, np1).
77
78				203 - one of the marginal totals(n1p, np1) exceeds the total bigram
79				count(npp).
80
81				204 - one of the marginal totals is -ve.
82
83				Error Codes required by the mutual information measures
84
85				211 - one of the expected values is zero.
86
87				212 - one of the expected values is -ve.
88
89
90				Error codes required by the CHI measures.
91
92				221 - one of the expected values is zero.
93
94				=head2 Methods
95
96				=over
97
98				=cut
99
100				package Text::NSP;
101
102	29	29	669	use strict;
	29		51
	29		1150
103	29	29	138	use Carp;
	29		45
	29		2319
104	29	29	126	use warnings;
	29		48
	29		2081
105
106				our ($VERSION, @ISA);
107
108				@ISA = qw(Exporter);
109
110				$VERSION = '1.31';
111
112				1;
113
114				__END__