File Coverage

Bio/Matrix/PSM/SiteMatrixI.pm
Criterion Covered Total %
statement 3 49 6.1
branch n/a
condition n/a
subroutine 1 24 4.1
pod 18 18 100.0
total 22 91 24.1


line stmt bran cond sub pod time code
1              
2             =head1 NAME
3              
4             Bio::Matrix::PSM::SiteMatrixI - SiteMatrixI implementation, holds a
5             position scoring matrix (or position weight matrix) and log-odds
6              
7             =head1 SYNOPSIS
8              
9             # You cannot use this module directly; see Bio::Matrix::PSM::SiteMatrix
10             # for an example implementation
11              
12             =head1 DESCRIPTION
13              
14             SiteMatrix is designed to provide some basic methods when working with position
15             scoring (weight) matrices, such as transcription factor binding sites for
16             example. A DNA PSM consists of four vectors with frequencies {A,C,G,T}. This is
17             the minimum information you should provide to construct a PSM object. The
18             vectors can be provided as strings with frequenciesx10 rounded to an int, going
19             from {0..a} and 'a' represents the maximum (10). This is like MEME's compressed
20             representation of a matrix and it is quite useful when working with relational
21             DB. If arrays are provided as an input (references to arrays actually) they can
22             be any number, real or integer (frequency or count).
23              
24             When creating the object you can ask the constructor to make a simple pseudo
25             count correction by adding a number (typically 1) to all positions (with the
26             -correction option). After adding the number the frequencies will be
27             calculated. Only use correction when you supply counts, not frequencies.
28              
29             Throws an exception if: You mix as an input array and string (for example A
30             matrix is given as array, C - as string). The position vector is (0,0,0,0). One
31             of the probability vectors is shorter than the rest.
32              
33             Summary of the methods I use most frequently (details bellow):
34              
35             iupac - return IUPAC compliant consensus as a string
36             score - Returns the score as a real number
37             IC - information content. Returns a real number
38             id - identifier. Returns a string
39             accession - accession number. Returns a string
40             next_pos - return the sequence probably for each letter, IUPAC
41             symbol, IUPAC probability and simple sequence
42             consenus letter for this position. Rewind at the end. Returns a hash.
43             pos - current position get/set. Returns an integer.
44             regexp - construct a regular expression based on IUPAC consensus.
45             For example AGWV will be [Aa][Gg][AaTt][AaCcGg]
46             width - site width
47             get_string - gets the probability vector for a single base as a string.
48             get_array - gets the probability vector for a single base as an array.
49             get_logs_array - gets the log-odds vector for a single base as an array.
50              
51             New methods, which might be of interest to anyone who wants to store PSM in a relational
52             database without creating an entry for each position is the ability to compress the
53             PSM vector into a string with losing usually less than 1% of the data.
54             this can be done with:
55              
56             my $str=$matrix->get_compressed_freq('A');
57              
58             or
59              
60             my $str=$matrix->get_compressed_logs('A');
61              
62             Loading from a database should be done with new, but is not yest implemented.
63             However you can still uncompress such string with:
64              
65             my @arr=Bio::Matrix::PSM::_uncompress_string ($str,1,1); for PSM
66              
67             or
68              
69             my @arr=Bio::Matrix::PSM::_uncompress_string ($str,1000,2); for log odds
70              
71             =head1 FEEDBACK
72              
73             =head2 Mailing Lists
74              
75             User feedback is an integral part of the evolution of this and other
76             Bioperl modules. Send your comments and suggestions preferably to one
77             of the Bioperl mailing lists. Your participation is much appreciated.
78              
79             bioperl-l@bioperl.org - General discussion
80             http://bioperl.org/wiki/Mailing_lists - About the mailing lists
81              
82             =head2 Support
83              
84             Please direct usage questions or support issues to the mailing list:
85              
86             I
87              
88             rather than to the module maintainer directly. Many experienced and
89             reponsive experts will be able look at the problem and quickly
90             address it. Please include a thorough description of the problem
91             with code and data examples if at all possible.
92              
93             =head2 Reporting Bugs
94              
95             Report bugs to the Bioperl bug tracking system to help us keep track
96             the bugs and their resolution. Bug reports can be submitted via the
97             web:
98              
99             https://github.com/bioperl/bioperl-live/issues
100              
101             =head1 AUTHOR - Stefan Kirov
102              
103             Email skirov@utk.edu
104              
105             =head1 APPENDIX
106              
107             =cut
108              
109              
110             # Let the code begin...
111              
112             package Bio::Matrix::PSM::SiteMatrixI;
113              
114             # use strict;
115 6     6   29 use base qw(Bio::Root::RootI);
  6         7  
  6         2561  
116              
117             =head2 calc_weight
118              
119             Title : calc_weight
120             Usage : $self->calc_weight({A=>0.2562,C=>0.2438,G=>0.2432,T=>0.2568});
121             Function: Recalculates the PSM (or weights) based on the PFM (the frequency matrix)
122             and user supplied background model.
123             Throws : if no model is supplied
124             Example :
125             Returns :
126             Args : reference to a hash with background frequencies for A,C,G and T
127              
128             =cut
129              
130             sub calc_weight {
131 0     0 1   my $self = shift;
132 0           $self->throw_not_implemented();
133             }
134              
135              
136             =head2 next_pos
137              
138             Title : next_pos
139             Usage : my %base=$site->next_pos;
140             Function:
141              
142             Retrieves the next position features: frequencies and weights for
143             A,C,G,T, the main letter (as in consensus) and the
144             probabilty for this letter to occur at this position and
145             the current position
146              
147             Throws :
148             Example :
149             Returns : hash (pA,pC,pG,pT,lA,lC,lG,lT,base,prob,rel)
150             Args : none
151              
152              
153             =cut
154              
155             sub next_pos {
156 0     0 1   my $self = shift;
157 0           $self->throw_not_implemented();
158             }
159              
160             =head2 curpos
161              
162             Title : curpos
163             Usage : my $pos=$site->curpos;
164             Function: Gets/sets the current position. Converts to 0 if argument is minus and
165             to width if greater than width
166             Throws :
167             Example :
168             Returns : integer
169             Args : integer
170              
171             =cut
172              
173             sub curpos {
174 0     0 1   my $self = shift;
175 0           $self->throw_not_implemented();
176             }
177              
178             =head2 e_val
179              
180             Title : e_val
181             Usage : my $score=$site->e_val;
182             Function: Gets/sets the e-value
183             Throws :
184             Example :
185             Returns : real number
186             Args : real number
187              
188             =cut
189              
190             sub e_val {
191 0     0 1   my $self = shift;
192 0           $self->throw_not_implemented();
193             }
194              
195             =head2 consensus
196              
197             Title : consensus
198             Usage :
199             Function: Returns the consensus
200             Returns : string
201             Args : (optional) threshold value 1 to 10, default 5
202             '5' means the returned characters had a 50% or higher presence at
203             their position
204              
205             =cut
206              
207             sub consensus {
208 0     0 1   my $self = shift;
209 0           $self->throw_not_implemented();
210             }
211              
212             =head2 accession_number
213              
214             Title : accession_number
215             Usage :
216             Function: accession number, this will be unique id for the SiteMatrix object as
217             well for any other object, inheriting from SiteMatrix
218             Throws :
219             Example :
220             Returns : string
221             Args : string
222              
223             =cut
224              
225             sub accession_number {
226 0     0 1   my $self = shift;
227 0           $self->throw_not_implemented();
228             }
229              
230              
231             =head2 width
232              
233             Title : width
234             Usage : my $width=$site->width;
235             Function: Returns the length of the site
236             Throws :
237             Example :
238             Returns : number
239             Args :
240              
241             =cut
242              
243             sub width {
244 0     0 1   my $self = shift;
245 0           $self->throw_not_implemented();
246             }
247              
248             =head2 IUPAC
249              
250             Title : IUPAC
251             Usage : my $iupac_consensus=$site->IUPAC;
252             Function: Returns IUPAC compliant consensus
253             Throws :
254             Example :
255             Returns : string
256             Args :
257              
258             =cut
259              
260             sub IUPAC {
261 0     0 1   my $self = shift;
262 0           $self->throw_not_implemented();
263             }
264              
265             =head2 IC
266              
267             Title : IC
268             Usage : my $ic=$site->IC;
269             Function: Information content
270             Throws :
271             Example :
272             Returns : real number
273             Args : none
274              
275             =cut
276              
277             sub IC {
278 0     0 1   my $self=shift;
279 0           $self->throw_not_implemented();
280             }
281              
282             =head2 get_string
283              
284             Title : get_string
285             Usage : my $freq_A=$site->get_string('A');
286             Function: Returns given probability vector as a string. Useful if you want to
287             store things in a rel database, where arrays are not first choice
288             Throws : If the argument is outside {A,C,G,T}
289             Example :
290             Returns : string
291             Args : character {A,C,G,T}
292              
293             =cut
294              
295             sub get_string {
296 0     0 1   my $self=shift;
297 0           $self->throw_not_implemented();
298             }
299              
300             =head2 id
301              
302             Title : id
303             Usage : my $id=$site->id;
304             Function: Gets/sets the site id
305             Throws :
306             Example :
307             Returns : string
308             Args : string
309              
310             =cut
311              
312             sub id {
313 0     0 1   my $self = shift;
314 0           $self->throw_not_implemented();
315             }
316              
317             =head2 regexp
318              
319             Title : regexp
320             Usage : my $regexp=$site->regexp;
321             Function: Returns a regular expression which matches the IUPAC convention.
322             N will match X, N, - and .
323             Throws :
324             Example :
325             Returns : string
326             Args :
327              
328             =cut
329              
330             sub regexp {
331 0     0 1   my $self=shift;
332 0           $self->throw_not_implemented();
333             }
334              
335             =head2 regexp_array
336              
337             Title : regexp_array
338             Usage : my @regexp=$site->regexp;
339             Function: Returns a regular expression which matches the IUPAC convention.
340             N will match X, N, - and .
341             Throws :
342             Example :
343             Returns : array
344             Args :
345             To do : I have separated regexp and regexp_array, but
346             maybe they can be rewritten as one - just check what
347             should be returned
348              
349             =cut
350              
351             sub regexp_array {
352 0     0 1   my $self=shift;
353 0           $self->throw_not_implemented();
354             }
355              
356             =head2 get_array
357              
358             Title : get_array
359             Usage : my @freq_A=$site->get_array('A');
360             Function: Returns an array with frequencies for a specified base
361             Throws :
362             Example :
363             Returns : array
364             Args : char
365              
366             =cut
367              
368             sub get_array {
369 0     0 1   my $self=shift;
370 0           $self->throw_not_implemented();
371             }
372              
373              
374             =head2 _to_IUPAC
375              
376             Title : _to_IUPAC
377             Usage :
378             Function: Converts a single position to IUPAC compliant symbol and
379             returns its probability. For rules see the implementation.
380             Throws :
381             Example :
382             Returns : char, real number
383             Args : real numbers for A,C,G,T (positional)
384              
385             =cut
386              
387             sub _to_IUPAC {
388 0     0     my $self = shift;
389 0           $self->throw_not_implemented();
390             }
391              
392             =head2 _to_cons
393              
394             Title : _to_cons
395             Usage :
396             Function: Converts a single position to simple consensus character and
397             returns its probability. For rules see the implementation,
398             Throws :
399             Example :
400             Returns : char, real number
401             Args : real numbers for A,C,G,T (positional)
402              
403             =cut
404              
405             sub _to_cons {
406 0     0     my $self = shift;
407 0           $self->throw_not_implemented();
408             }
409              
410              
411             =head2 _calculate_consensus
412              
413             Title : _calculate_consensus
414             Usage :
415             Function: Internal stuff
416             Throws :
417             Example :
418             Returns :
419             Args :
420              
421             =cut
422              
423             sub _calculate_consensus {
424 0     0     my $self = shift;
425 0           $self->throw_not_implemented();
426             }
427              
428             =head2 _compress_array
429              
430             Title : _compress_array
431             Usage :
432             Function: Will compress an array of real signed numbers to a string (ie vector of bytes)
433             -127 to +127 for bi-directional(signed) and 0..255 for unsigned ;
434             Throws :
435             Example : Internal stuff
436             Returns : String
437             Args : array reference, followed by an max value and
438             direction (optional, default 1-unsigned),1 unsigned, any other is signed.
439              
440             =cut
441              
442             sub _compress_array {
443 0     0     my $self = shift;
444 0           $self->throw_not_implemented();
445             }
446              
447             =head2 _uncompress_string
448              
449             Title : _uncompress_string
450             Usage :
451             Function: Will uncompress a string (vector of bytes) to create an array of real
452             signed numbers (opposite to_compress_array)
453             Throws :
454             Example : Internal stuff
455             Returns : string, followed by an max value and
456             direction (optional, default 1-unsigned), 1 unsigned, any other is signed.
457             Args : array
458              
459             =cut
460              
461             sub _uncompress_string {
462 0     0     my $self = shift;
463 0           $self->throw_not_implemented();
464             }
465              
466             =head2 get_compressed_freq
467              
468             Title : get_compressed_freq
469             Usage :
470             Function: A method to provide a compressed frequency vector. It uses one byte to
471             code the frequence for one of the probability vectors for one position.
472             Useful for relational database. Improvment of the previous 0..a coding.
473             Throws :
474             Example : my $strA=$self->get_compressed_freq('A');
475             Returns : String
476             Args : char
477              
478             =cut
479              
480             sub get_compressed_freq {
481 0     0 1   my $self = shift;
482 0           $self->throw_not_implemented();
483             }
484              
485             =head2 get_compressed_logs
486              
487             Title : get_compressed_logs
488             Usage :
489             Function: A method to provide a compressed log-odd vector. It uses one byte to
490             code the log value for one of the log-odds vectors for one position.
491             Throws :
492             Example : my $strA=$self->get_compressed_logs('A');
493             Returns : String
494             Args : char
495              
496             =cut
497              
498             sub get_compressed_logs {
499 0     0 1   my $self = shift;
500 0           $self->throw_not_implemented();
501             }
502              
503             =head2 sequence_match_weight
504              
505             Title : sequence_match_weight
506             Usage :
507             Function: This method will calculate the score of a match, based on the PWM
508             if such is associated with the matrix object. Returns undef if no
509             PWM data is available.
510             Throws : if the length of the sequence is different from the matrix width
511             Example : my $score=$matrix->sequence_match_weight('ACGGATAG');
512             Returns : Floating point
513             Args : string
514              
515             =cut
516              
517             sub sequence_match_weight {
518 0     0 1   my $self = shift;
519 0           $self->throw_not_implemented();
520             }
521              
522             =head2 get_all_vectors
523              
524             Title : get_all_vectors
525             Usage :
526             Function: returns all possible sequence vectors to satisfy the PFM under
527             a given threshold
528             Throws : If threshold outside of 0..1 (no sense to do that)
529             Example : my @vectors=$self->get_all_vectors(4);
530             Returns : Array of strings
531             Args : (optional) floating
532              
533             =cut
534              
535             sub get_all_vectors {
536 0     0 1   my $self = shift;
537 0           $self->throw_not_implemented();
538             }
539             1;