File Coverage

blib/lib/AI/NeuralNet/SOM.pm

Criterion	Covered	Total	%
statement	58	63	92.0
branch	7	8	87.5
condition	1	2	50.0
subroutine	11	16	68.7
pod	10	11	90.9
total	87	100	87.0

line	stmt	bran	cond	sub	pod	time	code
1							package AI::NeuralNet::SOM;
2
3	4			4		38653	use strict;
	4					7
	4					205
4	4			4		27	use warnings;
	4					7
	4					196
5
6							require Exporter;
7	4			4		34	use base qw(Exporter);
	4					7
	4					377
8
9	4			4		4653	use Data::Dumper;
	4					44395
	4					12172
10
11							=pod
12
13							=head1 NAME
14
15							AI::NeuralNet::SOM - Perl extension for Kohonen Maps
16
17							=head1 SYNOPSIS
18
19							use AI::NeuralNet::SOM::Rect;
20							my $nn = new AI::NeuralNet::SOM::Rect (output_dim => "5x6",
21							input_dim => 3);
22							$nn->initialize;
23							$nn->train (30,
24							[ 3, 2, 4 ],
25							[ -1, -1, -1 ],
26							[ 0, 4, -3]);
27
28							my @mes = $nn->train (30, ...); # learn about the smallest errors
29							# during training
30
31							print $nn->as_data; # dump the raw data
32							print $nn->as_string; # prepare a somehow formatted string
33
34							use AI::NeuralNet::SOM::Torus;
35							# similar to above
36
37							use AI::NeuralNet::SOM::Hexa;
38							my $nn = new AI::NeuralNet::SOM::Hexa (output_dim => 6,
39							input_dim => 4);
40							$nn->initialize ( [ 0, 0, 0, 0 ] ); # all get this value
41
42							$nn->value (3, 2, [ 1, 1, 1, 1 ]); # change value for a neuron
43							print $nn->value (3, 2);
44
45							$nn->label (3, 2, 'Danger'); # add a label to the neuron
46							print $nn->label (3, 2);
47
48
49							=head1 DESCRIPTION
50
51							This package is a stripped down implementation of the Kohonen Maps
52							(self organizing maps). It is B meant as demonstration or for use
53							together with some visualisation software. And while it is not (yet)
54							optimized for speed, some consideration has been given that it is not
55							overly slow.
56
57							Particular emphasis has been given that the package plays nicely with
58							others. So no use of files, no arcane dependencies, etc.
59
60							=head2 Scenario
61
62							The basic idea is that the neural network consists of a 2-dimensional
63							array of N-dimensional vectors. When the training is started these
64							vectors may be completely random, but over time the network learns
65							from the sample data, which is a set of N-dimensional vectors.
66
67							Slowly, the vectors in the network will try to approximate the sample
68							vectors fed in. If in the sample vectors there were clusters, then
69							these clusters will be neighbourhoods within the rectangle (or
70							whatever topology you are using).
71
72							Technically, you have reduced your dimension from N to 2.
73
74							=head1 INTERFACE
75
76							=head2 Constructor
77
78							The constructor takes arguments:
79
80							=over
81
82							=item C : (mandatory, no default)
83
84							A positive integer specifying the dimension of the sample vectors (and hence that of the vectors in
85							the grid).
86
87							=item C: (optional, default C<0.1>)
88
89							This is a magic number which controls how strongly the vectors in the grid can be influenced. Stronger
90							movement can mean faster learning if the clusters are very pronounced. If not, then the movement is
91							like noise and the convergence is not good. To mediate that effect, the learning rate is reduced
92							over the iterations.
93
94							=item C: (optional, defaults to radius)
95
96							A non-negative number representing the start value for the learning radius. Practically, the value
97							should be chosen in such a way to cover a larger part of the map. During the learning process this
98							value will be narrowed down, so that the learning radius impacts less and less neurons.
99
100							B: Do not choose C<1> as the C function is used on this value.
101
102							=back
103
104							Subclasses will (re)define some of these parameters and add others:
105
106							Example:
107
108							my $nn = new AI::NeuralNet::SOM::Rect (output_dim => "5x6",
109							input_dim => 3);
110
111							=cut
112
113	0			0	0	0	sub new { die; }
114
115							=pod
116
117							=head2 Methods
118
119							=over
120
121							=item I
122
123							I<$nn>->initialize
124
125							You need to initialize all vectors in the map before training. There are several options
126							how this is done:
127
128							=over
129
130							=item providing data vectors
131
132							If you provide a list of vectors, these will be used in turn to seed the neurons. If the list is
133							shorter than the number of neurons, the list will be started over. That way it is trivial to
134							zero everything:
135
136							$nn->initialize ( [ 0, 0, 0 ] );
137
138							=item providing no data
139
140							Then all vectors will get randomized values (in the range [ -0.5 .. 0.5 ]).
141
142							=item using eigenvectors (see L)
143
144							=back
145
146							=item I
147
148							I<$nn>->train ( I<$epochs>, I<@vectors> )
149
150							I<@mes> = I<$nn>->train ( I<$epochs>, I<@vectors> )
151
152							The training uses the list of sample vectors to make the network learn. Each vector is simply a
153							reference to an array of values.
154
155							The C parameter controls how many vectors are processed. The vectors are B used in
156							sequence, but picked randomly from the list. For this reason it is wise to run several epochs,
157							not just one. But within one epoch B vectors are visited exactly once.
158
159							Example:
160
161							$nn->train (30,
162							[ 3, 2, 4 ],
163							[ -1, -1, -1 ],
164							[ 0, 4, -3]);
165
166							=cut
167
168							sub train {
169	47			47	1	2911	my $self = shift;
170	47		50			142	my $epochs = shift \|\| 1;
171	47	50				129	die "no data to learn" unless @_;
172
173	47					279	$self->{LAMBDA} = $epochs / log ($self->{_Sigma0}); # educated guess?
174
175	47					107	my @mes = (); # this will contain the errors during the epochs
176	47					116	for my $epoch (1..$epochs) {
177	3060					5211	$self->{T} = $epoch;
178	3060					9605	my $sigma = $self->{_Sigma0} * exp ( - $self->{T} / $self->{LAMBDA} ); # compute current radius
179	3060					8016	my $l = $self->{_L0} * exp ( - $self->{T} / $epochs ); # current learning rate
180
181	3060					6264	my @veggies = @_; # make a local copy, that will be destroyed in the loop
182	3060					6900	while (@veggies) {
183	8780					22556	my $sample = splice @veggies, int (rand (scalar @veggies) ), 1; # find (and take out)
184
185	8780					29744	my @bmu = $self->bmu ($sample); # find the best matching unit
186	8780	100				22235	push @mes, $bmu[2] if wantarray;
187	8780					30356	my $neighbors = $self->neighbors ($sigma, @bmu); # find its neighbors
188	8780					14777	map { _adjust ($self, $l, $sigma, $_, $sample) } @$neighbors; # bend them like Beckham
	57992					107416
189							}
190							}
191	47					238	return @mes;
192							}
193
194							sub _adjust { # http://www.ai-junkie.com/ann/som/som4.html
195	57992			57992		71494	my $self = shift;
196	57992					62607	my $l = shift; # the learning rate
197	57992					60432	my $sigma = shift; # the current radius
198	57992					78081	my $unit = shift; # which unit to change
199	57992					80791	my ($x, $y, $d) = @$unit; # it contains the distance
200	57992					58590	my $v = shift; # the vector which makes the impact
201
202	57992					90497	my $w = $self->{map}->[$x]->[$y]; # find the data behind the unit
203	57992					126059	my $theta = exp ( - ($d ** 2) / (2 * $sigma ** 2)); # gaussian impact (using distance and current radius)
204
205	57992					109832	foreach my $i (0 .. $#$w) { # adjusting values
206	173976					480692	$w->[$i] = $w->[$i] + $theta * $l * ( $v->[$i] - $w->[$i] );
207							}
208							}
209
210							=pod
211
212							=item I
213
214							(I<$x>, I<$y>, I<$distance>) = I<$nn>->bmu (I<$vector>)
215
216							This method finds the I, i.e. that neuron which is closest to the vector passed
217							in. The method returns the coordinates and the actual distance.
218
219							=cut
220
221	0			0	1	0	sub bmu { die; }
222
223							=pod
224
225							=item I
226
227							I<$me> = I<$nn>->mean_error (I<@vectors>)
228
229							This method takes a number of vectors and produces the I, i.e. the average I
230							which the SOM makes when finding the Cs for the vectors. At least one vector must be passed in.
231
232							Obviously, the longer you let your SOM be trained, the smaller the error should become.
233
234							=cut
235
236							sub mean_error {
237	81			81	1	352	my $self = shift;
238	81					111	my $error = 0;
239	243					360	map { $error += $_ } # then add them all up
	243					745
240	81					180	map { ( $self->bmu($_) )[2] } # then find the distance
241							@_; # take all data vectors
242	81					491	return ($error / scalar @_); # return the mean value
243							}
244
245							=pod
246
247							=item I
248
249							I<$ns> = I<$nn>->neighbors (I<$sigma>, I<$x>, I<$y>)
250
251							Finds all neighbors of (X, Y) with a distance smaller than SIGMA. Returns a list reference of (X, Y,
252							distance) triples.
253
254							=cut
255
256	0			0	1	0	sub neighbors { die; }
257
258							=pod
259
260							=item I (read-only)
261
262							I<$dim> = I<$nn>->output_dim
263
264							Returns the output dimensions of the map as passed in at constructor time.
265
266							=cut
267
268							sub output_dim {
269	2			2	1	5	my $self = shift;
270	2					9	return $self->{output_dim};
271							}
272
273							=pod
274
275							=item I (read-only)
276
277							I<$radius> = I<$nn>->radius
278
279							Returns the I of the map. Different topologies interpret this differently.
280
281							=item I
282
283							I<$m> = I<$nn>->map
284
285							This method returns a reference to the map data. See the appropriate subclass of the data
286							representation.
287
288							=cut
289
290							sub map {
291	6			6	1	2586	my $self = shift;
292	6					31	return $self->{map};
293							}
294
295							=pod
296
297							=item I
298
299							I<$val> = I<$nn>->value (I<$x>, I<$y>)
300
301							I<$nn>->value (I<$x>, I<$y>, I<$val>)
302
303							Set or get the current vector value for a particular neuron. The neuron is addressed via its
304							coordinates.
305
306							=cut
307
308							sub value {
309	45			45	1	14904	my $self = shift;
310	45					82	my ($x, $y) = (shift, shift);
311	45					52	my $v = shift;
312	45	100				216	return defined $v ? $self->{map}->[$x]->[$y] = $v : $self->{map}->[$x]->[$y];
313							}
314
315							=pod
316
317							=item I
318
319							I<$label> = I<$nn>->label (I<$x>, I<$y>)
320
321							I<$nn>->label (I<$x>, I<$y>, I<$label>)
322
323							Set or get the label for a particular neuron. The neuron is addressed via its coordinates.
324							The label can be anything, it is just attached to the position.
325
326							=cut
327
328							sub label {
329	3			3	1	869	my $self = shift;
330	3					5	my ($x, $y) = (shift, shift);
331	3					4	my $l = shift;
332	3	100				18	return defined $l ? $self->{labels}->[$x]->[$y] = $l : $self->{labels}->[$x]->[$y];
333							}
334
335							=pod
336
337							=item I
338
339							print I<$nn>->as_string
340
341							This methods creates a pretty-print version of the current vectors.
342
343							=cut
344
345	0			0	1		sub as_string { die; }
346
347							=pod
348
349							=item I
350
351							print I<$nn>->as_data
352
353							This methods creates a string containing the raw vector data, row by
354							row. This can be fed into gnuplot, for instance.
355
356							=cut
357
358	0			0	1		sub as_data { die; }
359
360							=pod
361
362							=back
363
364							=head1 HOWTOs
365
366							=over
367
368							=item I
369
370							See the example script in the directory C provided in the
371							distribution. It uses L (for speed and scalability, but the
372							results are not as good as I had thought).
373
374							=item I
375
376							See the example script in the directory C. It uses
377							C to directly dump the data structure onto disk. Storage and
378							retrieval is quite fast.
379
380							=back
381
382							=head1 FAQs
383
384							=over
385
386							=item I
387
388							There is most likely something wrong with the C you
389							specified and your vectors should be having.
390
391							=back
392
393							=head1 TODOs
394
395							=over
396
397							=item maybe implement the SOM on top of PDL?
398
399							=item provide a ::SOM::Compat to have compatibility with the original AI::NeuralNet::SOM?
400
401							=item implement different window forms (bubble/gaussian), linear/random
402
403							=item implement the format mentioned in the original AI::NeuralNet::SOM
404
405							=item add methods as_html to individual topologies
406
407							=item add iterators through vector lists for I and I
408
409							=back
410
411							=head1 SUPPORT
412
413							Bugs should always be submitted via the CPAN bug tracker
414							L
415
416							=head1 SEE ALSO
417
418							Explanation of the algorithm:
419
420							L
421
422							Old version of AI::NeuralNet::SOM from Alexander Voischev:
423
424							L
425
426							Subclasses:
427
428							L
429							L
430							L
431
432
433							=head1 AUTHOR
434
435							Robert Barta, Erho@devc.atE
436
437							=head1 COPYRIGHT AND LICENSE
438
439							Copyright (C) 200[78] by Robert Barta
440
441							This library is free software; you can redistribute it and/or modify
442							it under the same terms as Perl itself, either Perl version 5.8.8 or,
443							at your option, any later version of Perl 5 you may have available.
444
445							=cut
446
447							our $VERSION = '0.07';
448
449							1;
450
451							__END__