File Coverage

blib/lib/ETL/Pipeline/Input.pm

Criterion	Covered	Total	%
statement	11	11	100.0
branch			n/a
condition			n/a
subroutine	4	4	100.0
pod			n/a
total	15	15	100.0

line	stmt	sub	time	code
1				=pod
2
3				=head1 NAME
4
5				ETL::Pipeline::Input - Role for ETL::Pipeline input sources
6
7				=head1 SYNOPSIS
8
9				use Moose;
10				with 'ETL::Pipeline::Input';
11
12				sub run {
13				# Add code to read your data here
14				...
15				}
16
17				=head1 DESCRIPTION
18
19				An I<input source> feeds the B<extract> part of B<ETL>. This is where data comes
20				from. These are your data sources.
21
22				A data source may be anything - a file, a database, or maybe a socket. Each
23				I<format> is an L<ETL::Pipeline> input source. For example, Excel files
24				represent one input source. Perl reads every Excel file the same way. With a few
25				judicious attributes, we can re-use the same input source for just about any
26				type of Excel file.
27
28				L<ETL::Pipeline> defines an I<input source> as a Moose object with at least one
29				method - C<run>. This role basically defines the requirement for the B<run>
30				method. It should be consumed by B<all> input source classes. L<ETL::Pipeline>
31				relies on the input source having this role.
32
33				=head2 How do I create an I<input source>?
34
35				=over
36
37				=item 1. Start a new Perl module. I recommend putting it in the C<ETL::Pipeline::Input> namespace. L<ETL::Pipeline> will pick it up automatically.
38
39				=item 2. Make your module a L<Moose> class - C<use Moose;>.
40
41				=item 3. Consume this role - C<with 'ETL::Pipeline::Input';>.
42
43				=item 4. Write the L</run> method. L</run> follows this basic algorithmn...
44
45				=over
46
47				=item a. Open the source.
48
49				=item b. Loop reading the records. Each iteration should call L<ETL::Pipeline/record> to trigger the I<transform> step.
50
51				=item c. Close the source.
52
53				=back
54
55				=item 5. Add any attributes for your class.
56
57				=back
58
59				The new source is ready to use, like this...
60
61				$etl->input( 'YourNewSource' );
62
63				You can leave off the leading B<ETL::Pipeline::Input::>.
64
65				When L<ETL::Pipeline> calls L</run>, it passes the L<ETL::Pipeline> object as
66				the only parameter.
67
68				=head2 Why this way?
69
70				Input sources mostly follow the basic algorithm of open, read, process, and
71				close. I originally had the role define methods for each of these steps. That
72				was a lot of work, and kind of confusing. This way, the input source only
73				I<needs> one code block that does all of these steps - in one place. So it's
74				easier to troubleshoot and write new sources.
75
76				In the work that I do, we have one output destination that rarely changes. It's
77				far more common to write new input sources - especially customized sources.
78				Making new sources easier saves time. Making it simpler means that more
79				developers can pick up those tasks.
80
81				=head2 Does B<ETL::Pipeline> only work with files?
82
83				No. B<ETL::Pipeline::Input> works for any source of data, such as SQL queries,
84				CSV files, or network sockets. Tailor the C<run> method for whatever suits your
85				needs.
86
87				Because files are most common, B<ETL::Pipeline> comes with a helpful role -
88				L<ETL::Pipeline::Input::File>. Consume L<ETL::Pipeline::Input::File> in your
89				inpiut source to access some standardized attributes.
90
91				=head2 Upgrading from older versions
92
93				L<ETL::Pipeline> version 3 is not compatible with input sources from older
94				versions. You will need to rewrite your custom input sources.
95
96				=over
97
98				=item Merge the C<setup>, C<finish>, and C<next_record> methods into L</run>.
99
100				=item Have L</run> call C<$etl->record> in place of C<next_record>.
101
102				=item Adjust attributes as necessary.
103
104				=back
105
106				=cut
107
108				package ETL::Pipeline::Input;
109
110	10	10	14269	use 5.014000;
	10		43
111	10	10	52	use warnings;
	10		22
	10		309
112
113	10	10	53	use Moose::Role;
	10		20
	10		107
114
115
116				our $VERSION = '3.00';
117
118
119				=head1 METHODS & ATTRIBUTES
120
121				=head3 path (optional)
122
123				If you define this, the standard logging will include it. The attribute is
124				named for file inputs. But it can return any value that is meaningful to your
125				users.
126
127				=head3 position (optional)
128
129				If you define this, the standard logging includes it with error or informational
130				messages. It can be any value that helps users locate the correct place to
131				troubleshoot.
132
133				=head3 run (required)
134
135				You define this method in the consuming class. It should open the file, read
136				each record, call L<ETL::Pipeline/record> after each record, and close the file.
137				This method is the workhorse. It defines the main ETL loop.
138				L<ETL::Pipeline/record> acts as a callback.
139
140				I say I<file>. It really means I<input source> - whatever that might be.
141
142				Some important things to remember about C<run>...
143
144				=over
145
146				=item C<run> receives one parameter - the L<ETL::Pipeline> object.
147
148				=item Should include all the code to open, read, and close the input source.
149
150				=item After reading a record, call L<ETL::Pipeline/record>.
151
152				=back
153
154				If your code encounters an error, B<run> can call L<ETL::Pipeline/status> with
155				the error message. L<ETL::Pipeline/status> should automatically include the
156				record count with the error message. You should add any other troubleshooting
157				information such as file names or key fields.
158
159				$etl->status( "ERROR", "Error message here for id $id" );
160
161				For fatal errors, I recommend using the C<croak> command from L<Carp>.
162
163				=cut
164
165				requires 'run';
166
167
168				=head3 source
169
170				The location in the input source of the current record. For example, for files
171				this would be the file name and character position. The consuming class can set
172				this value in its L<run\|ETL::Pipeline::Input/run> method.
173
174				L<Logging\|ETL::Pipeline/log> uses this when displaying errors or informational
175				messages. The value should be something that helps the user troubleshoot issues.
176				It can be whatever is appropriate for the input source.
177
178				B<NOTE:> Don't capitalize the first letter, unless it's supposed to be.
179				L<Logging\|ETL::Pipeline/log> will upper case the first letter if it's
180				appropriate.
181
182				=cut
183
184				has 'source' => (
185				default => '',
186				is => 'rw',
187				isa => 'Str',
188				);
189
190
191				=head1 SEE ALSO
192
193				L<ETL::Pipeline>, L<ETL::Pipeline::Input::File>, L<ETL::Pipeline::Output>
194
195				=head1 AUTHOR
196
197				Robert Wohlfarth <robert.j.wohlfarth@vumc.org>
198
199				=head1 LICENSE
200
201				Copyright 2021 (c) Vanderbilt University Medical Center
202
203				This program is free software; you can redistribute it and/or modify it under
204				the same terms as Perl itself.
205
206				=cut
207
208	10	10	33949	no Moose;
	10		28
	10		76
209
210				# Required by Perl to load the module.
211				1;