File Coverage

blib/lib/XML/ASCX12.pm
Criterion Covered Total %
statement 27 202 13.3
branch 0 108 0.0
condition 0 46 0.0
subroutine 9 21 42.8
pod 4 4 100.0
total 40 381 10.5


line stmt bran cond sub pod time code
1             #
2             # $Id: ASCX12.pm,v 1.17 2004/09/28 14:59:34 brian.kaney Exp $
3             #
4             # XML::ASCX12
5             #
6             # Copyright (c) Vermonster LLC
7             #
8             # This library is free software; you can redistribute it and/or
9             # modify it under the terms of the GNU Lesser General Public
10             # License as published by the Free Software Foundation; either
11             # version 2.1 of the License, or (at your option) any later version.
12             #
13             # This library is distributed in the hope that it will be useful,
14             # but WITHOUT ANY WARRANTY; without even the implied warranty of
15             # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
16             # Lesser General Public License for more details.
17             #
18             # For questions, comments, contributions and/or commercial support
19             # please contact:
20             #
21             # Vermonster LLC
22             # 312 Stuart St. 2nd Floor
23             # Boston, MA 02116 US
24             #
25             # vim: set expandtab tabstop=4 shiftwidth=4
26             #
27              
28             =head1 NAME
29              
30             XML::ASCX12 - ASCX12 EDI to XML Module
31              
32             =head1 SYNOPSIS
33              
34             use XML::ASCX12;
35            
36             my $ascx12 = new XML::ASCX12();
37             $ascx12->convertfile("/path/to/edi_input", "/path/to/xml_output");
38              
39             =head1 INFORMATION
40              
41             =head2 Module Description
42              
43             XML::ASCX12 started as a project to process X12 EDI files from shipping
44             vendors (i.e. transaction sets 110, 820 and 997). However, this module can be
45             extended to support any valid transaction set (catalog).
46              
47             =head2 Why are you doing this?
48              
49             If you've ever taken a look at an ASCX12 document you'll see why. The EDI format is
50             very compact, which makes is great for transmission. However this comes at a cost.
51              
52             The main challenge when dealing with EDI data is parsing through the structure.
53             Here we find loops within loops within loops. In this non-extensible, flat format,
54             human parsing is nearly impossible and machine parsing is a task at best.
55              
56             A quick background of how a typical EDI is formed:
57              
58              
59             +----> ISA - Interchange Control Header
60             | GS - Functional Group Header <--------+
61             | ST - Transaction Set Header |
62             Envelope [transaction set specific] Functional Group
63             | SE - Transaction Set Trailer |
64             | GE - Functional Group Trailer <--------+
65             +----> ISE - Interchange Control Trailer
66              
67              
68              
69             The Transmission Envelope can have one or more Functional Group. A Functional Group
70             can have one or more Transaction Set. Then each specific catalog (transaction set)
71             can have it's own hierarchical rules.
72              
73             This sort of structure really lends itself to XML. So using the power of Perl,
74             this module was created to make accessing EDI information easier.
75              
76             To learn more, the official ASC X12 group has a website L.
77              
78             =head2 Module Limitations
79              
80             This is a new module and has a few limitations.
81              
82             =over 4
83              
84             =item * EDI -> XML
85              
86             This module converts from EDI to XML. If you want to go in the other direction, suggest
87             creating an XSL stylesheet and use L or similar to preform a transformation.
88              
89             =item * Adding Transaction Sets
90              
91             Adding new catalogs is a manual process. The L and
92             the L need to be manually updated. A future development effort
93             could store this information in dbm files with an import script if demand exists.
94              
95             =back
96              
97             =head2 Style Guide
98              
99             You will (hopefully) find consistent coding style throughout this module.
100             Any private variable or method if prefixed with an underscore (C<_>). Any
101             static method or variable is named in C.
102              
103             The tabs are set at 4 spaces and the POD is physically close to the stuff it
104             is describing to promote fantastic ongoing documentation.
105              
106             =cut
107             package XML::ASCX12;
108              
109 1     1   28683 use 5.008004;
  1         4  
  1         42  
110 1     1   5 use strict;
  1         1  
  1         33  
111 1     1   5 use warnings;
  1         6  
  1         38  
112              
113 1     1   5 no warnings 'utf8';
  1         1  
  1         43  
114 1     1   1088 use bytes;
  1         11  
  1         4  
115              
116             our $VERSION = '0.04';
117              
118             =head1 REQUIREMENTS
119              
120             We use the L module to handle errors. Some day there may be a better
121             error handler and maybe an error object to reference, but for now it croaks
122             when there is a problem.
123              
124             L module is required and probably part of this package,
125             as is the L.
126              
127             =cut
128 1     1   49 use Carp qw(croak);
  1         2  
  1         89  
129              
130             ## use Data::Dumper;
131 1     1   607 use XML::ASCX12::Catalogs qw($LOOPNEST load_catalog);
  1         2  
  1         304  
132 1     1   19471 use XML::ASCX12::Segments qw($SEGMENTS $ELEMENTS);
  1         5  
  1         224  
133              
134             =head1 VARIABLE AND METHODS
135              
136             =head2 Private Variables
137              
138             These variables are not exported and not intended to be accessed externally.
139             They are listed here for documentation purposes only.
140              
141             =over 4
142              
143             =item C<@_LOOPS>
144              
145             Dynamic and keeps track of which loop we are on.
146              
147             =item C<%_XMLREP>
148              
149             Static variable used to lookup bad XML characters.
150              
151             =item C<$_XMLHEAD>
152              
153             Static variable containing the XML header for the output.
154              
155             =back
156              
157             =cut
158 1     1   9 use vars qw(@_LOOPS %_XMLREP $_XMLHEAD);
  1         2  
  1         2801  
159              
160             %_XMLREP = (
161             '&' => '&'
162             ,'<' => '<'
163             ,'>' => '>'
164             ,'"' => '"'
165             );
166              
167             $_XMLHEAD = qq||;
168              
169             =head2 Public Methods
170              
171             =over 4
172              
173             =item object = new([$segment_terminator], [$data_element_separator], [$subelement_separator])
174              
175              
176             The new method is the OO constructor. The default for the segment terminator
177             is ASCII C<85> hex. The default for the data element separator is ASCII C<1D> hex. The default
178             for the sub-element separator is ASCII C<1F> hex.
179              
180             my $xmlrpc = new XML::ASCX12();
181              
182             The defaults can be overridden by passing them into the constructor.
183              
184             my $xmlrpc = new XML::ASCX12('\x0D', '\x2A', '\x3A');
185              
186             The object that returns is now ready to transform EDI files.
187              
188             =cut
189             sub new
190             {
191 0     0 1   my ($name, $st, $des, $sbs) = @_;
192              
193 0 0         $st = '\x85' unless $st;
194 0 0         $des = '\x1D' unless $des;
195 0 0         $sbs = '\x1F' unless $sbs;
196              
197 0   0       my $class = ref($name) || $name;
198 0           my $self = { ST=>$st, DES=>$des, SBS=>$sbs };
199              
200 0           bless ($self, $class);
201 0           return $self;
202             }
203              
204             =item boolean = $obj->convertfile($input, $output)
205              
206              
207             This method will transform and EDI file to XML using the configuration information
208             passed in from the constructor.
209              
210             my $xmlrpc = new XML::ASCX12();
211             $xmlrpc->convertfile('/path/to/EDI.dat', '/path/to/EDI.xml');
212              
213             You may also pass filehandles (or references to filehandles):
214              
215             $xmlrpc->convertfile(\*INFILE, \*OUTFILE);
216              
217             =cut
218             sub convertfile
219             {
220 0     0 1   my ($self, $in, $out) = @_;
221 0           my ($inhandle, $outhandle);
222 0           my ($bisinfile, $bisoutfile);
223              
224 0           $self->_unload_catalog();
225              
226 0 0 0       if (ref($out) eq "GLOB" or ref(\$out) eq "GLOB"
      0        
      0        
227             or ref($out) eq 'FileHandle' or ref($out) eq 'IO::Handle')
228             {
229 0           $outhandle = $out;
230             }
231             else
232             {
233 0           local(*XMLOUT);
234 0 0         open (XMLOUT, "> $out") || croak "Cannot open file \"$out\" for writing: $!";
235 0           $outhandle = *XMLOUT;
236 0           $bisoutfile = 1;
237             }
238              
239 0           my $st_check=0;
240 0           my $des_check=0;
241              
242 0           print {$outhandle} $XML::ASCX12::_XMLHEAD;
  0            
243             {
244 0 0 0       if (ref($in) eq "GLOB" or ref(\$in) eq "GLOB"
  0   0        
      0        
245             or ref($in) eq 'FileHandle' or ref($in) eq 'IO::Handle')
246             {
247 0           $inhandle = $in;
248             }
249             else
250             {
251 0           local(*EDIIN);
252 0 0         open (EDIIN, "< $in") || croak "Cannot open file \"$in\" file for reading: $!";
253 0           $inhandle = *EDIIN;
254 0           $bisinfile = 1;
255             }
256 0           binmode($inhandle);
257              
258 0           (my $eos = $self->{ST}) =~ s/^\\/0/;
259 0           local $/ = pack("C*", oct($eos));
260              
261             # Looping per-segment for processing
262 0           while (<$inhandle>)
263             {
264 0 0         if (!$st_check) { $st_check = 1 if m/$self->{ST}/; }
  0 0          
265 0 0         if (!$des_check) { $des_check = 1 if m/$self->{DES}/; }
  0 0          
266              
267 0           chomp;
268 0           print {$outhandle} $self->_proc_segment($_);
  0            
269             }
270             # This is done to close any open loops
271             # XXX Is there a better way to "run on more time"?
272 0           print {$outhandle} $self->_proc_segment('');
  0            
273             }
274 0           print {$outhandle} '';
  0            
275              
276 0 0 0       (close($inhandle) || croak "Cannot close output file \"$out\": $!") if $bisinfile;
277 0 0 0       (close($outhandle)|| croak "Cannot close input file \"$in\": $!") if $bisoutfile;
278              
279 0 0         croak "EDI Parsing Error: Segment Terminator \"$self->{ST}\" not found" unless $st_check;
280 0 0         croak "EDI Parsing Error: Data Element Seperator \"$self->{DES}\" not found" unless $des_check;
281              
282 0           return 1;
283             }
284              
285             =item string = $obj->convertdata($input)
286              
287              
288             This method will transform an EDI data stream, returning wellformed XML.
289              
290             my $xmlrpc = new XML::ASCX12();
291             my $xml = $xmlrpc->convertdata($binary_edi_data);
292              
293              
294             =cut
295             sub convertdata
296             {
297 0     0 1   my ($self, $in) = @_;
298              
299 0 0         croak "EDI Parsing Error: Segment Terminator \"$self->{ST}\" not found" unless ($in =~ m/$self->{ST}/);
300 0 0         croak "EDI Parsing Error: Data Element Seperator \"$self->{DES}\" not found" unless ($in =~ m/$self->{DES}/);
301              
302 0           my $out = $XML::ASCX12::_XMLHEAD;
303 0           (my $eos = $self->{ST}) =~ s/^\\/0/;
304 0           my @data = split(pack("C*", oct($eos)), $in);
305 0           foreach(@data)
306             {
307 0           $out .= $self->_proc_segment($_);
308             }
309 0           $out .= $self->_proc_segment('');
310              
311 0           return $out;
312             }
313              
314             =item string = XMLENC($string)
315              
316              
317             Static public method used to encode and return data suitable for ASCII XML CDATA
318              
319             $xml_ready_string = XML::ASCX12::XMLENC($raw_data);
320              
321             =cut
322             sub XMLENC
323             {
324 0     0 1   my $str = $_[0];
325 0 0         if ($str)
326             {
327 0           $str =~ s/([&<>"])/$_XMLREP{$1}/ge; # relace any &<>" characters
  0            
328 0           $str =~ s/[\x80-\xff]//ge; # get rid on any non-ASCII characters
329 0           $str =~ s/[\x01-\x1f]//ge; # get rid on any non-ASCII characters
330             }
331 0           return $str;
332             }
333              
334             =back
335              
336             =head2 Private Methods
337              
338             =over 4
339              
340             =item string = _proc_segment($segment_data);
341              
342              
343             This is an internal private method that processes a segment using $LOOPNEST.
344             It is called by C or C while looping per-segment.
345              
346             =cut
347             sub _proc_segment
348             {
349 0     0     my ($self, $segment) = @_;
350 0 0         if (defined $XML::ASCX12::Catalogs::IS_CHILD) {
351 0           return $self->_proc_segment_in_child($segment);
352             }
353 0           $segment =~ s/\n//g;
354 0 0         if ($segment =~ m/[0-9A-Za-z]*/)
355             {
356 0           my ($segcode, @elements) = split(/$self->{DES}/, $segment);
357 0 0 0       if ($segcode and $segcode eq "ST")
358             {
359 0           $self->_unload_catalog();
360 0           $self->load_catalog($elements[0]);
361             ## IS_CHILD not defined until after Catalog loaded
362             ## Use alternate parsing starting with "ST" segment
363 0 0         if (defined $XML::ASCX12::Catalogs::IS_CHILD) {
364 0           return $self->_proc_segment_in_child($segment);
365             }
366             }
367              
368             # check to see if we need to close a loop
369 0 0         my $curloop = $XML::ASCX12::Segments::SEGMENTS->{$segcode}[3] if $segcode;
370 0           my $xml = '';
371 0 0         if (my $tmp = $self->_closeloop($curloop, $self->{lastloop}, $segcode)) { $xml .= $tmp; }
  0            
372 0 0         if (@elements)
373             {
374             # check to see if we need to open a loop
375 0 0         if (my $tmp = $self->_openloop($curloop, $self->{lastloop})) { $xml .= $tmp; }
  0            
376              
377             # now the standard segment (and elements)
378 0           $xml .= '
379 0 0         $xml .= ' desc="'.XML::ASCX12::XMLENC($XML::ASCX12::Segments::SEGMENTS->{$segcode}[0]).'"' if $XML::ASCX12::Segments::SEGMENTS->{$segcode};
380 0           $xml .= '>';
381            
382             # make our elements
383 0           $xml .= $self->_proc_element($segcode, @elements);
384            
385             # close the segment
386 0           $xml .= '';
387              
388             # keep track
389 0           $self->{lastloop} = $curloop;
390             }
391 0           return $xml;
392             }
393             }
394              
395             =item string = _proc_segment_in_child($segment_data);
396              
397              
398             This is an internal private method that processes a segment using $IN_CHILD.
399             It is called by C<_proc_segment()> when $IN_CHILD is defined.
400              
401             =cut
402             sub _proc_segment_in_child
403             {
404 0     0     my ($self, $segment) = @_;
405 0           $segment =~ s/\n//g;
406 0   0       $self->{lastloop} ||= '';
407 0 0         if ($segment =~ m/[0-9A-Za-z]*/)
408             {
409 0           my ($segcode, @elements) = split(/$self->{DES}/, $segment);
410 0 0 0       if ($segcode and $segcode eq "ST")
    0          
411             {
412              
413             ## warn "segcode = $segcode\n";
414             ## warn Dumper $self, $XML::ASCX12::Catalogs::LOOPNEST,
415             ## \@_LOOPS, $XML::ASCX12::Catalogs::IS_CHILD;
416             }
417             elsif ($segcode) {
418             ## warn "segcode = $segcode\n";
419             }
420             else {
421             ## warn "no segcode\n";
422             ## final loop close
423 0           return $self->_closeloop('', $self->{lastloop}, '');
424             }
425 0           my $xml = '';
426 0           my $is_child;
427 0           my $curloop = $_LOOPS[-1];
428 0           until ( defined ($is_child =
429             $XML::ASCX12::Catalogs::IS_CHILD->{$curloop}->{$segcode}) ) {
430 0           $xml .= $self->_execclose($curloop);
431             ## warn "WCB close tag: $xml\n";
432             ## warn Dumper \@_LOOPS;
433 0           $curloop = $_LOOPS[-1];
434             }
435             ## warn "WCB IS_CHILD = $is_child, $curloop, $segcode, $_LOOPS[-1]\n";
436            
437 0 0         if (@elements)
438             {
439             # check to see if we need to open a loop
440 0 0         if ($is_child eq '0') {
441 0           push (@_LOOPS, $segcode);
442             ## warn 'WCB open tag: \n";
443 0           $xml .= '';
444             }
445              
446             # now the standard segment (and elements)
447 0           $xml .= '
448 0 0         $xml .= ' desc="'.XML::ASCX12::XMLENC($XML::ASCX12::Segments::SEGMENTS->{$segcode}[0]).'"' if $XML::ASCX12::Segments::SEGMENTS->{$segcode};
449 0           $xml .= '>';
450            
451             # make our elements
452 0           $xml .= $self->_proc_element($segcode, @elements);
453            
454             # close the segment
455 0           $xml .= '';
456              
457             # keep track
458 0           $self->{lastloop} = $curloop;
459             }
460 0           return $xml;
461             }
462             }
463              
464             =item string = _proc_element($segment_code, @elements)
465              
466              
467             This is a private method called by C<_proc_segment()>. Each segment consists of
468             elements, this is where they are processed.
469              
470             =cut
471             sub _proc_element
472             {
473 0     0     my ($self, $segcode, @elements) = @_;
474 0           my $i = 1;
475 0           my $xml = '';
476 0           foreach (@elements)
477             {
478 0 0         if ($_ =~ /[0-9A-Za-z]/)
479             {
480 0           my $elename;
481 0 0         $elename = $segcode.$i if $i >= 10;
482 0 0         $elename = $segcode.'0'.$i if $i < 10;
483 0           $xml .= '
484 0 0         $xml .= ' desc="'.XML::ASCX12::XMLENC($XML::ASCX12::Segments::ELEMENTS->{$elename}[0]).'"' if $XML::ASCX12::Segments::ELEMENTS->{$elename};
485 0           $xml .= '>'.XML::ASCX12::XMLENC($_).'';
486             }
487 0           $i++;
488             }
489 0           return $xml;
490             }
491              
492              
493             =item string = _openloop($loop_to_open, $last_opened_loop)
494              
495              
496             This is an internal private method. It will either open a loop if we can
497             or return nothing.
498              
499             =cut
500             sub _openloop
501             {
502 0     0     my ($self, $newloop, $lastloop) = @_;
503 0 0         if (XML::ASCX12::_CANHAVE($lastloop, $newloop))
504             {
505 0           push (@_LOOPS, $newloop);
506 0           return '';
507             }
508 0           return;
509             }
510              
511             =item void = _closeloop($loop_to_close, $last_opened_loop, $current_segment, $trigger)
512              
513              
514             This routine is a private method. It will recurse to close any open loops.
515              
516             =cut
517             sub _closeloop
518             {
519 0     0     my ($self, $newloop, $lastloop, $currentseg, $once) = @_;
520 0   0       $lastloop ||= '';
521 0 0         $once = 0 unless $once;
522 0           my $xml;
523             # Case when there are two consecutive loops
524 0 0 0       if ($newloop and $lastloop and $currentseg eq $lastloop and ($currentseg ne ""))
    0 0        
      0        
525             {
526 0           $xml = $self->_execclose($lastloop);
527 0           return $xml;
528             }
529             # "Standard Case"
530             elsif (XML::ASCX12::_CANHAVE($newloop, $lastloop))
531             {
532 0           $xml = $self->_execclose($lastloop);
533 0           return $xml;
534             }
535             # Recusrively close loops
536             else
537             {
538 0           my @parent_loops_to_close = ();
539 0 0         if (@_LOOPS)
540             {
541 0           foreach my $testloop (reverse @_LOOPS) #Close in reverse order
542             {
543             # found a loop, see which ones we ough to close
544 0 0         if ($testloop eq $newloop)
545             {
546 0 0         if (@parent_loops_to_close)
547             {
548 0           foreach my $closeme (@parent_loops_to_close)
549             {
550 0 0         $xml .= $self->_execclose($closeme) if $closeme;
551             }
552             # See if the current loop ought to be closed
553 0 0         if ($once != 1)
554             {
555 0 0         if (my $tmp = $self->_closeloop($newloop, $self->{lastloop}, $currentseg, 1))
556             {
557 0           $xml .= $tmp;
558             }
559             }
560 0           return $xml;
561             }
562             }
563             # Push into the loops to close
564             else
565             {
566 0 0         if ($testloop) { push (@parent_loops_to_close, $testloop); }
  0            
567             }
568             }
569             }
570             }
571 0           return;
572             }
573              
574             =item string = _execclose($loop_to_close)
575              
576              
577             Private internal method to actually return the XML that signifies a closed
578             loop. It is called by C<_closeloop()>.
579              
580             =cut
581             sub _execclose
582             {
583 0     0     my ($self, $loop) = @_;
584 0 0         return unless $loop;
585 0 0         if ($loop =~ /[A-Za-z0-9]*/)
586             {
587 0           pop @_LOOPS;
588 0           $self->{lastloop} = $_LOOPS[-1];
589 0 0         return '' if XML::ASCX12::XMLENC($loop);
590             }
591             }
592              
593             =item void = _unload_catalog()
594              
595              
596             Private method that clears out catalog data and loads standard ASCX12 structure.
597             Also initializes ISA and GS data common to all Catalogs.
598              
599             =cut
600             sub _unload_catalog
601             {
602 0     0     my $self = shift;
603 0           $XML::ASCX12::Catalogs::LOOPNEST = ();
604 0           $XML::ASCX12::Catalogs::IS_CHILD = undef;
605 0           $self->load_catalog(0);
606             }
607              
608             =item boolean = _CANHAVE($parent_loop, $child_loop)
609              
610              
611             This is a private static method. It uses the rules in the L
612             to determine if a parent is allowed to have the child loop. Returns C<0> or C<1>.
613              
614             =cut
615              
616             sub _CANHAVE
617             {
618 0     0     my ($parent, $child) = @_;
619 0 0         if (!$parent) { return 1; } # root-level can have anything
  0            
620 0 0         return 0 unless $child;
621 0 0         foreach (@{$XML::ASCX12::Catalogs::LOOPNEST->{$parent}}) { if ($_ eq $child) { return 1; } }
  0            
  0            
  0            
622 0           return 0;
623             }
624              
625             =back
626              
627             =head1 TODO
628              
629             Here are some things that would make this module even better. They are in no particular order:
630              
631             =over 4
632              
633             =item * Error Handling
634              
635             Maybe throw in an error object to keep track of things
636              
637             =item * Encoding Support
638              
639             Anyone that could review to make sure we are using the correct encodings
640             We basically read in the EDI file in binary and use the ASCII HEX-equivalent for the
641             separators. Many EDI-producing systems use EBCDIC and not UTF-8 so be careful when
642             specifying the values.
643              
644             =item * B Transaction Set (Catalog) Library
645              
646             Make a live repository of transaction set data (catalogs). I'd really like use XML to describe
647             each catalog and import them to local dbm files or tied hashes during install and via an update
648             script. This project will be driven if there is adaquate demand.
649              
650             According to the ASC X12 website (L), there are 315 transaction sets. This module has 4, so
651             there are 311 that could be added.
652              
653             Documentation for Catalog 175 (Court Notice Transaction Set) is available from
654             the US Bankruptcy Courts (L).
655              
656              
657             =item * XML Documentation
658              
659             Create a DTD and maybe even an XML Schema for the XML output. There ought to be better
660             documentation here.
661              
662             =back
663              
664              
665             =head1 AUTHORS
666              
667             Brian Kaney >, Jay Powers >
668              
669             L
670              
671             Copyright (c) 2004 Vermonster LLC. All rights reserved.
672              
673             This library is free software. You can redistribute it and/or modify
674             it under the terms of the GNU Lesser General Public License as
675             published by the Free Software Foundation; either version 2 of the
676             License, or (at your option) any later version.
677              
678             Basically you may use this library in commercial or non-commercial applications.
679             However, If you make any changes directly to any files in this library, you are
680             obligated to submit your modifications back to the authors and/or copyright holder.
681             If the modification is suitable, it will be added to the library and released to
682             the back to public. This way we can all benefit from each other's hard work!
683              
684             If you have any questions, comments or suggestions please contact the author.
685              
686             =head1 SEE ALSO
687              
688             L, L and L
689              
690             =cut
691             1;