File Coverage

blib/lib/Email/Address/List.pm
Criterion Covered Total %
statement 63 72 87.5
branch 25 44 56.8
condition 12 20 60.0
subroutine 6 6 100.0
pod 1 1 100.0
total 107 143 74.8


line stmt bran cond sub pod time code
1 6     6   1482807 use strict;
  6         14  
  6         247  
2 6     6   33 use warnings;
  6         12  
  6         459  
3 6     6   122 use 5.010;
  6         23  
4              
5             package Email::Address::List;
6              
7             our $VERSION = '0.07';
8 6     6   3647 use Email::Address;
  6         72253  
  6         15873  
9              
10             =head1 NAME
11              
12             Email::Address::List - RFC close address list parsing
13              
14             =head1 SYNOPSIS
15              
16             use Email::Address::List;
17              
18             my $header = <<'END';
19             Foo Bar , (an obsolete comment),,,
20             a group:
21             a . weird . address @
22             for-real .biz
23             ; invalid thingy, <
24             more@example.com
25             >
26             END
27              
28             my @list = Email::Address::List->parse($header);
29             foreach my $e ( @list ) {
30             if ($e->{'type'} eq 'mailbox') {
31             print "an address: ", $e->{'value'}->format ,"\n";
32             }
33             else {
34             print $e->{'type'}, "\n"
35             }
36             }
37              
38             # prints:
39             # an address: "Foo Bar"
40             # comment
41             # group start
42             # an address: a.weird.address@forreal.biz
43             # group end
44             # unknown
45             # an address: more@example.com
46              
47             =head1 DESCRIPTION
48              
49             Parser for From, To, Cc, Bcc, Reply-To, Sender and
50             previous prefixed with Resent- (eg Resent-From) headers.
51              
52             =head1 REASONING
53              
54             L is good at parsing addresses out of any text
55             even mentioned headers and this module is derived work
56             from Email::Address.
57              
58             However, mentioned headers are structured and contain lists
59             of addresses. Most of the time you want to parse such field
60             from start to end keeping everything even if it's an invalid
61             input.
62              
63             =head1 METHODS
64              
65             =head2 parse
66              
67             A class method that takes a header value (w/o name and :) and
68             a set of named options, for example:
69              
70             my @list = Email::Address::List->parse( $line, option => 1 );
71              
72             Returns list of hashes. Each hash at least has 'type' key that
73             describes the entry. Types:
74              
75             =over 4
76              
77             =item mailbox
78              
79             A mailbox entry with L object under value key.
80              
81             If mailbox has obsolete parts then 'obsolete' is true.
82              
83             If address (not display-name/phrase or comments, but
84             local-part@domain) contains not ASCII chars then 'not_ascii' is
85             set to true. According to RFC 5322 not ASCII chars are not
86             allowed within mailbox. However, there are no big problems if
87             those are used and actually RFC 6532 extends a few rules
88             from 5322 with UTF8-non-ascii. Either use the feature or just
89             skip such addresses with skip_not_ascii option.
90              
91             Whitespace and Unicode format characters (category C<\p{Cf}>:
92             ZERO WIDTH SPACE, ZWJ, ZWNJ, BOM, soft hyphen, bidi marks, etc.)
93             are stripped from C and C. Such characters
94             are typically introduced by copy/paste; left in, they produce
95             undeliverable mail. Stripping does not apply inside quoted
96             local-parts (their contents are preserved literally), nor to the
97             display-name or comments.
98              
99             =item group start
100              
101             Some headers with mailboxes may contain groupped addresses. This
102             element is returned for position where group starts. Under value
103             key you find name of the group. B that value is not post
104             processed at the moment, so it may contain spaces, comments,
105             quoted strings and other noise. Author willing to take patches
106             and warns that this will be changed at some point without additional
107             notifications, so if you need groups info then you better send a
108             patch :)
109              
110             Groups can not be nested, but one field may have multiple groups or
111             mix of addresses that are in a group and not in any.
112              
113             See skip_groups option.
114              
115             =item group end
116              
117             Returned when a group ends.
118              
119             =item comment
120              
121             Obsolete syntax allows one to use standalone comments between mailboxes
122             that can not be addressed to any mailbox. In such situations a comment
123             returned as an entry of this type. Comment itself is under value.
124              
125             =item unknown
126              
127             Returned if parser met something that shouldn't be there. Parser
128             tries to recover by jumping over to next comma (or semicolon if inside
129             group) that is out quoted string or comment, so "foo, bar, baz" string
130             results in three unknown entries. Jumping over comments and quoted strings
131             means that parser is very sensitive to unbalanced quotes and parens,
132             but it's on purpose.
133              
134             =back
135              
136             It can be controlled which elements are skipped, for example:
137              
138             Email::Address::List->parse($line, skip_unknown => 1, ...);
139              
140             =over 4
141              
142             =item skip_comments
143              
144             Skips comments between mailboxes. Comments inside and next to a mailbox
145             are not skipped, but returned as part of mailbox entry.
146              
147             =item skip_not_ascii
148              
149             Skips mailboxes where address part has not ASCII characters.
150              
151             =item skip_groups
152              
153             Skips group starts and end elements, however emails within groups are
154             still returned.
155              
156             =item skip_unknown
157              
158             Skip anything that is not recognizable. It still tries to recover as
159             described earlier.
160              
161             =back
162              
163             =cut
164              
165             # mailbox = name-addr / addr-spec
166             # display-name = phrase
167             #
168             # from = "From:" mailbox-list CRLF
169             # sender = "Sender:" mailbox CRLF
170             # reply-to = "Reply-To:" address-list CRLF
171             #
172             # to = "To:" address-list CRLF
173             # cc = "Cc:" address-list CRLF
174             # bcc = "Bcc:" [address-list / CFWS] CRLF
175             #
176             # resent-from = "Resent-From:" mailbox-list CRLF
177             # resent-sender = "Resent-Sender:" mailbox CRLF
178             # resent-to = "Resent-To:" address-list CRLF
179             # resent-cc = "Resent-Cc:" address-list CRLF
180             # resent-bcc = "Resent-Bcc:" [address-list / CFWS] CRLF
181             #
182             # obs-from = "From" *WSP ":" mailbox-list CRLF
183             # obs-sender = "Sender" *WSP ":" mailbox CRLF
184             # obs-reply-to = "Reply-To" *WSP ":" address-list CRLF
185             #
186             # obs-to = "To" *WSP ":" address-list CRLF
187             # obs-cc = "Cc" *WSP ":" address-list CRLF
188             # obs-bcc = "Bcc" *WSP ":" (address-list / (*([CFWS] ",") [CFWS])) CRLF
189             #
190             # obs-resent-from = "Resent-From" *WSP ":" mailbox-list CRLF
191             # obs-resent-send = "Resent-Sender" *WSP ":" mailbox CRLF
192             # obs-resent-date = "Resent-Date" *WSP ":" date-time CRLF
193             # obs-resent-to = "Resent-To" *WSP ":" address-list CRLF
194             # obs-resent-cc = "Resent-Cc" *WSP ":" address-list CRLF
195             # obs-resent-bcc = "Resent-Bcc" *WSP ":" (address-list / (*([CFWS] ",") [CFWS])) CRLF
196             # obs-resent-mid = "Resent-Message-ID" *WSP ":" msg-id CRLF
197             # obs-resent-rply = "Resent-Reply-To" *WSP ":" address-list CRLF
198              
199             our $COMMENT_NEST_LEVEL ||= 2;
200              
201             our %RE;
202             our %CRE;
203              
204             $RE{'CTL'} = q{\x00-\x1F\x7F};
205             $RE{'special'} = q{()<>\\[\\]:;@\\\\,."};
206              
207             $RE{'text'} = qr/[^\x0A\x0D]/;
208              
209             $RE{'quoted_pair'} = qr/\\$RE{'text'}/;
210              
211             $RE{'atext'} = qr/[^$RE{'CTL'}$RE{'special'}\s]/;
212             $RE{'ctext'} = qr/[^()\\]++/;
213             $RE{'qtext'} = qr/[^\\"]/;
214             $RE{'dtext'} = qr/[^\[\]\\]/;
215              
216             ($RE{'ccontent'}, $RE{'comment'}) = (q{})x2;
217             for (1 .. $COMMENT_NEST_LEVEL) {
218             $RE{'ccontent'} = qr/$RE{'ctext'}|$RE{'quoted_pair'}|$RE{'comment'}/;
219             $RE{'comment'} = qr/(?>\s*+\((?:\s*+$RE{'ccontent'})*+\s*+\)\s*+)/;
220             }
221             $RE{'cfws'} = qr/$RE{'comment'}++|\s*+/;
222              
223             $RE{'qcontent'} = qr/$RE{'qtext'}|$RE{'quoted_pair'}/;
224             $RE{'quoted-string'} = qr/$RE{'cfws'}"$RE{'qcontent'}*+"$RE{'cfws'}/;
225              
226             $RE{'atom'} = qr/$RE{'cfws'}$RE{'atext'}++$RE{'cfws'}/;
227              
228             $RE{'word'} = qr/$RE{'atom'} | $RE{'quoted-string'}/x;
229             $RE{'phrase'} = qr/$RE{'word'}+/x;
230             $RE{'display-name'} = $RE{'phrase'};
231              
232             $RE{'dot_atom_text'} = qr/$RE{'atext'}++(?:\.$RE{'atext'}++)*/;
233             $RE{'dot_atom'} = qr/$RE{'cfws'}$RE{'dot_atom_text'}$RE{'cfws'}/;
234             $RE{'local-part'} = qr/$RE{'dot_atom'}|$RE{'quoted-string'}/;
235              
236             $RE{'dcontent'} = qr/$RE{'dtext'}|$RE{'quoted_pair'}/;
237             $RE{'domain_literal'} = qr/$RE{'cfws'}\[(?:\s*$RE{'dcontent'})*\s*\]$RE{'cfws'}/;
238             $RE{'domain'} = qr/$RE{'dot_atom'}|$RE{'domain_literal'}/;
239              
240             $RE{'addr-spec'} = qr/$RE{'local-part'}\@$RE{'domain'}/;
241             $RE{'angle-addr'} = qr/$RE{'cfws'} < $RE{'addr-spec'} > $RE{'cfws'}/x;
242              
243             $RE{'name-addr'} = qr/$RE{'display-name'}?$RE{'angle-addr'}/;
244             $RE{'mailbox'} = qr/(?:$RE{'name-addr'}|$RE{'addr-spec'})$RE{'comment'}*/;
245              
246             $CRE{'addr-spec'} = qr/($RE{'local-part'})\@($RE{'domain'})/;
247             $CRE{'mailbox'} = qr/
248             (?:
249             ($RE{'display-name'})?($RE{'cfws'})<$CRE{'addr-spec'}>($RE{'cfws'})
250             |$CRE{'addr-spec'}
251             )($RE{'comment'}*+)
252             /x;
253              
254             $RE{'dword'} = qr/$RE{'cfws'} (?: $RE{'atom'} | \. | "$RE{'qcontent'}++" ) $RE{'cfws'}/x;
255             $RE{'obs-phrase'} = qr/$RE{'word'} $RE{'dword'}*+/x;
256             $RE{'obs-display-name'} = $RE{'obs-phrase'};
257             $RE{'obs-route'} = qr/
258             (?:$RE{'cfws'}|,)*
259             \@$RE{'domain'}
260             (?:,$RE{'cfws'}?(?:\@$RE{'domain'})?)*
261             :
262             /x;
263             $RE{'obs-domain'} = qr/$RE{'atom'}(?:\.$RE{'atom'})*|$RE{'domain_literal'}/;
264             $RE{'obs-local-part'} = qr/$RE{'word'}(?:\.$RE{'word'})*/;
265             $RE{'obs-addr-spec'} = qr/$RE{'obs-local-part'}\@$RE{'obs-domain'}/;
266             $CRE{'obs-addr-spec'} = qr/($RE{'obs-local-part'})\@($RE{'obs-domain'})/;
267             $CRE{'obs-mailbox'} = qr/
268             (?:
269             ($RE{'obs-display-name'})?
270             ($RE{'cfws'})< $RE{'obs-route'}? $CRE{'obs-addr-spec'} >($RE{'cfws'})
271             |$CRE{'obs-addr-spec'}
272             )($RE{'comment'}*+)
273             /x;
274              
275             sub parse {
276 324     324 1 2070630 my $self = shift;
277 324 50       2150 my %args = @_%2? (line => @_) : @_;
278 324         1024 my $line = delete $args{'line'};
279              
280 324         638 my $in_group = 0;
281              
282 324         598 my @res;
283 324         2122 while ($line =~ /\S/) {
284             # in obs- case we have number of optional comments/spaces/
285             # address-list = (address *("," address)) / obs-addr-list
286             # obs-addr-list = *([CFWS] ",") address *("," [address / CFWS]))
287 1732 100       10133 if ( $line =~ s/^(?:($RE{'cfws'})?,)//o ) {
288             push @res, {type => 'comment', value => $1 }
289 1004 50 66     6854 if $1 && !$args{'skip_comments'} && $1 =~ /($RE{'comment'})/;
      66        
290 1004         2728 next;
291             }
292 728         1858 $line =~ s/^\s+//o;
293              
294             # now it's only comma separated address where address is:
295             # address = mailbox / group
296              
297             # deal with groups
298             # group = display-name ":" [group-list] ";" [CFWS]
299             # group-list = mailbox-list / CFWS / obs-group-list
300             # obs-group-list = 1*([CFWS] ",") [CFWS])
301 728 50 33     9405 if ( !$in_group && $line =~ s/^($RE{'display-name'})://o ) {
302             push @res, {type => 'group start', value => $1 }
303 0 0       0 unless $args{'skip_groups'};
304 0         0 $in_group = 1; next;
  0         0  
305             }
306 728 50 33     1871 if ( $in_group && $line =~ s/^;// ) {
307 0 0       0 push @res, {type => 'group end'} unless $args{'skip_groups'};
308 0         0 $in_group = 0; next;
  0         0  
309             }
310              
311             # now we got rid of groups and cfws, 'address = mailbox'
312             # mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
313             # obs-mbox-list = *([CFWS] ",") mailbox *("," [mailbox / CFWS]))
314              
315             # so address-list is now comma separated list of mailboxes:
316             # address-list = (mailbox *("," mailbox))
317 728         1139 my $obsolete = 0;
318 728 100 100     111090 if ( $line =~ s/^($CRE{'mailbox'})($RE{cfws}*)(?=,|;|$)//o
      100        
319             || ($line =~ s/^($CRE{'obs-mailbox'})($RE{cfws}*)(?=,|;|$)//o and $obsolete = 1)
320             ) {
321 719         2737 my ($original, $phrase, $user, $host, @comments) = $self->_process_mailbox(
322             $1,$2,$3,$4,$5,$6,$7,$8,$9
323             );
324 719 50       2574 my $not_ascii = "$user\@$host" =~ /\P{ASCII}/? 1 : 0;
325 719 0 33     1491 next if $not_ascii && $args{skip_not_ascii};
326              
327 719         3996 push @res, {
328             type => 'mailbox',
329             value => Email::Address->new(
330             $phrase, "$user\@$host", join(' ', @comments),
331             $original,
332             ),
333             obsolete => $obsolete,
334             not_ascii => $not_ascii,
335             };
336 719         10912 next;
337             }
338              
339             # if we got here then something unknown on our way
340             # try to recorver
341 9 50       240 if ($in_group) {
342 0 0       0 if ( $line =~ s/^([^;,"\)]*+(?:(?:$RE{'quoted-string'}|$RE{'comment'})[^;,"\)]*+)*+)(?=;|,)//o ) {
343 0 0       0 push @res, { type => 'unknown', value => $1 } unless $args{'skip_unknown'};
344 0         0 next;
345             }
346             } else {
347 9 100       1186 if ( $line =~ s/^([^,"\)]*+(?:(?:$RE{'quoted-string'}|$RE{'comment'})[^,"\)]*+)*+)(?=,)//o ) {
348 2 50       20 push @res, { type => 'unknown', value => $1 } unless $args{'skip_unknown'};
349 2         10 next;
350             }
351             }
352 7 50       59 push @res, { type => 'unknown', value => $line } unless $args{'skip_unknown'};
353 7         17 last;
354             }
355 324         1309 return @res;
356             }
357              
358             my $dequote = sub {
359             local $_ = shift;
360             s/^"//; s/"$//; s/\\(.)/$1/g;
361             return "$_";
362             };
363             my $quote = sub {
364             local $_ = shift;
365             s/([\\"])/\\$1/g;
366             return qq{"$_"};
367             };
368              
369             sub _process_mailbox {
370 719     719   1337 my $self = shift;
371 719         1759 my $original = shift;
372 719         4422 my @rest = (@_);
373              
374 719         1053 my @comments;
375 719         3193 foreach ( grep defined, splice @rest ) {
376 3717         19999 s{ ($RE{'quoted-string'}) | ($RE{comment}) }
377 917 100       2208 { $1? $1 : do { push @comments, $2; $comments[-1] =~ /^\s|\s$/? ' ' : '' } }xgoe;
  780 100       1566  
  780         5322  
378 3717         6035 s/^\s+//; s/\s+$//;
  3717         6483  
379 3717 100       7073 next unless length;
380              
381 1932         3473 push @rest, $_;
382             }
383 719         2701 my ($host, $user, $phrase) = reverse @rest;
384              
385             # deal with spaces out of quoted strings
386 435 100       3280 s{ ($RE{'quoted-string'}) | \s+ }{ $1? $1 : ' ' }xgoe
387 719         5492 foreach grep defined, $phrase;
388             # Strip whitespace and invisible format characters that
389             # aren't inside a quoted-string.
390 308 100       2136 s{ ($RE{'quoted-string'}) | [\p{White_Space}\p{Cf}]+ }{ $1? $1 : '' }xgoe
391 719         11988 foreach $user, $host;
392              
393             # dequote
394 137         405 s{ ($RE{'quoted-string'}) }{ $dequote->($1) }xgoe
395 719         4989 foreach grep defined, $phrase, $user;
396 719 50       9326 $user = $quote->($user) unless $user =~ /^$RE{'dot_atom'}$/;
397              
398 719         1773 @comments = grep length, map { s/^\s+//; s/\s+$//; $_ } grep defined, @comments;
  780         1326  
  780         1541  
  780         1670  
399 719         3936 return $original, $phrase, $user, $host, @comments;
400             }
401              
402              
403             =head1 AUTHOR
404              
405             Best Practical Solutions, LLC Emodules@bestpractical.comE
406              
407             =head1 BUGS
408              
409             All bugs should be reported via email to
410              
411             L
412              
413             or via the web at
414              
415             L.
416              
417             =head1 LICENSE AND COPYRIGHT
418              
419             Copyright (C) 2012-2026 Best Practical Solutions, LLC.
420              
421             This library is free software; you can redistribute it and/or modify
422             it under the same terms as Perl itself.
423              
424             =cut
425              
426             1;