File Coverage

blib/lib/HTML/Email/Obfuscate.pm
Criterion Covered Total %
statement 47 53 88.6
branch 22 32 68.7
condition n/a
subroutine 11 13 84.6
pod 3 3 100.0
total 83 101 82.1


line stmt bran cond sub pod time code
1             package HTML::Email::Obfuscate;
2              
3             =pod
4              
5             =head1 NAME
6              
7             HTML::Email::Obfuscate - Obfuscated HTML email addresses that look normal
8              
9             =head1 DESCRIPTION
10              
11             I<"Don't put emails directly on the page, they will be scraped">
12              
13             Stuff that, I'm sick of looking at C. Why can't we
14             just write emails in a way that looks normal to people, but is very, very
15             difficult to scrape off. Most email scrapers only use very very simple
16             parsing methods. And it isn't as if it is hard to just do.
17              
18             # Before we search for email addresses...
19             $page =~ s/\s+at\s+/@/g;
20             $page =~ s/\s+dot\s+/./g;
21              
22             This is an arms war dammit, and I want nukes!
23              
24             =head2 About this Module
25              
26             This module was written during OSDC/YAPC.AU to demonstrate how quick and
27             easy it is to write a basic module and put it on CPAN. The code was
28             written in about 40 minutes, the documentation was added during a break
29             period before drinks and dinner, and the packing and test files were
30             added during the python keynote (significant whitespace... ew...).
31              
32             =head2 How this works
33              
34             This module starts by applying a fairly basic set of character escapes to
35             avoid the most basic scrapers, and then layers more and more crap on
36             randomly, so that any scraper will need to implement more and more of a
37             full web browser, while keeping the email looking "normal" to anyone
38             browsing.
39              
40             I've only scraped the surface of what we can achieve, and I'll leave it to
41             others to submit patches to improve it from here on.
42              
43             =head2 Using HTML::Email::Obfuscate
44              
45             This is a pretty simple module.
46              
47             First, create an obfuscator object. This is just a simple object that holds
48             some preferences about how extreme you want to be about the obfuscation.
49              
50             # Create a default obfuscation object
51             my $Email = HTML::Email::Obfuscate->new;
52              
53             Now to turn a normal email string into an obfuscated and fully escaped HTML
54             one, just provide it to the escape_html method.
55              
56             # Obfuscate my email address
57             my $html = $Email->escape_html( 'cpan@ali.as' );
58              
59             And we get something like this
60              
61             ***Example here once I get a chance to run it***
62              
63             The defaults are fairly insane, so for people that just want veeeery simple
64             escaping, we'll provide a lite version.
65              
66             # Create a "lite" obfuscator
67             my $Email = HTML::Email::Obfuscate->new( lite => 1 );
68            
69             # Access the lite escape method directly, regardless of the
70             # obfuscator's constructor params.
71             my $html = $Email->escape_html_lite( 'cpan@ali.as' );
72              
73             For the more serious people, we can also add some more extreme measures
74             that are probably not going to be compatible with everything, such as
75             JavaScript. :/
76              
77             # Allow the obfuscator to use JavaScript
78             my $Email = HTML::Email::Obfuscator->new( javascript => 1 );
79              
80             Best not to use that unless you have a JavaScript-capable browser.
81              
82             I think that just about covers it, and my 7 minute lightning talk is
83             probably almost up.
84              
85             =head1 METHODS
86              
87             =cut
88              
89 2     2   32404 use 5.005;
  2         7  
  2         82  
90 2     2   13 use strict;
  2         4  
  2         112  
91 2     2   2041 use HTML::Entities ();
  2         16111  
  2         89  
92              
93 2     2   17 use vars qw{$VERSION @WRAP_METHOD};
  2         4  
  2         150  
94             BEGIN {
95 2     2   5 $VERSION = '1.00';
96              
97             # The list of modifier methods
98 2         1440 @WRAP_METHOD = qw{
99             _random_modifier_span
100             _random_modifier_comment
101             _random_modifier_javascript
102             };
103             }
104              
105              
106              
107              
108              
109             #####################################################################
110             # Constructor
111              
112             =pod
113              
114             =head2 new $param => $value [, ... ]
115              
116             The C constructor creates a new obfuscation object, which use can
117             then use to obfuscate as many email addresses as you like, at whatever
118             severity you want it to be done.
119              
120             It takes two optional parameters.
121              
122             If you set the C<'javascript'> param, the obfuscator will add JavaScript
123             obfuscation (possibly, and randomly) to the mix of obfuscation routines.
124              
125             If you set the C<'lite'> param, the obfuscator will only use the most
126             basic form of escaping, which will only fool scanner that don't do
127             HTML entity decoding. Setting 'lite' implies that JavaScript should not
128             be used, even if you explicitly try to turn it on.
129              
130             Returns a new C object.
131              
132             =cut
133              
134             sub new {
135 3     3 1 22 my $class = shift;
136 3 50       11 my %args = ref $_[0] eq 'HASH' ? %{shift()} : @_;
  0         0  
137 3         8 %args = map { lc $_ } %args;
  4         12  
138              
139             # Create the defailt HTML generation object
140 3         11 my $self = bless {
141             lite => '',
142             javascript => '',
143             }, $class;
144              
145             # Flag control
146 3 100       8 $self->{javascript} = 1 if $args{javascript};
147 3 100       13 $self->{javascript} = '' if $args{lite};
148 3 100       8 $self->{lite} = 1 if $args{lite};
149              
150 3         8 $self;
151             }
152              
153             =pod
154              
155             =head2 escape_html_lite $email
156              
157             On an otherwise normal obfuscator, the C method provides
158             direct access to the lite method for obfuscating emails.
159              
160             Returns a HTML string, or C if passed no params, or and undefined
161             param.
162              
163             =cut
164              
165             sub escape_html_lite {
166 3     3 1 2219 my $either = shift;
167 3 50       8 my $email = defined $_[0] ? shift : return undef;
168 3 50       13 my $self = ref($either) ? $either : $either->new(@_) or return undef;
    50          
169              
170             # Just escape @ and add a single HTML comment
171 3         9 $email =~ s/\@/@/sg;
172              
173 3         10 $email;
174             }
175              
176             =pod
177              
178             =head2 escape_html $email
179              
180             The C method obfuscates an email according to the params
181             provided to the constructor.
182              
183             Returns a HTML string, or C if passed no params, or and undefined
184             param.
185              
186             =cut
187              
188             sub escape_html {
189 3     3 1 2199 my $either = shift;
190 3 50       8 my $email = defined $_[0] ? shift : return undef;
191 3 50       12 my $self = ref $either ? $either : $either->new(@_) or return undef;
    50          
192              
193             # Split into a set of characters
194 3         94 my @chars = split //, $email;
195              
196 3         8 foreach my $char ( @chars ) {
197             # Escape individual characters
198 48         81 $char = $self->_escape_char($char);
199              
200             # Randomly wrap 20% of characters
201 48 100       914 next unless rand(1) < 0.1;
202 2         10 $char = $self->_random_modifier($char);
203             }
204              
205             # Join and return
206 3         16 join '', @chars;
207             }
208              
209             sub _escape_char {
210 48     48   49 my $self = shift;
211 48         43 my $char = shift;
212              
213             # Handle various characters
214 48 100       94 return '@' if $char eq '@';
215 45 100       67 return '.' if $char eq '.';
216              
217             # Force the numberic escape of 20% of the characters.
218             # Allow the remaining 80% to escape by the normal rules.
219 42 100       157 return (rand(1) < 0.2)
220             ? HTML::Entities::encode_numeric($char, '^ ')
221             : HTML::Entities::encode_numeric($char);
222             }
223              
224             sub _random_modifier {
225 2     2   3 my $self = shift;
226              
227             # Which wrap style do we want to use?
228 2 50       6 my $max = $self->{javascript} ? 2 : 1;
229 2         6 my $method = $WRAP_METHOD[int(rand($max))];
230 2         5 $self->$method(shift);
231             }
232              
233             sub _random_modifier_span {
234 2     2   7 "$_[1]";
235             }
236              
237             sub _random_modifier_comment {
238 0 0   0     (rand > 0.5) ? "$_[1]" : "$_[1]";
239             }
240              
241             sub _random_modifier_javascript {
242 0     0     my $self = shift;
243 0           my $html = shift;
244 0           $html =~ s/'/"/g;
245 0           qq~~;
246             }
247              
248             1;
249              
250             =pod
251              
252             =head1 TO DO
253              
254             OK, other than compile testing, I admit that I haven't really done
255             anything significant in the way of testing. I mean, there was B
256             an interesting python talk on, and how on earth do you test something
257             that has randomised output. :/
258              
259             So yeah, it would be nice to write some better tests.
260              
261             =head1 SUPPORT
262              
263             Bugs should be reported via the CPAN bug tracker at
264              
265             L
266              
267             For other issues, or commercial enhancement or support, contact the author.
268              
269             =head1 AUTHORS
270              
271             Adam Kennedy Eadamk@cpan.orgE
272              
273             Thank you to Phase N (L) for permitting
274             the open sourcing and release of this distribution.
275              
276             =head1 COPYRIGHT
277              
278             Copyright 2004 - 2006 Adam Kennedy.
279              
280             This program is free software; you can redistribute
281             it and/or modify it under the same terms as Perl itself.
282              
283             The full text of the license can be found in the
284             LICENSE file included with this module.
285              
286             =cut