File Coverage

blib/lib/Roman/Unicode.pm

Criterion	Covered	Total	%
statement	94	102	92.1
branch	28	30	93.3
condition	9	9	100.0
subroutine	21	22	95.4
pod	8	8	100.0
total	160	171	93.5

line	stmt	bran	cond	sub	pod	time	code
1	5			5		120329	use utf8;
	5					36
	5					31
2	5			5		226	use 5.014;
	5					18
3
4							=encoding utf8
5
6							=head1 NAME
7
8							Roman::Unicode - Make roman numerals, using the Unicode characters for them
9
10							=head1 SYNOPSIS
11
12							use Roman::Unicode qw( to_roman is_roman to_perl );
13
14							my $perl_number = to_perl( $roman ) if is_roman( $roman );
15							my $roman_number = to_roman( $arabic );
16
17							=head1 DESCRIPTION
18
19							I made this module as a way to demonstrate various Unicode things without
20							mixing up natural language stuff. Surprisingly, roman numerals can do quite
21							a bit with that. You'll have to read the source to see it in action.
22
23							There are many fancy characters in this documentation, so you need a good
24							font that has the right glyphs. The Symbola font is a good one:
25							http://users.teilar.gr/~g1951d/
26
27							=head2 Functions
28
29							=over 4
30
31							=item is_roman( STRING )
32
33							Returns true if the string looks like a valid roman numeral. This
34							works with either the ASCII version or the ones using the characters
35							in the U+2160 to U+2188 range. You cannot mix the uppercase and lowercase
36							numerals.
37
38							=item to_perl( ROMAN )
39
40							If the argument is a valid roman numeral, C returns the Perl
41							number. Otherwise, it returns nothing.
42
43							=item to_roman( PERL_NUMBER )
44
45							If the argument is a valid Perl number, even if it is a string,
46							C returns the roman numeral representation. This uses the
47							characters in the U+2160 to U+2188 range.
48
49							If the number cannot be represented as roman numerals, this returns
50							nothing. Note that 0 doesn't have a roman numeral representation.
51
52							If you want the lowercase version, you can use C on the result.
53							However, some of the roman numerals don't have lowercase versions.
54
55							=item to_ascii( ROMAN )
56
57							If the argument is a valid roman numeral, it returns an ASCII
58							representation of it. Most of the numeral code points have compatible
59							decompositions, so the first step uses NFKD decomposition. For other
60							characters, it uses ASCII art representations:
61
62							Roman ASCII art
63							------ ----------
64							ↁ \|)
65							ↂ ((\|))
66							ↈ (((\|)))
67							ↇ \|))
68
69							=back
70
71							=head2 Case mapping
72
73							As a demonstration of case mapping, I supply one function that uses
74							L. You can lexically override the case-mapping functions
75							as described in that module's documentation.
76
77							=over 4
78
79							=item to_roman_lower
80
81							A subroutine you can use with C. It's a bit more special
82							because it turns the higher magnitude characters into ASCII versions. That
83							means that the return value might not be a valid according to C. It
84							returns nothing if the input isn't a valid Roman numeral string.
85
86							You can also use this as a stand-alone function instead of C. That's the
87							smart way to do it, but then you don't get to play with C.
88
89							=back
90
91							=head2 User-defined properties
92
93							Perl lets you define your own properties, as documented in L. This
94							module defines several.
95
96							=over 4
97
98							=item IsRoman
99
100							The C property is a combination of C and
101							C.
102
103							=item IsUppercaseRoman
104
105							The C property matches these code points:
106
107							Ⅰ U+2160 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ
108							Ⅴ U+2164 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪᴠᴇ
109							Ⅹ U+2169 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴛᴇɴ
110							Ⅼ U+216C ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪꜰᴛʏ
111							Ⅽ U+216D ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ ʜᴜɴᴅʀᴇᴅ
112							Ⅾ U+216E ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪᴠᴇ ʜᴜɴᴅʀᴇᴅ
113							Ⅿ U+216F ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ ᴛʜᴏᴜsᴀɴᴅ
114							ↁ U+2181 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪᴠᴇ ᴛʜᴏᴜsᴀɴᴅ
115							ↂ U+2182 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴛᴇɴ ᴛʜᴏᴜsᴀɴᴅ
116							ↇ U+2187 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪꜰᴛʏ ᴛʜᴏᴜsᴀɴᴅ
117							ↈ U+2188 ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ ʜᴜɴᴅʀᴇᴅ ᴛʜᴏᴜsᴀɴᴅ
118
119							This excludes the other Roman numeral code points, such as Ⅻ (U+216B, ʀᴏᴍᴀɴ
120							ɴᴜᴍᴇʀᴀʟ ᴛᴡᴇʟᴠᴇ) since they are not designed to be part of larger strings of
121							Roman numerals.
122
123							=item IsLowercaseRoman
124
125							The C is the set of lowercase code points derived from the
126							set of code points in C. It checks each code point in
127							C and checks the Unicode Character Database (UCD) through
128							L to see if it has a lowercase mapping. If there is a lowercase
129							mapping, it makes it part of this property.
130
131							=back
132
133							=head1 LIMITATIONS
134
135							By using just the defined roman numerals characters in the Unicode Character
136							Set, you're limited to numbers less than 400,000 (although you could make
137							ↈↈↈↈ if you wanted, since that's not unheard of).
138
139							=head1 AUTHOR
140
141							brian d foy C<< >>
142
143							This module started with the Roman module, credited to:
144
145							OZAWA Sakuro C<< >> 1995-1997
146
147							Alexandr Ciornii, C<< >> 2007
148
149							=head1 COPYRIGHT
150
151							Copyright © 2011-2022, brian d foy .
152
153							You can use this module under the terms of Artistic License 2.0.
154
155							=cut
156
157							package Roman::Unicode {
158	5			5		28	use feature qw(unicode_strings);
	5					9
	5					642
159
160	5			5		35	use strict;
	5					8
	5					124
161	5			5		25	use warnings;
	5					12
	5					168
162	5			5		502	use open IO => ':utf8';
	5					1255
	5					31
163
164	5			5		565	use Exporter 'import';
	5					10
	5					443
165							our @EXPORT_OK = qw( is_roman to_perl to_roman to_ascii );
166							our $VERSION = '1.034';
167
168	5			5		6867	use Unicode::UCD;
	5					264810
	5					332
169	5			5		42	use Unicode::Normalize qw(NFKD);
	5					13
	5					1301
170
171							# I'm specifically not using the characters for the other roman numerals
172							# because those are meant to stand alone, as they might in a clock face
173							our %valid_roman = map { $_, 1 } (
174							# the capitals U+2160 to U+216F, U+2180 to U+2182, U+2187 to U+2188
175							qw(Ⅰ Ⅴ Ⅹ Ⅼ Ⅽ Ⅾ Ⅿ ↁ ↂ ↇ ↈ ),
176							# the lowercase U+2170 to U+217f
177							qw(ⅰ ⅴ ⅹ ⅼ ⅽ ⅾ ⅿ),
178							# the ASCII
179							qw(I V X L C D M),
180							qw(i v x l c d m),
181
182							);
183
184							our %roman2arabic = qw(
185							Ⅰ 1 Ⅴ 5 Ⅹ 10
186							Ⅼ 50 Ⅽ 100 Ⅾ 500 Ⅿ 1000 ↁ 5000 ↂ 10000 ↇ 50000 ↈ 100000
187
188							ⅰ 1 ⅴ 5 ⅹ 10
189							ⅼ 50 ⅽ 100 ⅾ 500 ⅿ 1000
190							);
191
192	28			28		222	sub _get_chars { my @chars = $_[0] =~ /(\X)/ug }
193
194	57			57		279	sub _highest_value { (sort { $a <=> $b } values %roman2arabic)[-1] }
	3021					3827
195
196							sub is_roman($) {
197	122	100		122	1	40097	$_[0] =~ / \A \p{IsUppercaseRoman}+ \z /x
198							or
199							$_[0] =~ / \A \p{IsLowercaseRoman}+ \z /x
200							}
201
202							sub to_perl($) { # Stolen from Roman.pm, mostly
203	34	100		34	1	92	is_roman $_[0] or return;
204	28					79	my($last_digit) = _highest_value();
205	28					47	my($arabic);
206
207	28					63	foreach my $char ( _get_chars( $_[0] ) ) {
208	71					126	my $digit = $roman2arabic{$char};
209	71	100				137	$arabic -= 2 * $last_digit if $last_digit < $digit;
210	71					123	$arabic += ($last_digit = $digit);
211							}
212
213	28					140	$arabic;
214							}
215
216	0					0	BEGIN {
217
218	5			5		48	my %roman_digits = qw(
219							1 ⅠⅤ
220							10 ⅩⅬ
221							100 ⅭⅮ
222							1000 Ⅿↁ
223							10000 ↂↇ
224							100000 ↈↈↈↈ
225							);
226
227	5					48	my @figure = reverse sort keys %roman_digits;
228	5					1890	$roman_digits{$_} = [split(//, $roman_digits{$_}, 2)] foreach @figure;
229
230							sub to_roman($) { # stolen from Roman.pm, mostly
231	34			34	1	74	my( $arg ) = @_;
232
233							{
234	5			5		2834	no warnings 'numeric';
	5					14
	5					1526
	34					52
235	34	100	100			176	0 < $arg and $arg < 4 * _highest_value() or return;
236							}
237
238	28					69	my($x, $roman) = ( '', '' );
239	28					68	foreach my $figure ( @figure ) {
240	168					264	my( $digit, $i, $v ) = (int( $arg/$figure ), @{$roman_digits{$figure}});
	168					407
241
242	168					232	$roman .= do {
243	168	100	100			703	if( 1 <= $digit and $digit <= 3 ) { $i x $digit }
	16	100	100			40
		100
		100
		100
244	5					12	elsif( $digit == 4 ) { "$i$v" }
245	6					15	elsif( $digit == 5 ) { $v }
246	3					13	elsif( 6 <= $digit and $digit <= 8 ) { $v . $i x ($digit - 5) }
247	10					26	elsif( $digit == 9 ) { "$i$x" }
248							};
249
250	168					255	$arg -= $digit * $figure;
251	168					264	$x = $i;
252							}
253
254	28					271	$roman;
255							}
256							}
257
258							sub to_ascii {
259	27			27	1	695	my( $roman ) = @_;
260	27	100				59	return unless is_roman( $roman );
261
262	22					115	$roman = Unicode::Normalize::NFKD( $roman );
263
264	22					63	$roman =~ s/ↁ/\|))/g;
265	22					44	$roman =~ s/ↂ/((\|))/g;
266	22					50	$roman =~ s/ↈ/(((\|)))/g;
267	22					43	$roman =~ s/ↇ/\|)))/g;
268
269	22					99	$roman;
270							}
271
272							sub IsRoman {
273	3			3	1	35623	IsUppercaseRoman() . IsLowercaseRoman()
274							}
275
276							sub IsUppercaseRoman {
277	11			11	1	2483	return <<"CODE_NUMBERS";
278							2160
279							2164
280							2169
281							216C 216F
282							2181 2182
283							2187 2188
284							CODE_NUMBERS
285							}
286
287							sub IsLowercaseRoman {
288	8			8	1	2095	state $string;
289	8	100				44	return $string if defined $string;
290
291	3					9	my @codes = ();
292
293	3					9	my $uppers = IsUppercaseRoman();
294	3			3		22	open my $string_fh, '<', \ $uppers;
	3					7
	3					19
	3					101
295	3					2474	while( my $line = <$string_fh> ) {
296	18					32	my @n = map { hex } map { m/(\p{HexDigit}+)/g } $line;
	27					77
	18					84
297	18	100				46	if( @n == 1 ) { push @codes, $n[0] }
	9					17
298	18	100				47	if( @n == 2 ) { push @codes, $n[0] .. $n[1] };
	9					42
299							}
300
301	21					43	my @lowers = map { hex } map {
302	3					9	my $char_info = Unicode::UCD::charinfo( $_ );
	33					98
303	33	100				847785	$char_info->{lower} ? $char_info->{lower} : ();
304							} @codes;
305
306							$string = join "\n", map {
307	3					12	sprintf( '%04X', $_ )
	21					55
308							} @lowers;
309
310	3					46	$string .= "\n";
311							}
312
313							# Use this with Unicode::Casing, or not
314							sub to_roman_lower {
315	0	0		0	1		return unless &is_roman;
316
317	0						my $lower = CORE::lc( $_[0] );
318
319	0						$lower =~ s/ↁ/\|)/g; # ↁ U+2181
320	0						$lower =~ s/ↂ/((\|))/g; # ↂ U+2182
321	0						$lower =~ s/ↇ/\|))/g; # ↇ U+2187
322	0						$lower =~ s/ↈ/(((\|)))/g; # ↈ U+2188
323
324	0						return $lower;
325							}
326							}
327
328							1;