line |
stmt |
bran |
cond |
sub |
pod |
time |
code |
1
|
|
|
|
|
|
|
package Cheater; |
2
|
|
|
|
|
|
|
|
3
|
11
|
|
|
11
|
|
69002
|
use Cheater::Parser; |
|
11
|
|
|
|
|
53
|
|
|
11
|
|
|
|
|
1400
|
|
4
|
11
|
|
|
11
|
|
6010
|
use Cheater::AST; |
|
0
|
|
|
|
|
|
|
|
0
|
|
|
|
|
|
|
5
|
|
|
|
|
|
|
use Cheater::Eval; |
6
|
|
|
|
|
|
|
|
7
|
|
|
|
|
|
|
our $VERSION = '0.10'; |
8
|
|
|
|
|
|
|
|
9
|
|
|
|
|
|
|
1; |
10
|
|
|
|
|
|
|
__END__ |
11
|
|
|
|
|
|
|
|
12
|
|
|
|
|
|
|
=encoding utf-8 |
13
|
|
|
|
|
|
|
|
14
|
|
|
|
|
|
|
=head1 NAME |
15
|
|
|
|
|
|
|
|
16
|
|
|
|
|
|
|
Cheater - Generate random database based on rules |
17
|
|
|
|
|
|
|
|
18
|
|
|
|
|
|
|
=head1 VERSION |
19
|
|
|
|
|
|
|
|
20
|
|
|
|
|
|
|
This document describes Cheater 0.10 released on June 24, 2011. |
21
|
|
|
|
|
|
|
|
22
|
|
|
|
|
|
|
=head1 DESCRIPTION |
23
|
|
|
|
|
|
|
|
24
|
|
|
|
|
|
|
Cheater is a tool that can generate random database based on rules. It's being widely used |
25
|
|
|
|
|
|
|
within the LineZing team of Taobao.com. |
26
|
|
|
|
|
|
|
|
27
|
|
|
|
|
|
|
Compared to other similar tools, C<cheater> has the following advantages: |
28
|
|
|
|
|
|
|
|
29
|
|
|
|
|
|
|
=over |
30
|
|
|
|
|
|
|
|
31
|
|
|
|
|
|
|
=item * |
32
|
|
|
|
|
|
|
|
33
|
|
|
|
|
|
|
it can automatically handle the association and foreign key restrictions among |
34
|
|
|
|
|
|
|
data tables, so it's the real "database instance generator". |
35
|
|
|
|
|
|
|
|
36
|
|
|
|
|
|
|
=item * |
37
|
|
|
|
|
|
|
|
38
|
|
|
|
|
|
|
It defines a SQL-like little language to specify the data model that we want to generate from. |
39
|
|
|
|
|
|
|
|
40
|
|
|
|
|
|
|
=item * |
41
|
|
|
|
|
|
|
|
42
|
|
|
|
|
|
|
It supports powerful C<{a, b, c}> discrete enumation sets, numerical/time/date interval syntax C<a..b>, |
43
|
|
|
|
|
|
|
Perl regular expressions C</regex/>, constant values C<'string'>, C<1.32>, and etc, to describe the value |
44
|
|
|
|
|
|
|
range of data table field. |
45
|
|
|
|
|
|
|
|
46
|
|
|
|
|
|
|
=item * |
47
|
|
|
|
|
|
|
|
48
|
|
|
|
|
|
|
It can generate JSON or SQL insert statements to ease importing to RDMBSes like MySQL/PostgreSQL. |
49
|
|
|
|
|
|
|
|
50
|
|
|
|
|
|
|
=back |
51
|
|
|
|
|
|
|
|
52
|
|
|
|
|
|
|
Below is a very simple example to demonstrate its basic usage. |
53
|
|
|
|
|
|
|
|
54
|
|
|
|
|
|
|
First of all, we create a F<.cht> input file in our working directory (say, under F<~/work/>), |
55
|
|
|
|
|
|
|
in order to describe the data model that we want to geneate data from. Assuming we have |
56
|
|
|
|
|
|
|
a F<company.cht> file like this: |
57
|
|
|
|
|
|
|
|
58
|
|
|
|
|
|
|
# Empolyee table |
59
|
|
|
|
|
|
|
table employees ( |
60
|
|
|
|
|
|
|
id serial; |
61
|
|
|
|
|
|
|
name text /[A-Z]a-z{2,5} [A-Z]a-z{2,7}/ not null unique; |
62
|
|
|
|
|
|
|
age integer 18..60 not null; |
63
|
|
|
|
|
|
|
tel text /1[35]8\d{8}/; |
64
|
|
|
|
|
|
|
birthday date; |
65
|
|
|
|
|
|
|
height real 1.50 .. 1.90 not null; |
66
|
|
|
|
|
|
|
grades text {'A','B','C','D','E'} not null; |
67
|
|
|
|
|
|
|
department references departments.id; |
68
|
|
|
|
|
|
|
) |
69
|
|
|
|
|
|
|
|
70
|
|
|
|
|
|
|
# Department table |
71
|
|
|
|
|
|
|
table departments ( |
72
|
|
|
|
|
|
|
id serial; |
73
|
|
|
|
|
|
|
name text /\w{2,10}/ not null; |
74
|
|
|
|
|
|
|
) |
75
|
|
|
|
|
|
|
|
76
|
|
|
|
|
|
|
10 employees; |
77
|
|
|
|
|
|
|
2 departments; |
78
|
|
|
|
|
|
|
|
79
|
|
|
|
|
|
|
Here we're using the little language (or DSL) defined by C<cheater> itself. It's semantics |
80
|
|
|
|
|
|
|
is self-explanatory. In particular, the last two lines state that we want to generate 10 rows |
81
|
|
|
|
|
|
|
of data for the C<employees> table and 2 rows for the C<departments> table. |
82
|
|
|
|
|
|
|
|
83
|
|
|
|
|
|
|
And then, we use the F<cht-compile> command to compile our F<company.cht> file to generate a |
84
|
|
|
|
|
|
|
random database instance: |
85
|
|
|
|
|
|
|
|
86
|
|
|
|
|
|
|
$ cht-compile company.cht |
87
|
|
|
|
|
|
|
Wrote ./data/departments.schema.json |
88
|
|
|
|
|
|
|
Wrote ./data/departments.rows.json |
89
|
|
|
|
|
|
|
Wrote ./data/employees.schema.json |
90
|
|
|
|
|
|
|
Wrote ./data/employees.rows.json |
91
|
|
|
|
|
|
|
|
92
|
|
|
|
|
|
|
We see that it generates two F<.json> data files for the C<departments> and C<employees> tables, |
93
|
|
|
|
|
|
|
respectively. For example, the F<data/emplyees.rows.json> file on my machine resulting from |
94
|
|
|
|
|
|
|
a particular run looks like this: |
95
|
|
|
|
|
|
|
|
96
|
|
|
|
|
|
|
$ cat data/employees.rows.json |
97
|
|
|
|
|
|
|
[["id","name","age","tel","birthday","height","grades","department"], |
98
|
|
|
|
|
|
|
["7606","Kxhwcn Cflub",54,"15872171866","2011-04-01","1.67276","D","408862"], |
99
|
|
|
|
|
|
|
["63649","Whf Iajgw",55,"13850771916",null,"1.65297","E","844615"], |
100
|
|
|
|
|
|
|
["348161","Nnwe Obfkln",27,"15801601215","2011-03-06","1.69275","D","408862"], |
101
|
|
|
|
|
|
|
["353404","Shgpak Xvqxw",28,"15816453097",null,"1.67796","A","408862"], |
102
|
|
|
|
|
|
|
["445500","Bdt Mhepht",47,"13855517847",null,"1.89943","C","844615"], |
103
|
|
|
|
|
|
|
["513515","Ipsa Mcbtk",25,"13874017694","2011-01-06","1.79534","A","844615"], |
104
|
|
|
|
|
|
|
["658009","Lboe Etqo",27,null,"2011-04-14","1.85162","E","408862"], |
105
|
|
|
|
|
|
|
["716899","Gey Elacflr",18,"15804516095","2011-02-27","1.75681","A","844615"], |
106
|
|
|
|
|
|
|
["945911","Hsuz Qcmky",39,"13862516775","2011-05-31","1.75947","B","408862"], |
107
|
|
|
|
|
|
|
["960643","Qbmbe Ijnbqsb",24,"15872418765","2011-04-11","1.78864","B","844615"]] |
108
|
|
|
|
|
|
|
|
109
|
|
|
|
|
|
|
These are the "row data". On the other hand, F<./data/employees.schema.json> is the table structure |
110
|
|
|
|
|
|
|
definition for the C<employees> table. It looks like this on my side: |
111
|
|
|
|
|
|
|
|
112
|
|
|
|
|
|
|
[{"attrs":[],"name":"id","type":"serial"}, |
113
|
|
|
|
|
|
|
{"attrs":["not null","unique"],"name":"name","type":"text"}, |
114
|
|
|
|
|
|
|
{"attrs":["not null"],"name":"age","type":"integer"}, |
115
|
|
|
|
|
|
|
{"attrs":[],"name":"tel","type":"text"}, |
116
|
|
|
|
|
|
|
{"attrs":[],"name":"birthday","type":"date"}, |
117
|
|
|
|
|
|
|
{"attrs":["not null"],"name":"height","type":"real"}, |
118
|
|
|
|
|
|
|
{"attrs":["not null"],"name":"grades","type":"text"}, |
119
|
|
|
|
|
|
|
{"attrs":[],"name":"department","type":"serial"}] |
120
|
|
|
|
|
|
|
|
121
|
|
|
|
|
|
|
We can generate SQL DDL statement files accepted by RDBMSes like MySQL or PostgreSQL from the |
122
|
|
|
|
|
|
|
F<.schema.json> files like this: |
123
|
|
|
|
|
|
|
|
124
|
|
|
|
|
|
|
$ cht-schema2sql data/employees.schema.json |
125
|
|
|
|
|
|
|
Wrote ./sql/employees.schema.sql |
126
|
|
|
|
|
|
|
|
127
|
|
|
|
|
|
|
The output F<.sql> file looks like this: |
128
|
|
|
|
|
|
|
|
129
|
|
|
|
|
|
|
$ cat ./sql/employees.schema.sql |
130
|
|
|
|
|
|
|
drop table if exists employees; |
131
|
|
|
|
|
|
|
create table employees ( |
132
|
|
|
|
|
|
|
id serial primary key, |
133
|
|
|
|
|
|
|
name text not null unique, |
134
|
|
|
|
|
|
|
age integer not null, |
135
|
|
|
|
|
|
|
tel text, |
136
|
|
|
|
|
|
|
birthday date, |
137
|
|
|
|
|
|
|
height real not null, |
138
|
|
|
|
|
|
|
grades text not null, |
139
|
|
|
|
|
|
|
department serial |
140
|
|
|
|
|
|
|
); |
141
|
|
|
|
|
|
|
|
142
|
|
|
|
|
|
|
If we want to eliminate the drop table statement in the resulting SQL file, we can |
143
|
|
|
|
|
|
|
specify the C<-n> option while running the F<cht-schema2sql> utility. For instance, |
144
|
|
|
|
|
|
|
|
145
|
|
|
|
|
|
|
$ cht-schema2sql -n data/employees.schema.json |
146
|
|
|
|
|
|
|
Wrote ./sql/employees.schema.sql |
147
|
|
|
|
|
|
|
|
148
|
|
|
|
|
|
|
At last, we can use the F<cht-rows2sql> command to convert those F<.rows.json> data files to |
149
|
|
|
|
|
|
|
F<.sql> files that are ready for relation database systems to import the "row data". |
150
|
|
|
|
|
|
|
|
151
|
|
|
|
|
|
|
$ cht-rows2sql data/*.rows.json |
152
|
|
|
|
|
|
|
Wrote ./sql/departments.rows.sql |
153
|
|
|
|
|
|
|
Wrote ./sql/employees.rows.sql |
154
|
|
|
|
|
|
|
|
155
|
|
|
|
|
|
|
The F<sql/departments.rows.sql> looks like this on my side: |
156
|
|
|
|
|
|
|
|
157
|
|
|
|
|
|
|
$ cat sql/departments.rows.sql |
158
|
|
|
|
|
|
|
insert into departments (id,name) values |
159
|
|
|
|
|
|
|
(408862,'dJRq7LCXL'), |
160
|
|
|
|
|
|
|
(844615,'G_m9Nkh3q'); |
161
|
|
|
|
|
|
|
|
162
|
|
|
|
|
|
|
To prevent the resulting data from conflicting with extra unique key restrictions in the targeting |
163
|
|
|
|
|
|
|
RDMBS table, we can use the C<-r> option to make F<cht-rows2sql> generate SQL replace statements |
164
|
|
|
|
|
|
|
to work-around this: |
165
|
|
|
|
|
|
|
|
166
|
|
|
|
|
|
|
$ cht-rows2sql -r data/*.rows.json |
167
|
|
|
|
|
|
|
Wrote ./sql/departments.rows.sql |
168
|
|
|
|
|
|
|
Wrote ./sql/employees.rows.sql |
169
|
|
|
|
|
|
|
|
170
|
|
|
|
|
|
|
Now we're ready to import the random data into database systems like MySQL! |
171
|
|
|
|
|
|
|
|
172
|
|
|
|
|
|
|
$ mysql -u some_user -p dbname < sql/departments.rows.sql |
173
|
|
|
|
|
|
|
|
174
|
|
|
|
|
|
|
For now, C<cheater> is still in active development and lacking comprehensive documentation, |
175
|
|
|
|
|
|
|
the most complete documentation is its (declarative) test suite: |
176
|
|
|
|
|
|
|
|
177
|
|
|
|
|
|
|
L<http://github.com/agentzh/cheater/tree/master/t/> |
178
|
|
|
|
|
|
|
|
179
|
|
|
|
|
|
|
Open one of those F<.t> files, you can see lots of declarative test cases, like these: |
180
|
|
|
|
|
|
|
|
181
|
|
|
|
|
|
|
=== TEST 5: datetime range domain |
182
|
|
|
|
|
|
|
--- src |
183
|
|
|
|
|
|
|
table cats ( |
184
|
|
|
|
|
|
|
birthday datetime 2010-05-24 03:45:00..2010-06-05 18:46:05 not null; |
185
|
|
|
|
|
|
|
) |
186
|
|
|
|
|
|
|
|
187
|
|
|
|
|
|
|
5 cats; |
188
|
|
|
|
|
|
|
--- out |
189
|
|
|
|
|
|
|
cats |
190
|
|
|
|
|
|
|
birthday |
191
|
|
|
|
|
|
|
2010-06-02 14:59:02 |
192
|
|
|
|
|
|
|
2010-06-04 03:31:00 |
193
|
|
|
|
|
|
|
2010-06-03 01:51:41 |
194
|
|
|
|
|
|
|
2010-05-28 19:29:34 |
195
|
|
|
|
|
|
|
2010-06-02 13:31:38 |
196
|
|
|
|
|
|
|
|
197
|
|
|
|
|
|
|
=head1 INSTALLATION |
198
|
|
|
|
|
|
|
|
199
|
|
|
|
|
|
|
perl Makefile.PL |
200
|
|
|
|
|
|
|
make |
201
|
|
|
|
|
|
|
make test |
202
|
|
|
|
|
|
|
make install |
203
|
|
|
|
|
|
|
|
204
|
|
|
|
|
|
|
SOURCE REPOSITORY |
205
|
|
|
|
|
|
|
The source repository of this project is on GitHub: |
206
|
|
|
|
|
|
|
|
207
|
|
|
|
|
|
|
L<http://github.com/agentzh/cheater/> |
208
|
|
|
|
|
|
|
|
209
|
|
|
|
|
|
|
If you have found any bugs or feature request, feel free to create tickets on the GitHub issues page: |
210
|
|
|
|
|
|
|
|
211
|
|
|
|
|
|
|
L<http://github.com/agentzh/cheater/issues> |
212
|
|
|
|
|
|
|
|
213
|
|
|
|
|
|
|
=head1 AUTHOR |
214
|
|
|
|
|
|
|
|
215
|
|
|
|
|
|
|
Zhang "agentzh" Yichun (ç« äº¦æ¥) C<< <agentzh@gmail.com> >> |
216
|
|
|
|
|
|
|
|
217
|
|
|
|
|
|
|
=head1 COPYRIGHT & LICENSE |
218
|
|
|
|
|
|
|
|
219
|
|
|
|
|
|
|
Copyright (c) 2010-2011, Taobao Inc., Alibaba Group (L<http://www.taobao.com>). |
220
|
|
|
|
|
|
|
|
221
|
|
|
|
|
|
|
Copyright (c) 2010-2011, Zhang "agentzh" Yichun (ç« äº¦æ¥) C<< <agentzh@gmail.com> >>. |
222
|
|
|
|
|
|
|
|
223
|
|
|
|
|
|
|
This module is licensed under the terms of the BSD license. |
224
|
|
|
|
|
|
|
|
225
|
|
|
|
|
|
|
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: |
226
|
|
|
|
|
|
|
|
227
|
|
|
|
|
|
|
=over |
228
|
|
|
|
|
|
|
|
229
|
|
|
|
|
|
|
=item * |
230
|
|
|
|
|
|
|
|
231
|
|
|
|
|
|
|
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. |
232
|
|
|
|
|
|
|
|
233
|
|
|
|
|
|
|
=item * |
234
|
|
|
|
|
|
|
|
235
|
|
|
|
|
|
|
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. |
236
|
|
|
|
|
|
|
|
237
|
|
|
|
|
|
|
=item * |
238
|
|
|
|
|
|
|
|
239
|
|
|
|
|
|
|
Neither the name of the Taobao Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. |
240
|
|
|
|
|
|
|
|
241
|
|
|
|
|
|
|
=back |
242
|
|
|
|
|
|
|
|
243
|
|
|
|
|
|
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
244
|
|
|
|
|
|
|
|
245
|
|
|
|
|
|
|
=head1 SEE ALSO |
246
|
|
|
|
|
|
|
|
247
|
|
|
|
|
|
|
L<Parse::RandGen::Regexp>, L<Data::Random>. |
248
|
|
|
|
|
|
|
|