Gather a significant volume of data for your database
and bulk load
it into
your relations. You should use real data for your DB as much as
possible. Write a program in any programming language you like to
collect and process the data,
then load the data into your DB relations. Your program will need to
transform the data into
files of records conforming to your DB schema. If certain real data is
not available, explain the reasons and write a program to fabricate
a large amount of data: Your software will produce records that are either random or structured (for instance, sequential) in accordance with your schema. It's crucial that the data you create resembles genuine data. The aim of producing vast quantities of data is to allow experimentation with a realistically-sized database, as opposed to a small-scale or "toy" database.
When writing a
program to fabricate data, there are two important points to keep in
mind:
- Do not to generate duplicate values for keys
of your
relations or
for other (sets of) unique attributes.
- For relations that
are
expected to join with each other. For example, a
Student relation with attribute courseNo
is expected
to
join with attribute number in relation Course.
When generating data for these two relations, make sure that the values generated can actually
join--otherwise all of your queries will generate empty results!
To ensure join compatibility, you can first produce values in one relation and then utilize those values to determine the joining values in another relation. For instance, you might initially create number in relation Course first and then use these numbers to populate the courseNo entries in the student relation.
Turn in your program code for generating or
transforming data, a small
sample of the records generated for each relation (5 or so records per
relation), and a script showing the loading of your data into the
database.