CS320 Project

Project part 4: Constraints, Triggers, and Indexing

100 points

Step 4 of Your DB

If you have modified your design because of feedbacks from the instructor (or any other reason), please explain the reasons in your report.

Create views, checks, and triggers that will facilitate your database and queries. Refer to the documentation for PostgreSQL on how to do this.
In previous you probably discovered that some queries run very slowly over your large database. An important technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the database to find quickly all tuples in R with a given value for attribute A. This index is useful if a value of A is specified by your query (in the where-clause). It may also be useful if A is involved in a join that equates it to some other attribute. For example, in the query
```
     SELECT Bars.address
     FROM Frequents, Bars
     WHERE   Frequents.drinker = 'fred'
         AND Frequents.bar = Bars.name;
```
we might use an index on Frequents.drinker to help us find the tuple for drinker Fred quickly. We might also like an index on Bars.name, so we can take all the bars Fred frequents and quickly find the tuples for those bars to read their addresses.
In PostgreSQL, you can get an index by the command:
```
     CREATE INDEX <IndexName> ON <RelationName>(<Attribute List>);
```
Note:
- Postgres does allow non-unique indices, but indices used to enforce SQL keys are always unique
- PostgreSQL creates indexes automatically when you declare an attribute(s) PRIMARY KEY or UNIQUE. Thus, some indexes may exist before you create them and may also be making certain queries run faster than than they would with no indexes at all. Be sure to take this factor into account when trying to explain differences in running times. If you are unable to discover queries whose running-time is affected by the indexes that you declare, you could try temporarily removing your key declarations, so PostgreSQL will not create indexes ``behind the scenes.''
If the attribute list for an index contains more than one attribute, then the index requires values for all the listed attributes to find a tuple. That situation might be helpful if the attributes together form a key.

An illustration of the CREATE INDEX command is
```
     CREATE INDEX FreqDrinkerInd ON Frequents(drinker);
     CREATE INDEX BarInd ON Bars(name);
```
which creates the two indexes mentioned above. To get rid of an index, you can say DROP INDEX followed by the name of the index. Notice that each index must have a name, even though we only refer to the name if we want to drop the index.

Now, you are ready to run your experiments. Create at least two useful indexes for your PDA. Run your queries from previous lab on your large database with the indexes and without the indexes. To time your commands, you may type \timing after logging into PostgreSQL. This will start the timer and show the running time for each of your queries in milliseconds.

Naturally these times may be affected by external factors such as system load, etc. Still, you should see a dramatic difference between the execution times with indexes and the times without. Turn in a script showing your commands to create indexes, and showing the relative times of query execution with and without indexes.

Note: Often, students discover that indexes appear to slow down the execution of queries. There are two issues you should consider:
1. A query involving several relations will not speed up unless indexes to support all of the selections. A single index may only increase the number of disk I/O's needed (to get index blocks), without affecting the query as a whole.
2. The second time you run a query may take much less time than the first, because the second time data is cached in main memory. Data may be cached in the main memory of the machine running PostgreSQL. If many students are using PostgreSQL at the same time, the PostgreSQL machine's cache will probably drop your data if you wait a few seconds. We suggest that you perform only one timing experiment in a session; at least exit from PostgreSQL and start it again.

Adapted from CS145 @ Standford.