CS145 - Fall 2013
Introduction to Databases
Due Monday October 14
This assignment consists of writing a whole bunch of SQL queries
(and a few SQL modifications) over two different databases, along
three challenge problems. It may not look like much, but don't wait
until the last minute -- there are lots of queries in the exercises
and some of them are fairly difficult.
Log into the CS145 Class2Go
Site. Make sure you're using the CS145 portal and not the
public one. Your SUNet ID
should be required to access the site; logging in with SUNet also
ensures that we properly record your scores. Click on the Interactive Exercises tab. Under
SQL, complete four exercise
Automated SQL Query and Modification
You will also see SQL Movie-Rating
Query Exercises (extras) and SQL
Social-Network Query Exercises (extras). These exercise sets are not assigned and have no due date during the course.
They are simply there to provide you with extra practice if you
- SQL Movie-Rating Query
Exercises (assigned) - 9 problems
- SQL Movie-Rating Modification
Exercises - 4
- SQL Social-Network Query
- 9 problems
- SQL Social-Network
- 3 problems
As with the DTD and Relational Algebra exercises from Assignment #1,
the website is designed so you can interactively develop your
solutions in place: each problem allows you to run your proposed
solution and see a comparison between your result and the expected
one. Once you're happy with your solutions, you must press Submit Answers for your score
to be recorded.
If you prefer to develop your solutions offline, you can install a
SQL system on your computer and download the data files. The
automated system tests your queries using SQLite. Since there are
some differences in capabilities between SQLite, MySQL, and
PostgreSQL, if you plan to work offline we suggest you use SQLite.
Note that for the SQL modification exercises, the automated system
tests each modification command against the original database state,
runs a query after the modification to ensure that the database has
been modified correctly, then restores the original state.
Whichever way you work, don't
forget to enter your solutions and press "Submit Answers" to
obtain your score. You may do the exercises as many times
as you like. Although all scores are recorded, we will only use the
highest score you achieve as of the due date. As usual, automated
exercises receive no credit if submitted late.
For all three problems:
- We're looking for solutions that use the standard declarative
constructs of SQL as covered in the lecture videos and optional
readings. Try to write queries using those constructs, before
resorting to nondeclarative constructs or esoteric features
supported by some SQL systems. All of the problems can be solved
using standard constructs.
- We strongly suggest using SQLite, MySQL, or some other SQL
DBMS to test your solutions. It's an easy matter to download one
of the open-source systems (see our Quick
Guide), create and load a sample table, and play around.
That's really the only way you can be assured your queries are
working. That's also the only way we'll be assured they're
working. To be considered for a "Silver" or "Gold" score on this
set of challenge problems (see Challenge
Problem Grading), include in your solution for each
problem a transcript of running your SQL queries over a sample
table that is large enough to verify correctness for all cases.
Please include your data as well as your query results. Solid
attempts at solutions that haven't been tested on a running
system are still eligible for "Bronze".
Consider a table
the typical annual rainfall (in inches) for cities around the
United States. Attributes
[city,state] together form
(a) Write the simplest (i.e., shortest) SQL query you can
come up with that returns each state appearing in the table, along
with the lowest and highest rainfall for that state. The schema of
the result should be
duplicates. Note that if a state has only one city in the
(b) You probably used
aggregation in your solution to part (a). Do the same problem, but
do not use aggregation. Again, keep your query as short as
possible, but don't worry if it's longer than your solution for
(c) Try to modify your solutions for parts (a) and (b) so
they also include the cities that have the lowest and highest
rainfall for each state. That is, the schema of the result should
(state,low,lcity,high,hcity). You may assume each
state has a single city with the lowest rainfall and a single city
with the highest rainfall. Is one solution easier to modify than
the the other?
(d) Starting from your solution for part (a) or (b) (or a
new version altogether), modify the query so it now returns a
fourth value for each state containing the median rainfall for
that state. The schema of the result should be
You may assume each state has an odd number of cities, and that
the median (the middle number) is unique.
Consider a one-attribute table
positive integers, possibly including duplicates. Your goal is to
write a SQL query to find the smallest positive integer that is not
present in the table. You may assume the table contains at least
one tuple. Make sure your query works correctly on instances of
that contain the value 1, and on instances of
where 1 is the smallest integer not present. As always, try to
keep your query as compact as possible.
Consider a SQL database being used to store a tree data
structure. The database has one table:
(pID,cID) in this table encodes the fact
that the node with identifier
pID is the parent of
the node with identifier
cID in the tree structure.
There are no duplicates in the table. You can assume that the
table faithfully represents a tree: every node except one (the
root) has exactly one parent. You may also assume the maximum
height of the tree is four: there's no path from the root to a
leaf that's of length > 4. Do not make any other assumptions,
such as a fixed or maximum number of children per parent, or a
balanced tree structure (i.e., a uniform distance from root to
Your goal is to write a SQL query that finds the lowest
commmon ancestor of two given nodes, X and Y. Your query
should return the unique node Z that is an ancestor of both X and
Y, and no other node is an ancestor of X and Y but a descendent of
Z. (Put another way, Z is the common ancestor that's furthest from
the root.) Note that every node is an ancestor of itself.
- In your query, treat X and Y as constants. When the query is
run, X and Y should be replaced with two node IDs from the
database. The result of the query should be a single node ID.
- Your solution should work regardless of where X and Y are
located with respect to each other in the tree, including X = Y.
- If you're shooting for a "Silver" or "Gold" score by
including a transcript of running your query over a sample
database, please also include a rendering of your tree
structure, and make sure to show a representative suite of
different relationships between nodes X and Y.
- Finding a solution that can be expressed in a single standard
declarative SQL query is not easy! If you're completely stuck,
one approach you can try is to modularize a bit: run an initial
query or two and put the results into temporary tables, then run
a query over the temporary tables to get the final result. Once
you've got that working you can think about how to create a
single query that has the same effect.
Submission Instructions: Challenge Problems
should be submitted in pdf
format. For students who prefer hand-writing their solutions or
using some other format, solutions may be scanned and submitted as
. Solutions must be submitted through Coursework
and must be in a single file named
). Login to Coursework
on the CS145 tab at the top. In the left side-bar, click on Assignments
you will see Challenge Problems #2
. Upload your solution
file and submit for grading. You may resubmit as many times as you
like, however only the latest file and the latest timestamp are
saved, and those are what we will use for grading your work and
determining late penalties. Submissions via email will not be
Reminders: Late Policy and Honor Code
- Your score for each automated exercise set is the highest
score achieved as of 11:59:00 PM on the due date. No credit is
given for automated exercises submitted after the due date.
- The Challenge Problems
are submitted separately as specified above. They are due at
11:59:00 PM on the due date. Submissions after the
deadline but less than 24 hours late will be accepted but
penalized 10%, and submissions more than 24 hours but less than
48 hours late will be penalized 30%. No submissions will be
accepted more than 48 hours late.
- Since emergencies do arise, each student is allowed a total
of four unpenalized late days (four periods up to 24 hours each)
for non-automated work, but no single assignment may be more
than two days late.
- For detailed discussion of the Stanford Honor Code as it
pertains to CS145, please see the Assigned
Work page under Honor Code. In summary: You must
indicate on all submitted work any assistance (human or
otherwise) that you received. Any assistance received that is
not given proper citation will be considered a violation of the
Honor Code. In any event, you are responsible for understanding
and being able to explain on your own all material that you