Commit 074cf4ea authored by Dan Suciu's avatar Dan Suciu
Browse files

working files

parent f3c0fd6a
# CSE 344 Homework 1: SQLITE and SQL Basics
**Objectives:** To be able to create and manipulate tables in sqlite3, and write simple queries using SQL.
**Assignment tools:** [SQLite 3](
**Assigned date:** January 7, 2019
**Due date:** January 15, 2019. You have 1 week for this homework.
**Questions:** Make sure your post them on the [discussion board](
**What to turn in:**
One file per each of the question below, containing your SQL and SQLite commands,
and SQL comments for your responses that are not in SQL (i.e., submit a `.sql` file that
can be executed directly against the database system).
No need to include any inputs or outputs. Name each file `hw1-q1.sql`, `hw1-q2.sql`, etc.
You will need to learn how to write SQL comments if you have not done so before.
Turn in your solution on [CSE's GitLab](
See [submission instructions](#submission) below.
We will use [SQLite]( for this assignment.
SQLite is a software library that implements a SQL database engine.
We will use SQLite in this assignment because it offers an extremely lightweight method to
create and analyze structured datasets (by structured we mean datasets in the form of tables
rather than, say, free text). Using SQLite is a minimal hassle approach to realizing the
benefits of a relational database management system.
Of course, SQLite does not do
everything, but we will get to that point in later assignments. In the meantime,
you can also learn [when to use SQLite and when not to use it](
- Some important SQLite commands:
- To view help contents: `.help`
- To view a list of all your tables: `.tables`
- To exit: `.exit`
- [A simple guide]( for commonly used command-line functions in SQLite.
- [More information]( on formatting output in SQLite.
- [An index]( of more detailed information for SQL commands in SQLite.
- A [SQL style guide]( in case you are interested (FYI only).
## Assignment Details
To run SQLite do the following:
- Mac OS X or Linux: open a terminal and type `sqlite3` (if installed)
- Windows: there are two reasonable options:
- Install the stand-alone windows program from the [SQLite web site](
(precompiled windows command-line shell on the [download]( page)
- (maybe a bit more complicated): Install [cygwin]( to get a
linux command shell, then open cygwin and type `sqlite3`
(you may have to install it by running setup → database → sqlite3;
it is probably already installed in the CSE labs).
Note that the course staff will only support the linux version of sqlite3
running on CSE machines (e.g., `attu`) and will also use that version for grading purposes.
*Make sure your submissions runs there*.
### Problems
1. (20 points) First, create a simple table using the following steps:
- Create a table Edges(Source, Destination) where both Source and Destination are integers.
- Insert the tuples `(10,5)`, `(6,25)`, `(1,3)`, and `(4,4)`
- Write a SQL statement that returns all tuples.
- Write a SQL statement that returns only column Source for all tuples.
- Write a SQL statement that returns all tuples where Source > Destination.
- Now insert the tuple `('-1','2000')`. Do you get an error? Why? This is a tricky question, you might want to [check the documentation](
2. (15 points) Next, you will create a table with attributes of types integer, varchar, date, and Boolean.
However, SQLite does not have date and Boolean: you will use `varchar` and `int` instead. Some notes:
- 0 (false) and 1 (true) are the values used to interpret Booleans.
- Date strings in SQLite are in the form: 'YYYY-MM-DD'.
Examples of valid date strings include: `'1988-01-15'`, `'0000-12-31'`, and `'2011-03-28'`.
Examples of invalid date strings include: `'11-11-01'`, `'1900-1-20'`, `'2011-03-5'`, and `'2011-03-50'`.
- Examples of date operations on date strings (feel free to try them):
`select date('2011-03-28')`;
`select date('now')`;
`select date('now', '-5 year')`;
`select date('now', '-5 year', '+24 hour')`;
`select case when date('now') < date('2011-12-09') then 'Taking classes' when date('now') < date('2011-12-16') then 'Exams' else 'Vacation' end;` What does this query do? (no need to turn in your answer)
Create a table called `MyRestaurants` with the following attributes (you can pick your own names for the attributes, just make sure it is clear which one is for which):
- Name of the restaurant: a `varchar` field
- Type of food they make: a `varchar` field
- Distance (in minutes) from your house: an `int`
- Date of your last visit: a `varchar` field, interpreted as date
- Whether you like it or not: an `int`, interpreted as a Boolean
3. (13 points)
Insert at least five tuples using the SQL INSERT command five (or more) times.
You should insert at least one restaurant you liked, at least one restaurant you did not like,
and at least one restaurant where you leave the “I like” field `NULL`.
4. (13 points)
Write a SQL query that returns all restaurants in your table. Experiment with a few of SQLite's
output formats and show the command you use to format the output along with your query:
- print the results in comma-separated form
- print the results in list form, delimited by "` | `"
- print the results in column form, and make each column have width 15
- for each of the formats above, try printing/not printing the column headers with the results
5. (13 points)
Write a SQL query that returns only the name and distance of all restaurants within and
including 20 minutes of your house. The query should list the restaurants in alphabetical order of names.
6. (13 points)
Write a SQL query that returns all restaurants that you like, but have not visited
since more than 3 months ago.
7. (13 points)
Write a SQL query that returns all restaurants that are within and including 10 mins from your house.
## Submission Instructions
<a name="submission"></a>
We will be using `git`, a source code control tool, for distributing and submitting homework assignments in this class.
This will allow you to download the code and instruction for the homework,
and also submit the labs in a standardized format that will streamline grading.
You will also be able to use `git` to commit your progress on the labs
as you go. This is **important**: Use `git` to back up your work. Back
up regularly by both committing and pushing your code as we describe below.
Course git repositories will be hosted as a repository in [CSE's
gitlab](, that is visible only to
you and the course staff.
### Getting started with Git
There are numerous guides on using `git` that are available. They range from being interactive to just text-based.
Find one that works and experiment -- making mistakes and fixing them is a great way to learn.
Here is a [link to resources](
that GitHub suggests starting with. If you have no experience with `git`, you may find this
[web-based tutorial helpful](
Git may already be installed in your environment; if it's not, you'll need to install it first.
For `bash`/Linux environments, git should be a simple `apt-get` / `yum` / etc. install.
More detailed instructions may be [found here](
Git is already installed on the CSE linux machines.
If you are using Eclipse or IntelliJ, many versions come with git already configured.
The instructions will be slightly different than the command line instructions listed but will work
for any OS. For Eclipse, detailed instructions can be found at
[EGit User Guide]( or the
[EGit Tutorial](
### Cloning your repository for homework assignments
We have created a git repository that you will use to commit and submit your the homework assignments.
This repository is hosted on the [CSE's GitLab]( ,
and you can view it by visiting the GitLab website at
`[your CSE username]`.
You'll be using this **same repository** for each of the homework assignments this quarter,
so if you don't see this repository or are unable to access it, let us know immediately!
The first thing you'll need to do is set up a SSH key to allow communication with GitLab:
1. If you don't already have one, generate a new SSH key. See [these instructions]( for details on how to do this.
2. Visit the [GitLab SSH key management page]( You'll need to log in using your CSE account.
3. Click "Add SSH Key" and paste in your **public** key into the text area.
While you're logged into the GitLab website, browse around to see which projects you have access to.
You should have access to `cse344-[your username]`.
Spend a few minutes getting familiar with the directory layout and file structure. For now nothing will
be there except for the `hw1` directory with these instructions.
We next want to move the code from the GitLab repository onto your local file system.
To do this, you'll need to clone the 344 repository by issuing the following commands on the command line:
$ cd [directory that you want to put your 344 assignments]
$ git clone[your CSE username].git
$ cd cse344-[your CSE username]
This will make a complete replica of the repository locally. If you get an error that looks like:
Cloning into 'cse344-[your CSE username]'...
Permission denied (publickey).
fatal: Could not read from remote repository.
... then there is a problem with your GitLab configuration. Check to make sure that your GitLab username matches the repository suffix, that your private key is in your SSH directory (`~/.ssh`) and has the correct permissions, and that you can view the repository through the website.
Cloning will make a complete replica of the homework repository locally. Any time you `commit` and `push` your local changes, they will appear in the GitLab repository. Since we'll be grading the copy in the GitLab repository, it's important that you remember to push all of your changes!
### Adding an upstream remote
The repository you just cloned is a replica of your own private repository on GitLab.
The copy on your file system is a local copy, and the copy on GitLab is referred to as the `origin` remote copy. You can view a list of these remote links as follows:
$ git remote -v
There is one more level of indirection to consider.
When we created your `cse344-[your CSE username]` repository, we forked a copy of it from another
repository `cse344-2019wi`. In `git` parlance, this "original repository" referred to as an `upstream` repository.
When we release bug fixes and subsequent homeworks, we will put our changes into the upstream repository, and you will need to be able to pull those changes into your own. See [the documentation]( for more details on working with remotes -- they can be confusing!
In order to be able to pull the changes from the upstream repository, we'll need to record a link to the `upstream` remote in your own local repository:
$ # Note that this repository does not have your username as a suffix!
$ git remote add upstream
For reference, your final remote configuration should read like the following when it's setup correctly:
$ git remote -v
origin[your CSE username].git (fetch)
origin[your CSE username].git (push)
upstream (fetch)
upstream (push)
In this configuration, the `origin` (default) remote links to **your** repository
where you'll be pushing your individual submission. The `upstream` remote points to **our**
repository where you'll be pulling subsequent homework and bug fixes (more on this below).
Let's test out the origin remote by doing a push of your master branch to GitLab. Do this by issuing the following commands:
$ touch empty_file
$ git add empty_file
$ git commit empty_file -m 'Testing git'
$ git push # ... to origin by default
The `git push` tells git to push all of your **committed** changes to a remote. If none is specified, `origin` is assumed by default (you can be explicit about this by executing `git push origin`). Since the `upstream` remote is read-only, you'll only be able to `pull` from it -- `git push upstream` will fail with a permission error.
After executing these commands, you should see something like the following:
Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 286 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To[your CSE username].git
cb5be61..9bbce8d master -> master
We pushed a blank file to our origin remote, which isn't very interesting. Let's clean up after ourselves:
$ # Tell git we want to remove this file from our repository
$ git rm empty_file
$ # Now commit all pending changes (-a) with the specified message (-m)
$ git commit -a -m 'Removed test file'
$ # Now, push this change to GitLab
$ git push
If you don't know Git that well, this probably seemed very arcane. Just keep using Git and you'll understand more and more. We'll provide explicit instructions below on how to use these commands to actually indicate your final lab solution.
### Pulling from the upstream remote
If we release additional details or bug fixes for this homework,
we'll push them to the repository that you just added as an `upstream` remote. You'll need to `pull` and `merge` them into your own repository. (You'll also do this for subsequent homeworks!) You can do both of these things with the following command:
$ git pull upstream master
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
* branch master -> FETCH_HEAD
7f81148..b0c4a3e master -> upstream/master
Merge made by the 'recursive' strategy. | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Here we pulled and merged changes to the `` file. Git may open a text editor to allow you to specify a merge commit message; you may leave this as the default. Note that these changes are merged locally, but we will eventually want to push them to the GitLab repository (`git push`).
Note that it's possible that there aren't any pending changes in the upstream repository for you to pull. If so, `git` will tell you that everything is up to date.
### Collaboration
All CSE 344 assignments are to be completed **INDIVIDUALLY**! However, you may discuss your high-level approach to solving each lab with other students in the class.
### Submitting your assignment
You may submit your code multiple times; we will use the latest version you submit that arrives
before the deadline.
Put your `.sql` files in `hw1/submission`. (A reminder: for homework 1, each of your .sql files must be executable by sqlite.)
Your directory structure should
look like this after you have completed the assignment:
cse344-[your CSE username]
\-- # script for turning in hw
\-- hw1
\-- # this is the file that you are currently reading
\-- submission
\-- hw1-q1.sql # your solution to question 1
\-- hw1-q2.sql # your solution to question 2
\-- hw1-q3.sql # your solution to question 3
**Important**: In order for your write-up to be added to the git repo, you need to explicitly add it:
$ cd submission
$ git add hw1-q1.sql hw1-q2.sql ...
Or if you do
$ git add submission
Then it will add *all* the files inside the `submission` directory to the repo. Finally, you need to commit and push the changes:
$ git commit -a -m 'my latest changes are here (or any message you want)'
$ git push
The flag `-a` means "commit all changes" (the easiest way); you can also manually select which files you want to commit. Commit and push as often as you want to save your homework on gitlab, before the deadline. When you want to submit, you should also add the tag `hw1` before pushing, by running
$ git tag hw1
The criteria for your homework being submitted on time is that your code must be tagged and pushed by the due date and time. This means that if one of the TAs or the instructor were to open up GitLab, they would be able to see your solutions on the GitLab web page.
**Just because your code has been committed on your local machine does not mean that it has been submitted -- it needs to be on GitLab!**
The tag can only be created once; if you want to update your submission, you need to delete it, then re-tag. There is convenient a bash script `` in the root level directory of your repository that does this automatically. You first commits, then run ``, which deletes any prior tag for the current lab, tags the current commit, and pushes the branch and tag to GitLab. If you are using Linux or Mac OSX, you should be able to run the following:
$ ./ hw1
You should see something like the following output:
$ ./ hw1
[master b155ba0] Homework 1
1 file changed, 1 insertion(+)
Deleted tag 'hw1' (was b26abd0)
To[your CSE username].git
- [deleted] hw1
Counting objects: 11, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), 448 bytes | 0 bytes/s, done.
Total 6 (delta 3), reused 0 (delta 0)
To[your CSE username].git
ae31bce..b155ba0 master -> master
Counting objects: 1, done.
Writing objects: 100% (1/1), 152 bytes | 0 bytes/s, done.
Total 1 (delta 0), reused 0 (delta 0)
To[your CSE username].git
* [new tag] hw1 -> hw1
#### Final Word of Caution!
Git is a distributed version control system. This means everything operates offline until you run `git pull` or `git push`. This is a great feature.
The bad thing is that you may **forget to `git push` your changes**. This is why we strongly, strongly suggest that you **check GitLab to be sure that what you want us to see matches up with what you expect**. As a second sanity check, you can re-clone your repository in a different directory to confirm the changes:
$ git clone[your CSE username].git confirmation_directory
$ cd confirmation_directory
$ # ... make sure everything is as you expect ...
\ No newline at end of file
put your .sql files in this directory, one file per question.
# CSE 344 Homework 2: Basic SQL Queries
**Objectives:** To create and import databases and to practice simple SQL queries using SQLite.
**Assignment tools:** [SQLite 3](, the flights dataset hosted in `hw2` directory
on gitlab.
(Reminder: To extract the content of a tar file, run the following command in the terminal of your VM, after navigating to the directory containing `flights-small.tar.gz`:
`tar zxvf flights-small.tar.gz`)
**Assigned date:** Tuesday, January 15, 2019
**Due date:** Tuesday, January 22, 2019. You have 1 week for this assignment. Turn in your solution using the provided script.
**Questions:** Make sure your post them on [Piazza](
**What to turn in:**
`create-tables.sql` and `hw2-q1.sql`, `hw2-q2.sql`, etc (see below).
## Obtaining Changes
After you have set the upstream master correctly (see HW1 for details), run `git pull upstream master` in your repository directory and it will pull the HW2 contents (including this file) into your repository. We will announce on Piazza if we have pushed major changes after releasing the HW (and if so you should pull again).
## Assignment Details
In this homework, you will write several SQL queries on a relational flights database.
The data in this database is abridged from the [Bureau of Transportation Statistics](
The database consists of four tables regarding a subset of flights that took place in 2015:
FLIGHTS (fid int,
month_id int, -- 1-12
day_of_month int, -- 1-31
day_of_week_id int, -- 1-7, 1 = Monday, 2 = Tuesday, etc
carrier_id varchar(7),
flight_num int,
origin_city varchar(34),
origin_state varchar(47),
dest_city varchar(34),
dest_state varchar(46),
departure_delay int, -- in mins
taxi_out int, -- in mins
arrival_delay int, -- in mins
canceled int, -- 1 means canceled
actual_time int, -- in mins
distance int, -- in miles
capacity int,
price int -- in $
CARRIERS (cid varchar(7), name varchar(83))
MONTHS (mid int, month varchar(9))
WEEKDAYS (did int, day_of_week varchar(9))
(FYI All data except for the capacity and price columns are real.)
We leave it up to you to decide how to declare these tables and translate their types to sqlite.
But make sure that your relations include all the attributes listed above.
In addition, make sure you impose the following constraints to the tables above:
- The primary key of the `FLIGHTS` table is `fid`.
- The primary keys for the other tables are `cid`, `mid`, and `did` respectively. Other than these, *do not assume any other attribute(s) is a key / unique across tuples.*
- `Flights.carrier_id` references `Carrier.cid`
- `Flights.month_id` references `Months.mid`
- `Flights.day_of_week_id` references `Weekdays.did`
We provide the flights database as a set of plain-text data files in the linked
`.tar.gz` archive. Each file in this archive contains all the rows for the named table, one row per line.
In this homework, you need to do two things:
1. import the flights dataset into SQLite
2. run SQL queries to answer a set of questions about the data.
To import the flights database into SQLite, you will need to run sqlite3 with a new database file.
for example `sqlite3 hw2.db`. Then you can run `CREATE TABLE` statement to create the tables,
choosing appropriate types for each column and specifying all key constraints as described above:
CREATE TABLE table_name ( ... );
Currently, SQLite does not enforce foreign keys by default.
To enable foreign keys use the following command.
The command will have no effect if you installed your own version of SQLite was not compiled with foreign keys enabled.
In that case do not worry about it (i.e., you will need to enforce foreign key constraints yourself as
you insert data into the table).
PRAGMA foreign_keys=ON;
Then, you can use the SQLite `.import` command to read data from each text file into its table after setting the input data to be in CSV (comma separated value) form:
.mode csv
.import filename tablename
See examples of `.import` statements in the section notes, and also look at the SQLite
documentation or sqlite3's help online for details.
Put all the code for this part (four `create table` statements and four `.import` statements)
into a file called `create-tables.sql` inside the `hw2/submission` directory.
### Writing SQL QUERIES (80 points, 10 points each)
**HINT: You should be able to answer all the questions below with SQL queries that do NOT contain any subqueries!**
For each question below, write a single SQL query to answer that question.
Put each of your queries in a separate `.sql` file as in HW1, i.e., `hw2-q1.sql`, `hw2-q2.sql`, etc.
Add a comment in each file indicating the number of rows in the query result.
**Important: The predicates in your queries should correspond to the English descriptions. For example, if a question asks you to find flights by Alaska Airlines Inc., the query should
include a predicate that checks for that specific name as opposed to checking for the matching carrier ID. Same for predicates over months, weekdays, etc.**
**Also, make sure you name the output columns as indicated! Do not change the output column names / return more or fewer columns!**
In the following questions below flights **include canceled flights as well, unless otherwise noted.**
Also, when asked to output times you can report them in minutes and don’t need to do minute-hour conversion.
If a query uses a `GROUP BY` clause, make sure that all attributes in your `SELECT` clause for that query
are either grouping keys or aggregate values. SQLite will let you select other attributes but that is wrong
as we discussed in lectures. Other database systems would reject the query in that case.
1. (10 points) List the distinct flight numbers of all flights from Seattle to Boston by Alaska Airlines Inc. on Mondays.
Also notice that, in the database, the city names include the state. So Seattle appears as
Seattle WA.
Name the output column `flight_num`.
[Hint: Output relation cardinality: 3 rows]
2. (10 points) Find all itineraries from Seattle to Boston on July 15th. Search only for itineraries that have one stop (i.e., flight 1: Seattle -> [somewhere], flight2: [somewhere] -> Boston).
Both flights must depart on the same day (same day here means the date of flight) and must be with the same carrier. It's fine if the landing date is different from the departing date (i.e., in the case of an overnight flight). You don't need to check whether the first flight overlaps with the second one since the departing and arriving time of the flights are not provided.
The total flight time (`actual_time`) of the entire itinerary should be fewer than 7 hours
(but notice that `actual_time` is in minutes).
For each itinerary, the query should return the name of the carrier, the first flight number,
the origin and destination of that first flight, the flight time, the second flight number,
the origin and destination of the second flight, the second flight time, and finally the total flight time.
Only count flight times here; do not include any layover time.
Name the output columns `name` as the name of the carrier, `f1_flight_num`, `f1_origin_city`, `f1_dest_city`, `f1_actual_time`, `f2_flight_num`, `f2_origin_city`, `f2_dest_city`, `f2_actual_time`, and `actual_time` as the total flight time. List the output columns in this order.
[Output relation cardinality: 1472 rows]
3. (10 points) Find the day of the week with the longest average arrival delay.
Return the name of the day and the average delay.
Name the output columns `day_of_week` and `delay`, in that order. (Hint: consider using `LIMIT`. Look up what it does!)
[Output relation cardinality: 1 row]
4. (10 points) Find the names of all airlines that ever flew more than 1000 flights in one day
(i.e., a specific day/month, but not any 24-hour period).
Return only the names of the airlines. Do not return any duplicates
(i.e., airlines with the exact same name).
Name the output column `name`.
[Output relation cardinality: 12 rows]
5. (10 points) Find all airlines that had more than 0.5 percent of their flights out of Seattle be canceled.
Return the name of the airline and the percentage of canceled flight out of Seattle.
Order the results by the percentage of canceled flights in ascending order.
Name the output columns `name` and `percent`, in that order.
[Output relation cardinality: 6 rows]
6. (10 points) Find the maximum price of tickets between Seattle and New York, NY (i.e. Seattle to NY or NY to Seattle).
Show the maximum price for each airline separately.
Name the output columns `carrier` and `max_price`, in that order.
[Output relation cardinality: 3 rows]
7. (10 points) Find the total capacity of all direct flights that fly between Seattle and San Francisco, CA on July 10th (i.e. Seattle to SF or SF to Seattle).
Name the output column `capacity`.
[Output relation cardinality: 1 row]
8. (10 points) Compute the total departure delay of each airline
across all flights.
Name the output columns `name` and `delay`, in that order.
[Output relation cardinality: 22 rows]
### Programming style
Remember to adhere to the SQL programming style from HW1. We repeat this below for your reference.
To encourage good SQL programming style please follow these two simple style rules:
- Give explicit names to all tables referenced in the `FROM` clause.
For instance, instead of writing:
select * from flights, carriers where carrier_id = cid
select * from flights as F, carriers as C where F.carrier_id = C.cid
(notice the `as`) so that it is clear which table you are referring to.
- Similarly, reference to all attributes must be qualified by the table name.
Instead of writing:
select * from flights where fid = 1
select * from flights as F where F.fid = 1
This will be useful when you write queries involving self joins in later assignments.
To help you check on whether your query is compliant with the above, use the [Cosette tool]( developed by the [UW Database research group]( Cosette is built to do more than syntax checking. To use Cosette, sign up for an account using your *UW email*, read through the brief tutorial, and put your query as one of the input queries (make sure you declare the input table schemas correctly). If your query contains stylistic errors, you will get a "Syntax Error" in the results pane. Otherwise, it will return whether the two input queries are equivalent or not.
Note: Cosette currently can only check the syntax for read queries (i.e., no inserts, updates, or deletes).
## Submission Instructions
Answer each of the queries above and put your SQL query in a separate file.
Call them `hw2-q1.sql`, `hw2-q2.sql`, etc. Make sure you name the files exactly as is. Put your
`.sql` files inside `hw2/submission` (along with your `create-tables.sql` from
the first part of this assignment).
Like HW1, you may submit your code multiple times; we will use the latest version you submit that
arrives before the deadline.
**Important**: To remind you, in order for your answers to be added to the git repo,
you need to explicitly add each file:
$ git add hw2-q1.sql hw2-q2.sql ...
**Again, just because your code has been committed on your local machine does not mean that it has been
submitted -- it needs to be on GitLab!**
Use the same bash script `` in the root level directory of your repository that
commits your changes, deletes any prior tag for the current lab, tags the current commit,
and pushes the branch and tag to GitLab.
If you are using the Linux VM or Mac OSX, you should be able to run the following: