Commit 319f76a3 authored by Alvin Cheung's avatar Alvin Cheung
Browse files

updated HW5 instructions

parent 08bae2a3
......@@ -39,7 +39,7 @@ In this homework, you will be writing SQL++ queries over the semi-structured dat
3. Start the server. Go to the [Asterix documentation website](https://asterixdb.apache.org/docs/0.9.2/index.html) and follow the instructions listed under "Option 1: Using NC Services." Follow the instructions under “Quick Start”.
If you use the home VM, `start-sample-cluster.sh` is located in `<directory that you unzipped the file above>/opt/local/bin`. You can start Asterix by first going to the `bin` directory and then run `JAVA_HOME=/ ./start-sample-cluster.sh` (or `start-sample-cluster.bat` on windows, you might see a few extra windows pop up when it starts, you can ignore those).
If you use the home VM, `start-sample-cluster.sh` is located in `<directory that you unzipped the file above>/opt/local/bin`. You can start Asterix by first going to the `bin` directory and then run `JAVA_HOME=/usr/lib/jvm/java-1.8.0 sudo ./start-sample-cluster.sh` (or `start-sample-cluster.bat` on windows, you might see a few extra windows pop up when it starts, you can ignore those).
Running the script might seemingly perform nothing. But if it works then you should be able to open the web interface in your browser, by visiting `http://localhost:19001` as described in the website.
In the web interface, select:
......@@ -131,7 +131,7 @@ In this homework, you will be writing SQL++ queries over the semi-structured dat
```
7. To shutdown Asterix, simply run `stop-sample-cluster.sh` in the terminal. If you are using the home VM, this script is located in
`<directory that you unzipped the file above>/opt/local/bin` (or `opt\local\bin\stop-sample-cluster.bat` on windows). If you are using the VM, go to the `bin` directory and then run `JAVA_HOME=/ ./stop-sample-cluster.sh` to shut down Asterix.
`<directory that you unzipped the file above>/opt/local/bin` (or `opt\local\bin\stop-sample-cluster.bat` on windows). If you are using the VM, go to the `bin` directory and then run `JAVA_HOME=/usr/lib/jvm/java-1.8.0 sudo ./stop-sample-cluster.sh` to shut down Asterix.
### B. Problems (100 points)
......@@ -140,33 +140,33 @@ In this homework, you will be writing SQL++ queries over the semi-structured dat
Use only the `mondial` dataset for problems 1-9. For problems 10-12 we will ask you to load in extra datasets provided in starter code.
1. Retrieve all the names of all cities located in Peru, sorted alphabetically. Name your output attribute ``cities``. [Result Size: 30 rows]
1. Retrieve all the names of all cities located in Peru, sorted alphabetically. Name your output attribute ``city``. [Result Size: 30 rows of `{"name":...}`]
2. For each country return its name, its population, and the number of religions, sorted alphabetically by country. Name your output attributes ``country``, ``population``, ``num_religions``. [Result Size: 238 rows]
2. For each country return its name, its population, and the number of religions, sorted alphabetically by country. Name your output attributes ``country``, ``population``, ``num_religions``. [Result Size: 238 rows of `{"num_religions":..., "country":..., "population":...}` (order of keys can differ)]
3. For each religion return the number of countries where it occurs; order them in decreasing number of countries. Name your output attributes ``religion``, ``num_countries``. [Result size: 37]
3. For each religion return the number of countries where it occurs; order them in decreasing number of countries. Name your output attributes ``religion``, ``num_countries``. [Result size: 37 of `{"religion':..., "num_countries":...}` (order of keys can differ)]
4. For each ethnic group, return the number of countries where it occurs, as well as the total population world-wide of that group. Hint: you need to multiply the ethnicity’s percentage with the country’s population. Use the functions `float(x)` and/or `int(x)` to convert a `string` to a `float` or to an `int`. Name your output attributes ``ethnic_group``, ``num_countries``, ``total_population``. You can leave your final `total_population` as a `float` if you like. [Result Size: 262]
4. For each ethnic group, return the number of countries where it occurs, as well as the total population world-wide of that group. Hint: you need to multiply the ethnicity’s percentage with the country’s population. Use the functions `float(x)` and/or `int(x)` to convert a `string` to a `float` or to an `int`. Name your output attributes ``ethnic_group``, ``num_countries``, ``total_population``. You can leave your final `total_population` as a `float` if you like. [Result Size: 262 of `{"ethnic_group":..., "num_countries":..., "total_population":...}` (order of keys can differ)]
5. Compute the list of all mountains, their heights, and the countries where they are located. Here you will join the "mountain" collection with the "country" collection, on the country code. You should return a list consisting of the mountain name, its height, the country code, and country name, in descending order of the height. Name your output attributes ``mountain``, ``height``, ``country_code``, ``country_name``. [Result Size: 272 rows]
5. Compute the list of all mountains, their heights, and the countries where they are located. Here you will join the "mountain" collection with the "country" collection, on the country code. You should return a list consisting of the mountain name, its height, the country code, and country name, in descending order of the height. Name your output attributes ``mountain``, ``height``, ``country_code``, ``country_name``. [Result Size: 272 rows of `{"mountain":..., "height":..., "country_code":..., "country_name":...}` (order of keys can differ)]
Hint: Some mountains can be located in more than one country. You need to output them for each country they are located in.
6. Compute a list of countries with all their mountains. This is similar to the previous problem, but now you will group the mountains for each country; return both the mountain name and its height. Your query should return a list where each element consists of the country code, country name, and a list of mountain names and heights; order the countries by the number of mountains they contain, in descending order. Name your output attributes ``country_code``, ``country_name``, ``mountain``, ``mountain_height``. [Result Size: 238]
6. Compute a list of countries with all their mountains. This is similar to the previous problem, but now you will group the mountains for each country; return both the mountain name and its height. Your query should return a list where each element consists of the country code, country name, and a list of mountain names and heights; order the countries by the number of mountains they contain, in descending order. Name your output attributes ``country_code``, ``country_name``, ``mountain``, ``mountain_height``. [Result Size: 238 rows of `{"country_code":..., "country_name":..., "mountains": [{"mountain":..., "mountain_height":...}, {"mountain":..., "mountain_height":...}, ...]}` (order of keys can differ)]
7. Find all countries bordering two or more seas. Here you need to join the "sea" collection with the "country" collection. For each country in your list, return its code, its name, and the list of bordering seas, in decreasing order of the number of seas. Name your output attributes ``country_code``, ``country_name``, ``sea``. [Result Size: 74]
7. Find all countries bordering two or more seas. Here you need to join the "sea" collection with the "country" collection. For each country in your list, return its code, its name, and the list of bordering seas, in decreasing order of the number of seas. Name your output attributes ``country_code``, ``country_name``, ``sea``. [Result Size: 74 rows of `{"country_code":..., "country_name":..., "seas": [{"sea":...}, {"sea":...}, ...]}` (order of keys can differ)]
8. Return all landlocked countries. A country is landlocked if it borders no sea. For each country in your list, return its code, its name, in decreasing order of the country's area. Note: this should be an easy query to derive from the previous one. Name your output attributes ``country_code``, ``country_name``, ``area``. [Result Size: 45]
8. Return all landlocked countries. A country is landlocked if it borders no sea. For each country in your list, return its code, its name, in decreasing order of the country's area. Note: this should be an easy query to derive from the previous one. Name your output attributes ``country_code``, ``country_name``, ``area``. [Result Size: 45 rows of `{"country_code":..., "country_name":..., "area":...}` (order of keys can differ)]
9. For this query you should also measure and report the runtime; it may be approximate (warning: it might run for a while) . Find all distinct pairs of countries that share both a mountain and a sea. Your query should return a list of pairs of country names. Avoid including a country with itself, like in (France,France), and avoid listing both (France,Korea) and (Korea,France) (not a real answer). Name your output attributes ``first_country``, ``second_country``. [Result Size: 7]
9. For this query you should also measure and report the runtime; it may be approximate (warning: it might run for a while) . Find all distinct pairs of countries that share both a mountain and a sea. Your query should return a list of pairs of country names. Avoid including a country with itself, like in (France,France), and avoid listing both (France,Korea) and (Korea,France) (not a real answer). Name your output attributes ``first_country``, ``second_country``. [Result Size: 7 rows of `{"first_country":..., "second_country":...}`]
10. Create a new dataverse called hw5index, then run the following commands:
......@@ -190,10 +190,10 @@ Use only the `mondial` dataset for problems 1-9. For problems 10-12 we will ask
Recall from lecture that asterix only allows creating index at top level collection, hence we provide the country, sea, etc collections individually even though their data is already included in mondial.
11. Re-run the query from 9. (“pairs of countries that share both a mountain and a sea”) on the new dataverse `hw5index`. Report the new runtime. [Result Size: 7]
11. Re-run the query from 9. (“pairs of countries that share both a mountain and a sea”) on the new dataverse `hw5index`. Report the new runtime. [Result Size: 7 rows of `{"first_country":..., "second_country":...}`]
12. Modify the query from 11. to return, for each pair of countries, the list of common mountains, and the list of common seas. Name your output attributes ``first_country``, ``second_country``, ``mountain``, ``sea``. [Result Size: 7]
12. Modify the query from 11. to return, for each pair of countries, the list of common mountains, and the list of common seas. Name your output attributes ``first_country``, ``second_country``, ``mountain``, ``sea``. [Result Size: 7 rows of `{"first_country":..., "second_country":...}`]
## Final Warning for HW6!
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment