Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# CSE 544 Homework 2: Finding the Mitochondrial Eve
**Objectives:**
To understand how queries are translated into the relational algebra. To master writing relational queries in a logic formalism using datalog.
**Assignment tools:**
Part 1: pen and paper; Part 2: Soufflé
**Assigned date:** January 21st, 2018
**Due date:** February 2nd, 2018
**What to turn in:** `hw2-q1.txt`, `hw2-q2.txt`, `hw2-q3.dl` along with its output `hw2-q3-1.ans`, `hw2-q3-2.ans`, `hw2-q3-3.ans`, `hw2-q3-4.ans`, `hw2-q3-5.ans`(see details below)
**Resources:**
- Soufflé (https://github.com/souffle-lang/souffle)
- Soufflé [language documentation](http://souffle-lang.org/docs/datalog/)
- [Soufflé tutorial](http://souffle-lang.org/pdf/SoufflePLDITutorial.pdf)
- Starter code in your personal repo for Part 2.
- General information for Part 2:
- The [Mitochondrial Eve](https://en.wikipedia.org/wiki/Mitochondrial_Eve)
- List of [women in the Bible](https://en.wikipedia.org/wiki/List_of_women_in_the_Bible)
- List of [minor biblical figures](https://en.wikipedia.org/wiki/List_of_minor_biblical_figures,_A%E2%80%93K)
- Note that the parent-child relationship is randomly generated and may change.
## Assignment Details
### Part 1: Warm Up with Relational Algebra
1. (10 points) Write the equivalent SQL query to this [relational algebra plan](figs/ra.pdf "Relational Algebra Plan"). Save your answer in `hw2-q1.txt`.
2. (10 points) Write a relational algebra plan for the following SQL query:
```sql
select a.p
from person_living a, male b
where a.p = b.name and
not exists (select *
from parent_child c, female d
where c.p1=d.name and c.p2=a.p)
```
You do not need to draw the query plan as a tree and can use the linear style instead. To make precedence clear, we ask you to break down your query plan by using *at most one* operator on each line. For example, given the query in question 1, you could write it as:
```sh
T1(x,p1,p2) = person_living(x) Join[x=p1] parent_child(p1,p2)
T2(p3,p4) = rename[p3,p4] parent_child(p3,p4)
T3(x,p1,p2,p3,p4) = T1(x,p1,p2) Join[p2=p3] T2(p3,p4)
T4(p1,p2,y) = GroupBy[p1,p2,count(*)->y] T3(x,p1,p2,p3,p4)
T5(p1,z) = GroupBy[p1,max(y)->z] T4(p1,p2,y)
```
where `T1`, `T2`, etc are temporary relations. Note that each line has at most one relational operator. You do not need to use the Greek symbols if you prefer. You also don't need to distinguish among the different flavors of join (just make sure that you write out the full join predicate).
Save your answer in `hw2-q2.txt`.
### Part 2. Finding the Mitochondrial Eve
Every human has a mother, who had her own mother, who in turn had her own mother. The matrilineal ancestor of an individual consists of the mother, the mother’s mother, and so on, following only the female lineage. A matrilinial common ancestor, MCA, is a matrilinial ancestor of all living humans. An MCA is very, very likely to exist (why?), and in fact there are many MCAs. The matrilineal most recent ancestor, or MRCA, is the only individual (woman) who is the MCA of all living humans and is the most recent such. Who is she? When did she live? In the 1980s three researchers, Cann, Stoneking and Wilson, analyzed the mitocondrial DNA of living humans and determined that the MRCA lived about 200,000 years ago. The researchers called her the [Mithcondrial Eve](https://en.wikipedia.org/wiki/Mitochondrial_Eve).
In this homework, you will analyze a database of 800 individuals, compute several things, culminating with the the computation of the Mithocondrial Eve. The genealogy database consists of over 800 biblical names, obtained from Wikipedia, with a randomly generated parent-child relationship.
### Getting Started
1. Install Soufflé
1. **Mac user**
* Download the [souffle-1.2.0.pkg](https://github.com/souffle-lang/souffle/releases/tag/1.2.0)
2. **Windows user**
* To ease the installation process, we recommand using the pre-built version of Soufflé on Debian
* Download the [VMPlayer](https://my.vmware.com/en/web/vmware/free#desktop_end_user_computing/vmware_workstation_player/12_0)
* Download the [Debian Image](https://www.debian.org/distrib/netinst). Make sure you install the amd64 version.
* When VMplayer starts running, click on the "Open a Virtual Machine" link. Navigate to the folder where you sotre the Debian Image. Click "OK". Then click on the left-side tab that appears containing the VM name. Click "Play virtual machine".
* When Debian is setup, obtain the pre-built package [souffle_1.2.0-1_amd64.deb](https://github.com/souffle-lang/souffle/releases/tag/1.2.0)
* Open a terminal and navigate to the location where you downloaded the package (which is probably `~/Downloads`)
* Then type `sudo apt install ./souffle_1.2.0-1_amd64.deb`
2. Verify Soufflé is working:
```
$ cd hw2/starter-code
$ souffle hw2-q3.dl
```
Congratulations! You just ran your first datalog query.
### Questions
For each question below, write in the file `hw2-q3.dl` a program that computes the answer to that question. See the Example section below.
1. (10 points) Find all descendants of Priscilla and their descriptions. Name your predicate `p1(x,d)`. Write the output to a file called `hw2-q3-1.ans`(123 rows)
2. (10 points) Find the woman/women with the largest number of children and the man/men with the largest number of children. For each individual, you should return the name of that individual, his/her description, and the number of children. Name your predicate `p2(x,d,n)`. Write the output to a file called `hw2-q3-2.ans`(2 rows)
3. (20 points) For each person x, we call a "complete lineage" any sequence x0=x, x1, x2, … , xn where each person is the parent of the previous person, and the last person has no parents; the length of the sequence is n. If x has a complete lineage of length n, then we also say that "x is in generation n". Compute the minimum and maximum generation of each living person x.
Name your predicate `p3(x,m1,m2)`, where x is a living person, and `m1`, `m2` are the minimal/maximal generation. (Hint: You may want to first compute all generations for all x: think about when can you say that x is in generation 0, and when can you say that x is in generation n+1. Of course x can be in multiple generations, e.g., x's mother is in generation 0 and x's father is in generation 2. Once you know everybody's generations, you can answer the problem easily.) Write the output to a file called `hw2-q3-3.ans` (22 rows)
4. (20 points) Compute all matrilineal common ancestors, MCA. Name your predicate `p4(x)`. Write the output to a file called `hw2-q3-4.ans` (6 rows)
5. (20 points) Find the mitochondrial Eve. Name your predicate `p5(x)`. Remember that you can utilize your predicates defined earlier. Write the output to a file called `hw2-q3-5.ans` (1 row)
#### Example
For example, suppose the question were: find all children of Priscilla; return their names and their descriptions. Then you write this in the `hw3-q3.dl` file (it’s already there):
```c
.output p0(IO=stdout)
p0(x,d) :- parent_child("Priscilla",x), person(x,d). //NOTE the period at the end
```
## Submission Instructions
For Part 1, write your answers in a file `hw4-q1.txt`, and `hw4-q2.txt`.
For part 2, write your answers in the provided file `hw2-q3.dl` and name the output generated from p1, p2, p3, p4, p5: `hw2-q3-1.ans`, `hw2-q3-2.ans`, `hw2-q3-3.ans`, `hw2-q3-4.ans`, `hw2-q3-5.ans`
**Important**: To remind you, in order for your answers to be added to the git repo,
you need to explicitly add each file:
```sh
$ git add *.txt *.ans
```
**Again, just because your code has been committed on your local machine does not mean that it has been
submitted -- it needs to be on GitLab!**
Use the same bash script `turnInHw.sh` in the root level directory of your repository that
commits your changes, deletes any prior tag for the current lab, tags the current commit,
and pushes the branch and tag to GitLab.
If you are using Linux or Mac OSX, you should be able to run the following:
```sh
$ ./turnInHw.sh hw2
```
Like previous assignments, make sure you check the results afterwards to make sure that your file(s)
have been committed.