add new lecture

fd1354fd · Yuxuan Mei · cbbbe242 · fd1354fd · fd1354fd
Commit fd1354fd authored 9 months ago by Yuxuan Mei
--- a/csv-data.ipynb
+++ b/csv-data.ipynb
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "9a53ab8e-0a96-4209-b13e-449440b75b97",
+   "metadata": {},
+   "source": [
+    "# CSV Data\n",
+    "\n",
+    "In this lesson, we'll review the dictionary features and learn about the CSV data file format. By the end of this lesson, students will be able to:\n",
+    "\n",
+    "- Identify the list of dictionaries corresponding to some CSV data.\n",
+    "- Loop over a list of dictionaries (CSV rows) and access dictionary values (CSV columns)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "77612e82-7c52-4a84-9de7-e30d90011783",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import doctest"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6859110d-2774-44a0-9dba-7751100c1481",
+   "metadata": {},
+   "source": [
+    "## Review: Dictionary functions\n",
+    "\n",
+    "Dictionaries, like lists, are also mutable data structures so they have functions to help store and retrieve elements.\n",
+    "\n",
+    "- `d.pop(key)` removes `key` from `d`.\n",
+    "- `d.keys()` returns a collection of all the keys in `d`.\n",
+    "- `d.values()` returns a collection of all the values in `d`.\n",
+    "- `d.items()` returns a collection of all `(key, value)` tuples in `d`.\n",
+    "\n",
+    "There are different ways to loop over a dictionary."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6ca794c6-2d26-4cce-8096-50070ecb0bb3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dictionary = {\"a\": 1, \"b\": 2, \"c\": 3}\n",
+    "for key in dictionary:\n",
+    "    print(key, dictionary[key])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c806f16e-f0c1-45a8-9e91-38114b94af42",
+   "metadata": {},
+   "source": [
+    "## None in Python\n",
+    "\n",
+    "In the lesson on File Processing, we saw a function to count the occurrences of each token in a file as a `dict` where the keys are words and the values are counts.\n",
+    "\n",
+    "Let's **debug** the following function `most_frequent` that takes a dictionary as *input* and returns the word with the highest count. If the input were a list, we could index the zero-th element from the list and loop over the remaining values by slicing the list. But it's harder to do this with a dictionary.\n",
+    "\n",
+    "Python has a special `None` keyword, like `null` in Java, that represents a placeholder value."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "084590d8-f37f-4852-bc76-f8ebbb0a7b7b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def most_frequent(counts):\n",
+    "    \"\"\"\n",
+    "    Returns the token in the given dictionary with the highest count, or None if empty.\n",
+    "\n",
+    "    >>> most_frequent({\"green\": 2, \"eggs\": 6, \"and\": 3, \"yam\": 2})\n",
+    "    'eggs'\n",
+    "    >>> most_frequent({}) # None is not displayed as output\n",
+    "\n",
+    "    \"\"\"\n",
+    "    max_word = None\n",
+    "    for word in counts:\n",
+    "        if counts[word] > counts[max_word]:\n",
+    "            max_word = word\n",
+    "    return max_word\n",
+    "\n",
+    "\n",
+    "doctest.run_docstring_examples(most_frequent, globals())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b8c4201-d35f-4517-abe6-a4deb955ec79",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "## Loop unpacking\n",
+    "\n",
+    "When we need keys and values, we can loop over and unpack each key-value pair by looping over the `dictionary.items()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7791f854-9d73-4a11-aad4-cf2b8a07b7f8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dictionary = {\"a\": 1, \"b\": 2, \"c\": 3}\n",
+    "for key, value in dictionary.items():\n",
+    "    print(key, value)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "abf82976-1a50-4125-9920-fff642782996",
+   "metadata": {},
+   "source": [
+    "Loop unpacking is not only useful for dictionaries, but also for looping over other sequences such as `enumerate` and `zip`. `enumerate` is a built-in function that takes a sequence and returns another sequence of pairs representing the element index and the element value."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "735741f6-b5df-4548-8b23-26ca0bd67cb0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open(\"poem.txt\") as f:\n",
+    "    for i, line in enumerate(f.readlines()):\n",
+    "        print(i, line[:-1])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b55e4520-de85-40af-9dcf-da497e0fa675",
+   "metadata": {},
+   "source": [
+    "`zip` is another built-in function that takes one or more sequences and returns a *sequence of tuples* consisting of the first element from each given sequence, the second element from each given sequence, etc. If the sequences are not all the same length, `zip` stops after yielding all elements from the shortest sequence."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3614855c-ed37-428a-a00a-e1212f17e4d7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "arabic_nums = [  1,    2,     3,    4,   5]\n",
+    "alpha_nums  = [\"a\",  \"b\",   \"c\",  \"d\", \"e\"]\n",
+    "roman_nums  = [\"i\", \"ii\", \"iii\", \"iv\", \"v\"]\n",
+    "\n",
+    "for arabic, alpha, roman in zip(arabic_nums, alpha_nums, roman_nums):\n",
+    "    print(arabic, alpha, roman)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d994834c-2bf7-4ab4-ba85-8d32d1cc89be",
+   "metadata": {},
+   "source": [
+    "## Comma-separated values\n",
+    "\n",
+    "In data science, we often work with tabular data such as the following table representing the names and hours of some of our TAs.\n",
+    "\n",
+    "Name | Hours\n",
+    "-----|-----:\n",
+    "Diana | 10\n",
+    "Thrisha | 15\n",
+    "Yuxiang | 20\n",
+    "Sheamin | 12\n",
+    "\n",
+    "A **table** has two main components to it:\n",
+    "\n",
+    "- **Rows** corresponding to each entry, such as each individual TA.\n",
+    "- **Columns** corresponding to (required or optional) fields for each entry, such as TA name and TA hours.\n",
+    "\n",
+    "A **comma-separated values** (CSV) file is a particular way of representing a table using only plain text. Here is the corresponding CSV file for the above table. Each row is separated with a newline. Each column is separated with a single comma `,`.\n",
+    "\n",
+    "```\n",
+    "Name,Hours\n",
+    "Diana,10\n",
+    "Thrisha,15\n",
+    "Yuxiang,20\n",
+    "Sheamin,12\n",
+    "```\n",
+    "\n",
+    "We'll learn a couple ways of processing CSV data in this course, first of which is representing the data as a **list of dictionaries**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "199d463e-9ecd-46c1-aacc-1bdffae1c00c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "staff = [\n",
+    "    {\"Name\": \"Yuxiang\", \"Hours\": 20},\n",
+    "    {\"Name\": \"Thrisha\", \"Hours\": 15},\n",
+    "    {\"Name\": \"Diana\", \"Hours\": 10},\n",
+    "    {\"Name\": \"Sheamin\", \"Hours\": 12},\n",
+    "]\n",
+    "staff"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ec2eeac-0253-4bc3-b5b9-ec708d56a425",
+   "metadata": {},
+   "source": [
+    "To see the total number of TA hours available, we can loop over the list of dictionaries and sum the \"Hours\" value."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "55dc7fda-d50c-4a74-8771-6ae1154bc0e8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "total_hours = 0\n",
+    "for ta in staff:\n",
+    "    total_hours += ta[\"Hours\"]\n",
+    "total_hours"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e76d800-bf01-4a9a-8578-1f187486fbd3",
+   "metadata": {},
+   "source": [
+    "What are some different ways to get the value of Thrisha's hours?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "930fd021-24ea-4d57-b0bd-762eccf3a88c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for ta in staff:\n",
+    "    if ta[\"Name\"] == \"Thrisha\":\n",
+    "        print(ta[\"Hours\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8b8eb32",
+   "metadata": {},
+   "source": [
+    "Poll Question: select the right option"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5d34d58f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "staff[1][\"Hours\"]\n",
+    "staff[\"Hours\"][1]\n",
+    "staff[\"Thrisha\"][\"Hours\"]\n",
+    "staff[\"Hours\"][\"Thrisha\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f84463d4",
+   "metadata": {},
+   "source": [
+    "## Reading CSV files using Python's built-in csv package\n",
+    "Suppose we have a dataset of earthquakes around the world stored in the CSV file `earthquakes.csv`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "53ae2878",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import csv"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d631d4a9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "earthquakes = []\n",
+    "with open(\"materials/earthquakes.csv\") as f:\n",
+    "    reader = csv.DictReader(f)\n",
+    "    for row in reader:\n",
+    "        earthquakes.append(row)\n",
+    "earthquakes[:5]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f87d17f0",
+   "metadata": {},
+   "source": [
+    "`csv.DictWriter` also exists; you can do the following to write a row into a csv file:\n",
+    "- `writeheader()`: Write a row with the field names (as specified in the constructor) to the writer’s file object.\n",
+    "- `writerow(row)` or `writerows(rows)`: Write the row/rows parameter to the writer’s file object.\n",
+    "\n",
+    "Here, `row` is a dictionary and `rows` is a list of dictionaries."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12e0a542-abb4-4583-8abe-af435c250162",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "source": [
+    "## Practice: Largest earthquake place\n",
+    "\n",
+    "Write a function `largest_earthquake_place` that takes the earthquake `data` represented as a list of dictionaries and returns the name of the location that experienced the largest earthquake. If there are no rows in the dataset (no data at all), return `None`.\n",
+    "\n",
+    "id | year | month | day | latitude | longitude | name | magnitude\n",
+    "---|:----:|:-----:|:---:|---------:|----------:|------|---------:\n",
+    "nc72666881 | 2016 | 7 | 27 | 37.672 | -121.619 | California | 1.43\n",
+    "us20006i0y | 2016 | 7 | 27 | 21.515 | 94.572 | Burma | 4.9\n",
+    "nc72666891 | 2016 | 7 | 27 | 37.577 | -118.859 | California | 0.06\n",
+    "nc72666896 | 2016 | 7 | 27 | 37.596 | -118.995 | California | 0.4\n",
+    "nn00553447 | 2016 | 7 | 27 | 39.378 | -119.845 | Nevada | 0.3\n",
+    "\n",
+    "For example, considering only the data shown above, the result would be `\"Burma\"` because it had the earthquake with the largest magnitude (4.9)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7da4b7e1-f224-4bd6-9cc6-f1c61bbb23ed",
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "def largest_earthquake_place(path):\n",
+    "    \"\"\"\n",
+    "    Returns the name of the place with the largest-magnitude earthquake in the specified CSV file.\n",
+    "\n",
+    "    >>> largest_earthquake_place(\"earthquakes.csv\")\n",
+    "    'Northern Mariana Islands'\n",
+    "    \"\"\"\n",
+    "    earthquakes = []\n",
+    "    with open(path) as f:\n",
+    "        reader = csv.DictReader(f)\n",
+    "        for row in reader:\n",
+    "            earthquakes.append(row)\n",
+    "    # TODO: find the place with the largest-magnitude earthquake\n",
+    "    ...\n",
+    "\n",
+    "\n",
+    "doctest.run_docstring_examples(largest_earthquake_place, globals())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3339b1f0",
+   "metadata": {},
+   "source": [
+    "Let's see another solution done with a library \"pandas\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4fc35602",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0ca0d6dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def largest_earthquake_place_pandas(path):\n",
+    "    \"\"\"\n",
+    "    Returns the name of the place with the largest-magnitude earthquake in the specified CSV file.\n",
+    "\n",
+    "    >>> largest_earthquake_place_pandas(\"materials/earthquakes.csv\")\n",
+    "    'Northern Mariana Islands'\n",
+    "    \"\"\"\n",
+    "    earthquakes = pd.read_csv(path)\n",
+    "    return earthquakes.loc[earthquakes[\"magnitude\"].idxmax()][\"name\"]\n",
+    "\n",
+    "doctest.run_docstring_examples(largest_earthquake_place_pandas, globals())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "08ccb960",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "earthquakes = pd.read_csv(\"materials/earthquakes.csv\")\n",
+    "earthquakes.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "142c7c7d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "type(earthquakes)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "62988d32",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# play with type()..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f8554769",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:9a53ab8e-0a96-4209-b13e-449440b75b97 tags:
+
+# CSV Data
+
+In this lesson, we'll review the dictionary features and learn about the CSV data file format. By the end of this lesson, students will be able to:
+
+- Identify the list of dictionaries corresponding to some CSV data.
+- Loop over a list of dictionaries (CSV rows) and access dictionary values (CSV columns).
+
+%% Cell type:code id:77612e82-7c52-4a84-9de7-e30d90011783 tags:
+
+``` python
+import doctest
+```
+
+%% Cell type:markdown id:6859110d-2774-44a0-9dba-7751100c1481 tags:
+
+## Review: Dictionary functions
+
+Dictionaries, like lists, are also mutable data structures so they have functions to help store and retrieve elements.
+
+- `d.pop(key)` removes `key` from `d`.
+- `d.keys()` returns a collection of all the keys in `d`.
+- `d.values()` returns a collection of all the values in `d`.
+- `d.items()` returns a collection of all `(key, value)` tuples in `d`.
+
+There are different ways to loop over a dictionary.
+
+%% Cell type:code id:6ca794c6-2d26-4cce-8096-50070ecb0bb3 tags:
+
+``` python
+dictionary = {"a": 1, "b": 2, "c": 3}
+for key in dictionary:
+    print(key, dictionary[key])
+```
+
+%% Cell type:markdown id:c806f16e-f0c1-45a8-9e91-38114b94af42 tags:
+
+## None in Python
+
+In the lesson on File Processing, we saw a function to count the occurrences of each token in a file as a `dict` where the keys are words and the values are counts.
+
+Let's **debug** the following function `most_frequent` that takes a dictionary as *input* and returns the word with the highest count. If the input were a list, we could index the zero-th element from the list and loop over the remaining values by slicing the list. But it's harder to do this with a dictionary.
+
+Python has a special `None` keyword, like `null` in Java, that represents a placeholder value.
+
+%% Cell type:code id:084590d8-f37f-4852-bc76-f8ebbb0a7b7b tags:
+
+``` python
+def most_frequent(counts):
+    """
+    Returns the token in the given dictionary with the highest count, or None if empty.
+
+    >>> most_frequent({"green": 2, "eggs": 6, "and": 3, "yam": 2})
+    'eggs'
+    >>> most_frequent({}) # None is not displayed as output
+
+    """
+    max_word = None
+    for word in counts:
+        if counts[word] > counts[max_word]:
+            max_word = word
+    return max_word
+
+
+doctest.run_docstring_examples(most_frequent, globals())
+```
+
+%% Cell type:markdown id:0b8c4201-d35f-4517-abe6-a4deb955ec79 tags:
+
+## Loop unpacking
+
+When we need keys and values, we can loop over and unpack each key-value pair by looping over the `dictionary.items()`.
+
+%% Cell type:code id:7791f854-9d73-4a11-aad4-cf2b8a07b7f8 tags:
+
+``` python
+dictionary = {"a": 1, "b": 2, "c": 3}
+for key, value in dictionary.items():
+    print(key, value)
+```
+
+%% Cell type:markdown id:abf82976-1a50-4125-9920-fff642782996 tags:
+
+Loop unpacking is not only useful for dictionaries, but also for looping over other sequences such as `enumerate` and `zip`. `enumerate` is a built-in function that takes a sequence and returns another sequence of pairs representing the element index and the element value.
+
+%% Cell type:code id:735741f6-b5df-4548-8b23-26ca0bd67cb0 tags:
+
+``` python
+with open("poem.txt") as f:
+    for i, line in enumerate(f.readlines()):
+        print(i, line[:-1])
+```
+
+%% Cell type:markdown id:b55e4520-de85-40af-9dcf-da497e0fa675 tags:
+
+`zip` is another built-in function that takes one or more sequences and returns a *sequence of tuples* consisting of the first element from each given sequence, the second element from each given sequence, etc. If the sequences are not all the same length, `zip` stops after yielding all elements from the shortest sequence.
+
+%% Cell type:code id:3614855c-ed37-428a-a00a-e1212f17e4d7 tags:
+
+``` python
+arabic_nums = [  1,    2,     3,    4,   5]
+alpha_nums  = ["a",  "b",   "c",  "d", "e"]
+roman_nums  = ["i", "ii", "iii", "iv", "v"]
+
+for arabic, alpha, roman in zip(arabic_nums, alpha_nums, roman_nums):
+    print(arabic, alpha, roman)
+```
+
+%% Cell type:markdown id:d994834c-2bf7-4ab4-ba85-8d32d1cc89be tags:
+
+## Comma-separated values
+
+In data science, we often work with tabular data such as the following table representing the names and hours of some of our TAs.
+
+Name | Hours
+-----|-----:
+Diana | 10
+Thrisha | 15
+Yuxiang | 20
+Sheamin | 12
+
+A **table** has two main components to it:
+
+- **Rows** corresponding to each entry, such as each individual TA.
+- **Columns** corresponding to (required or optional) fields for each entry, such as TA name and TA hours.
+
+A **comma-separated values** (CSV) file is a particular way of representing a table using only plain text. Here is the corresponding CSV file for the above table. Each row is separated with a newline. Each column is separated with a single comma `,`.
+
+```
+Name,Hours
+Diana,10
+Thrisha,15
+Yuxiang,20
+Sheamin,12
+```
+
+We'll learn a couple ways of processing CSV data in this course, first of which is representing the data as a **list of dictionaries**.
+
+%% Cell type:code id:199d463e-9ecd-46c1-aacc-1bdffae1c00c tags:
+
+``` python
+staff = [
+    {"Name": "Yuxiang", "Hours": 20},
+    {"Name": "Thrisha", "Hours": 15},
+    {"Name": "Diana", "Hours": 10},
+    {"Name": "Sheamin", "Hours": 12},
+]
+staff
+```
+
+%% Cell type:markdown id:8ec2eeac-0253-4bc3-b5b9-ec708d56a425 tags:
+
+To see the total number of TA hours available, we can loop over the list of dictionaries and sum the "Hours" value.
+
+%% Cell type:code id:55dc7fda-d50c-4a74-8771-6ae1154bc0e8 tags:
+
+``` python
+total_hours = 0
+for ta in staff:
+    total_hours += ta["Hours"]
+total_hours
+```
+
+%% Cell type:markdown id:6e76d800-bf01-4a9a-8578-1f187486fbd3 tags:
+
+What are some different ways to get the value of Thrisha's hours?
+
+%% Cell type:code id:930fd021-24ea-4d57-b0bd-762eccf3a88c tags:
+
+``` python
+for ta in staff:
+    if ta["Name"] == "Thrisha":
+        print(ta["Hours"])
+```
+
+%% Cell type:markdown id:f8b8eb32 tags:
+
+Poll Question: select the right option
+
+%% Cell type:code id:5d34d58f tags:
+
+``` python
+staff[1]["Hours"]
+staff["Hours"][1]
+staff["Thrisha"]["Hours"]
+staff["Hours"]["Thrisha"]
+```
+
+%% Cell type:markdown id:f84463d4 tags:
+
+## Reading CSV files using Python's built-in csv package
+Suppose we have a dataset of earthquakes around the world stored in the CSV file `earthquakes.csv`.
+
+%% Cell type:code id:53ae2878 tags:
+
+``` python
+import csv
+```
+
+%% Cell type:code id:d631d4a9 tags:
+
+``` python
+earthquakes = []
+with open("materials/earthquakes.csv") as f:
+    reader = csv.DictReader(f)
+    for row in reader:
+        earthquakes.append(row)
+earthquakes[:5]
+```
+
+%% Cell type:markdown id:f87d17f0 tags:
+
+`csv.DictWriter` also exists; you can do the following to write a row into a csv file:
+- `writeheader()`: Write a row with the field names (as specified in the constructor) to the writer’s file object.
+- `writerow(row)` or `writerows(rows)`: Write the row/rows parameter to the writer’s file object.
+
+Here, `row` is a dictionary and `rows` is a list of dictionaries.
+
+%% Cell type:markdown id:12e0a542-abb4-4583-8abe-af435c250162 tags:
+
+## Practice: Largest earthquake place
+
+Write a function `largest_earthquake_place` that takes the earthquake `data` represented as a list of dictionaries and returns the name of the location that experienced the largest earthquake. If there are no rows in the dataset (no data at all), return `None`.
+
+id | year | month | day | latitude | longitude | name | magnitude
+---|:----:|:-----:|:---:|---------:|----------:|------|---------:
+nc72666881 | 2016 | 7 | 27 | 37.672 | -121.619 | California | 1.43
+us20006i0y | 2016 | 7 | 27 | 21.515 | 94.572 | Burma | 4.9
+nc72666891 | 2016 | 7 | 27 | 37.577 | -118.859 | California | 0.06
+nc72666896 | 2016 | 7 | 27 | 37.596 | -118.995 | California | 0.4
+nn00553447 | 2016 | 7 | 27 | 39.378 | -119.845 | Nevada | 0.3
+
+For example, considering only the data shown above, the result would be `"Burma"` because it had the earthquake with the largest magnitude (4.9).
+
+%% Cell type:code id:7da4b7e1-f224-4bd6-9cc6-f1c61bbb23ed tags:
+
+``` python
+def largest_earthquake_place(path):
+    """
+    Returns the name of the place with the largest-magnitude earthquake in the specified CSV file.
+
+    >>> largest_earthquake_place("earthquakes.csv")
+    'Northern Mariana Islands'
+    """
+    earthquakes = []
+    with open(path) as f:
+        reader = csv.DictReader(f)
+        for row in reader:
+            earthquakes.append(row)
+    # TODO: find the place with the largest-magnitude earthquake
+    ...
+
+
+doctest.run_docstring_examples(largest_earthquake_place, globals())
+```
+
+%% Cell type:markdown id:3339b1f0 tags:
+
+Let's see another solution done with a library "pandas".
+
+%% Cell type:code id:4fc35602 tags:
+
+``` python
+import pandas as pd
+```
+
+%% Cell type:code id:0ca0d6dc tags:
+
+``` python
+def largest_earthquake_place_pandas(path):
+    """
+    Returns the name of the place with the largest-magnitude earthquake in the specified CSV file.
+
+    >>> largest_earthquake_place_pandas("materials/earthquakes.csv")
+    'Northern Mariana Islands'
+    """
+    earthquakes = pd.read_csv(path)
+    return earthquakes.loc[earthquakes["magnitude"].idxmax()]["name"]
+
+doctest.run_docstring_examples(largest_earthquake_place_pandas, globals())
+```
+
+%% Cell type:code id:08ccb960 tags:
+
+``` python
+earthquakes = pd.read_csv("materials/earthquakes.csv")
+earthquakes.head()
+```
+
+%% Cell type:code id:142c7c7d tags:
+
+``` python
+type(earthquakes)
+```
+
+%% Cell type:code id:62988d32 tags:
+
+``` python
+# play with type()...
+```
+
+%% Cell type:code id:f8554769 tags:
+
+``` python
+```
--- a/earthquakes.csv
+++ b/earthquakes.csv