add lecture

41d80705 · Yuxuan Mei · 7d673bb6 · 41d80705
Commit 41d80705 authored 9 months ago by Yuxuan Mei
--- a/asymptotic-analysis.ipynb
+++ b/asymptotic-analysis.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7b81ebea-c78a-48a3-bd24-88ffeeeb47ba",
+   "metadata": {},
+   "source": [
+    "# Asymptotic Analysis\n",
+    "\n",
+    "In the past, we've run into questions about efficiency when choosing what data structures to use (preferring sets or dictionaries instead of lists), as well as in designing our software to precompute and save results in the initializer rather than compute them just-in-time. By the end of this lesson, students will be able to:\n",
+    "\n",
+    "- Cache or memoize a function by defining a dictionary or using the `@cache` decorator.\n",
+    "- Analyze the runtime efficiency of a function involving loops.\n",
+    "- Describe the runtime efficiency of a function using asymptotic notation.\n",
+    "\n",
+    "There are many different ways software efficiency can be quantified and studied.\n",
+    "\n",
+    "- **Running Time**, or how long it takes a function to run (e.g. microseconds).\n",
+    "- **Space or Memory**, or how much memory is consumed while running a function (e.g. megabytes of memory).\n",
+    "- **Programmer Time**, or how long it took to implement a function correctly (e.g. hours or days).\n",
+    "- **Energy Consumption**, or how much electricity or natural resources a function takes to run (e.g. kilo-watts from a certain energy source)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e956695c-0e41-4b38-a59a-33af58656336",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -q nb_mypy\n",
+    "%reload_ext nb_mypy\n",
+    "%nb_mypy mypy-options --strict"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45284748-5d0a-4c65-a74e-0e11c452a4db",
+   "metadata": {},
+   "source": [
+    "## Data classes\n",
+    "\n",
+    "Earlier, we wrote a `University` class that needed to loop over all the students enrolled in the university in order to compute the list of students enrolled in a particular class. Let's take this opportunity to review this by also introducing one neat feature of Python that allows you to define simple data types called **data classes**. We can define a `Student` data class that takes a `str` name and `dict[str, int]` for courses and the credits for each course with the `@dataclass` **decorator**. Decorators will turn out to be a useful feature for the remainder of this lesson, and it's helpful to see it a more familiar context first."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c17bf60-fcb1-448b-8b81-2c3a8970e9aa",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dataclasses import dataclass, field\n",
+    "\n",
+    "\n",
+    "@dataclass\n",
+    "class Student:\n",
+    "    \"\"\"Represents a student with a given name and dictionary mapping course names to credits.\"\"\"\n",
+    "    # These are instance fields, not class-wide fields!\n",
+    "    name: str\n",
+    "    courses: dict[str, int] = field(default_factory=dict)\n",
+    "    # field function specifies that the default value for courses is an empty dictionary\n",
+    "\n",
+    "    def get_courses(self) -> list[str]:\n",
+    "        \"\"\"Return a list of all the enrolled courses.\"\"\"\n",
+    "        from time import sleep\n",
+    "        sleep(0.0001) # The time it takes to get your course list from MyUW (jk\n",
+    "        return list(self.courses)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df4cf234-ae83-49b7-9f6f-bc65ffec12ba",
+   "metadata": {},
+   "source": [
+    "The `@dataclass` decorator \"wraps\" the `Student` class in this example by providing many useful dunder methods like `__init__`, `__repr__`, and `__eq__` (used to compare whether two instances are `==` to each other), among many others. This allows you to write code that actually describes your class, rather than the \"boilerplate\" or template code that we always need to write when defining a class.\n",
+    "\n",
+    "Default values for each field can be set with `=` assignment operator, and dataclasses also have a built-in way of handling mutable default parameters like fields that are assigned to lists or dictionaries using the `field` function as shown above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "59fc6500-a4e5-4ad8-892e-6ccdce0366ad",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# let's create a student instance for practice!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09001a83-8d7a-42b8-8475-0525e770c54b",
+   "metadata": {},
+   "source": [
+    "Let's see how we can use this `Student` data class to implement our `University` class again."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a47c2c02-c821-4aad-974a-962856bf47b9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import Optional\n",
+    "\n",
+    "\n",
+    "class University:\n",
+    "    \"\"\"Represents a university with a name and zero or more enrolled students.\"\"\"\n",
+    "\n",
+    "    def __init__(self, name: str, students: Optional[list[Student]] = None) -> None:\n",
+    "        \"\"\"Initializes a new University with the given name and students (default None).\"\"\"\n",
+    "        self.name = name\n",
+    "        if students is None:\n",
+    "            self.enrolled_students = []\n",
+    "        else:\n",
+    "            self.enrolled_students = students\n",
+    "\n",
+    "        # self.courses: dict[str, list[Student]] = {}\n",
+    "        # for student in self.enrolled_students:\n",
+    "            # for course in student.get_courses():\n",
+    "                # if course not in self.courses:\n",
+    "                    # self.courses[course] = []\n",
+    "                # self.courses[course].append(student)\n",
+    "\n",
+    "    def students(self) -> list[Student]:\n",
+    "        \"\"\"Returns a list of all enrolled students sorted by alphabetical order.\"\"\"\n",
+    "        return sorted(self.enrolled_students, key=Student.get_name)\n",
+    "\n",
+    "    def enroll(self, student: Student) -> None:\n",
+    "        \"\"\"Enrolls the given student in this university.\"\"\"\n",
+    "        self.enrolled_students.append(student)\n",
+    "\n",
+    "    def roster(self, course: str) -> list[Student]:\n",
+    "        \"\"\"Returns a list of all students enrolled in the given course name.\"\"\"\n",
+    "        # if course not in self.courses:\n",
+    "            # return []\n",
+    "        # return self.courses[course]\n",
+    "        students = [student for student in self.enrolled_students if course in student.get_courses()]\n",
+    "        return students\n",
+    "\n",
+    "\n",
+    "uw = University(\"Udub\", [Student(868373, \"nicole.txt\"), Student(123456, \"nicole.txt\")])\n",
+    "uw.students()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "618b57eb-f9db-49ea-a1d5-929c79055276",
+   "metadata": {},
+   "source": [
+    "## Caching\n",
+    "\n",
+    "The approach of precomputing and saving results in the constructor is one way to make a function more efficient than simply computing the result each time we call the function. But these aren't the only two ways to approach this problem. In fact, there's another approach that we could have taken that sits in between these two approaches. We can **cache** or **memoize** the result by computing it once and then saving the result so that we don't have to compute it again when asked for the same result in the future.\n",
+    "\n",
+    "Let's start by copying and pasting the `University` class that we wrote above into the following code cell. Rename the new class to `CachedUniversity` so that we can compare them later."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "78743d7f-f3bc-4844-bc19-155e37399436",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "79fea560-2878-43f0-9820-a0c1abad4b81",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time uw.roster(\"CSE163\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "203256c7-66d8-4620-a146-49b513c2e491",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%time uw.roster(\"CSE163\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "80fa293b-b505-4c2b-8cd7-7a08a6f9b71a",
+   "metadata": {},
+   "source": [
+    "Okay, so this doesn't seem that much faster since we're only working with 2 students. There are 60,081 students enrolled across the three UW campuses. Let's try generating that many students for our courses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ecd853e9-b491-4951-adf2-9ca5da5055eb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "names = [\"Thrisha\", \"Kevin\", \"Nicole\"]\n",
+    "students = [Student(names[i % len(names)], {\"CSE163\": 4}) for i in range(60081)]\n",
+    "students[:10]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "278826dd-c2be-4baf-8cbe-7944a53f29c7",
+   "metadata": {},
+   "source": [
+    "Python provides a way to implement this caching feature using the `@cache` decorator. Like the `@dataclass` decorator, the `@cache` decorator helps us focus on expressing just the program logic that we want to communicate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4f50a799-00a0-4bb2-b653-7eec9f53fb0e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from functools import cache\n",
+    "..."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "35e7c55e-369b-4402-8233-1a9d21633a1f",
+   "metadata": {},
+   "source": [
+    "## Counting steps\n",
+    "\n",
+    "A \"simple\" line of code takes one step to run. Some common simple lines of code are:\n",
+    "\n",
+    "- `print` statements\n",
+    "- Simple arithmetic (e.g. `1 + 2 * 3`)\n",
+    "- Variable assignment or access (e.g. `x = 4`, `1 + x`)\n",
+    "- List or dictionary accesses (e.g. `l[0]`, `l[1] = 4`, `'key' in d`).\n",
+    "\n",
+    "The number of steps for a sequence of lines is the sum of the steps for each line."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "80e5d5a0-2780-4f94-8bf8-408ef289c6a2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = 1\n",
+    "print(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "715fcea0-75be-44f1-a54f-72b077a2764a",
+   "metadata": {},
+   "source": [
+    "The number of steps to execute a loop will be the number of times that loop runs multiplied by the number of steps inside its body.\n",
+    "\n",
+    "The number of steps in a function is just the sum of all the steps of the code in its body. The number of steps for a function is the number of steps made whenever that function is called (e.g if a function has 100 steps in it and is called twice, that is 200 steps total).\n",
+    "\n",
+    "This notion of a \"step\" intentionally vague. We will see later that it's not important that you get the exact number of steps correct as long as you get an approximation of their relative scale (more on this later). Below we show some example code blocks and annotate their number of steps in the first comment of the code block."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c6f503fc-a6be-425f-9cd6-ccf3f2a7d9f9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Total: 4 steps\n",
+    "\n",
+    "print(1 + 2)    # 1 step\n",
+    "print(\"hello\")  # 1 step\n",
+    "x = 14 - 2      # 1 step\n",
+    "y = x ** 2      # 1 step"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ac9adfa5-4f2e-4817-9511-179a44fcba76",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Total: 51 steps\n",
+    "\n",
+    "print(\"Starting loop\")  # 1 step\n",
+    "\n",
+    "# Loop body has a total of 1 step. It runs 30 times for a total of 30\n",
+    "for i in range(30):     # Runs 30 times\n",
+    "    print(i)            #   - 1 step\n",
+    "\n",
+    " # Loop body has a total of 2 steps. It runs 10 times for a total of 20\n",
+    "for i in range(10):     # Runs 10 times\n",
+    "    x = i               #   - 1 step\n",
+    "    y = i**2            #   - 1 step"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0f8b855a-b9f2-45df-8067-96596c219df8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Total: 521 steps\n",
+    "\n",
+    "print(\"Starting loop\")   # 1 step\n",
+    "\n",
+    "# Loop body has a total of 26 steps. It runs 20 times for a total of 520\n",
+    "for i in range(20):      # Runs 20 times\n",
+    "    print(i)             #   - 1 step\n",
+    "\n",
+    "                         #   Loop body has a total of 1 step.\n",
+    "                         #   It runs 25 times for total of 25.\n",
+    "    for j in range(25):  #   - Runs 25 times\n",
+    "        result = i * j   #       - 1 step"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f590ba4f-f1ed-40be-84e7-13ad4f08d2dd",
+   "metadata": {},
+   "source": [
+    "## Big-O notation\n",
+    "\n",
+    "Now that we feel a little bit more confortable counting steps, lets consider two functions `sum1` and `sum2` to compute the sum of numbers from 1 to some input *n*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5ecbfeb2-7238-4a84-a4b2-067730c78cbd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sum1(n: int) -> int:\n",
+    "    total = 0\n",
+    "    for i in range(n + 1):\n",
+    "        total += i\n",
+    "    return total\n",
+    "\n",
+    "\n",
+    "def sum2(n: int) -> int:\n",
+    "    return n * (n + 1) // 2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b066f68-4a97-48cd-8479-ae0030898194",
+   "metadata": {},
+   "source": [
+    "The second one uses a clever formula discovered by Gauss that computes the sum of numbers from 1 to n using a formula.\n",
+    "\n",
+    "To answer the question of how many steps it takes to run these functions, we now need to talk about the number of steps in terms of the input *n* (since the number of steps might depend on *n*)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b749ba33-0e20-41dd-9aa9-f73e207c68c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sum1(n: int) -> int:     # Total:\n",
+    "    total = 0                #\n",
+    "    for i in range(n + 1):   #\n",
+    "        total += i           #\n",
+    "    return total             #\n",
+    "\n",
+    "\n",
+    "def sum2(n: int) -> int:     # Total: \n",
+    "    return n * (n + 1) // 2  #"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43125198-e9bf-4fea-8fdb-05e20d117b5b",
+   "metadata": {},
+   "source": [
+    "Since everything we've done with counting steps is a super loose approximation, we don't really want to promise that the relationship for the **runtime** is something exactly like *n* + 3. Instead, we want to report that there is some dependence on *n* for the runtime so that we get the idea of how the function scales, without spending too much time quibbling over whether it's truly *n* + 3 or *n* + 2.\n",
+    "\n",
+    "Instead, computer scientists generally drop any constants or low-order terms from the expression *n* + 3 and just report that it \"grows like *n*.\" The notation we use is called **Big-O** notation. Instead of reporting *n* + 3 as the growth of steps, we instead report O(*n*). This means that the growth is proportional to *n* but we aren't saying the exact number of steps. We instead only report the terms that have the **biggest impact** on a function's runtime.\n",
+    "\n",
+    "Let's practice this! About how many steps are run in the loop for the given method?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d5f701b1-f5ee-44b1-89ee-d99d3305d4f8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def method(n: int) -> int:\n",
+    "    result = 0\n",
+    "    for i in range(9):\n",
+    "        for j in range(n):\n",
+    "            result += j\n",
+    "    return result\n",
+    "\n",
+    "\n",
+    "method(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29ec9792-94d1-44be-86bb-3d6711f655ed",
+   "metadata": {},
+   "source": [
+    "Big-O notation is helpful because it lets us communicate how the runtime of a function scales with respect to its input size. By scale, we mean quantifying approximately how much longer a function will run if you were to increase its input size. For example, if I told you a function had a runtime in O(*n*<sup>2</sup>), you would know that if I were to double the input size *n*, the function would take about 4 times as long to run.\n",
+    "\n",
+    "In addition to O(*n*) and O(*n*<sup>2</sup>), there are many other **orders of growth** in the [Big-O Cheat Sheet](https://www.bigocheatsheet.com/)."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "5d8de2fb-94b2-4e23-b1a4-e035d0c944ee",
+   "metadata": {},
+   "source": [
+    "## Data structures\n",
+    "\n",
+    "Most operations in a set or a dictionary, such as adding values, removing values, containment check, are actually constant time or O(1) operations. No matter how much data is stored in a set or dictionary, we can always answer questions like, \"Is this value a key in this dictionary?\" in constant time.\n",
+    "\n",
+    "In the [data-structures.ipynb](data-structures.ipynb) notebook, we wanted to count the number of unique words in the file, `moby-dick.txt`. We explored two different implementations that were nearly identical, except one used a `list` and the other used a `set`.\n",
+    "\n",
+    "```py\n",
+    "def count_unique_list(file_name: str) -> int:\n",
+    "    words = list()\n",
+    "    with open(file_name) as file:\n",
+    "        for line in file.readlines():\n",
+    "            for word in line.split():\n",
+    "                if word not in words:  # O(n)\n",
+    "                    words.append(word) # O(1)\n",
+    "    return len(words)\n",
+    "\n",
+    "def count_unique_set(file_name: str) -> int:\n",
+    "    words = set()\n",
+    "    with open(file_name) as file:\n",
+    "        for line in file.readlines():\n",
+    "            for word in line.split():\n",
+    "                if word not in words:  # O(1)\n",
+    "                    words.add(word)    # O(1)\n",
+    "    return len(words)\n",
+    "```\n",
+    "\n",
+    "The `not in` operator in the `list` version was responsible for the difference in runtime because it's an O(*n*) operation. Whereas, for `set`, the `not in` operator is an O(1) operation!\n",
+    "\n",
+    "The analysis ends up being a little bit more complex since the size of the list is changing throughout the function (as you add unique words), but with a little work you can show that the `list` version of this function takes O(*n*<sup>2</sup>) time where *n* is the number of words in the file while the `set` version only takes O(n) time."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:7b81ebea-c78a-48a3-bd24-88ffeeeb47ba tags:
+
+# Asymptotic Analysis
+
+In the past, we've run into questions about efficiency when choosing what data structures to use (preferring sets or dictionaries instead of lists), as well as in designing our software to precompute and save results in the initializer rather than compute them just-in-time. By the end of this lesson, students will be able to:
+
+- Cache or memoize a function by defining a dictionary or using the `@cache` decorator.
+- Analyze the runtime efficiency of a function involving loops.
+- Describe the runtime efficiency of a function using asymptotic notation.
+
+There are many different ways software efficiency can be quantified and studied.
+
+- **Running Time**, or how long it takes a function to run (e.g. microseconds).
+- **Space or Memory**, or how much memory is consumed while running a function (e.g. megabytes of memory).
+- **Programmer Time**, or how long it took to implement a function correctly (e.g. hours or days).
+- **Energy Consumption**, or how much electricity or natural resources a function takes to run (e.g. kilo-watts from a certain energy source).
+
+%% Cell type:code id:e956695c-0e41-4b38-a59a-33af58656336 tags:
+
+``` python
+!pip install -q nb_mypy
+%reload_ext nb_mypy
+%nb_mypy mypy-options --strict
+```
+
+%% Cell type:markdown id:45284748-5d0a-4c65-a74e-0e11c452a4db tags:
+
+## Data classes
+
+Earlier, we wrote a `University` class that needed to loop over all the students enrolled in the university in order to compute the list of students enrolled in a particular class. Let's take this opportunity to review this by also introducing one neat feature of Python that allows you to define simple data types called **data classes**. We can define a `Student` data class that takes a `str` name and `dict[str, int]` for courses and the credits for each course with the `@dataclass` **decorator**. Decorators will turn out to be a useful feature for the remainder of this lesson, and it's helpful to see it a more familiar context first.
+
+%% Cell type:code id:4c17bf60-fcb1-448b-8b81-2c3a8970e9aa tags:
+
+``` python
+from dataclasses import dataclass, field
+
+
+@dataclass
+class Student:
+    """Represents a student with a given name and dictionary mapping course names to credits."""
+    # These are instance fields, not class-wide fields!
+    name: str
+    courses: dict[str, int] = field(default_factory=dict)
+    # field function specifies that the default value for courses is an empty dictionary
+
+    def get_courses(self) -> list[str]:
+        """Return a list of all the enrolled courses."""
+        from time import sleep
+        sleep(0.0001) # The time it takes to get your course list from MyUW (jk
+        return list(self.courses)
+```
+
+%% Cell type:markdown id:df4cf234-ae83-49b7-9f6f-bc65ffec12ba tags:
+
+The `@dataclass` decorator "wraps" the `Student` class in this example by providing many useful dunder methods like `__init__`, `__repr__`, and `__eq__` (used to compare whether two instances are `==` to each other), among many others. This allows you to write code that actually describes your class, rather than the "boilerplate" or template code that we always need to write when defining a class.
+
+Default values for each field can be set with `=` assignment operator, and dataclasses also have a built-in way of handling mutable default parameters like fields that are assigned to lists or dictionaries using the `field` function as shown above.
+
+%% Cell type:code id:59fc6500-a4e5-4ad8-892e-6ccdce0366ad tags:
+
+``` python
+# let's create a student instance for practice!
+```
+
+%% Cell type:markdown id:09001a83-8d7a-42b8-8475-0525e770c54b tags:
+
+Let's see how we can use this `Student` data class to implement our `University` class again.
+
+%% Cell type:code id:a47c2c02-c821-4aad-974a-962856bf47b9 tags:
+
+``` python
+from typing import Optional
+
+
+class University:
+    """Represents a university with a name and zero or more enrolled students."""
+
+    def __init__(self, name: str, students: Optional[list[Student]] = None) -> None:
+        """Initializes a new University with the given name and students (default None)."""
+        self.name = name
+        if students is None:
+            self.enrolled_students = []
+        else:
+            self.enrolled_students = students
+
+        # self.courses: dict[str, list[Student]] = {}
+        # for student in self.enrolled_students:
+            # for course in student.get_courses():
+                # if course not in self.courses:
+                    # self.courses[course] = []
+                # self.courses[course].append(student)
+
+    def students(self) -> list[Student]:
+        """Returns a list of all enrolled students sorted by alphabetical order."""
+        return sorted(self.enrolled_students, key=Student.get_name)
+
+    def enroll(self, student: Student) -> None:
+        """Enrolls the given student in this university."""
+        self.enrolled_students.append(student)
+
+    def roster(self, course: str) -> list[Student]:
+        """Returns a list of all students enrolled in the given course name."""
+        # if course not in self.courses:
+            # return []
+        # return self.courses[course]
+        students = [student for student in self.enrolled_students if course in student.get_courses()]
+        return students
+
+
+uw = University("Udub", [Student(868373, "nicole.txt"), Student(123456, "nicole.txt")])
+uw.students()
+```
+
+%% Cell type:markdown id:618b57eb-f9db-49ea-a1d5-929c79055276 tags:
+
+## Caching
+
+The approach of precomputing and saving results in the constructor is one way to make a function more efficient than simply computing the result each time we call the function. But these aren't the only two ways to approach this problem. In fact, there's another approach that we could have taken that sits in between these two approaches. We can **cache** or **memoize** the result by computing it once and then saving the result so that we don't have to compute it again when asked for the same result in the future.
+
+Let's start by copying and pasting the `University` class that we wrote above into the following code cell. Rename the new class to `CachedUniversity` so that we can compare them later.
+
+%% Cell type:code id:78743d7f-f3bc-4844-bc19-155e37399436 tags:
+
+``` python
+```
+
+%% Cell type:code id:79fea560-2878-43f0-9820-a0c1abad4b81 tags:
+
+``` python
+%time uw.roster("CSE163")
+```
+
+%% Cell type:code id:203256c7-66d8-4620-a146-49b513c2e491 tags:
+
+``` python
+%time uw.roster("CSE163")
+```
+
+%% Cell type:markdown id:80fa293b-b505-4c2b-8cd7-7a08a6f9b71a tags:
+
+Okay, so this doesn't seem that much faster since we're only working with 2 students. There are 60,081 students enrolled across the three UW campuses. Let's try generating that many students for our courses.
+
+%% Cell type:code id:ecd853e9-b491-4951-adf2-9ca5da5055eb tags:
+
+``` python
+names = ["Thrisha", "Kevin", "Nicole"]
+students = [Student(names[i % len(names)], {"CSE163": 4}) for i in range(60081)]
+students[:10]
+```
+
+%% Cell type:markdown id:278826dd-c2be-4baf-8cbe-7944a53f29c7 tags:
+
+Python provides a way to implement this caching feature using the `@cache` decorator. Like the `@dataclass` decorator, the `@cache` decorator helps us focus on expressing just the program logic that we want to communicate.
+
+%% Cell type:code id:4f50a799-00a0-4bb2-b653-7eec9f53fb0e tags:
+
+``` python
+from functools import cache
+...
+```
+
+%% Cell type:markdown id:35e7c55e-369b-4402-8233-1a9d21633a1f tags:
+
+## Counting steps
+
+A "simple" line of code takes one step to run. Some common simple lines of code are:
+
+- `print` statements
+- Simple arithmetic (e.g. `1 + 2 * 3`)
+- Variable assignment or access (e.g. `x = 4`, `1 + x`)
+- List or dictionary accesses (e.g. `l[0]`, `l[1] = 4`, `'key' in d`).
+
+The number of steps for a sequence of lines is the sum of the steps for each line.
+
+%% Cell type:code id:80e5d5a0-2780-4f94-8bf8-408ef289c6a2 tags:
+
+``` python
+x = 1
+print(x)
+```
+
+%% Cell type:markdown id:715fcea0-75be-44f1-a54f-72b077a2764a tags:
+
+The number of steps to execute a loop will be the number of times that loop runs multiplied by the number of steps inside its body.
+
+The number of steps in a function is just the sum of all the steps of the code in its body. The number of steps for a function is the number of steps made whenever that function is called (e.g if a function has 100 steps in it and is called twice, that is 200 steps total).
+
+This notion of a "step" intentionally vague. We will see later that it's not important that you get the exact number of steps correct as long as you get an approximation of their relative scale (more on this later). Below we show some example code blocks and annotate their number of steps in the first comment of the code block.
+
+%% Cell type:code id:c6f503fc-a6be-425f-9cd6-ccf3f2a7d9f9 tags:
+
+``` python
+# Total: 4 steps
+
+print(1 + 2)    # 1 step
+print("hello")  # 1 step
+x = 14 - 2      # 1 step
+y = x ** 2      # 1 step
+```
+
+%% Cell type:code id:ac9adfa5-4f2e-4817-9511-179a44fcba76 tags:
+
+``` python
+# Total: 51 steps
+
+print("Starting loop")  # 1 step
+
+# Loop body has a total of 1 step. It runs 30 times for a total of 30
+for i in range(30):     # Runs 30 times
+    print(i)            #   - 1 step
+
+ # Loop body has a total of 2 steps. It runs 10 times for a total of 20
+for i in range(10):     # Runs 10 times
+    x = i               #   - 1 step
+    y = i**2            #   - 1 step
+```
+
+%% Cell type:code id:0f8b855a-b9f2-45df-8067-96596c219df8 tags:
+
+``` python
+# Total: 521 steps
+
+print("Starting loop")   # 1 step
+
+# Loop body has a total of 26 steps. It runs 20 times for a total of 520
+for i in range(20):      # Runs 20 times
+    print(i)             #   - 1 step
+
+                         #   Loop body has a total of 1 step.
+                         #   It runs 25 times for total of 25.
+    for j in range(25):  #   - Runs 25 times
+        result = i * j   #       - 1 step
+```
+
+%% Cell type:markdown id:f590ba4f-f1ed-40be-84e7-13ad4f08d2dd tags:
+
+## Big-O notation
+
+Now that we feel a little bit more confortable counting steps, lets consider two functions `sum1` and `sum2` to compute the sum of numbers from 1 to some input *n*.
+
+%% Cell type:code id:5ecbfeb2-7238-4a84-a4b2-067730c78cbd tags:
+
+``` python
+def sum1(n: int) -> int:
+    total = 0
+    for i in range(n + 1):
+        total += i
+    return total
+
+
+def sum2(n: int) -> int:
+    return n * (n + 1) // 2
+```
+
+%% Cell type:markdown id:0b066f68-4a97-48cd-8479-ae0030898194 tags:
+
+The second one uses a clever formula discovered by Gauss that computes the sum of numbers from 1 to n using a formula.
+
+To answer the question of how many steps it takes to run these functions, we now need to talk about the number of steps in terms of the input *n* (since the number of steps might depend on *n*).
+
+%% Cell type:code id:b749ba33-0e20-41dd-9aa9-f73e207c68c8 tags:
+
+``` python
+def sum1(n: int) -> int:     # Total:
+    total = 0                #
+    for i in range(n + 1):   #
+        total += i           #
+    return total             #
+
+
+def sum2(n: int) -> int:     # Total:
+    return n * (n + 1) // 2  #
+```
+
+%% Cell type:markdown id:43125198-e9bf-4fea-8fdb-05e20d117b5b tags:
+
+Since everything we've done with counting steps is a super loose approximation, we don't really want to promise that the relationship for the **runtime** is something exactly like *n* + 3. Instead, we want to report that there is some dependence on *n* for the runtime so that we get the idea of how the function scales, without spending too much time quibbling over whether it's truly *n* + 3 or *n* + 2.
+
+Instead, computer scientists generally drop any constants or low-order terms from the expression *n* + 3 and just report that it "grows like *n*." The notation we use is called **Big-O** notation. Instead of reporting *n* + 3 as the growth of steps, we instead report O(*n*). This means that the growth is proportional to *n* but we aren't saying the exact number of steps. We instead only report the terms that have the **biggest impact** on a function's runtime.
+
+Let's practice this! About how many steps are run in the loop for the given method?
+
+%% Cell type:code id:d5f701b1-f5ee-44b1-89ee-d99d3305d4f8 tags:
+
+``` python
+def method(n: int) -> int:
+    result = 0
+    for i in range(9):
+        for j in range(n):
+            result += j
+    return result
+
+
+method(10)
+```
+
+%% Cell type:markdown id:29ec9792-94d1-44be-86bb-3d6711f655ed tags:
+
+Big-O notation is helpful because it lets us communicate how the runtime of a function scales with respect to its input size. By scale, we mean quantifying approximately how much longer a function will run if you were to increase its input size. For example, if I told you a function had a runtime in O(*n*<sup>2</sup>), you would know that if I were to double the input size *n*, the function would take about 4 times as long to run.
+
+In addition to O(*n*) and O(*n*<sup>2</sup>), there are many other **orders of growth** in the [Big-O Cheat Sheet](https://www.bigocheatsheet.com/).
+
+%% Cell type:markdown id:5d8de2fb-94b2-4e23-b1a4-e035d0c944ee tags:
+
+## Data structures
+
+Most operations in a set or a dictionary, such as adding values, removing values, containment check, are actually constant time or O(1) operations. No matter how much data is stored in a set or dictionary, we can always answer questions like, "Is this value a key in this dictionary?" in constant time.
+
+In the [data-structures.ipynb](data-structures.ipynb) notebook, we wanted to count the number of unique words in the file, `moby-dick.txt`. We explored two different implementations that were nearly identical, except one used a `list` and the other used a `set`.
+
+```py
+def count_unique_list(file_name: str) -> int:
+    words = list()
+    with open(file_name) as file:
+        for line in file.readlines():
+            for word in line.split():
+                if word not in words:  # O(n)
+                    words.append(word) # O(1)
+    return len(words)
+
+def count_unique_set(file_name: str) -> int:
+    words = set()
+    with open(file_name) as file:
+        for line in file.readlines():
+            for word in line.split():
+                if word not in words:  # O(1)
+                    words.add(word)    # O(1)
+    return len(words)
+```
+
+The `not in` operator in the `list` version was responsible for the difference in runtime because it's an O(*n*) operation. Whereas, for `set`, the `not in` operator is an O(1) operation!
+
+The analysis ends up being a little bit more complex since the size of the list is changing throughout the function (as you add unique words), but with a little work you can show that the `list` version of this function takes O(*n*<sup>2</sup>) time where *n* is the number of words in the file while the `set` version only takes O(n) time.