fix a line

ceddf0bc · Yuxuan Mei · da2c6132 · ceddf0bc
Commit ceddf0bc authored 9 months ago by Yuxuan Mei
--- a/objects.ipynb
+++ b/objects.ipynb
@@ -216,7 +216,7 @@
    "\"\"\"\n",
    "\n",
    "import io\n",
-    "staff = pd.read_csv(io.StringIO(csv))\n",
+    "staff = pd.read_csv(io.StringIO(csv), index_col=\"Name\")\n",
    "staff[\"Hours\"][\"Thrisha\"]"
   ]
  },

 %% Cell type:markdown id:c55af259-5103-44f5-9258-cb4fd6a90afd tags:

 # Objects

 Over the past few weeks, we've used the word "object" frequently without defining exactly what it means. In this lesson, we'll introduce objects and see how we can use them in real data programming work. By the end of this lesson, students will be able to:

 - Define a Python class to represent objects with specific states and behaviors.
 - Explain how the Python memory model allows multiple references to the same objects.
 - Add type annotations to variables, function definitions, and class fields.

 %% Cell type:code id:66dcbcfa-a83c-4726-9220-102944b563bd tags:

 ``` python
 import pandas as pd
 ```

 %% Cell type:markdown id:990607a9-cada-4c42-8189-d9b56528e608 tags:

 An **object** (aka instance) in Python is a way of combining into a distinct unit (aka encapsulating) two software concepts:

 - **State**, or data like the elements of a list.
 - **Behavior**, or methods like a function that can take a list and return the size of the list.

 Recently, we've been using `DataFrame` objects frequently. A `DataFrame` stores data (state) and has many methods (behaviors), such as `groupby`.

 %% Cell type:code id:ac980b99-82fc-4932-8a70-5969405329d9 tags:

 ``` python
 seattle_air = pd.read_csv("seattle_air.csv", index_col="Time", parse_dates=True)
 seattle_air.groupby(seattle_air.index.year).count()
 ```

 %% Cell type:markdown id:3308f98e-3c10-43ee-82cb-442f73765d18 tags:

 ## Reference semantics

 When we call a method like `groupby` and then `count` each group, the result is a new object that is distinct from the original. If we now ask for the value of `seattle_air`, we'll see that the original `DataFrame` is still there with all its data intact and untouched by the `groupby` or `count` operations.

 %% Cell type:code id:800fd6a8-9de0-4347-a379-467bd7c2b238 tags:

 ``` python
 seattle_air
 ```

 %% Cell type:markdown id:db208d3f-72ec-4b25-b960-46a7d5e723d6 tags:

 However, unlike `groupby`, there are some `DataFrame` methods that can modify the underlying `DataFrame`. The `dropna` method for removing `NaN` values can modify the original when we include the keyword argument `inplace=True` (default `False`). Furthermore, if `inplace=True`, `dropna` will return `None` to more clearly communicate that instead of returning a new `DataFrame`, changes were made to the original `DataFrame`.

 %% Cell type:code id:371037eb-1ea3-4600-8cf3-f6b95756a383 tags:

 ``` python
 seattle_air.dropna()
 ```

 %% Cell type:code id:7f1df1fb-7971-467d-ae3d-b4c4b6fd86a6 tags:

 ``` python
 seattle_air
 ```

 %% Cell type:markdown id:c1b3d08a-bb9e-4fdb-b2c8-4eb454caf1be tags:

 ## Defining classes

 Python allows us to create our own custom objects by defining a **class**: a blueprint or template for objects. The `pandas` developers defined a `DataFrame` class so that you can construct `DataFrame` objects to use. Here's a highly simplified outline of the code that they could have written to define the `DataFrame` class.

 %% Cell type:code id:180bc8db-2c98-4b86-a067-f118b52a2bed tags:

 ``` python
 class DataFrame:
    """Represents two-dimensional tabular data structured around an index and column names."""

    def __init__(self, index, columns, data):
        """Initializes a new DataFrame object from the given index, columns, and tabular data."""
        print("Initializing DataFrame")
        self.index = index
        self.columns = columns
        self.data = data

    def dropna(self, inplace=False):
        """"
        Drops all rows containing NaN from this DataFrame. If inplace, returns None and modifies
        self. If not inplace, returns a new DataFrame without modifying self.
        """
        print("Calling dropna")
        if not inplace:
            return DataFrame([...], [...], [...])
        else:
            self.columns = [...]
            self.index = [...]
            self.data = [...]
            return None

    def __getitem__(self, column_or_indexer):
        """Given a column or indexer, returns the selection as a new Series or DataFrame object."""
        print("Calling __getitem__")
        if column_or_indexer in self.columns:
            return "Series" # placeholder for a Series
        else:
            return DataFrame([...], [...], [...])
 ```

 %% Cell type:markdown id:1147f0eb-6ffb-424b-8973-8a55981b409f tags:

 Let's breakdown each line of code.

 - `class DataFrame:` begins the class definition. We always name classes by capitalizing each word removing spaces between words.
 - `def __init__(self, index, columns, data):` defines a special function called an initializer. The **initializer** is called whenever constructing a new object. Each `DataFrame` stores its own data in **fields** (variables associated with an object), in this case called `index`, `columns`, and `data`.
 - `def dropna(self, inplace=False):` defines a function that can be called on `DataFrame` objects. Like the initializer, it also takes a `self` parameter as well as a default parameter `inplace=False`. Depending on the value of `inplace`, it can either return a new `DataFrame` or `None`.
 - `def __getitem__(self, column_or_indexer):` defines a special function that is called when you use the square brackets for indexing.

 Notice how every **method** (function associated with an object) always takes `self` as the first parameter. The two special functions that we defined above are only "special" in the sense that they have a specific naming format preceded by two underscores and followed by two underscores. These **dunder methods** are used internally by Python to enable the convenient syntax that we're all used to using.

 Just like how we need to call a function to use it, we also need to create an object (instance) to use a class.

 %% Cell type:code id:29e94100-8960-44c5-942d-e702385e6e15 tags:

 ``` python
 example = DataFrame([0, 1, 2], ["PM2.5"], [10, 20, 30])
 example["PM2.5"]
 ```

 %% Cell type:markdown id:d3b85f67-bf2f-4438-9de6-50e0541dc0a8 tags:

 Another useful dunder method is the `__repr__` method, which should return a string representing the object. By default, `__repr__` just tells you the fully-qualified name of the object's class and the location it is stored in your computer memory. But we can make it much more useful by defining our own `__repr__` method.

 %% Cell type:code id:705dad74-05e6-4153-817b-42aefa05bc09 tags:

 ``` python
 example
 ```

 %% Cell type:markdown id:672aa9ab tags:

 Poll questions: staff["Hours"]["Thrisha"]

 %% Cell type:code id:a9f0a780 tags:

 ``` python
 csv = """
 Name,Hours
 Diana,10
 Thrisha,15
 Yuxiang,20
 Sheamin,12
 """

 import io
-staff = pd.read_csv(io.StringIO(csv))
+staff = pd.read_csv(io.StringIO(csv), index_col="Name")
 staff["Hours"]["Thrisha"]
 ```

 %% Cell type:markdown id:c8c4ec93-e672-4db5-8b56-a146f4f6223d tags:

 ## Practice: `Student` class

 Write a `Student` class that represents a UW student, where each student has a `name`, a student `number`, and a `courses` dictionary that associates the name of each course to a number of credits. The `Student` class should include the following methods:

 - An initializer that takes the student number and the name of a file containing information about their schedule.
 - A method `__getitem__` that takes a `str` course name and returns the `int` number of credits for the course. If the student is not taking the given course, return `None`.
 - A method `get_courses` that returns a list of the courses the student is taking.

 Consider the following file `nicole.txt`.

 ```
 CSE163 4
 PHIL100 4
 CSE390HA 1
 ```

 The student's `name` is just the name of the file without the file extension. The file indicates they are taking CSE163 for 4 credits, PHIL100 for 4 credits, and CSE390HA for 1 credit.

 %% Cell type:code id:30c03ce0-e417-4a27-ab0d-fd0d8e76994a tags:

 ``` python
 class Student:
    ...


 nicole = Student(1234567, "nicole.txt")
 for course in nicole.get_courses():
    print(course, nicole[course])
 ```

 %% Cell type:markdown id:bc5e7876-0ff7-4322-9827-321069c9642b tags:

 ## Type annotations

 We've talked a lot about the types of each variable in the Python programs that we write, but we can also optionally write-in the type of each variable or return value as a type hint. In certain assessments, we'll use `mypy` to check your type annotations. Let's read the [Type hints cheat sheet](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html) and practice adding type annotations to our previous class definitions.

 %% Cell type:code id:a7be2cc2-1cba-407a-a414-f58a09003188 tags:

 ``` python
 !pip install -q nb_mypy
 %reload_ext nb_mypy
 %nb_mypy mypy-options --strict
 ```

 %% Cell type:markdown id:c620677e-365f-45f5-9a31-2e692d8bac9a tags:

 ## Practice: `University` class

 Write a `University` class that represents one or more students enrolled in courses at a university. The `University` class should include the following methods:

 - An initializer that takes the university name and, optionally, a list of `Student` objects to enroll in this university.
 - A method `enrollments` that takes returns all the enrolled `Student` objects sorted in alphabetical order by student name.
 - A method `enroll` that takes a `Student` object and enrolls them in the university.

 Later, we'll add more methods to this class. How well does your approach stand up to changing requirements?

 %% Cell type:code id:02488b8c-6326-4a9a-bb58-ef17f7109365 tags:

 ``` python
 class University:
    ...


 uw = University("Udub", [nicole])
 uw.enrollments()
 ```

 %% Cell type:markdown id:bea2ec7d-d918-4a4c-90ec-c409a427133a tags:

 ## Mutable default parameters

 Default parameter values are evaluated and bound to the parameter when the function is defined. This can lead to some unanticipated results when using mutable values like lists or dictionaries as default parameter values.

 Say we make two new `University` objects without specifying a list of students to enroll. The initializer might then assign this list value to a field.

 %% Cell type:code id:0d910b70-cba7-4e89-9efb-e8e61418a2ed tags:

 ``` python
 wsu = University("Wazzu")
 wsu.enrollments()
 ```

 %% Cell type:code id:b615d9eb-ef01-4096-a391-6c1b80dc8cb1 tags:

 ``` python
 sea_u = University("SeaU")
 sea_u.enrollments()
 ```

 %% Cell type:markdown id:647347fb-ec0f-4e46-acbc-01f848cb01d1 tags:

 When we enroll a student to `sea_u`, the change will also affect `wsu`. There are several ways to work around this, with the most common approach changing the default parameter value to `None` and adding an `if` statement in the program logic.

 %% Cell type:code id:04de0e07-21bd-47b4-b285-e851b17ab1e0 tags:

 ``` python
 sea_u.enroll(nicole)
 sea_u.enrollments()
 ```

 %% Cell type:code id:3332a75a-ff7b-4f15-b411-40dcf797262b tags:

 ``` python
 wsu.enrollments()
 ```