Over the past few weeks, we've used the word "object" frequently without defining exactly what it means. In this lesson, we'll introduce objects and see how we can use them in real data programming work. By the end of this lesson, students will be able to:
- Define a Python class to represent objects with specific states and behaviors.
- Explain how the Python memory model allows multiple references to the same objects.
- Add type annotations to variables, function definitions, and class fields.
When we call a method like `groupby` and then `count` each group, the result is a new object that is distinct from the original. If we now ask for the value of `seattle_air`, we'll see that the original `DataFrame` is still there with all its data intact and untouched by the `groupby` or `count` operations.
However, unlike `groupby`, there are some `DataFrame` methods that can modify the underlying `DataFrame`. The `dropna` method for removing `NaN` values can modify the original when we include the keyword argument `inplace=True` (default `False`). Furthermore, if `inplace=True`, `dropna` will return `None` to more clearly communicate that instead of returning a new `DataFrame`, changes were made to the original `DataFrame`.
Python allows us to create our own custom objects by defining a **class**: a blueprint or template for objects. The `pandas` developers defined a `DataFrame` class so that you can construct `DataFrame` objects to use. Here's a highly simplified outline of the code that they could have written to define the `DataFrame` class.
-`class DataFrame:` begins the class definition. We always name classes by capitalizing each word removing spaces between words.
-`def __init__(self, index, columns, data):` defines a special function called an initializer. The **initializer** is called whenever constructing a new object. Each `DataFrame` stores its own data in **fields** (variables associated with an object), in this case called `index`, `columns`, and `data`.
-`def dropna(self, inplace=False):` defines a function that can be called on `DataFrame` objects. Like the initializer, it also takes a `self` parameter as well as a default parameter `inplace=False`. Depending on the value of `inplace`, it can either return a new `DataFrame` or `None`.
-`def __getitem__(self, column_or_indexer):` defines a special function that is called when you use the square brackets for indexing.
Notice how every **method** (function associated with an object) always takes `self` as the first parameter. The two special functions that we defined above are only "special" in the sense that they have a specific naming format preceded by two underscores and followed by two underscores. These **dunder methods** are used internally by Python to enable the convenient syntax that we're all used to using.
Just like how we need to call a function to use it, we also need to create an object (instance) to use a class.
Another useful dunder method is the `__repr__` method, which should return a string representing the object. By default, `__repr__` just tells you the fully-qualified name of the object's class and the location it is stored in your computer memory. But we can make it much more useful by defining our own `__repr__` method.
Write a `Student` class that represents a UW student, where each student has a `name`, a student `number`, and a `courses` dictionary that associates the name of each course to a number of credits. The `Student` class should include the following methods:
- An initializer that takes the student number and the name of a file containing information about their schedule.
- A method `__getitem__` that takes a `str` course name and returns the `int` number of credits for the course. If the student is not taking the given course, return `None`.
- A method `get_courses` that returns a list of the courses the student is taking.
Consider the following file `nicole.txt`.
```
CSE163 4
PHIL100 4
CSE390HA 1
```
The student's `name` is just the name of the file without the file extension. The file indicates they are taking CSE163 for 4 credits, PHIL100 for 4 credits, and CSE390HA for 1 credit.
We've talked a lot about the types of each variable in the Python programs that we write, but we can also optionally write-in the type of each variable or return value as a type hint. In certain assessments, we'll use `mypy` to check your type annotations. Let's read the [Type hints cheat sheet](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html) and practice adding type annotations to our previous class definitions.
Write a `University` class that represents one or more students enrolled in courses at a university. The `University` class should include the following methods:
- An initializer that takes the university name and, optionally, a list of `Student` objects to enroll in this university.
- A method `enrollments` that takes returns all the enrolled `Student` objects sorted in alphabetical order by student name.
- A method `enroll` that takes a `Student` object and enrolls them in the university.
Later, we'll add more methods to this class. How well does your approach stand up to changing requirements?
Default parameter values are evaluated and bound to the parameter when the function is defined. This can lead to some unanticipated results when using mutable values like lists or dictionaries as default parameter values.
Say we make two new `University` objects without specifying a list of students to enroll. The initializer might then assign this list value to a field.
When we enroll a student to `sea_u`, the change will also affect `wsu`. There are several ways to work around this, with the most common approach changing the default parameter value to `None` and adding an `if` statement in the program logic.