Skip to content
Snippets Groups Projects
Commit cbbbe242 authored by Yuxuan Mei's avatar Yuxuan Mei
Browse files

add Friday lecture material

parent 204ffdb1
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:55b51b2a-ffc7-41e2-8094-0db9824913f1 tags:
# Data Structures
**Data structures**, such as lists, can represent complex data. While lists are quite useful on their own, Python provides several other built-in data structures to make it easier to represent complex data. By the end of this lesson, students will be able to:
- Apply list comprehensions to define basic `list` sequences.
- Apply `set` operations to store and retrieve values in a set.
- Apply `dict` operations to store and retrieve values in a dictionary.
- Describe the difference between the various data structures' properties (`list`, `set`, `dict`, `tuple`).
%% Cell type:code id:69ed929a-8bea-44a5-81ef-7075ec9447d3 tags:
``` python
import doctest
```
%% Cell type:markdown id:cf8e9266-78a6-4c7b-bb07-f3619ed3ccab tags:
## List comprehensions
Another one of the best features of Python is the list comprehension. A **list comprehension** provides a concise expression for building-up a list of values by looping over any type of sequence.
We already know how to create a list counting all the numbers between 0 and 10 (exclusive) by looping over a `range`.
%% Cell type:code id:f74c083a-68ac-4177-ace8-9c2b1a74637b tags:
``` python
nums = []
for i in range(10):
nums.append(i)
nums
```
%% Cell type:markdown id:2e3be061-e14a-48b8-912a-925100a35522 tags:
A list comprehension provides a shorter expression for achieving the same result.
Its basic structure is: [item_to_select for loop if condition [more nested list comp] ]
%% Cell type:code id:42bf028a-7c75-4db8-865d-2d5ab3b66c0f tags:
``` python
[i for i in range(10)]
```
%% Cell type:markdown id:e4721136-aa2d-4c84-822a-d8764ca904b7 tags:
What if we wanted to compute all these values squared? A list comprehension can help us with this as well.
%% Cell type:code id:1eca254c-8414-4fa7-a005-8149e49091ce tags:
``` python
[i ** 2 for i in range(10)]
```
%% Cell type:markdown id:2a38161b-9b7e-4849-a0d8-cad9cdfed786 tags:
Or, what if we wanted to only include values of `i` that are even?
%% Cell type:code id:0a1073d8-ef95-4346-9cb4-2cbcae1ceb35 tags:
``` python
[i ** 2 for i in range(10) if i % 2 == 0]
```
%% Cell type:markdown id:d11f17fb-4d85-4d80-a561-07bd70231962 tags:
Before running the next block of code, what do you think it will output?
%% Cell type:code id:c99f0fb9-bcdd-4a96-a6a2-e29edfd374c5 tags:
``` python
words = "I saw a dog today".split()
[word[0] for word in words if len(word) >= 2]
```
%% Cell type:markdown id:957a9c28 tags:
Now let's try to rewrite the count_odd function from the file processing lesson with list comprehension:
```
def count_odd(path):
"""
For the file path, prints out each line number followed by the number of odd-length tokens.
>>> count_odd("poem.txt")
1 2
2 1
3 0
4 3
"""
with open(path) as f:
lines = f.readlines()
line_num = 1
for line in lines:
tokens = line.split()
odd_count = 0
for token in tokens:
if len(token) % 2 == 1:
odd_count += 1
print(line_num, odd_count)
line_num += 1
doctest.run_docstring_examples(count_odd, globals())
```
%% Cell type:code id:009427c7 tags:
``` python
```
%% Cell type:markdown id:a7db0f78-0760-42bb-ba45-1790e0fb89c9 tags:
### Practice: Fun numbers
Fill in the blank with a list comprehension to complete the definition for `fun_numbers`.
%% Cell type:code id:7aadbbf2-74a5-4e94-9b46-0a00de377506 tags:
``` python
def fun_numbers(start, stop):
"""
Returns an increasing list of all fun numbers between start (inclusive) and stop (exclusive).
A fun number is defined as a number that is either divisible by 2 or divisible by 5.
>>> fun_numbers(2, 16)
[2, 4, 5, 6, 8, 10, 12, 14, 15]
"""
return ...
doctest.run_docstring_examples(fun_numbers, globals())
```
%% Cell type:markdown id:3c2f121b-9a33-4e11-916c-5515b77139f7 tags:
## Tuples
Whereas lists represent mutable sequences of any elements, **tuples** (pronounced "two pull" or like "supple") represent immutable sequences of any elements. Just like strings, tuples are immutable, so the sequence of elements in a tuple cannot be modified after the tuple has been created.
While lists are defined with square brackets, **tuples are defined by commas alone** as in the expression `1, 2, 3`. We often add parentheses around the structure for clarity. In fact, when representing a tuple, Python will use parentheses to indicate a tuple.
%% Cell type:code id:51ddc795-ffc2-43ef-92a1-65274924092d tags:
``` python
1, 2, 3
```
%% Cell type:code id:feba2e4a tags:
``` python
a = tuple([1, 2, 3])
a
```
%% Cell type:code id:28aeb52f tags:
``` python
a[2]
```
%% Cell type:code id:7a4a5919 tags:
``` python
a[2] = 0 # tuples are immutable!
```
%% Cell type:code id:eaa52a04 tags:
``` python
len(a)
```
%% Cell type:code id:494f5e92 tags:
``` python
for value in a:
print(value)
```
%% Cell type:markdown id:d330db3c-384c-49fe-8e09-fd825dfd004e tags:
We learned that there are many list functions, most of which modify the original list. Since tuples are immutable, there isn't an equivalent list of tuple functions. So why use tuples when we could just use lists instead?
> Your choice of data structure communicates information to other programmers. If you know exactly how many elements should go in your data structure and those elements don't need to change, a `tuple` is right for the job. **By choosing to use a tuple in this situation, we communicate to other programmers that the sequence of elements in this data structure cannot change!** Everyone working on your project doesn't have to worry about passing a `tuple` to a function and that function somehow destroying the data.
Tuples provide a helpful way return more than one value from a function. For example, we can write a function that returns both the first letter and the second letter from a word.
%% Cell type:code id:df832f2e-5c24-43df-89db-c08a168bd18f tags:
``` python
def first_two_letters(word):
return word[0], word[1]
a, b = first_two_letters('goodbye')
a
```
%% Cell type:markdown id:06e61e0b-1bc1-4c54-86d5-5957023e8ad0 tags:
## Sets
Whereas lists represent mutable sequences of any elements, **sets** represent mutable unordered collections of unique elements. Unlike lists, sets are not sequences so we cannot index into a set in the same way that we could for lists and tuples. Sets only represent unique elements, so attempts to add duplicate elements are ignored.
%% Cell type:code id:c281cc03-9f6a-40e7-8dab-42d0bb1f6c86 tags:
``` python
nums = set()
nums.add(1)
nums.add(2)
nums.add(3)
nums.add(2) # duplicate ignored
nums.add(-1)
nums
```
%% Cell type:markdown id:a4bbc637-5bc9-4b14-b181-f9d9bb9172d5 tags:
So what's the point of using a `set` over a `list`? Sets are often much faster than lists at determining whether a particular element is contained in the set. We can see this in action by comparing the time it takes to count the number of unique words in a large document. Using a list results in much slower code.
%% Cell type:code id:0fb31262-a0d6-49ec-8292-f36b3a7afdb0 tags:
``` python
def count_unique(path):
unique = []
with open(path) as f:
for line in f.readlines():
for token in line.split():
if token not in unique:
unique.append(token)
return len(unique)
%time count_unique("moby-dick.txt")
```
%% Cell type:markdown id:4f4e8a0d-9232-459b-8c07-b6ceff760e92 tags:
By combining sets and list comprehensions, we can compose our programs in more "Pythonic" ways.
%% Cell type:code id:aa4422d3-b392-4f66-8e79-1aebcc3ee5b0 tags:
``` python
def count_unique(path):
with open(path) as f:
return ...
%time count_unique("moby-dick.txt")
```
%% Cell type:markdown id:4c11b8b0-64e4-45f8-a143-75f0380452bf tags:
### Practice: Area codes
Fill in the blank to compose a "Pythonic" program that returns the number of unique area codes from the given list of phone numbers formatted as strings like `"123-456-7890"`. The area code is defined as the first 3 digits in a phone number.
%% Cell type:code id:010cef0c-46af-4ee1-8145-0ead4454fba8 tags:
``` python
def area_codes(phone_numbers):
"""
Returns the number of unique area codes in the given sequence.
>>> area_codes([
... '123-456-7890',
... '206-123-4567',
... '123-000-0000',
... '425-999-9999'
... ])
3
"""
return len(set(...))
doctest.run_docstring_examples(area_codes, globals())
```
%% Cell type:markdown id:78c7d550-7a6e-4a94-a7a1-cbfdc0a4100f tags:
## Dictionaries
A **dictionary** represents mutable unordered collections of key-value pairs, where the keys are immutable and unique. In other words, dictionaries are more flexible than lists. A list could be considered a dictionary where the "keys" are non-negative integers counting from 0 to the length minus 1.
%% Cell type:code id:5a63650e tags:
``` python
empty_dictionary = dict()
empty_dictionary
```
%% Cell type:code id:f8e2f152 tags:
``` python
lecture_schedule = dict()
lecture_schedule["06/17"] = "welcome-and-control-structures"
lecture_schedule["06/19"] = "holiday: Juneteenth"
lecture_schedule["06/21"] = "files-and-data-structures"
lecture_schedule
```
%% Cell type:code id:c523e8ee tags:
``` python
lecture_schedule.keys()
```
%% Cell type:code id:2abe85f6 tags:
``` python
for i in range(len(lecture_schedule.keys())):
print(lecture_schedule.keys()[i])
```
%% Cell type:code id:8b27202d tags:
``` python
for k in lecture_schedule.keys():
print(k)
```
%% Cell type:code id:da4d1365 tags:
``` python
lecture_schedule.values()
```
%% Cell type:code id:b5d1fd9e tags:
``` python
lecture_schedule.items()
```
%% Cell type:code id:da11753e tags:
``` python
for k, v in lecture_schedule.items():
print(k, v)
```
%% Cell type:markdown id:99ee913b tags:
Dictionaries are often helpful for counting occurrences. Whereas the above example counted the total number of unique words in a text file, a dictionary can help us count the number of occurrences of each unique word in that file.
%% Cell type:code id:b52bc626-0581-4e72-80e6-a65a5880916a tags:
``` python
def count_tokens(path):
counts = {}
with open(path) as f:
for token in f.read().split():
if token not in counts:
counts[token] = 1
else:
counts[token] += 1
return counts
%time count_tokens("moby-dick.txt")
```
%% Cell type:markdown id:01259ae5-82ed-47d8-a6bf-15c4ddd16c2a tags:
As an aside, there's also a more Pythonic way to write this program using `collections.Counter`, which is a specialized dictionary. The `Counter` type also sorts the results in order from greatest to least.
%% Cell type:code id:6eadca47-90d8-453c-aa66-b70619e133b1 tags:
``` python
def count_tokens(path):
from collections import Counter
with open(path) as f:
return ...
%time count_tokens("moby-dick.txt")
```
%% Cell type:markdown id:4f2249b5-92f6-4b3e-80d1-80fd25b28868 tags:
### Practice: Count words by first letters
Suppose we want to compute a histogram (counts) for the number of words that begin with each character in a given text file. Your coworker has written the following code and would like your help to finish the program. Explain your fix.
%% Cell type:code id:942fcab5-2592-4960-9d2d-86ac099d4194 tags:
``` python
def count_by_first_letter(words):
counts = {}
for word in words:
first_letter = word[0]
counts[first_letter] += 1
return counts
count_by_first_letter(['cats', 'dogs', 'deers'])
```
%% Cell type:markdown id:a3c2915e-0384-4c11-b758-d58c931580bc tags:
# File Processing
In this lesson, we'll introduce two ways to process files and synthesize what we've learned about debugging. By the end of this lesson, students will be able to:
- Read text files line-by-line (line processing).
- Read text files token-by-token (token processing).
- Write doctests and debug programs.
%% Cell type:code id:31af8b2f-8053-476e-938e-7f1cd47bc904 tags:
``` python
import doctest
```
%% Cell type:markdown id:cadb57a1-8b04-4f19-b033-da66fbf190f3 tags:
## Opening files in Python
In computers, data is stored in **files** that can represent text documents, pictures, structured spreadsheet-like data, etc. For now, we'll focus on files that represent text data that we indicate with the `.txt` file extension.
We can open and read files in Python using the built-in `open` function and specifying the **path** to the file. We will talk about file paths in a bit, but think of it like the full name of a file on a computer. The following code snippet opens the file path `poem.txt` and reads the text into the Python variable, `content`.
%% Cell type:code id:227ae0b2-1ad0-4d26-8c75-3c8b78435c33 tags:
``` python
with open("poem.txt") as f:
content = f.read()
print(content)
```
%% Cell type:markdown id:124a4a40-0dcc-417f-b14a-f35036ab2492 tags:
The `with open(...) as f` syntax negotiates access to the file with the computer's operating system by maintaining a **file handle**, which is assigned to the variable `f`. (You can use any variable name instead of `f`.) All the code contained in the `with` block has access to the file handle `f`. `f.read()` returns all the contents of the file as string.
%% Cell type:markdown id:5f5e70aa-7bd6-411e-907c-bdef0cecdab1 tags:
## Line processing
It's often useful to read a text file **line-by-line** so that you can process each line separately. We can accomplish this using the `split` function on the content of the file, but Python conveniently provides a `f.readlines()` function that returns all the string text as a list of lines.
The following code snippet prints out the file with a line number in front of each line. In this example `lines` will store a list of each line in the file and our loop over that just keeps track of a counter and prints that before the line itself.
%% Cell type:code id:41b0d320-31c1-4f20-800a-34e0dbcd3bf2 tags:
``` python
with open("poem.txt") as f:
lines = f.readlines()
line_num = 1
for line in lines:
print(line_num, line[:-1]) # Slice-out the newline character at the end
line_num += 1
```
%% Cell type:markdown id:623e33ab-3db9-491b-96e7-25de4cb9ec1b tags:
## Token processing
It's also often useful to process each line of text **token-by-token**. A **token** is a generalization of the idea of a "word" that allows for any sequence of characters separated by spaces. For example, the string `'I really <3 dogs'` has 4 tokens in it.
Token processing extends the idea of line processing by splitting each line on whitespace using the `split` function. In this course, we will use "word" and "token" interchangeably.
%% Cell type:code id:803a3f47-5b97-4680-ab80-464601750754 tags:
``` python
with open("poem.txt") as f:
lines = f.readlines()
line_num = 1
for line in lines:
tokens = line.split()
print(line_num, tokens)
line_num += 1
```
%% Cell type:markdown id:b9f0d195-a230-4d44-825f-9c8f9935c35e tags:
### Practice: Count odd-length tokens
How might we write a Python code snippet that takes the `poem.txt` file and prints the number of odd-length tokens per line?
%% Cell type:code id:1e202c48-ffa1-49f0-9b94-64a4e5165cea tags:
``` python
def count_odd(path):
"""
For the file path, prints out each line number followed by the number of odd-length tokens.
>>> count_odd("poem.txt")
1 2
2 1
3 0
4 3
"""
...
doctest.run_docstring_examples(count_odd, globals())
```
%% Cell type:markdown id:5efb0db2-0a67-4dc3-b5d1-836dc3961e25 tags:
### Practice: Debugging `first_tokens`
Let's help your coworker debug a function `first_tokens`, which should return a list containing the first token from each line in the specified file path. They sent you this message via team chat.
> Hey, do you have a minute to help me debug this function? There's an error when I run it but I can't figure out how to fix it.
Unfortunately, your teammate only provided the code but did not provide any information about the error message, sample inputs to reproduce the problem, or a description of what they already tried.
**Let's practice debugging this together and compose a helpful chat response to them.**
%% Cell type:code id:37e68a8e-cb42-4ce5-86fa-b79e49029b24 tags:
``` python
def first_tokens(path):
result = []
with open(path) as f:
for line in f.readlines():
line.split()
result += line[0]
return result
```
This diff is collapsed.
she sells
sea
shells by
the sea shore
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment