diff --git a/spreadsheets.ipynb b/spreadsheets.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..75599fae04bf6a0bcfcd3efcb2766e90f45276b9 --- /dev/null +++ b/spreadsheets.ipynb @@ -0,0 +1,136 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "8a4f78b6-0e92-4965-88c3-23110651b371", + "metadata": {}, + "source": [ + "# Spreadsheets\n", + "\n", + "By the end of this lesson, students will be able to:\n", + "\n", + "- Design spreadsheet data models that enable reproducible data analysis.\n", + "- Convert a pivot table operation to `pandas` `groupby` and vice versa.\n", + "- Write spreadsheet formulas that apply a function over many cells.\n", + "\n", + "For this lesson, we'll spend most of our time in the preceding notebook on [groupby-and-indexing.ipynb](groupby-and-indexing.ipynb).\n", + "\n", + "Later, we'll download the `earthquakes.csv` file and use it to create a spreadsheet. In lecture, we will visit [sheets.new](https://sheets.new) to create a new Google Sheet." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "494a3641", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import seaborn as sns\n", + "\n", + "sns.set_theme()" + ] + }, + { + "cell_type": "markdown", + "id": "37a0cdc1", + "metadata": {}, + "source": [ + "### What is pivot table?\n", + "\n", + "Let's first revisit the life expectancy dataset and use this as an example of showing what it is in pandas." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a344f81a", + "metadata": {}, + "outputs": [], + "source": [ + "life_expectancy = sns.load_dataset(\"healthexp\", index_col=[\"Year\", \"Country\"])\n", + "life_expectancy" + ] + }, + { + "cell_type": "markdown", + "id": "b36c314e", + "metadata": {}, + "source": [ + "Let's try pivoting the table about the \"Country\" column. We can read the documentation of `pivot_table` [here](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "672500b8", + "metadata": {}, + "outputs": [], + "source": [ + "pivoted_table = life_expectancy.pivot_table(index=\"Year\", columns=\"Country\", values=\"Life_Expectancy\")\n", + "pivoted_table.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1d994006", + "metadata": {}, + "outputs": [], + "source": [ + "life_expectancy.columns, pivoted_table.columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1eafc202", + "metadata": {}, + "outputs": [], + "source": [ + "sns.relplot(pivoted_table, kind=\"line\")\n", + "# pretty much the same as\n", + "# sns.relplot(life_expectancy, x=\"Year\", y=\"Life_Expectancy\", hue=\"Country\", kind=\"line\")" + ] + }, + { + "cell_type": "markdown", + "id": "d382e27c", + "metadata": {}, + "source": [ + "### Check the pandas version of earthquakes pivot table" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a32efdf6", + "metadata": {}, + "outputs": [], + "source": [ + "earthquakes = pd.read_csv(\"earthquakes.csv\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.-1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}