diff --git a/convolutions.ipynb b/convolutions.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..0033c9a935a3ca019c2c2d303795b9a7db996823 --- /dev/null +++ b/convolutions.ipynb @@ -0,0 +1,422 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9b037257-c5aa-4e16-90fb-04e9b2eb3d27", + "metadata": {}, + "source": [ + "# Convolutions\n", + "\n", + "Last time, we learned about how to represent images in Python with `numpy`. In this lesson, we'll learn about convolutions, specifically on image data, and how to implement convolutions. By the end of this lesson, students will be able to:\n", + "\n", + "- Define convolutions in terms of the sliding window algorithm.\n", + "- Recognize common kernels for image convolution.\n", + "- Apply convolutions to recognize objects in an image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "971050e4-2d05-49e0-be42-dc0a63e0b70e", + "metadata": {}, + "outputs": [], + "source": [ + "import imageio.v3 as iio\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "\n", + "\n", + "def compare(*images):\n", + " \"\"\"Display one or more images side by side.\"\"\"\n", + " _, axs = plt.subplots(1, len(images), figsize=(15, 6))\n", + " for ax, image in zip(axs, images):\n", + " ax.imshow(image, cmap=\"gray\")\n", + " plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "48d33575-059a-4249-aae7-9f2a3ac0e703", + "metadata": {}, + "source": [ + "## Convolutions as mathematical operators\n", + "\n", + "Convolution defined mathematically is an operator that takes as input two functions ($f$ and $g$) and produces an third function ($f * g$) as output. In this example, the function $f$ is our input image, Dubs!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e2e4853", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.chdir('materials')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "406f7d9c-be07-45b0-8f62-7b8decabccfd", + "metadata": {}, + "outputs": [], + "source": [ + "# Read image as grayscale\n", + "dubs = iio.imread(\"dubs.jpg\", mode=\"L\")\n", + "plt.imshow(dubs, cmap=\"gray\");" + ] + }, + { + "cell_type": "markdown", + "id": "0a7bd382-6613-4d3d-bdf4-0f08c8198ae2", + "metadata": {}, + "source": [ + "If $f$ represents our input image, what is the role of $g$? Let's watch a clip from 3Blue1Brown (Grant Sanderson)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7ace1238-65ab-408b-9a68-3a662bc5d674", + "metadata": {}, + "outputs": [], + "source": [ + "%%html\n", + "<iframe width=\"640\" height=\"360\" src=\"https://www.youtube-nocookie.com/embed/KuXjwB4LzSA?start=538&end=588\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>" + ] + }, + { + "cell_type": "markdown", + "id": "c7cd5f1b-041b-44ba-8553-27ddd59d9e10", + "metadata": {}, + "source": [ + "In this example, $g$ is a 3-by-3 [**box blur**](https://en.wikipedia.org/wiki/Box_blur) *kernel* that is applied to each patch or *subimage* in $f$.\n", + "\n", + "$$g = \\begin{bmatrix}\n", + "\\frac{1}{9} & \\frac{1}{9} & \\frac{1}{9}\\\\[0.3em]\n", + "\\frac{1}{9} & \\frac{1}{9} & \\frac{1}{9}\\\\[0.3em]\n", + "\\frac{1}{9} & \\frac{1}{9} & \\frac{1}{9}\n", + "\\end{bmatrix}$$\n", + "\n", + "A convolution is like a special way of looping over the pixels in an image. Instead of looping over an image one pixel at a time, a **convolution** loops over an image one **subimage** (patch) at a time. Convolutions are also often called \"sliding window algorithms\" because they start at the top row, generate the first subimage for the top leftmost corner, then *slide over* 1 pixel to the right, and repeats until every subimage has been generated.\n", + "\n", + "Here's how we can create this kernel in Python:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "574e8935-5563-46df-bae5-5656bb37a2f7", + "metadata": {}, + "outputs": [], + "source": [ + "box_blur_3x3 = 1 / 9 * np.ones((3, 3))\n", + "box_blur_3x3" + ] + }, + { + "cell_type": "markdown", + "id": "c1abb0d0-4db4-4afe-93be-28657ae0dcbc", + "metadata": {}, + "source": [ + "## Practice: Get subimages\n", + "\n", + "Uncomment the line code in the body of the loop that will correctly slice the subimage matching the given kernel shape." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6f1e74f-0b27-4415-99e1-7ef0e6f2dc24", + "metadata": {}, + "outputs": [], + "source": [ + "def get_subimages(image, kernel_shape):\n", + " \"\"\"Returns an array of subimages matching the given kernel shape.\"\"\"\n", + " image_h, image_w = image.shape\n", + " kernel_h, kernel_w = kernel_shape\n", + " subimages_h = image_h - kernel_h + 1\n", + " subimages_w = image_w - kernel_w + 1\n", + " subimages = np.zeros((subimages_h, subimages_w, kernel_h, kernel_w))\n", + " for ih in range(subimages_h):\n", + " for iw in range(subimages_w):\n", + " # subimages[ih, iw] = image[iw:iw + kernel_w, ih:ih + kernel_h]\n", + " # subimages[ih, iw] = image[iw:iw + kernel_w + 1, ih:ih + kernel_h + 1]\n", + " subimages[ih, iw] = image[ih:ih + kernel_h, iw:iw + kernel_w]\n", + " # subimages[ih, iw] = image[ih:ih + kernel_h + 1, iw:iw + kernel_w + 1]\n", + " return subimages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a8d317a", + "metadata": {}, + "outputs": [], + "source": [ + "dubs_eye = dubs[50:100, 300:350]\n", + "plt.imshow(dubs_eye, cmap=\"gray\");" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8936457", + "metadata": {}, + "outputs": [], + "source": [ + "dubs_eye_subimages = get_subimages(dubs_eye, (3, 3))\n", + "compare(dubs_eye_subimages[(30, 20)], dubs_eye[30:33, 20:23])\n", + "assert np.allclose(dubs_eye_subimages[(30, 20)], dubs_eye[30:33,20:23])" + ] + }, + { + "cell_type": "markdown", + "id": "c5c3e31e-17fc-4ac8-a4cd-4e0198d17a43", + "metadata": {}, + "source": [ + "## Practice: Sum of element-wise products\n", + "\n", + "Write a NumPy expression to compute the sum of element-wise products between the current subimage and the given kernel." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cb2d5d3f-f754-49f2-9f7e-13ae2ea30efa", + "metadata": {}, + "outputs": [], + "source": [ + "def convolve(image, kernel):\n", + " \"\"\"Returns the convolution of the image and kernel where the image shrinks by kernel shape.\"\"\"\n", + " subimages = get_subimages(image, kernel.shape)\n", + " subimages_h, subimages_w = subimages.shape[:2]\n", + " result = np.zeros((subimages_h, subimages_w))\n", + " for ih in range(subimages_h):\n", + " for iw in range(subimages_w):\n", + " # TODO: compute the sum of element-wise product between subimage and kernel\n", + " result[ih, iw] = (subimages[ih, iw] * kernel).sum()\n", + " return result\n", + "\n", + "\n", + "dubs_blurred = convolve(dubs, box_blur_3x3)\n", + "compare(dubs, dubs_blurred)" + ] + }, + { + "cell_type": "markdown", + "id": "94643923-a935-4032-95fe-5593eeae18f7", + "metadata": {}, + "source": [ + "What will the result look like if we make the kernel larger? How about a 10-by-10 kernel instead of a 3-by-3 kernel?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b933b062-5247-4689-ac9c-ed0444231f08", + "metadata": {}, + "outputs": [], + "source": [ + "box_blur_10x10 = np.ones((10, 10))\n", + "box_blur_10x10 /= box_blur_10x10.size\n", + "box_blur_10x10" + ] + }, + { + "cell_type": "markdown", + "id": "ae47765d-6efb-4f91-b065-72017da722f7", + "metadata": {}, + "source": [ + "The entries in this kernel sum up to 1 to ensure that the resulting image is an average over the image. 10-by-10 is a somewhat awkward shape since we no longer have a notion of the exact center pixel, but the convolution code still works." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "692429fd-f4c3-45f5-9e64-043f7bc5e521", + "metadata": {}, + "outputs": [], + "source": [ + "dubs_blurred_10x10 = convolve(dubs, box_blur_10x10)\n", + "compare(dubs, dubs_blurred_10x10)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "40c081ec-d427-4fcf-8b4d-2b21b6ac86d5", + "metadata": {}, + "source": [ + "## Understanding kernels\n", + "\n", + "What other kernels exist? Let's try to deduce what a kernel might do to an image.\n", + "\n", + "This kernel has 0's all around except for a 1 in the center of the kernel." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "61a23830-337c-4e53-9dbd-f8c5d4564e3d", + "metadata": {}, + "outputs": [], + "source": [ + "kernel = np.zeros((3, 3))\n", + "kernel[1, 1] = 1\n", + "kernel" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "086bc026-0778-4f1f-bb3a-21246281cc16", + "metadata": {}, + "outputs": [], + "source": [ + "compare(dubs, convolve(dubs, kernel))" + ] + }, + { + "cell_type": "markdown", + "id": "1f41b21b-d538-468e-8c3b-dd0017ff867f", + "metadata": {}, + "source": [ + "This kernel is computed by taking the double the previous mystery kernel and subtracting-out the box blur kernel." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5a7fdb5d-48f0-4a4d-a678-15b833c3bfdb", + "metadata": {}, + "outputs": [], + "source": [ + "# Setup the previous kernel\n", + "kernel = np.zeros((3, 3))\n", + "kernel[1, 1] = 1\n", + "# Double the previous kernel minus the box blur kernel\n", + "kernel = 2 * kernel - box_blur_3x3\n", + "kernel" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a9504ffe-a2a9-489b-858a-2f2de1e7debf", + "metadata": {}, + "outputs": [], + "source": [ + "compare(dubs, convolve(dubs, kernel))" + ] + }, + { + "cell_type": "markdown", + "id": "1ffbf5ae-fa99-4444-bf34-3f2af1bcdaa1", + "metadata": {}, + "source": [ + "This 10-by-10 kernel places a bell curve around the center with values shrinking the further you get from the center." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "00ab647f-dc42-47ce-9b8f-b87399b97201", + "metadata": {}, + "outputs": [], + "source": [ + "values = [[1, 2, 4, 7, 11, 11, 7, 4, 2, 1]]\n", + "kernel = np.array(values) * np.array(values).T\n", + "kernel = kernel / kernel.sum()\n", + "# Show values rounded to two decimal places\n", + "kernel.round(2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b375061a-edc4-4781-a1bb-b1e854745870", + "metadata": {}, + "outputs": [], + "source": [ + "compare(dubs, convolve(dubs, kernel))" + ] + }, + { + "cell_type": "markdown", + "id": "32e78a36-cb3b-4ae7-abc1-46c5faf7fff4", + "metadata": {}, + "source": [ + "## Practice: Template match\n", + "\n", + "Write a function `template_match` that finds instances of a small template inside a larger image. Given an image and a template that can be found within it, `template_match` returns an array of values representing the pixel-by-pixel similarity between the template and each corresponding subimage.\n", + "\n", + "This algorithm essentially applies a convolution over the image using the template as the kernel. However, using the raw pixel values will favor brighter pixels. Instead, we first have to de-mean both the template and the current subimage by subtracting the average pixel values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3998f270-4ea4-4234-8249-16ae8e46522e", + "metadata": {}, + "outputs": [], + "source": [ + "def template_match(image, template):\n", + " \"\"\"\n", + " Given a grayscale image and template, returns a numpy array that stores the similarity of the\n", + " template at each position in the image.\n", + " \"\"\"\n", + " image_h, image_w = image.shape\n", + " template_h, template_w = template.shape\n", + "\n", + " # De-mean the template\n", + " demeaned_template = template - template.mean()\n", + "\n", + " # Construct result of the expected output size\n", + " result = np.zeros((..., ...))\n", + " result_h, result_w = result.shape\n", + "\n", + " for ih in range(result_h):\n", + " for iw in range(result_w):\n", + " # Select corresponding subimage\n", + " subimage = ...\n", + "\n", + " # De-mean the subimage\n", + " demeaned_subimage = subimage - subimage.mean()\n", + "\n", + " # Compute sum of element-wise products\n", + " similarity = ...\n", + " result[ih, iw] = similarity\n", + "\n", + " return result\n", + "\n", + "\n", + "match = template_match(dubs, dubs_eye)\n", + "compare(dubs, match)\n", + "np.unravel_index(match.argmax(), match.shape)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}