Newer
Older
Data Server Tutorial
====================
This repository contains the `docker` + `django` tutorial for creating a data collection server that Elliot wrote in May of 2017. This repository is simultaneously an example, showing off how such a data collection server can be created and why we design things in such a way, as well as a starting point for your own data collection servers; this repository is designed to be downloaded, modified, and uploaded to a new repository to host your own data collection server.
## Overall design
The purpose of this repository is to show off a data collection server webapp. In the fictional example shown here, we mock up the server side of a mobile health data collection app that needs to keep track of data readings (which we store as just a list of floating point numbers) along with what patient this data is related to, and what device the data was collected on. I am now going to make a series of design decisions with hand-wavy justifications that you are welcome to argue with me in person about if you want to understand why:
* We are going to store our information in a SQL database, because they solve a lot of problems we don't want to have to solve on our own.
* We are going to offer access to our database over HTTP with JSON payloads because "The Web Is The Future" (TM).
* We are going to write glue code between the public-facing HTTP side of our app and the private SQL side of our app, so that we can implement logic, sanity checks, etc..., to make this more than just a simple database that anyone can use for any purpose.
Let's match technologies to our design decisions:
* For a SQL database, we're going to use [PostgreSQL](https://www.postgresql.org/)
* For an HTTP server we're going to use [nginx](https://www.nginx.com/)
* For glue code we're going to use [django](https://www.djangoproject.com/).
* To put all these together (and avoid having to get a Masters' Degree in server orchestration), we're going to use [docker](https://www.docker.com/).
These technologies will form the frame upon which we will hang our code and behavior. This repository has a [`docker-compose.yml`](docker-compose.yml) file that sets up the necessary services software. As you read this documentation, feel free to run the example code and follow along; the only thing you need installed is `docker`. I have provided `Makefile`s throughout the codebase that will allow you to type commands such as `make up` to bring the whole stack of components up, or `make stop` to stop the app from running, etc.... but this only works if you have `make` installed, which is not the case on Windows. Tough luck for you poor suckers, you'll have to type out the `docker-compose` commands manually.
## Technology Introductions
See [`docs/docker.md`](docs/docker.md) for a brief introduction to Docker, [`docs/django.md`](docs/django.md) for a brief introduction to Django, and [`docs/nginx.md`](docs/nginx.md) for what might almost be considered a brief introduction to nginx.
# How to run this example code
**tl;dr** If you're on Linux/MacOS and therefore have make, just run `make initdata superuser`.
If you're on windows and don't have `make`, run this:
docker run -ti -v "%cd%:/pwd" -w /pwd python python generate_env.py
docker-compose up --build -d
docker-compose exec app python manage.py initdata
docker-compose exec app python manage.py createsuperuser
```
These commands will ask you some questions about passwords and tokens and whatnot, so answer the questions, and then it will ask you for information about a "super user", which is what you will login as into the admin interface.
## More in-depth instructions
Once you've got `docker` installed and ready to go, open a shell in this directory and run `docker-compose up --build` to build the containers for each service defined within the `docker-compose.yml` file, and to have it run in the foreground of that shell. I have created a [`Makefile`](Makefile) that provides shortcuts if you have `make` installed (this is provided by default on platforms such as Linux or MacOS) so that you can run e.g. `make up` and it will bring the application stack up, but if you don't have `make` you can just type out the corresponding commands yourself. Here is a list of useful commands:
* `make`/`make up`/`docker-compose up --build -d`: This will take the current set code, package it into containers, and run them in the background. You can look at the currently running containers (if any) with `docker-compose ps`.
* `make stop`/`docker-compose stop`: This will stop the containers from running.
* `make logs`/`docker-compose logs -f`: This will scroll logs from the containers currently running in real time. Useful to see how your HTTP requests are being processed/why your python app is throwing errors, etc...
* `make shell`/`docker-compose exec app /bin/bash`: This will open up a `bash` shell within the python code container. This allows you to poke around, see what the files look like within the container, execute python commands, etc...
* `make initdata`/`docker-compose exec app python manage.py initdata`: This runs [some python code](django_app/data_app/management/commands/initdata.py) to initialize testing users/data within our database. This is not the kind of thing you want to run on the final production database, but it's useful for generating a set of testing data. As you adapt this repository to your own needs, this may or may not be useful to you.
* `make adminuser`/`docker-compose exec app python manage.py createsuperuser`: This creates a super user within the Django user model, which is important so that you can login to the administrative interface to directly edit database entries, add normal users, etc...
* `make .env`/`python generate_env.py`: This will generate a `.env` file in the current directory, which contains sensitive information such as SQL database passwords, tokens, etc... This file should never be checked into source control unencrypted, and in this case, it is automatically ignored from git through the `.gitignore` file. This file gets embedded within the python app container as a file called `secret.py`, and elements of this file are passed to the PostgreSQL container on startup.
There are other commands hidden away within the [`Makefile`](Makefile), take a peek if you want to know more.
# A walk through the code
Conceptually, we set things up such that `nginx` sits at the interface between our server and the internet, keeping the seething masses at bay with strong code and intelligent configuration. Any worthy HTTP requests will get forwarded into our Django application running within its own container, and the Django code will be able to communicate with the PostgeSQL server because we have already configured it properly. The requests are filtered through the code in [`urls.py`](django_app/data_app/urls.py), pairing off request URLs with functions within [`views.py`](django_app/data_app/views.py). The views take in the requests, process them, and return responses that get sent back to the requesting client.
We define endpoints for getting patients, devices, and readings that have been previously defined within the database, as well as an endpoint for uploading new readings. Most of these endpoints (with the exception of getting the device list) require authentication, which is done by hitting the "login" endpoint, which sets a cookie within the requesting client's HTTP stack. All endpoints return data in JSON format. Sending data to the server is a little more custom, see the client examples for how to format your data when uploading.
This codebase is intended to showcase a small variety of ways in which you can write a simple data storage webapp. The intention here is for you to ingest the documentation here, download this repository, and start tweaking it to your use case. There are many topics that are not covered here such as uploading large binary files, dealing with spotty internet connections, etc... but this is at least a good place to start.
## A note on API design
The API displayed here (the collection of `GET` and `POST` requests that you can send to the various API endpoints) is simplistic, but not completely unrealistic. It's good practice to structure your API endpoints in some reasonable way, that gives the user a consistent, well-defined way of interacting with your webapp. Additionally, it's good to lock down access to the bare minimum, so instead of allowing both `GET` and `POST` requests to an endpoint that is a "read-only" style endpoint (such as listing all the registered patients, etc...), it's better to lock it down to only `GET` requests and explicitly error on the `POST` requests. This will reduce confusion in the future when you accidentally send a `POST` request and are confused by the errors you get when you try to load in parameters from what you think is a `GET` request, etc...
API design is more of an art than a science, so experiment, and don't be afraid to rewrite things a few times if your initial experimentation results in something that feels "clunky".
## Where to go from here
In order to turn this repository into something useful, first define the data types you want to store in the database. This will involve editing the [`models.py`](django_app/data_app/models.py) file to store everything you're interested in. Then, write views within [`views.py`](django_app/data_app/views.py) to allow access to those models. Then, once you're satisfied that the API you've written is meaningful and implemented correctly, write clients for it in Python, on Android, on iOS, using Javascript in webpages, whatever. The world is your burrito.
# Client examples
I have provided two examples of how to request data from this server; [one set using the Postman app](client_examples/postman), and [one set using Python](client_examples/python). Read their respective READMEs to see how to run them and how to look at the results.
## Acknowledgements
Many thanks to Clara Lu for being the guinea pig and helping to define many of the best practices exemplified within this repository.