Setting up the Infrastructure#
In every data science project you will need several tools to help analyze the data in an efficient1 manner. Examples include a place to store the data (e.g. database management system or DBMS); a way to put your model to work, e.g. a way that allows the model to ingest new data and make predictions (an API); and a way to examine the performance of trained models (e.g. monitor tools).
This tutorial includes a script for managing the infrastructure2 in a transparent way.
The infrastructure of this tutorial has three pieces:
- a
postgresql
database calledfood_db
, - a container that executes
triage
experiments (we will use this when trying to scale up), - a container for interacting with the data called
bastion
.
bastion
includes a postgresql
client (so you can interact with the
database)3 and a full python
environment (so you can code or
modify the things for the tutorial).
The only thing you need installed on your laptop is docker
.
From your command line (terminal) run the following command from the repo directory:
./tutorial.sh
Usage: ./tutorial.sh {up|down|build|rebuild|run|logs|status|clean}
OPTIONS:
-h|help Show this message
up Starts Food DB
down Stops Food DB
build Builds images (food_db and bastion)
rebuild Builds images (food_db and bastion) ignoring if they already exists
-l|logs Shows container's logs
status Shows status of the containers
-d|clean Removes containers, images, volumes, netrowrks
INFRASTRUCTURE:
Build the DB's infrastructure:
$ ./tutorial.sh up
Check the status of the containers:
$ ./tutorial.sh status
Stop the tutorial's DB's infrastructure:
$ ./tutorial.sh down
Destroy all the resources related to the tutorial:
$ ./tutorial.sh clean
View the infrastructure logs:
$ ./tutorial.sh -l
Following the instructions on the screen, we can start the infrastructure with:
./tutorial.sh up
You can check that everything is running smoothly with status
by
using the following command:
./tutorial.sh status
Name Command State Ports
------------------------------------------------------------------------
tutorial_db docker-entrypoint.sh postgres Up 0.0.0.0:5434->5432/tcp
To access bastion
, where the postgresql
client is, submit the command:
./tutorial.sh bastion
Your prompt should change to something like:
[triage@dirtyduck$:/dirtyduck]#
NOTE: The number you see will be different (i.e. not 485373fb3c64
).
Inside bastion
, type the next command to connect to the database
psql ${DATABASE_URL}
The prompt will change again to (or something very similar):
psql (12.3 (Debian 12.3-1.pgdg100+1), server 12.2 (Debian 12.2-2.pgdg100+1))
Type "help" for help.
food=#
The previous command is using psql
, a powerful command line client
for the Postgresql database. If you want to use this client fully,
check psql's
documentation.
The database is now running and is named food
. It should contain a
single table named inspections
in the schema
raw
. Let's check
the structure of the inspections
table. Type the following command:
\d raw.inspections
Column | Type | Collation | Nullable | Default |
---|---|---|---|---|
inspection | character varying | not null | ||
dba_name | character varying | |||
aka_name | character varying | |||
license_num | numeric | |||
facility_type | character varying | |||
risk | character varying | |||
address | character varying | |||
city | character varying | |||
state | character varying | |||
zip | character varying | |||
date | date | |||
type | character varying | |||
results | character varying | |||
violations | character varying | |||
latitude | numeric | |||
longitude | numeric | |||
location | character varying |
That's it! We will work with this table of raw inspections data.
You can disconnect from the database by typing \q
. But don't leave
the database yet! We still need to do a lot of things4
-
Reproducible, scalable, flexible, etc. ↩
-
And other things through this tutorial, like the execution of the model training, etc. ↩
-
If you have a postgresql client installed, you can use
psql -h 0.0.0.0 -p 5434 -d food -U food_user
rather than thebastion
container. ↩ -
Welcome to the not-so-sexy part of the (supposedly) sexiest job of the XXI century. ↩