Some Tips for Handling Secrets
Keeping secrets (such as database passwords, API credentials, etc) out of your code is important to ensure the security your systems and data. While there are many approaches to doing so, two simple options are making use of environment variables and using secret config files.
Option 1: Environment Variables
Environment variables you set at the bash command line are available to your code running in that environment and a good option for keeping secrets out of your code itself. You can set environment variables at the command line by assigning them with an =
sign (avoid any spaces around the =
) and check their value using echo
and placing a $
before the variable name:
you@server:~$ FOO="HELLO WORLD"
you@server:~$ echo $FOO
HELLO WORLD
In python, you can access these using the built-in os
module, for instance if you had your database password stored in the PGPASSWORD
environment variable:
import os
db_pass = os.getenv('PGPASSWORD')
If you don't want to set the environment variables by hand every time you start a new terminal session, you could also store them in a shell script that would load them up when run, for instance, you might have a file called environment.sh
with contents:
export FOO="HELLO WORLD"
export BAR="BAZ"
Importantly, you'll need to restrict the access to this file: store it somewhere only you can access (e.g., your home directory), avoid committing it to a git repository, and change the permissions so only you can view it using chmod 600 {filename}
.
Once you've created that file, any time you want to load the environment variables, you can simply run its contents as a shell script using source
. For instance, if the file was named environment.sh
:
you@server:~$ source environment.sh
Option 2: Secrets Config File
A second option involves storing your secrets in a config file that can be read by your programs (any number of formats is reasonable: yaml, json, even plain text). For instance, you might create a file called secrets.yaml
with contents such as:
db:
host: database.mlpolicylab.dssg.io
port: 5432
dbname: group_students_database
user: andrewid
password: 12345
web_resource:
api_key: 23b53ca9845f70424ad08f958c94b275
Then, you can access your secrets within your code with the appropriate loading utility, such as (here, the yaml
module is not built-in, but comes from the package PyYAML
):
import yaml
with open('path/to/secrets.yaml', 'r') as f:
# loads contents of secrets.yaml into a python dictionary
secret_config = yaml.safe_load(f.read())
This can be an easy way to feed secrets into your programs, but you'll need to ensure these secrets don't accidentally get committed to github. You could either provide the path to config file as an input parameter to your program (in which case, you could keep the secrets file somewhere entirely outside of the git repo, such as your home directory) or have it live in some expected location within the structure of the github repo, but use a .gitignore
file to avoid committing the secrets file itself.
To do so, edit (or create) your .gitignore
file at the top level of your repo to add (in the example where the secrets are contained in secrets.yaml
):
# ignore secrets config
secrets.yaml
Make sure you've added and committed the .gitignore
file to your repo, and then you should be able to confirm that your secrets file isn't being tracked with git status
.