Skip to content

Getting and Keeping Data#

Data comes in many forms, from many sources - you may get a database dump directly from a project partner, or you may need to scrape data from the web (see Basic Web Scraping). Either way, once you've got your hands on some data, you'll need to bring it into a database, and start formatting it in such a way that you can use it for analysis. Command Line Tools will start to come in handy here. If your data is in a format that resembles CSV this instructions will be helpful. You'll definitely want to keep track of the steps you took to go from raw data to model-ready data (Reproducible ETL).

Often data science for social good projects will involve sensitive data, so it's important to be aware of some basic principles of data security: Data Security Primer.