Skip to the content.

94889: Machine Learning for Public Policy Lab

Previous Versions: Fall 2023 |Fall 2022 | Fall 2021 | Fall 2020 | Spring 2020

Fall 2025: Tues & Thurs, 11am-12:20pm (HBH 2008), Lab Section: Wednesday 9:30-10:50am (HBH 2008)

Important

Wednesday Sessions

The first few weeks will be hands-on tech sessions and for the remainder of the semester, we'll use the time on Wednesdays to meet with teams and check in about their progress on the project.

Course Description

This is a project-based course designed to provide training and experience in solving real-world problems using machine learning, with a focus on problems from public policy and social good.

Through lectures, discussions, readings, and project assignments, students will learn about and get hands-on experience building end-to-end machine learning systems, starting from project definition and scoping, to modeling, to field validation and turning their analysis into action. Through the course, students will develop skills in problem formulation, working with messy data, communicating about machine learning with non-technical stakeholders, model interpretability, understanding and mitigating algorithmic bias & disparities, evaluating the impact of deployed models, and understanding the ethical implications of design choices made throughout the ML pipeline.

Pre-Requisites: Students will be expected to know Python (for data analysis and machine learning),SQL, and have prior graduate coursework in machine learning. This course assumes that you have taken graduate Machine Learning courses before and is focused on teaching how to use ML to solve real-world problems. Experience with *nix command line, git(hub), and working on remote machines will be helpful and is highly recommended.

DRAFT SYLLABUS

People

Instructor

Rayid Ghani

GHC 8023
Office Hours:
Tuesday 12:30-1:30pm
Wednesday 3-4pm
Email me if you want to meet outside these hours

Teaching Assistant

Logan Crowl

HBH 3024 (slack if you can't find it)
Office Hours: Monday 1:30-2:30pm
Thursday 2-3pm

Grading

Throughout the semester, students will work together in small groups on a policy project using machine learning that will illustrate the concepts discussed in class and readings.

Graded components will include:

The data used for the course projects should be considered sensitive and private and must remain in the secure computing environment provided for the course. Any attempt to download any portion of the project data to a machine outside this environment will result in automatic failure of the class. Note that you may use tools like SQL clients, jupyter notebooks, etc. to interact with the data on the remote servers, but may not save the dataset (or a portion of it) to disk on a local machine.

Schedule

See the syllabus for much more detail as well, including links to required readings as well as information about group projects, grading, and helpful optional readings.

Week Dates Tuesday Wednesday Thursday Assignments Project Focus
1 Tu: Aug 26
Th: Aug 28
Intro/Overview + Project Overviews Basic Tech Setup: Make sure students can connect to the server through ssh, have access to github, and access the db both from psql and from dbeaver Th: Scoping, Problem Definition, Balancing goals (equity, efficiency, effectiveness) 1. Survey (Monday)
2. Project preferences + signature (Wednesday)
Get familiar with the class, goals, and understand project choices
2 Tu: Sep 2
Th: Sep 4
Case Studies + Discussion Remote Tech Workflows Acquiring Data, Privacy, Record Linkage Data Audit and Exploration
3 Tu: Sep 9
Th: Sep 11
Data Exploration
+ 30 min project team meeting/coordination
Git + GitHub Project Work Data Stories and Finalize Project Scope
4 Tu: Sep 16
Th: Sep 18
Analytical Formulation / Baselines Python + SQL Building ML Pipelines Project Proposal (tuesday) Initial ML Pipeline Setup
Analytical Formulation and Baselines
5 Tu: Sep 23
Th: Sep 25
Performance Metrics / Evaluation Part 1: Choosing Metrics Triage Configuration Tech Session Project Work Proposal Reviews (Wednesday) Iteration 1 - Build End to End Code Pipeline
(Focus on end-to-end shell)
6 Tu: Sep 30
Th: Oct 2
Performance Metrics / Evaluation Part 2: Model Selection and Validation Group Check-Ins Temporal Deep Dive with projects Analytic Formulation, Baselines, and Cohort/Label Queries (monday)
7 Tu: Oct 7
Th: Oct 9
Feature Engineering / Imputation Group Check-Ins Project Work Iteration 2 - End to End Code Pipeline
(Focus on feature development)
Tu: Oct 14
Th: Oct 16
FALL BREAK FALL BREAK FALL BREAK
8 Tu: Oct 21
Th: Oct 23
Features and Triage Group Check-Ins triage office hours and Q&A Modeling Plan and Temporal Validation Configuration (Monday)
9 Tu: Oct 28
Th: Oct 30
ML Modeling in Practice Group Check-Ins Project Work V0 Baseline Results and (Planned) Feature List (Monday) Iteration 3 - End to End Code Pipeline
(Focus on models and evaluation)
10 Tu: Nov 4
Th: Nov 6
No Classes Group Check-Ins Performance Metrics / Evaluation Pt. 3 (audition) V0 Modeling Results (Monday)
11 Tu: Nov 11
Th:Nov 13
Model Interpretability Group Check-Ins Ethics Workshop Weekly Update Assignment (Monday) - More complete results over time Iteration 4 - End to End Code Pipeline
(Focus on interpreting the models)
12 Tu: Nov 18
Th: Nov 20
Bias and Fairness Pt 1 Group Check-Ins Field Trials and Wrap-Up Weekly Update Assignment (Monday) - Feature Importances + Crosstabs
13 Tu: Nov 25
Th: Nov 27
Bias and Fairness Pt II HOLIDAY HOLIDAY Weekly Update Assignment (Monday) - Bias Final model choice and understanding its performance and impact on disparities
14 Tu: Dec 2
Th: Dec 4
Project Work Final Presentations Presentations Project Report and Presentations
Finals Week Final Report Due Final Report, Code, Repo, Documentation

Textbook & Software

Textbook: The course will rely on selected readings from various sources and has no required textbook – each week, we’ll have selected readings from a variety of sources, listed below.

Software: For project work, we will provide students with access to a shared data and ML infrastructure. Data will be available in a postgreSQL database and SQL and python will be used throughout the course. Students will be expected to store project code in a shared github repository, so you should create an account if you do not already have one (github.com). Additionally, we will be making use of the machine learning pipeline package triage for modeling.

Phone, Laptop, and Device Policy
Because much of the work in this course involves group discussions and responding thoughtfully to your colleagues’ progress reports, mobile devices are not permitted for use during the class. If you have a disability or other reason that necessitates the use of a mobile device, please speak to one of the instructors or teaching assistants.

Class Project

Beginning in the second week of class, groups of 3-4 students will work together on a machine learning project throughout the semester with one of several real-world public policy problems. Each week, every group will be expected to provide an update on its current status. In addition to helping connect readings and discussion topics to the policy domain, these updates and discussions will give you a chance to elicit input and feedback from your classmates about challenges you’re facing (and they likely are too!) in your analyses.
Throughout the semester, students will be responsible for several intermediate deliverables as they work on their group projects:

At the end of the semester, each group will be responsible for a final presentation (10 minutes in length plus 3 minutes for questions). While the deep dive presentations should be more technical in nature, the final presentation should be geared towards the relevant decision makers for your project, including an overview of the problem and approach, your results, policy recommendations, and limitations of the work.
Accompanying the final presentation is a written report, approximately 15 pages in length, which should include:

Tentative Schedule

In general, the course will be structured around three sessions each week:

Below is a preliminary schedule of the course, including the readings that will be assigned for that week. Please be sure to have read and be prepared to discuss the readings before the specified class session. Most of these topics can be (and often are) the focus of entire courses and generally we’ll only scratch the surface, but hopefully inspire you to delve deeper into areas that interest you (and you’ll find plenty of open research questions in each). Optional readings are also listed for most sessions which may be of interest to students who wish to delve deeper into a given area, as well as provide additional context for your related project work.

More Resources

You may find a number of books useful as general background reading, but these are by no means required texts for the course:

Additionally, the Global Communication Center (GCC) can provide assistance with the written or oral communication assignments in this class. The GCC is a free service, open to all students, and located in Hunt Library. You can learn more on the GCC website: cmu.edu/gcc.

Your Responsibilities

Attendance: Because much of this course is focused on discussion with your classmates, attending each session is important to both your ability to learn from the course and to contribute to what others get out of it as well. As such, you’ll be expected to attend every session and your participation will factor into your grade as described above. Should anything come up will require you to miss a class (illness, conferences, etc), please let one of the course staff know in advance.

Academic Integrity: Violations of class and university academic integrity policies will not be tolerated. Any instances of copying, cheating, plagiarism, or other academic integrity violations will be reported to your advisor and the dean of students in addition to resulting in an immediate failure of the course.

Data Security: As noted above, the data used for the project work in this course should be considered sensitive and care must be taken to protect the privacy of those in the dataset. The data must remain on the computing environment provided for the class and attempts to download it to any other machine will result in failure of the course.

Additionally, care must be taken to avoid accidentally committing any raw data, queries containing identifiable information, or secrets (key files, database passwords, etc) to github. Should this occur, or should you have any reason to believe your personal computer or private key has been compromised, you must immediately notify the course staff of the issue.

AI Use Policy: We want this class to reflect what solving problems with ML in the real world looks like, which means different policies depending on 1) where you're working 2) the data you’re using, and 3) the privacy and confidentiality requirements. For the data we are using in this class, please don't share or upload any confidential data information to any AI tool (on the web) but beyond that, you can use any tool you want. You're accountable for the output and the work you submit. Know that a lot of these models are trained on pretty bad ML code and practices :)

We also want this class to help you understand what the AI tools are good for, where they fall short, and how to use them best to solve real-world problems. So use them, but be skeptical, review and test the output, and be ready to share what you find with others in the class.

tl;dr

Resources

Students with Disabilities: We value inclusion and will work to ensure that all students have the resources they need to fully participate in our course. Please use the Office of Disability Resource’s online system to notify us of any necessary accommodations as early in the semester as possible. If you suspect that you have a disability but are not yet registered with the Office of Disability Resources, you can contact them at access@andrew.cmu.edu

Health and Wellness: As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. These mental health concerns or stressful events may diminish your academic performance and/or reduce your ability to participate in daily activities. CMU services are available, and treatment does work.

All of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is almost always helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

CaPS: 412-268-2922
Re:solve Crisis Network: 888-796-8226
If the situation is life threatening, call the police
On campus: CMU Police: 412-268-2323
Off campus: 911

Discrimination and Harassment: Everyone has a right to feel safe and respected on campus. If you or someone you know has been impacted by sexual harassment, assault, or discrimination, resources are available to help. You can make a report by contacting the University’s Office of Title IX Initiatives by email (tix@andrew.cmu.edu) or phone (412-268-7125).

Confidential reporting services are available through the Counseling and Psychological Services and University Health Center, as well as the Ethics Reporting Hotline at 877-700-7050 or www.reportit.net (user name: tartans; password: plaid).

You can learn more about these options, policies, and resources by visiting the University’s Title IX Office webpage at https://www.cmu.edu/title-ix/index.html

In case of an emergency, contact University Police 412-268-2323 on campus or call 911 off campus.

Student Academic Success Center (SASC) SASC focuses on creating spaces for students to engage in their coursework and approach learning through a variety of group and individual tutoring options. They offer many opportunities for students to deepen their understanding of who they are as learners, communicators, and scholars. Their workshops are free to the CMU community and meet the needs of all disciplines and levels of study. SASC programs to support student learning include the following (program titles link to webpages):