10718: Machine Learning in Practice

Previous Versions: Fall 2023 | Fall 2022 | Fall 2021 | Fall 2020 | Spring 2020

Fall 2025: Tues & Thurs, 2:00-3:20 (POS 153)

Important

All content will be on github in this repo including schedule and detailed syllabus
All assignments will be on, and submitted through canvas
Class communication and announcements will be primarily through email and piazza

Class Description

This is a project-based course designed to provide students training and experience in solving real-world problems using machine learning, while exploring the interface, translation, and gaps between research and practice.

The goal of this course is to give students exposure to the nuance of using machine learning in the real-world, where common assumptions (like iid and stationarity) break down, and the growing needs for (and limitations of) approaches that go beyond optimizing for simple model accuracy measures such as fairness, explainability, robustness, uncetainty quantification, etc. Through project assignments, lectures, discussions, and readings, students will learn about and experience building machine learning systems for real-world problems (suing real-world data_, as well as applying and evaluating the utility of proposed methods for enhancing the interpretability, uncertainty quantification, causal inference capabilities, robustness, and fairness of machine learning models. Students will develop skills in problem formulation, working with messy (aka real) data, making ML design choices appropriate for the problem at hand, model selection, model interpretability, understanding and mitigating bias & disparities, and evaluating the impact of deployed models in the real-world.

Course Learning Objectives

Design and Development: Learn how to design and develop end-to-end ML systems that tackle real-world problems
Understand and Evaluate: the impact of various design choices across the machine learning workflow in the context of real-world problems.
Take real-world questions involving data and evaluate or develop appropriate methods to answer these questions.
Communications: Present technical material clearly, in spoken and written form, to various audiences

People

Instructor

Rayid Ghani
GHC 8023 Office Hours: Tuesday 12:30-1:30pm Wednesday 4-5pm Email me if you want to meet outside these hours

Education Associate

Daniel Bird
Office: GHC 8120

Teaching Assistants

Chancharik Mitra	Namrata Deka	Rohan Venkatesh Kashyap
Office Hours: Tues 5pm and Thurs 11am GHC 8228	Office Hours: Mon 4pm and Fri 10am GHC 8228	Office Hours: Tues 12:30pm and Wed 11am GHC 8228

Grading

Project-related assignments

Project update assignments (30%)
Write-up on module 2 findings (15%)
Group presentations (mid-semester and end of semester) (10%)

Midterm take-home exam (20%)

Final reflection write-up (5%)

Class attendance and participation in discussions (15%)

Weekly check-in and feedback forms (5%)

Schedule

See the detailed syllabus below for more details, including links to required readings and information about group projects, grading, and helpful optional readings.

Week	Dates	Topic	Assignments	Readings
1	Tu: Aug 26	Class Intro and Overview
1	Th: Aug 28	Why ML systems can fail in practice [post-class discussion recap slides]		individual research: website,s blogs, papers, videos, news articles
2	Tu: Sep 2	Scoping and Defining ML Projects	Individual Assignment: Getting to know the class project (due tuesday) Project Team Selection	Required: ML Project Scoping Guide Optional:Listed below
2	Th: Sep 4	Getting, Storing, and Linking Data		Optional: Listed below
3	Tu: Sep 9	Data Exploration	Assignment on Data Exploration (Due) and Team Pesentations on Data Exploration
3	Th: Sep 11	Analytical Formulation / Baselines		Required: Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations Problemn Formulation and Fairness
4	Tu: Sep 16	Model Selection Methodology	Project Assignment 1: Formulation and Baseline (due Monday)	Required: Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure Optional: See below
4	Th: Sep 18	Performance Metrics		Required: The Misuse of AUC: What High Impact Risk Assessment Gets Wrong
5	Tu: Sep 23	Feature Engineering and Imputation	Project Assignment 2: Validation set up Initial pipeline with train and validation set(s) and baseline implemented (due Monday)
5	Th: Sep 25	ML Pipelines (cancelled)
6	Tu: Sep 31	Models/hyperparameters in practice	Project Assignment 3: list of features and some subset implemented (due Monday)
6	Th: Oct 2	Team Presentations and Reviewing Modeling Results
7	Tu: Oct 7	Team Presentations	Project Assignment 4: modeling results (due Monday)
7	Th: Oct 9	no class for midterm time	Take-Home Midterm Available
8	Tu: Oct 14	No Class - Mid-semester break
8	Th: Oct 16	No Class - Mid-semester break
9	Tu: Oct 21	Team Presentations and Temporal Model Selection	Updated model results assignment (+ model selection) Due Tuesday
9	Th: Oct 23	ML Ethics Issues Overview		Required: Princeton Ethics Case Study 6: Public Sector Data Analysis
10	Tu: Oct 28	Understanding the Models
10	Th: Oct 30	Fairness in ML		Required: Listed below Optional:Listed below
11	Tu: Nov 4	No class - Election Day
11	Th: Nov 6	Evaluating ML Systems in the Field
12	Tu: Nov 11	ML and Causal Inference
12	Th: Nov 13	Interpretability	importances + cross tabs assignment due
13	Tu: Nov 18	Uncertainty Quantificaion
13	Th: Nov 20	Domain Shift / Temporal Drift
14	Tu: Nov 25	ML Ops
14	Th: Nov 27	Thanksgiving holiday
15	Tu: Dec 2	Team Presentations	Module 2 Writeup Due
15	Th: Dec 4	Wrap-Up. (and team presentations
	Finals Week		Final Reflection Writeup Due

Projects and Deliverables

Broadly, the course will be divided into two modules: 1) applied end-to-end machine learning pipelines, 2) Key considerations when building ML systems in practice, such as interpretability, fairness, uncertainty quantification, privacy, MLOps. Throughout the course, students will work in groups of 4 on an applied project based on a real-world problem to explore the ideas and methods covered in each module in detail. During the project, students will be responsible for several key deliverables:

Throughout the first module (covering applied ML pipelines), groups will submit short project update assignments weekly, anbd iterate based on feedback from the instructors.
At the end of the first module, there will be a take-home midterm exam focused on the concepts and skills emphasized in this portion of the course.
During the second half, each group will pick one topic (among a few choices listed below and chosen collaboeratively with the class), implement that into their project, and present their results (through a short write-up and a team presentation).

More details about the class project

Public schools in the United States face large disparities in funding, often resulting in teachers and staff members filling these gaps by purchasing classroom supplies out of their own pockets. DonorsChoose is an online crowdfunding platform that tries to help alleviate this financial burden on teachers by allowing them to seek funding for projects and resources from the community (projects can include classroom basics like books and markers, larger items like lab equipment or musical instruments, specific experiences like field trips or guest speakers).

Projects on DonorsChoose expire after 4 months, and if the target funding level isn't reached, the project receives no funding. Since its launch in 2000, the platform has helped fund over 2 million projects at schools across the US, but about 1/3 of the projects that are posted nevertheless fail to meet their goal and go unfunded.

The Modeling Problem

For the purposes of the class project, \DonorsChoose has hired a digital content expert who will review projects and help teachers improve their postings and increase their chances of reaching their funding threshold. Because this individualized review is a labor-intensive process, the digital content expert has ** time to review and support only 10% of the projects posted to the platform on a given day**.

You are working with DonorsChoose, and your task is to help this content expert focus their limited resources on projects that most need the help. As such, you want to build a model to identify projects that are least likely to be fully funded before they expire and pass them off to the digital content expert for review.

Data

Download links and data set description

More details about Module 2

Module 2 will involve selecting one (or more) topic that you want to go deeper into once you've built an initial ML pipeline and a set of reasonable (correct and well-performing) models. This will involve exploring key considerations that are critical in real-world ML problems, including interpretablity, fairness, uncertainty quantification, robustness, causality, and drift. The assignment in Module 2 will start from the models you've already built and will involve two deliverables:

short write-up (under 4 pages) based on applying your selected topic to your class project.

The need for this topic in your project (who will be the user and who will be impacted, and why it's important)
What question are you trying to answer
Which methods within this topic did you choose to try, and why
How was your implementation experience (easy to use package? difficulties in implementation?)
Results - What did you find? Do you know if it worked? Will it help the downstream DonorsChoose team, teachers, or students?
Your recommendations for DonorsChoose based on your work

Short 10-minute team presentation during the last week of class on yur findings

Grace Days

Project teams receive 3 total grace days for use on their project deliverables. You may not use more than 1 grace day on any single assignment. We will automatically keep a tally of these grace days for you; they will be applied greedily.

Participation and Missing Days

Attendance in class and participation in class discussions is a large part of 10-718. Throughout the semester, your participation will be measured by your responses in class and via Slido. You are permitted to miss a maximum of 4 lectures in order to still be considered for full participation credit; more than this will begin to reduce your participation grade.

Structure

Below is a preliminary schedule of the course, including the readings that will be assigned for that week. Please be sure to have read and be prepared to discuss the readings before the specified class session. Most of these topics can be (and often are) the focus of entire courses and generally, we’ll only scratch the surface, but hopefully inspire you to delve deeper into areas that interest you (and you’ll find plenty of open research questions in each). Optional readings are also listed for most sessions, which may be of interest to students who wish to delve deeper into a given area, as well as provide additional context for your related project work.

MODULE 1: APPLYING ML TO PRACTICAL PROBLEMS**

Tuesday, August 26:

Introduction

We’ll provide an introduction to the class, its goals, and an overview of the applied project we will be using as a motivating example throughout the semester.

Thursday, August 28:

Why ML Systems Can Fail in Practice

We'll discuss real-world failure modes of ML systems, moving beyond model accuracy to system-level issues including data, deployment, governance, incentives, etc. The goal here is to encourage critical thinking about preventing failures and to motivate the topics to be covered during the rest of the semester.

Tuesday, September 2:

ML Project Definition and Scoping

In this session, we’ll talk about scoping, problem definition, and understanding and balancing organizational goals. Before we start doing technical ML work, a decision needs to be made about whether a given problem can and should be addressed with machine learning: is the problem significant, feasible to solve with ML, and of sufficient importance to the organization that they will devote resources to implementing the solution? How will success be measured? How will (often competing) goals of efficiency, effectiveness, and equity be balanced?

10718: Machine Learning in Practice

Fall 2025: Tues & Thurs, 2:00-3:20 (POS 153)

Important

Class Description

Course Learning Objectives

People

Instructor

Education Associate

Teaching Assistants

Grading

Schedule

Projects and Deliverables

More details about the class project

The Modeling Problem

Data

More details about Module 2

Grace Days

Participation and Missing Days

Structure

MODULE 1: APPLYING ML TO PRACTICAL PROBLEMS**

Introduction

Why ML Systems Can Fail in Practice

ML Project Definition and Scoping

Due Today:

Required Reading:

Optional Readings:

Obtaining, Storing, and Linking Data

Optional Readings:

Data Exploration

Optional Readings:

Analytical Formulation and Baselines

Required Readings:

Optional Readings:

Model Selection Methodology

Required Reading:

Optional Readings:

Model (Selection) Performance Metrics

Required Reading:

Optional Reading:

Feature Engineering and Imputation

Optional Readings:

ML Pipelines

ML Modeling in Practice

Module 1 Review: Applied ML End-to-End Pipelines

Required Readings:

Team Presentations and Temporal Model Selection**

Optional Reading:

MODULE 2: Key Considerations Beyond Model Accuracy

ML Ethics Overview

Required Readings:

Optional Readings:

Practical Understanding of ML Models: What did my model learn?

Introduction to ML and Fairness

Required Readings:

Optional Readings:

Field Trials and Causality

Required Readings:

Optional Readings:

Model Interpretability

Required Readings:

Uncertainty Quantification

Required Reading:

Domain Shift and Temporal Drift

Readings:

ML Ops and Deploying ML Systems

Wrap-up and Team Presentations

More Resources

Your Responsibilities

Resources