Skip to the content.

10718: Machine Learning in Practice

Previous Versions: Fall 2023 | Fall 2022 | Fall 2021 | Fall 2020 | Spring 2020

Fall 2025: Tues & Thurs, 2:00-3:20 (POS 153)

Important

Class Description

This is a project-based course designed to provide students training and experience in solving real-world problems using machine learning, while exploring the interface between research and practice.

The goal of this course is to give students exposure to the nuance of using machine learning in the real-world, where common assumptions (like iid and stationarity) break down, and the growing needs for (and limitations of) approaches that go beyond optimizing for simple model accuracy measures and explore notions of fairness, explainability, robustness, etc. Through project assignments, lectures, discussions, and readings, students will learn about and experience building machine learning systems for real-world problems and data, as well as applying and evaluating the utility of proposed methods for enhancing the interpretability and fairness of machine learning models. Through the course, students will develop skills in problem formulation, working with messy (aka real) data, making ML design choices appropriate for the problem at hand, model selection, model interpretability, understanding and mitigating bias & disparities, and evaluating the impact of deployed models.

Course Learning Objectives

People

Instructor

Rayid Ghani

GHC 8023
Office Hours:
Tuesday 12:30-1:30pm
Wednesday 3-4pm
Email me if you want to meet outside these hours

Education Associate

Daniel Bird

Office: GHC 8120

Teaching Assistants

Chancharik Mitra Namrata Deka Rohan Venkatesh Kashyap

Office Hours: Tues 5pm and Thurs 11am GHC 8228

Office Hours: Mon 4pm and Fri 10am GHC 8228

Office Hours: Tues 12:30pm and Wed 11am GHC 8228

Grading

Project update assignments (30%)

Midterm take-home exam (20%)

Write-up on module 2 findings (10%)

Group presentation (10%)

Final reflection write-up (10%)

Class attendance and participation in discussions (15%)

Submitting weekly check-in and feedback forms (5%)

Schedule

See the detailed syllabus below for much more detail as well, including links to required readings and information about group projects, grading, and helpful optional readings.

Week Dates Topic Assignments
1 Tu: Aug 26 Class Intro and Overview
1 Th: Aug 28 Why ML systems can fail in practice
2 Tu: Sep 2 Scoping and Defining ML Projects Individual Assignment: Getting to know the class project (due tuesday)
Project Team Selection
2 Th: Sep 4 Getting, Storing, and Linking Data
3 Tu: Sep 9 Data Exploration Assignment on Data Exploration (Due) and Team Pesentations on Data Exploration
3 Th: Sep 11 Analytical Formulation / Baselines
4 Tu: Sep 16 Model Selection Methodology Project Assignment 1: Formulation and Baseline (due Monday)
4 Th: Sep 18 Performance Metrics
5 Tu: Sep 23 Feature Engineering and Imputation Project Assignment 2:
Validation set up
Initial pipeline with train and validation set(s) and baseline implemented (due Monday)
5 Th: Sep 25 ML Pipelines
6 Tu: Sep 31 Models/hyperparameters in practice Project Assignment 3:
list of features and some subset implemented (due Monday)
6 Th: Oct 2 Temporal Model Selection
7 Tu: Oct 7 Module 1 Review: Applied ML - End to End Pipelines Project Assignment 4:
modeling results (due Monday)
7 Th: Oct 9 no class for midterm time Take-Home Midterm Available
8 Tu: Oct 14 No Class - Mid-semester break
8 Th: Oct 16 No Class - Mid-semester break
9 Tu: Oct 21 Common ML Pitfalls Add assignment: importances and cross tabs
9 Th: Oct 23 ML Ethics Issues Overview Updated model results assignment (+ model selection) Due Monday
10 Tu: Oct 28 Understanding the Models importances + cross tabs assignment due
10 Th: Oct 30 Interpretability
11 Tu: Nov 4 No class - Election Day
11 Th: Nov 6 Fairness in ML
12 Tu: Nov 11 Fairness in ML
12 Th: Nov 13 ML and Causal Inference
13 Tu: Nov 18 Evaluating ML Systems in the Field
13 Th: Nov 20 ML Ops
14 Tu: Nov 25 Uncertainlty Quantificaion
14 Th: Nov 27 Thanksgiving holiday
15 Tu: Dec 2 TBD Writeup Due
15 Th: Dec 4 Wrap-Up
Finals Week Final Reflection Writeup Due
Week Dates Topic Assignments
1 Tu: Aug 26 Class Intro and Overview
1 Th: Aug 28 Why ML systems can fail in practice
2 Tu: Sep 2 Scoping and Defining ML Projects Individual Assignment: Getting to know the class project (due tuesday)
Project Team Selection
2 Th: Sep 4 Getting, Storing, and Linking Data
3 Tu: Sep 9 Data Exploration Short Assignment on Data Exploration
3 Th: Sep 11 Analytical Formulation / Baselines
4 Tu: Sep 16 Model Selection Methodology Project Assignment 1: Formulation and Baseline (due Monday)
4 Th: Sep 18 Performance Metrics
5 Tu: Sep 23 Feature Engineering and Imputation Project Assignment 2:
Validation set up
Initial pipeline with train and validation set(s) and baseline implemented (due Monday)
5 Th: Sep 25 ML Pipelines
6 Tu: Sep 30 Models/hyperparameters in practice Project Assignment 3:
list of features and some subset implemented (due Monday)
6 Th: Oct 2 Temporal Model Selection
7 Tu: Oct 7 Module 1 Review: Applied ML - End to End Pipelines Project Assignment 4:
modeling results (due Monday)
7 Th: Oct 9 no class for midterm time Take-Home Midterm Available
8 Tu: Oct 14 No Class - Mid-semester break
8 Th: Oct 16 No Class - Mid-semester break
9 Tu: Oct 21 Common ML Pitfalls Add assignment: importances and cross tabs
9 Th: Oct 23 ML Ethics Issues Overview Updated model results assignment (+ model selection) Due Monday
10 Tu: Oct 28 Understanding the Models importances + cross tabs assignment due
10 Th: Oct 30 Interpretability
11 Tu: Nov 4 No class - Election Day
11 Th: Nov 6 Fairness in ML
12 Tu: Nov 11 Fairness in ML
12 Th: Nov 13 ML and Causal Inference
13 Tu: Nov 18 Evaluating ML Systems in the Field
13 Th: Nov 20 ML Ops
14 Tu: Nov 25 Uncertainlty Quantificaion
14 Th: Nov 27 Thanksgiving holiday
15 Tu: Dec 2 TBD Writeup Due
15 Th: Dec 4 Wrap-Up
Finals Week Final Reflection Writeup Due

Projects and Deliverables

Broadly, the course will be divided into two modules: 1) applied end-to-end machine learning pipelines, 2) Key considerations when building ML systems in practice, such as interpretability, fairness, uncertainty quantification, privacy, MLOps. Throughout the course, students will work in groups of 4 on an applied project based on a real-world problem to explore the ideas and methods covered in each module in detail. During the project, students will be responsible for several key deliverables:

More details about the class project

Public schools in the United States face large disparities in funding, often resulting in teachers and staff members filling these gaps by purchasing classroom supplies out of their own pockets. DonorsChoose is an online crowdfunding platform that tries to help alleviate this financial burden on teachers by allowing them to seek funding for projects and resources from the community (projects can include classroom basics like books and markers, larger items like lab equipment or musical instruments, specific experiences like field trips or guest speakers).

Projects on DonorsChoose expire after 4 months, and if the target funding level isn't reached, the project receives no funding. Since its launch in 2000, the platform has helped fund over 2 million projects at schools across the US, but about 1/3 of the projects that are posted nevertheless fail to meet their goal and go unfunded.

The Modeling Problem

For the purposes of the class project, \DonorsChoose has hired a digital content expert who will review projects and help teachers improve their postings and increase their chances of reaching their funding threshold. Because this individualized review is a labor-intensive process, the digital content expert has ** time to review and support only 10% of the projects posted to the platform on a given day**.

You are working with DonorsChoose, and your task is to help this content expert focus their limited resources on projects that most need the help. As such, you want to build a model to identify projects that are least likely to be fully funded before they expire and pass them off to the digital content expert for review.

Data

Download links and data set description

Grace Days

Project teams receive 3 total grace days for use on your project deliverables. You may not use more than 1 grace day on any single assignment. We will automatically keep a tally of these grace days for you; they will be applied greedily.

Participation and Missing Days

Attendance in class and participation in class discussions is a large part of 10-718. Throughout the semester your participation will be measured by your responses in class and via Slido. You are permitted to miss a maximum of 4 lectures in order to still be considered for full participation credit, more than this will begin to reduce your participation grade.

Structure

Below is a preliminary schedule of the course, including the readings that will be assigned for that week. Please be sure to have read and be prepared to discuss the readings before the specified class session. Most of these topics can be (and often are) the focus of entire courses and generally, we’ll only scratch the surface, but hopefully inspire you to delve deeper into areas that interest you (and you’ll find plenty of open research questions in each). Optional readings are also listed for most sessions which may be of interest to students who wish to delve deeper into a given area as well as provide additional context for your related project work.

MODULE 1: APPLYING ML TO PRACTICAL PROBLEMS**

MODULE 2: Key Considerations Beyond Model Accuracy**

This module will focus on topics such as ethics, interpretability, fairness, robustness, privacy, causality, field trials, uncertainty quantification, and supporting decision-makers. The topics to be covered will be chosen as we go through the semester collaboratively.

More Resources

You may find a number of books useful as general background reading on specific topics covered in class, but these are by no means required texts for the course:

Additionally, the Global Communication Center (GCC) can provide assistance with the written or oral communication assignments in this class. The GCC is a free service, open to all students, and located in Hunt Library. You can learn more on the GCC website: cmu.edu/gcc.

Your Responsibilities

Attendance: Because much of this course is focused on discussion with your classmates, attending each session is important to both your ability to learn from the course and to contribute to what others get out of it as well. As such, you’ll be expected to attend every session and your participation will factor into your grade as described above. Should anything come up that will require you to miss a class (illness, conferences, etc), please let one of the course staff know in advance.

Academic Integrity: Violations of class and university academic integrity policies will not be tolerated. Any instances of copying, cheating, plagiarism, or other academic integrity violations will be reported to your advisor and the dean of students in addition to resulting in an immediate failure of the course.

AI Use Policy: We want this class to reflect what solving problems with ML in the real world looks like, which means different policies depending on 1) where you're working 2) the data you’re using, and 3) the privacy and confidentiality requirements. For the data we are using in this class, as long as you don't share or upload any confidential information to any AI tool (on the web), you can use any tool you want. You're accountable for the output and the work you submit. Know that a lot of these models are trained on pretty bad ML code and practices :)

We also want this class to help you understand what the AI tools are good for, where they fall short, and how to best use them to solve real-world problems. So use them, but be skeptical, review and test the output, and be ready to share what you find with others in the class.

tl;dr

Resources

Students with Disabilities: We value inclusion and will work to ensure that all students have the resources they need to fully participate in our course. Please use the Office of Disability Resource’s online system to notify us of any necessary accommodations as early in the semester as possible. If you suspect that you have a disability but are not yet registered with the Office of Disability Resources, you can contact them at access@andrew.cmu.edu

Health and Wellness: As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. These mental health concerns or stressful events may diminish your academic performance and/or reduce your ability to participate in daily activities. CMU services are available, and treatment does work.
All of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is almost always helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:
CaPS: 412-268-2922
Re:solve Crisis Network: 888-796-8226

If the situation is life threatening, call the police
On campus: CMU Police: 412-268-2323
Off campus: 911

Discrimination and Harassment: Everyone has a right to feel safe and respected on campus. If you or someone you know has been impacted by sexual harassment, assault, or discrimination, resources are available to help. You can make a report by contacting the University’s Office of Title IX Initiatives by email (tix@andrew.cmu.edu) or phone (412-268-7125).

Confidential reporting services are available through the Counseling and Psychological Services and University Health Center, as well as the Ethics Reporting Hotline at 877-700-7050 or www.reportit.net (user name: tartans; password: plaid).
You can learn more about these options, policies, and resources by visiting the University’s Title IX Office webpage at https://www.cmu.edu/title-ix/index.html
In case of an emergency, contact University Police 412-268-2323 on campus or call 911 off campus.

Student Academic Success Center (SASC)

SASC focuses on creating spaces for students to engage in their coursework and approach learning through a variety of group and individual tutoring options. They offer many opportunities for students to deepen their understanding of who they are as learners, communicators, and scholars. Their workshops are free to the CMU community and meet the needs of all disciplines and levels of study. SASC programs to support student learning include the following (program titles link to webpages):