Welcome to Aequitas

The Aequitas toolkit is a flexible bias-audit utility for algorithmic decision-making models, accessible via Python API, command line interface (CLI), and through our web application.

Use Aequitas to evaluate model performance across several bias and fairness metrics, and utilize the most relevant metrics to your process in model selection.

Aequitas will help you:

  • Understand where biases exist in your model(s)

  • Compare the level of bias between groups in your sample population (bias disparity)

  • Visualize absolute bias metrics and their related disparities for rapid comprehension and decision-making

Our goal is to support informed and equitable action for both machine learnining practitioners and the decision-makers who rely on them.

Aequitas is compatible with: Python 3.6+

Getting started

You can audit your risk assessment system for two types of biases:

  • Biased actions or interventions that are not allocated in a way that’s representative of the population.

  • Biased outcomes through actions or interventions that are a result of your system being wrong about certain groups of people.

For both audits, you need the following data:

  • Data about the specific attributes (race, gender, age, income, etc.) you want to audit for the the overall population considered for interventions

  • The set of individuals in the above population that your risk assessment system recommended/ selected for intervention or action. It’s important t this set come from the assessments made after the system has been built, and not from the data the machine learning system was “trained” on if you’re using the audit as a factor in model selection.

If you want to audit for biases due to model or system errors, you also need to include actual outcomes (label values) for all individuals in the overall population.

Input data has slightly different requirements depending on whether you are using Aequitas via the webapp, CLI or Python package. In general, input data is a single table with the following columns:

  • score

  • label_value (for error-based metrics only)

  • at least one attribute e.g. race, sex and age_cat (attribute categories defined by user)

Bias measures tailored to your problem

After installing on your computer

Run aequitas-report on COMPAS data:

compas_for_aequitas.csv excerpt:










25 - 45



Native American


Less than 25

aequitas-report --input compas_for_aequitas.csv

Note: Disparites are always defined in relation to a reference group. By default, Aequitas uses the majority group within each attribute as the reference group. Defining a reference group

The Bias Report produces a pdf that returns descriptive interpretation of the results along with three sets of tables.

  • Fairness Measures Results

  • Bias Metrics Results

  • Group Metrics Results

Additionally, a csv is produced that contains the relevant data. More information about output here.

In the command line you will see The Bias Report, which returns counts for each attribute by group and then computes various fairness metrics. This is the same information that is captured in the csv output.

                    ___                    _ __
                   /   | ___  ____ ___  __(_) /_____ ______
                  / /| |/ _ \/ __ `/ / / / / __/ __ `/ ___/
                 / ___ /  __/ /_/ / /_/ / / /_/ /_/ (__  )
                /_/  |_\___/\__, /\__,_/_/\__/\__,_/____/

                      Bias and Fairness Audit Tool

Welcome to Aequitas-Audit
Fairness measures requested: Statistical Parity,Impact Parity,FDR Parity,FPR Parity,FNR Parity,FOR Parity
model_id, score_thresholds 1 {'rank_abs': [3317]}
COUNTS::: race
African-American    3696
Asian                 32
Caucasian           2454
Hispanic             637
Native American       18
Other                377
dtype: int64
COUNTS::: sex
Female    1395
Male      5819
dtype: int64
COUNTS::: age_cat
25 - 45            4109
Greater than 45    1576
Less than 25       1529
dtype: int64
audit: df shape from the crosstabs: (11, 26)
number of rows after bias majority ref group: 11
Any NaN?:  False
bias_df shape: (11, 38)
Fairness Threshold: 0.8
Fairness Measures: ['Statistical Parity', 'Impact Parity', 'FDR Parity', 'FPR Parity', 'FNR Parity', 'FOR Parity']