Aequitas API

src.aequitas.group

class src.aequitas.group.Group[source]
get_crosstabs(df, score_thresholds=None, model_id=1, attr_cols=None)[source]

Creates univariate groups and calculates group metrics.

Parameters
  • df – a dataframe containing the following required columns [score, label_value].

  • score_thresholds – dictionary { ‘rank_abs’:[] , ‘rank_pct’:[], ‘score’:[] }

  • model_id – the model ID on which to subset the df.

  • attr_cols – optional, list of names of columns corresponding to group attributes (i.e., gender, age category, race, etc.).

Returns

A dataframe of group score, label, and error statistics and absolute bias metric values grouped by unique attribute values

list_absolute_metrics(df)[source]

View list of all calculated absolute bias metrics in df

src.aequitas.bias

class src.aequitas.bias.Bias(key_columns=('model_id', 'score_threshold', 'attribute_name'), sample_df=None, non_attr_cols=('score', 'model_id', 'as_of_date', 'entity_id', 'rank_abs', 'rank_pct', 'id', 'label_value'), input_group_metrics=('ppr', 'pprev', 'precision', 'fdr', 'for', 'fpr', 'fnr', 'tpr', 'tnr', 'npv'), fill_divbyzero=None)[source]
get_disparity_major_group(df, original_df, key_columns=None, input_group_metrics=None, fill_divbyzero=None, check_significance=None, alpha=0.05, mask_significance=True, label_score_ref='fpr')[source]

Calculates disparities between groups for the predefined list of group metrics using the majority group within each attribute as the reference group (denominator).

Parameters
  • df – output dataframe of Group class get_crosstabs() method.

  • original_df – a dataframe of sample features and model results. Includes a required ‘score ‘column and possible ‘label_value’ column.

  • key_columns – optional, key identifying columns for grouping variables and bias metrics in intermediate joins. Defaults are ‘model_id’, ‘score_threshold’, ‘attribute_name’.

  • input_group_metrics – optional, columns list corresponding to the group metrics for which we want to calculate disparity values

  • fill_divbyzero – optional, fill value to use when divided by zero. Default is None.

  • check_significance – measures for which to determine statistical significance beyond label_value and score. Default is all metrics.

  • alpha – statistical significance level to use in significance determination. Default is 5e-2 (0.05).

  • mask_significance – whether to display a T/F mask over calculated p-values from statistical significance determination. Default is True.

  • label_score_ref – default reference group to use for score and label_value statistical significance calculations.

Returns

A dataframe with same number of rows as the input (crosstab) with additional disparity metrics columns and ref_group_values for each metric.

get_disparity_min_metric(df, original_df, key_columns=None, input_group_metrics=None, fill_divbyzero=None, check_significance=None, alpha=0.05, mask_significance=True, label_score_ref='fpr')[source]

Calculates disparities between groups for the predefined list of group metrics using the group with the minimum value for each absolute bias metric as the reference group (denominator).

Parameters
  • df – output dataframe of Group class get_crosstabs() method.

  • original_df – a dataframe of sample features and model results. Includes a required ‘score ‘column and possible ‘label_value’ column.

  • key_columns – optional, key identifying columns for grouping variables and bias metrics in intermediate joins. Defaults are ‘model_id’, ‘score_threshold’, ‘attribute_name’.

  • input_group_metrics – optional, columns list corresponding to the group metrics for which we want to calculate disparity values

  • fill_divbyzero – optional, fill value to use when divided by zero. Default is None.

  • check_significance – measures for which to determine statistical significance beyond label_value and score. Default is all metrics.

  • alpha – statistical significance level to use in significance determination. Default is 5e-2 (0.05).

  • mask_significance – whether to display a T/F mask over calculated p-values from statistical significance determination. Default is True.

  • label_score_ref – default reference group to use for score and label_value statistical significance calculations.

Returns

A dataframe with same number of rows as the input (crosstab) with additional disparity metrics columns and ref_group_values for each metric.

get_disparity_predefined_groups(df, original_df, ref_groups_dict, key_columns=None, input_group_metrics=None, fill_divbyzero=None, check_significance=None, alpha=0.05, mask_significance=True)[source]

Calculates disparities between groups for the predefined list of group metrics using a predefined reference group (denominator) value for each attribute.

Parameters
  • df – output dataframe of Group class get_crosstabs() method.

  • original_df – dataframe of sample features and model results. Includes a required ‘score ‘column and possible ‘label_value’ column.

  • ref_groups_dict – dictionary of format: {‘attribute_name’: ‘attribute_value’, …}

  • key_columns – optional, key identifying columns for grouping variables and bias metrics in intermediate joins. Defaults are ‘model_id’, ‘score_threshold’, ‘attribute_name’.

  • input_group_metrics – optional, columns list corresponding to the group metrics for which we want to calculate disparity values

  • fill_divbyzero – optional, fill value to use when divided by zero. Default is None.

  • check_significance – measures for which to determine statistical significance beyond label_value and score. Default is all metrics.

  • alpha – statistical significance level to use in significance determination. Default is 5e-2 (0.05).

  • mask_significance – whether to display a T/F mask over calculated p-values from statistical significance determination. Default is True.

Returns

A dataframe with same number of rows as the input (crosstab) with additional disparity metrics columns and ref_group_values for each metric.

list_absolute_metrics(df)[source]

View list of all calculated absolute bias metrics in df

list_disparities(df)[source]

View list of all calculated disparities in df

list_significance(df)[source]

View list of all calculated disparities in df

src.aequitas.fairness

class src.aequitas.fairness.Fairness(fair_eval=None, tau=None, fair_measures_depend=None, type_parity_depend=None, high_level_fairness_depend=None)[source]
get_fairness_measures_supported(input_df)[source]

Determine fairness measures supported based on columns in data frame.

get_group_attribute_fairness(group_value_df, fair_measures_requested=None)[source]

Determines whether the minimum value for each fairness measure in fair_measures_requested is ‘False’ across all attribute_values defined by a group attribute_name. If ‘False’ is present, determination for the attribute is False for given fairness measure.

Parameters

group_value_df – output dataframe of get_group_value_fairness() method

Returns

A dataframe of fairness measures at the attribute level (no attribute_values)

get_group_value_fairness(bias_df, tau=None, fair_measures_requested=None)[source]

Calculates the fairness measures defined in fair_measures_requested dictionary and adds them as columns to the input bias_df.

Parameters
  • bias_df – the output dataframe from bias/ disparity calculation methods.

  • tau – optional, the threshold for fair/ unfair evaluation.

  • fair_measures_requested – optional, a dictionary containing fairness measures as keys and the corresponding input bias disparity as values.

Returns

Bias_df dataframe with additional columns for each of the fairness measures defined in the fair_measures dictionary

get_overall_fairness(group_attribute_df)[source]

Calculates overall fairness regardless of the group_attributes. Searches for ‘False’ parity determinations across group_attributes and outputs ‘True’ determination if all group_attributes are fair.

Parameters

group_attribute_df – the output df of the get_group_attributes_fairness

Returns

A dictionary of overall, unsupervised, and supervised fairness determinations

list_parities(df)[source]

View list of all parity determinations in df

src.aequitas.plotting

class src.aequitas.plotting.Plot(key_metrics=('pprev', 'ppr', 'fdr', 'for', 'fpr', 'fnr'), key_disparities=('pprev_disparity', 'ppr_disparity', 'fdr_disparity', 'for_disparity', 'fpr_disparity', 'fnr_disparity'))[source]

Plotting object allows for visualization of absolute group bias metrics and relative disparities calculated by Aequitas Group(), Bias(), and Fairness() class instances.

plot_disparity(disparity_table, group_metric, attribute_name, color_mapping=None, ax=None, fig=None, label_dict=None, title=True, highlight_fairness=False, min_group_size=None, significance_alpha=0.05)[source]

Create treemap based on a single bias disparity metric across attribute groups.

Adapted from https://plot.ly/python/treemaps/, https://gist.github.com/gVallverdu/0b446d0061a785c808dbe79262a37eea, and https://fcpython.com/visualisation/python-treemaps-squarify-matplotlib

Parameters
  • disparity_table – a disparity table. Output of bias.get_disparity or fairness.get_fairness function.

  • group_metric – the metric to plot. Must be a column in the disparity_table.

  • attribute_name – which attribute to plot group_metric across.

  • color_mapping – matplotlib colormapping for treemap value boxes.

  • ax – a matplotlib Axis. If not passed, a new figure will be created.

  • fig – a matplotlib Figure. If not passed, a new figure will be created.

  • label_dict – optional, dictionary of replacement labels for data. Default is None.

  • title – whether to include a title in visualizations. Default is True.

  • highlight_fairness – whether to highlight treemaps by disparity magnitude, or by related fairness determination.

  • min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in bias metric visualization

  • significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap).

Returns

A Matplotlib axis

plot_disparity_all(data_table, attributes=None, metrics=None, fillzeros=True, title=True, label_dict=None, ncols=3, show_figure=True, min_group_size=None, significance_alpha=0.05)[source]

Plot multiple metrics at once from a fairness object table.

Parameters
  • data_table – output of group.get_crosstabs, bias.get_disparity, or fairness.get_fairness functions.

  • attributes – which attribute(s) to plot metrics for. If this value is null, will plot metrics against all attributes.

  • metrics

    which metric(s) to plot, or ‘all.’ If this value is null, will plot:

    • Predicted Prevalence Disparity (pprev_disparity),

    • Predicted Positive Rate Disparity (ppr_disparity),

    • False Discovery Rate Disparity (fdr_disparity),

    • False Omission Rate Disparity (for_disparity),

    • False Positive Rate Disparity (fpr_disparity),

    • False Negative Rate Disparity (fnr_disparity)

  • fillzeros – whether to fill null values with zeros. Default is True.

  • title – whether to display a title on each plot. Default is True.

  • label_dict – optional dictionary of label replacements. Default is None.

  • show_figure – whether to show figure (plt.show()). Default is True.

  • min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in metric visualization.

  • significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap).

Returns

A Matplotlib figure

plot_fairness_disparity(fairness_table, group_metric, attribute_name, ax=None, fig=None, title=True, min_group_size=None, significance_alpha=0.05)[source]

Plot disparity metrics colored based on calculated disparity.

Parameters
  • group_metric – the metric to plot. Must be a column in the disparity_table.

  • attribute_name – which attribute to plot group_metric across.

  • ax – a matplotlib Axis. If not passed, a new figure will be created.

  • fig – a matplotlib Figure. If not passed, a new figure will be created.

  • title – whether to include a title in visualizations. Default is True.

  • min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in bias metric visualization

  • significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap).

Returns

A Matplotlib axis

plot_fairness_disparity_all(fairness_table, attributes=None, metrics=None, fillzeros=True, title=True, label_dict=None, show_figure=True, min_group_size=None, significance_alpha=0.05)[source]

Plot multiple metrics at once from a fairness object table.

Parameters
  • fairness_table – output of fairness.get_fairness functions.

  • attributes – which attribute(s) to plot metrics for. If this value is null, will plot metrics against all attributes.

  • metrics

    which metric(s) to plot, or ‘all.’ If this value is null, will plot:

    • Predicted Prevalence Disparity (pprev_disparity),

    • Predicted Positive Rate Disparity (ppr_disparity),

    • False Discovery Rate Disparity (fdr_disparity),

    • False Omission Rate Disparity (for_disparity),

    • False Positive Rate Disparity (fpr_disparity),

    • False Negative Rate Disparity (fnr_disparity)

  • fillzeros – whether to fill null values with zeros. Default is True.

  • title – whether to display a title on each plot. Default is True.

  • label_dict – optional dictionary of label replacements. Default is None.

  • show_figure – whether to show figure (plt.show()). Default is True.

  • min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in fairness visualization

  • significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap)

Returns

A Matplotlib figure

plot_fairness_group(fairness_table, group_metric, ax=None, ax_lim=None, title=False, label_dict=None, min_group_size=None)[source]

This function plots absolute group metrics as indicated by the config file, colored based on calculated parity.

Parameters
  • fairness_table – a fairness table. Output of fairness.get_fairness function.

  • group_metric – the fairness metric to plot. Must be a column in the fairness_table.

  • ax – a matplotlib Axis. If not passed a new figure will be created.

  • ax_lim – maximum value on x-axis, used to match axes across subplots when plotting multiple metrics. Default is None.

  • title – whether to include a title in visualizations. Default is True.

  • label_dict – optional dictionary of replacement values for data. Default is None.

  • min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in fairness visualization

Returns

A Matplotlib axis

plot_fairness_group_all(fairness_table, metrics=None, fillzeros=True, ncols=3, title=True, label_dict=None, show_figure=True, min_group_size=None)[source]

Plot multiple metrics at once from a fairness object table.

Parameters
  • fairness_table – output of fairness.get_fairness functions.

  • metrics

    which metric(s) to plot, or ‘all.’ If this value is null, will plot:

    • Predicted Prevalence (pprev),

    • Predicted Positive Rate (ppr),

    • False Discovery Rate (fdr),

    • False Omission Rate (for),

    • False Positive Rate (fpr),

    • False Negative Rate (fnr)

  • fillzeros – whether to fill null values with zeros. Default is True.

  • ncols – number of subplots per row in figure. Default is 3.

  • title – whether to display a title on each plot. Default is True.

  • label_dict – optional dictionary of label replacements. Default is None.

  • show_figure – whether to show figure (plt.show()). Default is True.

  • min_group_size – minimum proportion of total group size (all data). a population group must meet in order to be included in fairness visualization

Returns

A Matplotlib figure

plot_group_metric(group_table, group_metric, ax=None, ax_lim=None, title=True, label_dict=None, min_group_size=None)[source]

Plot a single group metric across all attribute groups.

Parameters
  • group_table – group table. Output of of group.get_crosstabs() or bias.get_disparity functions.

  • group_metric – the metric to plot. Must be a column in the group_table.

  • ax – a matplotlib Axis. If not passed, a new figure will be created.

  • title – whether to include a title in visualizations. Default is True.

  • label_dict – optional, dictionary of replacement labels for data. Default is None.

  • min_group_size – minimum size for groups to include in visualization (as a proportion of total sample)

Returns

A Matplotlib axis

plot_group_metric_all(data_table, metrics=None, fillzeros=True, ncols=3, title=True, label_dict=None, show_figure=True, min_group_size=None)[source]

Plot multiple metrics at once from a fairness object table.

Parameters
  • data_table – output of group.get_crosstabs, bias.get_disparity, or fairness.get_fairness functions.

  • metrics

    which metric(s) to plot, or ‘all.’ If this value is null, will plot:

    • Predicted Prevalence (pprev),

    • Predicted Positive Rate (ppr),

    • False Discovery Rate (fdr),

    • False Omission Rate (for),

    • False Positive Rate (fpr),

    • False Negative Rate (fnr)

  • fillzeros – whether to fill null values with zeros. Default is True.

  • ncols – number of subplots per row in figure. Default is 3.

  • title – whether to display a title on each plot. Default is True.

  • label_dict – optional dictionary of label replacements. Default is None.

  • show_figure – whether to show figure (plt.show()). Default is True.

  • min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in group metric visualization.

Returns

A Matplotlib figure

src.aequitas.plotting.assemble_ref_groups(disparities_table, ref_group_flag='_ref_group_value', specific_measures=None, label_score_ref=None)[source]

Creates a dictionary of reference groups for each metric in a data_table.

Parameters

disparities_table

a disparity table. Output of bias.get_disparity or

fairness.get_fairness functions

param ref_group_flag

string indicating column indicates reference group flag value. Default is ‘_ref_group_value’.

param specific_measures

Limits reference dictionary to only specified metrics in a data table. Default is None.

param label_score_ref

Defines a metric, ex: ‘fpr’ (false positive rate) from which to mimic reference group for label_value and score. Used for statistical significance calculations in Bias() class. Default is None.

return

A dictionary

src.aequitas.preprocessing

src.aequitas.preprocessing.check_required_cols(df, required_cols)[source]
Parameters
  • df – A data frame of model results

  • required_cols – Column names required for selected fairness measures

Returns

None, or ValueError

src.aequitas.preprocessing.discretize(df, target_cols)[source]
Parameters
  • df – A data frame of model results

  • target_cols – Names of columns to discretize

Returns

A data frame

src.aequitas.preprocessing.get_attr_cols(df, non_attr_cols)[source]
Parameters
  • df – A data frame of model results

  • non_attr_cols – Names of columns not associated with attributes

Returns

List of columns associated with sample attributes

src.aequitas.preprocessing.preprocess_input_df(df, required_cols=None)[source]
Parameters
  • df – A data frame of model results

  • required_cols – Names of columns required for bias calculations. Default is None.

Returns

A data frame, list of columns associated with sample attributes