Aequitas API¶

`src.aequitas.group`¶

class src.aequitas.group.Group[source]¶

get_crosstabs(df, score_thresholds=None, model_id=1, attr_cols=None)[source]¶

Creates univariate groups and calculates group metrics.

Parameters

df – a dataframe containing the following required columns [score, label_value].
score_thresholds – dictionary { ‘rank_abs’:[] , ‘rank_pct’:[], ‘score’:[] }
model_id – the model ID on which to subset the df.
attr_cols – optional, list of names of columns corresponding to group attributes (i.e., gender, age category, race, etc.).

Returns

A dataframe of group score, label, and error statistics and absolute bias metric values grouped by unique attribute values

list_absolute_metrics(df)[source]¶: View list of all calculated absolute bias metrics in df

class src.aequitas.bias.Bias(key_columns=('model_id', 'score_threshold', 'attribute_name'), sample_df=None, non_attr_cols=('score', 'model_id', 'as_of_date', 'entity_id', 'rank_abs', 'rank_pct', 'id', 'label_value'), input_group_metrics=('ppr', 'pprev', 'precision', 'fdr', 'for', 'fpr', 'fnr', 'tpr', 'tnr', 'npv'), fill_divbyzero=None)[source]¶

get_disparity_major_group(df, original_df, key_columns=None, input_group_metrics=None, fill_divbyzero=None, check_significance=None, alpha=0.05, mask_significance=True, label_score_ref='fpr')[source]¶

Calculates disparities between groups for the predefined list of group metrics using the majority group within each attribute as the reference group (denominator).

Parameters

df – output dataframe of Group class get_crosstabs() method.
original_df – a dataframe of sample features and model results. Includes a required ‘score ‘column and possible ‘label_value’ column.
key_columns – optional, key identifying columns for grouping variables and bias metrics in intermediate joins. Defaults are ‘model_id’, ‘score_threshold’, ‘attribute_name’.
input_group_metrics – optional, columns list corresponding to the group metrics for which we want to calculate disparity values
fill_divbyzero – optional, fill value to use when divided by zero. Default is None.
check_significance – measures for which to determine statistical significance beyond label_value and score. Default is all metrics.
alpha – statistical significance level to use in significance determination. Default is 5e-2 (0.05).
mask_significance – whether to display a T/F mask over calculated p-values from statistical significance determination. Default is True.
label_score_ref – default reference group to use for score and label_value statistical significance calculations.

Returns

A dataframe with same number of rows as the input (crosstab) with additional disparity metrics columns and ref_group_values for each metric.

get_disparity_min_metric(df, original_df, key_columns=None, input_group_metrics=None, fill_divbyzero=None, check_significance=None, alpha=0.05, mask_significance=True, label_score_ref='fpr')[source]¶

Calculates disparities between groups for the predefined list of group metrics using the group with the minimum value for each absolute bias metric as the reference group (denominator).

Parameters

df – output dataframe of Group class get_crosstabs() method.
original_df – a dataframe of sample features and model results. Includes a required ‘score ‘column and possible ‘label_value’ column.
key_columns – optional, key identifying columns for grouping variables and bias metrics in intermediate joins. Defaults are ‘model_id’, ‘score_threshold’, ‘attribute_name’.
input_group_metrics – optional, columns list corresponding to the group metrics for which we want to calculate disparity values
fill_divbyzero – optional, fill value to use when divided by zero. Default is None.
check_significance – measures for which to determine statistical significance beyond label_value and score. Default is all metrics.
alpha – statistical significance level to use in significance determination. Default is 5e-2 (0.05).
mask_significance – whether to display a T/F mask over calculated p-values from statistical significance determination. Default is True.
label_score_ref – default reference group to use for score and label_value statistical significance calculations.

Returns

A dataframe with same number of rows as the input (crosstab) with additional disparity metrics columns and ref_group_values for each metric.

get_disparity_predefined_groups(df, original_df, ref_groups_dict, key_columns=None, input_group_metrics=None, fill_divbyzero=None, check_significance=None, alpha=0.05, mask_significance=True)[source]¶

Calculates disparities between groups for the predefined list of group metrics using a predefined reference group (denominator) value for each attribute.

Parameters

df – output dataframe of Group class get_crosstabs() method.
original_df – dataframe of sample features and model results. Includes a required ‘score ‘column and possible ‘label_value’ column.
ref_groups_dict – dictionary of format: {‘attribute_name’: ‘attribute_value’, …}
key_columns – optional, key identifying columns for grouping variables and bias metrics in intermediate joins. Defaults are ‘model_id’, ‘score_threshold’, ‘attribute_name’.
input_group_metrics – optional, columns list corresponding to the group metrics for which we want to calculate disparity values
fill_divbyzero – optional, fill value to use when divided by zero. Default is None.
check_significance – measures for which to determine statistical significance beyond label_value and score. Default is all metrics.
alpha – statistical significance level to use in significance determination. Default is 5e-2 (0.05).
mask_significance – whether to display a T/F mask over calculated p-values from statistical significance determination. Default is True.

Returns

A dataframe with same number of rows as the input (crosstab) with additional disparity metrics columns and ref_group_values for each metric.

list_absolute_metrics(df)[source]¶: View list of all calculated absolute bias metrics in df

list_disparities(df)[source]¶: View list of all calculated disparities in df

list_significance(df)[source]¶: View list of all calculated disparities in df

`src.aequitas.fairness`¶

class src.aequitas.fairness.Fairness(fair_eval=None, tau=None, fair_measures_depend=None, type_parity_depend=None, high_level_fairness_depend=None)[source]¶

get_fairness_measures_supported(input_df)[source]¶: Determine fairness measures supported based on columns in data frame.

get_group_attribute_fairness(group_value_df, fair_measures_requested=None)[source]¶

Determines whether the minimum value for each fairness measure in fair_measures_requested is ‘False’ across all attribute_values defined by a group attribute_name. If ‘False’ is present, determination for the attribute is False for given fairness measure.

Parameters: group_value_df – output dataframe of get_group_value_fairness() method
Returns: A dataframe of fairness measures at the attribute level (no attribute_values)

get_group_value_fairness(bias_df, tau=None, fair_measures_requested=None)[source]¶

Calculates the fairness measures defined in fair_measures_requested dictionary and adds them as columns to the input bias_df.

Parameters

bias_df – the output dataframe from bias/ disparity calculation methods.
tau – optional, the threshold for fair/ unfair evaluation.
fair_measures_requested – optional, a dictionary containing fairness measures as keys and the corresponding input bias disparity as values.

Returns

Bias_df dataframe with additional columns for each of the fairness measures defined in the fair_measures dictionary

get_overall_fairness(group_attribute_df)[source]¶

Calculates overall fairness regardless of the group_attributes. Searches for ‘False’ parity determinations across group_attributes and outputs ‘True’ determination if all group_attributes are fair.

Parameters: group_attribute_df – the output df of the get_group_attributes_fairness
Returns: A dictionary of overall, unsupervised, and supervised fairness determinations

list_parities(df)[source]¶: View list of all parity determinations in df

`src.aequitas.plotting`¶

class src.aequitas.plotting.Plot(key_metrics=('pprev', 'ppr', 'fdr', 'for', 'fpr', 'fnr'), key_disparities=('pprev_disparity', 'ppr_disparity', 'fdr_disparity', 'for_disparity', 'fpr_disparity', 'fnr_disparity'))[source]¶

Plotting object allows for visualization of absolute group bias metrics and relative disparities calculated by Aequitas Group(), Bias(), and Fairness() class instances.

plot_disparity(disparity_table, group_metric, attribute_name, color_mapping=None, ax=None, fig=None, label_dict=None, title=True, highlight_fairness=False, min_group_size=None, significance_alpha=0.05)[source]¶

Create treemap based on a single bias disparity metric across attribute groups.

Adapted from https://plot.ly/python/treemaps/, https://gist.github.com/gVallverdu/0b446d0061a785c808dbe79262a37eea, and https://fcpython.com/visualisation/python-treemaps-squarify-matplotlib

Parameters

disparity_table – a disparity table. Output of bias.get_disparity or fairness.get_fairness function.
group_metric – the metric to plot. Must be a column in the disparity_table.
attribute_name – which attribute to plot group_metric across.
color_mapping – matplotlib colormapping for treemap value boxes.
ax – a matplotlib Axis. If not passed, a new figure will be created.
fig – a matplotlib Figure. If not passed, a new figure will be created.
label_dict – optional, dictionary of replacement labels for data. Default is None.
title – whether to include a title in visualizations. Default is True.
highlight_fairness – whether to highlight treemaps by disparity magnitude, or by related fairness determination.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in bias metric visualization
significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap).

Returns

A Matplotlib axis

plot_disparity_all(data_table, attributes=None, metrics=None, fillzeros=True, title=True, label_dict=None, ncols=3, show_figure=True, min_group_size=None, significance_alpha=0.05)[source]¶

Plot multiple metrics at once from a fairness object table.

Parameters

data_table – output of group.get_crosstabs, bias.get_disparity, or fairness.get_fairness functions.
attributes – which attribute(s) to plot metrics for. If this value is null, will plot metrics against all attributes.
metrics –
which metric(s) to plot, or ‘all.’ If this value is null, will plot:
- Predicted Prevalence Disparity (pprev_disparity),
- Predicted Positive Rate Disparity (ppr_disparity),
- False Discovery Rate Disparity (fdr_disparity),
- False Omission Rate Disparity (for_disparity),
- False Positive Rate Disparity (fpr_disparity),
- False Negative Rate Disparity (fnr_disparity)
fillzeros – whether to fill null values with zeros. Default is True.
title – whether to display a title on each plot. Default is True.
label_dict – optional dictionary of label replacements. Default is None.
show_figure – whether to show figure (plt.show()). Default is True.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in metric visualization.
significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap).

Returns

A Matplotlib figure

plot_fairness_disparity(fairness_table, group_metric, attribute_name, ax=None, fig=None, title=True, min_group_size=None, significance_alpha=0.05)[source]¶

Plot disparity metrics colored based on calculated disparity.

Parameters

group_metric – the metric to plot. Must be a column in the disparity_table.
attribute_name – which attribute to plot group_metric across.
ax – a matplotlib Axis. If not passed, a new figure will be created.
fig – a matplotlib Figure. If not passed, a new figure will be created.
title – whether to include a title in visualizations. Default is True.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in bias metric visualization
significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap).

Returns

A Matplotlib axis

plot_fairness_disparity_all(fairness_table, attributes=None, metrics=None, fillzeros=True, title=True, label_dict=None, show_figure=True, min_group_size=None, significance_alpha=0.05)[source]¶

Plot multiple metrics at once from a fairness object table.

Parameters

fairness_table – output of fairness.get_fairness functions.
attributes – which attribute(s) to plot metrics for. If this value is null, will plot metrics against all attributes.
metrics –
which metric(s) to plot, or ‘all.’ If this value is null, will plot:
- Predicted Prevalence Disparity (pprev_disparity),
- Predicted Positive Rate Disparity (ppr_disparity),
- False Discovery Rate Disparity (fdr_disparity),
- False Omission Rate Disparity (for_disparity),
- False Positive Rate Disparity (fpr_disparity),
- False Negative Rate Disparity (fnr_disparity)
fillzeros – whether to fill null values with zeros. Default is True.
title – whether to display a title on each plot. Default is True.
label_dict – optional dictionary of label replacements. Default is None.
show_figure – whether to show figure (plt.show()). Default is True.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in fairness visualization
significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap)

Returns

A Matplotlib figure

plot_fairness_group(fairness_table, group_metric, ax=None, ax_lim=None, title=False, label_dict=None, min_group_size=None)[source]¶

This function plots absolute group metrics as indicated by the config file, colored based on calculated parity.

Parameters

fairness_table – a fairness table. Output of fairness.get_fairness function.
group_metric – the fairness metric to plot. Must be a column in the fairness_table.
ax – a matplotlib Axis. If not passed a new figure will be created.
ax_lim – maximum value on x-axis, used to match axes across subplots when plotting multiple metrics. Default is None.
title – whether to include a title in visualizations. Default is True.
label_dict – optional dictionary of replacement values for data. Default is None.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in fairness visualization

Returns

A Matplotlib axis

plot_fairness_group_all(fairness_table, metrics=None, fillzeros=True, ncols=3, title=True, label_dict=None, show_figure=True, min_group_size=None)[source]¶

Plot multiple metrics at once from a fairness object table.

Parameters

fairness_table – output of fairness.get_fairness functions.
metrics –
which metric(s) to plot, or ‘all.’ If this value is null, will plot:
- Predicted Prevalence (pprev),
- Predicted Positive Rate (ppr),
- False Discovery Rate (fdr),
- False Omission Rate (for),
- False Positive Rate (fpr),
- False Negative Rate (fnr)
fillzeros – whether to fill null values with zeros. Default is True.
ncols – number of subplots per row in figure. Default is 3.
title – whether to display a title on each plot. Default is True.
label_dict – optional dictionary of label replacements. Default is None.
show_figure – whether to show figure (plt.show()). Default is True.
min_group_size – minimum proportion of total group size (all data). a population group must meet in order to be included in fairness visualization

Returns

A Matplotlib figure

plot_group_metric(group_table, group_metric, ax=None, ax_lim=None, title=True, label_dict=None, min_group_size=None)[source]¶

Plot a single group metric across all attribute groups.

Parameters

group_table – group table. Output of of group.get_crosstabs() or bias.get_disparity functions.
group_metric – the metric to plot. Must be a column in the group_table.
ax – a matplotlib Axis. If not passed, a new figure will be created.
title – whether to include a title in visualizations. Default is True.
label_dict – optional, dictionary of replacement labels for data. Default is None.
min_group_size – minimum size for groups to include in visualization (as a proportion of total sample)

Returns

A Matplotlib axis

plot_group_metric_all(data_table, metrics=None, fillzeros=True, ncols=3, title=True, label_dict=None, show_figure=True, min_group_size=None)[source]¶

Plot multiple metrics at once from a fairness object table.

Parameters

data_table – output of group.get_crosstabs, bias.get_disparity, or fairness.get_fairness functions.
metrics –
which metric(s) to plot, or ‘all.’ If this value is null, will plot:
- Predicted Prevalence (pprev),
- Predicted Positive Rate (ppr),
- False Discovery Rate (fdr),
- False Omission Rate (for),
- False Positive Rate (fpr),
- False Negative Rate (fnr)
fillzeros – whether to fill null values with zeros. Default is True.
ncols – number of subplots per row in figure. Default is 3.
title – whether to display a title on each plot. Default is True.
label_dict – optional dictionary of label replacements. Default is None.
show_figure – whether to show figure (plt.show()). Default is True.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in group metric visualization.

Returns

A Matplotlib figure

src.aequitas.plotting.assemble_ref_groups(disparities_table, ref_group_flag='_ref_group_value', specific_measures=None, label_score_ref=None)[source]¶

Creates a dictionary of reference groups for each metric in a data_table.

Parameters

disparities_table –

a disparity table. Output of bias.get_disparity or: fairness.get_fairness functions

param ref_group_flag: string indicating column indicates reference group flag value. Default is ‘_ref_group_value’.
param specific_measures: Limits reference dictionary to only specified metrics in a data table. Default is None.
param label_score_ref: Defines a metric, ex: ‘fpr’ (false positive rate) from which to mimic reference group for label_value and score. Used for statistical significance calculations in Bias() class. Default is None.
return: A dictionary

`src.aequitas.preprocessing`¶

src.aequitas.preprocessing.check_required_cols(df, required_cols)[source]¶

Parameters

df – A data frame of model results
required_cols – Column names required for selected fairness measures

Returns

None, or ValueError

src.aequitas.preprocessing.discretize(df, target_cols)[source]¶

Parameters

df – A data frame of model results
target_cols – Names of columns to discretize

Returns

A data frame

src.aequitas.preprocessing.get_attr_cols(df, non_attr_cols)[source]¶

Parameters

df – A data frame of model results
non_attr_cols – Names of columns not associated with attributes

Returns

List of columns associated with sample attributes

src.aequitas.preprocessing.preprocess_input_df(df, required_cols=None)[source]¶

Parameters

df – A data frame of model results
required_cols – Names of columns required for bias calculations. Default is None.

Returns

A data frame, list of columns associated with sample attributes

Aequitas API¶

src.aequitas.group¶

src.aequitas.bias¶

src.aequitas.fairness¶

src.aequitas.plotting¶

src.aequitas.preprocessing¶

`src.aequitas.group`¶

`src.aequitas.bias`¶

`src.aequitas.fairness`¶

`src.aequitas.plotting`¶

`src.aequitas.preprocessing`¶