Aequitas API¶
src.aequitas.group
¶
-
class
src.aequitas.group.
Group
[source]¶ -
get_crosstabs
(df, score_thresholds=None, model_id=1, attr_cols=None)[source]¶ Creates univariate groups and calculates group metrics.
- Parameters
df – a dataframe containing the following required columns [score, label_value].
score_thresholds – dictionary { ‘rank_abs’:[] , ‘rank_pct’:[], ‘score’:[] }
model_id – the model ID on which to subset the df.
attr_cols – optional, list of names of columns corresponding to group attributes (i.e., gender, age category, race, etc.).
- Returns
A dataframe of group score, label, and error statistics and absolute bias metric values grouped by unique attribute values
-
src.aequitas.bias
¶
-
class
src.aequitas.bias.
Bias
(key_columns=('model_id', 'score_threshold', 'attribute_name'), sample_df=None, non_attr_cols=('score', 'model_id', 'as_of_date', 'entity_id', 'rank_abs', 'rank_pct', 'id', 'label_value'), input_group_metrics=('ppr', 'pprev', 'precision', 'fdr', 'for', 'fpr', 'fnr', 'tpr', 'tnr', 'npv'), fill_divbyzero=None)[source]¶ -
get_disparity_major_group
(df, original_df, key_columns=None, input_group_metrics=None, fill_divbyzero=None, check_significance=None, alpha=0.05, mask_significance=True, label_score_ref='fpr')[source]¶ Calculates disparities between groups for the predefined list of group metrics using the majority group within each attribute as the reference group (denominator).
- Parameters
df – output dataframe of Group class get_crosstabs() method.
original_df – a dataframe of sample features and model results. Includes a required ‘score ‘column and possible ‘label_value’ column.
key_columns – optional, key identifying columns for grouping variables and bias metrics in intermediate joins. Defaults are ‘model_id’, ‘score_threshold’, ‘attribute_name’.
input_group_metrics – optional, columns list corresponding to the group metrics for which we want to calculate disparity values
fill_divbyzero – optional, fill value to use when divided by zero. Default is None.
check_significance – measures for which to determine statistical significance beyond label_value and score. Default is all metrics.
alpha – statistical significance level to use in significance determination. Default is 5e-2 (0.05).
mask_significance – whether to display a T/F mask over calculated p-values from statistical significance determination. Default is True.
label_score_ref – default reference group to use for score and label_value statistical significance calculations.
- Returns
A dataframe with same number of rows as the input (crosstab) with additional disparity metrics columns and ref_group_values for each metric.
-
get_disparity_min_metric
(df, original_df, key_columns=None, input_group_metrics=None, fill_divbyzero=None, check_significance=None, alpha=0.05, mask_significance=True, label_score_ref='fpr')[source]¶ Calculates disparities between groups for the predefined list of group metrics using the group with the minimum value for each absolute bias metric as the reference group (denominator).
- Parameters
df – output dataframe of Group class get_crosstabs() method.
original_df – a dataframe of sample features and model results. Includes a required ‘score ‘column and possible ‘label_value’ column.
key_columns – optional, key identifying columns for grouping variables and bias metrics in intermediate joins. Defaults are ‘model_id’, ‘score_threshold’, ‘attribute_name’.
input_group_metrics – optional, columns list corresponding to the group metrics for which we want to calculate disparity values
fill_divbyzero – optional, fill value to use when divided by zero. Default is None.
check_significance – measures for which to determine statistical significance beyond label_value and score. Default is all metrics.
alpha – statistical significance level to use in significance determination. Default is 5e-2 (0.05).
mask_significance – whether to display a T/F mask over calculated p-values from statistical significance determination. Default is True.
label_score_ref – default reference group to use for score and label_value statistical significance calculations.
- Returns
A dataframe with same number of rows as the input (crosstab) with additional disparity metrics columns and ref_group_values for each metric.
-
get_disparity_predefined_groups
(df, original_df, ref_groups_dict, key_columns=None, input_group_metrics=None, fill_divbyzero=None, check_significance=None, alpha=0.05, mask_significance=True)[source]¶ Calculates disparities between groups for the predefined list of group metrics using a predefined reference group (denominator) value for each attribute.
- Parameters
df – output dataframe of Group class get_crosstabs() method.
original_df – dataframe of sample features and model results. Includes a required ‘score ‘column and possible ‘label_value’ column.
ref_groups_dict – dictionary of format: {‘attribute_name’: ‘attribute_value’, …}
key_columns – optional, key identifying columns for grouping variables and bias metrics in intermediate joins. Defaults are ‘model_id’, ‘score_threshold’, ‘attribute_name’.
input_group_metrics – optional, columns list corresponding to the group metrics for which we want to calculate disparity values
fill_divbyzero – optional, fill value to use when divided by zero. Default is None.
check_significance – measures for which to determine statistical significance beyond label_value and score. Default is all metrics.
alpha – statistical significance level to use in significance determination. Default is 5e-2 (0.05).
mask_significance – whether to display a T/F mask over calculated p-values from statistical significance determination. Default is True.
- Returns
A dataframe with same number of rows as the input (crosstab) with additional disparity metrics columns and ref_group_values for each metric.
-
src.aequitas.fairness
¶
-
class
src.aequitas.fairness.
Fairness
(fair_eval=None, tau=None, fair_measures_depend=None, type_parity_depend=None, high_level_fairness_depend=None)[source]¶ -
get_fairness_measures_supported
(input_df)[source]¶ Determine fairness measures supported based on columns in data frame.
-
get_group_attribute_fairness
(group_value_df, fair_measures_requested=None)[source]¶ Determines whether the minimum value for each fairness measure in fair_measures_requested is ‘False’ across all attribute_values defined by a group attribute_name. If ‘False’ is present, determination for the attribute is False for given fairness measure.
- Parameters
group_value_df – output dataframe of get_group_value_fairness() method
- Returns
A dataframe of fairness measures at the attribute level (no attribute_values)
-
get_group_value_fairness
(bias_df, tau=None, fair_measures_requested=None)[source]¶ Calculates the fairness measures defined in fair_measures_requested dictionary and adds them as columns to the input bias_df.
- Parameters
bias_df – the output dataframe from bias/ disparity calculation methods.
tau – optional, the threshold for fair/ unfair evaluation.
fair_measures_requested – optional, a dictionary containing fairness measures as keys and the corresponding input bias disparity as values.
- Returns
Bias_df dataframe with additional columns for each of the fairness measures defined in the fair_measures dictionary
-
get_overall_fairness
(group_attribute_df)[source]¶ Calculates overall fairness regardless of the group_attributes. Searches for ‘False’ parity determinations across group_attributes and outputs ‘True’ determination if all group_attributes are fair.
- Parameters
group_attribute_df – the output df of the get_group_attributes_fairness
- Returns
A dictionary of overall, unsupervised, and supervised fairness determinations
-
src.aequitas.plotting
¶
-
class
src.aequitas.plotting.
Plot
(key_metrics=('pprev', 'ppr', 'fdr', 'for', 'fpr', 'fnr'), key_disparities=('pprev_disparity', 'ppr_disparity', 'fdr_disparity', 'for_disparity', 'fpr_disparity', 'fnr_disparity'))[source]¶ Plotting object allows for visualization of absolute group bias metrics and relative disparities calculated by Aequitas Group(), Bias(), and Fairness() class instances.
-
plot_disparity
(disparity_table, group_metric, attribute_name, color_mapping=None, ax=None, fig=None, label_dict=None, title=True, highlight_fairness=False, min_group_size=None, significance_alpha=0.05)[source]¶ Create treemap based on a single bias disparity metric across attribute groups.
Adapted from https://plot.ly/python/treemaps/, https://gist.github.com/gVallverdu/0b446d0061a785c808dbe79262a37eea, and https://fcpython.com/visualisation/python-treemaps-squarify-matplotlib
- Parameters
disparity_table – a disparity table. Output of bias.get_disparity or fairness.get_fairness function.
group_metric – the metric to plot. Must be a column in the disparity_table.
attribute_name – which attribute to plot group_metric across.
color_mapping – matplotlib colormapping for treemap value boxes.
ax – a matplotlib Axis. If not passed, a new figure will be created.
fig – a matplotlib Figure. If not passed, a new figure will be created.
label_dict – optional, dictionary of replacement labels for data. Default is None.
title – whether to include a title in visualizations. Default is True.
highlight_fairness – whether to highlight treemaps by disparity magnitude, or by related fairness determination.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in bias metric visualization
significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap).
- Returns
A Matplotlib axis
-
plot_disparity_all
(data_table, attributes=None, metrics=None, fillzeros=True, title=True, label_dict=None, ncols=3, show_figure=True, min_group_size=None, significance_alpha=0.05)[source]¶ Plot multiple metrics at once from a fairness object table.
- Parameters
data_table – output of group.get_crosstabs, bias.get_disparity, or fairness.get_fairness functions.
attributes – which attribute(s) to plot metrics for. If this value is null, will plot metrics against all attributes.
metrics –
which metric(s) to plot, or ‘all.’ If this value is null, will plot:
Predicted Prevalence Disparity (pprev_disparity),
Predicted Positive Rate Disparity (ppr_disparity),
False Discovery Rate Disparity (fdr_disparity),
False Omission Rate Disparity (for_disparity),
False Positive Rate Disparity (fpr_disparity),
False Negative Rate Disparity (fnr_disparity)
fillzeros – whether to fill null values with zeros. Default is True.
title – whether to display a title on each plot. Default is True.
label_dict – optional dictionary of label replacements. Default is None.
show_figure – whether to show figure (plt.show()). Default is True.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in metric visualization.
significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap).
- Returns
A Matplotlib figure
-
plot_fairness_disparity
(fairness_table, group_metric, attribute_name, ax=None, fig=None, title=True, min_group_size=None, significance_alpha=0.05)[source]¶ Plot disparity metrics colored based on calculated disparity.
- Parameters
group_metric – the metric to plot. Must be a column in the disparity_table.
attribute_name – which attribute to plot group_metric across.
ax – a matplotlib Axis. If not passed, a new figure will be created.
fig – a matplotlib Figure. If not passed, a new figure will be created.
title – whether to include a title in visualizations. Default is True.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in bias metric visualization
significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap).
- Returns
A Matplotlib axis
-
plot_fairness_disparity_all
(fairness_table, attributes=None, metrics=None, fillzeros=True, title=True, label_dict=None, show_figure=True, min_group_size=None, significance_alpha=0.05)[source]¶ Plot multiple metrics at once from a fairness object table.
- Parameters
fairness_table – output of fairness.get_fairness functions.
attributes – which attribute(s) to plot metrics for. If this value is null, will plot metrics against all attributes.
metrics –
which metric(s) to plot, or ‘all.’ If this value is null, will plot:
Predicted Prevalence Disparity (pprev_disparity),
Predicted Positive Rate Disparity (ppr_disparity),
False Discovery Rate Disparity (fdr_disparity),
False Omission Rate Disparity (for_disparity),
False Positive Rate Disparity (fpr_disparity),
False Negative Rate Disparity (fnr_disparity)
fillzeros – whether to fill null values with zeros. Default is True.
title – whether to display a title on each plot. Default is True.
label_dict – optional dictionary of label replacements. Default is None.
show_figure – whether to show figure (plt.show()). Default is True.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in fairness visualization
significance_alpha – statistical significance level. Used to determine visual representation of significance (number of asterisks on treemap)
- Returns
A Matplotlib figure
-
plot_fairness_group
(fairness_table, group_metric, ax=None, ax_lim=None, title=False, label_dict=None, min_group_size=None)[source]¶ This function plots absolute group metrics as indicated by the config file, colored based on calculated parity.
- Parameters
fairness_table – a fairness table. Output of fairness.get_fairness function.
group_metric – the fairness metric to plot. Must be a column in the fairness_table.
ax – a matplotlib Axis. If not passed a new figure will be created.
ax_lim – maximum value on x-axis, used to match axes across subplots when plotting multiple metrics. Default is None.
title – whether to include a title in visualizations. Default is True.
label_dict – optional dictionary of replacement values for data. Default is None.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in fairness visualization
- Returns
A Matplotlib axis
-
plot_fairness_group_all
(fairness_table, metrics=None, fillzeros=True, ncols=3, title=True, label_dict=None, show_figure=True, min_group_size=None)[source]¶ Plot multiple metrics at once from a fairness object table.
- Parameters
fairness_table – output of fairness.get_fairness functions.
metrics –
which metric(s) to plot, or ‘all.’ If this value is null, will plot:
Predicted Prevalence (pprev),
Predicted Positive Rate (ppr),
False Discovery Rate (fdr),
False Omission Rate (for),
False Positive Rate (fpr),
False Negative Rate (fnr)
fillzeros – whether to fill null values with zeros. Default is True.
ncols – number of subplots per row in figure. Default is 3.
title – whether to display a title on each plot. Default is True.
label_dict – optional dictionary of label replacements. Default is None.
show_figure – whether to show figure (plt.show()). Default is True.
min_group_size – minimum proportion of total group size (all data). a population group must meet in order to be included in fairness visualization
- Returns
A Matplotlib figure
-
plot_group_metric
(group_table, group_metric, ax=None, ax_lim=None, title=True, label_dict=None, min_group_size=None)[source]¶ Plot a single group metric across all attribute groups.
- Parameters
group_table – group table. Output of of group.get_crosstabs() or bias.get_disparity functions.
group_metric – the metric to plot. Must be a column in the group_table.
ax – a matplotlib Axis. If not passed, a new figure will be created.
title – whether to include a title in visualizations. Default is True.
label_dict – optional, dictionary of replacement labels for data. Default is None.
min_group_size – minimum size for groups to include in visualization (as a proportion of total sample)
- Returns
A Matplotlib axis
-
plot_group_metric_all
(data_table, metrics=None, fillzeros=True, ncols=3, title=True, label_dict=None, show_figure=True, min_group_size=None)[source]¶ Plot multiple metrics at once from a fairness object table.
- Parameters
data_table – output of group.get_crosstabs, bias.get_disparity, or fairness.get_fairness functions.
metrics –
which metric(s) to plot, or ‘all.’ If this value is null, will plot:
Predicted Prevalence (pprev),
Predicted Positive Rate (ppr),
False Discovery Rate (fdr),
False Omission Rate (for),
False Positive Rate (fpr),
False Negative Rate (fnr)
fillzeros – whether to fill null values with zeros. Default is True.
ncols – number of subplots per row in figure. Default is 3.
title – whether to display a title on each plot. Default is True.
label_dict – optional dictionary of label replacements. Default is None.
show_figure – whether to show figure (plt.show()). Default is True.
min_group_size – minimum proportion of total group size (all data) a population group must meet in order to be included in group metric visualization.
- Returns
A Matplotlib figure
-
-
src.aequitas.plotting.
assemble_ref_groups
(disparities_table, ref_group_flag='_ref_group_value', specific_measures=None, label_score_ref=None)[source]¶ Creates a dictionary of reference groups for each metric in a data_table.
- Parameters
disparities_table –
- a disparity table. Output of bias.get_disparity or
fairness.get_fairness functions
- param ref_group_flag
string indicating column indicates reference group flag value. Default is ‘_ref_group_value’.
- param specific_measures
Limits reference dictionary to only specified metrics in a data table. Default is None.
- param label_score_ref
Defines a metric, ex: ‘fpr’ (false positive rate) from which to mimic reference group for label_value and score. Used for statistical significance calculations in Bias() class. Default is None.
- return
A dictionary
src.aequitas.preprocessing
¶
-
src.aequitas.preprocessing.
check_required_cols
(df, required_cols)[source]¶ - Parameters
df – A data frame of model results
required_cols – Column names required for selected fairness measures
- Returns
None, or ValueError
-
src.aequitas.preprocessing.
discretize
(df, target_cols)[source]¶ - Parameters
df – A data frame of model results
target_cols – Names of columns to discretize
- Returns
A data frame