Adversarial Debiasing — adversarial

Adversarial debiasing is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary's ability to determine the protected attribute from the predictions

Usage

adversarial_debiasing(
  unprivileged_groups,
  privileged_groups,
  scope_name = "current",
  sess = tf$compat$v1$Session(),
  seed = NULL,
  adversary_loss_weight = 0.1,
  num_epochs = 50L,
  batch_size = 128L,
  classifier_num_hidden_units = 200L,
  debias = TRUE
)

Arguments

unprivileged_groups: A list with two values: the column of the protected class and the value indicating representation for unprivileged group.
privileged_groups: A list with two values: the column of the protected class and the value indicating representation for privileged group.
scope_name: Scope name for the tensorflow variables.
sess: tensorflow session
seed: Seed to make predict repeatable. If not, NULL, must be an integer.
adversary_loss_weight: Hyperparameter that chooses the strength of the adversarial loss.
num_epochs: Number of training epochs. Must be an integer.
batch_size: Batch size. Must be an integer.
classifier_num_hidden_units: Number of hidden units in the classifier model. Must be an integer.
debias: Learn a classifier with or without debiasing.

Examples

if (FALSE) {
load_aif360_lib()
ad <- adult_dataset()
p <- list("race", 1)
u <- list("race", 0)

sess <- tf$compat$v1$Session()

plain_model <- adversarial_debiasing(privileged_groups = p,
                                     unprivileged_groups = u,
                                     scope_name = "debiased_classifier",
                                     debias = TRUE,
                                     sess = sess)

plain_model$fit(ad)
ad_nodebiasing <- plain_model$predict(ad)
}