alibi.explainers.cem module¶

class alibi.explainers.cem.CEM(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]¶

Bases: alibi.api.interfaces.Explainer, alibi.api.interfaces.FitMixin

__init__(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]¶

Initialize contrastive explanation method. Paper: https://arxiv.org/abs/1802.07623

Parameters

predict (Union[Callable, tensorflow.keras.Model, Model')]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilities
mode (str) – Find pertinant negatives (‘PN’) or pertinant positives (‘PP’)
shape (tuple) – Shape of input data starting with batch size
kappa (float) – Confidence parameter for the attack loss term
beta (float) – Regularization constant for L1 loss term
feature_range (tuple) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise ranges
gamma (float) – Regularization constant for optional auto-encoder loss term
ae_model (Union[tensorflow.keras.Model, Model')]) – Optional auto-encoder model used for loss regularization
learning_rate_init (float) – Initial learning rate of optimizer
max_iterations (int) – Maximum number of iterations for finding a PN or PP
c_init (float) – Initial value to scale the attack loss term
c_steps (int) – Number of iterations to adjust the constant scaling the attack loss term
eps (tuple) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)
clip (tuple) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graph
update_num_grad (int) – If numerical gradients are used, they will be updated every update_num_grad iterations
no_info_val (Union[float, ndarray]) – Global or feature-wise value considered as containing no information
write_dir (str) – Directory to write tensorboard files to
sess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally

Return type

None

attack(X, Y, verbose=False)[source]¶

Find pertinent negative or pertinent positive for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).

Parameters

X (ndarray) – Instance to attack
Y (ndarray) – Labels for X
verbose (bool) – Print intermediate results of optimization if True

Return type

Tuple[ndarray, Tuple[ndarray, ndarray]]

Returns

Overall best attack and gradients for that attack.

explain(X, Y=None, verbose=False)[source]¶

Explain instance and return PP or PN with metadata.

Parameters

X (ndarray) – Instances to attack
Y (Optional[ndarray]) – Labels for X
verbose (bool) – Print intermediate results of optimization if True

Return type

Explanation

Returns

explanation – Dictionary containing the PP or PN with additional metadata

fit(train_data, no_info_type='median')[source]¶

Get ‘no information’ values from the training data.

Parameters

train_data (ndarray) – Representative sample from the training data
no_info_type (str) – Median or mean value by feature supported

Return type

CEM

get_gradients(X, Y)[source]¶

Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s

Parameters

X (ndarray) – Instance around which gradient is evaluated
Y (ndarray) – One-hot representation of instance labels

Return type

ndarray

Returns

Array with gradients.

loss_fn(pred_proba, Y)[source]¶

Compute the attack loss.

Parameters

pred_proba (ndarray) – Prediction probabilities of an instance
Y (ndarray) – One-hot representation of instance labels

Return type

ndarray

Returns

Loss of the attack.

perturb(X, eps, proba=False)[source]¶

Apply perturbation to instance or prediction probabilities. Used for numerical calculation of gradients.

Parameters

X (ndarray) – Array to be perturbed
eps (Union[float, ndarray]) – Size of perturbation
proba (bool) – If True, the net effect of the perturbation needs to be 0 to keep the sum of the probabilities equal to 1

Return type

Tuple[ndarray, ndarray]

Returns

Instances where a positive and negative perturbation is applied.