alibi.explainers.cem module¶
-
class
alibi.explainers.cem.
CEM
(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]¶ Bases:
alibi.api.interfaces.Explainer
,alibi.api.interfaces.FitMixin
-
__init__
(predict, mode, shape, kappa=0.0, beta=0.1, feature_range=(-10000000000.0, 10000000000.0), gamma=0.0, ae_model=None, learning_rate_init=0.01, max_iterations=1000, c_init=10.0, c_steps=10, eps=(0.001, 0.001), clip=(-100.0, 100.0), update_num_grad=1, no_info_val=None, write_dir=None, sess=None)[source]¶ Initialize contrastive explanation method. Paper: https://arxiv.org/abs/1802.07623
- Parameters
predict (
Union
[Callable
, tensorflow.keras.Model,Model')
]) – Keras or TensorFlow model or any other model’s prediction function returning class probabilitiesmode (
str
) – Find pertinant negatives (‘PN’) or pertinant positives (‘PP’)shape (
tuple
) – Shape of input data starting with batch sizekappa (
float
) – Confidence parameter for the attack loss termbeta (
float
) – Regularization constant for L1 loss termfeature_range (
tuple
) – Tuple with min and max ranges to allow for perturbed instances. Min and max ranges can be floats or numpy arrays with dimension (1x nb of features) for feature-wise rangesgamma (
float
) – Regularization constant for optional auto-encoder loss termae_model (
Union
[tensorflow.keras.Model,Model')
]) – Optional auto-encoder model used for loss regularizationlearning_rate_init (
float
) – Initial learning rate of optimizermax_iterations (
int
) – Maximum number of iterations for finding a PN or PPc_init (
float
) – Initial value to scale the attack loss termc_steps (
int
) – Number of iterations to adjust the constant scaling the attack loss termeps (
tuple
) – If numerical gradients are used to compute dL/dx = (dL/dp) * (dp/dx), then eps[0] is used to calculate dL/dp and eps[1] is used for dp/dx. eps[0] and eps[1] can be a combination of float values and numpy arrays. For eps[0], the array dimension should be (1x nb of prediction categories) and for eps[1] it should be (1x nb of features)clip (
tuple
) – Tuple with min and max clip ranges for both the numerical gradients and the gradients obtained from the TensorFlow graphupdate_num_grad (
int
) – If numerical gradients are used, they will be updated every update_num_grad iterationsno_info_val (
Union
[float
,ndarray
]) – Global or feature-wise value considered as containing no informationwrite_dir (
str
) – Directory to write tensorboard files tosess (tensorflow.compat.v1.Session) – Optional Tensorflow session that will be used if passed instead of creating or inferring one internally
- Return type
-
attack
(X, Y, verbose=False)[source]¶ Find pertinent negative or pertinent positive for instance X using a fast iterative shrinkage-thresholding algorithm (FISTA).
-
explain
(X, Y=None, verbose=False)[source]¶ Explain instance and return PP or PN with metadata.
- Parameters
- Return type
- Returns
explanation – Dictionary containing the PP or PN with additional metadata
-
get_gradients
(X, Y)[source]¶ Compute numerical gradients of the attack loss term: dL/dx = (dL/dP)*(dP/dx) with L = loss_attack_s; P = predict; x = adv_s
- Parameters
X (
ndarray
) – Instance around which gradient is evaluatedY (
ndarray
) – One-hot representation of instance labels
- Return type
ndarray
- Returns
Array with gradients.
-
loss_fn
(pred_proba, Y)[source]¶ Compute the attack loss.
- Parameters
pred_proba (
ndarray
) – Prediction probabilities of an instanceY (
ndarray
) – One-hot representation of instance labels
- Return type
ndarray
- Returns
Loss of the attack.
-