Algorithm overview¶

This page provides a high-level overview of the algorithms and their features currently implemented in Alibi.

Model Explanations¶

These algorithms provide instance-specific (sometimes also called “local”) explanations of ML model predictions. Given a single instance and a model prediction they aim to answer the question “Why did my model make this prediction?” The following table summarizes the capabilities of the current algorithms:

Explainer	Classification	Regression	Categorical features	Tabular	Text	Images	Needs training set
Anchors	✔	✘	✔	✔	✔	✔	For Tabular
CEM	✔	✘	✘	✔	✘	✔	Optional
Counterfactual Instances	✔	✘	✘	✔	✘	✔	No
Prototype Counterfactuals	✔	✘	✘	✔	✘	✔	Optional

Anchor explanations: produce an “anchor” - a small subset of features and their ranges that will almost always result in the same model prediction. Documentation, tabular example, text classification, image classification.

Contrastive explanation method (CEM): produce a pertinent positive (PP) and a pertinent negative (PN) instance. The PP instance finds the features that should me minimally and sufficiently present to predict the same class as the original prediction (a PP acts as the “most compact” representation of the instance to keep the same prediction). The PN instance identifies the features that should be minimally and necessarily absent to maintain the original prediction (a PN acts as the closest instance that would result in a different prediction). Documentation, tabular example, image classification.

Counterfactual instances: generate counterfactual examples using a simple loss function. Documentation, image classification.

Prototype Counterfactuals: generate counterfactuals guided by nearest class prototypes other than the class predicted on the original instance. It can use both an encoder or k-d trees to define the prototypes. This method can speed up the search, especially for black box models, and create interpretable counterfactuals. Documentation, tabular example, image classification.

Model Confidence¶

These algorihtms provide instance-specific scores measuring the model confidence for making a particular prediction.

Algorithm	Classification	Regression	Categorical features	Tabular	Text	Images	Needs training set
Trust Scores	✔	✘	✘	✔	✔[1]	✔[2]	Yes

Trust scores: produce a “trust score” of a classifier’s prediction. The trust score is the ratio between the distance to the nearest class different from the predicted class and the distance to the predicted class, higher scores correspond to more trustworthy predictions. Documentation, tabular example, image classification

[1]	Depending on model

[2]	May require dimensionality reduction