This page was generated from examples/anchor_text_movie.ipynb.

Anchor explanations for movie sentiment

In this example, we will explain why a certain sentence is classified by a logistic regression as having negative or positive sentiment. The logistic regression is trained on negative and positive movie reviews.

[1]:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import spacy
from alibi.explainers import AnchorText
from alibi.datasets import fetch_movie_sentiment
from alibi.utils.download import spacy_model

Load movie review dataset

The fetch_movie_sentiment function returns a Bunch object containing the features, the targets and the target names for the dataset.

[2]:
movies = fetch_movie_sentiment()
movies.keys()
[2]:
dict_keys(['data', 'target', 'target_names'])
[3]:
data = movies.data
labels = movies.target
target_names = movies.target_names

Define shuffled training, validation and test set

[4]:
train, test, train_labels, test_labels = train_test_split(data, labels, test_size=.2, random_state=42)
train, val, train_labels, val_labels = train_test_split(train, train_labels, test_size=.1, random_state=42)
train_labels = np.array(train_labels)
test_labels = np.array(test_labels)
val_labels = np.array(val_labels)

Apply CountVectorizer to training set

[5]:
vectorizer = CountVectorizer(min_df=1)
vectorizer.fit(train)
[5]:
CountVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
                lowercase=True, max_df=1.0, max_features=None, min_df=1,
                ngram_range=(1, 1), preprocessor=None, stop_words=None,
                strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, vocabulary=None)

Fit model

[6]:
np.random.seed(0)
clf = LogisticRegression(solver='liblinear')
clf.fit(vectorizer.transform(train), train_labels)
[6]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

Define prediction function

[7]:
predict_fn = lambda x: clf.predict(vectorizer.transform(x))

Make predictions on train and test sets

[8]:
preds_train = predict_fn(train)
preds_val = predict_fn(val)
preds_test = predict_fn(test)
print('Train accuracy', accuracy_score(train_labels, preds_train))
print('Validation accuracy', accuracy_score(val_labels, preds_val))
print('Test accuracy', accuracy_score(test_labels, preds_test))
Train accuracy 0.9801624284382905
Validation accuracy 0.7544910179640718
Test accuracy 0.7589841878294202

Load spaCy model

English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.

[9]:
model = 'en_core_web_md'
spacy_model(model=model)
nlp = spacy.load(model)

Initialize anchor text explainer

[10]:
explainer = AnchorText(nlp, predict_fn)

Explain a prediction

[11]:
class_names = movies.target_names
[12]:
text = data[4]
print(text)
a visually flashy but narratively opaque and emotionally vapid exercise in style and mystification .

Prediction:

[13]:
pred = class_names[predict_fn([text])[0]]
alternative =  class_names[1 - predict_fn([text])[0]]
print('Prediction: %s' % pred)
Prediction: negative

Explanation:

[14]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, use_unk=True)

use_unk=True means we will perturb examples by replacing words with UNKs. Let us now take a look at the anchor. The word ‘exercise’ basically guarantees a negative prediction.

[15]:
print('Anchor: %s' % (' AND '.join(explanation.anchor)))
print('Precision: %.2f' % explanation.precision)
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_false']]))
Anchor: flashy
Precision: 0.99

Examples where anchor applies and model predicts negative:
a UNK flashy UNK UNK opaque and emotionally vapid exercise in style UNK mystification .
a UNK flashy UNK UNK UNK and emotionally UNK exercise UNK UNK and UNK UNK
a UNK flashy UNK narratively opaque UNK UNK UNK exercise in style and UNK UNK
UNK visually flashy UNK narratively UNK and emotionally UNK UNK UNK UNK UNK mystification .
UNK UNK flashy UNK UNK opaque and emotionally UNK UNK in UNK and UNK .
a visually flashy but UNK UNK and UNK UNK UNK in style UNK mystification .
a visually flashy but UNK opaque UNK emotionally vapid UNK in UNK and mystification .
a UNK flashy but narratively UNK UNK emotionally vapid exercise in style UNK mystification UNK
a UNK flashy but narratively opaque UNK emotionally vapid exercise in style and mystification .
a visually flashy UNK UNK opaque UNK UNK UNK exercise in UNK UNK UNK .

Examples where anchor applies and model predicts positive:
UNK UNK flashy but narratively UNK and UNK UNK UNK in style and UNK UNK

Changing the perturbation distribution

Let’s try this with another perturbation distribution, namely one that replaces words by similar words instead of UNKs.

Explanation:

[16]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, use_unk=False, sample_proba=0.5)

The anchor now shows that we need more to guarantee the negative prediction:

[17]:
print('Anchor: %s' % (' AND '.join(explanation.anchor)))
print('Precision: %.2f' % explanation.precision)
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_false']]))
Anchor: exercise AND emotionally
Precision: 0.97

Examples where anchor applies and model predicts negative:
a visually ballsy but philosophically opaque and emotionally vapid exercise if signature and negation .
a instantly flashy but narratively opaque and emotionally vapid exercise than style and mystification .
a visually flashy but narratively opaque and emotionally vapid exercise in style and fear .
some visually quirky but narratively dainty and emotionally patronising exercise until style and hopelessness .
another visually decorate but narratively opaque and emotionally vapid exercise in outfit and mystification .
both curiously unwieldy but wonderfully hollow and emotionally ridiculous exercise in style and mystification .
a visually artsy but gloriously opaque and emotionally vapid exercise in style and mystification .
a visually eclectic but narratively opaque and emotionally vapid exercise in vogue and mystification .
another stunningly flashy but narratively bright and emotionally unscientific exercise on style and falsehood .
a exceptionally whimsical but disturbingly opaque and emotionally vapid exercise about vibe and mystification .

Examples where anchor applies and model predicts positive:
both visually unconventional but socially opaque and emotionally caricature exercise around style and woe .
both wonderfully artsy but similarly opaque and emotionally vapid exercise in style and mystification .
a perfectly groovy but supremely smooth and emotionally vapid exercise towards style and mystification .
a visually stylish but narratively truthful and emotionally moronic exercise in style and oxymoron .
any remarkably inventive but narratively opaque and emotionally babble exercise despite style and naivety .

We can make the token perturbation distribution sample words that are more similar to the ground truth word via the top_n argument. Smaller values (default=100) should result in sentences that are more coherent and thus more in the distribution of natural language which could influence the returned anchor. By setting the use_probability_proba to True, the sampling distribution for perturbed tokens is proportional to the similarity score between the possible perturbations and the original word. We can also put more weight on similar words via the temperature argument. Lower values of temperature increase the sampling weight of more similar words. The following example will perturb tokens in the original sentence with probability equal to sample_proba. The sampling distribution for the perturbed tokens is proportional to the similarity score between the ground truth word and each of the top_n words.

[18]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, use_similarity_proba=True, sample_proba=0.5,
                                use_unk=False, top_n=20, temperature=.2)

print('Anchor: %s' % (' AND '.join(explanation.anchor)))
print('Precision: %.2f' % explanation.precision)
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_false']]))
Anchor: exercise AND emotionally
Precision: 0.98

Examples where anchor applies and model predicts negative:
a visually exquisite but narratively opaque and emotionally vapid exercise before style and mystification .
each mechanically eccentric but narratively transparent and emotionally unremarkable exercise in style and falsehood .
a incredibly extravagant but artistically bright and emotionally vapid exercise of style and mystification .
any visually shiny but artistically glide and emotionally vapid exercise around temperament and materialism .
another clearly flashy but aesthetically opaque and emotionally vapid exercise whether flair and mystification .
a visually snazzy but narratively opaque and emotionally mindless exercise within style and negation .
a visually ingenious but narratively opaque and emotionally unimaginative exercise in artistry and mystification .
a visually flashy but narratively colorful and emotionally vapid exercise than style and mystification .
a graphically punchy but narratively opaque and emotionally vapid exercise of vibe and insanity .
a artistically flashy but narratively opaque and emotionally vapid exercise in ballroom and mystification .

Examples where anchor applies and model predicts positive:
any vividly outlandish but supremely opaque and emotionally vapid exercise throughout streetwear and mystification .
another precisely elaborate but delightfully realistic and emotionally muddled exercise in brevity and paranoia .