This page was generated from examples/anchor_text_movie.ipynb.
Anchor explanations for movie sentiment¶
In this example, we will explain why a certain sentence is classified by a logistic regression as having negative or positive sentiment. The logistic regression is trained on negative and positive movie reviews.
[1]:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import spacy
from alibi.explainers import AnchorText
from alibi.datasets import fetch_movie_sentiment
from alibi.utils.download import spacy_model
Load movie review dataset¶
The fetch_movie_sentiment
function returns a Bunch
object containing the features, the targets and the target names for the dataset.
[2]:
movies = fetch_movie_sentiment()
movies.keys()
[2]:
dict_keys(['data', 'target', 'target_names'])
[3]:
data = movies.data
labels = movies.target
target_names = movies.target_names
Define shuffled training, validation and test set
[4]:
train, test, train_labels, test_labels = train_test_split(data, labels, test_size=.2, random_state=42)
train, val, train_labels, val_labels = train_test_split(train, train_labels, test_size=.1, random_state=42)
train_labels = np.array(train_labels)
test_labels = np.array(test_labels)
val_labels = np.array(val_labels)
Apply CountVectorizer to training set¶
[5]:
vectorizer = CountVectorizer(min_df=1)
vectorizer.fit(train)
[5]:
CountVectorizer(analyzer='word', binary=False, decode_error='strict',
dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
lowercase=True, max_df=1.0, max_features=None, min_df=1,
ngram_range=(1, 1), preprocessor=None, stop_words=None,
strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
tokenizer=None, vocabulary=None)
Fit model¶
[6]:
np.random.seed(0)
clf = LogisticRegression(solver='liblinear')
clf.fit(vectorizer.transform(train), train_labels)
[6]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=None, solver='liblinear', tol=0.0001, verbose=0,
warm_start=False)
Define prediction function¶
[7]:
predict_fn = lambda x: clf.predict(vectorizer.transform(x))
Make predictions on train and test sets¶
[8]:
preds_train = predict_fn(train)
preds_val = predict_fn(val)
preds_test = predict_fn(test)
print('Train accuracy', accuracy_score(train_labels, preds_train))
print('Validation accuracy', accuracy_score(val_labels, preds_val))
print('Test accuracy', accuracy_score(test_labels, preds_test))
Train accuracy 0.9801624284382905
Validation accuracy 0.7544910179640718
Test accuracy 0.7589841878294202
Load spaCy model¶
English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.
[9]:
model = 'en_core_web_md'
spacy_model(model=model)
nlp = spacy.load(model)
Initialize anchor text explainer¶
[10]:
explainer = AnchorText(nlp, predict_fn)
Explain a prediction¶
[11]:
class_names = movies.target_names
Prediction:
[12]:
text = 'This is a good book .'
pred = class_names[predict_fn([text])[0]]
alternative = class_names[1 - predict_fn([text])[0]]
print('Prediction: %s' % pred)
Prediction: positive
Explanation:
[13]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, use_proba=False, use_unk=True)
use_unk=True means we will perturb examples by replacing words with UNKs. Let us now take a look at the anchor. The word ‘good’ basically guarantees a positive prediction. This is because the UNKs do not take instances like ‘not good’ into account.
[14]:
print('Anchor: %s' % (' AND '.join(explanation['names'])))
print('Precision: %.2f' % explanation['precision'])
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_false']]))
Anchor: good
Precision: 1.00
Examples where anchor applies and model predicts positive:
UNK UNK UNK good book UNK
UNK is a good book .
UNK is a good book UNK
UNK is UNK good book .
UNK UNK UNK good book .
UNK is a good book .
UNK is UNK good UNK UNK
UNK UNK UNK good UNK .
This UNK a good UNK UNK
This is a good UNK .
Examples where anchor applies and model predicts negative:
Changing the perturbation distribution¶
Let’s try this with another perturbation distribution, namely one that replaces words by similar words instead of UNKs.
Explanation:
[15]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, use_proba=True, use_unk=False)
The anchor now shows that we need more to guarantee the positive prediction:
[16]:
print('Anchor: %s' % (' AND '.join(explanation['names'])))
print('Precision: %.2f' % explanation['precision'])
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_false']]))
Anchor: good AND book AND is
Precision: 1.00
Examples where anchor applies and model predicts positive:
ALL is any good book .
THESE is both good book .
Some is every good book .
THis is each good book .
That is another good book .
THis is another good book .
SOME is some good book .
That is the good book .
THOSE is some good book .
Both is every good book .
Examples where anchor applies and model predicts negative: