This page was generated from examples/integrated_gradients_transformers.ipynb.

Integrated gradients for transformers models

In this example, we apply the integrated gradients method to two different sentiment analysis models. The first one is a pretrained sentiment analysis model from the transformers library. The second model is a combination of a pretrained BERT model and a simple feed forward network. The feed forward network is trained on the IMDB dataset using the BERT output embeddings as features.

In text classification models, integrated gradients define an attribution value for each word in the input sentence. The attributions are calculated considering the integral of the model gradients with respect to the word embedding layer along a straight path from a baseline instance \(x^\prime\) to the input instance \(x.\) A description of the method can be found here. Integrated gradients was originally proposed in Sundararajan et al., “Axiomatic Attribution for Deep Networks”

[2]:
import re
import os
import tensorflow as tf
import numpy as np
import torch
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
from transformers import BertTokenizerFast, TFBertModel, BertConfig
from alibi.explainers import IntegratedGradients
from tensorflow.keras.datasets import imdb
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.optimizers import Adam

Here we define some functions needed to process the data. For consistency with other text examples in alibi, we will use the IMDB dataset provided by keras. Since the dataset consists of reviews that are already tokenized, we need to decode each sentence and re-convert them into tokens using the BERT tokenizer.

[3]:
def decode_sentence(x, reverse_index):
    """Decodes the tokenized sentences from keras IMDB dataset into plain text.
    """
    # the `-3` offset is due to the special tokens used by keras
    # see https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset
    return " ".join([reverse_index.get(i - 3, 'UNK') for i in x])

def preprocess_reviews(reviews):
    """Preprocess the text.
    """
    REPLACE_NO_SPACE = re.compile("[.;:,!\'?\"()\[\]]")
    REPLACE_WITH_SPACE = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")

    reviews = [REPLACE_NO_SPACE.sub("", line.lower()) for line in reviews]
    reviews = [REPLACE_WITH_SPACE.sub(" ", line) for line in reviews]

    return reviews

def process_sentences(sentence,
                      tokenizer,
                      max_len):
    """Tokenize the text sentences.
    """
    z = tokenizer(sentence,
                  add_special_tokens = False,
                  padding = 'max_length',
                  max_length = max_len,
                  truncation = True,
                  return_token_type_ids=True,
                  return_attention_mask = True,
                  return_tensors = 'np')
    return z

Automodel

In this section, we will use the tensorflow auto model for sequence classification provided by the transformers library.

The model is pre-trained on the Stanford Sentiment Treebank (SST) dataset. The Stanford Sentiment Treebank is the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language.

Each phrase is labelled as either negative, somewhat negative, neutral, somewhat positive or positive. The corpus with all 5 labels is referred to as SST-5 or SST fine-grained. Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary. In this example, we will use a text classifier pre-trained on the SST-2 dataset.

[4]:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
auto_model_bert = TFAutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.

The automodel output is a custom object containing the output logits. We use a wrapper to transform the output into a tensor and apply a softmax function to the logits.

[5]:
class AutoModelWrapper(tf.keras.Model):

    def __init__(self, model_bert, **kwargs):
        super().__init__()
        self.model_bert = model_bert

    def call(self, inputs, attention_mask=None):
        out = self.model_bert(inputs,
                              attention_mask=attention_mask)
        return tf.nn.softmax(out.logits)

    def get_config(self):
        return {}

    @classmethod
    def from_config(cls, config):
        return cls(**config)
[6]:
auto_model = AutoModelWrapper(auto_model_bert)

Calculate integrated gradients

[7]:
max_features = 10000
max_len = 100

Here we consider some simple sentences such as “I love you, I like you”, “I love you, I like you, but I also kind of dislike you” .

[8]:
z_test_sample = ['I love you, I like you',
                'I love you, I like you, but I also kind of dislike you']
z_test_sample = preprocess_reviews(z_test_sample)
z_test_sample = process_sentences(z_test_sample,
                                   tokenizer,
                                   max_len)
x_test_sample = z_test_sample['input_ids']
kwargs = {k:v for k,v in z_test_sample.items() if k == 'attention_mask'}

The auto model consists of a main BERT layer (layer 0) followed by two dense layers.

[9]:
auto_model.layers[0].layers
[9]:
[<transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertMainLayer at 0x7f4979fb8950>,
 <tensorflow.python.keras.layers.core.Dense at 0x7f496c7d4710>,
 <tensorflow.python.keras.layers.core.Dense at 0x7f496c7d4ad0>,
 <tensorflow.python.keras.layers.core.Dropout at 0x7f496c7d4dd0>]

We extract the first transformer’s block in the main BERT layer.

[10]:
#  Extracting the first transformer block
bl = auto_model.layers[0].layers[0].transformer.layer[1]
[11]:
n_steps = 5
method = "gausslegendre"
internal_batch_size = 5
ig  = IntegratedGradients(auto_model,
                          layer=bl,
                          n_steps=n_steps,
                          method=method,
                          internal_batch_size=internal_batch_size)
[12]:
predictions = auto_model(x_test_sample, **kwargs).numpy().argmax(axis=1)
explanation = ig.explain(x_test_sample,
                         forward_kwargs=kwargs,
                         baselines=None,
                         target=predictions)
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x7f4a82ee6670>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x7f4a82ee6670>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
[13]:
# Get attributions values from the explanation object
attrs = explanation.attributions[0]
print('Attributions shape:', attrs.shape)
Attributions shape: (2, 100, 768)
[14]:
attrs = attrs.sum(axis=2)
print('Attributions shape:', attrs.shape)
Attributions shape: (2, 100)
[15]:
i = 1
x_i = x_test_sample[i]
attrs_i = attrs[i]
pred = predictions[i]
pred_dict = {1: 'Positive review', 0: 'Negative review'}
[16]:
from IPython.display import HTML
def  hlstr(string, color='white'):
    """
    Return HTML markup highlighting text with the desired color.
    """
    return f"<mark style=background-color:{color}>{string} </mark>"
[17]:
def colorize(attrs, cmap='PiYG'):
    """
    Compute hex colors based on the attributions for a single instance.
    Uses a diverging colorscale by default and normalizes and scales
    the colormap so that colors are consistent with the attributions.
    """
    import matplotlib as mpl
    cmap_bound = np.abs(attrs).max()
    norm = mpl.colors.Normalize(vmin=-cmap_bound, vmax=cmap_bound)
    cmap = mpl.cm.get_cmap(cmap)

    # now compute hex values of colors
    colors = list(map(lambda x: mpl.colors.rgb2hex(cmap(norm(x))), attrs))
    return colors
[18]:
words = tokenizer.decode(x_i).split()
colors = colorize(attrs_i)
[19]:
print('Predicted label =  {}: {}'.format(pred, pred_dict[pred]))
Predicted label =  1: Positive review
[20]:
HTML("".join(list(map(hlstr, words, colors))))
[20]:
i love you i like you but i also kind of dislike you [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]

Sentiment analysis on IMDB with fine-tuned model head.

We consider a text classifier fine-tuned on the IMDB dataset. We train a feed forward network which uses the concatenated output embeddings of a pretrained BERT model as input features. The BERT model and the trained ffn are combined to obtain an end-to-end text classifier.

It must be noted that training an end-to-end text classifier (i. e. combining the BERT model and the feed forward network before training) instead of training the feed forward network separately is likely to lead to better model performance. However, the latter approach is considerably faster and lighter. We use this approach here since performance optimization is beyond the scope of this notebook and the purpose of this example is to illustrate the integrated gradients method applied to a custom classifier.

[21]:
def get_embeddings(X_train, model, batch_size=50):

    args = X_train['input_ids']
    kwargs = {k:v for k, v in  X_train.items() if k != 'input_ids'}

    dataset = tf.data.Dataset.from_tensor_slices((args, kwargs)).batch(batch_size)
    dataset = dataset.as_numpy_iterator()

    embbedings = []
    for X_batch in dataset:
        args_b, kwargs_b = X_batch
        batch_embeddings = model(args_b, **kwargs_b)
        embbedings.append(batch_embeddings.last_hidden_state.numpy())

    return np.concatenate(embbedings, axis=0)

Load and process data

Loading the IMDB dataset.

[22]:
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
test_labels = y_test.copy()
train_labels = y_train.copy()
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

index = imdb.get_word_index()
reverse_index = {value: key for (key, value) in index.items()}
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
x_train shape: (25000, 100)
x_test shape: (25000, 100)

Extract embeddings for training

In order to speed up the training, the BERT embeddings are pre-extracted and used as features by the feed forward network.

[23]:
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
config = BertConfig.from_pretrained("bert-base-uncased")
modelBert = TFBertModel.from_pretrained("bert-base-uncased", config=config)

modelBert.trainable=False
Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.

Decoding each sentence in the keras IMDB tokenized dataset to obtain the corresponding plain text.

[24]:
X_train, X_test = [], []
for i in range(len(x_train)):
    tr_sentence = decode_sentence(x_train[i], reverse_index)
    X_train.append(tr_sentence)
    te_sentence = decode_sentence(x_test[i], reverse_index)
    X_test.append(te_sentence)

Re-tokenizing the plain text using the BERT tokenizer.

[25]:
X_train = preprocess_reviews(X_train)
X_train = process_sentences(X_train, tokenizer, max_len)
X_test = preprocess_reviews(X_test)
X_test = process_sentences(X_test, tokenizer, max_len)

Extracting the BERT embeddings.

[26]:
train_embbedings = get_embeddings(X_train,
                                  modelBert,
                                  batch_size=100)
test_embbedings = get_embeddings(X_test,
                                 modelBert,
                                 batch_size=100)

Train model

Here we train the model head using the BERT output embeddings as features. The output embeddings are tensors of dimension 100 X 768, where each 768-dimensional vector represents a word in a sentence of 100 words. The embedding vectors are concatenated along the first dimension in order to represents a full review. The model head consists of one dense layer 128 hidden units followed by a 2 units layer with softmax activation.

[27]:
dropout = 0.1
hidden_dims = 128
[28]:
class ModelOut(tf.keras.Model):

    def __init__(self,
                 dropout=0.2,
                 hidden_dims=128):
        super().__init__()

        self.dropout = dropout
        self.hidden_dims = hidden_dims

        self.flat = tf.keras.layers.Flatten()
        self.dense_1 =  tf.keras.layers.Dense(hidden_dims,
                                              activation='relu')
        self.dropoutl = tf.keras.layers.Dropout(dropout)
        self.dense_2 = tf.keras.layers.Dense(2,
                                             activation='softmax')

    def call(self, inputs):
        x = self.flat(inputs)
        x = self.dense_1(x)
        x = self.dropoutl(x)
        x = self.dense_2(x)
        return x

    def get_config(self):
        return {"dropout": self.dropout,
                "hidden_dims": self.hidden_dims}

    @classmethod
    def from_config(cls, config):
        return cls(**config)
[29]:
model_out = ModelOut(dropout=dropout, hidden_dims=hidden_dims)

Training the model. If the model has been already trained, it can be loaded from the checkpoint directory setting load_model=True.

[30]:
load_model = False
batch_size = 128
epochs = 3
[31]:
filepath = './model_transformers/'  # change to desired save directory

model_out.compile(optimizer=Adam(1e-4),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

if not load_model:

    checkpoint_path = os.path.join(filepath, "training/cp-{epoch:04d}.ckpt")
    checkpoint_dir = os.path.dirname(checkpoint_path)

    # Create a callback that saves the model's weights every epoch
    cp_callback = tf.keras.callbacks.ModelCheckpoint(
        filepath=checkpoint_path,
        verbose=1,
        save_weights_only=True,
        save_freq='epoch')

    model_out.fit(train_embbedings, y_train,
                  validation_data=(test_embbedings, y_test),
                  epochs=epochs,
                  batch_size=batch_size,
                  callbacks=[cp_callback],
                  verbose=1)
else:
    epoch = 3
    load_path = os.path.join(filepath, f"training/cp-{epoch:04d}.ckpt")
    model_out.load_weights(load_path)
Epoch 1/3
196/196 [==============================] - ETA: 0s - loss: 0.6954 - accuracy: 0.6124
Epoch 00001: saving model to ./model_transformers/training/cp-0001.ckpt
196/196 [==============================] - 9s 48ms/step - loss: 0.6954 - accuracy: 0.6124 - val_loss: 0.6214 - val_accuracy: 0.6978
Epoch 2/3
194/196 [============================>.] - ETA: 0s - loss: 0.6044 - accuracy: 0.7097
Epoch 00002: saving model to ./model_transformers/training/cp-0002.ckpt
196/196 [==============================] - 6s 29ms/step - loss: 0.6040 - accuracy: 0.7096 - val_loss: 0.5393 - val_accuracy: 0.7290
Epoch 3/3
195/196 [============================>.] - ETA: 0s - loss: 0.5015 - accuracy: 0.7490
Epoch 00003: saving model to ./model_transformers/training/cp-0003.ckpt
196/196 [==============================] - 6s 29ms/step - loss: 0.5017 - accuracy: 0.7488 - val_loss: 0.4733 - val_accuracy: 0.7712

Combine BERT and feed forward network

Here we combine the BERT model with the model head to obtain an end-to-end text classifier.

[32]:
class TextClassifier(tf.keras.Model):

    def __init__(self, model_bert, model_out):
        super().__init__()
        self.model_bert = model_bert
        self.model_out = model_out

    def call(self, inputs, attention_mask=None):
        out = self.model_bert(inputs, attention_mask=attention_mask)
        out = self.model_out(out.last_hidden_state)
        return out

    def get_config(self):
        return {}

    @classmethod
    def from_config(cls, config):
        return cls(**config)
[33]:
text_classifier = TextClassifier(modelBert, model_out)

Calculate integrated gradients

We pick the first 10 sentences from the test set as examples.

[34]:
z_test_sample = [decode_sentence(x_test[i], reverse_index) for i in range(10)]
z_test_sample = preprocess_reviews(z_test_sample)
z_test_sample = process_sentences(z_test_sample, tokenizer, max_len)

x_test_sample = z_test_sample['input_ids']
kwargs = {k:v for k,v in z_test_sample.items() if k == 'attention_mask'}

We calculate the attributions with respect to the first embedding layer of the BERT encoder.

[35]:
bl = text_classifier.layers[0].bert.encoder.layer[0]
[36]:
n_steps = 5
method = "gausslegendre"
internal_batch_size = 5
ig  = IntegratedGradients(text_classifier,
                          layer=bl,
                          n_steps=n_steps,
                          method=method,
                          internal_batch_size=internal_batch_size)
[37]:
predictions = text_classifier(x_test_sample, **kwargs).numpy().argmax(axis=1)
explanation = ig.explain(x_test_sample,
                         forward_kwargs=kwargs,
                         baselines=None,
                         target=predictions)
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
[38]:
# Get attributions values from the explanation object
attrs = explanation.attributions[0]
print('Attributions shape:', attrs.shape)
Attributions shape: (10, 100, 768)
[39]:
attrs = attrs.sum(axis=2)
print('Attributions shape:', attrs.shape)
Attributions shape: (10, 100)
[40]:
i = 1
x_i = x_test_sample[i]
attrs_i = attrs[i]
pred = predictions[i]
pred_dict = {1: 'Positive review', 0: 'Negative review'}
[41]:
from IPython.display import HTML
def  hlstr(string, color='white'):
    """
    Return HTML markup highlighting text with the desired color.
    """
    return f"<mark style=background-color:{color}>{string} </mark>"
[42]:
def colorize(attrs, cmap='PiYG'):
    """
    Compute hex colors based on the attributions for a single instance.
    Uses a diverging colorscale by default and normalizes and scales
    the colormap so that colors are consistent with the attributions.
    """
    import matplotlib as mpl
    cmap_bound = np.abs(attrs).max()
    norm = mpl.colors.Normalize(vmin=-cmap_bound, vmax=cmap_bound)
    cmap = mpl.cm.get_cmap(cmap)

    # now compute hex values of colors
    colors = list(map(lambda x: mpl.colors.rgb2hex(cmap(norm(x))), attrs))
    return colors
[43]:
words = tokenizer.decode(x_i).split()
colors = colorize(attrs_i)
[44]:
print('Predicted label =  {}: {}'.format(pred, pred_dict[pred]))
Predicted label =  1: Positive review
[45]:
HTML("".join(list(map(hlstr, words, colors))))
[45]:
a powerful study of loneliness sexual unk and desperation be patient unk up the atmosphere and pay attention to the wonderfully written script br br i praise robert altman this is one of his many films that deals with unconventional fascinating subject matter this film is disturbing but its sincere and its sure to unk a strong emotional response from the viewer if you want to see an unusual film some might even say bizarre this is worth the time br br unfortunately its very difficult to find in video stores you may have to