• 注意力(Attention)


    概述

    RNN 课程中,我们被限制在最后使用表示,但是如果我们可以为每个编码输入赋予上下文权重(H一世) 在做出我们的预测时?这也是优选的,因为它可以帮助缓解因处理非常长的序列而导致的梯度消失问题。下面是应用于 RNN 输出的注意力。理论上,输出可以来自我们想学习如何在它们之间加权的任何地方,但是由于我们正在使用上一课中的 RNN 的上下文,我们将继续这样做。

    一个=s○F吨米一个X(在一个吨吨nH)
    C吨=∑一世=1n一个吨,一世H一世

    多变的描述
    ñ批量大小
    批处理中的最大序列长度
    H隐藏的暗淡,模型暗淡等。
    HRNN 输出(或您想要关注的任何输出组)∈RñX米XH
    一个吨,一世对齐函数上下文向量C吨(在我们的例子中注意)$
    在一个吨吨n学习的注意力权重∈RHX1
    C吨考虑不同输入的上下文向量

    • 目标
      • 在其核心,注意力是关于学习如何权衡一组编码表示以产生用于下游任务的上下文感知表示。这是通过学习一组注意力权重,然后使用 softmax 创建总和为 1 的注意力值来完成的。
    • 优点
      • 了解如何在不考虑位置的情况下考虑适当的编码表示。
    • 缺点
      • 另一个涉及学习权重的计算步骤。
    • 杂项
      • 几种最先进的方法扩展了基本注意力,以提供高度上下文感知的表示(例如自我注意力)。

       

      设置

      让我们为我们的主要任务设置种子和设备。

    1. import numpy as np
    2. import pandas as pd
    3. import random
    4. import torch
    5. import torch.nn as nn
    6. import torch.nn.functional as F
    SEED = 1234
    1. def set_seeds(seed=1234):
    2. """Set seeds for reproducibility."""
    3. np.random.seed(seed)
    4. random.seed(seed)
    5. torch.manual_seed(seed)
    6. torch.cuda.manual_seed(seed)
    7. torch.cuda.manual_seed_all(seed) # multi-GPU
    1. # Set seeds for reproducibility
    2. set_seeds(seed=SEED)
    1. # Set device
    2. cuda = True
    3. device = torch.device("cuda" if (
    4. torch.cuda.is_available() and cuda) else "cpu")
    5. torch.set_default_tensor_type("torch.FloatTensor")
    6. if device.type == "cuda":
    7. torch.set_default_tensor_type("torch.cuda.FloatTensor")
    8. print (device)

    加载数据

    我们将下载AG News 数据集Business,该数据集包含来自 4 个独特类别( 、Sci/TechSportsWorld) 的 120K 文本样本

    1. # Load data
    2. url = "https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/news.csv"
    3. df = pd.read_csv(url, header=0) # load
    4. df = df.sample(frac=1).reset_index(drop=True) # shuffle
    5. df.head()
    标题类别
    0沙龙接受减少加沙军队行动的计划......世界
    1野生动物犯罪斗争中的互联网关键战场科技
    27 月耐用品订单增长 1.7%商业
    3华尔街放缓的迹象越来越多商业
    4真人秀的新面孔世界

    预处理

    我们将首先通过执行诸如下部文本、删除停止(填充)词、使用正则表达式的过滤器等操作来清理我们的输入数据。

    1. import nltk
    2. from nltk.corpus import stopwords
    3. from nltk.stem import PorterStemmer
    4. import re
    1. nltk.download("stopwords")
    2. STOPWORDS = stopwords.words("english")
    3. print (STOPWORDS[:5])
    4. porter = PorterStemmer()

    [nltk_data] 正在将包停用词下载到 /root/nltk_data...
    [nltk_data] 包停用词已经是最新的!
    ['我','我','我的','我自己','我们']

    1. def preprocess(text, stopwords=STOPWORDS):
    2. """Conditional preprocessing on our text unique to our task."""
    3. # Lower
    4. text = text.lower()
    5. # Remove stopwords
    6. pattern = re.compile(r"\b(" + r"|".join(stopwords) + r")\b\s*")
    7. text = pattern.sub("", text)
    8. # Remove words in parenthesis
    9. text = re.sub(r"\([^)]*\)", "", text)
    10. # Spacing and filters
    11. text = re.sub(r"([-;;.,!?<=>])", r" \1 ", text)
    12. text = re.sub("[^A-Za-z0-9]+", " ", text) # remove non alphanumeric chars
    13. text = re.sub(" +", " ", text) # remove multiple spaces
    14. text = text.strip()
    15. return text
    1. # Sample
    2. text = "Great week for the NYSE!"
    3. preprocess(text=text)

    纽约证券交易所伟大的一周

    1. # Apply to dataframe
    2. preprocessed_df = df.copy()
    3. preprocessed_df.title = preprocessed_df.title.apply(preprocess)
    4. print (f"{df.title.values[0]}\n\n{preprocessed_df.title.values[0]}")

    《国土报》称,沙龙接受减少加沙军队行动的计划

    沙龙接受计划减少加沙军队 国土报说

    拆分数据

    1. import collections
    2. from sklearn.model_selection import train_test_split
    1. TRAIN_SIZE = 0.7
    2. VAL_SIZE = 0.15
    3. TEST_SIZE = 0.15
    1. def train_val_test_split(X, y, train_size):
    2. """Split dataset into data splits."""
    3. X_train, X_, y_train, y_ = train_test_split(X, y, train_size=TRAIN_SIZE, stratify=y)
    4. X_val, X_test, y_val, y_test = train_test_split(X_, y_, train_size=0.5, stratify=y_)
    5. return X_train, X_val, X_test, y_train, y_val, y_test
    1. # Data
    2. X = preprocessed_df["title"].values
    3. y = preprocessed_df["category"].values
    1. # Create data splits
    2. X_train, X_val, X_test, y_train, y_val, y_test = train_val_test_split(
    3. X=X, y=y, train_size=TRAIN_SIZE)
    4. print (f"X_train: {X_train.shape}, y_train: {y_train.shape}")
    5. print (f"X_val: {X_val.shape}, y_val: {y_val.shape}")
    6. print (f"X_test: {X_test.shape}, y_test: {y_test.shape}")
    7. print (f"Sample point: {X_train[0]}{y_train[0]}")

    X_train: (84000,), y_train: (84000,)
    X_val: (18000,), y_val: (18000,)
    X_test: (18000,), y_test: (18000,)
    样本点:中国与朝鲜核谈判作斗争 → 世界

    标签编码

    接下来,我们将定义 aLabelEncoder将我们的文本标签编码为唯一索引

    1. class LabelEncoder(object):
    2. """Label encoder for tag labels."""
    3. def __init__(self, class_to_index={}):
    4. self.class_to_index = class_to_index or {} # mutable defaults ;)
    5. self.index_to_class = {v: k for k, v in self.class_to_index.items()}
    6. self.classes = list(self.class_to_index.keys())
    7. def __len__(self):
    8. return len(self.class_to_index)
    9. def __str__(self):
    10. return f""
    11. def fit(self, y):
    12. classes = np.unique(y)
    13. for i, class_ in enumerate(classes):
    14. self.class_to_index[class_] = i
    15. self.index_to_class = {v: k for k, v in self.class_to_index.items()}
    16. self.classes = list(self.class_to_index.keys())
    17. return self
    18. def encode(self, y):
    19. encoded = np.zeros((len(y)), dtype=int)
    20. for i, item in enumerate(y):
    21. encoded[i] = self.class_to_index[item]
    22. return encoded
    23. def decode(self, y):
    24. classes = []
    25. for i, item in enumerate(y):
    26. classes.append(self.index_to_class[item])
    27. return classes
    28. def save(self, fp):
    29. with open(fp, "w") as fp:
    30. contents = {'class_to_index': self.class_to_index}
    31. json.dump(contents, fp, indent=4, sort_keys=False)
    32. @classmethod
    33. def load(cls, fp):
    34. with open(fp, "r") as fp:
    35. kwargs = json.load(fp=fp)
    36. return cls(**kwargs)
    1. # Encode
    2. label_encoder = LabelEncoder()
    3. label_encoder.fit(y_train)
    4. NUM_CLASSES = len(label_encoder)
    5. label_encoder.class_to_index

    {“商业”:0,“科技”:1,“体育”:2,“世界”:3}

    1. # Convert labels to tokens
    2. print (f"y_train[0]: {y_train[0]}")
    3. y_train = label_encoder.encode(y_train)
    4. y_val = label_encoder.encode(y_val)
    5. y_test = label_encoder.encode(y_test)
    6. print (f"y_train[0]: {y_train[0]}")

    y_train[0]:世界
    y_train[0]:3

    1. # Class weights
    2. counts = np.bincount(y_train)
    3. class_weights = {i: 1.0/count for i, count in enumerate(counts)}
    4. print (f"counts: {counts}\nweights: {class_weights}")

    计数:[21000 21000 21000 21000]
    权重:{0: 4.761904761904762e-05, 1: 4.761904761904762e-05, 2: 4.761904761904762e-05, 3: 4.761904761904762e-05}

    分词器

    我们将定义一个Tokenizer将我们的文本输入数据转换为标记索引。

    1. import json
    2. from collections import Counter
    3. from more_itertools import take
    1. class Tokenizer(object):
    2. def __init__(self, char_level, num_tokens=None,
    3. pad_token="", oov_token="",
    4. token_to_index=None):
    5. self.char_level = char_level
    6. self.separator = "" if self.char_level else " "
    7. if num_tokens: num_tokens -= 2 # pad + unk tokens
    8. self.num_tokens = num_tokens
    9. self.pad_token = pad_token
    10. self.oov_token = oov_token
    11. if not token_to_index:
    12. token_to_index = {pad_token: 0, oov_token: 1}
    13. self.token_to_index = token_to_index
    14. self.index_to_token = {v: k for k, v in self.token_to_index.items()}
    15. def __len__(self):
    16. return len(self.token_to_index)
    17. def __str__(self):
    18. return f""
    19. def fit_on_texts(self, texts):
    20. if not self.char_level:
    21. texts = [text.split(" ") for text in texts]
    22. all_tokens = [token for text in texts for token in text]
    23. counts = Counter(all_tokens).most_common(self.num_tokens)
    24. self.min_token_freq = counts[-1][1]
    25. for token, count in counts:
    26. index = len(self)
    27. self.token_to_index[token] = index
    28. self.index_to_token[index] = token
    29. return self
    30. def texts_to_sequences(self, texts):
    31. sequences = []
    32. for text in texts:
    33. if not self.char_level:
    34. text = text.split(" ")
    35. sequence = []
    36. for token in text:
    37. sequence.append(self.token_to_index.get(
    38. token, self.token_to_index[self.oov_token]))
    39. sequences.append(np.asarray(sequence))
    40. return sequences
    41. def sequences_to_texts(self, sequences):
    42. texts = []
    43. for sequence in sequences:
    44. text = []
    45. for index in sequence:
    46. text.append(self.index_to_token.get(index, self.oov_token))
    47. texts.append(self.separator.join([token for token in text]))
    48. return texts
    49. def save(self, fp):
    50. with open(fp, "w") as fp:
    51. contents = {
    52. "char_level": self.char_level,
    53. "oov_token": self.oov_token,
    54. "token_to_index": self.token_to_index
    55. }
    56. json.dump(contents, fp, indent=4, sort_keys=False)
    57. @classmethod
    58. def load(cls, fp):
    59. with open(fp, "r") as fp:
    60. kwargs = json.load(fp=fp)
    61. return cls(**kwargs)
    1. # Tokenize
    2. tokenizer = Tokenizer(char_level=False, num_tokens=5000)
    3. tokenizer.fit_on_texts(texts=X_train)
    4. VOCAB_SIZE = len(tokenizer)
    5. print (tokenizer)

    1. # Sample of tokens
    2. print (take(5, tokenizer.token_to_index.items()))
    3. print (f"least freq token's freq: {tokenizer.min_token_freq}") # use this to adjust num_tokens

    [('',0),('',1),('39',2),('b',3),('gt',4),]
    最低频率令牌的频率:14

    1. # Convert texts to sequences of indices
    2. X_train = tokenizer.texts_to_sequences(X_train)
    3. X_val = tokenizer.texts_to_sequences(X_val)
    4. X_test = tokenizer.texts_to_sequences(X_test)
    5. preprocessed_text = tokenizer.sequences_to_texts([X_train[0]])[0]
    6. print ("Text to indices:\n"
    7. f" (preprocessed) → {preprocessed_text}\n"
    8. f" (tokenized) → {X_train[0]}")

    文本到索引:
      (预处理)→ 中国与朝鲜核谈判作斗争
      (代币化)→ [ 16 1491 285 142 114 24]

    填充

    我们需要对我们的标记化文本进行 2D 填充。

    1. def pad_sequences(sequences, max_seq_len=0):
    2. """Pad sequences to max length in sequence."""
    3. max_seq_len = max(max_seq_len, max(len(sequence) for sequence in sequences))
    4. padded_sequences = np.zeros((len(sequences), max_seq_len))
    5. for i, sequence in enumerate(sequences):
    6. padded_sequences[i][:len(sequence)] = sequence
    7. return padded_sequences
    1. # 2D sequences
    2. padded = pad_sequences(X_train[0:3])
    3. print (padded.shape)
    4. print (padded)

    (3, 6)
    [[1.600e+01 1.491e+03 2.850e+02 1.420e+02 1.140e+02 2.400e+01]
     [1.445e+03 2.300e+01 6.560e+02 2.197e+03 1.000e+00 0.000e+00]
     [1.200e+02 1.400e+01 1.955e+03 1.005e+03 1.529e+03 4.014e+03]]

    数据集

    我们将创建数据集和数据加载器,以便能够使用我们的数据拆分有效地创建批次。

    1. class Dataset(torch.utils.data.Dataset):
    2. def __init__(self, X, y):
    3. self.X = X
    4. self.y = y
    5. def __len__(self):
    6. return len(self.y)
    7. def __str__(self):
    8. return f""
    9. def __getitem__(self, index):
    10. X = self.X[index]
    11. y = self.y[index]
    12. return [X, len(X), y]
    13. def collate_fn(self, batch):
    14. """Processing on a batch."""
    15. # Get inputs
    16. batch = np.array(batch)
    17. X = batch[:, 0]
    18. seq_lens = batch[:, 1]
    19. y = batch[:, 2]
    20. # Pad inputs
    21. X = pad_sequences(sequences=X)
    22. # Cast
    23. X = torch.LongTensor(X.astype(np.int32))
    24. seq_lens = torch.LongTensor(seq_lens.astype(np.int32))
    25. y = torch.LongTensor(y.astype(np.int32))
    26. return X, seq_lens, y
    27. def create_dataloader(self, batch_size, shuffle=False, drop_last=False):
    28. return torch.utils.data.DataLoader(
    29. dataset=self, batch_size=batch_size, collate_fn=self.collate_fn,
    30. shuffle=shuffle, drop_last=drop_last, pin_memory=True)
    1. # Create datasets
    2. train_dataset = Dataset(X=X_train, y=y_train)
    3. val_dataset = Dataset(X=X_val, y=y_val)
    4. test_dataset = Dataset(X=X_test, y=y_test)
    5. print ("Datasets:\n"
    6. f" Train dataset:{train_dataset.__str__()}\n"
    7. f" Val dataset: {val_dataset.__str__()}\n"
    8. f" Test dataset: {test_dataset.__str__()}\n"
    9. "Sample point:\n"
    10. f" X: {train_dataset[0][0]}\n"
    11. f" seq_len: {train_dataset[0][1]}\n"
    12. f" y: {train_dataset[0][2]}")
    1. 数据集:
    2. 训练数据集:
    3. 验证数据集:<数据集(N=18000)>
    4. 测试数据集:
    5. 采样点:
    6. X:[16 1491 285 142 114 24]
    7. seq_len:
    8. 和: 3
    1. # Create dataloaders
    2. batch_size = 64
    3. train_dataloader = train_dataset.create_dataloader(
    4. batch_size=batch_size)
    5. val_dataloader = val_dataset.create_dataloader(
    6. batch_size=batch_size)
    7. test_dataloader = test_dataset.create_dataloader(
    8. batch_size=batch_size)
    9. batch_X, batch_seq_lens, batch_y = next(iter(train_dataloader))
    10. print ("Sample batch:\n"
    11. f" X: {list(batch_X.size())}\n"
    12. f" seq_lens: {list(batch_seq_lens.size())}\n"
    13. f" y: {list(batch_y.size())}\n"
    14. "Sample point:\n"
    15. f" X: {batch_X[0]}\n"
    16. f" seq_len: {batch_seq_lens[0]}\n"
    17. f" y: {batch_y[0]}")
    1. 样品批次:
    2. X: [64, 14]
    3. seq_lens:[64]
    4. 和: [64]
    5. 采样点:
    6. X:张量([ 16, 1491, 285, 142, 114, 24, 0, 0, 0, 0, 0, 0,
    7. 0, 0])
    8. seq_len:
    9. 和: 3

    培训师

    让我们创建一个Trainer类,我们将使用它来促进我们的实验训练。

    1. class Trainer(object):
    2. def __init__(self, model, device, loss_fn=None, optimizer=None, scheduler=None):
    3. # Set params
    4. self.model = model
    5. self.device = device
    6. self.loss_fn = loss_fn
    7. self.optimizer = optimizer
    8. self.scheduler = scheduler
    9. def train_step(self, dataloader):
    10. """Train step."""
    11. # Set model to train mode
    12. self.model.train()
    13. loss = 0.0
    14. # Iterate over train batches
    15. for i, batch in enumerate(dataloader):
    16. # Step
    17. batch = [item.to(self.device) for item in batch] # Set device
    18. inputs, targets = batch[:-1], batch[-1]
    19. self.optimizer.zero_grad() # Reset gradients
    20. z = self.model(inputs) # Forward pass
    21. J = self.loss_fn(z, targets) # Define loss
    22. J.backward() # Backward pass
    23. self.optimizer.step() # Update weights
    24. # Cumulative Metrics
    25. loss += (J.detach().item() - loss) / (i + 1)
    26. return loss
    27. def eval_step(self, dataloader):
    28. """Validation or test step."""
    29. # Set model to eval mode
    30. self.model.eval()
    31. loss = 0.0
    32. y_trues, y_probs = [], []
    33. # Iterate over val batches
    34. with torch.inference_mode():
    35. for i, batch in enumerate(dataloader):
    36. # Step
    37. batch = [item.to(self.device) for item in batch] # Set device
    38. inputs, y_true = batch[:-1], batch[-1]
    39. z = self.model(inputs) # Forward pass
    40. J = self.loss_fn(z, y_true).item()
    41. # Cumulative Metrics
    42. loss += (J - loss) / (i + 1)
    43. # Store outputs
    44. y_prob = torch.sigmoid(z).cpu().numpy()
    45. y_probs.extend(y_prob)
    46. y_trues.extend(y_true.cpu().numpy())
    47. return loss, np.vstack(y_trues), np.vstack(y_probs)
    48. def predict_step(self, dataloader):
    49. """Prediction step."""
    50. # Set model to eval mode
    51. self.model.eval()
    52. y_probs = []
    53. # Iterate over val batches
    54. with torch.inference_mode():
    55. for i, batch in enumerate(dataloader):
    56. # Forward pass w/ inputs
    57. inputs, targets = batch[:-1], batch[-1]
    58. y_prob = F.softmax(model(inputs), dim=1)
    59. # Store outputs
    60. y_probs.extend(y_prob)
    61. return np.vstack(y_probs)
    62. def train(self, num_epochs, patience, train_dataloader, val_dataloader):
    63. best_val_loss = np.inf
    64. for epoch in range(num_epochs):
    65. # Steps
    66. train_loss = self.train_step(dataloader=train_dataloader)
    67. val_loss, _, _ = self.eval_step(dataloader=val_dataloader)
    68. self.scheduler.step(val_loss)
    69. # Early stopping
    70. if val_loss < best_val_loss:
    71. best_val_loss = val_loss
    72. best_model = self.model
    73. _patience = patience # reset _patience
    74. else:
    75. _patience -= 1
    76. if not _patience: # 0
    77. print("Stopping early!")
    78. break
    79. # Logging
    80. print(
    81. f"Epoch: {epoch+1} | "
    82. f"train_loss: {train_loss:.5f}, "
    83. f"val_loss: {val_loss:.5f}, "
    84. f"lr: {self.optimizer.param_groups[0]['lr']:.2E}, "
    85. f"_patience: {_patience}"
    86. )
    87. return best_model

    注意力

    注意应用于 RNN 的输出。理论上,输出可以来自我们想学习如何在它们之间加权的任何地方,但是由于我们正在使用上一课中的 RNN 的上下文,我们将继续这样做。

    多变的描述
    ñ批量大小
    批处理中的最大序列长度
    H隐藏的暗淡,模型暗淡等。
    HRNN 输出(或您想要关注的任何输出组)∈RñX米XH
    一个吨,一世对齐函数上下文向量C吨(在我们的例子中注意)$
    在一个吨吨n学习的注意力权重∈RHX1
    C吨考虑不同输入的上下文向量

    import torch.nn.functional as F

    RNN 将为我们输入中的每个单词创建一个编码表示,从而产生一个具有维度的堆叠向量ñX米XH,其中 N 是批次中的样本数,M 是批次中的最大序列长度,H 是 RNN 中隐藏单元的数量。

     

    1. BATCH_SIZE = 64
    2. SEQ_LEN = 8
    3. EMBEDDING_DIM = 100
    4. RNN_HIDDEN_DIM = 128
    1. # Embed
    2. x = torch.rand((BATCH_SIZE, SEQ_LEN, EMBEDDING_DIM))
    1. # Encode
    2. rnn = nn.RNN(EMBEDDING_DIM, RNN_HIDDEN_DIM, batch_first=True)
    3. out, h_n = rnn(x) # h_n is the last hidden state
    4. print ("out: ", out.shape)
    5. print ("h_n: ", h_n.shape)

    输出:torch.Size([64, 8, 128])
    h_n: torch.Size([1, 64, 128])

    1. # Attend
    2. attn = nn.Linear(RNN_HIDDEN_DIM, 1)
    3. e = attn(out)
    4. attn_vals = F.softmax(e.squeeze(2), dim=1)
    5. c = torch.bmm(attn_vals.unsqueeze(1), out).squeeze(1)
    6. print ("e: ", e.shape)
    7. print ("attn_vals: ", attn_vals.shape)
    8. print ("attn_vals[0]: ", attn_vals[0])
    9. print ("sum(attn_vals[0]): ", sum(attn_vals[0]))
    10. print ("c: ", c.shape)
    1. # Predict
    2. fc1 = nn.Linear(RNN_HIDDEN_DIM, NUM_CLASSES)
    3. output = F.softmax(fc1(c), dim=1)
    4. print ("output: ", output.shape)

    输出:torch.Size([64, 4])

    模型

    现在让我们创建基于 RNN 的模型,但在 RNN 的输出之上添加了注意力层。

    1. RNN_HIDDEN_DIM = 128
    2. DROPOUT_P = 0.1
    3. HIDDEN_DIM = 100
    1. class RNN(nn.Module):
    2. def __init__(self, embedding_dim, vocab_size, rnn_hidden_dim,
    3. hidden_dim, dropout_p, num_classes, padding_idx=0):
    4. super(RNN, self).__init__()
    5. # Initialize embeddings
    6. self.embeddings = nn.Embedding(
    7. embedding_dim=embedding_dim, num_embeddings=vocab_size,
    8. padding_idx=padding_idx)
    9. # RNN
    10. self.rnn = nn.RNN(embedding_dim, rnn_hidden_dim, batch_first=True)
    11. # Attention
    12. self.attn = nn.Linear(rnn_hidden_dim, 1)
    13. # FC weights
    14. self.dropout = nn.Dropout(dropout_p)
    15. self.fc1 = nn.Linear(rnn_hidden_dim, hidden_dim)
    16. self.fc2 = nn.Linear(hidden_dim, num_classes)
    17. def forward(self, inputs):
    18. # Embed
    19. x_in, seq_lens = inputs
    20. x_in = self.embeddings(x_in)
    21. # Encode
    22. out, h_n = self.rnn(x_in)
    23. # Attend
    24. e = self.attn(out)
    25. attn_vals = F.softmax(e.squeeze(2), dim=1)
    26. c = torch.bmm(attn_vals.unsqueeze(1), out).squeeze(1)
    27. # Predict
    28. z = self.fc1(c)
    29. z = self.dropout(z)
    30. z = self.fc2(z)
    31. return z
    1. # Simple RNN cell
    2. model = RNN(
    3. embedding_dim=EMBEDDING_DIM, vocab_size=VOCAB_SIZE,
    4. rnn_hidden_dim=RNN_HIDDEN_DIM, hidden_dim=HIDDEN_DIM,
    5. dropout_p=DROPOUT_P, num_classes=NUM_CLASSES)
    6. model = model.to(device) # set device
    7. print (model.named_parameters)

    训练

    from torch.optim import Adam
    1. NUM_LAYERS = 1
    2. LEARNING_RATE = 1e-4
    3. PATIENCE = 10
    4. NUM_EPOCHS = 50
    1. # Define Loss
    2. class_weights_tensor = torch.Tensor(list(class_weights.values())).to(device)
    3. loss_fn = nn.CrossEntropyLoss(weight=class_weights_tensor)
    1. # Define optimizer & scheduler
    2. optimizer = Adam(model.parameters(), lr=LEARNING_RATE)
    3. scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    4. optimizer, mode="min", factor=0.1, patience=3)
    1. # Trainer module
    2. trainer = Trainer(
    3. model=model, device=device, loss_fn=loss_fn,
    4. optimizer=optimizer, scheduler=scheduler)
    1. # Train
    2. best_model = trainer.train(
    3. NUM_EPOCHS, PATIENCE, train_dataloader, val_dataloader)

    评估

    1. import json
    2. from sklearn.metrics import precision_recall_fscore_support
    1. def get_metrics(y_true, y_pred, classes):
    2. """Per-class performance metrics."""
    3. # Performance
    4. performance = {"overall": {}, "class": {}}
    5. # Overall performance
    6. metrics = precision_recall_fscore_support(y_true, y_pred, average="weighted")
    7. performance["overall"]["precision"] = metrics[0]
    8. performance["overall"]["recall"] = metrics[1]
    9. performance["overall"]["f1"] = metrics[2]
    10. performance["overall"]["num_samples"] = np.float64(len(y_true))
    11. # Per-class performance
    12. metrics = precision_recall_fscore_support(y_true, y_pred, average=None)
    13. for i in range(len(classes)):
    14. performance["class"][classes[i]] = {
    15. "precision": metrics[0][i],
    16. "recall": metrics[1][i],
    17. "f1": metrics[2][i],
    18. "num_samples": np.float64(metrics[3][i]),
    19. }
    20. return performance
    1. # Get predictions
    2. test_loss, y_true, y_prob = trainer.eval_step(dataloader=test_dataloader)
    3. y_pred = np.argmax(y_prob, axis=1)
    1. # Determine performance
    2. performance = get_metrics(
    3. y_true=y_test, y_pred=y_pred, classes=label_encoder.classes)
    4. print (json.dumps(performance["overall"], indent=2))

    推理

    1. def get_probability_distribution(y_prob, classes):
    2. """Create a dict of class probabilities from an array."""
    3. results = {}
    4. for i, class_ in enumerate(classes):
    5. results[class_] = np.float64(y_prob[i])
    6. sorted_results = {k: v for k, v in sorted(
    7. results.items(), key=lambda item: item[1], reverse=True)}
    8. return sorted_results
    1. # Load artifacts
    2. device = torch.device("cpu")
    3. label_encoder = LabelEncoder.load(fp=Path(dir, "label_encoder.json"))
    4. tokenizer = Tokenizer.load(fp=Path(dir, 'tokenizer.json'))
    5. model = GRU(
    6. embedding_dim=EMBEDDING_DIM, vocab_size=VOCAB_SIZE,
    7. rnn_hidden_dim=RNN_HIDDEN_DIM, hidden_dim=HIDDEN_DIM,
    8. dropout_p=DROPOUT_P, num_classes=NUM_CLASSES)
    9. model.load_state_dict(torch.load(Path(dir, "model.pt"), map_location=device))
    10. model.to(device)
    1. # Initialize trainer
    2. trainer = Trainer(model=model, device=device)
    1. # Dataloader
    2. text = "The final tennis tournament starts next week."
    3. X = tokenizer.texts_to_sequences([preprocess(text)])
    4. print (tokenizer.sequences_to_texts(X))
    5. y_filler = label_encoder.encode([label_encoder.classes[0]]*len(X))
    6. dataset = Dataset(X=X, y=y_filler)
    7. dataloader = dataset.create_dataloader(batch_size=batch_size)
    ['决赛网球锦标赛下周开始']
    1. # Inference
    2. y_prob = trainer.predict_step(dataloader)
    3. y_pred = np.argmax(y_prob, axis=1)
    4. label_encoder.decode(y_pred)
    ['运动的']
    1. # Class distributions
    2. prob_dist = get_probability_distribution(y_prob=y_prob[0], classes=label_encoder.classes)
    3. print (json.dumps(prob_dist, indent=2))
    1. {
    2. “体育”:0.9651875495910645,
    3. “世界”:0.03468644618988037,
    4. 《科技》:8.490968320984393e-05,
    5. 《商务》:4.112234091735445e-05
    6. }

    可解释性

    让我们使用注意力值来查看哪些编码标记在预测适当的标签时最有用。

    1. import collections
    2. import seaborn as sns
    1. class InterpretAttn(nn.Module):
    2. def __init__(self, embedding_dim, vocab_size, rnn_hidden_dim,
    3. hidden_dim, dropout_p, num_classes, padding_idx=0):
    4. super(InterpretAttn, self).__init__()
    5. # Initialize embeddings
    6. self.embeddings = nn.Embedding(
    7. embedding_dim=embedding_dim, num_embeddings=vocab_size,
    8. padding_idx=padding_idx)
    9. # RNN
    10. self.rnn = nn.RNN(embedding_dim, rnn_hidden_dim, batch_first=True)
    11. # Attention
    12. self.attn = nn.Linear(rnn_hidden_dim, 1)
    13. # FC weights
    14. self.dropout = nn.Dropout(dropout_p)
    15. self.fc1 = nn.Linear(rnn_hidden_dim, hidden_dim)
    16. self.fc2 = nn.Linear(hidden_dim, num_classes)
    17. def forward(self, inputs):
    18. # Embed
    19. x_in, seq_lens = inputs
    20. x_in = self.embeddings(x_in)
    21. # Encode
    22. out, h_n = self.rnn(x_in)
    23. # Attend
    24. e = self.attn(out) # could add optional activation function (ex. tanh)
    25. attn_vals = F.softmax(e.squeeze(2), dim=1)
    26. return attn_vals
    1. # Initialize model
    2. interpretable_model = InterpretAttn(
    3. embedding_dim=EMBEDDING_DIM, vocab_size=VOCAB_SIZE,
    4. rnn_hidden_dim=RNN_HIDDEN_DIM, hidden_dim=HIDDEN_DIM,
    5. dropout_p=DROPOUT_P, num_classes=NUM_CLASSES)
    6. interpretable_model.load_state_dict(torch.load(Path(dir, "model.pt"), map_location=device))
    7. interpretable_model.to(device)
    1. # Initialize trainer
    2. interpretable_trainer = Trainer(model=interpretable_model, device=device)
    1. # Get attention values
    2. attn_vals = interpretable_trainer.predict_step(dataloader)
    3. print (attn_vals.shape) # (N, max_seq_len)
    1. # Visualize a bi-gram filter's outputs
    2. sns.set(rc={"figure.figsize":(10, 1)})
    3. tokens = tokenizer.sequences_to_texts(X)[0].split(" ")
    4. sns.heatmap(attn_vals, xticklabels=tokens)

     这个词tennis得到了最多的关注以产生Sports标签。

    关注类型

    我们将简要介绍不同类型的注意力以及何时使用它们。

    软(全局)关注

    Soft attention 到目前为止我们已经实现的注意力类型,我们在创建上下文向量时关注所有编码的输入。

    • 优点:我们总是有能力关注所有输入,以防我们更早看到/稍后看到的东西对于确定输出至关重要。
    • 缺点:如果我们的输入序列很长,这可能会导致昂贵的计算。

    硬注意力

    硬注意力集中在每个时间步长的一组特定编码输入上。

    • 优点:我们可以通过每次只关注一个本地补丁来节省大量的长序列计算。
    • 缺点:不可微分,因此我们需要使用更复杂的技术(方差减少、强化学习等)来训练。

    当地关注

    局部注意力融合了软注意力和硬注意力的优点。它涉及学习对齐的位置向量并根据经验确定要处理的编码输入的本地窗口。

    • 优点:关注输入的局部补丁,但仍可区分。
    • 缺点:需要确定每个输出的对齐向量,但为了避免关注所有输入,确定要关注的正确输入窗口是值得权衡的。

    自注意力

    我们还可以在编码的输入序列中使用注意力来创建基于输入对之间相似性的加权表示。这将使我们能够创建了解彼此之间关系的输入序列的丰富表示。例如,在下图中,您可以看到在组成标记“its”的表示时,这个特定的注意力头将合并来自标记“Law”的信号(据了解,“its”指的是“Law” )。

  • 相关阅读:
    搭建安信可小安派Windows 开发环境
    Springboot的房屋租赁租房系统049
    这应该是最全的Spring Boot启动原理源码剖析了
    LeetCode337:打家劫舍III
    Java练习题-用冒泡排序法实现数组排序
    IS-IS 路由选择协议入门
    软考高项考试历程回顾
    【EKF】EKF原理
    ASP.NET Core GRPC 和 Dubbo 互通
    走近科学之《spring security 的秘密》
  • 原文地址:https://blog.csdn.net/sikh_0529/article/details/126784755