speechbrain.lobes.models.transformer.TransformerLM 模块

Transformer语言模型的实现。

作者 * 钟建元 * 萨穆埃莱·科内尔

摘要

类：

TransformerLM

这是变压器语言模型的实现。

参考

class speechbrain.lobes.models.transformer.TransformerLM.TransformerLM(vocab, d_model=512, nhead=8, num_encoder_layers=12, num_decoder_layers=0, d_ffn=2048, dropout=0.1, activation=<class 'torch.nn.modules.activation.ReLU'>, positional_encoding='fixed_abs_sine', normalize_before=False, d_embedding=None, max_length=2500, causal=True, attention_type='regularMHA', decoder_use_memory=False)[source]

基础类: TransformerInterface

这是变压器语言模型的实现。

该架构基于论文《Attention Is All You Need》：https://arxiv.org/pdf/1706.03762.pdf

Parameters:

vocab (int) – 嵌入词汇表大小
d_model (int) – 编码器/解码器输入中期望的特征数量（默认=512）。
nhead (int) – 多头注意力模型中的头数（默认=8）。
num_encoder_layers (int) – 编码器中子编码层的数量（默认=12）。
num_decoder_layers (int) – 解码器中子解码层的数量（默认=0）。
d_ffn (int) – 前馈网络模型的维度（默认=2048）。
dropout (float) – 丢弃率的值（默认=0.1）。
activation (torch 类) – 编码器/解码器中间层的激活函数，relu 或 gelu (默认=relu)。
positional_encoding (str) – 位置编码的类型，默认为“fixed_abs_sine”
normalize_before (bool) – 是否在每一层之前进行归一化。
d_embedding (int) – 嵌入的大小，如果为None则使用d_model。
max_length (int) – 最大序列长度，默认为2500个标记。
causal (bool) – 是否在解码时包含未来信息，默认为 True。
attention_type (str) – 使用的注意力类型，可选“regularMHA”或“RelPosMHAXL”
decoder_use_memory (bool) – 是否在解码器中使用隐藏状态

Example

>>> src = torch.randint(0, 720, [8, 120])
>>> net = TransformerLM(720, 512, 8, 1, 0, 1024, activation=torch.nn.GELU)
>>> enc_out = net.forward(src)
>>> print(enc_out.shape)
torch.Size([8, 120, 720])

forward(src)[source]

Parameters:: src (torch.Tensor) – 编码器的输入序列（必需）。
Returns:: pred – 转换器的输出。
Return type:: torch.Tensor

make_masks(src, pad_idx=0, look_ahead_mask=True, padding_mask=True)[source]