speechbrain.wordemb.transformer 模块
一个方便的包装器,用于从HuggingFace transformers(例如BERT)中检索的词嵌入。
作者 * Artem Ploujnikov 2021
摘要
异常:
当未安装HuggingFace Transformers时抛出 |
类:
一个用于从HuggingFace Transformers(例如BERT)预训练模型中提取词嵌入的封装器。 |
参考
- class speechbrain.wordemb.transformer.TransformerWordEmbeddings(model, tokenizer=None, layers=None, device=None)[source]
基础:
Module一个用于从HuggingFace Transformers(例如BERT)预训练的Transformer模型中检索词嵌入的封装器
- Parameters:
Example
注意:由于对HuggingFace transformer库的依赖是可选的,因此禁用了Doctests。
>>> from transformers import AutoTokenizer, AutoModel >>> from speechbrain.wordemb.transformer import TransformerWordEmbeddings >>> model_name = "bert-base-uncased" >>> tokenizer = AutoTokenizer.from_pretrained( ... model_name, return_tensors='pt') >>> model = AutoModel.from_pretrained( ... model_name, ... output_hidden_states=True) >>> word_emb = TransformerWordEmbeddings( ... model=model, ... layers=4, ... tokenizer=tokenizer ... ) >>> embedding = word_emb.embedding( ... sentence="THIS IS A TEST SENTENCE", ... word="TEST" ... ) >>> embedding[:8] tensor([ 3.4332, -3.6702, 0.5152, -1.9301, 0.9197, 2.1628, -0.2841, -0.3549]) >>> embeddings = word_emb.embeddings("This is cool") >>> embeddings.shape torch.Size([3, 768]) >>> embeddings[:, :3] tensor([[-2.9078, 1.2496, 0.7269], [-0.9940, -0.6960, 1.4350], [-1.2401, -3.8237, 0.2739]]) >>> sentences = [ ... "This is the first test sentence", ... "This is the second test sentence", ... "A quick brown fox jumped over the lazy dog" ... ] >>> batch_embeddings = word_emb.batch_embeddings(sentences) >>> batch_embeddings.shape torch.Size([3, 9, 768]) >>> batch_embeddings[:, :2, :3] tensor([[[-5.0935, -1.2838, 0.7868], [-4.6889, -2.1488, 2.1380]],
- [[-4.4993, -2.0178, 0.9369],
[-4.1760, -2.4141, 1.9474]],
- [[-1.0065, 1.4227, -2.6671],
[-0.3408, -0.6238, 0.1780]]])
- MSG_WORD = "'word' should be either a word or the index of a word"
- DEFAULT_LAYERS = 4
- embeddings(sentence)[source]
返回句子中所有单词的模型嵌入
- Parameters:
句子 (str) – 一个句子
- Returns:
emb – 所有词嵌入的张量
- Return type:
torch.Tensor