speechbrain.decoders.scorer 模块
令牌评分器抽象和规范。
- Authors:
阿德尔·穆门 2022, 2023
叶松林 2021
摘要
类:
一个用于继承的评分抽象类,用于在束搜索中使用的其他评分方法。 |
|
一个评分抽象类,供其他用于束搜索的评分方法继承。 |
|
基于BaseScorerInterface的CTCPrefixScore的封装。 |
|
一个覆盖惩罚评分器,用于防止假设的循环,其中 |
|
基于BaseRescorerInterface的HuggingFace的TransformerLM的封装。 |
|
KenLM N-gram 评分器。 |
|
一个长度奖励评分器。 |
|
基于BaseRescorerInterface的RNNLM包装器。 |
|
基于BaseScorerInterface的RNNLM封装器。 |
|
为beamsearch构建rescorer实例。 |
|
为beamsearch构建评分器实例。 |
|
基于BaseRescorerInterface的TransformerLM包装器。 |
|
基于BaseScorerInterface的TransformerLM封装器。 |
参考
- class speechbrain.decoders.scorer.BaseScorerInterface[source]
基础类:
object一个评分器抽象类,供其他用于束搜索的评分方法继承。
评分器是一个模块,它根据当前时间步输入和之前的评分器状态对词汇表中的标记进行评分。它可以用于对整个词汇集(即完整评分器)或修剪后的标记集(即部分评分器)进行评分,以防止计算开销。在后一种情况下,部分评分器将在完整评分器之后被调用。它只会对从完整评分器中提取的前k个候选(即修剪后的标记集)进行评分。前k个候选是基于束大小和scorer_beam_scale提取的,使得候选数量为int(beam_size * scorer_beam_scale)。当完整评分器计算成本较高时(例如,KenLM评分器),这非常有用。
继承此类以实现与speechbrain.decoders.seq2seq.S2SBeamSearcher()兼容的自己的评分器。
- See:
speechbrain.decoders.scorer.CTCPrefixScorer
speechbrain.decoders.scorer.RNNLMScorer
speechbrain.decoders.scorer.TransformerLMScorer
speechbrain.decoders.scorer.KenLMScorer
speechbrain.decoders.scorer.CoverageScorer
speechbrain.decoders.scorer.LengthScorer
- score(inp_tokens, memory, candidates, attn)[source]
该方法根据当前时间步的信息对新光束进行评分。
分数是一个形状为(batch_size x beam_size, vocab_size)的张量。 它是给定当前时间步输入和先前评分器状态的下一个标记的对数概率。
它可以用于在修剪后的前k个候选上进行评分,以防止计算开销,或者在候选为None时在整个词汇集上进行评分。
- Parameters:
inp_tokens (torch.Tensor) – 当前时间步的输入张量。
memory (无限制) – 此时间步长的评分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size). 在完整评分器之后需要评分的top-k候选。 如果为None,评分器将在完整词汇集上进行评分。
attn (torch.Tensor) – 用于CoverageScorer或CTCScorer的注意力权重。
- Returns:
torch.Tensor – (batch_size x beam_size, vocab_size), 下一个token的分数。
memory (No limit) – 这个时间步的内存变量输入。
- class speechbrain.decoders.scorer.CTCScorer(ctc_fc, blank_index, eos_index, ctc_window_size=0)[source]
-
基于BaseScorerInterface的CTCPrefixScore的封装。
此Scorer用于提供下一个输入标记的CTC标签同步分数。该实现基于https://www.merl.com/publications/docs/TR2017-190.pdf。
- See:
speechbrain.decoders.scorer.CTCPrefixScore
- Parameters:
Example
>>> import torch >>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR >>> from speechbrain.decoders import S2STransformerBeamSearcher, CTCScorer, ScorerBuilder >>> batch_size=8 >>> n_channels=6 >>> input_size=40 >>> d_model=128 >>> tgt_vocab=140 >>> src = torch.rand([batch_size, n_channels, input_size]) >>> tgt = torch.randint(0, tgt_vocab, [batch_size, n_channels]) >>> net = TransformerASR( ... tgt_vocab, input_size, d_model, 8, 1, 1, 1024, activation=torch.nn.GELU ... ) >>> ctc_lin = Linear(input_shape=(1, 40, d_model), n_neurons=tgt_vocab) >>> lin = Linear(input_shape=(1, 40, d_model), n_neurons=tgt_vocab) >>> eos_index = 2 >>> ctc_scorer = CTCScorer( ... ctc_fc=ctc_lin, ... blank_index=0, ... eos_index=eos_index, ... ) >>> scorer = ScorerBuilder( ... full_scorers=[ctc_scorer], ... weights={'ctc': 1.0} ... ) >>> searcher = S2STransformerBeamSearcher( ... modules=[net, lin], ... bos_index=1, ... eos_index=eos_index, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... using_eos_threshold=False, ... beam_size=7, ... temperature=1.15, ... scorer=scorer ... ) >>> enc, dec = net.forward(src, tgt) >>> hyps, _, _, _ = searcher(enc, torch.ones(batch_size))
- score(inp_tokens, memory, candidates, attn)[source]
该方法根据在时间帧上计算的CTC分数对新波束进行评分。
- See:
speechbrain.decoders.scorer.CTCPrefixScore
- Parameters:
inp_tokens (torch.Tensor) – 当前时间步的输入张量。
memory (无限制) – 此时间步长的评分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size). 在完整评分器之后需要评分的top-k候选。 如果为None,评分器将在完整词汇集上进行评分。
attn (torch.Tensor) – 用于CoverageScorer或CTCScorer的注意力权重。
- Returns:
scores (torch.Tensor)
memory
- class speechbrain.decoders.scorer.RNNLMScorer(language_model, temperature=1.0)[source]
-
基于BaseScorerInterface的RNNLM封装器。
RNNLMScorer 用于根据当前时间步输入和之前的评分器状态提供下一个输入标记的 RNNLM 分数。
- Parameters:
language_model (torch.nn.Module) – 一个基于RNN的语言模型。
温度 (float) – 应用于softmax的温度因子。它改变了概率分布,当T>1时更柔和,当T<1时更尖锐。(默认值:1.0)
Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder >>> from speechbrain.decoders import S2SRNNBeamSearcher, RNNLMScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> emb = torch.nn.Embedding( ... embedding_dim=input_size, ... num_embeddings=vocab_size, ... ) >>> d_model=7 >>> dec = AttentionalRNNDecoder( ... rnn_type="gru", ... attn_type="content", ... hidden_size=3, ... attn_dim=3, ... num_layers=1, ... enc_dim=d_model, ... input_size=input_size, ... ) >>> n_channels=3 >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) >>> lm_weight = 0.4 >>> lm_model = RNNLM( ... embedding_dim=d_model, ... output_neurons=vocab_size, ... dropout=0.0, ... rnn_neurons=128, ... dnn_neurons=64, ... return_hidden=True, ... ) >>> rnnlm_scorer = RNNLMScorer( ... language_model=lm_model, ... temperature=1.25, ... ) >>> scorer = ScorerBuilder( ... full_scorers=[rnnlm_scorer], ... weights={'rnnlm': lm_weight} ... ) >>> beam_size=5 >>> searcher = S2SRNNBeamSearcher( ... embedding=emb, ... decoder=dec, ... linear=seq_lin, ... bos_index=1, ... eos_index=2, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... topk=2, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.25, ... scorer=scorer ... ) >>> batch_size=2 >>> enc = torch.rand([batch_size, n_channels, d_model]) >>> wav_len = torch.ones([batch_size]) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]
此方法根据在先前标记上计算的RNNLM分数对新光束进行评分。
- Parameters:
inp_tokens (torch.Tensor) – 当前时间步的输入张量。
memory (无限制) – 此时间步长的评分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size). 在完整评分器之后需要评分的top-k候选。 如果为None,评分器将在完整词汇集上进行评分。
attn (torch.Tensor) – 用于CoverageScorer或CTCScorer的注意力权重。
- Returns:
log_probs (torch.Tensor) – 输出概率。
hs (torch.Tensor) – 语言模型隐藏状态。
- class speechbrain.decoders.scorer.TransformerLMScorer(language_model, temperature=1.0)[source]
-
基于BaseScorerInterface的TransformerLM的封装。
TransformerLMScorer 用于根据当前时间步输入和之前的评分器状态提供下一个输入标记的 TransformerLM 分数。
- Parameters:
language_model (torch.nn.Module) – 一个基于Transformer的语言模型。
temperature (float) – 应用于softmax的温度因子。它改变了概率分布,当T>1时更柔和,当T<1时更尖锐。(默认值:1.0)
Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR >>> from speechbrain.lobes.models.transformer.TransformerLM import TransformerLM >>> from speechbrain.decoders import S2STransformerBeamSearcher, TransformerLMScorer, CTCScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> d_model=128 >>> net = TransformerASR( ... tgt_vocab=vocab_size, ... input_size=input_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=1, ... d_ffn=256, ... activation=torch.nn.GELU ... ) >>> lm_model = TransformerLM( ... vocab=vocab_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=0, ... d_ffn=256, ... activation=torch.nn.GELU, ... ) >>> n_channels=6 >>> ctc_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> seq_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> eos_index = 2 >>> ctc_scorer = CTCScorer( ... ctc_fc=ctc_lin, ... blank_index=0, ... eos_index=eos_index, ... ) >>> transformerlm_scorer = TransformerLMScorer( ... language_model=lm_model, ... temperature=1.15, ... ) >>> ctc_weight_decode=0.4 >>> lm_weight=0.6 >>> scorer = ScorerBuilder( ... full_scorers=[transformerlm_scorer, ctc_scorer], ... weights={'transformerlm': lm_weight, 'ctc': ctc_weight_decode} ... ) >>> beam_size=5 >>> searcher = S2STransformerBeamSearcher( ... modules=[net, seq_lin], ... bos_index=1, ... eos_index=eos_index, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.15, ... scorer=scorer ... ) >>> batch_size=2 >>> wav_len = torch.ones([batch_size]) >>> src = torch.rand([batch_size, n_channels, input_size]) >>> tgt = torch.randint(0, vocab_size, [batch_size, n_channels]) >>> enc, dec = net.forward(src, tgt) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]
该方法根据在先前标记上计算的TransformerLM分数对新光束进行评分。
- Parameters:
inp_tokens (torch.Tensor) – 当前时间步的输入张量。
memory (无限制) – 此时间步长的评分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size). 在完整评分器之后需要评分的top-k候选。 如果为None,评分器将在完整词汇集上进行评分。
attn (torch.Tensor) – 用于CoverageScorer或CTCScorer的注意力权重。
- Returns:
log_probs (torch.Tensor)
memory
- class speechbrain.decoders.scorer.KenLMScorer(lm_path, vocab_size, token_list)[source]
-
KenLM N-gram 评分器。
这个评分器基于KenLM,这是一个快速高效的N-gram语言模型工具包。它用于提供下一个输入标记的n-gram分数。
此评分器依赖于KenLM包。可以使用以下命令进行安装:
注意:KenLM评分器的计算成本很高。建议将其用作部分评分器,仅对前k个候选进行评分,而不是对整个词汇集进行评分。
Example
# >>> from speechbrain.nnet.linear import Linear # >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder # >>> from speechbrain.decoders import S2SRNNBeamSearcher, KenLMScorer, ScorerBuilder # >>> input_size=17 # >>> vocab_size=11 # >>> lm_path=’path/to/kenlm_model.arpa’ # 或 .bin # >>> token_list=[‘
’, ‘ ’, ‘ ’, ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’] # >>> emb = torch.nn.Embedding( # … embedding_dim=input_size, # … num_embeddings=vocab_size, # … ) # >>> d_model=7 # >>> dec = AttentionalRNNDecoder( # … rnn_type=”gru”, # … attn_type=”content”, # … hidden_size=3, # … attn_dim=3, # … num_layers=1, # … enc_dim=d_model, # … input_size=input_size, # … ) # >>> n_channels=3 # >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) # >>> kenlm_weight = 0.4 # >>> kenlm_model = KenLMScorer( # … lm_path=lm_path, # … vocab_size=vocab_size, # … token_list=token_list, # … ) # >>> scorer = ScorerBuilder( # … full_scorers=[kenlm_model], # … weights={‘kenlm’: kenlm_weight} # … ) # >>> beam_size=5 # >>> searcher = S2SRNNBeamSearcher( # … embedding=emb, # … decoder=dec, # … linear=seq_lin, # … bos_index=1, # … eos_index=2, # … min_decode_ratio=0.0, # … max_decode_ratio=1.0, # … topk=2, # … using_eos_threshold=False, # … beam_size=beam_size, # … temperature=1.25, # … scorer=scorer # … ) # >>> batch_size=2 # >>> enc = torch.rand([batch_size, n_channels, d_model]) # >>> wav_len = torch.ones([batch_size]) # >>> hyps, _, _, _ = searcher(enc, wav_len) - score(inp_tokens, memory, candidates, attn)[source]
此方法基于n-gram分数对新光束进行评分。
- Parameters:
inp_tokens (torch.Tensor) – 当前时间步的输入张量。
memory (无限制) – 此时间步长的评分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size). 在完整评分器之后需要评分的top-k候选。 如果为None,评分器将在完整词汇集上进行评分。
attn (torch.Tensor) – 用于CoverageScorer或CTCScorer的注意力权重。
- Returns:
scores (torch.Tensor)
(new_memory, new_scoring_table) (tuple)
- class speechbrain.decoders.scorer.CoverageScorer(vocab_size, threshold=0.5)[source]
-
一个覆盖惩罚评分器,用于防止假设的循环, 其中
`coverage`是累积的注意力概率向量。 参考:https://arxiv.org/pdf/1612.02695.pdf,Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder >>> from speechbrain.decoders import S2SRNNBeamSearcher, RNNLMScorer, CoverageScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> emb = torch.nn.Embedding( ... num_embeddings=vocab_size, ... embedding_dim=input_size ... ) >>> d_model=7 >>> dec = AttentionalRNNDecoder( ... rnn_type="gru", ... attn_type="content", ... hidden_size=3, ... attn_dim=3, ... num_layers=1, ... enc_dim=d_model, ... input_size=input_size, ... ) >>> n_channels=3 >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) >>> lm_weight = 0.4 >>> coverage_penalty = 1.0 >>> lm_model = RNNLM( ... embedding_dim=d_model, ... output_neurons=vocab_size, ... dropout=0.0, ... rnn_neurons=128, ... dnn_neurons=64, ... return_hidden=True, ... ) >>> rnnlm_scorer = RNNLMScorer( ... language_model=lm_model, ... temperature=1.25, ... ) >>> coverage_scorer = CoverageScorer(vocab_size=vocab_size) >>> scorer = ScorerBuilder( ... full_scorers=[rnnlm_scorer, coverage_scorer], ... weights={'rnnlm': lm_weight, 'coverage': coverage_penalty} ... ) >>> beam_size=5 >>> searcher = S2SRNNBeamSearcher( ... embedding=emb, ... decoder=dec, ... linear=seq_lin, ... bos_index=1, ... eos_index=2, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... topk=2, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.25, ... scorer=scorer ... ) >>> batch_size=2 >>> enc = torch.rand([batch_size, n_channels, d_model]) >>> wav_len = torch.ones([batch_size]) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, coverage, candidates, attn)[source]
此方法基于Coverage评分器对新光束进行评分。
- Parameters:
inp_tokens (torch.Tensor) – 当前时间步的输入张量。
coverage (无限制) – 此时间步长的评分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size). 在完整评分器之后需要评分的top-k候选。 如果为None,评分器将在完整词汇集上进行评分。
attn (torch.Tensor) – 用于CoverageScorer或CTCScorer的注意力权重。
- Returns:
score (torch.Tensor)
coverage
- class speechbrain.decoders.scorer.LengthScorer(vocab_size)[source]
-
一个长度奖励评分器。
LengthScorer 用于提供长度奖励分数。 它用于防止束搜索偏向于短假设。
注意:length_normalization 与此评分器不兼容。请确保在使用 LengthScorer 时将其设置为 False。
- Parameters:
vocab_size (int) – 总标记数。
Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.nnet.RNN import AttentionalRNNDecoder >>> from speechbrain.decoders import S2SRNNBeamSearcher, RNNLMScorer, CoverageScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> emb = torch.nn.Embedding( ... num_embeddings=vocab_size, ... embedding_dim=input_size ... ) >>> d_model=7 >>> dec = AttentionalRNNDecoder( ... rnn_type="gru", ... attn_type="content", ... hidden_size=3, ... attn_dim=3, ... num_layers=1, ... enc_dim=d_model, ... input_size=input_size, ... ) >>> n_channels=3 >>> seq_lin = Linear(input_shape=[d_model, n_channels], n_neurons=vocab_size) >>> lm_weight = 0.4 >>> length_weight = 1.0 >>> lm_model = RNNLM( ... embedding_dim=d_model, ... output_neurons=vocab_size, ... dropout=0.0, ... rnn_neurons=128, ... dnn_neurons=64, ... return_hidden=True, ... ) >>> rnnlm_scorer = RNNLMScorer( ... language_model=lm_model, ... temperature=1.25, ... ) >>> length_scorer = LengthScorer(vocab_size=vocab_size) >>> scorer = ScorerBuilder( ... full_scorers=[rnnlm_scorer, length_scorer], ... weights={'rnnlm': lm_weight, 'length': length_weight} ... ) >>> beam_size=5 >>> searcher = S2SRNNBeamSearcher( ... embedding=emb, ... decoder=dec, ... linear=seq_lin, ... bos_index=1, ... eos_index=2, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... topk=2, ... using_eos_threshold=False, ... beam_size=beam_size, ... temperature=1.25, ... length_normalization=False, ... scorer=scorer ... ) >>> batch_size=2 >>> enc = torch.rand([batch_size, n_channels, d_model]) >>> wav_len = torch.ones([batch_size]) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, candidates, attn)[source]
此方法基于长度评分器对新光束进行评分。
- Parameters:
inp_tokens (torch.Tensor) – 当前时间步的输入张量。
memory (无限制) – 此时间步长的评分器状态。
candidates (torch.Tensor) – (batch_size x beam_size, scorer_beam_size). 在完整评分器之后需要评分的top-k候选。 如果为None,评分器将在完整词汇集上进行评分。
attn (torch.Tensor) – 用于CoverageScorer或CTCScorer的注意力权重。
- Returns:
torch.Tensor – 分数
None
- class speechbrain.decoders.scorer.ScorerBuilder(weights={}, full_scorers=[], partial_scorers=[], scorer_beam_scale=2)[source]
基础类:
object为beamsearch构建评分器实例。
ScorerBuilder类负责为束搜索构建一个评分器实例。它接受完整和部分评分器的权重,以及完整和部分评分器类的实例。它根据指定的权重组合评分器,并提供用于评分标记、排列评分器内存和重置评分器内存的方法。
这是用于构建光束搜索的评分器实例的类。
参见 speechbrain.decoders.seq2seq.S2SBeamSearcher()
- Parameters:
Example
>>> from speechbrain.nnet.linear import Linear >>> from speechbrain.lobes.models.transformer.TransformerASR import TransformerASR >>> from speechbrain.lobes.models.transformer.TransformerLM import TransformerLM >>> from speechbrain.decoders import S2STransformerBeamSearcher, TransformerLMScorer, CoverageScorer, CTCScorer, ScorerBuilder >>> input_size=17 >>> vocab_size=11 >>> d_model=128 >>> net = TransformerASR( ... tgt_vocab=vocab_size, ... input_size=input_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=1, ... d_ffn=256, ... activation=torch.nn.GELU ... ) >>> lm_model = TransformerLM( ... vocab=vocab_size, ... d_model=d_model, ... nhead=8, ... num_encoder_layers=1, ... num_decoder_layers=0, ... d_ffn=256, ... activation=torch.nn.GELU, ... ) >>> n_channels=6 >>> ctc_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> seq_lin = Linear(input_size=d_model, n_neurons=vocab_size) >>> eos_index = 2 >>> ctc_scorer = CTCScorer( ... ctc_fc=ctc_lin, ... blank_index=0, ... eos_index=eos_index, ... ) >>> transformerlm_scorer = TransformerLMScorer( ... language_model=lm_model, ... temperature=1.15, ... ) >>> coverage_scorer = CoverageScorer(vocab_size=vocab_size) >>> ctc_weight_decode=0.4 >>> lm_weight=0.6 >>> coverage_penalty = 1.0 >>> scorer = ScorerBuilder( ... full_scorers=[transformerlm_scorer, coverage_scorer], ... partial_scorers=[ctc_scorer], ... weights={'transformerlm': lm_weight, 'ctc': ctc_weight_decode, 'coverage': coverage_penalty} ... ) >>> beam_size=5 >>> searcher = S2STransformerBeamSearcher( ... modules=[net, seq_lin], ... bos_index=1, ... eos_index=eos_index, ... min_decode_ratio=0.0, ... max_decode_ratio=1.0, ... using_eos_threshold=False, ... beam_size=beam_size, ... topk=3, ... temperature=1.15, ... scorer=scorer ... ) >>> batch_size=2 >>> wav_len = torch.ones([batch_size]) >>> src = torch.rand([batch_size, n_channels, input_size]) >>> tgt = torch.randint(0, vocab_size, [batch_size, n_channels]) >>> enc, dec = net.forward(src, tgt) >>> hyps, _, _, _ = searcher(enc, wav_len)
- score(inp_tokens, memory, attn, log_probs, beam_size)[source]
此方法根据定义的全部分数器和部分分数器对词汇表中的标记进行评分。分数将被添加到beamsearch的日志概率中。
- Parameters:
- Returns:
log_probs (torch.Tensor) – (batch_size x beam_size, vocab_size)。由评分器更新的对数概率。
new_memory (dict[str, scorer memory]) – 评分器的更新状态。
- class speechbrain.decoders.scorer.BaseRescorerInterface[source]
-
一个用于继承的评分器抽象,旨在用于波束搜索中使用的其他评分方法。
在这种方法中,使用神经网络为潜在的文本转录分配分数。 束搜索解码过程生成一组前K个假设。 这些候选者随后被发送到语言模型(LM)进行排名。 排名由LM执行,它为每个候选者分配一个分数。
分数计算如下:
分数 = 波束搜索分数 + 语言模型权重 * 重评分分数
- See:
speechbrain.decoders.scorer.RNNLMRescorer
speechbrain.decoders.scorer.TransformerLMRescorer
speechbrain.decoders.scorer.HuggingFaceLMRescorer
- class speechbrain.decoders.scorer.RNNLMRescorer(language_model, tokenizer, device='cuda', temperature=1.0, bos_index=0, eos_index=0, pad_index=0)[source]
-
基于BaseRescorerInterface的RNNLM封装器。
- Parameters:
language_model (torch.nn.Module) – 一个基于RNN的语言模型。
tokenizer (SentencePieceProcessor) – 一个 SentencePiece 分词器。
device (str) – 将评分器移动到的设备。
temperature (float) – 应用于softmax的温度因子。它改变了概率分布,当T>1时更柔和,当T<1时更尖锐。(默认值:1.0)
bos_index (int) – 序列开始(bos)标记的索引。
eos_index (int) – 序列结束(eos)标记的索引。
pad_index (int) – 填充标记的索引。
注意
此类旨在与预训练的TransformerLM模型一起使用。 请参阅:https://huggingface.co/speechbrain/asr-crdnn-rnnlm-librispeech
默认情况下,此模型使用 SentencePiece 分词器。
Example
>>> import torch >>> from sentencepiece import SentencePieceProcessor >>> from speechbrain.lobes.models.RNNLM import RNNLM >>> from speechbrain.utils.parameter_transfer import Pretrainer >>> source = "speechbrain/asr-crdnn-rnnlm-librispeech" >>> lm_model_path = source + "/lm.ckpt" >>> tokenizer_path = source + "/tokenizer.ckpt" >>> # define your tokenizer and RNNLM from the HF hub >>> tokenizer = SentencePieceProcessor() >>> lm_model = RNNLM( ... output_neurons = 1000, ... embedding_dim = 128, ... activation = torch.nn.LeakyReLU, ... dropout = 0.0, ... rnn_layers = 2, ... rnn_neurons = 2048, ... dnn_blocks = 1, ... dnn_neurons = 512, ... return_hidden = True, ... ) >>> pretrainer = Pretrainer( ... collect_in = getfixture("tmp_path"), ... loadables = { ... "lm" : lm_model, ... "tokenizer" : tokenizer, ... }, ... paths = { ... "lm" : lm_model_path, ... "tokenizer" : tokenizer_path, ... }) >>> _ = pretrainer.collect_files() >>> pretrainer.load_collected() >>> from speechbrain.decoders.scorer import RNNLMRescorer, RescorerBuilder >>> rnnlm_rescorer = RNNLMRescorer( ... language_model = lm_model, ... tokenizer = tokenizer, ... temperature = 1.0, ... bos_index = 0, ... eos_index = 0, ... pad_index = 0, ... ) >>> # Define a rescorer builder >>> rescorer = RescorerBuilder( ... rescorers=[rnnlm_rescorer], ... weights={"rnnlm":1.0} ... ) >>> # topk hyps >>> topk_hyps = [["HELLO", "HE LLO", "H E L L O"]] >>> topk_scores = [[-2, -2, -2]] >>> rescored_hyps, rescored_scores = rescorer.rescore(topk_hyps, topk_scores) >>> # NOTE: the returned hypotheses are already sorted by score. >>> rescored_hyps [['HELLO', 'H E L L O', 'HE LLO']] >>> # NOTE: as we are returning log-probs, the more it is closer to 0, the better. >>> rescored_scores [[-17.863974571228027, -25.12890625, -26.075977325439453]]
- to_device(device=None)[source]
此方法将评分器移动到设备上。
如果设备为None,则评分器将移动到构造函数中提供的默认设备。
- Parameters:
device (str) – 将评分器移动到的设备。
- class speechbrain.decoders.scorer.TransformerLMRescorer(language_model, tokenizer, device='cuda', temperature=1.0, bos_index=0, eos_index=0, pad_index=0)[source]
-
基于BaseRescorerInterface的TransformerLM包装器。
- Parameters:
language_model (torch.nn.Module) – 一个基于Transformer的语言模型。
tokenizer (SentencePieceProcessor) – 一个 SentencePiece 分词器。
device (str) – 将评分器移动到的设备。
temperature (float) – 应用于softmax的温度因子。它改变了概率分布,当T>1时更柔和,当T<1时更尖锐。(默认值:1.0)
bos_index (int) – 序列开始(bos)标记的索引。
eos_index (int) – 序列结束(eos)标记的索引。
pad_index (int) – 填充标记的索引。
注意
此类旨在与预训练的TransformerLM模型一起使用。 请参阅:https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech
默认情况下,此模型使用 SentencePiece 分词器。
Example
>>> import torch >>> from sentencepiece import SentencePieceProcessor >>> from speechbrain.lobes.models.transformer.TransformerLM import TransformerLM >>> from speechbrain.utils.parameter_transfer import Pretrainer >>> source = "speechbrain/asr-transformer-transformerlm-librispeech" >>> lm_model_path = source + "/lm.ckpt" >>> tokenizer_path = source + "/tokenizer.ckpt" >>> tokenizer = SentencePieceProcessor() >>> lm_model = TransformerLM( ... vocab=5000, ... d_model=768, ... nhead=12, ... num_encoder_layers=12, ... num_decoder_layers=0, ... d_ffn=3072, ... dropout=0.0, ... activation=torch.nn.GELU, ... normalize_before=False, ... ) >>> pretrainer = Pretrainer( ... collect_in = getfixture("tmp_path"), ... loadables={ ... "lm": lm_model, ... "tokenizer": tokenizer, ... }, ... paths={ ... "lm": lm_model_path, ... "tokenizer": tokenizer_path, ... } ... ) >>> _ = pretrainer.collect_files() >>> pretrainer.load_collected() >>> from speechbrain.decoders.scorer import TransformerLMRescorer, RescorerBuilder >>> transformerlm_rescorer = TransformerLMRescorer( ... language_model=lm_model, ... tokenizer=tokenizer, ... temperature=1.0, ... bos_index=1, ... eos_index=2, ... pad_index=0, ... ) >>> rescorer = RescorerBuilder( ... rescorers=[transformerlm_rescorer], ... weights={"transformerlm": 1.0} ... ) >>> topk_hyps = [["HELLO", "HE LLO", "H E L L O"]] >>> topk_scores = [[-2, -2, -2]] >>> rescored_hyps, rescored_scores = rescorer.rescore(topk_hyps, topk_scores) >>> # NOTE: the returned hypotheses are already sorted by score. >>> rescored_hyps [["HELLO", "HE L L O", "HE LLO"]] >>> # NOTE: as we are returning log-probs, the more it is closer to 0, the better. >>> rescored_scores [[-17.863974571228027, -25.12890625, -26.075977325439453]]
- to_device(device=None)[source]
此方法将评分器移动到设备上。
如果设备为None,则评分器将移动到构造函数中提供的默认设备。
当阶段等于TEST时,此方法会在配方中动态调用。
- Parameters:
device (str) – 将评分器移动到的设备。
- class speechbrain.decoders.scorer.HuggingFaceLMRescorer(model_name, device='cuda')[source]
-
基于BaseRescorerInterface的HuggingFace的TransformerLM的封装。
Example
>>> from speechbrain.decoders.scorer import HuggingFaceLMRescorer, RescorerBuilder >>> source = "gpt2-medium" >>> huggingfacelm_rescorer = HuggingFaceLMRescorer( ... model_name=source, ... ) >>> rescorer = RescorerBuilder( ... rescorers=[huggingfacelm_rescorer], ... weights={"huggingfacelm": 1.0} ... ) >>> topk_hyps = [["Hello everyone.", "Hell o every one.", "Hello every one"]] >>> topk_scores = [[-2, -2, -2]] >>> rescored_hyps, rescored_scores = rescorer.rescore(topk_hyps, topk_scores) >>> # NOTE: the returned hypotheses are already sorted by score. >>> rescored_hyps [['Hello everyone.', 'Hello every one', 'Hell o every one.']] >>> # NOTE: as we are returning log-probs, the more it is closer to 0, the better. >>> rescored_scores [[-20.03631591796875, -27.615638732910156, -42.662353515625]]
- to_device(device=None)[source]
此方法将评分器移动到设备上。
如果设备为None,则评分器将移动到构造函数中提供的默认设备。
当阶段等于TEST时,此方法会在配方中动态调用。
- Parameters:
device (str) – 将评分器移动到的设备。