speechbrain.k2_integration.lattice_decoder 模块

针对k2的不同解码图算法，无论是HL还是HLG（带有G LM和更大的重评分LM）。

此代码是从icefall调整而来 (https://github.com/k2-fsa/icefall/blob/master/icefall/decode.py).

Authors:

皮埃尔·冠军 2023
赵泽宇 2023
乔治奥斯·卡拉卡西迪斯 2023

摘要

函数：

`get_decoding`	此函数读取配置并为k2图编译器解码创建解码器。有以下几种情况： - HLG已编译并使用LM重评分。在这种情况下，compose_HL_with_G和use_G_rescoring都为True，我们将创建例如G_3_gram.fst.txt和G_4_gram.fst.txt。请注意，3gram和4gram ARPA lms需要存在于`hparams['lm_dir']`下。 - HLG已编译但不使用LM重评分。在这种情况下，compose_HL_with_G为True，use_G_rescoring为False，我们将创建例如G_3_gram.fst.txt。请注意，3gram ARPA lm需要存在于`hparams['lm_dir']`下。 - HLG未编译（仅使用HL图）并使用LM重评分。在这种情况下，compose_HL_with_G为False，use_G_rescoring为True。请注意，4gram ARPA lms需要存在于`hparams['lm_dir']`下。 - HLG未编译（仅使用HL图）且不使用LM重评分。在这种情况下，compose_HL_with_G为False，use_G_rescoring为False，我们将不会将LM转换为FST。
`get_lattice`	从解码图和神经网络输出中获取解码格。
`one_best_decoding`	从格中获取最佳路径。
`rescore_with_whole_lattice`	将网格与n-gram语言模型相交并使用最短路径进行解码。

参考

speechbrain.k2_integration.lattice_decoder.get_decoding(hparams: Dict, graphCompiler: GraphCompiler, device='cpu')[source]

此函数读取配置并为k2图编译器解码创建解码器。有以下几种情况：

HLG 被编译并且使用了 LM 重评分。在这种情况下， compose_HL_with_G 和 use_G_rescoring 都为 True，我们将创建例如 G_3_gram.fst.txt 和 G_4_gram.fst.txt。请注意 3gram 和 4gram ARPA lms 需要存在于 hparams['lm_dir'] 下。

HLG已编译但未使用LM重评分。在这种情况下， compose_HL_with_G为True且use_G_rescoring为False，我们将创建例如G_3_gram.fst.txt。请注意，3gram ARPA lm需要存在于hparams['lm_dir']下。

HLG 未编译（仅使用 HL 图）并且使用了 LM 重评分。在这种情况下，compose_HL_with_G 为 False 且 use_G_rescoring 为 True。请注意，4gram ARPA lms 需要存在于 hparams['lm_dir'] 下。

HLG 未编译（仅使用 HL 图）且未使用 LM 重评分。在这种情况下，compose_HL_with_G 为 False，use_G_rescoring 为 False 并且我们不会将 LM 转换为 FST。

Parameters:

hparams (dict) – 超参数。
graphCompiler (graph_compiler.GraphCompiler) – 图形编译器 (H)
device (torch.device) – 使用的设备。

Returns:

decoding_graph: k2.Fsa: 一个HL或HLG解码图。与神经网络输出和函数get_lattice一起使用，以获得解码格k2.Fsa。
decoding_method: Callable[[k2.Fsa], k2.Fsa]: 一个函数，调用解码格k2.Fsa（在神经网络输出与HL或HLG相交后获得）。返回包含线性FSAs的FsaVec。

Return type:

字典

Example

>>> import torch
>>> from speechbrain.k2_integration.losses import ctc_k2
>>> from speechbrain.k2_integration.utils import lattice_paths_to_text
>>> from speechbrain.k2_integration.graph_compiler import CtcGraphCompiler
>>> from speechbrain.k2_integration.lexicon import Lexicon
>>> from speechbrain.k2_integration.prepare_lang import prepare_lang
>>> from speechbrain.k2_integration.lattice_decoder import get_decoding
>>> from speechbrain.k2_integration.lattice_decoder import get_lattice

>>> batch_size = 1

>>> log_probs = torch.randn(batch_size, 40, 10)
>>> log_probs.requires_grad = True
>>> # Assume all utterances have the same length so no padding was needed.
>>> input_lens = torch.ones(batch_size)
>>> # Create a small lexicon containing only two words and write it to a file.
>>> lang_tmpdir = getfixture('tmpdir')
>>> lexicon_sample = "hello h e l l o\nworld w o r l d\n<UNK> <unk>"
>>> lexicon_file = lang_tmpdir.join("lexicon.txt")
>>> lexicon_file.write(lexicon_sample)
>>> # Create a lang directory with the lexicon and L.pt, L_inv.pt, L_disambig.pt
>>> prepare_lang(lang_tmpdir)
>>> # Create a lexicon object
>>> lexicon = Lexicon(lang_tmpdir)
>>> # Create a random decoding graph
>>> graph = CtcGraphCompiler(
...     lexicon,
...     log_probs.device,
... )

>>> decode = get_decoding(
...     {"compose_HL_with_G": False,
...      "decoding_method": "onebest",
...      "lang_dir": lang_tmpdir},
...     graph)
>>> lattice = get_lattice(log_probs, input_lens, decode["decoding_graph"])
>>> path = decode["decoding_method"](lattice)['1best']
>>> text = lattice_paths_to_text(path, lexicon.word_table)

speechbrain.k2_integration.lattice_decoder.get_lattice(log_probs_nnet_output: Tensor, input_lens: Tensor, decoder: k2.Fsa, search_beam: int = 5, output_beam: int = 5, min_active_states: int = 300, max_active_states: int = 1000, ac_scale: float = 1.0, subsampling_factor: int = 1) → k2.Fsa[source]

从解码图和神经网络输出中获取解码格。

Parameters:

log_probs_nnet_output (torch.Tensor) – 它是神经模型的输出，形状为 (batch, seq_len, num_tokens)。
input_lens (torch.Tensor) – 它是一个形状为 (batch,) 的整数张量。它包含了 log_probs_nnet_output 中每个序列的长度。
decoder (k2.Fsa) – 它是一个表示解码图的k2.Fsa实例。
search_beam (int) – 解码束，例如20。越小越快，越大越精确（修剪更少）。这是默认值；它可能会被min_active_states和max_active_states修改。
output_beam (int) – 用于修剪输出的波束，类似于Kaldi中的lattice-beam。相对于输出的最佳路径。
min_active_states (int) – 在任何给定的帧中，对于任何给定的交集/组合任务，允许的最小FSA状态数。这是建议性的，因为它会尽量避免少于这个数量的状态处于活动状态。如果没有约束，请将其设置为零。
max_active_states (int) – 在任何给定的帧中，对于任何给定的交集/组合任务，允许的最大FSA状态数。这是建议性的，因为它会尽量不超过这个数量，但可能并不总是成功。如果不需要约束，可以使用一个非常大的数字。
ac_scale (float) – 应用于 log_probs_nnet_output 的声学尺度
subsampling_factor (int) – 模型的子采样因子。

Returns:

lattice – 一个包含解码结果的 FsaVec。它的轴是 [utt][state][arc]。

Return type:

k2.Fsa

speechbrain.k2_integration.lattice_decoder.one_best_decoding(lattice: k2.Fsa, use_double_scores: bool = True) → k2.Fsa[source]

从网格中获取最佳路径。

Parameters:

lattice (k2.Fsa) – 由get_lattice()返回的解码网格。
use_double_scores (bool) – 如果为True，则在计算中使用双精度浮点数。如果为False，则使用单精度。

Returns:

best_path – 包含线性路径的 FsaVec。

Return type:

k2.Fsa

speechbrain.k2_integration.lattice_decoder.rescore_with_whole_lattice(lattice: k2.Fsa, G_with_epsilon_loops: k2.Fsa, lm_scale_list: List[float] | None = None, use_double_scores: bool = True) → k2.Fsa | Dict[str, k2.Fsa][source]

将格与n-gram语言模型相交并使用最短路径进行解码。输入格是通过将HLG与DenseFsaVec相交获得的，其中HLG中的G通常是一个3-gram语言模型。输入的G_with_epsilon_loops通常是一个4-gram语言模型。你可以将此函数视为第二遍解码。在第一遍解码中，我们使用一个较小的G，而在第二遍解码中，我们使用一个较大的G。

Parameters:

lattice (k2.Fsa) – 一个具有轴 [utt][state][arc] 的 FsaVec。它的 aux_labels 是单词 ID。它必须具有属性 lm_scores。
G_with_epsilon_loops (k2.Fsa) – 一个仅包含单个FSA的FsaVec。它包含epsilon自循环。它是一个接受器，其标签是单词ID。
lm_scale_list (可选[列表[float]]) – 如果为None，则返回lattice和G_with_epsilon_loops的交集。如果不为None，则包含一个用于缩放LM分数的值列表。对于每个缩放值，结果字典中包含相应的解码结果。
use_double_scores (bool) – 如果为True，则在计算中使用双精度。如果为False，则使用单精度。

Returns:

如果 lm_scale_list 为 None，则返回一个新的 lattice，它是 lattice 和 G_with_epsilon_loops 的交集结果。
否则，返回一个字典，其键是 lm_scale_list 中的一个条目，
值是解码结果（即包含线性 FSA 的 FsaVec）。