speechbrain.lobes.models.g2p.homograph 模块
用于同形异义词消歧的工具 作者
阿尔乔姆·普洛日尼科夫 2021
摘要
类:
一个实用类,用于帮助从一批序列中提取子序列 |
|
用于输出中特定单词的损失函数,用于同形异义词消歧任务。方法如下:1. |
参考
- class speechbrain.lobes.models.g2p.homograph.SubsequenceLoss(seq_cost, word_separator=0, word_separator_base=0)[source]
基础:
Module用于输出中特定单词的损失函数,用于同形异义词消歧任务 方法如下: 1. 仅将原始批次中的目标单词排列成一个单一的张量 2. 找到每个目标单词的单词索引 3. 计算预测序列中单词的开始和结束位置。假设模型已经训练得足够好,可以通过简单的argmax识别单词边界,而不需要执行束搜索。 重要提示!此损失函数仅可用于微调 模型应已能够正确预测单词边界
- Parameters:
Example
>>> import torch >>> from speechbrain.lobes.models.g2p.homograph import SubsequenceLoss >>> from speechbrain.nnet.losses import nll_loss >>> loss = SubsequenceLoss( ... seq_cost=nll_loss ... ) >>> phns = torch.Tensor( ... [[1, 2, 0, 1, 3, 0, 2, 1, 0], ... [2, 1, 3, 0, 1, 2, 0, 3, 2]] ... ) >>> phn_lens = torch.IntTensor([8, 9]) >>> subsequence_phn_start = torch.IntTensor([3, 4]) >>> subsequence_phn_end = torch.IntTensor([5, 7]) >>> p_seq = torch.Tensor([ ... [[0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [1., 0., 0., 0.]], ... [[0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 0., 0., 1.], ... [0., 0., 1., 0.]] ... ]) >>> loss_value = loss( ... phns, ... phn_lens, ... p_seq, ... subsequence_phn_start, ... subsequence_phn_end ... ) >>> loss_value tensor(-0.8000)
- property word_separator
使用的单词分隔符
- property word_separator_base
使用的单词分隔符
- forward(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_lens_base=None)[source]
评估子序列损失
- Parameters:
phns (torch.Tensor) – 音素张量(批次 x 长度)
phn_lens (torch.Tensor) – 音素长度张量
p_seq (torch.Tensor) – 输出的音素概率张量 (batch x length x phns)
subsequence_phn_start (torch.Tensor) – 目标子序列的开始 (即同形异义词)
subsequence_phn_end (torch.Tensor) – 目标子序列的结束 (即同形异义词)
phns_base (torch.Tensor) – 音素张量(未预处理)
phn_lens_base (torch.Tensor) – 音素长度(未预处理)
- Returns:
loss – 损失张量
- Return type:
torch.Tensor
- class speechbrain.lobes.models.g2p.homograph.SubsequenceExtractor(word_separator=0, word_separator_base=None)[source]
基础类:
object一个实用类,用于帮助从一批序列中提取子序列
- Parameters:
Example
>>> import torch >>> from speechbrain.lobes.models.g2p.homograph import SubsequenceExtractor >>> extractor = SubsequenceExtractor() >>> phns = torch.Tensor( ... [[1, 2, 0, 1, 3, 0, 2, 1, 0], ... [2, 1, 3, 0, 1, 2, 0, 3, 2]] ... ) >>> phn_lens = torch.IntTensor([8, 9]) >>> subsequence_phn_start = torch.IntTensor([3, 4]) >>> subsequence_phn_end = torch.IntTensor([5, 7]) >>> p_seq = torch.Tensor([ ... [[0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [1., 0., 0., 0.]], ... [[0., 0., 1., 0.], ... [0., 1., 0., 0.], ... [0., 0., 0., 1.], ... [1., 0., 0., 0.], ... [0., 1., 0., 0.], ... [0., 0., 1., 0.], ... [1., 0., 0., 0.], ... [0., 0., 0., 1.], ... [0., 0., 1., 0.]] ... ]) >>> extractor.extract_seq( ... phns, ... phn_lens, ... p_seq, ... subsequence_phn_start, ... subsequence_phn_end ... ) (tensor([[[0., 1., 0., 0.], [0., 0., 0., 1.], [0., 0., 0., 0.]], [[0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 0.]]]), tensor([[1., 3., 0.], [1., 2., 0.]]), tensor([0.6667, 1.0000]))
- extract_seq(phns, phn_lens, p_seq, subsequence_phn_start, subsequence_phn_end, phns_base=None, phn_base_lens=None)[source]
从完整序列中提取子序列
- Parameters:
phns (torch.Tensor) – 音素张量(批次 x 长度)
phn_lens (torch.Tensor) – 音素长度张量
p_seq (torch.Tensor) – 输出的音素概率张量 (batch x length x phns)
subsequence_phn_start (torch.Tensor) – 目标子序列的开始 (即同形异义词)
subsequence_phn_end (torch.Tensor) – 目标子序列的结束 (即同形异义词)
phns_base (torch.Tensor) – 音素张量(未预处理)
phn_base_lens (torch.Tensor) – 音素长度(未预处理)
- Returns:
p_seq_subsequence (torch.Tensor) – 输出的子序列(概率的)
phns_subsequence (torch.Tensor) – 目标子序列
subsequence_lengths (torch.Tensor) – 子序列长度,表示为张量最后一个维度的分数