speechbrain.nnet.losses 模块

训练神经网络的损失。

Authors

Mirco Ravanelli 2020
萨穆埃莱·科内尔 2020
Hwidong Na 2020
高岩 2020
Titouan Parcollet 2020

摘要

类：

`AdditiveAngularMargin`	以下论文中提出的加性角度边距（AAM）的实现：'''边距的重要性：面向更区分的深度神经网络嵌入用于说话人识别''' (https://arxiv.org/abs/1906.07317)
`AngularMargin`	以下论文中提出的Angular Margin (AM)的实现：'''Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition''' (https://arxiv.org/abs/1906.07317)
`AutoencoderLoss`	标准（非变分）自编码器损失的实现
`AutoencoderLossDetails`
`ContrastiveLoss`	用于wav2vec2的对比损失。
`Laplacian`	计算图像类数据的拉普拉斯算子
`LaplacianVarianceLoss`	拉普拉斯方差损失 - 用于惩罚图像类数据（如频谱图）中的模糊性。
`LogSoftmaxWrapper`
`PitWrapper`	排列不变包装器，允许使用现有损失进行排列不变训练（PIT）。
`VariationalAutoencoderLoss`	变分自编码器损失，支持长度掩码
`VariationalAutoencoderLossDetails`

函数：

`bce_loss`	计算二元交叉熵（BCE）损失。
`cal_si_snr`	计算SI-SNR。
`cal_snr`	计算双耳通道信噪比。
`ce_kd`	交叉熵损失的简化蒸馏版本。
`classification_error`	计算帧或批次级别的分类错误。
`compute_length_mask`	计算指定数据形状的长度掩码
`compute_masked_loss`	计算一组不等长波形的真实平均损失。
`ctc_loss`	CTC损失。
`ctc_loss_kd`	CTC损失的蒸馏知识。
`distance_diff_loss`	一种损失函数，可用于模型在区间尺度上为离散变量输出任意概率分布的情况，例如序列的长度，而真实值是从数据样本中获取的变量的精确值。
`get_mask`
`get_si_snr_with_pitwrapper`	此函数使用speechbrain的pit-wrapper封装了si_snr的计算。
`get_snr_with_pitwrapper`	此函数使用speechbrain的pit-wrapper封装了snr计算。
`kldiv_loss`	计算批次级别的KL散度误差。
`l1_loss`	计算真实的l1损失，考虑长度差异。
`mse_loss`	计算真实的均方误差，考虑长度差异。
`nll_loss`	计算负对数似然损失。
`nll_loss_kd`	负对数似然损失的知识蒸馏。
`reduce_loss`	执行指定减少原始损失值的操作
`transducer_loss`	Transducer损失，参见`speechbrain/nnet/loss/transducer_loss.py`。
`truncate`	确保预测和目标长度相同。

参考

speechbrain.nnet.losses.transducer_loss(logits, targets, input_lens, target_lens, blank_index, reduction='mean', use_torchaudio=True)[source]

转换器损失，参见 speechbrain/nnet/loss/transducer_loss.py。

Parameters:

logits (torch.Tensor) – 预测张量，形状为 [batch, maxT, maxU, num_labels]。
targets (torch.Tensor) – 目标张量，没有任何空白，形状为 [batch, target_len]。
input_lens (torch.Tensor) – 每个话语的长度。
target_lens (torch.Tensor) – 每个目标序列的长度。
blank_index (int) – 空白符号在标签索引中的位置。
reduction (str) – 指定应用于输出的缩减方式：'mean' | 'batchmean' | 'sum'。
use_torchaudio (bool) – 如果为True，则使用torchaudio中的Transducer损失实现，否则使用Speechbrain的Numba实现。

Return type:

计算出的传感器损耗。

class speechbrain.nnet.losses.PitWrapper(base_loss)[source]

基础：Module

排列不变包装器，允许使用现有损失进行排列不变训练（PIT）。

排列不变性是在源/类别轴上计算的，该轴被假定为最右边的维度：预测和目标张量被假定具有形状 [batch, …, channels, sources]。

Parameters:: base_loss (function) – 基础损失函数，例如 torch.nn.MSELoss。假设它需要两个参数： predictions 和 targets，并且不执行 reduction。（如果使用 pytorch 损失函数，用户必须指定 reduction=”none”）。

Example

>>> pit_mse = PitWrapper(nn.MSELoss(reduction="none"))
>>> targets = torch.rand((2, 32, 4))
>>> p = (3, 0, 2, 1)
>>> predictions = targets[..., p]
>>> loss, opt_p = pit_mse(predictions, targets)
>>> loss
tensor([0., 0.])

reorder_tensor(tensor, p)[source]

Parameters:

张量 (torch.Tensor) – 根据最优排列重新排序的torch.Tensor，形状为 [batch, …, sources]。
p (list of tuples) – 最优排列的列表，例如对于 batch=2 和 n_sources=3 [(0, 1, 2), (0, 2, 1].

Returns:

reordered – 根据排列 p 重新排序的张量。

Return type:

torch.Tensor

forward(preds, targets)[source]

Parameters:

preds (torch.Tensor) – 网络预测张量，形状为 [batch, channels, …, sources]。
targets (torch.Tensor) – 目标张量，形状为 [batch, channels, …, sources]。

Returns:

loss (torch.Tensor) – 当前示例的排列不变损失，形状为 [batch] 的张量
perms (list) – 输入在源上的最优排列的索引列表。例如，对于三个源和每批2个示例，[(0, 1, 2), (2, 1, 0)]

speechbrain.nnet.losses.ctc_loss(log_probs, targets, input_lens, target_lens, blank_index, reduction='mean')[source]

CTC损失。

Parameters:

log_probs (torch.Tensor) – 预测的张量，形状为 [batch, time, chars]。
targets (torch.Tensor) – 目标张量，没有任何空白，形状为 [batch, target_len]
input_lens (torch.Tensor) – 每个话语的长度。
target_lens (torch.Tensor) – 每个目标序列的长度。
blank_index (int) – 空白符号在字符索引中的位置。
reduction (str) – 应用于输出的缩减方式。'mean'、'sum'、'batch'、'batchmean'、'none'。参见pytorch中的'mean'、'sum'、'none'。'batch'选项返回批次中每个项目的损失，'batchmean'返回总和除以批次大小。

Return type:

计算出的CTC损失。

speechbrain.nnet.losses.l1_loss(predictions, targets, length=None, allowed_len_diff=3, reduction='mean')[source]

计算真实的l1损失，考虑长度差异。

Parameters:

predictions (torch.Tensor) – 预测的张量，形状为 [batch, time, *]。
targets (torch.Tensor) – 目标张量，与预测张量大小相同。
length (torch.Tensor) – 每个话语的长度，用于通过掩码计算真实误差。
allowed_len_diff (int) – 在引发异常之前可以容忍的长度差异。
reduction (str) – 选项有 'mean', 'batch', 'batchmean', 'sum'。关于 'mean', 'sum' 请参见 pytorch。'batch' 选项返回批次中每个项目的损失，'batchmean' 返回总和除以批次大小。

Return type:

计算得到的L1损失。

Example

>>> probs = torch.tensor([[0.9, 0.1, 0.1, 0.9]])
>>> l1_loss(probs, torch.tensor([[1., 0., 0., 1.]]))
tensor(0.1000)

speechbrain.nnet.losses.mse_loss(predictions, targets, length=None, allowed_len_diff=3, reduction='mean')[source]

计算真实的均方误差，考虑长度差异。

Parameters:

predictions (torch.Tensor) – 预测张量，形状为 [batch, time, *]。
targets (torch.Tensor) – 目标张量，大小与预测张量相同。
length (torch.Tensor) – 每个话语的长度，用于通过掩码计算真实误差。
allowed_len_diff (int) – 在引发异常之前可以容忍的长度差异。
reduction (str) – 选项有 'mean', 'batch', 'batchmean', 'sum'。有关 'mean', 'sum' 的详细信息，请参见 pytorch。'batch' 选项返回批次中每个项目的损失，'batchmean' 返回总和 / 批次大小。

Return type:

计算出的MSE损失。

Example

>>> probs = torch.tensor([[0.9, 0.1, 0.1, 0.9]])
>>> mse_loss(probs, torch.tensor([[1., 0., 0., 1.]]))
tensor(0.0100)

speechbrain.nnet.losses.classification_error(probabilities, targets, length=None, allowed_len_diff=3, reduction='mean')[source]

计算帧或批次级别的分类错误。

Parameters:

概率 (torch.Tensor) – 形状为 [batch, prob] 或 [batch, frames, prob] 的后验概率
targets (torch.Tensor) – 目标，形状为 [batch] 或 [batch, frames]
length (torch.Tensor) – 每个话语的长度，如果需要帧级损失。
allowed_len_diff (int) – 在引发异常之前可以容忍的长度差异。
reduction (str) – 选项有 'mean', 'batch', 'batchmean', 'sum'。参见 pytorch 的 'mean', 'sum'。'batch' 选项返回批次中每个项目的损失，'batchmean' 返回总和 / 批次大小。

Return type:

计算出的分类错误。

Example

>>> probs = torch.tensor([[[0.9, 0.1], [0.1, 0.9]]])
>>> classification_error(probs, torch.tensor([1, 1]))
tensor(0.5000)

speechbrain.nnet.losses.nll_loss(log_probabilities, targets, length=None, label_smoothing=0.0, allowed_len_diff=3, weight=None, reduction='mean')[source]

计算负对数似然损失。

Parameters:

log_probabilities (torch.Tensor) – 应用对数后的概率。格式为 [batch, log_p] 或 [batch, frames, log_p]。
targets (torch.Tensor) – 目标，形状为 [batch] 或 [batch, frames]。
length (torch.Tensor) – 每个话语的长度，如果需要帧级损失。
label_smoothing (float) – 应用于标签的平滑量（默认值为0.0，表示不进行平滑处理）
allowed_len_diff (int) – 在引发异常之前可以容忍的长度差异。
weight (torch.Tensor) – 每个类别的手动重新缩放权重。如果给定，必须是一个大小为C的张量。
reduction (str) – 选项有 'mean', 'batch', 'batchmean', 'sum'。关于 'mean', 'sum' 请参见 pytorch。'batch' 选项返回批次中每个项目的损失，'batchmean' 返回总和除以批次大小。

Return type:

计算得到的NLL损失。

Example

>>> probs = torch.tensor([[0.9, 0.1], [0.1, 0.9]])
>>> nll_loss(torch.log(probs), torch.tensor([1, 1]))
tensor(1.2040)

speechbrain.nnet.losses.bce_loss(inputs, targets, length=None, weight=None, pos_weight=None, reduction='mean', allowed_len_diff=3, label_smoothing=0.0)[source]

计算二元交叉熵（BCE）损失。它还直接应用sigmoid函数（这提高了数值稳定性）。

Parameters:

inputs (torch.Tensor) – 应用最终softmax之前的输出格式为 [batch[, 1]?] 或 [batch, frames[, 1]?]。（无论最后是否有单例维度都可以工作）。
targets (torch.Tensor) – 目标，形状为 [batch] 或 [batch, frames]。
length (torch.Tensor) – 每个话语的长度，如果需要帧级损失。
weight (torch.Tensor) – 如果提供了手动重新缩放的权重，它会重复以匹配输入张量的形状。
pos_weight (torch.Tensor) – 正例的权重。必须是一个长度等于类别数量的向量。
reduction (str) – 选项有 'mean', 'batch', 'batchmean', 'sum'。有关 'mean', 'sum' 的详细信息，请参见 pytorch。'batch' 选项返回批次中每个项目的损失，'batchmean' 返回总和 / 批次大小。
allowed_len_diff (int) – 在引发异常之前可以容忍的长度差异。
label_smoothing (float) – 应用于标签的平滑量（默认值为0.0，表示不进行平滑处理）

Return type:

计算出的BCE损失。

Example

>>> inputs = torch.tensor([10.0, -6.0])
>>> targets = torch.tensor([1, 0])
>>> bce_loss(inputs, targets)
tensor(0.0013)

speechbrain.nnet.losses.kldiv_loss(log_probabilities, targets, length=None, label_smoothing=0.0, allowed_len_diff=3, pad_idx=0, reduction='mean')[source]

在批次级别计算KL散度误差。此损失直接将标签平滑应用于目标

Parameters:

log_probabilities (torch.Tensor) – 后验概率的形状为 [batch, prob] 或 [batch, frames, prob]。
targets (torch.Tensor) – 目标，形状为 [batch] 或 [batch, frames]。
length (torch.Tensor) – 每个话语的长度，如果需要帧级损失。
label_smoothing (float) – 应用于标签的平滑量（默认值为0.0，表示不进行平滑处理）
allowed_len_diff (int) – 在引发异常之前可以容忍的长度差异。
pad_idx (int) – 此值的条目被视为填充。
reduction (str) – 选项有 'mean', 'batch', 'batchmean', 'sum'。参见 pytorch 的 'mean', 'sum'。'batch' 选项返回批次中每个项目的损失，'batchmean' 返回总和 / 批次大小。

Return type:

计算得到的kldiv损失。

Example

>>> probs = torch.tensor([[0.9, 0.1], [0.1, 0.9]])
>>> kldiv_loss(torch.log(probs), torch.tensor([1, 1]))
tensor(1.2040)

speechbrain.nnet.losses.distance_diff_loss(predictions, targets, length=None, beta=0.25, max_weight=100.0, reduction='mean')[source]

一种损失函数，可用于模型在区间尺度上为离散变量输出任意概率分布的情况，例如序列的长度，而真实值是从数据样本中获取的变量的精确值。

损失定义为 loss_i = p_i * exp(beta * |i - y|) - 1.

损失也可以用于输出不是概率的情况，只要希望接近真实位置的值高，远离它的值低即可。

Parameters:

predictions (torch.Tensor) – 一个 (batch x max_len) 的张量，其中每个元素是该位置的概率、权重或其他值
targets (torch.Tensor) – 一个1维张量，其中每个元素是真实值
length (torch.Tensor) – 长度（用于在填充批次中进行掩码）
beta (torch.Tensor) – 一个控制惩罚的超参数。随着beta的增加，惩罚将更快地增加
max_weight (torch.Tensor) – 最大距离权重（用于长序列中的数值稳定性）
reduction (str) – 选项有 'mean', 'batch', 'batchmean', 'sum'。有关 'mean', 'sum' 的详细信息，请参见 pytorch。'batch' 选项返回批次中每个项目的损失，'batchmean' 返回总和 / 批次大小

Return type:

掩码损失。

Example

>>> predictions = torch.tensor(
...    [[0.25, 0.5, 0.25, 0.0],
...     [0.05, 0.05, 0.9, 0.0],
...     [8.0, 0.10, 0.05, 0.05]]
... )
>>> targets = torch.tensor([2., 3., 1.])
>>> length = torch.tensor([.75, .75, 1.])
>>> loss = distance_diff_loss(predictions, targets, length)
>>> loss
tensor(0.2967)

speechbrain.nnet.losses.truncate(predictions, targets, allowed_len_diff=3)[source]

确保预测值和目标值的长度相同。

Parameters:

predictions (torch.Tensor) – 用于检查长度的第一个张量。
targets (torch.Tensor) – 用于检查长度的第二个张量。
allowed_len_diff (int) – 在引发异常之前可以容忍的长度差异。

Returns:

predictions (torch.Tensor)
targets (torch.Tensor) – 与输入相同，但具有相同的形状。

speechbrain.nnet.losses.compute_masked_loss(loss_fn, predictions, targets, length=None, label_smoothing=0.0, mask_shape='targets', reduction='mean')[source]

计算一组不等长波形的真实平均损失。

Parameters:

loss_fn (function) – 一个用于计算损失值的函数，仅接受预测值和目标值作为输入。应返回所有损失值，而不是一个缩减后的值（例如 reduction=”none”）。
predictions (torch.Tensor) – 损失函数的第一个参数。
targets (torch.Tensor) – 损失函数的第二个参数。
length (torch.Tensor) – 每个话语的长度，用于计算掩码。如果为None，则计算并返回全局平均值。
label_smoothing (float) – 标签平滑的比例。仅应用于NLL损失。参考：通过惩罚自信输出分布来正则化神经网络。https://arxiv.org/abs/1701.06548
mask_shape (torch.Tensor) –
掩码的形状默认是“targets”，这将使掩码与目标的形状相同

其他选项包括“predictions”和“loss”，它们将分别使用预测的形状和未减少的损失。这对于输出形状与目标不匹配的损失函数非常有用
reduction (str) – 其中之一是 'mean', 'batch', 'batchmean', 'none'，其中 'mean' 返回一个单一值，'batch' 返回批次中每个项目的一个值，'batchmean' 是总和除以批次大小，'none' 返回所有值。

Return type:

掩码损失。

speechbrain.nnet.losses.compute_length_mask(data, length=None, len_dim=1)[source]

计算指定数据形状的长度掩码

Parameters:

data (torch.Tensor) – 数据的形状
length (torch.Tensor) – 对应数据样本的长度
len_dim (int) – 长度维度（默认为1）

Returns:

mask – 掩码

Return type:

torch.Tensor

Example

>>> data = torch.arange(5)[None, :, None].repeat(3, 1, 2)
>>> data += torch.arange(1, 4)[:, None, None]
>>> data *= torch.arange(1, 3)[None, None, :]
>>> data
tensor([[[ 1,  2],
         [ 2,  4],
         [ 3,  6],
         [ 4,  8],
         [ 5, 10]],

        [[ 2,  4],
         [ 3,  6],
         [ 4,  8],
         [ 5, 10],
         [ 6, 12]],

        [[ 3,  6],
         [ 4,  8],
         [ 5, 10],
         [ 6, 12],
         [ 7, 14]]])
>>> compute_length_mask(data, torch.tensor([1., .4, .8]))
tensor([[[1, 1],
         [1, 1],
         [1, 1],
         [1, 1],
         [1, 1]],

        [[1, 1],
         [1, 1],
         [0, 0],
         [0, 0],
         [0, 0]],

        [[1, 1],
         [1, 1],
         [1, 1],
         [1, 1],
         [0, 0]]])
>>> compute_length_mask(data, torch.tensor([.5, 1., .5]), len_dim=2)
tensor([[[1, 0],
         [1, 0],
         [1, 0],
         [1, 0],
         [1, 0]],

        [[1, 1],
         [1, 1],
         [1, 1],
         [1, 1],
         [1, 1]],

        [[1, 0],
         [1, 0],
         [1, 0],
         [1, 0],
         [1, 0]]])

speechbrain.nnet.losses.reduce_loss(loss, mask, reduction='mean', label_smoothing=0.0, predictions=None, targets=None)[source]

执行原始损失值的指定减少

Parameters:

loss (function) – 一个用于计算损失值的函数，仅接受预测值和目标值作为输入。应返回所有损失值，而不是一个缩减后的值（例如 reduction=”none”）。
mask (torch.Tensor) – 在计算损失之前应用的掩码。
reduction (str) – 其中之一是 'mean', 'batch', 'batchmean', 'none'，其中 'mean' 返回一个单一值，'batch' 返回批次中每个项目的一个值，'batchmean' 是总和除以批次大小，'none' 返回所有值。
label_smoothing (float) – 标签平滑的比例。仅应用于NLL损失。参考：通过惩罚自信输出分布来正则化神经网络。https://arxiv.org/abs/1701.06548
predictions (torch.Tensor) – 损失函数的第一个参数。仅在使用了标签平滑时是必需的。
targets (torch.Tensor) – 损失函数的第二个参数。仅在使用了标签平滑时必需。

Return type:

损失减少。

speechbrain.nnet.losses.get_si_snr_with_pitwrapper(source, estimate_source)[source]

此函数使用speechbrain的pit-wrapper封装了si_snr计算。

Parameters:

source (torch.Tensor) – 形状为 [B, T, C], 其中 B 是批次大小，T 是源的长度，C 是源的数量，排序方式使得此损失与类 PitWrapper 兼容。
estimate_source (torch.Tensor) – 估计的源，形状为 [B, T, C]

Returns:

loss – 计算得到的信噪比

Return type:

torch.Tensor

Example

>>> x = torch.arange(600).reshape(3, 100, 2)
>>> xhat = x[:, :, (1, 0)]
>>> si_snr = -get_si_snr_with_pitwrapper(x, xhat)
>>> print(si_snr)
tensor([135.2284, 135.2284, 135.2284])

speechbrain.nnet.losses.get_snr_with_pitwrapper(source, estimate_source)[source]

此函数使用speechbrain的pit-wrapper封装了snr计算。

Parameters:

source (torch.Tensor) – 形状为 [B, T, E, C]，其中 B 是批量大小，T 是源的长度，E 是双耳通道，C 是源的数量排序方式使得此损失与类 PitWrapper 兼容。
estimate_source (torch.Tensor) – 估计的源，形状为 [B, T, E, C]

Returns:

loss – 计算得到的信噪比

Return type:

torch.Tensor

speechbrain.nnet.losses.cal_si_snr(source, estimate_source)[source]

计算SI-SNR。

Parameters:

source (torch.Tensor) – 形状为 [T, B, C], 其中 B 是批量大小，T 是源的长度，C 是源的数量排序方式使得此损失与类 PitWrapper 兼容。
estimate_source (torch.Tensor) – 估计的源，形状为 [T, B, C]

Returns:

计算得到的SI-SNR。
示例
———
>>> import numpy as np
>>> x = torch.Tensor([[1, 0], [123, 45], [34, 5], [2312, 421]])
>>> xhat = x[ (, (1, 0)])
>>> x = x.unsqueeze(-1).repeat(1, 1, 2)
>>> xhat = xhat.unsqueeze(1).repeat(1, 2, 1)
>>> si_snr = -cal_si_snr(x, xhat)
>>> print(si_snr)
tensor([[[ 25.2142, 144.1789], – [130.9283, 25.2142]]])

speechbrain.nnet.losses.cal_snr(source, estimate_source)[source]

计算双耳通道信噪比。

Parameters:

source (torch.Tensor) – 形状为 [T, E, B, C] 其中 B 是批量大小，T 是源的长度，E 是双耳通道，C 是源的数量排序方式使得此损失与类 PitWrapper 兼容。
estimate_source (torch.Tensor) – 估计的源，形状为 [T, E, B, C]

Return type:

双耳通道信噪比

speechbrain.nnet.losses.get_mask(source, source_lengths)[source]

Parameters:

source (torch.Tensor) – 形状 [T, B, C]
source_lengths (torch.Tensor) – 形状 [B]

Returns:

mask – 形状 [T, B, 1]

Return type:

torch.Tensor

Example

>>> source = torch.randn(4, 3, 2)
>>> source_lengths = torch.Tensor([2, 1, 4]).int()
>>> mask = get_mask(source, source_lengths)
>>> print(mask)
tensor([[[1.],
         [1.],
         [1.]],

        [[1.],
         [0.],
         [1.]],

        [[0.],
         [0.],
         [1.]],

        [[0.],
         [0.],
         [1.]]])

class speechbrain.nnet.losses.AngularMargin(margin=0.0, scale=1.0)[source]

基础：Module

以下论文中提出的Angular Margin (AM)的实现：'''Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition''' (https://arxiv.org/abs/1906.07317)

Parameters:

margin (float) – 余弦相似度的边距
scale (float) – 余弦相似度的比例

Example

>>> pred = AngularMargin()
>>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
>>> targets = torch.tensor([ [1., 0.], [0., 1.], [ 1., 0.], [0.,  1.] ])
>>> predictions = pred(outputs, targets)
>>> predictions[:,0] > predictions[:,1]
tensor([ True, False,  True, False])

forward(outputs, targets)[source]

计算两个张量之间的AM

Parameters:

输出 (torch.Tensor) – 形状为 [N, C] 的输出，需要计算余弦相似度。
targets (torch.Tensor) – 目标的形状为 [N, C]，其中应用了边距。

Returns:

预测

Return type:

torch.Tensor

class speechbrain.nnet.losses.AdditiveAngularMargin(margin=0.0, scale=1.0, easy_margin=False)[source]

基础类: AngularMargin

以下论文中提出的加性角度边际（AAM）的实现：'''边际重要：面向更具区分性的深度神经网络嵌入用于说话人识别'''（https://arxiv.org/abs/1906.07317）

Parameters:

margin (float) – 余弦相似度的边距。
scale (float) – 余弦相似度的比例。
easy_margin (bool)

Example

>>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
>>> targets = torch.tensor([ [1., 0.], [0., 1.], [ 1., 0.], [0.,  1.] ])
>>> pred = AdditiveAngularMargin()
>>> predictions = pred(outputs, targets)
>>> predictions[:,0] > predictions[:,1]
tensor([ True, False,  True, False])

forward(outputs, targets)[source]

计算两个张量之间的AAM

Parameters:

输出 (torch.Tensor) – 形状为 [N, C] 的输出，需要计算余弦相似度。
targets (torch.Tensor) – 目标的形状为 [N, C]，其中应用了边距。

Returns:

预测

Return type:

torch.Tensor

class speechbrain.nnet.losses.LogSoftmaxWrapper(loss_fn)[source]

基础：Module

Parameters:: loss_fn (Callable) – 要包装的LogSoftmax函数。

Example

>>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
>>> outputs = outputs.unsqueeze(1)
>>> targets = torch.tensor([ [0], [1], [0], [1] ])
>>> log_prob = LogSoftmaxWrapper(nn.Identity())
>>> loss = log_prob(outputs, targets)
>>> 0 <= loss < 1
tensor(True)
>>> log_prob = LogSoftmaxWrapper(AngularMargin(margin=0.2, scale=32))
>>> loss = log_prob(outputs, targets)
>>> 0 <= loss < 1
tensor(True)
>>> outputs = torch.tensor([ [1., -1.], [-1., 1.], [0.9, 0.1], [0.1, 0.9] ])
>>> log_prob = LogSoftmaxWrapper(AdditiveAngularMargin(margin=0.3, scale=32))
>>> loss = log_prob(outputs, targets)
>>> 0 <= loss < 1
tensor(True)

forward(outputs, targets, length=None)[source]

Parameters:

输出 (torch.Tensor) – 网络输出张量，形状为 [batch, 1, outdim]。
targets (torch.Tensor) – 目标张量，形状为 [batch, 1]。
length (torch.Tensor) – 对应输入的长度。

Returns:

loss – 当前示例的损失。

Return type:

torch.Tensor

speechbrain.nnet.losses.ctc_loss_kd(log_probs, targets, input_lens, blank_index, device)[source]

CTC损失的蒸馏知识。

参考

从声学模型集合中提取知识用于联合CTC-注意力端到端语音识别。 https://arxiv.org/abs/2005.09310

param log_probs:: 来自学生模型的预测张量，形状为 [batch, time, chars]。
type log_probs:: torch.Tensor
param targets:: 来自单一教师模型的预测张量，形状为 [batch, time, chars]。
type targets:: torch.Tensor
param input_lens:: 每个话语的长度。
type input_lens:: torch.Tensor
param blank_index:: 空白符号在字符索引中的位置。
type blank_index:: 整数
param device:: 用于计算的设备。
type device:: 字符串
rtype:: 计算出的CTC损失。

speechbrain.nnet.losses.ce_kd(inp, target)[source]

交叉熵损失的简化蒸馏版本。

Parameters:

inp (torch.Tensor) – 来自学生模型的概率，形状为 [batch_size * length, feature]
target (torch.Tensor) – 来自教师模型的概率，形状为 [batch_size * length, feature]

Return type:

蒸馏后的输出。

speechbrain.nnet.losses.nll_loss_kd(probabilities, targets, rel_lab_lengths)[source]

负对数似然损失的知识蒸馏。

参考

从声学模型集合中提取知识用于联合CTC-注意力端到端语音识别。 https://arxiv.org/abs/2005.09310

param probabilities:: 学生模型的预测概率。格式为 [batch, frames, p]
type probabilities:: torch.Tensor
param targets:: 来自教师模型的目标概率。格式为 [batch, frames, p]
type targets:: torch.Tensor
param rel_lab_lengths:: 每个话语的长度，如果需要帧级损失。
type rel_lab_lengths:: torch.Tensor
rtype:: 计算NLL KD损失。

Example

>>> probabilities = torch.tensor([[[0.8, 0.2], [0.2, 0.8]]])
>>> targets = torch.tensor([[[0.9, 0.1], [0.1, 0.9]]])
>>> rel_lab_lengths = torch.tensor([1.])
>>> nll_loss_kd(probabilities, targets, rel_lab_lengths)
tensor(-0.7400)

class speechbrain.nnet.losses.ContrastiveLoss(logit_temp)[source]

基础：Module

对比损失，如wav2vec2中所使用的。

参考

wav2vec 2.0: 语音表示自监督学习框架 https://arxiv.org/abs/2006.11477

param logit_temp:: 用于划分logits的温度。
type logit_temp:: torch.Float

forward(x, y, negs)[source]

计算对比损失。

Parameters:

x (torch.Tensor) – 编码的嵌入，形状为 (B, T, C)。
y (torch.Tensor) – 特征提取器目标嵌入，形状为 (B, T, C)。
negs (torch.Tensor) – 来自特征提取器的负样本嵌入，形状为 (N, B, T, C)，其中 N 是负样本的数量。可以使用我们的 sample_negatives 函数获取（请查看 lobes/wav2vec2）。

Returns:

loss (torch.Tensor) – 计算得到的损失
accuracy (torch.Tensor) – 计算得到的准确率

class speechbrain.nnet.losses.VariationalAutoencoderLoss(rec_loss=None, len_dim=1, dist_loss_weight=0.001)[source]

基础：Module

变分自编码器损失，支持长度掩码

来自自动编码变分贝叶斯：https://arxiv.org/pdf/1312.6114.pdf

Parameters:

rec_loss (callable) – 用于计算重建损失的函数或模块
len_dim (int) – 如果编码可变长度的序列，则用于长度的维度
dist_loss_weight (float) – 分布损失（K-L散度）的相对权重

Example

>>> from speechbrain.nnet.autoencoders import VariationalAutoencoderOutput
>>> vae_loss = VariationalAutoencoderLoss(dist_loss_weight=0.5)
>>> predictions = VariationalAutoencoderOutput(
...     rec=torch.tensor(
...         [[0.8, 1.0],
...          [1.2, 0.6],
...          [0.4, 1.4]]
...         ),
...     mean=torch.tensor(
...         [[0.5, 1.0],
...          [1.5, 1.0],
...          [1.0, 1.4]],
...         ),
...     log_var=torch.tensor(
...         [[0.0, -0.2],
...          [2.0, -2.0],
...          [0.2,  0.4]],
...         ),
...     latent=torch.randn(3, 1),
...     latent_sample=torch.randn(3, 1),
...     latent_length=torch.tensor([1., 1., 1.]),
... )
>>> targets = torch.tensor(
...     [[0.9, 1.1],
...      [1.4, 0.6],
...      [0.2, 1.4]]
... )
>>> loss = vae_loss(predictions, targets)
>>> loss
tensor(1.1264)
>>> details = vae_loss.details(predictions, targets)
>>> details  
VariationalAutoencoderLossDetails(loss=tensor(1.1264),
                                  rec_loss=tensor(0.0333),
                                  dist_loss=tensor(2.1861),
                                  weighted_dist_loss=tensor(1.0930))

forward(predictions, targets, length=None, reduction='batchmean')[source]

计算前向传播

Parameters:

predictions (speechbrain.nnet.autoencoders.VariationalAutoencoderOutput) – 变分自编码器的输出
targets (torch.Tensor) – 重建目标
length (torch.Tensor) – 每个样本的长度，用于通过掩码计算真实误差。
reduction (str) – 要应用的缩减类型，默认为“batchmean”

Returns:

loss – VAE损失（重建 + K-L散度）

Return type:

torch.Tensor

details(predictions, targets, length=None, reduction='batchmean')[source]

获取有关损失的详细信息（用于绘图、日志等）

Parameters:

predictions (speechbrain.nnet.autoencoders.VariationalAutoencoderOutput) – 变分自编码器输出（或一个包含rec、mean、log_var的元组）
targets (torch.Tensor) – 重建损失的目标
length (torch.Tensor) – 每个样本的长度，用于通过掩码计算真实误差。
reduction (str) – 要应用的缩减类型，默认为“batchmean”

Returns:

详细信息 – 一个包含以下参数的命名元组 loss: torch.Tensor

组合损失

rec_loss: torch.Tensor: 重建损失
dist_loss: torch.Tensor: 分布损失（K-L散度），原始值
weighted_dist_loss: torch.Tensor: 分布损失的加权值，用于组合损失中

Return type:

VAELossDetails

class speechbrain.nnet.losses.AutoencoderLoss(rec_loss=None, len_dim=1)[source]

基础：Module

标准（非变分）自动编码器损失的实现

Parameters:

rec_loss (callable) – 用于计算重建损失的可调用对象
len_dim (int) – 用于长度的维度索引

Example

>>> from speechbrain.nnet.autoencoders import AutoencoderOutput
>>> ae_loss = AutoencoderLoss()
>>> rec = torch.tensor(
...   [[0.8, 1.0],
...    [1.2, 0.6],
...    [0.4, 1.4]]
... )
>>> predictions = AutoencoderOutput(
...     rec=rec,
...     latent=torch.randn(3, 1),
...     latent_length=torch.tensor([1., 1.])
... )
>>> targets = torch.tensor(
...     [[0.9, 1.1],
...      [1.4, 0.6],
...      [0.2, 1.4]]
... )
>>> ae_loss(predictions, targets)
tensor(0.0333)
>>> ae_loss.details(predictions, targets)
AutoencoderLossDetails(loss=tensor(0.0333), rec_loss=tensor(0.0333))

forward(predictions, targets, length=None, reduction='batchmean')[source]

计算自编码器的损失

Parameters:

predictions (speechbrain.nnet.autoencoders.AutoencoderOutput) – 自编码器的输出
targets (torch.Tensor) – 重建损失的目标
length (torch.Tensor) – 每个样本的长度，用于通过掩码计算真实误差
reduction (str) – 要应用的缩减类型，默认为“batchmean”

Return type:

计算出的损失。

details(predictions, targets, length=None, reduction='batchmean')[source]

获取有关损失的详细信息（用于绘图、日志等）

这主要是为了使损失函数与更复杂的自编码器损失函数（如VAE损失）可以互换。

Parameters:

predictions (speechbrain.nnet.autoencoders.AutoencoderOutput) – 自动编码器的输出
targets (torch.Tensor) – 重建损失的目标
length (torch.Tensor) – 每个样本的长度，用于通过掩码计算真实误差。
reduction (str) – 要应用的缩减类型，默认为“batchmean”

Returns:

详情 – 一个包含以下参数的命名元组 loss: torch.Tensor

组合损失

rec_loss: torch.Tensor: 重建损失

Return type:

AutoencoderLossDetails

class speechbrain.nnet.losses.VariationalAutoencoderLossDetails(loss, rec_loss, dist_loss, weighted_dist_loss)

基础：tuple

dist_loss: 字段编号2的别名

loss: 字段编号 0 的别名

rec_loss: 字段编号1的别名

weighted_dist_loss: 字段编号3的别名

class speechbrain.nnet.losses.AutoencoderLossDetails(loss, rec_loss)

基础：tuple

loss: 字段编号 0 的别名

rec_loss: 字段编号1的别名

class speechbrain.nnet.losses.Laplacian(kernel_size, dtype=torch.float32)[source]

基础：Module

计算类似图像数据的拉普拉斯算子

Parameters:

kernel_size (int) – 拉普拉斯核的大小
dtype (torch.dtype) – 数据类型（可选）

Example

>>> lap = Laplacian(3)
>>> lap.get_kernel()
tensor([[[[-1., -1., -1.],
          [-1.,  8., -1.],
          [-1., -1., -1.]]]])
>>> data = torch.eye(6) + torch.eye(6).flip(0)
>>> data
tensor([[1., 0., 0., 0., 0., 1.],
        [0., 1., 0., 0., 1., 0.],
        [0., 0., 1., 1., 0., 0.],
        [0., 0., 1., 1., 0., 0.],
        [0., 1., 0., 0., 1., 0.],
        [1., 0., 0., 0., 0., 1.]])
>>> lap(data.unsqueeze(0))
tensor([[[ 6., -3., -3.,  6.],
         [-3.,  4.,  4., -3.],
         [-3.,  4.,  4., -3.],
         [ 6., -3., -3.,  6.]]])

get_kernel()[source]: 计算拉普拉斯核

forward(data)[source]

计算类似图像数据的拉普拉斯算子

Parameters:: data (torch.Tensor) – 一个 (B x C x W x H) 或 (B x C x H x W) 的张量，包含类似图像的数据
Return type:: 转换后的输出。

class speechbrain.nnet.losses.LaplacianVarianceLoss(kernel_size=3, len_dim=1)[source]

基础：Module

拉普拉斯方差损失 - 用于惩罚图像类数据（如频谱图）中的模糊度。

损失值将为负方差，因为方差越大，图像越清晰。

Parameters:

kernel_size (int) – 拉普拉斯核的大小
len_dim (int) – 用作长度的维度

Example

>>> lap_loss = LaplacianVarianceLoss(3)
>>> data = torch.ones(6, 6).unsqueeze(0)
>>> data
tensor([[[1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1., 1.]]])
>>> lap_loss(data)
tensor(-0.)
>>> data = (
...     torch.eye(6) + torch.eye(6).flip(0)
... ).unsqueeze(0)
>>> data
tensor([[[1., 0., 0., 0., 0., 1.],
         [0., 1., 0., 0., 1., 0.],
         [0., 0., 1., 1., 0., 0.],
         [0., 0., 1., 1., 0., 0.],
         [0., 1., 0., 0., 1., 0.],
         [1., 0., 0., 0., 0., 1.]]])
>>> lap_loss(data)
tensor(-17.6000)

forward(predictions, length=None, reduction=None)[source]

计算拉普拉斯损失

Parameters:

predictions (torch.Tensor) – 一个 (B x C x W x H) 或 (B x C x H x W) 的张量
length (torch.Tensor) – 对应输入的长度。
reduction (str) – “batch” 或 None

Returns:

loss – 损失值

Return type:

torch.Tensor