speechbrain.lobes.models.L2I 模块

该文件实现了必要的类和函数，以实现从https://arxiv.org/abs/2202.11479v2中提出的Listen-to-Interpret (L2I)解释方法。

作者 * Cem Subakan 2022 * Francesco Paissan 2022

摘要

类：

`CNN14PSI_stft`	该类在给定分类器表示的情况下，估计STFT域上的显著性图。
`CNN14PSI_stft_2d`	该类使用L2I框架估计NMF激活以创建显著性图
`NMFDecoderAudio`	该类实现了一个NMF解码器
`NMFEncoder`	该类实现了一个带有卷积网络的NMF编码器
`Psi`	卷积层用于从分类器表示中估计NMF激活
`PsiOptimized`	用于从分类器表示中估计NMF激活的卷积层，针对对数谱进行了优化。
`Theta`	该类在NMF激活上实现了一个线性分类器

函数：

weights_init

对网络权重应用Xavier初始化。

参考

class speechbrain.lobes.models.L2I.Psi(n_comp=100, T=431, in_emb_dims=[2048, 1024, 512])[source]

基础：Module

从分类器表示中估计NMF激活的卷积层

Parameters:

n_comp (int) – NMF组件的数量（或等效于每个时间步长输出端的神经元数量）
T (int) – 时间维度上的目标长度
in_emb_dims (包含整型元素的列表) – 一个长度为3的列表，包含输入维度的维度该列表需要与输入分类器表示中的通道数匹配最后一个条目应该是最小的条目

Example

>>> inp = [torch.ones(2, 150, 6, 2), torch.ones(2, 100, 6, 2), torch.ones(2, 50, 12, 5)]
>>> psi = Psi(n_comp=100, T=120, in_emb_dims=[150, 100, 50])
>>> h = psi(inp)
>>> print(h.shape)
torch.Size([2, 100, 120])

forward(inp)[source]

这个前向函数返回给定分类器激活的NMF时间激活

Parameters:: inp (list) – 一个长度为3的分类器输入表示列表。
Return type:: NMF 时间激活

class speechbrain.lobes.models.L2I.NMFDecoderAudio(n_comp=100, n_freq=513, device='cuda')[source]

基础：Module

该类实现了一个NMF解码器

Parameters:

n_comp (int) – NMF组件的数量
n_freq (int) – NMF字典中的频率仓数量
device (str) – 运行模型的设备

Example

>>> NMF_dec = NMFDecoderAudio(20, 210, device='cpu')
>>> H = torch.rand(1, 20, 150)
>>> Xhat = NMF_dec.forward(H)
>>> print(Xhat.shape)
torch.Size([1, 210, 150])

forward(H)[source]

给定激活H的NMF前向传递

Parameters:

H (torch.Tensor) –

激活张量，形状为 B x n_comp x T 其中 B = 批量大小

n_comp = NMF 组件数量 T = 时间点数量

Returns:

output – NMF 输出

Return type:

torch.Tensor

return_W()[source]: 此函数返回NMF字典

speechbrain.lobes.models.L2I.weights_init(m)[source]

将Xavier初始化应用于网络权重。

Parameters:: m (nn.Module) – 要初始化的模块。

class speechbrain.lobes.models.L2I.PsiOptimized(dim=128, K=100, numclasses=50, use_adapter=False, adapter_reduce_dim=True)[source]

基础：Module

卷积层用于从分类器表示中估计NMF激活，针对对数谱进行了优化。

Parameters:

dim (int) – 隐藏表示的维度（分类器的输入）。
K (int) – NMF组件的数量（或等效于每个时间步输出神经元的数量）
numclasses (int) – 可能的类别数量。
use_adapter (bool) – True 如果你希望为潜在表示学习一个适配器。
adapter_reduce_dim (bool) – True 如果适配器应压缩潜在表示。

Example

>>> inp = torch.randn(1, 256, 26, 32)
>>> psi = PsiOptimized(dim=256, K=100, use_adapter=False, adapter_reduce_dim=False)
>>> h, inp_ad= psi(inp)
>>> print(h.shape, inp_ad.shape)
torch.Size([1, 1, 417, 100]) torch.Size([1, 256, 26, 32])

forward(hs)[source]

计算前进步骤。

Parameters:: hs (torch.Tensor) – 潜在表示（分类器的输入）。期望的形状为 torch.Size([B, C, H, W])。
Returns:: NMF激活和适应表示。形状为`torch.Size([B, 1, T, 100])`。
Return type:: torch.Tensor

class speechbrain.lobes.models.L2I.Theta(n_comp=100, T=431, num_classes=50)[source]

基础：Module

该类在NMF激活之上实现了一个线性分类器

Parameters:

n_comp (int) – NMF组件的数量
T (int) – NMF激活中的时间点数量
num_classes (int) – 分类器处理的类别数量

Example

>>> theta = Theta(30, 120, 50)
>>> H = torch.rand(1, 30, 120)
>>> c_hat = theta.forward(H)
>>> print(c_hat.shape)
torch.Size([1, 50])

forward(H)[source]

我们首先压缩时间轴，然后通过线性层

Parameters:

H (torch.Tensor) –

激活张量，形状为 B x n_comp x T 其中 B = 批量大小

n_comp = NMF 组件数量 T = 时间点数量

Returns:

theta_out – 分类器输出

Return type:

torch.Tensor

class speechbrain.lobes.models.L2I.NMFEncoder(n_freq, n_comp)[source]

基础：Module

该类实现了一个带有卷积网络的NMF编码器

Parameters:

n_freq (int) – NMF字典中的频率仓数量
n_comp (int) – NMF组件的数量

Example

>>> nmfencoder = NMFEncoder(513, 100)
>>> X = torch.rand(1, 513, 240)
>>> Hhat = nmfencoder(X)
>>> print(Hhat.shape)
torch.Size([1, 100, 240])

forward(X)[source]

Parameters:

X (torch.Tensor) –

输入的频谱图张量，形状为 B x n_freq x T 其中 B = 批量大小

n_freq = 输入频谱图的 nfft T = 时间点数

Return type:

NMF 编码输出。

class speechbrain.lobes.models.L2I.CNN14PSI_stft(dim=128, K=100)[source]

基础：Module

该类在给定分类器表示的情况下，估计STFT域上的显著性图。

Parameters:

dim (int) – 输入表示的维度。
K (int) – 定义显著性图中的输出通道数量。

Example

>>> from speechbrain.lobes.models.Cnn14 import Cnn14
>>> classifier_embedder = Cnn14(mel_bins=80, emb_dim=2048, return_reps=True)
>>> x = torch.randn(2, 201, 80)
>>> _, hs = classifier_embedder(x)
>>> psimodel = CNN14PSI_stft(2048, 20)
>>> xhat = psimodel.forward(hs)
>>> print(xhat.shape)
torch.Size([2, 20, 207])

forward(hs, labels=None)[source]

前向步骤。估计NMF激活以用于获取显著性掩码。

Parameters:

hs (torch.Tensor) – 分类器的表示。
labels (torch.Tensor) – 分类器表示的预测标签。

Returns:

xhat – 估计的NMF激活系数

Return type:

torch.Tensor

class speechbrain.lobes.models.L2I.CNN14PSI_stft_2d(dim=128, K=100)[source]

基础：Module

该类使用L2I框架估计NMF激活以创建显著性图

Parameters:

dim (int) – 输入表示的维度。
K (int) – 定义显著性图中的输出通道数。

Example

>>> from speechbrain.lobes.models.Cnn14 import Cnn14
>>> classifier_embedder = Cnn14(mel_bins=80, emb_dim=2048, return_reps=True)
>>> x = torch.randn(2, 201, 80)
>>> _, hs = classifier_embedder(x)
>>> psimodel = CNN14PSI_stft_2d(2048, 20)
>>> xhat = psimodel.forward(hs)
>>> print(xhat.shape)
torch.Size([2, 20, 207])

forward(hs, labels=None)[source]

前向步骤。估计NMF激活以用于获取显著性掩码。

Parameters:

hs (torch.Tensor) – 分类器的表示。
labels (torch.Tensor) – 分类器表示的预测标签。

Returns:

xhat – 估计的NMF激活系数

Return type:

torch.Tensor