speechbrain.lobes.models.conv_tasnet 模块

实现一个流行的语音分离模型。

摘要

类：

`ChannelwiseLayerNorm`	通道方向的层归一化 (cLN)。
`Chomp1d`	该类从信号末尾切出一部分。
`Decoder`	该类实现了ConvTasnet的解码器。
`DepthwiseSeparableConv`	ConvTasNet中Masknet时间模块的构建块。
`Encoder`	该类学习ConvTasnet模型的自适应前端。
`GlobalLayerNorm`	全局层归一化 (gLN)。
`MaskNet`
`TemporalBlock`	用于Masknet的conv1d复合层。
`TemporalBlocksSequential`	用于复制时间块层的包装器

函数：

choose_norm

此函数返回所选的归一化类型。

参考

class speechbrain.lobes.models.conv_tasnet.Encoder(L, N)[source]

基础：Module

该类学习ConvTasnet模型的自适应前端。

Parameters:

L (int) – 滤波器核的大小。必须是一个奇数。
N (int) – 自适应前端输出时的维度数量。

Example

>>> inp = torch.rand(10, 100)
>>> encoder = Encoder(11, 20)
>>> h = encoder(inp)
>>> h.shape
torch.Size([10, 20, 20])

forward(mixture)[source]

Parameters:: mixture (torch.Tensor) – 张量形状为 [M, T]。M 是批量大小。T 是样本数量
Returns:: mixture_w – 张量形状为 [M, K, N]，其中 K = (T-L)/(L/2)+1 = 2T/L-1
Return type:: torch.Tensor

class speechbrain.lobes.models.conv_tasnet.Decoder(L, N)[source]

基础：Module

该类实现了ConvTasnet的解码器。

分离的源嵌入被送入解码器，以在时域中重建估计的源。

Parameters:

L (int) – 重建时使用的基础数量。
N (int) – 输入大小

Example

>>> L, C, N = 8, 2, 8
>>> mixture_w = torch.randn(10, 100, N)
>>> est_mask = torch.randn(10, 100, C, N)
>>> Decoder = Decoder(L, N)
>>> mixture_hat = Decoder(mixture_w, est_mask)
>>> mixture_hat.shape
torch.Size([10, 404, 2])

forward(mixture_w, est_mask)[source]

Parameters:

mixture_w (torch.Tensor) – 张量形状为 [M, K, N]。
est_mask (torch.Tensor) – 张量形状为 [M, K, C, N]。

Returns:

est_source – 张量形状为 [M, T, C]。

Return type:

torch.Tensor

class speechbrain.lobes.models.conv_tasnet.TemporalBlocksSequential(input_shape, H, P, R, X, norm_type, causal)[source]

基础类: Sequential

用于复制时间块层的包装器

Parameters:

input_shape (tuple) – 输入的预期形状。
H (int) – 中间通道的数量。
P (int) – 卷积中的核大小。
R (int) – 复制多层时间块的次数。
X (int) – 具有不同扩张率的时序块层数。
norm_type (str) – 归一化类型，可选值为 [‘gLN’, ‘cLN’]。
causal (bool) – 使用因果或非因果卷积，取值为 [True, False]。

Example

>>> x = torch.randn(14, 100, 10)
>>> H, P, R, X = 10, 5, 2, 3
>>> TemporalBlocks = TemporalBlocksSequential(
...     x.shape, H, P, R, X, 'gLN', False
... )
>>> y = TemporalBlocks(x)
>>> y.shape
torch.Size([14, 100, 10])

class speechbrain.lobes.models.conv_tasnet.MaskNet(N, B, H, P, X, R, C, norm_type='gLN', causal=False, mask_nonlinear='relu')[source]

基础：Module

Parameters:

N (int) – 自编码器中滤波器的数量。
B (int) – 瓶颈1×1卷积块中的通道数。
H (int) – 卷积块中的通道数。
P (int) – 卷积块中的核大小。
X (int) – 每个重复中的卷积块数量。
R (int) – 重复次数。
C (int) – 扬声器数量。
norm_type (str) – 其中之一是 BN, gLN, cLN。
因果 (bool) – 因果或非因果。
mask_nonlinear (str) – 使用哪种非线性函数生成掩码，可选值为 [‘softmax’, ‘relu’]。

Example

>>> N, B, H, P, X, R, C = 11, 12, 2, 5, 3, 1, 2
>>> MaskNet = MaskNet(N, B, H, P, X, R, C)
>>> mixture_w = torch.randn(10, 11, 100)
>>> est_mask = MaskNet(mixture_w)
>>> est_mask.shape
torch.Size([2, 10, 11, 100])

forward(mixture_w)[source]

保持此API与TasNet相同。

Parameters:: mixture_w (torch.Tensor) – 张量形状为 [M, K, N]，M 是批量大小。
Returns:: est_mask – 张量形状为 [M, K, C, N]。
Return type:: torch.Tensor

class speechbrain.lobes.models.conv_tasnet.TemporalBlock(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]

基础：Module

Masknet中使用的conv1d复合层。

Parameters:

input_shape (tuple) – 输入的预期形状。
out_channels (int) – 中间通道的数量。
kernel_size (int) – 卷积中的核大小。
stride (int) – 卷积层中的卷积步幅。
padding (str) – 卷积层中的填充类型，（same, valid, causal）。如果为“valid”，则不进行填充。
dilation (int) – 卷积层中的扩张量。
norm_type (str) – 归一化类型，可选值为 [‘gLN’, ‘cLN’]。
causal (bool) – 使用因果或非因果卷积，取值为 [True, False]。

Example

>>> x = torch.randn(14, 100, 10)
>>> TemporalBlock = TemporalBlock(x.shape, 10, 11, 1, 'same', 1)
>>> y = TemporalBlock(x)
>>> y.shape
torch.Size([14, 100, 10])

forward(x)[source]

Parameters:: x (torch.Tensor) – 张量形状为 [M, K, B]。
Returns:: x – 张量形状为 [M, K, B]。
Return type:: torch.Tensor

class speechbrain.lobes.models.conv_tasnet.DepthwiseSeparableConv(input_shape, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]

基础类: Sequential

ConvTasNet中Masknet的时间模块的构建块。

Parameters:

input_shape (tuple) – 输入的预期形状。
out_channels (int) – 输出通道的数量。
kernel_size (int) – 卷积中的核大小。
stride (int) – 卷积层中的卷积步长。
padding (str) – 卷积层中的填充类型，（same, valid, causal）。如果为“valid”，则不进行填充。
dilation (int) – 卷积层中的膨胀量。
norm_type (str) – 归一化类型，可选值为 [‘gLN’, ‘cLN’]。
causal (bool) – 使用因果或非因果卷积，取值为 [True, False]。

Example

>>> x = torch.randn(14, 100, 10)
>>> DSconv = DepthwiseSeparableConv(x.shape, 10, 11, 1, 'same', 1)
>>> y = DSconv(x)
>>> y.shape
torch.Size([14, 100, 10])

class speechbrain.lobes.models.conv_tasnet.Chomp1d(chomp_size)[source]

基础：Module

此类从信号末尾切出一部分。

它被写成一个类，以便能够将其合并到一个顺序包装器中。

Parameters:: chomp_size (int) – 要丢弃的部分的大小（以样本为单位）。

Example

>>> x = torch.randn(10, 110, 5)
>>> chomp = Chomp1d(10)
>>> x_chomped = chomp(x)
>>> x_chomped.shape
torch.Size([10, 100, 5])

forward(x)[source]

Parameters:: x (torch.Tensor) – 张量形状为 [M, Kpad, H]。
Returns:: x – 张量形状为 [M, K, H]。
Return type:: torch.Tensor

speechbrain.lobes.models.conv_tasnet.choose_norm(norm_type, channel_size)[source]

此函数返回所选的归一化类型。

Parameters:

norm_type (str) – 其中之一 [‘gLN’, ‘cLN’, ‘batchnorm’]。
channel_size (int) – 通道数量。

Return type:

构建所选类型的层

Example

>>> choose_norm('gLN', 10)
GlobalLayerNorm()

class speechbrain.lobes.models.conv_tasnet.ChannelwiseLayerNorm(channel_size)[source]

基础：Module

通道级层归一化 (cLN)。

Parameters:: channel_size (int) – 归一化维度（第三维度）中的通道数。

Example

>>> x = torch.randn(2, 3, 3)
>>> norm_func = ChannelwiseLayerNorm(3)
>>> x_normalized = norm_func(x)
>>> x.shape
torch.Size([2, 3, 3])

reset_parameters()[source]: 重置参数。

forward(y)[source]

Args:: y: [M, K, N], M 是批量大小, N 是通道大小, K 是长度
Returns:: cLN_y: [M, K, N]

class speechbrain.lobes.models.conv_tasnet.GlobalLayerNorm(channel_size)[source]

基础：Module

全局层归一化 (gLN)。

Parameters:: channel_size (int) – 第三维度中的通道数。

Example

>>> x = torch.randn(2, 3, 3)
>>> norm_func = GlobalLayerNorm(3)
>>> x_normalized = norm_func(x)
>>> x.shape
torch.Size([2, 3, 3])

reset_parameters()[source]: 重置参数。

forward(y)[source]

Parameters:: y (torch.Tensor) – 张量形状 [M, K, N]。M 是批量大小，N 是通道大小，K 是长度。
Returns:: gLN_y – 张量形状 [M, K. N]
Return type:: torch.Tensor