speechbrain.lobes.models.dual_path 模块

支持双路径语音分离的库。

Authors

Cem Subakan 2020
Mirco Ravanelli 2020
萨穆埃莱·科内尔 2020
米尔科·布朗齐 2020
钟建元 2020

摘要

类：

`CumulativeLayerNorm`	计算累积层归一化。
`DPTNetBlock`	DPT 网络块。
`Decoder`	一个由ConvTranspose1d组成的解码器层。
`Dual_Computation_Block`	用于双路径处理的计算块。
`Dual_Path_Model`	双路径模型是dualpathrnn、sepformer、dptnet的基础。
`Encoder`	卷积编码器层。
`FastTransformerBlock`	该模块用于实现具有高效注意力的快速变换器模型。
`GlobalLayerNorm`	计算全局层归一化。
`IdentityBlock`	当我们在双路径块中希望进行恒等变换时，使用此块。
`PyTorchPositionalEncoding`	用于PyTorch变换器的位置编码器。
`PytorchTransformerBlock`	一个使用pytorch transformer块的包装器。
`SBConformerEncoderBlock`	ConformerEncoder的SpeechBrain实现的封装。
`SBRNNBlock`	用于双路径管道的RNNBlock。
`SBTransformerBlock`	SpeechBrain实现的transformer编码器的封装。
`SepformerWrapper`	sepformer模型的包装器，它结合了编码器、掩码网络和解码器 https://arxiv.org/abs/2010.13154

函数：

select_norm

只是一个选择归一化类型的包装器。

参考

class speechbrain.lobes.models.dual_path.GlobalLayerNorm(dim, shape, eps=1e-08, elementwise_affine=True)[source]

基础：Module

计算全局层归一化。

Parameters:

dim ((int 或 list 或 torch.Size)) – 输入形状，来自预期输入的大小。
shape (tuple) – 输入的预期形状。
eps (float) – 为了数值稳定性，添加到分母的一个值。
elementwise_affine (bool) – 一个布尔值，当设置为True时，该模块具有可学习的逐元素仿射参数，初始化为1（用于权重）和0（用于偏置）。

Example

>>> x = torch.randn(5, 10, 20)
>>> GLN = GlobalLayerNorm(10, 3)
>>> x_norm = GLN(x)

forward(x)[source]

返回归一化的张量。

Parameters:: x (torch.Tensor) – 大小为 [N, C, K, S] 或 [N, C, L] 的张量。
Returns:: out – 归一化的输出。
Return type:: torch.Tensor

class speechbrain.lobes.models.dual_path.CumulativeLayerNorm(dim, elementwise_affine=True, eps=1e-08)[source]

基础：LayerNorm

计算累积层归一化。

Parameters:

dim (int) – 你想要标准化的维度。
elementwise_affine (bool) – 可学习的逐元素仿射参数。
eps (float) – 一个防止溢出的小值。

Example

>>> x = torch.randn(5, 10, 20)
>>> CLN = CumulativeLayerNorm(10)
>>> x_norm = CLN(x)

forward(x)[source]

返回归一化的张量。

Parameters:: x (torch.Tensor) – torch.Tensor 大小 [N, C, K, S] 或 [N, C, L]
Returns:: out – 归一化的输出。
Return type:: torch.Tensor

speechbrain.lobes.models.dual_path.select_norm(norm, dim, shape, eps=1e-08)[source]: 只是一个选择归一化类型的包装器。

class speechbrain.lobes.models.dual_path.Encoder(kernel_size=2, out_channels=64, in_channels=1)[source]

基础：Module

卷积编码器层。

Parameters:

kernel_size (int) – 过滤器的长度。
out_channels (int) – 输出通道的数量。
in_channels (int) – 输入通道的数量。

Example

>>> x = torch.randn(2, 1000)
>>> encoder = Encoder(kernel_size=4, out_channels=64)
>>> h = encoder(x)
>>> h.shape
torch.Size([2, 64, 499])

forward(x)[source]

返回编码后的输出。

Parameters:

x (torch.Tensor) – 输入张量，维度为 [B, L]。

Returns:

x – 具有维度 [B, N, T_out] 的编码张量。其中 B = 批量大小

L = 时间点数 N = 滤波器数量 T_out = 编码器输出的时间点数

Return type:

torch.Tensor

class speechbrain.lobes.models.dual_path.Decoder(*args, **kwargs)[source]

基础类：ConvTranspose1d

一个由ConvTranspose1d组成的解码器层。

Parameters:

*args (元组)
**kwargs (dict) – 传递给 nn.ConvTranspose1d 的参数

Example

>>> x = torch.randn(2, 100, 1000)
>>> decoder = Decoder(kernel_size=4, in_channels=100, out_channels=1)
>>> h = decoder(x)
>>> h.shape
torch.Size([2, 1003])

forward(x)[source]

返回解码后的输出。

Parameters:

x (torch.Tensor) –

输入张量的维度为 [B, N, L]。

其中，B = 批量大小，: N = 滤波器数量 L = 时间点

Returns:

out – 解码后的输出。

Return type:

torch.Tensor

class speechbrain.lobes.models.dual_path.IdentityBlock[source]

基础类：object

当我们需要在Dual_path块内进行恒等变换时，使用此块。

Parameters:: **kwargs (dict) – 参数被忽略。

Example

>>> x = torch.randn(10, 100)
>>> IB = IdentityBlock()
>>> xhat = IB(x)

class speechbrain.lobes.models.dual_path.FastTransformerBlock(attention_type, out_channels, num_layers=6, nhead=8, d_ffn=1024, dropout=0, activation='relu', reformer_bucket_size=32)[source]

基础：Module

此块用于实现具有高效注意力的快速变压器模型。

实现取自 https://fast-transformers.github.io/

Parameters:

attention_type (str) – 指定注意力的类型。详情请查看 https://fast-transformers.github.io/。
out_channels (int) – 表示的维度。
num_layers (int) – 层数。
nhead (int) – 注意力头的数量。
d_ffn (int) – 位置前馈的维度。
dropout (float) – Dropout 丢弃率。
activation (str) – 激活函数。
reformer_bucket_size (int) – reformer的桶大小。

Example

# >>> x = torch.randn(10, 100, 64) # >>> block = FastTransformerBlock(‘linear’, 64) # >>> x = block(x) # >>> x.shape # torch.Size([10, 100, 64])

forward(x)[source]

返回转换后的输入。

Parameters:

x (torch.Tensor) –

张量形状 [B, L, N]. 其中，B = 批量大小，

N = 滤波器数量 L = 时间点

Returns:

out – 转换后的输出。

Return type:

torch.Tensor

class speechbrain.lobes.models.dual_path.PyTorchPositionalEncoding(d_model, dropout=0.1, max_len=5000)[source]

基础：Module

用于PyTorch变换器的位置编码器。

Parameters:

d_model (int) – 表示维度。
dropout (float) – Dropout 丢弃概率。
max_len (int) – 最大序列长度。

Example

>>> x = torch.randn(10, 100, 64)
>>> enc = PyTorchPositionalEncoding(64)
>>> x = enc(x)

forward(x)[source]

返回编码后的输出。

Parameters:

x (torch.Tensor) –

张量形状 [B, L, N], 其中, B = 批量大小,

N = 滤波器数量 L = 时间点

Returns:

out – 编码后的输出。

Return type:

torch.Tensor

class speechbrain.lobes.models.dual_path.PytorchTransformerBlock(out_channels, num_layers=6, nhead=8, d_ffn=2048, dropout=0.1, activation='relu', use_positional_encoding=True)[source]

基础：Module

一个使用pytorch transformer块的包装器。

Parameters:

out_channels (int) – 表示的维度。
num_layers (int) – 层数。
nhead (int) – 注意力头的数量。
d_ffn (int) – 位置前馈的维度。
dropout (float) – Dropout 丢弃率。
activation (str) – 激活函数。
use_positional_encoding (bool) – 如果为真，我们使用位置编码。

Example

>>> x = torch.randn(10, 100, 64)
>>> block = PytorchTransformerBlock(64)
>>> x = block(x)
>>> x.shape
torch.Size([10, 100, 64])

forward(x)[source]

返回转换后的输出。

Parameters:

x (torch.Tensor) –

张量形状 [B, L, N] 其中，B = 批量大小，

N = 滤波器数量 L = 时间点

Returns:

out – 转换后的输出。

Return type:

torch.Tensor

class speechbrain.lobes.models.dual_path.SBTransformerBlock(num_layers, d_model, nhead, d_ffn=2048, input_shape=None, kdim=None, vdim=None, dropout=0.1, activation='relu', use_positional_encoding=False, norm_before=False, attention_type='regularMHA')[source]

基础：Module

SpeechBrain实现的transformer编码器的封装。

Parameters:

num_layers (int) – 层数。
d_model (int) – 表示的维度。
nhead (int) – 注意力头的数量。
d_ffn (int) – 位置前馈的维度。
input_shape (tuple) – 输入的形状。
kdim (int) – 键的维度（可选）。
vdim (int) – 值的维度（可选）。
dropout (float) – 丢弃率。
activation (str) – 激活函数。
use_positional_encoding (bool) – 如果为真，我们使用位置编码。
norm_before (bool) – 在转换之前使用归一化。
attention_type (str) – 使用的注意力类型，默认为“regularMHA”

Example

>>> x = torch.randn(10, 100, 64)
>>> block = SBTransformerBlock(1, 64, 8)
>>> x = block(x)
>>> x.shape
torch.Size([10, 100, 64])

forward(x)[source]

返回转换后的输出。

Parameters:

x (torch.Tensor) –

张量形状 [B, L, N], 其中, B = 批量大小,

L = 时间点 N = 滤波器数量

Returns:

out – 转换后的输出。

Return type:

torch.Tensor

class speechbrain.lobes.models.dual_path.SBRNNBlock(input_size, hidden_channels, num_layers, rnn_type='LSTM', dropout=0, bidirectional=True)[source]

基础：Module

用于双路径管道的RNNBlock。

Parameters:

input_size (int) – 输入特征的维度。
hidden_channels (int) – RNN潜在层的维度。
num_layers (int) – RNN层的数量。
rnn_type (str) – RNN单元的类型。
dropout (float) – 丢弃率
双向 (bool) – 如果为True，则为双向。

Example

>>> x = torch.randn(10, 100, 64)
>>> rnn = SBRNNBlock(64, 100, 1, bidirectional=True)
>>> x = rnn(x)
>>> x.shape
torch.Size([10, 100, 200])

forward(x)[source]

返回转换后的输出。

Parameters:

x (torch.Tensor) –

[B, L, N] 其中，B = 批量大小，

N = 滤波器数量 L = 时间点

Returns:

out – 转换后的输出。

Return type:

torch.Tensor

class speechbrain.lobes.models.dual_path.DPTNetBlock(d_model, nhead, dim_feedforward=256, dropout=0, activation='relu')[source]

基础：Module

DPT 网络块。

Parameters:

d_model (int) – 输入中期望的特征数量（必需）。
nhead (int) – 多头注意力模型中的头数（必需）。
dim_feedforward (int) – 前馈网络模型的维度（默认=2048）。
dropout (float) – 丢弃率（默认=0.1）。
activation (str) – 中间层的激活函数，relu 或 gelu（默认=relu）。

示例

>>> encoder_layer = DPTNetBlock(d_model=512, nhead=8)
>>> src = torch.rand(10, 100, 512)
>>> out = encoder_layer(src)
>>> out.shape
torch.Size([10, 100, 512])

forward(src)[source]

将输入通过编码器层。

Parameters:

src (torch.Tensor) –

张量形状 [B, L, N] 其中，B = 批量大小，

N = 滤波器数量 L = 时间点

Return type:

编码输出。

class speechbrain.lobes.models.dual_path.Dual_Computation_Block(intra_mdl, inter_mdl, out_channels, norm='ln', skip_around_intra=True, linear_layer_after_inter_intra=True)[source]

基础：Module

双路径处理的计算块。

Parameters:

intra_mdl (torch.nn.module) – 用于在块内处理的模型。
inter_mdl (torch.nn.module) – 用于跨块处理的模型。
out_channels (int) – 模型内部/外部的维度。
norm (str) – 归一化类型。
skip_around_intra (bool) – 跳过围绕内部层的连接。
linear_layer_after_inter_intra (bool) – 在inter或intra之后是否使用线性层。

Example

>>> intra_block = SBTransformerBlock(1, 64, 8)
>>> inter_block = SBTransformerBlock(1, 64, 8)
>>> dual_comp_block = Dual_Computation_Block(intra_block, inter_block, 64)
>>> x = torch.randn(10, 64, 100, 10)
>>> x = dual_comp_block(x)
>>> x.shape
torch.Size([10, 64, 100, 10])

forward(x)[source]

返回输出张量。

Parameters:

x (torch.Tensor) – 输入张量的维度为 [B, N, K, S]。

Returns:

out – 输出张量的维度为 [B, N, K, S]。其中，B = 批量大小，

N = 滤波器数量 K = 每个块中的时间点 S = 块的数量

Return type:

torch.Tensor

class speechbrain.lobes.models.dual_path.Dual_Path_Model(in_channels, out_channels, intra_model, inter_model, num_layers=1, norm='ln', K=200, num_spks=2, skip_around_intra=True, linear_layer_after_inter_intra=True, use_global_pos_enc=False, max_length=20000)[source]

基础：Module

双路径模型是dualpathrnn、sepformer、dptnet的基础。

Parameters:

in_channels (int) – 编码器输出端的通道数。
out_channels (int) – 将输入到内部和外部块的通道数。
intra_model (torch.nn.module) – 用于在块内处理的模型。
inter_model (torch.nn.module) – 用于跨块处理的模型，
num_layers (int) – 双计算块的层数。
norm (str) – 归一化类型。
K (int) – 块长度。
num_spks (int) – 源（说话者）的数量。
skip_around_intra (bool) – 跳过围绕内部的连接。
linear_layer_after_inter_intra (bool) – 在inter和intra之后的线性层。
use_global_pos_enc (bool) – 全局位置编码。
max_length (int) – 最大序列长度。

Example

>>> intra_block = SBTransformerBlock(1, 64, 8)
>>> inter_block = SBTransformerBlock(1, 64, 8)
>>> dual_path_model = Dual_Path_Model(64, 64, intra_block, inter_block, num_spks=2)
>>> x = torch.randn(10, 64, 2000)
>>> x = dual_path_model(x)
>>> x.shape
torch.Size([2, 10, 64, 2000])

forward(x)[source]

返回输出张量。

Parameters:

x (torch.Tensor) – 输入张量的维度为 [B, N, L]。

Returns:

out – 输出张量的维度为 [spks, B, N, L] 其中，spks = 说话者数量

B = 批量大小, N = 滤波器数量 L = 时间点数量

Return type:

torch.Tensor

class speechbrain.lobes.models.dual_path.SepformerWrapper(encoder_kernel_size=16, encoder_in_nchannels=1, encoder_out_nchannels=256, masknet_chunksize=250, masknet_numlayers=2, masknet_norm='ln', masknet_useextralinearlayer=False, masknet_extraskipconnection=True, masknet_numspks=2, intra_numlayers=8, inter_numlayers=8, intra_nhead=8, inter_nhead=8, intra_dffn=1024, inter_dffn=1024, intra_use_positional=True, inter_use_positional=True, intra_norm_before=True, inter_norm_before=True)[source]

基础：Module

sepformer模型的封装器，它结合了编码器、掩码网络和解码器 https://arxiv.org/abs/2010.13154

Parameters:

encoder_kernel_size (int) – 编码器中使用的核大小
encoder_in_nchannels (int) – 输入音频的通道数
encoder_out_nchannels (int) – 编码器中使用的过滤器数量。同时，也是输入到内部和外部块的通道数量。
masknet_chunksize (int) – 由内部块处理的块长度
masknet_numlayers (int) – 组合内部和外部块的层数
masknet_norm (str,) –
在masknet中使用的归一化类型应该是以下之一：‘ln’ – 层归一化, ‘gln’ – 全局层归一化

’cln’ – 累积层归一化, ‘bn’ – 批归一化 – 更多详情请参见上面的select_norm函数
masknet_useextralinearlayer (bool) – 是否在内部和外部块的输出处使用线性层
masknet_extraskipconnection (bool) – 这会在内部块周围引入额外的跳跃连接
masknet_numspks (int) – 这决定了要估计的说话者数量
intra_numlayers (int) – 这决定了内部块中的层数
inter_numlayers (int) – 这决定了inter块中的层数
intra_nhead (int) – 这决定了内部块中并行注意力头的数量
inter_nhead (int) – 这决定了inter块中并行注意力头的数量
intra_dffn (int) – 内部块中位置前馈模型的维度数
inter_dffn (int) – 内部块中位置前馈模型的维度数量
intra_use_positional (bool) – 是否在内部块中使用位置编码
inter_use_positional (bool) – 是否在inter块中使用位置编码
intra_norm_before (bool) – 是否在内部块的转换之前使用归一化
inter_norm_before (bool) – 是否在内部块的转换之前使用归一化

Example

>>> model = SepformerWrapper()
>>> inp = torch.rand(1, 160)
>>> result = model.forward(inp)
>>> result.shape
torch.Size([1, 160, 2])

reset_layer_recursively(layer)[source]: 重新初始化网络的参数

forward(mix)[source]: 处理输入张量 x 并返回输出张量。

class speechbrain.lobes.models.dual_path.SBConformerEncoderBlock(num_layers, d_model, nhead, d_ffn=2048, input_shape=None, kdim=None, vdim=None, dropout=0.1, activation='swish', kernel_size=31, bias=True, use_positional_encoding=True, attention_type='RelPosMHAXL')[source]

基础：Module

ConformerEncoder 的 SpeechBrain 实现的封装器。

Parameters:

num_layers (int) – 层数。
d_model (int) – 表示的维度。
nhead (int) – 注意力头的数量。
d_ffn (int) – 位置前馈的维度。
input_shape (tuple) – 输入的形状。
kdim (int) – 键的维度（可选）。
vdim (int) – 值的维度（可选）。
dropout (float) – 丢弃率。
activation (str) – 激活函数。
kernel_size (int) – 在conformer编码器中的核大小
bias (bool) – 在conformer编码器的卷积部分是否使用偏置
use_positional_encoding (bool) – 如果为真，我们使用位置编码。
attention_type (str) – 使用的注意力类型，默认为“RelPosMHAXL”

Example

>>> x = torch.randn(10, 100, 64)
>>> block = SBConformerEncoderBlock(1, 64, 8)
>>> from speechbrain.lobes.models.transformer.Transformer import PositionalEncoding
>>> pos_enc = PositionalEncoding(64)
>>> pos_embs = pos_enc(torch.ones(1, 199, 64))
>>> x = block(x)
>>> x.shape
torch.Size([10, 100, 64])

forward(x)[source]

返回转换后的输出。

Parameters:

x (torch.Tensor) –

张量形状 [B, L, N], 其中, B = 批量大小,

L = 时间点 N = 滤波器数量

Return type:

转换后的输出