speechbrain.lobes.models.convolution 模块

这是一个模块，用于集成带有或不带有残差连接的卷积（深度）编码器。

Authors

钟建元 2020
Titouan Parcollet 2023

摘要

类：

`ConvBlock`	一种实现1D或2D卷积（深度卷积）的卷积块。
`ConvolutionFrontEnd`	这是一个模块，用于集成带有或不带有残差连接的卷积（深度）编码器。
`ConvolutionalSpatialGatingUnit`	该模块实现了CSGU，定义于：Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding"

参考

class speechbrain.lobes.models.convolution.ConvolutionalSpatialGatingUnit(input_size, kernel_size=31, dropout=0.0, use_linear_after_conv=False, activation=<class 'torch.nn.modules.linear.Identity'>)[source]

基础：Module

本模块实现了CSGU，定义于： Branchformer: 并行MLP-注意力架构用于捕捉语音识别和理解中的局部和全局上下文

该代码深受原始ESPNet实现的启发。

Parameters:

input_size (int) – 特征（通道）维度的大小。
kernel_size (int, optional) – 内核的大小
dropout (float, optional) – 应用于输出的丢弃率
use_linear_after_conv (bool, 可选) – 如果为True，将应用一个大小为input_size//2的线性变换
activation (torch.class, optional) – 用于门的激活函数，默认为Identity。

Example

>>> x = torch.rand((8, 30, 10))
>>> conv = ConvolutionalSpatialGatingUnit(input_size=x.shape[-1])
>>> out = conv(x)
>>> out.shape
torch.Size([8, 30, 5])

forward(x)[source]

Parameters:: x (torch.Tensor) – 输入张量，形状为 (B, T, D)
Returns:: out – 处理后的输出。
Return type:: torch.Tensor

class speechbrain.lobes.models.convolution.ConvolutionFrontEnd(input_shape, num_blocks=3, num_layers_per_block=5, out_channels=[128, 256, 512], kernel_sizes=[3, 3, 3], strides=[1, 2, 2], dilations=[1, 1, 1], residuals=[True, True, True], conv_module=<class 'speechbrain.nnet.CNN.Conv2d'>, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, norm=<class 'speechbrain.nnet.normalization.LayerNorm'>, dropout=0.1, conv_bias=True, padding='same', conv_init=None)[source]

基础类: Sequential

这是一个模块，用于集成带有或不带有残差连接的卷积（深度）编码器。

Parameters:

input_shape (tuple) – 输入张量的预期形状。
num_blocks (int) – 块的数量（默认值为21）。
num_layers_per_block (int) – 每个块的卷积层数（默认值为5）。
out_channels (可选(列表[整数])) – 每个块的输出通道数。
kernel_sizes (可选(列表[整数])) – 卷积块的核大小。
strides (可选(list[int])) – 每个块的步幅因子，此步幅应用于每个块的最后一个卷积层。
dilations (可选(列表[整数])) – 每个块的扩张因子。
residuals (Optional(list[bool])) – 是否在每个块应用残差连接（默认为 None）。
conv_module (class) – 用于构建卷积层的类。
activation (Callable) – 每个块的激活函数（默认为 LeakyReLU）。
norm (torch class) – 用于正则化模型的归一化方法（默认为 BatchNorm1d）。
dropout (float) – 丢弃率（默认值为0.1）。
conv_bias (bool) – 是否在卷积层中添加偏置项。
padding (str) – 要应用的填充类型。
conv_init (str) – 用于卷积层的初始化类型。

Example

>>> x = torch.rand((8, 30, 10))
>>> conv = ConvolutionFrontEnd(input_shape=x.shape)
>>> out = conv(x)
>>> out.shape
torch.Size([8, 8, 3, 512])

get_filter_properties() → FilterProperties[source]

class speechbrain.lobes.models.convolution.ConvBlock(num_layers, out_channels, input_shape, kernel_size=3, stride=1, dilation=1, residual=False, conv_module=<class 'speechbrain.nnet.CNN.Conv2d'>, activation=<class 'torch.nn.modules.activation.LeakyReLU'>, norm=None, dropout=0.1, conv_bias=True, padding='same', conv_init=None)[source]

基础：Module

一维或二维卷积（深度卷积）块的实现。

Parameters:

num_layers (int) – 该块的深度卷积层数。
out_channels (int) – 该模型的输出通道数（默认640）。
input_shape (tuple) – 输入张量的预期形状。
kernel_size (int) – 卷积层的核大小（默认为3）。
stride (int) – 此块的步幅因子（默认为1）。
dilation (int) – 扩张因子。
residual (bool) – 如果为True，则添加残差连接。
conv_module (torch class) – 构建卷积层时使用的类。
activation (Callable) – 此块的激活函数。
norm (torch class) – 用于正则化模型的归一化方法（默认为 BatchNorm1d）。
dropout (float) – 将输出置零的比率。
conv_bias (bool) – 向卷积层添加偏置项。
padding (str) – 要添加的填充类型。
conv_init (str) – 用于卷积层的初始化类型。

Example

>>> x = torch.rand((8, 30, 10))
>>> conv = ConvBlock(2, 16, input_shape=x.shape)
>>> out = conv(x)
>>> x.shape
torch.Size([8, 30, 10])

forward(x)[source]: 处理输入张量 x 并返回输出张量。

get_filter_properties() → FilterProperties[source]