detectron2.layers¶

class detectron2.layers.FrozenBatchNorm2d(num_features, eps=1e-05)[源代码]¶

基类: torch.nn.Module

BatchNorm2d，其中批次统计数据和仿射参数是固定的。

它包含不可训练缓冲区，分别称为"weight"、"bias"、"running_mean"和"running_var"，这些缓冲区被初始化为执行恒等变换。

Caffe2预训练的主干模型仅包含"weight"和"bias"参数，这些参数是由BN的原始四个参数计算得出的。仿射变换x * weight + bias将执行与(x - running_mean) / sqrt(running_var) * weight + bias等效的计算。当从Caffe2加载主干模型时，"running_mean"和"running_var"将保持不变作为恒等变换。

其他预训练骨干模型可能包含全部4个参数。

前向传播由F.batch_norm(…, training=False)实现。

forward(x)[源代码]¶

classmethod convert_frozen_batchnorm(module)[源代码]¶

将模块中的所有BatchNorm/SyncBatchNorm转换为FrozenBatchNorm。

Parameters: 模块 (torch.nn.Module) –
Returns: 如果模块是BatchNorm/SyncBatchNorm，则返回一个新模块。否则，原地转换模块并返回它。

类似于https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/batchnorm.py中的convert_sync_batchnorm

classmethod convert_frozenbatchnorm2d_to_batchnorm2d(module: torch.nn.Module) → torch.nn.Module [源代码]¶

将所有FrozenBatchNorm2d转换为BatchNorm2d

Parameters: 模块 (torch.nn.Module) –
Returns: 如果模块是FrozenBatchNorm2d，则返回一个新模块。否则，原地转换模块并返回它。

This is needed for quantization:: https://fb.workplace.com/groups/1043663463248667/permalink/1296330057982005/

training: bool¶

detectron2.layers.get_norm(norm, out_channels)[源代码]¶

Parameters: norm (str 或 可调用对象) - 可以是BN、SyncBN、FrozenBN、GN中的一种；或者是一个接收通道数并返回归一化层(nn.Module)的可调用对象。
Returns: nn.Module 或 None - 归一化层

class detectron2.layers.NaiveSyncBatchNorm(*args, stats_mode='', **kwargs)[源代码]¶

基类: torch.nn.BatchNorm2d

在PyTorch<=1.5版本中，当每个工作节点的批次大小不同时（例如使用尺度增强或应用于掩码头时），nn.SyncBatchNorm会出现梯度计算错误的问题。

这是nn.SyncBatchNorm的一个较慢但正确的替代方案。

注意

Sync BatchNorm 并没有一个统一的定义。

当stats_mode==""时，该模块通过等权重使用每个工作线程的统计数据来计算总体统计信息。只有当所有工作线程具有相同的(N, H, W)时，结果才是所有样本的真实统计数据（就像它们都在一个工作线程上一样）。此模式不支持批处理大小为零的输入。

当stats_mode=="N"时，该模块通过按每个工作节点的N值加权计算总体统计量。只有当所有工作节点具有相同的(H, W)时，结果才是所有样本的真实统计量（就像它们都在一个工作节点上一样）。这比stats_mode==""模式要慢。

尽管该模块的结果可能并非所有样本的真实统计数据，但它仍然是合理的，因为可能更倾向于为所有工作器分配相等的权重，而不管它们的(H, W)维度如何，而不是对较大的图像赋予更大的权重。根据初步实验，这种简化实现与准确计算总体均值和方差之间几乎没有差异。

forward(input)[源代码]¶

num_features: int¶

eps: float¶

momentum: float¶

affine: bool¶

track_running_stats: bool¶

class detectron2.layers.CycleBatchNormList(length: int, bn_class=<class 'torch.nn.BatchNorm2d'>, **kwargs)[源代码]¶

基类: torch.nn.ModuleList

通过循环实现特定领域的批归一化。

当BatchNorm层用于多个输入域或输入特征时，可能需要为每个域维护独立的测试时统计数据。详见Rethinking "Batch" in BatchNorm第5.2节。

该模块通过使用N个独立的BN层来实现这一点，并在每次调用forward()时循环遍历它们。

注意：此模块的调用者必须确保始终以N的倍数调用该模块。否则其测试时统计信息将不正确。

__init__(length: int, bn_class=<class 'torch.nn.BatchNorm2d'>, **kwargs)[源代码]¶

Parameters

length – 循环的BatchNorm层数。
bn_class – 使用的BatchNorm类
kwargs – BatchNorm类的参数，例如num_features。

forward(x)[源代码]¶

extra_repr()[源代码]¶

training: bool¶

class detectron2.layers.DeformConv(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deformable_groups=1, bias=False, norm=None, activation=None)[源代码]¶

基类: torch.nn.Module

__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deformable_groups=1, bias=False, norm=None, activation=None)[源代码]¶

可变形卷积来自Deformable Convolutional Networks。

参数与Conv2D类似。额外参数：

Parameters

deformable_groups (int) – 可变形卷积中使用的组数。
norm (nn.Module, optional) – 归一化层
activation (callable(Tensor) -> Tensor) – 可调用的激活函数

forward(x, offset)[源代码]¶

extra_repr()[源代码]¶

training: bool¶

class detectron2.layers.ModulatedDeformConv(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deformable_groups=1, bias=True, norm=None, activation=None)[源代码]¶

基类: torch.nn.Module

__init__(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deformable_groups=1, bias=True, norm=None, activation=None)[源代码]¶

来自Deformable ConvNets v2: More Deformable, Better Results的可调制变形卷积。

参数与Conv2D类似。额外参数：

Parameters

deformable_groups (int) – 可变形卷积中使用的组数。
norm (nn.Module, optional) – 一个归一化层
activation (callable(Tensor) -> Tensor) – 可调用的激活函数

forward(x, offset, mask)[源代码]¶

extra_repr()[源代码]¶

training: bool¶

detectron2.layers.paste_masks_in_image(masks: torch.Tensor, boxes: torch.Tensor, image_shape: Tuple[int, int], threshold: float = 0.5)[源代码]¶

将一组固定分辨率（例如28 x 28）的掩码粘贴到图像中。每个掩码的粘贴位置、高度和宽度由它们在boxes中对应的边界框确定。

注意

这是一个复杂但更精确的实现。在实际部署中，通常使用更快但精度稍低的实现就足够了。可参考本文件中的paste_mask_in_image_old()作为替代实现方案。

Parameters

masks (tensor) - 形状为(Bimg, Hmask, Wmask)的张量，其中Bimg表示图像中检测到的物体实例数量，Hmask和Wmask分别是预测掩码的宽度和高度（例如Hmask = Wmask = 28）。数值范围在[0, 1]之间。
boxes (Boxes 或 Tensor) - 一个长度为Bimg的Boxes或形状为(Bimg, 4)的张量。 boxes[i]和masks[i]对应同一个对象实例。
image_shape (tuple) – 高度，宽度
threshold (float) – 用于将(软)掩码转换为二值掩码的阈值，取值范围为[0, 1]。

Returns

img_masks (Tensor) – 一个形状为 (Bimg, Himage, Wimage) 的张量，其中 Bimg 是检测到的物体实例数量，Himage 和 Wimage 分别是图像的宽度和高度。img_masks[i] 表示物体实例 i 的二进制掩码。

detectron2.layers.nms(boxes: torch.Tensor, scores: torch.Tensor, iou_threshold: float) → torch.Tensor [源代码]¶

根据边界框的交并比(IoU)执行非极大值抑制(NMS)。

NMS（非极大值抑制）会迭代地移除那些与另一个（得分更高的）框的IoU（交并比）超过iou_threshold阈值的低分框。

如果多个边界框具有完全相同的分数并且满足与参考框的IoU标准，那么在CPU和GPU之间所选框不能保证一致。这与PyTorch中argsort在存在重复值时的行为类似。

Parameters

boxes (Tensor[N, 4])) – 需要进行非极大值抑制(NMS)的边界框。预期格式为(x1, y1, x2, y2)，且满足0 <= x1 < x2和 0 <= y1 < y2。
scores (Tensor[N]) - 每个边界框的得分
iou_threshold (float) - 丢弃所有IoU大于iou_threshold的重叠框

Returns

keep (Tensor) –

int64张量，包含索引: 经过NMS保留的元素索引，按分数降序排列

detectron2.layers.batched_nms(boxes: torch.Tensor, scores: torch.Tensor, idxs: torch.Tensor, iou_threshold: float)[源代码]¶: 与 torchvision.ops.boxes.batched_nms 相同，但使用 float()。

detectron2.layers.batched_nms_rotated(boxes: torch.Tensor, scores: torch.Tensor, idxs: torch.Tensor, iou_threshold: float)[源代码]¶

以批处理方式执行非极大值抑制。

每个索引值对应一个类别，不同类别元素之间不会应用NMS。

Parameters

boxes (Tensor[N, 5]) – 将执行NMS的边界框。预期格式为(x_ctr, y_ctr, width, height, angle_degrees)
scores (Tensor[N]) - 每个边界框的得分
idxs (Tensor[N]) - 每个框对应的类别索引。
iou_threshold (float) - 丢弃所有IoU小于iou_threshold的重叠框

Returns

Tensor – 一个int64类型的张量，包含经过非极大值抑制(NMS)处理后保留的元素的索引，按分数降序排列

detectron2.layers.nms_rotated(boxes: torch.Tensor, scores: torch.Tensor, iou_threshold: float)[源代码]¶

根据旋转框的交并比(IoU)执行非极大值抑制(NMS)。

旋转非极大值抑制(Rotated NMS)会迭代地移除那些与另一个(得分更高的)旋转框IoU超过iou_threshold的较低得分旋转框。

请注意，RotatedBox (5, 3, 4, 2, -90) 与 RotatedBox (5, 3, 4, 2, 90) 覆盖的区域完全相同，它们的IoU值为1。然而，在某些任务中（例如OCR），它们可能代表完全不同的对象。

至于旋转非极大值抑制是否应将它们视为相距较远的框，即使它们的交并比为1，这取决于具体应用场景和/或真实标注情况。

作为一个极端的例子，考虑单个字符v及其周围的方框。

如果角度为0度，物体（文本）将被读取为'v'；

如果角度为90度，对象（文本）将变为‘>’;

如果角度为180度，物体（文本）将变为'^';

如果角度为270/-90度，物体（文本）将变为‘<’

所有这些情况的IoU值相互之间都为1，而仅使用IoU作为标准的旋转NMS只会保留其中得分最高的一个——这在大多数实际情况下仍然合理，因为通常只有其中一种方向是正确的。此外，如果后续仅使用该框对物体进行分类（而不是通过序列OCR识别模型进行转录），影响也不会太大。

另一方面，当我们在训练过程中使用IoU来筛选接近真实标注的候选框时，如果已知真实标注采用了严格正确的方向（例如倒置文字被标注为-180度，尽管它们可能被0/90/-90度的框覆盖等），我们绝对应该将角度因素纳入考量。

原始数据集的标注方式也很重要。例如，如果数据集是一个不强制规定顶点顺序/方向的4点多边形数据集，我们可以估算该多边形的最小旋转边界框，但无法100%确定正确的角度（如上所示，可能存在4个不同的旋转框，角度彼此相差90度，却覆盖完全相同的区域）。在这种情况下，除非我们能做出其他假设（例如宽度总是大于高度，或者物体旋转不超过90度逆时针/顺时针等），否则只能使用IoU来确定框的接近程度（正如许多检测基准测试（甚至针对文本）所做的那样）。

总之，目前不考虑旋转NMS中的角度似乎是一个不错的选择，但我们应该意识到其潜在影响。

Parameters

boxes (Tensor[N, 5]) – 用于执行NMS的旋转框。预期格式为 (x_center, y_center, width, height, angle_degrees)。
scores (Tensor[N]) - 每个旋转框的得分
iou_threshold (float) - 丢弃所有IoU小于iou_threshold的重叠旋转框

Returns

keep (Tensor) – 一个int64张量，包含被旋转NMS保留的元素的索引，按分数降序排列

detectron2.layers.roi_align(input: torch.Tensor, boxes: torch.Tensor, output_size: None, spatial_scale: float = 1.0, sampling_ratio: int = - 1, aligned: bool = False) → torch.Tensor [源代码]¶

执行Mask R-CNN中描述的感兴趣区域(RoI)对齐操作

Parameters

input (Tensor[N, C, H, W]) – 输入张量
boxes (Tensor[K, 5] or List[Tensor[L, 4]]) - 以(x1, y1, x2, y2)格式表示的框坐标，将从中提取区域。坐标必须满足 0 <= x1 < x2 和 0 <= y1 < y2。如果传入单个张量，则第一列应包含批次索引。如果传入张量列表，则每个张量将对应批次中元素i的框
output_size (int 或 Tuple[int, int]) – 裁剪操作后的输出尺寸，格式为(高度, 宽度)
spatial_scale (float) - 一个将输入坐标映射到边界框坐标的比例因子。默认值：1.0
sampling_ratio (int) – 用于计算每个池化输出bin输出值的插值网格中的采样点数量。如果大于0，则精确使用sampling_ratio x sampling_ratio个网格点。如果小于等于0，则使用自适应数量的网格点（计算为ceil(roi_width / pooled_w)，高度同理）。默认值：-1
aligned (bool) – 如果为False，则使用旧版实现。如果为True，像素偏移-0.5以更完美地对齐两个相邻像素索引。这是Detectron2中的版本

Returns

输出 (张量[K, C, 输出尺寸[0], 输出尺寸[1]])

class detectron2.layers.ROIAlign(output_size, spatial_scale, sampling_ratio, aligned=True)[源代码]¶

基类: torch.nn.Module

__init__(output_size, spatial_scale, sampling_ratio, aligned=True)[源代码]¶

Parameters

output_size (tuple) – 高度, 宽度
spatial_scale (float) – 将输入框按此数值进行缩放
sampling_ratio (int) – 每个输出样本对应的输入样本采样数量。设为0表示进行密集采样。
aligned (bool) - 如果为False，则使用Detectron中的旧实现。如果为True，则更完美地对齐结果。

注意

aligned=True 的含义：

给定一个连续坐标c，其两个相邻像素索引（在我们的像素模型中）通过floor(c - 0.5)和ceil(c - 0.5)计算得出。例如， c=1.3的像素邻居具有离散索引[0]和[1]（这些像素是从连续坐标0.5和1.5的基础信号中采样的）。但原始的 roi_align（aligned=False）在计算相邻像素索引时没有减去0.5，因此在执行双线性插值时使用了略微错位的像素（相对于我们的像素模型）。

当aligned=True时，我们首先适当缩放ROI区域，然后在调用roi_align前将其偏移-0.5。这样可以获得正确的相邻区域；验证方法请参考detectron2/tests/test_roi_align.py。

如果ROIAlign与卷积层一起使用，这种差异不会对模型性能产生影响。

forward(input, rois)[源代码]¶

Parameters

input – NCHW格式的图像
rois – Bx5 边界框。第一列是索引到N的值。其余4列是xyxy坐标。

training: bool¶

detectron2.layers.roi_align_rotated()¶

class detectron2.layers.ROIAlignRotated(output_size, spatial_scale, sampling_ratio)[源代码]¶

基类: torch.nn.Module

__init__(output_size, spatial_scale, sampling_ratio)[源代码]¶

Parameters

output_size (tuple) – 高度, 宽度
spatial_scale (float) – 将输入框按此数值进行缩放
sampling_ratio (int) - 每个输出样本对应的输入样本采样数量。设为0表示进行密集采样。

注意

ROIAlignRotated默认支持连续坐标：给定一个连续坐标c，它的两个相邻像素索引（在我们的像素模型中）通过floor(c - 0.5)和ceil(c - 0.5)计算得出。例如， c=1.3的像素邻居具有离散索引[0]和[1]（这些样本是从连续坐标0.5和1.5的基础信号中采样的）。

forward(input, rois)[源代码]¶

Parameters

input – NCHW格式的图像
rois – Bx6 边界框。第一列是索引到N的值。其余5列分别是 (x中心坐标, y中心坐标, 宽度, 高度, 角度度数)。

training: bool¶

class detectron2.layers.ShapeSpec(channels: Optional[int] = None, height: Optional[int] = None, width: Optional[int] = None, stride: Optional[int] = None)[源代码]¶

基类: object

A simple structure that contains basic shape specification about a tensor. It is often used as the auxiliary inputs/outputs of models, to complement the lack of shape inference ability among pytorch modules.

channels: Optional[int] = None¶

height: Optional[int] = None¶

width: Optional[int] = None¶

stride: Optional[int] = None¶

class detectron2.layers.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶

基类: torch.nn.modules.batchnorm._BatchNorm

对4D输入（带有额外通道维度的2D输入小批量）应用批量归一化，如论文Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift中所述。

\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

均值和标准差是在小批量数据上按维度计算的，\(\gamma\)和\(\beta\)是可学习的参数向量，大小为C（其中C是输入尺寸）。默认情况下，\(\gamma\)的元素设置为1，\(\beta\)的元素设置为0。标准差通过有偏估计量计算，等同于torch.var(input, unbiased=False)。

默认情况下，在训练过程中，该层会持续计算并更新其均值和方差的运行估计值，这些估计值随后在评估阶段用于归一化。运行估计值默认保持0.1的momentum。

如果track_running_stats设置为False，则该层不会保留运行估计值，在评估期间也会使用批量统计信息。

注意

这个momentum参数与优化器类中使用的动量以及传统动量概念不同。从数学上看，这里运行统计量的更新规则是 \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\)，其中\(\hat{x}\)是估计统计量，\(x_t\)是新观测到的值。

由于批归一化是在C维度上进行的，对(N, H, W)切片计算统计量，因此通常将其称为空间批归一化。

Parameters

num_features – 来自预期输入尺寸\((N, C, H, W)\)中的\(C\)
eps – 为分母添加的一个数值，用于保证数值稳定性。默认值：1e-5
momentum – 用于计算running_mean和running_var的值。可设置为None以使用累积移动平均（即简单平均）。默认值：0.1
affine – 一个布尔值，当设置为True时，该模块具有可学习的仿射参数。默认值：True
track_running_stats – 一个布尔值，当设置为True时，该模块会跟踪运行均值和方差；当设置为False时，该模块不会跟踪这些统计量，并将统计缓冲区running_mean和running_var初始化为None。当这些缓冲区为None时，该模块在训练和评估模式下始终使用批量统计量。默认值：True

Shape:

输入: \((N, C, H, W)\)
输出: \((N, C, H, W)\) (与输入形状相同)

示例：

>>> # With Learnable Parameters
>>> m = nn.BatchNorm2d(100)
>>> # Without Learnable Parameters
>>> m = nn.BatchNorm2d(100, affine=False)
>>> input = torch.randn(20, 100, 35, 45)
>>> output = m(input)

num_features: int¶

eps: float¶

momentum: float¶

affine: bool¶

track_running_stats: bool¶

class detectron2.layers.Conv2d(*args, **kwargs)[源代码]¶

基类: torch.nn.Conv2d

一个对torch.nn.Conv2d的封装包装器，用于支持空输入和更多功能。

__init__(*args, **kwargs)[源代码]¶

除了torch.nn.Conv2d中的参数外，还支持以下额外关键字参数：

Parameters

norm (nn.Module, optional) – 一个归一化层
activation (callable(Tensor) -> Tensor) – 可调用的激活函数

它假设在激活之前使用了归一化层。

forward(x)[源代码]¶

bias: Optional[torch.Tensor]¶

out_channels: int¶

kernel_size: Tuple[int, …]¶

stride: Tuple[int, …]¶

padding: Tuple[int, …]¶

dilation: Tuple[int, …]¶

transposed: bool¶

output_padding: Tuple[int, …]¶

groups: int¶

padding_mode: str¶

weight: torch.Tensor¶

class detectron2.layers.ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, output_padding: Union[int, Tuple[int, int]] = 0, groups: int = 1, bias: bool = True, dilation: int = 1, padding_mode: str = 'zeros')¶

基类: torch.nn.modules.conv._ConvTransposeNd

对由多个输入平面组成的输入图像应用二维转置卷积算子。

该模块可视为Conv2d相对于其输入的梯度。它也被称为分数步长卷积或反卷积（尽管它并非真正的反卷积操作）。

该模块支持TensorFloat32。

stride 控制互相关的步长。
padding 控制两侧隐式零填充的数量，用于 dilation * (kernel_size - 1) - padding 个点。详情请参阅下面的说明。
output_padding 控制输出形状单侧增加的额外尺寸。详情请参阅下方说明。
dilation 控制卷积核点之间的间距，也称为"à trous"算法。虽然较难描述，但这个链接提供了关于dilation作用的直观可视化展示。
groups 控制输入与输出之间的连接。 in_channels 和 out_channels 都必须能被 groups 整除。例如：
- 当groups=1时，所有输入都会被卷积到所有输出。
- 当groups=2时，该操作等效于并排两个卷积层，每个层处理一半的输入通道并生成一半的输出通道，随后将两者连接起来。
- 当groups=in_channels时，每个输入通道会与它自己的一组滤波器（大小为\(\frac{\text{out\_channels}}{\text{in\_channels}}\)）进行卷积。

参数 kernel_size, stride, padding, output_padding 可以是以下两种情况之一：

单个 int - 这种情况下高度和宽度维度使用相同的值

一个由两个整数组成的tuple - 在这种情况下，第一个int用于高度维度，第二个int用于宽度维度

注意

padding参数实际上会在输入的两侧各添加dilation * (kernel_size - 1) - padding数量的零填充。这样设置是为了当Conv2d和ConvTranspose2d使用相同参数初始化时，它们在输入和输出形状方面互为逆运算。然而当stride > 1时，Conv2d会将多个输入形状映射到相同的输出形状。output_padding就是通过有效增加一侧的计算输出形状来解决这种歧义性的。请注意output_padding仅用于确定输出形状，实际上并不会向输出添加零填充。

注意

在某些情况下，当使用CUDA设备上的张量并启用CuDNN时，此运算符可能会选择非确定性算法以提高性能。如果不希望出现这种情况，可以尝试通过设置torch.backends.cudnn.deterministic = True使操作具有确定性（可能会以性能为代价）。更多信息请参阅/notes/randomness。

Parameters

in_channels (int) - 输入图像的通道数
out_channels (int) – 卷积操作输出的通道数
kernel_size (int 或 tuple) – 卷积核的大小
stride (int 或 tuple, 可选) – 卷积的步长。默认值: 1
padding (int 或 tuple, 可选) – dilation * (kernel_size - 1) - padding 零填充将被添加到输入数据的每个维度的两侧。默认值：0
output_padding (int 或 tuple, 可选) - 在输出形状的每个维度一侧额外增加的尺寸。默认值：0
groups (int, 可选) – 输入通道到输出通道的阻塞连接数。默认值: 1
bias (bool, optional) – 如果设为True，会在输出中添加一个可学习的偏置项。默认值：True
dilation (int 或 tuple, 可选) – 卷积核元素之间的间距。默认值: 1

Shape:

输入: \((N, C_{in}, H_{in}, W_{in})\)
输出: \((N, C_{out}, H_{out}, W_{out})\) 其中

\[H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) + \text{output\_padding}[0] + 1\]

\[W_{out} = (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) + \text{output\_padding}[1] + 1\]

weight¶

该模块的可学习权重形状为 \((\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}},\) \(\text{kernel\_size[0]}, \text{kernel\_size[1]})\)。这些权重的值是从 \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) 中采样的，其中 \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)

Type: 张量

bias¶

该模块的可学习偏置，形状为(out_channels) 如果bias设置为True，则这些权重的值将从\(\mathcal{U}(-\sqrt{k}, \sqrt{k})\)范围内采样，其中 \(k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)

Type: 张量

示例：

>>> # With square kernels and equal stride
>>> m = nn.ConvTranspose2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)
>>> # exact output size can be also specified as an argument
>>> input = torch.randn(1, 16, 12, 12)
>>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
>>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
>>> h = downsample(input)
>>> h.size()
torch.Size([1, 16, 6, 6])
>>> output = upsample(h, output_size=input.size())
>>> output.size()
torch.Size([1, 16, 12, 12])

forward(input: torch.Tensor, output_size: Optional[List[int]] = None) → torch.Tensor ¶

bias: Optional[torch.Tensor]¶

out_channels: int¶

kernel_size: Tuple[int, …]¶

stride: Tuple[int, …]¶

padding: Tuple[int, …]¶

dilation: Tuple[int, …]¶

transposed: bool¶

output_padding: Tuple[int, …]¶

groups: int¶

padding_mode: str¶

weight: torch.Tensor¶

detectron2.layers.cat(tensors: List[torch.Tensor], dim: int = 0)[源代码]¶: torch.cat的高效版本，当列表中仅有一个元素时可避免复制操作

detectron2.layers.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None, recompute_scale_factor=None)[源代码]¶

将输入下采样或上采样至给定的size或指定的scale_factor

插值算法由mode决定。

目前支持时间、空间和体积采样，即预期输入形状为3维、4维或5维。

输入维度按以下形式解释：小批量 x 通道数 x [可选深度] x [可选高度] x 宽度。

可用的调整大小模式包括：nearest、linear（仅限3D）、 bilinear、bicubic（仅限4D）、trilinear（仅限5D）、area

Parameters

input (Tensor) - 输入张量
size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]) - 输出的空间尺寸。
scale_factor (float 或 Tuple[float]) - 空间尺寸的乘数。如果是元组，必须与输入尺寸匹配。
mode (str) – 用于上采样的算法： 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear' | 'area'. 默认值: 'nearest'
align_corners (bool, 可选) - 从几何学角度，我们将输入和输出的像素视为方块而非点。如果设为True，输入和输出张量将通过其角像素的中心点对齐，保留角像素的值。如果设为False，输入和输出张量将通过其角像素的角点对齐，插值操作会对超出边界的值使用边缘值填充，这使得当scale_factor保持不变时，此操作与输入尺寸无关。此选项仅在mode 为'linear'、'bilinear'、'bicubic'或'trilinear'时有效。默认值：False
recompute_scale_factor (bool, optional) – 重新计算用于插值计算的缩放因子。当scale_factor作为参数传入时，它会被用来计算output_size。如果recompute_scale_factor为False或未指定，传入的scale_factor将直接用于插值计算。否则，将基于输出和输入尺寸重新计算一个新的scale_factor用于插值计算（即计算结果将与显式传入计算得出的output_size完全一致）。请注意当scale_factor为浮点数时，由于舍入和精度问题，重新计算的scale_factor可能与传入值存在差异。

注意

使用mode='bicubic'时，可能会导致超调现象，换句话说，它可能为图像生成负值或大于255的值。如果想在显示图像时减少超调，请显式调用result.clamp(min=0, max=255)。

警告

当align_corners = True时，线性插值模式（linear、bilinear和trilinear）不会按比例对齐输出和输入像素，因此输出值可能取决于输入尺寸。在0.3.1版本之前，这是这些模式的默认行为。此后，默认行为变为align_corners = False。具体示例请参阅Upsample了解这如何影响输出结果。

警告

当指定scale_factor时，如果recompute_scale_factor=True， scale_factor将用于计算output_size，然后该output_size将被用于推断插值的新比例。在1.6.0版本中，recompute_scale_factor的默认行为已更改为False，此时scale_factor将直接用于插值计算。

注意

当在CUDA设备上给定张量时，此操作可能会产生非确定性梯度。更多信息请参阅/notes/randomness。

class detectron2.layers.Linear(in_features: int, out_features: int, bias: bool = True)¶

基类: torch.nn.Module

对输入数据应用线性变换：\(y = xA^T + b\)

该模块支持TensorFloat32。

Parameters

in_features – 每个输入样本的大小
out_features - 每个输出样本的大小
bias - 如果设置为False，该层将不会学习附加偏置。默认值：True

Shape:

输入: \((N, *, H_{in})\) 其中 \(*\) 表示任意数量的额外维度，且 \(H_{in} = \text{in\_features}\)
输出：\((N, *, H_{out})\) 其中除最后一维外与输入形状相同，且\(H_{out} = \text{out\_features}\)。

weight¶: 该模块的可学习权重形状为 \((\text{out\_features}, \text{in\_features})\)。数值初始化范围是 \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\)，其中 \(k = \frac{1}{\text{in\_features}}\)

bias¶: 该模块的可学习偏置，形状为\((\text{out\_features})\)。如果bias为True，则值将从 \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\)初始化，其中 \(k = \frac{1}{\text{in\_features}}\)

示例：

>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])

extra_repr() → str ¶

forward(input: torch.Tensor) → torch.Tensor ¶

reset_parameters() → None ¶

in_features: int¶

out_features: int¶

weight: torch.Tensor¶

detectron2.layers.nonzero_tuple(x)[源代码]¶: 为了支持torchscript，实现了'torch.nonzero'的'as_tuple=True'版本。原因详见https://github.com/pytorch/pytorch/issues/38718

detectron2.layers.cross_entropy(input, target, *, reduction='mean', **kwargs)¶: 与loss_func相同，但对于空输入返回0（而不是nan）。

detectron2.layers.empty_input_loss_func_wrapper(loss_func)[源代码]¶

detectron2.layers.shapes_to_tensor(x: List[int], device: Optional[torch.device] = None) → torch.Tensor [源代码]¶

将整数标量或整数张量标量的列表转换为向量，这种方式既支持追踪也支持脚本化。

在追踪过程中，x应为一个标量张量列表，以便输出能够追踪到输入。在脚本模式或即时执行模式下，x应为一个整数列表。

detectron2.layers.move_device_like(src: torch.Tensor, dst: torch.Tensor) → torch.Tensor [源代码]¶: 以追踪友好的方式将张量转换为另一个张量的设备。在追踪过程中设备将被视为常量，通过脚本化整个转换过程可以规避此问题。

class detectron2.layers.CNNBlockBase(in_channels, out_channels, stride)[源代码]¶

基类: torch.nn.Module

假设CNN块具有输入通道、输出通道和步长。forward()方法的输入和输出必须是NCHW张量。该方法可以执行任意计算，但必须匹配给定的通道和步长规格。

Attribute:: in_channels (int): out_channels (int): stride (int):

__init__(in_channels, out_channels, stride)[源代码]¶

任何子类的__init__方法都应包含这些参数。

Parameters

in_channels (int) –
out_channels (int) –
stride (int) –

freeze()[源代码]¶

使该模块不可训练。此方法将所有参数设置为requires_grad=False，并将所有BatchNorm层转换为FrozenBatchNorm

Returns: 区块本身

training: bool¶

class detectron2.layers.DepthwiseSeparableConv2d(in_channels, out_channels, kernel_size=3, padding=1, dilation=1, *, norm1=None, activation1=None, norm2=None, activation2=None)[源代码]¶

基类: torch.nn.Module

一个kxk深度卷积 + 一个1x1卷积。

在Xception: Deep Learning with Depthwise Separable Convolutions中，标准化(norm)和激活(activation)应用于第二个卷积层。 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications则在两个卷积层上都使用了标准化和激活。

__init__(in_channels, out_channels, kernel_size=3, padding=1, dilation=1, *, norm1=None, activation1=None, norm2=None, activation2=None)[源代码]¶

Parameters

norm1 (str 或 callable) - 两个卷积层的归一化方式。
norm2 (str 或 callable) - 两个卷积层的归一化方式。
activation1 (callable(Tensor) -> Tensor) – 两个卷积层的激活函数。
activation2 (callable(Tensor) -> Tensor) – 两个卷积层的激活函数。

forward(x)[源代码]¶

training: bool¶

class detectron2.layers.ASPP(in_channels, out_channels, dilations, *, norm, activation, pool_kernel_size=None, dropout: float = 0.0, use_depthwise_separable_conv=False)[源代码]¶

基类: torch.nn.Module

空洞空间金字塔池化 (ASPP)。

__init__(in_channels, out_channels, dilations, *, norm, activation, pool_kernel_size=None, dropout: float = 0.0, use_depthwise_separable_conv=False)[源代码]¶

Parameters

in_channels (int) - ASPP模块的输入通道数。
out_channels (int) - 输出通道数。
dilations (list) – ASPP中的3个膨胀系数列表。
norm (str 或 callable) - 所有卷积层的归一化方法。支持的格式请参见layers.get_norm()。该归一化会应用于除全局平均池化后的卷积层之外的所有卷积层。
activation (callable) – 激活函数。
pool_kernel_size (tuple, list) - 用于ASPP中图像池化层的平均池化尺寸(kh, kw)。如果设为None，则始终执行全局平均池化。若非None，该值必须能被forward()中输入的形状整除。建议在训练时使用固定的输入特征尺寸，并将此选项设置为匹配该尺寸，这样在训练时执行全局平均池化，同时在推理过程中保持池化窗口尺寸一致。
dropout (float) – 在ASPP的输出上应用dropout。官方DeepLab实现中使用该参数，比率为0.1： https://github.com/tensorflow/models/blob/21b73d22f3ed05b650e85ac50849408dd36de32e/research/deeplab/model.py#L532 # noqa
use_depthwise_separable_conv (bool) – 在ASPP中使用DepthwiseSeparableConv2d进行3x3卷积，该方案由Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation论文提出。

forward(x)[源代码]¶

training: bool¶

detectron2.layers.ciou_loss(boxes1: torch.Tensor, boxes2: torch.Tensor, reduction: str = 'none', eps: float = 1e-07) → torch.Tensor [源代码]¶

完全交并比损失（郑兆辉等人） https://arxiv.org/abs/1911.08287 :param boxes1: 以XYXY格式表示的框位置，形状为(N, 4)或(4,)。 :type boxes1: Tensor :param boxes2: 以XYXY格式表示的框位置，形状为(N, 4)或(4,)。 :type boxes2: Tensor :param reduction: 'none' | 'mean' | 'sum'

'none': 不对输出进行任何缩减处理。 'mean': 输出将被平均。 'sum': 输出将被求和。

Parameters: eps (float) – 防止除以零的小数值

detectron2.layers.diou_loss(boxes1: torch.Tensor, boxes2: torch.Tensor, reduction: str = 'none', eps: float = 1e-07) → torch.Tensor [源代码]¶

距离交并比损失（Zhaohui Zheng等人） https://arxiv.org/abs/1911.08287 :param boxes1: 以XYXY格式表示的框位置，形状为(N, 4)或(4,)。 :type boxes1: Tensor :param boxes2: 以XYXY格式表示的框位置，形状为(N, 4)或(4,)。 :type boxes2: Tensor :param reduction: 'none' | 'mean' | 'sum'

'none': 不对输出进行任何缩减处理。 'mean': 输出将被平均。 'sum': 输出将被求和。

Parameters: eps (float) – 防止除以零的小数值