torch.nn.utils.memory_format 的源代码

```html
import torch


[docs]def convert_conv2d_weight_memory_format(module, memory_format):
    r"""将 ``nn.Conv2d.weight`` 的 ``memory_format`` 转换为 ``memory_format``。

    该转换递归地应用于嵌套的 ``nn.Module``，包括 ``module``。
    请注意，它仅更改 memory_format，但不改变每个维度的语义。
    此函数用于促进计算采用 NHWC 内核，这
    为 CUDA 设备上计算能力 >= 7.0 的 fp16 数据提供了显著的速度提升

    .. 注意::
        调用 ``model.to(memory_format=torch.channels_last)`` 比实用函数 ``convert_conv2d_weight_memory_format`` 更具侵略性。任何
        具有 4d 权重的层都会受到 ``model.to`` 的影响，这并不
        一定从转换为指定的 ``memory_format`` 中受益。
        我们确信的一个地方是 cuDNN 中的卷积的 NHWC（channels_last）转换，因为运行卷积在 NHWC 中是有益的，
        即使在必须对输入张量应用排列的情况下也是如此。

        因此，我们的策略是仅将卷积的权重转换为
        channels_last。这确保了：
        1. 将使用快速卷积内核，其好处可能
        超过排列的开销（如果输入不在同一格式中）
        2. 不会对不从 memory_format 转换中受益的层应用不必要的排列。

        最佳情况是，卷积层之间的层是 channels
        last 兼容的。输入张量将在遇到第一个卷积层时排列为 channels last 并保持该内存格式。
        因此，后续卷积将不需要排列其输入张量。

        在卷积层之间存在 channels last 不兼容层的情况下，我们需要将输入张量排列回连续格式
        对于该层。输入张量将以连续格式通过剩余层，并在遇到
        另一个卷积层时排列为 channels last。没有必要将该排列传播到更早的层，因为大多数层对
        ``memory_format`` 相当不敏感。

        当 PyTorch 支持排列融合时，这一说法可能会改变，因为
        可能有一个更好的位置来融合排列，而不是
        立即在卷积之前。

    参数:
        module (nn.Module): ``nn.Conv2d`` & ``nn.ConvTranspose2d`` 或容器
                            ``nn.Module``
        memory_format: 用户指定的 ``memory_format``，
            例如 ``torch.channels_last`` 或 ``torch.contiguous_format``

    返回:
        更新了 ``nn.Conv2d`` 的原始模块

    示例:
        >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA)
        >>> # xdoctest: +REQUIRES(env:CUBLAS_WORKSPACE_CONFIG)
        >>> input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float16, device="cuda")
        >>> model = nn.Sequential(
        >>>     nn.Conv2d(8, 4, 3)).cuda().half()
        >>> # 这等同于：
        >>> # nn.utils.convert_conv2d_weight_memory_format(model, torch.channels_last)
        >>> model = nn.utils.convert_conv2d_weight_memory_format(model, torch.channels_last)
        >>> out = model(input)
    """
    # TODO: 当 channels_last 支持扩展到 4d 张量以外时，将其扩展到 `_ConvNd`。
    if isinstance(module, (torch.nn.Conv2d, torch.nn.ConvTranspose2d)):
        weight_data = module.weight.detach().clone().contiguous(memory_format=memory_format)
        module.weight.data = weight_data.resize_(weight_data.size(), memory_format=memory_format)
    for child in module.children():
        convert_conv2d_weight_memory_format(child, memory_format)
    return module


[docs]def convert_conv3d_weight_memory_format(module, memory_format):
    r"""将 ``nn.Conv3d.weight`` 的 ``memory_format`` 转换为 ``memory_format``
    该转换递归地应用于嵌套的 ``nn.Module``，包括 ``module``。
    请注意，它仅更改 memory_format，但不改变每个维度的语义。
    此函数用于促进计算采用 NHWC 内核，这
    为 CUDA 设备上计算能力 >= 7.0 的 fp16 数据提供了显著的速度提升

    .. 注意::
        调用 ``model.to(memory_format=torch.channels_last)`` 比实用函数 ``convert_conv3d_weight_memory_format`` 更具侵略性。任何
        具有 4d 权重的层都会受到 ``model.to`` 的影响，这并不
        一定从转换为指定的 ``memory_format`` 中受益。
        我们确信的一个地方是 cuDNN 中的卷积的 NHWC（channels_last）转换，因为运行卷积在 NHWC 中是有益的，
        即使在必须对输入张量应用排列的情况下也是如此。

        因此，我们的策略是仅将卷积的权重转换为
        channels_last。这确保了：
        1. 将使用快速卷积内核，其好处可能
        超过排列的开销（如果输入不在同一格式中）
        2. 不会对不从 memory_format 转换中受益的层应用不必要的排列。

        最佳情况是，卷积层之间的层是 channels
        last 兼容的。输入张量将在遇到第一个卷积层时排列为 channels last 并保持该内存格式。
        因此，后续卷积将不需要排列其输入张量。

        在卷积层之间存在 channels last 不兼容层的情况下，我们需要将输入张量排列回连续格式
        对于该层。输入张量将以连续格式通过剩余层，并在遇到
        另一个卷积层时排列为 channels last。没有必要将该排列传播到更早的层，因为大多数层对
        ``memory_format`` 相当不敏感。

        当 PyTorch 支持排列融合时，这一说法可能会改变，因为
        可能有一个更好的位置来融合排列，而不是
        立即在卷积之前。

    参数:
        module (nn.Module): ``nn.Conv3d`` & ``nn.ConvTranspose3d`` 或容器
                            ``nn.Module``
        memory_format: 用户指定的 ``memory_format``，
            例如 ``torch.channels_last`` 或 ``torch.contiguous_format``

    返回:
        更新了 ``nn.Conv3d`` 的原始模块

    示例:
        >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA)
        >>> # xdoctest: +REQUIRES(env:CUBLAS_WORKSPACE_CONFIG)
        >>> input = torch.randint(1, 10, (2, 8, 4, 4, 4), dtype=torch.float16, device="cuda")
        >>> model = nn.Sequential(
        >>>     nn.Conv3d(8, 4, 3)).cuda().half()
        >>> # 这等同于：
        >>> # nn.utils.convert_conv3d_weight_memory_format(model, torch.channels_last)
        >>> model = nn.utils.convert_conv3d_weight_memory_format(model, torch.channels_last)
        >>> out = model(input)
    """

    # TODO: 当 channels_last 支持扩展到 4d 张量以外时，将其扩展到 `_ConvNd`。
    if isinstance(module, (torch.nn.Conv3d, torch.nn.ConvTranspose3d)):
        weight_data = module.weight.detach().clone().contiguous(memory_format=memory_format)
        module.weight.data = weight_data.resize_(weight_data.size(), memory_format=memory_format)
    for child in <span class="