CUDA Driver API :: CUDA Toolkit Documentation

6.25. 占用率

本节介绍底层CUDA驱动程序应用程序编程接口的占用率计算函数。

Functions

CUresult cuOccupancyAvailableDynamicSMemPerBlock ( size_t* dynamicSmemSize, CUfunction func, int numBlocks, int blockSize ): Returns dynamic shared memory available per block when launching numBlocks blocks on SM.
CUresult cuOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, CUfunction func, int blockSize, size_t dynamicSMemSize ): Returns occupancy of a function.
CUresult cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags ( int* numBlocks, CUfunction func, int blockSize, size_t dynamicSMemSize, unsigned int flags ): Returns occupancy of a function.
CUresult cuOccupancyMaxActiveClusters ( int* numClusters, CUfunction func, const CUlaunchConfig* config ): Given the kernel function (func) and launch configuration (config), return the maximum number of clusters that could co-exist on the target device in *numClusters.
CUresult cuOccupancyMaxPotentialBlockSize ( int* minGridSize, int* blockSize, CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit ): Suggest a launch configuration with reasonable occupancy.
CUresult cuOccupancyMaxPotentialBlockSizeWithFlags ( int* minGridSize, int* blockSize, CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit, unsigned int flags ): Suggest a launch configuration with reasonable occupancy.
CUresult cuOccupancyMaxPotentialClusterSize ( int* clusterSize, CUfunction func, const CUlaunchConfig* config ): Given the kernel function (func) and launch configuration (config), return the maximum cluster size in *clusterSize.

Functions

CUresult cuOccupancyAvailableDynamicSMemPerBlock ( size_t* dynamicSmemSize, CUfunction func, int numBlocks, int blockSize )

Returns dynamic shared memory available per block when launching numBlocks blocks on SM.

参数

dynamicSmemSize: - Returned maximum dynamic shared memory
func: - Kernel function for which occupancy is calculated
numBlocks: - Number of blocks to fit on SM
blockSize: - Size of the blocks

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

描述

在*dynamicSmemSize中返回允许每个SM运行numBlocks个块所需的动态共享内存最大大小。

请注意，该API也可与无上下文内核CUkernel配合使用，通过cuLibraryGetKernel()查询句柄，然后将其转换为CUfunction传递给API。在这种情况下，用于计算的环境将是当前上下文。

Note:

请注意，此函数也可能返回之前异步启动的错误代码。

CUresult cuOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, CUfunction func, int blockSize, size_t dynamicSMemSize )

返回函数的占用情况。

参数

numBlocks: - Returned occupancy
func: - Kernel for which occupancy is calculated
blockSize: - Block size the kernel is intended to be launched with
dynamicSMemSize: - Per-block dynamic shared memory usage intended, in bytes

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

描述

返回*numBlocks中每个流式多处理器最大活跃块的数量。

请注意，该API也可与无上下文内核CUkernel配合使用，通过cuLibraryGetKernel()查询句柄，然后将其强制转换为CUfunction传递给API。在这种情况下，用于计算的环境将是当前上下文。

Note:

请注意，此函数也可能返回之前异步启动的错误代码。

另请参阅：

cudaOccupancyMaxActiveBlocksPerMultiprocessor

CUresult cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags ( int* numBlocks, CUfunction func, int blockSize, size_t dynamicSMemSize, unsigned int flags )

返回函数的占用情况。

参数

numBlocks: - Returned occupancy
func: - Kernel for which occupancy is calculated
blockSize: - Block size the kernel is intended to be launched with
dynamicSMemSize: - Per-block dynamic shared memory usage intended, in bytes
flags: - Requested behavior for the occupancy calculator

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

描述

返回*numBlocks中每个流式多处理器最大活跃块的数量。

Flags 参数控制特殊情况的处理方式。有效标志包括：

CU_OCCUPANCY_DEFAULT，保持与cuOccupancyMaxActiveBlocksPerMultiprocessor相同的默认行为；

CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE，该参数在全局缓存影响占用率的平台上会抑制默认行为。在此类平台上，如果启用了缓存，但每个块的SM资源使用会导致零占用率，则占用率计算器将按照禁用缓存的情况计算占用率。设置CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE会使占用率计算器在此类情况下返回0。有关此功能的更多信息，请参阅Maxwell调优指南中的"统一L1/纹理缓存"章节。

请注意，该API也可以通过查询cuLibraryGetKernel()获取句柄，然后将其转换为CUfunction传递给API，从而启动无上下文内核CUkernel。在这种情况下，用于计算的环境将是当前上下文。

Note:

请注意，此函数也可能返回之前异步启动的错误代码。

另请参阅：

cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags

CUresult cuOccupancyMaxActiveClusters ( int* numClusters, CUfunction func, const CUlaunchConfig* config )

Given the kernel function (func) and launch configuration (config), return the maximum number of clusters that could co-exist on the target device in *numClusters.

参数

numClusters: - Returned maximum number of clusters that could co-exist on the target device
func: - Kernel function for which maximum number of clusters are calculated
config: - Launch configuration for the given kernel function

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_INVALID_CLUSTER_SIZE, CUDA_ERROR_UNKNOWN

描述

如果函数已设置所需的集群大小（参见cudaFuncGetAttributes / cuFuncGetAttribute），则配置中的集群大小必须未指定或与所需大小匹配。若未设置所需大小，则必须在配置中指定集群大小，否则函数将返回错误。

请注意，内核函数的各种属性可能会影响占用率计算。运行时环境可能会影响硬件如何调度集群，因此计算出的占用率并不保证能够实现。

请注意，该API也可与无上下文内核CUkernel配合使用，通过cuLibraryGetKernel()查询句柄后，将其强制转换为CUfunction传递给API。在这种情况下，用于计算的上文将取自指定的流config->hStream，若流为NULL则使用当前上下文。

Note:

请注意，此函数也可能返回之前异步启动的错误代码。

另请参阅：

cudaFuncGetAttributes, cuFuncGetAttribute

CUresult cuOccupancyMaxPotentialBlockSize ( int* minGridSize, int* blockSize, CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit )

建议一个具有合理占用率的启动配置。

参数

minGridSize: - Returned minimum grid size needed to achieve the maximum occupancy
blockSize: - Returned maximum block size that can achieve the maximum occupancy
func: - Kernel for which launch configuration is calculated
blockSizeToDynamicSMemSize: - A function that calculates how much per-block dynamic shared memory func uses based on the block size
dynamicSMemSize: - Dynamic shared memory usage intended, in bytes
blockSizeLimit: - The maximum block size func is designed to handle

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

描述

在*blockSize中返回一个合理的块大小，该大小可以实现最大占用率（或者说，每个多处理器上使用最少的块实现最大数量的活动warp），并在*minGridSize中返回实现最大占用率所需的最小网格大小。

如果blockSizeLimit为0，配置器将改用设备/函数允许的最大块大小。

如果不需要每个块的动态共享内存分配，用户应将blockSizeToDynamicSMemSize和dynamicSMemSize都保留为0。

如果需要为每个块动态分配共享内存，且动态共享内存大小与块大小无关而保持恒定，则应通过dynamicSMemSize传递该大小值，并将blockSizeToDynamicSMemSize设为NULL。

否则，如果每个块的动态共享内存大小随不同块大小而变化，用户需要通过blockSizeToDynamicSMemSize提供一个一元函数，该函数计算func对于任何给定块大小所需的动态共享内存。dynamicSMemSize将被忽略。示例函数签名如下：

‎    // Take block size, returns dynamic shared memory needed
          size_t blockToSmem(int blockSize);

Note:

请注意，此函数也可能返回之前异步启动的错误代码。

另请参阅：

cudaOccupancyMaxPotentialBlockSize

CUresult cuOccupancyMaxPotentialBlockSizeWithFlags ( int* minGridSize, int* blockSize, CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit, unsigned int flags )

建议一个具有合理占用率的启动配置。

参数

minGridSize: - Returned minimum grid size needed to achieve the maximum occupancy
blockSize: - Returned maximum block size that can achieve the maximum occupancy
func: - Kernel for which launch configuration is calculated
blockSizeToDynamicSMemSize: - A function that calculates how much per-block dynamic shared memory func uses based on the block size
dynamicSMemSize: - Dynamic shared memory usage intended, in bytes
blockSizeLimit: - The maximum block size func is designed to handle
flags: - Options

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

描述

cuOccupancyMaxPotentialBlockSize的扩展版本。除了传递给cuOccupancyMaxPotentialBlockSize的参数外，cuOccupancyMaxPotentialBlockSizeWithFlags还接受一个Flags参数。

Flags 参数控制特殊情况的处理方式。有效标志包括：

CU_OCCUPANCY_DEFAULT，保持与cuOccupancyMaxPotentialBlockSize相同的默认行为；

CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE，该参数会抑制在全局缓存影响占用率的平台上的默认行为。在此类平台上，产生最大占用率的启动配置可能不支持全局缓存。设置CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE可确保生成的启动配置与全局缓存兼容，但可能会以占用率为代价。有关此功能的更多信息，请参阅Maxwell调优指南中的"统一L1/纹理缓存"章节。

Note:

请注意，此函数也可能返回之前异步启动的错误代码。

另请参阅：

cudaOccupancyMaxPotentialBlockSizeWithFlags

CUresult cuOccupancyMaxPotentialClusterSize ( int* clusterSize, CUfunction func, const CUlaunchConfig* config )

Given the kernel function (func) and launch configuration (config), return the maximum cluster size in *clusterSize.

参数

clusterSize: - Returned maximum cluster size that can be launched for the given kernel function and launch configuration
func: - Kernel function for which maximum cluster size is calculated
config: - Launch configuration for the given kernel function

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

描述

config中的集群维度参数将被忽略。如果func设置了必需的集群大小（参见cudaFuncGetAttributes / cuFuncGetAttribute），*clusterSize将反映所需的集群大小。

默认情况下，此函数始终会返回一个可在未来硬件上移植的值。如果内核函数允许非可移植的集群大小，则可能返回更高的值。

该函数将遵循编译时的启动边界限制。

请注意，该API也可与无上下文内核CUkernel配合使用，通过cuLibraryGetKernel()查询句柄后，将其强制转换为CUfunction传递给API。此时，用于计算的环境将从指定的流config->hStream中获取，若流为NULL则使用当前环境。

Note:

请注意，此函数也可能返回之前异步启动的错误代码。

另请参阅：

cudaFuncGetAttributes, cuFuncGetAttribute