6.10. 占用率

本节介绍CUDA运行时应用程序编程接口中的占用率计算函数。

除了占用率计算函数(cudaOccupancyMaxActiveBlocksPerMultiprocessorcudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags)之外,在C++ API Routines模块中还记录了仅限C++的基于占用率的启动配置函数。

参见 cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API) cudaOccupancyAvailableDynamicSMemPerBlock (C++ API),

Functions

__host__cudaError_t cudaOccupancyAvailableDynamicSMemPerBlock ( size_t* dynamicSmemSize, const void* func, int  numBlocks, int  blockSize )
Returns dynamic shared memory available per block when launching numBlocks blocks on SM.
__host____device__cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, const void* func, int  blockSize, size_t dynamicSMemSize )
Returns occupancy for a device function.
__host__cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags ( int* numBlocks, const void* func, int  blockSize, size_t dynamicSMemSize, unsigned int  flags )
Returns occupancy for a device function with the specified flags.
__host__cudaError_t cudaOccupancyMaxActiveClusters ( int* numClusters, const void* func, const cudaLaunchConfig_t* launchConfig )
Given the kernel function (func) and launch configuration (config), return the maximum number of clusters that could co-exist on the target device in *numClusters.
__host__cudaError_t cudaOccupancyMaxPotentialClusterSize ( int* clusterSize, const void* func, const cudaLaunchConfig_t* launchConfig )
Given the kernel function (func) and launch configuration (config), return the maximum cluster size in *clusterSize.

Functions

__host__cudaError_t cudaOccupancyAvailableDynamicSMemPerBlock ( size_t* dynamicSmemSize, const void* func, int  numBlocks, int  blockSize )
Returns dynamic shared memory available per block when launching numBlocks blocks on SM.
参数
dynamicSmemSize
- Returned maximum dynamic shared memory
func
- Kernel function for which occupancy is calculated
numBlocks
- Number of blocks to fit on SM
blockSize
- Size of the block
描述

*dynamicSmemSize中返回允许每个SM运行numBlocks个块所需的动态共享内存最大大小。

Note:

另请参阅:

cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags, cudaOccupancyMaxPotentialBlockSize (C++ API), cudaOccupancyMaxPotentialBlockSizeWithFlags (C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem (C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMemWithFlags (C++ API), cudaOccupancyAvailableDynamicSMemPerBlock

__host____device__cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, const void* func, int  blockSize, size_t dynamicSMemSize )
返回设备函数的占用情况。
参数
numBlocks
- Returned occupancy
func
- Kernel function for which occupancy is calculated
blockSize
- Block size the kernel is intended to be launched with
dynamicSMemSize
- Per-block dynamic shared memory usage intended, in bytes
描述

返回设备函数每个流式多处理器在*numBlocks中的最大活动块数。

Note:

另请参阅:

cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags, cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeWithFlags ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMemWithFlags ( C++ API), cudaOccupancyAvailableDynamicSMemPerBlock (C++ API), cuOccupancyMaxActiveBlocksPerMultiprocessor

__host__cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags ( int* numBlocks, const void* func, int  blockSize, size_t dynamicSMemSize, unsigned int  flags )
返回具有指定标志的设备函数的占用情况。
参数
numBlocks
- Returned occupancy
func
- Kernel function for which occupancy is calculated
blockSize
- Block size the kernel is intended to be launched with
dynamicSMemSize
- Per-block dynamic shared memory usage intended, in bytes
flags
- Requested behavior for the occupancy calculator
描述

返回设备函数每个流式多处理器在*numBlocks中的最大活动块数。

flags 参数用于控制特殊情况的处理方式。有效标志包括:

  • cudaOccupancyDisableCachingOverride: 该标志用于在全局缓存影响占用率的平台上抑制默认行为。在此类平台上,如果启用了缓存,但每个块的SM资源使用会导致零占用率,占用率计算器将按照禁用缓存的情况计算占用率。设置此标志将使占用率计算器在此类情况下返回0。有关此功能的更多信息,请参阅Maxwell调优指南中的"统一L1/纹理缓存"部分。

Note:

另请参阅:

cudaOccupancyMaxActiveBlocksPerMultiprocessor, cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeWithFlags ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMemWithFlags ( C++ API), cudaOccupancyAvailableDynamicSMemPerBlock (C++ API), cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags

__host__cudaError_t cudaOccupancyMaxActiveClusters ( int* numClusters, const void* func, const cudaLaunchConfig_t* launchConfig )
Given the kernel function (func) and launch configuration (config), return the maximum number of clusters that could co-exist on the target device in *numClusters.
参数
numClusters
- Returned maximum number of clusters that could co-exist on the target device
func
- Kernel function for which maximum number of clusters are calculated
launchConfig
描述

如果函数已设置所需的集群大小(参见cudaFuncGetAttributes),则配置中的集群大小必须未指定或与所需大小匹配。若未设置所需大小,则必须在配置中指定集群大小,否则函数将返回错误。

请注意,内核函数的各种属性可能会影响占用率计算。运行时环境可能会影响硬件如何调度集群,因此计算出的占用率并不保证能够实现。

Note:

另请参阅:

cudaFuncGetAttributes cudaOccupancyMaxActiveClusters (C++ API), cuOccupancyMaxActiveClusters

__host__cudaError_t cudaOccupancyMaxPotentialClusterSize ( int* clusterSize, const void* func, const cudaLaunchConfig_t* launchConfig )
Given the kernel function (func) and launch configuration (config), return the maximum cluster size in *clusterSize.
参数
clusterSize
- Returned maximum cluster size that can be launched for the given kernel function and launch configuration
func
- Kernel function for which maximum cluster size is calculated
launchConfig
描述

config中的集群维度参数将被忽略。如果func设置了必需的集群大小(参见cudaFuncGetAttributes),*clusterSize将反映所需的集群大小。

默认情况下,此函数始终会返回一个可在未来硬件上移植的值。如果内核函数允许非可移植的集群大小,则可能返回更高的值。

该函数将遵循编译时的启动边界限制。

Note:

另请参阅:

cudaFuncGetAttributes cudaOccupancyMaxPotentialClusterSize (C++ API), cuOccupancyMaxPotentialClusterSize