6.32. 库管理
本节介绍CUDA运行时应用程序编程接口的库管理功能。
Functions
- __host__ cudaError_t cudaKernelSetAttributeForDevice ( cudaKernel_t kernel, cudaFuncAttribute attr, int value, int device )
- Sets information about a kernel.
- __host__ cudaError_t cudaLibraryEnumerateKernels ( cudaKernel_t* kernels, unsigned int numKernels, cudaLibrary_t lib )
- Retrieve the kernel handles within a library.
- __host__ cudaError_t cudaLibraryGetGlobal ( void** dptr, size_t* bytes, cudaLibrary_t library, const char* name )
- Returns a global device pointer.
- __host__ cudaError_t cudaLibraryGetKernel ( cudaKernel_t* pKernel, cudaLibrary_t library, const char* name )
- Returns a kernel handle.
- __host__ cudaError_t cudaLibraryGetKernelCount ( unsigned int* count, cudaLibrary_t lib )
- Returns the number of kernels within a library.
- __host__ cudaError_t cudaLibraryGetManaged ( void** dptr, size_t* bytes, cudaLibrary_t library, const char* name )
- Returns a pointer to managed memory.
- __host__ cudaError_t cudaLibraryGetUnifiedFunction ( void** fptr, cudaLibrary_t library, const char* symbol )
- Returns a pointer to a unified function.
- __host__ cudaError_t cudaLibraryLoadData ( cudaLibrary_t* library, const void* code, cudaJitOption ** jitOptions, void** jitOptionsValues, unsigned int numJitOptions, cudaLibraryOption ** libraryOptions, void** libraryOptionValues, unsigned int numLibraryOptions )
- Load a library with specified code and options.
- __host__ cudaError_t cudaLibraryLoadFromFile ( cudaLibrary_t* library, const char* fileName, cudaJitOption ** jitOptions, void** jitOptionsValues, unsigned int numJitOptions, cudaLibraryOption ** libraryOptions, void** libraryOptionValues, unsigned int numLibraryOptions )
- Load a library with specified file and options.
- __host__ cudaError_t cudaLibraryUnload ( cudaLibrary_t library )
- Unloads a library.
Functions
- __host__ cudaError_t cudaKernelSetAttributeForDevice ( cudaKernel_t kernel, cudaFuncAttribute attr, int value, int device )
-
设置有关内核的信息。
参数
- kernel
- - Kernel to set attribute of
- attr
- - Attribute requested
- value
- - Value to set
- device
- - Device to set attribute of
描述
此调用将指定设备device上内核kernel的属性attr值设置为由value指定的整数值。如果属性新值设置成功,该函数将返回cudaSuccess。若设置失败,此调用将返回错误。并非所有属性都可设置值,尝试对只读属性设置值将导致错误(cudaErrorInvalidValue)
请注意,通过cudaFuncSetAttribute()设置的属性将覆盖本API设置的属性,无论cudaFuncSetAttribute()调用发生在本API调用之前还是之后。由于这个原因以及下文提到的更严格的锁定要求,建议在初始化路径中使用此调用,而不是在每次访问kernel的线程上(例如内核启动或关键路径上)使用。
attr的有效取值包括:
-
cudaFuncAttributeMaxDynamicSharedMemorySize - 请求的动态分配共享内存的最大字节数。该值与函数属性sharedSizeBytes的总和不能超过设备属性cudaDevAttrMaxSharedMemoryPerBlockOptin。可请求的动态共享内存最大容量可能因GPU架构而异。
-
cudaFuncAttributePreferredSharedMemoryCarveout - 在L1缓存和共享内存使用相同硬件资源的设备上,此属性设置共享内存的预留偏好比例(以总共享内存的百分比表示)。详见cudaDevAttrMaxSharedMemoryPerMultiprocessor。这仅是一个提示值,驱动程序可根据函数执行需要选择不同的比例。
-
cudaFuncAttributeRequiredClusterWidth: 所需的块集群宽度。宽度、高度和深度值必须全为0或全为正数。集群维度的有效性将在启动时检查。如果在编译时设置该值,则无法在运行时设置。在运行时设置将返回cudaErrorNotPermitted。
-
cudaFuncAttributeRequiredClusterHeight: 所需的块集群高度。宽度、高度和深度值必须全为0或全为正数。集群维度的有效性将在启动时检查。如果在编译时设置了该值,则无法在运行时设置。在运行时设置将返回cudaErrorNotPermitted。
-
cudaFuncAttributeRequiredClusterDepth: 所需的块集群深度。宽度、高度和深度值必须全为0或全为正数。集群维度的有效性将在启动时检查。如果在编译时设置该值,则无法在运行时设置。在运行时设置将返回cudaErrorNotPermitted错误。
-
cudaFuncAttributeNonPortableClusterSizeAllowed: 表示该函数是否允许使用非便携式集群大小启动。1表示允许,0表示禁止。
-
cudaFuncAttributeClusterSchedulingPolicyPreference: 函数的块调度策略。值类型为cudaClusterSchedulingPolicy。
Note:与旧版API cudaFuncSetAttribute()相比,该API具有更严格的锁定要求,因为它涉及设备范围的语义。如果多个线程同时尝试在同一设备上设置相同属性,属性设置结果将取决于操作系统调度器选择的时间交错和内存一致性。
另请参阅:
cudaLibraryLoadData, cudaLibraryLoadFromFile, cudaLibraryUnload, cudaLibraryGetKernel, cudaLaunchKernel, cudaFuncSetAttribute, cuKernelSetAttribute
- __host__ cudaError_t cudaLibraryEnumerateKernels ( cudaKernel_t* kernels, unsigned int numKernels, cudaLibrary_t lib )
-
检索库中的内核句柄。
参数
- kernels
- - Buffer where the kernel handles are returned to
- numKernels
- - Maximum number of kernel handles may be returned to the buffer
- lib
- - Library to query from
返回
cudaSuccess, cudaErrorCudartUnloading, cudaErrorInitializationError, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle
描述
在kernels中返回库lib内最多numKernels个内核句柄。当库被卸载时,返回的内核句柄将失效。
另请参阅:
- __host__ cudaError_t cudaLibraryGetGlobal ( void** dptr, size_t* bytes, cudaLibrary_t library, const char* name )
-
返回一个全局设备指针。
参数
- dptr
- - Returned global device pointer for the requested library
- bytes
- - Returned global size in bytes
- library
- - Library to retrieve global from
- name
- - Name of global to retrieve
返回
cudaSuccess, cudaErrorCudartUnloading, cudaErrorInitializationError, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorSymbolNotFoundcudaErrorDeviceUninitialized, cudaErrorContextIsDestroyed
描述
在*dptr和*bytes中返回指定库library和当前设备中名为name的全局变量的基指针和大小。如果请求的名称name不存在对应的全局变量,该调用将返回cudaErrorSymbolNotFound。参数dptr或bytes(不能同时为NULL)可以为NULL,此时该参数将被忽略。返回的dptr不能传递给符号API,例如cudaMemcpyToSymbol、cudaMemcpyFromSymbol、cudaGetSymbolAddress或cudaGetSymbolSize。
另请参阅:
cudaLibraryLoadData, cudaLibraryLoadFromFile, cudaLibraryUnload, cudaLibraryGetManaged, cuLibraryGetGlobal
- __host__ cudaError_t cudaLibraryGetKernel ( cudaKernel_t* pKernel, cudaLibrary_t library, const char* name )
-
返回一个内核句柄。
参数
- pKernel
- - Returned kernel handle
- library
- - Library to retrieve kernel from
- name
- - Name of kernel to retrieve
返回
cudaSuccess, cudaErrorCudartUnloading, cudaErrorInitializationError, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorSymbolNotFound
描述
在pKernel中返回位于库library中名为name的内核句柄。如果未找到内核句柄,该调用将返回cudaErrorSymbolNotFound。
另请参阅:
cudaLibraryLoadData, cudaLibraryLoadFromFile, cudaLibraryUnload, cuLibraryGetKernel
- __host__ cudaError_t cudaLibraryGetKernelCount ( unsigned int* count, cudaLibrary_t lib )
-
返回库中的内核数量。
参数
- count
- - Number of kernels found within the library
- lib
- - Library to query
返回
cudaSuccess, cudaErrorCudartUnloading, cudaErrorInitializationError, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle
描述
返回lib中内核的数量到count参数。
另请参阅:
cudaLibraryEnumerateKernels, cudaLibraryLoadFromFile, cudaLibraryLoadData, cuLibraryGetKernelCount
- __host__ cudaError_t cudaLibraryGetManaged ( void** dptr, size_t* bytes, cudaLibrary_t library, const char* name )
-
返回指向托管内存的指针。
参数
- dptr
- - Returned pointer to the managed memory
- bytes
- - Returned memory size in bytes
- library
- - Library to retrieve managed memory from
- name
- - Name of managed memory to retrieve
返回
cudaSuccess, cudaErrorCudartUnloading, cudaErrorInitializationError, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorSymbolNotFound
描述
在*dptr和*bytes中返回请求库library中名为name的托管内存的基指针和大小。如果不存在具有请求名称name的托管内存,则调用返回cudaErrorSymbolNotFound。参数dptr或bytes(不能同时)可以为NULL,此时该参数将被忽略。请注意,库library的托管内存跨设备共享,并在加载库时注册。返回的dptr不能传递给符号API,例如cudaMemcpyToSymbol、cudaMemcpyFromSymbol、cudaGetSymbolAddress或cudaGetSymbolSize。
另请参阅:
cudaLibraryLoadData, cudaLibraryLoadFromFile, cudaLibraryUnload, cudaLibraryGetGlobal, cuLibraryGetManaged
- __host__ cudaError_t cudaLibraryGetUnifiedFunction ( void** fptr, cudaLibrary_t library, const char* symbol )
-
返回指向统一函数的指针。
参数
- fptr
- - Returned pointer to a unified function
- library
- - Library to retrieve function pointer memory from
- symbol
- - Name of function pointer to retrieve
返回
cudaSuccess, cudaErrorCudartUnloading, cudaErrorInitializationError, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle, cudaErrorSymbolNotFound
描述
在*fptr中返回由symbol表示的统一函数的函数指针。如果不存在名为symbol的统一函数,该调用将返回cudaErrorSymbolNotFound。如果系统中不存在具有cudaDeviceProp::unifiedFunctionPointers属性的设备,该调用可能会返回cudaErrorSymbolNotFound。
另请参阅:
cudaLibraryLoadData, cudaLibraryLoadFromFile, cudaLibraryUnload, cuLibraryGetUnifiedFunction
- __host__ cudaError_t cudaLibraryLoadData ( cudaLibrary_t* library, const void* code, cudaJitOption ** jitOptions, void** jitOptionsValues, unsigned int numJitOptions, cudaLibraryOption ** libraryOptions, void** libraryOptionValues, unsigned int numLibraryOptions )
-
加载一个带有指定代码和选项的库。
参数
- library
- - Returned library
- code
- - Code to load
- jitOptions
- - Options for JIT
- jitOptionsValues
- - Option values for JIT
- numJitOptions
- - Number of options
- libraryOptions
- - Options for loading
- libraryOptionValues
- - Option values for loading
- numLibraryOptions
- - Number of options for loading
返回
cudaSuccess, cudaErrorInvalidValue, cudaErrorMemoryAllocation, cudaErrorInitializationError, cudaErrorCudartUnloading, cudaErrorInvalidPtx, cudaErrorUnsupportedPtxVersion, cudaErrorNoKernelImageForDevice, cudaErrorSharedObjectSymbolNotFound, cudaErrorSharedObjectInitFailed, cudaErrorJitCompilerNotFound
描述
获取一个指针code并根据应用程序定义的库加载模式加载对应的库library:
-
如果通过"模块加载"中描述的环境变量将模块加载设置为EAGER模式,library会在调用时立即加载到所有上下文中,并在创建新上下文时自动加载,直到使用cudaLibraryUnload()卸载该库为止。
-
如果环境变量设置为LAZY,library不会立即加载到所有现有上下文,只有当该上下文需要某个函数(例如内核启动)时才会加载。
这些环境变量在CUDA编程指南的"CUDA环境变量"章节中有详细说明。
The code may be a cubin or fatbin as output by nvcc, or a NULL-terminated PTX, either as output by nvcc or hand-written. A fatbin should also contain relocatable code when doing separate compilation. Please also see the documentation for nvrtc (https://docs.nvidia.com/cuda/nvrtc/index.html), nvjitlink (https://docs.nvidia.com/cuda/nvjitlink/index.html), and nvfatbin (https://docs.nvidia.com/cuda/nvfatbin/index.html) for more information on generating loadable code at runtime.
选项通过jitOptions以数组形式传递,任何对应的参数则在jitOptionsValues中传递。总JIT选项数量通过numJitOptions提供。所有输出将通过jitOptionsValues返回。
库加载选项通过libraryOptions以数组形式传递,任何对应的参数则在libraryOptionValues中传递。总库加载选项的数量通过numLibraryOptions提供。
另请参阅:
cudaLibraryLoadFromFile, cudaLibraryUnload, cuLibraryLoadData
- __host__ cudaError_t cudaLibraryLoadFromFile ( cudaLibrary_t* library, const char* fileName, cudaJitOption ** jitOptions, void** jitOptionsValues, unsigned int numJitOptions, cudaLibraryOption ** libraryOptions, void** libraryOptionValues, unsigned int numLibraryOptions )
-
加载具有指定文件和选项的库。
参数
- library
- - Returned library
- fileName
- - File to load from
- jitOptions
- - Options for JIT
- jitOptionsValues
- - Option values for JIT
- numJitOptions
- - Number of options
- libraryOptions
- - Options for loading
- libraryOptionValues
- - Option values for loading
- numLibraryOptions
- - Number of options for loading
返回
cudaSuccess, cudaErrorInvalidValue, cudaErrorMemoryAllocation, cudaErrorInitializationError, cudaErrorCudartUnloading, cudaErrorInvalidPtx, cudaErrorUnsupportedPtxVersion, cudaErrorNoKernelImageForDevice, cudaErrorSharedObjectSymbolNotFound, cudaErrorSharedObjectInitFailed, cudaErrorJitCompilerNotFound
描述
获取一个指针code并根据应用程序定义的库加载模式加载对应的库library:
-
如果通过"模块加载"中描述的环境变量将模块加载设置为EAGER模式,library会在调用时立即加载到所有上下文中,并在创建新上下文时自动加载,直到使用cudaLibraryUnload()卸载该库为止。
-
如果环境变量设置为LAZY,library不会立即加载到所有现有上下文中,只有当该上下文需要某个函数(例如内核启动)时才会加载。
这些环境变量在CUDA编程指南的"CUDA环境变量"章节中有详细说明。
The file should be a cubin file as output by nvcc, or a PTX file either as output by nvcc or handwritten, or a fatbin file as output by nvcc. A fatbin should also contain relocatable code when doing separate compilation. Please also see the documentation for nvrtc (https://docs.nvidia.com/cuda/nvrtc/index.html), nvjitlink (https://docs.nvidia.com/cuda/nvjitlink/index.html), and nvfatbin (https://docs.nvidia.com/cuda/nvfatbin/index.html) for more information on generating loadable code at runtime.
选项通过jitOptions以数组形式传递,对应的参数通过jitOptionsValues传递。总选项数由numJitOptions提供。所有输出将通过jitOptionsValues返回。
库加载选项通过libraryOptions以数组形式传递,任何对应的参数则在libraryOptionValues中传递。总库加载选项的数量通过numLibraryOptions提供。
另请参阅:
cudaLibraryLoadData, cudaLibraryUnload, cuLibraryLoadFromFile
- __host__ cudaError_t cudaLibraryUnload ( cudaLibrary_t library )
-
卸载一个库。
参数
- library
- - Library to unload