cuDLA API :: CUDA Toolkit Documentation

1. 模块

以下是所有模块的列表：

cuDLA驱动使用的数据类型
cuDLA API

1.1. cuDLA驱动使用的数据类型

类

struct CudlaFence
union cudlaDevAttribute
struct cudlaExternalMemoryHandleDesc_t
struct cudlaExternalSemaphoreHandleDesc_t
union cudlaModuleAttribute
struct cudlaModuleTensorDescriptor
struct cudlaSignalEvents
struct cudlaTask
struct cudlaWaitEvents

类型定义

typedef cudlaDevHandle_t * cudlaDevHandle
typedef cudlaModule_t * cudlaModule

类型定义

typedef cudlaDevHandle_t * cudlaDevHandle: cuDLA 设备句柄
typedef cudlaModule_t * cudlaModule: cuDLA 模块句柄

枚举

enum cudlaAccessPermissionFlags

导入NvSciBuffers的访问权限标志

取值

CUDLA_READ_WRITE_PERM = 0: Flag to import memory with read-write permission
CUDLA_READ_ONLY_PERM = 1: Flag to import memory with read-only permission
CUDLA_TASK_STATISTICS = 1<<1: Flag to indicate buffer as layerwise statistics buffer.

enum cudlaDevAttributeType

设备属性类型。

取值

CUDLA_UNIFIED_ADDRESSING = 0: Flag to check for support for UVA.
CUDLA_DEVICE_VERSION = 1: Flag to check for DLA HW version.

enum cudlaFenceType

支持的栅栏类型。

取值

CUDLA_NVSCISYNC_FENCE = 1: NvSciSync fence type for EOF.
CUDLA_NVSCISYNC_FENCE_SOF = 2

enum cudlaMode

设备创建模式。

取值

CUDLA_CUDA_DLA = 0: Hyrbid mode.
CUDLA_STANDALONE = 1: Standalone mode.

enum cudlaModuleAttributeType

模块属性类型。

取值

CUDLA_NUM_INPUT_TENSORS = 0: Flag to retrieve number of input tensors.
CUDLA_NUM_OUTPUT_TENSORS = 1: Flag to retrieve number of output tensors.
CUDLA_INPUT_TENSOR_DESCRIPTORS = 2: Flag to retrieve all the input tensor descriptors.
CUDLA_OUTPUT_TENSOR_DESCRIPTORS = 3: Flag to retrieve all the output tensor descriptors.
CUDLA_NUM_OUTPUT_TASK_STATISTICS = 4: Flag to retrieve total number of output task statistics buffer.
CUDLA_OUTPUT_TASK_STATISTICS_DESCRIPTORS = 5: Flag to retrieve all the output task statistics descriptors.

enum cudlaModuleLoadFlags

cudlaModuleLoadFromMemory的模块加载标志。

取值

CUDLA_MODULE_DEFAULT = 0: Default flag.
CUDLA_MODULE_ENABLE_FAULT_DIAGNOSTICS = 1: Flag to load a module that is used to perform permanent fault diagnostics for DLA HW.

enum cudlaNvSciSyncAttributes

cuDLA NvSciSync属性。

取值

CUDLA_NVSCISYNC_ATTR_WAIT = 1: Wait attribute.
CUDLA_NVSCISYNC_ATTR_SIGNAL = 2: Signal attribute.

enum cudlaStatus

错误代码。

取值

cudlaSuccess = 0: The API call returned with no errors.
cudlaErrorInvalidParam = 1: This indicates that one or more parameters passed to the API is/are incorrect.
cudlaErrorOutOfResources = 2: This indicates that the API call failed due to lack of underlying resources.
cudlaErrorCreationFailed = 3: This indicates that an internal error occurred during creation of device handle.
cudlaErrorInvalidAddress = 4: This indicates that the memory object being passed in the API call has not been registered before.
cudlaErrorOs = 5: This indicates that an OS error occurred.
cudlaErrorCuda = 6: This indicates that there was an error in a CUDA operation as part of the API call.
cudlaErrorUmd = 7: This indicates that there was an error in the DLA runtime for the API call.
cudlaErrorInvalidDevice = 8: This indicates that the device handle passed to the API call is invalid.
cudlaErrorInvalidAttribute = 9: This indicates that an invalid attribute is being requested.
cudlaErrorIncompatibleDlaSWVersion = 10: This indicates that the underlying DLA runtime is incompatible with the current cuDLA version.
cudlaErrorMemoryRegistered = 11: This indicates that the memory object is already registered.
cudlaErrorInvalidModule = 12: This indicates that the module being passed is invalid.
cudlaErrorUnsupportedOperation = 13: This indicates that the operation being requested by the API call is unsupported.
cudlaErrorNvSci = 14: This indicates that the NvSci operation requested by the API call failed.
cudlaErrorDlaErrInvalidInput = 0x40000001: DLA HW Error.
cudlaErrorDlaErrInvalidPreAction = 0x40000002: DLA HW Error.
cudlaErrorDlaErrNoMem = 0x40000003: DLA HW Error.
cudlaErrorDlaErrProcessorBusy = 0x40000004: DLA HW Error.
cudlaErrorDlaErrTaskStatusMismatch = 0x40000005: DLA HW Error.
cudlaErrorDlaErrEngineTimeout = 0x40000006: DLA HW Error.
cudlaErrorDlaErrDataMismatch = 0x40000007: DLA HW Error.
cudlaErrorUnknown = 0x7fffffff: This indicates that an unknown error has occurred.

enum cudlaSubmissionFlags

cudlaSubmitTask的任务提交标志。

取值

CUDLA_SUBMIT_NOOP = 1: Flag to specify that the submitted task must be bypassed for execution.
CUDLA_SUBMIT_SKIP_LOCK_ACQUIRE = 1<<1: Flag to specify that the global lock acquire must be skipped.
CUDLA_SUBMIT_DIAGNOSTICS_TASK = 1<<2: Flag to specify that the submitted task is to run permanent fault diagnostics for DLA HW.

1.2. cuDLA API

本节介绍cuDLA驱动器的应用程序编程接口。

Functions

cudlaStatus cudlaCreateDevice ( const uint64_t device, const cudlaDevHandle* devHandle, const uint32_t flags ): Create a device handle.
cudlaStatus cudlaDestroyDevice ( const cudlaDevHandle devHandle ): Destroy device handle.
cudlaStatus cudlaDeviceGetAttribute ( const cudlaDevHandle devHandle, const cudlaDevAttributeType attrib, const cudlaDevAttribute* pAttribute ): Get cuDLA device attributes.
cudlaStatus cudlaDeviceGetCount ( const uint64_t* pNumDevices ): Get device count.
cudlaStatus cudlaGetLastError ( const cudlaDevHandle devHandle ): Gets the last asynchronous error in task execution.
cudlaStatus cudlaGetNvSciSyncAttributes ( uint64_t* attrList, const uint32_t flags ): Get cuDLA's NvSciSync attributes.
cudlaStatus cudlaGetVersion ( const uint64_t* version ): Returns the version number of the library.
cudlaStatus cudlaImportExternalMemory ( const cudlaDevHandle devHandle, const cudlaExternalMemoryHandleDesc* desc, const uint64_t** devPtr, const uint32_t flags ): Imports external memory into cuDLA.
cudlaStatus cudlaImportExternalSemaphore ( const cudlaDevHandle devHandle, const cudlaExternalSemaphoreHandleDesc* desc, const uint64_t** devPtr, const uint32_t flags ): Imports external semaphore into cuDLA.
cudlaStatus cudlaMemRegister ( const cudlaDevHandle devHandle, const uint64_t* ptr, const size_t size, const uint64_t** devPtr, const uint32_t flags ): Registers the CUDA memory to DLA engine.
cudlaStatus cudlaMemUnregister ( const cudlaDevHandle devHandle, const uint64_t* devPtr ): Unregisters the input memory from DLA engine.
cudlaStatus cudlaModuleGetAttributes ( const cudlaModule hModule, const cudlaModuleAttributeType attrType, const cudlaModuleAttribute* attribute ): Get DLA module attributes.
cudlaStatus cudlaModuleLoadFromMemory ( const cudlaDevHandle devHandle, const uint8_t* pModule, const size_t moduleSize, const cudlaModule* hModule, const uint32_t flags ): Load a DLA module.
cudlaStatus cudlaModuleUnload ( const cudlaModule hModule, const uint32_t flags ): Unload a DLA module.
cudlaStatus cudlaSetTaskTimeoutInMs ( const cudlaDevHandle devHandle, const uint32_t timeout ): Set task timeout in millisecond.
cudlaStatus cudlaSubmitTask ( const cudlaDevHandle devHandle, const cudlaTask* ptrToTasks, const uint32_t numTasks, const void* stream, const uint32_t flags ): Submits the inference operation on DLA.

Functions

cudlaStatus cudlaCreateDevice ( const uint64_t device, const cudlaDevHandle* devHandle, const uint32_t flags )

创建一个设备句柄。

参数

device

- Device number (can be 0 or 1).

devHandle

- Pointer to hold the created cuDLA device handle.

flags

- Flags controlling device creation. Valid values for flags are:

CUDLA_CUDA_DLA - 在此模式下，cuDLA作为CUDA编程模型的扩展，允许使用CUDA结构提交DLA工作。
CUDLA_STANDALONE - 在此模式下，cuDLA独立运行，不与CUDA进行任何交互。

cudlaSuccess, cudlaErrorOutOfResources, cudlaErrorInvalidParam, cudlaErrorIncompatibleDlaSWVersion, cudlaErrorCreationFailed, cudlaErrorCuda, cudlaErrorUmd, cudlaErrorUnsupportedOperation

描述

创建一个cuDLA设备实例，可用于提交DLA操作。应用程序可以在混合模式或独立模式下创建该句柄。在混合模式下，此API会使用当前设置的GPU设备来决定所创建的DLA设备句柄的关联关系。如果当前设置的GPU设备是dGPU，由于目前cuDLA不支持dGPU，此函数将返回cudlaErrorUnsupportedOperation。每个DLA硬件实例支持16个cuDLA设备句柄。

cudlaStatus cudlaDestroyDevice ( const cudlaDevHandle devHandle )

销毁设备句柄。

参数

devHandle: - A valid device handle.

cudlaSuccess, cudlaErrorInvalidDevice, cudlaErrorCuda, cudlaErrorUmd

描述

销毁通过cudlaCreateDevice创建的cuDLA设备实例。在销毁句柄之前，必须确保之前提交给设备的所有任务都已完成。否则可能导致应用程序崩溃。

在混合模式下，cuDLA内部使用CUDA通过主上下文执行内存分配。因此，在销毁或重置CUDA主上下文之前，必须销毁所有cuDLA设备初始化。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaDeviceGetAttribute ( const cudlaDevHandle devHandle, const cudlaDevAttributeType attrib, const cudlaDevAttribute* pAttribute )

获取cuDLA设备属性。

参数

devHandle: - The input cuDLA device handle.
attrib: - The attribute that is being requested.
pAttribute: - The output pointer where the attribute will be available.

cudlaSuccess, cudlaErrorInvalidParam, cudlaErrorInvalidDevice, cudlaErrorUmd, cudlaErrorInvalidAttribute

描述

CUDA与DLA之间的UVA寻址需要底层内核模式驱动的特殊支持。应用程序应查询cuDLA运行时以检查当前版本的cuDLA是否支持UVA寻址。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaDeviceGetCount ( const uint64_t* pNumDevices )

获取设备数量。

参数

pNumDevices: - The number of DLA devices will be available in this variable upon successful completion.

cudlaSuccess, cudlaErrorInvalidParam, cudlaErrorUmd, cudlaErrorIncompatibleDlaSWVersion

描述

获取可用的DLA设备数量。

cudlaStatus cudlaGetLastError ( const cudlaDevHandle devHandle )

获取任务执行中的最后一个异步错误。

参数

devHandle: - A valid device handle.

cudlaSuccess, cudlaErrorInvalidDevice, cudlaErrorDlaErrInvalidInput, cudlaErrorDlaErrInvalidPreAction, cudlaErrorDlaErrNoMem, cudlaErrorDlaErrProcessorBusy, cudlaErrorDlaErrTaskStatusMismatch, cudlaErrorDlaErrEngineTimeout, cudlaErrorDlaErrDataMismatch, cudlaErrorUnknown

描述

DLA任务在DLA硬件上异步执行。因此，在任务提交时无法获知任务执行状态。可以通过此接口查询特定设备句柄最近由DLA硬件执行的任务状态。

请注意，此函数返回的cudlaSuccess状态码并不保证最近执行的任务一定成功。由于该函数会立即返回，它只能反映调用时刻的任务状态快照。为确保任务完成，应用程序必须在混合或独立模式下对已提交的任务进行同步，然后再调用此API检查错误。

cudlaStatus cudlaGetNvSciSyncAttributes ( uint64_t* attrList, const uint32_t flags )

获取cuDLA的NvSciSync属性。

参数

attrList

- Attribute list created by the application.

flags

- Applications can use this flag to specify how they intend to use the NvSciSync object created from the attrList. The valid values of flags can be one of the following (or an OR of these values):

CUDLA_NVSCISYNC_ATTR_WAIT, 指定应用程序打算将使用此属性列表创建的NvSciSync对象作为cuDLA中的等待器使用，因此需要cuDLA填充等待器特定的NvSciSyncAttr。
CUDLA_NVSCISYNC_ATTR_SIGNAL，指定应用程序打算将使用此属性列表创建的NvSciSync对象作为cuDLA中的信号发送器使用，因此需要cuDLA填充信号发送器特定的NvSciSyncAttr。

cudlaSuccess, API调用成功返回，无错误。
cudlaErrorInvalidParam, 该API调用失败，因为传入了无效参数attrList。
cudlaErrorUnsupportedOperation, 此错误代码表示API调用失败，因为该操作在混合模式下不受支持。
cudlaErrorInvalidAttribute, API调用失败，因为参数attrList包含无效值。
cudlaErrorNvSci, 此错误代码表示在API调用过程中NvSci操作出现错误。
cudlaErrorNotPermittedOperation，此错误代码表示当DRIVE OS处于运行状态时不允许进行该API调用。
cudlaErrorUnknown, 该错误代码表示发生了未知错误。

描述

获取应用程序创建的属性列表中NvSciSync的属性。

cuDLA支持两种类型的NvSciSync对象原语 -

同步点
默认情况下，确定性信号量cuDLA会优先处理同步点原语而非确定性信号量原语，并在NvSciSync属性列表中设置这些优先级。

对于确定性信号量，用于创建NvSciSync对象的NvSciSync属性列表必须将NvSciSyncAttrKey_RequireDeterministicFences键的值设置为true。

cuDLA还支持NvSciSync对象上的时间戳功能。等待者可以通过将NvSciSync属性"NvSciSyncAttrKey_WaiterRequireTimestamps"设置为true来请求此功能。

如果NvSci初始化失败，此函数将返回cudlaErrorUnsupportedOperation。在某些情况下，当底层NvSci操作失败时，此函数可能返回cudlaErrorNvSci或cudlaErrorInvalidAttribute。

此API将使用等同于以下公共属性键值的值更新输入nvSciSyncAttrList：

NvSciSyncAttrKey_RequiredPerm 被设置为

如果标志的值设置为CUDLA_NVSCISYNC_ATTR_WAIT，则使用NvSciSyncAccessPerm_SignalOnly。
如果标志值设置为CUDLA_NVSCISYNC_ATTR_SIGNAL，则使用NvSciSyncAccessPerm_WaitOnly权限。
如果标志的值设置为CUDLA_NVSCISYNC_ATTR_SIGNAL | CUDLA_NVSCISYNC_ATTR_WAIT，则使用NvSciSyncAccessPerm_WaitSignal。

由于NvSciSyncAttrKey_RequiredPerm是由cuDLA内部设置的，因此不允许应用程序设置此值。

Note:

cuDLA用户只能使用NvSci API向输出attrList追加属性，修改输出attrList中已填充的值可能导致未定义行为。

cudlaStatus cudlaGetVersion ( const uint64_t* version )

返回该库的版本号。

参数

version: - cuDLA library version will be available in this variable upon successful execution.

cudlaSuccess, cudlaErrorInvalidParam

描述

cuDLA采用语义化版本号。此函数将返回版本号，格式为1000000*主版本号 + 1000*次版本号 + 修订号。

cudlaStatus cudlaImportExternalMemory ( const cudlaDevHandle devHandle, const cudlaExternalMemoryHandleDesc* desc, const uint64_t** devPtr, const uint32_t flags )

将外部内存导入到cuDLA中。

参数

devHandle

- A valid device handle.

desc

- Contains description about allocated external memory.

devPtr

- The output pointer where the mapping will be available.

flags

- Application can use this flag to specify the memory access permissions of the memory that needs to be registered with DLA. The valid values of flags can be one of the following:

CUDLA_READ_WRITE_PERM，指定外部内存需要以读写内存的形式注册到DLA。
CUDLA_READ_ONLY_PERM，指定外部内存需要以只读内存的形式注册到DLA。
CUDLA_TASK_STATISTICS，指定外部内存需要向DLA注册以获取分层统计信息。

cudlaSuccess, cudlaErrorInvalidParam, cudlaErrorInvalidDevice, cudlaErrorUnsupportedOperation, cudlaErrorNvSci, cudlaErrorInvalidAttribute, cudlaErrorMemoryRegistered, cudlaErrorUmd

描述

通过向DLA注册导入已分配的外部内存。成功注册后，返回的指针可用于任务提交。

在Tegra平台上，cuDLA仅支持在独立模式下导入NvSciBuf对象。如果NvSci初始化失败（可能是由于在混合模式下使用了此API或NvSci库初始化出现问题），该函数将返回cudlaErrorUnsupportedOperation。当底层NvSci操作失败时，该函数在某些情况下可能返回cudlaErrorNvSci或cudlaErrorInvalidAttribute。

Note:

cuDLA仅支持导入类型为NvSciBufType_RawBuffer或NvSciBufType_Tensor的NvSciBuf对象。导入任何其他类型的NvSciBuf对象可能导致未定义行为。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaImportExternalSemaphore ( const cudlaDevHandle devHandle, const cudlaExternalSemaphoreHandleDesc* desc, const uint64_t** devPtr, const uint32_t flags )

将外部信号量导入到cuDLA中。

参数

devHandle: - A valid device handle.
desc: - Contains sempahore object.
devPtr: - The output pointer where the mapping will be available.
flags: - Reserved for future. Must be set to 0.

cudlaSuccess, cudlaErrorInvalidParam, cudlaErrorInvalidDevice, cudlaErrorUnsupportedOperation, cudlaErrorNvSci, cudlaErrorInvalidAttribute, cudlaErrorMemoryRegistered

描述

通过向DLA注册导入已分配的外部信号量。成功注册后，返回的指针可用于任务提交中以通知同步对象。

在Tegra平台上，cuDLA仅支持在独立模式下导入NvSciSync对象。cuDLA支持的NvSciSync对象原语包括同步点和确定性信号量。

cuDLA还支持NvSciSync对象上的时间戳功能，用户可以通过该功能获取特定围栏信号触发时的DLA时钟快照。在任何时间点，最多只能有512个有效的时间戳缓冲区与围栏关联。例如，如果用户从启用了时间戳的单个NvSciSync对象创建了513个围栏，那么第1个围栏关联的时间戳缓冲区将与第513个围栏相同。

如果NvSci初始化失败（可能是由于在混合模式下使用此API或NvSci库初始化出现问题），该函数将返回cudlaErrorUnsupportedOperation。在某些情况下，当底层NvSci操作失败时，该函数可能返回cudlaErrorNvSci或cudlaErrorInvalidAttribute。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaMemRegister ( const cudlaDevHandle devHandle, const uint64_t* ptr, const size_t size, const uint64_t** devPtr, const uint32_t flags )

将CUDA内存注册到DLA引擎。

参数

devHandle

- A valid cuDLA device handle create by a previous call to cudlaCreateDevice.

ptr

- The CUDA pointer to be registered.

size

- The size of the mapping i.e the number of bytes from ptr that must be mapped.

devPtr

- The output pointer where the mapping will be available.

flags

- Applications can use this flag to control several aspects of the registration process. The valid values of flags can be one of the following (or an OR of these values):

0, 默认
CUDLA_TASK_STATISTICS，指定外部内存需要向DLA注册以获取分层统计信息。

cudlaSuccess, cudlaErrorInvalidDevice, cudlaErrorInvalidParam, cudlaErrorInvalidAddress, cudlaErrorCuda, cudlaErrorUmd, cudlaErrorOutOfResources, cudlaErrorMemoryRegistered, cudlaErrorUnsupportedOperation

描述

作为注册过程的一部分，系统会创建一个映射关系，使DLA硬件能够访问底层的CUDA内存。生成的映射信息存储在devPtr中，应用程序在提交操作中引用该内存时必须使用此映射。

如果待注册的指针或大小无效，该函数将返回cudlaErrorInvalidAddress。此外，如果输入指针已被注册过，则该函数将返回cudlaErrorMemoryRegistered。尝试重新注册内存不会在cuDLA中造成任何不可恢复的错误，即使发生此错误后应用程序仍可继续使用cuDLA API。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaMemUnregister ( const cudlaDevHandle devHandle, const uint64_t* devPtr )

从DLA引擎中注销输入内存。

参数

devHandle: - A valid cuDLA device handle create by a previous call to cudlaCreateDevice.
devPtr: - The pointer to be unregistered.

cudlaSuccess, cudlaErrorInvalidDevice, cudlaErrorInvalidAddress, cudlaErrorUmd

描述

使DLA硬件能够访问内存的系统映射已被移除。该映射可能是由之前调用cudlaMemRegister、cudlaImportExternalMemory或cudlaImportExternalSemaphore创建的。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaModuleGetAttributes ( const cudlaModule hModule, const cudlaModuleAttributeType attrType, const cudlaModuleAttribute* attribute )

获取DLA模块属性。

参数

hModule: - The input DLA module.
attrType: - The attribute type that is being requested.
attribute: - The output pointer where the attribute will be available.

cudlaSuccess, cudlaErrorInvalidParam, cudlaErrorInvalidModule, cudlaErrorInvalidDevice, cudlaErrorUmd, cudlaErrorInvalidAttribute, cudlaErrorUnsupportedOperation

描述

从已加载的模块中获取模块属性。如果模块未在任何设备中加载，此API将返回cudlaErrorInvalidDevice。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaModuleLoadFromMemory ( const cudlaDevHandle devHandle, const uint8_t* pModule, const size_t moduleSize, const cudlaModule* hModule, const uint32_t flags )

加载一个DLA模块。

参数

devHandle

- The input cuDLA device handle. The module will be loaded in the context of this handle.

pModule

- A pointer to an in-memory module.

moduleSize

- The size of the module.

hModule

- The address in which the loaded module handle will be available upon successful execution.

flags

- Applications can use this flag to specify how the module is going to be used. The valid values of flags can be one of the following:

CUDLA_MODULE_DEFAULT, 默认值为0。
CUDLA_MODULE_ENABLE_FAULT_DIAGNOSTICS, 应用程序可以指定此标志来加载用于执行DLA硬件故障诊断的模块。设置此标志后，pModule和moduleSize参数应为NULL和0，因为诊断模块是内部加载的。

cudlaSuccess, cudlaErrorInvalidDevice, cudlaErrorInvalidParam, cudlaErrorOutOfResources, cudlaErrorUnsupportedOperation, cudlaErrorUmd

描述

将模块加载到当前设备句柄中。

不允许将多个可加载项加载到单个cuDLA设备句柄上。
在cuDLA设备句柄的生命周期内，一个可加载对象只能被加载一次。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaModuleUnload ( const cudlaModule hModule, const uint32_t flags )

卸载一个DLA模块。

参数

hModule: - Handle to the loaded module.
flags: - Reserved for future. Must be set to 0.

cudlaSuccess, cudlaErrorInvalidParam, cudlaErrorInvalidDevice, cudlaErrorInvalidModule, cudlaErrorUmd

描述

从加载该模块的设备句柄中卸载模块。如果模块未加载到有效设备中，此API将返回cudlaErrorInvalidDevice。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaSetTaskTimeoutInMs ( const cudlaDevHandle devHandle, const uint32_t timeout )

设置任务超时时间，单位为毫秒。

参数

devHandle: - A valid device handle.
timeout: - task timeout value in ms.

cudlaSuccess, cudlaErrorInvalidParam

描述

为每个设备句柄设置任务超时时间（毫秒）。如果用户未明确设置超时时间，cuDLA默认将超时值设为30秒。

如果设备句柄无效或超时时间为0或超时时间超过1000秒，该函数将返回cudlaErrorInvalidParam，否则返回cudlaSuccess。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

cudlaStatus cudlaSubmitTask ( const cudlaDevHandle devHandle, const cudlaTask* ptrToTasks, const uint32_t numTasks, const void* stream, const uint32_t flags )

在DLA上提交推理操作。

参数

devHandle

- A valid cuDLA device handle.

ptrToTasks

- A list of inferencing tasks.

numTasks

- The number of tasks.

stream

- The stream on which the DLA task has to be submitted.

flags

- Applications can use this flag to control several aspects of the submission process. The valid values of flags can be one of the following (or an OR of these values):

0, 默认值
CUDLA_SUBMIT_NOOP，指定提交的任务在DLA执行期间必须跳过。但是，所有waitEvents和signalEvents的依赖关系必须满足。当进行NULL数据提交时此标志会被忽略，因为在这种情况下仅会内部存储等待和信号事件以供下一个任务提交使用。
CUDLA_SUBMIT_SKIP_LOCK_ACQUIRE，指定提交的任务正在被加入设备句柄队列，且此时没有其他线程正在向该设备句柄队列加入其他任务。这是一个应用程序可用于优化的标志。通常，cuDLA API会在内部获取全局锁以保证线程安全。但在应用程序向不同设备句柄提交任务的情况下，此锁会导致不必要的串行化。如果应用程序在多个线程中提交一个或多个任务，且这些提交面向不同的设备句柄，并且在各自的提交中没有共享数据作为任务信息的一部分，则应用程序可以在提交时指定此标志以跳过内部锁获取。共享数据还包括混合模式操作中的输入流。因此，如果使用相同的流提交两个不同的任务，即使两个设备句柄不同，此标志的使用也是无效的。
CUDLA_SUBMIT_DIAGNOSTICS_TASK，指定提交的任务是为DLA硬件运行永久性故障诊断。用户可以使用此任务来探测DLA硬件的状态。设置此标志后，在独立模式下不允许用户进行仅包含事件（等待/信号或两者）的任务提交（此时张量信息为NULL）。这是因为该任务始终运行在内部加载的诊断模块上。该诊断模块不需要任何输入张量，因此不需要输入张量内存，但用户需要查询输出张量的数量，分配输出张量内存，并在使用提交任务时传递这些信息。

cudlaSuccess, cudlaErrorInvalidParam, cudlaErrorInvalidDevice, cudlaErrorInvalidModule, cudlaErrorCuda, cudlaErrorUmd, cudlaErrorOutOfResources, cudlaErrorInvalidAddress, cudlaErrorUnsupportedOperation, cudlaErrorInvalidAttribute, cudlaErrorNvSci cudlaErrorOs

描述

该操作接收一系列任务，并按输入任务数组中的顺序将它们提交到DLA硬件执行。假设输入和输出张量（如果使用还包括统计缓冲区）已通过cudlaMemRegister（混合模式）或cudlaImportExternalMemory（独立模式）预先注册。若未完成预注册，可能导致本函数返回cudlaErrorInvalidAddress错误。

stream参数必须指定为在混合模式下提交DLA任务执行的CUDA流。在独立模式下，此参数必须传递为NULL，否则将导致此函数返回cudlaErrorInvalidParam。

cudlaTask 结构体提供了指定等待和信号事件的机制，这些事件分别表示 cuDLA 在执行 cudlaSubmitTask() 时必须等待和触发的操作。每个提交的任务将在开始执行前等待其所有等待事件被触发，并提供一个信号事件（如果在 cudlaSubmitTask 期间请求了该事件），应用程序（或任何其他实体）可以等待该信号事件以确保提交的任务已完成执行。在 cuDLA 1.0 中，仅支持将 NvSciSync 栅栏作为等待事件的一部分。此外，只有 NvSciSync 对象（通过 cudlaImportExternalSemaphore 注册）可以作为信号事件的一部分被触发，并且与触发事件对应的栅栏将作为 cudlaSubmitTask 的一部分返回。

在独立模式下，如果在cudlaTask结构体中将inputTensor和outputTensor字段设置为NULL，则任务提交将被解释为等待和信号事件的入队操作，这些事件必须被后续任务提交考虑。实际上并未执行真正的任务提交。在输入/输出Tensor字段中多次提交带有NULL字段的后续任务将覆盖待考虑的等待和信号事件列表。换句话说，在非NULL提交之前最新的非NULL等待事件和/或最新的非NULL信号事件将被用于后续实际任务提交。在独立模式下执行实际任务提交时，将被考虑的有效等待事件和信号事件包括：应用程序通过NULL数据提交设置的内容，以及该特定任务提交在waitEvents和signalEvents字段中设置的内容。作为NULL数据提交一部分设置的等待事件仅被视为第一个任务的依赖项，而作为NULL数据提交一部分设置的信号事件将在任务列表的最后一个任务完成时触发。所有单独适用于waitEvents和signalEvents的约束条件（如下所述）同样适用于组合后的列表。

cuDLA支持3种类型的fence - preFence、SOF fence和EOF fence。

preFence是DLA等待开始任务执行时的围栏类型。使用cudlaFenceType作为CUDLA_NVSCISYNC_FENCE来将围栏标记为preFence。
SOF(Start Of Frame)围栏是一种在DLA上任务执行开始前发出信号的围栏类型。使用cudlaFenceType 设置为CUDLA_NVSCISYNC_FENCE_SOF来将围栏标记为SOF围栏。
EOF(End Of Frame)围栏是一种在DLA上任务执行完成后发出信号的围栏类型。使用cudlaFenceType为CUDLA_NVSCISYNC_FENCE来将围栏标记为EOF围栏。

对于等待事件，应用程序预期会

使用cudlaImportExternalSemaphore注册其同步对象。
使用CudlaFence创建所需数量的preFence占位符。
使用应用程序中的相关围栏填充占位符。
列出cudlaWaitEvents中的所有围栏。

对于信号事件，应用程序预期会

使用cudlaImportExternalSemaphore注册其同步对象。
使用CudlaFence创建所需数量的SOF和EOF围栏占位符围栏。
将注册对象和对应的栅栏放入cudlaSignalEvents中。对于确定性信号量，无需在cudlaSignalEvents中传递栅栏。

当cudlaSubmitTask成功返回时，可以使用cudlaSignalEvents中的栅栏来等待特定任务完成。cuDLA支持1个同步点和任意数量的信号量作为cudlaSignalEvents的组成部分。如果指定超过1个同步点，将返回cudlaErrorInvalidParam错误。

cuDLA遵循DLA的限制，每个DLA任务最多支持29个preFences和SOF fences的组合，以及29个EOF fences。

在提交过程中，用户可以选择为网络的各个层启用逐层统计性能分析功能。该选项需要通过指定额外的输出缓冲区来启用，这些缓冲区将包含性能分析信息。具体而言，

"cudlaTask::numOutputTensors" 应为 cudlaModuleGetAttributes(...,CUDLA_NUM_OUTPUT_TENSORS,...) 和 cudlaModuleGetAttributes(...,CUDLA_NUM_OUTPUT_TASK_STATISTICS,...) 返回值的总和
"cudlaTask::outputTensor" 应包含附加了统计输出缓冲区数组的输出张量数组。

如果出现以下情况，该函数可能返回 cudlaErrorUnsupportedOperation

以混合模式使用的流处于捕获状态。
应用程序尝试在混合模式下使用NvSci功能。
针对特定平台加载NvSci库失败。
指定了除CUDLA_NVSCISYNC_FENCE之外的围栏类型。
在混合模式下，waitEvents或signaEvents不为NULL。
在混合模式下，inputTensor或outputTensor为NULL，且标志不是CUDLA_SUBMIT_DIAGNOSTICS_TASK。
在独立模式下，当inputTensor为NULL且outputTensor不为NULL，或者相反情况时，且标志位不是CUDLA_SUBMIT_DIAGNOSTICS_TASK。
在独立模式下，当inputTensor和outputTensor为NULL且任务数量不等于1，并且标志不是CUDLA_SUBMIT_DIAGNOSTICS_TASK时。
inputTensor 不为 NULL 或 output tensor 为 NULL 且 flags 为 CUDLA_SUBMIT_DIAGNOSTICS_TASK。
有效信号事件列表包含多个需要同步的信号点。
如果层间特征不受支持。
如果未满足每个任务的preFences、SOF fences和EOF fences限制。

在某些情况下，当底层NvSci操作失败时，此函数可能返回cudlaErrorNvSci或cudlaErrorInvalidAttribute。

如果内部系统操作失败，此函数可能返回cudlaErrorOs。

Note:

该API可以返回之前DLA任务提交中的任务执行错误。

2. 数据结构

以下是带有简要描述的数据结构：

cudlaDevAttribute
cudlaExternalMemoryHandleDesc
cudlaExternalSemaphoreHandleDesc
CudlaFence
cudlaModuleAttribute
cudlaModuleTensorDescriptor
cudlaSignalEvents
cudlaTask
cudlaWaitEvents

2.1. cudlaDevAttribute 联合体参考

[cuDLA驱动使用的数据类型]

设备属性。

公共变量

uint32_t deviceVersion
uint8_t unifiedAddressingSupported

变量

uint32_t cudlaDevAttribute::deviceVersion [inherited]: DLA设备版本。Xavier为1.0，Orin为2.0。
uint8_t cudlaDevAttribute::unifiedAddressingSupported [inherited]: 如果不支持统一寻址，则返回0。

2.2. cudlaExternalMemoryHandleDesc_t 结构体参考

[cuDLA驱动使用的数据类型]

外部内存句柄描述符。

公共变量

const void * extBufObject
unsigned long long 大小

变量

const void * cudlaExternalMemoryHandleDesc_t::extBufObject [inherited]: 表示外部内存对象的句柄。
unsigned long long cudlaExternalMemoryHandleDesc_t::大小 [inherited]: 内存分配的大小

2.3. cudlaExternalSemaphoreHandleDesc_t 结构体参考

[cuDLA驱动使用的数据类型]

外部信号量句柄描述符。

公共变量

const void * extSyncObject

变量

const void * cudlaExternalSemaphoreHandleDesc_t::extSyncObject [inherited]: 表示外部同步对象的句柄。

2.4. CudlaFence 结构体参考

[cuDLA驱动使用的数据类型]

围栏描述。

公共变量

void * fence
cudlaFenceType 类型

变量

void * CudlaFence::fence [inherited]: 围栏。
cudlaFenceType CudlaFence::类型 [inherited]: 栅栏类型。

2.5. cudlaModuleAttribute 联合体参考

[cuDLA驱动使用的数据类型]

模块属性。

公共变量

cudlaModuleTensorDescriptor * inputTensorDesc
uint32_t numInputTensors
uint32_t numOutputTensors
cudlaModuleTensorDescriptor * outputTensorDesc

变量

cudlaModuleTensorDescriptor * cudlaModuleAttribute::inputTensorDesc [inherited]: 返回输入张量描述符的数组。
uint32_t cudlaModuleAttribute::numInputTensors [inherited]: 返回输入张量的数量。
uint32_t cudlaModuleAttribute::numOutputTensors [inherited]: 返回输出张量的数量。
cudlaModuleTensorDescriptor * cudlaModuleAttribute::outputTensorDesc [inherited]: 返回输出张量描述符的数组。

2.6. cudlaModuleTensorDescriptor 结构体参考

[cuDLA驱动使用的数据类型]

张量描述符。

2.7. cudlaSignalEvents 结构体参考

[cuDLA驱动使用的数据类型]

cudlaSubmitTask的信号事件

公共变量

const * devPtrs
CudlaFence * eofFences
uint32_t numEvents

变量

const * cudlaSignalEvents::devPtrs [inherited]: 已注册同步对象数组（通过cudlaImportExternalSemaphore）。
CudlaFence * cudlaSignalEvents::eofFences [inherited]: 对应于同步对象的所有信号事件的栅栏指针数组。
uint32_t cudlaSignalEvents::numEvents [inherited]: 信号事件的总数。

2.8. cudlaTask 结构体参考

[cuDLA驱动使用的数据类型]

任务结构。

公共变量

const * inputTensor
cudlaModule moduleHandle
uint32_t numInputTensors
uint32_t numOutputTensors
const * outputTensor
cudlaSignalEvents * signalEvents
const cudlaWaitEvents * waitEvents

变量

const * cudlaTask::inputTensor [inherited]: 输入张量的数组。
cudlaModule cudlaTask::moduleHandle [inherited]: cuDLA模块句柄。
uint32_t cudlaTask::numInputTensors [inherited]: 输入张量的数量。
uint32_t cudlaTask::numOutputTensors [inherited]: 输出张量的数量。
const * cudlaTask::outputTensor [inherited]: 输出张量的数组。
cudlaSignalEvents * cudlaTask::signalEvents [inherited]: 信号事件。
const cudlaWaitEvents * cudlaTask::waitEvents [inherited]: 等待事件。

2.9. cudlaWaitEvents 结构体参考

[cuDLA驱动使用的数据类型]

等待cudlaSubmitTask的事件。

公共变量

uint32_t numEvents
const CudlaFence * preFences

变量

uint32_t cudlaWaitEvents::numEvents [inherited]: 等待事件的总数。
const CudlaFence * cudlaWaitEvents::preFences [inherited]: 所有等待事件的栅栏指针数组。

3. 数据字段

以下是所有已记录的结构体和联合体字段列表，每个字段均附有指向相应结构体/联合体文档的链接：

deviceVersion: cudlaDevAttribute
devPtrs: cudlaSignalEvents
eofFences: cudlaSignalEvents
extBufObject: cudlaExternalMemoryHandleDesc
extSyncObject: cudlaExternalSemaphoreHandleDesc
fence: CudlaFence
inputTensor: cudlaTask
inputTensorDesc: cudlaModuleAttribute
moduleHandle: cudlaTask
numEvents: cudlaWaitEvents; cudlaSignalEvents
numInputTensors: cudlaTask; cudlaModuleAttribute
numOutputTensors: cudlaTask; cudlaModuleAttribute
outputTensor: cudlaTask
outputTensorDesc: cudlaModuleAttribute
preFences: cudlaWaitEvents
signalEvents: cudlaTask
size: cudlaExternalMemoryHandleDesc
type: CudlaFence
unifiedAddressingSupported: cudlaDevAttribute
waitEvents: cudlaTask

注意事项

注意

本文件仅供参考用途，不应被视为对产品功能、状态或质量的保证。NVIDIA公司（"NVIDIA"）不作任何明示或暗示的陈述或保证，对于本文件所含信息的准确性或完整性概不负责，并且对于其中可能存在的任何错误不承担任何责任。NVIDIA对于因使用此类信息而产生的后果或使用行为，或因使用该信息而导致的专利或其他第三方权利侵权概不负责。本文件并非对开发、发布或交付任何材料（定义见下文）、代码或功能的承诺。

NVIDIA保留随时对本文件进行更正、修改、增强、改进以及任何其他变更的权利，恕不另行通知。

客户在下单前应获取最新相关信息，并确认该信息是最新且完整的。

NVIDIA产品的销售受限于NVIDIA在订单确认时提供的标准销售条款和条件，除非经NVIDIA与客户授权代表签署的单独销售协议另有约定（"销售条款"）。NVIDIA特此明确反对将任何客户通用条款和条件适用于本文件所述NVIDIA产品的采购。本文件不直接或间接构成任何合同义务。

NVIDIA产品并非设计、授权或保证适用于医疗、军事、航空、航天或生命维持设备，也不适用于可合理预期因NVIDIA产品故障或失灵而导致人身伤害、死亡、财产损失或环境损害的应用场景。NVIDIA对于在此类设备或应用中使用其产品不承担任何责任，因此相关使用行为将由客户自行承担风险。

NVIDIA不声明或保证基于本文件的任何产品适用于特定用途。NVIDIA未必会对每个产品的所有参数进行测试。客户须自行评估并确定本文件所含信息的适用性，确保产品适合并满足客户计划的应用需求，同时为应用进行必要测试以避免应用或产品出现故障。客户产品设计中的缺陷可能影响NVIDIA产品的质量与可靠性，并可能导致超出本文件所述范围的其他或不同条件和/或要求。对于因以下原因导致的任何故障、损害、成本或问题，NVIDIA概不承担责任：(i) 以违反本文件的方式使用NVIDIA产品，或(ii) 客户产品设计问题。

本文件未授予任何NVIDIA专利权、版权或其他NVIDIA知识产权相关的明示或默示许可。NVIDIA发布的关于第三方产品或服务的信息，不构成NVIDIA对这些产品或服务的授权、担保或认可。使用此类信息可能需要获得第三方基于其专利或其他知识产权的许可，或需要获得NVIDIA基于其专利或其他知识产权的许可。

本文件所含信息的复制仅在被NVIDIA事先书面批准、未经改动完整复制、完全符合所有适用的出口法律法规，并附带所有相关条件、限制和声明的情况下方可进行。

本文件及所有NVIDIA设计规范、参考板、文件、图纸、诊断工具、清单及其他文档（统称及单独称为"材料"）均按"现状"提供。NVIDIA不就材料作出任何明示或默示的保证，包括但不限于对不侵权、适销性和特定用途适用性的默示保证的免责声明。在法律允许的最大范围内，NVIDIA对因使用本文件引起的任何损害（包括但不限于直接、间接、特殊、附带、惩罚性或后果性损害）概不负责，无论损害如何造成且无论责任理论为何，即使NVIDIA已被告知发生此类损害的可能性。不论客户因任何原因可能遭受的任何损害，NVIDIA对本文所述产品的总体和累计责任应受产品销售条款的限制。

OpenCL

OpenCL是苹果公司的商标，经Khronos Group Inc.授权使用。

商标

NVIDIA及NVIDIA标识是NVIDIA公司在美国及其他国家的商标或注册商标。其他公司及产品名称可能是其各自关联公司的商标。

版权

本产品包含由Syncro Soft SRL(http://www.sync.ro/)开发的软件。