访问 CUDA 功能#

流和事件#

在本节中，我们将讨论CUDA流和事件的基本用法。有关API参考，请参阅流和事件。关于它们在CUDA编程模型中的作用，请参阅 CUDA编程指南。

CuPy 提供了高级 Python API Stream 和 Event 分别用于创建流和事件。数据复制和内核启动被排入当前流，可以通过 get_current_stream() 查询，并通过设置上下文管理器来更改：

>>> import numpy as np
>>>
>>> a_np = np.arange(10)
>>> s = cp.cuda.Stream()
>>> with s:
...     a_cp = cp.asarray(a_np)  # H2D transfer on stream s
...     b_cp = cp.sum(a_cp)      # kernel launched on stream s
...     assert s == cp.cuda.get_current_stream()
...
>>> # fall back to the previous stream in use (here the default stream)
>>> # when going out of the scope of s

或者通过使用 use() 方法：

>>> s = cp.cuda.Stream()
>>> s.use()  # any subsequent operations are done on steam s  
<Stream ... (device ...)>
>>> b_np = cp.asnumpy(b_cp)
>>> assert s == cp.cuda.get_current_stream()
>>> cp.cuda.Stream.null.use()  # fall back to the default (null) stream
<Stream 0 (device -1)>
>>> assert cp.cuda.Stream.null == cp.cuda.get_current_stream()

事件可以通过手动创建或通过 record() 方法创建。Event 对象可以用于计时 GPU 活动（通过 get_elapsed_time()）或设置流间依赖关系：

>>> e1 = cp.cuda.Event()
>>> e1.record()
>>> a_cp = b_cp * a_cp + 8
>>> e2 = cp.cuda.get_current_stream().record()
>>>
>>> # set up a stream order
>>> s2 = cp.cuda.Stream()
>>> s2.wait_event(e2)
>>> with s2:
...     # the a_cp is guaranteed updated when this copy (on s2) starts
...     a_np = cp.asnumpy(a_cp)
>>>
>>> # timing
>>> e2.synchronize()
>>> t = cp.cuda.get_elapsed_time(e1, e2)  # only include the compute time, not the copy time

就像 Device 对象一样，Stream 和 Event 对象也可以用于同步。

备注

在CuPy中，Stream 对象是按线程、按设备管理的。

备注

On NVIDIA GPUs, there are two stream singleton objects null and ptds, referred to as the legacy default stream and the per-thread default stream, respectively. CuPy uses the former as default when no user-defined stream is in use. To change this behavior, set the environment variable CUPY_CUDA_PER_THREAD_DEFAULT_STREAM to 1, see 环境变量. This is not applicable to AMD GPUs.

为了与其他Python库创建的流互操作，CuPy提供了:class:~cupy.cuda.ExternalStream API来包装现有的流指针（以Python int 形式给出）。详情请参阅:doc:互操作性。

CUDA 驱动程序和运行时 API#

建设中。请参阅运行时 API 获取API参考。