访问 CUDA 功能#

流和事件#

在本节中,我们将讨论CUDA流和事件的基本用法。有关API参考,请参阅 流和事件。关于它们在CUDA编程模型中的作用,请参阅 CUDA编程指南

CuPy 提供了高级 Python API StreamEvent 分别用于创建流和事件。数据复制和内核启动被排入 当前流 ,可以通过 get_current_stream() 查询,并通过设置上下文管理器来更改:

>>> import numpy as np
>>>
>>> a_np = np.arange(10)
>>> s = cp.cuda.Stream()
>>> with s:
...     a_cp = cp.asarray(a_np)  # H2D transfer on stream s
...     b_cp = cp.sum(a_cp)      # kernel launched on stream s
...     assert s == cp.cuda.get_current_stream()
...
>>> # fall back to the previous stream in use (here the default stream)
>>> # when going out of the scope of s

或者通过使用 use() 方法:

>>> s = cp.cuda.Stream()
>>> s.use()  # any subsequent operations are done on steam s  
<Stream ... (device ...)>
>>> b_np = cp.asnumpy(b_cp)
>>> assert s == cp.cuda.get_current_stream()
>>> cp.cuda.Stream.null.use()  # fall back to the default (null) stream
<Stream 0 (device -1)>
>>> assert cp.cuda.Stream.null == cp.cuda.get_current_stream()

事件可以通过手动创建或通过 record() 方法创建。Event 对象可以用于计时 GPU 活动(通过 get_elapsed_time())或设置流间依赖关系:

>>> e1 = cp.cuda.Event()
>>> e1.record()
>>> a_cp = b_cp * a_cp + 8
>>> e2 = cp.cuda.get_current_stream().record()
>>>
>>> # set up a stream order
>>> s2 = cp.cuda.Stream()
>>> s2.wait_event(e2)
>>> with s2:
...     # the a_cp is guaranteed updated when this copy (on s2) starts
...     a_np = cp.asnumpy(a_cp)
>>>
>>> # timing
>>> e2.synchronize()
>>> t = cp.cuda.get_elapsed_time(e1, e2)  # only include the compute time, not the copy time

就像 Device 对象一样,StreamEvent 对象也可以用于同步。

备注

在CuPy中,Stream 对象是按线程、按设备管理的。

备注

On NVIDIA GPUs, there are two stream singleton objects null and ptds, referred to as the legacy default stream and the per-thread default stream, respectively. CuPy uses the former as default when no user-defined stream is in use. To change this behavior, set the environment variable CUPY_CUDA_PER_THREAD_DEFAULT_STREAM to 1, see 环境变量. This is not applicable to AMD GPUs.

为了与其他Python库创建的流互操作,CuPy提供了:class:~cupy.cuda.ExternalStream API来包装现有的流指针(以Python int 形式给出)。详情请参阅:doc:互操作性

CUDA 驱动程序和运行时 API#

建设中。请参阅 运行时 API 获取API参考。