访问 CUDA 功能#
流和事件#
在本节中,我们将讨论CUDA流和事件的基本用法。有关API参考,请参阅 流和事件。关于它们在CUDA编程模型中的作用,请参阅 CUDA编程指南。
CuPy 提供了高级 Python API Stream 和 Event 分别用于创建流和事件。数据复制和内核启动被排入 当前流 ,可以通过 get_current_stream() 查询,并通过设置上下文管理器来更改:
>>> import numpy as np
>>>
>>> a_np = np.arange(10)
>>> s = cp.cuda.Stream()
>>> with s:
... a_cp = cp.asarray(a_np) # H2D transfer on stream s
... b_cp = cp.sum(a_cp) # kernel launched on stream s
... assert s == cp.cuda.get_current_stream()
...
>>> # fall back to the previous stream in use (here the default stream)
>>> # when going out of the scope of s
或者通过使用 use() 方法:
>>> s = cp.cuda.Stream()
>>> s.use() # any subsequent operations are done on steam s
<Stream ... (device ...)>
>>> b_np = cp.asnumpy(b_cp)
>>> assert s == cp.cuda.get_current_stream()
>>> cp.cuda.Stream.null.use() # fall back to the default (null) stream
<Stream 0 (device -1)>
>>> assert cp.cuda.Stream.null == cp.cuda.get_current_stream()
事件可以通过手动创建或通过 record() 方法创建。Event 对象可以用于计时 GPU 活动(通过 get_elapsed_time())或设置流间依赖关系:
>>> e1 = cp.cuda.Event()
>>> e1.record()
>>> a_cp = b_cp * a_cp + 8
>>> e2 = cp.cuda.get_current_stream().record()
>>>
>>> # set up a stream order
>>> s2 = cp.cuda.Stream()
>>> s2.wait_event(e2)
>>> with s2:
... # the a_cp is guaranteed updated when this copy (on s2) starts
... a_np = cp.asnumpy(a_cp)
>>>
>>> # timing
>>> e2.synchronize()
>>> t = cp.cuda.get_elapsed_time(e1, e2) # only include the compute time, not the copy time
就像 Device 对象一样,Stream 和 Event 对象也可以用于同步。
备注
在CuPy中,Stream 对象是按线程、按设备管理的。
备注
On NVIDIA GPUs, there are two stream singleton objects null and
ptds, referred to as the legacy default stream and the per-thread default
stream, respectively. CuPy uses the former as default when no user-defined stream is in use. To
change this behavior, set the environment variable CUPY_CUDA_PER_THREAD_DEFAULT_STREAM to 1,
see 环境变量. This is not applicable to AMD GPUs.
为了与其他Python库创建的流互操作,CuPy提供了:class:~cupy.cuda.ExternalStream API来包装现有的流指针(以Python int 形式给出)。详情请参阅:doc:互操作性。
CUDA 驱动程序和运行时 API#
建设中。请参阅 运行时 API 获取API参考。