评价此页

CUDAGraph#

class torch.cuda.CUDAGraph(keep_graph=False)[source]#

Wrapper around a CUDA graph.

参数

keep_graph (bool, optional) – If keep_graph=False, the cudaGraphExec_t will be instantiated on GPU at the end of capture_end and the underlying cudaGraph_t will be destroyed. Users who want to query or otherwise modify the underlying cudaGraph_t before instantiatiation can set keep_graph=True and access it via raw_cuda_graph after capture_end. Note that the cudaGraphExec_t will not be instantiated at the end of capture_end in this case. Instead, it wil be instantiated via an explicit called to instantiate or automatically on the first call to replay if instantiate was not already called. Calling instantiate manually before replay is recommended to prevent increased latency on the first call to replay. It is allowed to modify the raw cudaGraph_t after first calling instantiate, but the user must call instantiate again manually to make sure the instantiated graph has these changes. Pytorch has no means of tracking these changes.

警告

此 API 处于 Beta 版,未来版本中可能会更改。

capture_begin(pool=None, capture_error_mode='global')[source]#

Begin capturing CUDA work on the current stream.

Typically, you shouldn’t call capture_begin yourself. Use graph or make_graphed_callables(), which call capture_begin internally.

参数
  • pool (optional) – Token (returned by graph_pool_handle() or other_Graph_instance.pool()) that hints this graph may share memory with the indicated pool. See Graph memory management.

  • capture_error_mode (str, optional) – specifies the cudaStreamCaptureMode for the graph capture stream. Can be “global”, “thread_local” or “relaxed”. During cuda graph capture, some actions, such as cudaMalloc, may be unsafe. “global” will error on actions in other threads, “thread_local” will only error for actions in the current thread, and “relaxed” will not error on these actions. Do NOT change this setting unless you’re familiar with cudaStreamCaptureMode

capture_end()[source]#

End CUDA graph capture on the current stream.

After capture_end, replay may be called on this instance.

Typically, you shouldn’t call capture_end yourself. Use graph or make_graphed_callables(), which call capture_end internally.

debug_dump(debug_path)[source]#
参数

debug_path (required) – Path to dump the graph to.

Calls a debugging function to dump the graph if the debugging is enabled via CUDAGraph.enable_debug_mode()

enable_debug_mode()[source]#

Enable debugging mode for CUDAGraph.debug_dump.

instantiate()[source]#

Instantiate the CUDA graph. Will be called by capture_end if keep_graph=False, or by replay if keep_graph=True and instantiate has not already been explicitly called. Does not destroy the cudaGraph_t returned by raw_cuda_graph.

pool()[源代码]#

返回一个表示此图内存池 ID的不透明令牌。

此 ID 可选择性地传递给另一个图的 capture_begin,这暗示另一个图可以共享相同的内存池。

raw_cuda_graph()[源代码]#

返回底层的 cudaGraph_t。`keep_graph` 必须为 True。

有关如何操作此对象的 API,请参阅:Graph Managementcuda-python Graph Management bindings

replay()[源代码]#

重放此图捕获的 CUDA 工作。

reset()[源代码]#

删除此实例当前持有的图。