CUDAGraph#
- class torch.cuda.CUDAGraph(keep_graph=False)[source]#
Wrapper around a CUDA graph.
- 参数
keep_graph (bool, optional) – If
keep_graph=False
, the cudaGraphExec_t will be instantiated on GPU at the end ofcapture_end
and the underlying cudaGraph_t will be destroyed. Users who want to query or otherwise modify the underlying cudaGraph_t before instantiatiation can setkeep_graph=True
and access it viaraw_cuda_graph
aftercapture_end
. Note that the cudaGraphExec_t will not be instantiated at the end ofcapture_end
in this case. Instead, it wil be instantiated via an explicit called toinstantiate
or automatically on the first call toreplay
ifinstantiate
was not already called. Callinginstantiate
manually beforereplay
is recommended to prevent increased latency on the first call toreplay
. It is allowed to modify the raw cudaGraph_t after first callinginstantiate
, but the user must callinstantiate
again manually to make sure the instantiated graph has these changes. Pytorch has no means of tracking these changes.
警告
此 API 处于 Beta 版,未来版本中可能会更改。
- capture_begin(pool=None, capture_error_mode='global')[source]#
Begin capturing CUDA work on the current stream.
Typically, you shouldn’t call
capture_begin
yourself. Usegraph
ormake_graphed_callables()
, which callcapture_begin
internally.- 参数
pool (optional) – Token (returned by
graph_pool_handle()
orother_Graph_instance.pool()
) that hints this graph may share memory with the indicated pool. See Graph memory management.capture_error_mode (str, optional) – specifies the cudaStreamCaptureMode for the graph capture stream. Can be “global”, “thread_local” or “relaxed”. During cuda graph capture, some actions, such as cudaMalloc, may be unsafe. “global” will error on actions in other threads, “thread_local” will only error for actions in the current thread, and “relaxed” will not error on these actions. Do NOT change this setting unless you’re familiar with cudaStreamCaptureMode
- capture_end()[source]#
End CUDA graph capture on the current stream.
After
capture_end
,replay
may be called on this instance.Typically, you shouldn’t call
capture_end
yourself. Usegraph
ormake_graphed_callables()
, which callcapture_end
internally.
- debug_dump(debug_path)[source]#
- 参数
debug_path (required) – Path to dump the graph to.
Calls a debugging function to dump the graph if the debugging is enabled via CUDAGraph.enable_debug_mode()
- instantiate()[source]#
Instantiate the CUDA graph. Will be called by
capture_end
ifkeep_graph=False
, or byreplay
ifkeep_graph=True
andinstantiate
has not already been explicitly called. Does not destroy the cudaGraph_t returned byraw_cuda_graph
.
- raw_cuda_graph()[源代码]#
返回底层的 cudaGraph_t。`keep_graph` 必须为 True。
有关如何操作此对象的 API,请参阅:Graph Management 和 cuda-python Graph Management bindings