LazyMemmapStorage¶

class torchrl.data.replay_buffers.LazyMemmapStorage(max_size: int, *, scratch_dir=None, device: device = 'cpu', ndim: int = 1, existsok: bool = False, compilable: bool = False)[源代码]¶

内存映射的张量和 tensordicts 存储。

参数:

max_size (int) – 存储大小，即缓冲区中存储的最大元素数量。

关键字参数:

scratch_dir (str 或 path) – memmap-tensors 将写入的目录。
device (torch.device, 可选) – 采样张量将被存储和发送到的设备。默认为 torch.device("cpu")。如果提供 None，则设备将从传递的第一个数据批次自动收集。此功能默认不启用，以避免因数据意外放置在 GPU 上而导致 OOM 问题。
ndim (int, optional) – 计算存储大小时要考虑的维度数。例如，形状为 [3, 4] 的存储，如果 ndim=1，则容量为 3；如果 ndim=2，则容量为 12。默认为 1。
existsok (bool, 可选) – 如果磁盘上已存在任何张量，是否应引发错误。默认为 True。如果为 False，则将按原样打开张量，不覆盖。

注意

在检查点（checkpointing）LazyMemmapStorage 时，可以提供一个与存储已存储位置相同的路径，以避免执行已存储在磁盘上的数据的长时间复制。这仅在使用默认的 TensorStorageCheckpointer 检查点时才有效。例如

>>> from tensordict import TensorDict
>>> from torchrl.data import TensorStorage, LazyMemmapStorage, ReplayBuffer
>>> import tempfile
>>> from pathlib import Path
>>> import time
>>> td = TensorDict(a=0, b=1).expand(1000).clone()
>>> # We pass a path that is <main_ckpt_dir>/storage to LazyMemmapStorage
>>> rb_memmap = ReplayBuffer(storage=LazyMemmapStorage(10_000_000, scratch_dir="dump/storage"))
>>> rb_memmap.extend(td);
>>> # Checkpointing in `dump` is a zero-copy, as the data is already in `dump/storage`
>>> rb_memmap.dumps(Path("./dump"))

示例

>>> data = TensorDict({
...     "some data": torch.randn(10, 11),
...     ("some", "nested", "data"): torch.randn(10, 11, 12),
... }, batch_size=[10, 11])
>>> storage = LazyMemmapStorage(100)
>>> storage.set(range(10), data)
>>> len(storage)  # only the first dimension is considered as indexable
10
>>> storage.get(0)
TensorDict(
    fields={
        some data: MemoryMappedTensor(shape=torch.Size([11]), device=cpu, dtype=torch.float32, is_shared=False),
        some: TensorDict(
            fields={
                nested: TensorDict(
                    fields={
                        data: MemoryMappedTensor(shape=torch.Size([11, 12]), device=cpu, dtype=torch.float32, is_shared=False)},
                    batch_size=torch.Size([11]),
                    device=cpu,
                    is_shared=False)},
            batch_size=torch.Size([11]),
            device=cpu,
            is_shared=False)},
    batch_size=torch.Size([11]),
    device=cpu,
    is_shared=False)

此类也支持 tensorclass 数据。

示例

>>> from tensordict import tensorclass
>>> @tensorclass
... class MyClass:
...     foo: torch.Tensor
...     bar: torch.Tensor
>>> data = MyClass(foo=torch.randn(10, 11), bar=torch.randn(10, 11, 12), batch_size=[10, 11])
>>> storage = LazyMemmapStorage(10)
>>> storage.set(range(10), data)
>>> storage.get(0)
MyClass(
    bar=MemoryMappedTensor(shape=torch.Size([11, 12]), device=cpu, dtype=torch.float32, is_shared=False),
    foo=MemoryMappedTensor(shape=torch.Size([11]), device=cpu, dtype=torch.float32, is_shared=False),
    batch_size=torch.Size([11]),
    device=cpu,
    is_shared=False)

attach(buffer: Any) → None¶

此函数将采样器附加到此存储。

从该存储读取的缓冲区必须通过调用此方法作为已附加实体包含进来。这确保了当存储中的数据发生变化时，组件能够感知到这些变化，即使该存储与其他缓冲区（例如，Priority Samplers）共享。

参数:: buffer – 读取此存储的对象。

dump(*args, **kwargs)¶: dumps() 的别名。

load(*args, **kwargs)¶: loads() 的别名。

save(*args, **kwargs)¶: dumps() 的别名。

LazyMemmapStorage¶

文档

教程

资源